Managing User Information with NoSQL - dummies

Managing User Information with NoSQL

By Adam Fowler

There’s mission‐critical data, and there’s supporting data. It’s okay if your mission‐critical data appears a little slowly because you want to be sure it’s safe and properly managed. But you don’t want the supporting data of your application to hinder overall transactions and user experiences.

Although the supporting data may be lower in value, its need to scale up is great — typically by providing delivery of query responses in less than ten milliseconds. Much of this supporting data helps users access a system, tailor a service to their needs, or find other available services or products.

Delivering web advertisements

Although advertisements are critical to companies marketing their wares or services on the web, they aren’t essential to many users’ web‐browsing experiences. However, the loading time of web pages is important to them, and as soon as a slowly delivered ad starts adding to a page’s load time, users start moving to alternative, faster, websites.

Serving advertisements fast is, therefore, a key concern. Doing so isn’t a simple business, though. Which advertisement is shown to which user depends on a very large number of factors, often determined by such factors as the user’s tracked activity online, language, and location.

Companies that target their advertisements to the right customers receive more click‐throughs, and thus more profit. However, the business of targeted advertising is increasingly scientific.

Key‐value stores are used mainly by web advertisement companies. (You can find case studies about such usage on key‐value NoSQL vendors’ websites.) Utilizing their proprietary software, these companies use a combination of factors to determine what a user wants or is interested in so that they can target advertisements to that user effectively.

You can think of this combination of factors as being a key, and it’s this composite key that points to the most compelling advertisement. Everything that is needed to serve the advertisement is kept as the value within a key‐value store.

If you need to serve data fast based on a set of known factors, then a key‐value store is an excellent match. All you need to do is set up the key effectively.

To set up the key, perform some offline analysis of which advertisements will be relevant to each combined profile of people. If the information you have on the visiting user is country, language, and favorite category of purchases on Amazon, then perhaps an appropriate key would be UK‐English‐guitars.

This prevents having to do any complex queries at ad serving time — just instead concatenate these fields together to form a key and ask for the value of that key.

Handling user sessions

You can spend all the money you want on a state‐of‐the‐art datacenter for your transactional data, but if your website is slow, people will say that your entire service is slow. In fact, when companies and governments launch new online services that can’t handle the load placed on them, the press eats them for breakfast.

Typically, the problem isn’t that a primary processing system goes down; rather, it’s because the users’ identities or sessions are handled poorly. Perhaps the username isn’t cached, or every request requires opening a new session from the application server instead of than caching this information between requests.

A user session may track how a user walks through an application, adding data on each page. The data can then be saved at the end of this journey in a single hit to the database, rather than in a sequence of small requests across many page requests. Users often don’t mind waiting a couple of seconds after clicking a save button. Providing an effective user session on a website that has low latency has a couple of benefits:

  • The user (soon to be customer!) receives good service.

  • Partially complete data doesn’t get saved to your main back‐end transactional database.

Websites use a cookie to track the user’s interaction with a website. A cookie is a small file linked to a unique ID, just like a record in a key‐value store. The server uses these cookies to identify that it already knows a user on their second or subsequent requests, so the server needs to fetch a session using this data quickly. In this way, when users log in, the websites recognize who they are, which pages they visit, and what information they’re looking for.

This unique ID is typically a random number, perhaps our old friend, the Universally Unique Identifier (UUID). The website may need to store various types of data. Typically, this data is short‐lived — the length of a user’s session, perhaps just a few minutes.

Key‐value stores are, therefore, ideal for storing and retrieving session data at high speeds. The ability to tombstone (that is delete) data once a timestamp is exceeded is also useful. In this way, the application doesn’t need to check the timestamp of the session on each request — if the session isn’t in the database, it’s been tombstoned. So the session is no longer valid, which removes some of the application programmer’s administrative burden.

Supporting personalization

Similar to the user‐session requirement, but longer‐lived, is the concept of user service personalization. This is where the front‐end application is configured by users for their specific needs.

Again, this is a front‐end secondary type of data, not the primary transactional data within a system. For example, imagine that you have a primary database showing the work levels for all your team, the current case files they’re working on, and all the related data. This is the primary data of the application. Perhaps it’s stored in an Oracle relational database or a MarkLogic NoSQL document database.

Use of the data can vary. For instance, one user may want to view a summary of only his team’s workload, whereas a manager might want to track all employees on a team.

These users are receiving different personalized views of the same data. These view preferences need to be saved somewhere. You probably don’t want to overload your case database with this personalization data; it’s specific to the front‐end application, not the core case‐management system.

Using a key‐value store with a composite key containing user id (not session id) and the service name allows you to store the personalization settings as a value, which makes lookups very quick and prevents the performance of your primary systems from being negatively affected.