High‐Speed Key Access with NoSQL

By Adam Fowler

Key‐value stores in NoSQL are all about speed. You can use various techniques to maximize that speed, from caching data, to having multiple copies of data, or using the most appropriate storage structures.

Caching data in memory

Because data is easily accessed when it’s stored in random access memory (RAM), choosing a key‐value store that caches data in RAM can significantly speed up your access to data, albeit at the price of higher server costs.

Often, though, this tradeoff is worth making. You can easily calculate what percentage of your stored data is requested frequently. If you know five percent is generally requested every few minutes, then take five percent of your data size and add that number as spare RAM space across your database servers.

Bear in mind that the operating system, other applications, and the database server have memory requirements, too.

Replicating data to slaves

In key‐value stores, a particular key is stored on one of the servers in the cluster. This process is called key partitioning. This means that, if this key is constantly requested, this node will receive the bulk of requests. This node, therefore, will be slower than your average request speed, potentially affecting the quality of service to your users.

To avoid this situation, some key‐value stores support adding read‐only replicas, also referred to as slaves. Redis, Riak, and Aerospike are good examples. Replication allows the key to be stored multiple times across several servers, which increases response speed but at the cost of more hardware.

Some key‐value stores guarantee that the replicas of the key will always have the same value as the master. This guarantee is called being fully consistent. If an update happens on the master server holding the key, all the replicas are guaranteed to be up to date.

Not all key‐value stores guarantee this status (Riak, for example), so if it’s important to be up to date to the millisecond, then choose a database whose replicas are fully consistent (such as Aerospike).

Data modeling in key‐value stores

Many key‐value stores support only basic structures for their value types, leaving the application programmer with the job of interpreting the data. Simple data type support typically includes strings, integers, JSON, and binary values.

For many use cases, this works well, but sometimes a slightly more granular access to data is useful. Redis, for example, supports the following data value types:

  • String

  • List

  • Set

  • Sorted set

  • Hash maps

  • Bit arrays

  • Hyperlog logs

Sorted sets can be queried for matching ranges of values — much like querying an index of values sorted by date, which is very useful for searching for a subset of typed data.

Operating on data

Redis includes operations to increment and decrement key values directly, without having to do a read‐modify‐update (RMU) set of steps. You can do so within a single transaction to ensure that no other application changes the value during an update. These data‐type specific operations include adding and removing items to lists and sets, too.

You can even provide autocomplete functionality on an application’s user interface by using the Redis ZRANGEBYLEX command. This command retrieves a set of keys which partially matches a string. So, if you were to type “NoSQL for” in the search bar of an application built on Redis, you would see the suggestion “NoSQL For Dummies.”

Evaluating Redis

Redis prides itself on being a very lightweight but blazingly fast key‐value store. It was originally designed to be an in‐memory key‐value store, but now boasts disk‐based data storage.

You can use Redis to safeguard data by enabling AOF (append only file) mode and instructing Redis to force data to disk on each query (known as forced fsync flushing). AOF does slow down writes, of course, but it provides a higher level of durability for data. Be aware, though, that it’s still possible to lose up to one second of commands.

Also, Redis only recently added support for clustering. In fact, at the time of this writing, Redis’s clustering support is in the beta testing phase. Fortunately, Redis uses a shared‐nothing cluster model, with masters for particular keys and slaves that are never directly written to by a client; only the master does so. Providing shared‐nothing clustering should make it easier for Redis to implement reliable clustering than it is for databases that allow writes to all replicas.

If you want a very high‐speed, in‐memory caching layer in front of another database — MongoDB or Riak are commonly used with Redis — then evaluate Redis as an option. As support for clustering and data durability evolves, perhaps Redis can overtake other back‐end databases.