Reducing Time to Value in NoSQL

By Adam Fowler

Time to value is the amount of time required from starting an IT project to being able to realize business benefit. This can be tangible benefits in cost reduction or the ability to transact new business, or intangible benefits like providing better customer service or products.

Key‐value stores are the simplest NoSQL databases with regards to data model. So, you can quickly build applications, especially if you apply a few key principles, including reviewing how you manage data structures.

Using simple structures

Key‐value stores are more flexible than relational databases in terms of the format of data. Use this flexibility to your advantage to maximize the rate of your application’s throughput. For example, if you’re storing map tiles, store them in hex format so that they can be rendered immediately in a browser.

In your application, store easy‐to‐use structures that don’t require scores of processing time. These structures can be simple intrinsic types like integers, strings, and dates, or more sophisticated structures like lists, sorted sets, or even JSON documents stored as a string.

Because it can be interpreted directly by a JavaScript web application, use JSON for simple web app status or preference storage. If you’re storing log data, store it in the format most appropriate for retrieval and analysis.

Use the most appropriate structure for your application, not your database administrator. Also consider the effects of time on your database. Will you want to modify data structures in the future to support new features?

Data structures change over time. A flexible JSON document is better than a CSV data file or fixed‐width data file because JSON structures can easily vary over time without needing to consider new or deleted properties. Change a column in a CSV file stored in a key‐value store, and you must update all of your application’s code! This isn’t the case with a JSON document, where older code simply ignores new properties.

Complex structure handling

If you have complex interrelated data sets, give careful thought to the data structures in your key‐value store. Store data sets in a way that allows easy retrieval. Rather than store eight items separately that will require eight reads, denormalize the data — write the data to the same record at ingestion time — so that only one read is needed later.

This does mean some data will be stored multiple times. An example is storing customer name in an order document. Although this stores the customer name across many orders, it means when showing a summary of the order you don’t have to discover that the value customer_number=12 means Mr A Fowler — preventing an additional read request.

Denormalization consumes more disk space than relational databases’ normal form, but greatly increases query throughput. It’s the NoSQL equivalent of a materialized view in a relational database. You’re sacrificing storage space for speed — the classic computer science tradeoff.

For computer scientists of a certain generation, it’s considered heresy to keep multiple copies of the same data. It’s simply inefficient. Relational database lecturers would eat you for breakfast!

However, with the current low cost of storage and the increasing demands of modern applications, it’s much better to sacrifice storage for speed in reading data. So, consider denormalization as a friend.