Using Pluggable Storage with NoSQL - dummies

Using Pluggable Storage with NoSQL

By Adam Fowler

There are times when you want to provide key‐value style high speed access to data held in a relational database. This database could be, for example, Berkeley DB (Java Edition for Voldemort) or MySQL.

Providing key‐value like access to data requires a key‐value store to be layered directly over one of these other databases. Basically, you use another database as the storage layer, rather than a combination of a file system for storage and an ingestion pipeline for copying data from a relational database.

This process simplifies providing a high speed key‐value store while using a traditional relational database for storage.

Changing storage engines

Different workloads require different storage engines and performance characteristics. Aerospike is great for high ingest; Redis is great for high numbers of reads. Each is built around a specific use case.

Voldemort takes a different approach. Rather than treating the key‐value store as a separate tier of data management, Voldemort treats the key‐value store as an API and adds an in‐memory caching layer, which means that you can plug into the back end that makes the most sense for your particular needs.

If you want a straightforward disk storage tier, you can use the Berkeley DB Java Edition storage engine. If instead you want to store relational data, you can use MySQL as a back‐end to Voldemort.

This capability combined with custom data types allows you to use a key‐value store’s simple store/retrieve API to effectively pull back and directly cache information in a different back‐end store.

This approach contrasts with the usual approach of having separate databases — one in, say, Oracle for transactional data and another in your key‐value store (Riak, for example). With this two‐tier approach, you have to develop code to move data from one tier to the other for caching. With Voldemort, there is one combined tier — your data tier — so the extra code is redundant.

Caching data in memory

Voldemort has a built‐in in‐memory cache, which decreases the load on the storage engine and increases query performance. No need to use a separate caching layer such as Redis or Oracle’s Coherence Java application data caching product on top.

The capability to provide high‐speed storage tiering with caching is why LinkedIn uses Voldemort for certain high‐performance use cases.

With Voldemort, you get the best of both worlds — a storage engine for your exact data requirements and a high‐speed in‐memory cache to reduce the load on that engine. You also get simple key‐value store store/retrieve semantics on top of your storage engine.

Evaluating Voldemort

In the Harry Potter books Lord Voldemort held a lot of magic in him, both good and bad, although he used it for terrorizing muggles. The Voldemort database, as it turns out, can also store vast amounts of data, but can be used for good by data magicians everywhere!

Voldemort is still a product in development. Many pieces are still missing, so it doesn’t support the variety of storage engines you might expect. This focus for Voldemort’s development community is likely because Voldemort is built in the Java programming language, which requires a Java Native Interface (JNI) connector to be built for integration to most C or C++ based databases.

Voldemort has good integration with serialization frameworks, though. Supported frameworks include Java serialization, Avro, Thrift, and Protocol Buffers. This means that the provided API wrappers match the familiar serialization method of each programming language, making the development of applications intuitive.

Voldemort doesn’t handle consistency as well as other systems do. Voldemort uses the read repair approach, where inconsistent version numbers for the same record are fixed at read time, rather than being kept consistent at write time.

There is also no secondary indexing or query support; Voldemort expects you to use the facilities of the underlying storage engine to cope with that use case. Also, Voldemort doesn’t have native database triggers or an alerting or event processing framework with which to build one.

If you do need a key‐value store that is highly available, is partition‐tolerant, runs in Java, and uses different storage back ends, then Voldemort may be for you.