Taking Advantage of Flash Storage in NoSQL - dummies

Taking Advantage of Flash Storage in NoSQL

By Adam Fowler

When you need incredibly fast writes, flash storage is called for (as opposed to calling for Flash Gordon). This comes at the cost of using RAM space, of course. Writing to RAM will get you, well, about as far as the size of your RAM. So having a very high‐speed storage option immediately behind your server’s RAM is a good idea.

This way, when a checkpoint operation to flush the data to disk is done, it clears space in RAM as quickly as possible.

Spending money for speed

Flash is expensive — more so than traditional spinning disk and RAM. It’s possible to make do without flash by using RAID 10 spinning disk arrays, but these will get you only so far.

A logical approach is to look at how fast data streams into your database. Perhaps provisioning 100 percent of the size of your store data for a spinning disk, 10 percent for flash, and one percent for RAM. These figures will vary depending on your application’s data access profile, and how often that same data is accessed.

Of course, if you’re in an industry where data ages quickly and you absolutely need to guarantee write throughput, then an expensive all‐flash infrastructure could be for you.

To give you an idea about the possible scale achievable in a key‐value store that supports native flash, Aerospike claims that, with native flash for data and RAM for indexes, 99.9 percent of reads and writes are completed within one millisecond.

Context computing

Aerospike espouses a concept called contextaware computing. Context‐aware computing is where you have a very short window of time to respond to a request, and the correct response is dictated by some properties of the user, such as age or products purchased. These properties could include:

  • Identity: Session IDs, cookies, IP addresses

  • Attributes: Demographic or geographic

  • Behavior: Presence (swipe, search, share), channels (web, phone), services (frequency, sophistication)

  • Segments: Attitudes, values, lifestyle, history

  • Transactions: Payments, campaigns

The general idea is to mine data from a transactional system to determine the most appropriate advertisement or recommendation for a customer based on various factors. You can do so by using a Hadoop map/reduce job, for example, on a transactional Oracle relational database.

The outputs are then stored in Aerospike so that when a particular customer arrives on your website and they have a mixture of the preceding list of factors (modeled as a composite key), the appropriate advertisement or recommendation is immediately given to the customer.

Evaluating Aerospike

Aerospike is the king of flash support. Rather than use the operating system’s file system support on top of flash, as other databases do (that is, they basically treat a flash disk as any other hard disk), Aerospike natively accesses the flash.

This behavior provides Aerospike with maximum throughput, because it doesn’t have to wait for operating system function calls to be completed; it simply accesses the raw flash blocks directly. Moreover, Aerospike can take advantage of the physical attributes of flash storage in order to eke out every last bit of performance.

Aerospike in start overtaking Riak in large enterprises and mission‐critical use cases, though. It has enterpriselevel features lacking in other databases, including the following:

  • Full ACID consistency: Ensures data is safe and consistent.

  • Shared‐nothing cluster: Has synchronous replication to keep data consistent.

  • Automatic rebalancing: Automatically moves some data to new nodes, evening out read times and allowing for scale out and scale back in a cluster.

  • Support for UDFs and Hadoop: User defined functions can run next to the data for aggregation queries, and Hadoop Map/Reduce is supported for more complex requirements.

  • Secondary indexes: Adds indexes on data value fields for fast querying.

  • Large data types: Supports custom and large data types; allows for complex data models and use cases.

  • Automatic storage tier flushing on writes: Flushes RAM to flash storage (SSDs) and disk when space on the faster tier is nearly exhausted.

Whether or not you need blazing‐fast flash support, these other features should really interest people with mission‐critical use cases. If you’re evaluating Riak for a mission‐critical system, definitely evaluate Aerospike as well.