Big Data For Dummies
Book image
Explore Book Buy On Amazon

By far, the simplest of the NoSQL (not-only-SQL) databases in a big data environment are those employing the key-value pair (KVP) model. KVP databases do not require a schema (like RDBMSs) and offer great flexibility and scalability.

KVP databases do not offer ACID (Atomicity, Consistency, Isolation, Durability) capability, and require implementers to think about data placement, replication, and fault tolerance as they are not expressly controlled by the technology itself. KVP databases are not typed. As a result, most of the data is stored as strings.

Key Value
Color Blue
Libation Beer
Hero Soldier

This is a very simplified set of keys and values. In a big data implementation, many individuals will have differing ideas about colors, libations, and heroes.

Key Value
FacebookUser12345_Color Red
TwitterUser67890_Color Brownish
FoursquareUser45678_Libation “White wine”
Google+User24356_Libation “Dry martini with a twist”
LinkedInUser87654_Hero “Top sales performer”

As the number of users increases, keeping track of precise keys and related values can be challenging. If you need to keep track of the opinions of millions of users, the number of key-value pairs associated with them can increase exponentially. If you do not want to constrain choices for the values, the generic string representation of KVP provides flexibility and readability.

You might need some additional help organizing data in a key-value database. Most offer the capability to aggregate keys (and their related values) into a collection. Collections can consist of any number of key-value pairs and do not require exclusive control of the individual KVP elements.

One widely used open source key-value pair database is called Riak. It is developed and supported by a company called Basho Technologies and is made available under the Apache Software License v2.0.

Riak is a very fast and scalable implementation of a key-value database. It supports a high-volume environment with fast-changing data because it is lightweight. Riak is particularly effective at real-time analysis of trading in financial services. It uses “buckets” as an organizing mechanism for collections of keys and values.

Riak implementations are clusters of physical or virtual nodes arranged in a peer-to-peer fashion. No master node exists, so the cluster is resilient and highly scalable. All data and operations are distributed across the cluster. Larger cluster perform better and faster than clusters with fewer nodes. Communication in the cluster is implemented via a special protocol called Gossip. Gossip stores status information about the cluster and shares information about buckets.

Riak has many features and is part of an ecosystem consisting of the following:

  • Parallel processing: Using MapReduce, Riak supports a capability to decompose and recompose queries across the cluster for real-time analysis and computation.

  • Links and link walking: Riak can be constructed to mimic a graph database using links. A link can be thought of as a one-way connection between key-value pairs. Walking (following) the links will provide a map of relationships between key-value pairs.

  • Search: Riak Search has a fault-tolerant, distributed full-text searching capability. Buckets can be indexed for rapid resolution of value to keys.

  • Secondary indexes: Developers can tag values with one or more key field values. The application can then query the index and return a list of matching keys. This can be very useful in big data implementations because the operation is atomic and will support real-time behaviors.

Riak implementations are best suited for

  • User data for social networks, communities, or gaming

  • High-volume, media-rich data gathering and storage

  • Caching layers for connecting RDBMS and NoSQL databases

  • Mobile applications requiring flexibility and dependability

About This Article

This article is from the book:

About the book authors:

Judith Hurwitz is an expert in cloud computing, information management, and business strategy. Alan Nugent has extensive experience in cloud-based big data solutions. Dr. Fern Halper specializes in big data and analytics. Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.

This article can be found in the category: