Key-Value Pair Databases in a Big Data Environment
By far, the simplest of the NoSQL (not-only-SQL) databases in a big data environment are those employing the key-value pair (KVP) model. KVP databases do not require a schema (like RDBMSs) and offer great flexibility and scalability.
KVP databases do not offer ACID (Atomicity, Consistency, Isolation, Durability) capability, and require implementers to think about data placement, replication, and fault tolerance as they are not expressly controlled by the technology itself. KVP databases are not typed. As a result, most of the data is stored as strings.
This is a very simplified set of keys and values. In a big data implementation, many individuals will have differing ideas about colors, libations, and heroes.
|Google+User24356_Libation||“Dry martini with a twist”|
|LinkedInUser87654_Hero||“Top sales performer”|
As the number of users increases, keeping track of precise keys and related values can be challenging. If you need to keep track of the opinions of millions of users, the number of key-value pairs associated with them can increase exponentially. If you do not want to constrain choices for the values, the generic string representation of KVP provides flexibility and readability.
You might need some additional help organizing data in a key-value database. Most offer the capability to aggregate keys (and their related values) into a collection. Collections can consist of any number of key-value pairs and do not require exclusive control of the individual KVP elements.
Riak is a very fast and scalable implementation of a key-value database. It supports a high-volume environment with fast-changing data because it is lightweight. Riak is particularly effective at real-time analysis of trading in financial services. It uses “buckets” as an organizing mechanism for collections of keys and values.
Riak implementations are clusters of physical or virtual nodes arranged in a peer-to-peer fashion. No master node exists, so the cluster is resilient and highly scalable. All data and operations are distributed across the cluster. Larger cluster perform better and faster than clusters with fewer nodes. Communication in the cluster is implemented via a special protocol called Gossip. Gossip stores status information about the cluster and shares information about buckets.
Riak has many features and is part of an ecosystem consisting of the following:
Parallel processing: Using MapReduce, Riak supports a capability to decompose and recompose queries across the cluster for real-time analysis and computation.
Links and link walking: Riak can be constructed to mimic a graph database using links. A link can be thought of as a one-way connection between key-value pairs. Walking (following) the links will provide a map of relationships between key-value pairs.
Search: Riak Search has a fault-tolerant, distributed full-text searching capability. Buckets can be indexed for rapid resolution of value to keys.
Secondary indexes: Developers can tag values with one or more key field values. The application can then query the index and return a list of matching keys. This can be very useful in big data implementations because the operation is atomic and will support real-time behaviors.
Riak implementations are best suited for
User data for social networks, communities, or gaming
High-volume, media-rich data gathering and storage
Caching layers for connecting RDBMS and NoSQL databases
Mobile applications requiring flexibility and dependability