Key-Value Pair Databases in a Big Data Environment

Judith S. Hurwitz

Alan Nugent

Fern Halper

Marcia Kaufman

Updated

2016-03-26 15:07:01

From the book

Big Data For Dummies

Download E-Book

Statistics for Big Data For Dummies

Explore Book

Download E-Book

Statistics for Big Data For Dummies

Explore Book

By far, the simplest of the NoSQL (not-only-SQL) databases in a big data environment are those employing the key-value pair (KVP) model. KVP databases do not require a schema (like RDBMSs) and offer great flexibility and scalability.

KVP databases do not offer ACID (Atomicity, Consistency, Isolation, Durability) capability, and require implementers to think about data placement, replication, and fault tolerance as they are not expressly controlled by the technology itself. KVP databases are not typed. As a result, most of the data is stored as strings.

Key	Value
Color	Blue
Libation	Beer
Hero	Soldier

This is a very simplified set of keys and values. In a big data implementation, many individuals will have differing ideas about colors, libations, and heroes.

Key	Value
FacebookUser12345_Color	Red
TwitterUser67890_Color	Brownish
FoursquareUser45678_Libation	“White wine”
Google+User24356_Libation	“Dry martini with a twist”
LinkedInUser87654_Hero	“Top sales performer”

As the number of users increases, keeping track of precise keys and related values can be challenging. If you need to keep track of the opinions of millions of users, the number of key-value pairs associated with them can increase exponentially. If you do not want to constrain choices for the values, the generic string representation of KVP provides flexibility and readability.

You might need some additional help organizing data in a key-value database. Most offer the capability to aggregate keys (and their related values) into a collection. Collections can consist of any number of key-value pairs and do not require exclusive control of the individual KVP elements.

One widely used open source key-value pair database is called Riak. It is developed and supported by a company called Basho Technologies and is made available under the Apache Software License v2.0.

Riak is a very fast and scalable implementation of a key-value database. It supports a high-volume environment with fast-changing data because it is lightweight. Riak is particularly effective at real-time analysis of trading in financial services. It uses “buckets” as an organizing mechanism for collections of keys and values.

Riak implementations are clusters of physical or virtual nodes arranged in a peer-to-peer fashion. No master node exists, so the cluster is resilient and highly scalable. All data and operations are distributed across the cluster. Larger cluster perform better and faster than clusters with fewer nodes. Communication in the cluster is implemented via a special protocol called Gossip. Gossip stores status information about the cluster and shares information about buckets.

Riak has many features and is part of an ecosystem consisting of the following:

Parallel processing: Using MapReduce, Riak supports a capability to decompose and recompose queries across the cluster for real-time analysis and computation.
Links and link walking: Riak can be constructed to mimic a graph database using links. A link can be thought of as a one-way connection between key-value pairs. Walking (following) the links will provide a map of relationships between key-value pairs.
Search: Riak Search has a fault-tolerant, distributed full-text searching capability. Buckets can be indexed for rapid resolution of value to keys.
Secondary indexes: Developers can tag values with one or more key field values. The application can then query the index and return a list of matching keys. This can be very useful in big data implementations because the operation is atomic and will support real-time behaviors.

Riak implementations are best suited for

User data for social networks, communities, or gaming
High-volume, media-rich data gathering and storage
Caching layers for connecting RDBMS and NoSQL databases
Mobile applications requiring flexibility and dependability

About This Article

About the book author:

Judith Hurwitz is an expert in cloud computing, information management, and business strategy.

Alan Nugent has extensive experience in cloud-based big data solutions.

Dr. Fern Halper specializes in big data and analytics.

Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.

This article can be found in the category:

Big Data

Hot off the press

Explore Related content

Statistics for Big Data For Dummies

Big Data For Dummies

Big Data For Small Business For Dummies

Book & Article Categories

Book & Article Categories

Collections

Key-Value Pair Databases in a Big Data Environment

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

Key-Value Pair Databases in a Big Data Environment

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Beyond Boundaries: Unstructured Data Orchestration

Big Data For Dummies Cheat Sheet

Statistics for Big Data For Dummies Cheat Sheet

Big Data for Small Business For Dummies Cheat Sheet

Integrate Big Data with the Traditional Data Warehouse

Best Practices for Big Data Integration

How to Analyze Big Data to Get Results

Big Data Planning Stages

Ten Hot Big Data Trends

Explore the Big Data Stack

Defining Big Data: Volume, Velocity, and Variety

Understanding Unstructured Data

Basics of Big Data Infrastructure

The Role of Traditional Operational Data in the Big Data Environment

Laying the Groundwork for Your Big Data Strategy

Managing Big Data with Hadoop: HDFS and MapReduce

Identify the Data You Need for Your Big Data

Layer 2 of the Big Data Stack: Operational Databases

Manage Virtualization for Big Data

Layer 4 of the Big Data Stack: Analytical Data Warehouses