Key Value Pairs in the HBase Data Model

By Dirk deRoos

The logical HBase data model is simple yet elegant, and it provides a natural data storage mechanism for all kinds of data — especially unstructured big data sets. All the parts of the data model converge into a key-value pair.

First off, in a world where you can think of the row key as the primary key for data stored in HBase, how do you end up leveraging the rest of the data model components? Well, it all depends on how much data you want returned in queries and how long you’re willing to wait.

Specifying only the row key can potentially return a ton of data, because an individual row can have millions of columns. Also, with only the row key to work from, HBase can return every column qualifier, version, and value related to the row key.

What if you want only a particular column or version of your data? From the example shown, can you see what happens if you want only the last name of a particular customer? The solution is to build a more complex key to specify exactly what you need. A key-value pair can look like this: RowKey:(Column Family:Column Qualifier:Version) => Value

Logical View of Customer Contact Information in HBase
Row Key Column Family: {Column Qualifier:Version:Value}
00001 CustomerName: {‘FN’:
‘LN’: 1383859182858:‘Smith’,
‘MN’: 1383859183001:’Timothy’,
‘MN’: 1383859182915:’T’}
ContactInfo: {‘EA’:
’SA’: 1383859183073:’1 Hadoop Lane, NY
00002 CustomerName: {‘FN’:
‘LN’: 1383859183163:‘Doe’,
ContactInfo: {
’SA’: 1383859185577:’7 HBase Ave, CA

After you specify the key, the rest is optional. The more specific you make the query, however (moving from left to right), the more granular the results. Your performance will worsen, because the system has to spend more time locating the exact value or values you need, but less data is returned when the query is finished.

So keys are more complex than you might imagine from studying the table. For example, if you want the most recent middle name (or the only middle name so far) of the customer in row ‘00001’, the resulting key-value pair would look like this: ‘00001:CustomerName:MN’ => ‘Timothy’

Remember that versions are implemented using time stamps by default and are sorted in decreasing order so that you automatically get the most recent value if you don’t specify a version. If you want a prior middle initial for your customer (refer to Table 12-2), your resulting key-value pair would look like this:’00001:CustomerName:MN:1383859182915′ => ‘T’

We hope that the various descriptions of HBase are starting to take shape in your mind. Specifically, HBase is both a column family oriented data store and a key-value-pair data store. Referring to HBase as simply a “column oriented” data store leaves a lot to the imagination.

In case you were curious, there are no data types in HBase — values in HBase are just one or more bytes. Again, simple but powerful because you can store anything!