3 Hadoop Cluster Configurations
Networking and Hadoop Clusters
Hadoop For Dummies Cheat Sheet

Row Keys in the HBase Data Model

HBase data stores consist of one or more tables, which are indexed by row keys. Data is stored in rows with columns, and rows can have multiple versions. By default, data versioning for rows is implemented with time stamps.

Logical View of Customer Contact Information in HBase
Row Key Column Family: {Column Qualifier:Version:Value}
00001 CustomerName: {‘FN’: 1383859182496:‘John’,
‘LN’: 1383859182858:‘Smith’,
‘MN’: 1383859183001:’Timothy’,
‘MN’: 1383859182915:’T’}
ContactInfo: {‘EA’: 1383859183030:‘John.Smith@xyz.com’,
’SA’: 1383859183073:’1 Hadoop Lane, NY 11111’}
00002 CustomerName: {‘FN’: 1383859183103:‘Jane’,
‘LN’: 1383859183163:‘Doe’,
ContactInfo: {
’SA’: 1383859185577:’7 HBase Ave, CA 22222’}

For the sake of illustration, the table has two simple row keys: 00001 and 00002. Row keys are implemented as byte arrays, and are sorted in byte-lexicographical order, which simply means that the row keys are sorted, byte by byte, from left to right.

If you think in terms of numeric values when designing row keys, then sorting is simple. Given two keys, if the byte at Index 1 in Key 1 is less than the byte at Index 1 in Key 2, Row Key 1 will always be stored before Row Key 2, no matter what’s next in the sequence of bytes.

However, it’s common to use printable (ASCII) characters rather than numeric values for row keys in HBase and if you do, you need to understand that the Java language represents characters using the Unicode Standard. The following example illustrates this design consideration for Basic Latin (ASCII).

"RowA" precedes "RowA""Row-1" precedes "Row11""Row1" precedes "RowA"

You may wonder why you would bother with this fine detail with respect to row keys. The reason for this special attention is that proper row key design is crucial to achieving good performance in HBase — not doing so means you won’t realize the full value of your HBase cluster. Sorted row keys can help you access your data faster.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Factors That Increase the Scale of Statistical Analysis in Hadoop
The Limitations of Sampling in Hadoop
Hadapt and Hadoop
Importing Data into HBase with Sqoop
Keep Track of Data Blocks with NameNode in HDFS

Inside Dummies.com