Column Qualifiers in the HBase Data Model

By Dirk deRoos

In the HBase data model column qualifiers are specific names assigned to your data values in order to make sure you’re able to accurately identify them. Unlike column families, column qualifiers can be virtually unlimited in content, length and number.

If you omit the column qualifier, the HBase system will assign one for you. Printable characters are not required, so any type and number of bytes can be used to create a column qualifier. Because the number of column qualifiers is variable, new data can be added to column families on the fly, making HBase flexible and highly scalable.

But there’s a cost to consider: HBase stores the column qualifier with your value (it’s actually part of the key), and since HBase doesn’t limit the number of column qualifiers you can have, creating long column qualifiers can be quite costly in terms of storage.

That’s why the column qualifiers are abbreviated in the table (for example, “LN:” was used instead of “LastName”). Notice in the logical representation of the customer contact information in HBase that the system is taking advantage of sparse data support in the case of Jane Doe.

Logical View of Customer Contact Information in HBase
Row Key Column Family: {Column Qualifier:Version:Value}
00001 CustomerName: {‘FN’:
‘LN’: 1383859182858:‘Smith’,
‘MN’: 1383859183001:’Timothy’,
‘MN’: 1383859182915:’T’}
ContactInfo: {‘EA’:
’SA’: 1383859183073:’1 Hadoop Lane, NY
00002 CustomerName: {‘FN’:
‘LN’: 1383859183163:‘Doe’,
ContactInfo: {
’SA’: 1383859185577:’7 HBase Ave, CA

Assuming this table represents customer contact information from a service company, the company isn’t too worried about Jane’s middle name (abbreviated ‘MN’) and e-mail addresses (abbreviated ‘EA’) now, but hopes to (progressively) gather that information over time.