When Does HBase Make Sense for You?

By Dirk deRoos

So, when should you consider using HBase? Though the answer to this question isn’t necessarily straightforward for everyone, for starters you clearly must have a big data requirement and sufficient hardware resources.

  • A big data requirement: Terabytes to petabytes— otherwise you’ll have a lot of idle servers in your racks.

  • Sufficient hardware resources: Five servers is a good starting point.

When considering which route to take — HBase versus RDBMS — consider other requirements such as transaction support, rich data types, indexes, and query language support — though these factors are not as black and white as the preceding two bullets. Rich data types, indexes and query language support can be added via other technologies, such as Hive or commercial products.

“What about transactions?” you ask.

Certain use cases for RDBMSs, like online transaction processing, depend on ACID-compliant transactions between the client and the RDBMS for the system to function properly. (ACID — Atomicity, Consistency, Isolation, and Durability)

When compared to an RDBMS, HBase isn’t considered an ACID-compliant database as of this writing. HBase does not support ACID-compliant transactions over multiple rows or across tables. However, HBase does guarantee the following aspects:

  • Atomic: All row level operations within a table are atomic. This guarantee is maintained even when there’s more than one column family within a row.

  • Consistency: Scan operations return a consistent view of the data stored in HBase at some point in the past. Concurrent client interaction could update a row during a multi-row scan, but all rows returned by a scan operation will always contain valid data from some point in the past.

  • Durability: Any data that can be retrieved from HBase has also been made durable to disk (persisted to HDFS, in other words).

One of the exciting aspects of HBase and other open source Apache projects is that someone in the community is always innovating and trying to improve the technology. HBase does support multi-row transactions if the rows are on the same RegionServer. This feature, which requires additional coding, was introduced in HBase version 0.94.0. (If you’re curious, the additional coding focused on HBase’s split policy.)

When ACID properties are required by HBase clients, design the HBase schema such that cross row or cross table data operations are not required. Keeping data within a row provides atomicity.