RegionServers in HBase - dummies

RegionServers in HBase

By Dirk deRoos

RegionServers are the software processes (often called daemons) you activate to store and retrieve data in HBase (Hadoop Database). In production environments, each RegionServer is deployed on its own dedicated compute node. When you start using HBase, you create a table and then begin storing and retrieving your data.

However, at some point — and perhaps quite quickly in big data use cases — the table grows beyond a configurable limit. At this point, the HBase system automatically splits the table and distributes the load to another RegionServer.

In this process, often referred to as auto-sharding, HBase automatically scales as you add data to the system — a huge benefit compared to most database management systems, which require manual intervention to scale the overall system beyond a single server. With HBase, as long as you have in the rack another spare server that’s configured, scaling is automatic!

Why set a limit on tables and then split them? After all, HDFS is the underlying storage mechanism, so all available disks in the HDFS cluster are available for storing your tables. (Not counting the replication factor, of course.) If you have an entire cluster at your disposal, why limit yourself to one RegionServer to manage your tables?

Simple. You may have any number of tables large or small and you’ll want HBase to leverage all available RegionServers when managing your data. You want to take full advantage of the cluster’s compute performance. Furthermore, with many clients accessing your HBase system, you’ll want to use many RegionServers to meet the demand.

HBase addresses all of these concerns for you and scales automatically in terms of storage capacity and compute power.