By Dirk deRoos

RegionServers are one thing, but you also have to take a look at how individual regions work. In HBase, a table is both spread across a number of RegionServers as well as being made up of individual regions. As tables are split, the splits become regions. Regions store a range of key-value pairs, and each RegionServer manages a configurable number of regions.

But what do the individual regions look like? HBase is a column-family-oriented data store, so how do the individual regions store key-value pairs based on the column families they belong to? The following figure begins to answer these questions and helps you digest more vital information about the architecture of HBase.

image0.jpg

HBase is written in Java — like the vast majority of Hadoop technologies. Java is an object oriented programming language and an elegant technology for distributed computing. So, as you continue to find out more about HBase, remember that all of the components in the architecture are ultimately Java objects.

First off, the preceding figure gives a pretty good idea of what region objects actually look like, generally speaking. It also makes it clear that regions separate data into column families and store the data in the HDFS using HFile objects.

When clients put key-value pairs into the system, the keys are processed so that data is stored based on the column family the pair belongs to. As shown in the figure, each column family store object has a read cache called the BlockCache and a write cache called the MemStore. The BlockCache helps with random read performance.

Data is read in blocks from the HDFS and stored in the BlockCache. Subsequent reads for the data — or data stored in close proximity — will be read from RAM instead of disk, improving overall performance. The Write Ahead Log (WAL, for short) ensures that your HBase writes are reliable. There is one WAL per RegionServer.

image1.jpg

Always heed the Iron Law of Distributed Computing: A failure isn’t the exception — it’s the norm, especially when clustering hundreds or even thousands of servers. Google followed the Iron Law in designing BigTable and HBase followed suit.

When you write or modify data in HBase, the data is first persisted to the WAL, which is stored in the HDFS, and then the data is written to the MemStore cache. At configurable intervals, key-value pairs stored in the MemStore are written to HFiles in the HDFS and afterwards WAL entries are erased.

If a failure occurs after the initial WAL write but before the final MemStore write to disk, the WAL can be replayed to avoid any data loss.

Three HFile objects are in one column family and two in the other. The design of HBase is to flush column family data stored in the MemStore to one HFile per flush. Then at configurable intervals HFiles are combined into larger HFiles. This strategy queues up the critical compaction operation in HBase.