The HBase Client Ecosystem

By Dirk deRoos

HBase is written in Java, an elegant language for building distributed technologies like HBase, but face it — not everyone who wants to take advantage of HBase innovations is a Java developer. That’s why there’s a rich HBase client ecosystem out there whose sole purpose is to do the heavy Java lifting for you and let you concentrate on making HBase work for you.

Rich is usually a good characteristic, but when that adjective crosses the line into overwhelming, you start having a problem. Here is an overview of the client ecosystem in diagram form. Note that the diagram is similar to the HBase architecture diagram, with an exploded view of the client box.

image0.jpg

The following lists summarize your options, starting with the Apache Hadoop clients, more specifically those HBase clients which are part of the Apache Hadoop ecosystem along with those technologies bundled with HBase that are designed to help you build HBase clients:

  • Hive: Hive is another top level Apache project. Hive provides its own take on data warehousing capabilities on top of Apache Hadoop. It comes with a storage handler for HBase, and also provides the HiveQL query language, which is quite similar to SQL. With Hive, you can do all the querying of HBase that you want using HiveQL and — here’s the kicker — no Java coding is required when you’re using HBase with Hive.

  • MapReduce: MapReduce is part of the Apache Hadoop framework. MapReduce’s claim to fame is that it’s a programming model for processing data in parallel on a distributed cluster. In the Hadoop universe, HBase is (as the name implies) the “Hadoop Database.” HBase leverages the Hadoop Distributed File System (HDFS ) and can also be leveraged by MapReduce jobs. HBase tables can be a source or sink to parallel processing MapReduce jobs.

  • Pig: Pig is another technology included with Apache Hadoop and, as with Hive, Pig can leverage HBase. Pig takes you up a level by giving you a higher level programming language called Pig Latin, which can do the heavy MapReduce lifting for you.

  • Multi-Language Thrift System: Thrift provides a language-neutral approach to building HBase clients. Developed by Facebook, Thrift’s Interface Definition Language (IDL) allows you to define data types and service interfaces so that two different systems written in different languages can communicate with one another. After the IDL is written, Thrift generates the code necessary for communication.

  • Java Client: If you happen to be a Java developer and you understand the ins and outs of Java packages, then you’ll want to check out the package which comes bundled with the HBase distribution.

  • REST System: Probably the fastest approach for accessing a HBase table is to leverage the REST interface. REST, which stands for Representational State Transfer, is the technology that makes your web browser work. Most folks just take web browsers for granted these days, so what could be more natural for anyone than just using your favorite browser as the gateway to an HBase cluster?

    As with the Thrift approach, the REST gateway server ships with HBase and you need to start at least one in order to enable browser interaction with your tables. To do that, just pick a port number for your gateway server and type the following command:

    $INSTALL_DIR/hbase-0.94.7/bin/hbase rest start _p 7777
  • JRuby (HBase Shell): The fastest way to roll up your sleeves and learn to use HBase is via the HBase shell. As you’ve probably already seen in the hands-on example of the HBase shell in the previous section, the shell is a powerful tool for interacting with HBase. The HBase shell is based on JRuby’s Interactive Ruby Shell or IRB for short.

    Keep in mind, however, that you can also write scripts and execute them in batch mode.