Take HBase for a Test Run - dummies

Take HBase for a Test Run

By Dirk deRoos

Here, you find out how to download and deploy HBase in standalone mode. It’s amazingly simple to install HBase and start using the technology. Just keep in mind that HBase is typically deployed on a cluster of commodity servers, though you can also easily deploy HBase in a standalone configuration instead, for learning or demonstration purposes.

Like Apache Hadoop, HBase supports Linux primarily but you can use Windows in non-production environments if you first download Cygwin. Cygwin gives Microsoft Windows users a Unix shell with all its commands and utilities. So if you follow the Quick Start Guide, you’ll want to download the latest HBase release.

You get to choose where to install HBase. It turns out, though, that if you want things to run in standalone mode, you’ll need to edit a couple of files before you can actually start HBase. The first file is shown in the following listing. The changes you’ll want to make are bolded to make them stand out:

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/biadmin/my-local-hbase/hbase-data</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2222</value>
    <description>Property from ZooKeeper's config zoo.cfg.
      The port at which the clients will connect.
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/biadmin/my-local-hbase/zookeeper</value>
  </property>
   <property>
      <name>hbase.zookeeper.quorum</name>
      <value>bivm</value>
   </property>
</configuration>

You specify a directory in the local file system to store the HBase data. In production environments, this property would point to the HDFS for the data store. For the sake of illustration, pseudo-distributed mode will cause HBase to start a RegionServer instance, a MasterServer instance, and a Zookeeper process.

Additionally, you need to specify the directory where Zookeeper will store its data () and a list of servers on which Zookeeper will run to form a quorum (). For standalone, you specify only the single Zookeeper server.

Getting started with HBase in standalone mode is very straightforward in part because HBase manages Zookeeper for you. You can download a separate Zookeeper release and point HBase to it, but for standalone installs, you’ll find it much easier to let HBase manage Zookeeper for you.

To crystallize the decision to let HBase manage Zookeeper for you, here’s how to set an environment variable in yet another HBase file. The following listing shows what needs to be added:

# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/opt/ibm/biginsights/jdk

You’ll have to make sure you point to your chosen JDK. Finally, you need to specify the name of your Linux system in yet another file. (In a fully distributed production environment, this file would have a line by line list of all servers on which HBase can start the RegionServer process on.)

You can now start up HBase and test your install. To start HBase, use the script as spelled out in the following listing.

$ cd $INSTALL_DIR/hbase-0.94.7/bin
$ ./start-hbase.sh
bivm: starting zookeeper, logging to /home/biadmin/my-local-hbase/hbase-0.94.7/bin/../logs/hbase-biadmin-zookeeper-bivm.out
starting master, logging to /home/biadmin/my-local-hbase/hbase-0.94.7/bin/../logs/hbase-biadmin-master-bivm.out
localhost: starting regionserver, logging to /home/biadmin/my-local-hbase/hbase-0.94.7/bin/../logs/hbase-biadmin-regionserver-bivm.out

Note that the first line has a cd (change directory) command that moves you to an environment variable. You have to set that variable to your actual install directory for HBase or type out the full path.

Next use the JConsole tool, which comes bundled with Java, to perform a quick check on what processes are running after the script finishes. You can start the JConsole tool by typing the following command: $JAVA_HOME/bin/jconsole.

JConsole reveals that the three processes that the script claimed to start are indeed running — the zookeeper, the master, and the RegionServer processes.

image0.jpg

To put HBase through its paces, you interact with all three HBase processes, starting with the MasterServer. By default, the MasterServer reports on the system status by way of a browser user interface on port number 60010. In the example, you can confirm that the MasterServer is running correctly by entering the following URL in a web browser: http://bivm:60010/. Doing so brings up the information you see here.

image1.jpg