By Dirk deRoos

The first Hive client is the Hive command-line interface (CLI). To master the finer points of the Hive CLI client, it might help to review the (somewhat busy-looking) Hive architecture.

image0.jpg

In the second figure, the architecture is streamlined to focus only on the components that are required when running the CLI.

image1.jpg

These are the components of Hive that are needed when running the CLI on a Hadoop cluster. Here, you run Hive in local mode, which uses local storage, rather than the HDFS, for your data.

To run the Hive CLI, you execute the hive command and specify the CLI as the service you want to run. In the following listing, you can see the command that’s required as well as some of our first HiveQL statements. (A steps annotation using the A-B-C model is included in the listing to direct your attention to the key commands.)

(A) $ $HIVE_HOME/bin hive --service cli
(B) hive> set hive.cli.print.current.db=true;
(C) hive (default)> CREATE DATABASE ourfirstdatabase;
OK
Time taken: 3.756 seconds
(D) hive (default)> USE ourfirstdatabase;
OK
Time taken: 0.039 seconds
(E) hive (ourfirstdatabase)> CREATE TABLE our_first_table (
                       > FirstName       STRING,
                       > LastName        STRING,
                       > EmployeeId      INT);
OK
Time taken: 0.043 seconds
hive (ourfirstdatabase)> quit;
(F) $ ls /home/biadmin/Hive/warehouse/ourfirstdatabase.db
our_first_table

The first command (see Step A) starts the Hive CLI using the $HIVE_HOME environment variable. The –service cli command-line option directs the Hive system to start the command-line interface, though you could have chosen other servers.

Next, in Step B, you tell the Hive CLI to print your current working database so that you know where you are in the namespace. (This statement will make sense after we explain how to use the next command, so hold tight.)

In Step C you use HiveQL’s data definition language (DDL) to create your first database. (Remember that databases in Hive are simply namespaces where particular tables reside; because a set of tables can be thought of as a database or schema, you could have used the term SCHEMA in place of DATABASE to accomplish the same result.).

More specifically, you’re using DDL to tell the system to create a database called ourfirstdatabase and then to make this database the default for subsequent HiveQL DDL commands using the USE command in Step D. In Step E, you create your first table and give it the (quite appropriate) name our_first_table.

(Until now, you may have believed that it looks a lot like SQL, with perhaps a few minor differences in syntax depending on which RDBMS you’re accustomed to — and you would have been right.) The last command, in Step F, carries out a directory listing of your chosen Hive warehouse directory so that you can see that our_first_table has in fact been stored on disk.

You set the hive.metastore.warehouse.dir variable to point to the local directory /home/biadmin/Hive/warehouse in your Linux virtual machine rather than use the HDFS as you would on a proper Hadoop cluster.

After you’ve created a table, it’s interesting to view the table’s metadata. In production environments, you might have dozens of tables or more, so it’s helpful to be able to review the table structure from time to time. You can use a HiveQL command to do this using the Hive CLI, but the Hive Web Interface (HWI) Server provides a helpful interface for this type of operation.

Using the HWI Server instead of the CLI can also be more secure. Careful consideration must be made when using the CLI in production environments because the machine running the CLI must have access to the entire Hadoop cluster.

Therefore, system administrators typically put in place tools like the secure shell (ssh) in order to provide controlled and secure access to the machine running the CLI as well as to provide network encryption. However, when the HWI Server is employed, a user can only access Hive data allowed by the HWI Server via his or her web browser.