Hadoop and Hive - dummies

By Dirk deRoos

To make a long story short, Hive provides Hadoop with a bridge to the RDBMS world and provides an SQL dialect known as Hive Query Language (HiveQL), which can be used to perform SQL-like tasks. That’s the big news, but there’s more to Hive than meets the eye, as they say, or more applications of this new technology than you can present in a standard elevator pitch.

For example, Hive also makes possible the concept known as enterprise data warehouse (EDW) augmentation, a leading use case for Apache Hadoop, where data warehouses are set up as RDBMSs built specifically for data analysis and reporting.

Now, some experts will argue that Hadoop (with Hive, HBase, Sqoop, and its assorted buddies) can replace the EDW. However, Apache Hadoop is a great addition to the enterprise and that it can augment and complement existing EDWs. Hive, HBase, and Sqoop enable EDW augmentation.

Closely associated with RDBMS/EDW technology is extract, transform, and load (ETL) technology. To grasp what ETL does, it helps to know that, in many use cases, data cannot be immediately loaded into the relational database — it must first be extracted from its native source, transformed into an appropriate format, and then loaded into the RDBMS or EDW.

For example, a company or an organization might extract unstructured text data from an Internet forum, transform the data into a structured format that’s both valuable and useful, and then load the structured data into its EDW.

You can see that Hive is a powerful ETL tool in its own right, along with the major player in this realm: Apache Pig. Again, users may try to set up Hive and Pig as the new ETL tools for the data center. (Let them try.)

As with the debate over EDW versus Apache Hadoop, these Apache Hadoop technologies are not direct replacements for existing ETL tools but instead are powerful new ETL tools to be used when appropriate.

Last but not least, Apache Hive gives you powerful analytical tools, all within the framework of HiveQL. These tools should look and feel quite familiar to IT professionals who understand how to use SQL.