Importing Data into Hive with Sqoop - dummies

Importing Data into Hive with Sqoop

By Dirk deRoos

Here, you import all of the Service Order Database directly from MySQL into Hive and run a HiveQL query against the newly imported database on Apache Hadoop. The following listing shows you how it’s done.

hive> create database serviceorderdb;
OK
Time taken: 1.343 seconds
hive> use serviceorderdb;
OK
Time taken: 0.062 seconds
$ sqoop import --connect jdbc:mysql://localhost/serviceorderdb 
   --username root -P 
   --table productinfo 
   --hive-import 
   --hive-table serviceorderdb.productinfo -m 1
Enter password:
...
13/08/16 15:17:08 INFO hive.HiveImport: Hive import complete.
$ sqoop import --connect jdbc:mysql://localhost/serviceorderdb 
   --username root -P 
   --table customercontactinfo 
   --hive-import 
   --hive-table serviceorderdb.customercontactinfo -m 1
Enter password:
...
13/08/16 17:21:35 INFO hive.HiveImport: Hive import complete.
$ sqoop import --connect jdbc:mysql://localhost/serviceorderdb 
   --username root -P 
   --table serviceorders 
   --hive-import 
   --hive-table serviceorderdb.serviceorders -m 1
Enter password:
...
13/08/16 17:26:56 INFO hive.HiveImport: Hive import complete.

When the import operations are complete, you run the show tables command to list the newly imported tables (see the following listing):

hive> show tables;
OK
customercontactinfo
productinfo
serviceorders
Time taken: 0.074 seconds

Then run a Hive query to show which Apache Hadoop technologies have open service orders in the database:

hive> SELECT productdesc FROM productinfo
   > INNER JOIN serviceorders
   > ON productinfo.productnum = serviceorders.productnum;
...
OK
HBase Support Product
Hive Support Product
Sqoop Support Product
Pig Support Product
Time taken: 28.552 seconds

You can confirm the results. You have four open service orders on the products in bold. The Sqoop Hive import operation worked, and now the service company can leverage Hive to query, analyze, and transform its service order structured data.

Additionally, the company can now combine its relational data with other data types (perhaps unstructured) as part of any new Hadoop analytics applications. Many possibilities now exist with Apache Hadoop being part of the overall IT strategy!