How to Use Hive’s Create Table As Select (CTAS)

By Dirk deRoos

In the Hive DML example shown here, the powerful technique in Hive known as Create Table As Select, or CTAS is illustrated. Its constructs allow you to quickly derive Hive tables from other tables as you build powerful schemas for big data analysis.

(A) hive> CREATE TABLE myflightinfo2007 AS
    > SELECT Year, Month, DepTime, ArrTime, FlightNum, Origin, Dest FROM FlightInfo2007
    > WHERE (Month = 7 AND DayofMonth = 3) AND (Origin='JFK' AND Dest='ORD');
(B) hive> SELECT * FROM myFlightInfo2007;
OK
2007    7       700     834     5447    JFK     ORD
2007    7       1633    1812    5469    JFK     ORD
2007    7       1905    2100    5492    JFK     ORD
2007    7       1453    1624    4133    JFK     ORD
2007    7       1810    1956    4392    JFK     ORD
2007    7       643     759     903     JFK     ORD
2007    7       939     1108    907     JFK     ORD
2007    7       1313    1436    915     JFK     ORD
2007    7       1617    1755    917     JFK     ORD
2007    7       2002    2139    919     JFK     ORD
Time taken: 0.089 seconds, Fetched: 10 row(s)
hive> CREATE TABLE myFlightInfo2008 AS
    > SELECT Year, Month, DepTime, ArrTime, FlightNum, Origin, Dest FROM FlightInfo2008
    > WHERE (Month = 7 AND DayofMonth = 3) AND (Origin='JFK' AND Dest='ORD');
hive> SELECT * FROM myFlightInfo2008;
OK
2008    7       930     1103    5199    JFK     ORD
2008    7       705     849     5687    JFK     ORD
2008    7       1645    1914    5469    JFK     ORD
2008    7       1345    1514    4392    JFK     ORD
2008    7       1718    1907    1217    JFK     ORD
2008    7       757     929     1323    JFK     ORD
2008    7       928     1057    907     JFK     ORD
2008    7       1358    1532    915     JFK     ORD
2008    7       1646    1846    917     JFK     ORD
2008    7       2129    2341    919     JFK     ORD
Time taken: 0.186 seconds, Fetched: 10 row(s)

In Step A, you build two smaller tables derived from the FlightInfo2007 and FlightInfo2008 by selecting a subset of fields from the larger tables for a particular day (in this case, July 3), where the origin of the flight is New York’s JFK airport (JFK) and the destination is Chicago’s O’Hare airport (ORD).

Then in Step B you simply dump the contents of these small tables so that you can view the data.