3 Hadoop Cluster Configurations
Sqoop 2.0 Preview
10 Emerging Hadoop Technologies to Keep Your Eye On

Your First Hadoop Program: Hello Hadoop!

After the Hadoop cluster is installed and running, you can run your first Hadoop program. This application is very simple, and calculates the total miles flown for all flights flown in one year. The year is defined by the data file you read in your application.

To keep things a bit simpler here, you’ll run a Pig script to calculate the total miles flown. You will see the map and reduce phases fly by in the output.

Here is the code for this Pig script:

records = LOAD '2013_subset.csv' USING PigStorage(',') AS              
(Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,\               
CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,\              
CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,\              
Distance:int,TaxiIn,TaxiOut,Cancelled,CancellationCode,\              
Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,\              LateAircraftDelay);milage_recs = 
GROUP records ALL;tot_miles = FOREACH milage_recs GENERATE SUM(records.Distance);STORE tot_miles INTO /user/root/totalmiles;

You want to put this code in a file on your VM, so first create a file. Right-click on the desktop of your VM and select Create Document from the contextual menu that appears and name the document. Then open the document in an editor, paste in the code, and save the file.

From the command line, run the following command to run the Pig script:

pig totalmiles.pig

You will see many lines of output, and then finally a “Success!” message, followed by more statistics, and then finally the command prompt. After your Pig job has completed, you can see your output:

hdfs dfs -cat /user/root/totalmiles/part-r-00000

Drumroll, please… And the answer is: 775009272

And with that, you’ve run your first Hadoop application!

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Hadapt and Hadoop
The Hadoop dfsadmin Command Options
The YARN Architecture in Hadoop
Hadoop Pig and Pig Latin for Big Data
Local and Distributed Modes of Running Pig Scripts in Hadoop
Advertisement

Inside Dummies.com