How to Launch a MapReduce Application in Hadoop 1
To see how the JobTracker and TaskTracker work together to carry out a MapReduce action, take a look at the execution of a MapReduce application. The figure shows the interactions, and the following step list lays out the play-by-play:
The client application submits an application request to the JobTracker.
The JobTracker determines how many processing resources are needed to execute the entire application.
This is done by requesting the locations and names of the files and data blocks that the application needs from the NameNode, and calculating how many map tasks and reduce tasks will be needed to process all this data.
The JobTracker looks at the state of the slave nodes and queues all the map tasks and reduce tasks for execution.
As processing slots become available on the slave nodes, map tasks are deployed to the slave nodes.
Map tasks assigned to specific blocks of data are assigned to nodes where that same data is stored.
The JobTracker monitors task progress, and in the event of a task failure or a node failure, the task is restarted on the next available slot.
If the same task fails after four attempts (which is a default value and can be customized), the whole job will fail.
After the map tasks are finished, reduce tasks process the interim result sets from the map tasks.
The result set is returned to the client application.
More complicated applications can have multiple rounds of map/reduce phases, where the result of one round is used as input for the second round. This is quite common with SQL-style workloads, where there are, for example, join and group-by operations.