YARN’s Node Manager in Hadoop - dummies

YARN’s Node Manager in Hadoop

By Dirk deRoos

Each slave node in Yet Another Resource Negotiator (YARN) has a Node Manager daemon, which acts as a slave for the Resource Manager. As with the TaskTracker, each slave node has a service that ties it to the processing service (Node Manager) and the storage service (DataNode) that enable Hadoop to be a distributed system.

Each Node Manager tracks the available data processing resources on its slave node and sends regular reports to the Resource Manager.

The processing resources in a Hadoop cluster are consumed in bite-size pieces called containers. A container is a collection of all the resources necessary to run an application: CPU cores, memory, network bandwidth, and disk space. A deployed container runs as an individual process on a slave node in a Hadoop cluster.

The concept of a container may remind you of a slot, the unit of processing used by the JobTracker and TaskTracker, but they have some notable differences. Most significantly, containers are generic and can run whatever application logic they’re given, unlike slots, which are specifically defined to run either map or reduce tasks. Also, containers can be requested with custom amounts of resources, while slots are all uniform.

As long as the requested amount is within the minimum and maximum bounds of what’s acceptable for a container (and as long as the requested amount of memory is a multiple of the minimum amount), the Resource Manager will grant and schedule that container.

All container processes running on a slave node are initially provisioned, monitored, and tracked by that slave node’s Node Manager daemon.