YARN’s Resource Manager - dummies

YARN’s Resource Manager

By Dirk deRoos

The core component of YARN (Yet Another Resource Negotiator) is the Resource Manager, which governs all the data processing resources in the Hadoop cluster. Simply put, the Resource Manager is a dedicated scheduler that assigns resources to requesting applications. Its only tasks are to maintain a global view of all resources in the cluster, handling resource requests, scheduling the request, and then assigning resources to the requesting application.

The Resource Manager, a critical component in a Hadoop cluster, should run on a dedicated master node.

Even though the Resource Manager is basically a pure scheduler, it relies on scheduler modules for the actual scheduling logic. You can choose from the same schedulers that were available in Hadoop 1, which have all been updated to work with YARN: FIFO (first in, first out), Capacity, or Fair Share.

The Resource Manager is completely agnostic with regard to both applications and frameworks — it doesn’t have any dogs in those particular hunts, in other words. It has no concept of map or reduce tasks, it doesn’t track the progress of jobs or their individual tasks, and it doesn’t handle failovers.

In short, the Resource Manager is a complete departure from the JobTracker daemon for Hadoop 1 environments. What the Resource Manager does do is schedule workloads, and it does that job well.

This high degree of separating duties — concentrating on one aspect while ignoring everything else — is exactly what makes YARN much more scalable, able to provide a generic platform for applications, and able to support a multi-tenant Hadoop cluster — multi-tenant because different business units can share the same Hadoop cluster.