Slave Nodes in the Hadoop Distributed File System (HDFS)

By Dirk deRoos

In a Hadoop cluster, each data node (also known as a slave node) runs a background process named DataNode. This background process (also known as a daemon) keeps track of the slices of data that the system stores on its computer. It regularly talks to the master server for HDFS (known as the NameNode) to report on the health and status of the locally stored data.

Data blocks are stored as raw files in the local file system. From the perspective of a Hadoop user, you have no idea which of the slave nodes has the pieces of the file you need to process. From within Hadoop, you don’t see data blocks or how they’re distributed across the cluster — all you see is a listing of files in HDFS.

The complexity of how the file blocks are distributed across the cluster is hidden from you — you don’t know how complicated it all is, and you don’t need to know. Actually, the slave nodes themselves don’t even know what’s inside the data blocks they’re storing. It’s the NameNode server that knows the mappings of which data blocks compose the files stored in HDFS.

Better living through redundancy

One core design principle of HDFS is the concept of minimizing the cost of the individual slave nodes by using commodity hardware components. For massively scalable systems, this idea is a sensible one because costs escalate quickly when you need hundreds or thousands of slave nodes. Using lower-cost hardware has a consequence, though, in that individual components aren’t as reliable as more expensive hardware.

When you’re choosing storage options, consider the impact of using commodity drives rather than more expensive enterprise-quality drives. Imagine that you have a 750-node cluster, where each node has 12 hard disk drives dedicated to HDFS storage.

Based on an annual failure rate (AFR) of 4 percent for commodity disk drives (a given hard disk drive has a 4 percent likelihood of failing in a given year, in other words), your cluster will likely experience a hard disk failure every day of the year.

Because there can be so many slave nodes, their failure is also a common occurrence in larger clusters with hundreds or more nodes. With this information in mind, HDFS has been engineered on the assumption that all hardware components, even at the slave node level, are unreliable.

HDFS overcomes the unreliability of individual hardware components by way of redundancy: That’s the idea behind those three copies of every file stored in HDFS, distributed throughout the system. More specifically, each file block stored in HDFS has a total of three replicas. If one system breaks with a specific file block that you need, you can turn to the other two.

Sketching out slave node server design

To balance such important factors as total cost of ownership, storage capacity, and performance, you need to carefully plan the design of your slave nodes.

You commonly see slave nodes now where each node typically has between 12 and 16 locally attached 3TB hard disk drives. Slave nodes use moderately fast dual-socket CPUs with six to eight cores each — no speed demons, in other words. This is accompanied by 48GB of RAM. In short, this server is optimized for dense storage.

Because HDFS is a user-space-level file system, it’s important to optimize the local file system on the slave nodes to work with HDFS. In this regard, one high-impact decision when setting up your servers is choosing a file system for the Linux installation on the slave nodes.

Ext3 is the most commonly deployed file system because it has been the most stable option for a number of years. Take a look at Ext4, however. It’s the next version of Ext3, and it has been available long enough to be widely considered stable and reliable.

More importantly for our purposes, it has a number of optimizations for handling large files, which makes it an ideal choice for HDFS slave node servers.

Don’t use the Linux Logical Volume Manager (LVM) — it represents an additional layer between the Linux file system and HDFS, which prevents Hadoop from optimizing its performance. Specifically, LVM aggregates disks, which hampers the resource management that HDFS and YARN do, based on how files are distributed on the physical drives.