Replicating Data Blocks in the Hadoop Distributed File System

By Dirk deRoos

Hadoop Distributed File System (HDFS) is designed to store data on inexpensive, and more unreliable, hardware. Inexpensive has an attractive ring to it, but it does raise concerns about the reliability of the system as a whole, especially for ensuring the high availability of the data.

Planning ahead for disaster, the brains behind HDFS made the decision to set up the system so that it would store three (count ’em — three) copies of every data block.

HDFS assumes that every disk drive and every slave node is inherently unreliable, so, clearly, care must be taken in choosing where the three copies of the data blocks are stored.

The figure shows how data blocks from the earlier file are striped across the Hadoop cluster — meaning they are evenly distributed between the slave nodes so that a copy of the block will still be available regardless of disk, node, or rack failures.


The file shown has five data blocks, labeled a, b, c, d, and e. If you take a closer look, you can see this particular cluster is made up of two racks with two nodes apiece, and that the three copies of each data block have been spread out across the various slave nodes.

Every component in the Hadoop cluster is seen as a potential failure point, so when HDFS stores the replicas of the original blocks across the Hadoop cluster, it tries to ensure that the block replicas are stored in different failure points.

For example, take a look at Block A. At the time it needed to be stored, Slave Node 3 was chosen, and the first copy of Block A was stored there. For multiple rack systems, HDFS then determines that the remaining two copies of block A need to be stored in a different rack. So the second copy of block A is stored on Slave Node 1.

The final copy can be stored on the same rack as the second copy, but not on the same slave node, so it gets stored on Slave Node 2.