Many of the decisions you need to make in terms of the composition of racks and networking are dependent on the scale of your Hadoop cluster. It has three main permutations.


Single-rack Hadoop deployment

A single-rack deployment is an ideal starting point for a Hadoop cluster.

Here, the cluster is fairly self-contained, but because it still has relatively few slave nodes, the true benefits of Hadoop’s resiliency aren’t yet apparent.


Three-rack Hadoop deployment

A medium-size cluster has multiple racks, where the three master nodes are distributed across the racks.

Hadoop’s resiliency is starting to become apparent: Even if an entire rack were to fail (for example, both ToR switches in a single rack), the cluster would still function, albeit at a lower level of performance. A slave node failure would barely be noticeable.


Large-scale Hadoop deployment

In larger clusters with many racks, like the example shown, the networking architecture required is pretty sophisticated.

Regardless of how many racks Hadoop clusters expand to, the slave nodes from any rack need to be able to efficiently “talk” to any master node.

As the number of slave nodes increases to the point where you have more than three racks, additional racks are composed only of slave nodes, aside from the ToR switches. If you’re using HBase heavily on your cluster, you may add master nodes to host additional HMaster and Zookeeper services.

If you graduate to a truly massive scale, where you have hundreds of slave nodes, you may need to use the HDFS federation capabilities so that large portions of your data sets are managed by different NameNode services.

For every additional Active NameNode, you will need a corresponding Standby NameNode and two master nodes to host these servers. With HDFS federation, the sky is truly the limit in terms of how far you can scale out your clusters.