Checkpointing Updates in Hadoop Distributed File System

By Dirk deRoos

Hadoop Distributed File System (HDFS) is a journaled file system, where new changes to files in HDFS are captured in an edit log that’s stored on the NameNode in a file named. Periodically, when the file reaches a certain threshold or after a certain period has elapsed, the journaled entries need to be committed to the master file.

The NameNode itself doesn’t do this, because it’s designed to answer application requests as quickly as possible. More importantly, considerable risk is involved in having this metadata update operation managed by a single master server.

If the metadata describing the mappings between the data blocks and their corresponding files becomes corrupted, the original data is as good as lost.

Checkpointing services for a Hadoop cluster are handled by one of four possible daemons, which need to run on their own dedicated master node alongside the NameNode daemon’s master node:

  • Secondary NameNode: Prior to Hadoop 2, this was the only checkpointing daemon, performing the checkpointing process described in this section. The Secondary NameNode has a notoriously inaccurate name because it is in no way “secondary” or a “standby” for the NameNode.

  • Checkpoint Node: The Checkpoint Node is the replacement for the Secondary NameNode. It performs checkpointing and nothing more.

  • Backup Node: Provides checkpointing service, but also maintains a backup of the and edits file.

  • Standby NameNode: Performs checkpointing service and, unlike the old Secondary NameNode, the Standby NameNode is a true standby server, enabling a hot-swap of the NameNode process to avoid any downtime.

The checkpointing process

The following steps describe the checkpointing process as it’s carried out by the NameNode and the checkpointing service (note that four possible daemons can be used for checkpointing):

  1. When it’s time to perform the checkpoint, the NameNode creates a new file to accept the journaled file system changes.

    It names the new file.

  2. As a result, the file accepts no further changes and is copied to the checkpointing service, along with the file.

  3. The checkpointing service merges these two files, creating a file named.

  4. The checkpointing service copies the file to the NameNode.

  5. The NameNode overwrites the file with.

  6. The NameNode renames the file to.

    image0.jpg

Backup Node considerations

In addition to providing checkpointing functionality, the Backup Node maintains the current state of all the HDFS block metadata in memory, just like the NameNode. In this sense, it maintains a real-time backup of the NameNode’s state.

As a result of keeping the block metadata in memory, the Backup Node is far more efficient than the Checkpoint Node at performing the checkpointing task, because the and files don’t need to be transferred and then merged. These changes are already merged in memory.

Another benefit of using the Backup Node is that the NameNode can be configured to delegate the Backup Node so that it persists journal data to disk.

If you’re using the Backup Node, you can’t run the Checkpoint Node. There’s no need to do so, because the checkpointing process is already being taken care of.

Standby NameNode considerations

The Standby NameNode is the designated hot standby master server for the NameNode. While serving as standby, it also performs the checkpointing process. As such, you can’t run the Backup Node or Standby Node.

Secondary NameNode, Checkpoint Node, Backup Node, and Standby NameNode Master server design

The master server running the Secondary NameNode, Checkpoint Node, Backup Node, or Standby NameNode daemons have the same hardware requirements as the ones deployed for the NameNode master server. The reason is that these servers also load into memory all the metadata and location data about all the data blocks stored in HDFS.