Apache Zookeeper and NoSQL Databases

By Adam Fowler

A large cluster of NoSQL databases is an unwieldy thing to manage. Apache Zookeeper to the rescue! Keeping track of which nodes are in the cluster, what data each is managing, and ensuring that new masters are selected when a master fails aren’t easy tasks.

Coordinating large distributed systems is, therefore, very difficult. Both Hadoop and distributed NoSQL databases need a way to manage the configuration of an entire cluster. This process also needs to be highly available so that it isn’t single point of failure in the overall system.

This is where Apache Zookeeper comes in. Zookeeper provides a distributed, transactionally consistent coordination service.

Several other products use Zookeeper for cluster management:

  • Apache Hadoop

  • Solr Cloud

  • Neo4j

  • Accumulo

  • HBase

  • Rackspace

  • Zynga

  • Yahoo! (for several of its services)

Zookeeper provides an in-memory hierarchical storage structure that’s similar to a computer file system. This structure is managed by the current Zookeeper master and replicated among the other nodes in the cluster. A Zookeeper cluster is called a Zookeeper ensemble.

Only the master manages updates (writes) to storage. These changes are checkpointed to disk to make them durable and then replicated to the other Zookeeper instances in the ensemble.

These services store their cluster configuration data in Zookeeper. Some of them store the key ranges for shards of their database, too. This enables clients who are utilizing a NoSQL database that uses Zookeeper to communicate with any Zookeeper server in the ensemble. In this way, clients can discover which NoSQL servers hold the data they’re interested in.

Looking up which node stores which key range minimizes the load on the NoSQL servers, because they don’t need to forward requests for data from one NoSQL node to the node that actually holds the data.

You can also use Zookeeper’s data storage for ephemeral storage (storage that won’t last beyond a restart of the service), which is useful for storing session or other runtime data.

Zookeeper servers use this ephemeral storage to determine who takes over if a master fails. Each server creates a numbered ephemeral znode (storage file) in the key space. If the Zookeeper master (leader) suffers a hardware failure, then the owner of the next znode in the sequence becomes the master. This is an elegant solution and avoids the “herd” effect where all servers communicate frantically with each other for a few seconds to select a new master.

Zookeeper is a great Java solution to the problems inherent in coordinated systems management and high availability. You can use it to implement highly available services, including messaging services. So, if you need to create a new distributed service, consider using Zookeeper.