Applying Consistency Methods in NoSQL
The consistency property of a database means that once data is written to a database successfully, queries that follow are able to access the data and get a consistent view of the data. In practice, this means that if you write a record to a database and then immediately request that record, you’re guaranteed to see it. It’s particularly useful for things like Amazon orders and bank transfers.
Consistency is a sliding scale, though, and a subject too deep to cover here. However, in the NoSQL world, consistency generally falls into one of two camps:
ACID Consistency (ACID stands for Atomicity, Consistency, Isolation, Durability): ACID means that once data is written, you have full consistency in reads.
Eventual Consistency (BASE): BASE means that once data is written, it will eventually appear for reading.
A battle has been raging between people who believe strong consistency in a database isn’t required and those who believe it absolutely is required (translate people to NoSQL companies’ marketing departments!).
The reality is somewhere in between. Does it matter that a person’s Facebook post isn’t seen by all his friends for five minutes? No, probably not. Change “Facebook post” to “billion‐dollar‐financial transaction,” though, and your attitude changes rapidly! Which consistency approach you pick depends on the situation. In my experience, though, strong consistency is always the choice in mission‐critical enterprise system situations.
ACID is a general set of principles for transactional systems, not something linked purely to relational systems, or even just databases, so it’s well worth knowing about. ACID basically means, “This database has facilities to stop you from corrupting or losing data,” which isn’t a given for all databases. In fact, the vast majority of NoSQL databases don’t provide ACID guarantees.
Foundation DB, MarkLogic, and Neo4j are notable exceptions. Some NoSQL databases provide a lower‐grade guarantee called Check and Set that verifies whether someone else has altered a document before allowing a transaction to complete. This behavior is usually limited because it tends to be implemented on a single‐record basis.
MongoDB is a notable database that provides Check and Set capabilities. With MongoDB, an entire node‐worth of data can be locked during an update, thereby preventing all read and all write operations until the operation completes. The company is working on removing this limitation, though.
BASE means that rather than make ACID guarantees, the database has a tunable balance of consistency and data availability. This is typically the case when nodes in a given database cluster act as primary managers of a part of the database, and other nodes hold read‐only replicas.
To ensure that every client sees all updates (that is, they have a consistent view of the data), a write to the primary node holding the data needs to lock until all read replicas are up to date. This is called a two‐phase commit — the change is made locally but applied and confirmed to the client only when all other nodes are updated.
BASE relaxes this requirement, requiring only a subset of the nodes holding the same data to be updated in order for the transaction to succeed. Sometime after the transaction is committed, the read‐only replica is updated.
The advantage of this approach is that transactions are committed faster. Having readable live replicas also means you can spread your data read load, making reading quicker.
The downside is that clients connecting to some of the read replicas may see out‐of‐date information for an unspecified period of time. In some scenarios, this state is fine. If you post a new message on Facebook and some of your friends don’t see it for a couple of minutes, it’s not a huge loss. If you send a payment order to your bank, though, you may want an immediate transaction.
An alternative approach to read‐only replicas is to have a shared‐nothing cluster in which only one node on a cluster always serves a particular part of the database.
Shared‐nothing doesn’t mean you lose replication, though. Databases that employ this method typically do replicate their data to a secondary area on another primary node or nodes — but only one node is the master for reads and writes at any time.
Shared‐nothing clusters have the advantage of a simpler consistency model but require a two‐phase commit to replicas. This fact means the transaction locks while all replicas are updated. (An internal lock plus locking for other nodes gives you two phases.)
This typically has less impact than shared data clusters with read-only replicas, though, because shared‐nothing replica data areas don’t receive read requests for that part of the database. Therefore, two‐phase commits are faster on a shared‐nothing cluster than on a cluster with readable replicas.
Choosing ACID or BASE?
As you might expect, much of the argument is because NoSQL vendors can differentiate themselves from their competitors by claiming a different, unique approach. It’s interesting to note, however, the number of NoSQL vendors with ACID‐compliance on their roadmap.
Some NoSQL databases have ACID‐compliance on their roadmap, even though they are proponents of BASE, which shows how relevant ACID guarantees are to enterprise, mission‐critical systems.
Many companies use BASE‐consistency products when testing ideas because they are free but then migrate to an ACID‐compliant paid‐for database when they want to go live on a mission‐critical system.
The easiest way to decide whether you need ACID is to consider the interactions people and other systems have with your data. For example, if you add or update data, is it important that the very next query is able to see the change? In other words, are important decisions hanging on the current state of the database? Would seeing slightly out‐of‐date data mean that those decisions could be fatally flawed?
In financial services, the need for consistency is obvious. Think of traders purchasing stock. They need to check the cash balance before trading to ensure that they have the money to cover the trade. If they don’t see the correct balance, they will decide to spend money on another transaction. If the database they’re querying is only eventually consistent, they may not see a lack of sufficient funds, thus exposing their organization to financial risk.
Similar cases can be built for ACID over BASE in health care, defense, intelligence, and other sectors. It all boils down to the data, though, and the importance of both timeliness and data security.