The Twelve Days of NoSQL: Day Five: Replication and Eventual Consistency
On the fifth day of Christmas, my true love gave to me
Five golden rings.
By now, you must be wondering when I’m going to get around to explaining how to create a NoSQL database. When I was a junior programmer, quite early in my career, my friends and I were assigned to work on a big software development project for which we would have to use technologies with which we were completely unfamiliar. We were promised that training would be provided before the project started. The very first thing the instructor said was (paraphrasing) “First you have to insert your definitions into the C.D.D.” and he walked to the board and wrote the commands that we needed for the purpose. Needless to say, we were quite flustered because we had no idea what those “definitions” might be or what a “C.D.D.” was and how it fit into the big picture.
NoSQL is being taught without reference to the big picture. None of the current books on NoSQL mention functional segmentation even though it is the underpinning principle of NoSQL. All the current books on NoSQL imply that NoSQL principles are in conflict with the relational model. If you are in a hurry to create your first NoSQL database, I can recommend Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. But as one of the world’s greatest geniuses Leonardo da Vinci has said: “Those who are in love with practice without science are like the sailor who gets into a ship without rudder or compass, who is never certain where he is going. Practice must always be built on sound theory … The painter who copies by practice and judgement of eye, without rules, is like a mirror which imitates within itself all the things placed before it without any understanding of them.” (On the errors of those who rely on practice without science).
Continuing the train of thought from Day Four, the Dynamo developers saw that one of the keys to extreme availability was data replication. Multiple copies of the shopping cart are allowed to exist and, if one of the replicas becomes unresponsive, the data can be served by one of the other replicas. However, because of network latencies, the copies may occasionally get out of sync and the customer may occasionally encounter a stale version of the shopping cart. Once again, this can be handled appropriately by the application tier; the node that falls behind can catch up eventually or inconsistencies can be detected and resolved at an opportune time, such as at checkout. This technique is called “eventual consistency.”
The inventor of relational theory, Dr. Codd, was acutely aware of the potential overhead of consistency checking. In his 1970 paper, he said:
“There are, of course, several possible ways in which a system can detect inconsistencies and respond to them. In one approach the system checks for possible inconsistency whenever an insertion, deletion, or key update occurs. Naturally, such checking will slow these operations down. [emphasis added] If an inconsistency has been generated, details are logged internally, and if it is not remedied within some reasonable time interval, either the user or someone responsible for the security and integrity of the data is notified. Another approach is to conduct consistency checking as a batch operation once a day or less frequently.”
In other words, the inventor of relational theory would not have found a conflict between his relational model and the “eventual consistency” that is one of the hallmarks of the NoSQL products of today. However, the Dynamo developers imagined a conflict because it quite understandably conflated the relational model with the ACID guarantees of database management systems. However, ACID has nothing to do with the relational model per se (although relational theory does come in very handy in defining consistency constraints); pre-relational database management systems such as IMS provided ACID guarantees and so did post-relational object-oriented database management systems.
I should not defend eventual consistency simply by using a convenient quote from the writings of Dr. Codd. “The devil can cite Scripture for his purpose. An evil soul producing holy witness is like a villain with a smiling cheek, a goodly apple rotten at the heart. O, what a goodly outside falsehood hath!” (from the Shakespeare play The Merchant of Venice) If I am in favor of eventual consistency, I should explain why, not simply quote from the writings of Dr. Codd. If I can defend my own beliefs, I free myself to disagree with Dr. Codd as I plan to do later in this series. I have in fact come to accept that real-time consistency checking should be a design choice not a scriptural mandate. I may have had a different opinion in the past but “a foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. … Speak what you think now in hard words, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day.” (from the Emerson essay Self-Reliance).
The tradeoff between consistency and performance is as important in the wired world of today as it was in Dr. Codd’s world. We cannot cast stones at Dynamo for the infraction of not guaranteeing the synchronization of replicated data (or allowing temporary inconsistencies between functional segments), because violations of the consistency requirement are equally commonplace in the relational camp. The replication technique used by Dynamo has a close parallel in the technique of “multimaster replication” used in the relational camp. Application developers in the relational camp are warned about the negative impact of integrity constraints.     And, most importantly, no mainstream DBMS currently implements the SQL-92 “CREATE ASSERTION” feature that is necessary to provide the consistency guarantee. For a detailed analysis of this anomaly, refer to Toon Koppelaars’s article “CREATE ASSERTION: The Impossible Dream” in the August 2013 issue of the NoCOUG Journal.
1. “Using primary and foreign keys can impact performance. Avoid using them when possible.” (http://docs.oracle.com/cd/E17904_01/core.1111/e10108/adapters.htm#BABCCCIH)
2. “For performance reasons, the Oracle BPEL Process Manager, Oracle Mediator, human workflow, Oracle B2B, SOA Infrastructure, and Oracle BPM Suite schemas have no foreign key constraints to enforce integrity.” (http://docs.oracle.com/cd/E23943_01/admin.1111/e10226/soaadmin_partition.htm#CJHCJIJI)
3. “For database independence, applications typically do not store the primary key-foreign key relationships in the database itself; rather, the relationships are enforced in the application.” (http://docs.oracle.com/cd/E25178_01/fusionapps.1111/e14496/securing.htm#CHDDGFHH)
4. “The ETL process commonly verifies that certain constraints are true. For example, it can validate all of the foreign keys in the data coming into the fact table. This means that you can trust it to provide clean data, instead of implementing constraints in the data warehouse.” (http://docs.oracle.com/cd/E24693_01/server.11203/e16579/constra.htm#i1006300)