Day One: Disruptive Innovation
Day Two: Requirements and Assumptions
Day Three: Functional Segmentation
Day Four: Sharding
Day Five: Replication and Eventual Consistency
Day Six: The False Premise of NoSQL
Day Seven: Schemaless Design
Day Eight: Oracle NoSQL Database
Day Nine: NoSQL Taxonomy
Day Ten: Big Data
Day Eleven: Mistakes of the relational camp
Day Twelve: Concluding Remarks
On the twelfth day of Christmas, my true love gave to me
Twelve drummers drumming.
The relational camp put productivity, ease-of-use, and logical elegance front and center. However, the mistakes and misconceptions of the relational camp prevent mainstream database management systems from achieving the performance levels required by modern applications. For example, Dr. Codd forbade nested relations (a.k.a.unnormalized relations) and mainstream database management systems equate the normalized set with the stored set.
The NoSQL camp on the other hand put performance, scalability, and reliability front and center. Understandably the NoSQL camp could not see past the mistakes and misconceptions of the relational camp and lost the opportunity to take the relational model to the next level. Just like the relational camp, the NoSQL camp believes that normalization dictates physical storage choices. Just like the relational camp, the NoSQL camp believes that non-relational APIs are forbidden by the relational model. And the NoSQL camp believes that relational is synonomous with ACID (Atomicity, Consistency, Isolation, Durability).
The NoSQL camp created a number of innovations that are disruptive in the sense used by Harvard Business School professor Clayton Christensen: functional segmentation, sharding, replication, eventual consistency, and schemaless design. Since these innovations are compatible with the relational model, I hope that they will eventually be absorbed by mainstream database management systems.
There are already proofs that performance, scalability, and reliability can be achieved without abandoning the relational model. For example, ScaleBase provides sharding and replication on top of MySQL storage nodes. Another good example to study is VoltDB which claims to be the world’s fastest OLTP database (though it has never published an audited TPC benchmark). A counter-example to Amazon is eBay which arguably has equal scale and equally high performance, scalability, and reliability requirements. eBay uses performance segmentation, sharding, replication, and eventual consistency but continues to use Oracle (and SQL) to manage the local database. I asked Randy Shoup, one of the architects of the eBay e-commerce platform, why eBay did not abandon Oracle Database and he answered in one word: “comfort.” Here are links to some of his presentations and articles on the eBay architecture:
- eBay’s Scaling Odyssey: Growing and Evolving a Large eCommerce Site (Slide deck)
- The eBay Architecture: Striking a balance between site stability, feature velocity, performance, and cost (Slide deck)
- Randy Shoup Discusses the eBay Architecture (video and transcript)
- Randy Shoup on eBay’s Architectural Principles (video and transcript)
- Scalability Best Practices: Lessons from eBay (blog post)
Finally, I should point out that are very good reasons to criticize current NoSQL products; for example, lack of standards, primitive feature sets, primitive security, and primitive management tools, unproven claims, and traps for the unwary. MongoDB uses a database-wide lock for reads and writes …
I hope that you enjoyed reading this series of posts as much as I enjoyed writing it. Happy new year!
On the seventh day of Christmas, my true love gave to me
Seven swans a-swimming.
As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from lack of distributed transactions across functional segments and from asynchronous replication), and blobs.
The final innovation of the NoSQL camp is “schemaless design.” In database management systems of the NoSQL kind, data is stored in “blobs” and documents the database management system does not police their structure. In mainstream database management systems on the other hand, doctrinal purity requires that the schema be designed before data is inserted. Let’s do a thought experiment.
Suppose that we don’t have a schema and let’s suppose that the following facts are known.
- Iggy Fernandez is an employee with EMPLOYEE_ID=1 and SALARY=$1000.
- Mogens Norgaard is a commissioned employee with EMPLOYEE_ID=2, SALARY=€1000, and COMMISSION_PCT=25.
- Morten Egan is a commissioned employee with EMPLOYEE_ID=3, SALARY=€1000, and unknown COMMISSION_PCT.
Could we ask the following questions and expect to receive correct answers?
- Question: What is the salary of Iggy Fernandez?
- Correct answer: $1000.
- Question: What is the commission percentage of Iggy Fernandez?
- Correct answer: Invalid question.
- Question: What is the commission percentage of Mogens Norgaard?
- Correct answer: 25%
- Question: What is the commission percentage of Morten Egan?
- Correct answer: Unknown.
If we humans can process the above data and correctly answer the above questions, then surely we can program computers to do so.
The above data could be modeled with the following three relations. It is certainly disruptive to suggest that this be done on the fly by the database management system but not outside the realm of possibility.
EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) EMPLOYEE_NAME VARCHAR2(128) UNCOMMISSIONED_EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) SALARY NUMBER(8,2) COMMISSIONED_EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) SALARY NUMBER(8,2) COMMISSION_PCT NUMBER(2,2)
A NoSQL company called Hadapt has already stepped forward with such a feature:
“While it is true that SQL requires a schema, it is entirely untrue that the user has to define this schema in advance before query processing. There are many data sets out there, including JSON, XML, and generic key-value data sets that are self-describing — each value is associated with some key that describes what entity attribute this value is associated with [emphasis added]. If these data sets are stored in Hadoop, there is no reason why Hadoop cannot automatically generate a virtual schema against which SQL queries can be issued. And if this is true, users should not be forced to define a schema before using a SQL-on-Hadoop solution — they should be able to effortlessly issue SQL against a schema that was automatically generated for them when data was loaded into Hadoop.
Thanks to the hard work of many people at Hadapt from several different groups, including the science team who developed an initial design of the feature, the engineering team who continued to refine the design and integrate it into Hadapt’s SQL-on-Hadoop solution, and the customer solutions team who worked with early customers to test and collect feedback on the functionality of this feature, this feature is now available in Hadapt.” (http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/)
This is not really new ground. Oracle Database provides the ability to convert XML documents into relational tables (http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb01int.htm#ADXDB0120) though it ought to be possible to view XML data as tables while physically storing it in XML format in order to benefit certain use cases. It should also be possible to redundantly store data in both XML and relational formats in order to benefit other use cases.
In “Extending the Database Relational Model to Capture More Meaning,” Dr. Codd explains how a “formatted database” is created from a collection of facts:
“Suppose we think of a database initially as a set of formulas in first-order predicate logic. Further, each formula has no free variables and is in as atomic a form as possible (e.g, A & B would be replaced by the component formulas A, B). Now suppose that most of the formulas are simple assertions of the form Pab…z (where P is a predicate and a, b, … , z are constants), and that the number of distinct predicates in the database is few compared with the number of simple assertions. Such a database is usually called formatted, because the major part of it lends itself to rather regular structuring. One obvious way is to factor out the predicate common to a set of simple assertions and then treat the set as an instance of an n-ary relation and the predicate as the name of the relation.”
In other words, a collection of facts can always be organized into relations if necessary.
On the sixth day of Christmas, my true love gave to me
Six geese a-laying.
The final hurdle was extreme performance, and that’s where the Dynamo developers went astray. The Dynamo developers believed that the relational model imposes a “join penalty” and therefore chose to store data as “blobs.” This objection to the relational model is colorfully summarized by the following statement attributed to Esther Dyson, the editor of the Release 1.0 newsletter, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.”  The statement dates back to 1988 and was much quoted when object-oriented databases were in vogue.
Since the shopping cart is an object, doesn’t disassembling it for storage make subsequent data retrieval and updates inefficient? The belief stems from an unfounded assumption that has found its way into every mainstream DBMS—that every table should map to physical storage. In reality, the relational model is a logical model and, therefore, it does not concern itself with storage details at all. It would be perfectly legitimate to store the shopping cart in a physical form that resembled a shopping cart while still offering a relational model of the data complete with SQL. In other words, the physical representation could be optimized for the most important use case—retrieving the entire shopping-cart object using its key—without affecting the relational model of the data. It would also be perfectly legitimate to provide a non-relational API for the important use cases. Dr. Codd himself gave conditional blessing to such non-relational APIs in his 1985 Computerworld article, “Is Your DBMS Really Relational?”, in which he says, “If a relational system has a low-level (single-record-at-a-time) language, that low level [should not] be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”
The key-blob or “key-value” approach used by Dynamo and successor products would be called “zeroth” normal form in relational terminology. In his 1970 paper, Dr. Codd says: “Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on. For example, one of the domains on which the relation employee is defined might be salary history. An element of the salary history domain is a binary relation defined on the domain date and the domain salary. The salary history domain is the set of all such binary relations. At any instant of time there are as many instances of the salary history relation in the data bank as there are employees. In contrast, there is only one instance of the employee relation.” In common parlance, a relation with non-simple domains is said to be in “zeroth” normal form or unnormalized. Dr. Codd suggested that unnormalized relations should be normalized for ease of use. Here again is the unnormalized employee relation from Dr. Codd’s paper:
employee ( employee#, name, birthdate, jobhistory (jobdate, title, salaryhistory (salarydate, salary)), children (childname, birthyear) )
The above unnormalized relation can be decomposed into four normalized relations as follows.
employee' (employee#, name, birthdate) jobhistory' (employee#, jobdate, title) salaryhistory' (employee#, jobdate, salarydate, salary) children' (employee#, childname, birthyear)
However, this is not to suggest that these normalized relations must necessarily be mapped to individual buckets of physical storage. Dr. Codd differentiated between the stored set, the named set, and the expressible set. In the above example, we have one unnormalized relation and four normalized relations, if we preferred it, the unnormalized employee relation could be the only member of the stored set. Alternatively, if we preferred it, all five relations could be part of the stored set; that is, we could legitimately store redundant representations of the data. However, the common belief blessed by current practice is that the normalized relations should be the only members of the stored set.
Even if the stored set contains only normalized relations, they need not map to different buckets of physical storage. Oracle is unique among mainstream database management systems in providing a convenient construct called the “table cluster” that is suitable for hierarchical schemas. In Dr. Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk thus avoiding the join penalty. If the cluster was a “hash cluster,” no indexes would be required for the use case of retrieving records belonging to a single cluster key.
The mistake made by the Dynamo developers is really a mistake perpetuated by the relational camp but it is a mistake nevertheless.
Tomorrow: Schemaless Design
1. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.
Appendix: Table Clusters in Oracle Database
Here’s a demonstration of using Oracle table clusters to store records from four tables in the same block and retrieving all the components of the “employee cart” without using indexes. First we create four normalized tables and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. Then we create an object-relational view that assembles employee information into a single unnormalized structure and show how to insert into this view using an “INSTEAD OF” trigger.
The following demonstration was performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 188.8.131.52.
SQL*Plus: Release 184.108.40.206.0 Production on Sun Jul 28 19:44:23 2013 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
First, we create a table cluster and add four tables to the cluster.
SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000; Cluster created. SQL> CREATE TABLE employees 2 ( 3 employee# INTEGER NOT NULL, 4 name VARCHAR2(16), 5 birth_date DATE, 6 CONSTRAINT employees_pk PRIMARY KEY (employee#) 7 ) 8 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE job_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 title VARCHAR2(16), 6 CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date), 7 CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE salary_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 salary_date DATE NOT NULL, 6 salary NUMBER, 7 CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date), 8 CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees, 9 CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history 10 ) 11 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE children 2 ( 3 employee# INTEGER NOT NULL, 4 child_name VARCHAR2(16) NOT NULL, 5 birth_date DATE, 6 CONSTRAINT children_pk PRIMARY KEY (employee#, child_name), 7 CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created.
Then we insert data into all four tables. We find that all the records have been stored in the same database block even though they belong to different tables. Therefore the join-penalty has been eliminated.
SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN'); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000); 1 row created. SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1; BLOCK_NUMBER ------------ 22881
Next we create an object-relational view that presents each employee as an object.
SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE) 2 / Type created. SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER) 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB) 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec 2 / Type created. SQL> create or replace view employees_view as 2 SELECT 3 employee#, 4 name, 5 birth_date, 6 CAST 7 ( 8 MULTISET 9 ( 10 SELECT 11 child_name, 12 birth_date 13 FROM children 14 WHERE employee#=e.employee# 15 ) 16 AS children_tab 17 ) children, 18 CAST 19 ( 20 MULTISET 21 ( 22 SELECT 23 job_date, 24 title, 25 CAST 26 ( 27 MULTISET 28 ( 29 SELECT salary_date, salary 30 FROM salary_history 31 WHERE employee#=e.employee# 32 AND job_date=jh.job_date 33 ) 34 AS salary_history_tab 35 ) salary_history 36 FROM job_history jh 37 WHERE employee#=e.employee# 38 ) 39 AS job_history_tab 40 ) job_history 41 FROM employees e; View created.
Let’s retrieve one employee object and look at the query execution plan. No indexes are used in retrieving records from each of the four tables. The cost of the plan is just 1. This is the minimum achievable cost, indicating that there is no join-penalty.
SQL> alter session set "_rowsource_execution_statistics"=true; Session altered. SQL> SELECT * FROM employees_view WHERE employee# = 1; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 1 IGNATIUS 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000)))) SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ SQL_ID aaxmaqz947aa0, child number 0 ------------------------------------- SELECT * FROM employees_view WHERE employee# = 1 Plan hash value: 2117652374 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1 | 1 |00:00:00.01 | 1 | |* 1 | TABLE ACCESS HASH| CHILDREN | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 2 | TABLE ACCESS HASH| SALARY_HISTORY | 2 | 1 | 44 | | 4 |00:00:00.01 | 3 | |* 3 | TABLE ACCESS HASH| JOB_HISTORY | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 4 | TABLE ACCESS HASH| EMPLOYEES | 1 | 845 | 27040 | | 1 |00:00:00.01 | 1 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMPLOYEE#"=:B1) 2 - access("EMPLOYEE#"=:B1) filter("JOB_DATE"=:B1) 3 - access("EMPLOYEE#"=:B1) 4 - access("EMPLOYEE#"=1) Note ----- - cpu costing is off (consider enabling it) - dynamic sampling used for this statement (level=2) 30 rows selected.
Next, let’s create an “INSTEAD OF” trigger so that we insert into the view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting for us.
SQL> CREATE OR REPLACE TRIGGER employees_view_insert 2 INSTEAD OF INSERT ON employees_view 3 REFERENCING NEW AS n 4 FOR EACH ROW 5 DECLARE 6 i NUMBER; 7 BEGIN 8 INSERT INTO employees 9 VALUES 10 ( 11 :n.employee#, 12 :n.name, 13 :n.birth_date 14 ); 15 16 FOR i IN :n.children.FIRST .. :n.children.LAST 17 LOOP 18 INSERT INTO children 19 VALUES 20 ( 21 :n.employee#, 22 :n.children(i).child_name, 23 :n.children(i).birth_date 24 ); 25 END LOOP; 26 27 FOR i IN :n.job_history.FIRST .. :n.job_history.LAST 28 LOOP 29 INSERT INTO job_history VALUES 30 ( 31 :n.employee#, 32 :n.job_history(i).job_date, 33 :n.job_history(i).title 34 ); 35 FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST 36 LOOP 37 INSERT INTO salary_history 38 VALUES 39 ( 40 :n.employee#, 41 :n.job_history(i).job_date, 42 :n.job_history(i).salary_history(j).salary_date, 43 :n.job_history(i).salary_history(j).salary 44 ); 45 END LOOP; 46 END LOOP; 47 END; 48 / Trigger created.
Finally, let’s insert an employee object directly into the view and confirm that we can read it back.
SQL> INSERT INTO employees_view 2 VALUES 3 ( 4 2, 5 'YGNACIO', 6 '01-JAN-70', 7 CHILDREN_TAB 8 ( 9 CHILDREN_REC('INIGA', '01-JAN-01'), 10 CHILDREN_REC('INIGO', '01-JAN-02') 11 ), 12 JOB_HISTORY_TAB 13 ( 14 JOB_HISTORY_REC 15 ( 16 '01-JAN-91', 17 'PROGRAMMER', 18 SALARY_HISTORY_TAB 19 ( 20 SALARY_HISTORY_REC('01-FEB-91', 1000), 21 SALARY_HISTORY_REC('01-MAR-91', 1000) 22 ) 23 ), 24 JOB_HISTORY_REC 25 ( 26 '01-JAN-92', 27 'DATABASE ADMIN', 28 SALARY_HISTORY_TAB 29 ( 30 SALARY_HISTORY_REC('01-FEB-92', 2000), 31 SALARY_HISTORY_REC('01-MAR-92', 2000) 32 ) 33 ) 34 ) 35 ); 1 row created. SQL> SELECT * FROM employees_view WHERE employee# = 2; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 2 YGNACIO 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000))))
On the fifth day of Christmas, my true love gave to me
Five golden rings.
By now, you must be wondering when I’m going to get around to explaining how to create a NoSQL database. When I was a junior programmer, quite early in my career, my friends and I were assigned to work on a big software development project for which we would have to use technologies with which we were completely unfamiliar. We were promised that training would be provided before the project started. The very first thing the instructor said was (paraphrasing) “First you have to insert your definitions into the C.D.D.” and he walked to the board and wrote the commands that we needed for the purpose. Needless to say, we were quite flustered because we had no idea what those “definitions” might be or what a “C.D.D.” was and how it fit into the big picture.
NoSQL is being taught without reference to the big picture. None of the current books on NoSQL mention functional segmentation even though it is the underpinning principle of NoSQL. All the current books on NoSQL imply that NoSQL principles are in conflict with the relational model. If you are in a hurry to create your first NoSQL database, I can recommend Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. But as one of the world’s greatest geniuses Leonardo da Vinci has said: “Those who are in love with practice without science are like the sailor who gets into a ship without rudder or compass, who is never certain where he is going. Practice must always be built on sound theory … The painter who copies by practice and judgement of eye, without rules, is like a mirror which imitates within itself all the things placed before it without any understanding of them.” (On the errors of those who rely on practice without science).
Continuing the train of thought from Day Four, the Dynamo developers saw that one of the keys to extreme availability was data replication. Multiple copies of the shopping cart are allowed to exist and, if one of the replicas becomes unresponsive, the data can be served by one of the other replicas. However, because of network latencies, the copies may occasionally get out of sync and the customer may occasionally encounter a stale version of the shopping cart. Once again, this can be handled appropriately by the application tier; the node that falls behind can catch up eventually or inconsistencies can be detected and resolved at an opportune time, such as at checkout. This technique is called “eventual consistency.”
The inventor of relational theory, Dr. Codd, was acutely aware of the potential overhead of consistency checking. In his 1970 paper, he said:
“There are, of course, several possible ways in which a system can detect inconsistencies and respond to them. In one approach the system checks for possible inconsistency whenever an insertion, deletion, or key update occurs. Naturally, such checking will slow these operations down. [emphasis added] If an inconsistency has been generated, details are logged internally, and if it is not remedied within some reasonable time interval, either the user or someone responsible for the security and integrity of the data is notified. Another approach is to conduct consistency checking as a batch operation once a day or less frequently.”
In other words, the inventor of relational theory would not have found a conflict between his relational model and the “eventual consistency” that is one of the hallmarks of the NoSQL products of today. However, the Dynamo developers imagined a conflict because it quite understandably conflated the relational model with the ACID guarantees of database management systems. However, ACID has nothing to do with the relational model per se (although relational theory does come in very handy in defining consistency constraints); pre-relational database management systems such as IMS provided ACID guarantees and so did post-relational object-oriented database management systems.
I should not defend eventual consistency simply by using a convenient quote from the writings of Dr. Codd. “The devil can cite Scripture for his purpose. An evil soul producing holy witness is like a villain with a smiling cheek, a goodly apple rotten at the heart. O, what a goodly outside falsehood hath!” (from the Shakespeare play The Merchant of Venice) If I am in favor of eventual consistency, I should explain why, not simply quote from the writings of Dr. Codd. If I can defend my own beliefs, I free myself to disagree with Dr. Codd as I plan to do later in this series. I have in fact come to accept that real-time consistency checking should be a design choice not a scriptural mandate. I may have had a different opinion in the past but “a foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. … Speak what you think now in hard words, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day.” (from the Emerson essay Self-Reliance).
The tradeoff between consistency and performance is as important in the wired world of today as it was in Dr. Codd’s world. We cannot cast stones at Dynamo for the infraction of not guaranteeing the synchronization of replicated data (or allowing temporary inconsistencies between functional segments), because violations of the consistency requirement are equally commonplace in the relational camp. The replication technique used by Dynamo has a close parallel in the technique of “multimaster replication” used in the relational camp. Application developers in the relational camp are warned about the negative impact of integrity constraints.     And, most importantly, no mainstream DBMS currently implements the SQL-92 “CREATE ASSERTION” feature that is necessary to provide the consistency guarantee. For a detailed analysis of this anomaly, refer to Toon Koppelaars’s article “CREATE ASSERTION: The Impossible Dream” in the August 2013 issue of the NoCOUG Journal.
1. “Using primary and foreign keys can impact performance. Avoid using them when possible.” (http://docs.oracle.com/cd/E17904_01/core.1111/e10108/adapters.htm#BABCCCIH)
2. “For performance reasons, the Oracle BPEL Process Manager, Oracle Mediator, human workflow, Oracle B2B, SOA Infrastructure, and Oracle BPM Suite schemas have no foreign key constraints to enforce integrity.” (http://docs.oracle.com/cd/E23943_01/admin.1111/e10226/soaadmin_partition.htm#CJHCJIJI)
3. “For database independence, applications typically do not store the primary key-foreign key relationships in the database itself; rather, the relationships are enforced in the application.” (http://docs.oracle.com/cd/E25178_01/fusionapps.1111/e14496/securing.htm#CHDDGFHH)
4. “The ETL process commonly verifies that certain constraints are true. For example, it can validate all of the foreign keys in the data coming into the fact table. This means that you can trust it to provide clean data, instead of implementing constraints in the data warehouse.” (http://docs.oracle.com/cd/E24693_01/server.11203/e16579/constra.htm#i1006300)
On the fourth day of Christmas, my true love gave to me
Four colly birds.
Let’s recap what we have covered so far.
- Day One: NoSQL technology consists of “disruptive innovations” which present a dilemma for established players.
- Day Two: The goals of NoSQL technology are extreme performance, extreme scalability, and extreme availability.
- Day Three: The underpinning component of NoSQL technology is functional segmentation which results in simple hierarchical schemas.
Amazon’s next design decision was “sharding” or horizontal partitioning of all the tables in a hierarchical schema. Hash-partitioning is typically used. Each table is partitioned in the same way as the other tables in the schema and each set of partitions is placed in a separate database referred to as a “shard.” The shards are independent of each other; that is, there is no clustering (as in Oracle RAC) or federation (as in IBM DB2).
Note that the hierarchical schemas that result from functional segmentation are always shardable; that is, hierarchical schemas are shardable by definition.
Returning to the example from Ted Codd’s 1970 paper on the relational model:
- employee (man#, name, birthdate) with primary key (man#)
- children (man#, childname, birthyear) with primary key (man#, childname)
- jobhistory (man#, jobdate, title) with primary key (man#, jobdate)
- salaryhistory (man#, jobdate, salarydate, salary) with primary key (man#, jobdate, salarydate)
Note that the jobhistory, salaryhistory, and children tables have composite keys. In each case, the leading column of the composite key is the man#. Therefore, all four tables can be partitioned using the man#.
Sharding is an essential component of NoSQL designs but it does not present a conflict with the relational model; it too is simply a physical database design decision. In the relational model, the collection of standalone databases or shards can be logically viewed as a single distributed database.
On the third day of Christmas, my true love gave to me
Three French hens.
Amazon’s pivotal design decision was to break its monolithic enterprise-wide database service into simpler component services such as a best-seller list service, a shopping cart service, a customer preferences service, a sales rank service, and a product catalog service. This avoided a single point of failure. In an interview for the NoCOUG Journal, Amazon’s first database administrator, Jeremiah Wilton explains the rationale behind Amazon’s approach: “The best availability in the industry comes from application software that is predicated upon a surprising assumption: The databases upon which the software relies will inevitably fail. The better the software’s ability to continue operating in such a situation, the higher the overall service’s availability will be. But isn’t Oracle unbreakable? At the database level, regardless of the measures taken to improve availability, outages will occur from time to time. An outage may be from a required upgrade or a bug. Knowing this, if you engineer application software to handle this eventuality, then a database outage will have less or no impact on end users. In summary, there are many ways to improve a single database’s availability. But the highest availability comes from thoughtful engineering of the entire application architecture.” As an example, the shopping cart service should not be affected if the checkout service is unavailable or not performing well.
I said that this was the pivotal design decision made by Amazon. I cannot emphasize this enough. If you resist functional segmentation, you are not ready for NoSQL. If you miss the point, you will not understand NoSQL.
Note that functional segmentation results in simple hierarchical schemas. Here is an example of a simple hierarchical schema from Ted Codd’s 1970 paper on the relational model, meticulously reproduced in the 100th issue of the NoCOUG Journal. This schema stores information about employees, their children, their job histories, and their salary histories.
- employee (man#, name, birthdate)
- children (man#, childname, birthyear)
- jobhistory (man#, jobdate, title)
- salaryhistory (man#, jobdate, salarydate, salary)
Functional segmentation is the underpinning of NoSQL technology but it does not present a conflict with the relational model; it is simply a physical database design decision. Each functional segment is usually assigned its own standalone database. The collection of functional segments could be regarded as a single distributed database. However, distributed transactions are verboten in the NoSQL world. Functional segmentation can therefore result in temporary inconsistencies if, for example, the shopping cart data is not in the same database as the product catalog and occasional inconsistencies result. Occasionally, an item that is present in a shopping cart may go out of stock. Occasionally, an item that is present in a shopping cart may be repriced. The problems can be resolved when the customer decides to check out, if not earlier. As an Amazon customer, I occasionally leave items in my shopping cart but don’t complete a purchase. When I resume shopping, I sometimes get a notification that an item in my shopping chart is no longer in stock or has been repriced. This technique is called “eventual consistency” and the application is responsible for ensuring that inconsistencies are eventually corrected. Randy Shoup, one of the architects of the eBay’s ecommerce platform, explains how:
“At eBay, we allow absolutely no client-side or distributed transactions of any kind – no two-phase commit. In certain well-defined situations, we will combine multiple statements on a single database into a single transactional operation. For the most part, however, individual statements are auto-committed. While this intentional relaxation of orthodox ACID properties does not guarantee immediate consistency everywhere, the reality is that most systems are available the vast majority of the time. Of course, we do employ various techniques to help the system reach eventual consistency: careful ordering of database operations, asynchronous recovery events, and reconciliation or settlement batches. We choose the technique according to the consistency demands of the particular use case.” (Scalability Best Practices: Lessons from eBay)
The eventual consistency technique receives a lot of attention because it is supposedly in conflict with the relational model. We will return to this subject later in this series and argue that eventual consistency is not in conflict with the relational model.
Here is the poll data from the Confio-sponsored webinar “NoSQL and Big Data for the Oracle DBA” and my answers to the questions asked in the chat. The recording of the webinar is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at http://iggyfernandez.files.wordpress.com/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf.
|Are NoSQL products and technologies being deployed at your organization?|
|Total Responses: 145 of 303 (48%)|
|Answer||Total Number||Total %|
|Are Big Data products and technologies being deployed at your organization?|
|Total Responses: 138 of 303 (46%)|
|Answer||Total Number||Total %|
|Is there any merit to the claim that NoSQL technology beats relational technology in performance, scalability, and availability?|
|Total Responses: 143 of 303 (47%)|
|Answer||Total Number||Total %|
|No merit whatsoever||5||3%|
|A lot of merit||6||4%|
|Don’t have enough information to judge||94||66%|
SQL v/s NoSQL
Q. Where would I use a NoSQL database v/s Cloudera Hadoop? (G. B.)
A. You would use a NoSQL database where you are dealing with simple schemas such as in the Amazon examples. You would use Cloudera Hadoop when you want to process large amounts of filesystem data using the parallel capabilities of the Map/Reduce algorithm.
Q. I’m a little confused on the objective of the webinar. Is it to indicate that a NoSQL solution is always unnecessary and one can use Oracle? Or that there are situations where a NoSQL data store is relevant and one can still use Oracle? Or something else? (R. B.)
A. There are certainly situations where NoSQL technologies excel, specifically situations that benefit from sharding and replication. However, the eBay example proves that it is also possible to design a modern e-commerce platform without completely abandoning relational technology. The unstated objective of the presentation was to prove that Amazon missed the opportunity to take relational technology to the next level. Amazon believed that relational technology requires a join penalty but it was wrong. See my post http://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.
Q. Are you implying that NoSQL is a solution for everything relational or that it has areas where it excels? What are those areas for NoSQL vs. Relational? (C. R.)
A. Answered above.
Q. Are you also going to discuss the cost of Oracle v/s NoSQL? (D. T.)
A. Many NoSQL technologies are open-source. However, there is an argument to be made that you get what you pay for. Oracle Database is a feature-rich and mature product.
Q. If I understood your explanations, you mean that Oracle, through the 12c version, is now able to create data or reorganize any existing ones the NoSQL way and then can still use the powerful SQL language which does not require high technical skills? (J. M. A.)
Clusters have been available in Oracle Database for an extremely long time. For example, refer to the Oracle7 Server Concepts Manual at http://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch5.htm#toc052. Object-relational capabilities were introduced in Oracle Database 8.0.
Q. Is there a benefit for eBay to transfer its data into NoSQL clusters given it is already using Oracle the SQL way? (J. M. A.)
The eBay example proves that it is also possible to design a modern e-commerce platform without completely abandoning relational technology.
Q. Why do you think people are not using clustered tables? Are there any downsides with it? (A. K. E.)
A. They are not used because most application developers and database administrators haven’t heard of them even though Oracle uses them for data dictionary tables such as TAB$ and COL$ and even though Oracle uses single-table hash clusters in TPC-C benchmarks.
Clusters cannot be partitioned but you could use partition views to emulate partitioning as in the example at http://iggyfernandez.wordpress.com/2013/01/22/we-dont-use-databases-we-dont-use-indexes/. Note that Oracle used an undocumented patch to partition hash clusters in a recent TPC-C benchmark. See http://iggyfernandez.wordpress.com/2011/05/10/major-new-undocumented-partitioning-feature-in-oracle-database-11g-release-2/.
One issue with hash clusters is the potential for hash collisions and block chains. In the TPC-C benchmarks, Oracle pre-allocates space for all the expected rows and uses the “HASH IS” clause to prevent hash collisions. An alternative is to use indexed clusters.
Q. What if you have large tables in terms of rows. How will table clusters perform? (T. D.)
A. Clusters cannot be partitioned but you could use partition views to emulate partitioning as in the example at http://iggyfernandez.wordpress.com/2013/01/22/we-dont-use-databases-we-dont-use-indexes/. Note that parallel UNION ALL is only available in Oracle Database 12c. Note that Oracle used an undocumented patch to partition hash clusters in a recent TPC-C benchmark. See http://iggyfernandez.wordpress.com/2011/05/10/major-new-undocumented-partitioning-feature-in-oracle-database-11g-release-2/ so it is not unreasonable to hope that the feature will be implemented someday.
Q. How will table clusters perform for large amounts of data? (M. R.)
A. Answered above.
Q. Is there a multi-block read penalty for blocks that are read from the clustered tables in your example when you want to report across all employees? (K. H.)
A. Yes, there is a penalty. But NoSQL optimizes for a specific use case and we did the same. It is good practice to optimize for the most important use case. In the example, we optimized for the use case of retrieving all data for a single employee.
Q. How about the employee object-relational view. Doesn’t it use join operations in the background at the time of select? (K. N.)
A. Yes, it does. But the query execution plan shows that there is no join penalty. When retrieving one row from the view, the estimated cost is 1 and the actual number of blocks touched is also 1.
Q. You mentioned writing Java code [for Hadoop]. How about using C++, Python, etc.? (K. Z.)
Q. How does what you present compare to Aster SQL/MapReduce? (L. L.)
A. The detailed explanation is in Aster Data’s paper “SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions” available at http://pdf.aminer.org/000/225/039/a_practical_approach_to_static_node_positioning.pdf. You might also want to read Oracle’s paper “In-Database Map-Reduce” available at http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-indatabase-mapreduce-128831.pdf.
Q. What is the best database to load Twitter and Facebook data and analyze? (G.B.)
A. Facebook cannot provide public access to user posts because of privacy restrictions but Twitter provides public access to its tweet stream. The choice of technology for analyzing the tweet stream depends on the kind of analysis. Last year, Twitter engineers made presentations on the Twitter architecture to students at the University of California at Berkeley. For their projects, the students analyzed the Twitter data using various technologies. The architecture presentations by the Twitter engineers and the project presentations by the students are available at http://blogs.ischool.berkeley.edu/i290-abdt-s12/. The “map of a tweet” is at http://www.scribd.com/doc/30146338/map-of-a-tweet.
Q. What skills can an Oracle DBA take to use in Big Data world? (G. B.)
A. In the Big Data world, you’re probably going to need Linux, SQL, and programming skills. You can leverage your previous experience as an administrator or an application developer.
Q. Assuming NoSQL takes over, what do you think will be the roles of Database Administrators? (K. I.)
A. I don’t foresee NoSQL taking over. But I see innovation continuing in the relational and non-relational spaces. I see relational and non-relational systems co-existing. I see some companies winning and some companies losing. I see both the relational and non-relational camps adopting each other’s best ideas. However, the tasks will remain the same; that is, installing, configuring, upgrading, monitoring, tuning, programming, etc.
Q. With Big Data and the Cloud coming into the picture, will roles like Oracle DBA and SQL Server DBA be replaced by Big Data DBA and Cloud DBA roles? (V. V.)
A. They won’t be replaced because Oracle Database and SQL Server are not going away.
Q. If NoSQL people are reinventing SQL in one way or another, then what is the future of SQL? (J. G.)
A. Relational algebra is the right tool for a lot of tasks, so the future of SQL is assured.