Day One: Disruptive Innovation
Day Two: Requirements and Assumptions
Day Three: Functional Segmentation
Day Four: Sharding
Day Five: Replication and Eventual Consistency
Day Six: The False Premise of NoSQL
Day Seven: Schemaless Design
Day Eight: Oracle NoSQL Database
Day Nine: NoSQL Taxonomy
Day Ten: Big Data
Day Eleven: Mistakes of the relational camp
Day Twelve: Concluding Remarks
On the twelfth day of Christmas, my true love gave to me
Twelve drummers drumming.
The relational camp put productivity, ease-of-use, and logical elegance front and center. However, the mistakes and misconceptions of the relational camp prevent mainstream database management systems from achieving the performance levels required by modern applications. For example, Dr. Codd forbade nested relations (a.k.a.unnormalized relations) and mainstream database management systems equate the normalized set with the stored set.
The NoSQL camp on the other hand put performance, scalability, and reliability front and center. Understandably the NoSQL camp could not see past the mistakes and misconceptions of the relational camp and lost the opportunity to take the relational model to the next level. Just like the relational camp, the NoSQL camp believes that normalization dictates physical storage choices. Just like the relational camp, the NoSQL camp believes that non-relational APIs are forbidden by the relational model. And the NoSQL camp believes that relational is synonomous with ACID (Atomicity, Consistency, Isolation, Durability).
The NoSQL camp created a number of innovations that are disruptive in the sense used by Harvard Business School professor Clayton Christensen: functional segmentation, sharding, replication, eventual consistency, and schemaless design. Since these innovations are compatible with the relational model, I hope that they will eventually be absorbed by mainstream database management systems.
There are already proofs that performance, scalability, and reliability can be achieved without abandoning the relational model. For example, ScaleBase provides sharding and replication on top of MySQL storage nodes. Another good example to study is VoltDB which claims to be the world’s fastest OLTP database (though it has never published an audited TPC benchmark). A counter-example to Amazon is eBay which arguably has equal scale and equally high performance, scalability, and reliability requirements. eBay uses performance segmentation, sharding, replication, and eventual consistency but continues to use Oracle (and SQL) to manage the local database. I asked Randy Shoup, one of the architects of the eBay e-commerce platform, why eBay did not abandon Oracle Database and he answered in one word: “comfort.” Here are links to some of his presentations and articles on the eBay architecture:
- eBay’s Scaling Odyssey: Growing and Evolving a Large eCommerce Site (Slide deck)
- The eBay Architecture: Striking a balance between site stability, feature velocity, performance, and cost (Slide deck)
- Randy Shoup Discusses the eBay Architecture (video and transcript)
- Randy Shoup on eBay’s Architectural Principles (video and transcript)
- Scalability Best Practices: Lessons from eBay (blog post)
Finally, I should point out that are very good reasons to criticize current NoSQL products; for example, lack of standards, primitive feature sets, primitive security, and primitive management tools, unproven claims, and traps for the unwary. MongoDB uses a database-wide lock for reads and writes …
I hope that you enjoyed reading this series of posts as much as I enjoyed writing it. Happy new year!
On the eleventh day of Christmas, my true love gave to me
Eleven pipers piping.
Over a lifespan of four and a half decades, the relational camp made a series of strategic mistakes that made NoSQL possible. The mistakes started very early. The biggest mistake is enshrined in the first sentence of the first paper on relational theory by none other than its inventor, Dr. Edgar Codd: “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” (A Relational Model of Data for Large Shared Data Banks)
How likely is it that application developers will develop scalable high-performance applications if they are shielded from the internal representation of data? SQL was never intended for serious application development. As explained by the creators of SQL in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners [emphasis added]. It is for this class of users that SEQUEL is intended. For this reason, [SQL] emphasizes simple data structures and operations [emphasis added].” (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf).
A case in point: If you were the manager of a bookstore, how would you stock the shelves? Would you stand at the door and fling books onto any shelf that had some free space, perhaps recording their locations in a notebook for future reference. Of course not! And would you scatter related books all over the bookstore? Of course not! Then why do we store rows of data in random fashion? The default Oracle table storage structure is the unorganized heap and yet it is chosen 99.9% of the time.
Productivity and ease-of-use were the goals of the relational model, not performance. In Normalized Data Base Structure: A Brief Tutorial (1971), Dr. Codd said “What is less understandable is the trend toward more and more complexity in the data structures with which application programmers and terminal users directly interact. Surely, in the choice of logical data structures that a system is to support, there is one consideration of absolutely paramount importance – and that is the convenience of the majority of users. … To make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming we must provide the simplest possible data structures and almost natural language. … What could be a simpler, more universally needed, and more universally understood data structure than a table?”
Dr. Codd emphasized the productivity benefits of the relational model in his acceptance speech for the 1981 Turing Award: “It is well known that the growth in demands from end users for new applications is outstripping the capability of data processing departments to implement the corresponding application programs. There are two complementary approaches to attacking this problem (and both approaches are needed): one is to put end users into direct touch with the information stored in computers; the other is to increase the productivity of data processing professionals in the development of application programs. It is less well known that a single technology, relational database management, provides a practical foundation to both approaches.”
The emphasis on productivity and ease of use at the expense of performance was the biggest mistake of the relational camp.
Mistake #2: Forbidding nested relations
Dr. Codd made a second serious mistake in his 1970 paper by forbidding nested relations in the interests of productivity and ease-of-use. He incorrectly argued (in an unpublished version of that paper) that nested relations would mean that “The second-order predicate calculus (rather than first-order) is needed because the domains on which relations are defined may themselves have relations as elements.” Not anticipating markup languages like XML, he also argued that “The simplicity of the array representation which becomes feasible when all relations are cast in normal form is not only an advantage for storage purposes but also for communication of bulk data between systems which use widely different representations of the data.” This caused the detractors of the relational model to observe that “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” (incorrectly attributed to Esther Dyson, the editor of Release 1.0).
Even though he made a serious mistake by forbidding nested relations, Dr. Codd never said that
- data should be stored in normalized form only;
- each stored table should occupy a separate storage bucket;
- a single data block should only contain data from a single table;
- data should be stored in row-major order; and
- stored tables have only one storage representation each.
Oracle Database has limited support for nested tables but they are not stored inline, they are “collections” not real tables, and you cannot define primary-key or foreign-key constraints on them.
Mistake #3: Favoring relational calculus over relational algebra
At the heart of the relational model are the operations of the “relational algebra”: equi-join, theta-join, outer-join, semi-join, anti-join, union, difference, intersection, etc. In the interests of productivity and ease-of-use however, Dr. Codd favored “relational calculus” over relational algebra. In Relational Completeness of Data Base Sublanguages, he proved the equivalence of relational calculus and relational algebra. In the interests of productivity and ease-of-use, Codd then made the decision for you and me: “Clearly, the majority of users should not have to learn either the relational calculus or algebra in order to interact with data bases. However, requesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language.” SQL was therefore based on relational calculus instead of relational algebra and it became the job of the database management system to convert SQL statements into equivalent sequences of relational algebra operations. This is not a trivial problem. It is an extremely hard problem. But the relational camp is adamant that the optimizer always knows best (it doesn’t).
Mistake #4: Equating the normalized set with the stored set
Equating the normalized set with the stored set ensures that mainstream database management systems will be permanently handicapped in the performance race. For example, a nested relation can be normalized into “first normal form” (no nested relations). As another example, if B and C are colunm-subsets of A and A = B EQUIJOIN C, then we can perform a “lossless decomposition” of A into B and C. However, the relational model does not require that each normalized relation be mapped to a separate bucket of physical storage. Dr. Codd differentiated between the “stored” set, the “named” set, and the “expressible” set but mainstream database management systems conflate the stored set with the normalized set. Mainstream database management systems only allow integrity constraints to be specified on the stored set and only permit DML operations on the stored set. If B and C are colunm-subsets of A and A = B EQUIJOIN C, then it ought to be possible to store A only while defining primary and foreign key constraints on B and C and allowing DML operations on B and C but this is not permitted by mainstream database management systems.
Mistake #5: Incomplete implementation
Strange as it may sound, the relational model has not yet been fully implemented by mainstream database management systems. You’ve probably heard of Codd’s “twelve rules” published in a 1985 Computerworld article. More than a quarter-century later, mainstream database management systems still do not follow all the twelve rules. DML operations are still not permitted on views (Rule 6 and Rule 7). Arbitrary integrity constraints still cannot be declared and enforced (Rule 10). Integrity constraints still cannot span separate databases (Rule 11). If would be hard to abandon a database management systems that had these capabilities. It is easier to abandon a database management system that does not have these capabilities.
Mistake #6: The marriage of relational and transactional
ACID transactions nothing to do with the relational model per se although relational theory does come in very handy in defining consistency constraints. (See What’s so sacred about relational anyway?) Pre-relational database management systems such as IMS provided ACID guarantees and so did post-relational object-oriented database management systems. We should be able to store non-transactional data outside a transactional database management system while continuing to exploit the entire spectrum of indexing, partitioning, and clustering techniques. See We don’t use databases; we don’t use indexes.
On the tenth day of Christmas, my true love gave to me
Ten lords a-leaping.
The topic of Big Data is often brought up in NoSQL discussions so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented the PageRank algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search Engine by Brin and Page) and founded Google. The PageRank algorithm required very large matrix-vector multiplications (Mining of Massive Datasets Ch. 5 by Rajaraman and Ullman) so the MapReduce technique was invented to handle such large computations (MapReduce: Simplified Data Processing on Large Clusters). Smart people then realized that the MapReduce technique could be used for other classes of problems and an open-source project called Hadoop was created to popularize the MapReduce technique (The history of Hadoop: From 4 nodes to the future of data). Other smart people realized that MapReduce could handle the operations of relational algebra such as join, anti-join, semi-join, union, difference, and intersection (Mining of Massive Datasets Ch. 2 by Rajaraman and Ullman) and began looking at the possibility of processing large volumes of business data (a.k.a. “Big Data”) better and cheaper than mainstream database management systems. Initially programmers had to write Java code for the “mappers” and “reducers” used by MapReduce. However, smart people soon realized that SQL queries could be automatically translated into the necessary Java code and “SQL-on-Hadoop” was born. Big Data thus became about processing large volumes of business data with SQL but better and cheaper than mainstream database management systems. However, the smart people have now realized that MapReduce is not the best solution for low-latency queries (Facebook open sources its SQL-on-Hadoop engine, and the web rejoices). Big Data has finally become about processing large volumes of business data with SQL but better and cheaper than mainstream database management systems and with or without MapReduce.
That’s the fast-moving story of Big Data in a nutshell.
On the ninth day of Christmas, my true love gave to me
Nine ladies dancing.
NoSQL databases can be classified into the following categories:
- Key-value stores: The archetype is Amazon Dynamo of which DynamoDB is the commercial successor. Key-value stores basically allow applications to “put” and “get” values but each product has differentiators. For example, DynamoDB supports “tables” (namespaces) while Oracle NoSQL Database offers “major” and “minor” key paths.
- Column-family stores: Column-family stores allow data associated with a single key to be spread over multiple storage nodes. Each storage node only stores a subset of the data associated with the key; hence the name “column-family.” A key is therefore composed of a “row key” and a “column key” in a manner analogous to the major and minor key paths of Oracle NoSQL Database.
- Graph databases: Graph databases are non-relational databases that use graph concepts such as nodes and edges to solve certain classes of problems: for example; the shortest route between two towns on a map. The concepts of functional segmentation, sharding, replication, eventual consistency, and schemaless design do not apply to graph databases so I will not discuss graph databases.
NoSQL products are numerous and rapidly evolving. There is a crying need for a continuously updated encyclopedia of NoSQL products but none exists. There is a crying need for an independent benchmarking organization but none exists. My best advice is to do a proof of concept (POC) as well as a PSR (Performance Scalability Reliability) test before committing to using a NoSQL product. Back in the day, in 1985 to be precise, Dr. Codd had words of advice for those who were debating between the new relational products and the established pre-relational products of his day. The advice is as solid today as it was in Dr. Codd’s day.
“Any buyer confronted with the decision of which DBMS to acquire should weigh three factors heavily.
The first factor is the buyer’s performance requirements, often expressed in terms of the number of transactions that must be executed per second. The average complexity of each transaction is also an important consideration. Only if the performance requirements are extremely severe should buyers rule out present relational DBMS products on this basis. Even then buyers should design performance tests of their own, rather than rely on vendor-designed tests or vendor-declared strategies. [emphasis added]
The second factor is reduced costs for developing new databases and new application programs …
The third factor is protecting future investments in application programs by acquiring a DBMS with a solid theoretical foundation …
In every case, a relational DBMS wins on factors two and three. In many cases, it can win on factor one also—in spite of all the myths about performance.”
—An Evaluation Scheme for Database Management Systems that are claimed to be Relational
On the eighth day of Christmas, my true love gave to me
Eight maids a-milking.
In May 2011, Oracle Corporation published a scathing indictment of NoSQL, the last words being “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” Just a few months later however, in September of that year, Oracle Corporation released Oracle NoSQL Database. Oracle removed the NoSQL criticism from its website but since information published on the internet is immortal, archived copies can be easily found if you know what you are looking for. In the white paper that accompanied the release of Oracle NoSQL Database, Oracle Corporation claimed that the demands of certain applications could not be met by mainstream database management systems:
“The Oracle NoSQL Database, with its “No Single Point of Failure” architecture, is the right solution when data access is “simple” in nature and application demands exceed the volume or latency capability of traditional data management solutions. For example, click-stream data from high volume web sites, high-throughput event processing and social networking communications all represent application domains that produce extraordinary volumes of simple keyed data. Monitoring online retail behavior, accessing customer profiles, pulling up appropriate customer ads and storing and forwarding real-time communication are examples of domains requiring the ultimate in low-latency access. Highly distributed applications such as real-time sensor aggregation and scalable authentication also represent domains well-suited to Oracle NoSQL Database.”
Oracle NoSQL Database has two features that distinguish it from other key-value stores.
- A key is the concatenation of a “major key path” and a “minor key path.” All records with the same “major key path” will be colocated on the same storage node.
- Oracle NoSQL provides transactional support for modifying multiple records with the same major key path.
Here are some resources to get you started with Oracle NoSQL Database:
- The white paper on Oracle NoSQL Database v2.0; an updated version of the original September 2011 paper.
- Download the community edition of Oracle NoSQL Database v2.0 from http://download.oracle.com/otn-pub/otn_software/nosql-database/kv-ce-2.0.26.zip. The prerequisite is JDK 1.6 or higher which you can download from http://www.oracle.com/technetwork/java/javase/downloads/index.html.
- Installation instructions and a five-minute quickstart for Oracle NoSQL Database are at http://docs.oracle.com/cd/NOSQL/html/quickstart.html. It includes a “Hello World” teaching example illustrating the “put” and “get” function calls which are the basic operations in key-value stores.
final String keyString = "Hello"; final String valueString = "Big Data World!"; store.put(Key.createKey(keyString), Value.createValue(valueString.getBytes())); final ValueVersion valueVersion = store.get(Key.createKey(keyString)); System.out.println(keyString + " " + new String(valueVersion.getValue().getValue()));
- Oracle NoSQL Database and Oracle Relational Database – A Perfect Fit, a presentation by Dave Rubin, Director of NoSQL Database development at Oracle Corporation.
- Data Management in Oracle NoSQL Database, a presentation by Anuj Sahni, Principal Product Manager at Oracle Corporation. Also a hands-on database administration workshop.
- The Oracle NoSQL Database resource page on the Oracle Corporation website.
On the seventh day of Christmas, my true love gave to me
Seven swans a-swimming.
As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from lack of distributed transactions across functional segments and from asynchronous replication), and blobs.
The final innovation of the NoSQL camp is “schemaless design.” In database management systems of the NoSQL kind, data is stored in “blobs” and documents the database management system does not police their structure. In mainstream database management systems on the other hand, doctrinal purity requires that the schema be designed before data is inserted. Let’s do a thought experiment.
Suppose that we don’t have a schema and let’s suppose that the following facts are known.
- Iggy Fernandez is an employee with EMPLOYEE_ID=1 and SALARY=$1000.
- Mogens Norgaard is a commissioned employee with EMPLOYEE_ID=2, SALARY=€1000, and COMMISSION_PCT=25.
- Morten Egan is a commissioned employee with EMPLOYEE_ID=3, SALARY=€1000, and unknown COMMISSION_PCT.
Could we ask the following questions and expect to receive correct answers?
- Question: What is the salary of Iggy Fernandez?
- Correct answer: $1000.
- Question: What is the commission percentage of Iggy Fernandez?
- Correct answer: Invalid question.
- Question: What is the commission percentage of Mogens Norgaard?
- Correct answer: 25%
- Question: What is the commission percentage of Morten Egan?
- Correct answer: Unknown.
If we humans can process the above data and correctly answer the above questions, then surely we can program computers to do so.
The above data could be modeled with the following three relations. It is certainly disruptive to suggest that this be done on the fly by the database management system but not outside the realm of possibility.
EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) EMPLOYEE_NAME VARCHAR2(128) UNCOMMISSIONED_EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) SALARY NUMBER(8,2) COMMISSIONED_EMPLOYEES EMPLOYEE_ID NOT NULL NUMBER(6) SALARY NUMBER(8,2) COMMISSION_PCT NUMBER(2,2)
A NoSQL company called Hadapt has already stepped forward with such a feature:
“While it is true that SQL requires a schema, it is entirely untrue that the user has to define this schema in advance before query processing. There are many data sets out there, including JSON, XML, and generic key-value data sets that are self-describing — each value is associated with some key that describes what entity attribute this value is associated with [emphasis added]. If these data sets are stored in Hadoop, there is no reason why Hadoop cannot automatically generate a virtual schema against which SQL queries can be issued. And if this is true, users should not be forced to define a schema before using a SQL-on-Hadoop solution — they should be able to effortlessly issue SQL against a schema that was automatically generated for them when data was loaded into Hadoop.
Thanks to the hard work of many people at Hadapt from several different groups, including the science team who developed an initial design of the feature, the engineering team who continued to refine the design and integrate it into Hadapt’s SQL-on-Hadoop solution, and the customer solutions team who worked with early customers to test and collect feedback on the functionality of this feature, this feature is now available in Hadapt.” (http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/)
This is not really new ground. Oracle Database provides the ability to convert XML documents into relational tables (http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb01int.htm#ADXDB0120) though it ought to be possible to view XML data as tables while physically storing it in XML format in order to benefit certain use cases. It should also be possible to redundantly store data in both XML and relational formats in order to benefit other use cases.
In “Extending the Database Relational Model to Capture More Meaning,” Dr. Codd explains how a “formatted database” is created from a collection of facts:
“Suppose we think of a database initially as a set of formulas in first-order predicate logic. Further, each formula has no free variables and is in as atomic a form as possible (e.g, A & B would be replaced by the component formulas A, B). Now suppose that most of the formulas are simple assertions of the form Pab…z (where P is a predicate and a, b, … , z are constants), and that the number of distinct predicates in the database is few compared with the number of simple assertions. Such a database is usually called formatted, because the major part of it lends itself to rather regular structuring. One obvious way is to factor out the predicate common to a set of simple assertions and then treat the set as an instance of an n-ary relation and the predicate as the name of the relation.”
In other words, a collection of facts can always be organized into relations if necessary.
On the sixth day of Christmas, my true love gave to me
Six geese a-laying.
The final hurdle was extreme performance, and that’s where the Dynamo developers went astray. The Dynamo developers believed that the relational model imposes a “join penalty” and therefore chose to store data as “blobs.” This objection to the relational model is colorfully summarized by the following statement attributed to Esther Dyson, the editor of the Release 1.0 newsletter, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.”  The statement dates back to 1988 and was much quoted when object-oriented databases were in vogue.
Since the shopping cart is an object, doesn’t disassembling it for storage make subsequent data retrieval and updates inefficient? The belief stems from an unfounded assumption that has found its way into every mainstream DBMS—that every table should map to physical storage. In reality, the relational model is a logical model and, therefore, it does not concern itself with storage details at all. It would be perfectly legitimate to store the shopping cart in a physical form that resembled a shopping cart while still offering a relational model of the data complete with SQL. In other words, the physical representation could be optimized for the most important use case—retrieving the entire shopping-cart object using its key—without affecting the relational model of the data. It would also be perfectly legitimate to provide a non-relational API for the important use cases. Dr. Codd himself gave conditional blessing to such non-relational APIs in his 1985 Computerworld article, “Is Your DBMS Really Relational?”, in which he says, “If a relational system has a low-level (single-record-at-a-time) language, that low level [should not] be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”
The key-blob or “key-value” approach used by Dynamo and successor products would be called “zeroth” normal form in relational terminology. In his 1970 paper, Dr. Codd says: “Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on. For example, one of the domains on which the relation employee is defined might be salary history. An element of the salary history domain is a binary relation defined on the domain date and the domain salary. The salary history domain is the set of all such binary relations. At any instant of time there are as many instances of the salary history relation in the data bank as there are employees. In contrast, there is only one instance of the employee relation.” In common parlance, a relation with non-simple domains is said to be in “zeroth” normal form or unnormalized. Dr. Codd suggested that unnormalized relations should be normalized for ease of use. Here again is the unnormalized employee relation from Dr. Codd’s paper:
employee ( employee#, name, birthdate, jobhistory (jobdate, title, salaryhistory (salarydate, salary)), children (childname, birthyear) )
The above unnormalized relation can be decomposed into four normalized relations as follows.
employee' (employee#, name, birthdate) jobhistory' (employee#, jobdate, title) salaryhistory' (employee#, jobdate, salarydate, salary) children' (employee#, childname, birthyear)
However, this is not to suggest that these normalized relations must necessarily be mapped to individual buckets of physical storage. Dr. Codd differentiated between the stored set, the named set, and the expressible set. In the above example, we have one unnormalized relation and four normalized relations, if we preferred it, the unnormalized employee relation could be the only member of the stored set. Alternatively, if we preferred it, all five relations could be part of the stored set; that is, we could legitimately store redundant representations of the data. However, the common belief blessed by current practice is that the normalized relations should be the only members of the stored set.
Even if the stored set contains only normalized relations, they need not map to different buckets of physical storage. Oracle is unique among mainstream database management systems in providing a convenient construct called the “table cluster” that is suitable for hierarchical schemas. In Dr. Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk thus avoiding the join penalty. If the cluster was a “hash cluster,” no indexes would be required for the use case of retrieving records belonging to a single cluster key.
The mistake made by the Dynamo developers is really a mistake perpetuated by the relational camp but it is a mistake nevertheless.
Tomorrow: Schemaless Design
1. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.
Appendix: Table Clusters in Oracle Database
Here’s a demonstration of using Oracle table clusters to store records from four tables in the same block and retrieving all the components of the “employee cart” without using indexes. First we create four normalized tables and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. Then we create an object-relational view that assembles employee information into a single unnormalized structure and show how to insert into this view using an “INSTEAD OF” trigger.
The following demonstration was performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 126.96.36.199.
SQL*Plus: Release 188.8.131.52.0 Production on Sun Jul 28 19:44:23 2013 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 184.108.40.206.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
First, we create a table cluster and add four tables to the cluster.
SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000; Cluster created. SQL> CREATE TABLE employees 2 ( 3 employee# INTEGER NOT NULL, 4 name VARCHAR2(16), 5 birth_date DATE, 6 CONSTRAINT employees_pk PRIMARY KEY (employee#) 7 ) 8 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE job_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 title VARCHAR2(16), 6 CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date), 7 CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE salary_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 salary_date DATE NOT NULL, 6 salary NUMBER, 7 CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date), 8 CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees, 9 CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history 10 ) 11 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE children 2 ( 3 employee# INTEGER NOT NULL, 4 child_name VARCHAR2(16) NOT NULL, 5 birth_date DATE, 6 CONSTRAINT children_pk PRIMARY KEY (employee#, child_name), 7 CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created.
Then we insert data into all four tables. We find that all the records have been stored in the same database block even though they belong to different tables. Therefore the join-penalty has been eliminated.
SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN'); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000); 1 row created. SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1; BLOCK_NUMBER ------------ 22881
Next we create an object-relational view that presents each employee as an object.
SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE) 2 / Type created. SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER) 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB) 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec 2 / Type created. SQL> create or replace view employees_view as 2 SELECT 3 employee#, 4 name, 5 birth_date, 6 CAST 7 ( 8 MULTISET 9 ( 10 SELECT 11 child_name, 12 birth_date 13 FROM children 14 WHERE employee#=e.employee# 15 ) 16 AS children_tab 17 ) children, 18 CAST 19 ( 20 MULTISET 21 ( 22 SELECT 23 job_date, 24 title, 25 CAST 26 ( 27 MULTISET 28 ( 29 SELECT salary_date, salary 30 FROM salary_history 31 WHERE employee#=e.employee# 32 AND job_date=jh.job_date 33 ) 34 AS salary_history_tab 35 ) salary_history 36 FROM job_history jh 37 WHERE employee#=e.employee# 38 ) 39 AS job_history_tab 40 ) job_history 41 FROM employees e; View created.
Let’s retrieve one employee object and look at the query execution plan. No indexes are used in retrieving records from each of the four tables. The cost of the plan is just 1. This is the minimum achievable cost, indicating that there is no join-penalty.
SQL> alter session set "_rowsource_execution_statistics"=true; Session altered. SQL> SELECT * FROM employees_view WHERE employee# = 1; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 1 IGNATIUS 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000)))) SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ SQL_ID aaxmaqz947aa0, child number 0 ------------------------------------- SELECT * FROM employees_view WHERE employee# = 1 Plan hash value: 2117652374 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1 | 1 |00:00:00.01 | 1 | |* 1 | TABLE ACCESS HASH| CHILDREN | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 2 | TABLE ACCESS HASH| SALARY_HISTORY | 2 | 1 | 44 | | 4 |00:00:00.01 | 3 | |* 3 | TABLE ACCESS HASH| JOB_HISTORY | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 4 | TABLE ACCESS HASH| EMPLOYEES | 1 | 845 | 27040 | | 1 |00:00:00.01 | 1 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMPLOYEE#"=:B1) 2 - access("EMPLOYEE#"=:B1) filter("JOB_DATE"=:B1) 3 - access("EMPLOYEE#"=:B1) 4 - access("EMPLOYEE#"=1) Note ----- - cpu costing is off (consider enabling it) - dynamic sampling used for this statement (level=2) 30 rows selected.
Next, let’s create an “INSTEAD OF” trigger so that we insert into the view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting for us.
SQL> CREATE OR REPLACE TRIGGER employees_view_insert 2 INSTEAD OF INSERT ON employees_view 3 REFERENCING NEW AS n 4 FOR EACH ROW 5 DECLARE 6 i NUMBER; 7 BEGIN 8 INSERT INTO employees 9 VALUES 10 ( 11 :n.employee#, 12 :n.name, 13 :n.birth_date 14 ); 15 16 FOR i IN :n.children.FIRST .. :n.children.LAST 17 LOOP 18 INSERT INTO children 19 VALUES 20 ( 21 :n.employee#, 22 :n.children(i).child_name, 23 :n.children(i).birth_date 24 ); 25 END LOOP; 26 27 FOR i IN :n.job_history.FIRST .. :n.job_history.LAST 28 LOOP 29 INSERT INTO job_history VALUES 30 ( 31 :n.employee#, 32 :n.job_history(i).job_date, 33 :n.job_history(i).title 34 ); 35 FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST 36 LOOP 37 INSERT INTO salary_history 38 VALUES 39 ( 40 :n.employee#, 41 :n.job_history(i).job_date, 42 :n.job_history(i).salary_history(j).salary_date, 43 :n.job_history(i).salary_history(j).salary 44 ); 45 END LOOP; 46 END LOOP; 47 END; 48 / Trigger created.
Finally, let’s insert an employee object directly into the view and confirm that we can read it back.
SQL> INSERT INTO employees_view 2 VALUES 3 ( 4 2, 5 'YGNACIO', 6 '01-JAN-70', 7 CHILDREN_TAB 8 ( 9 CHILDREN_REC('INIGA', '01-JAN-01'), 10 CHILDREN_REC('INIGO', '01-JAN-02') 11 ), 12 JOB_HISTORY_TAB 13 ( 14 JOB_HISTORY_REC 15 ( 16 '01-JAN-91', 17 'PROGRAMMER', 18 SALARY_HISTORY_TAB 19 ( 20 SALARY_HISTORY_REC('01-FEB-91', 1000), 21 SALARY_HISTORY_REC('01-MAR-91', 1000) 22 ) 23 ), 24 JOB_HISTORY_REC 25 ( 26 '01-JAN-92', 27 'DATABASE ADMIN', 28 SALARY_HISTORY_TAB 29 ( 30 SALARY_HISTORY_REC('01-FEB-92', 2000), 31 SALARY_HISTORY_REC('01-MAR-92', 2000) 32 ) 33 ) 34 ) 35 ); 1 row created. SQL> SELECT * FROM employees_view WHERE employee# = 2; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 2 YGNACIO 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000))))