“The devil can cite Scripture for his purpose.
An evil soul producing holy witness
Is like a villain with a smiling cheek,
A goodly apple rotten at the heart:
O, what a goodly outside falsehood hath!”
—from The Merchant of Venice (1596) by William Shakespeare
“Yet see what strong intellects dare not yet hear God himself, unless he speak the phraseology of I know not what David, or Jeremiah, or Paul. We shall not always set so great a price on a few texts, on a few lives. We are like children who repeat by rote the sentences of grandames and tutors, and, as they grow older, of the men of talents and character they chance to see,—painfully recollecting the exact words they spoke; afterwards, when they come into the point of view which those had who uttered these sayings, they understand them, and are willing to let the words go; for, at any time, they can use words as good when occasion comes. If we live truly, we shall see truly. It is as easy for the strong man to be strong, as it is for the weak to be weak. When we have new perception, we shall gladly disburden the memory of its hoarded treasures as old rubbish.”—from Self-Reliance (1841) by Ralph Waldo Emerson
Performance trumps theory:
“Any buyer confronted with making a decision regarding which DBMS to acquire should weigh three factors heavily. The first factor is his performance requirements, often expressed in terms of the number of transactions which must be executed per second, even though the average complexity of each transaction is an important consideration also. Only if his performance requirements are extremely severe does he need [emphasis added] to rule out present releases of relational DBMS products on this basis.”—from An Evaluation Scheme for Database Management Systems that are claimed to be Relational by Dr. Edgar Codd, the inventor of the relational model
Graph problems don’t benefit from the relational model:
It may be argued that in some applications the problems have an immediate natural formulation in terms of networks. This is true of some applications, such as studies of transportation networks, power-line networks, computer design, and the like. We shall call these network applications and consider their special needs later. The numerous data bases which reflect the daily operations and transactions of commercial and industrial enterprises are, for the most part, concerned with non-network applications. … Except in network applications, links should not be employed in the user’s data model.”—from Normalized Data Base Structure: A Brief Tutorial by Dr. Edgar Codd, the inventor of the relational model, the inventor of the relational model
Eventual consistency is a valid design choice:
“There are, of course, several possible ways in which a system can detect inconsistencies and respond to them. In one approach the system checks for possible inconsistency whenever an insertion, deletion, or key update occurs. Naturally, such checking will slow these operations down. [emphasis added] If an inconsistency has been generated, details are logged internally, and if it is not remedied within some reasonable time interval, either the user or someone responsible for the security and integrity of the data is notified. Another approach is to conduct consistency checking as a batch operation once a day or less frequently. Inputs causing the inconsistencies which remain in the data bank state at checking time can be tracked down if the system maintains a journal of all state-changing transactions. This latter approach would certainly be superior if few non-transitory inconsistencies occurred.”—from A Relational Model of Data for Large Shared Data Banks by Dr. Edgar Codd, the inventor of the relational model
The underlying storage structures may validly include XML, object-relational, key-value, document, and column-family storage structures:
“It is important to remember that we are not making a case for or against any physical [emphasis in the original text] storage structures.”—from Normalized Data Base Structure: A Brief Tutorial by Dr. Edgar Codd, the inventor of the relational model
SQL is far from perfect:
“SQL departs significantly from the relational model.”—from The Relational Model for Database Management Version 2 by Dr. Edgar Codd, the inventor of the relational model
Here is the poll data from the Confio-sponsored webinar “NoSQL and Big Data for the Oracle DBA” and my answers to the questions asked in the chat. The recording of the webinar is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at http://iggyfernandez.files.wordpress.com/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf.
|Are NoSQL products and technologies being deployed at your organization?|
|Total Responses: 145 of 303 (48%)|
|Answer||Total Number||Total %|
|Are Big Data products and technologies being deployed at your organization?|
|Total Responses: 138 of 303 (46%)|
|Answer||Total Number||Total %|
|Is there any merit to the claim that NoSQL technology beats relational technology in performance, scalability, and availability?|
|Total Responses: 143 of 303 (47%)|
|Answer||Total Number||Total %|
|No merit whatsoever||5||3%|
|A lot of merit||6||4%|
|Don’t have enough information to judge||94||66%|
SQL v/s NoSQL
Q. Where would I use a NoSQL database v/s Cloudera Hadoop? (G. B.)
A. You would use a NoSQL database where you are dealing with simple schemas such as in the Amazon examples. You would use Cloudera Hadoop when you want to process large amounts of filesystem data using the parallel capabilities of the Map/Reduce algorithm.
Q. I’m a little confused on the objective of the webinar. Is it to indicate that a NoSQL solution is always unnecessary and one can use Oracle? Or that there are situations where a NoSQL data store is relevant and one can still use Oracle? Or something else? (R. B.)
A. There are certainly situations where NoSQL technologies excel, specifically situations that benefit from sharding and replication. However, the eBay example proves that it is also possible to design a modern e-commerce platform without completely abandoning relational technology. The unstated objective of the presentation was to prove that Amazon missed the opportunity to take relational technology to the next level. Amazon believed that relational technology requires a join penalty but it was wrong. See my post http://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.
Q. Are you implying that NoSQL is a solution for everything relational or that it has areas where it excels? What are those areas for NoSQL vs. Relational? (C. R.)
A. Answered above.
Q. Are you also going to discuss the cost of Oracle v/s NoSQL? (D. T.)
A. Many NoSQL technologies are open-source. However, there is an argument to be made that you get what you pay for. Oracle Database is a feature-rich and mature product.
Q. If I understood your explanations, you mean that Oracle, through the 12c version, is now able to create data or reorganize any existing ones the NoSQL way and then can still use the powerful SQL language which does not require high technical skills? (J. M. A.)
Clusters have been available in Oracle Database for an extremely long time. For example, refer to the Oracle7 Server Concepts Manual at http://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch5.htm#toc052. Object-relational capabilities were introduced in Oracle Database 8.0.
Q. Is there a benefit for eBay to transfer its data into NoSQL clusters given it is already using Oracle the SQL way? (J. M. A.)
The eBay example proves that it is also possible to design a modern e-commerce platform without completely abandoning relational technology.
Q. Why do you think people are not using clustered tables? Are there any downsides with it? (A. K. E.)
A. They are not used because most application developers and database administrators haven’t heard of them even though Oracle uses them for data dictionary tables such as TAB$ and COL$ and even though Oracle uses single-table hash clusters in TPC-C benchmarks.
Clusters cannot be partitioned but you could use partition views to emulate partitioning as in the example at http://iggyfernandez.wordpress.com/2013/01/22/we-dont-use-databases-we-dont-use-indexes/. Note that Oracle used an undocumented patch to partition hash clusters in a recent TPC-C benchmark. See http://iggyfernandez.wordpress.com/2011/05/10/major-new-undocumented-partitioning-feature-in-oracle-database-11g-release-2/.
One issue with hash clusters is the potential for hash collisions and block chains. In the TPC-C benchmarks, Oracle pre-allocates space for all the expected rows and uses the “HASH IS” clause to prevent hash collisions. An alternative is to use indexed clusters.
Q. What if you have large tables in terms of rows. How will table clusters perform? (T. D.)
A. Clusters cannot be partitioned but you could use partition views to emulate partitioning as in the example at http://iggyfernandez.wordpress.com/2013/01/22/we-dont-use-databases-we-dont-use-indexes/. Note that parallel UNION ALL is only available in Oracle Database 12c. Note that Oracle used an undocumented patch to partition hash clusters in a recent TPC-C benchmark. See http://iggyfernandez.wordpress.com/2011/05/10/major-new-undocumented-partitioning-feature-in-oracle-database-11g-release-2/ so it is not unreasonable to hope that the feature will be implemented someday.
Q. How will table clusters perform for large amounts of data? (M. R.)
A. Answered above.
Q. Is there a multi-block read penalty for blocks that are read from the clustered tables in your example when you want to report across all employees? (K. H.)
A. Yes, there is a penalty. But NoSQL optimizes for a specific use case and we did the same. It is good practice to optimize for the most important use case. In the example, we optimized for the use case of retrieving all data for a single employee.
Q. How about the employee object-relational view. Doesn’t it use join operations in the background at the time of select? (K. N.)
A. Yes, it does. But the query execution plan shows that there is no join penalty. When retrieving one row from the view, the estimated cost is 1 and the actual number of blocks touched is also 1.
Q. You mentioned writing Java code [for Hadoop]. How about using C++, Python, etc.? (K. Z.)
Q. How does what you present compare to Aster SQL/MapReduce? (L. L.)
A. The detailed explanation is in Aster Data’s paper “SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions” available at http://pdf.aminer.org/000/225/039/a_practical_approach_to_static_node_positioning.pdf. You might also want to read Oracle’s paper “In-Database Map-Reduce” available at http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-indatabase-mapreduce-128831.pdf.
Q. What is the best database to load Twitter and Facebook data and analyze? (G.B.)
A. Facebook cannot provide public access to user posts because of privacy restrictions but Twitter provides public access to its tweet stream. The choice of technology for analyzing the tweet stream depends on the kind of analysis. Last year, Twitter engineers made presentations on the Twitter architecture to students at the University of California at Berkeley. For their projects, the students analyzed the Twitter data using various technologies. The architecture presentations by the Twitter engineers and the project presentations by the students are available at http://blogs.ischool.berkeley.edu/i290-abdt-s12/. The “map of a tweet” is at http://www.scribd.com/doc/30146338/map-of-a-tweet.
Q. What skills can an Oracle DBA take to use in Big Data world? (G. B.)
A. In the Big Data world, you’re probably going to need Linux, SQL, and programming skills. You can leverage your previous experience as an administrator or an application developer.
Q. Assuming NoSQL takes over, what do you think will be the roles of Database Administrators? (K. I.)
A. I don’t foresee NoSQL taking over. But I see innovation continuing in the relational and non-relational spaces. I see relational and non-relational systems co-existing. I see some companies winning and some companies losing. I see both the relational and non-relational camps adopting each other’s best ideas. However, the tasks will remain the same; that is, installing, configuring, upgrading, monitoring, tuning, programming, etc.
Q. With Big Data and the Cloud coming into the picture, will roles like Oracle DBA and SQL Server DBA be replaced by Big Data DBA and Cloud DBA roles? (V. V.)
A. They won’t be replaced because Oracle Database and SQL Server are not going away.
Q. If NoSQL people are reinventing SQL in one way or another, then what is the future of SQL? (J. G.)
A. Relational algebra is the right tool for a lot of tasks, so the future of SQL is assured.
Update: The recording is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at http://iggyfernandez.files.wordpress.com/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf.
In my webinar for Confio on October 10, I will explain that the deficiencies of relational technology are actually a result of deliberate choices made by the relational movement in its early years. The relational camp needs to revisit these choices if it wants to compete with NoSQL and Big Data technologies in the areas of performance, scalability, and availability. Previous versions of this presentation have been delivered at Great Lakes Oracle Conference, NoCOUG, and OakTableWorld. New material is constantly being added to the presentation based on attendee feedback and additional research. Attendees seem most interested in learning that eBay had the same performance, scalability, and availability requirements as Amazon but stuck with Oracle and SQL.
- The origins of NoSQL
- Amazon’s requirements
- Amazon’s solution
- Amazon v/s eBay
- The false premise of NoSQL
- Zeroth normal form
- The problem with flat tables
- Eliminating the join penalty
- Dr. Codd on eventual consistency
- The NoSQL landscape
- Key-value databases
- Document databases
- Column-family databases
- Graph databases
- What makes relational technology so sacred anyway?
- Mistakes of the relational camp
- De-emphasizing physical database design
- Discarding nested relations
- Favoring relational calculus over relational algebra
- Equating the normalized set with the stored set
- Marrying relational theory to ACID DBMS
- Ignoring SQL-92 CREATE ASSERTION
- Going one better than Hadoop
- Preprocessor feature of external tables
- Partition pruning in partition views
- Parallel UNION ALL in Oracle Database 12c
- Bonus slides
- NoSQL Buyer’s Guide