Why SQL Loses and NoSQL Wins
The First Reason: SQL Can’t Really be Optimized
SQL was not originally intended for the use of software developers. It was originally intended for the use of accountants, engineers, architects, and urban planners who, “while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language.” This is a direct quote from the 1972 paper that first presented SQL to the world, “SEQUEL: A STRUCTURED ENGLISH QUERY LANGUAGE,” by IBM researchers Donald Chamberlin and Raymond Boyce. Therein lies the first reason why SQL loses: it is a high-level, non-procedural query language designed for the use of those who are not computer specialists. As the 1972 Turing Award winner, Edsger Dijkstra, said: A number of our established “powerful” programming language features, even beloved ones, could very well turn out to belong to the “problem set” than to the “solution set.” (Correctness Concerns, and Among Other Things, Why They Are Resented)
So what’s the big problem with SQL being a high-level, non-procedural query language? Let’s start with two words: query optimization. Database performance expert Wolfgang Breitling complains: “I became interested in the CBO’s selectivity calculations trying to understand why it comes up with some of the ridiculously low cardinality estimates (like 1 when in reality there are 80,000+) which then lead to disastrous access plans that take hours, provided they finish at all, instead of minutes or seconds.” It will require a hefty book to explain why the optimization problem is difficult—you could start with Cost-Based Oracle Fundamentals by Jonathan Lewis—so I won’t trivialize the problem by trying to explain it in a few paragraphs in this position piece.
It doesn’t help that SQL allows us to formulate a query in so many different ways and that that current SQL optimizers cannot resolve all equivalent SQL queries into a single canonical form. (No EXPLAIN PLAN To Rule Them All). It doesn’t help that SQL allows duplicates. (Why Duplicate Tuples are Prohibited). It doesn’t help that the optimizer’s focus is “optimal” query plans instead of “robust” query plans (query plans that work reasonably well for different values of the bind variables). It doesn’t help that the available methods for specifying and freezing query plans are convoluted, inadequate, only partly documented, and not foolproof. And don’t let me rant about bind variable peeking. (Bind Variable Peeking: Bane or Boon?)
I’ll let blogger Christopher Charles have the last word:
SQL can’t really be optimized. Why not? Because it breaks so many of the fundamental rules for a relational language that it can’t be systematically and unambiguously optimized. One of the rules of relational algebra is that an arbitrary [sequence of operations] can be evaluated in any [semantically equivalent] order and still produce the same result. However, because SQL allows duplicates, this isn’t true. Different orders of evaluation can yield different results. Further complicating this is the syntactic redundancy mentioned earlier – making the task of transforming expressions into a canonical form even more difficult. An optimizer that is unable to always transform equivalent expressions to a single form and that is unable to always arrange those expressions in the most optimal manner is not going to be an optimizer that is always able to produce the best (or even an acceptable) execution plan. (My Brilliant Career Tuning SQL)
The Second Reason: SQL is Used Inappropriately
The second reason why SQL loses is that it is used for purposes for which it is unsuited. In my mind, SQL shines in business data processing (BDP) but does not shine in scientific number crunching (Results of the First International NoCOUG SQL Challenge), messaging, and other areas where better tools exist. Even the pioneer of the relational database movement Dr. Edgar (Ted) Codd (1923–2003) recognized that the relational model was most suited for business data processing. In NORMALIZED DATA BASE STRUCTURE: A BRIEF TUTORIAL he said: “It may be argued that in some applications the problems have an immediate natural formulation in terms of networks. This is true of some applications, such as studies of transportation networks, power-line networks, computer design, and the like. […] The numerous data bases which reflect the daily operations and transactions of commercial and industrial enterprises are, for the most part, concerned with non-network applications.”
The Solution: Jump on the NoSQL Bandwagon
The solution is to use the momentum of the “NoSQL” movement to create a replacement for SQL. We don’t need to abandon relational databases—that would be akin to throwing out the baby with the bathwater—but to use relational databases only in situations where they excel such as business data processing. And SQL should give way to a language that is designed for software developers instead of for “accountants, engineers, architects, and urban planners.” A fitting name for such a new language might well be “NoSQL.” This language would be a rigorous implementation of relational algebra or relational calculus. Because I favor putting more control in the hands of software developers, I would prefer a language based on relational algebra (including complex set operations such as semi-join, anti-join, division, and perhaps even exotic operations such as “star join”) with the provision for the software developer to optionally specify join orders, join algorithms, and any other execution detail. I am a software developer not an accountant or urban planner; I can handle relational algebra.