Home > DBA, NoSQL, Oracle, SQL > Why SQL Loses and NoSQL Wins

Why SQL Loses and NoSQL Wins


The First Reason: SQL Can’t Really be Optimized

SQL was not originally intended for the use of software developers. It was originally intended for the use of accountants, engineers, architects, and urban planners who, “while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language.” This is a direct quote from the 1972 paper that first presented SQL to the world, “SEQUEL: A STRUCTURED ENGLISH QUERY LANGUAGE,” by IBM researchers Donald Chamberlin and Raymond Boyce. Therein lies the first reason why SQL loses: it is a high-level, non-procedural query language designed for the use of those who are not computer specialists. As the 1972 Turing Award winner, Edsger Dijkstra, said: A number of our established “powerful” programming language features, even beloved ones, could very well turn out to belong to the “problem set” than to the “solution set.” (Correctness Concerns, and Among Other Things, Why They Are Resented)

So what’s the big problem with SQL being a high-level, non-procedural query language? Let’s start with two words: query optimization. Database performance expert Wolfgang Breitling complains: “I became interested in the CBO’s selectivity calculations trying to understand why it comes up with some of the ridiculously low cardinality estimates (like 1 when in reality there are 80,000+) which then lead to disastrous access plans that take hours, provided they finish at all, instead of minutes or seconds.” It will require a hefty book to explain why the optimization problem is difficult—you could start with Cost-Based Oracle Fundamentals by Jonathan Lewis—so I won’t trivialize the problem by trying to explain it in a few paragraphs in this position piece.

It doesn’t help that SQL allows us to formulate a query in so many different ways and that that current SQL optimizers cannot resolve all equivalent SQL queries into a single canonical form. (No EXPLAIN PLAN To Rule Them All). It doesn’t help that SQL allows duplicates. (Why Duplicate Tuples are Prohibited). It doesn’t help that the optimizer’s focus is “optimal” query plans instead of “robust” query plans (query plans that work reasonably well for different values of the bind variables). It doesn’t help that the available methods for specifying and freezing query plans are convoluted, inadequate, only partly documented, and not foolproof. And don’t let me rant about bind variable peeking. (Bind Variable Peeking: Bane or Boon?)

I’ll let blogger Christopher Charles have the last word:

SQL can’t really be optimized. Why not? Because it breaks so many of the fundamental rules for a relational language that it can’t be systematically and unambiguously optimized. One of the rules of relational algebra is that an arbitrary [sequence of operations] can be evaluated in any [semantically equivalent] order and still produce the same result. However, because SQL allows duplicates, this isn’t true. Different orders of evaluation can yield different results. Further complicating this is the syntactic redundancy mentioned earlier – making the task of transforming expressions into a canonical form even more difficult. An optimizer that is unable to always transform equivalent expressions to a single form and that is unable to always arrange those expressions in the most optimal manner is not going to be an optimizer that is always able to produce the best (or even an acceptable) execution plan. (My Brilliant Career Tuning SQL)

The Second Reason: SQL is Used Inappropriately

The second reason why SQL loses is that it is used for purposes for which it is unsuited. In my mind, SQL shines in business data processing (BDP) but does not shine in scientific number crunching (Results of the First International NoCOUG SQL Challenge), messaging, and other areas where better tools exist. Even the pioneer of the relational database movement Dr. Edgar (Ted) Codd (1923–2003) recognized that the relational model was most suited for business data processing. In NORMALIZED DATA BASE STRUCTURE: A BRIEF TUTORIAL he said: “It may be argued that in some applications the problems have an immediate natural formulation in terms of networks. This is true of some applications, such as studies of transportation networks, power-line networks, computer design, and the like. […] The numerous data bases which reflect the daily operations and transactions of commercial and industrial enterprises are, for the most part, concerned with non-network applications.”

The Solution: Jump on the NoSQL Bandwagon

The solution is to use the momentum of the “NoSQL” movement to create a replacement for SQL. We don’t need to abandon relational databases—that would be akin to throwing out the baby with the bathwater—but to use relational databases only in situations where they excel such as business data processing. And SQL should give way to a language that is designed for software developers instead of for “accountants, engineers, architects, and urban planners.” A fitting name for such a new language might well be “NoSQL.” This language would be a rigorous implementation of relational algebra or relational calculus. Because I favor putting more control in the hands of software developers, I would prefer a language based on relational algebra (including complex set operations such as semi-join, anti-join, division, and perhaps even exotic operations such as “star join”) with the provision for the software developer to optionally specify join orders, join algorithms, and any other execution detail. I am a software developer not an accountant or urban planner; I can handle relational algebra.

About these ads
Categories: DBA, NoSQL, Oracle, SQL
  1. June 5, 2011 at 5:49 pm

    Iggy,

    I am not familiar with noSQL and therefore I reserve judgment.

    With respect to the “first reason”, your comments and those by the blogger validate my and others’ long-time opinion that the people who designed SQL did not have a sufficient knowledge, understanding and appreciation of the relational model. Had that not been the case, many of the problems mentioned would have not existed. The relational model intended and permits a higher level language without
    as many optimization problems as SQL’s and, in fact, one of the main purposes was to raise the level of the data loanguage for both end-users and programmers (see below).

    Please bear in mind that SQL was essentially a research prototype which was turned into a commercial product without much thought as to the potential problems, of which Codd vehemently warned IBM to no avail.

    It is fair ro say that SQL was targetted at the business domain amenable to predicate logic/set theory. I do not consider myself sufficiently competent to assess whether one data language can be designed that can address both business and science problems, so maybe somebody with better expertise can enlighten US.
    My guess would be yes, but would require serious expertise and very careful design, which neither the academia nor the industry care for these days.

    What I can say is that there are circumstances in which both end users and programmers can use the same properly designed relational DATA language (which is what you suggest) that has also non-data components which programmers
    have access to (and to which, of course, the relational model does not apply, but which must have a smooth interaction with the data component. That was essentially Codd’s concept of a sublanguage, but unfortunately the combination of SQL and existing standard programming languages was not what he had in mind. The industry
    was what it was and he could not do much about it, no matter haow hard he tried.

    Such a “optimal” language could be designed and used under effectively only under one set of circumstances: end-users, programmers, DBAs and product designers would have to undergo a solid and thorough basic education–before they even get into computer science–in logic and mathematics (as well as other subjects) to the degree that they come easy, a non-trivial teaching task. This, of course, would
    serve them well in life beyond databases.

    Unfortunately, to the extent that some of that was done in the past, the industry and academia have moved away from education, substituting it with training (essentially products), which is hardly the same thing. This is a fundamental societal problem not limited to data management or computer science and we can see its consequences in the decline of the West. Unfortunately, what that
    means is that whatever languages and products emerge, they will not be fundamentally sound and will have no less, and probably less problems than SQL.

    Many years ago Chris Date participated in a panel with Stonebraker, who was a prof at Berkeley. Date said “We must give up on the older generation of database professionals and rely on the young graduates who are coming out of academia to know, understand and appreciate the relational model. To which Stonebraker replied “Chris, you don’t know what you’re talking about. The young kids were not around when we had the huge problems with non-relational DBMSs and they are reinventing all of them”. Today there is practically no serious research on or teaching in data fundamentals. Don’t bother them with all that theory, they have practical things to do. Like XML and clouds.

  2. June 5, 2011 at 5:53 pm

    Correction:

    Unfortunately, what that
    means is that whatever languages and products emerge, they will not be fundamentally sound and will have no less, and probably ***MORE*** problems than SQL.

  3. June 5, 2011 at 6:24 pm

    As Fabian said:

    a full circle has now been closed. We now are throwing out past experience and knowledge in the name of expediency and “product certification”.
    Enjoy the consequences. Apparently, they are “cost-effective”.
    Of course: if one never audits the results, they are ALWAYS such…

  4. June 6, 2011 at 5:50 pm

    I disagree. It might have been a noble intention to develop a language that would be usable by accountants…. but SQL that is existant now is far removed from any design intentions of the 1970s.
    To take an example : Oracle SQL has very powerful constructs.
    The issue is that developers of applications (COTS or inhouse) do not spend enough time learning the intricacies. I suspect that they are not given enough time and experience with all the different commands, keywords and combinations possible in an SQL statement. Also, where the developers are keen to “move” to the “new languages” they aren’t going to delve deep enough.

    It is frequently stated that NoSQL / “Hadoop” / “eventually consistent” databases are not really transaction oriented. SQL can still be used for Analytical processing “in a node” and “in a cluster”.

    To add to some of Fabian’s comments — in 2009 I did 2 semesters of a Masters program in Software Engineering. I was utterly flabbergasted when the faculty members stated that Stored Procedures would not be accepted in the Project — ostensibly because they couldn’t understand Stored Procedures, only Java.

  5. Iggy Fernandez
    June 6, 2011 at 6:29 pm

    Hemant K Chitale :

    I disagree. It might have been a noble intention to develop a language that would be usable by accountants…. but SQL that is existant now is far removed from any design intentions of the 1970s.
    To take an example : Oracle SQL has very powerful constructs.

    The crux of my argument is that SQL is not well-suited to machine optimization because of flawed design decisions dating back to the 1970s. It allows duplicates. It has so much syntactic redundancy that reduction to canonical form appears to be an impossible task. It does not directly support relational algebra operations such as semijoin, antijoin, and division. It does not directly support the universal quantifier of relational calculus. It is a weird marriage of relational algebra and relational calculus without supporting either well. A cleanly designed “data base sublanguage” (a term coined by Dr. Codd) should only allow one or the other to be used at a time. SQL should therefore be discarded and replaced with a language that not only is well-suited to machine optimization but also allows the programmer to take control of the execution plan if desired or required.

  6. Antony
    June 7, 2011 at 8:17 am

    I agree that SQl is not efficient for number crunching ability, but SQL has served the purpose it is invented for and still serving.

  7. Enrique Aviles
    June 7, 2011 at 10:06 am

    Reason #2 is not SQL’s fault. That only shows a case where SQL is not the best choice. The same thing can be said about so many other technologies. Use the right tool to solve the problem.

    “use relational databases only in situations where they excel such as business data processing.”

    I’d say that’s where SQL and relational databases are used the most, for business data processing (or at least that’s what it should be used for instead of just treating the DB as a bit bucket and doing all processing in the application layer).

    “I would prefer a language based on relational algebra (including complex set operations such as semi-join, anti-join, and division) with the provision for the software developer to optionally specify join orders, join algorithms, and any other execution detail.”

    That sounds really nice but the average developer has trouble with the current version of SQL. I can only imagine how much more difficult it will be if a language that provides all those features becomes available.

    The optimizer is not perfect and there are cases where a (good) developer is able to come up with a better execution plan but in my experience those are rare events not the norm.

    SQL is not perfect (no language is) and it can be improved but to say it needs to be replaced with something completely different seems a bit extreme.

  8. Iggy Fernandez
    June 7, 2011 at 1:54 pm

    Enrique Aviles :

    That sounds really nice but the average developer has trouble with the current version of SQL. I can only imagine how much more difficult it will be if a language that provides all those features becomes available.

    Thanks for the comments, Enrique. My explanation for why the average developer has so much difficulty with SQL is that the connection between SQL and relational algebra and relational calculus is not strong enough. SQL is a mixture of relational algebra and relational calculus but does not support either very well. The Anti-Join, Semi-Join, and Division operations of relational algebra are missing from SQL and one must simulate them using the existential quantifier of relational calculus (EXISTS). The universal quantifier (FORALL) of relational calculus is also missing and that too must be simulated using the existential quantifier. The usual way to teach SQL is to say “A SQL statement has the form SELECT select-list FROM table-list WHERE condition-list” without explaining that this is simply a convenient shorthand for multiple selection, projection, and join operations of relational algebra. I believe that SQL should be abandoned in favor of a complete and relationally rigorous language based on relational algebra that can be taught rigourously and is more amenable to machine-optimization than the current language. The reason that I prefer a language based on relational algebra instead of relational calculus is that provision could be made for the software developer to take control of the execution plan if desired or required.

  9. joel garry
    June 7, 2011 at 4:28 pm

    I’d say (and I’ve said it for a long time) that SQL is difficult for business application developers because business has not been designed relationally. It is even more inappropriate as a de facto data interchange language.

    The basis of all business going back to cuneiform times is the transaction. All the stuff layered on top of that since then can be difficult to express relationally, as it developed organically. Accounting in particular has developed specifically with redundancies to correct manual errors and disallow collusion. What happens when you computerize accounting systems to make things more “efficient?” Enron and Bernie Madoff and collaterallized debt obligations. Applying optimization algorithms to real world design can be a mistake, you simply don’t want to wind up with the toilet next to your desk (“This time that efficiency analyst has gone too far!”). Forcing design to match machine is just plain backwards.

    So while Fabian is fundamentally correct, that analysis sidesteps the even more fundamental issue. Trust and control. You can’t design a replacement business language without the hardware, software and development infrastructure to support it. Without non-repudiability down to the chip level, we might as well be playing Dungeons and Dragons. And yet to implement such infrastructure would be the doorway to Big Brother.

    So we’re perhaps better off with the status quo, lots of middling programmers making middling systems, muddling through with the rare peek of a bright shining tomorrow, bubbling startups in between catastrophes. nosql may or may not be appropriate for doc stores or social networking, but it is certainly not for business programming.

    There may be a case for a new paradigm for programming, but you have a huge amount of inertia to overcome, starting with education, plus what Noons said.

    There’s no economics to support it.

  10. June 7, 2011 at 9:52 pm

    “I’d say (and I’ve said it for a long time) that SQL is difficult for business application developers because business has not been designed relationally.”
    It is even more inappropriate as a de facto data interchange language.”

    1. Are you aware that what you are saying is that business data is not amenable to logic????????? What would you replace logic with?????

    2. What Iggy and I are saying is that SQL is not really relational. According to your reasoning, it should be then good for business.

    3. Why should a data language be an interchange language? Are the 2 functions the same?

    I am the last person to defend SQL, but it should be rejected for real, not absurd reasons and there are enough of the former.

  11. June 9, 2011 at 10:22 pm

    For a taste of the level of complete ignorance in today’s database management field see the discussion about NoSQL at

    http://news.ycombinator.net/item?id=1847575.

    People who exchange messages about absolutely nothing.

    It validates my guess that the chance of NoSQL being anything of substance is nil and my claim that anything that comes out these days can only be regress, not progress.

  1. July 20, 2012 at 10:40 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 772 other followers

%d bloggers like this: