Home > DBA, NoSQL, Oracle, SQL > Do you agree that SQL needs massive improvement?

Do you agree that SQL needs massive improvement?

To find out whether others share my belief that SQL needs massive improvement, I’ve added a poll to my post Why SQL Loses and NoSQL Wins. You can vote below or on the original post; it’s the same poll, so it doesn’t matter where you vote. At the time of writing, it’s a neck-and-neck race between Disagree and Agree.

Here’s a summary of my original post:

The First Reason Why SQL Loses: SQL Can’t Really be Optimized: SQL has so many design flaws that it is not as amenable to machine optimization as a properly designed “data base sublanguage” could be. A number of good reading links are provided in the original post.

The Second Reason Why SQL Loses: SQL is Used Inappropriately: SQL is best suited for business data processing (BDP) and badly suited for messaging applications such as Oracle Streams. Using the SQL interface for row-by-row processing of large amounts of data makes the applications run “slow by slow.”

The Solution: Jump on the NoSQL Bandwagon: Leverage the momentum of the NoSQL movement to create a SQL replacement that is well-suited to machine optimization and allows software developers to control the execution plan if they wish to do so. An appropriate name for this replacement might well be “NoSQL.” SQL will not die easily but perhaps, just perhaps, some upstart little company will challenge the old guard and reinvigorate relational technology.

P.S. I have a lot of opinions but am willing to change them when confronted with fresh facts. As the American philosopher Ralph Waldo Emerson said in his essay on self-reliance: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. Speak what you think now in hard words, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said today.”

As Mogens Norgaard famously said at the end of one of his essays: “Now prove me wrong.”

Categories: DBA, NoSQL, Oracle, SQL
  1. June 10, 2011 at 12:34 pm

    Interesting points but I feel an like the discussion is marrying two disconnected topics – fixing SQL and noSQL. As far as I can tell these two discussions have nothing to do with each other except for possibly the desire to harness the energy of the noSQL wave to transform SQL. NoSQL on the other hand , AFAIK, is 1) about mining, in parallel, massive amounts of non-relational structured data that have little need for a transactional model, and 2) for key value stores for the young web programmers out there who don’t understand relational databases and don’t have the inclination to learn (lamentable but understandable) These are 2 specific cases in a huge spectrum of data manipulation types.
    SQL handles these areas, though probably not as well as the noSQL options. On the other hand SQL handles a vaste range of data manipulation and data such as relational data and transactional data that these systems are incapable of handling. SQL has issues for sure, as Date points out, for example with the classic duplicate problems and it would be cool if SQL was tightened up in regards to these issues. It would be awesome if some join types were easier to define and clearer in the code in SQL.

    I don’t think giving developers direct access to access plans helps. (you can always run in rule mode in Oracle if you want dependability or run some other rule based optimizer database)

    “Therein lies the first reason why SQL loses: it is a high-level, non-procedural query language designed for the use of those who are not computer specialists.”

    Are you kidding me ?! SQL if anything is considered too complex and so complex that it’s often reserved for a SQL specialist and if there is any lack of specialist horse power then developers resort to minimizing the SQL usage, doing row level interactions, and setting up entity attribute models (EAV). Developers are use to object oriented programming and/or procedural programing which doesn’t naturally mesh with set based functionality
    There are several different programmer pitfalls with developers and then there is a different though not distinct set of SQL issues.
    SQL issues are like
    out join errors
    duplicate row issues
    semi-join issues
    schema errors and or missing schema constraints
    Developer issues are
    single row interactions for multi-row and full table operations
    joining tables in application code
    filtering tables in application code
    throwing the whole relational model out and using entity attribute value caches

    SQL with its automatic handling of the internals of tuning and still SQL intimidates people. Are we saying that by allowing the user to tune their own queries that we will produce a better language? I don’t think so. I can just see it – the database request will fail to run unless we tell the database how to access the data, in what order and with what join methods. Hmmmm.

    I think the only way things will work better is by taking care of the issues for the users/programmers. Identifying flawed data access patterns and flagging them or adding bandaids – query caching, prefetching etc.
    Programmers don’t understand how to use SQL and I do think that points out a ***flaw in SQL***, but by putting more power into the users hands I think the problem will become worse, since the issues that affect SQL tuning multiply exponentially with the complexity of the statement. If anything we should be looking for ways to simplify SQL , patch the wholes and make it the optimal methods of using SQL more obvious.

    Twenty plus years ago when I started using SQL I hated it exactly because I didn’t know what it was doing – the join access and types were not defined in the language and I wanted to control them. That was naive at the time, but it’s beyond naive now. At the time there was a lot less SQL in the world and a lot less complexity in the SQL that existed. Now complexity is rampant. It’s not feasible to ask developers to optimize queries. Query plans of course should change depending on predicates. How would developer handle this?
    Sure Oracle’s optimizer screws things up and always will because there will always be intractible SQL being written but rather than bash the optimizer which handles a huge range of issues, I’d look for ways to make it better. We can determine if a SQL is intractable before running it. I want to see ways of analyzing SQL and pointing out user flaws not just blindly grinding any possible SQL through the database.. . My problem is (beyond the SQL language itself ) is once the optimizer screws up, it’s pretty dang hard to figure out why or what to do about. Thus I spent much of the last two years addressing this through visualizations.

    I’m a strong advocate of visualizing quantitative data, and thus worked on softwqare visualization of SQL tuning – VST (the structure, relationships, data sizes and organization of queries , ***not the execution plan ***). The VST is the map – the execution plan is the path taken on the map. The current VST diagrams are a n awesome start but could be improved a lot. Even with the visualizations of SQL, it’s can still be a daunting task but much easier than with just the SQL text itself. Unfortunately I’ve had to move onto other projects

  2. June 10, 2011 at 2:05 pm

    It’s worth musing upon how integrated relational databases are in the computer world. Talking to two different banks recently they had 4500 and 6000 Oracle databases respectively. Relational database here to stay, at least for long long time. On the other hand there are areas where they fail. Besides non-relational structured data mining and key value stores relational database also fail at general indexing like google, or splunk etc. When it comes to getting general, close enough information relational databases fail miserably.
    Now we can try and force a relational database to do generalized close enough searches but why try? We can force database to load and crunch massive amounts of non-relational structure data by why when there are proven alternatives.
    Now the key value store issue is trickier. There might be instances where such storage is appropriate but if the data is going to be joined or constrained then those key value stores start to break down rapidly. Even indexing becomes a problem.

  3. Iggy Fernandez
    June 12, 2011 at 7:18 am

    Kyle Hailey :

    Interesting points but I feel like the discussion is marrying two disconnected topics – fixing SQL and noSQL. As far as I can tell these two discussions have nothing to do with each other except for possibly the desire to harness the energy of the noSQL wave to transform SQL.

    Correct. I propose that the relational community harness the energy of the NoSQL movement to replace SQL with something that is both easier for application developers to learn as well as easier for the optimizer to optimize. NoSQL would be an appropriate name for the replacement. 🙂

    Kyle Hailey :

    I don’t think giving developers direct access to access plans helps.

    I believe that predictability trumps the search for optimality because optimality is such a difficult goal for so many reasons that require a book (such as Cost Based Optimizer Fundamentals by Jonathan Lewis) to explain. Optimization technology still has many miles to go. As a simple example, current optimizers are unable to determine when UNION has the same effect as UNION ALL or when SELECT DISTINCT has the same effect as SELECT. Since we cannot currently guarantee that the optimizer will find the optimal or a nearly optimal execution strategy, why not allow the developers to control the execution strategy if required or desired? Interestingly, we willingly believe the claims that the optimizer is capable of finding optimal or nearly optimal execution plans for complex queries even though there are no studies that prove it. 🙂

    Kyle Hailey :

    “Therein lies the first reason why SQL loses: it is a high-level, non-procedural query language designed for the use of those who are not computer specialists.”

    Are you kidding me ?! SQL if anything is considered too complex and so complex that it’s often reserved for a SQL specialist

    I believe that the reason SQL is considered complex is that the learning and teaching of it are distanced from relational algebra and relational calculus. Leonardo da Vinci said: “Those who are in love with practice without [theoretical] knowledge are like the sailor who gets into a ship without rudder or compass and who can never be certain [where] he is going. Practice must always be founded on sound theory.” (http://books.google.com/books?id=M_MVAAAAYAAJ&pg=PA18)

    However, SQL is a convoluted mixture of both relational algebra and relational calculus without supporting either well. This makes it especially hard to understand, hard to teach, and, even worse, hard to optimize. I would like to see a replacement based on relational algebra with support for complex operations such as Semi Join, Anti Join (Semi Difference), Division, and even Star Join. Support for complex operations would nullify the advantage of relational calculus over relational algebra to which Codd alluded when he said: “The relational calculus permits a user to request the data he desires by its properties. This is an ideal starting point for search optimization. The algebra, on the other hand, requires the user to formulate a sequence of algebraic operations that will generate the desired data from the data base relations. For queries other than very simple ones, the properties of the desired data tend to get hidden in the particular operation sequence (one of many possible ones) which the user selects. Therefore, starting from an algebraic source language, one has the choice of locally optimizing the execution of each operation (a very limited form of optimization) or tackling the difficult problem of analyzing sequences of such operations to discover the intended defining properties of the desired data.” (http://www.cs.berkeley.edu/~christos/classics/Codd72a.pdf#page=35)

    Therefore, I wasn’t being facetious when I suggested that SQL’s original goal of being a high-level, non-procedural query language for the use of those who are not computer specialists is what is holding relational databases back. It seems to me that SQL has become the elephant in the room (http://en.wikipedia.org/wiki/Elephant_in_the_room). As Djikstra said: A number of our established “powerful” programming language features, even beloved ones, could very well turn out to belong to the “problem set” than to the “solution set.” (Correctness Concerns, and Among Other Things, Why They Are Resented http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD450.html).

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: