Home > DBA, Interviews, NoCOUG, NoSQL, Oracle, SQL > Professor Stonebraker’s strong opinions on SQL, NoSQL, NewSQL, and Oracle

Professor Stonebraker’s strong opinions on SQL, NoSQL, NewSQL, and Oracle


As published in the 100th issue of the NoCOUG Journal (November 2011)

Michael Stonebraker has been a pioneer of database research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, the object-relational DBMS, POSTGRES, and the federated data system, Mariposa. All three prototypes were developed at the University of California at Berkeley, where Stonebraker was a professor of computer science for 25 years. Stonebraker moved to MIT in 2001 where he focused on database scalability and opposed the old idea that one size fits all. He was instrumental in building Aurora (an early stream-processing engine), C-Store (one of the first column stores), H-Store (a shared-nothing row-store for OLTP), and SciDB (a DBMS for scientists). He epitomizes the philosophy of the American philosopher Emerson, who said: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines…speak what you think today in words as hard as cannon-balls, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said to-day.” (http://books.google.com/books?id=RI09AAAAcAAJ&pg=PA30)

INGRES AND POSTGRES—THE BACKSTORY

The Ingres RDBMS was the first open-source software product, wasn’t it? There’s wasn’t a GNU Public License at the time, so it was used to create commercial products. Why was Ingres distributed so freely? Which commercial database management systems owe their beginnings to the Ingres project?

Essentially all of the early RDBMS implementations borrowed from either Ingres or System R.  Berkeley/CS has a tradition of open-source projects (Unix 4BSD, Ingres, Postgres, etc.)

The embedded query language used by the Ingres RDBMS was QUEL, not SQL. Like SQL, QUEL was based on relational calculus, but—unlike SQL—QUEL failed to win acceptance in the marketplace. Why did QUEL fail to win acceptance in the marketplace?

QUEL is an obviously better language than SQL. See a long paper by Chris Date in 1985 for all of the reasons why. The only reason SQL won in the marketplace was because IBM released DB2 in 1984 without changing the System R query language. At the time, it had sufficient “throw-weight” to ensure that SQL won. If IBM hadn’t released DB2, Ingres Corp. and Oracle Corp. would have traded futures.

Postgres and PostgreSQL succeeded Ingres. Why was a replacement for Ingres necessary?

RDBMSs (at the time) were good at business data processing but not at geographic data, medical data, etc. Postgres was designed to extend database technology into other areas. All of the RDBMS vendors have implemented the Postgres extensibility ideas.

PostgreSQL continues to innovate. The recent 9.1 release offers synchronous replication, K-Nearest Neighbor indexing, and foreign data wrappers, among other goodies. Will PostgreSQL succeed where Ingres failed?

I don’t have any visibility into Postgres futures or prospects, since I am not involved.

QUEL is an obviously better language than SQL. … If IBM hadn’t released DB2, Ingres Corp. and Oracle Corp. would have traded futures.

STRUCTURED QUERY LANGUAGE

According to the inventors of SQL, Donald Chamberlin and Raymond Boyce, SQL was intended for the use of accountants, engineers, architects, and urban planners who, “while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language.” (http://www.joakimdalby.dk/HTM/sequel.pdf) Why didn’t things work out the way Chamberlin and Boyce predicted?

SQL is a language for programmers. That was well known by 1985. Vendors implemented other forms-based notations for non-programmers.

Chris Date quotes you as having said that “SQL is intergalactic data-speak.” (http://archive.computerhistory.org/resources/access/text/Oral_History/102658166.05.01.acc.pdf#page=43) What did you mean?

SQL is intergalactic data speak—i.e., it is the standard way for programmers to talk to databases.

Dr. Edgar Codd said in 1972: “Requesting data by its properties is far more natural than devising a particular algorithm of sequence of operations for its retrieval. Thus a calculus-oriented language provides a good target language for a more user-oriented source language.” (http://www.eecs.berkeley.edu/~christos/classics/Codd72a.pdf) With the benefit of hindsight, should we have rejected user-oriented calculus-oriented languages in favor of programmer-oriented algebra-oriented languages with full support for complex operations such as relational division, outer join, semi join, anti join, and star join?

Mere mortals cannot understand division. That doomed the relational algebra. It is interesting to note that science users seem to want algebraic query languages rather than calculus ones. Hence, SciDB supports both.

Mere mortals cannot understand [relational] division.

NO TO STRUCTURED QUERY LANGUAGE?

NoSQL is confusing to many in the relational camp. Is NoSQL a rejection of SQL or of relational database management systems, or both? Or is it just confused?

NoSQL is a collection of 50 or 75 vendors with various objectives. For some of them, the goal is to go fast by rejecting SQL and ACID. I feel these folks are misguided, since SQL is not the performance problem in current RDBMSs. In fact, there is a NewSQL movement that contains very high-performance ACID/SQL implementations.

Other members of the NoSQL movement are focused on document management or semi-structured data—application areas where RDBMSs are known not to work very well. These folks seem to be filling a market not well served by RDBMSs.

You’ve been championing NewSQL as an answer to NoSQL? What exactly is NewSQL?

Current RDBMSs are too slow to meet some of the demanding current-day applications. This causes some users to look for other alternatives. NewSQL preserves SQL and ACID, and gets much better performance with a different architecture than that used by the traditional RDBMS vendors.

Oracle Database did not enforce referential integrity constraints until Version 7. Back then, Berkeley/CS Professor Larry Rowe suggested that the best way for the CODASYL systems to compete against the relational systems was to point out that they did not [yet] support referential integrity. (http://findarticles.com/p/articles/mi_m0SMG/is_n1_v9/ai_7328281/) Can the new entrants in the DBMS marketplace prevail against the established players without enforcing integrity constraints and business rules?

I have seen several applications where the current RDBMS vendors are more than an order of magnitude too slow to meet the user’s needs. In this world, the traditional vendors are nonstarters, and users are looking for something that meets their needs.

The older players in the DBMS marketplace are encumbered by enterprise-grade capabilities that hamper performance. (http://www.think88.com/Examples/Think88_SybaseIQ_wp.pdf) Are enterprise-grade capabilities and performance mutually exclusive?

Everybody should read a great book by Clayton Christenson called The Innovator’s Dilemma. The established vendors are hampered (in my opinion) primarily by legacy code and an unwillingness to delete or change features in their products. As such, they are 30-year-old technology that is no longer good at anything. The products from the current vendors deserve to be sent to the Home for Tired Software. How to morph from obsolete products to new ones without losing one’s customer base is a challenge—which is the topic of the book above.

In this world, the traditional vendors are nonstarters, and users are looking for something that meets their [performance] needs. … The products from the current vendors deserve to be sent to the Home for Tired Software.

THE CUTTING EDGE

Why do you believe that it is time for a complete rewrite of relational database management systems?

In every market I can think of, the traditional vendors can be beaten by one to two orders of magnitude by something else. In OLTP, it is NewSQL; in data warehouses, it is column stores; in complex analytics, it is array stores; in document management, it is NoSQL. I see a world where there are (perhaps) a half-dozen differently architected DBMSs that are purpose built. In other words, I see the death of one-size-fits-all.

Your latest projects, Vertica and VoltDB, claim to leave legacy database management systems in the dust, yet neither of them have published TPC benchmarks. How relevant are TPC benchmarks today?

It is well understood that the standard benchmarks have been constructed largely by the traditional RDBMS vendors to highlight their products. Also, it is clear that they can make their products go an order of magnitude faster on standard benchmarks than is possible on similar workloads.

I encourage customers to demand benchmark numbers on their real applications.

A massively parallel, shared-nothing database management system scales very well if the data can be sharded and if each node has all the data it needs. However, if the physical database design does not meet the needs of the application, then broadcasting of data over the network will result in diminished scalability. How can this problem be prevented? (Question from Dave Abercrombie, Principal Database Architect, Convio)

Physical database design will continue to be a big challenge, for the reasons you mention. It is not clear how to get high performance from an application that does not shard well without giving something else up. This will allow application architects to earn the big bucks for the foreseeable future.

It is well understood that the standard benchmarks have been constructed largely by the traditional RDBMS vendors to highlight their products. … I encourage customers to demand benchmark numbers on their real applications.

“GO WEST, YOUNG [WOMAN], GO WEST AND GROW UP WITH THE COUNTRY”

You’ve had a ringside seat during the relational era and have spent a lot of time in the ring yourself. What would you have changed if you could go back and start all over again?

I would have made Oracle do serious quality control and not confuse future tense and present tense with regard to product features.

George Orwell imagined that by the year 1984, a device called a “telescreen” would watch you day and night and hear everything you said. (http://books.google.com/books?id=w-rb62wiFAwC&pg=PA7) Substitute “database” for “telescreen” and “government,” “advertisers,” or “criminals” for “thought police,” and Orwell’s vision is not far from today’s reality. Big Data is watching us every minute of the day. Every movement is tracked and recorded by cell towers; every thought is tracked and recorded by search engines; every financial transaction is tracked and recorded by the financial industry; and every text message, email message, and phone conversation is tracked and recorded by service providers. Are databases more evil than good?

A good example is the imminent arrival of sensors in your car, put there by your insurance carrier in exchange for lower rates. Of course, the sensor tracks your every movement, and your privacy is compromised. I expect most customers to voluntarily relinquish their privacy in exchange for lower rates.  Cell phones and credit cards are similar; we give up privacy in exchange for some sort of service. I expect that our privacy will be further compromised, off into the future.

As long as we feel this way as a society, privacy will be nonexistent.

What advice do you have for the young IT professional or computer science graduate just starting out today? Which way is west?

The Internet made text search a mainstream task. Ad placement and web mass personalization are doing likewise for machine learning. Databases are getting bigger faster than hardware is getting cheaper.  Hence, I expect DBMS technology will continue to enjoy a place in the sun. ▲

I would have made Oracle do serious quality control and not confuse future tense and present tense with regard to product features.

Download the 100th issue of the NoCOUG Journal

Categories: DBA, Interviews, NoCOUG, NoSQL, Oracle, SQL
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: