Home > Big Data, DBA, Hadoop, NoCOUG, NoSQL, Oracle, Physical Database Design, SQL > The Mistakes of the Relational Camp: Mistake #1: The de-emphasis of physical database design

The Mistakes of the Relational Camp: Mistake #1: The de-emphasis of physical database design


See also :  No! to SQL and No! to NoSQL

The inventor of the relational model, Dr. Edgar “Ted” Codd believed that the suppression of physical database design details was the chief advantage of the relational model. He made the case in the very first sentence of the very first paper on the relational model saying “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” (“A Relational Model of Data for Large Shared Data Banks,” reprinted with permission in the 100th issue of the NoCOUG Journal.)

How likely is it that application developers will develop highly performant and scalable applications if they are shielded from the internal representation of data? The de-emphasis of physical database design was the biggest mistake of the relational camp and provided the opening for NoSQL and Big Data technologies to proliferate.

A case in point is that the language SQL which is universally used by application developers was not created with them in mind. As explained by the creators of SQL (originally called SEQUEL) in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners [emphasis added]. It is for this class of users that SEQUEL is intended. For this reason, SEQUEL emphasizes simple data structures and operations [emphasis added].” (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf)

If you were the manager of a bookstore, how would you stock the shelves? Would you stand at the door and fling books onto any shelf that had some free space, perhaps recording their locations in a notebook for future reference. Of course not! And would you scatter related books all over the bookstore? Of course not! Then why do we store rows of data in random fashion? The default Oracle table storage structure is the unorganized heap and it is chosen 99.9% of the time.

The de-emphasis of physical database design was an epic failure in the long run. Esther Dyson referred to the “join penalty” when she complained that “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” [1]

It doesn’t have to be that way. Oracle Database has always provided a way to cluster rows of data from one or more tables using single-table or multi-table clusters in hashed or indexed flavors and thus to completely avoid the join penalty that Esther Dyson complained about. However, they must be the longest and best kept secret of Oracle Database—as suggested by their near-zero adoption rate—and have not been emulated by any other DBMS vendor. You can read more about them at http://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.

It doesn’t have to be that way. But it is.

1. Esther Dyson was the editor of a newsletter called Release 1.0. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

See also :  No! to SQL and No! to NoSQL

About these ads
  1. September 10, 2013 at 1:01 am

    Iggy, Are you not missing the point of physical and logical models – that is to enable one to be changed without having to change the other as was the case with CODSAYL (network) and hierarchical databases before relational databases were conceived?

  2. Iggy Fernandez
    September 10, 2013 at 5:31 pm

    John Seaman :

    Iggy, Are you not missing the point of physical and logical models – that is to enable one to be changed without having to change the other as was the case with CODSAYL (network) and hierarchical databases before relational databases were conceived?

    In “Normalized Data Base Structure: A Brief Tutorial” Codd explains that the paramount consideration in hiding the internal representation of data is the convenience of application developers and simplified application development: “The complexity of the physical representations which these systems support is, perhaps, understandable, because these representations are selected in order to obtain high performance in access and update. What is less understandable is the trend toward more and more complexity in the data structures with which application programmers and terminal users directly interact. Surely, in the choice of logical data structures that a system is to support, there is one consideration of absolutely paramount importance [underlining occurs in the original text] – and that is the convenience of the majority of users. [emphasis added] … To make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming we must provide the simplest possible data structures and almost natural language. … What could be a simpler, more universally needed, and more universally understood data structure than a table?”

    In his 1981 Turing Award Lecture “Relational Database: A Practical Foundation for Productivity” he presses the point: “It is well known that the growth in demands from end users for new applications is outstripping the capability of data processing departments to implement the corresponding application programs. There are two complementary approaches to attacking this problem (and both approaches are needed): one is to put end users into direct touch with the information stored in computers; the other is to increase the productivity of data processing professionals in the development of application programs.”

    But how likely is it that application developers will develop highly performant and scalable applications if they are shielded from the internal representation of data? The de-emphasis of physical database design was an epic failure in the long run and provided the opening for NoSQL and Big Data technologies to proliferate. A proof point is the almost-zero adoption rate of Oracle multi-table clusters.

    Physical Database Independence means that “application programs and terminal activities remain logically unimpared [sic] whenever any changes are made in either storage representations or access methods.” (“Is your DBMS really relational?” Computerworld, October 14, 1985) Physical database independence is also very convenient for application developers but also serves to de-emphasize physical database design. As a case in point, application developers avoid proprietary features of database management systems in an effort to be database agnostic.

    • September 11, 2013 at 3:48 am

      But weren’t you arguing recently that there was really no need for the NoSQL approach and isn’t that what Chris Date was arguing in his recent interview in the NoCOUG magazine (No to SQL! No to NOSQL http://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/)?

      I have worked on may systems using Oracle over the years (including data warehouses although the databases were maybe quite small compared to Amazon”s or Google’s) and all the performance issues have been down to poor application design rather than physical layout of the data. So maybe NoSQL is more of an edge application than mainstream (although possibly becoming more mainstream). In which case I would argue that there is nothing wrong with the relational model per se.

      Also remember in the the late 70s and early 80s when Codd came up with is relational theory, computers were much less advanced, disks were much smaller and databases (as I said earlier) were either hierarchical or network and were (apparently) expensive, difficult and slow to change because the physical model was the same as the logical model. I think, therefore that you’re being a bit harsh on the poor guy (with the benefit of hindsight) :-).

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 772 other followers

%d bloggers like this: