Archive

Posts Tagged ‘Big Data’

The Mistakes of the Relational Camp: Mistake #1: The de-emphasis of physical database design

September 9, 2013 3 comments

See also :  No! to SQL and No! to NoSQL

The inventor of the relational model, Dr. Edgar “Ted” Codd believed that the suppression of physical database design details was the chief advantage of the relational model. He made the case in the very first sentence of the very first paper on the relational model saying “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” (“A Relational Model of Data for Large Shared Data Banks,” reprinted with permission in the 100th issue of the NoCOUG Journal.)

How likely is it that application developers will develop highly performant and scalable applications if they are shielded from the internal representation of data? The de-emphasis of physical database design was the biggest mistake of the relational camp and provided the opening for NoSQL and Big Data technologies to proliferate.

A case in point is that the language SQL which is universally used by application developers was not created with them in mind. As explained by the creators of SQL (originally called SEQUEL) in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners [emphasis added]. It is for this class of users that SEQUEL is intended. For this reason, SEQUEL emphasizes simple data structures and operations [emphasis added].” (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf)

If you were the manager of a bookstore, how would you stock the shelves? Would you stand at the door and fling books onto any shelf that had some free space, perhaps recording their locations in a notebook for future reference. Of course not! And would you scatter related books all over the bookstore? Of course not! Then why do we store rows of data in random fashion? The default Oracle table storage structure is the unorganized heap and it is chosen 99.9% of the time.

The de-emphasis of physical database design was an epic failure in the long run. Esther Dyson referred to the “join penalty” when she complained that “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” [1]

It doesn’t have to be that way. Oracle Database has always provided a way to cluster rows of data from one or more tables using single-table or multi-table clusters in hashed or indexed flavors and thus to completely avoid the join penalty that Esther Dyson complained about. However, they must be the longest and best kept secret of Oracle Database—as suggested by their near-zero adoption rate—and have not been emulated by any other DBMS vendor. You can read more about them at https://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.

It doesn’t have to be that way. But it is.

1. Esther Dyson was the editor of a newsletter called Release 1.0. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

See also :  No! to SQL and No! to NoSQL

%d bloggers like this: