Archive

Archive for the ‘NoCOUG’ Category

Oracle Trivia Quiz

October 29, 2014 Leave a comment

All the answers can be found in the November 2014 issue of the NoCOUG Journal.

I am the editor of the NoCOUG Journal. What’s NoCOUG, you ask? Only the oldest and most active Oracle users group in the world. If you live in the San Francisco bay area and have never ever attended a NoCOUG conference in the last 28 years (that’s how long NoCOUG has been around), please contact me for a free pass to our upcoming conference at eBay on November 19, featuring presentations such as “Managing the Largest OLTP Database in the World” and “I Didn’t Know that Toad could do That!” Continental breakfast, hot lunch (can you say “Chicken Tikka Masala”), and never-ending refreshments. Who says there’s no such thing as a free lunch?
  1. Which executive vice-president of product development at Oracle began as the PL/SQL product manager? (page 23)
  2. Which senior vice-president of server technologies at Oracle wrote the B-Tree indexing code back in the day? (page 23)
  3. What is the evil twin of relational algebra? (page 17)
  4. If you have SELECT privilege on a table, can you also “SELECT FOR UPDATE”? (page 19)
  5. Who coined the term “compulsive tuning disorder”? (page 13 footnote)
  6. What percentage of women leave high technology at the mid-career mark? (page 4)
  7. How can you empower application developers with access to live performance data from the production database? (page 6)
  8. What kind of optimization is the root of all evil according to Professor Donald Knuth? (page 10)
  9. Can UNION ALL materialized views be fast-refreshed on commit? (page 18)
  10. What will be the winning lottery numbers for the next Powerball drawing (just checking if you’re awake)

P.S. A 10-year digital archive of the NoCOUG Journal is available at http://nocoug.wordpress.com/nocoug-journal-archive/.

Categories: NoCOUG, Oracle

Results of the NoCOUG SQL Mini-Challenge

October 28, 2014 2 comments

As published in the November 2014 issue of the NoCOUG Journal

The inventor of the relational model, Dr. Edgar Codd, was of the opinion that “[r]equesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language” (“Relational Completeness of Data Base Sublanguages”). Therefore, with the exception of the union operation, the original version of SQL was based on relational calculus, although, over time, other elements of relational algebra like difference (minus), intersection, and outer join crept in.

Testing of existence in a set using subqueries is the hallmark of relational calculus. The following example has nested subqueries. It lists the locations containing a department that either contains an employee named Steven King or an employee who holds the title of President or an employee who has previously held the title of President.

SELECT l.location_id, l.city
FROM locations l
WHERE EXISTS (
  SELECT * FROM departments d, employees e, jobs j
  WHERE d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND j.job_id = e.job_id
  AND (
    (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_title  = 'President'
    OR EXISTS (
      SELECT * FROM job_history jh, jobs j2
      WHERE jh.employee_id = e.employee_id
      AND j2.job_id = jh.job_id
      AND j2.job_title = 'President'
    )
  )
)

The challenge was to rewrite the above query without subqueries and in the most efficient way possible, as measured by the number of consistent gets—the Buffers column in the query execution plan—when executed in the HR sample schema. The competitors published their submissions on the NoCOUG blog nocoug.wordpress.com. Oracle ACE Directors Kyle Hailey and Tim Gorman helped judge the competition.

Backgrounder

The secret sauce of relational database management systems is composed of what Dr. Edgar Codd referred to as “relational calculus” and “relational algebra.” He was a mathematician by training so he instinctively used complex-sounding mathematical terms that normal folk don’t use a lot. Relational calculus has nothing to do with college calculus; a relational calculus expression is simply an English-like non-procedural specification of the user’s query requirements. Relational algebra, on the other hand, is a step-by-step procedural recipe that details how to meet the user’s requirements. The whole reason why there is so much interest in EXPLAIN PLAN is that it documents the sequence of relational algebra operations that Oracle Database used at runtime to execute any particular query; that is, it documents the query execution plan used by Oracle Database.

More precisely, relational algebra is a collection of operations that can be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like “restriction,” “projection,” “union,” “difference,” and “cartesian join.” These five operations are provably sufficient to implement any requirement expressed in relational calculus. They are also primitive; that is, none of them is a combination of the others. Composite relational operations can be devised by combining these five operations; examples include “inner join,” “intersection,” “outer join,” “semi-join,” “anti-join,” and “division.” For example, inner join is obviously a combination of cartesian join and restriction. Another example is A INTERSECTION B = A MINUS (A MINUS B).

In “Relational Completeness of Data Base Sublanguages” Codd showed how to mechanically convert a relational calculus expression into a relational algebra expression. Refer to http://goo.gl/SHcWLE for a worked example. In the case of the challenge query, a little thought—or a lot of thought—will convince you that the EXISTS clauses can be eliminated by the simple expedient of selecting distinct values of location_id and city from the join of all the tables in the original query, as shown below. Note that the jobs table has to be joined twice because it occurs twice in the challenge query.

SELECT DISTINCT l.location_id, l.city
FROM locations l, departments d, employees e, jobs j, job_history jh, jobs j2
WHERE EXISTS (
  SELECT * FROM departments d, employees e, jobs j
  WHERE d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND j.job_id = e.job_id
  AND (
    (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_title  = 'President'
    OR EXISTS (
      SELECT * FROM job_history jh, jobs j2
      WHERE jh.employee_id = e.employee_id
      AND j2.job_id = jh.job_id
      AND j2.job_title = 'President'
    )
  )
)

However, such a mechanical conversion from relational calculus to relational algebra may not be your best bet. In general, there is more than one way to write a SQL query, and the query optimizer may choose a different query execution plan for each version. In other words, functionally equivalent SQL queries may not be equally efficient in practice. This is the biggest unsolved problem of the relational era.

Judging Criteria

The criteria used by the judges were correctness and efficiency. Efficiency was measured by the number of consistent gets, that is, the Buffers column in the query execution plan. The GATHER_PLAN_STATISTICS hint instructs Oracle Database to keep track of “rowsource execution statistics,” which can then be printed out by DBMS_XPLAN.DISPLAY_CURSOR. Judging correctness is a much harder task, because no tools are available for determining the equivalence of two queries. In fact, the reason why the query optimizer may choose different query execution plans for two functionally equivalent queries is that it cannot tell if they are equivalent. What is needed is a systematic method of converting a SQL query into a “canonical” version such that all functionally equivalent queries would be converted into the same canonical version. In the absence of automated tools, we have to resort to thinking very hard and writing test cases.

We Have a Winner!

Kim Berg Hansen from Denmark wins the mini-challenge. Not only did his solution survive the test cases written by the judges but he reduced the number of consistent gets to the theoretical minimum of 1 by using a UNION ALL materialized view. The first branch of the UNION ALL finds employees named Steven King or who hold the title of President, while the second branch finds employees who previously held the title of President. Materialized views can be indexed just like tables, so Kim created an index consisting only of the location_id and city columns. He also took pains to ensure that the materialized view was fast-refreshable on commit. Therefore, his solution continues to return correct results even after the tables are updated.

CREATE MATERIALIZED VIEW LOG ON employees WITH ROWID, COMMIT SCN
   (employee_id, job_id, first_name, last_name, department_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON departments WITH ROWID, COMMIT SCN
   (department_id, location_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON locations WITH ROWID, COMMIT SCN
   (location_id, city)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON jobs WITH ROWID, COMMIT SCN
   (job_id, job_title)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON job_history WITH ROWID, COMMIT SCN
   (job_id, employee_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW employees_mv
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE
AS
SELECT 'CURRENT' record_type, l.location_id, l.city, j.rowid j_rowid, NULL jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
FROM locations l, departments d, employees e, jobs j
WHERE (
  d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND j.job_id = e.job_id
  AND (
    (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_title  = 'President'
  )
)
UNION ALL
SELECT 'HISTORY' record_type, l.location_id, l.city, j2.rowid j_rowid, jh.rowid jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
FROM locations l, departments d, employees e, job_history jh, jobs j2
WHERE (
  d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND (
      jh.employee_id = e.employee_id
      AND j2.job_id = jh.job_id
      AND j2.job_title = 'President'
  )
);

CREATE INDEX employees_mv_ix on employees_mv (
   location_id, city
);

Kim’s solution to the mini-challenge mimics the definition of the materialized view in the hope that “query rewrite” will kick in. Because Oracle Database does not index null values, the redundant clause “WHERE location_id IS NOT NULL” is added to ensure that the query optimizer uses the index.

SELECT /*+ GATHER_PLAN_STATISTICS */ DISTINCT location_id, city
FROM (
  SELECT l.location_id, l.city
  FROM locations l, departments d, employees e, jobs j
  WHERE (
    d.location_id = l.location_id
    AND e.department_id = d.department_id
    AND j.job_id = e.job_id
    AND (
      (e.first_name = 'Steven' AND e.last_name = 'King')
      OR j.job_title  = 'President'
    )
  )
  UNION ALL
  SELECT l.location_id, l.city
  FROM locations l, departments d, employees e,
    job_history jh, jobs j2
  WHERE (
    d.location_id = l.location_id
    AND e.department_id = d.department_id
    AND (
        jh.employee_id = e.employee_id
        AND j2.job_id = jh.job_id
        AND j2.job_title = 'President'
    )
  )
)
WHERE location_id IS NOT NULL;

On checking the query execution plan, we find that the data was indeed obtained from the index on the materialized view. Only one consistent get operation was needed.

SELECT * FROM
TABLE(dbms_xplan.display_cursor(format => 'TYPICAL IOSTATS LAST'));

Plan hash value: 4093995400

--------------------------------------------------------------------------
| Id  | Operation          | Name            | Starts | A-Rows | Buffers |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |      1 |      1 |       1 |
|   1 |  SORT UNIQUE NOSORT|                 |      1 |      1 |       1 |
|*  2 |   INDEX FULL SCAN  | EMPLOYEES_MV_IX |      1 |      1 |       1 |
--------------------------------------------------------------------------

Kim wins a $75 Amazon gift certificate for his most excellent effort. The entries of the other competitors have been published on the NoCOUG blog nocoug.wordpress.com.

Categories: NoCOUG, Oracle, SQL

The people behind the NoCOUG Journal

Now in its 28th year, the NoCOUG Journal is the oldest Oracle user group publication in the world. No other small user group in the world has a printed journal. Most large user groups do not have printed journals either. But little NoCOUG does. I am the editor of the NoCOUG Journal and—I must confess—I get sad when I see a discarded copy of the NoCOUG Journal at a NoCOUG conference. But the person who discarded it probably didn’t realize how much it costs to produce the Journal—$15 per copy—and how much volunteer effort goes into each issue. A very special mention goes to Brian Hitchcock, who has written dozens of book reviews for the Journal over a 12-year period.

And the production qualities of the Journal are simply awesome. The Journal is professionally copyedited and proofread by veteran copyeditor Karen Mead of Creative Solutions. Karen polishes phrasing and calls out misused words (such as the noun “reminiscence” instead of the verb “reminisce”). She dots every i, crosses every t, checks every quote, and verifies every URL. Then, the Journal is expertly designed by graphics duo Kenneth Lockerbie and Richard Repas of San Francisco-based Giraffex. And, finally, Jo Dziubek at Andover Printing Services deftly brings the Journal to life on an HP Indigo digital printer. The professional pictures on the front cover are supplied by Photos.com.

The content of the Journal is beyond awesome. But I’ll let you judge that for yourself. Click on the icons below to download the last four issues of the Journal.

Spring 2014 issue of the NoCOUG JournalNoCOUG_Journal_201402_IconNoCOUG_Journal_201311_IconNoCOUG_Journal_201308_Icon

Categories: DBA, NoCOUG, Oracle

Cupcake Wars at NoCOUG Spring Conference


FOR IMMEDIATE RELEASE

Cupcake Wars at NoCOUG Spring Conference on May 15 at UCSC Extension Silicon Valley

SILICON VALLEY (APRIL 1, 2014) – In a bold experiment aimed at increasing attendance at its awesome educational conferences, the Northern California Oracle Users Group (NoCOUG) is considering changing the format of its spring conference to that of Food Network’s “Cupcake Wars.”

Distinguished Oracle Product Manager Bryn Llewellyn will lead the PL/SQL team, OraPub founder Craig Shallahamer will lead the DBA team, Hadoop maven Gwen Shapira will lead the Big Data team, and Database Specialists Director of Managed Services Terry Sutton will lead the RAC team. NoCOUG president Hanan Hit will stride from one room to another shouting “TEN MINUTES, BAKERS! YOU HAVE TEN MINUTES LEFT!

“NoCOUG has been serving the Oracle community for 28 years but our conferences are best known for their awesome educational content. We want our conferences to also be a place where people can come together on a social level” said NoCOUG president Hanan Hit when asked for comment.

Registration for the spring conference is now open. Click here to view the complete agenda and register.

Also in today’s news:

  • Want to make easy money? “Airbrb,” based on the apartment-renting app Airbnb, lets you rent out your office desk while you hang out at the water cooler or take a bio break.
  • Convert any website into emoticon characters: Google now lets you emojify the web.

Oracle 12c Gives Fresh Life to the Relational Database Movement

Explosive Revelations about Dr. Edgar (Ted) Codd rock the NoSQL World

Secret of Oracle Database Performance Found in 1897 Sears Roebuck Catalog

The Mistakes of the Relational Camp: Mistake #1: The de-emphasis of physical database design

September 9, 2013 3 comments

See also :  No! to SQL and No! to NoSQL

The inventor of the relational model, Dr. Edgar “Ted” Codd believed that the suppression of physical database design details was the chief advantage of the relational model. He made the case in the very first sentence of the very first paper on the relational model saying “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” (“A Relational Model of Data for Large Shared Data Banks,” reprinted with permission in the 100th issue of the NoCOUG Journal.)

How likely is it that application developers will develop highly performant and scalable applications if they are shielded from the internal representation of data? The de-emphasis of physical database design was the biggest mistake of the relational camp and provided the opening for NoSQL and Big Data technologies to proliferate.

A case in point is that the language SQL which is universally used by application developers was not created with them in mind. As explained by the creators of SQL (originally called SEQUEL) in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners [emphasis added]. It is for this class of users that SEQUEL is intended. For this reason, SEQUEL emphasizes simple data structures and operations [emphasis added].” (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf)

If you were the manager of a bookstore, how would you stock the shelves? Would you stand at the door and fling books onto any shelf that had some free space, perhaps recording their locations in a notebook for future reference. Of course not! And would you scatter related books all over the bookstore? Of course not! Then why do we store rows of data in random fashion? The default Oracle table storage structure is the unorganized heap and it is chosen 99.9% of the time.

The de-emphasis of physical database design was an epic failure in the long run. Esther Dyson referred to the “join penalty” when she complained that “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” [1]

It doesn’t have to be that way. Oracle Database has always provided a way to cluster rows of data from one or more tables using single-table or multi-table clusters in hashed or indexed flavors and thus to completely avoid the join penalty that Esther Dyson complained about. However, they must be the longest and best kept secret of Oracle Database—as suggested by their near-zero adoption rate—and have not been emulated by any other DBMS vendor. You can read more about them at http://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.

It doesn’t have to be that way. But it is.

1. Esther Dyson was the editor of a newsletter called Release 1.0. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

See also :  No! to SQL and No! to NoSQL

Solve the Oracle Database murder mystery and win a free ticket for yourself and a friend to the NoCOUG conference‏

You may remember this children’s song from kindergarten or you can listen to this YouTube video:

“Ten green bottles hanging on the wall
Ten green bottles hanging on the wall
And if one green bottle should accidentally fall
There’ll be nine green bottles hanging on the wall.”

In this Oracle Database murder mystery, there were no green bottles left hanging on the wall after the first bottle fell. Send your solution to contact_nocoug@nocoug.org and receive a free ticket for yourself and a friend to the NoCOUG conference on Thursday, August 15 featuring performance guru Craig Shallahamer, a full track of Oracle Database 12c presentations, and alternative technology presentations on MySQL, NoSQL, and Big Data. Click here to review the detailed agenda.

It was a beautiful spring day. Popcorn was popping on the apricot tree. What does this have to do with databases? Nothing, but I’m trying to write a novel!

As I said, it was a beautiful Wednesday morning in spring. The time was exactly 9:12:00 AM PST. A database user noticed that her favorite database was down and called her favorite DBA—let’s call him Jack—for help.

For the record, I am not related to Jack.

Young Jack jumped to it and restarted the database lickety-split. Then disaster struck! The nine other databases on that database server—a Linux box with NetApp storage—crashed like bowling pins!

An unseen hand restarted all the databases immediately but the damage was done. Jack was dragged to the DBA interrogation chamber—the DBA manager’s office—and made to sit on the hot seat.

It was a sunny day and the sun was streaming in through the plate glass windows, which explains why the seat was so hot. Besides, the air-conditioning was not working that day.

“WHAT HAVE YOU DONE,” bellowed the furious DBA manager. “I was only trying to help,” said poor Jack.

“HELP!? DO YOU CALL THAT HELPING!?” bellowed the furious DBA manager. The database alert logs were examined. The first database log showed that someone had used the command “STARTUP FORCE” at precisely 9:12:00 AM PST.

“DID YOU DO THAT!? DID YOU DO THAT!?” bellowed the furious DBA manager. “Yes, I did that,” said poor Jack, “but I was only trying to help.”

A single tear slowly streamed down young Jack’s cheek.

“HELP!? DO YOU CALL THAT HELPING!?” bellowed the furious DBA manager, unmoved by Jack’s obvious distress. The remaining database alert logs were examined. Each of them showed that someone had used the command “SHUTDOWN IMMEDIATE” followed by “STARTUP” right after the first database was restarted by Jack. “DID YOU DO THAT!? DID YOU DO THAT!?” bellowed the furious DBA manager.

“I didn’t do any of that,” said poor Jack.

If you believe Jack’s protestations of innocence, figure out how and why nine databases were mysteriously stopped and restarted. Send your solution to contact_nocoug@nocoug.org and receive a free ticket for yourself and your friend to the summer conference on Thursday, August 15 at Chevron in San Ramon. RSVP here.

Happy sleuthing!

P.S. For more than 25 years, NoCOUG has helped Oracle professionals like you continuously improve and enhance your skill sets through our conferences and Journal. Our conferences are held on the third Thursday of February (winter conference), May (spring conference), August (summer conference), and November (fall conference) and are filled with practical and cutting-edge content for application developers as well as database administrators. Please help spread the word about NoCOUG by forwarding this message to your friends and colleagues. They can join our email list at http://www.nocoug.org/.

C J Date

This colleague of Dr. Edgar “Ted” Codd was featured in the latest NoCOUG Journal

Categories: Announcements, DBA, Humor, NoCOUG

No! to SQL and No! to NoSQL

July 28, 2013 3 comments

See also :  What’s so sacred about relational anyway?

C J DateThe opinions of relational theoreticians C. J. Date and Hugh Darwen about SQL are well known but they have not previously commented on the NoSQL phenomenon and so their interview in the latest issue of the NoCOUG Journal—the official publication of the Northern California Oracle Users group—makes for interesting reading. In my opinion, their comments are right on the mark even though Date admits that he knows almost nothing about NoSQL products.

In discussing Amazon Dynamo—the forerunner of the NoSQL movement—and those that came after it, Date and Darwen make these astute observations:

“Developers tend to be more concerned with convenience in database definition and updating than with the ease of deriving useful and reliable information from the database.”—Darwen

 “If there’s a suggestion that Amazon’s various disaster scenarios, regarding tornados and the rest, are somehow more of a problem for relational systems than they are for nonrelational ones, then of course I reject that suggestion 100 percent.”—Date

“Those who disparage relational are almost invariably very far from being properly informed and almost invariably equate ‘relational’ with [current implementations].”—Darwen

Dynamo Assumptions and Requirements

Date and Darwen’s remarks are spot on. Here is the Dynamo use case from the 2007 ACM paper by Amazon: “Customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. Therefore, the service responsible for managing shopping carts requires that it can always write to and read from its data store, and that its data needs to be available across multiple data centers. … There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications. … Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability. … Dynamo targets applications that operate with weaker consistency (the “C” in ACID) if this results in high availability.”

To paraphrase, Amazon’s goals were extreme performance, extreme scalability, and extreme availability and it concluded that the only way to achieve its ends was to discard the relational model.

Compartmentation

Amazon started off on the right track. Its first innovation was to break up the traditional monolithic enterprise-wide database service into simpler component services such as the best seller list service, the shopping cart service, the customer preferences service, the sales rank service, and the product catalog service. This avoids a single point of failure.

Amazon’s first DBA, Jeremiah Wilton, explained Amazon’s approach in his answer to the question “Is 24×7 a myth?” in an interview published in the NoCOUG Journal in 2007. He said “The best availability in the industry comes from application software that is predicated upon a surprising assumption: The databases upon which the software relies will inevitably fail. The better the software’s ability to continue operating in such a situation, the higher the overall service’s availability will be. But isn’t Oracle unbreakable? At the database level, regardless of the measures taken to improve availability, outages will occur from time to time. An outage may be from a required upgrade or a bug. Knowing this, if you engineer application software to handle this eventuality, then a database outage will have less or no impact on end users. In summary, here are many ways to improve a single database’s availability. But the highest availability comes from thoughtful engineering of the entire application architecture.”

As an example, the shopping cart service should not be affected if the checkout service is having hiccups.

There’s no conflict with the relational model there, even if the shopping cart data is not in the same database as the product catalog and occasional inconsistencies result. Occasionally, an item that is present in a shopping cart may go out of stock. Occasionally, an item that is present in a shopping cart may be repriced. The problems can be resolved when the customer decides to checkout if not earlier. As an Amazon customer, I occasionally leave items in my shopping cart but don’t complete a purchase. When I resume shopping, I sometimes get a notification that an item in my shopping chart is no longer in stock or has been repriced.

In any case, current relational implementations do not allow referential integrity constraints to span databases. See, for example, http://docs.oracle.com/cd/E11882_01/server.112/e25494/ds_appdev003.htm.

Extreme Scalability

Focusing next on extreme scalability, Amazon saw that the solution was “sharding” or horizontal partitioning of a hierarchical schema amongst shared-nothing database servers. The simple hierarchical schemas that resulted from compartmentation were very shardable. No conflict with the relational model there.

Extreme Availability

Continuing this line of thought, Amazon saw that the key to extreme availability was data replication. Multiple copies of the shopping cart are allowed to exist and, if one of the replicas becomes unresponsive, the data can be served by one of the other replicas. The technique used by Dynamo has a close parallel in the well-known technique of “multimaster replication.” However, because of network latencies, the copies may occasionally get out of sync and the customer may occasionally encounter a stale version of the shopping cart. Once again, this can be appropriately handled by the application tier; the node that falls behind can catch up eventually or inconsistencies can be detected and resolved at an opportune time such as checkout. This technique is called “eventual consistency.”

The inventor of relational theory, Dr. Edgar “Ted” Codd, himself blessed eventual consistency in the very first paper on relational theory “A Relational Model of Data for Large Shared Data Banks” which was reprinted with permission in the 100th issue of the NoCOUG Journal. In that paper, Dr. Codd said:

“There are, of course, several possible ways in which a system can detect inconsistencies and respond to them. In one approach the system checks for possible inconsistency whenever an insertion, deletion, or key update occurs. Naturally, such checking will slow these operations down. [emphasis added] If an inconsistency has been generated, details are logged internally, and if it is not remedied within some reasonable time interval, either the user or someone responsible for the security and integrity of the data is notified. Another approach is to conduct consistency checking as a batch operation once a day or less frequently.”

In other words, the inventor of relational theory did not see a conflict between his relational model and the “eventual consistency” that is one of the hallmarks of the NoSQL products of today. However, Amazon imagined a conflict because it quite understandably conflated the relational model with the ACID guarantees of database management systems. However, ACID has nothing to do with the relational model per se (though relational theory does come in very handy in defining consistency constraints); pre-relational database management systems such as IMS provided ACID guarantees and so did post-relational object-oriented database management systems.

The tradeoff between consistency and performance is as important in the wired world of today as it was in Codd’s world. We cannot cast stones at Dynamo for the infraction of not guaranteeing the synchronization of replicated data, because violations of the consistency requirement are commonplace in the relational camp. The replication technique used by Dynamo is similar to the well-known technique of “multimaster replication.” Application developers are warned about the negative impact of integrity constraints. [1] [2] [3] [4] And, most importantly, no DBMS that aspires to the relational moniker currently implements the SQL-92 “CREATE ASSERTION” feature that is necessary in order to provide the consistency guarantee. For a detailed analysis of this anomaly, refer to Toon Koppelaars’s article “CREATE ASSERTION: The Impossible Dream” in the August 2013 issue of the NoCOUG Journal.

Extreme Performance

The final hurdle was extreme performance and that’s where Amazon went astray. Amazon believed that the relational model makes data retrieval and updates inefficient. This objection to the relational model is colorfully summarized by the following statement attributed to Esther Dyson, the editor of the Release 1.0 newsletter, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” The statement dates back to 1988 and was much quoted when object-oriented databases were in vogue. [5]

Since the shopping cart is an “object,” then surely disassembling it for storage makes data retrieval and updates inefficient? The belief stems from an unfounded assumption that has found its way into every relational implementation to date—that a “table” should map to physical storage. In reality, the relational model does not concern itself with storage details at all. The relational model is a logical model, it does not concern itself with storage details. It would be perfectly legitimate to store the shopping cart in a physical form that resembled a shopping cart while still offering a relational model of the data complete with SQL. In other words, the physical representation could be optimized for the most important use case—retrieving the entire shopping-cart object using its key—without affecting the relational model of the data. It would also be perfectly legitimate to provide a non-relational API for the important use cases. Dr. Codd himself gave conditional blessing to such non-relational APIs in his 1985 Computerworld article “Is Your DBMS Really Relational?” where he says “If a relational system has a low-level (single-record-at-a-time) language, that low level [should not] be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”

Have I mentioned that Dr. Codd invented relational theory?

Zero-th Normal Form

In fact, the key-value approach used by Dynamo and those that came after it is exactly equivalent to “zero-th” normal form in relational terminology. [6] In his 1970 paper, Codd says: “Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on. For example, one of the domains on which the relation employee is defined might be salary history. An element of the salary history domain is a binary relation defined on the domain date and the domain salary. The salary history domain is the set of all such binary relations. At any instant of time there are as many instances of the salary history relation in the data bank as there are employees. In contrast, there is only one instance of the employee relation.” In relational terminology, a relation with non-simple domains is said to be in “zero-th” normal form or unnormalized. Codd suggested that unnormalized relations should be normalized for ease of use. Here is the unnormalized employee relation from Codd’s paper.

employee (
  employee#,
  name,
  birthdate,
  jobhistory (jobdate, title, salaryhistory (salarydate, salary)),
  children (childname, birthyear)
)

The above unnormalized relation can be decomposed into four normalized relations as follows.

employee' (employee#, name, birthdate)
jobhistory' (employee#, jobdate, title)
salaryhistory' (employee#, jobdate, salarydate, salary)
children' (employee#, childname, birthyear)

However, this is not to suggest that these normalized relations must necessarily be mapped to individual buckets of physical storage. Codd differentiated between the stored set, the named set, and the expressible set. In the above example, we have five relations but, if we preferred it, the unnormalized employee relation could be the only member of the stored set. Alternatively, if we preferred it, all five relations could be part of the stored set; that is, we could legitimately store redundant representations of the data. However, the common belief blessed by current practice is that the normalized relations should be the only members of the stored set.

Even if the stored set contains only normalized relations, they need not map to different buckets of physical storage. Oracle is unique amongst RDBMS vendors in providing a convenient construct called the “table cluster” which is suitable for hierarchical schemas. In Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk. If the cluster was a “hash cluster,” no indexes would be required to retrieve records corresponding to the same cluster key from all four tables.

Conclusion

The conclusion is that Amazon had an opportunity to kick the relational model up a notch but did not rise to the occasion. Amazon could have eaten its cake—extreme performance, extreme scalability, and extreme availability for important use cases such as shopping carts—and had it too—the relational model with all its wonderful declarative power. As Hugh Darwen said in the NoCOUG Journal interview: “Developers tend to be more concerned with convenience in database definition and updating than with the ease of deriving useful and reliable information from the database. … Those who disparage relational are almost invariably very far from being properly informed and almost invariably equate ‘relational’ with [current implementations].” I’ll leave you with that thought.

See also :  What’s so sacred about relational anyway?

1. “Using primary and foreign keys can impact performance. Avoid using them when possible.” (http://docs.oracle.com/cd/E17904_01/core.1111/e10108/adapters.htm#BABCCCIH)

2. “For performance reasons, the Oracle BPEL Process Manager, Oracle Mediator, human workflow, Oracle B2B, SOA Infrastructure, and Oracle BPM Suite schemas have no foreign key constraints to enforce integrity.” (http://docs.oracle.com/cd/E23943_01/admin.1111/e10226/soaadmin_partition.htm#CJHCJIJI)

3. “For database independence, applications typically do not store the primary key-foreign key relationships in the database itself; rather, the relationships are enforced in the application.” (http://docs.oracle.com/cd/E25178_01/fusionapps.1111/e14496/securing.htm#CHDDGFHH)

4. “The ETL process commonly verifies that certain constraints are true. For example, it can validate all of the foreign keys in the data coming into the fact table. This means that you can trust it to provide clean data, instead of implementing constraints in the data warehouse.” (http://docs.oracle.com/cd/E24693_01/server.11203/e16579/constra.htm#i1006300)

5. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

6. Chris Date is a strong proponent of “relation-valued attributes” (RVAs) and argues that relations with RVAs are as “normal” as those without. See “What First Normal Form Really Means” in Date on Database: Writings 2000-2006 (Apress, 2006).

Appendix: Table Clusters in Oracle Database

Here’s a demonstration of using Oracle table clusters to store records from four tables in the same block and retrieving all the components of the “employee cart” without using indexes. First we create four normalized tables and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. Then we create an object-relational view that assembles employee information into a single unnormalized structure and show how to insert into this view using an “INSTEAD OF” trigger.

The following demonstration was performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 11.2.0.2.

SQL*Plus: Release 11.2.0.2.0 Production on Sun Jul 28 19:44:23 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

First, we create a table cluster and add four tables to the cluster.

SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000;

Cluster created.

SQL> CREATE TABLE employees
  2  (
  3    employee# INTEGER NOT NULL,
  4    name VARCHAR2(16),
  5    birth_date DATE,
  6    CONSTRAINT employees_pk PRIMARY KEY (employee#)
  7  )
  8  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE job_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    title VARCHAR2(16),
  6    CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date),
  7    CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE salary_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    salary_date DATE NOT NULL,
  6    salary NUMBER,
  7    CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date),
  8    CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees,
  9    CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history
 10  )
 11  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE children
  2  (
  3    employee# INTEGER NOT NULL,
  4    child_name VARCHAR2(16) NOT NULL,
  5    birth_date DATE,
  6    CONSTRAINT children_pk PRIMARY KEY (employee#, child_name),
  7    CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);

Table created.

Then we insert data into all four tables. We find that all the records have been stored in the same database block even though they belong to different tables. Therefore the join-penalty has been eliminated.

SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970');

1 row created.

SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001');

1 row created.

SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002');

1 row created.

SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER');

1 row created.

SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN');

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000);

1 row created.

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1;

BLOCK_NUMBER
------------
       22881

Next we create an object-relational view that presents each employee as an object.

SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec
  2  /

Type created.

SQL> create or replace view employees_view as
  2  SELECT
  3    employee#,
  4    name,
  5    birth_date,
  6    CAST
  7    (
  8      MULTISET
  9      (
 10        SELECT
 11          child_name,
 12          birth_date
 13        FROM children
 14        WHERE employee#=e.employee#
 15      )
 16      AS children_tab
 17    ) children,
 18    CAST
 19    (
 20      MULTISET
 21      (
 22        SELECT
 23          job_date,
 24          title,
 25          CAST
 26          (
 27            MULTISET
 28            (
 29              SELECT salary_date, salary
 30              FROM salary_history
 31              WHERE employee#=e.employee#
 32              AND job_date=jh.job_date
 33            )
 34            AS salary_history_tab
 35          ) salary_history
 36        FROM job_history jh
 37        WHERE employee#=e.employee#
 38      )
 39      AS job_history_tab
 40    ) job_history
 41  FROM employees e;

View created.

Let’s retrieve one employee object and look at the query execution plan. No indexes are used in retrieving records from each of the four tables. The cost of the plan is just 1. This is the minimum achievable cost, indicating that there is no join-penalty.

SQL> alter session set "_rowsource_execution_statistics"=true;

Session altered.

SQL> SELECT * FROM employees_view WHERE employee# = 1;

 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         1 IGNATIUS         01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  aaxmaqz947aa0, child number 0
-------------------------------------
SELECT * FROM employees_view WHERE employee# = 1

Plan hash value: 2117652374

--------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost  | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |       |     1 |      1 |00:00:00.01 |       1 |
|*  1 |  TABLE ACCESS HASH| CHILDREN       |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  2 |  TABLE ACCESS HASH| SALARY_HISTORY |      2 |      1 |    44 |       |      4 |00:00:00.01 |       3 |
|*  3 |  TABLE ACCESS HASH| JOB_HISTORY    |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  4 |  TABLE ACCESS HASH| EMPLOYEES      |      1 |    845 | 27040 |       |      1 |00:00:00.01 |       1 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("EMPLOYEE#"=:B1)
   2 - access("EMPLOYEE#"=:B1)
       filter("JOB_DATE"=:B1)
   3 - access("EMPLOYEE#"=:B1)
   4 - access("EMPLOYEE#"=1)

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)

30 rows selected.

Next, let’s create an “INSTEAD OF” trigger so that we insert into the view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting for us.

SQL> CREATE OR REPLACE TRIGGER employees_view_insert
  2  INSTEAD OF INSERT ON employees_view
  3  REFERENCING NEW AS n
  4  FOR EACH ROW
  5  DECLARE
  6    i NUMBER;
  7  BEGIN
  8    INSERT INTO employees
  9    VALUES
 10    (
 11      :n.employee#,
 12      :n.name,
 13      :n.birth_date
 14    );
 15
 16    FOR i IN :n.children.FIRST .. :n.children.LAST
 17    LOOP
 18      INSERT INTO children
 19      VALUES
 20      (
 21        :n.employee#,
 22        :n.children(i).child_name,
 23        :n.children(i).birth_date
 24      );
 25    END LOOP;
 26
 27    FOR i IN :n.job_history.FIRST .. :n.job_history.LAST
 28    LOOP
 29      INSERT INTO job_history VALUES
 30      (
 31        :n.employee#,
 32        :n.job_history(i).job_date,
 33        :n.job_history(i).title
 34      );
 35      FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST
 36      LOOP
 37        INSERT INTO salary_history
 38        VALUES
 39        (
 40          :n.employee#,
 41          :n.job_history(i).job_date,
 42          :n.job_history(i).salary_history(j).salary_date,
 43          :n.job_history(i).salary_history(j).salary
 44         );
 45      END LOOP;
 46    END LOOP;
 47  END;
 48  /

Trigger created.

Finally, let’s insert an employee object directly into the view and confirm that we can read it back.

SQL> INSERT INTO employees_view
  2  VALUES
  3  (
  4    2,
  5    'YGNACIO',
  6    '01-JAN-70',
  7    CHILDREN_TAB
  8    (
  9      CHILDREN_REC('INIGA', '01-JAN-01'),
 10      CHILDREN_REC('INIGO', '01-JAN-02')
 11    ),
 12    JOB_HISTORY_TAB
 13    (
 14      JOB_HISTORY_REC
 15      (
 16        '01-JAN-91',
 17        'PROGRAMMER',
 18        SALARY_HISTORY_TAB
 19        (
 20          SALARY_HISTORY_REC('01-FEB-91', 1000),
 21          SALARY_HISTORY_REC('01-MAR-91', 1000)
 22        )
 23      ),
 24      JOB_HISTORY_REC
 25      (
 26        '01-JAN-92',
 27        'DATABASE ADMIN',
 28        SALARY_HISTORY_TAB
 29        (
 30          SALARY_HISTORY_REC('01-FEB-92', 2000),
 31          SALARY_HISTORY_REC('01-MAR-92', 2000)
 32        )
 33      )
 34    )
 35  );

1 row created.

SQL> SELECT * FROM employees_view WHERE employee# = 2;

 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         2 YGNACIO          01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))
Follow

Get every new post delivered to your Inbox.

Join 759 other followers

%d bloggers like this: