On the Toad World site, I’m writing a series of blog posts and articles on the subject of EXPLAIN PLAN. I’m using EXPLAIN PLAN as a motif to teach not just SQL tuning but also relational theory, logical database design, and physical database design. In a year’s time, I hope to have enough material for a self-published book.
Part 1—DON’T PANIC: Even experienced application developers may not understand EXPLAIN PLAN output. As the great Renaissance artist Leonardo da Vinci said in his discourse on painting: “Those who are in love with practice without science are like the sailor who gets into a ship without rudder or compass, who is never certain where he is going. Practice must always be built on sound theory … The painter who copies by practice and judgement of eye, without rules, is like a mirror which imitates within itself all the things placed before it without any understanding of them.”
Part 2—A Long Time Ago: When magnetic disk drives first became a reality in the 1960s, a software engineer at General Electric, Charles Bachman, invented the first database management system (DBMS). He conceived of the application developer as a navigator, navigating through records of different types using indexes and pointers. For his tremendous accomplishment, he received the ACM Turing Award, the equivalent of the Nobel prize in the computer field. We call his system the “network model” because of its emphasis on the pathways between individual records.
Part 3—The Impossible Dream: To make application development easier, an IBM researcher named Edgar Codd suggested that application developers should not have to concern themselves with indexes and pointers. Instead, they should use non-procedural languages and leave the choice of access paths to the DBMS. Dr. Codd used the mathematical term “relation” (table) for a set of records of a single type and called his theory the “relational model.” In contrast, Bachman had never once used the words “table” or “relation.” Dr. Codd also received the ACM Turing Award.
Part 4—Secret Sauce: The secret sauces of the relation model are “relational algebra” and “relational calculus.” Relational algebra is a collection of operations—such as selection, projection, union, difference, and join—that can be used to produce new tables from old while relational calculus is an English-like non-procedural language for specifying the characteristics of the desired information. A relational calculus expression has to be converted into an equivalent sequence of relational algebra operations (a query execution plan) but this is the job of the DBMS not the application developer.
- Introduction to Relational Algebra and Relational Calculus: Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like “selection,” “projection,” “union,” “difference,” and “join.” However, Codd was of the opinion that “Requesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language.” With the exception of the union operation, the original version of SQL was based on relational calculus though, over time, other elements of relational algebra like difference (minus), intersection, and outer join crept in.
- Equivalence of Relational Algebra and Relational Calculus: Dr. Codd showed how to systematically convert a relational calculus expression into an equivalent relational algebra expression. A given collection of relational algebra operations is “complete” if the operations in the collection can be used to translate all relational calculus expressions into equivalent algebra expressions.
Part 5—SQL Sucks: In practice, we don’t use relational algebra or relational calculus but an English-like query language called SQL. As with relational calculus expressions, a SQL statement must be converted into an equivalent sequence of relational algebra operations (a query execution plan) by the DBMS. SQL is a heavily redundant language offering multiple ways of posing the same query. Unfortunately, and for no fault of the application developer, semantically equivalent but syntactically different SQL statements typically end up with different execution plans that are not equally efficient.
Part 6—Trees Rule: The DBMS converts your SQL statement into an equivalent sequence of relational algebra operations (the query execution plan). EXPLAIN PLAN output is simply a listing of that query execution plan. The Oracle documentation incorrectly states that “The execution order in EXPLAIN PLAN output begins with the line that is the furthest indented to the right.” In reality, the EXPLAIN PLAN is a “tree” structure.
Part 7—Don’t pre-order your EXPLAIN PLAN: An EXPLAIN PLAN is a “tree” structure corresponding to a relational algebra expression. It is printed in “pre-order” sequence (visit the root of the tree, then traverse each subtree—if any—in pre-order sequence) but should be read in “post-order” sequence (first traverse each subtree—if any—in post-order sequence, then only visit the root of the tree).
Part 8—Tree Menagerie: There are four varieties of EXPLAIN PLAN trees: deep-left trees, deep-right trees, zigzag trees, and bushy trees. Deep left trees are very common because the optimizer typically picks a “driving” table and then joins tables to it one by one. Deep-right trees are useful in data warehouses for joining large fact tables to small dimension tables using hash joins. Hash tables are best constructed from the smaller of the inputs so Oracle will switch the order of the inputs when necessary; this results in zigzag trees. The optimizer does not generally consider bushy trees because they increase the search space beyond its capabilities. However, it is forced to use a bushy tree when faced with an unmergeable view.
Part 9—A Forest Hymn: It is popularly believed that the number of join orderings of N tables is FACTORIAL(N) = N * (N – 1) * (N – 2) * … * 3 * 2 * 1 because FACTORIAL(N) is the number of possible permutations of N objects. FACTORIAL(N) is actually the number of deep-left trees; it omits all the other possibilities. The actual number of join orderings is much larger.
Part 10—Mystery Tree: EXPLAIN PLAN output can sometimes be very confusing. In the EXPLAIN PLAN output that we obtained for the relational calculus solution of our first teaching example “employees who have worked in all accounting job classifications,” some operations seem to be located in the wrong nodes of the tree. The mystery can be solved by referring to the “predicate information” section of the EXPLAIN PLAN output and inserting additional nodes into the tree.
My paper on NoSQL and Big Data won the Editor’s Choice award at ODTUG Kscope14. Here are some key points from the paper:
- The relational camp made serious mistakes that limited the performance and usefulness of the relational model.
- NoSQL is based on the incorrect premise that tables in the relational model must be mapped to separate buckets of physical storage.
- All the innovations of the NoSQL camp could have been implemented within a relational framework.
Click here to download the paper. Most of it is based on my blog posts.
Over at ToadWorld:
Part 5: SQL Sucks!
Part 6: Trees Rule
Part 8: Tree Menagerie
Bonus article: Equivalence of Relational Algebra and Relational Calculus
The story so far:
- A relational database is “a database in which: the data is perceived by the user as tables (and nothing but tables) and the operators available to the user for (for example) retrieval are operators that derive ‘new’ tables from ‘old’ ones.” (An Introduction to Database Systems by Chris Date)
- SQL is a non-procedural language; that is, a SQL query specifies what data is needed but does not specify how to obtain it. The “query optimizer” automagically constructs a query execution plan for us.
- The query execution plan can and does change when the inputs (values of bind variables, data distribution statistics, etc.) change. This comes as a great surprise to everybody but that’s how it was always intended to work.
- A huge problem with relational databases is that semantically equivalent statements do not result in the same run-time query execution plan. That’s not how it was ever intended to work.
- The EXPLAIN PLAN documents the query execution plan used by Oracle Database; that is, it documents the sequence of relational algebra operations that Oracle Database uses at run-time to execute any particular SQL query.
- An EXPLAIN PLAN is a “tree” structure corresponding to a relational algebra expression. It is printed in “pre-order” sequence (visit the root of the tree, then traverse each subtree—if any—in pre-order sequence) but should be read in “in-order” sequence (first traverse each subtree—if any—in in-order sequence, then only visit the root of the tree).
- The Oracle documentation incorrectly states that “The execution order in EXPLAIN PLAN output begins with the line that is the furthest indented to the right.”
On the Toad World site, I’m publishing a whole series of blog posts and articles on the subject of EXPLAIN PLAN. I’m using EXPLAIN PLAN as a central motif to teach not just SQL tuning but relational theory, logical database design, and physical database design. In a year’s time, I hope to have enough material for a self-published book.
I’ve published five blog posts so far:
Part 1: DON’T PANIC
Part 2: A Long Time Ago
Part 4: Secret Sauce
Bonus article: Introduction to Relational Algebra and Relational Calculus
You can register at Toad World if you would like to be notified whenever there’s a new post.
Now in its 28th year, the NoCOUG Journal is the oldest Oracle user group publication in the world. No other small user group in the world has a printed journal. Most large user groups do not have printed journals either. But little NoCOUG does. I am the editor of the NoCOUG Journal and—I must confess—I get sad when I see a discarded copy of the NoCOUG Journal at a NoCOUG conference. But the person who discarded it probably didn’t realize how much it costs to produce the Journal—$15 per copy—and how much volunteer effort goes into each issue. A very special mention goes to Brian Hitchcock, who has written dozens of book reviews for the Journal over a 12-year period.
And the production qualities of the Journal are simply awesome. The Journal is professionally copyedited and proofread by veteran copyeditor Karen Mead of Creative Solutions. Karen polishes phrasing and calls out misused words (such as the noun “reminiscence” instead of the verb “reminisce”). She dots every i, crosses every t, checks every quote, and verifies every URL. Then, the Journal is expertly designed by graphics duo Kenneth Lockerbie and Richard Repas of San Francisco-based Giraffex. And, finally, Jo Dziubek at Andover Printing Services deftly brings the Journal to life on an HP Indigo digital printer. The professional pictures on the front cover are supplied by Photos.com.
The content of the Journal is beyond awesome. But I’ll let you judge that for yourself. Click on the icons below to download the last four issues of the Journal.
FOR IMMEDIATE RELEASE
Cupcake Wars at NoCOUG Spring Conference on May 15 at UCSC Extension Silicon Valley
SILICON VALLEY (APRIL 1, 2014) – In a bold experiment aimed at increasing attendance at its awesome educational conferences, the Northern California Oracle Users Group (NoCOUG) is considering changing the format of its spring conference to that of Food Network’s “Cupcake Wars.”
Distinguished Oracle Product Manager Bryn Llewellyn will lead the PL/SQL team, OraPub founder Craig Shallahamer will lead the DBA team, Hadoop maven Gwen Shapira will lead the Big Data team, and Database Specialists Director of Managed Services Terry Sutton will lead the RAC team. NoCOUG president Hanan Hit will stride from one room to another shouting “TEN MINUTES, BAKERS! YOU HAVE TEN MINUTES LEFT!
“NoCOUG has been serving the Oracle community for 28 years but our conferences are best known for their awesome educational content. We want our conferences to also be a place where people can come together on a social level” said NoCOUG president Hanan Hit when asked for comment.
Registration for the spring conference is now open. Click here to view the complete agenda and register.
Also in today’s news:
- Want to make easy money? “Airbrb,” based on the apartment-renting app Airbnb, lets you rent out your office desk while you hang out at the water cooler or take a bio break.
- Convert any website into emoticon characters: Google now lets you emojify the web.
Originally posted on So Many Oracle Manuals, So Little Time:
Day One: Disruptive Innovation
Day Two: Requirements and Assumptions
Day Three: Functional Segmentation
Day Four: Sharding
Day Five: Replication and Eventual Consistency
Day Six: The False Premise of NoSQL
Day Seven: Schemaless Design
Day Eight: Oracle NoSQL Database
Day Nine: NoSQL Taxonomy
Day Ten: Big Data
Day Eleven: Mistakes of the relational camp
Day Twelve: Concluding Remarks
On the twelfth day of Christmas, my true love gave to me
Twelve drummers drumming.
The relational camp put productivity, ease-of-use, and logical elegance front and center. However, the mistakes and misconceptions of the relational camp prevent mainstream database management systems from achieving the performance levels required by modern applications. For example, Dr. Codd forbade nested relations (a.k.a.unnormalized relations) and mainstream database management systems equate the normalized set with the stored set.
The NoSQL camp on the other hand put performance, scalability, and reliability front and center. Understandably the NoSQL…
View original 394 more words