Dear NoCOUG Members and Friends,
Oracle indexes can be given one, two, or three stars. A three-star index is the best possible index for a SQL query.
- The first star is given if the index entries relevant to the query are next to each other or at least as close to each other as possible. This minimizes the thickness of the index slice that must be scanned.
- The second star is given if the index rows are in the right order for the query. This eliminates sorting.
- The third star is given if the index rows contain all the columns referred to by the query. This eliminates table access: the access path is index-only.
You can read more about three-star indexes in Relational Database Index Design by Lahdenmaki and Leach which is #1 on the top-ten list of Oracle books recommended by Oracle performance expert Jonathan Lewis in his interview titled Everybody Loves Jonathan printed in the November 2008 issue of the NoCOUG Journal.
Similarly, Oracle conferences can be given one, two, or three stars. A three-star conference is the best possible conference you can imagine.
- The first star is given for great speakers.
- The second star is given for a great location.
- The third star is given for a great occasion.
NoCOUG Conference #100 Sponsored by Quest Software—Simplicity at Work on Wednesday, November 9 is a three-star conference.
- It will feature some of the best speakers in the world including Oracle ACE Directors Steven Feuerstein , Craig Shallahamer , and Alex Gorbachev .
- It will be held at the greatest location in the world for an Oracle conference—the Computer History Museum in Mountain View. The museum features wonderful computing artifacts including Hollerith’s 1890 Census Tabulator—the product that made IBM famous—and a Babbage Difference Engine—one of only two that have ever been constructed.
- It is a history-making occasion. NoCOUG is celebrating its 25th anniversary in 2011 and the upcoming conference is NoCOUG’s 100th quarterly conference.
Conference 100 will be our biggest, baddest conference ever! Don’t you want to be there? We’ve opened registration early because we expect so many attendees. Please review the agenda and RSVP right away.
Dave Abercrombie, Journal Editor
Eric Hutchinson, Webmaster
Gwen Shapira, Conference Director
Hanan Hit, Vice President
Iggy Fernandez, President
Jen Hong, Board Member at Large
Naren Nagtode, Secretary and Treasurer
Omar Anwar, Vendor Coordinator
Randy Samberg, Training Coordinator
Scott Alexander, Speaker Coordinator
In most people’s vocabularies, design means veneer. It’s interior decorating. It’s the fabric of the curtains and the sofa. But to me, nothing could be further from the meaning of design. Design is the fundamental soul of a man-made creation that ends up expressing itself in successive outer layers of the product or service.—Steve, Jobs. “Apple’s One-Dollar-a-Year Man.” Fortune Magazine. 24 Jan 2000.
Physical database design improvements are a cost-effective solution to a performance and scalability challenge.
Or are they?
Economic principles can equally well be used be used to make the case for faster hardware. In his book, Oracle on VMware, Dr. Bert Scalzo says:
Person hours cost so much more now than computer hardware even with inexpensive offshore outsourcing. It is now considered a sound business decision these days to throw cheap hardware at problems. It is at least, if not more, cost effective than having the staff [sic] tuned and optimized for the same net effect.
Besides, a failed tuning and optimization effort leaves you exactly where you started. At least the hardware upgrade approach results in a faster/better server experiencing the same problem that may still have future value to the business once the fundamental problem is eventually corrected. And, if nothing else, the hardware can be depreciated, whereas the time spent tuning is always just a cost taken off the bottom line. So, with such cheap hardware, it might be a wiser business bet to throw hardware at some solutions sooner than was done in the past. One might go so far as to make an economic principle claim that the opportunity cost of tuning is foregoing cheap upgrades that might fix the issue and also possess intrinsic value. Stated this way, it is a safe bet that is where the business people would vote to spend.
When I interviewed Jonathan Lewis for the November 2008 issue of the NoCOUG Journal, I asked him for a second opinion. He homed on a key element of Dr. Scalzo’s argument.
I think the key sentence in Dr. Scalzo’s case is “So, with such cheap hardware, it might be a wiser business bet to throw hardware at some solutions sooner than was done in the past.” And in this sentence, the most important word is “sooner.”
Throwing hardware at a problem has always been an option—and sometimes the correct option—but 20 years ago hardware was much more expensive than it is now, took up more space, used more power, and generated more heat so, in the standard analysis of cost/benefit/risk, correcting the bad code was usually seen to be more cost effective than using more hardware to live with bad code.
Nothing has really changed in the analysis—except the pricing. If you know that more hardware will give you the scalability you need for as long as you need; the installation process is fast enough and safe enough; and the alternative is going to take too long, need too much testing, or introduce too much risk, then the hardware is the right choice.
It has to be said, though that the analysis is not always carried out correctly. Issues of scalability can be deceptive—especially when you are talking about highly concurrent systems, and installing new hardware isn’t always risk-free or fast. The main drawback to the hardware solution (if the analysis isn’t done properly) is that you can still end up having to fix the code, except the fix is now an emergency that appears unexpectedly at some future time.
When I interviewed Dr. Scalzo for the next issue of the Journal, I asked him to clarify his remarks. He focused on the cost of tuning:
No—Oracle is a great database with extensive flexibility. Instead, it’s the cost of tuning (whether for database, operating system, and/or application) that has become somewhat cost ineffective. Let me give an example. A decade ago, tuning the SGA memory allocation and usage was critical—and it still is today. But with dirt-cheap memory, servers that can accommodate tons of memory, and Oracle automatic memory management—just buy more memory. It’s often far cheaper to add 16 GB RAM than pay two days’ expert database consultation fees. Yes, there are some scenarios where simply adding hardware might make a database bottleneck situation worse. But the odds and cost are now such that it’s a worthwhile first step—since you can always tune if that fails. I’m just preaching to try cheap and easy solutions first.”
In the next issue of the Journal, I interviewed Karen Morton and discussed the same topic with her. She emphasized the need for diagnosis before attempting any fix:
If you have performance problems that have been proven, through careful analysis, to have their root cause in substandard hardware, then the solution may be to upgrade. The investment made to make sure personnel are adequately trained to properly diagnose and repair performance problems can benefit your company in the long run much more so than taking what appears to be a quick fix by throwing hardware at the problem in the short run.
I think that when personnel are properly trained and knowledgeable, they will not only be able to quickly, efficiently, and cost effectively find and fix problems when they arise, but they will not make the mistakes that cause the problems in the first place.
I believe the truth is that if hardware isn’t your real problem, you’ll eventually have no other choice than to address the actual root cause, because hardware upgrades can only be a Band-aid on the wound for so long. I’ve seen too many people try this approach only to pay hundreds of thousands of dollars for hardware they don’t really even need and still have performance problems. When they finally do address the real root cause and fix it, they often find that they could have bypassed the entire expenditure on hardware and saved more time and money if they’d just done it right in the first place.
To summarize, while physical database design improvements may be the elegant and efficient solution to a performance and scalability challenge, more powerful hardware can sometimes be the simplest, quickest, and most cost-effective solution. For example, GridIron Systems claims that its “TurboCharger” caching solution “boosts Oracle database performance by providing up to 1000x faster access to the data blocks containing database tables and indexes.” However, diligence and diagnosis is necessary because a hardware upgrade may not improve the performance of your task and can sometimes make it worse. As Cary Millsap points out in Why “System” is a Four-Letter Word, “performance of your task gets worse if the “improvement” intensifies competition for the resource that is the bottleneck for your task.”
April Fools’ Day Joke
I started writing my book on April 1, 2011 when enthusiasm for Oracle Exadata Database Machine was at its height. The strengths of Oracle Exadata Database Machine are Smart Scans, Storage Indexes, and Hybrid Columnar Compression which dramatically reduce the cost of full table scans. However, it was clear to me that powerful hardware did not eliminate the need for physical database design. I also noted that Oracle’s most recent attempt at the TPC-H benchmark did not use Exadata. I wrote the following tongue-in-cheek posting in honor of the day.
SECRET OF ORACLE DATABASE PERFORMANCE FOUND IN 1987 SEARS ROEBUCK CATALOG
Redwood Shores, April 1, 2010 – In between mouthfuls of raw tofu slices delicately dressed with chili oil and shredded seaweed, Oracle CEO Larry Ellison related his game-changing epiphany about database performance.
“My personal ivory buyer was showing me a picture of ivory chopsticks in the 1897 Sears Roebuck catalog when I noticed the following words at the back of the catalog: IF YOU DON’T FIND IT IN THE INDEX, LOOK VERY CAREFULLY THROUGH THE ENTIRE CATALOG.
That’s when the truth hit me like a ton of bricks. Physical database design is hard. There are few good books about it. Why should our loyal customers spend a lot of money on database consultants when they can spend an equal amount of money on Oracle software and hardware instead? We could tell ’em to stop worrying about indexes, clustering, partitioning, and materialization and sell ’em a big honkin’ piece of hardware that can simply scan entire tables every time.”
Mr. Ellison went on: “It’s not that we haven’t tried the traditional route. Consider Multidimensional Clustering for example. We’ve offered it from day one but the only places it is ever used are the Oracle data dictionary and our own TPC-C benchmarks. It’s time to admit that our customers are right and to give them what they want, not what they need.”
When asked how he came up with the “Smart Scan” moniker for what was formerly called a full table scan, Mr. Ellison was equally candid. “When other companies sell fish, they bill it as dead fish. We call it sushi.”
When told that a certain database maverick named Iggy Fernandez had started writing a book with the ridiculously long title of “Expert Oracle Indexing, Clustering, Partitioning, and Materialization for Performance and Scalability in Oracle Database 11g Release 2” that he hopes to release in time for OpenWorld 2011, Mr. Ellison let out a derisive snort and predicted that Mr. Fernandez was unlikely to get rich from it. When contacted for comment, Mr. Fernandez reluctantly agreed with Mr. Ellison’s prediction and quoted American poet laureate Robert Frost “Two roads diverged in a wood, and I—I took the one less traveled by.”
I myself have encountered at least one database application that did not use any indexes whatsoever but performed adequately even though every query required full table scans. In fact, a database vendor named ParAccel recently performed the 1 TB TPC-H data warehouse benchmark without any indexes at all; it calls this approach “Load-and-Go.” You may wonder how a database can perform joins—inner, outer, semi, and anti joins—without indexes. Oracle Database has two methods at its disposal—the “hash join” and the “sort merge join.” Please refer to the links for more information.
- A hardware upgrade can sometimes be the simplest, quickest, and most cost-effective solution to a performance and scalability problem.
- Performance of your task gets worse if a hardware upgrade intensifies competition for the resource that is the bottleneck for your task.
- Oracle Database can use the “hash join” and “sort merge join” techniques to perform joins of all kinds without the use of indexes.
- Use Statspack or AWR to investigate whether your database application will benefit from faster I/O.
- Evaluate SSD products from vendors such as Kaminario, Fusion-io, Texas Memory Systems, and GridIron Systems.
Now that my publisher and I have amicably parted ways, I am free to self-publish my work on physical database design. Some explanation is called for. I did not want to rewrite the excellent sections on physical database design that can be found in the free online guides provided by Oracle Corporation; instead I wanted to direct the reader to the online guides for supplemental information. This would have meant that my book would be far shorter and take far longer to write than the contract called for. Now that the publishing rights have reverted back to me, I plan on serializing my work here in the form of short lessons with links to the relevant sections of the online Oracle guides for supplemental information. Comments and criticism will always be welcome; it will help me improve the work which I hope to eventually publish in book form under the title Physical Database Design for Oracle Databases: Indexes, Clusters, Partitions, and Materialized Views.
In the first lesson you will learn the necessity of Physical Database Design and find out what is involved.
The year was … well just let’s say it was a long time ago. My friends Andrea, Mabel, and Sameer and I had been assigned to work on a big software development project. We were unfamiliar with the technologies that would be used—VMS, ACMS, TDMS, and Rdb—but training was promised. The very first thing the instructor said was “First you have to insert your definitions into the CDD,” and he walked to the chalkboard and wrote the commands that we needed for the purpose. Needless to say, we were quite flustered because we had no idea what those “definitions” might be or what a “CDD” was and how it fit into the big picture. We were being taught practice before the theory.
Those who are in love with practice without knowledge are like the sailor who gets into a ship without rudder or compass and who never can be certain whether he is going. Practice must always be founded on sound theory.”—The Discourse On Painting by Leonardo da Vinci
You’ll appreciate that I don’t begin by saying “First you have to create your indexes” and giving you the commands that you need for the purpose. Instead I’ll first put physical database design into perspective, explain its purpose, and provide a definition.
Definitions of Logical and Physical Database Design
Database design can be divided into a logical phase (logical database design) and a physical phase (physical database design). Logical database design is the process of designing normalized database tables. (Lightstone et al. 2007) Consider the following example taken from a paper by Edgar Codd, the inventor of relational database theory:
Suppose the data bank contains information about parts and projects. For each part, the part number, part name, part description, quantity-on-hand, and quantity-on-order are recorded. For each project, the project number, project name, project description are recorded. Whenever a project makes use of a certain part, the quantity of that part committed to the given project is also recorded.
You could conceivably construct the following solution consisting of just one table:
CREATE TABLE part_project_commit ( -- part information part# NUMBER(6) NOT NULL, part_name VARCHAR2(16) NOT NULL, part_description VARCHAR2(64) NOT NULL, quantity_on_hand NUMBER(18,6) NOT NULL, quantity_on_order NUMBER(18,6) NOT NULL, -- project information project# NUMBER(6) NOT NULL, project_name VARCHAR2(16) NOT NULL, project_description VARCHAR2(64) NOT NULL, -- quantity of part committed to project quantity_committed NUMBER(18,6) NOT NULL, CONSTRAINT part_project_commit_pk PRIMARY KEY (part#, project#) );
The above solution is obviously problematic; for example, multiple rows may have to be updated whenever the quantity_on_hand of a product changes. Through a process of data modeling and database normalization a better solution involving three tables can be constructed as follows:
CREATE TABLE part ( part# NUMBER(6) NOT NULL, part_name VARCHAR2(16) NOT NULL, part_description VARCHAR2(64) NOT NULL, quantity_on_hand NUMBER(18,6) NOT NULL, quantity_on_order NUMBER(18,6) NOT NULL, CONSTRAINT part_pk PRIMARY KEY (part#) ); CREATE TABLE project ( project# NUMBER(6) NOT NULL, project_name VARCHAR2(16) NOT NULL, project_description VARCHAR2(64) NOT NULL, CONSTRAINT project_pk PRIMARY KEY (project#) ); CREATE TABLE commit ( part# NUMBER(6) NOT NULL, project# NUMBER(6) NOT NULL, quantity_committed NUMBER(18,6) NOT NULL, CONSTRAINT commit_pk PRIMARY KEY (part#, project#), CONSTRAINT commit_part_fk FOREIGN KEY (part#) REFERENCES part, CONSTRAINT commit_proj_fk FOREIGN KEY (project#) REFERENCES project );
The above CREATE TABLE commands can be used to construct tables in your Oracle database. This concludes the process of logical database design as we defined it. However, the work of database design is not complete; we still have to consider the question of performance. The default organization of an Oracle database table is an unordered heap; in other words, an Oracle database table starts out as an unorganized collection of records. The SQL queries that are retrieve data from these tables will need the help of indexes and other mechanisms to retrieve data efficiently. They are transformed by the query optimizer into sequences of database operations such as restriction, projection, inner join, outer join, semi-join, anti-join, aggregation, and sorting. These operations are illustrated in the following examples; some experience with SQL is assumed.
Our first example illustrates the restriction and projection operations. The restriction operation eliminates rows from the target of the query while the projection operation eliminates columns.
-- Illustration of restriction and projection -- List part names for certain parts SELECT prt.part#, prt.part_name FROM part prt WHERE part# BETWEEN 10000 AND 20000 ORDER BY prt.part#;
Our second example illustrates the inner join operation. It is used to associate rows from two tables.
-- Illustration of inner join -- List each project and the parts it uses -- Projects that use no parts are not listed SELECT prj.project#, prj.project_name, cmt.part#, cmt.quantity_committed FROM project prj INNER JOIN commit cmt ON (cmt.project# = prj.project#) ORDER BY prj.project#, cmt.part#;
Our third example illustrates the left outer join operation. Like the inner join operation, it is also used to associate rows from two tables. However, rows from the left table that have no corresponding rows in the right table are also included in the result. The right outer join operation is similar to the left outer join operation; rows from the right table that have no corresponding rows in the left table are included in the result. The full outer join operation includes rows from both the left and right tables even if they have no corresponding rows in the right and left tables respectively.
-- Illustration of left outer join -- List each part and the projects in which it is used -- Parts that are not used in any project are also listed SELECT prt.part#, prt.part_name, cmt.project#, cmt.quantity_committed FROM part prt LEFT OUTER JOIN commit cmt ON (cmt.part# = prt.part#) WHERE prt.part# BETWEEN 10000 AND 20000 ORDER BY prt.part#, cmt.project#;
Our fourth example illustrates a check for existence in a correlated subquery. The corresponding database operation is called a semi-join.
-- Illustration of semi-join -- List parts that are used in at least one project SELECT prt.part#, prt.part_name, prt.part_description, prt.quantity_on_hand, prt.quantity_on_order FROM part prt WHERE EXISTS ( -- Correlated subquery SELECT * FROM COMMIT cmt WHERE cmt.part#=prt.part# ) ORDER BY prt.part#;
Our fifth example illustrates a check for non-existence in a correlated subquery. The corresponding database operation is called an anti-join.
-- Illustration of anti-join -- List parts that are not used in any projects SELECT prt.part#, prt.part_name, prt.part_description, prt.quantity_on_hand, prt.quantity_on_order FROM part prt WHERE NOT EXISTS ( -- Correlated subquery SELECT * FROM COMMIT cmt WHERE cmt.part#=prt.part# ) ORDER BY prt.part#;
Our sixth example illustrates aggregation.
-- Illustration of aggregation -- List the total committed quantity for each part SELECT cmt.part#, SUM(cmt.quantity_committed) AS tot_quantity_committed FROM commit cmt GROUP BY cmt.part# ORDER BY cmt.part#;
Obviously, SQL queries can be much more complicated than the above examples; queries that are thousands of lines long are not unheard of. Queries can include any number of tables and can build upon intermediate queries as in the following example which lists projects which use more than 50% of the committed usage of any part.
-- A complex query -- List projects which use more than 50% of the committed usage of any part SELECT cmt.project#, cmt.part#, cmt.quantity_committed, cmt.quantity_committed/tot.tot_quantity_committed * 100 AS pct_usage FROM COMMIT cmt INNER JOIN ( -- Intermediate query -- List the total committed quantity for each part SELECT cmt.part#, SUM(cmt.quantity_committed) AS tot_quantity_committed FROM COMMIT cmt GROUP BY cmt.part# ) tot ON (cmt.part# = tot.part#) WHERE cmt.quantity_committed/tot.tot_quantity_committed * 100 > 50 ORDER BY cmt.project#, cmt.part#;
Three broad categories of performance mechanisms are available to improve the performance of database operations. Indexes help in restriction, projection, joins of all types, and sorting. Partitions and clusters can be used to co-locate data with similar characteristics in order to limit the amount of physical I/O. Finally, materialized views can be used to perform joins and aggregations ahead of time.
Physical database design is the process of designing indexes, partitions, clusters, and materialized views. (Lightstone et al. 2007) Its purpose is to improve the efficiency of database operations such as restriction, projection, joining, aggregation, and sorting.
This concludes the first lesson.
- Database design can be divided into a logical phase (logical database design) and a physical phase (physical database design).
- Logical database design is the process of designing normalized database tables. (Lightstone et al. 2007)
- Physical database design is the process of designing indexes, partitions, clusters, and materialized views. (Lightstone et al. 2007)
- The purpose of physical database design is to improve the efficiency of database operations such as restriction, projection, joining, aggregation, and sorting.
- Oracle Corporation. SQL Developer Data Modeler User’s Guide Release 3.0. 2011. http://download.oracle.com/docs/cd/E18464_01/doc.30/e15802/tut_data_modeling.htm (accessed September 4, 2011).
- Microsoft Support, “Description of the database normalization basics.” http://support.microsoft.com/kb/283878 (accessed 9/4/2011).
- Perform the database design exercise in SQL Developer Data Modeler User’s Guide Release 3.0.
- Discuss whether physical database design is required for small databases that completely fit into memory.
- Discuss whether database normalization affects query performance.
- Create the Part, Project, and Commit tables using the CREATE TABLE statements provided in the lesson. Identify the indexes that were automatically created by Oracle even though we did not explicitly ask for them to be created. Justify why Oracle decides to create these indexes automatically.
- Use the UNION clause instead of the OUTER JOIN clause to list each part and the projects in which it is used, including the parts which are not used in any projects. Use the INNER JOIN clause instead of a correlated subquery to list parts that are used in at least one project. Use the MINUS clause instead of a correlated subquery to list parts that are not used in any project. Use analytic functions instead of the GROUP BY clause to lists projects which account for more than 50% of the committed usage of any part. Discuss the performance impacts of rewriting queries and whether the physical database design is influenced by how queries are written.