When to Rebuild Secondary Indexes on Index-Organized Tables?
An “index-organized table” (or clustered index in SQL Server terminology) is just a conflation of a primary key index and a table. The unindexed columns are stored in the leaf blocks of the B-tree index instead of just a ROWID as in the case of regular B-tree indexes on heap tables. The “TABLE ACCESS BY INDEX ROWID” step is thus eliminated because the primary key index is now the table and the table is now the primary key index. The need for a separate table is eliminated and a quick check of DBA_SEGMENTS will show that there is no longer a table segment for the table, just an index segment with the same name as the primary key constraint.
Index-organized tables are important for two reasons. The first reason is that they eliminate the extra I/O operation needed to locate the row in a heap table. But there is a far more important reason: IOTs provide organization. Why is organization so important you ask? We wouldn’t patronize a library or book store which had a great catalog but whose books were randomly scattered throughout the bookstore, would we? Then why do we put so much stock in heap tables? I’ll tell you why. On page 379 of Effective Oracle by Design, Tom Kyte quotes Steve Adams as saying: “If a schema has no IOTs or clusters, that is a good indication that no thought has been given to the matter of optimizing data access.” I agree. In my experience, most applications use little more than heap tables and single-column indexes. Also, the modern trend in data modeling is to use an “artificial” key instead of a natural key as the primary key. However, consider the Customers table in the TPC-C benchmark; it has a very natural key composed of the warehouse ID, the district ID, and the customer ID. Each warehouse is made up of up to 10 districts and each district provides services to a maximum of 3000 customers. An IOT would not only provide great compression of the primary key columns but would also provide valuable organization to the customers table.
There are three reasons why organization is so important. Firstly, storing logically related records in the same data blocks increases the chances that a required block is already in memory. For example, bookstore customers who look at Beginning Oracle Database 11g Administration by yours truly may also look at Expert Oracle Database 11g Administration by my arch-rival, Sam Alapati. Secondly, it often happens that certain subsets of data are more active than others. For example, cookbooks and computer books may sell better than books on paleontology and paleobotany. As another example, bank customers usually check recent bank statements more frequently than old bank statements. And, thirdly, it may happen that different subsets of data are active at different times. For example, customers in the United States need their data when customers in the European Union are fast asleep and vice versa. This allows more caching of active data.
There is an important caveat about creating indexes on an IOT. Since an IOT is really a B-tree index, it is subject to splitting of leaf blocks and branch blocks and, therefore, rows will be forced to migrate from one block to another. Large numbers of rows may be affected by a splitting operation and it would be prohibitively expensive to update all the indexes in real time. Therefore, a secondary index on an IOT stores primary keys in addition to ROWIDs. If a row has migrated and a ROWID lookup can no longer find the row, Oracle can use the primary key as a fallback option. As more and more rows migrate, Oracle finds itself doing more and more additional I/O in order to find a row. Therefore, Oracle maintains a rough estimate of the percentage of index entries that have valid ROWIDs. This estimate is listed as PCT_DIRECT_ACCESS in ALL_INDEXES. Oracle uses PCT_DIRECT_ACCESS to calculate whether to use the primary key directly without bothering to use the ROWID first. The first question is: What is the threshold at which it makes sense to completely ignore the ROWIDs and start using the primary key exclusively?
Here is my back-of-the-envelope calculation:
Let N be the number of records that we expect to retrieve
Let P be the fraction of valid ROWIDs in the secondary index; that is, P = PCT_DIRECT_ACCESS/100.
Let C be the cost of finding a single record using the IOT; that is, C = BLEVEL + 1
The threshold can then be computed using the following equation:
Consistent gets during two-step procedure < Consistent gets during one-step procedure <=> NP + N(1 – P)(1 + C) < NC <=> P + (1 - P)(1 + C) < C <=> P + 1 + C - P - PC < C <=> 1 - PC < 0 <=> PC > 1 <=> P > 1/C <=> PCT_DIRECT_ACCESS > 100/(BLEVEL + 1)
If BLEVEL = 3, then the threshold at which it makes sense to ignore the ROWIDs and use the primary keys exclusively is as low as 25. But you probably wouldn’t want to wait until PCT_DIRECT_ACCESS falls to that level because Oracle needs to perform (1 – P) * C extra consistent get operations for every 100 access operations. If BLEVEL = 3 and PCT_DIRECT_ACCESS = 25, Oracle has to perform 3 extra consistent get operations or 300% more operations than would have been required if all ROWIDs were valid. If BLEVEL = 3 and PCT_DIRECT_ACCESS = 75, Oracle has to perform 1 extra consistent get operation or 100% more operations than would have been required if all ROWIDs were valid.
Let X% be the percentage of extra operations that we are willing to tolerate. The threshold at which we need to update block references can then be computed using the following equation:
(1 - P) * C = X/100 <=> P = 1 - X/(100 * C) <=> PCT_DIRECT_ACCESS = 100 - X/(BLEVEL + 1)
Here is a table showing the threshold for different values of X. Once the threshold is breached, block references can be updated using the UPDATE BLOCK REFERENCES or REBUILD options of the ALTER INDEX command.
Comments are welcome.