Take That, Exadata: Fast Index Creation using NOPARALLEL
Follow @Oratweets
The development team needed a copy of the production schema and index creation was taking forever. The development server had lots of CPU and memory but IO was sucky. A half-dozen ginormous tables were especially problematic; it took 40 minutes to an hour to create each index with PARALLEL 4 because each of these tables was close to 10 GB in size. To make matters worse, each of these tables had nearly one hundred indexes and we calculated that it would take a month to create all the indexes. We tried to run multiple index creation scripts simultaneously but it saturated the bandwidth and made things even slower. One hundred times 10 GB is one terabyte; a lot of IO for a sucky IO system limited by a sucky network connection.
My first impulse was to reach for an Exadata (picture of Larry Ellison reaching for an Exadata). After all, the surest way to squish an insect is to use a steamroller instead of a slipper. But I didn’t have an Exadata handy. 🙂
My second impulse was to google for help. Why think when you can just google? But the most frequent advice found by Google was to use PARALLEL. Many hours of determined googling later, I found an excellent discussion on Oracle-L about Oracle’s inability to create multiple indexes simultaneously. Obviously, there’s no real need to read a 10 GB table one hundred times in order to create one hundred indexes; why not create multiple indexes in one pass through the table. (Other databases can create multiple indexes in one pass over a table.) David Alridge had a wonderful suggestion that I wished I had thought of myself; leverage the Oracle cache. This is one case where “direct path reads” and parallelism hurt instead of helping. The trick is to use NOPARALLEL so that data is loaded into shared memory using “db file scattered reads”—instead of being loaded into private memory using direct path reads—and to run multiple CREATE INDEX statements simultaneously. In the words of David Aldridge, you’ll get “1 index creation process performing the physical reads and the other 9 (for example) processes greedily waiting for the blocks to load into the buffer.” That worked wonderfully well for me; here’s a screenshot. For the read events, P1 is the file#, P2 is the block#, and P3 is the number of blocks—starting at block#—that the session is trying to read.
select p1, p2, event, sid, p3, wait_time_micro from v$session where program like '%sqlplus%' order by p1, p2, event, sid; P1 P2 EVENT SID P3 WAIT_TIME_MICRO ---------- ---------- ------------------------------ ---------- ---------- --------------- 11 733116 db file scattered read 2181 4 45001 11 733116 read by other session 1088 1 44271 11 733116 read by other session 1089 1 44390 11 733116 read by other session 1093 1 44451 11 733116 read by other session 1094 1 44344 11 733116 read by other session 1096 1 44573 11 733116 read by other session 1100 1 44442 11 733116 read by other session 1102 1 44292 11 733116 read by other session 2166 1 44733 11 733116 read by other session 2168 1 44804 11 733116 read by other session 2176 1 44665 11 733116 read by other session 2177 1 44751 11 733116 read by other session 2182 1 45021 11 733116 read by other session 2185 1 44886 11 733116 read by other session 2186 1 44956 11 733116 read by other session 2187 1 44937 1650815232 1 SQL*Net message to client 1078 0 2
Now, blocks read through “db file scattered read” operations are cached at the least recently used end of the Least Recently Used (LRU) list and only a limited number of these blocks are permitted to exist in the cache at one time in order to prevent a full-table scan from fushing the entire buffer cache (Shallahamer, Craig. “All About Oracle’s Touch Count Data Block Buffer Cache Algorithm.” 05 Jan 2004). Since I had lots of memory and the tables in question could be accomodated in the cache, I used the CACHE option to allow the blocks to be cached using the regular caching algorithm. You may not want to create one hundred indexes simultaneously, so caching the data will help speed up successive runs. Here is a little shell script that reads CREATE INDEX NOPARALLEL statements from a file and runs a configurable number of them simultaneously. (The NOPARALLEL option is the default so you don’t really need to explicitly specify it.) All the CREATE INDEX statements pertain to a single table so the CACHE option is enabled at the beginning and disabled at the end. Also note that I manually set the SORT_AREA_SIZE to 1000M for every index creation; you’ll have to set WORKAREA_SIZE_POLICY to MANUAL in order for this to have any effect.
#!/bin/ksh # This script requires four parameters: Table Name, DDL File, Log File, and Maximum Parallelism # Parameter settings tableName=$1 ddlFile=$2 logFile=$3 maxParallelism=$4 # Function enableCaching # Enable caching during full table scans enableCaching () { sqlplus -s "/ as sysdba" <<EOF alter table $tableName CACHE; EOF } # Function disableCaching # Disable caching during full table scans disableCaching () { sqlplus -s "/ as sysdba" <<EOF alter table $tableName NOCACHE; EOF } # Function ddlLoop # Loop through the ddlFile ddlLoop() { threadNumber=0 cat $ddlFile | while read ddlCommand do ((threadNumber+=1)) echo $(date) $ddlCommand >>$logFile sqlplus -s "/ as sysdba" <<EOF >>$logFile 2>&1 & set feedback off alter session set sort_area_size=1048576000; $ddlCommand EOF if [[ $threadNumber -eq $maxParallelism ]] then wait # for parallel threads to complete threadNumber=0 fi done wait # for remaining parallel threads to complete } enableCaching ddlLoop disableCaching
Using the above script, I was able to create all the required indexes in 24 hours instead of the projected 24 days. I restricted myself to 16 index creation statements at a time because I didn’t want to choke the server during the writing phase.
Sometimes a slipper is a much better tool than a steamroller to squish an insect.
Take that, Exadata.
Secret of Oracle Database Performance Found in 1897 Sears Roebuck Catalog
posting code in wordpress is a pain, your <<EOF was translated in < …
Thanks, Laurent; I’ve fixed it now. Switching between HTML and Visual editing modes is what sometimes messes up the source code in the sourcecode sections.
Very informative. Thanks for sharing this.
Your solution is ok in case you have relatively small table ( less than the sga )
Alexandru,
Correct; if the number of indexes is too much for one pass (close to a hundred in the example), then caching the whole table in the SGA is the way to go. However, even if the table does not completely fit in the SGA, there is a huge advantage to be gained from running multiple NOPARALLEL statements simultaneously (16 in the example).
I’m a little late to this discussion, but I too found very little online about doing something such as this. Everything revolves around only using parallel. When migrating a database to 11G on ASM using datapump I was having terrible I/O issues while building indexes. The worst was on a 40G table. Our Netapp array was just maxed out on what it could provide.
What I did, was to set filesysetmio_options=async (instead of setall), thus disabling direct i/o. I’m on ASM so I’m not using file system caching. I then set the sga to 43GB, pga to 10GB, workarea_size_policy=manual, and sort_area_size=2147483647 (2 GB, which is the maximum when using manual), and altered the table to be cached for full scans.
Then I built all my indexes as normal in one sql script, using parallel 4. The first index build on the 40G table took the normal amount of time, but each subsequent index build absolutely flew because they used the buffer cache. Since it was parallel, each parallel process could use up to 2GB of pga memory, so a total of 8GB was available for each index build.
Parallel was important here, as it allowed me to easily give each index build a total of 8GB from the pga. I saved quite a lot of time by avoiding temp. Single threaded I’d have a max of 2GB for pga and I would have gone to temp frequently.
Overall, my index build time went from 1 hour 22 minutes to 17 minutes.
In re-reading my post, it seems to make sense only when I/O is slow, memory is small, and you cannot use PARALLEL.