Day 8: The Twelve Days of SQL: Statistics are a double-edged sword
On the eighth day of Christmas, my true love gave to me
Eight maids a-milking.
Database administrators as well as developers, managers, and even end-users put a lot of faith in statistics. When application performance degrades, there is a chorus of requests to collect fresh statistics. However, statistics can be a double-edged sword. My favorite story about statistics is when performance suddenly tanked one day in a database that had never experienced any performance issues since it had been created about a year ago—so they told me. We checked the statistics and they were as fresh as could be. Several shouting matches later, I just so happened to look at the cron schedule and found the still-smoking gun. It turned out that this database did not refresh statistics on a regular basis. Then why were the statistics so fresh? They had first been collected one year ago using a cron job scheduled for a particular day of a particular month but the person who created the cron job had forgotten to disable it. This business-critical database had stellar performance for a whole year without fresh statistics but performance nosedived when the cron job was executed again, exactly one year later. True story.
Mogens Norgaard said in an “Ask the Oracles” article titled “Statistics—How and When” published in the August 2007 issue of the NoCOUG Journal: “I have advised many customers to stop analyzing, thereby creating a more stable environment overnight.” Here are some more memorable quotes from that article. Jonathan Lewis has the perfect summary.
“Oh, and by the way, could you please stop gathering statistics constantly? I don’t know much about databases, but I do think I know the following: small tables tend to stay small, large tables tend to stay large, unique indexes have a tendency to stay unique, and non-unique indexes often stay non-unique.”—Dave Ensor as remembered by Mogens Norgaard
“Monitor the changes in execution plans and/or performance for the individual SQL statements … and perhaps as a consequence re-gather stats. That way, you’d leave stuff alone that works very well, thank you, and you’d put your efforts into exactly the things that have become worse.”—Mogens Norgaard
“It is my firm belief that most scheduled statistics-gathering jobs do not cause much harm only because (most) changes in the statistics were insignificant as far as the optimizer is concerned—meaning that it was an exercise in futility.”—Wolfgang Breitling
“There are some statistics about your data that can be left unchanged for a long time, possibly forever; there are some statistics that need to be changed periodically; and there are some statistics that need to be changed constantly. … The biggest problem is that you need to understand the data.”—Jonathan Lewis
But my all-time favorite quote about statistics is:
“It astonishes me how many shops prohibit any un-approved production changes and yet re-analyze schema stats weekly. Evidently, they do not understand that the [possible consequence] of schema re-analysis is to change their production SQL execution plans, and they act surprised when performance changes!”—Don Burleson
To fully understand what the above authors were trying to convey, you have to read the quotes in context. Please click here to read their full opinions (PDF). Also review the Should we stop analyzing? discussion on the Oracle-L mailing list in December 2003 by Mogens Norgaard, Wolfgang Breitling, Jonathan Lewis, Don Burleson, and others. A quick refresher on statistics can be found in the Performance Tuning Guide.
P.S. You may ask how we recovered from the statistics problem. There was no undoing the changes because this was an Oracle 8i database which did not have the wonderful statistics restore capability possessed by Oracle 10g and 11g databases. Several donuts later, I decided to collect statistics again but with histograms this time. That fixed it. I hope I remembered to remove that annual cron job.