Day 5: The Twelve Days of SQL: The query cost is only an estimate
On the fifth day of Christmas, my true love gave to me
Five golden rings.
The Cost column in query plans can be confusing because a query can take a long time to execute even if the cost is low. The natural assumption is that execution time will be low if the cost is low and high if the cost is high. However, the cost is only an estimate. The query optimizer is given very limited time and very limited information to solve a very complex problem. It chooses the query plan with the lowest cost estimate it can find under the circumstances. However, an estimate is not the same thing as a guarantee.
The unit of cost can also be confusing. Prior to Oracle Database 9i, the query optimizer attempted to minimize the number of I/O requests; that is the sum of single block reads and multi-block reads. Therefore, the “cost” reported in query plans prior to Oracle Database 9i was the estimated number of I/O requests. Beginning with Oracle Database 9i, the query optimizer performs more sophisticated calculations and attempts to minimize the actual amount of time required to execute the query. However, the resulting time estimate is divided by an estimate of the time to read one database block from storage (SREADTIM) obtained from the “system statistics.”
You can use statistical theory to validate or disprove the hypothesis that execution time is strongly correlated with the cost. You can calculate the “coefficient of determination” R2 for the cost estimates and execution times recorded in your shared pool (Khan Academy video on the coefficient of determination). The coefficient of determination measures what fraction of the variation in the “response variable” (query execution time in this case) can be attributed to the variation in the “explanatory variable” (the cost estimate in this case). The coefficient of determination therefore ranges from 0 to 1 and will be a perfect 1 for perfect linear correlation. Multiply by 100 to express it as a percentage. The following query calculates the coefficient of determination for schemas with at least 100 different queries in the shared pool. The closer to 100%, the higher the correlation between query plan cost and actual execution time. You can vote multiple times in the poll below.
The coefficient of determination will quantify the fraction of variation in the execution time that can be attributed to the cost estimate. Low values are unavoidable in real life because the query optimizer is given limited information and limited time to solve a very complex problem so the estimated cost may be too low or too high. Also, the execution time is affected by the run time values of the bind variables, the amount of caching, contention, etc.
Update: Thanks to Jonathan Lewis for suggesting that the investigation be limited to SELECT statements only (COMMAND_TYPE = 3).
set linesize 100 column query_count format 999,999,990 column total_executions format 999,999,990 column total_execution_seconds format 999,999,990 column R2 format 99.90 SELECT parsing_schema_name AS schema, COUNT(*) AS query_count, SUM(executions) AS total_executions, SUM(elapsed_time) / 1000000 AS total_execution_seconds, 100 * REGR_R2(optimizer_cost, elapsed_time/executions) AS R2 -- use the average elapsed time as the response variable FROM v$sql WHERE parsing_schema_name != 'SYS' AND command_type = 3 -- SELECT statements only AND optimizer_cost > 0 AND executions > 0 AND elapsed_time > 0 GROUP BY parsing_schema_name HAVING COUNT(*) > 100 -- ensure a reasonable sample size ORDER BY 5; SCHEMA QUERY_COUNT TOTAL_EXECUTIONS TOTAL_EXECUTION_SECONDS R2 ------------------------------ ------------ ---------------- ----------------------- ------ OLTP 864 52,771 4,371 24.08