Home > DBA, Oracle, SQL > Day 5: The Twelve Days of SQL: The query cost is only an estimate

Day 5: The Twelve Days of SQL: The query cost is only an estimate


On the fifth day of Christmas, my true love gave to me
Five golden rings.

Day 5: The query cost is only an estimate (Day 4: The way you write your query matters)(Day 6: The execution plan is a tree)

The Cost column in query plans can be confusing because a query can take a long time to execute even if the cost is low. The natural assumption is that execution time will be low if the cost is low and high if the cost is high. However, the cost is only an estimate. The query optimizer is given very limited time and very limited information to solve a very complex problem. It chooses the query plan with the lowest cost estimate it can find under the circumstances. However, an estimate is not the same thing as a guarantee.

The unit of cost can also be confusing. Prior to Oracle Database 9i, the query optimizer attempted to minimize the number of I/O requests; that is the sum of single block reads and multi-block reads. Therefore, the “cost” reported in query plans prior to Oracle Database 9i was the estimated number of I/O requests. Beginning with Oracle Database 9i, the query optimizer performs more sophisticated calculations and attempts to minimize the actual amount of time required to execute the query.  However, the resulting time estimate is divided by an estimate of the time to read one database block from storage (SREADTIM) obtained from the “system statistics.”

You can use statistical theory to validate or disprove the hypothesis that execution time is strongly correlated with the cost. You can calculate the “coefficient of determination” R2 for the cost estimates and execution times recorded in your shared pool (Khan Academy video on the coefficient of determination). The coefficient of determination measures what fraction of the variation in the “response variable” (query execution time in this case) can be attributed to the variation in the “explanatory variable” (the cost estimate in this case). The coefficient of determination therefore ranges from 0 to 1 and will be a perfect 1 for perfect linear correlation. Multiply by 100 to express it as a percentage. The following query calculates the coefficient of determination for schemas with at least 100 different queries in the shared pool. The closer to 100%, the higher the correlation between query plan cost and actual execution time. You can vote multiple times in the poll below.

The coefficient of determination will quantify the fraction of variation in the execution time that can be attributed to the cost estimate. Low values are unavoidable in real life because the query optimizer is given limited information and limited time to solve a very complex problem so the estimated cost may be too low or too high. Also, the execution time is affected by the run time values of the bind variables, the amount of caching, contention, etc.

Update: Thanks to Jonathan Lewis for suggesting that the investigation be limited to SELECT statements only (COMMAND_TYPE = 3).

set linesize 100

column query_count format 999,999,990
column total_executions format 999,999,990
column total_execution_seconds format 999,999,990
column R2 format 99.90

SELECT
  parsing_schema_name AS schema,
  COUNT(*) AS query_count,
  SUM(executions) AS total_executions,
  SUM(elapsed_time) / 1000000 AS total_execution_seconds,
  100 * REGR_R2(optimizer_cost, elapsed_time/executions) AS R2 -- use the average elapsed time as the response variable
FROM v$sql
WHERE parsing_schema_name != 'SYS'
AND command_type = 3 -- SELECT statements only
AND optimizer_cost > 0
AND executions > 0
AND elapsed_time > 0
GROUP BY parsing_schema_name
HAVING COUNT(*) > 100 -- ensure a reasonable sample size
ORDER BY 5;

SCHEMA                          QUERY_COUNT TOTAL_EXECUTIONS TOTAL_EXECUTION_SECONDS     R2
------------------------------ ------------ ---------------- ----------------------- ------
OLTP                                    864           52,771                   4,371  24.08

Day 4: The way you write your query matters

Day 6: The execution plan is a tree

Categories: DBA, Oracle, SQL
  1. Rich Headrick
    December 5, 2011 at 6:37 pm

    Funny you mentioned SREADTIM. Please have a look at bug 9842771. We were discussing this today. There is an API to simply change the value of SREADTIM and MREADTIME in case you are hitting the bug.

    Example:

    SELECT pname, pval1
    FROM sys.aux_stats$
    WHERE sname = ‘SYSSTATS_MAIN’;

    Example:

    PNAME PVAL1
    ———– ———-
    CPUSPEEDNW 1553
    IOSEEKTIM 10
    IOTFRSPEED 4096
    SREADTIM 57932.193 <– Should be 5.8ms or just 6ms
    MREADTIM 67233.684 <– Should be 6.7 or just 7ms
    CPUSPEED 1554
    MBRC 6
    MAXTHR
    SLAVETHR

    Adjust using the following:

    exec dbms_stats.set_systems_stats('SREADTIM', 6);
    exec dbms_stats.set_systems_stats('MREADTIM', 7);

  2. December 8, 2011 at 3:41 pm

    great idea Ziggy.
    One comment, make sure and comment out the rows with cost=0 which I think are from rule based optimizations. With the filter I got 76 and with out I got 32.
    – Kyle

  3. Iggy Fernandez
    December 8, 2011 at 7:22 pm

    Kyle Hailey :

    make sure and comment out the rows with cost=0 which I think are from rule based optimizations. With the filter I got 76 and with out I got 32.

    Thanks Kyle, I’ve fixed the query and reset the poll.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: