Home > DBA, Oracle, SQL > Query Tuning By Example: Throw Away That Execution Plan

Query Tuning By Example: Throw Away That Execution Plan


I was asked to tune the following SQL query; it usually retrieved less than a hundred rows but took several minutes of execution time. Table names and column names have been changed in the interests of confidentiality.

SELECT DISTINCT
  o.order_id,
  o.order_date
FROM
  orders o,
  order_items ol
WHERE
  LOWER(o.order_state) = : order_state AND
  ol.order_id = o.order_id AND
  ol.item_id IN (
    SELECT item_id
    FROM items
    WHERE item_state = : item_state
  )
ORDER BY o.order_date;

The first thing that most of us would want to do is to look at the execution plan generated by EXPLAIN PLAN or DBMS_XPLAN.DISPLAY. However, watch this fascinating YouTube video Rewriting SQL Queries for Performance in 9 Minutes by Stephane Faroult in which he claims that he doesn’t find execution plans very useful in improving queries. Stephane’s point seems to be that if you write high-quality SQL queries and your tables are well indexed, you can be reasonably confident that your queries will perform reasonably well. The analogy he gives is the problem of translating from one language to another using computer software; instead of constantly wondering how the language translation engine works and why it is producing poor-quality output, focus instead on writing well-structured and unambiguous input for the engine.

The first thing that I noticed about the problem query was that the FROM clause included tables from which nothing was being selected in the SELECT clause. This is a classic performance trap. For example, consider the query: Which departments have at least one employee? It can be answered in the following ways:

SELECT DISTINCT d.department_name
FROM departments d, employees e
WHERE e.department_id = d.department_id;

SELECT d.department_name
FROM departments d
WHERE EXISTS (
  SELECT *
  FROM employees e
  WHERE department_id = d.department_id
);

Even though these two queries mean exactly the same thing and produce the same results, Oracle does not process them in the same way. In the first case, Oracle compares every record in the Departments table with every record in the Employees table, thus collecting duplicate department names which have to be eliminated (SELECT DISTINCT). In the second case, Oracle performs an existence check (semi-join) for each record in the department table; no duplicates are produced.

The second thing that I noticed about the problem query was the use of the LOWER function; it would prevent Oracle from using an index on the order_state column. I questioned whether there was any need to use the LOWER function and whether there was a “function-based index” on the expression LOWER(order_state).

The third thing that I noticed was the use of the preposition “IN” to perform an existence check instead of “EXISTS.” We need to be very careful when using “IN,” “ANY,” “SOME,” and “ALL” in SQL queries. They behave in unintuitive ways in the presence of nulls (unknowns). For example, the value of the boolean expression “1 IN (2, Null)” is not False as one might expect, it is considered to be unknown. Similarly the value of the expression “1 NOT IN (2, Null)” is not True as one might expect, it is also considered to be unknown. Finally—since an object either belongs or does not belong to a set—one might expect that the value of the expression “(1 IN (2, Null)) OR (1 NOT IN (2, Null))” is True but that is not so; it is also considered to be unknown. None of this is very intuitive, to say the least. However, refer to the section titled Use of EXISTS versus IN for Subqueries in the Oracle Database 10g Performance Tuning Guide. A similar note appears in the Oracle Database 9i guide but I could not find anything like it in the Oracle Database 11g guide; presumably the Oracle Database 11g query optimizer is smarter than previous versions.

It was time to rewrite the problem query. This is what I wrote:

SELECT
  order_id,
  order_date
FROM orders o
WHERE order_state = : order_state
AND EXISTS (
  SELECT *
  FROM order_items ol
  WHERE order_id = o.order_id
  AND EXISTS (
    SELECT *
    FROM items
    WHERE item_id = ol.item_id
    AND item_state = : item_state
  )
)
ORDER BY order_date;

In the manner of Stephane Faroult, I confidently declared that the rewritten query would be much faster than the original version if the tables were well indexed. I was right; the rewritten query completed in a very small fraction of a second. Stephane was right too!

Categories: DBA, Oracle, SQL
  1. October 3, 2009 at 12:11 pm

    11g optimizer is smarter.

    There is only one exists needed to satisfy FROM including only tables that are in SELECT clause.
    -A tuning tip from version 8 times is to prefer using joins over in or exists.
    -If there is a DISTINCT in a select clause one should question is there something wrong in the query.
    Putting these together my approach would be.

    SELECT o.order_id, o.order_date
    FROM orders o
    WHERE o.order_state = : order_state
    AND EXISTS (
    SELECT null
    FROM order_items ol, items i
    WHERE i.item_id = ol.item_id
    AND ol.order_id = o.order_id
    AND i.item_state = : item_state
    )
    ORDER BY order_date;

    • Iggy Fernandez
      October 3, 2009 at 2:08 pm

      Thanks for the comment, Timo. In relational calculus, only one variable is bound to each existential qualifier. In the spirit of what Stephane Faroult said in his video, I therefore chose to give the optimizer an SQL statement that most closely adhered to first principles.

  1. May 26, 2020 at 6:56 am

Leave a comment