Query Tuning By Example: Throw Away That Execution Plan
I was asked to tune the following SQL query; it usually retrieved less than a hundred rows but took several minutes of execution time. Table names and column names have been changed in the interests of confidentiality.
SELECT DISTINCT o.order_id, o.order_date FROM orders o, order_items ol WHERE LOWER(o.order_state) = : order_state AND ol.order_id = o.order_id AND ol.item_id IN ( SELECT item_id FROM items WHERE item_state = : item_state ) ORDER BY o.order_date;
The first thing that most of us would want to do is to look at the execution plan generated by EXPLAIN PLAN or DBMS_XPLAN.DISPLAY. However, watch this fascinating YouTube video Rewriting SQL Queries for Performance in 9 Minutes by Stephane Faroult in which he claims that he doesn’t find execution plans very useful in improving queries. Stephane’s point seems to be that if you write high-quality SQL queries and your tables are well indexed, you can be reasonably confident that your queries will perform reasonably well. The analogy he gives is the problem of translating from one language to another using computer software; instead of constantly wondering how the language translation engine works and why it is producing poor-quality output, focus instead on writing well-structured and unambiguous input for the engine.
The first thing that I noticed about the problem query was that the FROM clause included tables from which nothing was being selected in the SELECT clause. This is a classic performance trap. For example, consider the query: Which departments have at least one employee? It can be answered in the following ways:
SELECT DISTINCT d.department_name FROM departments d, employees e WHERE e.department_id = d.department_id; SELECT d.department_name FROM departments d WHERE EXISTS ( SELECT * FROM employees e WHERE department_id = d.department_id );
Even though these two queries mean exactly the same thing and produce the same results, Oracle does not process them in the same way. In the first case, Oracle compares every record in the Departments table with every record in the Employees table, thus collecting duplicate department names which have to be eliminated (SELECT DISTINCT). In the second case, Oracle performs an existence check (semi-join) for each record in the department table; no duplicates are produced.
The second thing that I noticed about the problem query was the use of the LOWER function; it would prevent Oracle from using an index on the order_state column. I questioned whether there was any need to use the LOWER function and whether there was a “function-based index” on the expression LOWER(order_state).
The third thing that I noticed was the use of the preposition “IN” to perform an existence check instead of “EXISTS.” We need to be very careful when using “IN,” “ANY,” “SOME,” and “ALL” in SQL queries. They behave in unintuitive ways in the presence of nulls (unknowns). For example, the value of the boolean expression “1 IN (2, Null)” is not False as one might expect, it is considered to be unknown. Similarly the value of the expression “1 NOT IN (2, Null)” is not True as one might expect, it is also considered to be unknown. Finally—since an object either belongs or does not belong to a set—one might expect that the value of the expression “(1 IN (2, Null)) OR (1 NOT IN (2, Null))” is True but that is not so; it is also considered to be unknown. None of this is very intuitive, to say the least. However, refer to the section titled Use of EXISTS versus IN for Subqueries in the Oracle Database 10g Performance Tuning Guide. A similar note appears in the Oracle Database 9i guide but I could not find anything like it in the Oracle Database 11g guide; presumably the Oracle Database 11g query optimizer is smarter than previous versions.
It was time to rewrite the problem query. This is what I wrote:
SELECT order_id, order_date FROM orders o WHERE order_state = : order_state AND EXISTS ( SELECT * FROM order_items ol WHERE order_id = o.order_id AND EXISTS ( SELECT * FROM items WHERE item_id = ol.item_id AND item_state = : item_state ) ) ORDER BY order_date;
In the manner of Stephane Faroult, I confidently declared that the rewritten query would be much faster than the original version if the tables were well indexed. I was right; the rewritten query completed in a very small fraction of a second. Stephane was right too!