The way you write your query matters: Performance comparison between CONNECT BY and recursive subquery factoring
Follow @Oratweets
I was asked to compare the performance of traditional CONNECT BY syntax with recursive subquery factoring (a.k.a. recursive common table expressions) but I only have part of the answer. To summarize my answer, the CONNECT BY NO FILTERING and CONNECT BY WITHOUT FILTERING strategies are not available for queries that use recursive subquery factoring. These strategies require fewer buffer gets and it appears that Oracle builds structures that allow it to perform the recursion without needing additional buffer gets after the first pass through the data set. These strategies may therefore be more efficient for small data sets but I don’t have the exact details of how exactly they work and would appreciate any insights that anybody can provide. These examples are further evidence that the way you write your query has performance implications. However, Dan Tow pointed out some confusing aspects of CONNECT BY syntax to me and therefore I recommend recursive subquery factoring for general use because of its greater clarity (and greater generality), performance implications notwithstanding.
Consider the well-known example of the employee hierarchy.
select
rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name,
emp.hire_date
from employees emp
start with emp.manager_id is null
connect by emp.manager_id = prior emp.employee_id
order siblings by hire_date;
The recursive subquery equivalent is extremely verbose in comparison. “RCTE” in the query below stands for “recursive common table expression.” Refer to my previous article I Heart Recursive Subquery Factoring (Recursive Common Table Expressions for a detailed explanation of such queries.
with rcte(lvl, employee_id, first_name, last_name, hire_date) as
(
-- anchor member
select
1 as lvl,
emp.employee_id,
emp.first_name,
emp.last_name,
emp.hire_date
from employees emp
where emp.manager_id is null
union all
-- recursive member
select
rcte.lvl + 1 as lvl,
emp.employee_id,
emp.first_name,
emp.last_name,
emp.hire_date
from
rcte,
employees emp
where rcte.employee_id = emp.manager_id
)
search depth first by hire_date set rank
select
rpad(' ', 2 * (lvl - 1)) || first_name || ' ' || last_name as name,
hire_date
from rcte;
Here is the execution plan for the above recursive subquery. It is a UNION ALL (step Id 2) of the TABLE ACCESS FULL operation (step Id 3) which retrieves the starting data set and the NESTED LOOPS operation (step Id 4) which retrieves initial rows.
SQL_ID 42cthn2bdc7s8, child number 0
Plan hash value: 3762696923
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 8 (100)| | 107 |00:00:00.05 | 23 |
| 1 | VIEW | | 1 | 7 | 336 | 8 (13)| 00:00:01 | 107 |00:00:00.05 | 23 |
| 2 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | 1 | | | | | 107 |00:00:00.05 | 23 |
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 |
| 4 | NESTED LOOPS | | 4 | | | | | 106 |00:00:00.02 | 16 |
| 5 | NESTED LOOPS | | 4 | 6 | 522 | 4 (0)| 00:00:01 | 106 |00:00:00.01 | 5 |
| 6 | RECURSIVE WITH PUMP | | 4 | | | | | 107 |00:00:00.01 | 0 |
|* 7 | INDEX RANGE SCAN | EMP_MANAGER_IX | 107 | 6 | | 0 (0)| | 106 |00:00:00.01 | 5 |
| 8 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 106 | 6 | 366 | 1 (0)| 00:00:01 | 106 |00:00:00.04 | 11 |
----------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("EMP"."MANAGER_ID" IS NULL)
7 - access("RCTE"."EMPLOYEE_ID"="EMP"."MANAGER_ID")
Three different execution strategies are available for the traditional CONNECT BY syntax. In the examples below, I used the mysteriously named NO_CONNECT_BY_FILTERING, NO_CONNECT_BY_COMBINE_SW, and CONNECT_BY_FILTERING hints respectively to force the optimizer to use the strategies I wanted. I can’t explain the allusions to filtering in the names so any insights would be appreciated. My best guess is that NO_CONNECT_BY_COMBINE_SW means that the optimizer will not include the cost of the START WITH clause in its costing calculations. The CONNECT_BY_FILTERING strategy is very similar to the strategy used above for recursive subquery factoring.
Here is the execution plan for the traditional query with the addition of the NO_CONNECT_BY_FILTERING hint. The number of buffer gets is only 7 and the employee table is touched only once implying that Oracle has created a hashed structure to perform the recursion.
SQL_ID 2xfgzbz36s2qu, child number 0
Plan hash value: 1986864582
-----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 107 |00:00:00.01 | 7 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | | | | 107 |00:00:00.01 | 7 |
| 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 6527 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 |
-----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."MANAGER_ID"=PRIOR NULL)
filter("EMP"."MANAGER_ID" IS NULL)
Here is the execution plan for the traditional query with the addition of the NO_CONNECT_BY_COMBINE_SW hint. There are two passes through the Employees table. The number of buffer gets is 14. Based on the values in the Starts and A-Rows column, it appears that Oracle has created a hashed structure from the results of the second pass. Therefore, this strategy may not offer any advantages over the NO_CONNECT_BY_FILTERING strategy.
SQL_ID a9bux4t6t76qw, child number 0
Plan hash value: 1497856380
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 7 (100)| | 107 |00:00:00.01 | 14 |
|* 1 | CONNECT BY WITHOUT FILTERING| | 1 | | | | | 107 |00:00:00.01 | 14 |
|* 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 6527 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."MANAGER_ID"=PRIOR NULL)
2 - filter("EMP"."MANAGER_ID" IS NULL)
Here is the execution plan for the traditional query with the addition of the CONNECT_BY_FILTERING hint. The execution plan is now very similar to that in the case of the recursive subquery factoring example. Once again there is a TABLE ACCESS FULL (step Id 2) to retrieve the starting data set and a NESTED LOOPS operation (step Id 3) to retrieve additional rows. The number of buffer gets (22) is almost identical to that of the recursive subquery factoring case (23).
SQL_ID 24wb2xqwkrm4x, child number 0
Plan hash value: 2829436659
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 8 (100)| | 107 |00:00:00.01 | 22 |
|* 1 | CONNECT BY WITH FILTERING | | 1 | | | | | 107 |00:00:00.01 | 22 |
|* 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 |
| 3 | NESTED LOOPS | | 4 | 6 | 444 | 4 (0)| 00:00:01 | 106 |00:00:00.01 | 15 |
| 4 | CONNECT BY PUMP | | 4 | | | | | 107 |00:00:00.01 | 0 |
| 5 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 107 | 6 | 366 | 1 (0)| 00:00:01 | 106 |00:00:00.01 | 15 |
|* 6 | INDEX RANGE SCAN | EMP_MANAGER_IX | 107 | 6 | | 0 (0)| | 106 |00:00:00.01 | 5 |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."MANAGER_ID"=PRIOR NULL)
2 - filter("EMP"."MANAGER_ID" IS NULL)
6 - access("EMP"."MANAGER_ID"="connect$_by$_pump$_002"."prior emp.employee_id ")
Dan Tow pointed out some confusing aspects of CONNECT BY syntax to me and therefore I recommend recursive subquery factoring for general use because of its clarity. Consider the following two queries; at first glance they appear to be functionally identical but their results prove otherwise. The first uses INNER JOIN while the second does not. Both include the condition “dep.department_name = ‘Sales’.”
-- INNER JOIN version
select
rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name,
emp.hire_date,
dep.department_name
from
employees emp
inner join departments dep on (emp.department_id = dep.department_id and dep.department_name = 'Sales')
-- No START WITH clause is required in this example
connect by emp.manager_id = prior emp.employee_id
order siblings by hire_date;
-- Traditional version
select
rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name,
emp.hire_date,
dep.department_name
from
employees emp,
departments dep
where emp.department_id = dep.department_id
and dep.department_name = 'Sales'
-- No START WITH clause is required in this example
connect by emp.manager_id = prior emp.employee_id
order siblings by hire_date;
Here are the execution plan and results of the INNER JOIN version. 63 rows are produced. The condition “dep.department_name = ‘Sales’” is checked at step Id 4.
SQL_ID 0gpmzjr5zcybv, child number 0
Plan hash value: 849430486
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 63 |00:00:00.01 | 117 |
|* 1 | CONNECT BY WITHOUT FILTERING | | 1 | | | | | 63 |00:00:00.01 | 117 |
| 2 | NESTED LOOPS | | 1 | 10 | 1040 | 3 (0)| 00:00:01 | 34 |00:00:00.01 | 117 |
| 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 7918 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 |
|* 4 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 107 | 1 | 30 | 0 (0)| | 34 |00:00:00.01 | 110 |
|* 5 | INDEX UNIQUE SCAN | DEPT_ID_PK | 107 | 1 | | 0 (0)| | 106 |00:00:00.01 | 4 |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."MANAGER_ID"=PRIOR NULL)
4 - filter("DEP"."DEPARTMENT_NAME"='Sales')
5 - access("EMP"."DEPARTMENT_ID"="DEP"."DEPARTMENT_ID")
NAME HIRE_DATE DEPARTMENT_NAME
------------------------------ --------- ------------------------------
John Russell 01-OCT-96 Sales
Peter Tucker 30-JAN-97 Sales
David Bernstein 24-MAR-97 Sales
Peter Hall 20-AUG-97 Sales
Christopher Olsen 30-MAR-98 Sales
Nanette Cambrault 09-DEC-98 Sales
Oliver Tuvault 23-NOV-99 Sales
Karen Partners 05-JAN-97 Sales
Janette King 30-JAN-96 Sales
Patrick Sully 04-MAR-96 Sales
Allan McEwen 01-AUG-96 Sales
Lindsey Smith 10-MAR-97 Sales
Louise Doran 15-DEC-97 Sales
Sarath Sewall 03-NOV-98 Sales
Alberto Errazuriz 10-MAR-97 Sales
Clara Vishney 11-NOV-97 Sales
Danielle Greene 19-MAR-99 Sales
Mattea Marvins 24-JAN-00 Sales
David Lee 23-FEB-00 Sales
Sundar Ande 24-MAR-00 Sales
Amit Banda 21-APR-00 Sales
Gerald Cambrault 15-OCT-99 Sales
Lisa Ozer 11-MAR-97 Sales
Tayler Fox 24-JAN-98 Sales
Harrison Bloom 23-MAR-98 Sales
William Smith 23-FEB-99 Sales
Elizabeth Bates 24-MAR-99 Sales
Sundita Kumar 21-APR-00 Sales
Eleni Zlotkey 29-JAN-00 Sales
Ellen Abel 11-MAY-96 Sales
Alyssa Hutton 19-MAR-97 Sales
Jonathon Taylor 24-MAR-98 Sales
Jack Livingston 23-APR-98 Sales
Charles Johnson 04-JAN-00 Sales
Peter Tucker 30-JAN-97 Sales
David Bernstein 24-MAR-97 Sales
Peter Hall 20-AUG-97 Sales
Christopher Olsen 30-MAR-98 Sales
Nanette Cambrault 09-DEC-98 Sales
Oliver Tuvault 23-NOV-99 Sales
Janette King 30-JAN-96 Sales
Patrick Sully 04-MAR-96 Sales
Allan McEwen 01-AUG-96 Sales
Lindsey Smith 10-MAR-97 Sales
Louise Doran 15-DEC-97 Sales
Sarath Sewall 03-NOV-98 Sales
Clara Vishney 11-NOV-97 Sales
Danielle Greene 19-MAR-99 Sales
Mattea Marvins 24-JAN-00 Sales
David Lee 23-FEB-00 Sales
Sundar Ande 24-MAR-00 Sales
Amit Banda 21-APR-00 Sales
Lisa Ozer 11-MAR-97 Sales
Tayler Fox 24-JAN-98 Sales
Harrison Bloom 23-MAR-98 Sales
William Smith 23-FEB-99 Sales
Elizabeth Bates 24-MAR-99 Sales
Sundita Kumar 21-APR-00 Sales
Ellen Abel 11-MAY-96 Sales
Alyssa Hutton 19-MAR-97 Sales
Jonathon Taylor 24-MAR-98 Sales
Jack Livingston 23-APR-98 Sales
Charles Johnson 04-JAN-00 Sales
63 rows selected.
Here are the execution plan and results of the more traditional version. 97 rows are produced instead of 63. The condition “dep.department_name = ‘Sales’” is checked at step Id 1.
SQL_ID 156g27gqtx4qu, child number 0
Plan hash value: 658926182
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 97 |00:00:00.01 | 117 |
|* 1 | FILTER | | 1 | | | | | 97 |00:00:00.01 | 117 |
|* 2 | CONNECT BY WITHOUT FILTERING | | 1 | | | | | 312 |00:00:00.05 | 117 |
| 3 | NESTED LOOPS | | 1 | 106 | 11024 | 3 (0)| 00:00:01 | 106 |00:00:00.01 | 117 |
| 4 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 7918 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 107 | 1 | 30 | 0 (0)| | 106 |00:00:00.01 | 110 |
|* 6 | INDEX UNIQUE SCAN | DEPT_ID_PK | 107 | 1 | | 0 (0)| | 106 |00:00:00.01 | 4 |
----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DEP"."DEPARTMENT_NAME"='Sales')
2 - access("EMP"."MANAGER_ID"=PRIOR NULL)
6 - access("EMP"."DEPARTMENT_ID"="DEP"."DEPARTMENT_ID")
NAME HIRE_DATE DEPARTMENT_NAME
------------------------------ --------- ------------------------------
John Russell 01-OCT-96 Sales
Peter Tucker 30-JAN-97 Sales
David Bernstein 24-MAR-97 Sales
Peter Hall 20-AUG-97 Sales
Christopher Olsen 30-MAR-98 Sales
Nanette Cambrault 09-DEC-98 Sales
Oliver Tuvault 23-NOV-99 Sales
Karen Partners 05-JAN-97 Sales
Janette King 30-JAN-96 Sales
Patrick Sully 04-MAR-96 Sales
Allan McEwen 01-AUG-96 Sales
Lindsey Smith 10-MAR-97 Sales
Louise Doran 15-DEC-97 Sales
Sarath Sewall 03-NOV-98 Sales
Alberto Errazuriz 10-MAR-97 Sales
Clara Vishney 11-NOV-97 Sales
Danielle Greene 19-MAR-99 Sales
Mattea Marvins 24-JAN-00 Sales
David Lee 23-FEB-00 Sales
Sundar Ande 24-MAR-00 Sales
Amit Banda 21-APR-00 Sales
Gerald Cambrault 15-OCT-99 Sales
Lisa Ozer 11-MAR-97 Sales
Tayler Fox 24-JAN-98 Sales
Harrison Bloom 23-MAR-98 Sales
William Smith 23-FEB-99 Sales
Elizabeth Bates 24-MAR-99 Sales
Sundita Kumar 21-APR-00 Sales
Eleni Zlotkey 29-JAN-00 Sales
Ellen Abel 11-MAY-96 Sales
Alyssa Hutton 19-MAR-97 Sales
Jonathon Taylor 24-MAR-98 Sales
Jack Livingston 23-APR-98 Sales
Charles Johnson 04-JAN-00 Sales
Peter Tucker 30-JAN-97 Sales
David Bernstein 24-MAR-97 Sales
Peter Hall 20-AUG-97 Sales
Christopher Olsen 30-MAR-98 Sales
Nanette Cambrault 09-DEC-98 Sales
Oliver Tuvault 23-NOV-99 Sales
Janette King 30-JAN-96 Sales
Patrick Sully 04-MAR-96 Sales
Allan McEwen 01-AUG-96 Sales
Lindsey Smith 10-MAR-97 Sales
Louise Doran 15-DEC-97 Sales
Sarath Sewall 03-NOV-98 Sales
Clara Vishney 11-NOV-97 Sales
Danielle Greene 19-MAR-99 Sales
Mattea Marvins 24-JAN-00 Sales
David Lee 23-FEB-00 Sales
Sundar Ande 24-MAR-00 Sales
Amit Banda 21-APR-00 Sales
Lisa Ozer 11-MAR-97 Sales
Tayler Fox 24-JAN-98 Sales
Harrison Bloom 23-MAR-98 Sales
William Smith 23-FEB-99 Sales
Elizabeth Bates 24-MAR-99 Sales
Sundita Kumar 21-APR-00 Sales
Ellen Abel 11-MAY-96 Sales
Alyssa Hutton 19-MAR-97 Sales
Jonathon Taylor 24-MAR-98 Sales
Jack Livingston 23-APR-98 Sales
Charles Johnson 04-JAN-00 Sales
John Russell 01-OCT-96 Sales
Peter Tucker 30-JAN-97 Sales
David Bernstein 24-MAR-97 Sales
Peter Hall 20-AUG-97 Sales
Christopher Olsen 30-MAR-98 Sales
Nanette Cambrault 09-DEC-98 Sales
Oliver Tuvault 23-NOV-99 Sales
Karen Partners 05-JAN-97 Sales
Janette King 30-JAN-96 Sales
Patrick Sully 04-MAR-96 Sales
Allan McEwen 01-AUG-96 Sales
Lindsey Smith 10-MAR-97 Sales
Louise Doran 15-DEC-97 Sales
Sarath Sewall 03-NOV-98 Sales
Alberto Errazuriz 10-MAR-97 Sales
Clara Vishney 11-NOV-97 Sales
Danielle Greene 19-MAR-99 Sales
Mattea Marvins 24-JAN-00 Sales
David Lee 23-FEB-00 Sales
Sundar Ande 24-MAR-00 Sales
Amit Banda 21-APR-00 Sales
Gerald Cambrault 15-OCT-99 Sales
Lisa Ozer 11-MAR-97 Sales
Tayler Fox 24-JAN-98 Sales
Harrison Bloom 23-MAR-98 Sales
William Smith 23-FEB-99 Sales
Elizabeth Bates 24-MAR-99 Sales
Sundita Kumar 21-APR-00 Sales
Eleni Zlotkey 29-JAN-00 Sales
Ellen Abel 11-MAY-96 Sales
Alyssa Hutton 19-MAR-97 Sales
Jonathon Taylor 24-MAR-98 Sales
Jack Livingston 23-APR-98 Sales
Charles Johnson 04-JAN-00 Sales
97 rows selected.
This is not a bug. According to the SQL Language Reference:
If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.
To make things clearer, here is the recursive subquery factoring version of the above query. Notice that the condition “department_name = ‘Sales’” has been placed outside the recursive subquery.
with rcte(lvl, employee_id, first_name, last_name, hire_date, department_name) as
(
select
1 as lvl,
emp.employee_id,
emp.first_name,
emp.last_name,
emp.hire_date,
dep.department_name
from
employees emp,
departments dep
where emp.department_id = dep.department_id
union all
select
rcte.lvl + 1 as lvl,
emp.employee_id,
emp.first_name,
emp.last_name,
emp.hire_date,
dep.department_name
from
rcte,
employees emp,
departments dep
where rcte.employee_id = emp.manager_id
and emp.department_id = dep.department_id
)
search depth first by hire_date set rank
select
rpad(' ', 2 * (lvl - 1)) || first_name || ' ' || last_name as name,
hire_date,
department_name
from rcte
where department_name = 'Sales';
All the above examples were produced using Oracle Database 11g Release 2 and the Developer Days Virtual Machine.


Recent Comments