The way you write your query matters: Performance comparison between CONNECT BY and recursive subquery factoring
Follow @Oratweets
I was asked to compare the performance of traditional CONNECT BY syntax with recursive subquery factoring (a.k.a. recursive common table expressions) but I only have part of the answer. To summarize my answer, the CONNECT BY NO FILTERING and CONNECT BY WITHOUT FILTERING strategies are not available for queries that use recursive subquery factoring. These strategies require fewer buffer gets and it appears that Oracle builds structures that allow it to perform the recursion without needing additional buffer gets after the first pass through the data set. These strategies may therefore be more efficient for small data sets but I don’t have the exact details of how exactly they work and would appreciate any insights that anybody can provide. These examples are further evidence that the way you write your query has performance implications. However, Dan Tow pointed out some confusing aspects of CONNECT BY syntax to me and therefore I recommend recursive subquery factoring for general use because of its greater clarity (and greater generality), performance implications notwithstanding.
Consider the well-known example of the employee hierarchy.
select rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name, emp.hire_date from employees emp start with emp.manager_id is null connect by emp.manager_id = prior emp.employee_id order siblings by hire_date;
The recursive subquery equivalent is extremely verbose in comparison. “RCTE” in the query below stands for “recursive common table expression.” Refer to my previous article I Heart Recursive Subquery Factoring (Recursive Common Table Expressions for a detailed explanation of such queries.
with rcte(lvl, employee_id, first_name, last_name, hire_date) as ( -- anchor member select 1 as lvl, emp.employee_id, emp.first_name, emp.last_name, emp.hire_date from employees emp where emp.manager_id is null union all -- recursive member select rcte.lvl + 1 as lvl, emp.employee_id, emp.first_name, emp.last_name, emp.hire_date from rcte, employees emp where rcte.employee_id = emp.manager_id ) search depth first by hire_date set rank select rpad(' ', 2 * (lvl - 1)) || first_name || ' ' || last_name as name, hire_date from rcte;
Here is the execution plan for the above recursive subquery. It is a UNION ALL (step Id 2) of the TABLE ACCESS FULL operation (step Id 3) which retrieves the starting data set and the NESTED LOOPS operation (step Id 4) which retrieves initial rows.
SQL_ID 42cthn2bdc7s8, child number 0 Plan hash value: 3762696923 ---------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 8 (100)| | 107 |00:00:00.05 | 23 | | 1 | VIEW | | 1 | 7 | 336 | 8 (13)| 00:00:01 | 107 |00:00:00.05 | 23 | | 2 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | 1 | | | | | 107 |00:00:00.05 | 23 | |* 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 | | 4 | NESTED LOOPS | | 4 | | | | | 106 |00:00:00.02 | 16 | | 5 | NESTED LOOPS | | 4 | 6 | 522 | 4 (0)| 00:00:01 | 106 |00:00:00.01 | 5 | | 6 | RECURSIVE WITH PUMP | | 4 | | | | | 107 |00:00:00.01 | 0 | |* 7 | INDEX RANGE SCAN | EMP_MANAGER_IX | 107 | 6 | | 0 (0)| | 106 |00:00:00.01 | 5 | | 8 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 106 | 6 | 366 | 1 (0)| 00:00:01 | 106 |00:00:00.04 | 11 | ---------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("EMP"."MANAGER_ID" IS NULL) 7 - access("RCTE"."EMPLOYEE_ID"="EMP"."MANAGER_ID")
Three different execution strategies are available for the traditional CONNECT BY syntax. In the examples below, I used the mysteriously named NO_CONNECT_BY_FILTERING, NO_CONNECT_BY_COMBINE_SW, and CONNECT_BY_FILTERING hints respectively to force the optimizer to use the strategies I wanted. I can’t explain the allusions to filtering in the names so any insights would be appreciated. My best guess is that NO_CONNECT_BY_COMBINE_SW means that the optimizer will not include the cost of the START WITH clause in its costing calculations. The CONNECT_BY_FILTERING strategy is very similar to the strategy used above for recursive subquery factoring.
Here is the execution plan for the traditional query with the addition of the NO_CONNECT_BY_FILTERING hint. The number of buffer gets is only 7 and the employee table is touched only once implying that Oracle has created a hashed structure to perform the recursion.
SQL_ID 2xfgzbz36s2qu, child number 0 Plan hash value: 1986864582 ----------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 107 |00:00:00.01 | 7 | |* 1 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | | | | 107 |00:00:00.01 | 7 | | 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 6527 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 | ----------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."MANAGER_ID"=PRIOR NULL) filter("EMP"."MANAGER_ID" IS NULL)
Here is the execution plan for the traditional query with the addition of the NO_CONNECT_BY_COMBINE_SW hint. There are two passes through the Employees table. The number of buffer gets is 14. Based on the values in the Starts and A-Rows column, it appears that Oracle has created a hashed structure from the results of the second pass. Therefore, this strategy may not offer any advantages over the NO_CONNECT_BY_FILTERING strategy.
SQL_ID a9bux4t6t76qw, child number 0 Plan hash value: 1497856380 ------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | | 7 (100)| | 107 |00:00:00.01 | 14 | |* 1 | CONNECT BY WITHOUT FILTERING| | 1 | | | | | 107 |00:00:00.01 | 14 | |* 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 | | 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 6527 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 | ------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."MANAGER_ID"=PRIOR NULL) 2 - filter("EMP"."MANAGER_ID" IS NULL)
Here is the execution plan for the traditional query with the addition of the CONNECT_BY_FILTERING hint. The execution plan is now very similar to that in the case of the recursive subquery factoring example. Once again there is a TABLE ACCESS FULL (step Id 2) to retrieve the starting data set and a NESTED LOOPS operation (step Id 3) to retrieve additional rows. The number of buffer gets (22) is almost identical to that of the recursive subquery factoring case (23).
SQL_ID 24wb2xqwkrm4x, child number 0 Plan hash value: 2829436659 ------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | | 8 (100)| | 107 |00:00:00.01 | 22 | |* 1 | CONNECT BY WITH FILTERING | | 1 | | | | | 107 |00:00:00.01 | 22 | |* 2 | TABLE ACCESS FULL | EMPLOYEES | 1 | 1 | 61 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 7 | | 3 | NESTED LOOPS | | 4 | 6 | 444 | 4 (0)| 00:00:01 | 106 |00:00:00.01 | 15 | | 4 | CONNECT BY PUMP | | 4 | | | | | 107 |00:00:00.01 | 0 | | 5 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 107 | 6 | 366 | 1 (0)| 00:00:01 | 106 |00:00:00.01 | 15 | |* 6 | INDEX RANGE SCAN | EMP_MANAGER_IX | 107 | 6 | | 0 (0)| | 106 |00:00:00.01 | 5 | ------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."MANAGER_ID"=PRIOR NULL) 2 - filter("EMP"."MANAGER_ID" IS NULL) 6 - access("EMP"."MANAGER_ID"="connect$_by$_pump$_002"."prior emp.employee_id ")
Dan Tow pointed out some confusing aspects of CONNECT BY syntax to me and therefore I recommend recursive subquery factoring for general use because of its clarity. Consider the following two queries; at first glance they appear to be functionally identical but their results prove otherwise. The first uses INNER JOIN while the second does not. Both include the condition “dep.department_name = ‘Sales’.”
-- INNER JOIN version select rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name, emp.hire_date, dep.department_name from employees emp inner join departments dep on (emp.department_id = dep.department_id and dep.department_name = 'Sales') -- No START WITH clause is required in this example connect by emp.manager_id = prior emp.employee_id order siblings by hire_date; -- Traditional version select rpad(' ', 2 * (level - 1)) || first_name || ' ' || last_name as name, emp.hire_date, dep.department_name from employees emp, departments dep where emp.department_id = dep.department_id and dep.department_name = 'Sales' -- No START WITH clause is required in this example connect by emp.manager_id = prior emp.employee_id order siblings by hire_date;
Here are the execution plan and results of the INNER JOIN version. 63 rows are produced. The condition “dep.department_name = ‘Sales’” is checked at step Id 4.
SQL_ID 0gpmzjr5zcybv, child number 0 Plan hash value: 849430486 --------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 63 |00:00:00.01 | 117 | |* 1 | CONNECT BY WITHOUT FILTERING | | 1 | | | | | 63 |00:00:00.01 | 117 | | 2 | NESTED LOOPS | | 1 | 10 | 1040 | 3 (0)| 00:00:01 | 34 |00:00:00.01 | 117 | | 3 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 7918 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 | |* 4 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 107 | 1 | 30 | 0 (0)| | 34 |00:00:00.01 | 110 | |* 5 | INDEX UNIQUE SCAN | DEPT_ID_PK | 107 | 1 | | 0 (0)| | 106 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."MANAGER_ID"=PRIOR NULL) 4 - filter("DEP"."DEPARTMENT_NAME"='Sales') 5 - access("EMP"."DEPARTMENT_ID"="DEP"."DEPARTMENT_ID") NAME HIRE_DATE DEPARTMENT_NAME ------------------------------ --------- ------------------------------ John Russell 01-OCT-96 Sales Peter Tucker 30-JAN-97 Sales David Bernstein 24-MAR-97 Sales Peter Hall 20-AUG-97 Sales Christopher Olsen 30-MAR-98 Sales Nanette Cambrault 09-DEC-98 Sales Oliver Tuvault 23-NOV-99 Sales Karen Partners 05-JAN-97 Sales Janette King 30-JAN-96 Sales Patrick Sully 04-MAR-96 Sales Allan McEwen 01-AUG-96 Sales Lindsey Smith 10-MAR-97 Sales Louise Doran 15-DEC-97 Sales Sarath Sewall 03-NOV-98 Sales Alberto Errazuriz 10-MAR-97 Sales Clara Vishney 11-NOV-97 Sales Danielle Greene 19-MAR-99 Sales Mattea Marvins 24-JAN-00 Sales David Lee 23-FEB-00 Sales Sundar Ande 24-MAR-00 Sales Amit Banda 21-APR-00 Sales Gerald Cambrault 15-OCT-99 Sales Lisa Ozer 11-MAR-97 Sales Tayler Fox 24-JAN-98 Sales Harrison Bloom 23-MAR-98 Sales William Smith 23-FEB-99 Sales Elizabeth Bates 24-MAR-99 Sales Sundita Kumar 21-APR-00 Sales Eleni Zlotkey 29-JAN-00 Sales Ellen Abel 11-MAY-96 Sales Alyssa Hutton 19-MAR-97 Sales Jonathon Taylor 24-MAR-98 Sales Jack Livingston 23-APR-98 Sales Charles Johnson 04-JAN-00 Sales Peter Tucker 30-JAN-97 Sales David Bernstein 24-MAR-97 Sales Peter Hall 20-AUG-97 Sales Christopher Olsen 30-MAR-98 Sales Nanette Cambrault 09-DEC-98 Sales Oliver Tuvault 23-NOV-99 Sales Janette King 30-JAN-96 Sales Patrick Sully 04-MAR-96 Sales Allan McEwen 01-AUG-96 Sales Lindsey Smith 10-MAR-97 Sales Louise Doran 15-DEC-97 Sales Sarath Sewall 03-NOV-98 Sales Clara Vishney 11-NOV-97 Sales Danielle Greene 19-MAR-99 Sales Mattea Marvins 24-JAN-00 Sales David Lee 23-FEB-00 Sales Sundar Ande 24-MAR-00 Sales Amit Banda 21-APR-00 Sales Lisa Ozer 11-MAR-97 Sales Tayler Fox 24-JAN-98 Sales Harrison Bloom 23-MAR-98 Sales William Smith 23-FEB-99 Sales Elizabeth Bates 24-MAR-99 Sales Sundita Kumar 21-APR-00 Sales Ellen Abel 11-MAY-96 Sales Alyssa Hutton 19-MAR-97 Sales Jonathon Taylor 24-MAR-98 Sales Jack Livingston 23-APR-98 Sales Charles Johnson 04-JAN-00 Sales 63 rows selected.
Here are the execution plan and results of the more traditional version. 97 rows are produced instead of 63. The condition “dep.department_name = ‘Sales’” is checked at step Id 1.
SQL_ID 156g27gqtx4qu, child number 0 Plan hash value: 658926182 ---------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 97 |00:00:00.01 | 117 | |* 1 | FILTER | | 1 | | | | | 97 |00:00:00.01 | 117 | |* 2 | CONNECT BY WITHOUT FILTERING | | 1 | | | | | 312 |00:00:00.05 | 117 | | 3 | NESTED LOOPS | | 1 | 106 | 11024 | 3 (0)| 00:00:01 | 106 |00:00:00.01 | 117 | | 4 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 7918 | 3 (0)| 00:00:01 | 107 |00:00:00.01 | 7 | | 5 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 107 | 1 | 30 | 0 (0)| | 106 |00:00:00.01 | 110 | |* 6 | INDEX UNIQUE SCAN | DEPT_ID_PK | 107 | 1 | | 0 (0)| | 106 |00:00:00.01 | 4 | ---------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DEP"."DEPARTMENT_NAME"='Sales') 2 - access("EMP"."MANAGER_ID"=PRIOR NULL) 6 - access("EMP"."DEPARTMENT_ID"="DEP"."DEPARTMENT_ID") NAME HIRE_DATE DEPARTMENT_NAME ------------------------------ --------- ------------------------------ John Russell 01-OCT-96 Sales Peter Tucker 30-JAN-97 Sales David Bernstein 24-MAR-97 Sales Peter Hall 20-AUG-97 Sales Christopher Olsen 30-MAR-98 Sales Nanette Cambrault 09-DEC-98 Sales Oliver Tuvault 23-NOV-99 Sales Karen Partners 05-JAN-97 Sales Janette King 30-JAN-96 Sales Patrick Sully 04-MAR-96 Sales Allan McEwen 01-AUG-96 Sales Lindsey Smith 10-MAR-97 Sales Louise Doran 15-DEC-97 Sales Sarath Sewall 03-NOV-98 Sales Alberto Errazuriz 10-MAR-97 Sales Clara Vishney 11-NOV-97 Sales Danielle Greene 19-MAR-99 Sales Mattea Marvins 24-JAN-00 Sales David Lee 23-FEB-00 Sales Sundar Ande 24-MAR-00 Sales Amit Banda 21-APR-00 Sales Gerald Cambrault 15-OCT-99 Sales Lisa Ozer 11-MAR-97 Sales Tayler Fox 24-JAN-98 Sales Harrison Bloom 23-MAR-98 Sales William Smith 23-FEB-99 Sales Elizabeth Bates 24-MAR-99 Sales Sundita Kumar 21-APR-00 Sales Eleni Zlotkey 29-JAN-00 Sales Ellen Abel 11-MAY-96 Sales Alyssa Hutton 19-MAR-97 Sales Jonathon Taylor 24-MAR-98 Sales Jack Livingston 23-APR-98 Sales Charles Johnson 04-JAN-00 Sales Peter Tucker 30-JAN-97 Sales David Bernstein 24-MAR-97 Sales Peter Hall 20-AUG-97 Sales Christopher Olsen 30-MAR-98 Sales Nanette Cambrault 09-DEC-98 Sales Oliver Tuvault 23-NOV-99 Sales Janette King 30-JAN-96 Sales Patrick Sully 04-MAR-96 Sales Allan McEwen 01-AUG-96 Sales Lindsey Smith 10-MAR-97 Sales Louise Doran 15-DEC-97 Sales Sarath Sewall 03-NOV-98 Sales Clara Vishney 11-NOV-97 Sales Danielle Greene 19-MAR-99 Sales Mattea Marvins 24-JAN-00 Sales David Lee 23-FEB-00 Sales Sundar Ande 24-MAR-00 Sales Amit Banda 21-APR-00 Sales Lisa Ozer 11-MAR-97 Sales Tayler Fox 24-JAN-98 Sales Harrison Bloom 23-MAR-98 Sales William Smith 23-FEB-99 Sales Elizabeth Bates 24-MAR-99 Sales Sundita Kumar 21-APR-00 Sales Ellen Abel 11-MAY-96 Sales Alyssa Hutton 19-MAR-97 Sales Jonathon Taylor 24-MAR-98 Sales Jack Livingston 23-APR-98 Sales Charles Johnson 04-JAN-00 Sales John Russell 01-OCT-96 Sales Peter Tucker 30-JAN-97 Sales David Bernstein 24-MAR-97 Sales Peter Hall 20-AUG-97 Sales Christopher Olsen 30-MAR-98 Sales Nanette Cambrault 09-DEC-98 Sales Oliver Tuvault 23-NOV-99 Sales Karen Partners 05-JAN-97 Sales Janette King 30-JAN-96 Sales Patrick Sully 04-MAR-96 Sales Allan McEwen 01-AUG-96 Sales Lindsey Smith 10-MAR-97 Sales Louise Doran 15-DEC-97 Sales Sarath Sewall 03-NOV-98 Sales Alberto Errazuriz 10-MAR-97 Sales Clara Vishney 11-NOV-97 Sales Danielle Greene 19-MAR-99 Sales Mattea Marvins 24-JAN-00 Sales David Lee 23-FEB-00 Sales Sundar Ande 24-MAR-00 Sales Amit Banda 21-APR-00 Sales Gerald Cambrault 15-OCT-99 Sales Lisa Ozer 11-MAR-97 Sales Tayler Fox 24-JAN-98 Sales Harrison Bloom 23-MAR-98 Sales William Smith 23-FEB-99 Sales Elizabeth Bates 24-MAR-99 Sales Sundita Kumar 21-APR-00 Sales Eleni Zlotkey 29-JAN-00 Sales Ellen Abel 11-MAY-96 Sales Alyssa Hutton 19-MAR-97 Sales Jonathon Taylor 24-MAR-98 Sales Jack Livingston 23-APR-98 Sales Charles Johnson 04-JAN-00 Sales 97 rows selected.
This is not a bug. According to the SQL Language Reference:
If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.
To make things clearer, here is the recursive subquery factoring version of the above query. Notice that the condition “department_name = ‘Sales’” has been placed outside the recursive subquery.
with rcte(lvl, employee_id, first_name, last_name, hire_date, department_name) as ( select 1 as lvl, emp.employee_id, emp.first_name, emp.last_name, emp.hire_date, dep.department_name from employees emp, departments dep where emp.department_id = dep.department_id union all select rcte.lvl + 1 as lvl, emp.employee_id, emp.first_name, emp.last_name, emp.hire_date, dep.department_name from rcte, employees emp, departments dep where rcte.employee_id = emp.manager_id and emp.department_id = dep.department_id ) search depth first by hire_date set rank select rpad(' ', 2 * (lvl - 1)) || first_name || ' ' || last_name as name, hire_date, department_name from rcte where department_name = 'Sales';
All the above examples were produced using Oracle Database 11g Release 2 and the Developer Days Virtual Machine.
Recent Comments