I Heart Recursive Subquery Factoring (Recursive Common Table Expressions)
Follow @Oratweets
Also see Performance comparison between CONNECT BY and recursive subquery factoring
I enjoy attending RMOUG Training Days because it is an educational workshop, not a gigantic trade show like Oracle OpenWorld. The presentations are not “customer success stories” or sales pitches; they are technical presentations on technical topics that matter to database administrators and developers. Here is a picture taken at the speaker reception at Training Days 2010 sent to me by Dan Hotka who has a quarter of a century of experience with Oracle Database. Dan is now inventing the future of Oracle training with Oracle Training at Your Desk; I highly recommend it.
Seated from left to right: Guy Harrison, Steven Feuerstein, John Beresniewicz, Maria Colgan, and Daniel Liu. Standing from left to right: Iggy Fernandez (me), Chen Shapira, and Sumit Sengupta.
One of my presentations at Training Days 2010 was on recursive common table expressions in Oracle 11gR2. Oracle was late to the table with recursive common table expressions which have been part of the SQL standard since 2003 but, to Oracle’s credit, it has provided the CONNECT BY clause for hierarchical queries from the very beginning. However, recursive common table expressions can be used for much more than hierarchical queries. Also note that Oracle uses the non-standard terms “subquery factoring” and “recursive subquery factoring” for “common table expressions” and “recursive common table expressions” respectively.
Here is a quick recap of common table expressions of the non-recursive kind. They are an alternative to inline views and make a complex SQL query much easier to read and maintain. Another advantage is that they only need to be evaluated once if they are used more than once within the query. My favorite example of using common table expressions to make complex SQL statements readable is the winning solution to the First International NoCOUG SQL Challenge by Alberto Dell’Era from Italy. A simpler example—Suppliers Who Supply All Parts—is in my book.
On to recursive common table expressions. A recursive common table expression (recursive CTE) contains subqueries called “anchor members” and “recursive members.” The rows produced by the anchor members are processed by the recursive members. The recursive members produce other rows that are fed right back to them for further processing. Recursion stops only when the recursive members fail to produce additional rows. The explanation in the Oracle documentation is fairly cryptic but a good explanation can be found on the Microsoft Developer Network.
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
Here’s a simple example of a recursive common table expression: a number generator. The following example generates the consecutive numbers from 1 to 9. The anchor member generates the first row which is then processed by the recursive member. The recursive member uses the name of the recursive CTE as a placeholder for the output of the anchor member or a previous execution of the recursive CTE. In this example, each execution of the recursive member produces one more row. Recursion stops when nine records have been produced.
WITH -- The following number generator is a simple example of a recursive CTE. It -- produces the consecutive digits from 1 to 9. Numbers(n) AS ( -- The "anchor member." It contains exactly one row (N = 1). SELECT 1 AS N FROM dual UNION ALL -- The "recursive member." Notice that it references the name of the recursive -- CTE as a placeholder for the results of the anchor member or the previous -- execution of the recursive CTE. Each iteration of the recursive member -- produces the next value of N. Recursive execution stops when N = 9. SELECT N + 1 AS N FROM Numbers WHERE N < 9 ) SELECT * FROM Numbers;
The above example can be simply duplicated using the CONNECT BY clause as follows:
SELECT level AS N FROM dual CONNECT BY level <= 9;
Next consider a standard hierarchical query; an org-chart of managers and employees. First, here’s the old solution using the CONNECT BY clause; it is short and sweet.
SELECT
LPAD (' ', 4 * (LEVEL - 1)) || first_name || ' ' || last_name AS name
FROM employees
START WITH manager_id IS NULL
CONNECT BY manager_id = PRIOR employee_id;
The solution using recursive common table expressions is much more verbose. Note especially the SEARCH DEPTH FIRST clause; refer to the Oracle documentation for an explanation.
WITH
RecursiveCTE (employee_id, first_name, last_name, lvl) AS
(
-- The "anchor member" of the recursive CTE. It locates employees who don't
-- have any manager; presumably there is at least one such employee.
SELECT
employee_id,
first_name,
last_name,
1 AS lvl
FROM
employees
WHERE manager_id IS NULL
UNION ALL
-- The "recursive member" of the recursive CTE. Notice that it uses the name
-- of the recursive CTE as a placeholder for the results of the anchor member
-- or the previous execution of the recursive CTE. Each iteration of the
-- recursive member locates the employees who report to the employees located
-- in the previous iteration. Recursive execution stops when all employees
-- have been located.
SELECT
e.employee_id,
e.first_name,
e.last_name,
lvl + 1 AS lvl
FROM
RecursiveCTE r INNER JOIN employees e
ON (r.employee_id = e.manager_id)
)
-- Go deep in order to produce records in exactly the same order as the CONNECT
-- BY clause. The default order of processing is BREADTH FIRST which would
-- produce all managers at the same level before any of their employees; this is
-- not not the order in which the CONNECT BY produces rows. The pseudocolumn
-- seq# has been designated here to capture the order in which records are
-- produced by the recursive CTE; it will be used in the main query.
SEARCH DEPTH FIRST BY employee_id ASC SET seq#
-- This is the main query. It processes the results produced by the recursive
-- CTE.
SELECT LPAD (' ', 4 * (lvl - 1)) || first_name || ' ' || last_name AS name
FROM RecursiveCTE
ORDER BY seq#;
From the two examples above, it might appear that a recursive CTE is little more than a verbose way of specifying what could be more succinctly achieved with the CONNECT BY clause. However recursive common table expressions are more powerful than the CONNECT BY clause. For example, consider the following example from the TSQL Challenges team:
Given a list of products and a list of discount coupons, we need to use the following rules to find the minimum price for each product:
- Not more than two coupons can be applied to the same product.
- The discounted price cannot be less than 70% of the original price.
- The total amount of the discount cannot exceed $30.
Syed Mehroz Alam has provided an elegant solution to the above problem using recursive common table expressions.
My final exhibit is a Sudoku puzzle. It turns out that a Sudoku puzzle can be elegantly solved with recursive common table expressions. The solution was discovered by Anton Scheffer. The listing below is a heavily annotated version of Anton Scheffer’s solution with some cosmetic changes for better understandability.
-- The following SQL statement solves a Sudoku puzzle that is provided in the
-- form of a one-dimensional array of 81 digits. Note that a Sudoku puzzle is
-- really a 9x9 square grid. Here is how the positions in the one-dimensional
-- array correspond to positions in the 9x9 grid.
-- +---------+---------+---------+
-- | 1 2 3| 4 5 6| 7 8 9|
-- | 10 11 12| 13 14 15| 16 17 18|
-- | 19 20 21| 22 23 24| 25 26 27|
-- +---------+---------+---------+
-- | 28 29 30| 31 32 33| 34 35 36|
-- | 37 38 39| 40 41 42| 43 44 45|
-- | 46 47 48| 49 50 51| 52 53 54|
-- +---------+---------+---------+
-- | 55 56 57| 58 59 60| 61 62 63|
-- | 64 65 66| 67 68 69| 70 71 72|
-- | 73 74 75| 76 77 78| 79 80 81|
-- +---------+---------+---------+
-- A recursive common table expression (CTE) is used to solve the puzzle. The
-- "anchor member" of the recursive CTE contains the unsolved Sudoku puzzle. The
-- "recursive member" generates partially solved Sudokus. Each iteration of the
-- recursive member completes one blank cell. Recursive execution stops when no
-- more blank cells are left or if no value can be found for a blank cell
-- (meaning that the Sudoku has no solution). All solutions are produced if the
-- puzzle has multiple solutions.
-- Consider the following Sudoku puzzle: (All but the last 9 cells are filled.)
-- 534678912672195348198342567859761423426853791713924856961537284287419635
-- Here is the output produced for the above puzzle: (The output has been
-- condensed in order to accomodate it within the available space.)
-- Partially Solved Sudoku
-- ----------------------------------------------------------------------------
-- 534678912672195348198342567859 ... 53791713924856961537284287419635
-- 534678912672195348198342567859 ... 537917139248569615372842874196353
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345
-- 534678912672195348198342567859 ... 537917139248569615372842874196353452
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534528
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345286
-- 534678912672195348198342567859 ... 537917139248569615372842874196353452861
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534528617
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345286179
SET sqlblanklines on
SET linesize 132
SET pagesize 66
WITH
-- The following number generator is itself an example of a recursive CTE. It
-- produces the consecutive digits from 1 to 9.
Numbers(n) AS
(
-- The "anchor member." It contains exactly one row (N = 1).
SELECT 1 AS N
FROM dual
UNION ALL
-- The "recursive member." Each iteration of the recursive member produces the
-- next value of N. Recursive execution stops when N = 9.
SELECT N + 1 AS N
FROM Numbers
WHERE N < 9
),
RecursiveCTE(PartiallySolvedSudoku, BlankCell) AS
(
-- The "anchor member" of the recursive CTE. It contains exactly one row: the
-- unsolved Sudoku puzzle.
SELECT
cast(rpad('&&SudokuPuzzle', 81) AS VARCHAR2(81)) AS SudokuPuzzle,
instr(rpad('&&SudokuPuzzle', 81), ' ', 1) AS FirstBlankCell
FROM dual
UNION ALL
-- The "recursive member" of the recursive CTE. It generates intermediate
-- solutions by providing values for the first blank cell in the intermediate
-- solutions produced by the previous iteration. Recursive execution stops
-- when no more blank cells are left or if none of the intermediate solutions
-- generated by the previous iteration can be improved (meaning that the
-- Sudoku puzzle has no solution). All solutions are generated if the puzzle
-- has multiple solutions.
SELECT
-- Construct an intermediate solution by completing one blank cell.
cast(
substr(RecursiveCTE.PartiallySolvedSudoku, 1, BlankCell - 1)
|| to_char(Candidates.N)
|| substr(RecursiveCTE.PartiallySolvedSudoku, BlankCell + 1)
AS VARCHAR2(81)
) AS PartiallySolvedSudoku,
-- Locate the next blank cell, if any. Note that the result of instr is 0 if
-- the string we are searching does not contain what we are looking for.
instr(
RecursiveCTE.PartiallySolvedSudoku,
' ',
RecursiveCTE.BlankCell + 1
) AS NextBlankCell
FROM
-- The intermediate solutions from the previous iteration.
RecursiveCTE,
-- Consider all 9 candidate values from the Numbers table.
Numbers Candidates
WHERE NOT EXISTS
-- Filter out candidate values that have already been used in the same row,
-- the same column, or the same 3x3 grid as the blank cell. Note that a Sudoku
-- puzzle is really a 9x9 grid but we are using a one-dimensional array of 81
-- cells instead. Recursive execution will stop if none of the intermediate
-- solutions generated by the previous iteration can be improved.
-- The position within the one-dimensional array of the first cell in the same
-- row of the 9x9 grid as the blank cell is trunc((BlankCell-1)/9)*9 + 1. The
-- position of the Nth cell is obtained by adding N-1. For example, if
-- BlankCell = 41, then the positions of the cells in the same row are 37,
-- 37+1, 37+2, 37+3, 37+4, 37+5, 37+6, 37+7, and 37+8.
-- The position of the first cell in the same column of the 9x9 grid as the
-- blank cell is mod(BlankCell-1,9) + 1. The position of the Nth cell is
-- obtained by adding 9*(N-1). For example, if BlankCell = 41, then the
-- positions of the cells in the same column are 5, 5+9, 5+18, 5+27, 5+36,
-- 5+45, 5+54, 5+63, and 5+72.
-- The position of the first cell in the same 3x3 grid as the blank cell is
-- trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1. The
-- position of the Nth cell in the same 3x3 grid is obtained by adding (N-1) +
-- trunc((N-1)/3)*6. For example, if BlankCell = 41, then the positions of the
-- cells of in the same 3x3 grid are 31, 31+1, 31+2, 31+9, 31+10, 31+11,
-- 31+18, 31+19, and 31+20.
(
SELECT N FROM Numbers
WHERE to_char(Candidates.N) IN
(
-- Check the contents of the row containing the blank cell
substr
(
RecursiveCTE.PartiallySolvedSudoku,
trunc((BlankCell-1)/9)*9 + 1
+ (N-1),
1
),
-- Check the contents of the column containing the blank cell
substr
(
RecursiveCTE.PartiallySolvedSudoku,
mod(BlankCell-1,9) + 1
+ 9*(N-1),
1
),
-- Check the contents of the 3x3 grid containing the blank cell
substr
(
RecursiveCTE.PartiallySolvedSudoku,
trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1
+ (N-1) + trunc((N-1)/3)*6,
1
)
)
)
-- Stop processing when no more blank cells are left.
AND BlankCell > 0
)
SELECT PartiallySolvedSudoku "Partially Solved Sudoku"
FROM RecursiveCTE;
See Results of the Second International NoCOUG SQL Challenge and Results of the Third International NoCOUG SQL Challenge for other examples of the use of the recursive subquery factoring technique.
P.S. The recursive examples in this post will only work in Oracle Database 11g Release 2. Anton Scheffer has a solution for Sudoku using the MODEL clause and a solution for Sudoku using collections which work with Oracle Database 10g Release 2.
Performance comparison between CONNECT BY and recursive subquery factoring



SELECT level AS N
FROM dual
CONNECT BY level <= 9
seems faster, I guess CONNECT BY is more mature but Recursive Subquery is way more artistic, for a developer…
Nice query from Anton, this guy is mega-cool !
re: performance
Perhaps we should continue using the CONNECT BY clause whenever it is sufficient for the job and restrict the use of recursive common table expressions to situations where it is really needed. One place where recursive common table expressions come up short is CONNECT_BY_ISLEAF but one of Anton Scheffer’s colleagues has shown how to emulate it using the LEAD analytic function. See http://technology.amis.nl/blog/6533/oracle-11gr2-alternative-for-connect_by_isleaf-function-for-recursive-subquery-factoring-dedicated-to-anton.
re: Anton Scheffer
Anton has provided two other solutions for Sudoku: one using the MODEL clause and one using collections. I’ve added the information to the post.
CONNECT_BY_ISLEAF can be done with Recursive Subquery Factoring/Common Table Expressions.
Here are self contained examples:
First the STD CONNECT BY syntax
SQL> with test_data as ( 2 select /*+ inline gather_plan_statistics */ 3 column_value parent, decode(column_value,null,0,5,null,column_value)+1 child 4 from ( 5 table( 6 sys.odcinumberlist(null,1,2,3,4,5) 7 ) 8 ) 9 ) 10 select t.parent, t.child, connect_by_isleaf 11 from test_data t 12 connect by prior t.child = t.parent 13 start with t.parent is null; PARENT CHILD CONNECT_BY_ISLEAF ------- ------- ----------------- 1 0 1 2 0 2 3 0 3 4 0 4 5 0 5 1 6 rows selected. Next, a recursive subquery SQL> with test_data as ( 2 select /*+ inline gather_plan_statistics */ 3 column_value parent, decode(column_value,null,0,5,null,column_value)+1 child 4 from ( 5 table( 6 sys.odcinumberlist(null,1,2,3,4,5) 7 ) 8 ) 9 ), 10 test_recurse (parent,child,isleaf) as ( 11 select t.parent, t.child, decode(t.child,null,1,0) isleaf 12 from test_data t 13 where parent is null 14 union all 15 select t.parent, nvl(t.child,null) child, decode(t.child,null,1,0) isleaf 16 from test_data t 17 join test_recurse tr on tr.child = t.parent 18 ) 19 select t.parent parent, t.child, t.isleaf 20 from test_recurse t; PARENT CHILD ISLEAF ------- ------- ---------- 1 0 1 2 0 2 3 0 3 4 0 4 5 0 5 1 6 rows selected.Hmmm… that didn’t turn out too well.
If I can edit that I will, as it is not formatted well.
I May have spoken too soon – there doesn’t seem to be a way do this with real data min a recursive CTE.
OK, it can be done with recursive CTE, but it isn’t pretty
I will try embedding this with gist.
If it doesn’t appear, just head on over to this URL: http://gist.github.com/516760
Thanks, Jared. The sourcecode tag within square brackets can be used in WordPress comments to preserve formatting of SQL code. I’ve patched up your first commment and it is now readable.
Here’s the code that you posted on GitHub:
-- connect_by_isleaf with standard connect by select lpad(' ',2*(level-1)) || last_name last_name, connect_by_isleaf from hr.employees start with manager_id is null connect by prior employee_id = manager_id; /* LAST_NAME CONNECT_BY_ISLEAF -------------------- ----------------- King 0 Kochhar 0 Greenberg 0 Faviet 1 Chen 1 Sciarra 1 Urman 1 Popp 1 Whalen 1 Mavris 1 Baer 1 Higgins 0 Gietz 1 De Haan 0 Hunold 0 Ernst 1 ... */ -- Now using recursive subquery factoring with leaves as ( select employee_id from hr.employees where employee_id not in ( select manager_id from hr.employees where manager_id is not null ) ), emp(manager_id,employee_id,last_name,lvl,isleaf) as ( select e.manager_id, e.employee_id, e.last_name, 1 as lvl, 0 as isleaf from hr.employees e where e.manager_id is null union all select e.manager_id, nvl(e.employee_id,null) employee_id, e.last_name, emp.lvl + 1 as lvl , decode(l.employee_id,null,0,1) isleaf from hr.employees e join emp on emp.employee_id = e.manager_id left outer join leaves l on l.employee_id = e.employee_id --join hr.employees e on e.manager_id = emp.employee_id --join emp on emp.employee_id = e.manager_id ) search depth first by last_name set order1 select lpad(' ',2*(lvl-1)) || last_name last_name, isleaf from emp / /* LAST_NAME ISLEAF -------------------- ---------- King 0 Cambrault 0 Bates 1 Bloom 1 Fox 1 Kumar 1 Ozer 1 Smith 1 De Haan 0 Hunold 0 Austin 1 ... */Thanks Iggy, I didn’t know about the square brackets.
Though most (if not all, not sure yet) of the functionality that Oracle provides via connect by can be accomplished with RCTE, I’m with you on this: Unless you have some need to do this with ANSI standards, the CONNECT BY may be much simpler. Using CONNECT BY in a subfactored query is a powerful combination.
Hi Iggy, thanks for article!
Have you tried RCTEs with DATE data types? It seems to be a whole world of pain. I couldn’t see any reason why it wouldn’t be supported.
On 11.2.0.1, we get either a compilation error or curiously wrong results.
– This fails with
–ORA-01790: expression must have same datatype as corresponding expression
with test(X) as
( select date ’2007-01-01′
from dual
union all
select (X + 1)
from test
where X <= to_date('2007-01-02','YYYY-MM-DD')
)
select *
from test;
– This executes but returns dates earlier(!)
– than the 1st of January 2007 back to 31st December 1999(!)
with test(X) as
( select cast(date '2007-01-01' as date)
from dual
union all
select (X + 1)
from test
where X <= to_date('2007-01-02','YYYY-MM-DD')
)
select *
from test;
– This executes but returns only one row.
with test(X) as
( select cast(date '2008-01-01' as date)
from dual
union all
select (X + 1)
from test
where X <= to_date('2008-01-10','YYYY-MM-DD')
)
select *
from test;
– The difference in whether it fails immediately
– is the encoding type.
– One is the in-memory type (13) and the other
– is the storage type (12)
select dump(date '2011-01-01' ,16)
, dump(cast(date '2007-01-01' as date) ,16)
from dual;
Hi, Tim,
I’m interested in what Oracle Support had to say. Perhaps you would consider working around the problem in the following way. Suppose that you wanted to generate a contiguous sequence of dates starting with a particular date. You could use something like the following:
WITH test(n) AS ( SELECT 0 FROM dual UNION ALL SELECT n + 1 FROM test WHERE n < &num_days - 1 ) SELECT to_date('&start_date','YYYY-MM-DD') + n FROM test;Kindest regards,
Iggy