Home > Oracle, SQL > I Heart Recursive Subquery Factoring (Recursive Common Table Expressions)

I Heart Recursive Subquery Factoring (Recursive Common Table Expressions)


Also see Performance comparison between CONNECT BY and recursive subquery factoring

I enjoy attending RMOUG Training Days because it is an educational workshop, not a gigantic trade show like Oracle OpenWorld. The presentations are not “customer success stories” or sales pitches; they are technical presentations on technical topics that matter to database administrators and developers. Here is a picture taken at the speaker reception at Training Days 2010 sent to me by Dan Hotka who has a quarter of a century of experience with Oracle Database. Dan is now inventing the future of Oracle training with Oracle Training at Your Desk; I highly recommend it.

Seated from left to right: Guy Harrison, Steven Feuerstein, John Beresniewicz, Maria Colgan, and Daniel Liu. Standing from left to right: Iggy Fernandez (me), Chen Shapira, and Sumit Sengupta.

One of my presentations at Training Days 2010 was on recursive common table expressions in Oracle 11gR2. Oracle was late to the table with recursive common table expressions which have been part of the SQL standard since 2003 but, to Oracle’s credit, it has provided the CONNECT BY clause for hierarchical queries from the very beginning. However, recursive common table expressions can be used for much more than hierarchical queries. Also note that Oracle uses the non-standard terms “subquery factoring” and “recursive subquery factoring” for “common table expressions” and “recursive common table expressions” respectively.

Here is a quick recap of common table expressions of the non-recursive kind. They are an alternative to inline views and make a complex SQL query much easier to read and maintain. Another advantage is that they only need to be evaluated once if they are used more than once within the query. My favorite example of using common table expressions to make complex SQL statements readable is the winning solution to the First International NoCOUG SQL Challenge by Alberto Dell’Era from Italy. A simpler example—Suppliers Who Supply All Parts—is in my book.

On to recursive common table expressions. A recursive common table expression (recursive CTE) contains subqueries called “anchor members” and “recursive members.” The rows produced by the anchor members are processed by the recursive members. The recursive members produce other rows that are fed right back to them for further processing. Recursion stops only when the recursive members fail to produce additional rows. The explanation in the Oracle documentation is fairly cryptic but a good explanation can be found on the Microsoft Developer Network.

 

  • Split the CTE expression into anchor and recursive members.
  • Run the anchor member(s) creating the first invocation or base result set (T0).
  • Run the recursive member(s) with Ti as an input and Ti+1 as an output.
  • Repeat step 3 until an empty set is returned.
  • Return the result set. This is a UNION ALL of T0 to Tn.

 

Here’s a simple example of a recursive common table expression: a number generator. The following example generates the consecutive numbers from 1 to 9. The anchor member generates the first row which is then processed by the recursive member. The recursive member uses the name of the recursive CTE as a placeholder for the output of the anchor member or a previous execution of the recursive CTE. In this example, each execution of the recursive member produces one more row. Recursion stops when nine records have been produced.

WITH

-- The following number generator is a simple example of a recursive CTE. It
-- produces the consecutive digits from 1 to 9.

Numbers(n) AS
(

  -- The "anchor member." It contains exactly one row (N = 1).

  SELECT 1 AS N
  FROM dual

  UNION ALL

  -- The "recursive member." Notice that it references the name of the recursive
  -- CTE as a placeholder for the results of the anchor member or the previous
  -- execution of the recursive CTE. Each iteration of the recursive member
  -- produces the next value of N. Recursive execution stops when N = 9.

  SELECT N + 1 AS N
  FROM Numbers
  WHERE N < 9

)

SELECT *
FROM Numbers;

The above example can be simply duplicated using the CONNECT BY clause as follows:

SELECT level AS N
FROM dual
CONNECT BY level <= 9;

Next consider a standard hierarchical query; an org-chart of managers and employees. First, here’s the old solution using the CONNECT BY clause; it is short and sweet.

SELECT
  LPAD (' ', 4 * (LEVEL - 1)) || first_name || ' ' || last_name AS name
FROM employees
START WITH manager_id IS NULL
CONNECT BY manager_id = PRIOR employee_id;

The solution using recursive common table expressions is much more verbose. Note especially the SEARCH DEPTH FIRST clause; refer to the Oracle documentation for an explanation.

WITH

RecursiveCTE (employee_id, first_name, last_name, lvl) AS
(

  -- The "anchor member" of the recursive CTE. It locates employees who don't
  -- have any manager; presumably there is at least one such employee.

  SELECT
    employee_id,
    first_name,
    last_name,
    1 AS lvl
  FROM
    employees
  WHERE manager_id IS NULL

  UNION ALL

  -- The "recursive member" of the recursive CTE. Notice that it uses the name
  -- of the recursive CTE as a placeholder for the results of the anchor member
  -- or the previous execution of the recursive CTE. Each iteration of the
  -- recursive member locates the employees who report to the employees located
  -- in the previous iteration. Recursive execution stops when all employees
  -- have been located.

  SELECT
    e.employee_id,
    e.first_name,
    e.last_name,
    lvl + 1 AS lvl
  FROM
    RecursiveCTE r INNER JOIN employees e
    ON (r.employee_id = e.manager_id)

)

-- Go deep in order to produce records in exactly the same order as the CONNECT
-- BY clause. The default order of processing is BREADTH FIRST which would
-- produce all managers at the same level before any of their employees; this is
-- not not the order in which the CONNECT BY produces rows. The pseudocolumn
-- seq# has been designated here to capture the order in which records are
-- produced by the recursive CTE; it will be used in the main query.

SEARCH DEPTH FIRST BY employee_id ASC SET seq#

-- This is the main query. It processes the results produced by the recursive
-- CTE.

SELECT LPAD (' ', 4 * (lvl - 1)) || first_name || ' ' || last_name AS name
FROM RecursiveCTE
ORDER BY seq#;

From the two examples above, it might appear that a recursive CTE is little more than a verbose way of specifying what could be more succinctly achieved with the CONNECT BY clause. However recursive common table expressions are more powerful than the CONNECT BY clause. For example, consider the following example from the TSQL Challenges team:

Given a list of products and a list of discount coupons, we need to use the following rules to find the minimum price for each product:

  1. Not more than two coupons can be applied to the same product.
  2. The discounted price cannot be less than 70% of the original price.
  3. The total amount of the discount cannot exceed $30.

Syed Mehroz Alam has provided an elegant solution to the above problem using recursive common table expressions.

My final exhibit is a Sudoku puzzle. It turns out that a Sudoku puzzle can be elegantly solved with recursive common table expressions. The solution was discovered by Anton Scheffer. The listing below is a heavily annotated version of Anton Scheffer’s solution with some cosmetic changes for better understandability.

-- The following SQL statement solves a Sudoku puzzle that is provided in the
-- form of a one-dimensional array of 81 digits. Note that a Sudoku puzzle is
-- really a 9x9 square grid. Here is how the positions in the one-dimensional
-- array correspond to positions in the 9x9 grid.

-- +---------+---------+---------+
-- |  1  2  3|  4  5  6|  7  8  9|
-- | 10 11 12| 13 14 15| 16 17 18|
-- | 19 20 21| 22 23 24| 25 26 27|
-- +---------+---------+---------+
-- | 28 29 30| 31 32 33| 34 35 36|
-- | 37 38 39| 40 41 42| 43 44 45|
-- | 46 47 48| 49 50 51| 52 53 54|
-- +---------+---------+---------+
-- | 55 56 57| 58 59 60| 61 62 63|
-- | 64 65 66| 67 68 69| 70 71 72|
-- | 73 74 75| 76 77 78| 79 80 81|
-- +---------+---------+---------+

-- A recursive common table expression (CTE) is used to solve the puzzle. The
-- "anchor member" of the recursive CTE contains the unsolved Sudoku puzzle. The
-- "recursive member" generates partially solved Sudokus. Each iteration of the
-- recursive member completes one blank cell. Recursive execution stops when no
-- more blank cells are left or if no value can be found for a blank cell
-- (meaning that the Sudoku has no solution). All solutions are produced if the
-- puzzle has multiple solutions.

-- Consider the following Sudoku puzzle: (All but the last 9 cells are filled.)

-- 534678912672195348198342567859761423426853791713924856961537284287419635

-- Here is the output produced for the above puzzle: (The output has been
-- condensed in order to accomodate it within the available space.)

-- Partially Solved Sudoku
-- ----------------------------------------------------------------------------
-- 534678912672195348198342567859 ... 53791713924856961537284287419635
-- 534678912672195348198342567859 ... 537917139248569615372842874196353
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345
-- 534678912672195348198342567859 ... 537917139248569615372842874196353452
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534528
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345286
-- 534678912672195348198342567859 ... 537917139248569615372842874196353452861
-- 534678912672195348198342567859 ... 5379171392485696153728428741963534528617
-- 534678912672195348198342567859 ... 53791713924856961537284287419635345286179

SET sqlblanklines on
SET linesize 132
SET pagesize 66

WITH

-- The following number generator is itself an example of a recursive CTE. It
-- produces the consecutive digits from 1 to 9.

Numbers(n) AS
(

  -- The "anchor member." It contains exactly one row (N = 1).

  SELECT 1 AS N
  FROM dual

  UNION ALL

  -- The "recursive member." Each iteration of the recursive member produces the
  -- next value of N. Recursive execution stops when N = 9.

  SELECT N + 1 AS N
  FROM Numbers
  WHERE N < 9

),

RecursiveCTE(PartiallySolvedSudoku, BlankCell) AS
(
  -- The "anchor member" of the recursive CTE. It contains exactly one row: the
  -- unsolved Sudoku puzzle.

  SELECT
    cast(rpad('&&SudokuPuzzle', 81) AS VARCHAR2(81)) AS SudokuPuzzle,
    instr(rpad('&&SudokuPuzzle', 81), ' ', 1) AS FirstBlankCell
  FROM dual

  UNION ALL

  -- The "recursive member" of the recursive CTE. It generates intermediate
  -- solutions by providing values for the first blank cell in the intermediate
  -- solutions produced by the previous iteration. Recursive execution stops
  -- when no more blank cells are left or if none of the intermediate solutions
  -- generated by the previous iteration can be improved (meaning that the
  -- Sudoku puzzle has no solution). All solutions are generated if the puzzle
  -- has multiple solutions.

  SELECT

    -- Construct an intermediate solution by completing one blank cell.

    cast(
      substr(RecursiveCTE.PartiallySolvedSudoku, 1, BlankCell - 1)
      || to_char(Candidates.N)
      || substr(RecursiveCTE.PartiallySolvedSudoku, BlankCell + 1)
      AS VARCHAR2(81)
    ) AS PartiallySolvedSudoku,

    -- Locate the next blank cell, if any. Note that the result of instr is 0 if
    -- the string we are searching does not contain what we are looking for.

    instr(
      RecursiveCTE.PartiallySolvedSudoku,
      ' ',
      RecursiveCTE.BlankCell + 1
    ) AS NextBlankCell

  FROM

    -- The intermediate solutions from the previous iteration.
    RecursiveCTE,

    -- Consider all 9 candidate values from the Numbers table.
    Numbers Candidates

  WHERE NOT EXISTS

  -- Filter out candidate values that have already been used in the same row,
  -- the same column, or the same 3x3 grid as the blank cell. Note that a Sudoku
  -- puzzle is really a 9x9 grid but we are using a one-dimensional array of 81
  -- cells instead. Recursive execution will stop if none of the intermediate
  -- solutions generated by the previous iteration can be improved.

  -- The position within the one-dimensional array of the first cell in the same
  -- row of the 9x9 grid as the blank cell is trunc((BlankCell-1)/9)*9 + 1. The
  -- position of the Nth cell is obtained by adding N-1. For example, if
  -- BlankCell = 41, then the positions of the cells in the same row are 37,
  -- 37+1, 37+2, 37+3, 37+4, 37+5, 37+6, 37+7, and 37+8.

  -- The position of the first cell in the same column of the 9x9 grid as the
  -- blank cell is mod(BlankCell-1,9) + 1. The position of the Nth cell is
  -- obtained by adding 9*(N-1). For example, if BlankCell = 41, then the
  -- positions of the cells in the same column are 5, 5+9, 5+18, 5+27, 5+36,
  -- 5+45, 5+54, 5+63, and 5+72.

  -- The position of the first cell in the same 3x3 grid as the blank cell is
  -- trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1. The
  -- position of the Nth cell in the same 3x3 grid is obtained by adding (N-1) +
  -- trunc((N-1)/3)*6. For example, if BlankCell = 41, then the positions of the
  -- cells of in the same 3x3 grid are 31, 31+1, 31+2, 31+9, 31+10, 31+11,
  -- 31+18, 31+19, and 31+20.

  (
    SELECT N FROM Numbers

    WHERE to_char(Candidates.N) IN
    (
      -- Check the contents of the row containing the blank cell

      substr
      (
        RecursiveCTE.PartiallySolvedSudoku,
        trunc((BlankCell-1)/9)*9 + 1
          + (N-1),
        1
      ),

      -- Check the contents of the column containing the blank cell

      substr
      (
        RecursiveCTE.PartiallySolvedSudoku,
        mod(BlankCell-1,9) + 1
          + 9*(N-1),
        1
      ),

      -- Check the contents of the 3x3 grid containing the blank cell

      substr
      (
        RecursiveCTE.PartiallySolvedSudoku,
        trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1
          + (N-1) + trunc((N-1)/3)*6,
        1
      )
    )
  )

  -- Stop processing when no more blank cells are left.

  AND BlankCell > 0

)

SELECT PartiallySolvedSudoku "Partially Solved Sudoku"
FROM RecursiveCTE;

See Results of the Second International NoCOUG SQL Challenge and Results of the Third International NoCOUG SQL Challenge for other examples of the use of the recursive subquery factoring technique.

P.S. The recursive examples in this post will only work in Oracle Database 11Release 2. Anton Scheffer has a solution for Sudoku using the MODEL clause and a solution for Sudoku using collections which work with Oracle Database 10Release 2.

Performance comparison between CONNECT BY and recursive subquery factoring

Categories: Oracle, SQL
  1. March 1, 2010 at 12:59 am

    SELECT level AS N
    FROM dual
    CONNECT BY level <= 9

    seems faster, I guess CONNECT BY is more mature but Recursive Subquery is way more artistic, for a developer…

    Nice query from Anton, this guy is mega-cool !

  2. Iggy Fernandez
    March 1, 2010 at 10:37 am

    re: performance

    Perhaps we should continue using the CONNECT BY clause whenever it is sufficient for the job and restrict the use of recursive common table expressions to situations where it is really needed. One place where recursive common table expressions come up short is CONNECT_BY_ISLEAF but one of Anton Scheffer’s colleagues has shown how to emulate it using the LEAD analytic function. See http://technology.amis.nl/blog/6533/oracle-11gr2-alternative-for-connect_by_isleaf-function-for-recursive-subquery-factoring-dedicated-to-anton.

    re: Anton Scheffer

    Anton has provided two other solutions for Sudoku: one using the MODEL clause and one using collections. I’ve added the information to the post.

  3. August 9, 2010 at 9:53 pm

    CONNECT_BY_ISLEAF can be done with Recursive Subquery Factoring/Common Table Expressions.

    Here are self contained examples:

    First the STD CONNECT BY syntax

    SQL> with test_data as (
      2     select /*+ inline gather_plan_statistics */
      3             column_value parent, decode(column_value,null,0,5,null,column_value)+1 child
      4     from (
      5        table(
      6           sys.odcinumberlist(null,1,2,3,4,5)
      7        )
      8     )
      9  )
     10  select t.parent, t.child, connect_by_isleaf
     11  from test_data t
     12  connect by prior t.child = t.parent
     13  start with t.parent is null;
    
     PARENT   CHILD CONNECT_BY_ISLEAF
    ------- ------- -----------------
                  1                 0
          1       2                 0
          2       3                 0
          3       4                 0
          4       5                 0
          5                         1
    
    6 rows selected.
    
    
    Next, a recursive subquery
    
    SQL> with test_data as (
      2     select /*+ inline gather_plan_statistics */
      3             column_value parent, decode(column_value,null,0,5,null,column_value)+1 child
      4     from (
      5        table(
      6           sys.odcinumberlist(null,1,2,3,4,5)
      7        )
      8     )
      9  ),
     10  test_recurse (parent,child,isleaf) as (
     11     select t.parent, t.child, decode(t.child,null,1,0) isleaf
     12     from test_data t
     13     where parent is null
     14     union all
     15     select t.parent, nvl(t.child,null) child, decode(t.child,null,1,0) isleaf
     16     from test_data t
     17     join test_recurse tr on tr.child  = t.parent
     18  )
     19  select  t.parent parent, t.child, t.isleaf
     20  from test_recurse t;
    
     PARENT   CHILD     ISLEAF
    ------- ------- ----------
                  1          0
          1       2          0
          2       3          0
          3       4          0
          4       5          0
          5                  1
    
    6 rows selected.
  4. August 9, 2010 at 10:05 pm

    Hmmm… that didn’t turn out too well.
    If I can edit that I will, as it is not formatted well.

  5. August 9, 2010 at 10:12 pm

    I May have spoken too soon – there doesn’t seem to be a way do this with real data min a recursive CTE.

  6. August 9, 2010 at 10:48 pm

    OK, it can be done with recursive CTE, but it isn’t pretty

    I will try embedding this with gist.
    If it doesn’t appear, just head on over to this URL: http://gist.github.com/516760

  7. Iggy Fernandez
    August 10, 2010 at 6:46 am

    Thanks, Jared. The sourcecode tag within square brackets can be used in WordPress comments to preserve formatting of SQL code. I’ve patched up your first commment and it is now readable.

    Here’s the code that you posted on GitHub:

    -- connect_by_isleaf with standard connect by
    
    select lpad(' ',2*(level-1)) || last_name last_name, connect_by_isleaf
    from hr.employees
    start with manager_id is null
    connect by prior employee_id = manager_id;
    
    /*
    
    LAST_NAME            CONNECT_BY_ISLEAF
    -------------------- -----------------
    King                                 0
      Kochhar                            0
        Greenberg                        0
          Faviet                         1
          Chen                           1
          Sciarra                        1
          Urman                          1
          Popp                           1
        Whalen                           1
        Mavris                           1
        Baer                             1
        Higgins                          0
          Gietz                          1
      De Haan                            0
        Hunold                           0
          Ernst                          1
    ...
    
    */
    
    -- Now using recursive subquery factoring
    
    with leaves as (
    	select employee_id
    	from hr.employees
    	where employee_id not in (
    		select manager_id
    		from hr.employees
    		where manager_id is not null
    	)
    ),
    emp(manager_id,employee_id,last_name,lvl,isleaf) as (
    	select e.manager_id, e.employee_id, e.last_name, 1 as lvl, 0 as isleaf
    	from hr.employees e
    	where e.manager_id is null
    	union all
    	select e.manager_id, nvl(e.employee_id,null) employee_id,  e.last_name, emp.lvl + 1 as lvl
    		, decode(l.employee_id,null,0,1) isleaf
    	from hr.employees e
    	join emp on emp.employee_id = e.manager_id
    	left outer join leaves l on l.employee_id = e.employee_id
    	--join hr.employees e on e.manager_id = emp.employee_id
    	--join emp on emp.employee_id = e.manager_id
    )
    search depth first by last_name set order1
    select lpad(' ',2*(lvl-1)) || last_name last_name, isleaf
    from emp
    /
    
    /*
    
    LAST_NAME                ISLEAF
    -------------------- ----------
    King                          0
      Cambrault                   0
        Bates                     1
        Bloom                     1
        Fox                       1
        Kumar                     1
        Ozer                      1
        Smith                     1
      De Haan                     0
        Hunold                    0
          Austin                  1
    ...
    */
    
  8. Jared
    August 10, 2010 at 8:22 am

    Thanks Iggy, I didn’t know about the square brackets.

    Though most (if not all, not sure yet) of the functionality that Oracle provides via connect by can be accomplished with RCTE, I’m with you on this: Unless you have some need to do this with ANSI standards, the CONNECT BY may be much simpler. Using CONNECT BY in a subfactored query is a powerful combination.

  9. Tim Hopkins
    February 14, 2011 at 10:40 am

    Hi Iggy, thanks for article!

    Have you tried RCTEs with DATE data types? It seems to be a whole world of pain. I couldn’t see any reason why it wouldn’t be supported.

    On 11.2.0.1, we get either a compilation error or curiously wrong results.

    — This fails with
    –ORA-01790: expression must have same datatype as corresponding expression
    with test(X) as
    ( select date ‘2007-01-01’
    from dual
    union all
    select (X + 1)
    from test
    where X <= to_date('2007-01-02','YYYY-MM-DD')
    )
    select *
    from test;

    — This executes but returns dates earlier(!)
    — than the 1st of January 2007 back to 31st December 1999(!)
    with test(X) as
    ( select cast(date '2007-01-01' as date)
    from dual
    union all
    select (X + 1)
    from test
    where X <= to_date('2007-01-02','YYYY-MM-DD')
    )
    select *
    from test;

    — This executes but returns only one row.
    with test(X) as
    ( select cast(date '2008-01-01' as date)
    from dual
    union all
    select (X + 1)
    from test
    where X <= to_date('2008-01-10','YYYY-MM-DD')
    )
    select *
    from test;

    — The difference in whether it fails immediately
    — is the encoding type.
    — One is the in-memory type (13) and the other
    — is the storage type (12)
    select dump(date '2011-01-01' ,16)
    , dump(cast(date '2007-01-01' as date) ,16)
    from dual;

  10. Iggy Fernandez
    February 14, 2011 at 9:19 pm

    Hi, Tim,

    Tim Hopkins :

    Have you tried RCTEs with DATE data types? It seems to be a whole world of pain. I couldn’t see any reason why it wouldn’t be supported. On 11.2.0.1, we get either a compilation error or curiously wrong results.

    I’m interested in what Oracle Support had to say. Perhaps you would consider working around the problem in the following way. Suppose that you wanted to generate a contiguous sequence of dates starting with a particular date. You could use something like the following:

    WITH test(n) AS
    (
      SELECT 0
      FROM dual
      UNION ALL
      SELECT n + 1
      FROM test
      WHERE n < &num_days - 1
    )
    SELECT to_date('&start_date','YYYY-MM-DD') + n
    FROM test;
    

    Kindest regards,

    Iggy

  1. No trackbacks yet.

Leave a comment