I Heart Recursive Subquery Factoring (Recursive Common Table Expressions)
I enjoy attending RMOUG Training Days because it is an educational workshop, not a gigantic trade show like Oracle OpenWorld. The presentations are not “customer success stories” or sales pitches; they are technical presentations on technical topics that matter to database administrators and developers. Here is a picture taken at the speaker reception at Training Days 2010 sent to me by Dan Hotka who has a quarter of a century of experience with Oracle Database. Dan is now inventing the future of Oracle training with Oracle Training at Your Desk; I highly recommend it.
One of my presentations at Training Days 2010 was on recursive common table expressions in Oracle 11gR2. Oracle was late to the table with recursive common table expressions which have been part of the SQL standard since 2003 but, to Oracle’s credit, it has provided the CONNECT BY clause for hierarchical queries from the very beginning. However, recursive common table expressions can be used for much more than hierarchical queries. Also note that Oracle uses the non-standard terms “subquery factoring” and “recursive subquery factoring” for “common table expressions” and “recursive common table expressions” respectively.
Here is a quick recap of common table expressions of the non-recursive kind. They are an alternative to inline views and make a complex SQL query much easier to read and maintain. Another advantage is that they only need to be evaluated once if they are used more than once within the query. My favorite example of using common table expressions to make complex SQL statements readable is the winning solution to the First International NoCOUG SQL Challenge by Alberto Dell’Era from Italy. A simpler example—Suppliers Who Supply All Parts—is in my book.
On to recursive common table expressions. A recursive common table expression (recursive CTE) contains subqueries called “anchor members” and “recursive members.” The rows produced by the anchor members are processed by the recursive members. The recursive members produce other rows that are fed right back to them for further processing. Recursion stops only when the recursive members fail to produce additional rows. The explanation in the Oracle documentation is fairly cryptic but a good explanation can be found on the Microsoft Developer Network.
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
Here’s a simple example of a recursive common table expression: a number generator. The following example generates the consecutive numbers from 1 to 9. The anchor member generates the first row which is then processed by the recursive member. The recursive member uses the name of the recursive CTE as a placeholder for the output of the anchor member or a previous execution of the recursive CTE. In this example, each execution of the recursive member produces one more row. Recursion stops when nine records have been produced.
WITH -- The following number generator is a simple example of a recursive CTE. It -- produces the consecutive digits from 1 to 9. Numbers(n) AS ( -- The "anchor member." It contains exactly one row (N = 1). SELECT 1 AS N FROM dual UNION ALL -- The "recursive member." Notice that it references the name of the recursive -- CTE as a placeholder for the results of the anchor member or the previous -- execution of the recursive CTE. Each iteration of the recursive member -- produces the next value of N. Recursive execution stops when N = 9. SELECT N + 1 AS N FROM Numbers WHERE N < 9 ) SELECT * FROM Numbers;
The above example can be simply duplicated using the CONNECT BY clause as follows:
SELECT level AS N FROM dual CONNECT BY level <= 9;
Next consider a standard hierarchical query; an org-chart of managers and employees. First, here’s the old solution using the CONNECT BY clause; it is short and sweet.
SELECT LPAD (' ', 4 * (LEVEL - 1)) || first_name || ' ' || last_name AS name FROM employees START WITH manager_id IS NULL CONNECT BY manager_id = PRIOR employee_id;
The solution using recursive common table expressions is much more verbose. Note especially the SEARCH DEPTH FIRST clause; refer to the Oracle documentation for an explanation.
WITH RecursiveCTE (employee_id, first_name, last_name, lvl) AS ( -- The "anchor member" of the recursive CTE. It locates employees who don't -- have any manager; presumably there is at least one such employee. SELECT employee_id, first_name, last_name, 1 AS lvl FROM employees WHERE manager_id IS NULL UNION ALL -- The "recursive member" of the recursive CTE. Notice that it uses the name -- of the recursive CTE as a placeholder for the results of the anchor member -- or the previous execution of the recursive CTE. Each iteration of the -- recursive member locates the employees who report to the employees located -- in the previous iteration. Recursive execution stops when all employees -- have been located. SELECT e.employee_id, e.first_name, e.last_name, lvl + 1 AS lvl FROM RecursiveCTE r INNER JOIN employees e ON (r.employee_id = e.manager_id) ) -- Go deep in order to produce records in exactly the same order as the CONNECT -- BY clause. The default order of processing is BREADTH FIRST which would -- produce all managers at the same level before any of their employees; this is -- not not the order in which the CONNECT BY produces rows. The pseudocolumn -- seq# has been designated here to capture the order in which records are -- produced by the recursive CTE; it will be used in the main query. SEARCH DEPTH FIRST BY employee_id ASC SET seq# -- This is the main query. It processes the results produced by the recursive -- CTE. SELECT LPAD (' ', 4 * (lvl - 1)) || first_name || ' ' || last_name AS name FROM RecursiveCTE ORDER BY seq#;
From the two examples above, it might appear that a recursive CTE is little more than a verbose way of specifying what could be more succinctly achieved with the CONNECT BY clause. However recursive common table expressions are more powerful than the CONNECT BY clause. For example, consider the following example from the TSQL Challenges team:
Given a list of products and a list of discount coupons, we need to use the following rules to find the minimum price for each product:
- Not more than two coupons can be applied to the same product.
- The discounted price cannot be less than 70% of the original price.
- The total amount of the discount cannot exceed $30.
Syed Mehroz Alam has provided an elegant solution to the above problem using recursive common table expressions.
My final exhibit is a Sudoku puzzle. It turns out that a Sudoku puzzle can be elegantly solved with recursive common table expressions. The solution was discovered by Anton Scheffer. The listing below is a heavily annotated version of Anton Scheffer’s solution with some cosmetic changes for better understandability.
-- The following SQL statement solves a Sudoku puzzle that is provided in the -- form of a one-dimensional array of 81 digits. Note that a Sudoku puzzle is -- really a 9x9 square grid. Here is how the positions in the one-dimensional -- array correspond to positions in the 9x9 grid. -- +---------+---------+---------+ -- | 1 2 3| 4 5 6| 7 8 9| -- | 10 11 12| 13 14 15| 16 17 18| -- | 19 20 21| 22 23 24| 25 26 27| -- +---------+---------+---------+ -- | 28 29 30| 31 32 33| 34 35 36| -- | 37 38 39| 40 41 42| 43 44 45| -- | 46 47 48| 49 50 51| 52 53 54| -- +---------+---------+---------+ -- | 55 56 57| 58 59 60| 61 62 63| -- | 64 65 66| 67 68 69| 70 71 72| -- | 73 74 75| 76 77 78| 79 80 81| -- +---------+---------+---------+ -- A recursive common table expression (CTE) is used to solve the puzzle. The -- "anchor member" of the recursive CTE contains the unsolved Sudoku puzzle. The -- "recursive member" generates partially solved Sudokus. Each iteration of the -- recursive member completes one blank cell. Recursive execution stops when no -- more blank cells are left or if no value can be found for a blank cell -- (meaning that the Sudoku has no solution). All solutions are produced if the -- puzzle has multiple solutions. -- Consider the following Sudoku puzzle: (All but the last 9 cells are filled.) -- 534678912672195348198342567859761423426853791713924856961537284287419635 -- Here is the output produced for the above puzzle: (The output has been -- condensed in order to accomodate it within the available space.) -- Partially Solved Sudoku -- ---------------------------------------------------------------------------- -- 534678912672195348198342567859 ... 53791713924856961537284287419635 -- 534678912672195348198342567859 ... 537917139248569615372842874196353 -- 534678912672195348198342567859 ... 5379171392485696153728428741963534 -- 534678912672195348198342567859 ... 53791713924856961537284287419635345 -- 534678912672195348198342567859 ... 537917139248569615372842874196353452 -- 534678912672195348198342567859 ... 5379171392485696153728428741963534528 -- 534678912672195348198342567859 ... 53791713924856961537284287419635345286 -- 534678912672195348198342567859 ... 537917139248569615372842874196353452861 -- 534678912672195348198342567859 ... 5379171392485696153728428741963534528617 -- 534678912672195348198342567859 ... 53791713924856961537284287419635345286179 SET sqlblanklines on SET linesize 132 SET pagesize 66 WITH -- The following number generator is itself an example of a recursive CTE. It -- produces the consecutive digits from 1 to 9. Numbers(n) AS ( -- The "anchor member." It contains exactly one row (N = 1). SELECT 1 AS N FROM dual UNION ALL -- The "recursive member." Each iteration of the recursive member produces the -- next value of N. Recursive execution stops when N = 9. SELECT N + 1 AS N FROM Numbers WHERE N < 9 ), RecursiveCTE(PartiallySolvedSudoku, BlankCell) AS ( -- The "anchor member" of the recursive CTE. It contains exactly one row: the -- unsolved Sudoku puzzle. SELECT cast(rpad('&&SudokuPuzzle', 81) AS VARCHAR2(81)) AS SudokuPuzzle, instr(rpad('&&SudokuPuzzle', 81), ' ', 1) AS FirstBlankCell FROM dual UNION ALL -- The "recursive member" of the recursive CTE. It generates intermediate -- solutions by providing values for the first blank cell in the intermediate -- solutions produced by the previous iteration. Recursive execution stops -- when no more blank cells are left or if none of the intermediate solutions -- generated by the previous iteration can be improved (meaning that the -- Sudoku puzzle has no solution). All solutions are generated if the puzzle -- has multiple solutions. SELECT -- Construct an intermediate solution by completing one blank cell. cast( substr(RecursiveCTE.PartiallySolvedSudoku, 1, BlankCell - 1) || to_char(Candidates.N) || substr(RecursiveCTE.PartiallySolvedSudoku, BlankCell + 1) AS VARCHAR2(81) ) AS PartiallySolvedSudoku, -- Locate the next blank cell, if any. Note that the result of instr is 0 if -- the string we are searching does not contain what we are looking for. instr( RecursiveCTE.PartiallySolvedSudoku, ' ', RecursiveCTE.BlankCell + 1 ) AS NextBlankCell FROM -- The intermediate solutions from the previous iteration. RecursiveCTE, -- Consider all 9 candidate values from the Numbers table. Numbers Candidates WHERE NOT EXISTS -- Filter out candidate values that have already been used in the same row, -- the same column, or the same 3x3 grid as the blank cell. Note that a Sudoku -- puzzle is really a 9x9 grid but we are using a one-dimensional array of 81 -- cells instead. Recursive execution will stop if none of the intermediate -- solutions generated by the previous iteration can be improved. -- The position within the one-dimensional array of the first cell in the same -- row of the 9x9 grid as the blank cell is trunc((BlankCell-1)/9)*9 + 1. The -- position of the Nth cell is obtained by adding N-1. For example, if -- BlankCell = 41, then the positions of the cells in the same row are 37, -- 37+1, 37+2, 37+3, 37+4, 37+5, 37+6, 37+7, and 37+8. -- The position of the first cell in the same column of the 9x9 grid as the -- blank cell is mod(BlankCell-1,9) + 1. The position of the Nth cell is -- obtained by adding 9*(N-1). For example, if BlankCell = 41, then the -- positions of the cells in the same column are 5, 5+9, 5+18, 5+27, 5+36, -- 5+45, 5+54, 5+63, and 5+72. -- The position of the first cell in the same 3x3 grid as the blank cell is -- trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1. The -- position of the Nth cell in the same 3x3 grid is obtained by adding (N-1) + -- trunc((N-1)/3)*6. For example, if BlankCell = 41, then the positions of the -- cells of in the same 3x3 grid are 31, 31+1, 31+2, 31+9, 31+10, 31+11, -- 31+18, 31+19, and 31+20. ( SELECT N FROM Numbers WHERE to_char(Candidates.N) IN ( -- Check the contents of the row containing the blank cell substr ( RecursiveCTE.PartiallySolvedSudoku, trunc((BlankCell-1)/9)*9 + 1 + (N-1), 1 ), -- Check the contents of the column containing the blank cell substr ( RecursiveCTE.PartiallySolvedSudoku, mod(BlankCell-1,9) + 1 + 9*(N-1), 1 ), -- Check the contents of the 3x3 grid containing the blank cell substr ( RecursiveCTE.PartiallySolvedSudoku, trunc((BlankCell-1)/27)*27 + mod(trunc((BlankCell-1)/3),3)*3 + 1 + (N-1) + trunc((N-1)/3)*6, 1 ) ) ) -- Stop processing when no more blank cells are left. AND BlankCell > 0 ) SELECT PartiallySolvedSudoku "Partially Solved Sudoku" FROM RecursiveCTE;
See Results of the Second International NoCOUG SQL Challenge and Results of the Third International NoCOUG SQL Challenge for other examples of the use of the recursive subquery factoring technique.
P.S. The recursive examples in this post will only work in Oracle Database 11g Release 2. Anton Scheffer has a solution for Sudoku using the MODEL clause and a solution for Sudoku using collections which work with Oracle Database 10g Release 2.