Home > Oracle, SQL > Part 1: The Hitchhiker’s Guide to SQL: Lateral derived tables and other alternatives to GROUP BY

Part 1: The Hitchhiker’s Guide to SQL: Lateral derived tables and other alternatives to GROUP BY


“In many of the more relaxed civilizations on the Outer Eastern Rim of the Galaxy, the Hitchhiker’s Guide has already supplanted the great Encyclopaedia Galactica as the standard repository of all knowledge and wisdom, for though it has many omissions and contains much that is apocryphal, or at least wildly inaccurate, it scores over the older, more pedestrian work in two important respects. First, it is slightly cheaper; and secondly it has the words DON’T PANIC inscribed in large friendly letters on its cover.”The Hitchhiker’s Guide to the Galaxy by Douglas Adams

The father of relational theory, Dr. Edgar (Ted) Codd posed the following question as an exercise in his book The Relational Model for Database Management: Version 2:

“Consider the language SQL. It contains a feature called GROUP BY. Treat SQL, with this feature dropped as SQLX. Are there any queries expressible in SQL that are not expressible in SQLX? If your answer is yes, supply two examples. If the answer is no, supply a proof.”

DON’T PANIC. The answer is no. Here is a proof by example. Suppose that we have the following query requirement in the HR sample schema: find the sum of employee salaries for each department at location 1700 with more than 5 employees. Four different solutions are shown below, beginning with the traditional GROUP BY solution and followed by window functions, scalar subqueries, and lateral derived tables. Note that event 22829 needs to be set before using lateral derived tables; thanks to Jonathan Lewis for that tip. I conducted my tests using Oracle Database 11g Release 2 and the Developer Days Virtual Machine.

Traditional solution using GROUP BY:

SELECT
  /* GROUP BY */
  d.department_name,
  SUM(e.salary) AS sum
FROM departments d
INNER JOIN employees e
ON (e.department_id = d.department_id)
WHERE d.location_id = 1700
GROUP BY d.department_name
HAVING COUNT(*) > 5
ORDER BY d.department_name;

Solution using window functions:

SELECT DISTINCT
  /* Window functions */
  department_name,
  sum
FROM
  (
    SELECT
      d.department_name,
      SUM(e.salary) OVER (PARTITION BY d.department_name) AS sum,
      COUNT(*) OVER (PARTITION BY d.department_name) AS count
    FROM departments d
    INNER JOIN employees e
    ON (e.department_id = d.department_id)
    WHERE d.location_id = 1700
  )
WHERE COUNT > 5
ORDER BY department_name;

Solution using scalar subqueries:

SELECT
  /* Scalar subqueries */
  d.department_name,
  (
    SELECT SUM(salary)
    FROM employees
    WHERE department_id = d.department_id
  ) AS sum
FROM departments d
WHERE d.location_id = 1700
AND (SELECT COUNT(*) FROM employees WHERE department_id = d.department_id) > 5
ORDER BY d.department_name;

Unsupported solution using lateral derived tables:

ALTER session SET events '22829 trace name context forever';

SELECT
  /* Lateral derived tables */
  d.department_name,
  l.sum
FROM
  departments d,
  LATERAL
  (
    SELECT
      SUM(e.salary) AS sum,
      COUNT(*) AS count
    FROM employees e
    WHERE e.department_id = d.department_id
  ) l
WHERE d.location_id = 1700
AND l.count > 5
ORDER BY department_name;

Addendum: A reader asked what point I am trying to make and if I am suggesting that we stop using the GROUP BY clause. In a perfect world, we could use any of the above variations and it wouldn’t matter because all of them would have the same execution plan for any particular data distribution and hence would always have the same performance. In an imperfect world, we have to choose what works best in our case whether that be GROUP BY or something else. Read Day 4: The Twelve Days of SQL: The way you write your query matters.

Categories: Oracle, SQL
  1. Ravi
    April 15, 2012 at 8:31 pm

    What’s the point? Are you suggesting that we stop using GROUP BY clause?

  2. Iggy Fernandez
    April 16, 2012 at 6:23 am

    Ravi :

    Are you suggesting that we stop using GROUP BY clause?

    In a perfect world, we could use any of the above variations and it wouldn’t matter because all of them would have the same execution plan for any particular data distribution and hence would always have the same performance. In an imperfect world, we have to choose what works best in our case whether that be GROUP BY or something else. Read Day 4: The Twelve Days of SQL: The way you write your query matters.

  3. The Grey Ghost
    April 19, 2012 at 12:52 am

    It’s possible that Ted Codd was highlighting the non-relational nature of the SQL implementation. I guess a projection by a relational query projecting a sub set of table columns would return the equivalent of a SQL select distinct meaning there’s no need for a group by. Fabian Pascall and Chris Date and various others have written endlessly on poor nature of the SQL language, highlighting the endless varieties of ways of writing the same thing and hence the difficulty of writing an efficient optimiser. Don Chamberlain, one of the original authors of SQL pops up regularly in commercialisations of data management science, often falling foul of familiar traits. He’s also partly responsible for the development of the awful xquery FLOWR query syntax that follows a similar linear grammar found in SQL. Old dogs and new tricks and all that.

  4. Iggy Fernandez
    April 19, 2012 at 6:55 am

    Jonathan Gennick who is a very good teacher has an excellent article on XQuery “flowers” at http://www.oracle.com/technetwork/issue-archive/2005/05-sep/o55xquery-097999.html

  1. No trackbacks yet.

Leave a comment