## Results of the First International NoCOUG SQL Challenge

Follow @Oratweets

*The following report was published in the August 2009 issue of the* NoCOUG Journal*. (View **PDF**)*

The *First International NoCOUG SQL Challenge* was a great success; nine solutions were found by participants in seven countries and three continents. Alberto Dell’Era wins the contest for his wonderful solution using Discrete Fourier Transforms; the runner-up is André Araujo from Australia, who used binary arithmetic and common table expressions in his solution. The *August Order of the Wooden Pretzel* (Footnote 1) will be bestowed on Alberto but the real prize is six books of his choice from the Apress catalog. André will receive a prize of six e-books of his choice. Thanks to Chen Shapira for publicizing the event in her blog, Dan Tow for helping to judge the contest, and Apress for donating the books.

**The Challenge**

An ancient 20-sided die (icosahedron) was discovered in the secret chamber of mystery at Hogwash School of Es-Cue-El. A mysterious symbol was inscribed on each face of the die. The great Wizard of Odds discovered that each symbol represents a number. The great wizard discovered that the die was biased: that is, it was more probable that certain numbers would be displayed than others if the die were used in a game of chance. The great wizard recorded this information in tabular fashion as described below.

Name |
Null? |
Type |

FACE_ID | NOT NULL | INT |

FACE_VALUE | NOT NULL | INT |

PROBABILITY | NOT NULL | REAL |

The great wizard then invited all practitioners of the ancient arts of Es-Cue-El to create an Es-Cue-El spell that displays the probabilities of obtaining various sums when the die is thrown N times in succession in a game of chance, N being a *substitution variable* or *bind variable*. The rules of the competition can be found in the May 2009 issue of the *NoCOUG Journal* and on the Web at **www.nocoug.org/SQLchallenge/FirstSQLchallenge.pdf**.

**The Solutions**

The modus operandi was for each participant to post their solutions on their blog or website; you can find them all with a simple Google search. The first solution—by **Laurent Schneider** from Switzerland—used the *CONNECT BY* clause to join the table to itself N times and the *SYS_CONNECT_BY_PATH* and *XMLQUERY* functions to perform the necessary additions and multiplications. The number of records generated by CONNECT BY grows exponentially and hurts performance.

The second solution—by **Craig Martin** from the USA—used the CONNECT BY clause to join the table to itself N times and logarithms to perform the necessary additions and multiplications. The number of records grows exponentially in this case too.

The third solution—by **Rob van Wijk** from the Netherlands—used the *Model* clause to generate records. The number of records grows exponentially in this case too.

The fourth solution—by **Vadim Tropashko** from the USA—used *recursive common table expressions* to generate records. The number of records grows exponentially in this case too. Recursive common table expressions are available in Microsoft SQL Server and DB2 but are not presently available in Oracle 11*g*R1. Rumor has that they will be available in Oracle Database 11*g*R2.

The fifth and sixth solutions—by **Alberto Dell’Era** from Italy—used advanced mathematical techniques such as *convolutions*, *Discrete Fourier Transforms*, and *Fast Fourier Transforms*. The Fast Fourier Transform method is an efficient way of calculating Discrete Fourier Transforms and was implemented using the Model clause.

The seventh solution—by **Fabien Contaminard** from France—was based on the *multinomial* probability distribution, an extension of the binomial distribution.

The eighth solution—by a blogger named Cd-MaN from Romania—used *pipelined table functions* and recursion. The solution was demonstrated in a Postgres database but can easily be adapted for use in an Oracle Database. The use of recursion means that this is not a pure SQL solution.

The ninth solution—by **André Araujo** from Australia—used binary arithmetic and common table expressions.

**Judges’ Decision**

Dan Tow authored the following statement on behalf of the judging committee:

*To begin, we’d like to congratulate the contestants on finding so many different and clever ways to solve this problem, and on making the problem of picking a winner so difficult for us, personally. Each of the solutions had major advantages to recommend it. We were also very pleasantly surprised at the global extent of the entries, with entries from seven nations and three continents!*

*The criteria for the judging were stated in advance at www.nocoug.org/SQLchallenge/FirstSQLchallenge.pdf. The main criteria that separated the top scores from the rest, given that they were all quite good as technical solutions from one perspective or another, were the inclusion of commentary and test results, which were minimal or altogether lacking from most entries. (Hey, we understand that you’re busy, so we’re not surprised to see these missing or minimal, but they were important to getting a win here!)*

*There were elegant solutions using the Model clause, including one by Alberto Dell’Era that implemented a solution using Fast Fourier Transforms that was technically amazing, well-documented and tested, and scaled better than any other solution, with order N * Log(N) scaling, almost linear up to enormous numbers of throws of the dice. However, these used “iterate” loops that we believe violated the contest requirement that “Solutions that use procedural loops to multiply probabilities are not eligible” stated at the top of the “Judges’ Statement.” (We know that there are wonderful, super-efficient ways to solve this problem in procedural code like C, but the point behind that unbendable requirement was to get contestants “thinking in SQL,” doing the job in a set-wise manner, not just finding ways to bend SQL into doing what we’d be better off doing in C, procedurally.) The “procedural loop” component in these solutions was really minimal and easy to miss, even, in a casual examination of the code, but we think we have to stick with the pre-stated rules here and disqualify those solutions, even while we admire them.*

*Scaling almost as well (at order N * N), and also very well documented and tested, both from the perspective of performance and functionality, was the amazing Fourier-Transform-based solution, www.adellera.it/investigations/nocoug_challenge/index.html also by Alberto Dell’Era from Italy, that we think has to be declared the winner here, with no procedural loops and good-to-excellent scores in every stated judging criteria.*

*The runner-up choice is difficult, too, but we’d probably have to go with André Araujo of Australia, www.pythian.com/news/2385/nocoug-sql-challenge-entry. Of all the solutions, his probably best combined straightforward (very clever, but still straightforward to the reader!), portable SQL that could be easily understood and maintained by a developer without an advanced degree in mathematics, with fairly scalable and well-tested SQL that ran well up to quite high numbers of throws of the die (“N”). It is true that the SQL had a hard-coded limit of N = 511 (a limit that André documented well, to the credit of the solution), and that functional limit lost a few points, but we should keep in mind that this high value of N is one that most of the implementations (other than Alberto Dell’Era’s) would never reach in our lifetime, anyway, owing to their comparative lack of scalability—being logically correct at high N is worth nothing if the program never finishes! If we had to actually maintain one of these in a production environment, and we didn’t anticipate needing results at very high values of N, we’d probably go with André’s solution, just because we’d be frightened of long-term maintenance on the high mathematics of Alberto Dell’Era’s brilliant but more complex and technically harder-to-follow solution.*

**Analysis of the Winning Solution**

Alberto published two solutions. The first solution uses a mathematical tool called *Discrete Fourier Transforms* used in *Digital Signal Processing*. This solution will work with most databases—including SQL Server and DB2—with a little tweaking. The second solution provided by Alberto used Oracle’s proprietary *Model* clause to implement an efficient technique called *Fast Fourier Transforms* for computing Discrete Fourier Transforms.

Alberto recognized that the contents of the die table define a mathematical function and that the process of joining the table with itself and grouping the results is the so-called *convolution* of this function with itself. Throwing the die N times is therefore equivalent to performing N – 1 convolutions. For example, for N = 3, we have to perform two convolutions. This is best expressed using *common table expressions* as follows. Notice that the definition of the second convolution references the first convolution.

WITH first_convolution AS ( SELECT face_value, SUM (probability) AS probability FROM ( SELECT d1.face_value + d2.face_value AS face_value, d1.probability * d2.probability AS probability FROM die d1 CROSS JOIN die d2 ) GROUP BY face_value ), second_convolution AS ( SELECT face_value, SUM (probability) AS probability FROM ( SELECT d1.face_value + d2.face_value AS face_value, d1.probability * d2.probability AS probability FROM first_convolution d1 CROSS JOIN die d2 ) GROUP BY face_value ) SELECT face_value, probability FROM second_convolution ORDER BY face_value;

The most common solution of the problem requires an N-way cross join. Convolutions have obvious advantages over an N-way cross join because they keep the size of intermediate results in check. The question is how to compute the required N – 1 convolutions with a single SQL statement if the value of N is not known in advance. One solution is to recursively invoke a *table function* as was done by the Romanian contestant; that is, we have to resort to procedural programming. Alberto was able to avoid procedural programming using a *Fourier transform;* that is, a certain function whose definition is derived from the original function. The Fourier transform has the interesting property that the transform of the convolution of functions is the simple product of the individual transforms. Therefore, the Fourier transform of the N-way convolution of our function with itself is the Nth power of the Fourier transform of the function. The Fourier transform is straightforward to compute and—with a little mathematical trick involving a conversion from the Cartesian coordinate system to the polar coordinate system—the Nth power of the Fourier transform is also straightforward to compute. At this point, Alberto has the Fourier transform of the N-way convolution. To obtain the result that he really needs, all that is left for him to do is to compute the *Inverse Fourier Transform* of the Fourier transform. An explanation of Fourier Transforms can be found on the Web; efficient C-language implementations can also be readily found.

For readability and maintainability, Alberto uses a sequence of *common table expressions*. The following is a simplified version of his solution; the full solution handles more general cases and uses some tricks to reduce the number of scientific computations. Hints to guide the optimizer and improve efficiency are included in the version shown below.

First Alberto creates a one-column table of sequence numbers using the CONNECT BY method; this table is used several times in the rest of the solution. The number of elements in the table is one more than the product of the number of sides of the die and the number of throws.

sequence AS ( SELECT /*+ NO_MERGE */ LEVEL - 1 AS n FROM dual CONNECT BY LEVEL <= (:N * :sides + 1) )

Alberto then constructs a *discrete function* whose domain is the numbers in the sequence table. The function is called *discrete* because it is only defined for certain discrete values—not for all values in a range as in the case of a *continuous* function. Whenever possible, the function uses the values in the die table. If a value is not found, the value of the function is set to zero. This is done using an *outer join*.

function AS ( SELECT /*+ NO_MERGE */ n, COALESCE(probability, 0) AS x FROM sequence LEFT OUTER JOIN die ON (n = face_value) )

Alberto then computes the *Fourier transform* of the discrete function. There’s some advanced math going on here but it’s easy to take in small doses; you’ll recognize Pi as the well-known mathematical constant. A *cross join* with the Sequence table is required to calculate sums.

transform AS ( SELECT /*+ NO_MERGE LEADING(function) */ sequence.n, SUM(x * COS(-2 * :pi * sequence.n * function.n / (:N * :sides + 1))) AS x, SUM(x * SIN(-2 * :pi * sequence.n * function.n / (:N * :sides + 1))) AS y FROM function CROSS JOIN sequence GROUP BY sequence.n )

In another little dose of math, Alberto switches to *polar form* for ease of further computation.

polar AS ( SELECT /*+ NO_MERGE */ n, SQRT((x * x) + (y * y)) AS r, CASE WHEN ABS(y) < 0.000001 AND ABS(x) < 0.000001 THEN 0 ELSE ATAN2(y, x) END AS theta FROM transform )

Computing the Nth power of the Fourier transform is then very easy.

power AS ( SELECT /*+ NO_MERGE */ n, POWER(r, :N) AS r, theta * :N AS theta FROM polar )

Alberto has no more use for the polar form, so he converts back to Cartesian form.

cartesian AS ( SELECT /*+ NO_MERGE */ n, r * COS(theta) AS x, r * SIN(theta) AS y FROM power )

So far, Alberto has calculated the Fourier transform of the discrete function and computed its Nth power. As explained earlier, this is the Fourier transform of the N-way convolution of the discrete function. To obtain the convolution itself, Alberto computes the *Inverse Fourier Transform* using another *cross join* with the Sequence table.

convolution AS ( SELECT /*+ NO_MERGE LEADING(cartesian) */ sequence.n, SUM ( x * COS(+2 * :pi * cartesian.n * sequence.n / (:N * :sides + 1)) - y * SIN(+2 * :pi * cartesian.n * sequence.n / (:N * :sides + 1)) ) / (:N * :sides + 1) AS x FROM cartesian CROSS JOIN sequence GROUP BY sequence.n )

Finally, Alberto can display the results. The result has more than 30 digits of decimal precision; only 30 are displayed in the interests of accuracy.

SELECT n AS face_value, ROUND(x, 30) AS probability FROM convolution WHERE n >= :N ORDER BY n

As you can see, Alberto’s solution used advanced mathematical techniques, but it is not very long and the use of *common table expressions* makes it quite readable. We have a winner!

Footnote 1: Steven Feuerstein was asked the following question in an interview published in the May 2006 issue of the *NoCOUG Journal*: *“SQL is a set-oriented non-procedural language; i.e., it works on sets and does not specify access paths. PL/SQL on the other hand is a record-oriented procedural language, as is very clear from the name. What is the place of a record-oriented procedural language in the relational world?”* Steven replied: *“Its place is proven: SQL is not a complete language. Some people can perform seeming miracles with straight SQL, but the statements can end up looking like pretzels created by someone who is experimenting with hallucinogens. We need more than SQL to build our applications, whether it is the implementation of business rules or application logic. PL/SQL remains the fastest and easiest way to access and manipulate data in an Oracle RDBMS, and I am certain it is going to stay that way for decades.”*

Download the complete SQL script (simplified version of Alberto Dell’Era’s solution)

Subscribe to this blog by Email

Congratulations to Alberto and André !

Thanks Iggy and NoCOUG for this great challenge !

And don’t forget to publish your own solution !

My solution wasn’t a pure SQL solution like yours 🙂 First, I dynamically constructed an SQL query that is very similar to the one in Andre Araujo’s solution. I then executed it dynamically using DBMS_XMLGEN.GETXMLTYPE (http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_xmlgen.htm#i1015152).

I’ll publish my solution soon but I like your solution better 🙂

I’ve published my solution at https://iggyfernandez.wordpress.com/2009/08/09/the-tenth-solution/.

Hi Iggy,

thank you for both setting up the Challenge and especially for the time and effort you spent reviewing the solutions – you examined and commented the solutions with such a dedication and love for detail that really amazed me. Thanks !

Alberto –

now a proud member of the Wooden Pretzel Society

Congrats, Alberto! Amazing solution!

Thanks, Iggy, for organising the challenge. I’m already looking forward to the next one 🙂

André

Hi Iggy,

This remonds me of some of Joe Celko’s quizzes!

While these are fun and a great test of skill, I was wondering what your take is on the pgarmatism of solving complex procedural problems with SQL, as opposed to a procedural language?

Some folks argue that simpler, procedural solutions to these types of complex problems are easier to maintain, and frequently run faster than SQL . . .

Eons ago, when I was in College, programming competitions were quite different, because hardware was super-expensive, relative to programmers slaries. The code was not judged on who could solve the problem the fastest or most elegantly, but on which entry ran the fastest!

You know, it would have been interesting to run a contest where the criteria is not only to solve the problem but judge your entries on how fast the SQL runs . . .

Hi, Don,

I think that SQL is best suited for the common kinds of queries that occur in business data processing. C and FORTRAN are better choices for scientific problems. Wikipedia says:

“Since Fortran has been in use for more than fifty years, there is a vast body of Fortran in daily use throughout the scientific and engineering communities. It is the primary language for some of the most intensive supercomputing tasks, such as weather and climate modeling, computational fluid dynamics, computational chemistry, computational economics, and computational physics. Even today, half a century later, many of the floating-point benchmarks to gauge the performance of new computer processors are still written in Fortran (e.g., CFP2006, the floating-point component of the SPEC CPU2006 benchmarks).”