The Tenth Solution
I got the idea for the First International NoCOUG SQL Challenge from a trick question that I was asked at a job interview several years ago: “How would you calculate the probabilities of obtaining various sums in two throws of a [six-sided fair] die?” No, this was not at Google—I’ve never got past the front door there—but I’ve heard that Google too likes to stress candidates by asking questions that are completely unrelated to the job they will be doing; for example, “How would you sell ice to an Eskimo?” I’d prefer to be honest when asked such questions and say—with an engaging smile, of course—“I’d like to pass on the question because it has nothing to do with the job that I’ll be doing” but I know that you’re supposed to play along with such tomfoolery if you really want the job.
I flubbed the trick question but the answer came to me on the drive home; it always does. One simply has to enumerate each possibility—for example, two sixes—and compute the probability of each possibility. Since the die has six faces and since the die is thrown twice, there are 36 possibilities. Since the die is fair and since the two throws are independent of each other, each possibility has the exact same probability; that is, 1/6 * 1/6 which equals 1/36. However, some possibilities give rise to the same sum—for example, 3 + 4 equals 4 + 3—and we have to add up their probabilities. Here is the exhaustive enumeration of possibilities.
|First throw||Second Throw||Sum||Probability|
Upon grouping by the sum of the faces, we arrive at the final answer.
|2||1 * 1/36|
|3||2 * 1/36|
|4||3 * 1/36|
|5||4 * 1/36|
|6||5 * 1/36|
|7||6 * 1/36|
|8||5 * 1/36|
|9||4 * 1/36|
|10||3 * 1/36|
|11||2 * 1/36|
|12||1 * 1/36|
I then got even cleverer—as inevitably happens to me on the drive home from an unsatisfactory interview. I realized that I could solve the problem with a clever little SQL statement. Assume that the following data is stored in a table called Die.
The following SQL statement can then be used to process the data and compute the required probabilities.
SELECT face_value, SUM (probability) AS probability FROM ( SELECT d1.face_value + d2.face_value AS face_value, d1.probability * d2.probability AS probability FROM die d1 CROSS JOIN die d2 ) GROUP BY face_value;
Isn’t that clever? But what if the die was not fair? What if the die had more than six faces? It wouldn’t matter; my clever query would still work. I mentally patted myself on the back even though I knew in my heart that my chance had passed and I would not get the job. Lady Luck knocks only once, after that she sends her daughter, Miss Fortune.
While waiting for the telephone call that never came, I toyed with the problem of computing the probabilities of obtaining various sums in more than two throws. That was simple enough, it just required more joins. For example, here’s what to do in the case of three throws.
SELECT face_value, SUM (probability) AS probability FROM ( SELECT d1.face_value + d2.face_value + d3.face_value AS face_value, d1.probability * d2.probability * d3.probability AS probability FROM die d1 CROSS JOIN die d2 CROSS JOIN die d3 ) GROUP BY face_value;
But what if the number of throws was not specified in advance? What if it was an input to the program? That was more difficult but I finally found a way. I cooked up a silly story about the Wizard of “Odds” at the School of “Hogwash” (Hogwarts) having discovered an ancient jade icosahedron in the magic chamber of mystery and published the puzzle in the November 2007 issue of the NoCOUG Journal. There were no entries but at least one person mentioned to me that he had spent a couple of hours trying to find a solution.
A year and a half passed and I proposed to the board of the Northern California Oracle Users Group (NoCOUG) that we hold an international SQL competition. I got Apress to offer some prizes; they are always very supportive of user groups such as NoCOUG. The results of the competition were spectacular; there were nine radically different entries from seven countries and three continents.
Here is my own solution to the problem. It is a variant of André Araujo’s solution; it is equally fast but has no logical limitations. However, it does have a hidden physical limitation which I will discuss later.
Even though my solution is quite fast, it would not have pleased the judges because it depended on a trick that does not work in any database except Oracle. I dynamically generate an SQL statement and execute it using the DBMS_XMLGEN.getxmltype function; I then use the XMLTABLE function to display the results.
As is the case with André Araujo’s solution, my solution does not require as many joins as there are throws; that is, using binary arithmetic and frequent grouping, one can drastically reduce the number of joins.
Here is the SQL statement that I generate for seven throws of the die. I use a sequence of common table expressions. Here, first, are the probabilities for one throw of the die.
l0 AS ( SELECT face_value AS f, probability AS p FROM die )
The probabilities for two throws of the die are computed by joining the previous expression with itself.
l1 AS ( SELECT f, SUM (p) AS p FROM ( SELECT t1.f + t2.f AS f, t1.p * t2.p AS p FROM l0 t1, l0 t2 ) GROUP BY f )
The probabilities for four throws of the die are likewise obtained by joining the previous expression with itself.
l2 AS ( SELECT f, SUM (p) AS p FROM ( SELECT t1.f + t2.f AS f, t1.p * t2.p AS p FROM l1 t1, l1 t2 ) GROUP BY f )
The probabilities for seven throws of the die can be obtained by joining all three of the above expressions because 7 = 4 + 2 + 1. There is a join with the Dual table in the very center of the query; this is the same trick that is used to great effect by André Araujo.
SELECT f, SUM (p) AS p FROM ( SELECT t1.f + t2.f AS f, t1.p * t2.p AS p FROM l2 t1, ( SELECT f, SUM (p) AS p FROM ( SELECT t1.f + t2.f AS f, t1.p * t2.p AS p FROM l1 t1, ( SELECT f, SUM (p) AS p FROM ( SELECT t1.f + t2.f AS f, t1.p * t2.p AS p FROM l0 t1, ( ( SELECT 0 AS f, 1 AS p FROM DUAL ) ) t2 ) GROUP BY f ) t2 ) GROUP BY f ) t2 ) GROUP BY f
The above statement is actually generated in a series of small steps. Here are the clauses of the above SQL query sorted by step number and clause number.
SECTION CLAUSE# CLAUSE ---------- ---------- ---------------------------------------- 1 1 WITH l0 as(SELECT face_value as f,probab ility as p FROM die) 2 1 ,l1 AS(SELECT f,SUM(p) AS p FROM(SELECT t1.f+t2.f AS f, t1.p*t2.p AS p FROM l0 t 1,l0 t2)GROUP BY f) 2 2 ,l2 AS(SELECT f,SUM(p) AS p FROM(SELECT t1.f+t2.f AS f, t1.p*t2.p AS p FROM l1 t 1,l1 t2)GROUP BY f) 3 1 SELECT f,sum(p) AS p FROM(SELECT t1.f+t2 .f AS f,t1.p*t2.p AS p FROM l2 t1,( 3 2 SELECT f,sum(p) AS p FROM(SELECT t1.f+t2 .f AS f,t1.p*t2.p AS p FROM l1 t1,( 3 3 SELECT f,sum(p) AS p FROM(SELECT t1.f+t2 .f AS f,t1.p*t2.p AS p FROM l0 t1,( 4 1 (SELECT 0 AS f, 1 AS p FROM dual) 5 1 )t2)GROUP BY f 5 2 )t2)GROUP BY f 5 3 )t2)GROUP BY f
Enough said. Here is the complete solution. As usual, common table expressions are used for readability. In the first step, I generate the first section of the dynamically generated SQL statement. The fake CARDINALITY hint is required because you will encounter an ORA-600 with argument 15160 if the number of throws is extremely large and the expected cost is extremely high; we have therefore to trick the query optimizer into thinking that the table Die has only one row.
section1 AS ( SELECT 1 AS section, 1 AS clause#, 'WITH l0 as(SELECT/*+CARDINALITY(die 1)*/face_value as f,probability as p FROM die)' AS clause FROM DUAL )
In the following step, I generate the second section of the dynamically generated SQL statement. Each common table expression generated in this section is constructed by joining the preceding expression with itself. For example, l1 = l0 CROSS JOIN l0 and l2 = l1 CROSS JOIN l1.
section2 AS ( SELECT 2 AS section, LEVEL AS clause#, ',l' || TO_CHAR (LEVEL) || ' AS(SELECT f,SUM(p) AS p FROM(SELECT t1.f+t2.f AS f, t1.p*t2.p AS p FROM l' || TO_CHAR (LEVEL - 1) || ' t1,l' || TO_CHAR (LEVEL - 1) || ' t2)GROUP BY f)' AS clause FROM DUAL CONNECT BY POWER (2, LEVEL) <= :n )
In the following step, I generate the third section of the dynamically generated SQL statement. The BITAND function is used to extract the bits from the binary representation of N—the number of throws. Notice the use of the DESC clause in the ORDER BY specification.
section3 AS ( SELECT 3 AS section, ROW_NUMBER() OVER (ORDER BY LEVEL DESC) AS clause#, DECODE (BITAND (:n, POWER (2, LEVEL - 1)), 0, '', 'SELECT f,sum(p) AS p FROM(SELECT t1.f+t2.f AS f,t1.p*t2.p AS p FROM l' || TO_CHAR (LEVEL - 1) || ' t1,(') AS clause FROM DUAL CONNECT BY POWER (2, LEVEL - 1) <= :n )
In the following step, I generate the fourth section of the dynamically generated SQL statement. I construct an “identity” table that contains only one row and does not affect the table to which it is joined.
section4 AS ( SELECT 4 AS section, 1 AS clause#, '(SELECT 0 AS f, 1 AS p FROM dual)' AS clause FROM DUAL )
In the following step, I generate the fifth and final section of the dynamically generated SQL statement. I generate closing parentheses to match the opening parentheses generated in section 3.
section5 AS ( SELECT 5 AS section, ROW_NUMBER() OVER (ORDER BY LEVEL) AS clause#, DECODE (BITAND (:n, POWER (2, LEVEL - 1)), 0, '', ')t2)GROUP BY f') AS clause FROM DUAL CONNECT BY POWER (2, LEVEL - 1) <= :n )
I now collect clauses from all five sections. Empty clauses are eliminated and row numbers are assigned to the remaining clauses.
all_sections AS ( SELECT ROW_NUMBER() OVER (ORDER BY section, clause#) AS clause#, clause FROM (SELECT section, clause#, clause FROM section1 UNION ALL SELECT section, clause#, clause FROM section2 UNION ALL SELECT section, clause#, clause FROM section3 UNION ALL SELECT section, clause#, clause FROM section4 UNION ALL SELECT section, clause#, clause FROM section5) WHERE LENGTH(clause) != 0 )
Next, all SQL clauses are concatentated into a single VARCHAR2 string using the CONNECT BY method. The reason why I used short variable names and avoided any unnecessary white space while constructing SQL clauses is that the maximum length of a VARCHAR2 column is 2048 characters. This is the physical limitation to which I alluded earlier.
sql_text AS ( SELECT REPLACE (sql_text, '|', '') AS sql_text FROM (SELECT RANK () OVER (ORDER BY lvl DESC) AS RANK, sql_text FROM (SELECT LEVEL AS lvl, SYS_CONNECT_BY_PATH (clause, '|') AS sql_text FROM all_sections START WITH clause# = 1 CONNECT BY clause# = LEVEL)) WHERE RANK = 1 )
I now execute the dynamically generated SQL statement using the DBMS_XMLGET.GETXMLTYPE function. The result is a CLOB containing the results formatted using XML.
xml_string AS ( SELECT DBMS_XMLGEN.getxmltype (sql_text) xml_string FROM sql_text )
Finally, I use the XMLTABLE function to extract the results of the dynamically generated SQL statement from the XML string.
SELECT sum, probability FROM xml_string, XMLTABLE ( '$x/ROWSET/ROW' PASSING xml_string AS "x" COLUMNS sum NUMBER PATH 'F', probability NUMBER PATH 'P' ) x ORDER BY sum
It’s not very pretty but I was quite pleased with myself at the time. I had no idea that there were so many good solutions out there. Congratulations to Laurent Schneider (Switzerland), Craig Martin (USA), Rob van Wijk (Netherlands), Vadim Tropashko (USA), Alberto Dell’Era (Italy), Fabien Contaminard (France), Cd-MaN (Romania), and André Araujo (Australia) for their original solutions. Congratulations especially to Alberto Dell’Era, Knight of the August Order of the Wooden Pretzel!
P.S. Here is a graph of the results I obtained on my laptop with a 2.00 GHz Intel Core2 processor.
|Throws (Thousands)||Elapsed Time (Minutes)|