## SQL Challenge Part II: Are You Smarter Than a Fifth-Grader?

After Jeff Kemp quickly solved the original problem, I realized that I had unnecessarily required that every table in the FROM clause be joined to at least one other table. Eliminating the restriction makes the problem tougher.

A certain SQL query—the text of which is unavailable to us—has exactly 1000 tables listed in the FROM clause. No table name is repeated and there are no views involved. Call the tables T1, T2, T3, …, T1000; that is, the FROM clause is “FROM T1, T2, T3, …, T1000.” On inspecting the WHERE clause next, we find that every table in the FROM clause is joined to at least one other table listed in the FROM clause and, obviously, each table is joined to no more than 999 other tables. Let’s say that T1 is joined to J1 other tables, T2 is joined to J2 other tables, T3 is joined to J3 other tables, …, and T1000 is joined to J1000 other tables. Prove that there must be at least two tables that are joined to the same number of tables; that is, there must be at least one pair of tables—say Tx and Ty—in the SQL query such that Jx and Jy are equal.

Can you solve it? Are *you* smarter than a fifth-grader?

This seems like the same thing… If T1 joins to no other tables, you are left with 999 tables with 998 remaining possible values for J (0 isn’t allowable since it is already used and 999 isn’t allowable since that would require T1 to have a join).

Right 🙂 There are two cases. In the first case, every table joins to at least one other table and J can range from 1 to 999. In the second case, there is at least one table that has not joined to any other table and J can range from 0 to 998. In both cases, there are exactly 999 possible values for J. Since there are 1000 tables, at least one value of J must repeat.

This is a variant of the Handshake Problem. The underlying principle is called the Pigeonhole Principle.