The 12 Days of SQL: 12 Things Every Oracle DBA and Developer Should Know About SQL (Day 1)
Inspired by Andrew Zitelli’s presentation Four Things Every DBA and Developer Should Know About Oracle at NoCOUG Conference #100 at the Computer History Museum (Conference 100 Report: Pictures at an Exhibition)
On the first day of Christmas, my true love gave to me
A partridge in a pear tree.
Day 1: SQL is based on relational calculus and relational algebra (Day 2: SQL is a non-procedural language)
As explained by IBM researcher Dr. Edgar (Ted) Codd in his seminal 1970 paper “A Relational Model of Data for Large Shared Data Banks” (reproduced in its entirety in the 100th issue of the NoCOUG Journal), the power of the relational model comes from the ability to specify data requirements by writing a “relational algebra” expression using relational operations (operations on sets of data) such as Restrict (a subset of rows is targeted), Project (a subject of columns is targeted), Union, Minus, Intersect, Cross Join, Theta Join (a combination of the Cross Join and Restrict operations), Antijoin, Semijoin, Outer Join, and Division.
Dr. Codd published a subsequent paper “Relational Completeness of Data Base Sublanguages” in which he outlined a non-procedural method of specifying data requirements using “quantifiers” such as Exists. He called this method “relational calculus” and showed that how a database engine like Oracle could convert non-procedural relational calculus expressions into equivalent relational algebra expressions. At the end of the paper he declared that “Clearly, the majority of users should not have to learn either the relational calculus or algebra in order to interact with data bases. However, requesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language.”
Two of Codd’s IBM colleagues, Donald Chamberlin and Raymond Boyce, then created a language called SEQUEL (Structured English Query Language) using the ideas of relational calculus (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf). The acronym was shortened to SQL when it was discovered that SEQUEL was someone else’s registered trademark. SQL was not intended to be a rigorous implementation of relational calculus—it was originally intended for the use of “accountants, engineers, architects, and urban planners” who, “while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language”—and had flaws which persist to this day; for example, it permitted duplicates. Also, it used the Union, Minus, and Intersect operations which are relational operations. Over time, explicit support for other relational operations such as Cross Join, Theta Join, and Outer Join was added. SQL is therefore a redundant hybrid of relational calculus and relational algebra.
Here are some examples from the Human Resources sample schema (ER diagram)
Employees who earn more than $10,000 (the relational calculus version)
select d.department_id, d.department_name, e.employee_id, e.first_name, e.last_name, e.salary from employees e, departments d where e.salary > 10000 and e.department_id = d.department_id
The above expression does not explicitly mention the need to join the employee and department tables; it is the task of the database engine to find a sequence of relational operations that can produce the required data.
Employees who earn more than $10,000 (the relational algebra version)
select d.department_id, d.department_name, -- Project operation on the Departments table e.employee_id, e.first_name, e.last_name, e.salary -- Project operation on the Employees table from departments d join employees e on (e.department_id = d.department_id) -- Theta Join operation on the Employees and Departments tables where e.salary > 10000 -- Restriction operation on the Employees table
The above expression contains two Project operations (one each on the Employees and Departments tables), one Theta Join operation, and one Restrict operation.
Managers who earn less than one of their employees (the relational calculus version)
select m.employee_id, m.first_name, m.last_name from employees m where exists ( select * from employees e where m.employee_id = e.manager_id and m.salary < e.salary )
Notice the use of the Exists quantifier in the above expression.
Managers who earn less than one of their employees (the relational algebra version)
select distinct m.employee_id, m.first_name, m.last_name from employees m join employees e on (m.employee_id = e.manager_id) where m.salary < e.salary
The above expression implicitly requires at least two Project operations; can you find them?
Employees who have worked in every department (the relational calculus version)
select e.employee_id, e.first_name, e.last_name from employees e where not exists ( select * from departments d where (d.department_id != e.department_id and e.department_id is not null) and not exists ( select * from job_history j Where j.employee_id = e.employee_id and j.department_id = d.department_id ) )
Employees who have worked in every department (the relational algebra version)
Left as an exercise 🙂 Rewrite without using the Exists quantifier.
A rigorous treatment of relational algebra and relational calculus can be found in Database System Concepts by Korth and Silberschatz. As Leonardo da Vinci said in The Discourse On Painting “Those who are in love with practice without knowledge are like the sailor who gets into a ship without rudder or compass and who never can be certain whether he is going. Practice must always be founded on sound theory.”