SQL Quiz: Which Query is Better?
In general, there are lots of ways of expressing a particular query requirement in SQL with implications for query performance. For example, which departments have employees with salaries greater than a certain cutoff? Here are two ways to express this query requirement in SQL. The first uses a conventional correlated subquery while the second uses ANSI join syntax. Which is better? From a theoretical perspective? From a practical perspective? In certain situations? Since I can only pick one, which one should I pick?
VARIABLE salary_cutoff NUMBER -- Correlated subquery SELECT d.department_name FROM departments d WHERE EXISTS ( SELECT * FROM employees e WHERE salary > :salary_cutoff AND department_id = d.department_id ); -- ANSI join syntax SELECT d.department_name FROM ( SELECT DISTINCT department_id FROM employees WHERE salary > :salary_cutoff ) e JOIN departments d ON e.department_id = d.department_id;
Inquiring minds want to know!
P.S. The above queries will work unmodified in the HR sample schema.


XE:
Correlated: 0 sorts (memory) and understandable syntax
ANSI: 1 sorts (memory)
My rule of thumb is, a table should only be in the FROM clause if columns in that table appear in the SELECT clause. Otherwise you are not selecting from it, and its place is in the where clause.
The best choice depends on the data sizes and distribution patterns, index definitions, and Oracle version.
Without checking the HR schema, I’d
have to say that your interviewer might have been correct. If there were an index on salary a very high cutoff value would presumably find very few employees, have to do a very small sort, and then index into a very small number of departments.