Chapter 8: Key Takeaways

DataField.Dev

Chapter 8: Key Takeaways

Core Concepts

A subquery is a SELECT inside another SQL statement. It can appear in the WHERE clause, the SELECT list, the FROM clause, or within HAVING. The placement determines what the subquery must return (scalar, list, or table).
Scalar subqueries return one row, one column. They substitute for a single value. If the subquery returns more than one row, DB2 raises SQLCODE -811. If it returns no rows, the result is NULL.
IN and NOT IN compare against a set. IN checks membership; NOT IN checks non-membership. NOT IN fails silently when the subquery result contains NULL — always prefer NOT EXISTS for negative existence checks.
Correlated subqueries reference the outer query. They are logically re-evaluated for every outer row. This makes them powerful but potentially expensive. Check EXPLAIN plans and ensure indexes exist on correlation columns.
EXISTS checks for row existence, not row values. It returns TRUE if the subquery produces at least one row. DB2 executes it as a semi-join and can stop at the first match. NOT EXISTS is the anti-semi-join counterpart.
Derived tables are subqueries in the FROM clause. They produce an inline result set. Useful for pre-aggregation before joins and for filtering on computed columns.
CTEs (WITH clause) name subqueries for readability and reuse. They transform inside-out nested SQL into top-to-bottom logical steps. Multiple CTEs can be chained, and each can reference previously defined CTEs.
CTEs can be referenced multiple times. Unlike derived tables, a CTE defined once can appear in multiple places in the main query. This avoids duplicating logic and reduces inconsistency risk.
DB2 can inline or materialize CTEs. Inlining merges the CTE into the main query for holistic optimization. Materialization stores the result in a temp area. DB2 generally prefers inlining and materializes when it estimates a benefit (especially when the CTE is referenced multiple times).
Recursive CTEs traverse hierarchies. An anchor member plus a recursive member that references the CTE itself allows traversal of parent-child structures to arbitrary depth.

Decision Rules

Situation	Use This
Simple scalar comparison	Scalar subquery in WHERE
Check if a value is in a set	IN with subquery
Check if a value is NOT in a set	NOT EXISTS (not NOT IN)
Existence check	EXISTS
Multiple levels of logic	CTE chain
Same intermediate result used twice	CTE
Reused across multiple SQL statements	VIEW
Hierarchical / tree traversal	Recursive CTE

Performance Rules of Thumb

Uncorrelated subqueries run once. They are cheap.
Correlated subqueries run once per outer row (logically). They can be expensive. Ensure indexed correlation columns.
EXISTS with an index on the correlation column is typically the fastest existence check.
NOT EXISTS beats NOT IN for correctness (NULL safety) and often for performance.
Pre-aggregate in a CTE before joining to avoid the fan-out problem.
Check EXPLAIN before optimizing. The optimizer may already be transforming your subquery into a join.

Common Mistakes to Avoid

Writing a scalar subquery that can return multiple rows for some data states.
Using NOT IN when the subquery column is nullable.
Deeply nesting subqueries when CTEs would be clearer.
Assuming correlated subqueries are always slow — check the plan first.
Forgetting that CTEs exist only for one statement — they are not stored objects.
Not casting string columns to sufficient length in recursive CTE anchors.