Chapter 6: Key Takeaways
Core Concepts
1. Joins Exist Because of Normalization
Relational databases store each fact once, spreading data across multiple tables. Joins reassemble these tables into meaningful results. Without joins, you cannot answer any question that spans two or more entities.
2. INNER JOIN Is the Default Workhorse
- Returns only rows with matching values in both tables
- Unmatched rows on either side are silently excluded
- Write
INNER JOINexplicitly for clarity, even thoughJOINalone defaults to inner
3. LEFT OUTER JOIN Preserves the Left Side
- Returns all rows from the left table, whether or not they match the right table
- Unmatched right-side columns are filled with NULL
- The anti-join pattern (
LEFT OUTER JOIN ... WHERE right.key IS NULL) finds rows with no match — invaluable for data quality checks
4. RIGHT OUTER JOIN Is the Mirror of LEFT
- Preserves all rows from the right table
- In practice, prefer LEFT OUTER JOIN by placing the "must-preserve" table on the left
- RIGHT OUTER JOIN and LEFT OUTER JOIN are interchangeable by swapping table positions
5. FULL OUTER JOIN Preserves Both Sides
- Returns all rows from both tables, with NULLs on whichever side has no match
- Primary use case: data reconciliation — comparing two data sources to find gaps on either side
6. CROSS JOIN Produces the Cartesian Product
- Every row from table A is paired with every row from table B
- Result size = rows(A) x rows(B)
- Use deliberately for generating reporting skeletons, date ranges, or test combinations
- Never create one accidentally by omitting a join condition
7. LATERAL Enables Correlated FROM-Clause Subqueries
- The right-side subquery can reference columns from the left side
- Ideal for "top N per group" queries
- Use
LEFT OUTER JOIN LATERAL ... ON 1=1to preserve left-side rows when the subquery returns no results
Critical Rules
Join Conditions vs. WHERE Conditions (Outer Joins)
| Placement | Effect on Outer Join |
|---|---|
| ON clause | Filters during join; unmatched rows still preserved with NULLs |
| WHERE clause | Filters after join; eliminates preserved NULL rows, converting outer join to inner join |
Rule: Place filters on the "optional" (outer) table in the ON clause.
Table Aliases
- Required for self-joins (mandatory — DB2 cannot disambiguate otherwise)
- Strongly recommended for all multi-table queries
- Use consistent conventions:
cfor CUSTOMER,afor ACCOUNT,bfor BRANCH,tfor TRANSACTION,efor EMPLOYEE
Multi-Table Joins
- Chain joins by adding more
INNER JOIN ... ONorLEFT OUTER JOIN ... ONclauses - You can mix join types freely in a single query
- List tables in logical order for readability — the optimizer chooses its own processing order
- For N tables, you need at least N-1 join conditions to avoid Cartesian products
Common Mistakes to Avoid
- Accidental Cartesian product — Missing or incorrect ON clause produces m x n rows
- Wrong outer join direction — Preserving the wrong table (check which side is LEFT vs. RIGHT)
- WHERE clause on outer table — Nullifies the outer join, silently converting it to inner
- Row multiplication in aggregates — Joining one-to-many then summing duplicates the parent values; aggregate in subqueries first
- Old-style comma joins — Mix join and filter logic; use explicit
JOIN ... ONinstead - Ignoring NULLs in COUNT — Use
COUNT(column)notCOUNT(*)when counting outer-joined rows
Performance Preview
| Join Method | Best For | Cost Model |
|---|---|---|
| Nested Loop | Small outer table, indexed inner table | outer_rows x index_lookup |
| Merge Scan | Large sorted/indexed tables | sort_cost + (rows_a + rows_b) |
| Hash Join | Large tables, no useful indexes, equi-joins | rows_a + rows_b (+ memory for hash table) |
You influence the optimizer by: - Creating indexes on foreign key and frequently filtered columns - Running RUNSTATS to keep statistics current - Writing clear, unambiguous join conditions
Quick Reference: Which Join to Use
| Business Question Pattern | Join Type |
|---|---|
| "Show X with their Y" (both must exist) | INNER JOIN |
| "Show all X, even those without Y" | LEFT OUTER JOIN |
| "Find X that have no Y" | LEFT OUTER JOIN + WHERE Y.key IS NULL |
| "Compare two sources, find gaps on both sides" | FULL OUTER JOIN |
| "Generate all combinations of X and Y" | CROSS JOIN |
| "For each X, find the top N related Y" | LATERAL |
Looking Ahead
- Chapter 7: Aggregation with GROUP BY and HAVING builds directly on joins
- Chapter 14: Window functions provide alternatives to some LATERAL patterns
- Chapter 20: Indexing strategies to optimize join performance
- Chapter 22: Query performance tuning — reading EXPLAIN plans, diagnosing slow joins