Chapter 6: Key Takeaways

Core Concepts

1. Joins Exist Because of Normalization

Relational databases store each fact once, spreading data across multiple tables. Joins reassemble these tables into meaningful results. Without joins, you cannot answer any question that spans two or more entities.

2. INNER JOIN Is the Default Workhorse

  • Returns only rows with matching values in both tables
  • Unmatched rows on either side are silently excluded
  • Write INNER JOIN explicitly for clarity, even though JOIN alone defaults to inner

3. LEFT OUTER JOIN Preserves the Left Side

  • Returns all rows from the left table, whether or not they match the right table
  • Unmatched right-side columns are filled with NULL
  • The anti-join pattern (LEFT OUTER JOIN ... WHERE right.key IS NULL) finds rows with no match — invaluable for data quality checks

4. RIGHT OUTER JOIN Is the Mirror of LEFT

  • Preserves all rows from the right table
  • In practice, prefer LEFT OUTER JOIN by placing the "must-preserve" table on the left
  • RIGHT OUTER JOIN and LEFT OUTER JOIN are interchangeable by swapping table positions

5. FULL OUTER JOIN Preserves Both Sides

  • Returns all rows from both tables, with NULLs on whichever side has no match
  • Primary use case: data reconciliation — comparing two data sources to find gaps on either side

6. CROSS JOIN Produces the Cartesian Product

  • Every row from table A is paired with every row from table B
  • Result size = rows(A) x rows(B)
  • Use deliberately for generating reporting skeletons, date ranges, or test combinations
  • Never create one accidentally by omitting a join condition

7. LATERAL Enables Correlated FROM-Clause Subqueries

  • The right-side subquery can reference columns from the left side
  • Ideal for "top N per group" queries
  • Use LEFT OUTER JOIN LATERAL ... ON 1=1 to preserve left-side rows when the subquery returns no results

Critical Rules

Join Conditions vs. WHERE Conditions (Outer Joins)

Placement Effect on Outer Join
ON clause Filters during join; unmatched rows still preserved with NULLs
WHERE clause Filters after join; eliminates preserved NULL rows, converting outer join to inner join

Rule: Place filters on the "optional" (outer) table in the ON clause.

Table Aliases

  • Required for self-joins (mandatory — DB2 cannot disambiguate otherwise)
  • Strongly recommended for all multi-table queries
  • Use consistent conventions: c for CUSTOMER, a for ACCOUNT, b for BRANCH, t for TRANSACTION, e for EMPLOYEE

Multi-Table Joins

  • Chain joins by adding more INNER JOIN ... ON or LEFT OUTER JOIN ... ON clauses
  • You can mix join types freely in a single query
  • List tables in logical order for readability — the optimizer chooses its own processing order
  • For N tables, you need at least N-1 join conditions to avoid Cartesian products

Common Mistakes to Avoid

  1. Accidental Cartesian product — Missing or incorrect ON clause produces m x n rows
  2. Wrong outer join direction — Preserving the wrong table (check which side is LEFT vs. RIGHT)
  3. WHERE clause on outer table — Nullifies the outer join, silently converting it to inner
  4. Row multiplication in aggregates — Joining one-to-many then summing duplicates the parent values; aggregate in subqueries first
  5. Old-style comma joins — Mix join and filter logic; use explicit JOIN ... ON instead
  6. Ignoring NULLs in COUNT — Use COUNT(column) not COUNT(*) when counting outer-joined rows

Performance Preview

Join Method Best For Cost Model
Nested Loop Small outer table, indexed inner table outer_rows x index_lookup
Merge Scan Large sorted/indexed tables sort_cost + (rows_a + rows_b)
Hash Join Large tables, no useful indexes, equi-joins rows_a + rows_b (+ memory for hash table)

You influence the optimizer by: - Creating indexes on foreign key and frequently filtered columns - Running RUNSTATS to keep statistics current - Writing clear, unambiguous join conditions

Quick Reference: Which Join to Use

Business Question Pattern Join Type
"Show X with their Y" (both must exist) INNER JOIN
"Show all X, even those without Y" LEFT OUTER JOIN
"Find X that have no Y" LEFT OUTER JOIN + WHERE Y.key IS NULL
"Compare two sources, find gaps on both sides" FULL OUTER JOIN
"Generate all combinations of X and Y" CROSS JOIN
"For each X, find the top N related Y" LATERAL

Looking Ahead

  • Chapter 7: Aggregation with GROUP BY and HAVING builds directly on joins
  • Chapter 14: Window functions provide alternatives to some LATERAL patterns
  • Chapter 20: Indexing strategies to optimize join performance
  • Chapter 22: Query performance tuning — reading EXPLAIN plans, diagnosing slow joins