Chapter 6 — Key Takeaways

The big idea

A join reassembles related rows split across tables — the payoff of good design (Ch. 3) and, mechanically, a Cartesian product narrowed by a match condition (Ch. 4). Real questions are just chains of two-table joins; you built a six-table business query one join at a time.

The join types

Join Keeps Use for
INNER JOIN only matching rows (both sides) "X with its Y" when every X has a Y
LEFT JOIN all left rows + matches (NULL if none) "all X, with Y if any"
LEFT JOIN … WHERE right IS NULL left rows with no match "X that never Y" (anti-join)
RIGHT JOIN mirror of LEFT (rarely used — swap sides)
FULL OUTER JOIN all rows from both sides reconciliation
CROSS JOIN every pairing (the bare product) deliberate combinations
self-join a table joined to itself (two aliases) adjacency: manager/employee, etc.

Building multi-table queries

  • Alias every table (o, c, oi, p); qualify columns (o.order_id).
  • Follow the foreign keys outward: orders → order_items → products → categories, plus customers, employees.
  • The ON clause is the match (σ); it can be any condition (equi- or non-equi join). USING(col) is shorthand when names match.

The five classic mistakes

  1. Accidental Cartesian product — missing/wrong ON → far too many rows.
  2. Fan-out (one-to-many over-counting) — joining grains then SUM-ing a coarser value triples it; pre-aggregate to the right grain, don't paper over with DISTINCT. (Case Study 1.)
  3. Wrong join typeINNER where you needed LEFT silently drops no-match rows and biases counts/averages. (Case Study 2.)
  4. Ambiguous column — qualify with the alias.
  5. Outer-join WHERE trap — a condition on the right table in WHERE turns LEFT back into INNER. Filter the optional side in ON, the preserved side in WHERE.

Two patterns to memorize

  • Anti-join: A LEFT JOIN B … WHERE B.key IS NULL → "A with no B."
  • Include the zeros: A LEFT JOIN B … COALESCE(SUM(B.val), 0) → per-A totals counting empty A's as 0.

Always ask, of every join

"What does one row represent now (the grain)?" and "Do the no-match rows matter to this question?" The first guards against fan-out; the second tells you INNER vs. LEFT.

Performance (theme #5)

Index the columns you join on — especially foreign keys. An unindexed join on a large table is a classic 45-second query. The optimizer chooses nested-loop / hash / merge joins; you'll read its choice with EXPLAIN in Ch. 24.

You can now…

  • ☐ Write every join type, including self-joins.
  • ☐ Build 4–6 table queries by chaining two-table joins.
  • ☐ Use the anti-join and "include the zeros" patterns.
  • ☐ Diagnose Cartesian products, fan-out, and the outer-join WHERE trap.
  • ☐ Reason about grain and no-match rows before trusting a result.

Looking ahead

Chapter 7 — Aggregation. COUNT, SUM, AVG, GROUP BY, HAVING — condensing joined rows into the totals and averages that answer business questions (and the right place to fix fan-out).

One sentence to carry forward: Joins reassemble well-designed data — but every join has a grain and a choice of "keep the non-matches or not," and getting those two things right is the difference between a correct answer and a confident wrong one.