Chapter 6 — Key Takeaways
The big idea
A join reassembles related rows split across tables — the payoff of good design (Ch. 3) and, mechanically, a Cartesian product narrowed by a match condition (Ch. 4). Real questions are just chains of two-table joins; you built a six-table business query one join at a time.
The join types
| Join | Keeps | Use for |
|---|---|---|
INNER JOIN |
only matching rows (both sides) | "X with its Y" when every X has a Y |
LEFT JOIN |
all left rows + matches (NULL if none) | "all X, with Y if any" |
LEFT JOIN … WHERE right IS NULL |
left rows with no match | "X that never Y" (anti-join) |
RIGHT JOIN |
mirror of LEFT (rarely used — swap sides) | — |
FULL OUTER JOIN |
all rows from both sides | reconciliation |
CROSS JOIN |
every pairing (the bare product) | deliberate combinations |
| self-join | a table joined to itself (two aliases) | adjacency: manager/employee, etc. |
Building multi-table queries
- Alias every table (
o,c,oi,p); qualify columns (o.order_id). - Follow the foreign keys outward:
orders → order_items → products → categories, pluscustomers,employees. - The
ONclause is the match (σ); it can be any condition (equi- or non-equi join).USING(col)is shorthand when names match.
The five classic mistakes
- Accidental Cartesian product — missing/wrong
ON→ far too many rows. - Fan-out (one-to-many over-counting) — joining grains then
SUM-ing a coarser value triples it; pre-aggregate to the right grain, don't paper over withDISTINCT. (Case Study 1.) - Wrong join type —
INNERwhere you neededLEFTsilently drops no-match rows and biases counts/averages. (Case Study 2.) - Ambiguous column — qualify with the alias.
- Outer-join
WHEREtrap — a condition on the right table inWHEREturnsLEFTback intoINNER. Filter the optional side inON, the preserved side inWHERE.
Two patterns to memorize
- Anti-join:
A LEFT JOIN B … WHERE B.key IS NULL→ "A with no B." - Include the zeros:
A LEFT JOIN B … COALESCE(SUM(B.val), 0)→ per-A totals counting empty A's as 0.
Always ask, of every join
"What does one row represent now (the grain)?" and "Do the no-match rows matter to this question?" The first guards against fan-out; the second tells you
INNERvs.LEFT.
Performance (theme #5)
Index the columns you join on — especially foreign keys. An unindexed join on a large table is a classic 45-second query. The optimizer chooses nested-loop / hash / merge joins; you'll read its choice with EXPLAIN in Ch. 24.
You can now…
- ☐ Write every join type, including self-joins.
- ☐ Build 4–6 table queries by chaining two-table joins.
- ☐ Use the anti-join and "include the zeros" patterns.
- ☐ Diagnose Cartesian products, fan-out, and the outer-join
WHEREtrap. - ☐ Reason about grain and no-match rows before trusting a result.
Looking ahead
Chapter 7 — Aggregation. COUNT, SUM, AVG, GROUP BY, HAVING — condensing joined rows into the totals and averages that answer business questions (and the right place to fix fan-out).
One sentence to carry forward: Joins reassemble well-designed data — but every join has a grain and a choice of "keep the non-matches or not," and getting those two things right is the difference between a correct answer and a confident wrong one.