Chapter 4 — Key Takeaways
The big idea
Relational algebra is the small set of operations — taking relations in, returning relations out — that every SQL query is built from. Learn to see the algebra inside the SQL, and queries become compositions you reason about instead of incantations you memorize. This is the bridge from the model (Ch. 3) to fluent SQL (Part II).
The operations and their SQL
| Algebra | Does | SQL |
|---|---|---|
| σ selection | keep rows matching a condition | WHERE |
| π projection | keep chosen columns | SELECT list (+ DISTINCT for true set) |
| × Cartesian product | every pairing of two relations | CROSS JOIN / FROM a, b |
| ⋈ join | product + selection on a match | JOIN ... ON |
| ∪ ∩ − set ops | union / intersection / difference | UNION / INTERSECT / EXCEPT |
| ρ rename | rename relation/attributes | AS / aliases (essential for self-joins) |
| extended | grouping + aggregation, outer joins | GROUP BY, LEFT JOIN, … |
⚠️ False friend: SQL's
SELECTlist is projection (π); relational selection (σ) is theWHEREclause.
Big ideas to keep
- Closure (relations in → relations out) is why subqueries, CTEs, and views work — you can query the result of a query.
- A join is a Cartesian product followed by a selection. A missing join condition → an accidental full product (a row explosion). (Ch. 6 prevents this.)
- "Never / none / not any" is a difference (anti-join), not a row filter. Row filters (
WHERE) can't express the absence of a related row — useEXCEPT,NOT EXISTS, orLEFT JOIN … IS NULL. (Case Study 2.) - Optimization = equivalent algebra, cheaper plan. The optimizer rewrites your expression (e.g., pushing selection down — filter before joining) to a faster but equal form. Equivalence guarantees the same answer, not the same cost. (Case Study 1.)
- SQL works on bags (duplicates allowed), pure algebra on sets. Hence
DISTINCTand theUNIONvs.UNION ALLdistinction.
Why it pays off
- Reading a query plan (Ch. 24) = reading an algebra expression the optimizer chose.
- Writing filter-early shapes gives the optimizer better options — a performance instinct (theme #5).
- Classifying a question by its operation (join? difference? intersection?) makes the SQL mechanical.
You can now…
- ☐ Name the core operations and map each to its SQL clause.
- ☐ Explain a join as product + selection (and why a missing
ONexplodes rows). - ☐ Recognize "never bought Y"–style questions as set differences / anti-joins.
- ☐ Explain closure and how it enables subqueries/CTEs/views.
- ☐ Explain how equivalent algebra expressions drive query optimization.
- ☐ Distinguish set vs. bag semantics (
DISTINCT,UNION ALL).
Looking ahead — and a Part I milestone
Part I is complete. You understand databases, you have PostgreSQL + Mercado running, and you know the relational model and the algebra beneath SQL.
Chapter 5 (Part II begins) — SELECT, FROM, WHERE. You start writing real queries in earnest, and the algebra is right there under the syntax: FROM chooses the relation, WHERE is σ, the SELECT list is π. Time to become fluent.
One sentence to carry forward: Every SQL query is a composition of a few simple operations on relations — and once you can see those operations, you can read, write, and reason about any query.