Chapter 4 — Key Takeaways

The big idea

Relational algebra is the small set of operations — taking relations in, returning relations out — that every SQL query is built from. Learn to see the algebra inside the SQL, and queries become compositions you reason about instead of incantations you memorize. This is the bridge from the model (Ch. 3) to fluent SQL (Part II).

The operations and their SQL

Algebra Does SQL
σ selection keep rows matching a condition WHERE
π projection keep chosen columns SELECT list (+ DISTINCT for true set)
× Cartesian product every pairing of two relations CROSS JOIN / FROM a, b
join product + selection on a match JOIN ... ON
∪ ∩ − set ops union / intersection / difference UNION / INTERSECT / EXCEPT
ρ rename rename relation/attributes AS / aliases (essential for self-joins)
extended grouping + aggregation, outer joins GROUP BY, LEFT JOIN, …

⚠️ False friend: SQL's SELECT list is projection (π); relational selection (σ) is the WHERE clause.

Big ideas to keep

  • Closure (relations in → relations out) is why subqueries, CTEs, and views work — you can query the result of a query.
  • A join is a Cartesian product followed by a selection. A missing join condition → an accidental full product (a row explosion). (Ch. 6 prevents this.)
  • "Never / none / not any" is a difference (anti-join), not a row filter. Row filters (WHERE) can't express the absence of a related row — use EXCEPT, NOT EXISTS, or LEFT JOIN … IS NULL. (Case Study 2.)
  • Optimization = equivalent algebra, cheaper plan. The optimizer rewrites your expression (e.g., pushing selection down — filter before joining) to a faster but equal form. Equivalence guarantees the same answer, not the same cost. (Case Study 1.)
  • SQL works on bags (duplicates allowed), pure algebra on sets. Hence DISTINCT and the UNION vs. UNION ALL distinction.

Why it pays off

  • Reading a query plan (Ch. 24) = reading an algebra expression the optimizer chose.
  • Writing filter-early shapes gives the optimizer better options — a performance instinct (theme #5).
  • Classifying a question by its operation (join? difference? intersection?) makes the SQL mechanical.

You can now…

  • ☐ Name the core operations and map each to its SQL clause.
  • ☐ Explain a join as product + selection (and why a missing ON explodes rows).
  • ☐ Recognize "never bought Y"–style questions as set differences / anti-joins.
  • ☐ Explain closure and how it enables subqueries/CTEs/views.
  • ☐ Explain how equivalent algebra expressions drive query optimization.
  • ☐ Distinguish set vs. bag semantics (DISTINCT, UNION ALL).

Looking ahead — and a Part I milestone

Part I is complete. You understand databases, you have PostgreSQL + Mercado running, and you know the relational model and the algebra beneath SQL.

Chapter 5 (Part II begins) — SELECT, FROM, WHERE. You start writing real queries in earnest, and the algebra is right there under the syntax: FROM chooses the relation, WHERE is σ, the SELECT list is π. Time to become fluent.

One sentence to carry forward: Every SQL query is a composition of a few simple operations on relations — and once you can see those operations, you can read, write, and reason about any query.