Chapter 4 — Exercises

A mix of paper exercises (translate between algebra and SQL) and hands-on ones (run the SQL against mercado). Don't worry about perfect Greek notation — capturing the operations is what matters. (answer in Appendix) = worked solution in Answers. ⭐ = stretch.


Group A — Identify the operation

For each English description, name the relational-algebra operation (selection, projection, product, join, union, intersection, difference, rename) and the SQL clause it maps to.

4.1 "Keep only the products costing more than $200." (answer in Appendix)

4.2 "Show only the name and email columns of customers."

4.3 "Pair each order with its customer's details."

4.4 "List the emails of customers who have never placed an order."

4.5 "Compare each employee to their manager (same table, twice)."

4.6 ⭐ "Products that appear in both the 'on sale' list and the 'best sellers' list." Which set operation?


Group B — Algebra → SQL

Write the SQL for each algebra expression (against Mercado).

4.7 π first_name, last_name (σ loyalty_tier = 'gold' (customers)) (answer in Appendix)

4.8 σ price > 100 AND is_active = true (products)

4.9 π name, price (products ⋈ (products.category_id = categories.category_id) categories), keeping only category 'Books'

4.10 ⭐ π email (customers) − π email (customers ⋈ orders) — "customers who never ordered." (Hint: you can use EXCEPT, or an anti-join with NOT EXISTS — try the EXCEPT version.)


Group C — SQL → algebra

For each query, write the algebra expression (loosely) and label each operation.

4.11

SELECT sku, price FROM products WHERE price < 50;

(answer in Appendix)

4.12

SELECT o.order_id, c.email
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.status = 'delivered';

4.13

SELECT DISTINCT category_id FROM products;

What does the DISTINCT correspond to in pure relational algebra, and why does SQL need it here?


Group D — Joins as product + selection

4.14 Run SELECT count(*) FROM customers; and SELECT count(*) FROM orders;. Multiply the two counts. Now run SELECT count(*) FROM customers CROSS JOIN orders;. Does it match your product? Explain what the Cartesian product produced. (answer in Appendix)

4.15 Add a join condition to turn that Cartesian product into a meaningful join:

SELECT count(*) FROM customers c JOIN orders o ON o.customer_id = c.customer_id;

Why is the count now equal to the number of orders rather than the product?

4.16 ⭐ Explain, in terms of σ and ×, exactly what JOIN ... ON o.customer_id = c.customer_id is doing. Why does forgetting the ON condition produce a row explosion?


Group E — Optimization & equivalence

4.17 Two expressions: (a) σ customer_id = 5 (orders ⋈ customers) and (b) (σ customer_id = 5 (orders)) ⋈ customers. Are they equivalent? Which is cheaper to compute, and why? (answer in Appendix)

4.18 The optimizer "pushes selection down." In your own words, what does that mean and why does it help?

4.19 ⭐ Give one other algebra-preserving rewrite the optimizer might perform (besides pushing selection down). Why must such a rewrite preserve the result exactly?


Group F — Sets vs. bags

4.20 Run both and explain the difference in row counts:

SELECT category_id FROM products;
SELECT DISTINCT category_id FROM products;

(answer in Appendix)

4.21 What is the difference between UNION and UNION ALL? Which matches pure relational algebra, and which is faster?

4.22 ⭐ Why does SQL default to bag semantics (keeping duplicates) instead of set semantics? Give a performance reason.


Group G — Progressive project

4.23 Pick two questions your project must answer. For each, sketch the algebra: which tables (join?), which rows (σ), which columns (π)?

4.24 Is any of your questions a set question ("X but not Y," "X in both A and B")? If so, identify which set operation it needs.

4.25 ⭐ For one question, write two equivalent ways to express it (e.g., a join vs. a subquery, or filtering before vs. after a join). You'll later test which the optimizer prefers (Chapter 24).


Self-check. If you can translate fluidly between an English question, its algebra, and SQL — and explain a join as product-plus-selection — you're ready for Part II. The algebra is the skeleton; Chapter 5 starts putting flesh on it.