Chapter 7 — Exercises

Run against mercado. Aggregation is the analyst's core skill — do these until GROUP BY is automatic. (answer in Appendix) = worked solution in Answers. ⭐ = stretch.


Group A — Whole-table aggregates

7.1 How many products are in the catalog? (answer in Appendix)

7.2 What are the minimum, maximum, and average product price (round the average to 2 decimals)?

7.3 How many customers have a phone number on file? Compare COUNT(*) to COUNT(phone) and explain the difference. (answer in Appendix)

7.4 How many distinct loyalty tiers actually appear in customers?

7.5 ⭐ What is the total catalog value (sum of all prices)? Then the total of only active products.


Group B — GROUP BY

7.6 Count customers per loyalty tier, most common tier first. (answer in Appendix)

7.7 For each category_id, show the number of products and their average price.

7.8 Count orders per status.

7.9 For each product, show its average review rating and the number of reviews (join reviews). (answer in Appendix)

7.10 ⭐ Average product price per (category_id, is_active) combination.


Group C — HAVING

7.11 Which loyalty tiers have more than 2 customers? (answer in Appendix)

7.12 Which categories have an average product price above $100?

7.13 Which products have received more than one review?

7.14 ⭐ Among active products only (WHERE), which categories have 2 or more products (HAVING)? Show count and average price.


Group D — WHERE vs HAVING

7.15 Explain why this errors, and fix it: (answer in Appendix)

SELECT loyalty_tier, COUNT(*) FROM customers WHERE COUNT(*) > 2 GROUP BY loyalty_tier;

7.16 Rewrite "categories whose average active-product price exceeds $100" putting the row filter in WHERE and the group filter in HAVING.

7.17 ⭐ A query uses HAVING is_active = true. Why does it run but is poor style? Where should that condition go, and why?


Group E — Aggregation across joins

7.18 Total revenue (SUM(quantity * unit_price)) per customer, including customers with no orders (show them as 0). Use LEFT JOIN + COALESCE. (answer in Appendix)

7.19 Number of orders per customer. Use COUNT(DISTINCT order_id) and explain why COUNT(*) would be wrong if you also join order_items.

7.20 Revenue per category (join order_items → products → categories), highest first.

7.21 ⭐ Average order value: total revenue divided by number of distinct orders. (Be careful about fan-out and about which denominator you use.)


Group F — Predict / fix

7.22 Predict the difference, then run both: (answer in Appendix)

SELECT AVG(discount) FROM order_items;
SELECT AVG(COALESCE(discount, 0)) FROM order_items;

(In Mercado discount is NOT NULL with default 0 — so reason about the general case where it could be NULL.)

7.23 This "number of orders per customer" double-counts. Find and fix the bug:

SELECT o.customer_id, COUNT(*) AS orders
FROM orders o JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY o.customer_id;

7.24 ⭐ Explain why grouping by customer_id lets you also SELECT first_name without aggregating it, but grouping by loyalty_tier does not let you SELECT first_name.


Group G — Advanced grouping

7.25 Revenue per category with a grand-total row using ROLLUP. (answer in Appendix)

7.26 ⭐ Using GROUPING SETS, produce, in one query: revenue by category, and revenue by order status, and the grand total.


Group H — Progressive project

7.27 Write three summaries for your domain: one whole-table aggregate, one GROUP BY, one GROUP BY + HAVING.

7.28 ⭐ Write one summary across a join that uses LEFT JOIN + COALESCE (include empty groups) and COUNT(DISTINCT ...) (defeat fan-out).


Self-check. If you can write GROUP BY ... HAVING from a plain-English request, explain WHERE vs HAVING, and aggregate correctly across a fan-out join, you've got the analyst's core skill. Next: the functions that compute and clean the values you aggregate.