Chapter 7 — Key Takeaways
The big idea
Aggregation condenses rows into summaries — the numbers that answer business questions. Aggregate functions collapse rows; GROUP BY does it per group; HAVING filters groups. With joins, this answers almost any analytical question.
Aggregate functions
COUNT(*)= rows;COUNT(col)= non-NULL values;COUNT(DISTINCT col)= distinct non-NULL values.SUM,AVG,MIN,MAXskipNULLs — useCOALESCE(col, 0)if "missing" should mean "zero."
GROUP BY
- Produces one row per distinct group (single or multi-column).
- Cardinal rule: every
SELECT-list column must be inside an aggregate or inGROUP BY(or functionally determined by a grouped PK).
WHERE vs HAVING
| Clause | Filters | When |
|---|---|---|
WHERE |
rows | before grouping (can't use aggregates) |
HAVING |
groups | after aggregation (uses aggregates) |
Put row conditions in WHERE (faster, clearer); put aggregate conditions in HAVING.
Full logical evaluation order
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
Explains why WHERE can't see aggregates, HAVING can, and ORDER BY can use SELECT aliases.
Aggregating across joins — two rules
- Include the empties:
LEFT JOIN+COALESCE(SUM(...), 0)so zero-groups show as 0. (Ch. 6.) - Defeat fan-out: a one-to-many join multiplies rows, so
COUNT(*)counts the "many." UseCOUNT(DISTINCT key)(or aggregate the "one" table separately) to count the "one." (Case Study 2.)
Two analytics traps
- Average of averages ≠ overall average. Averaging per-group averages weights groups equally regardless of size (Simpson's paradox). Compute metrics at their natural grain:
SUM(value)/COUNT(*), and always show counts beside averages. (Case Study 1.) - Mixing grains in one query makes at least one metric wrong. Count orders at order-grain, sum revenue at line-item grain.
Advanced grouping
GROUP BY ROLLUP (...) adds a grand total; CUBE adds all-combination subtotals; GROUPING SETS picks exact levels — for dashboards and pivot reports.
Performance (theme #5)
Filter with WHERE before grouping; index grouped/filtered columns; cache hot summaries with materialized views (Ch. 15). Quantify with EXPLAIN (Ch. 24).
You can now…
- ☐ Use all aggregates and the three
COUNTforms. - ☐ Write
GROUP BY(obeying the cardinal rule) andHAVING. - ☐ Distinguish
WHEREvsHAVINGand state the full evaluation order. - ☐ Aggregate across joins with
LEFT JOIN+COALESCEandCOUNT(DISTINCT …). - ☐ Avoid the average-of-averages and mixed-grain traps; validate against a join-free baseline.
- ☐ Produce subtotals with
ROLLUP/CUBE/GROUPING SETS.
Looking ahead
Chapter 8 — Built-in Functions. The string, numeric, date/time, and type-conversion functions you'll use to clean and compute the values you select, filter, group, and aggregate.
One sentence to carry forward: Every aggregate has a grain and a NULL behavior — get those right, show your denominators, and check against the one-line baseline, and your summaries will be trustworthy.