Chapter 7 — Key Takeaways

The big idea

Aggregation condenses rows into summaries — the numbers that answer business questions. Aggregate functions collapse rows; GROUP BY does it per group; HAVING filters groups. With joins, this answers almost any analytical question.

Aggregate functions

  • COUNT(*) = rows; COUNT(col) = non-NULL values; COUNT(DISTINCT col) = distinct non-NULL values.
  • SUM, AVG, MIN, MAX skip NULLs — use COALESCE(col, 0) if "missing" should mean "zero."

GROUP BY

  • Produces one row per distinct group (single or multi-column).
  • Cardinal rule: every SELECT-list column must be inside an aggregate or in GROUP BY (or functionally determined by a grouped PK).

WHERE vs HAVING

Clause Filters When
WHERE rows before grouping (can't use aggregates)
HAVING groups after aggregation (uses aggregates)

Put row conditions in WHERE (faster, clearer); put aggregate conditions in HAVING.

Full logical evaluation order

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

Explains why WHERE can't see aggregates, HAVING can, and ORDER BY can use SELECT aliases.

Aggregating across joins — two rules

  • Include the empties: LEFT JOIN + COALESCE(SUM(...), 0) so zero-groups show as 0. (Ch. 6.)
  • Defeat fan-out: a one-to-many join multiplies rows, so COUNT(*) counts the "many." Use COUNT(DISTINCT key) (or aggregate the "one" table separately) to count the "one." (Case Study 2.)

Two analytics traps

  • Average of averages ≠ overall average. Averaging per-group averages weights groups equally regardless of size (Simpson's paradox). Compute metrics at their natural grain: SUM(value)/COUNT(*), and always show counts beside averages. (Case Study 1.)
  • Mixing grains in one query makes at least one metric wrong. Count orders at order-grain, sum revenue at line-item grain.

Advanced grouping

GROUP BY ROLLUP (...) adds a grand total; CUBE adds all-combination subtotals; GROUPING SETS picks exact levels — for dashboards and pivot reports.

Performance (theme #5)

Filter with WHERE before grouping; index grouped/filtered columns; cache hot summaries with materialized views (Ch. 15). Quantify with EXPLAIN (Ch. 24).

You can now…

  • ☐ Use all aggregates and the three COUNT forms.
  • ☐ Write GROUP BY (obeying the cardinal rule) and HAVING.
  • ☐ Distinguish WHERE vs HAVING and state the full evaluation order.
  • ☐ Aggregate across joins with LEFT JOIN+COALESCE and COUNT(DISTINCT …).
  • ☐ Avoid the average-of-averages and mixed-grain traps; validate against a join-free baseline.
  • ☐ Produce subtotals with ROLLUP/CUBE/GROUPING SETS.

Looking ahead

Chapter 8 — Built-in Functions. The string, numeric, date/time, and type-conversion functions you'll use to clean and compute the values you select, filter, group, and aggregate.

One sentence to carry forward: Every aggregate has a grain and a NULL behavior — get those right, show your denominators, and check against the one-line baseline, and your summaries will be trustworthy.