Chapter 7: Key Takeaways
The Five-Minute Review
If you have five minutes before a meeting, a test, or a code review, read these points. They capture the most important and most commonly misunderstood concepts from Chapter 7.
1. COUNT Has Three Distinct Forms
- COUNT(*) counts rows. Always. Regardless of NULLs.
- COUNT(column) counts non-NULL values in that column.
- COUNT(DISTINCT column) counts unique non-NULL values.
Confusing these three is one of the most common production SQL bugs. When in doubt, ask yourself: "Am I counting rows, non-null values, or unique values?"
2. All Aggregates (Except COUNT(*)) Ignore NULLs
SUM, AVG, MIN, MAX, STDDEV, and VARIANCE all skip NULL values. They operate only on the rows where the column has a value. This means:
- AVG divides by the count of non-NULL values, not total rows.
- SUM over an empty set (or all NULLs) returns NULL, not zero.
- Use COALESCE(SUM(...), 0) whenever a sum might operate on an empty set.
- Use AVG(COALESCE(column, 0)) only when you explicitly want NULLs treated as zeros.
3. The GROUP BY Rule Is Non-Negotiable
Every column in the SELECT list must either: - Appear in the GROUP BY clause, or - Be inside an aggregate function.
There are no exceptions. DB2 enforces this at parse time.
4. WHERE Filters Rows; HAVING Filters Groups
| Clause | When it runs | What it can reference | Use for |
|---|---|---|---|
| WHERE | Before GROUP BY | Individual columns | Row-level filters |
| HAVING | After GROUP BY | Aggregate functions | Group-level filters |
If a filter does not involve an aggregate, put it in WHERE. It is both syntactically correct and more performant.
5. ROLLUP, CUBE, and GROUPING SETS Are Shortcuts
| Technique | Levels Produced | Pattern |
|---|---|---|
| ROLLUP(a, b, c) | N+1 = 4 | (a,b,c), (a,b), (a), () |
| CUBE(a, b, c) | 2^N = 8 | Every possible combination |
| GROUPING SETS(...) | Custom | Exactly what you specify |
ROLLUP is for hierarchies (drill-down reports). CUBE is for cross-tabulation. GROUPING SETS is for when you need specific combinations that neither ROLLUP nor CUBE produces exactly.
6. Use GROUPING() to Identify Summary Rows
GROUPING(column) returns 1 when the column's NULL indicates a summary row (not real NULL data). Pair it with CASE to produce clean labels:
CASE GROUPING(branch_name)
WHEN 1 THEN '--- All Branches ---'
ELSE branch_name
END
7. LEFT JOIN + Aggregation Requires Care
When using LEFT JOIN with GROUP BY: - Use COUNT(right_table.column) instead of COUNT() to get correct zero counts. - Place right-table filters in the ON clause, not WHERE, to preserve unmatched rows. - Wrap aggregates in COALESCE* to prevent NULL totals for unmatched groups.
8. The Standard Reporting Pattern
The majority of real-world reporting queries follow this template:
SELECT descriptive_columns, aggregate_functions
FROM main_table
JOIN lookup_tables ON ...
WHERE row_level_filters
GROUP BY descriptive_columns
HAVING group_level_filters
ORDER BY presentation_order
Join first to get human-readable names. Filter before grouping. Group. Filter groups. Order for presentation.
9. Performance Essentials
- Filter early: WHERE before GROUP BY reduces the data that needs to be sorted or hashed.
- Covering indexes: If every column in the query exists in an index, DB2 can use index-only access.
- SORTHEAP: GROUP BY often triggers sorts. If sorts overflow to disk, increase the SORTHEAP parameter.
- Materialized Query Tables (MQTs): For frequently-run reports on large tables, pre-compute the aggregation and query the summary instead.
10. Conditional Aggregation Is Your Most Versatile Tool
SUM(CASE WHEN condition THEN value ELSE 0 END)
This pattern lets you split one column into multiple aggregated buckets in a single pass. Use it for credits vs. debits, checking vs. savings breakdowns, and any report that pivots data without a PIVOT operator.
One-Line Summaries by Section
| Section | One-Line Summary |
|---|---|
| 7.1 | Aggregate functions collapse many rows into one value; know the three forms of COUNT. |
| 7.2 | GROUP BY partitions rows into buckets; every SELECT column must be grouped or aggregated. |
| 7.3 | WHERE filters rows before grouping; HAVING filters groups after aggregation. |
| 7.4 | All aggregates except COUNT(*) ignore NULLs; use COALESCE defensively. |
| 7.5 | GROUPING SETS produce custom multi-level summaries in one pass. |
| 7.6 | ROLLUP produces hierarchical subtotals: N columns yield N+1 levels. |
| 7.7 | CUBE produces every combination: N columns yield 2^N levels. |
| 7.8 | GROUPING() distinguishes summary NULLs from data NULLs. |
| 7.9 | Join tables for descriptive columns, then aggregate — the standard reporting pattern. |
| 7.10 | Meridian Bank reports combine conditional SUM, ROLLUP, and multi-table joins. |
| 7.11 | Filter early, use covering indexes, size SORTHEAP, and consider MQTs for large tables. |