Chapter 7: Key Takeaways

DataField.Dev

Chapter 7: Key Takeaways

The Five-Minute Review

If you have five minutes before a meeting, a test, or a code review, read these points. They capture the most important and most commonly misunderstood concepts from Chapter 7.

1. COUNT Has Three Distinct Forms

COUNT(*) counts rows. Always. Regardless of NULLs.
COUNT(column) counts non-NULL values in that column.
COUNT(DISTINCT column) counts unique non-NULL values.

Confusing these three is one of the most common production SQL bugs. When in doubt, ask yourself: "Am I counting rows, non-null values, or unique values?"

2. All Aggregates (Except COUNT(*)) Ignore NULLs

SUM, AVG, MIN, MAX, STDDEV, and VARIANCE all skip NULL values. They operate only on the rows where the column has a value. This means:

AVG divides by the count of non-NULL values, not total rows.
SUM over an empty set (or all NULLs) returns NULL, not zero.
Use COALESCE(SUM(...), 0) whenever a sum might operate on an empty set.
Use AVG(COALESCE(column, 0)) only when you explicitly want NULLs treated as zeros.

3. The GROUP BY Rule Is Non-Negotiable

Every column in the SELECT list must either: - Appear in the GROUP BY clause, or - Be inside an aggregate function.

There are no exceptions. DB2 enforces this at parse time.

4. WHERE Filters Rows; HAVING Filters Groups

Clause	When it runs	What it can reference	Use for
WHERE	Before GROUP BY	Individual columns	Row-level filters
HAVING	After GROUP BY	Aggregate functions	Group-level filters

If a filter does not involve an aggregate, put it in WHERE. It is both syntactically correct and more performant.

5. ROLLUP, CUBE, and GROUPING SETS Are Shortcuts

Technique	Levels Produced	Pattern
ROLLUP(a, b, c)	N+1 = 4	(a,b,c), (a,b), (a), ()
CUBE(a, b, c)	2^N = 8	Every possible combination
GROUPING SETS(...)	Custom	Exactly what you specify

ROLLUP is for hierarchies (drill-down reports). CUBE is for cross-tabulation. GROUPING SETS is for when you need specific combinations that neither ROLLUP nor CUBE produces exactly.

6. Use GROUPING() to Identify Summary Rows

GROUPING(column) returns 1 when the column's NULL indicates a summary row (not real NULL data). Pair it with CASE to produce clean labels:

CASE GROUPING(branch_name)
    WHEN 1 THEN '--- All Branches ---'
    ELSE branch_name
END

7. LEFT JOIN + Aggregation Requires Care

When using LEFT JOIN with GROUP BY: - Use COUNT(right_table.column) instead of COUNT() to get correct zero counts. - Place right-table filters in the ON clause, not WHERE, to preserve unmatched rows. - Wrap aggregates in COALESCE* to prevent NULL totals for unmatched groups.

8. The Standard Reporting Pattern

The majority of real-world reporting queries follow this template:

SELECT descriptive_columns, aggregate_functions
FROM main_table
JOIN lookup_tables ON ...
WHERE row_level_filters
GROUP BY descriptive_columns
HAVING group_level_filters
ORDER BY presentation_order

Join first to get human-readable names. Filter before grouping. Group. Filter groups. Order for presentation.

9. Performance Essentials

Filter early: WHERE before GROUP BY reduces the data that needs to be sorted or hashed.
Covering indexes: If every column in the query exists in an index, DB2 can use index-only access.
SORTHEAP: GROUP BY often triggers sorts. If sorts overflow to disk, increase the SORTHEAP parameter.
Materialized Query Tables (MQTs): For frequently-run reports on large tables, pre-compute the aggregation and query the summary instead.

10. Conditional Aggregation Is Your Most Versatile Tool

SUM(CASE WHEN condition THEN value ELSE 0 END)

This pattern lets you split one column into multiple aggregated buckets in a single pass. Use it for credits vs. debits, checking vs. savings breakdowns, and any report that pivots data without a PIVOT operator.

One-Line Summaries by Section

Section	One-Line Summary
7.1	Aggregate functions collapse many rows into one value; know the three forms of COUNT.
7.2	GROUP BY partitions rows into buckets; every SELECT column must be grouped or aggregated.
7.3	WHERE filters rows before grouping; HAVING filters groups after aggregation.
7.4	All aggregates except COUNT(*) ignore NULLs; use COALESCE defensively.
7.5	GROUPING SETS produce custom multi-level summaries in one pass.
7.6	ROLLUP produces hierarchical subtotals: N columns yield N+1 levels.
7.7	CUBE produces every combination: N columns yield 2^N levels.
7.8	GROUPING() distinguishes summary NULLs from data NULLs.
7.9	Join tables for descriptive columns, then aggregate — the standard reporting pattern.
7.10	Meridian Bank reports combine conditional SUM, ROLLUP, and multi-table joins.
7.11	Filter early, use covering indexes, size SORTHEAP, and consider MQTs for large tables.