Chapter 10 Key Takeaways

The One-Sentence Summary

Window functions, recursive CTEs, temporal tables, and OLAP grouping extensions let you reason about relationships between rows — across partitions, through hierarchies, over time, and at multiple levels of aggregation — transforming SQL from a row-retrieval language into a complete analytical platform.


Window Functions

  1. Window functions preserve rows; GROUP BY collapses them. A window function adds a computed column to every row without reducing the result set. This is the fundamental distinction.

  2. The OVER clause has three components: PARTITION BY (divides rows into independent groups), ORDER BY (defines sequence within each partition), and the frame specification (defines which rows the function can "see").

  3. Window functions execute after WHERE, GROUP BY, and HAVING. You cannot use a window function in a WHERE clause. To filter on a window function result, wrap it in a CTE or subquery.

  4. Always specify the frame explicitly when computing running totals. The default frame with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which includes all rows sharing the same ORDER BY value — not a row-by-row running total. Use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW with a tiebreaker in ORDER BY.

Ranking Functions

  1. Choose the right ranking function for the business requirement: - ROW_NUMBER: unique sequential numbers (pagination, deduplication) - RANK: competition-style ranking (gaps after ties) - DENSE_RANK: contiguous ranking (no gaps after ties) - NTILE(n): equal-sized buckets (percentile analysis)

  2. ROW_NUMBER is nondeterministic for ties. When two rows have the same ORDER BY value, which gets the lower number is unpredictable. Add a tiebreaker column to ORDER BY if determinism matters.

Aggregate Window Functions and Frames

  1. Any aggregate function becomes a window function by adding OVER. SUM, AVG, COUNT, MIN, MAX — all work with OVER, PARTITION BY, ORDER BY, and frame specifications.

  2. ROWS counts physical positions; RANGE compares values. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is always 7 rows. RANGE BETWEEN 100 PRECEDING AND CURRENT ROW includes all rows whose ORDER BY value is within 100 of the current row — the count varies.

  3. LAST_VALUE requires an explicit frame. Without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, the default frame ends at CURRENT ROW, making LAST_VALUE return the current row's own value.

LAG, LEAD, and Value Functions

  1. LAG looks backward; LEAD looks forward. Both accept an offset (default 1) and an optional default value for when the offset exceeds the partition boundary.

  2. LAG enables period-over-period comparisons without self-joins. Month-over-month, year-over-year, and sequential gap analysis all become single-pass operations.

Recursive CTEs

  1. Recursive CTEs have two parts: the anchor member (base case) and the recursive member (self-referencing step). They are connected by UNION ALL. The engine iterates until the recursive member returns no new rows.

  2. Always include a depth limit. Add WHERE level < N in the recursive member to prevent runaway recursion from data quality issues or unexpected cycles.

  3. Cycle detection requires tracking visited nodes. Build a path string and check it with LOCATE before recursing into a node. This is a safety net for hierarchical data that might contain circular references.

  4. Recursive CTEs materialize intermediate results. Keep CTE columns narrow and hierarchies shallow for performance. For frequently queried hierarchies, consider materializing paths in the base table.

Temporal Tables

  1. Three temporal models serve three purposes:

    • System-time: automatic audit trail (when the database changed)
    • Business-time: application-managed validity periods (when facts are effective)
    • Bitemporal: both dimensions (the regulatory gold standard)
  2. System-time versioning is automatic and transparent. Once enabled, every UPDATE and DELETE preserves the previous version in the history table. No application changes are required.

  3. FOR SYSTEM_TIME AS OF is "time travel." It returns the row version that was active at a specific timestamp, transparently querying both current and history tables.

  4. Business-time WITHOUT OVERLAPS enforces data integrity. The constraint guarantees that no two rows with the same key prefix have overlapping validity periods.

  5. Bitemporal queries use two FOR clauses. FOR SYSTEM_TIME AS OF selects the database snapshot; FOR BUSINESS_TIME AS OF selects the business-effective version within that snapshot. This answers "what did we believe at time X about what was true at time Y?"

OLAP Grouping Extensions

  1. ROLLUP produces hierarchical subtotals. ROLLUP(A, B, C) generates groupings (A,B,C), (A,B), (A), and () — progressively removing columns from right to left.

  2. CUBE produces all possible subtotal combinations. CUBE(A, B) generates (A,B), (A), (B), and () — useful for cross-tabulation analysis.

  3. GROUPING SETS gives explicit control. Specify exactly which groupings you need — no more, no less. More efficient than CUBE when you only need specific combinations.

  4. Use GROUPING() to distinguish subtotal NULLs from real NULLs. The GROUPING() function returns 1 when a column's NULL represents "all values" in a subtotal row.

The Meta-Lesson

  1. These features change how you think, not just what you can write. Once you internalize window functions, you will see opportunities in old code that you previously solved with subqueries, self-joins, and application logic. That perceptual shift — from "process rows" to "reason about relationships between rows" — is the true takeaway of this chapter.