Chapter 12 — Key Takeaways

The big idea

Window functions compute over a set of related rows (a window) without collapsing them — the analytics GROUP BY can't do. They replace whole categories of slow self-joins and correlated subqueries with single, fast, one-pass expressions.

The OVER clause

function(...) OVER (PARTITION BY <cols> ORDER BY <cols> <frame>)
  • PARTITION BY — groups without collapsing (the row-preserving GROUP BY).
  • ORDER BY — orders within the partition (needed for running totals, ranking, LAG/LEAD).
  • OVER () — whole result is one window (grand total, % of total).

The function families

  • Ranking: ROW_NUMBER (unique), RANK (ties skip: 1,1,3), DENSE_RANK (ties no gap: 1,1,2), NTILE(n) (buckets).
  • Windowed aggregates: SUM/AVG/COUNT OVER (...) — group totals alongside rows; with ORDER BYrunning totals.
  • Offset: LAG/LEAD — previous/next row (period-over-period).
  • Positional: FIRST_VALUE/LAST_VALUE/NTH_VALUE.

Patterns to memorize

  • Top-N-per-group: ROW_NUMBER() OVER (PARTITION BY g ORDER BY x DESC) in a subquery/CTE, then WHERE rn <= N.
  • % of total: 100.0 * x / SUM(x) OVER ().
  • Running total: SUM(x) OVER (ORDER BY t).
  • Period-over-period: LAG(x) OVER (ORDER BY t).
  • Moving average: AVG(x) OVER (ORDER BY t ROWS BETWEEN n PRECEDING AND CURRENT ROW).

Two essential gotchas

  • Can't filter a window function in WHERE (it's computed after WHERE). Wrap-and-filter: compute in a subquery/CTE, filter outside. (The top-N pattern.)
  • Default frame is cumulative (... UNBOUNDED PRECEDING AND CURRENT ROW), so LAST_VALUE returns the current row. Use an explicit full frame, or FIRST_VALUE(... ORDER BY t DESC). (Case Study 2.)
  • ROWS (physical rows) vs RANGE (peer values) frames differ on ties.

Performance (theme #5)

One ordered pass instead of the O(n²) of a correlated-subquery running total (Case Study 1). A matching index can avoid the sort. When you reach for a self-join "to compare a row to its group," reach for a window function instead.

You can now…

  • ☐ Compute per-group values without collapsing rows (PARTITION BY).
  • ☐ Rank and do top-N-per-group (wrap-and-filter).
  • ☐ Compute running totals, % of total, and moving averages (frames).
  • ☐ Compare to adjacent rows with LAG/LEAD.
  • ☐ Avoid the WHERE-filter and default-frame/LAST_VALUE traps.

Looking ahead

Chapter 13 — Data Modification. INSERT, UPDATE, DELETE, TRUNCATE, RETURNING, UPSERT. You've read data for eight chapters; now you change it — safely (an UPDATE without WHERE is a production horror story).

One sentence to carry forward: Whenever you want to compute something across rows while keeping every row, it's a window function — just mind the default frame and that you can't filter it in WHERE.