> Where you are: Part II, Chapter 12 of 40. GROUP BY (Chapter 7) collapses rows into summaries. Window functions compute summaries while keeping every row — rankings, running totals, moving averages, "compare this row to its group." This is the most...
In This Chapter
- The thing GROUP BY can't do
- The OVER clause: defining the window
- Ranking functions
- Aggregate window functions: running totals and group totals
- LAG and LEAD: reaching adjacent rows
- FIRST_VALUE, LAST_VALUE, NTH_VALUE
- Frames: moving averages and windowed ranges
- You can't filter on a window function in WHERE
- A mental model: see the window, keep the row
- PARTITION BY, deeper
- Deduplication with ROW_NUMBER: the keep-one-per-group trick
- Named windows: defining a window once
- A performance note (preview of Chapter 24)
- Distribution and statistical window functions
- Gaps and islands: a classic window pattern
- A worked scenario: a month-over-month dashboard
- Window functions versus the alternatives
- Frames in depth: ROWS, RANGE, and GROUPS
- A field guide to window-function pitfalls
- Why this chapter matters most for analysts
- Portability, history, and the standard
- Progressive project: add analytics
- Summary
Chapter 12: Window Functions — Analytics Without Losing Row Detail
Where you are: Part II, Chapter 12 of 40.
GROUP BY(Chapter 7) collapses rows into summaries. Window functions compute summaries while keeping every row — rankings, running totals, moving averages, "compare this row to its group." This is the most underused feature in SQL, and the one that most often makes people say "I didn't know SQL could do that."Learning paths: 💻 📊 🔬 🏗️ — everyone, but 📊 analysts: this is your power tool. Master it and you'll replace dozens of awkward self-joins and correlated subqueries.
The thing GROUP BY can't do
Suppose you want, for each product, its price and the average price of its category, on the same row. With GROUP BY you can get the per-category average — but only as collapsed summary rows; you lose the individual products. To attach the group's average to every product row, you'd historically write a self-join or a correlated subquery (Chapter 9) — clumsy and often slow.
Window functions solve this directly: they compute a value over a set of rows related to the current row (a "window") without collapsing those rows. Every input row stays in the output, now carrying its computed value. That's the whole idea, and it's transformative:
SELECT name, category_id, price,
ROUND(AVG(price) OVER (PARTITION BY category_id), 2) AS category_avg,
ROUND(price - AVG(price) OVER (PARTITION BY category_id), 2) AS vs_category_avg
FROM products
ORDER BY category_id, price DESC;
name | category_id | price | category_avg | vs_category_avg
-------------------+-------------+--------+--------------+-----------------
UltraBook 16 Pro | 3 | 1899.00| 1186.50 | 712.50
UltraBook 14 | 3 | 1299.00| 1186.50 | 112.50
BudgetBook 13 | 3 | 549.00| 1186.50 | -637.50
...
Every product row remains, and each shows its category's average and how far it deviates. No join, no subquery — just AVG(...) OVER (...).
The OVER clause: defining the window
A window function is an ordinary function followed by OVER (...). The OVER clause defines which rows the function looks at — the window — using up to three parts:
function(...) OVER (
PARTITION BY <cols> -- split rows into groups (optional)
ORDER BY <cols> -- order within each group (optional)
<frame> -- which rows in the group count (optional)
)
PARTITION BYdivides rows into groups (likeGROUP BY, but without collapsing). The function restarts for each partition. Omit it, and the whole result is one window.ORDER BYorders rows within each partition — essential for running totals, rankings, andLAG/LEAD.- The frame narrows the window to a range of rows relative to the current one (for moving averages). We'll get to it.
OVER () with nothing inside means "the entire result set is the window" — useful for "% of grand total."
Ranking functions
These assign a rank to each row within its partition, based on the window's ORDER BY:
SELECT name, category_id, price,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS rn,
RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS dense
FROM products
ORDER BY category_id, price DESC;
ROW_NUMBER()— a unique sequential number per partition (1, 2, 3, …), even for ties.RANK()— same rank for ties, then skips (1, 1, 3).DENSE_RANK()— same rank for ties, no gaps (1, 1, 2).NTILE(n)— divides rows intonroughly equal buckets (quartiles, deciles).
The difference between ROW_NUMBER, RANK, and DENSE_RANK matters: for "the single most expensive product per category," ROW_NUMBER() = 1 gives exactly one; RANK() = 1 could give several if there's a price tie.
The top-N-per-group pattern. Window ranking is the clean solution to "the top N rows within each group" — a question that's painful with joins. Rank in a subquery/CTE, then filter:
sql SELECT * FROM ( SELECT name, category_id, price, ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS rn FROM products ) ranked WHERE rn <= 2; -- the 2 priciest products in each category(Why the subquery? See "you can't filter on a window function in WHERE" below.)
Aggregate window functions: running totals and group totals
Any aggregate (SUM, AVG, COUNT, MIN, MAX) becomes a window aggregate with OVER. Two big uses:
Group total alongside each row (no collapsing) — and "% of total":
SELECT name, price,
SUM(price) OVER () AS catalog_total, -- grand total on every row
ROUND(100.0 * price / SUM(price) OVER (), 2) AS pct_of_catalog
FROM products
ORDER BY price DESC;
Running total (a cumulative sum in order) — add ORDER BY inside OVER:
SELECT order_id, order_date,
SUM(amount) OVER (ORDER BY order_date) AS running_revenue
FROM payments
ORDER BY order_date;
When OVER has an ORDER BY, the aggregate becomes cumulative — it sums from the start of the partition up to the current row. Add PARTITION BY to get a running total per group (e.g., running revenue per customer).
LAG and LEAD: reaching adjacent rows
LAG and LEAD access a previous or next row within the window — the key to period-over-period comparisons ("how did this month compare to last month?"):
WITH monthly AS (
SELECT date_trunc('month', order_date) AS month,
SUM(amount) AS revenue
FROM payments
GROUP BY 1
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS change,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0), 1) AS pct_change
FROM monthly
ORDER BY month;
LAG(revenue) returns the previous month's revenue on each row, so you can compute the change and percent change inline. LEAD does the same looking forward. Before window functions, this required an awkward self-join on "month = month − 1"; now it's a single expression. (Note the NULLIF guard for the first month, which has no predecessor — Chapter 8.)
FIRST_VALUE, LAST_VALUE, NTH_VALUE
These return a specific row's value from within the window — e.g., each product's price and the highest price in its category, side by side:
SELECT name, category_id, price,
FIRST_VALUE(name) OVER (PARTITION BY category_id ORDER BY price DESC) AS priciest_in_category
FROM products;
Common mistake —
LAST_VALUEand the default frame.LAST_VALUEoften surprises people by returning the current row instead of the partition's last. That's because when you specifyORDER BYin a window, the default frame is "from the start up to the current row" — so the "last value" is, by default, the current row. To get the true last value, set the frame explicitly:LAST_VALUE(x) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING). Understanding frames (next) resolves this.
Frames: moving averages and windowed ranges
The frame narrows the window to rows relative to the current row, which is how you compute moving averages and rolling sums. The syntax is ROWS BETWEEN <start> AND <end>:
-- 3-row moving average of daily revenue (current row + 2 preceding)
WITH daily AS (
SELECT order_date::date AS day, SUM(amount) AS revenue
FROM payments GROUP BY 1
)
SELECT day, revenue,
ROUND(AVG(revenue) OVER (ORDER BY day
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 2) AS moving_avg_3
FROM daily
ORDER BY day;
Frame bounds: UNBOUNDED PRECEDING, n PRECEDING, CURRENT ROW, n FOLLOWING, UNBOUNDED FOLLOWING. The crucial default to remember:
- With
ORDER BYand no explicit frame, the default frame isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW— i.e., cumulative (this is why running totals work, and whyLAST_VALUEsurprises). - With no
ORDER BY, the frame is the whole partition.
ROWSvsRANGE.ROWScounts physical rows (exactly N preceding);RANGEgroups by value (all rows with the sameORDER BYvalue count together, andRANGEis needed for value-based windows like "within 7 days"). For most moving-average work you wantROWS. The distinction matters with ties — worth knowing, detailed in the docs.
You can't filter on a window function in WHERE
Window functions are computed at the SELECT stage — after WHERE, GROUP BY, and HAVING (recall the evaluation order, Chapter 7). So you cannot reference a window function in WHERE or HAVING:
-- ERROR: window functions are not allowed in WHERE
SELECT name, ROW_NUMBER() OVER (ORDER BY price DESC) AS rn
FROM products WHERE rn <= 3;
To filter on a window result, compute it in a subquery or CTE, then filter in the outer query (the top-N-per-group pattern above). This is the single most common window-function stumbling block — and the fix is always "wrap it and filter outside."
A mental model: see the window, keep the row
The single idea that unlocks window functions is right there in the name: each function looks through a window onto the data — a set of rows related to the current row — computes something over what it sees, and writes the answer onto the current row, which stays. Contrast GROUP BY, which gathers rows into a pile and replaces the whole pile with one summary row. A window function never replaces anything; it annotates. Every input row survives to the output, now carrying one more piece of information computed from its neighbors.
This "annotate, don't collapse" framing explains every behavior in the chapter. Why can you show a product's price and its category average on the same row? Because the window function annotated each product row with a value computed over the window of its category-mates, without disturbing the rows themselves. Why does a running total work? Because each row's window is "everything from the start up to me," and the function annotates each row with the sum over that window — which grows row by row. Why does LAG reach the previous row? Because its window includes the neighbor, and it annotates the current row with a value pulled from there. In every case, the recipe is identical: define a window relative to each row, compute over it, annotate the row.
The window itself is defined by the OVER clause in up to three independent decisions, and separating them in your mind is the key to fluency. PARTITION BY decides which rows are even eligible to be in the window — it walls off the data into groups, and a row's window can only contain rows from its own partition. ORDER BY decides the sequence of rows within the partition, which matters whenever "before" and "after" are meaningful (running totals, ranking, LAG/LEAD). The frame decides which of the ordered rows actually count — all of them, or just a sliding range around the current row. Most confusion about window functions dissolves once you ask, for any given function, three separate questions: partitioned how, ordered how, framed how? Answer those and you've fully specified what the function sees.
Holding this model — window as a per-row lens, defined by partition/order/frame, that annotates rather than collapses — turns the rest of the chapter from a list of functions to memorize into variations on one clear theme. The functions differ (rank, sum, lag, first-value), but the machinery they ride on is always the same window, and you control that machinery with the same three knobs every time.
PARTITION BY, deeper
PARTITION BY is where most of a window function's power lives, so it rewards a closer look. It works like GROUP BY in deciding the groups, but — the crucial difference, restated because it's everything — it doesn't collapse them. The function restarts at each partition boundary: a ROW_NUMBER() resets to 1 in each new partition, a running SUM starts over, a RANK re-ranks from the top. You can think of PARTITION BY category_id as saying "do this computation separately within each category, but leave all the rows in place."
You can partition by multiple columns, and the partition is then each distinct combination — exactly like multi-column GROUP BY. "Rank products by price within each (category, active-status) group" partitions by both columns, so active and inactive products in the same category are ranked separately. And you can omit PARTITION BY entirely, which makes the whole result set one big partition — the form you use for "% of grand total" (SUM(x) OVER ()), where every row should see the total over everything, not a per-group subtotal.
A subtle and powerful capability: different window functions in the same query can use different partitions. One column can show a value relative to the category while another shows a value relative to the supplier, each with its own OVER clause, all on the same row:
SELECT name, category_id, supplier_id, price,
RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank_in_category,
RANK() OVER (PARTITION BY supplier_id ORDER BY price DESC) AS rank_for_supplier,
ROUND(AVG(price) OVER (PARTITION BY category_id), 2) AS category_avg
FROM products
ORDER BY category_id, price DESC;
This is something GROUP BY fundamentally cannot do — a single GROUP BY collapses to one grouping, but a query can carry as many different windows as you like, each annotating the same surviving rows from a different angle. Seeing a product's standing within its category and within its supplier's catalog and against the category average, all at once, on one row, is the kind of multi-perspective analysis that makes window functions the analyst's favorite tool. Each OVER clause is independent; compose as many as the question needs.
Deduplication with ROW_NUMBER: the keep-one-per-group trick
Beyond top-N, ROW_NUMBER() solves another extremely common real-world problem so cleanly it deserves its own treatment: deduplication — keeping exactly one row per group when your data has unwanted duplicates. Suppose an import left you with several rows per customer email and you want to keep only the most recent of each. Rank within each email by recency, then keep rank 1:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM customers_messy
)
SELECT * FROM ranked WHERE rn = 1; -- one row per email: the newest
The PARTITION BY email makes each email its own group; the ORDER BY created_at DESC puts the newest first within each group; ROW_NUMBER() numbers them so the newest gets rn = 1; and the outer filter keeps just those. The result is exactly one row per email — the deduplication done declaratively, with full control over which duplicate survives (newest, oldest, highest-value — just change the ORDER BY). This is dramatically cleaner than the GROUP BY-and-rejoin gymnastics people often resort to, and it generalizes to any "keep the best/latest/first row per group" task.
Why ROW_NUMBER and not RANK here? Because ROW_NUMBER guarantees a unique number even when the ORDER BY values tie — so you always keep exactly one row per group, never accidentally two. (If two messy rows shared the same created_at, RANK would give both rank 1 and you'd keep both; ROW_NUMBER breaks the tie arbitrarily but decisively.) That guarantee is precisely what deduplication needs. This single pattern — partition by the identity, order by your tiebreaker, keep ROW_NUMBER() = 1 — is one of the most useful things in all of SQL, and you'll reach for it constantly in data cleaning and reporting. It's worth committing to memory whole.
Named windows: defining a window once
When several functions in a query share the same window specification, repeating the OVER (PARTITION BY ... ORDER BY ...) clause is verbose and error-prone — change it in one place and forget another, and your columns silently disagree. SQL lets you define a window once, name it, and reference it, with the WINDOW clause:
SELECT name, category_id, price,
RANK() OVER w AS rnk,
DENSE_RANK() OVER w AS dense,
ROUND(AVG(price) OVER w, 2) AS category_avg
FROM products
WINDOW w AS (PARTITION BY category_id ORDER BY price DESC)
ORDER BY category_id, price DESC;
The WINDOW w AS (...) clause (which sits after HAVING and before ORDER BY) defines the window named w, and each function references it with OVER w. Now the partition and order are stated exactly once; changing them updates every function at once, and there's no risk of three slightly-different copies drifting apart. You can even define several named windows and reference them independently, and one named window can build on another. For any query with more than one or two window functions sharing a specification, named windows are the readable, maintainable choice — the same impulse toward naming-for-clarity that CTEs embody, applied to window definitions. It's a small feature, but it markedly cleans up analytics queries, which tend to accumulate many windowed columns.
A performance note (preview of Chapter 24)
Window functions require the database to sort rows by the PARTITION BY/ORDER BY columns, so a matching index can avoid a sort. They're generally much faster than the self-joins or correlated subqueries they replace — computing a running total with a window function is one pass; doing it with a correlated subquery is O(n²). When you find yourself reaching for a self-join "to compare each row to others in its group," a window function is almost always the better answer. EXPLAIN (Chapter 24) shows the WindowAgg node.
Distribution and statistical window functions
Beyond ranking and running totals, a family of window functions describes where each row sits in a distribution — invaluable for analytics that ask "how does this compare to everything else?" PERCENT_RANK() gives each row's relative rank as a fraction from 0 to 1 (the cheapest product is near 0, the priciest near 1); CUME_DIST() gives the cumulative distribution (the fraction of rows at or below this one); and NTILE(n) slots rows into n equal-sized buckets, which is how you compute quartiles, deciles, or percentile bands.
SELECT name, price,
ROUND(PERCENT_RANK() OVER (ORDER BY price)::numeric, 2) AS pct_rank,
NTILE(4) OVER (ORDER BY price) AS price_quartile
FROM products
ORDER BY price;
NTILE(4) labels each product 1–4 by price quartile, so you can instantly segment a catalog into "budget / lower-mid / upper-mid / premium" bands without hand-picking thresholds — the data decides the boundaries. These are the tools behind customer segmentation ("which decile of spend is this customer in?"), performance bands ("top quartile of sales reps"), and outlier detection ("flag rows above the 95th percentile"). They partition just like any window function, so "quartile of price within each category" is a PARTITION BY category_id away. Together with the percentile_cont aggregate from Chapter 7, they give PostgreSQL a genuinely capable statistical vocabulary — another instance of theme #4, where the database absorbs work people often assume needs a separate analytics tool.
Gaps and islands: a classic window pattern
One of the most celebrated uses of window functions is solving the "gaps and islands" problem — finding consecutive runs (islands) and the breaks between them (gaps) in a sequence. It sounds abstract but appears constantly: consecutive days a user was active, unbroken streaks of on-time deliveries, contiguous ranges of available seat numbers, periods of uninterrupted subscription. The technique is clever and worth knowing, because once you've seen it you'll recognize the pattern everywhere.
The trick exploits a beautiful property: if you number rows sequentially with ROW_NUMBER() and subtract that from a value that should increase by one each step (like a day number), the difference stays constant within a consecutive run and jumps at each gap. Rows sharing the same difference belong to the same island. So you compute that difference, then GROUP BY it to collapse each island:
-- Find streaks of consecutive active days per customer.
WITH daily AS (
SELECT DISTINCT customer_id, order_date::date AS day FROM orders
),
grouped AS (
SELECT customer_id, day,
day - (ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY day))::int AS grp
FROM daily
)
SELECT customer_id, MIN(day) AS streak_start, MAX(day) AS streak_end,
COUNT(*) AS streak_length
FROM grouped
GROUP BY customer_id, grp
HAVING COUNT(*) >= 2
ORDER BY customer_id, streak_start;
The day - ROW_NUMBER() expression is the heart of it: within a run of consecutive days, both the day and the row number increase by one each step, so their difference is constant; the instant a day is skipped, the day jumps but the row number doesn't, so the difference changes and a new island begins. Grouping by that difference (grp) collapses each island into a single row with its start, end, and length. This is the kind of result that looks almost impossible to a beginner and falls out in a few lines once you know the pattern. It combines nearly everything in this chapter — ROW_NUMBER, partitioning, ordering — with the aggregation of Chapter 7, and it's a rite of passage in SQL mastery. Recognizing "this is a gaps-and-islands problem" when a question involves consecutive or streak or contiguous is the real skill; the technique is then reusable verbatim.
A worked scenario: a month-over-month dashboard
Let's assemble a realistic analytics dashboard query that uses several window functions together, the way a real reporting task would. The business wants, per month: revenue, the running cumulative revenue for the year, the previous month's revenue, the month-over-month percentage change, and each month's share of the year's total. Every one of those is a window function, and they compose into a single query over a monthly-aggregated base.
WITH monthly AS (
SELECT date_trunc('month', order_date) AS month,
SUM(total_amount) AS revenue
FROM orders
GROUP BY date_trunc('month', order_date)
)
SELECT
month,
revenue,
SUM(revenue) OVER (ORDER BY month) AS cumulative_revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0), 1) AS mom_pct_change,
ROUND(100.0 * revenue / SUM(revenue) OVER (), 1) AS pct_of_year
FROM monthly
ORDER BY month;
Trace each measure to its tool. The base CTE aggregates orders to monthly grain (Chapter 7) — window functions need a clean per-period base to operate over. SUM(revenue) OVER (ORDER BY month) is a running total — the cumulative-revenue column, growing month by month. LAG(revenue) OVER (ORDER BY month) reaches back to the previous month, enabling both the raw previous figure and the month-over-month percentage change (with a NULLIF guard for the first month and 100.0 to force decimal division — Chapter 8 lessons still earning their keep). And SUM(revenue) OVER () — no ORDER BY, no PARTITION BY — is the grand total over every month, used for each month's share of the year.
This one query would have required, before window functions, a tangle of self-joins (for the previous month), a correlated subquery (for the running total), and another subquery (for the grand total) — slow, verbose, and hard to verify. With window functions it's a single readable pass, each column a clear expression. This is exactly why analysts prize the feature: real dashboards are full of "compared to last period," "running total," "share of whole," and "rank against peers," and window functions express all of them natively. Build a query like this and you've built the analytical core of a real reporting dashboard.
Window functions versus the alternatives
It's worth being explicit about what window functions replace, because recognizing those situations is what prompts you to reach for the better tool. Three older patterns are the ones window functions most often supersede, and each was genuinely painful.
The self-join for comparing a row to its group — "show each product alongside its category average" done by joining the table to an aggregated copy of itself — is replaced by a single AVG(...) OVER (PARTITION BY ...). The window version is shorter, clearer, and usually faster, because it computes the average in one pass rather than materializing and joining an aggregate. The correlated subquery for per-row computation — "for each product, a subquery counting its reviews" — is replaced by a window function or a join-and-window, turning a potential O(n²) per-row re-scan into a single O(n log n) pass. And the self-join for adjacent rows — comparing each month to the previous by joining on "month = month − 1" — is replaced by LAG/LEAD, which is both simpler and immune to the off-by-one and missing-row bugs that plague the join version.
The practical heuristic: whenever you catch yourself about to join a table to itself, or to an aggregated copy of itself, or to write a correlated subquery that recomputes something per row, pause and ask whether a window function does it better. The answer is yes more often than not. This isn't to say joins and subqueries are obsolete — they remain the right tools for combining different tables and for many filtering tasks — but for the specific shape of "compute something over a group of rows while keeping the rows," window functions are almost always the superior choice on every axis: brevity, readability, and speed. Internalizing that recognition is what moves you from "I've heard of window functions" to "I instinctively reach for them," which is the leap this chapter exists to produce.
Frames in depth: ROWS, RANGE, and GROUPS
The frame is the most intricate part of window functions, and a deeper look pays off because the default frame is the source of the chapter's most notorious surprise. Recall the three frame modes. ROWS counts physical rows — "the 2 rows before this one and this one" is ROWS BETWEEN 2 PRECEDING AND CURRENT ROW, a precise count regardless of values. RANGE works by value — with an ORDER BY day, RANGE BETWEEN '7 days' PRECEDING AND CURRENT ROW includes every row whose day is within seven days, however many that is. GROUPS (newer) counts peer groups of equal ORDER BY values. For most moving-average and rolling-sum work you want ROWS, because you mean "the last N rows"; reach for RANGE when the window is genuinely value-based, like a time interval.
The critical thing to internalize is the default frame, because you get it whenever you write ORDER BY in a window without specifying a frame, and it's not always what you'd guess. With an ORDER BY present, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — everything from the partition's start up to (and including) the current row, with ties at the current value all included. This default is exactly why running totals work without you asking for a frame: each row sees everything up to itself, so the sum accumulates. And it's exactly why LAST_VALUE surprises people — "the last value in the window" is, under this default frame, the current row, because the window only extends up to the current row, not to the partition's end. To get the true partition-wide last value you must widen the frame explicitly to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- WRONG (default frame stops at current row, so this returns the current name):
LAST_VALUE(name) OVER (PARTITION BY category_id ORDER BY price)
-- RIGHT (widen the frame to the whole partition):
LAST_VALUE(name) OVER (PARTITION BY category_id ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
So the rule worth memorizing: ORDER BY in a window silently imposes a cumulative (start-to-current) frame. That one fact explains running totals, explains the LAST_VALUE trap, and tells you when you must override the frame. Ranking functions (ROW_NUMBER, RANK) and LAG/LEAD ignore the frame entirely (they're defined by position, not by a range of rows), so frames matter mainly for the aggregate window functions and the FIRST_VALUE/LAST_VALUE/NTH_VALUE family. When an aggregate-over-window returns something unexpected, the frame is the first thing to check.
A field guide to window-function pitfalls
Gathering the traps in one checklist, because window functions have a few sharp edges that catch everyone once:
Filtering on a window result in WHERE. Window functions are computed at the SELECT stage, after WHERE/GROUP BY/HAVING, so you can't reference them in those clauses. The fix is always the same: compute the window value in a subquery or CTE, then filter in the outer query (the wrap-and-filter pattern). This is the number-one window-function error, and recognizing "I need to filter on a rank" as "wrap it" is the cure.
The default-frame / LAST_VALUE surprise. As above — ORDER BY imposes a cumulative frame, so LAST_VALUE returns the current row unless you widen the frame. Whenever a windowed aggregate or value function returns something that looks "as of the current row" when you wanted "over the whole partition," suspect the default frame.
ROW_NUMBER versus RANK for top-N and dedup. For "exactly one per group," use ROW_NUMBER (unique even on ties); for "all rows tied for the top," use RANK. Choosing wrong gives you either too many rows (using RANK when you wanted one) or arbitrarily-dropped ties (using ROW_NUMBER when you wanted all the tied winners). Decide consciously based on how you want ties handled.
Forgetting PARTITION BY and getting a global computation. Omitting PARTITION BY makes the window the whole result set. Sometimes that's what you want (% of grand total); often it's a bug — you meant "per category" but got "over everything." If a windowed value looks like it ignored the groups, check whether you forgot to partition.
Performance from missing sort support. Window functions sort by the partition/order columns; without a supporting index on a large table, that sort can dominate the query. It's still usually far faster than the self-join it replaces, but for hot queries, an index matching the PARTITION BY/ORDER BY can eliminate the sort (Chapter 23).
Run a misbehaving window query past these five and you'll diagnose nearly everything. The functions are powerful and, once the model and these pitfalls are internalized, remarkably predictable — which is exactly why they repay the investment of learning them properly rather than copying patterns half-understood.
Why this chapter matters most for analysts
If you take one capability from Part II into an analytics career, make it this one. The reason is that the questions real organizations ask are overwhelmingly comparative and sequential: not "what is the revenue" but "how does this month compare to last," "which products rank top in their category," "what's the running total to date," "what share of the whole is this segment," "how many consecutive days did this continue." Every one of those is a window function, and expressing them any other way is painful enough that, before windows existed, a lot of this analysis simply got exported to spreadsheets and done by hand. Window functions bring it back into the database, where it's repeatable, fast, and close to the data.
The deeper reason is that window functions change how you think about analytical problems. Once the "annotate each row with something computed over its neighbors" model is natural, you start seeing data not as flat rows but as ordered, partitioned sequences with rich relationships between rows — this row versus its group, versus its predecessor, versus the whole. That perspective is precisely how a skilled analyst reads data, and the window-function vocabulary is what lets you express it directly to the database. Mastering this chapter is, in a real sense, learning to think analytically in SQL rather than merely retrieving with it. It is the capstone of the querying skills in Part II, and the tool that most distinguishes someone who can get data from someone who can analyze it.
A closing encouragement, because window functions intimidate people more than they should: there is genuinely nothing here harder than the GROUP BY you already know. A window function is GROUP BY's computation without GROUP BY's collapse — same aggregates, same partitioning instinct, just keeping the rows. The unfamiliarity is all in the OVER syntax and a few specific gotchas (the default frame, the can't-filter-in-WHERE rule), every one of which this chapter has named and defused. Type the examples, adapt them to your own data, hit the gotchas once or twice, and within a week the OVER clause will feel as natural as WHERE. The payoff — a permanent, portable, high-leverage analytical skill that comparatively few people truly possess — is out of all proportion to that modest week of practice. Make the investment; it is among the best you can make in all of SQL.
Portability, history, and the standard
Window functions are part of the SQL standard (they first appeared in SQL:2003) and are now supported by every major database — PostgreSQL, SQL Server, Oracle, and, since version 8.0, MySQL, along with modern SQLite. This is reassuring for portability: the core syntax you've learned here — OVER, PARTITION BY, the ranking functions, LAG/LEAD, windowed aggregates — works almost identically across them. The skills transfer, which matters because window functions are exactly the kind of high-value capability you'll want to carry between jobs and databases.
There's a small historical irony worth appreciating: window functions are simultaneously one of SQL's most powerful features and, for years, one of its most overlooked. They've been in PostgreSQL since version 8.4 (released in 2009) and in the standard since 2003, yet a remarkable number of working developers have never used them, defaulting instead to the self-joins and correlated subqueries this chapter showed they replace. Part of the reason is simply that introductory SQL courses often stop at GROUP BY, treating window functions as "advanced" — when in truth they're not conceptually harder than grouping, just less commonly taught. That gap is an opportunity: fluency with window functions is a genuine differentiator, because so many people who "know SQL" don't really know these. The analyst or engineer who reaches naturally for a window function where others reach for a clumsy self-join is visibly more capable, and the difference shows up in both the quality and the performance of their queries.
The newer corners — the GROUPS frame mode, the WINDOW clause's full flexibility, FILTER combined with window functions — vary a bit more across databases and versions, so check the specifics when you stray into them (Appendix J notes the differences). But the everyday 90% is standard and stable. Learn it once, on PostgreSQL, and you can wield it nearly everywhere, for the rest of your career, on data of any size. Few features in this book offer that combination of power, portability, and relative rarity — which is why the chapter has pressed so hard on mastering it rather than merely meeting it. The investment you make here repays itself every time a "compared to," "running," "ranked," or "streak" question appears — which, in analytical work, is most of the time.
Progressive project: add analytics
In your domain:
- Rank within groups (top 3 most-borrowed books per genre; highest-GPA students per major) with
ROW_NUMBER/RANK+ the wrap-and-filter pattern. - A running total (cumulative fines over time; enrollments to date) with
SUM(...) OVER (ORDER BY ...). - Period-over-period with
LAG(this month vs. last month). - % of total with
SUM(...) OVER ().
Add these to project-notes.md — they're the analytics your reports will showcase.
Summary
Window functions compute over a set of related rows (a window) without collapsing them — the analytics GROUP BY can't do. The OVER (PARTITION BY ... ORDER BY ... <frame>) clause defines the window. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) number rows within partitions — the basis of top-N-per-group. Aggregate window functions give group totals alongside rows and, with ORDER BY, running totals. LAG/LEAD reach adjacent rows for period-over-period comparisons. Frames (ROWS BETWEEN ...) enable moving averages — mind the default frame (cumulative), which explains the LAST_VALUE surprise. You can't filter on a window function in WHERE — wrap it in a subquery/CTE and filter outside. Window functions replace whole categories of slow self-joins and correlated subqueries with single, fast expressions.
You can now: - Use
OVER (PARTITION BY ... ORDER BY ...)to compute per-group values without collapsing rows. - Rank withROW_NUMBER/RANK/DENSE_RANK/NTILEand do top-N-per-group. - Compute running totals, group totals, and % of total with windowed aggregates. - Compare to adjacent rows withLAG/LEAD(period-over-period). - Define frames for moving averages, and explain the default-frame /LAST_VALUEgotcha. - Filter on window results correctly (wrap-and-filter).
What's next. Chapter 13 — Data Modification — INSERT, UPDATE, DELETE, TRUNCATE, RETURNING, and UPSERT (ON CONFLICT). You've been reading data for eight chapters; now you'll change it — safely, because UPDATE without a WHERE is the stuff of production horror stories.
Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.