26 min read

> "These are the SQL features that separate someone who knows SQL from someone who THINKS in SQL."

Chapter 10: Advanced SQL — Window Functions, OLAP Functions, Recursive Queries, and Temporal Queries

"These are the SQL features that separate someone who knows SQL from someone who THINKS in SQL."

Every chapter so far has built your vocabulary. You can retrieve data, filter it, join tables, group results, and modify rows with confidence. You have written hundreds of queries against our Meridian National Bank schema and you understand how the database engine processes them. That foundation matters — but it has also been limiting you in ways you may not yet realize.

Consider a deceptively simple request from Meridian's CFO: "For each branch, show me every customer's checking account balance alongside the branch average, and flag anyone whose balance exceeds twice that average." With what you know today, you might reach for a correlated subquery or a self-join with a GROUP BY. Both approaches work, but they force the engine to scan the same data multiple times, and the SQL reads like a legal contract instead of a business question.

Window functions solve this in a single, elegant pass. They let you compute aggregates, rankings, and comparisons across related rows without collapsing those rows into groups. The query reads almost like the English sentence the CFO spoke. And the engine, freed from redundant scans, often executes it faster.

This chapter introduces four families of advanced SQL features that share a common theme: they let you reason about relationships between rows rather than treating each row in isolation.

  • Window functions compute values across a set of rows related to the current row.
  • Recursive common table expressions traverse hierarchical relationships — organizational charts, bills of materials, account referral chains.
  • Temporal tables let DB2 manage the dimension of time itself, enabling "time-travel" queries that answer questions like "What did this customer's profile look like on March 15 of last year?"
  • OLAP grouping extensions — ROLLUP, CUBE, and GROUPING SETS — produce multi-level summaries in a single query.

By the end of this chapter you will not merely use these features; you will think in them. You will look at old queries and see opportunities you never noticed. That shift in perception is the hallmark of an advanced SQL practitioner.


10.1 Window Functions — A New Way to Think About Rows

10.1.1 The Problem Window Functions Solve

Let us begin with a concrete scenario. Meridian National Bank tracks every teller transaction in a table called TELLER_TRANSACTIONS:

CREATE TABLE TELLER_TRANSACTIONS (
    txn_id        INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    branch_id     INTEGER NOT NULL,
    teller_id     INTEGER NOT NULL,
    txn_date      DATE NOT NULL,
    txn_amount    DECIMAL(15,2) NOT NULL,
    txn_type      CHAR(1) NOT NULL  -- 'D' deposit, 'W' withdrawal
);

The branch manager asks: "List every transaction from January 2025, and next to each one show the running total of deposits for that branch."

Without window functions, you would write something like this:

-- Pre-window-function approach: correlated subquery
SELECT t.txn_id,
       t.branch_id,
       t.teller_id,
       t.txn_date,
       t.txn_amount,
       (SELECT SUM(t2.txn_amount)
        FROM   TELLER_TRANSACTIONS t2
        WHERE  t2.branch_id = t.branch_id
          AND  t2.txn_type  = 'D'
          AND  t2.txn_date <= t.txn_date
          AND  t2.txn_date >= '2025-01-01')  AS running_deposit_total
FROM   TELLER_TRANSACTIONS t
WHERE  t.txn_type  = 'D'
  AND  t.txn_date >= '2025-01-01'
  AND  t.txn_date <  '2025-02-01'
ORDER BY t.branch_id, t.txn_date;

This works, but the correlated subquery executes once per row in the outer query. For 10,000 January transactions, that is 10,000 subquery executions. The engine might optimize some of this away, but the fundamental approach is O(n^2).

Now the window function version:

-- Window function approach
SELECT txn_id,
       branch_id,
       teller_id,
       txn_date,
       txn_amount,
       SUM(txn_amount) OVER (
           PARTITION BY branch_id
           ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_deposit_total
FROM   TELLER_TRANSACTIONS
WHERE  txn_type  = 'D'
  AND  txn_date >= '2025-01-01'
  AND  txn_date <  '2025-02-01'
ORDER BY branch_id, txn_date;

Same result. One pass through the data. And the intent — "sum the amounts, partitioned by branch, ordered by date, from the start up to the current row" — is stated declaratively in the OVER clause.

10.1.2 Anatomy of the OVER Clause

Every window function follows the same pattern:

function_name(...) OVER (
    [PARTITION BY column_list]
    [ORDER BY column_list [ASC|DESC] [NULLS FIRST|NULLS LAST]]
    [frame_specification]
)

Three components, each optional but profoundly important:

Component Purpose Analogy
PARTITION BY Divides rows into independent groups (windows) Like GROUP BY, but without collapsing rows
ORDER BY Defines the sequence of rows within each partition Determines what "before" and "after" mean
Frame specification Restricts which rows within the ordered partition are visible to the function A sliding "viewport" over the partition

If you omit PARTITION BY, the entire result set is one partition. If you omit ORDER BY, the order within the partition is undefined (which matters enormously for running totals). If you omit the frame specification, DB2 uses a default that depends on whether ORDER BY is present — we will cover this in detail in Section 10.4.

10.1.3 Window Functions vs. GROUP BY

This distinction is so critical that it deserves explicit treatment. GROUP BY collapses rows — ten branch-101 transactions become one summary row. Window functions preserve every row and add computed columns. Think of it this way:

  • GROUP BY: "Tell me the total deposits per branch." (One row per branch.)
  • Window function: "Tell me every deposit, and next to each one show the branch total." (One row per transaction, with the branch total repeated.)

You can use both in the same query. In fact, combining GROUP BY for initial aggregation with window functions for cross-group analysis is one of the most powerful patterns in analytical SQL.

10.1.4 Execution Order and the Logical Query Pipeline

Understanding when window functions execute in the logical query pipeline explains many behaviors that otherwise seem mysterious:

FROM / JOIN          -- 1. Assemble rows
WHERE                -- 2. Filter rows
GROUP BY             -- 3. Aggregate
HAVING               -- 4. Filter groups
SELECT (expressions) -- 5. Compute columns
  Window functions   -- 5b. Computed AFTER other SELECT expressions
DISTINCT             -- 6. Remove duplicates
ORDER BY             -- 7. Sort final result
FETCH FIRST / LIMIT  -- 8. Limit rows

Window functions execute at step 5b — after WHERE, GROUP BY, and HAVING have already filtered the data, but before DISTINCT and the final ORDER BY. This means:

  • You cannot use a window function in a WHERE clause. The rows have not been windowed yet when WHERE executes.
  • You can reference a window function result in an outer query by wrapping it in a subquery or CTE.
  • If you use GROUP BY, the window function operates on the grouped result, not the original rows.
-- Window function on grouped data
-- "Rank branches by their total January deposits"
SELECT branch_id,
       SUM(txn_amount) AS total_deposits,
       RANK() OVER (ORDER BY SUM(txn_amount) DESC) AS deposit_rank
FROM   TELLER_TRANSACTIONS
WHERE  txn_type = 'D'
  AND  txn_date >= '2025-01-01'
  AND  txn_date <  '2025-02-01'
GROUP BY branch_id;

Here, GROUP BY first collapses rows to one per branch, then RANK() operates on those summary rows.


10.2 Ranking Functions

Ranking functions assign an ordinal position to each row within its partition. DB2 supports four ranking functions, each with subtly different behavior when ties occur.

10.2.1 ROW_NUMBER()

ROW_NUMBER() assigns a unique sequential integer to each row within its partition. When two rows have the same ORDER BY value, the assignment is nondeterministic — DB2 picks one, but you cannot predict which. This makes ROW_NUMBER ideal for pagination and deduplication, but dangerous for "fair" ranking.

-- Assign a unique row number to each customer per branch, ordered by balance
SELECT ROW_NUMBER() OVER (
           PARTITION BY branch_id
           ORDER BY current_balance DESC
       ) AS row_num,
       branch_id,
       customer_id,
       account_number,
       current_balance
FROM   ACCOUNTS
WHERE  account_type = 'CHK';

Meridian Bank Use Case — Pagination: The web banking portal shows 20 accounts per page. ROW_NUMBER provides clean pagination:

WITH numbered AS (
    SELECT ROW_NUMBER() OVER (ORDER BY account_number) AS rn,
           account_number,
           customer_name,
           current_balance
    FROM   ACCOUNTS a
    JOIN   CUSTOMERS c ON a.customer_id = c.customer_id
    WHERE  a.account_type = 'CHK'
)
SELECT account_number, customer_name, current_balance
FROM   numbered
WHERE  rn BETWEEN 41 AND 60;  -- Page 3

Meridian Bank Use Case — Deduplication: Duplicate customer records sometimes enter the system during mergers. Keep only the most recently updated record:

WITH ranked AS (
    SELECT ROW_NUMBER() OVER (
               PARTITION BY ssn_hash
               ORDER BY last_updated DESC
           ) AS rn,
           customer_id,
           ssn_hash,
           customer_name,
           last_updated
    FROM   CUSTOMERS
)
DELETE FROM CUSTOMERS
WHERE  customer_id IN (
    SELECT customer_id FROM ranked WHERE rn > 1
);

10.2.2 RANK()

RANK() assigns the same rank to tied values, then skips the next rank(s). If two customers share rank 3, the next customer gets rank 5, not rank 4. Think of it like Olympic medals — if two athletes tie for silver, there is no bronze.

-- Rank customers by total relationship balance across all accounts
SELECT customer_id,
       customer_name,
       total_balance,
       RANK() OVER (ORDER BY total_balance DESC) AS balance_rank
FROM (
    SELECT c.customer_id,
           c.customer_name,
           SUM(a.current_balance) AS total_balance
    FROM   CUSTOMERS c
    JOIN   ACCOUNTS a ON c.customer_id = a.customer_id
    GROUP BY c.customer_id, c.customer_name
) AS cust_totals;

Result pattern with ties:

customer_name total_balance balance_rank
Hargrove Industries 2,450,000.00 1
Chen Family Trust 1,800,000.00 2
Westfield LLC 1,800,000.00 2
Donovan Holdings 1,200,000.00 4

Notice rank 3 is missing — that is the defining behavior of RANK().

10.2.3 DENSE_RANK()

DENSE_RANK() assigns the same rank to tied values but does not skip ranks. Using the same data:

customer_name total_balance dense_rank
Hargrove Industries 2,450,000.00 1
Chen Family Trust 1,800,000.00 2
Westfield LLC 1,800,000.00 2
Donovan Holdings 1,200,000.00 3

DENSE_RANK is preferable when you need the ranks to be contiguous — for example, "show me the top 5 balance tiers" regardless of how many customers share each tier.

-- Top 3 balance tiers per branch (may return more than 3 rows if ties exist)
WITH ranked AS (
    SELECT branch_id,
           customer_id,
           current_balance,
           DENSE_RANK() OVER (
               PARTITION BY branch_id
               ORDER BY current_balance DESC
           ) AS tier
    FROM   ACCOUNTS
    WHERE  account_type = 'SAV'
)
SELECT branch_id, customer_id, current_balance, tier
FROM   ranked
WHERE  tier <= 3
ORDER BY branch_id, tier;

10.2.4 NTILE(n)

NTILE(n) distributes rows into n approximately equal groups (tiles). This is invaluable for percentile analysis.

-- Divide customers into quartiles by total balance
SELECT customer_id,
       customer_name,
       total_balance,
       NTILE(4) OVER (ORDER BY total_balance DESC) AS quartile
FROM (
    SELECT c.customer_id,
           c.customer_name,
           SUM(a.current_balance) AS total_balance
    FROM   CUSTOMERS c
    JOIN   ACCOUNTS a ON c.customer_id = a.customer_id
    GROUP BY c.customer_id, c.customer_name
) AS cust_totals;

Meridian Bank Use Case — Relationship Pricing: The bank offers tiered interest rates based on customer quartile. NTILE(4) cleanly identifies which customers fall into each tier.

If the number of rows is not evenly divisible by n, DB2 assigns the remainder rows to the first groups. With 103 rows and NTILE(4): groups 1-3 get 26 rows each, group 4 gets 25 rows.

10.2.5 Choosing the Right Ranking Function

Scenario Use
Unique numbering needed (pagination, deduplication) ROW_NUMBER()
Competition-style ranking (gaps after ties) RANK()
Contiguous ranking (no gaps after ties) DENSE_RANK()
Distribute into equal buckets (percentiles, tiers) NTILE(n)

10.3 Aggregate Window Functions

Any standard aggregate function — SUM, AVG, COUNT, MIN, MAX — can be used as a window function by adding an OVER clause. This unlocks running totals, moving averages, and comparative analytics without subqueries.

10.3.1 Running Totals

The running total is the canonical window function use case. At Meridian Bank, daily cash position reporting requires a running total of deposits and withdrawals by branch:

SELECT txn_date,
       branch_id,
       txn_amount,
       txn_type,
       SUM(CASE WHEN txn_type = 'D' THEN txn_amount ELSE -txn_amount END)
           OVER (
               PARTITION BY branch_id
               ORDER BY txn_date, txn_id
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
           ) AS running_net_position
FROM   TELLER_TRANSACTIONS
WHERE  txn_date >= '2025-01-01'
  AND  txn_date <  '2025-02-01'
ORDER BY branch_id, txn_date, txn_id;

The CASE expression converts withdrawals to negative amounts, and the window function accumulates from the first row of each branch's partition through the current row.

10.3.2 Moving Averages

A 7-day moving average smooths daily fluctuations and reveals trends. Meridian's treasury team uses this to predict cash requirements:

SELECT txn_date,
       branch_id,
       daily_total,
       AVG(daily_total) OVER (
           PARTITION BY branch_id
           ORDER BY txn_date
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS moving_avg_7day
FROM (
    SELECT txn_date,
           branch_id,
           SUM(txn_amount) AS daily_total
    FROM   TELLER_TRANSACTIONS
    WHERE  txn_type = 'D'
    GROUP BY txn_date, branch_id
) AS daily
ORDER BY branch_id, txn_date;

The frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a 7-row window (6 before + current). For the first 6 days of data, the window is smaller — the average adapts gracefully.

10.3.3 Cumulative Distribution and Ratios

Window functions shine when you need both detail and summary. Show each account's balance as a percentage of its branch total:

SELECT branch_id,
       account_number,
       customer_name,
       current_balance,
       SUM(current_balance) OVER (PARTITION BY branch_id) AS branch_total,
       DECIMAL(
           current_balance * 100.0 /
           SUM(current_balance) OVER (PARTITION BY branch_id),
           5, 2
       ) AS pct_of_branch
FROM   ACCOUNTS a
JOIN   CUSTOMERS c ON a.customer_id = c.customer_id
WHERE  account_type = 'CHK'
ORDER BY branch_id, current_balance DESC;

No GROUP BY needed. No self-join. The SUM(...) OVER (PARTITION BY branch_id) computes the branch total and repeats it on every row, allowing the percentage calculation inline.

10.3.4 COUNT for Running Distinct Counts and Sequence Detection

COUNT as a window function can detect patterns in sequences:

-- Flag transactions that are the 5th or later in a single day for a teller
-- (potential fraud indicator)
SELECT txn_id,
       teller_id,
       txn_date,
       txn_amount,
       COUNT(*) OVER (
           PARTITION BY teller_id, txn_date
           ORDER BY txn_id
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS daily_txn_sequence,
       CASE
           WHEN COUNT(*) OVER (
               PARTITION BY teller_id, txn_date
               ORDER BY txn_id
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
           ) >= 5
           THEN 'REVIEW'
           ELSE 'OK'
       END AS flag
FROM   TELLER_TRANSACTIONS
ORDER BY teller_id, txn_date, txn_id;

10.4 Frame Specifications

The frame specification is the most nuanced part of window function syntax. It defines the "sliding viewport" — which rows within the ordered partition are visible to the aggregate function for each row.

10.4.1 The Complete Frame Syntax

{ ROWS | RANGE | GROUPS } BETWEEN
    { UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW }
  AND
    { UNBOUNDED FOLLOWING | n FOLLOWING | CURRENT ROW }

Three frame types:

  • ROWS: Physical row positions. ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING means "the 3 rows before me, me, and the 1 row after me" — exactly 5 rows (when available).
  • RANGE: Logical value ranges. RANGE BETWEEN 100 PRECEDING AND 100 FOLLOWING means "all rows whose ORDER BY value is within 100 of mine." The number of rows varies.
  • GROUPS: Groups of tied ORDER BY values. GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING means "my peer group plus the groups immediately before and after."

10.4.2 Default Frame Behavior

This is a common source of bugs. The defaults differ depending on whether ORDER BY is present:

ORDER BY present? Default frame
No RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire partition)
Yes RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

The second default is subtle and dangerous. With RANGE, all rows sharing the same ORDER BY value as the current row are included. If you ORDER BY txn_date and three transactions share the same date, all three see each other's amounts in the running total — the total "jumps" by the sum of all three at each of those rows rather than incrementing one at a time.

To get a true row-by-row running total, always use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly, and include a tiebreaker (like txn_id) in the ORDER BY.

10.4.3 Frame Examples at Meridian Bank

3-month moving average of monthly branch revenue:

WITH monthly_rev AS (
    SELECT branch_id,
           YEAR(txn_date) AS yr,
           MONTH(txn_date) AS mo,
           SUM(txn_amount) AS monthly_revenue
    FROM   TELLER_TRANSACTIONS
    WHERE  txn_type = 'D'
    GROUP BY branch_id, YEAR(txn_date), MONTH(txn_date)
)
SELECT branch_id,
       yr,
       mo,
       monthly_revenue,
       AVG(monthly_revenue) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) AS moving_avg_3mo
FROM   monthly_rev
ORDER BY branch_id, yr, mo;

Year-to-date cumulative:

SELECT branch_id,
       txn_date,
       txn_amount,
       SUM(txn_amount) OVER (
           PARTITION BY branch_id, YEAR(txn_date)
           ORDER BY txn_date, txn_id
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS ytd_total
FROM   TELLER_TRANSACTIONS
WHERE  txn_type = 'D'
ORDER BY branch_id, txn_date;

The PARTITION BY includes YEAR(txn_date), so the running total resets at the start of each year — exactly the behavior the CFO expects.

Comparing each row to partition extremes:

SELECT account_number,
       branch_id,
       current_balance,
       MIN(current_balance) OVER (PARTITION BY branch_id) AS branch_min,
       MAX(current_balance) OVER (PARTITION BY branch_id) AS branch_max,
       current_balance - AVG(current_balance) OVER (PARTITION BY branch_id)
           AS deviation_from_avg
FROM   ACCOUNTS
WHERE  account_type = 'CHK';

No frame specification is needed here because there is no ORDER BY — the default encompasses the entire partition.

10.4.4 Performance Implications of Frame Specifications

Frame specifications directly affect query performance, and the choice between ROWS and RANGE has practical consequences beyond correctness.

ROWS frames are generally faster than RANGE frames. With ROWS, the engine maintains a fixed-size sliding window — it knows exactly how many rows to include. With RANGE, it must evaluate the ORDER BY values for every potential peer, which can vary in cost depending on data distribution.

Unbounded frames are cheaper than bounded frames for aggregates. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW accumulates incrementally — each row adds to the previous total. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW requires the engine to both add the new row and subtract the row that just left the window.

Multiple window functions with the same OVER clause share a single sort. DB2 recognizes when multiple functions use identical PARTITION BY and ORDER BY specifications and performs the sort once. This is why you should aim for consistency in your OVER clauses:

-- GOOD: same OVER clause reused — one sort operation
SELECT txn_id,
       txn_amount,
       SUM(txn_amount) OVER w AS running_total,
       AVG(txn_amount) OVER w AS running_avg,
       COUNT(*)         OVER w AS running_count
FROM   TELLER_TRANSACTIONS
WHERE  branch_id = 101
WINDOW w AS (ORDER BY txn_date, txn_id
             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);

The WINDOW clause (shown above) is a DB2 feature that lets you name a window specification and reuse it. This is not merely syntactic sugar — it guarantees the optimizer recognizes the shared sort opportunity.

Avoid unnecessary ORDER BY in window functions. If you need a partition-level aggregate (like the branch total in a percentage calculation), do not include ORDER BY — it forces a sort and changes the default frame from "entire partition" to "up to current row," which is almost certainly not what you want.

10.4.5 Named Windows with the WINDOW Clause

DB2 supports the SQL standard WINDOW clause, which defines a named window specification that can be referenced by multiple functions:

SELECT txn_id,
       branch_id,
       txn_date,
       txn_amount,
       SUM(txn_amount) OVER w_running   AS running_total,
       AVG(txn_amount) OVER w_running   AS running_avg,
       MIN(txn_amount) OVER w_partition AS branch_min,
       MAX(txn_amount) OVER w_partition AS branch_max
FROM   TELLER_TRANSACTIONS
WHERE  txn_type = 'D'
  AND  txn_date >= '2025-01-01'
  AND  txn_date <  '2025-02-01'
WINDOW
    w_running   AS (PARTITION BY branch_id
                    ORDER BY txn_date, txn_id
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
    w_partition AS (PARTITION BY branch_id)
ORDER BY branch_id, txn_date, txn_id;

Named windows improve readability when the same window specification appears in three or more functions. They also make it harder to introduce subtle inconsistencies — a common source of bugs when window specifications are copy-pasted and then partially modified.


10.5 LAG and LEAD — Accessing Adjacent Rows

10.5.1 LAG(): Looking Backward

LAG(expression, offset, default) returns the value of expression from the row that is offset rows before the current row in the partition. If no such row exists, default is returned (or NULL if omitted).

Month-over-month deposit comparison:

WITH monthly AS (
    SELECT branch_id,
           YEAR(txn_date) AS yr,
           MONTH(txn_date) AS mo,
           SUM(txn_amount) AS total_deposits
    FROM   TELLER_TRANSACTIONS
    WHERE  txn_type = 'D'
    GROUP BY branch_id, YEAR(txn_date), MONTH(txn_date)
)
SELECT branch_id,
       yr,
       mo,
       total_deposits,
       LAG(total_deposits, 1) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS prev_month_deposits,
       total_deposits - LAG(total_deposits, 1) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS month_over_month_change,
       CASE
           WHEN LAG(total_deposits, 1) OVER (
               PARTITION BY branch_id ORDER BY yr, mo
           ) IS NOT NULL
           THEN DECIMAL(
               (total_deposits - LAG(total_deposits, 1) OVER (
                   PARTITION BY branch_id ORDER BY yr, mo
               )) * 100.0 /
               LAG(total_deposits, 1) OVER (
                   PARTITION BY branch_id ORDER BY yr, mo
               ), 7, 2)
           ELSE NULL
       END AS pct_change
FROM   monthly
ORDER BY branch_id, yr, mo;

The first month for each branch shows NULL for previous values — there is no prior row. The third argument to LAG can supply a default value (e.g., 0) but NULL is often more honest.

10.5.2 LEAD(): Looking Forward

LEAD(expression, offset, default) is the mirror image — it looks at future rows. Meridian uses this to calculate days until next transaction:

SELECT customer_id,
       txn_date,
       txn_amount,
       LEAD(txn_date, 1) OVER (
           PARTITION BY customer_id
           ORDER BY txn_date
       ) AS next_txn_date,
       DAYS(LEAD(txn_date, 1) OVER (
           PARTITION BY customer_id
           ORDER BY txn_date
       )) - DAYS(txn_date) AS days_until_next_txn
FROM   TELLER_TRANSACTIONS
WHERE  customer_id = 10042
ORDER BY txn_date;

A customer with a long gap between transactions may be at risk of attrition — the customer relationship team monitors this metric.

10.5.3 Multi-Period Comparisons

LAG and LEAD accept any positive integer as the offset. Year-over-year comparison at the monthly grain:

WITH monthly AS (
    SELECT branch_id,
           YEAR(txn_date) AS yr,
           MONTH(txn_date) AS mo,
           SUM(txn_amount) AS total_deposits
    FROM   TELLER_TRANSACTIONS
    WHERE  txn_type = 'D'
    GROUP BY branch_id, YEAR(txn_date), MONTH(txn_date)
)
SELECT branch_id, yr, mo, total_deposits,
       LAG(total_deposits, 12) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS same_month_prior_year,
       total_deposits - LAG(total_deposits, 12) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS yoy_change
FROM   monthly
ORDER BY branch_id, yr, mo;

10.6 FIRST_VALUE, LAST_VALUE, and NTH_VALUE

These functions retrieve a specific value from the window frame — not a rank or an aggregate, but an actual column value from a particular row.

10.6.1 FIRST_VALUE()

Returns the first value in the ordered partition (or frame). Meridian uses this to show each customer's first and latest transaction on the same row:

SELECT customer_id,
       txn_date,
       txn_amount,
       FIRST_VALUE(txn_amount) OVER (
           PARTITION BY customer_id
           ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS first_txn_amount,
       FIRST_VALUE(txn_date) OVER (
           PARTITION BY customer_id
           ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS first_txn_date
FROM   TELLER_TRANSACTIONS
WHERE  customer_id IN (10042, 10078)
ORDER BY customer_id, txn_date;

10.6.2 LAST_VALUE()

Returns the last value in the frame. Warning: because the default frame with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, the "last" value is the current row itself — almost certainly not what you want. Always specify the full frame:

SELECT customer_id,
       txn_date,
       txn_amount,
       LAST_VALUE(txn_amount) OVER (
           PARTITION BY customer_id
           ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS most_recent_txn_amount
FROM   TELLER_TRANSACTIONS
WHERE  customer_id = 10042
ORDER BY txn_date;

Without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, every row would see its own amount as the "last" value. This is one of the most common window function bugs. Remember it.

10.6.3 NTH_VALUE()

NTH_VALUE(expression, n) returns the value from the nth row in the frame. Want the third-largest deposit ever made by a customer?

SELECT DISTINCT
       customer_id,
       NTH_VALUE(txn_amount, 3) OVER (
           PARTITION BY customer_id
           ORDER BY txn_amount DESC
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS third_largest_deposit
FROM   TELLER_TRANSACTIONS
WHERE  txn_type = 'D';

If the customer has fewer than 3 deposits, NTH_VALUE returns NULL.

10.6.4 Practical Patterns: Combining Value Functions

Value functions become especially powerful when combined with other window functions. Here is a pattern Meridian's analysts use frequently — showing each transaction alongside the first and most recent transactions for context:

SELECT customer_id,
       txn_date,
       txn_amount,
       txn_type,
       ROW_NUMBER() OVER w AS txn_sequence,
       FIRST_VALUE(txn_date) OVER w AS first_ever_txn_date,
       FIRST_VALUE(txn_amount) OVER w AS first_ever_txn_amount,
       LAST_VALUE(txn_date) OVER w AS most_recent_txn_date,
       LAST_VALUE(txn_amount) OVER w AS most_recent_txn_amount,
       txn_amount - FIRST_VALUE(txn_amount) OVER w AS change_from_first
FROM   TELLER_TRANSACTIONS
WHERE  customer_id = 10042
  AND  txn_type = 'D'
WINDOW w AS (
    PARTITION BY customer_id
    ORDER BY txn_date, txn_id
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY txn_date;

Notice the WINDOW clause — naming the window once and referencing it five times eliminates redundancy and ensures all five functions operate on the same window specification. This is a best practice whenever you reference the same window more than twice.

When to use FIRST_VALUE/LAST_VALUE vs. MIN/MAX: FIRST_VALUE and LAST_VALUE respect the ORDER BY and return the value from a specific position in the ordered frame. MIN and MAX return the extreme values regardless of position. If your ORDER BY column is the same as the column you are aggregating (e.g., ordering by balance and wanting the minimum balance), the results coincide. But if they differ (e.g., ordering by date and wanting the amount of the first transaction), only FIRST_VALUE gives the correct result.


10.7 Recursive CTEs — Querying Hierarchical Data

10.7.1 The Hierarchical Data Problem

Relational tables are flat. But business data is often hierarchical: organizational charts, account referral chains, product category trees, branch reporting structures. Meridian National Bank's employee table contains a classic self-referential hierarchy:

CREATE TABLE EMPLOYEES (
    employee_id    INTEGER NOT NULL PRIMARY KEY,
    employee_name  VARCHAR(100) NOT NULL,
    title          VARCHAR(100),
    department     VARCHAR(50),
    manager_id     INTEGER,  -- references employee_id
    branch_id      INTEGER,
    hire_date      DATE,
    salary         DECIMAL(12,2),
    FOREIGN KEY (manager_id) REFERENCES EMPLOYEES(employee_id)
);

The CEO has manager_id = NULL. Vice presidents report to the CEO. Directors report to VPs. And so on. Querying "who reports to whom, at all levels" requires recursion.

10.7.2 The WITH RECURSIVE Syntax

DB2 supports recursive common table expressions with the following structure:

WITH recursive_cte (column_list) AS (
    -- Anchor member: the starting point
    SELECT ...
    FROM   ...
    WHERE  ... (base case)

    UNION ALL

    -- Recursive member: references the CTE itself
    SELECT ...
    FROM   table
    JOIN   recursive_cte ON ... (recursive step)
)
SELECT * FROM recursive_cte;

The engine executes this iteratively: 1. Run the anchor member, producing the initial result set. 2. Run the recursive member, joining the table against the previous iteration's output. 3. Repeat step 2 until the recursive member returns no new rows. 4. Combine all iterations with UNION ALL.

10.7.3 Organizational Chart Traversal

List every employee who reports to the CEO (directly or indirectly), with their level in the hierarchy:

WITH org_chart (employee_id, employee_name, title, manager_id, lvl) AS (
    -- Anchor: the CEO (no manager)
    SELECT employee_id,
           employee_name,
           title,
           manager_id,
           0 AS lvl
    FROM   EMPLOYEES
    WHERE  manager_id IS NULL

    UNION ALL

    -- Recursive: find employees who report to someone already in the result
    SELECT e.employee_id,
           e.employee_name,
           e.title,
           e.manager_id,
           oc.lvl + 1
    FROM   EMPLOYEES e
    JOIN   org_chart oc ON e.manager_id = oc.employee_id
)
SELECT REPEAT('  ', lvl) || employee_name AS org_tree,
       title,
       lvl
FROM   org_chart
ORDER BY lvl, employee_name;

Result:

ORG_TREE                     TITLE                   LVL
---                          ---                     ---
Margaret Chen                CEO                       0
  David Okonkwo              VP Operations             1
  Sarah Lindström             VP Technology             1
    James Moreau             Director, IT Infra         2
    Priya Sharma             Director, App Dev          2
      Carlos Rivera          Sr. Developer              3
      Emily Watson           Sr. Developer              3

10.7.4 Building a Path String

For reporting, it is often useful to show the full management chain:

WITH org_path (employee_id, employee_name, mgr_chain, lvl) AS (
    SELECT employee_id,
           employee_name,
           CAST(employee_name AS VARCHAR(1000)),
           0
    FROM   EMPLOYEES
    WHERE  manager_id IS NULL

    UNION ALL

    SELECT e.employee_id,
           e.employee_name,
           CAST(op.mgr_chain || ' > ' || e.employee_name AS VARCHAR(1000)),
           op.lvl + 1
    FROM   EMPLOYEES e
    JOIN   org_path op ON e.manager_id = op.employee_id
)
SELECT employee_name, mgr_chain, lvl
FROM   org_path
ORDER BY mgr_chain;

Carlos Rivera's path: Margaret Chen > Sarah Lindström > Priya Sharma > Carlos Rivera.

10.7.5 Bill of Materials Explosion

Meridian's vendor management tracks composite products (e.g., a "Premium Account Package" that contains sub-products). A PRODUCT_COMPONENTS table:

CREATE TABLE PRODUCT_COMPONENTS (
    parent_product_id  INTEGER NOT NULL,
    child_product_id   INTEGER NOT NULL,
    quantity           INTEGER NOT NULL DEFAULT 1,
    PRIMARY KEY (parent_product_id, child_product_id)
);

Explode a product into all its components at every level:

WITH bom (root_product, product_id, product_name, lvl, qty_path) AS (
    -- Anchor: the top-level product
    SELECT p.product_id,
           p.product_id,
           p.product_name,
           0,
           1
    FROM   PRODUCTS p
    WHERE  p.product_id = 5001  -- Premium Account Package

    UNION ALL

    SELECT b.root_product,
           pc.child_product_id,
           p.product_name,
           b.lvl + 1,
           b.qty_path * pc.quantity
    FROM   bom b
    JOIN   PRODUCT_COMPONENTS pc ON b.product_id = pc.parent_product_id
    JOIN   PRODUCTS p ON pc.child_product_id = p.product_id
)
SELECT REPEAT('  ', lvl) || product_name AS component_tree,
       qty_path AS total_quantity,
       lvl
FROM   bom
ORDER BY root_product, lvl;

10.7.6 Depth Limiting and Cycle Detection

Runaway recursion is a real danger. DB2 will eventually hit its recursion limit and return an error, but it is better to be explicit.

Depth limiting — add a condition in the recursive member:

WITH org_chart (employee_id, employee_name, lvl) AS (
    SELECT employee_id, employee_name, 0
    FROM   EMPLOYEES
    WHERE  manager_id IS NULL

    UNION ALL

    SELECT e.employee_id, e.employee_name, oc.lvl + 1
    FROM   EMPLOYEES e
    JOIN   org_chart oc ON e.manager_id = oc.employee_id
    WHERE  oc.lvl < 10  -- never recurse deeper than 10 levels
)
SELECT * FROM org_chart;

Cycle detection — if the data might contain cycles (an employee who indirectly manages their own manager), track visited nodes:

WITH org_chart (employee_id, employee_name, lvl, path) AS (
    SELECT employee_id,
           employee_name,
           0,
           CAST(CHAR(employee_id) AS VARCHAR(4000))
    FROM   EMPLOYEES
    WHERE  manager_id IS NULL

    UNION ALL

    SELECT e.employee_id,
           e.employee_name,
           oc.lvl + 1,
           oc.path || ',' || CHAR(e.employee_id)
    FROM   EMPLOYEES e
    JOIN   org_chart oc ON e.manager_id = oc.employee_id
    WHERE  LOCATE(CHAR(e.employee_id), oc.path) = 0  -- not already visited
      AND  oc.lvl < 20
)
SELECT * FROM org_chart;

The LOCATE check prevents any employee from appearing twice in their own management chain. This is a safeguard for data integrity issues — in production, the data should not have cycles, but defensive coding prevents infinite loops during development and data migration.

10.7.7 Generating Sequences with Recursive CTEs

Recursive CTEs are not limited to hierarchical traversal. They can generate sequences — an invaluable technique for filling gaps in time-series data. Meridian's reporting team uses this to ensure every day appears in daily reports, even days with no transactions:

-- Generate all dates in January 2025
WITH date_series (dt) AS (
    SELECT DATE('2025-01-01')
    FROM   SYSIBM.SYSDUMMY1

    UNION ALL

    SELECT dt + 1 DAY
    FROM   date_series
    WHERE  dt < DATE('2025-01-31')
)
SELECT ds.dt AS report_date,
       COALESCE(daily.total_deposits, 0) AS total_deposits,
       COALESCE(daily.txn_count, 0) AS txn_count
FROM   date_series ds
LEFT JOIN (
    SELECT txn_date,
           SUM(txn_amount) AS total_deposits,
           COUNT(*) AS txn_count
    FROM   TELLER_TRANSACTIONS
    WHERE  branch_id = 101
      AND  txn_type = 'D'
      AND  txn_date BETWEEN '2025-01-01' AND '2025-01-31'
    GROUP BY txn_date
) daily ON ds.dt = daily.txn_date
ORDER BY ds.dt;

The LEFT JOIN ensures weekends and holidays appear as zero-deposit days rather than missing rows — essential for charts and moving-average calculations that assume continuous time series.

10.7.8 Recursive CTE Performance Considerations

Recursive CTEs are powerful but carry performance implications:

  1. Each iteration materializes — DB2 stores intermediate results. Deep hierarchies with wide rows consume TEMP space.
  2. No indexes on intermediate results — the join in the recursive member scans the previous iteration's output. Keep the CTE columns narrow.
  3. UNION ALL is mandatory — UNION (with duplicate elimination) is not supported in recursive CTEs in DB2. If you need deduplication, do it in the outer query.
  4. Consider adjacency list vs. materialized path — for frequently queried hierarchies, consider storing the materialized path (e.g., /1/4/12/) in the table and querying with LIKE instead of recursion.
  5. Test with realistic data volumes — a recursive CTE that runs instantly on 50 employees may grind to a halt on 50,000. Profile the number of iterations and intermediate row counts before deploying to production.

10.8 Temporal Tables — Time Travel in DB2

10.8.1 Why Temporal Tables Matter

Banking regulations (Dodd-Frank, Basel III, SOX) require institutions to answer questions like:

  • "What was this customer's credit rating on the date we approved the loan?"
  • "Show me the complete history of changes to this account's terms."
  • "As of the last audit date, what did we believe this customer's address was?"

You could build this yourself with trigger-maintained history tables, effective-date columns, and complex WHERE clauses. DB2 temporal tables automate all of it, guaranteeing correctness and providing clean query syntax.

10.8.2 Three Types of Temporal Tables

DB2 supports three temporal models:

Type Managed By Tracks Use Case
System-time (system-period temporal) DB2 engine automatically When rows were physically changed in the database Audit trails, regulatory history
Business-time (application-period temporal) Application explicitly When facts are effective in the real world Policy effective dates, rate schedules
Bitemporal Both Both system-time and business-time Full regulatory compliance, backdated corrections with audit trail

10.8.3 System-Time Temporal Tables

System-time versioning means DB2 automatically maintains a history of every INSERT, UPDATE, and DELETE. You never lose data.

Creating a system-time temporal table:

CREATE TABLE CUSTOMER_PROFILES (
    customer_id      INTEGER NOT NULL,
    customer_name    VARCHAR(100) NOT NULL,
    credit_rating    CHAR(2),
    annual_income    DECIMAL(15,2),
    risk_category    VARCHAR(20),
    sys_start        TIMESTAMP(12) NOT NULL
                     GENERATED ALWAYS AS ROW BEGIN,
    sys_end          TIMESTAMP(12) NOT NULL
                     GENERATED ALWAYS AS ROW END,
    txn_start_id     TIMESTAMP(12)
                     GENERATED ALWAYS AS TRANSACTION START ID,
    PERIOD SYSTEM_TIME (sys_start, sys_end),
    PRIMARY KEY (customer_id)
);

-- History table — same structure, no GENERATED clauses
CREATE TABLE CUSTOMER_PROFILES_HIST (
    customer_id      INTEGER NOT NULL,
    customer_name    VARCHAR(100) NOT NULL,
    credit_rating    CHAR(2),
    annual_income    DECIMAL(15,2),
    risk_category    VARCHAR(20),
    sys_start        TIMESTAMP(12) NOT NULL,
    sys_end          TIMESTAMP(12) NOT NULL,
    txn_start_id     TIMESTAMP(12)
);

-- Link them
ALTER TABLE CUSTOMER_PROFILES
    ADD VERSIONING USE HISTORY TABLE CUSTOMER_PROFILES_HIST;

Now, every UPDATE or DELETE automatically copies the old row to the history table with the time range it was valid. You write normal DML — the engine handles history.

Querying system-time:

-- Current data (normal query)
SELECT * FROM CUSTOMER_PROFILES
WHERE  customer_id = 10042;

-- Data as it existed on a specific date
SELECT * FROM CUSTOMER_PROFILES
    FOR SYSTEM_TIME AS OF '2024-06-15-00.00.00.000000000000'
WHERE  customer_id = 10042;

-- All versions of a customer's profile
SELECT * FROM CUSTOMER_PROFILES
    FOR SYSTEM_TIME FROM '2023-01-01' TO '2025-12-31'
WHERE  customer_id = 10042
ORDER BY sys_start;

-- All versions between two points
SELECT * FROM CUSTOMER_PROFILES
    FOR SYSTEM_TIME BETWEEN '2024-01-01' AND '2024-12-31'
WHERE  customer_id = 10042
ORDER BY sys_start;

The FOR SYSTEM_TIME AS OF clause transparently queries both the current table and the history table, returning whichever row was active at that moment. This is "time travel."

Understanding the three temporal query predicates:

Predicate Behavior
FOR SYSTEM_TIME AS OF timestamp Returns the single row version that was active at that exact moment
FOR SYSTEM_TIME FROM t1 TO t2 Returns all versions whose system-time period overlaps with [t1, t2). Note: the end timestamp t2 is exclusive.
FOR SYSTEM_TIME BETWEEN t1 AND t2 Returns all versions whose system-time period overlaps with [t1, t2]. Note: the end timestamp t2 is inclusive.

The distinction between FROM...TO (exclusive end) and BETWEEN...AND (inclusive end) matters when you need exact boundary control. For most reporting scenarios, AS OF is the clearest choice — it returns exactly one version per primary key value, making it a direct substitute for a regular query.

Deleting from system-time tables: When you DELETE a row from a system-time table, the row is not physically destroyed — it is moved to the history table with its sys_end set to the current timestamp. The row is gone from the current table's perspective, but a FOR SYSTEM_TIME AS OF query at a prior timestamp will still find it. This is fundamental to the audit trail guarantee: no data is ever truly lost.

10.8.4 Business-Time Temporal Tables

Business-time tracks when facts are effective in the real world, independent of when they were recorded in the database. Meridian's loan rate schedule is a perfect example — a rate may be agreed today but take effect next month:

CREATE TABLE LOAN_RATES (
    loan_type       VARCHAR(30) NOT NULL,
    rate_pct        DECIMAL(5,3) NOT NULL,
    bus_start       DATE NOT NULL,
    bus_end         DATE NOT NULL,
    PERIOD BUSINESS_TIME (bus_start, bus_end),
    PRIMARY KEY (loan_type, BUSINESS_TIME WITHOUT OVERLAPS)
);

The WITHOUT OVERLAPS clause on the primary key guarantees that no two rows for the same loan type have overlapping business-time periods. DB2 enforces this at the constraint level.

Querying business-time:

-- What rate applies to 30-year fixed mortgages today?
SELECT * FROM LOAN_RATES
    FOR BUSINESS_TIME AS OF CURRENT DATE
WHERE  loan_type = '30YR_FIXED';

-- What rate was in effect on 2024-03-15?
SELECT * FROM LOAN_RATES
    FOR BUSINESS_TIME AS OF '2024-03-15'
WHERE  loan_type = '30YR_FIXED';

-- All rate changes for 30-year fixed in 2024
SELECT * FROM LOAN_RATES
    FOR BUSINESS_TIME FROM '2024-01-01' TO '2025-01-01'
WHERE  loan_type = '30YR_FIXED'
ORDER BY bus_start;

Temporal DML — automatic period splitting:

When you UPDATE or DELETE with a FOR PORTION OF clause, DB2 automatically splits periods:

-- Change the 30-year fixed rate for Q3 2025 only
UPDATE LOAN_RATES
    FOR PORTION OF BUSINESS_TIME FROM '2025-07-01' TO '2025-10-01'
SET    rate_pct = 6.250
WHERE  loan_type = '30YR_FIXED';

If the original row covered all of 2025, DB2 splits it into three rows: Jan-Jun at the old rate, Jul-Sep at 6.250%, and Oct-Dec at the old rate. The engine handles the splitting automatically, maintaining the no-overlap constraint.

10.8.5 Bitemporal Tables

Bitemporal tables combine both dimensions. This is the gold standard for regulatory compliance — you can answer: "As of today, what did we believe the rate was on March 15, 2024?" vs. "As of last year's audit date, what did we believe the rate was on March 15, 2024?" These are different questions if the data was corrected between the audit and today.

CREATE TABLE LOAN_RATES_BT (
    loan_type       VARCHAR(30) NOT NULL,
    rate_pct        DECIMAL(5,3) NOT NULL,
    bus_start       DATE NOT NULL,
    bus_end         DATE NOT NULL,
    sys_start       TIMESTAMP(12) NOT NULL
                    GENERATED ALWAYS AS ROW BEGIN,
    sys_end         TIMESTAMP(12) NOT NULL
                    GENERATED ALWAYS AS ROW END,
    txn_start_id    TIMESTAMP(12)
                    GENERATED ALWAYS AS TRANSACTION START ID,
    PERIOD BUSINESS_TIME (bus_start, bus_end),
    PERIOD SYSTEM_TIME (sys_start, sys_end),
    PRIMARY KEY (loan_type, BUSINESS_TIME WITHOUT OVERLAPS)
);

CREATE TABLE LOAN_RATES_BT_HIST LIKE LOAN_RATES_BT;

ALTER TABLE LOAN_RATES_BT
    ADD VERSIONING USE HISTORY TABLE LOAN_RATES_BT_HIST;

Bitemporal query:

-- What rate do we CURRENTLY believe was effective for 30YR_FIXED on 2024-03-15?
SELECT * FROM LOAN_RATES_BT
    FOR BUSINESS_TIME AS OF '2024-03-15'
WHERE  loan_type = '30YR_FIXED';

-- What rate did we believe ON 2024-12-31 was effective for 30YR_FIXED on 2024-03-15?
SELECT * FROM LOAN_RATES_BT
    FOR SYSTEM_TIME AS OF '2024-12-31-00.00.00.000000000000'
    FOR BUSINESS_TIME AS OF '2024-03-15'
WHERE  loan_type = '30YR_FIXED';

The second query is the regulatory auditor's dream — it reconstructs the bank's belief at a specific point in time about what was effective at another point in time. Without bitemporal tables, this requires extraordinarily complex manual versioning.


10.9 OLAP Expressions and Analytical Patterns

10.9.1 DB2 OLAP Functions vs. Window Functions

In DB2 documentation, you will encounter the term "OLAP functions" (also called "OLAP specifications" or "online analytical processing functions"). This can be confusing because the industry broadly uses "window functions" for the same concept. In DB2's terminology:

  • OLAP aggregate functions: standard aggregates (SUM, AVG, etc.) with an OVER clause — what this chapter calls "aggregate window functions."
  • OLAP ranking functions: ROW_NUMBER, RANK, DENSE_RANK — what this chapter calls "ranking functions."
  • OLAP numbering function: ROW_NUMBER specifically.
  • OLAP grouping extensions: ROLLUP, CUBE, GROUPING SETS — these are not window functions; they are extensions to GROUP BY.

We have covered the first three categories. Now let us address the grouping extensions.

10.9.2 ROLLUP — Hierarchical Subtotals

ROLLUP adds subtotals at each level of a grouping hierarchy, plus a grand total:

SELECT COALESCE(CHAR(branch_id), 'ALL BRANCHES') AS branch,
       COALESCE(CHAR(YEAR(txn_date)), 'ALL YEARS')  AS yr,
       COALESCE(CHAR(MONTH(txn_date)), 'ALL MONTHS') AS mo,
       SUM(txn_amount) AS total_deposits,
       COUNT(*) AS txn_count
FROM   TELLER_TRANSACTIONS
WHERE  txn_type = 'D'
GROUP BY ROLLUP(branch_id, YEAR(txn_date), MONTH(txn_date))
ORDER BY branch_id, YEAR(txn_date), MONTH(txn_date);

ROLLUP(A, B, C) produces groupings: (A, B, C), (A, B), (A), and (). The order matters — ROLLUP(A, B) is not the same as ROLLUP(B, A).

For Meridian's executive dashboard, this produces: - Detail: deposits per branch per year per month - Subtotal: deposits per branch per year - Subtotal: deposits per branch - Grand total: all deposits

10.9.3 CUBE — All Possible Combinations

CUBE produces subtotals for every combination of the grouped columns:

SELECT branch_id,
       account_type,
       SUM(current_balance) AS total_balance,
       COUNT(*) AS account_count,
       GROUPING(branch_id)  AS is_branch_subtotal,
       GROUPING(account_type) AS is_type_subtotal
FROM   ACCOUNTS
GROUP BY CUBE(branch_id, account_type)
ORDER BY branch_id, account_type;

CUBE(A, B) produces groupings: (A, B), (A), (B), and (). With three columns, CUBE produces 2^3 = 8 groupings. Use CUBE when you need cross-tabulation analysis — "total by branch," "total by account type," and "grand total" all in one query.

The GROUPING() function returns 1 when the column is aggregated (i.e., the NULL in that column represents "all values," not a real NULL). This helps distinguish genuine NULLs from subtotal NULLs.

10.9.4 GROUPING SETS — Explicit Control

GROUPING SETS lets you specify exactly which groupings you want, without computing all possible combinations:

SELECT branch_id,
       account_type,
       YEAR(open_date) AS open_year,
       SUM(current_balance) AS total_balance,
       COUNT(*) AS account_count
FROM   ACCOUNTS
GROUP BY GROUPING SETS (
    (branch_id, account_type),
    (branch_id, YEAR(open_date)),
    (account_type),
    ()
)
ORDER BY branch_id, account_type, open_year;

This produces exactly four groupings — no more, no less. GROUPING SETS is more efficient than CUBE when you only need specific combinations, and it is the most flexible of the three.

Equivalences: - ROLLUP(A, B, C) = GROUPING SETS((A,B,C), (A,B), (A), ()) - CUBE(A, B) = GROUPING SETS((A,B), (A), (B), ())

10.9.5 Combining OLAP Grouping with Window Functions

The real power emerges when you combine grouping extensions with window functions:

-- Executive dashboard: branch performance with rank within each subtotal level
WITH branch_summary AS (
    SELECT branch_id,
           account_type,
           SUM(current_balance) AS total_balance,
           GROUPING(branch_id) AS grp_branch,
           GROUPING(account_type) AS grp_type
    FROM   ACCOUNTS
    GROUP BY ROLLUP(branch_id, account_type)
)
SELECT branch_id,
       account_type,
       total_balance,
       CASE
           WHEN grp_branch = 0 AND grp_type = 0
           THEN RANK() OVER (
               PARTITION BY account_type
               ORDER BY total_balance DESC
           )
       END AS rank_within_type,
       grp_branch,
       grp_type
FROM   branch_summary
ORDER BY grp_branch, grp_type, branch_id;

10.10 Meridian Bank Advanced Analytics

Let us bring everything together with four realistic analytical queries that Meridian National Bank's data team might build for the executive dashboard.

10.10.1 Customer Lifetime Value Calculation

Customer lifetime value (CLV) estimates the total revenue a customer will generate. We approximate it using historical transaction data, account tenure, and product holdings:

WITH customer_metrics AS (
    SELECT c.customer_id,
           c.customer_name,
           c.signup_date,
           DAYS(CURRENT DATE) - DAYS(c.signup_date) AS tenure_days,
           COUNT(DISTINCT a.account_id) AS product_count,
           SUM(a.current_balance) AS total_balance,
           COUNT(t.txn_id) AS total_transactions,
           SUM(CASE WHEN t.txn_type = 'D' THEN t.txn_amount ELSE 0 END)
               AS lifetime_deposits,
           SUM(CASE WHEN t.txn_date >= CURRENT DATE - 365 DAYS
                    THEN t.txn_amount ELSE 0 END) AS last_year_volume
    FROM   CUSTOMERS c
    LEFT JOIN ACCOUNTS a ON c.customer_id = a.customer_id
    LEFT JOIN TELLER_TRANSACTIONS t ON a.account_id = t.account_id
    GROUP BY c.customer_id, c.customer_name, c.signup_date
),
clv_scored AS (
    SELECT customer_id,
           customer_name,
           tenure_days,
           product_count,
           total_balance,
           total_transactions,
           lifetime_deposits,
           last_year_volume,
           -- CLV approximation: weighted scoring
           (total_balance * 0.03)          -- 3% annual margin on balances
           + (last_year_volume * 0.001)    -- fee revenue estimate
           + (product_count * 500)          -- cross-sell value
           AS estimated_annual_value,
           NTILE(5) OVER (ORDER BY total_balance + last_year_volume DESC)
               AS value_quintile,
           RANK() OVER (ORDER BY total_balance + last_year_volume DESC)
               AS overall_rank
    FROM   customer_metrics
)
SELECT customer_id,
       customer_name,
       tenure_days,
       product_count,
       total_balance,
       estimated_annual_value,
       value_quintile,
       overall_rank,
       FIRST_VALUE(customer_name) OVER (
           PARTITION BY value_quintile
           ORDER BY estimated_annual_value DESC
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS top_customer_in_quintile
FROM   clv_scored
ORDER BY overall_rank;

This query uses CTEs for staged calculation, NTILE for segmentation, RANK for ordering, and FIRST_VALUE for reference comparisons — all in a single statement.

10.10.2 Transaction Trend Analysis with Moving Averages

The treasury team needs to visualize deposit trends by branch with smoothing:

WITH daily_deposits AS (
    SELECT branch_id,
           txn_date,
           SUM(txn_amount) AS daily_total,
           COUNT(*) AS daily_count
    FROM   TELLER_TRANSACTIONS
    WHERE  txn_type = 'D'
      AND  txn_date >= CURRENT DATE - 180 DAYS
    GROUP BY branch_id, txn_date
)
SELECT branch_id,
       txn_date,
       daily_total,
       daily_count,
       -- 7-day moving average
       AVG(daily_total) OVER (
           PARTITION BY branch_id
           ORDER BY txn_date
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS ma_7day,
       -- 30-day moving average
       AVG(daily_total) OVER (
           PARTITION BY branch_id
           ORDER BY txn_date
           ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
       ) AS ma_30day,
       -- Day-over-day change
       daily_total - LAG(daily_total, 1, 0) OVER (
           PARTITION BY branch_id
           ORDER BY txn_date
       ) AS dod_change,
       -- Cumulative YTD
       SUM(daily_total) OVER (
           PARTITION BY branch_id, YEAR(txn_date)
           ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS ytd_cumulative,
       -- Rank day within branch by volume
       RANK() OVER (
           PARTITION BY branch_id
           ORDER BY daily_total DESC
       ) AS volume_rank_in_branch
FROM   daily_deposits
ORDER BY branch_id, txn_date;

Five different window function computations on the same data set, each providing a distinct analytical lens, all in a single pass.

10.10.3 Branch Performance Ranking Over Time

Management wants to see how each branch's rank changes month over month:

WITH monthly_branch AS (
    SELECT branch_id,
           YEAR(txn_date) AS yr,
           MONTH(txn_date) AS mo,
           SUM(txn_amount) AS monthly_deposits,
           COUNT(DISTINCT customer_id) AS active_customers
    FROM   TELLER_TRANSACTIONS t
    JOIN   ACCOUNTS a ON t.account_id = a.account_id
    WHERE  txn_type = 'D'
    GROUP BY branch_id, YEAR(txn_date), MONTH(txn_date)
),
ranked AS (
    SELECT branch_id,
           yr,
           mo,
           monthly_deposits,
           active_customers,
           RANK() OVER (
               PARTITION BY yr, mo
               ORDER BY monthly_deposits DESC
           ) AS deposit_rank,
           RANK() OVER (
               PARTITION BY yr, mo
               ORDER BY active_customers DESC
           ) AS customer_rank
    FROM   monthly_branch
)
SELECT branch_id,
       yr,
       mo,
       monthly_deposits,
       deposit_rank,
       LAG(deposit_rank, 1) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS prev_month_rank,
       deposit_rank - LAG(deposit_rank, 1) OVER (
           PARTITION BY branch_id
           ORDER BY yr, mo
       ) AS rank_change,
       CASE
           WHEN deposit_rank < LAG(deposit_rank, 1) OVER (
               PARTITION BY branch_id ORDER BY yr, mo
           ) THEN 'IMPROVED'
           WHEN deposit_rank > LAG(deposit_rank, 1) OVER (
               PARTITION BY branch_id ORDER BY yr, mo
           ) THEN 'DECLINED'
           ELSE 'STABLE'
       END AS trend
FROM   ranked
ORDER BY yr, mo, deposit_rank;

10.10.4 Audit Trail with Temporal Tables

The compliance team needs to demonstrate that the bank can reconstruct any customer's profile as it existed at any point in time:

-- Compliance scenario: Show customer 10042's profile as it existed
-- at the end of each quarter in 2024

SELECT 'Q1 2024' AS quarter,
       cp.*
FROM   CUSTOMER_PROFILES cp
    FOR SYSTEM_TIME AS OF '2024-03-31-23.59.59.999999999999'
WHERE  cp.customer_id = 10042

UNION ALL

SELECT 'Q2 2024',
       cp.*
FROM   CUSTOMER_PROFILES cp
    FOR SYSTEM_TIME AS OF '2024-06-30-23.59.59.999999999999'
WHERE  cp.customer_id = 10042

UNION ALL

SELECT 'Q3 2024',
       cp.*
FROM   CUSTOMER_PROFILES cp
    FOR SYSTEM_TIME AS OF '2024-09-30-23.59.59.999999999999'
WHERE  cp.customer_id = 10042

UNION ALL

SELECT 'Q4 2024',
       cp.*
FROM   CUSTOMER_PROFILES cp
    FOR SYSTEM_TIME AS OF '2024-12-31-23.59.59.999999999999'
WHERE  cp.customer_id = 10042

ORDER BY quarter;

This produces a quarterly snapshot of the customer's profile — credit rating changes, income updates, risk category reclassifications — all without the application having written a single line of history-management code.

For the complete audit trail showing every change:

SELECT customer_id,
       customer_name,
       credit_rating,
       annual_income,
       risk_category,
       sys_start,
       sys_end,
       CASE
           WHEN sys_end = '9999-12-30-00.00.00.000000000000'
           THEN 'CURRENT'
           ELSE 'HISTORICAL'
       END AS version_status
FROM   CUSTOMER_PROFILES
    FOR SYSTEM_TIME FROM '2020-01-01' TO '9999-12-30'
WHERE  customer_id = 10042
ORDER BY sys_start;

Spaced Review — Chapters 3, 6, and 8

Before proceeding to the exercises, let us reinforce concepts from earlier chapters that are deeply connected to this chapter's material.

From Chapter 3: SELECT Fundamentals

Connection to Chapter 10: Window functions extend SELECT with the OVER clause. But remember — window functions execute after WHERE and GROUP BY in the logical query pipeline. You cannot filter on a window function result in the same query level; wrap it in a CTE or subquery first.

Review Question 1: Write a query that lists all checking accounts with a balance above $10,000 and includes a column showing how many such accounts exist at the same branch. (Hint: COUNT with OVER and PARTITION BY.)

SELECT branch_id,
       account_number,
       customer_name,
       current_balance,
       COUNT(*) OVER (PARTITION BY branch_id) AS high_balance_count_at_branch
FROM   ACCOUNTS a
JOIN   CUSTOMERS c ON a.customer_id = c.customer_id
WHERE  account_type = 'CHK'
  AND  current_balance > 10000
ORDER BY branch_id, current_balance DESC;

From Chapter 6: Joins

Connection to Chapter 10: The self-join pattern from Chapter 6 (EMPLOYEES e JOIN EMPLOYEES m ON e.manager_id = m.employee_id) solves one level of the hierarchy. Recursive CTEs generalize this to n levels. You have now graduated from the self-join to the recursive CTE.

Review Question 2: Rewrite a self-join that finds each employee and their manager's name. Then extend it to show the entire management chain using a recursive CTE. Which approach scales to arbitrary depth?

From Chapter 8: Subqueries and CTEs

Connection to Chapter 10: Chapter 8 introduced CTEs with WITH. Recursive CTEs add the UNION ALL self-reference pattern. The discipline of naming CTEs clearly and building queries in stages is even more important with recursive queries, where debugging a malformed recursive member can be maddening.

Review Question 3: Convert the following correlated subquery to a window function approach:

-- Correlated subquery version
SELECT a.account_number,
       a.current_balance,
       (SELECT AVG(a2.current_balance)
        FROM ACCOUNTS a2
        WHERE a2.branch_id = a.branch_id
          AND a2.account_type = a.account_type) AS avg_balance
FROM   ACCOUNTS a;

-- Window function version
SELECT account_number,
       current_balance,
       AVG(current_balance) OVER (
           PARTITION BY branch_id, account_type
       ) AS avg_balance
FROM   ACCOUNTS;

Both produce identical results, but the window function version is clearer, more concise, and typically faster.


Chapter Summary

This chapter introduced four families of advanced SQL features that fundamentally change how you approach data problems:

  1. Window functions let you compute aggregates, rankings, and comparisons across related rows without collapsing them. The OVER clause with PARTITION BY, ORDER BY, and frame specifications gives you precise control over which rows participate in each computation.

  2. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) assign ordinal positions within partitions, with different tie-handling semantics that matter for different business requirements.

  3. LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE let you access specific rows relative to the current row — enabling period-over-period comparisons, gap analysis, and reference lookups without self-joins.

  4. Frame specifications (ROWS BETWEEN, RANGE BETWEEN) define the sliding viewport that aggregate window functions operate on. Understanding the default frame behavior is essential for avoiding subtle bugs.

  5. Recursive CTEs traverse hierarchical relationships to arbitrary depth. They open up org chart queries, bill of materials explosions, and graph traversals that are impossible with standard joins.

  6. Temporal tables — system-time, business-time, and bitemporal — let DB2 manage the time dimension, providing automatic history tracking and "time-travel" queries that are essential for regulatory compliance.

  7. OLAP grouping extensions — ROLLUP, CUBE, and GROUPING SETS — produce multi-level summaries in a single query, replacing multiple UNION ALL operations with a clean, declarative syntax.

These are the tools that separate a SQL user from a SQL thinker. In the chapters ahead, we will build on these foundations as we move into database design, stored procedures, and performance tuning. But the analytical power you have gained in this chapter will remain central to your work — these queries form the backbone of every reporting system, every dashboard, and every data pipeline you will ever build.


What Comes Next

Chapter 11 takes us beyond DML into the world of stored procedures, user-defined functions, and triggers. We will embed the analytical SQL you learned here inside procedural logic — creating reusable, parameterized analytical routines that Meridian's applications can call with a single statement. The window functions and recursive CTEs from this chapter will appear frequently in those procedures.


The queries in this chapter were developed against IBM DB2 for LUW version 11.5. Temporal table syntax and specific function availability may vary slightly in DB2 for z/OS or earlier versions. Consult the IBM Knowledge Center for your specific platform and version.