Case Study 1: Financial Reporting Queries at Scale

DataField.Dev

Case Study 1: Financial Reporting Queries at Scale

Background

Regional First Bancorp (RFB) is a mid-sized bank holding company operating 47 branches across three states: Pennsylvania, Delaware, and New Jersey. The bank manages approximately 380,000 customer accounts with a total deposit base of $4.2 billion. Its transaction table contains 18 months of history — roughly 45 million rows — and grows by about 250,000 rows per business day.

RFB's reporting infrastructure was built five years ago by a consulting firm. The consultants created a set of stored procedures that generate nightly reports by running aggregate queries against the production DB2 for LUW database. At the time, the approach worked. The largest table had 12 million rows, and the nightly batch window could accommodate the reporting workload alongside the usual batch processing.

Today, the system is breaking.

The Problem

Three specific issues have brought the reporting team to the DBA group:

Issue 1: The Branch Performance Report takes 47 minutes. This report aggregates transaction data by branch, month, and transaction type using a three-level ROLLUP. It joins TRANSACTIONS (45 million rows) with ACCOUNTS (380,000 rows) and BRANCHES (47 rows). Five years ago, with 12 million transactions, it ran in 8 minutes. It has been growing linearly with data volume.

Issue 2: The Customer Segmentation Report produces incorrect averages. The report groups customers into wealth tiers based on total deposits. The marketing department noticed that the "average balance per customer" in the mid-tier segment was $23,000 — but a manual audit of 50 customers in that segment showed an average of $31,000. The discrepancy has undermined trust in the reporting system.

Issue 3: The Channel Mix CUBE report exhausts temporary table space. A CUBE operation on four dimensions — state, branch, account type, and channel — produces 2^4 = 16 grouping levels. Combined with the large data volume, the intermediate sort and hash operations exceed the configured temporary tablespace, and the query fails with SQLSTATE 57011 (out of temporary space).

Investigation

Issue 1: The 47-Minute Report

The DBA team ran EXPLAIN on the Branch Performance Report and found:

The query performed a full table scan on TRANSACTIONS because the WHERE clause filtered on TRANSACTION_DATE >= '2024-07-01', and the existing index on TRANSACTION_DATE was not being used — the optimizer estimated that 18 months of data (the full table) needed to be read anyway, since the filter covered most of the table's range.
The GROUP BY ROLLUP triggered a sort operation that overflowed the sort heap. The SORTHEAP was configured at 2048 4K pages (8 MB), which was sufficient five years ago but inadequate for a sort involving 45 million rows with multiple grouping columns.
The query joined TRANSACTIONS to ACCOUNTS to BRANCHES in the inner loop, forcing a nested loop join where a hash join would have been far more efficient at this data volume.

Resolution:

The team applied three fixes:

First, they changed the reporting query to filter on a narrower date range. The ROLLUP report only needed 12 months of data, not 18. They added AND TRANSACTION_DATE >= CURRENT DATE - 12 MONTHS which allowed the optimizer to use the TRANSACTION_DATE index for range elimination, cutting the scanned rows from 45 million to approximately 30 million.

Second, they increased SORTHEAP from 2048 to 8192 pages (32 MB) and set SHEAPTHRES_SHR appropriately to allow the larger sort heap without starving other workloads.

Third, and most significantly, they created a Materialized Query Table (MQT):

CREATE TABLE MQT_BRANCH_MONTHLY_SUMMARY AS (
    SELECT a.BRANCH_ID,
           YEAR(t.TRANSACTION_DATE) AS TXN_YEAR,
           MONTH(t.TRANSACTION_DATE) AS TXN_MONTH,
           t.TRANSACTION_TYPE,
           COUNT(*) AS TXN_COUNT,
           SUM(t.AMOUNT) AS TOTAL_AMOUNT,
           SUM(CASE WHEN t.AMOUNT > 0 THEN t.AMOUNT ELSE 0 END) AS CREDITS,
           SUM(CASE WHEN t.AMOUNT < 0 THEN ABS(t.AMOUNT) ELSE 0 END) AS DEBITS
    FROM TRANSACTIONS t
    JOIN ACCOUNTS a ON t.ACCOUNT_ID = a.ACCOUNT_ID
    WHERE t.STATUS = 'C'
    GROUP BY a.BRANCH_ID, YEAR(t.TRANSACTION_DATE),
             MONTH(t.TRANSACTION_DATE), t.TRANSACTION_TYPE
) DATA INITIALLY DEFERRED REFRESH DEFERRED;

REFRESH TABLE MQT_BRANCH_MONTHLY_SUMMARY;

The MQT pre-aggregates 45 million transaction rows into approximately 8,500 rows (47 branches x 18 months x roughly 10 transaction types). The nightly batch refreshes the MQT after transaction processing completes. The ROLLUP report now runs against the MQT instead of the base tables:

SELECT b.STATE_CODE,
       b.BRANCH_NAME,
       m.TXN_YEAR,
       m.TXN_MONTH,
       SUM(m.TXN_COUNT) AS txn_count,
       SUM(m.CREDITS) AS credits,
       SUM(m.DEBITS) AS debits
FROM MQT_BRANCH_MONTHLY_SUMMARY m
JOIN BRANCHES b ON m.BRANCH_ID = b.BRANCH_ID
GROUP BY ROLLUP (b.STATE_CODE, b.BRANCH_NAME, m.TXN_YEAR, m.TXN_MONTH)
ORDER BY ...;

Run time dropped from 47 minutes to 3 seconds.

Issue 2: The Incorrect Averages

The marketing team's Customer Segmentation Report contained this query:

SELECT segment,
       COUNT(*) AS customer_count,
       AVG(CURRENT_BALANCE) AS avg_balance
FROM (
    SELECT c.CUSTOMER_ID,
           CASE WHEN SUM(a.CURRENT_BALANCE) < 25000 THEN 'Mass Market'
                WHEN SUM(a.CURRENT_BALANCE) < 250000 THEN 'Affluent'
                ELSE 'High Net Worth'
           END AS segment,
           a.CURRENT_BALANCE
    FROM CUSTOMERS c
    JOIN ACCOUNTS a ON c.CUSTOMER_ID = a.CUSTOMER_ID
    WHERE a.STATUS = 'A'
    GROUP BY c.CUSTOMER_ID, a.CURRENT_BALANCE
) sub
GROUP BY segment;

The bug was in the subquery. By including a.CURRENT_BALANCE in the GROUP BY, the subquery produced one row per customer per distinct balance, not one row per customer. A customer with three accounts (balances of $10,000, $15,000, and $8,000) produced three rows, each with a different CURRENT_BALANCE value. The CASE expression evaluated SUM(CURRENT_BALANCE) correctly for the segmentation, but the outer AVG(CURRENT_BALANCE) averaged individual account balances, not total customer balances.

Resolution:

SELECT segment,
       COUNT(*) AS customer_count,
       AVG(total_balance) AS avg_balance
FROM (
    SELECT c.CUSTOMER_ID,
           SUM(a.CURRENT_BALANCE) AS total_balance,
           CASE WHEN SUM(a.CURRENT_BALANCE) < 25000 THEN 'Mass Market'
                WHEN SUM(a.CURRENT_BALANCE) < 250000 THEN 'Affluent'
                ELSE 'High Net Worth'
           END AS segment
    FROM CUSTOMERS c
    JOIN ACCOUNTS a ON c.CUSTOMER_ID = a.CUSTOMER_ID
    WHERE a.STATUS = 'A'
    GROUP BY c.CUSTOMER_ID
) sub
GROUP BY segment;

The fix was to remove a.CURRENT_BALANCE from the inner GROUP BY and instead compute SUM(a.CURRENT_BALANCE) AS total_balance in the subquery's SELECT. The outer query then averages the per-customer totals. The corrected report matched the manual audit.

Issue 3: The CUBE Temporary Space Exhaustion

The four-dimension CUBE was producing 16 grouping levels across data that, before grouping, involved 45 million transaction rows joined to accounts and branches. The sort operations for CUBE are internally implemented as multiple passes, and the intermediate results exceeded the 5 GB temporary tablespace.

Resolution:

The team realized that management did not actually need all 16 combinations. They only needed:

State + Branch + Account Type + Channel (detail)
State + Branch + Account Type (channel subtotal)
State + Branch (account type subtotal)
State (branch subtotal)
Grand total

This is a ROLLUP, not a CUBE. Switching from CUBE to ROLLUP reduced the grouping levels from 16 to 5 and, more importantly, allowed DB2 to use a single sorted pass rather than multiple hash-based passes. Additionally, the team filtered the data to the most recent 6 months before the ROLLUP, further reducing the working set.

For the few cases where management genuinely needed cross-dimensional subtotals (e.g., "total by channel across all branches"), they used GROUPING SETS with only the specific combinations needed, rather than the full CUBE.

Lessons Learned

Aggregation performance degrades with data growth. A query that runs in 8 minutes on 12 million rows will not run in 8 minutes on 45 million rows. Monitor growth and plan for it.
Materialized Query Tables are the single most effective tool for speeding up reporting queries. Pre-aggregating from millions of rows to thousands transforms minutes into seconds.
AVG bugs are subtle. The denominator of AVG is not always what you expect, especially in subqueries with incorrect GROUP BY columns. When aggregation results do not match manual audits, check the GROUP BY clause first.
CUBE is expensive. 2^N grouping levels grow exponentially. If you need a hierarchy, use ROLLUP (N+1 levels). If you need specific combinations, use GROUPING SETS. Only use CUBE when you genuinely need every combination.
Temporary tablespace must be sized for the worst-case query. ROLLUP, CUBE, and large GROUP BY operations can produce enormous intermediate result sets. Monitor temporary space utilization and increase it proactively as data grows.

Discussion Questions

The MQT solved the performance problem, but it introduces a data freshness issue — the MQT is only as current as its last refresh. How would you handle a request for a "real-time" branch summary report while still using the MQT for historical data?
The AVG bug persisted for months before the marketing team noticed. What testing practices could have caught it earlier?
If the bank decides to add a fifth dimension to the CUBE (e.g., customer segment), the grouping levels would increase from 16 to 32. At what point is it better to move this analysis to a separate data warehouse with a star schema rather than running it against the OLTP database?