Case Study 2: Index-Only Access Optimization — Eliminating Table Scans

"The Dashboard That Took 45 Seconds" — How a Covering Index Strategy Transformed a FinTech Application


The Company

Bridgepoint Financial Technologies was a San Francisco-based FinTech company that provided real-time portfolio analytics to institutional investors. Their platform served 340 hedge funds and asset managers, providing dashboards that showed portfolio positions, risk metrics, and performance attribution updated every 30 seconds throughout the trading day.

The backend ran on DB2 for LUW, version 11.5.8, deployed on a cluster of high-memory Linux servers (256 GB RAM each) on AWS. The core database held 4.2 billion rows across the main tables, with the POSITION table (current portfolio holdings) at 180 million rows and the TRADE table (historical trades) at 3.8 billion rows.

The Problem

In late 2025, client complaints escalated. The portfolio summary dashboard — the most-viewed page in the application — was taking 30-45 seconds to load during market hours. The SLA was 3 seconds. Clients were threatening to leave for competitors.

The dashboard query was conceptually simple. For a given fund, it needed to show:

SELECT P.SECURITY_ID,
       P.SECURITY_NAME,
       P.QUANTITY,
       P.MARKET_VALUE,
       P.WEIGHT_PCT,
       P.UNREALIZED_PNL
FROM POSITION P
WHERE P.FUND_ID = ?
  AND P.AS_OF_DATE = CURRENT DATE
  AND P.QUANTITY <> 0
ORDER BY P.MARKET_VALUE DESC;

A typical fund held 200-500 positions. The query should have returned a few hundred rows from a 180-million-row table. On paper, this was a straightforward indexed lookup. In practice, it was a performance disaster.

The Investigation

Bridgepoint's lead DBA, Raj Patel, began his investigation with the EXPLAIN output:

Access Plan:
  RETURN (Cost: 45,230)
    SORT (ORDER BY)
      FETCH (Table: POSITION)
        IXSCAN (Index: IX_POSITION_FUND_DATE)
          Matching columns: 2
          Index-only: NO

The access plan showed that DB2 was using the index IX_POSITION_FUND_DATE on (FUND_ID, AS_OF_DATE) to locate the rows — both columns were matching. But then it was performing a FETCH operation: for each qualifying index entry, DB2 read the corresponding data page to retrieve SECURITY_NAME, QUANTITY, MARKET_VALUE, WEIGHT_PCT, and UNREALIZED_PNL.

Raj examined the POSITION table's physical characteristics:

  • Table size: 180 million rows across 7.2 million data pages (about 28 GB)
  • Average rows per fund per day: 350
  • Cluster ratio on IX_POSITION_FUND_DATE: 23%

The cluster ratio was the smoking gun. At 23%, the data pages for a given fund's positions were scattered almost randomly throughout the tablespace. Each of the 350 qualifying rows was likely on a different data page. That meant 350 random data page reads for every dashboard load.

Why was the cluster ratio so low? The POSITION table was loaded by a batch process that ran at end of day. It processed funds in alphabetical order, inserting all of Fund A's positions, then Fund B's, and so on. On any given day, this created good clustering. But the table retained 90 days of history, and the FUND_ID values for different dates were interleaved throughout the tablespace. Yesterday's Fund A positions were on pages 1-5, today's on pages 7,200,001-7,200,005. The clustering index was on (FUND_ID, AS_OF_DATE), but the loading pattern distributed the data chronologically rather than by fund.

The First Attempt: REORG

Raj's first instinct was to REORG the tablespace to restore clustering. The REORG took 6 hours (28 GB of data, rebuilding all indexes). It improved the cluster ratio to 98%. Dashboard response time dropped to 2 seconds. Problem solved — until the next day's load, when the cluster ratio began degrading. Within a week, it was back below 40%. Daily REORGs were not feasible; the table was too large and the maintenance window too short.

The Second Attempt: Covering Index

Raj reconsidered the problem. The dashboard query needed only 6 columns from the POSITION table: SECURITY_ID, SECURITY_NAME, QUANTITY, MARKET_VALUE, WEIGHT_PCT, and UNREALIZED_PNL. The WHERE clause used FUND_ID, AS_OF_DATE, and QUANTITY. The ORDER BY used MARKET_VALUE.

What if all these columns were in the index?

CREATE INDEX IX_POSITION_DASHBOARD
    ON POSITION (FUND_ID, AS_OF_DATE, MARKET_VALUE DESC)
    INCLUDE (SECURITY_ID, SECURITY_NAME, QUANTITY,
             WEIGHT_PCT, UNREALIZED_PNL);

This index design was deliberate:

  1. FUND_ID as the first key column: equality predicate, most selective.
  2. AS_OF_DATE as the second key column: equality predicate.
  3. MARKET_VALUE DESC as the third key column: matches the ORDER BY, eliminating the SORT operation.
  4. INCLUDE columns: SECURITY_ID, SECURITY_NAME, QUANTITY, WEIGHT_PCT, UNREALIZED_PNL — everything else the query needs.

With this index, DB2 could satisfy the entire query from the index leaf pages: find the entries for the given FUND_ID and AS_OF_DATE (matching scan on 2 columns), deliver them in MARKET_VALUE DESC order (third key column, no sort needed), and return all needed columns from the INCLUDE data (no table fetch needed). The QUANTITY <> 0 filter would be applied to each leaf entry, but since QUANTITY was an INCLUDE column, it was available without a data page read.

The New EXPLAIN Output

Access Plan:
  RETURN (Cost: 42)
    IXSCAN (Index: IX_POSITION_DASHBOARD)
      Matching columns: 2
      Index-only: YES

Cost dropped from 45,230 to 42. The FETCH and SORT operators disappeared entirely.

Measuring the Impact

Raj tested the new index in the staging environment with production-scale data:

Metric Before After Improvement
Dashboard query time 30-45 seconds 80-120 ms 99.7%
I/O per query ~370 random reads ~8 sequential reads 97.8%
CPU per query 12ms 2ms 83%
Sort operations 1 per query 0 Eliminated

The index added 4.2 GB of storage (the INCLUDE columns made each leaf entry substantially wider). But the buffer pool impact was actually positive: the dashboard query no longer read 350 random data pages, freeing buffer pool space for other queries.

The Trade-Off Analysis

Raj carefully evaluated the write overhead. The POSITION table was loaded once per day in a batch window. The load process inserted approximately 2 million rows (350 positions x 340 funds x ~17 adjustment cycles). With the new covering index, each INSERT required one additional index entry write — wider than a standard index entry due to the INCLUDE columns.

Measured impact on the nightly load: - Before: 45 minutes for the POSITION load - After: 52 minutes for the POSITION load - Overhead: 7 minutes (15% increase)

Seven minutes of additional batch time to save 45 seconds per dashboard load, 340 clients, refreshing every 30 seconds throughout a 6.5-hour trading day. The math:

  • Dashboard loads per day: 340 clients x (6.5 hours x 120 refreshes/hour) = 265,200 queries
  • Time saved per query: ~40 seconds
  • Total time saved: 265,200 x 40 seconds = 2,948 hours of cumulative client wait time per day
  • Time cost: 7 minutes of batch window

The trade-off was not even close.

Additional Optimizations

Emboldened by the dashboard success, Raj identified four more high-frequency queries that could benefit from covering indexes:

Query 2 — Position Detail Drill-Down:

SELECT SECURITY_ID, SECURITY_NAME, QUANTITY, AVG_COST,
       MARKET_PRICE, MARKET_VALUE, UNREALIZED_PNL, SECTOR
FROM POSITION
WHERE FUND_ID = ? AND AS_OF_DATE = CURRENT DATE
  AND SECURITY_ID = ?;

This query was already partially served by the dashboard index (FUND_ID and AS_OF_DATE matching). But it also needed AVG_COST, MARKET_PRICE, and SECTOR — columns not in the dashboard index. Rather than widening the dashboard index further (which would slow inserts), Raj created a separate index:

CREATE UNIQUE INDEX IX_POSITION_DETAIL
    ON POSITION (FUND_ID, AS_OF_DATE, SECURITY_ID)
    INCLUDE (SECURITY_NAME, QUANTITY, AVG_COST,
             MARKET_PRICE, MARKET_VALUE, UNREALIZED_PNL,
             WEIGHT_PCT, SECTOR);

Query 3 — Fund Summary (total market value and P&L):

SELECT SUM(MARKET_VALUE), SUM(UNREALIZED_PNL)
FROM POSITION
WHERE FUND_ID = ? AND AS_OF_DATE = CURRENT DATE
  AND QUANTITY <> 0;

This was already index-only with the dashboard index (MARKET_VALUE and UNREALIZED_PNL were INCLUDE columns, QUANTITY was an INCLUDE column for filtering). No new index needed.

Query 4 — Sector Allocation:

SELECT SECTOR, SUM(MARKET_VALUE) AS SECTOR_VALUE,
       SUM(WEIGHT_PCT) AS SECTOR_WEIGHT
FROM POSITION
WHERE FUND_ID = ? AND AS_OF_DATE = CURRENT DATE
  AND QUANTITY <> 0
GROUP BY SECTOR;

Served by the detail index (IX_POSITION_DETAIL), which included SECTOR, MARKET_VALUE, WEIGHT_PCT, and QUANTITY.

In the end, two covering indexes served all four critical queries with index-only access. Total additional storage: 9.1 GB. Total batch load overhead: 12 minutes. Total OLTP improvement: transformative.

The Broader Lesson

Raj documented three principles from this experience:

1. Cluster ratio is the hidden variable. A query that looks efficient on paper (matching index scan on two selective columns) can be devastatingly slow if the cluster ratio is low. The EXPLAIN output showed the right index was being used — the problem was not the access plan but the physical data layout. Always check the cluster ratio before assuming an indexed lookup is fast.

2. Index-only access changes the equation fundamentally. When the cluster ratio is low, adding more data to the index (via INCLUDE columns) can be more effective than fixing the cluster ratio. A covering index bypasses the data pages entirely, making the cluster ratio irrelevant for that query. This is especially powerful for read-heavy workloads where the table is large and clustering is difficult to maintain.

3. Measure the trade-off in real units. "The index is wider" and "the insert is slower" are not arguments — they are observations. The argument must be quantified: how much wider, how much slower, and does the read benefit exceed the write cost? In Raj's case, 12 minutes of batch time bought 2,948 hours of client wait time savings. That is not a close call.

Epilogue

Bridgepoint's client complaints dropped to zero within a week of deploying the covering indexes. The portfolio summary dashboard loaded in under 150ms — so fast that the UX team added a subtle animation to the page transition because the instant load felt "jarring" to users accustomed to waiting.

Raj presented the case study at the annual IDUG conference in Dallas. The title of his talk: "Stop Fetching: How INCLUDE Columns Saved Our Business." It was the highest-rated technical session of the conference.


Return to Chapter 15 | Continue to Key Takeaways