Case Study 2: OLTP Query Tuning Sprint

Background

Meridian National Bank has deployed a new teller workstation application. Within the first week of production, the help desk is flooded with complaints: "The system is slow." Response times that were sub-second in testing now exceed 5 seconds during peak hours (10:00 AM - 2:00 PM). Branch managers are escalating to the CIO.

The DBA team captures a workload trace and identifies the top five most frequently executed queries, sorted by total CPU consumption. These five queries account for 78% of the total CPU used by the teller application during peak hours.

The Top 5 Problem Queries

Query 1: Customer Lookup by Name (executed ~2,000 times/hour)

SELECT c.customer_id, c.customer_name, c.phone, c.email,
       c.address_line1, c.city, c.state, c.zip_code
FROM   CUSTOMERS c
WHERE  UPPER(c.customer_name) LIKE '%' || UPPER(:search_term) || '%'
ORDER BY c.customer_name
FETCH FIRST 20 ROWS ONLY;

Measured: Average 3.2 seconds per execution. Tablespace scan every time. At 2,000 executions per hour, this single query consumes 1.78 hours of wall-clock time per hour of peak (overlapping executions).

Query 2: Account Summary (executed ~3,500 times/hour)

SELECT a.account_id, a.account_number, a.account_type,
       a.current_balance, a.status,
       (SELECT MAX(t.txn_date) FROM TRANSACTIONS t
        WHERE t.account_id = a.account_id) AS last_activity,
       (SELECT COUNT(*) FROM HOLDS h
        WHERE h.account_id = a.account_id AND h.status = 'A') AS active_holds
FROM   ACCOUNTS a
WHERE  a.customer_id = :customer_id;

Measured: Average 1.8 seconds per execution. The correlated subquery on TRANSACTIONS is the bottleneck (50M row table, no index on account_id + txn_date for MAX).

Query 3: Recent Transactions (executed ~4,000 times/hour)

SELECT t.txn_id, t.txn_date, t.txn_type, t.txn_amount,
       t.description, t.running_balance
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
  AND  t.txn_date >= CURRENT DATE - 30 DAYS
ORDER BY t.txn_date DESC, t.txn_id DESC;

Measured: Average 0.8 seconds. The query is sargable but returns ALL transactions from the last 30 days — some active accounts have 500+ transactions. The application only displays the first 25.

Query 4: Balance Inquiry with Hold Calculation (executed ~5,000 times/hour)

SELECT a.account_number, a.account_type, a.current_balance,
       a.current_balance - COALESCE(
           (SELECT SUM(h.hold_amount) FROM HOLDS h
            WHERE h.account_id = a.account_id AND h.status = 'A'), 0
       ) AS available_balance
FROM   ACCOUNTS a
WHERE  a.account_number = :account_number;

Measured: Average 0.4 seconds. Not terrible individually, but at 5,000 executions/hour, the correlated subquery overhead accumulates.

Query 5: Duplicate Transaction Check (executed ~1,500 times/hour)

SELECT COUNT(*)
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
  AND  t.txn_amount = :amount
  AND  t.txn_type = :type
  AND  t.txn_date = CURRENT DATE
  AND  HOUR(t.txn_timestamp) = HOUR(CURRENT TIMESTAMP)
  AND  MINUTE(t.txn_timestamp) BETWEEN MINUTE(CURRENT TIMESTAMP) - 5
                                    AND MINUTE(CURRENT TIMESTAMP);

Measured: Average 1.1 seconds. The HOUR() and MINUTE() functions on txn_timestamp are non-sargable. This check runs before every teller transaction.

Your Task

Part 1: Prioritize

Rank the five queries by impact (frequency x average time). Which query should be tuned first?

Part 2: Rewrite Each Query

For each of the five queries, provide: 1. A rewritten version that addresses all identified problems 2. Supporting index recommendations 3. Expected improvement in average response time

Part 3: Quick Wins vs Long-Term Fixes

Categorize your changes into: - Quick wins: Changes that can be deployed today (SQL rewrites, index additions) - Medium-term: Changes requiring application code modifications - Long-term: Changes requiring schema modifications or architectural changes

Part 4: Capacity Planning

After tuning, estimate the new peak-hour CPU profile. If the current server is at 85% CPU during peak, what will the new utilization be?

Solution Outline

Impact Ranking

Query Freq/hr Avg Time CPU-Hours/Hr Rank
Q1: Customer Lookup 2,000 3.2s 1.78 1
Q2: Account Summary 3,500 1.8s 1.75 2
Q5: Duplicate Check 1,500 1.1s 0.46 3
Q3: Recent Txns 4,000 0.8s 0.89 4
Q4: Balance Inquiry 5,000 0.4s 0.56 5

Query 1 Rewrite: Customer Lookup

-- Change 1: Trailing wildcard only (requires teller training to search "SMITH" not "mith")
-- Change 2: Remove UPPER() by standardizing data at insert time
-- Change 3: Add OPTIMIZE FOR
CREATE INDEX IX_CUST_NAME ON CUSTOMERS(customer_name);

SELECT c.customer_id, c.customer_name, c.phone, c.email,
       c.address_line1, c.city, c.state, c.zip_code
FROM   CUSTOMERS c
WHERE  c.customer_name LIKE :search_term || '%'
ORDER BY c.customer_name
FETCH FIRST 20 ROWS ONLY
OPTIMIZE FOR 20 ROWS;
-- Expected: 3.2s → 0.05s

If substring search is truly required, implement a text search index as a medium-term project.

Query 2 Rewrite: Account Summary

-- Replace correlated subqueries with LEFT JOINs
CREATE INDEX IX_TXN_ACCT_DATE ON TRANSACTIONS(account_id, txn_date DESC);
CREATE INDEX IX_HOLDS_ACCT_STATUS ON HOLDS(account_id, status) INCLUDE (hold_amount);

SELECT a.account_id, a.account_number, a.account_type,
       a.current_balance, a.status,
       la.last_activity,
       COALESCE(hc.active_holds, 0) AS active_holds
FROM   ACCOUNTS a
LEFT JOIN (
    SELECT account_id, MAX(txn_date) AS last_activity
    FROM   TRANSACTIONS
    GROUP BY account_id
) la ON la.account_id = a.account_id
LEFT JOIN (
    SELECT account_id, COUNT(*) AS active_holds
    FROM   HOLDS WHERE status = 'A'
    GROUP BY account_id
) hc ON hc.account_id = a.account_id
WHERE  a.customer_id = :customer_id;
-- Expected: 1.8s → 0.1s

Alternative for the last activity date — use LATERAL or a denormalized column:

-- Even faster: lateral join reads only one row per account
SELECT a.account_id, a.account_number, a.account_type,
       a.current_balance, a.status,
       la.last_activity,
       COALESCE(hc.active_holds, 0) AS active_holds
FROM   ACCOUNTS a
LEFT JOIN LATERAL (
    SELECT MAX(t.txn_date) AS last_activity
    FROM   TRANSACTIONS t
    WHERE  t.account_id = a.account_id
) la ON TRUE
LEFT JOIN LATERAL (
    SELECT COUNT(*) AS active_holds
    FROM   HOLDS h
    WHERE  h.account_id = a.account_id AND h.status = 'A'
) hc ON TRUE
WHERE  a.customer_id = :customer_id;

Query 3 Rewrite: Recent Transactions

-- Add FETCH FIRST to match what the application actually displays
SELECT t.txn_id, t.txn_date, t.txn_type, t.txn_amount,
       t.description, t.running_balance
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
  AND  t.txn_date >= CURRENT DATE - 30 DAYS
ORDER BY t.txn_date DESC, t.txn_id DESC
FETCH FIRST 25 ROWS ONLY
OPTIMIZE FOR 25 ROWS;

-- Supporting index (descending for sort avoidance):
CREATE INDEX IX_TXN_ACCT_DATE_DESC ON TRANSACTIONS(account_id, txn_date DESC, txn_id DESC);
-- Expected: 0.8s → 0.03s

Query 4 Rewrite: Balance Inquiry

-- Pre-aggregate holds or use a maintained column
-- Quick win: add index and restructure
CREATE INDEX IX_HOLDS_ACCT_STATUS ON HOLDS(account_id, status) INCLUDE (hold_amount);

SELECT a.account_number, a.account_type, a.current_balance,
       a.current_balance - COALESCE(h.total_holds, 0) AS available_balance
FROM   ACCOUNTS a
LEFT JOIN (
    SELECT account_id, SUM(hold_amount) AS total_holds
    FROM   HOLDS WHERE status = 'A'
    GROUP BY account_id
) h ON h.account_id = a.account_id
WHERE  a.account_number = :account_number;
-- Expected: 0.4s → 0.02s

Query 5 Rewrite: Duplicate Transaction Check

-- Replace HOUR/MINUTE functions with a sargable timestamp range
-- Application calculates: :window_start = CURRENT TIMESTAMP - 5 MINUTES
SELECT COUNT(*)
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
  AND  t.txn_amount = :amount
  AND  t.txn_type = :type
  AND  t.txn_date = CURRENT DATE
  AND  t.txn_timestamp >= :window_start
  AND  t.txn_timestamp <= CURRENT TIMESTAMP;

-- Supporting index:
CREATE INDEX IX_TXN_DUP_CHECK
    ON TRANSACTIONS(account_id, txn_date, txn_type, txn_amount, txn_timestamp);
-- Expected: 1.1s → 0.01s

Projected Impact

Query Before After Freq/hr CPU Saved/hr
Q1 3.2s 0.05s 2,000 1.75 hrs
Q2 1.8s 0.10s 3,500 1.65 hrs
Q3 0.8s 0.03s 4,000 0.86 hrs
Q4 0.4s 0.02s 5,000 0.53 hrs
Q5 1.1s 0.01s 1,500 0.45 hrs
Total 5.24 hrs saved

If the server was at 85% CPU utilization during peak, and these queries accounted for 78% of teller CPU, the reduction represents roughly a 50-60% drop in teller CPU. Overall server utilization might drop from 85% to approximately 45-50%, providing ample headroom for growth.