Case Study 1: Batch Data Processing -- End-of-Day at Meridian Bank


Background

Every weekday at 6:00 PM Eastern, Meridian National Bank's mainframe begins the End-of-Day (EOD) batch cycle. The batch window is four hours -- from 6:00 PM to 10:00 PM. During this window, the system must process the day's transactions, calculate balances, update summaries, feed data to downstream systems, and prepare for the next business day. If the batch does not complete by 10:00 PM, the overnight maintenance utilities (REORG, RUNSTATS, backups) are delayed, which in turn can delay the start of the next business day. A missed batch window is a serious operational incident.

Marcus Chen, a mid-level DB2 DBA at Meridian, has been assigned to investigate and optimize the EOD batch. The batch has been growing steadily as the bank adds customers, and last week it exceeded the window for the first time -- completing at 10:47 PM. Management wants it running within three hours to provide a safety margin.


The Current Batch Process

Marcus examines the existing EOD batch, which consists of five major steps:

Step 1: Transaction Posting (Current: 45 minutes)

The day's transactions are moved from the PENDING_TRANSACTIONS table to the TRANSACTION_HISTORY table. Currently implemented as:

-- Current implementation: row-by-row cursor processing
DECLARE post_cursor CURSOR FOR
    SELECT transaction_id, account_id, transaction_type,
           amount, transaction_date, description
    FROM meridian.pending_transactions
    WHERE status = 'VALIDATED'
      AND transaction_date = CURRENT DATE;

OPEN post_cursor;

FETCH post_cursor INTO ...;

WHILE SQLCODE = 0 DO
    INSERT INTO meridian.transaction_history
        (transaction_id, account_id, transaction_type,
         amount, transaction_date, description, posted_timestamp)
    VALUES (..., CURRENT TIMESTAMP);

    UPDATE meridian.pending_transactions
    SET status = 'POSTED'
    WHERE CURRENT OF post_cursor;

    SET v_count = v_count + 1;
    IF MOD(v_count, 100) = 0 THEN
        COMMIT;
    END IF;

    FETCH post_cursor INTO ...;
END WHILE;

COMMIT;

Problems identified: - Row-by-row cursor processing (the classic "slow-by-slow" anti-pattern) - Committing every 100 rows adds excessive commit overhead - No restart capability if the job fails partway through

Step 2: Balance Recalculation (Current: 55 minutes)

Every account's balance is recalculated from the day's posted transactions:

-- Current implementation: correlated UPDATE, all accounts at once
UPDATE meridian.account a
SET current_balance = current_balance + (
    SELECT COALESCE(SUM(amount), 0)
    FROM meridian.transaction_history t
    WHERE t.account_id = a.account_id
      AND t.transaction_date = CURRENT DATE
      AND t.posted_timestamp >= :batch_start_time
);

Problems identified: - Updates ALL accounts, including those with no transactions today - Single massive transaction -- risk of lock escalation and log fill - Correlated subquery executes for every account in the table

Step 3: Daily Summary Generation (Current: 35 minutes)

Generates DAILY_ACCOUNT_SUMMARY records. Currently uses DELETE-then-INSERT instead of MERGE:

-- Current: delete existing summaries, then insert new ones
DELETE FROM meridian.daily_account_summary
WHERE summary_date = CURRENT DATE;

COMMIT;

INSERT INTO meridian.daily_account_summary
    (account_id, summary_date, transaction_count,
     total_credits, total_debits, net_change)
SELECT
    account_id,
    CURRENT DATE,
    COUNT(*),
    SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END),
    SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END),
    SUM(amount)
FROM meridian.transaction_history
WHERE transaction_date = CURRENT DATE
GROUP BY account_id;

COMMIT;

Problems identified: - DELETE-then-INSERT is not atomic -- if the job fails between DELETE and INSERT, the summaries are lost - Full logging on both the DELETE and INSERT operations - No handling for accounts that appear in existing summaries but had no transactions today

Step 4: Downstream Feed Generation (Current: 25 minutes)

Exports data for the regulatory reporting system and the analytics warehouse. This step is primarily SELECT-based and performs adequately.

Step 5: Cleanup (Current: 20 minutes)

Archives posted transactions and cleans staging tables. Currently uses individual DELETE statements with small batch sizes.

Total current time: approximately 3 hours, occasionally exceeding 4 hours under peak volume.


Marcus's Optimization Plan

Optimization 1: Replace Cursor with Set-Based INSERT

Marcus replaces the row-by-row cursor in Step 1 with a set-based INSERT...SELECT:

-- Optimized Step 1: Set-based transaction posting
INSERT INTO meridian.transaction_history
    (transaction_id, account_id, transaction_type,
     amount, transaction_date, description, posted_timestamp)
SELECT
    transaction_id, account_id, transaction_type,
    amount, transaction_date, description, CURRENT TIMESTAMP
FROM meridian.pending_transactions
WHERE status = 'VALIDATED'
  AND transaction_date = CURRENT DATE;

-- Mark as posted in a single UPDATE
UPDATE meridian.pending_transactions
SET status = 'POSTED'
WHERE status = 'VALIDATED'
  AND transaction_date = CURRENT DATE;

COMMIT;

Result: Step 1 drops from 45 minutes to 8 minutes. The set-based approach eliminates cursor overhead, reduces commit frequency (two commits instead of thousands), and allows DB2 to use its most efficient access paths.

Risk mitigation: Marcus adds a row-count check after each statement. If the INSERT count does not match the UPDATE count, the transaction is rolled back and the job is flagged for investigation.

Optimization 2: Targeted Balance Update with MERGE

Marcus replaces the correlated UPDATE in Step 2 with a MERGE that only touches accounts with transactions:

-- Optimized Step 2: MERGE-based balance update
MERGE INTO meridian.account AS target
USING (
    SELECT account_id, SUM(amount) AS day_net
    FROM meridian.transaction_history
    WHERE transaction_date = CURRENT DATE
      AND posted_timestamp >= :batch_start_time
    GROUP BY account_id
) AS source
ON target.account_id = source.account_id
WHEN MATCHED THEN
    UPDATE SET
        target.current_balance = target.current_balance + source.day_net,
        target.last_transaction_date = CURRENT DATE;

Result: Step 2 drops from 55 minutes to 12 minutes. The MERGE only updates accounts that had transactions (typically 15-20% of total accounts), and the aggregation in the USING subquery means the join occurs against a much smaller derived table.

Optimization 3: MERGE for Summary Generation

Marcus replaces the DELETE-then-INSERT in Step 3 with a single MERGE:

-- Optimized Step 3: MERGE-based summary generation
MERGE INTO meridian.daily_account_summary AS target
USING (
    SELECT
        account_id,
        CURRENT DATE AS summary_date,
        COUNT(*) AS transaction_count,
        SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS total_credits,
        SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) AS total_debits,
        SUM(amount) AS net_change
    FROM meridian.transaction_history
    WHERE transaction_date = CURRENT DATE
    GROUP BY account_id
) AS source
ON target.account_id = source.account_id
   AND target.summary_date = source.summary_date
WHEN MATCHED THEN
    UPDATE SET
        target.transaction_count = source.transaction_count,
        target.total_credits = source.total_credits,
        target.total_debits = source.total_debits,
        target.net_change = source.net_change,
        target.last_updated = CURRENT TIMESTAMP
WHEN NOT MATCHED THEN
    INSERT (account_id, summary_date, transaction_count,
            total_credits, total_debits, net_change, last_updated)
    VALUES (source.account_id, source.summary_date,
            source.transaction_count, source.total_credits,
            source.total_debits, source.net_change, CURRENT TIMESTAMP);

COMMIT;

Result: Step 3 drops from 35 minutes to 10 minutes. The MERGE is atomic (no window where summaries are missing), generates roughly half the log volume (no DELETE log records), and is idempotent -- running it twice produces the same result.

Optimization 4: TRUNCATE for Staging Cleanup

Marcus replaces the DELETE in Step 5's staging table cleanup with TRUNCATE:

-- Optimized Step 5: TRUNCATE staging tables
TRUNCATE TABLE meridian.pending_transactions IMMEDIATE;
TRUNCATE TABLE meridian.daily_feed_staging IMMEDIATE;

Result: Step 5 drops from 20 minutes to under 1 minute. TRUNCATE bypasses row-level logging entirely.


Results Summary

Step Before After Improvement
1. Transaction Posting 45 min 8 min 82%
2. Balance Recalculation 55 min 12 min 78%
3. Summary Generation 35 min 10 min 71%
4. Downstream Feeds 25 min 25 min (no change)
5. Cleanup 20 min 1 min 95%
Total 180 min 56 min 69%

The batch now completes in under one hour, well within the four-hour window and far below the three-hour target. The optimizations also reduced total log volume by approximately 60%, reduced lock contention (fewer individual lock acquisitions), and added idempotency to the critical MERGE steps.


Lessons Learned

  1. Set-based operations beat cursor processing. The single biggest improvement came from replacing the row-by-row cursor with an INSERT...SELECT. DB2's optimizer can plan set-based operations far more efficiently than it can optimize individual row operations issued in a loop.

  2. MERGE is the natural tool for batch synchronization. The DELETE-then-INSERT pattern is fragile and wasteful. MERGE handles the same logic atomically and with less logging.

  3. Commit frequency matters, but not the way you might expect. Committing every 100 rows (Step 1's original approach) added more overhead than benefit. The optimized approach uses fewer, larger transactions with strategic commit points.

  4. TRUNCATE exists for a reason. When you need to empty a table and do not need trigger execution or row-level logging, TRUNCATE is orders of magnitude faster than DELETE.

  5. Measure before optimizing. Marcus started by timing each step individually. Without this measurement, he might have spent time optimizing Step 4 (which was already efficient) instead of focusing on the highest-impact steps.


Discussion Questions

  1. The optimized Step 1 processes all transactions in a single INSERT...SELECT. What happens if this set contains 2 million rows? At what point should Marcus add intermediate commits, and how would that change the implementation?

  2. The MERGE in Step 2 uses a subquery in the USING clause that aggregates by account_id. What index on meridian.transaction_history would most benefit this query?

  3. Step 5 uses TRUNCATE, which does not fire triggers. If there were a DELETE trigger on pending_transactions that logged archived records, how would Marcus need to modify his approach?

  4. Marcus's optimizations reduced the batch from 3 hours to 1 hour. What would you monitor going forward to detect if the batch is growing toward the window limit again? What proactive actions would you take?


Return to Chapter 9 | Continue to Case Study 2