Case Study 2: Scaling the Batch Window

Background

Regional Savings Bank processes approximately 80,000 transactions per night through a COBOL batch system similar to the one designed in this chapter. The system has a 6-hour batch window (10 PM to 4 AM), and for years the job stream completed in about 3.5 hours, leaving comfortable margin.

Over the past 18 months, the bank has been growing — acquiring two smaller banks and expanding its online banking services. Transaction volume has increased from 80,000 to 210,000 per night. The batch run now takes 5 hours and 40 minutes, leaving only 20 minutes of margin.

"If we have one more month of growth, we're going to blow the batch window," said Operations Manager Rick Torres. "When that happens, the online system can't start on time, and customers can't access their accounts."

The Analysis

The development team — led by senior developer Claudia Voss — was tasked with reducing batch processing time by at least 40% to accommodate projected growth. They began by profiling the existing system.

Current Job Stream: 1. Sort transactions by account number — 15 minutes 2. Transaction processing (TXN-PROC) — 4 hours 20 minutes 3. Balance reconciliation — 25 minutes 4. Audit trail archival — 15 minutes 5. Backup — 25 minutes

TXN-PROC consumed 76% of the total run time. This was the obvious target.

TXN-PROC Profile: - 210,000 READ operations on the transaction file (sequential, fast) - 210,000 READ operations on the VSAM master (random, slow) - 185,000 REWRITE operations on the VSAM master (random, slow) - 210,000 WRITE operations on the audit file (sequential, fast)

The bottleneck was clear: random VSAM I/O. Each random READ or REWRITE required a physical disk access (the VSAM file was 2.1 GB, far too large to cache entirely). With 395,000 random I/O operations and an average access time of 4 milliseconds per I/O, the I/O alone consumed approximately 26 minutes. But the actual time was much longer because of CI and CA splits caused by insufficient free space, buffer pool contention, and channel busy conditions during peak I/O.

The Solutions

Claudia's team implemented four optimizations:

1. Sort Input to Match VSAM Key Sequence

The transaction file was already sorted, but by transaction time — not by account number. Changing the sort to account number sequence meant that consecutive transactions for the same account hit the same VSAM CI, which was already in the buffer pool. This single change reduced VSAM I/O time by approximately 35%.

2. Batch Multiple Transactions Per Account

Instead of reading the account master, applying one transaction, and rewriting — then reading the same account again for the next transaction — the modified program accumulated all transactions for the same account and applied them in a single read-modify-rewrite cycle. For accounts with 5-10 transactions per day, this reduced VSAM I/O operations dramatically.

      * Before: One read-rewrite per transaction
      * After:  One read, multiple applies, one rewrite per account
       2000-PROCESS-ACCOUNT-GROUP.
           MOVE TXN-ACCT-NUMBER TO WS-CURRENT-ACCT
           PERFORM 2100-READ-ACCOUNT-MASTER
           PERFORM 2200-APPLY-TRANSACTION
               UNTIL TXN-ACCT-NUMBER NOT = WS-CURRENT-ACCT
                  OR WS-END-OF-FILE
           PERFORM 2300-REWRITE-ACCOUNT-MASTER
           .

3. Increase VSAM Buffer Allocation

The original JCL allocated default buffers (2 data buffers, 1 index buffer). Claudia increased these to 20 data buffers and 5 index buffers using the AMP parameter:

//ACCTMSTR DD  DSN=REGIONAL.ACCT.MASTER,DISP=SHR,
//             AMP=('BUFND=20,BUFNI=5')

This change alone reduced elapsed time by 18% by keeping more of the VSAM index and recently accessed data CIs in memory.

4. Reorganize the VSAM File

IDCAMS LISTCAT showed that the VSAM file had 847 CI splits and 12 CA splits — far more than expected. The file had not been reorganized in 8 months. An EXPORT/IMPORT with fresh free space definitions restored optimal performance.

Results

Metric Before After Improvement
TXN-PROC elapsed time 4h 20m 1h 45m 60% reduction
Total batch window 5h 40m 3h 05m 46% reduction
VSAM I/O operations 395,000 148,000 63% reduction
Throughput (txn/sec) 13.5 33.3 147% increase

The batch window now had nearly 3 hours of margin — enough to accommodate 18 months of projected growth.

Discussion Questions

  1. Why does sorting the input file to match the VSAM key sequence improve performance so dramatically?
  2. The "batch multiple transactions per account" optimization changes the program's structure significantly. What new error handling challenges does this introduce?
  3. At what point would you recommend migrating from VSAM to DB2 for the account master? What factors would influence this decision?
  4. If the bank continues to grow, eventually even the optimized batch will exceed the window. What architectural changes would you recommend for the long term?
  5. How do the buffer allocation changes interact with other batch jobs that may be running concurrently? What is the risk of allocating too many buffers?