Case Study 1: GlobalBank Batch Transaction Processing Optimization
Background
GlobalBank's nightly batch cycle processes all daily transactions against the master account file. In 2022, the bank's transaction volume had grown from 800,000 to 2.3 million transactions per day, and the batch window was in crisis. The main processing job, TXN-PROC, had grown from a comfortable 45-minute run to a nail-biting 2 hours and 50 minutes — dangerously close to the 3-hour batch window deadline.
Priya Kapoor, GlobalBank's systems architect, was tasked with optimizing TXN-PROC without a full rewrite. "We do not have time for a rewrite," she told the project team. "We need to find performance within the existing structure."
The Original Code
The original TXN-PROC main loop was straightforward but inefficient:
1300-PROCESS-ALL-TXNS.
READ TXN-FILE INTO WS-TXN-RECORD
AT END SET END-OF-FILE TO TRUE
END-READ
PERFORM 2000-PROCESS-ONE-TXN
UNTIL END-OF-FILE
.
2000-PROCESS-ONE-TXN.
MOVE 0 TO WS-ACCT-FOUND
PERFORM 2100-SEARCH-ACCT-TABLE
VARYING WS-SEARCH-IDX FROM 1 BY 1
UNTIL WS-SEARCH-IDX > WS-ACCT-TABLE-SIZE
OR WS-ACCT-FOUND = 1
IF WS-ACCT-FOUND = 1
PERFORM 2200-APPLY-TXN
PERFORM 2300-UPDATE-RUNNING-TOTALS
PERFORM 2400-CHECK-ALERTS
ELSE
PERFORM 2500-WRITE-REJECT
END-IF
READ TXN-FILE INTO WS-TXN-RECORD
AT END SET END-OF-FILE TO TRUE
END-READ
.
2100-SEARCH-ACCT-TABLE.
IF WS-ACCT-NUM(WS-SEARCH-IDX) =
WS-TXN-ACCT-NUM
MOVE 1 TO WS-ACCT-FOUND
END-IF
.
Problems Identified
Priya's analysis revealed three critical issues:
Problem 1: Linear Search on Sorted Data
The account table was loaded in account-number order from the VSAM master file, but the search used a linear scan. With 150,000 accounts, the average search examined 75,000 entries per transaction. For 2.3 million transactions, that was approximately 172 billion comparisons per night.
Problem 2: No Defensive Controls
The loop had no safety counter, no checkpoint logging, and no progress monitoring. On three occasions in the previous year, a corrupted input file had caused the job to loop for the entire batch window before an operator noticed.
Problem 3: Redundant Work Inside the Loop
The 2400-CHECK-ALERTS paragraph recalculated the alert threshold on every iteration, even though the threshold only changed once per day.
The Optimization
Fix 1: Binary Search with Early Exit
Since the account table was sorted, Priya replaced the linear search with a binary search using PERFORM VARYING and manual index manipulation:
2100-SEARCH-ACCT-TABLE.
MOVE 1 TO WS-LOW-IDX
MOVE WS-ACCT-TABLE-SIZE TO WS-HIGH-IDX
MOVE 0 TO WS-ACCT-FOUND
PERFORM 2110-BINARY-SEARCH
UNTIL WS-LOW-IDX > WS-HIGH-IDX
OR WS-ACCT-FOUND = 1
.
2110-BINARY-SEARCH.
COMPUTE WS-MID-IDX =
(WS-LOW-IDX + WS-HIGH-IDX) / 2
EVALUATE TRUE
WHEN WS-ACCT-NUM(WS-MID-IDX) =
WS-TXN-ACCT-NUM
MOVE 1 TO WS-ACCT-FOUND
MOVE WS-MID-IDX TO WS-SEARCH-IDX
WHEN WS-ACCT-NUM(WS-MID-IDX) <
WS-TXN-ACCT-NUM
ADD 1 TO WS-MID-IDX
GIVING WS-LOW-IDX
WHEN WS-ACCT-NUM(WS-MID-IDX) >
WS-TXN-ACCT-NUM
SUBTRACT 1 FROM WS-MID-IDX
GIVING WS-HIGH-IDX
END-EVALUATE
.
This reduced the average search from 75,000 comparisons to approximately 17 (log2 of 150,000).
Fix 2: Comprehensive Defensive Controls
01 WS-DEFENSE.
05 WS-MAX-RECORDS PIC 9(09) VALUE 5000000.
05 WS-RECORDS-READ PIC 9(09) VALUE 0.
05 WS-CHECKPOINT-INT PIC 9(07) VALUE 50000.
05 WS-LAST-KEY PIC X(10) VALUE SPACES.
05 WS-STUCK-COUNT PIC 9(05) VALUE 0.
05 WS-STUCK-LIMIT PIC 9(05) VALUE 500.
1300-PROCESS-ALL-TXNS.
PERFORM 2050-READ-NEXT-TXN
PERFORM 2000-PROCESS-ONE-TXN
UNTIL END-OF-FILE
OR WS-RECORDS-READ >= WS-MAX-RECORDS
OR WS-ABORT-FLAG = 'Y'
PERFORM 1310-VERIFY-COMPLETION
.
1310-VERIFY-COMPLETION.
EVALUATE TRUE
WHEN END-OF-FILE
DISPLAY 'BATCH COMPLETE: '
WS-RECORDS-READ ' records'
WHEN WS-RECORDS-READ >= WS-MAX-RECORDS
DISPLAY 'SAFETY LIMIT HIT at '
WS-RECORDS-READ
MOVE 8 TO RETURN-CODE
WHEN WS-ABORT-FLAG = 'Y'
DISPLAY 'ABORTED: ' WS-ABORT-REASON
MOVE 16 TO RETURN-CODE
END-EVALUATE
.
Fix 3: Constant Hoisting
Alert threshold calculation was moved outside the loop:
1200-PRE-LOOP-SETUP.
* Calculate constants ONCE before the loop
COMPUTE WS-ALERT-THRESHOLD =
WS-DAILY-LIMIT * WS-ALERT-PCT / 100
COMPUTE WS-CTR-THRESHOLD = 10000.00
MOVE FUNCTION CURRENT-DATE TO WS-BATCH-DATE
.
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average search comparisons per TXN | 75,000 | 17 | 99.98% |
| Total batch run time | 2h 50m | 42m | 75% |
| Infinite loop incidents (annual) | 3 | 0 | 100% |
| Time to detect stuck loop | 45+ min | < 1 min | 98% |
The optimization brought the batch well within the 3-hour window, with room for continued transaction volume growth.
Lessons Learned
-
Algorithm choice matters more than micro-optimization. The binary search improvement alone accounted for 70% of the time savings. No amount of COBOL-level tuning would have compensated for a linear search on 150,000 entries.
-
Defensive programming is not overhead — it is insurance. The checkpoint and safety counter code added microseconds per iteration but prevented hours-long production incidents.
-
Loop constants should be computed once. Moving invariant calculations outside the loop is a simple change with measurable impact when multiplied by millions of iterations.
-
Profile before you optimize. Priya used the IBM Debug Tool's performance profiling to identify that 94% of CPU time was in the search paragraph before making any changes.
Discussion Questions
-
The binary search assumes the account table is sorted. What defensive check should be added to verify this assumption? What should happen if the table is not sorted?
-
The safety counter is set to 5,000,000 (more than double the expected volume). How would you determine the right value for this limit? What are the tradeoffs of setting it too high vs. too low?
-
The checkpoint interval is 50,000 records. What factors should influence this value? How would you adjust it for a system processing 10 million records per night?
-
Could the search be further optimized by caching recently accessed accounts? What data structure would you use, and what are the tradeoffs?