Case Study 1: GlobalBank Batch Transaction Processing Optimization

Background

GlobalBank's nightly batch cycle processes all daily transactions against the master account file. In 2022, the bank's transaction volume had grown from 800,000 to 2.3 million transactions per day, and the batch window was in crisis. The main processing job, TXN-PROC, had grown from a comfortable 45-minute run to a nail-biting 2 hours and 50 minutes — dangerously close to the 3-hour batch window deadline.

Priya Kapoor, GlobalBank's systems architect, was tasked with optimizing TXN-PROC without a full rewrite. "We do not have time for a rewrite," she told the project team. "We need to find performance within the existing structure."

The Original Code

The original TXN-PROC main loop was straightforward but inefficient:

       1300-PROCESS-ALL-TXNS.
           READ TXN-FILE INTO WS-TXN-RECORD
               AT END SET END-OF-FILE TO TRUE
           END-READ
           PERFORM 2000-PROCESS-ONE-TXN
               UNTIL END-OF-FILE
           .

       2000-PROCESS-ONE-TXN.
           MOVE 0 TO WS-ACCT-FOUND
           PERFORM 2100-SEARCH-ACCT-TABLE
               VARYING WS-SEARCH-IDX FROM 1 BY 1
               UNTIL WS-SEARCH-IDX > WS-ACCT-TABLE-SIZE
               OR WS-ACCT-FOUND = 1

           IF WS-ACCT-FOUND = 1
               PERFORM 2200-APPLY-TXN
               PERFORM 2300-UPDATE-RUNNING-TOTALS
               PERFORM 2400-CHECK-ALERTS
           ELSE
               PERFORM 2500-WRITE-REJECT
           END-IF

           READ TXN-FILE INTO WS-TXN-RECORD
               AT END SET END-OF-FILE TO TRUE
           END-READ
           .

       2100-SEARCH-ACCT-TABLE.
           IF WS-ACCT-NUM(WS-SEARCH-IDX) =
              WS-TXN-ACCT-NUM
               MOVE 1 TO WS-ACCT-FOUND
           END-IF
           .

Problems Identified

Priya's analysis revealed three critical issues:

Problem 1: Linear Search on Sorted Data

The account table was loaded in account-number order from the VSAM master file, but the search used a linear scan. With 150,000 accounts, the average search examined 75,000 entries per transaction. For 2.3 million transactions, that was approximately 172 billion comparisons per night.

Problem 2: No Defensive Controls

The loop had no safety counter, no checkpoint logging, and no progress monitoring. On three occasions in the previous year, a corrupted input file had caused the job to loop for the entire batch window before an operator noticed.

Problem 3: Redundant Work Inside the Loop

The 2400-CHECK-ALERTS paragraph recalculated the alert threshold on every iteration, even though the threshold only changed once per day.

The Optimization

Fix 1: Binary Search with Early Exit

Since the account table was sorted, Priya replaced the linear search with a binary search using PERFORM VARYING and manual index manipulation:

       2100-SEARCH-ACCT-TABLE.
           MOVE 1 TO WS-LOW-IDX
           MOVE WS-ACCT-TABLE-SIZE TO WS-HIGH-IDX
           MOVE 0 TO WS-ACCT-FOUND

           PERFORM 2110-BINARY-SEARCH
               UNTIL WS-LOW-IDX > WS-HIGH-IDX
               OR WS-ACCT-FOUND = 1
           .

       2110-BINARY-SEARCH.
           COMPUTE WS-MID-IDX =
               (WS-LOW-IDX + WS-HIGH-IDX) / 2

           EVALUATE TRUE
               WHEN WS-ACCT-NUM(WS-MID-IDX) =
                    WS-TXN-ACCT-NUM
                   MOVE 1 TO WS-ACCT-FOUND
                   MOVE WS-MID-IDX TO WS-SEARCH-IDX
               WHEN WS-ACCT-NUM(WS-MID-IDX) <
                    WS-TXN-ACCT-NUM
                   ADD 1 TO WS-MID-IDX
                       GIVING WS-LOW-IDX
               WHEN WS-ACCT-NUM(WS-MID-IDX) >
                    WS-TXN-ACCT-NUM
                   SUBTRACT 1 FROM WS-MID-IDX
                       GIVING WS-HIGH-IDX
           END-EVALUATE
           .

This reduced the average search from 75,000 comparisons to approximately 17 (log2 of 150,000).

Fix 2: Comprehensive Defensive Controls

       01  WS-DEFENSE.
           05  WS-MAX-RECORDS       PIC 9(09) VALUE 5000000.
           05  WS-RECORDS-READ      PIC 9(09) VALUE 0.
           05  WS-CHECKPOINT-INT    PIC 9(07) VALUE 50000.
           05  WS-LAST-KEY          PIC X(10) VALUE SPACES.
           05  WS-STUCK-COUNT       PIC 9(05) VALUE 0.
           05  WS-STUCK-LIMIT       PIC 9(05) VALUE 500.

       1300-PROCESS-ALL-TXNS.
           PERFORM 2050-READ-NEXT-TXN
           PERFORM 2000-PROCESS-ONE-TXN
               UNTIL END-OF-FILE
               OR WS-RECORDS-READ >= WS-MAX-RECORDS
               OR WS-ABORT-FLAG = 'Y'

           PERFORM 1310-VERIFY-COMPLETION
           .

       1310-VERIFY-COMPLETION.
           EVALUATE TRUE
               WHEN END-OF-FILE
                   DISPLAY 'BATCH COMPLETE: '
                       WS-RECORDS-READ ' records'
               WHEN WS-RECORDS-READ >= WS-MAX-RECORDS
                   DISPLAY 'SAFETY LIMIT HIT at '
                       WS-RECORDS-READ
                   MOVE 8 TO RETURN-CODE
               WHEN WS-ABORT-FLAG = 'Y'
                   DISPLAY 'ABORTED: ' WS-ABORT-REASON
                   MOVE 16 TO RETURN-CODE
           END-EVALUATE
           .

Fix 3: Constant Hoisting

Alert threshold calculation was moved outside the loop:

       1200-PRE-LOOP-SETUP.
      *    Calculate constants ONCE before the loop
           COMPUTE WS-ALERT-THRESHOLD =
               WS-DAILY-LIMIT * WS-ALERT-PCT / 100
           COMPUTE WS-CTR-THRESHOLD = 10000.00
           MOVE FUNCTION CURRENT-DATE TO WS-BATCH-DATE
           .

Results

Metric Before After Improvement
Average search comparisons per TXN 75,000 17 99.98%
Total batch run time 2h 50m 42m 75%
Infinite loop incidents (annual) 3 0 100%
Time to detect stuck loop 45+ min < 1 min 98%

The optimization brought the batch well within the 3-hour window, with room for continued transaction volume growth.

Lessons Learned

  1. Algorithm choice matters more than micro-optimization. The binary search improvement alone accounted for 70% of the time savings. No amount of COBOL-level tuning would have compensated for a linear search on 150,000 entries.

  2. Defensive programming is not overhead — it is insurance. The checkpoint and safety counter code added microseconds per iteration but prevented hours-long production incidents.

  3. Loop constants should be computed once. Moving invariant calculations outside the loop is a simple change with measurable impact when multiplied by millions of iterations.

  4. Profile before you optimize. Priya used the IBM Debug Tool's performance profiling to identify that 94% of CPU time was in the search paragraph before making any changes.

Discussion Questions

  1. The binary search assumes the account table is sorted. What defensive check should be added to verify this assumption? What should happen if the table is not sorted?

  2. The safety counter is set to 5,000,000 (more than double the expected volume). How would you determine the right value for this limit? What are the tradeoffs of setting it too high vs. too low?

  3. The checkpoint interval is 50,000 records. What factors should influence this value? How would you adjust it for a system processing 10 million records per night?

  4. Could the search be further optimized by caching recently accessed accounts? What data structure would you use, and what are the tradeoffs?