Case Study: GlobalBank — Optimizing the Nightly Batch from 4 Hours to 90 Minutes

The Crisis

On a Friday night in October 2025, GlobalBank's nightly batch cycle overran its window for the first time. The cycle — scheduled to complete by 5:00 AM — didn't finish until 5:47 AM. Online banking was unavailable for 47 minutes during a period when early-rising customers on the East Coast were checking balances and initiating transfers.

The incident made the local news. The CIO called an emergency meeting on Monday.

The Data

Maria Chen pulled historical batch completion data:

Month Batch Duration Trend
Jan 2024 2h 50m
Apr 2024 3h 05m +15m
Jul 2024 3h 22m +17m
Oct 2024 3h 35m +13m
Jan 2025 3h 48m +13m
Apr 2025 3h 55m +7m
Jul 2025 4h 02m +7m
Oct 2025 4h 47m +45m (overrun!)

The trend was clear: batch duration was growing at approximately 4-5 minutes per month, driven by account growth (12% year-over-year) and new regulatory feeds. The October spike was caused by a quarter-end regulatory report that processed additional data.

At the current growth rate, even after optimization, the batch would need to accommodate 15-20% annual data growth for the foreseeable future.

The Profiling Phase

Maria ran Strobe profiling on each job in the batch cycle over five consecutive nights (Monday-Friday) to get a representative sample:

BAL-CALC (72 minutes average)

The interest calculation program was the single longest job. Strobe revealed: - 36.9% of CPU in 3110-COMPOUND-DAILY — the hot paragraph, called 2.3 million times - All arithmetic fields were DISPLAY format — 9 bytes per field instead of 5 (COMP-3) - Daily rate was recomputed inside the main loop for every record, despite being constant for each account type - VSAM block size was 4,096 — adequate but not optimal

TXN-POST (55 minutes average)

The transaction posting program read a sequential transaction file and updated the VSAM account master. Strobe showed almost no CPU utilization — the program was I/O bound, spending 95% of time waiting for VSAM random reads.

Investigation revealed that the transaction file was not sorted by account key. Each transaction required a random VSAM read. With 800,000 transactions against 2.3 million accounts, most reads were cache misses.

RPT-DAILY (45 minutes average)

The daily report job spent 70% of its time in a DFSORT step. The sort processed 5 million records. Only one SORTWK DD was defined (the JCL had been copied from a template that predated sort optimization guidelines).

REG-FEED (38 minutes average)

The regulatory feed job wrote a 50-million-record file to sequential output. Maria noticed the block size was 800 bytes — the same as the logical record length. Records were completely unblocked. Every WRITE was a separate I/O operation.

"Someone set BLKSIZE equal to LRECL," Maria said. "That means every single record is its own block. Fifty million I/Os instead of 156,250."

The Optimization Plan

Maria prioritized optimizations by expected impact:

Optimization Target Job Expected Savings Risk Effort
Block size fix REG-FEED 30 min Very low 1 hour
Sort optimization RPT-DAILY 25 min Low 2 hours
Sort + sequential read TXN-POST 35 min Low 4 hours
Data type conversion BAL-CALC 8 min Medium 8 hours
Loop optimization BAL-CALC 6 min Low 2 hours
Compiler options BAL-CALC 4 min Low 1 hour
VSAM buffer tuning BAL-CALC 17 min Low 1 hour
DB2 query rewrite ACCT-MAINT 14 min Medium 6 hours

Total expected savings: 139 minutes. Target: 150 minutes.

Implementation

The One-Line Fix (REG-FEED)

The single most impactful change was one line of JCL:

//* BEFORE:
//OUTPUT   DD DSN=PROD.REG.FEED,DISP=(NEW,CATLG),
//            DCB=(RECFM=FB,LRECL=800,BLKSIZE=800)

//* AFTER:
//OUTPUT   DD DSN=PROD.REG.FEED,DISP=(NEW,CATLG),
//            DCB=(RECFM=FB,LRECL=800,BLKSIZE=32000)

Result: 38 minutes down to 8 minutes. No code changes. No testing required (the output file content was identical, just blocked differently). The downstream consumer read the file with the same JCL since the system handles deblocking automatically.

The Sort Pre-Step (TXN-POST)

Maria added a sort step before TXN-POST:

//SORT     EXEC PGM=SORT
//SORTIN   DD DSN=PROD.TXN.DAILY,DISP=SHR
//SORTOUT  DD DSN=&&SORTED,DISP=(NEW,PASS),
//            SPACE=(CYL,(50,10)),
//            DCB=(RECFM=FB,LRECL=100,BLKSIZE=32000)
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(30))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(30))
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(30))
//SYSIN    DD *
  SORT FIELDS=(1,10,CH,A)
  OPTION MAINSIZE=MAX
/*

The sort took 3 minutes. But because TXN-POST could now read the VSAM master sequentially (skipping non-matching records) instead of randomly, processing dropped from 55 minutes to 15 minutes. Net savings: 37 minutes.

The BAL-CALC Overhaul

This was the most complex change. Maria converted 23 DISPLAY fields to COMP-3, moved the daily rate computation outside the main loop, changed compiler options to OPTIMIZE(FULL) and NUMPROC(PFD), and increased VSAM buffers.

The unit test suite from Chapter 34 was essential — every change was verified against the 64-test regression suite before promotion.

Post-Optimization Results

After all optimizations were implemented and tested over a one-week parallel run period:

Job Before After Savings
BAL-CALC 72 min 25 min 65%
TXN-POST 55 min 15 min 73%
RPT-DAILY 45 min 12 min 73%
REG-FEED 38 min 8 min 79%
ACCT-MAINT 22 min 8 min 64%
Other 8 min 7 min 13%
Total 240 min 75 min 69%

The batch cycle now completed by 12:15 AM — nearly five hours before the deadline.

Six-Month Follow-Up

Six months later, with continued 12% annual account growth, the batch cycle had grown to 82 minutes — still well within the 6-hour window. The optimization had bought approximately 8 years of growth headroom at current rates.

Discussion Questions

  1. The REG-FEED block size fix saved 30 minutes with a one-line JCL change. How could this misconfiguration have been prevented? What process would catch similar issues in the future?

  2. Maria prioritized optimizations by expected savings. If she had instead prioritized by effort (easiest first), would the order change? Would the outcome be different?

  3. The sort pre-step for TXN-POST added 3 minutes but saved 40. Under what circumstances would adding a sort step NOT be beneficial?

  4. Maria used NUMPROC(PFD) for BAL-CALC. What testing would you require before allowing this change in production? What would happen if the data contained non-preferred signs?