Case Study 1: CNB's Batch Performance Optimization Project

DataField.Dev

Case Study 1: CNB's Batch Performance Optimization Project

"We didn't buy a faster mainframe. We learned how to use the one we had."

— Kwame Mensah, Chief Architect, Continental National Bank

The Setup

Continental National Bank's Q4 batch window crisis (Chapter 23) had been resolved through architectural changes — dependency cleanup, parallel stream redesign, and job splitting. The critical path dropped from 420 minutes (the blown-window night) to 310 minutes. The batch window closed at 5:05 AM most nights, leaving 55 minutes of margin. Enough to sleep through the night. Not enough to sleep well.

Rob Calloway was direct in the post-crisis review: "We fixed the architecture. Now we need to fix the programs. Three hundred ten minutes is too long for what these jobs actually do. I've been watching EXCP counters for 17 years, and half these jobs are doing three times the I/O they should be."

Kwame Mensah agreed. He chartered a 12-week performance optimization project with three goals: 1. Reduce critical path elapsed time by at least 30% 2. Reduce total batch EXCP count by at least 40% 3. Reduce GP MSU consumption by at least 20%

Lisa Tran would lead the DB2 optimization workstream. Rob would lead the I/O and scheduling workstream. And a senior COBOL developer named Jerome Washington — quiet, methodical, and the best code-level diagnostician in the shop — would lead the COBOL and SORT optimization workstream.

Phase 1: The Measurement Baseline (Weeks 1-2)

Jerome's first action was to ban all optimization discussions until the measurement baseline was complete. "I've been on three performance projects where people started tuning before they measured," he said. "All three of them made things worse."

The team collected SMF Type 30 data for 14 consecutive overnight runs, covering two full weeks including month-end processing. They supplemented with SMF Type 14/15 (dataset activity), Type 42 (DFSORT), and Type 101/102 (DB2 accounting).

The baseline revealed patterns that surprised even Rob:

Discovery 1: The BLKSIZE Problem

Dataset                      LRECL   BLKSIZE   Records/Block   Track Util
─────────────────────────────────────────────────────────────────────────
CNB.EOD.TRANS                200     4096      20              14.5%
CNB.EOD.SORTED.TRANS         200     6160      30              21.8%
CNB.EOD.VALID.TRANS          200     200       1               0.7%
CNB.GL.JOURNAL               150     4050      27              14.3%
CNB.STMT.EXTRACT             250     8000      32              28.3%
CNB.REG.FILING               300     4500      15              15.9%
CNB.ACH.OUTPUT               94      4042      43              14.3%

Seven critical-path datasets. Not one with half-track blocking. The worst offender — CNB.EOD.VALID.TRANS with BLKSIZE=200 (one record per block) — had been created in 1998 by a programmer who misunderstood DCB parameters. It had been processing 50 million records with 50 million EXCP every night for 26 years.

Jerome did the math: optimal BLKSIZE for LRECL=200 is 27800 (139 records per block). That's a ratio of 139:1 versus the current 1:1. The validation step was issuing 139 times more I/O operations than necessary.

Rob Calloway stared at the numbers. "Twenty-six years," he said quietly. "Twenty-six years of paying for 139x the I/O because nobody looked."

Discovery 2: The Buffer Desert

Job Name        Input BUFNO   Output BUFNO   VSAM BUFNI   VSAM BUFND
─────────────────────────────────────────────────────────────────────
CNBEOD-POST     (default=5)   (default=5)    1            2
CNBEOD-VALID    (default=5)   (default=5)    1            2
CNBEOD-STMT     (default=5)   (default=5)    N/A          N/A
CNBEOD-GL03     (default=5)   (default=5)    N/A          N/A
CNBEOD-REG      (default=5)   (default=5)    1            2

Every critical-path job was running with default buffer settings. No BUFNO overrides. No VSAM buffer optimization. The account master VSAM cluster — 5 million records, 3-level index — was being accessed with BUFNI=1, meaning every random lookup required three index I/O operations.

Discovery 3: The COBOL Sort Bottleneck

Jerome examined the EOD transaction sort (CNBEOD-SORT):

      * In CNBEOD-SORT (compiled 2019, OPT(0)):
       SORT SORT-WORK-FILE
           ON ASCENDING KEY SW-ACCOUNT-NUMBER
           INPUT PROCEDURE IS 1000-FILTER-INPUT
           OUTPUT PROCEDURE IS 2000-ADD-TRAILER.

Two problems: 1. INPUT PROCEDURE and OUTPUT PROCEDURE disabled FASTSRT 2. The INPUT PROCEDURE filtered records (only type 'P' and 'S' transactions) — achievable with DFSORT INCLUDE 3. The OUTPUT PROCEDURE added a single trailer record — achievable with DFSORT OUTFIL TRAILER1

The entire SORT program — 340 lines of COBOL — existed to do what DFSORT could do in 8 lines of control statements.

Discovery 4: The Compiler Archaeology

Program           Compiled    OPT Level   FASTSRT   NUMPROC
──────────────────────────────────────────────────────────
CNBEOD-POST       2021-03     OPT(0)      NO        NOPFD
CNBEOD-VALID      2019-11     OPT(0)      NO        NOPFD
CNBEOD-INTST      2020-06     OPT(0)      NO        NOPFD
CNBEOD-STMT       2018-04     OPT(0)      NO        NOPFD
CNBEOD-GL03       2022-01     OPT(1)      NO        NOPFD
CNBEOD-BAL        2021-09     OPT(0)      NO        NOPFD
CNBEOD-RECON      2020-02     OPT(0)      NO        NOPFD

Nine of ten critical-path programs compiled with OPT(0). No FASTSRT anywhere. NUMPROC(NOPFD) — the most conservative packed-decimal handling — on every program, even though all packed decimal fields in the CNB system have valid signs.

Phase 2: The Low-Hanging Fruit (Weeks 3-5)

BLKSIZE Remediation

Jerome created new dataset definitions with optimal BLKSIZE for all seven critical-path datasets. The change was purely JCL and SMS — no COBOL modifications required:

//* BEFORE:
//VALIDOUT DD DSN=CNB.EOD.VALID.TRANS,DISP=(NEW,CATLG),
//            SPACE=(CYL,(200,20)),
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=200)
//*
//* AFTER:
//VALIDOUT DD DSN=CNB.EOD.VALID.TRANS,DISP=(NEW,CATLG),
//            SPACE=(CYL,(200,20)),
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)

Results after the first night:

Dataset                  EXCP Before    EXCP After     Reduction
────────────────────────────────────────────────────────────────
CNB.EOD.TRANS            2,500,000      360,000        85.6%
CNB.EOD.SORTED.TRANS     1,667,000      360,000        78.4%
CNB.EOD.VALID.TRANS      50,000,000     360,000        99.3%
CNB.GL.JOURNAL           741,000        107,000        85.6%
CNB.STMT.EXTRACT         7,813,000      1,799,000      77.0%
CNB.REG.FILING           3,333,000      567,000        83.0%
CNB.ACH.OUTPUT           11,628,000     1,700,000      85.4%

The validation dataset went from 50 million EXCP to 360,000. A 99.3% reduction. Jerome ran the numbers twice to be sure.

Total EXCP saved from BLKSIZE alone: 71.6 million per night. Critical path reduction: 18 minutes.

Buffer Optimization

Jerome added BUFNO overrides to all critical-path JCL and VSAM buffer specifications:

//TRANFILE DD DSN=CNB.EOD.TRANS,DISP=SHR,BUFNO=25
//ACCTMSTR DD DSN=CNB.VSAM.ACCOUNTS,DISP=SHR,
//            AMP=('BUFND=20,BUFNI=12400')

The VSAM full-index buffering for the account master was the biggest single win. The posting job (CNBEOD-POST) went from 43 minutes to 34 minutes — nine minutes saved from buffer changes alone.

Compiler Recompilation

All ten critical-path programs were recompiled with the CNB production compiler option set:

OPT(2), FASTSRT, NOSEQCHK, NUMPROC(PFD), TRUNC(OPT), SSRANGE, RENT, RMODE(ANY)

Three of the programs contained SORT verbs that qualified for FASTSRT once recompiled. Combined CPU savings: 14 minutes per night.

Phase 3: DFSORT Replacement (Weeks 6-8)

Jerome identified four COBOL programs whose entire purpose was sort, merge, or reformat operations:

Program          Lines   Function                         DFSORT Replacement
─────────────────────────────────────────────────────────────────────────────
CNBEOD-SORT      340     Sort + filter + trailer          8 control stmt lines
CNBEOD-MERGE     520     Merge 3 sorted files + reformat  12 control stmt lines
CNBEOD-RPTFMT    890     Read sorted, reformat for report 6 control stmt lines
CNBEOD-DEDUP     280     Sort + eliminate duplicates       4 control stmt lines

Total: 2,030 lines of COBOL replaced by 30 lines of DFSORT control statements.

The SORT replacement was the most impactful. The COBOL SORT with INPUT/OUTPUT PROCEDURE ran in 5 minutes. The DFSORT replacement:

  SORT FIELDS=(1,10,CH,A)
  INCLUDE COND=(85,1,CH,EQ,C'P',OR,85,1,CH,EQ,C'S')
  INCLUDE COND=(100,8,PD,GT,+0)
  OUTFIL TRAILER1=(COUNT=(M10,LENGTH=8),
                   C' RECORD COUNT: ',
                   COUNT=(M10,LENGTH=10))
  OPTION MAINSIZE=MAX,HIPRMAX=OPTIMAL

Elapsed time: 2 minutes 18 seconds. A 54% reduction from the COBOL version.

Phase 4: DB2 Optimization (Weeks 9-10)

Lisa Tran's workstream focused on the three DB2-bound critical-path jobs: CNBEOD-BAL (balance calculation), CNBEOD-INTST (interest accrual), and CNBEOD-RECON (reconciliation).

Commit Frequency Adjustment

All three programs were committing every record. Lisa changed them to commit every 5,000 records:

Job              Commit Before   Commit After   DB2 Wait Before   DB2 Wait After
──────────────────────────────────────────────────────────────────────────────────
CNBEOD-BAL       5,000,000       1,000          11:12             6:08
CNBEOD-INTST     5,000,000       1,000          15:30             9:44
CNBEOD-RECON     5,000,000       1,000          6:54              4:18

Total DB2 wait time saved: 13 minutes 26 seconds.

Cursor Optimization

Lisa added FOR FETCH ONLY to 14 read cursors and verified sequential prefetch activation in EXPLAIN for all batch access paths. Two cursors were missing optimal indexes — she added them and saw list prefetch engage, dropping random I/O to sequential I/O patterns.

BIND with DEGREE(ANY)

Re-binding the three DB2-bound batch plans with DEGREE(ANY) enabled partition-level parallelism for the balance calculation (the ACCOUNTS table is partitioned by account number range into 10 partitions). CNBEOD-BAL dropped from 15:33 to 10:22.

Phase 5: Advanced Techniques (Weeks 11-12)

Hiperbatch Deployment

Kwame Mensah championed Hiperbatch for the five most-read batch datasets. The DLF configuration was straightforward:

OBJECT(CNB.EOD.TRANS)         CONNECT(YES) RLSQ(YES)
OBJECT(CNB.EOD.SORTED.TRANS)  CONNECT(YES) RLSQ(YES)
OBJECT(CNB.EOD.VALID.TRANS)   CONNECT(YES) RLSQ(YES)
OBJECT(CNB.ACCT.MASTER.FLAT)  CONNECT(YES) RLSQ(YES)
OBJECT(CNB.RATE.TABLES)       CONNECT(YES) RLSQ(YES)

The transaction extract file (CNB.EOD.TRANS) was read by six jobs. Hiperbatch reduced the total EXCP for those six reads from 2.16 million (6 × 360,000) to 360,000 (one physical read + five cache reads). Across all five datasets, Hiperbatch eliminated 4.2 million EXCP per night.

zIIP Offload

Working with the systems programming team, Kwame enabled zIIP offload for DB2 batch processing. The DB2 subsystem parameter MAX_ZIIP_OFFLOAD was set to 100 (full offload).

The impact on GP MSU was immediate: 31 minutes of CPU time per night shifted from GP to zIIP. At CNB's MSU rate, this translated to a $14,000/month reduction in software licensing costs.

The Results

After 12 weeks, the batch performance dashboard told the story:

CNB BATCH PERFORMANCE PROJECT — FINAL RESULTS
═══════════════════════════════════════════════════════

                        BASELINE    WEEK 12     CHANGE
────────────────────────────────────────────────────────
Critical path (min):    310         188         -39.4%
Total elapsed (min):    375         228         -39.2%
Total EXCP (millions):  48.2        18.7        -61.2%
GP CPU time (min):      142         89          -37.3%
zIIP CPU time (min):    0           31          (new)
Window margin (min):    30→55*      162         +194%
Monthly MSU:            3,840       2,590       -32.6%
Monthly savings:        —           $14,000/mo  (zIIP offload)

* Margin was 30 min before Ch 23 arch changes, 55 after, 162 after Ch 26.

BREAKDOWN BY OPTIMIZATION LAYER:
────────────────────────────────────────────────────────
BLKSIZE remediation:           -18 min elapsed, -71.6M EXCP
Buffer optimization:           -22 min elapsed, -15.3M EXCP
Compiler recompilation:        -14 min CPU
DFSORT replacement:            -8 min elapsed, -1.8M EXCP
DB2 optimization:              -18 min elapsed (DB2 wait)
Hiperbatch:                    -4.2M EXCP, -6 min I/O wait
zIIP offload:                  -31 min GP CPU (→ zIIP)

The Aftermath

Three months after the project completed, Rob Calloway presented the results to the CIO — the same CIO who had received the 5:47 AM phone call during the Q4 crisis.

"We went from a batch window that was blowing at 375 minutes to one that finishes in 228 minutes with 162 minutes of margin," Rob said. "That margin gives us 12-15 months of volume growth before we need to re-optimize. And we did it without buying a single piece of hardware."

The CIO's question was pointed: "Why wasn't this done before the crisis?"

Kwame answered: "Because nobody measured. The batch window worked, so nobody looked inside it. We had BLKSIZE settings from 1998 that were costing us 50 million unnecessary I/O operations every night. We had programs compiled with default optimization for 26 years. We had every buffer at default. It all worked — it just worked 40% slower than it should have."

Jerome Washington, characteristically quiet, added the technical coda: "The platform was never the bottleneck. The configuration was. A 3390 doesn't care if you read 200 bytes per I/O or 27,800 bytes per I/O — it does the I/O either way. The difference is whether you ask it to do the I/O 50 million times or 360,000 times."

The project established a permanent batch performance review process at CNB. Every critical-path job now has a quarterly performance review. Every new batch program must pass a performance checklist before production deployment. And every JCL template in the shop includes optimal BLKSIZE and BUFNO settings.

Marcus Whitfield at Federal Benefits, hearing about the results through Sandra Chen's industry network, started his own performance baseline. He found similar patterns — default BLKSIZE, default buffers, OPT(0) everywhere. "If CNB was leaving 40% on the table," Marcus said, "I guarantee we're leaving 50%."

He was right. His initial baseline showed 52% improvement potential. But that's a story for his retirement — and Sandra's continuation.

Lessons Learned

Measure before you touch anything. Two weeks of SMF collection prevented three months of misdirected effort.
The biggest wins are the simplest changes. BLKSIZE correction and buffer allocation — zero code changes — delivered more improvement than all the COBOL and DB2 work combined.
Defaults are not decisions. Every default setting is a non-decision by a programmer who didn't know or didn't care. In batch performance, defaults are almost always wrong.
Compound optimization. Each layer of improvement enables the next. Better BLKSIZE means fewer EXCP means buffers are more effective means Hiperbatch caches more efficiently. The layers multiply.
Performance is maintenance. The CNB batch window degraded over 26 years because nobody maintained the performance configuration. Performance isn't a one-time project — it's a discipline.
The platform is not the problem. z/OS, properly configured, is the highest-throughput batch processing platform ever built. When performance is bad, the configuration is bad.