Case Study 1: Continental National Bank's U4038 Cascade and LE Governance Overhaul

Background

Continental National Bank's CICS environment runs 30 production regions across four LPARs. These regions serve online banking, teller operations, ATM processing, and wire transfers — processing approximately 500 million transactions per day. The COBOL programs in these regions have been compiled over a span of twelve years, using Enterprise COBOL versions from V4.1 through V6.3.

In September 2022, IBM released a PTF (Program Temporary Fix) for Language Environment that CNB's systems programming team installed during a routine maintenance window. The PTF addressed a performance improvement in LE's condition handling — a fix that 99% of programs would never notice. But it included a tightened compatibility check for programs compiled with Enterprise COBOL V4.x.

Nobody knew that 14 production CICS programs had been compiled with Enterprise COBOL V4.2 and never recompiled.

The Incident

November 7, 2022 — Monday

The z/OS maintenance was applied on Sunday night. Monday morning operations began normally.

  • 08:14 AM — CICS AOR CNBAO03 on CNBPROD1: Transaction XVAL (account validation) abends with U4038. The CEEDUMP shows:
CEE3501S LE enclave initialization failed.
  Reason: The load module CNBVAL50 was created with a version of
  the compiler that is not compatible with the current Language
  Environment runtime.
  Module compiled with: Enterprise COBOL V4.2 (LE Compatibility Level 11)
  Current LE runtime:   z/OS 2.5 LE V2.5 (Minimum Compatibility Level 14)
  • 08:14-08:16 — 43 additional U4038 abends in CNBAO03 as users retry the failed transaction.
  • 08:17 — CICSPlex SM detects the failure rate and routes XVAL transactions to other AORs. But CNBVAL50 is installed in all AORs — the failures follow.
  • 08:19 — All 8 AORs that serve XVAL are experiencing U4038 abends. The XVAL transaction is effectively down.
  • 08:22 — Kwame Mensah receives alert. He's already reading the first CEEDUMP.

Kwame's Diagnosis

The CEEDUMP was unambiguous. Kwame's analysis took less than four minutes:

  1. The CEEDUMP says "Enterprise COBOL V4.2, LE Compatibility Level 11." The installed LE requires minimum Compatibility Level 14. The PTF applied Sunday night tightened this requirement from Level 10 to Level 14.

  2. CNBVAL50 was compiled in 2011. The compile listing (retrieved from the source management system) showed the COBOL compiler version: Enterprise COBOL for z/OS V4.2.

  3. The fix: Recompile CNBVAL50 with the current compiler (Enterprise COBOL V6.3). No source changes needed.

  • 08:26 — Kwame instructed the build team to recompile CNBVAL50 with Enterprise COBOL V6.3.
  • 08:34 — Recompilation complete. New load module promoted to production.
  • 08:36 — NEWCOPY issued in all AORs.
  • 08:38 — XVAL transactions processing normally.

Total outage for XVAL: 24 minutes.

The Cascade

But CNBVAL50 was not the only V4.2-compiled program. Over the next three hours:

Time Program Transaction Issue
09:15 CNBRTE20 XRTE (routing) U4038 — compiled with V4.2
09:47 CNBACC15 XACC (account access) U4038 — compiled with V4.2
10:05 CNBFEE10 XFEE (fee calculation) U4038 — compiled with V4.2
10:22 CNBLMT30 XLMT (limit checking) U4038 — compiled with V4.2
10:55 CNBINT05 XINT (interest calc) U4038 — compiled with V4.1 (!!)

Each failure required the same fix: recompile with V6.3, promote, NEWCOPY. But each failure caused 5-20 minutes of transaction outage. The cumulative impact was severe.

By 11:30 AM, Kwame realized the reactive approach was unsustainable. He issued a directive: "Identify every load module in every production CICS region compiled with a compiler version below V5.1. Recompile all of them. Today."

The Audit

Lisa Tran's team ran a utility that extracted the compiler version stamp from every load module in the production CICS load libraries. The results:

Compiler Version Production Modules Oldest Compile Date
Enterprise COBOL V6.3 287 2021-03-15
Enterprise COBOL V6.2 124 2019-07-22
Enterprise COBOL V6.1 68 2018-01-10
Enterprise COBOL V5.2 45 2016-09-05
Enterprise COBOL V5.1 31 2015-04-18
Enterprise COBOL V4.2 14 2011-02-14
Enterprise COBOL V4.1 3 2009-11-30
Total at risk 17

Seventeen production load modules compiled with V4.x — all at risk of U4038 under the new LE PTF.

Five had already failed. Twelve had not yet been invoked since the maintenance window. The team recompiled all 17 on Monday afternoon and evening, deploying them via NEWCOPY during low-volume hours.

No further U4038 abends occurred.

Root Cause Analysis

Why Were 17 Programs Never Recompiled?

The RCA meeting identified three contributing factors:

1. No mandatory recompilation policy. CNB had no policy requiring periodic recompilation of production load modules. The prevailing philosophy was "if it works, don't touch it." Programs compiled in 2009 and 2011 had been running successfully for over a decade. Nobody questioned whether they would continue to work after LE upgrades.

2. Source management disconnect. The source code for all 17 programs existed in the source management system (Endevor). The compile listings were archived. But there was no automated check that connected the production load library to the source management system to verify compiler version currency.

3. PTF review process gap. The systems programming team reviewed the LE PTF's documentation before applying it. The documentation mentioned the tightened compatibility level, but described it as affecting "programs compiled with very old compilers." The team interpreted "very old" as VS COBOL II era — not Enterprise COBOL V4. They did not check which compiler versions were in use in production.

The Real Root Cause

Kwame's assessment in the RCA report was blunt: "The root cause is not the PTF. The root cause is that we had production software compiled twelve years ago and no process to ensure it remained compatible with the evolving runtime environment. The PTF exposed the gap; it didn't create it."

Remediation

Short-Term: Emergency Recompilation

All 17 V4.x programs recompiled with V6.3 on the day of the incident. An additional 76 V5.1/V5.2 programs identified as next-at-risk were scheduled for recompilation within 30 days.

Medium-Term: Annual Recompilation Policy

Kwame established a new policy:

CNB Architecture Standard ARC-2022-11: Annual Load Module Recompilation

SCOPE: All COBOL production load modules in batch and CICS environments.

REQUIREMENT: Every production COBOL load module must be recompiled
with the current Enterprise COBOL compiler version at least once per
calendar year, even if the source code has not changed.

SCHEDULE: January of each year. All load modules recompiled, tested
in QA, and promoted to production by January 31.

EXCEPTION: Programs with no available source code are exempt but must
be documented on the LE Compatibility Risk Register and tested against
each z/OS/LE upgrade before production cutover.

RATIONALE: Annual recompilation ensures LE compatibility, captures
compiler optimizer improvements, and eliminates the risk of compiler-
version-related abends during z/OS maintenance.

Long-Term: Automated Compiler Version Monitoring

Lisa Tran's team implemented an automated check:

  1. Weekly scan: A batch job scans all production CICS load libraries and extracts the compiler version stamp from every load module.
  2. Age check: Any module compiled more than 14 months ago (giving a 2-month buffer past the annual recompilation deadline) is flagged.
  3. Alert: Flagged modules generate a report sent to the application team and Kwame's architecture team.
  4. z/OS upgrade pre-check: Before any z/OS maintenance that includes LE PTFs, the scan runs and identifies all modules compiled with versions older than the PTF's minimum compatibility level.

The scan is a simple REXX program that reads the load module's header (the PDS directory entry and the module's eye-catcher area) to extract the Enterprise COBOL version stamp.

Financial and Operational Impact

Impact Details
Transaction outage (XVAL) 24 minutes
Transaction outages (5 subsequent programs) 5-20 minutes each, ~50 minutes cumulative
Total user-facing impact ~75 minutes of partial service degradation
Emergency recompilation effort 8 person-hours (17 programs × ~30 min each including testing)
Revenue impact $95,000 (failed wire transfers, delayed teller transactions, ATM fallback fees)
Annual recompilation effort (ongoing) ~40 person-hours per year (all ~570 load modules)
Automated monitoring (one-time) ~16 person-hours to develop and test

Kwame's assessment: "Forty hours per year to recompile everything. Sixteen hours to build the monitoring. Compare that to $95,000 in one Monday morning. The ROI is immediate."

Connections to LE Concepts

This incident illustrates several concepts from the chapter:

  1. LE initialization sequence (Section 3.2): The U4038 occurred at step 2a — LE version verification. This is one of the first things LE does, before any COBOL code executes. The failure was instantaneous and affected every invocation.

  2. Enclave model (Section 3.3): Each CICS task created a new enclave. Each enclave failed at initialization. With 43 failures in 2 minutes, the CICS AOR was flooded with failed enclaves — each one consuming brief amounts of DSA storage for the CEEDUMP before being cleaned up.

  3. CEEDUMP analysis (Section 3.6): The CEEDUMP contained the exact information needed for diagnosis: compiler version, LE compatibility level, and the mismatch. Kwame diagnosed the issue in under four minutes because he could read the CEEDUMP.

  4. LE configuration governance (Section 3.7): The lack of a recompilation policy and compiler version monitoring allowed the problem to accumulate over 12 years. The new ARC-2022-11 standard prevents recurrence.

Discussion Questions

  1. CNB's annual recompilation costs 40 person-hours per year. Some organizations resist mandatory recompilation because "it introduces risk — recompiling stable code might introduce bugs." Evaluate this argument. Is the risk of recompilation greater or less than the risk of LE incompatibility?

  2. Three of the 17 at-risk programs were compiled with Enterprise COBOL V4.1, which is no longer supported by IBM. What additional risks exist beyond LE compatibility when running unsupported compiler versions?

  3. The systems programming team read the PTF documentation but did not connect "very old compilers" to the V4.x programs in production. What process improvement would have caught this? Design a pre-PTF checklist for LE maintenance.

  4. Sandra Chen at Federal Benefits Administration has 23 programs compiled with VS COBOL II (which predates Enterprise COBOL entirely). If FBA applied this same LE PTF, what would happen? How does her situation differ from CNB's?

  5. The automated compiler version scan runs weekly. Could it be event-driven instead (triggered by each NEWCOPY or load module promotion)? What are the trade-offs between scheduled scanning and event-driven scanning?