> "The worst thing you can do to a legacy system is rewrite it from scratch. The second worst thing is leave it exactly as it is."
In This Chapter
- Introduction: The Modernization Imperative
- Phase 1: Documentation and Understanding
- Phase 2: Refactoring for Modularity
- The Refactoring Process in Detail
- Testing During Refactoring
- Understanding the COBOL Compilation Pipeline
- The DB2 Migration Decision
- Phase 3: Adding DB2
- Phase 4: Exposing as API
- Phase 5: CI/CD Pipeline and Testing
- The Web Service in Detail
- Building the Test Automation Framework
- The Modernization Results
- Lessons from the Trenches
- Common Modernization Anti-Patterns
- Working with the Student Mainframe Lab
- Understanding the Modernization Timeline
- Summary
- Chapter Reflection: What Modernization Really Means
Chapter 44: Capstone 2 — Legacy System Modernization Case Study
"The worst thing you can do to a legacy system is rewrite it from scratch. The second worst thing is leave it exactly as it is." — James Okafor, to the MedClaim modernization steering committee
Introduction: The Modernization Imperative
In Capstone 1, you built a system from scratch. That was the easy part. Most of your career will be spent working with systems that already exist — systems written by people who are no longer with the organization, documented partially or not at all, running on technology that has been patched and extended for decades. These systems work. They process millions of transactions, adjudicate hundreds of thousands of claims, and run the infrastructure of modern commerce. But they are increasingly difficult to maintain, extend, and integrate with modern systems.
This capstone takes you through the complete modernization of a legacy COBOL system. Not a rewrite — a modernization. The distinction matters. A rewrite discards decades of accumulated business logic, introduces new bugs, and takes years to complete. A modernization preserves what works while making the system easier to maintain, extend, and integrate. It is incremental, reversible, and — most importantly — it can deliver value at every phase.
💡 The Modernization Spectrum. There is no single "right" way to modernize a legacy system. The spectrum ranges from "document and stabilize" (least disruptive) to "rewrite in a new language" (most disruptive). Most successful modernizations operate in the middle of this spectrum: refactoring for clarity, migrating data stores, exposing APIs, and automating testing and deployment. This capstone walks through each of these phases.
The System: MedClaim Insurance Processing
MedClaim Health Services processes approximately 500,000 insurance claims per month. Their core processing system, MEDCLAIM-PROC, is approximately 800,000 lines of COBOL running on z/OS with DB2. The system has been in production for 18 years, and it works — claims are processed accurately, providers are paid on time, and regulatory requirements are met.
But the system has problems:
- Only two developers understand it. James Okafor (team lead, 15 years) and one other senior developer maintain the entire system. If both left, the organization would be in crisis.
- No automated tests. Changes are tested manually against a copy of production data. Test cycles take 2-3 days.
- Flat files everywhere. While the core data is in DB2, many interfaces between programs use flat sequential files. These interfaces are fragile and difficult to change.
- No API access. Partner organizations need claim status information, but the only way to get it is through a batch extract that runs nightly. Real-time access does not exist.
- Inconsistent copybooks. Over 18 years, multiple developers have created slightly different versions of the same copybooks. Some programs use the "official" copybook; others use local copies with modifications.
James Okafor has been asked to lead the modernization effort. He has a budget for 12 months of work, a team of three (himself, Sarah Kim as business analyst, and Tomás Rivera as DBA), and a mandate from management: "Make this system sustainable for the next 10 years."
James accepts the assignment with a mixture of excitement and trepidation. He knows this system better than anyone alive — he has been maintaining it for 15 years, fixing bugs at 2 AM, adding features under deadline pressure, and watching the technical debt accumulate. He has wanted to modernize it for years but never had the budget or the organizational support. Now he has both, and the pressure to deliver is real.
⚖️ The Human Factor. Notice that the modernization is driven by a human problem — knowledge concentration in too few people — not a technical problem. The system works fine technically. But a system that only two people can maintain is an organizational risk. This is a pattern you will see repeatedly in your career: the decision to modernize is rarely about technology. It is about people, risk, and sustainability.
Phase 1: Documentation and Understanding
Code Archaeology
Before changing anything, James must understand what exists. This is code archaeology — the disciplined process of reading, documenting, and mapping a system you did not write.
James begins with an inventory. He uses a combination of JCL analysis, COBOL cross-reference listings, and manual review to create a complete picture of the system.
System Inventory:
| Component | Count | Description |
|---|---|---|
| COBOL programs | 47 | Batch and CICS programs |
| Copybooks | 83 | Record layouts and common data |
| JCL procedures | 12 | Cataloged procedures for job streams |
| JCL job streams | 8 | Daily, weekly, monthly, ad-hoc |
| DB2 tables | 23 | Core data store |
| VSAM files | 6 | Work files and lookup tables |
| Sequential files | 31 | Interface files between programs |
| CICS transactions | 5 | Online inquiry and maintenance |
| BMS maps | 5 | Screen definitions |
"Forty-seven programs," Sarah Kim observes. "That's more than I expected."
"Wait until you see how they're connected," James replies. He draws a program dependency diagram on the whiteboard. It looks like a plate of spaghetti.
The Legacy Code Assessment
James categorizes each program into one of four modernization tiers:
Tier 1 — Leave Alone (12 programs): These programs are simple, well-structured, and rarely modified. They work, they are readable, and changing them would add risk without benefit. Examples: date conversion utilities, standard report headers, parameter validation routines.
Tier 2 — Document and Stabilize (15 programs): These programs are complex but stable. They have not been modified in years and rarely cause production issues. They need documentation (comments, flow diagrams, data maps) but not structural changes. Examples: monthly regulatory report generators, year-end processing.
Tier 3 — Refactor (14 programs): These programs are complex, frequently modified, and difficult to understand. They contain duplicated code, inconsistent copybooks, and tangled control flow. They need structural improvement. Examples: claim adjudication, provider payment calculation, eligibility verification.
Tier 4 — Redesign (6 programs): These programs are fundamentally flawed in their current design. They cannot be refactored incrementally — they need to be redesigned and rewritten, one at a time, with the same interfaces so the rest of the system is unaffected. Examples: the claim intake program (which mixes I/O, validation, and business logic in a single 12,000-line program), the batch scheduler.
📊 The 80/20 Rule of Modernization. James estimates that 80% of the modernization value will come from the Tier 3 and Tier 4 programs — which represent only 20 programs out of 47. This is typical. Most legacy systems have a core of critical, complex programs surrounded by a periphery of simpler support programs. Focus your modernization effort on the core.
Documenting the Existing System
For each Tier 2 and Tier 3 program, James creates three documentation artifacts:
1. Program Specification: A one-page summary of what the program does, its inputs, outputs, key business rules, and known issues.
2. Data Flow Diagram: A visual showing which files and tables the program reads and writes, and how data flows through the program's major sections.
3. Business Rule Catalog: A structured list of every business rule embedded in the code. Each rule is tagged with a line number reference, a plain-English description, and a confidence level (High/Medium/Low) indicating how well the rule is understood.
Here is an excerpt from the business rule catalog for CLM-ADJUD (the claim adjudication program):
| Rule ID | Description | Code Location | Confidence |
|---|---|---|---|
| ADJ-001 | Claims over $50,000 require manual review | Lines 1247-1260 | High |
| ADJ-002 | Provider must be in active status | Lines 1305-1318 | High |
| ADJ-003 | Duplicate claim check uses member ID + date + provider + amount | Lines 1402-1456 | Medium |
| ADJ-004 | Emergency room claims bypass pre-authorization check | Lines 1520-1535 | High |
| ADJ-005 | Mental health parity calculation uses 2008 rule table | Lines 1678-1742 | Low |
🔴 Confidence Levels Matter. Rule ADJ-005 has Low confidence because the code references a "2008 rule table" that James cannot find documentation for. The code works — claims are adjudicated correctly — but nobody can explain why the calculation uses the specific factors it does. This is a common pattern in legacy systems: the code embodies institutional knowledge that was never written down anywhere else. If the code is lost, the knowledge is lost.
The Original Legacy Code
Let us examine a representative piece of MedClaim's legacy code — the claim intake program. This is what James's team found when they opened CLM-INTAKE:
IDENTIFICATION DIVISION.
PROGRAM-ID. CLM-INTAKE.
* CLAIM INTAKE - MODIFIED 03/2012 JO
* MODIFIED 07/2014 JO - ADDED TYPE 7 PROCESSING
* MODIFIED 11/2016 RK - FIXED BUG IN DATE CHECK
* MODIFIED 02/2019 JO - NEW PROVIDER FORMAT
* MODIFIED 09/2021 JO - COVID CODES
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT CLAIM-IN ASSIGN TO CLMIN
FILE STATUS IS WS-FS1.
SELECT CLAIM-OUT ASSIGN TO CLMOUT
FILE STATUS IS WS-FS2.
SELECT ERR-FILE ASSIGN TO ERROUT
FILE STATUS IS WS-FS3.
SELECT PROV-FILE ASSIGN TO PROVFL
ORGANIZATION IS INDEXED
ACCESS IS RANDOM
RECORD KEY IS PROV-ID
FILE STATUS IS WS-FS4.
SELECT MEMB-FILE ASSIGN TO MEMBFL
ORGANIZATION IS INDEXED
ACCESS IS RANDOM
RECORD KEY IS MEMB-ID
FILE STATUS IS WS-FS5.
DATA DIVISION.
FILE SECTION.
FD CLAIM-IN.
01 CLAIM-IN-REC.
05 CI-MEMB-ID PIC X(12).
05 CI-PROV-ID PIC X(10).
05 CI-SVC-DATE PIC 9(08).
05 CI-DIAG-CODE PIC X(07).
05 CI-PROC-CODE PIC X(05).
05 CI-AMOUNT PIC 9(07)V99.
05 CI-CLM-TYPE PIC 9(01).
05 CI-PLACE-SVC PIC X(02).
05 CI-MODIFIER PIC X(02).
05 CI-AUTH-NUM PIC X(12).
05 CI-FILLER PIC X(32).
FD CLAIM-OUT.
01 CLAIM-OUT-REC PIC X(200).
FD ERR-FILE.
01 ERR-REC PIC X(132).
FD PROV-FILE.
01 PROV-REC.
05 PROV-ID PIC X(10).
05 PROV-NAME PIC X(30).
05 PROV-STAT PIC X(01).
05 PROV-TYPE PIC 9(02).
05 PROV-REST PIC X(57).
FD MEMB-FILE.
01 MEMB-REC.
05 MEMB-ID PIC X(12).
05 MEMB-NAME PIC X(30).
05 MEMB-EFF-DT PIC 9(08).
05 MEMB-TERM-DT PIC 9(08).
05 MEMB-PLAN PIC X(04).
05 MEMB-REST PIC X(38).
WORKING-STORAGE SECTION.
01 WS-FS1 PIC X(02).
01 WS-FS2 PIC X(02).
01 WS-FS3 PIC X(02).
01 WS-FS4 PIC X(02).
01 WS-FS5 PIC X(02).
01 WS-EOF PIC X VALUE 'N'.
01 WS-CTR1 PIC 9(07) VALUE 0.
01 WS-CTR2 PIC 9(07) VALUE 0.
01 WS-CTR3 PIC 9(07) VALUE 0.
01 WS-ERR-MSG PIC X(80) VALUE SPACES.
01 WS-WORK-DATE PIC 9(08).
01 WS-TODAY PIC 9(08).
01 WS-CLM-WORK PIC X(200).
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INIT.
PERFORM 2000-PROC UNTIL WS-EOF = 'Y'.
PERFORM 3000-TERM.
STOP RUN.
1000-INIT.
OPEN INPUT CLAIM-IN PROV-FILE MEMB-FILE.
OPEN OUTPUT CLAIM-OUT ERR-FILE.
ACCEPT WS-TODAY FROM DATE YYYYMMDD.
READ CLAIM-IN
AT END MOVE 'Y' TO WS-EOF.
2000-PROC.
ADD 1 TO WS-CTR1.
* CHECK MEMBER
MOVE CI-MEMB-ID TO MEMB-ID.
READ MEMB-FILE.
IF WS-FS5 NOT = '00'
MOVE 'MEMBER NOT FOUND' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF.
* CHECK DATES
IF CI-SVC-DATE < MEMB-EFF-DT OR
CI-SVC-DATE > MEMB-TERM-DT
MOVE 'SERVICE DATE OUT OF COVERAGE' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF.
* CHECK PROVIDER
MOVE CI-PROV-ID TO PROV-ID.
READ PROV-FILE.
IF WS-FS4 NOT = '00'
MOVE 'PROVIDER NOT FOUND' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF.
IF PROV-STAT NOT = 'A'
MOVE 'PROVIDER NOT ACTIVE' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF.
* CHECK AMOUNT
IF CI-AMOUNT NOT > 0
MOVE 'INVALID AMOUNT' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF.
* CHECK AUTH FOR NON-EMERGENCY
IF CI-PLACE-SVC NOT = '23'
IF CI-AUTH-NUM = SPACES
IF CI-CLM-TYPE NOT = 7
MOVE 'AUTH REQUIRED' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF
END-IF
END-IF.
* COVID OVERRIDE - ADDED 09/2021
IF CI-DIAG-CODE(1:3) = 'U07'
MOVE SPACES TO CI-AUTH-NUM
MOVE 0 TO CI-CLM-TYPE
END-IF.
* WRITE GOOD CLAIM
MOVE CLAIM-IN-REC TO WS-CLM-WORK.
WRITE CLAIM-OUT-REC FROM WS-CLM-WORK.
ADD 1 TO WS-CTR2.
2000-EXIT.
READ CLAIM-IN
AT END MOVE 'Y' TO WS-EOF.
3000-TERM.
DISPLAY 'CLAIMS READ: ' WS-CTR1.
DISPLAY 'CLAIMS WRITTEN: ' WS-CTR2.
DISPLAY 'CLAIMS ERRORED: ' WS-CTR3.
CLOSE CLAIM-IN CLAIM-OUT ERR-FILE PROV-FILE MEMB-FILE.
8000-ERR.
STRING CI-MEMB-ID DELIMITED SIZE
' ' DELIMITED SIZE
WS-ERR-MSG DELIMITED SIZE
INTO ERR-REC.
WRITE ERR-REC.
ADD 1 TO WS-CTR3.
This code works. It has been processing claims for 18 years. But it has significant problems:
- No copybooks. Record layouts are defined inline. If the claim format changes, every program that reads claims must be updated independently.
- Cryptic variable names. WS-FS1, WS-CTR1, CI-MEMB-ID — these are understandable if you know the conventions, but they are not self-documenting.
- Inline file status checking. The
IF WS-FS4 NOT = '00'pattern is repeated everywhere without 88-level conditions. - Hardcoded business rules. The COVID override (checking for diagnosis code 'U07') is hardcoded. When COVID rules change, the program must be modified and recompiled.
- No audit trail. Rejected claims are written to an error file, but there is no record of why the claim was rejected with enough detail for systematic analysis.
- Mixed concerns. I/O, validation, and business rules are all in the same paragraph. This makes the program difficult to test and modify.
- GO TO for flow control. The GO TO 2000-EXIT pattern is a common legacy idiom but makes the control flow harder to follow than structured alternatives.
- Unsigned amount field. CI-AMOUNT is PIC 9(07)V99 — unsigned. If a negative amount were received (perhaps from a reversal), it would be silently treated as positive.
⚠️ Do Not Judge. It is easy — and wrong — to look at legacy code and conclude that the original developers were incompetent. They were not. They were working with the standards, tools, and time pressures of their era. The code worked then, and it works now. Our job is not to judge it but to improve it — incrementally, carefully, and respectfully. As James tells his team: "This code paid the bills for 18 years. Show it some respect."
Analyzing the Legacy Code: A Detailed Walk-Through
Let us trace through the legacy CLM-INTAKE code to understand both what it does well and where it falls short. This kind of analysis is the core skill of code archaeology.
What the legacy code does well:
-
It checks file status after every I/O. The
IF WS-FS5 NOT = '00'pattern is not pretty, but it works. Every file read is verified. Many legacy programs do not check file status at all — they simply assume every READ succeeds. -
It validates in a reasonable order. Member check comes before provider check, which comes before financial checks. This ordering makes sense: there is no point checking the provider if the member does not exist.
-
It has counters. WS-CTR1, WS-CTR2, and WS-CTR3 track reads, writes, and errors. The DISPLAY statements in 3000-TERM produce a processing summary. This is elementary audit control, but it is present — many legacy programs do not even have this.
-
It handles COVID. The
U07diagnosis code check (added in 2021) shows that the program has been maintained and adapted to changing business requirements. The modification comment at the top documents when and why the change was made.
Where the legacy code falls short:
-
The GO TO 2000-EXIT pattern creates a spaghetti-like control flow that is difficult to trace. Each validation check has its own GO TO, and understanding the flow requires mentally tracking which GO TOs can be reached from which IF conditions. In the refactored version, cascading IFs make the flow explicit: "if valid so far, check the next thing."
-
The COVID override is dangerous. Lines 249-252 clear the authorization number and claim type for COVID diagnosis codes. But what if a claim has a legitimate authorization? Clearing it discards information. The refactored version would add a COVID-OVERRIDE flag without destroying existing data.
-
No distinction between "not found" and "file error." The legacy code treats
WS-FS5 NOT = '00'as "member not found." But VSAM file status '23' (not found) is very different from '24' (boundary violation) or '9x' (physical I/O error). The legacy code conflates all non-zero statuses into a single error path. -
The error report is unstructured. The STRING in paragraph 8000-ERR produces a free-form error line that is difficult to parse programmatically. If the operations team wants to count errors by type, they must write a custom parser. The refactored version uses structured error records with separate fields for each piece of information.
Understanding these strengths and weaknesses is the foundation for effective refactoring. You preserve the strengths (file status checking, validation ordering, counters) while addressing the weaknesses (GO TO flow, crude error handling, inline definitions).
The Knowledge Transfer Problem
James faces a challenge that many modernization leaders encounter: the original developer of CLM-INTAKE left MedClaim four years ago. The modification log shows names (JO for James Okafor, RK for an unknown developer), but the reasoning behind design decisions is lost.
For example, line 249 checks CI-DIAG-CODE(1:3) = 'U07'. Why the first three characters? COVID diagnosis codes are U07.1 (COVID-19, virus identified) and U07.2 (COVID-19, virus not identified). Checking three characters catches both codes. But it also catches any future code starting with U07 — is that intentional? Nobody knows.
James documents his assumption: "The U07 check is intentionally broad, catching any current or future U07.x code. If WHO assigns a non-COVID code in the U07 range, this check will need to be revised." By documenting the assumption, James ensures that the next developer who encounters this code understands the reasoning — or the lack thereof.
This kind of archaeological inference — reading the code, forming hypotheses about intent, documenting assumptions — is a skill that separates competent legacy developers from exceptional ones. The code tells you what happens; documentation tells you why. When documentation is missing, you must reconstruct the "why" from the "what" and record it for future developers.
Phase 2: Refactoring for Modularity
Creating Standard Copybooks
The first concrete modernization step is copybook consolidation. James identifies 83 copybooks in the system, but many are duplicates or near-duplicates. After analysis, he reduces the canonical set to 34 copybooks and creates a naming standard:
| Prefix | Meaning | Example |
|---|---|---|
| CLM- | Claim records | CLMREC, CLMHIST |
| PRV- | Provider records | PRVREC, PRVADDR |
| MBR- | Member records | MBRREC, MBRELIG |
| PAY- | Payment records | PAYREC, PAYHIST |
| ERR- | Error/audit records | ERRREC, ERRCODE |
| WS- | Common working storage | WSDATE, WSCNTRS |
| RPT- | Report layouts | RPTHDR, RPTDTL |
The new claim record copybook:
*================================================================*
* COPYBOOK: CLMREC *
* Insurance Claim Record Layout - Version 2.0 *
* Modernized: 2024-03-15 by James Okafor *
* Previous: Multiple inline definitions (legacy) *
*================================================================*
01 CLAIM-RECORD.
05 CLM-HEADER.
10 CLM-CLAIM-ID PIC X(15).
10 CLM-RECEIVED-DATE PIC 9(08).
10 CLM-RECEIVED-TIME PIC 9(06).
10 CLM-SOURCE PIC X(02).
88 CLM-FROM-ELECTRONIC VALUE 'EL'.
88 CLM-FROM-PAPER VALUE 'PA'.
88 CLM-FROM-PHONE VALUE 'PH'.
88 CLM-VALID-SOURCE VALUE 'EL' 'PA' 'PH'.
05 CLM-MEMBER-INFO.
10 CLM-MEMBER-ID PIC X(12).
10 CLM-MEMBER-NAME PIC X(30).
10 CLM-PLAN-CODE PIC X(04).
05 CLM-PROVIDER-INFO.
10 CLM-PROVIDER-ID PIC X(10).
10 CLM-PROVIDER-NAME PIC X(30).
10 CLM-PROVIDER-TYPE PIC 9(02).
88 CLM-PROV-PHYSICIAN VALUE 01.
88 CLM-PROV-HOSPITAL VALUE 02.
88 CLM-PROV-LAB VALUE 03.
88 CLM-PROV-PHARMACY VALUE 04.
05 CLM-SERVICE-INFO.
10 CLM-SERVICE-DATE PIC 9(08).
10 CLM-DIAGNOSIS-CODE PIC X(07).
10 CLM-PROCEDURE-CODE PIC X(05).
10 CLM-MODIFIER PIC X(02).
10 CLM-PLACE-OF-SVC PIC X(02).
88 CLM-EMERGENCY-ROOM VALUE '23'.
10 CLM-CLAIM-TYPE PIC 9(01).
88 CLM-TYPE-STANDARD VALUE 0.
88 CLM-TYPE-EMERGENCY VALUE 1.
88 CLM-TYPE-MENTAL-HTH VALUE 2.
88 CLM-TYPE-DENTAL VALUE 3.
88 CLM-TYPE-VISION VALUE 4.
88 CLM-TYPE-EXEMPT VALUE 7.
05 CLM-FINANCIAL-INFO.
10 CLM-CHARGED-AMOUNT PIC S9(7)V99 COMP-3.
10 CLM-ALLOWED-AMOUNT PIC S9(7)V99 COMP-3.
10 CLM-PAID-AMOUNT PIC S9(7)V99 COMP-3.
10 CLM-COPAY-AMOUNT PIC S9(5)V99 COMP-3.
10 CLM-DEDUCTIBLE-AMT PIC S9(5)V99 COMP-3.
05 CLM-AUTHORIZATION.
10 CLM-AUTH-NUMBER PIC X(12).
10 CLM-AUTH-REQUIRED PIC X(01).
88 CLM-NEEDS-AUTH VALUE 'Y'.
88 CLM-NO-AUTH-NEEDED VALUE 'N'.
05 CLM-STATUS-INFO.
10 CLM-STATUS PIC X(02).
88 CLM-RECEIVED VALUE 'RC'.
88 CLM-VALIDATED VALUE 'VL'.
88 CLM-ADJUDICATED VALUE 'AJ'.
88 CLM-PAID VALUE 'PD'.
88 CLM-DENIED VALUE 'DN'.
88 CLM-PENDING-REVIEW VALUE 'PR'.
10 CLM-REASON-CODE PIC X(04).
05 FILLER PIC X(20).
Compare this to the original inline definition. The modernized version: - Groups related fields logically - Uses 88-level conditions for every coded field - Uses COMP-3 for all monetary fields with sign - Has a consistent naming convention - Includes FILLER for future expansion - Documents its purpose, version, and history in comments
Refactoring CLM-INTAKE
With the new copybook in place, James refactors CLM-INTAKE. The refactored version separates concerns into distinct paragraphs, uses the standard copybook, and follows modern COBOL conventions.
IDENTIFICATION DIVISION.
PROGRAM-ID. CLM-INTAKE.
*================================================================*
* Program: CLM-INTAKE *
* Purpose: Validate and route incoming insurance claims *
* Author: Original unknown; refactored by James Okafor *
* Date: Refactored 2024-03-20 *
* System: MedClaim Insurance Processing *
*================================================================*
* Modification Log: *
* Date Author Description *
* ---------- ------- ------------------------------------------- *
* 2024-03-20 JO Refactored: copybooks, structured code, *
* audit trail, separated validation *
* 2024-03-25 JO Added external rule table for auth check *
*================================================================*
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT CLAIM-INPUT
ASSIGN TO CLMIN
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-INPUT-STATUS.
SELECT CLAIM-OUTPUT
ASSIGN TO CLMOUT
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-OUTPUT-STATUS.
SELECT ERROR-REPORT
ASSIGN TO ERROUT
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-ERROR-STATUS.
SELECT PROVIDER-FILE
ASSIGN TO PROVFL
ORGANIZATION IS INDEXED
ACCESS MODE IS RANDOM
RECORD KEY IS PRV-PROVIDER-ID
FILE STATUS IS WS-PROV-STATUS.
SELECT MEMBER-FILE
ASSIGN TO MEMBFL
ORGANIZATION IS INDEXED
ACCESS MODE IS RANDOM
RECORD KEY IS MBR-MEMBER-ID
FILE STATUS IS WS-MEMB-STATUS.
SELECT AUDIT-FILE
ASSIGN TO AUDOUT
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-AUDIT-STATUS.
DATA DIVISION.
FILE SECTION.
FD CLAIM-INPUT
RECORDING MODE IS F
RECORD CONTAINS 200 CHARACTERS.
COPY CLMREC.
FD CLAIM-OUTPUT
RECORDING MODE IS F
RECORD CONTAINS 200 CHARACTERS.
01 CLAIM-OUTPUT-REC PIC X(200).
FD ERROR-REPORT
RECORDING MODE IS F
RECORD CONTAINS 132 CHARACTERS.
01 ERROR-REPORT-LINE PIC X(132).
FD PROVIDER-FILE
RECORD CONTAINS 100 CHARACTERS.
COPY PRVREC.
FD MEMBER-FILE
RECORD CONTAINS 100 CHARACTERS.
COPY MBRREC.
FD AUDIT-FILE
RECORDING MODE IS F
RECORD CONTAINS 150 CHARACTERS.
COPY ERRREC.
WORKING-STORAGE SECTION.
01 WS-FILE-STATUSES.
05 WS-INPUT-STATUS PIC X(02).
88 WS-INPUT-OK VALUE '00'.
88 WS-INPUT-EOF VALUE '10'.
05 WS-OUTPUT-STATUS PIC X(02).
88 WS-OUTPUT-OK VALUE '00'.
05 WS-ERROR-STATUS PIC X(02).
05 WS-PROV-STATUS PIC X(02).
88 WS-PROV-OK VALUE '00'.
88 WS-PROV-NOT-FOUND VALUE '23'.
05 WS-MEMB-STATUS PIC X(02).
88 WS-MEMB-OK VALUE '00'.
88 WS-MEMB-NOT-FOUND VALUE '23'.
05 WS-AUDIT-STATUS PIC X(02).
01 WS-COUNTERS.
05 WS-CLAIMS-READ PIC 9(07) VALUE ZERO.
05 WS-CLAIMS-ACCEPTED PIC 9(07) VALUE ZERO.
05 WS-CLAIMS-REJECTED PIC 9(07) VALUE ZERO.
01 WS-FLAGS.
05 WS-EOF-FLAG PIC X(01) VALUE 'N'.
88 WS-END-OF-INPUT VALUE 'Y'.
88 WS-MORE-INPUT VALUE 'N'.
05 WS-CLAIM-VALID PIC X(01).
88 WS-CLAIM-IS-VALID VALUE 'Y'.
88 WS-CLAIM-IS-INVALID VALUE 'N'.
01 WS-REJECTION-INFO.
05 WS-REJECT-CODE PIC X(04).
05 WS-REJECT-MESSAGE PIC X(60).
01 WS-DATE-WORK.
05 WS-CURRENT-DATE PIC 9(08).
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-PROCESS-CLAIMS
UNTIL WS-END-OF-INPUT
PERFORM 3000-TERMINATE
STOP RUN
.
1000-INITIALIZE.
OPEN INPUT CLAIM-INPUT
PROVIDER-FILE
MEMBER-FILE
OPEN OUTPUT CLAIM-OUTPUT
ERROR-REPORT
AUDIT-FILE
IF NOT WS-INPUT-OK
DISPLAY 'FATAL: CANNOT OPEN CLAIM INPUT: '
WS-INPUT-STATUS
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF
ACCEPT WS-CURRENT-DATE FROM DATE YYYYMMDD
PERFORM 2100-READ-CLAIM
.
2000-PROCESS-CLAIMS.
ADD 1 TO WS-CLAIMS-READ
SET WS-CLAIM-IS-VALID TO TRUE
MOVE SPACES TO WS-REJECT-CODE
MOVE SPACES TO WS-REJECT-MESSAGE
PERFORM 2200-VALIDATE-MEMBER
IF WS-CLAIM-IS-VALID
PERFORM 2300-VALIDATE-PROVIDER
END-IF
IF WS-CLAIM-IS-VALID
PERFORM 2400-VALIDATE-SERVICE
END-IF
IF WS-CLAIM-IS-VALID
PERFORM 2500-VALIDATE-AUTHORIZATION
END-IF
IF WS-CLAIM-IS-VALID
PERFORM 2600-VALIDATE-FINANCIAL
END-IF
IF WS-CLAIM-IS-VALID
PERFORM 2700-WRITE-ACCEPTED-CLAIM
ELSE
PERFORM 2800-WRITE-REJECTED-CLAIM
END-IF
PERFORM 2100-READ-CLAIM
.
2100-READ-CLAIM.
READ CLAIM-INPUT
EVALUATE TRUE
WHEN WS-INPUT-OK
CONTINUE
WHEN WS-INPUT-EOF
SET WS-END-OF-INPUT TO TRUE
WHEN OTHER
DISPLAY 'READ ERROR: ' WS-INPUT-STATUS
SET WS-END-OF-INPUT TO TRUE
END-EVALUATE
.
2200-VALIDATE-MEMBER.
MOVE CLM-MEMBER-ID TO MBR-MEMBER-ID
READ MEMBER-FILE
EVALUATE TRUE
WHEN WS-MEMB-OK
IF MBR-TERM-DATE < CLM-SERVICE-DATE
OR MBR-EFF-DATE > CLM-SERVICE-DATE
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'MCOV' TO WS-REJECT-CODE
MOVE 'SERVICE DATE OUTSIDE COVERAGE PERIOD'
TO WS-REJECT-MESSAGE
END-IF
WHEN WS-MEMB-NOT-FOUND
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'MNFD' TO WS-REJECT-CODE
MOVE 'MEMBER NOT FOUND' TO WS-REJECT-MESSAGE
WHEN OTHER
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'MERR' TO WS-REJECT-CODE
MOVE 'MEMBER FILE READ ERROR'
TO WS-REJECT-MESSAGE
END-EVALUATE
.
2300-VALIDATE-PROVIDER.
MOVE CLM-PROVIDER-ID TO PRV-PROVIDER-ID
READ PROVIDER-FILE
EVALUATE TRUE
WHEN WS-PROV-OK
IF NOT PRV-ACTIVE
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'PINA' TO WS-REJECT-CODE
MOVE 'PROVIDER NOT IN ACTIVE STATUS'
TO WS-REJECT-MESSAGE
END-IF
WHEN WS-PROV-NOT-FOUND
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'PNFD' TO WS-REJECT-CODE
MOVE 'PROVIDER NOT FOUND'
TO WS-REJECT-MESSAGE
WHEN OTHER
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'PERR' TO WS-REJECT-CODE
MOVE 'PROVIDER FILE READ ERROR'
TO WS-REJECT-MESSAGE
END-EVALUATE
.
2400-VALIDATE-SERVICE.
IF CLM-SERVICE-DATE > WS-CURRENT-DATE
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'SFUT' TO WS-REJECT-CODE
MOVE 'SERVICE DATE IN THE FUTURE'
TO WS-REJECT-MESSAGE
END-IF
.
2500-VALIDATE-AUTHORIZATION.
IF NOT CLM-EMERGENCY-ROOM
AND NOT CLM-TYPE-EXEMPT
AND CLM-AUTH-NUMBER = SPACES
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'AUTH' TO WS-REJECT-CODE
MOVE 'AUTHORIZATION REQUIRED FOR THIS SERVICE'
TO WS-REJECT-MESSAGE
END-IF
.
2600-VALIDATE-FINANCIAL.
IF CLM-CHARGED-AMOUNT NOT > ZERO
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'IAMT' TO WS-REJECT-CODE
MOVE 'INVALID CLAIM AMOUNT'
TO WS-REJECT-MESSAGE
END-IF
.
2700-WRITE-ACCEPTED-CLAIM.
SET CLM-VALIDATED TO TRUE
WRITE CLAIM-OUTPUT-REC FROM CLAIM-RECORD
IF WS-OUTPUT-OK
ADD 1 TO WS-CLAIMS-ACCEPTED
ELSE
DISPLAY 'WRITE ERROR ON OUTPUT: ' WS-OUTPUT-STATUS
END-IF
.
2800-WRITE-REJECTED-CLAIM.
SET CLM-DENIED TO TRUE
MOVE WS-REJECT-CODE TO CLM-REASON-CODE
ADD 1 TO WS-CLAIMS-REJECTED
PERFORM 2810-WRITE-AUDIT
PERFORM 2820-WRITE-ERROR-REPORT
.
2810-WRITE-AUDIT.
INITIALIZE AUDIT-RECORD
MOVE CLM-CLAIM-ID TO AUD-CLAIM-ID
MOVE CLM-MEMBER-ID TO AUD-MEMBER-ID
MOVE WS-REJECT-CODE TO AUD-REJECT-CODE
MOVE WS-REJECT-MESSAGE TO AUD-REJECT-DESC
MOVE WS-CURRENT-DATE TO AUD-DATE
WRITE AUDIT-RECORD
.
2820-WRITE-ERROR-REPORT.
INITIALIZE ERROR-REPORT-LINE
STRING CLM-CLAIM-ID DELIMITED BY SPACES
' | ' DELIMITED BY SIZE
CLM-MEMBER-ID DELIMITED BY SPACES
' | ' DELIMITED BY SIZE
WS-REJECT-CODE DELIMITED BY SIZE
' | ' DELIMITED BY SIZE
WS-REJECT-MESSAGE DELIMITED BY SPACES
INTO ERROR-REPORT-LINE
END-STRING
WRITE ERROR-REPORT-LINE
.
3000-TERMINATE.
DISPLAY '======================================'
DISPLAY 'CLM-INTAKE PROCESSING COMPLETE'
DISPLAY '======================================'
DISPLAY 'CLAIMS READ: ' WS-CLAIMS-READ
DISPLAY 'CLAIMS ACCEPTED: ' WS-CLAIMS-ACCEPTED
DISPLAY 'CLAIMS REJECTED: ' WS-CLAIMS-REJECTED
DISPLAY '======================================'
CLOSE CLAIM-INPUT
CLAIM-OUTPUT
ERROR-REPORT
PROVIDER-FILE
MEMBER-FILE
AUDIT-FILE
IF WS-CLAIMS-REJECTED > ZERO
MOVE 4 TO RETURN-CODE
ELSE
MOVE 0 TO RETURN-CODE
END-IF
.
What changed and why:
- Copybooks replace inline definitions. CLMREC, PRVREC, MBRREC, and ERRREC are now shared across all programs.
- Meaningful file status variables. WS-INPUT-STATUS with 88-levels replaces WS-FS1.
- Separated validation paragraphs. Each validation concern has its own paragraph. This makes testing possible — you can test member validation independently from provider validation.
- Structured rejection handling. A reject code and message are captured for every rejection, then written to both an audit file and an error report.
- No GO TO. The cascading IF pattern replaces the GO TO 2000-EXIT pattern. Both achieve the same result — skipping later validations after a failure — but the structured version is easier to trace.
- Audit trail added. Every rejected claim now generates an audit record, enabling systematic analysis of rejection patterns.
- Authorization logic uses 88-levels. Instead of checking
CI-PLACE-SVC NOT = '23'andCI-CLM-TYPE NOT = 7, the code usesNOT CLM-EMERGENCY-ROOMandNOT CLM-TYPE-EXEMPT.
✅ Theme: Readability is a Feature. Compare the two versions of the authorization check:
Legacy: IF CI-PLACE-SVC NOT = '23' ... IF CI-CLM-TYPE NOT = 7
Refactored: IF NOT CLM-EMERGENCY-ROOM AND NOT CLM-TYPE-EXEMPT
The refactored version reads like a business rule: "If this is not an emergency room visit and not an exempt claim type, authorization is required." A business analyst can verify that the code matches the business requirement. The legacy version requires the reader to know that '23' means emergency room and 7 means exempt.
Extracting Validation Subprograms
James takes the refactoring one step further. The validation logic in CLM-INTAKE is also needed by other programs (CLM-ADJUD uses the same member and provider checks). Instead of duplicating the code, James extracts the validation logic into a called subprogram.
IDENTIFICATION DIVISION.
PROGRAM-ID. CLM-VALID.
*================================================================*
* Program: CLM-VALID *
* Purpose: Centralized claim validation subprogram *
* Called by: CLM-INTAKE, CLM-ADJUD, CLM-AMEND *
* Author: James Okafor *
* Date: 2024-04-01 *
*================================================================*
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-INTERNAL-FLAGS.
05 WS-PROV-STATUS PIC X(02).
88 WS-PROV-FOUND VALUE '00'.
05 WS-MEMB-STATUS PIC X(02).
88 WS-MEMB-FOUND VALUE '00'.
LINKAGE SECTION.
01 LS-VALIDATION-REQUEST.
05 LS-FUNCTION PIC X(02).
88 LS-VALIDATE-MEMBER VALUE 'VM'.
88 LS-VALIDATE-PROVIDER VALUE 'VP'.
88 LS-VALIDATE-AUTH VALUE 'VA'.
88 LS-VALIDATE-ALL VALUE 'AL'.
05 LS-MEMBER-ID PIC X(12).
05 LS-PROVIDER-ID PIC X(10).
05 LS-SERVICE-DATE PIC 9(08).
05 LS-PLACE-OF-SERVICE PIC X(02).
05 LS-CLAIM-TYPE PIC 9(01).
05 LS-AUTH-NUMBER PIC X(12).
05 LS-RESULT.
10 LS-VALID-FLAG PIC X(01).
88 LS-IS-VALID VALUE 'Y'.
88 LS-IS-INVALID VALUE 'N'.
10 LS-REJECT-CODE PIC X(04).
10 LS-REJECT-MSG PIC X(60).
PROCEDURE DIVISION USING LS-VALIDATION-REQUEST.
0000-MAIN.
SET LS-IS-VALID TO TRUE
MOVE SPACES TO LS-REJECT-CODE
MOVE SPACES TO LS-REJECT-MSG
EVALUATE TRUE
WHEN LS-VALIDATE-MEMBER
PERFORM 1000-CHECK-MEMBER
WHEN LS-VALIDATE-PROVIDER
PERFORM 2000-CHECK-PROVIDER
WHEN LS-VALIDATE-AUTH
PERFORM 3000-CHECK-AUTH
WHEN LS-VALIDATE-ALL
PERFORM 1000-CHECK-MEMBER
IF LS-IS-VALID
PERFORM 2000-CHECK-PROVIDER
END-IF
IF LS-IS-VALID
PERFORM 3000-CHECK-AUTH
END-IF
WHEN OTHER
SET LS-IS-INVALID TO TRUE
MOVE 'IFNC' TO LS-REJECT-CODE
MOVE 'INVALID VALIDATION FUNCTION'
TO LS-REJECT-MSG
END-EVALUATE
GOBACK
.
1000-CHECK-MEMBER.
* Member validation logic here
* (reads MEMBER-FILE via CICS or VSAM as appropriate)
CONTINUE
.
2000-CHECK-PROVIDER.
* Provider validation logic here
CONTINUE
.
3000-CHECK-AUTH.
* Authorization validation logic here
IF LS-PLACE-OF-SERVICE NOT = '23'
AND LS-CLAIM-TYPE NOT = 7
AND LS-AUTH-NUMBER = SPACES
SET LS-IS-INVALID TO TRUE
MOVE 'AUTH' TO LS-REJECT-CODE
MOVE 'AUTHORIZATION REQUIRED'
TO LS-REJECT-MSG
END-IF
.
Now CLM-INTAKE, CLM-ADJUD, and CLM-AMEND all call CLM-VALID instead of implementing their own validation. When a validation rule changes, it changes in one place.
🔗 Theme: The Modernization Spectrum. Notice that James has not changed the system's architecture. Programs still read the same files, produce the same outputs, and run in the same JCL. He has changed the internal structure — extracting copybooks, separating concerns, creating reusable subprograms — without changing any external interface. This is the least risky form of modernization: improving the code without changing what it does.
The Refactoring Process in Detail
Step-by-Step Refactoring Methodology
James establishes a disciplined refactoring methodology that he requires every team member to follow. Undisciplined refactoring — changing code without a systematic process — is more dangerous than leaving the code alone.
Step 1: Baseline. Before changing any program, create a baseline: capture the program's output for a known set of inputs. This output becomes the "expected result" for all future comparisons.
//*--- CREATE BASELINE OUTPUT ---
//BASELN EXEC PGM=CLMINTAK
//STEPLIB DD DSN=MEDCLAIM.PROD.LOADLIB,DISP=SHR
//CLMIN DD DSN=MEDCLAIM.TEST.CLAIMS,DISP=SHR
//CLMOUT DD DSN=MEDCLAIM.TEST.BASELINE.CLMOUT,
// DISP=(NEW,CATLG),
// SPACE=(CYL,(1,1)),
// DCB=(RECFM=FB,LRECL=200)
//ERROUT DD DSN=MEDCLAIM.TEST.BASELINE.ERRORS,
// DISP=(NEW,CATLG),
// SPACE=(TRK,(5,1)),
// DCB=(RECFM=FB,LRECL=132)
//PROVFL DD DSN=MEDCLAIM.TEST.PROVIDER,DISP=SHR
//MEMBFL DD DSN=MEDCLAIM.TEST.MEMBER,DISP=SHR
//SYSOUT DD SYSOUT=*
Step 2: Understand. Read the program from top to bottom. Document every paragraph's purpose. Identify the data flow: which files are inputs, which are outputs, what working storage is shared between paragraphs. Create the three documentation artifacts (program spec, data flow diagram, business rule catalog).
Step 3: Test harness. Create a JCL procedure that runs the program against the test data and compares the output to the baseline. This is your safety net — after every change, you run this procedure. If the output changes, you have introduced a bug.
//*--- COMPARE REFACTORED OUTPUT TO BASELINE ---
//COMPARE EXEC PGM=IEBCOMPR
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DSN=MEDCLAIM.TEST.BASELINE.CLMOUT,DISP=SHR
//SYSUT2 DD DSN=MEDCLAIM.TEST.REFACTOR.CLMOUT,DISP=SHR
//SYSIN DD *
COMPARE TYPORG=PS
/*
Step 4: Refactor in small steps. Each refactoring change should be small enough that if the comparison test fails, you can identify what broke within minutes. Typical steps:
- Replace inline record definitions with COPY statements (recompile, compare)
- Replace numeric file status variables with named 88-levels (recompile, compare)
- Extract one validation concern into its own paragraph (recompile, compare)
- Replace GO TO with structured alternative (recompile, compare)
- Add audit trail writing (recompile, compare — output will differ, verify new output is correct)
Each step produces a compilable, testable program. If any step breaks the comparison, you revert that single step and investigate.
Step 5: Parallel run. After all refactoring is complete, run both the legacy and refactored versions against the full test dataset (not just the small test file — the production-volume test data). Compare outputs byte-for-byte. Only when the parallel run shows zero differences is the refactored version promoted to production.
The GO TO Debate
James expects pushback on eliminating GO TO statements, and he gets it. One of MedClaim's senior developers argues: "GO TO 2000-EXIT is a perfectly valid pattern. Every COBOL programmer recognizes it. Changing it just to satisfy a textbook rule introduces risk for no benefit."
James acknowledges the point. "You're right that GO TO 2000-EXIT is a recognized pattern. But here's the problem: when a paragraph has eight GO TO 2000-EXIT statements scattered through twenty validation checks, the control flow becomes hard to trace. Which checks were skipped? Did we skip the right ones? The cascading-IF pattern makes this explicit."
He demonstrates with a concrete example. The legacy authorization check:
* Legacy: Three levels of nesting with GO TO exit
IF CI-PLACE-SVC NOT = '23'
IF CI-AUTH-NUM = SPACES
IF CI-CLM-TYPE NOT = 7
MOVE 'AUTH REQUIRED' TO WS-ERR-MSG
PERFORM 8000-ERR
GO TO 2000-EXIT
END-IF
END-IF
END-IF.
The refactored version:
* Refactored: Flat condition with 88-levels
IF NOT CLM-EMERGENCY-ROOM
AND NOT CLM-TYPE-EXEMPT
AND CLM-AUTH-NUMBER = SPACES
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'AUTH' TO WS-REJECT-CODE
MOVE 'AUTHORIZATION REQUIRED FOR THIS SERVICE'
TO WS-REJECT-MESSAGE
END-IF
The refactored version is easier to read, easier to verify against the business requirement, and produces the same result. The GO TO is not needed because the validation flag (WS-CLAIM-IS-INVALID) controls whether subsequent validations are executed.
"I'm not against GO TO on principle," James clarifies. "I'm against GO TO when there's a clearer alternative. In this case, there is."
📊 Measuring Refactoring Impact. After refactoring CLM-INTAKE, James measures two things: (1) the time it takes a new developer to understand the program (before: ~3 days; after: ~4 hours), and (2) the time it takes to make a typical change (before: ~2 days including testing; after: ~3 hours). These are subjective measurements, but they quantify the economic value of readability.
Copybook Migration Strategy
Migrating from inline record definitions to shared copybooks is the single most impactful refactoring in MedClaim's modernization. But it must be done carefully.
The Migration Paradox: You cannot change 47 programs simultaneously. But if you change only some programs to use the new copybook while others still use inline definitions, you create a temporary inconsistency. If the new copybook differs from any inline definition — even by one byte — the programs will interpret data differently.
James solves this with a three-phase approach:
Phase A: Create copybooks that match existing definitions exactly. The first version of CLMREC must produce exactly the same record layout as the inline definitions it replaces. No field name changes, no type changes, no field additions. This is purely a structural change.
Phase B: Migrate programs one at a time. Replace each program's inline definition with COPY CLMREC. Recompile. Run the comparison test. If the output matches the baseline, the migration is successful. Move to the next program.
Phase C: Improve the copybook. Once all programs use the shared copybook, improve it: add 88-level conditions, change field names to follow the naming convention, add FILLER. Then recompile all programs and run the full regression suite.
This approach ensures that at no point do programs disagree about the record layout. Phase A is invisible to the programs (same layout, different source). Phase B is invisible to the data (same layout, same programs). Phase C changes the layout but all programs change simultaneously.
⚠️ The Recompile Rule. When a copybook changes, every program that COPYs it must be recompiled. James creates a cross-reference file that maps every copybook to every program that uses it:
CLMREC -> CLM-INTAKE, CLM-ADJUD, CLM-PAY, CLM-AMEND,
CLM-REVERSE, CLM-EXTRACT, CLM-API
PRVREC -> CLM-INTAKE, CLM-ADJUD, PRV-MAINT, PRV-REPORT
MBRREC -> CLM-INTAKE, CLM-ADJUD, MBR-MAINT, MBR-REPORT,
MBR-ELIG
This cross-reference is maintained manually — and James acknowledges this is a weakness. "In an ideal world, we'd have automated dependency tracking. That's on the roadmap for Phase 5."
Testing During Refactoring
The Comparison Testing Framework
James's refactoring strategy is built on a simple but powerful principle: the refactored program must produce byte-for-byte identical output to the legacy program when given the same input. This principle transforms refactoring from a risky activity into a safe, repeatable process.
The comparison testing framework works as follows:
-
Capture baseline output. Run the legacy program against a standardized test dataset and save all output files (valid claims, error reports, audit records, counters).
-
Run refactored program. Run the refactored program against the same test dataset and save all output files.
-
Compare outputs. Use the z/OS IEBCOMPR utility (or the open-source
diffcommand in GnuCOBOL environments) to compare the outputs byte-by-byte. -
Investigate differences. Any difference — even a single byte — must be investigated and explained. Some differences are intentional (e.g., the refactored version writes more detailed audit records). These are documented and approved by Sarah Kim. Unintentional differences indicate a bug in the refactoring.
//COMPARE EXEC PGM=IEBCOMPR
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DSN=MEDCLAIM.TEST.LEGACY.OUTPUT,DISP=SHR
//SYSUT2 DD DSN=MEDCLAIM.TEST.REFACT.OUTPUT,DISP=SHR
//SYSIN DD DUMMY
If IEBCOMPR returns RC=0, the outputs are identical. If it returns RC=8, there are differences, and the SYSPRINT will show the location of the first difference.
Building the Test Dataset
The test dataset is a critical artifact. It must cover every business rule and edge case in the program. James builds it from three sources:
1. Production sampling. James extracts a representative sample of 10,000 claims from the previous month's production data. This sample includes the full spectrum of claim types, provider types, member statuses, and error conditions that occur in real production.
2. Edge cases. James adds synthetic records that test specific edge cases:
| Test Case | Purpose |
|---|---|
| Service date = effective date | Boundary test for date coverage |
| Service date = termination date | Boundary test for date coverage |
| Service date = termination date + 1 | Should reject (one day after coverage ends) |
| Claim amount = $0.00 | Should reject (zero amount) |
| Claim amount = $99,999.99 | Maximum valid amount |
| Provider status = 'S' (suspended) | Should reject |
| Emergency room + no auth | Should accept (ER exemption) |
| COVID diagnosis code U07.1 | Should trigger COVID override |
| COVID diagnosis code U07.2 | Should trigger COVID override |
| Member ID not in file | Should reject (member not found) |
| Provider ID not in file | Should reject (provider not found) |
| Duplicate claim (same member, date, provider, amount) | Should accept (duplicate check is downstream) |
3. Regression captures. Every production bug that has been fixed in the past 5 years is added as a test case. If bug #4521 was caused by a claim with a hyphenated diagnosis code, there is now a test record with a hyphenated diagnosis code. This ensures that the refactoring does not reintroduce previously fixed bugs.
Sarah Kim reviews the test dataset against the business rule catalog (from Phase 1) and verifies that every cataloged rule has at least one test case that exercises it. Rules ADJ-001 through ADJ-005 all have corresponding test records.
The Refactoring Safety Net
The comparison testing framework creates a safety net for refactoring. Each refactoring step follows this cycle:
- Make a single structural change (e.g., replace inline record definition with copybook)
- Recompile
- Run against test dataset
- Compare outputs
- If identical: commit the change, move to the next refactoring step
- If different: investigate, fix, and repeat step 3
This cycle means that at every point during the refactoring, the team has a working program that produces correct output. If the refactoring is interrupted (by a production emergency, a budget cut, or a team member leaving), the most recent committed version is safe to deploy.
James tracks the refactoring progress in a spreadsheet that lists every planned change, its status (planned/in-progress/tested/committed), and the comparison test result. After eight weeks of refactoring, CLM-INTAKE has undergone 23 structural changes, each individually tested and committed:
| Change # | Description | Test Result |
|---|---|---|
| 1 | Replace inline claim definition with COPY CLMREC | Identical |
| 2 | Replace inline provider definition with COPY PRVREC | Identical |
| 3 | Replace inline member definition with COPY MBRREC | Identical |
| 4 | Rename WS-FS1 through WS-FS5 to meaningful names | Identical |
| 5 | Add 88-level conditions to file status fields | Identical |
| 6 | Replace GO TO 2000-EXIT with cascading IF | Identical |
| 7 | Separate member validation into 2200-VALIDATE-MEMBER | Identical |
| 8 | Separate provider validation into 2300-VALIDATE-PROVIDER | Identical |
| ... | ... | ... |
| 22 | Add audit trail WRITE for rejected claims | Audit records added (approved) |
| 23 | Add return code setting based on rejection count | RC differs (approved) |
Changes 22 and 23 produce different outputs by design — the refactored version generates audit records and return codes that the legacy version did not. Sarah Kim reviewed and approved these differences as intentional improvements.
🔵 Theme: Readability is a Feature. The comparison testing framework ensures that readability improvements (renaming variables, restructuring paragraphs, adding comments) do not accidentally change behavior. This is the foundation of safe refactoring: change the structure without changing the meaning.
Understanding the COBOL Compilation Pipeline
A key concept in MedClaim's modernization is understanding how COBOL programs move from source code to running production code. Many modernization mistakes — particularly the "forgot to rebind" problem with DB2 — stem from not understanding this pipeline.
From Source to Load Module
The compilation pipeline for a standard COBOL program (without DB2) has three stages:
Source Code → [COBOL Compiler (IGYCRCTL)] → Object Module (.OBJ)
Object Module → [Linkage Editor (IEWL)] → Load Module
Load Module → Production Load Library → Execution
The COBOL Compiler translates COBOL source into machine-language object code. It resolves COPY statements (inserting copybook contents), checks syntax, and generates diagnostics. The output is an object module — machine code that references external programs by name but does not contain their code.
The Linkage Editor resolves external references. If CLM-INTAKE calls CLM-VALID, the linkage editor finds CLM-VALID's object module and combines them into a single load module (or creates a reference for dynamic calls). The output is a load module — an executable program ready to run.
The Load Library stores load modules. When JCL specifies EXEC PGM=CLM-INTAKE, the operating system finds the load module in the load library and executes it.
The DB2 Compilation Pipeline
When a COBOL program contains EXEC SQL statements, the pipeline adds two stages:
Source Code → [DB2 Precompiler (DSNHPC)] → Modified Source + DBRM
Modified Source → [COBOL Compiler] → Object Module
Object Module → [Linkage Editor] → Load Module
DBRM → [BIND PLAN/PACKAGE (DSN)] → DB2 Plan (in DB2 catalog)
The DB2 Precompiler is the critical addition. It: 1. Extracts every EXEC SQL block from the source code 2. Replaces each block with a CALL to the DB2 interface module (DSNHLI) 3. Creates a DBRM (Database Request Module) containing the extracted SQL 4. Generates host variable declarations
The DBRM is then BOUND to create a DB2 plan or package. The BIND process: 1. Validates the SQL syntax against the DB2 catalog (do the tables and columns exist?) 2. Optimizes the SQL (choosing access paths, index usage, join strategies) 3. Stores the optimized access plan in the DB2 catalog
The load module and the DB2 plan must be in sync. The precompiler embeds a consistency token (timestamp) in both the DBRM and the modified source. At runtime, DB2 compares the token in the load module against the token in the plan. If they do not match (SQLCODE -818), the program ABENDs.
This is why the "forgot to rebind" mistake is so insidious: the program compiles successfully, the linkage editor succeeds, and the load module looks correct. But at runtime, DB2 detects the timestamp mismatch and the program fails. The fix is to always include the BIND step in the compilation JCL:
//*-----------------------------------------------------------
//* COMPILE, LINK, AND BIND - COMPLETE PIPELINE
//*-----------------------------------------------------------
//PRECOMP EXEC PGM=DSNHPC
//DBRMLIB DD DSN=MEDCLAIM.DBRM(CLMINTAK),DISP=SHR
//SYSIN DD DSN=MEDCLAIM.SOURCE(CLMINTAK),DISP=SHR
//SYSLIB DD DSN=MEDCLAIM.COPYLIB,DISP=SHR
//SYSPRINT DD SYSOUT=*
//SYSCIN DD DSN=&&MODIFIED,DISP=(,PASS),UNIT=SYSDA
//*
//COMPILE EXEC PGM=IGYCRCTL,COND=(4,LT)
//SYSLIB DD DSN=MEDCLAIM.COPYLIB,DISP=SHR
//SYSIN DD DSN=&&MODIFIED,DISP=(OLD,DELETE)
//SYSLIN DD DSN=&&OBJECT,DISP=(,PASS),UNIT=SYSDA
//SYSPRINT DD SYSOUT=*
//*
//LINK EXEC PGM=IEWL,COND=(4,LT)
//SYSLIN DD DSN=&&OBJECT,DISP=(OLD,DELETE)
//SYSLMOD DD DSN=MEDCLAIM.LOADLIB(CLMINTAK),DISP=SHR
//SYSPRINT DD SYSOUT=*
//*
//BIND EXEC PGM=IKJEFT01,COND=(4,LT)
//SYSTSIN DD *
DSN SYSTEM(DSN1)
BIND PACKAGE(MEDCLAIM) -
MEMBER(CLMINTAK) -
ACTION(REPLACE) -
VALIDATE(BIND) -
ISOLATION(CS) -
RELEASE(COMMIT)
END
/*
//SYSPRINT DD SYSOUT=*
James creates this as a cataloged procedure so that developers cannot accidentally skip the BIND step. The procedure takes the program name as a symbolic parameter and executes all four steps automatically.
The DB2 Migration Decision
Why DB2? Why Now?
Moving from VSAM to DB2 is a significant undertaking. James must justify this decision to MedClaim's steering committee with concrete business benefits, not just technical arguments.
Business Case:
-
Ad-hoc queries. MedClaim's business analysts currently request custom reports by filing tickets with the development team. Average turnaround: 5 business days. With DB2, analysts can write SQL queries directly using SPUFI or QMF. Turnaround: immediate.
-
Referential integrity. VSAM files have no concept of relationships. A claim can reference a provider ID that does not exist in the provider file. DB2 foreign keys prevent this — the INSERT fails if the referenced provider does not exist.
-
Concurrent access. VSAM's share options are coarse (file-level or CI-level). DB2's locking is row-level by default, allowing much more granular concurrent access. This is essential for Phase 4 (APIs).
-
Recovery. VSAM recovery requires explicit EXPORT/IMPORT or REPRO. DB2 provides point-in-time recovery using its transaction log. If a batch program corrupts data, DB2 can recover to any point before the corruption.
What NOT to Migrate:
Not everything should move to DB2. James identifies three types of files that should remain as VSAM or sequential:
- Work files: Temporary files used within a single job stream do not benefit from DB2 overhead.
- Archive files: Historical data that is only read sequentially for reporting is more efficiently stored as compressed sequential files.
- High-performance lookup tables: Small, frequently accessed tables (like code-to-description mappings) may perform better as VSAM files with aggressive buffering.
DCLGEN: Keeping COBOL and DB2 in Sync
When Tomás Rivera creates a DB2 table, he runs DCLGEN (Declaration Generator) to produce a COBOL copybook that exactly matches the table definition:
//DCLGEN EXEC PGM=DSNTIAD
//SYSPRINT DD SYSOUT=*
//SYSTSIN DD *
DCLGEN TABLE(MEDCLAIM.CLAIM) -
LIBRARY('MEDCLAIM.COPYLIB(DCLCLM)') -
ACTION(REPLACE) -
LANGUAGE(COBOL) -
STRUCTURE(DCL-CLAIM) -
APOST -
LABEL(YES)
/*
DCLGEN generates a copybook like this:
* DCLGEN TABLE(MEDCLAIM.CLAIM)
* ... GENERATED BY DCLGEN
01 DCL-CLAIM.
10 CLAIM-ID PIC X(15).
10 MEMBER-ID PIC X(12).
10 PROVIDER-ID PIC X(10).
10 SERVICE-DATE PIC X(10).
10 DIAGNOSIS-CODE PIC X(7).
10 PROCEDURE-CODE PIC X(5).
10 CHARGED-AMT PIC S9(7)V9(2) COMP-3.
10 ALLOWED-AMT PIC S9(7)V9(2) COMP-3.
10 PAID-AMT PIC S9(7)V9(2) COMP-3.
10 CLAIM-STATUS PIC X(2).
10 REASON-CODE PIC X(4).
10 RECEIVED-TS PIC X(26).
10 PROCESSED-TS PIC X(26).
The key benefit: if someone changes the DB2 table definition (ALTER TABLE), rerunning DCLGEN produces an updated copybook. The recompile rule then forces all programs to be recompiled against the new definitions — preventing the copybook-mismatch problem that caused the incident in Case Study 1.
SQL in COBOL: The Precompiler Process
COBOL programs that contain EXEC SQL statements go through an additional compilation step: the DB2 precompiler. The precompiler:
- Extracts all EXEC SQL blocks from the COBOL source
- Replaces them with CALL statements to the DB2 interface module
- Creates a DBRM (Database Request Module) containing the SQL statements
- The DBRM is bound to a DB2 plan, which DB2 optimizes
The compilation sequence for a DB2/COBOL program:
COBOL Source → [DB2 Precompiler] → Modified COBOL Source + DBRM
│
Modified COBOL Source → [COBOL Compiler] → Object Module
│
Object Module → [Linkage Editor] → Load Module
│
DBRM → [BIND PLAN] → DB2 Plan (stored in DB2 catalog)
Understanding this process helps debugging: if a program ABENDs with an SQL error, the DBRM and plan must be in sync with the COBOL source. If someone recompiles the COBOL without rebinding the plan, the program may produce unpredictable results.
🔴 A Common Mistake: Forgetting to Rebind. James has seen this happen multiple times: a developer changes an EXEC SQL statement, recompiles the COBOL, but forgets to rebind the DB2 plan. The program runs with the OLD SQL (from the previous BIND) against the NEW COBOL logic. Sometimes it works by coincidence; sometimes it produces wrong results; sometimes it ABENDs with SQLCODE -805 (plan not found) or -818 (timestamp mismatch). The fix is simple: always rebind after recompiling. Better yet: automate the compile-bind sequence so developers cannot forget.
Phase 3: Adding DB2
Migrating from Flat Files to DB2
The next phase moves lookup data from VSAM files to DB2 tables. This provides several benefits:
- SQL access: Ad-hoc queries become possible without writing COBOL programs
- Referential integrity: DB2 enforces relationships between tables
- Concurrent access: DB2's locking model is more sophisticated than VSAM share options
- Recovery: DB2's logging and recovery are more robust than VSAM backup/restore
Tomás Rivera designs the DB2 schema:
-- Provider table
CREATE TABLE MEDCLAIM.PROVIDER (
PROVIDER_ID CHAR(10) NOT NULL,
PROVIDER_NAME VARCHAR(30) NOT NULL,
PROVIDER_TYPE SMALLINT NOT NULL,
PROVIDER_STAT CHAR(1) NOT NULL
DEFAULT 'A',
EFF_DATE DATE NOT NULL,
TERM_DATE DATE,
TAX_ID CHAR(9),
NPI_NUMBER CHAR(10),
LAST_UPDATE TIMESTAMP NOT NULL
DEFAULT CURRENT TIMESTAMP,
PRIMARY KEY (PROVIDER_ID)
);
-- Member table
CREATE TABLE MEDCLAIM.MEMBER (
MEMBER_ID CHAR(12) NOT NULL,
MEMBER_NAME VARCHAR(30) NOT NULL,
PLAN_CODE CHAR(4) NOT NULL,
EFF_DATE DATE NOT NULL,
TERM_DATE DATE,
DATE_OF_BIRTH DATE,
GENDER CHAR(1),
LAST_UPDATE TIMESTAMP NOT NULL
DEFAULT CURRENT TIMESTAMP,
PRIMARY KEY (MEMBER_ID)
);
-- Claim table
CREATE TABLE MEDCLAIM.CLAIM (
CLAIM_ID CHAR(15) NOT NULL,
MEMBER_ID CHAR(12) NOT NULL,
PROVIDER_ID CHAR(10) NOT NULL,
SERVICE_DATE DATE NOT NULL,
DIAGNOSIS_CODE CHAR(7) NOT NULL,
PROCEDURE_CODE CHAR(5) NOT NULL,
CHARGED_AMT DECIMAL(9,2) NOT NULL,
ALLOWED_AMT DECIMAL(9,2),
PAID_AMT DECIMAL(9,2),
CLAIM_STATUS CHAR(2) NOT NULL
DEFAULT 'RC',
REASON_CODE CHAR(4),
RECEIVED_TS TIMESTAMP NOT NULL
DEFAULT CURRENT TIMESTAMP,
PROCESSED_TS TIMESTAMP,
PRIMARY KEY (CLAIM_ID),
FOREIGN KEY (MEMBER_ID)
REFERENCES MEDCLAIM.MEMBER (MEMBER_ID),
FOREIGN KEY (PROVIDER_ID)
REFERENCES MEDCLAIM.PROVIDER (PROVIDER_ID)
);
-- Index for common queries
CREATE INDEX MEDCLAIM.CLM_MEMBER_IX
ON MEDCLAIM.CLAIM (MEMBER_ID, SERVICE_DATE);
CREATE INDEX MEDCLAIM.CLM_PROVIDER_IX
ON MEDCLAIM.CLAIM (PROVIDER_ID, SERVICE_DATE);
CREATE INDEX MEDCLAIM.CLM_STATUS_IX
ON MEDCLAIM.CLAIM (CLAIM_STATUS);
Modifying COBOL Programs for DB2
The member validation in CLM-INTAKE changes from VSAM READ to SQL SELECT:
* BEFORE: VSAM READ
MOVE CLM-MEMBER-ID TO MBR-MEMBER-ID.
READ MEMBER-FILE.
IF WS-MEMB-STATUS NOT = '00' ...
* AFTER: DB2 SQL
EXEC SQL
SELECT MEMBER_NAME,
PLAN_CODE,
EFF_DATE,
TERM_DATE
INTO :WS-MEMBER-NAME,
:WS-PLAN-CODE,
:WS-EFF-DATE,
:WS-TERM-DATE
FROM MEDCLAIM.MEMBER
WHERE MEMBER_ID = :CLM-MEMBER-ID
END-EXEC
EVALUATE SQLCODE
WHEN 0
PERFORM 2210-CHECK-COVERAGE-DATES
WHEN +100
SET WS-CLAIM-IS-INVALID TO TRUE
MOVE 'MNFD' TO WS-REJECT-CODE
MOVE 'MEMBER NOT FOUND'
TO WS-REJECT-MESSAGE
WHEN OTHER
PERFORM 9000-DB2-ERROR
END-EVALUATE
Key differences between VSAM and DB2 access:
-
SQLCODE replaces file status. SQLCODE 0 means success, +100 means not found (equivalent to VSAM status '23'), and negative values indicate errors.
-
Host variables. The
:CLM-MEMBER-IDand:WS-MEMBER-NAMEsyntax identifies COBOL variables used in SQL statements. The colon prefix is required in SQL; it is not present in regular COBOL. -
No explicit OPEN/CLOSE. DB2 tables do not need to be opened or closed in the program. Connection management is handled by the DB2 subsystem.
-
DCLGEN copybooks. DB2 provides a utility (DCLGEN) that generates COBOL copybooks matching DB2 table definitions. This ensures that COBOL field definitions always match the DB2 schema.
📊 The Migration Strategy. Tomás migrates one file at a time, starting with the least-used lookup files and working toward the most-used. Each migration follows the same pattern: (1) create the DB2 table, (2) load data from the VSAM file, (3) modify programs to use SQL instead of VSAM, (4) run parallel tests comparing SQL and VSAM results, (5) remove the VSAM file from production. This incremental approach means the system is never fully in VSAM or fully in DB2 — it operates in a mixed mode during the transition.
The Data Migration Process
Loading 18 years of VSAM data into DB2 tables is not a simple REPRO. The data must be validated, transformed, and loaded in a way that preserves referential integrity.
Tomás writes a COBOL data migration program for each table. The program reads the VSAM file, validates each record, transforms field formats as needed, and inserts into DB2:
IDENTIFICATION DIVISION.
PROGRAM-ID. MBR-MIGRATE.
*================================================================*
* Program: MBR-MIGRATE *
* Purpose: Migrate member data from VSAM to DB2 *
* Author: Tomás Rivera *
* Date: 2024-07-15 *
*================================================================*
DATA DIVISION.
WORKING-STORAGE SECTION.
COPY MBRREC.
01 WS-COUNTERS.
05 WS-READ PIC 9(07) VALUE ZERO.
05 WS-INSERTED PIC 9(07) VALUE ZERO.
05 WS-SKIPPED PIC 9(07) VALUE ZERO.
05 WS-ERRORS PIC 9(07) VALUE ZERO.
05 WS-COMMIT-COUNT PIC 9(07) VALUE ZERO.
01 WS-DB2-FIELDS.
05 WS-DB-MEMBER-ID PIC X(12).
05 WS-DB-MEMBER-NAME PIC X(30).
05 WS-DB-EFF-DATE PIC X(10).
05 WS-DB-TERM-DATE PIC X(10).
05 WS-DB-PLAN-CODE PIC X(04).
05 WS-DB-STATUS PIC X(01).
EXEC SQL INCLUDE SQLCA END-EXEC.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-PROCESS-RECORDS
UNTIL WS-END-OF-INPUT
PERFORM 3000-FINALIZE
STOP RUN
.
2000-PROCESS-RECORDS.
ADD 1 TO WS-READ
* Transform VSAM date format (YYYYMMDD) to DB2 date
* format (YYYY-MM-DD)
STRING MBR-EFF-DATE(1:4) DELIMITED BY SIZE
'-' DELIMITED BY SIZE
MBR-EFF-DATE(5:2) DELIMITED BY SIZE
'-' DELIMITED BY SIZE
MBR-EFF-DATE(7:2) DELIMITED BY SIZE
INTO WS-DB-EFF-DATE
END-STRING
* Skip obviously invalid records
IF MBR-MEMBER-ID = SPACES OR LOW-VALUES
ADD 1 TO WS-SKIPPED
PERFORM 2100-READ-NEXT
GO TO 2000-PROCESS-RECORDS-EXIT
END-IF
EXEC SQL
INSERT INTO MEDCLAIM.MEMBER
(MEMBER_ID, MEMBER_NAME, EFF_DATE,
TERM_DATE, PLAN_CODE, MEMBER_STATUS)
VALUES
(:MBR-MEMBER-ID, :MBR-MEMBER-NAME,
:WS-DB-EFF-DATE, :WS-DB-TERM-DATE,
:MBR-PLAN-CODE, :MBR-MEMBER-STATUS)
END-EXEC
EVALUATE SQLCODE
WHEN 0
ADD 1 TO WS-INSERTED
ADD 1 TO WS-COMMIT-COUNT
IF WS-COMMIT-COUNT >= 1000
EXEC SQL COMMIT END-EXEC
MOVE ZERO TO WS-COMMIT-COUNT
END-IF
WHEN -803
* Duplicate key - skip (already migrated)
ADD 1 TO WS-SKIPPED
WHEN OTHER
ADD 1 TO WS-ERRORS
DISPLAY 'DB2 ERROR FOR: ' MBR-MEMBER-ID
' SQLCODE: ' SQLCODE
END-EVALUATE
PERFORM 2100-READ-NEXT
.
2000-PROCESS-RECORDS-EXIT.
EXIT
.
Key migration design decisions:
-
Commit every 1,000 rows. Without periodic commits, a migration of 500,000 member records would create a single massive DB2 transaction. If the migration fails at row 499,999, all 499,998 previous inserts are rolled back. Committing every 1,000 rows limits the maximum rollback to 1,000 rows.
-
Handle duplicates gracefully. SQLCODE -803 means the primary key already exists. This allows the migration to be re-run safely — if it fails partway through and is restarted, it will skip already-migrated records instead of failing on duplicates.
-
Date format transformation. VSAM stores dates as PIC 9(08) in YYYYMMDD format. DB2 stores dates as DATE type in YYYY-MM-DD format. The STRING statement handles the transformation.
-
Skip invalid records. Rather than ABENDing on bad data (which would stop the entire migration), the program logs invalid records and continues. The skipped records are investigated manually.
DB2 Error Handling Patterns
After migrating to DB2, COBOL programs need robust SQL error handling. James establishes a standard error handling paragraph that all DB2-enabled programs must use:
9000-DB2-ERROR.
* Standard DB2 error handler
* Log error details for debugging
DISPLAY '*** DB2 ERROR ***'
DISPLAY 'SQLCODE: ' SQLCODE
DISPLAY 'SQLERRMC: ' SQLERRMC
DISPLAY 'PROGRAM: CLM-INTAKE'
DISPLAY 'LOCATION: ' WS-CURRENT-PARAGRAPH
* Determine severity
EVALUATE TRUE
WHEN SQLCODE = -911 OR SQLCODE = -913
* Deadlock or timeout - may be recoverable
DISPLAY 'DEADLOCK/TIMEOUT - WILL RETRY'
EXEC SQL ROLLBACK END-EXEC
WHEN SQLCODE = -904
* Resource unavailable - system issue
DISPLAY 'RESOURCE UNAVAILABLE'
MOVE 16 TO RETURN-CODE
WHEN SQLCODE = -551
* Authorization failure
DISPLAY 'AUTHORIZATION FAILURE'
MOVE 16 TO RETURN-CODE
WHEN OTHER
* Unknown error - log and continue or abort
DISPLAY 'UNHANDLED DB2 ERROR'
MOVE 16 TO RETURN-CODE
END-EVALUATE
.
The error handler distinguishes between transient errors (deadlocks, timeouts) that may succeed on retry, and permanent errors (authorization failures, resource unavailable) that require operator intervention. This pattern is reused across all 34 programs that access DB2 after the migration.
🔵 SQLCODE Cheat Sheet for COBOL Developers. The most common SQLCODEs that COBOL developers encounter:
| SQLCODE | Meaning | Action |
|---|---|---|
| 0 | Success | Continue processing |
| +100 | No rows found (SELECT) or no more rows (FETCH) | Handle as "not found" |
| -803 | Duplicate key on INSERT | Record already exists |
| -811 | SELECT returned more than one row | Add more WHERE conditions |
| -818 | Timestamp mismatch (plan vs. program) | Rebind the plan |
| -904 | Resource unavailable | Wait and retry, or notify operator |
| -911 | Deadlock detected, transaction rolled back | Retry the transaction |
| -913 | Unsuccessful execution, deadlock timeout | Retry the transaction |
This cheat sheet is posted in the team's work area and included in every new developer's onboarding packet.
Phase 4: Exposing as API
From Batch to Real-Time: CICS Web Services
The most transformative modernization phase exposes MedClaim's data through APIs. Partner organizations — hospitals, clinics, and other insurers — currently receive claim status information through nightly batch extracts. They want real-time access.
James's team builds a CICS web service that accepts JSON requests and returns JSON responses. The CICS program reads from DB2 (migrated in Phase 3) and formats the response using the JSON GENERATE statement available in Enterprise COBOL v6+.
IDENTIFICATION DIVISION.
PROGRAM-ID. CLM-API.
*================================================================*
* Program: CLM-API *
* Purpose: CICS web service for claim status inquiry *
* Channel: CLMCHNL *
* Author: James Okafor / Priya Kapoor *
* Date: 2024-06-15 *
*================================================================*
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-REQUEST-DATA.
05 WS-REQ-CLAIM-ID PIC X(15).
05 WS-REQ-MEMBER-ID PIC X(12).
05 WS-REQ-FUNCTION PIC X(10).
01 WS-RESPONSE-DATA.
05 WS-RSP-STATUS PIC X(07).
05 WS-RSP-MESSAGE PIC X(100).
05 WS-RSP-CLAIM.
10 WS-RSP-CLAIM-ID PIC X(15).
10 WS-RSP-MEMBER-ID PIC X(12).
10 WS-RSP-MEMBER-NAME PIC X(30).
10 WS-RSP-PROVIDER-ID PIC X(10).
10 WS-RSP-SVC-DATE PIC X(10).
10 WS-RSP-DIAG-CODE PIC X(07).
10 WS-RSP-CLM-STATUS PIC X(02).
10 WS-RSP-CHARGED-AMT PIC X(12).
10 WS-RSP-PAID-AMT PIC X(12).
01 WS-JSON-BUFFER PIC X(2000).
01 WS-JSON-LENGTH PIC 9(04) COMP.
01 WS-CONTAINER-NAME PIC X(16).
01 WS-CHANNEL-NAME PIC X(16) VALUE 'CLMCHNL'.
01 WS-RESP-CODE PIC S9(08) COMP.
01 WS-DB2-FIELDS.
05 WS-DB-CLAIM-ID PIC X(15).
05 WS-DB-MEMBER-ID PIC X(12).
05 WS-DB-MEMBER-NAME PIC X(30).
05 WS-DB-PROVIDER-ID PIC X(10).
05 WS-DB-SVC-DATE PIC X(10).
05 WS-DB-DIAG-CODE PIC X(07).
05 WS-DB-STATUS PIC X(02).
05 WS-DB-CHARGED PIC S9(7)V99 COMP-3.
05 WS-DB-PAID PIC S9(7)V99 COMP-3.
EXEC SQL INCLUDE SQLCA END-EXEC.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-RECEIVE-REQUEST
PERFORM 2000-PROCESS-REQUEST
PERFORM 3000-SEND-RESPONSE
EXEC CICS RETURN END-EXEC
.
1000-RECEIVE-REQUEST.
MOVE 'CLMREQUEST' TO WS-CONTAINER-NAME
EXEC CICS GET CONTAINER(WS-CONTAINER-NAME)
CHANNEL(WS-CHANNEL-NAME)
INTO(WS-JSON-BUFFER)
FLENGTH(WS-JSON-LENGTH)
RESP(WS-RESP-CODE)
END-EXEC
IF WS-RESP-CODE NOT = DFHRESP(NORMAL)
MOVE 'ERROR' TO WS-RSP-STATUS
MOVE 'FAILED TO RECEIVE REQUEST CONTAINER'
TO WS-RSP-MESSAGE
PERFORM 3000-SEND-RESPONSE
EXEC CICS RETURN END-EXEC
END-IF
EXEC CICS TRANSFORM DATATODATA
CHANNEL(WS-CHANNEL-NAME)
DATCONTAINER('CLMREQUEST')
INCONTAINER('CLMREQUEST')
RESP(WS-RESP-CODE)
END-EXEC
JSON PARSE WS-JSON-BUFFER
INTO WS-REQUEST-DATA
END-JSON
.
2000-PROCESS-REQUEST.
EVALUATE WS-REQ-FUNCTION
WHEN 'STATUS'
PERFORM 2100-CLAIM-STATUS
WHEN 'HISTORY'
PERFORM 2200-CLAIM-HISTORY
WHEN OTHER
MOVE 'ERROR' TO WS-RSP-STATUS
MOVE 'UNKNOWN FUNCTION' TO WS-RSP-MESSAGE
END-EVALUATE
.
2100-CLAIM-STATUS.
EXEC SQL
SELECT C.CLAIM_ID,
C.MEMBER_ID,
M.MEMBER_NAME,
C.PROVIDER_ID,
CHAR(C.SERVICE_DATE, ISO),
C.DIAGNOSIS_CODE,
C.CLAIM_STATUS,
C.CHARGED_AMT,
C.PAID_AMT
INTO :WS-DB-CLAIM-ID,
:WS-DB-MEMBER-ID,
:WS-DB-MEMBER-NAME,
:WS-DB-PROVIDER-ID,
:WS-DB-SVC-DATE,
:WS-DB-DIAG-CODE,
:WS-DB-STATUS,
:WS-DB-CHARGED,
:WS-DB-PAID
FROM MEDCLAIM.CLAIM C
JOIN MEDCLAIM.MEMBER M
ON C.MEMBER_ID = M.MEMBER_ID
WHERE C.CLAIM_ID = :WS-REQ-CLAIM-ID
END-EXEC
EVALUATE SQLCODE
WHEN 0
MOVE 'SUCCESS' TO WS-RSP-STATUS
MOVE 'CLAIM FOUND' TO WS-RSP-MESSAGE
MOVE WS-DB-CLAIM-ID TO WS-RSP-CLAIM-ID
MOVE WS-DB-MEMBER-ID TO WS-RSP-MEMBER-ID
MOVE WS-DB-MEMBER-NAME TO WS-RSP-MEMBER-NAME
MOVE WS-DB-PROVIDER-ID TO WS-RSP-PROVIDER-ID
MOVE WS-DB-SVC-DATE TO WS-RSP-SVC-DATE
MOVE WS-DB-DIAG-CODE TO WS-RSP-DIAG-CODE
MOVE WS-DB-STATUS TO WS-RSP-CLM-STATUS
WHEN +100
MOVE 'NOTFND' TO WS-RSP-STATUS
MOVE 'CLAIM NOT FOUND' TO WS-RSP-MESSAGE
WHEN OTHER
MOVE 'ERROR' TO WS-RSP-STATUS
STRING 'DB2 ERROR SQLCODE: '
DELIMITED BY SIZE
SQLCODE DELIMITED BY SIZE
INTO WS-RSP-MESSAGE
END-STRING
END-EVALUATE
.
2200-CLAIM-HISTORY.
* History retrieval using cursor (simplified)
MOVE 'SUCCESS' TO WS-RSP-STATUS
MOVE 'HISTORY FUNCTION NOT YET IMPLEMENTED'
TO WS-RSP-MESSAGE
.
3000-SEND-RESPONSE.
JSON GENERATE WS-JSON-BUFFER
FROM WS-RESPONSE-DATA
COUNT WS-JSON-LENGTH
END-JSON
MOVE 'CLMRESPONSE' TO WS-CONTAINER-NAME
EXEC CICS PUT CONTAINER(WS-CONTAINER-NAME)
CHANNEL(WS-CHANNEL-NAME)
FROM(WS-JSON-BUFFER)
FLENGTH(WS-JSON-LENGTH)
RESP(WS-RESP-CODE)
END-EXEC
.
The Architecture of a COBOL Web Service:
The program uses CICS channels and containers — a modern alternative to the traditional COMMAREA. A container holds the JSON request; the program parses it, queries DB2, builds a JSON response, and places it in another container. The CICS web service infrastructure handles HTTP protocol, content negotiation, and routing.
From the outside, this is a REST API. A partner system sends an HTTP GET request and receives a JSON response. From the inside, it is a COBOL program reading DB2 — exactly what MedClaim has been doing for 18 years, just with a different interface.
🔵 The JSON Bridge. The JSON GENERATE and JSON PARSE statements (Enterprise COBOL v6.1+) are the bridge between COBOL's fixed-format data and the modern web's preference for JSON. They work by mapping COBOL data items to JSON fields based on the item names. WS-RSP-CLAIM-ID in COBOL becomes "WS-RSP-CLAIM-ID" in JSON. You can customize the mapping with the NAME phrase to produce cleaner JSON field names.
Phase 5: CI/CD Pipeline and Testing
Building Automated Tests
The final phase establishes automated testing and continuous integration. Without automated tests, every change to the system requires manual verification — the 2-3 day test cycle that has been slowing MedClaim down.
James creates three tiers of tests:
Unit Tests: Individual program tests that verify specific validation rules, calculations, and edge cases. These use synthetic test data and check specific outputs.
//*================================================================*
//* UNIT TEST: CLM-INTAKE MEMBER VALIDATION *
//* Tests: Valid member, invalid member, expired coverage *
//*================================================================*
//UTEST01 EXEC PGM=CLMINTAK
//STEPLIB DD DSN=MEDCLAIM.TEST.LOADLIB,DISP=SHR
//CLMIN DD DSN=MEDCLAIM.TEST.CLAIMS.MEMBER,DISP=SHR
//CLMOUT DD DSN=&&CLMOUT,DISP=(NEW,PASS),
// SPACE=(TRK,(1,1)),
// DCB=(RECFM=FB,LRECL=200)
//ERROUT DD DSN=&&ERROUT,DISP=(NEW,PASS),
// SPACE=(TRK,(1,1)),
// DCB=(RECFM=FB,LRECL=132)
//PROVFL DD DSN=MEDCLAIM.TEST.PROVIDER,DISP=SHR
//MEMBFL DD DSN=MEDCLAIM.TEST.MEMBER,DISP=SHR
//AUDOUT DD DSN=&&AUDOUT,DISP=(NEW,PASS),
// SPACE=(TRK,(1,1))
//SYSOUT DD SYSOUT=*
//*
//*--- VERIFY EXPECTED OUTPUT ---
//*
//VERIFY EXEC PGM=IEBCOMPR,COND=(8,LT,UTEST01)
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DSN=&&CLMOUT,DISP=(OLD,DELETE)
//SYSUT2 DD DSN=MEDCLAIM.TEST.EXPECTED.MEMBER,DISP=SHR
//SYSIN DD *
COMPARE TYPORG=PS
/*
Integration Tests: End-to-end tests that run the complete job stream with test data and verify final outputs. These catch interface problems between programs.
Regression Tests: The complete test suite, run automatically after every code change. James configures the mainframe's automation tool to trigger regression tests whenever a program is compiled into the test load library.
The CI/CD Pipeline
James establishes a deployment pipeline using IBM's z/OS tools:
Developer makes change
│
▼
Compile to TEST loadlib
│
▼
Run unit tests (automated)
│
▼
Run integration tests (automated)
│
▼
Code review (manual - James or Sarah)
│
▼
Promote to QA loadlib
│
▼
Run regression suite against QA (automated)
│
▼
User acceptance testing (Sarah Kim, manual)
│
▼
Promote to PRODUCTION loadlib
│
▼
Post-deployment verification (automated smoke test)
This is not a sophisticated CI/CD pipeline by modern standards. There is no container orchestration, no blue-green deployment, no canary releases. But it is a massive improvement over MedClaim's previous process, which was: compile, test manually, promote to production, and hope.
⚠️ Pragmatic Modernization. A common mistake in modernization is trying to adopt every modern practice at once. James resists the pressure to implement Kubernetes, microservices, and DevOps-as-code. Instead, he implements the practices that provide the most value for MedClaim's specific situation: automated testing (reduces the 2-3 day test cycle to 2 hours), code review (catches bugs before production), and a promotion pipeline (prevents accidental deployment of untested code). The goal is not to be trendy — it is to be effective.
Test Data Management
One of the underappreciated challenges of automated testing is test data management. Each test run needs consistent, predictable data. If test data changes between runs, test results become unreliable.
James establishes a test data strategy with three tiers:
Tier 1: Static Reference Data. Provider and member files that do not change between test runs. These are loaded once into test VSAM files (and later into test DB2 tables) and refreshed only when the data model changes.
Tier 2: Test Input Data. Claim records that exercise specific validation rules and edge cases. Each test case has a documented purpose, and the complete set is version-controlled alongside the source code.
Tier 3: Expected Output Data. The "golden" output files that represent correct behavior. When a test runs, its output is compared against the golden file. If the program behavior changes intentionally (e.g., a new audit field is added), the golden file is updated and the change is documented.
The test data is stored in a dedicated set of PDS members:
MEDCLAIM.TEST.INPUT(CLMMBR01) - Member validation tests
MEDCLAIM.TEST.INPUT(CLMPROV01) - Provider validation tests
MEDCLAIM.TEST.INPUT(CLMAUTH01) - Authorization tests
MEDCLAIM.TEST.INPUT(CLMEDGE01) - Edge cases
MEDCLAIM.TEST.INPUT(CLMREGR01) - Regression (from past bugs)
MEDCLAIM.TEST.EXPECT(CLMMBR01) - Expected output for member tests
MEDCLAIM.TEST.EXPECT(CLMPROV01) - Expected output for provider tests
...
Measuring Modernization ROI
After completing all five phases, James prepares a modernization ROI report for the steering committee. The results quantify what modernization delivered:
Developer Productivity:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Time to understand a program | 3 days | 4 hours | 6x faster |
| Time to make a typical change | 2 days | 3 hours | 5x faster |
| Time to test a change | 2-3 days | 2 hours | 12x faster |
| Time to deploy to production | 1 day | 2 hours | 4x faster |
| New developer onboarding | 6 months | 6 weeks | 4x faster |
System Quality:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Production incidents per month | 4.2 | 0.8 | 81% reduction |
| Mean time to resolve incident | 6 hours | 1.5 hours | 75% reduction |
| Deployments per month | 1-2 | 8-12 | 6x more frequent |
| Test coverage (business rules) | ~30% | ~90% | 3x coverage |
| Code duplication rate | ~35% | ~8% | 77% reduction |
Business Capability:
| Capability | Before | After |
|---|---|---|
| Real-time claim status | Not available | Available via API |
| Ad-hoc data queries | 5-day request cycle | Self-service SQL |
| New partner integration | 3-6 months | 2-4 weeks |
| Regulatory reporting changes | 4-6 weeks | 1-2 weeks |
The total investment was 12 months of three-person effort — approximately 3 person-years. The annual savings in incident resolution, deployment efficiency, and developer productivity more than pay for the investment within the first year.
"The numbers tell one story," James says in his final presentation. "But the real ROI is harder to measure: we can now hire new developers and have them contributing within weeks instead of months. We can accept new partner integrations without a six-month project. And I can take a vacation without worrying that the system will fail while I'm gone."
🔴 Theme: The Human Factor. The modernization ROI is ultimately about people. Faster onboarding means the organization is not dependent on one or two developers. More frequent deployments mean changes reach users sooner. Lower incident rates mean the operations team spends less time firefighting and more time improving. The technology changes enable these human outcomes, but the human outcomes are what justify the investment.
The Web Service in Detail
CICS Pipeline Configuration
The CLM-API program does not exist in isolation. It is deployed within a CICS web service pipeline — a configuration that handles HTTP protocol, content type negotiation, and routing. The pipeline definition (typically in XML) specifies:
- Provider pipeline: The inbound pipeline that receives HTTP requests, converts them to CICS containers, and invokes the target program.
- URIMAP: A mapping from URL path to CICS program. For example,
/medclaim/api/v1/claims/{claimId}maps to transaction CLMQ, which invokes CLM-API. - JSON binding: The mapping between JSON field names and COBOL data item names. By default, COBOL item names become JSON field names. The NAME clause on JSON GENERATE/PARSE allows customization.
The CICS URIMAP definition:
DEFINE URIMAP(CLMAPI01)
GROUP(MEDCLAIM)
USAGE(SERVER)
SCHEME(HTTPS)
HOST(medclaim.example.com)
PATH(/api/v1/claims/*)
TRANSACTION(CLMQ)
PIPELINE(CLMPIPE)
TCPIPSERVICE(HTTPSPT)
STATUS(ENABLED)
When an HTTP GET request arrives at https://medclaim.example.com/api/v1/claims/CLM000098765, CICS routes it through the pipeline, places the request body in a container, and starts transaction CLMQ, which invokes CLM-API.
Security Considerations for APIs
Exposing internal data through APIs creates security requirements that batch systems never faced:
Authentication: Every API call must identify the caller. Options include HTTP Basic Auth (username/password), OAuth 2.0 tokens, or client certificates. CICS integrates with RACF for authentication.
Authorization: Not every authenticated user should access every claim. James implements role-based access: hospitals can only see claims from their own members; MedClaim staff can see all claims.
Rate Limiting: A misbehaving client sending thousands of requests per second could overwhelm the CICS region. The API gateway (z/OS Connect or CICS web interface) should enforce request rate limits.
Encryption: All API traffic uses HTTPS (TLS). The TCPIPSERVICE definition specifies HTTPS and the SSL certificate.
Input Validation: The CLM-API program validates every input field before using it in an SQL statement. This prevents SQL injection — a concern that batch programs never had because their input came from trusted internal files.
* Validate claim ID format before using in SQL
IF WS-REQ-CLAIM-ID = SPACES
OR WS-REQ-CLAIM-ID = LOW-VALUES
MOVE 'ERROR' TO WS-RSP-STATUS
MOVE 'CLAIM ID IS REQUIRED' TO WS-RSP-MESSAGE
PERFORM 3000-SEND-RESPONSE
EXEC CICS RETURN END-EXEC
END-IF
* Check for embedded SQL injection characters
INSPECT WS-REQ-CLAIM-ID TALLYING WS-TALLY-COUNT
FOR ALL "'" ALL ";" ALL "--"
IF WS-TALLY-COUNT > 0
MOVE 'ERROR' TO WS-RSP-STATUS
MOVE 'INVALID CHARACTERS IN CLAIM ID'
TO WS-RSP-MESSAGE
PERFORM 3000-SEND-RESPONSE
EXEC CICS RETURN END-EXEC
END-IF
📊 API Usage Patterns. Within the first month after deployment, the CLM-API handles approximately 8,000 requests per day. The breakdown: 60% claim status inquiries from partner hospitals, 25% member claim history from call center agents, 15% automated monitoring checks from partner systems. Peak usage is 11 AM-2 PM Eastern (during business hours at partner organizations).
Error Handling in Web Services
Web services require a different error handling philosophy than batch programs. In batch, an error is logged to a file and the next record is processed. In a web service, the error must be communicated to the caller through the HTTP response:
| Situation | HTTP Status | JSON Response |
|---|---|---|
| Claim found | 200 OK | Full claim data |
| Claim not found | 404 Not Found | {"status":"NOTFND","message":"Claim not found"} |
| Invalid input | 400 Bad Request | {"status":"ERROR","message":"Invalid claim ID format"} |
| DB2 error | 500 Internal Server Error | {"status":"ERROR","message":"Internal processing error"} |
| Authentication failed | 401 Unauthorized | {"status":"ERROR","message":"Authentication required"} |
The CLM-API program maps internal processing results to appropriate HTTP status codes. Notice that internal error details (like SQLCODE values) are NOT exposed in the response — they could reveal database structure to potential attackers. Instead, the program logs detailed error information to a CICS system log and returns a generic error message to the caller.
Building the Test Automation Framework
The Three Tiers of Testing
James's testing framework uses three tiers, each serving a different purpose:
Tier 1: Unit Tests (per-program, per-function)
Unit tests validate individual validation rules and calculations. Each test provides specific input and verifies specific output. Unit tests are fast (seconds) and focused.
Example unit test for CLM-INTAKE member validation:
TEST: Valid member with active coverage
INPUT: Member MBR100000001, Service Date 20240315
EXPECTED: Claim accepted (no rejection)
TEST: Member not found
INPUT: Member MBR999999999, Service Date 20240315
EXPECTED: Claim rejected, code MNFD
TEST: Service date outside coverage
INPUT: Member MBR100000001, Service Date 20250101 (after term date)
EXPECTED: Claim rejected, code MCOV
TEST: Service date in the future
INPUT: Member MBR100000001, Service Date 20991231
EXPECTED: Claim rejected, code SFUT
Tier 2: Integration Tests (multi-program, full job stream)
Integration tests run the complete processing pipeline with test data. They verify that programs work together correctly — that CLM-INTAKE's output is correctly processed by CLM-ADJUD, and CLM-ADJUD's output is correctly processed by CLM-PAY.
Tier 3: Regression Tests (full system, production-volume data)
Regression tests use a sanitized copy of production data (with personal information masked). They run the complete job stream and verify that output matches a known-good baseline. Regression tests are run after every code change, no matter how small.
Automated Test Execution
James creates a JCL PROC (cataloged procedure) that runs all three tiers:
//MCTEST PROC TIER='ALL'
//*================================================================*
//* MEDCLAIM AUTOMATED TEST SUITE *
//* PARM: TIER = 'UNIT' | 'INTG' | 'REGR' | 'ALL' *
//*================================================================*
//*--- TIER 1: UNIT TESTS ---
//UNIT01 EXEC PGM=CLMINTAK,
// COND=(0,NE) SKIP IF TIER NOT 'UNIT' OR 'ALL'
//STEPLIB DD DSN=MEDCLAIM.TEST.LOADLIB,DISP=SHR
// ... (unit test DD statements)
//*
//UNIT01V EXEC PGM=IEBCOMPR,COND=(8,LT,UNIT01)
// ... (comparison to expected output)
//*
//*--- TIER 2: INTEGRATION TESTS ---
//INTG01 EXEC PGM=CLMINTAK,
// COND=(4,LT,UNIT01V) SKIP IF UNIT TESTS FAILED
// ... (integration test DD statements)
The test suite runs automatically every night at 8 PM, after the day's development is complete. Results are emailed to the team. If any test fails, the team investigates before promoting code.
🧪 The Test Data Challenge. Creating realistic test data is one of the hardest parts of automated testing. MedClaim's test data must include: valid claims for every combination of member type, provider type, and claim type; invalid claims for every validation rule; edge cases like zero-dollar claims, claims on the exact coverage start and end dates, and claims with the maximum allowed amount. James maintains a test data generation program that creates these scenarios systematically.
The Modernization Results
After 12 months, James presents the results to MedClaim's steering committee:
Quantitative Results
| Metric | Before | After | Change |
|---|---|---|---|
| Developers who understand the system | 2 | 5 | +150% |
| Test cycle time | 2-3 days | 2 hours | -96% |
| Production incidents per quarter | 12 | 3 | -75% |
| Time to add new claim type | 3-4 weeks | 3-5 days | -80% |
| Partner data latency | 24 hours (batch) | Sub-second (API) | -99.99% |
| Copybook inconsistencies | 49 | 0 | -100% |
| Programs with automated tests | 0 | 34 (of 47) | +34 |
Qualitative Results
-
Knowledge is no longer concentrated. Three new developers can now maintain the system because the code is documented, readable, and has automated tests that catch regressions.
-
Change is no longer frightening. With automated tests, developers can modify programs with confidence that they will not break something unexpectedly. This has accelerated the pace of enhancement.
-
Partners are satisfied. The API provides real-time access to claim status, eliminating the nightly batch extract that was MedClaim's most frequent source of partner complaints.
-
The system is positioned for the future. With DB2, APIs, and automated testing in place, MedClaim can continue modernizing incrementally. The next phase might add event-driven processing, move to a cloud-hosted z/OS environment, or expose additional APIs.
What Was NOT Changed
Equally important is what James did not change:
- The core adjudication logic. The business rules that determine how claims are paid were not modified. They were documented, tested, and preserved exactly as they were.
- The batch job streams. Daily, weekly, and monthly batch processing continues to run exactly as before. The API is an addition, not a replacement.
- The programming language. Everything is still COBOL. No Java wrappers, no Python scripts, no "strangler fig" pattern. COBOL on z/OS is the right tool for this workload, and modernizing the code does not require changing the language.
🔴 Theme: Legacy != Obsolete. MedClaim's COBOL system is now modern in every way that matters: well-documented, well-tested, modular, API-enabled, and maintainable by a team. It is still COBOL. It still runs on z/OS. It still processes claims in batch. But it is no longer a liability — it is an asset. The system is not legacy because it is old; it was legacy because it was unmaintainable. Now it is maintainable, and "legacy" no longer applies.
Lessons from the Trenches
Lesson 1: Start with Documentation, Not Code
James's first instinct was to start refactoring immediately. Sarah Kim talked him out of it. "If you change code you don't understand, you'll introduce bugs. Document first, change second." She was right. The three months spent on Phase 1 (documentation) saved six months of debugging in Phases 2-5.
Lesson 2: Modernize Incrementally
Every phase delivered value independently. Phase 1 (documentation) made the system understandable. Phase 2 (refactoring) made it maintainable. Phase 3 (DB2) made it queryable. Phase 4 (API) made it accessible. Phase 5 (CI/CD) made it safe to change. If the project had been canceled after any phase, MedClaim would still have benefited.
Lesson 3: Preserve the Business Logic
The most dangerous part of modernization is accidentally changing business logic. James's rule was: "The refactored program must produce exactly the same output as the legacy program for the same input." Every refactoring was verified by running both the old and new versions against the same test data and comparing outputs byte-for-byte.
Lesson 4: Automated Tests Are the Safety Net
Before Phase 5, every change was a risk. After Phase 5, changes were routine. The automated test suite is not glamorous — it is JCL that runs programs and compares outputs — but it is the single most valuable artifact of the entire modernization effort.
Lesson 5: People Matter More Than Technology
The modernization succeeded because James had the right team. Sarah Kim understood the business domain and could verify that refactored code preserved business intent. Tomás Rivera understood DB2 and could design efficient schemas. James himself understood the legacy code and could guide the refactoring. No amount of technology could have compensated for a team that did not understand the system.
🧪 Theme: Defensive Programming. Throughout the modernization, James maintained a "parallel run" capability. For every program that was refactored, both the old and new versions were available in production. If the new version produced unexpected results, the old version could be reinstated within minutes. This defensive approach meant that modernization never put production processing at risk.
Common Modernization Anti-Patterns
James has seen other organizations attempt modernizations that failed. He shares the most common anti-patterns:
Anti-Pattern 1: The Big Bang Rewrite
"Let's rewrite the whole system in Java." This approach takes years, costs millions, and usually fails because the new system cannot replicate the accumulated business logic of the legacy system. By the time the rewrite is "done," the requirements have changed, and the team has burned out.
Anti-Pattern 2: The Screen Scraper
Wrapping a 3270 terminal interface in a web browser does not modernize anything. It adds a layer of complexity and latency without improving the underlying system. Real modernization changes the code, not just the presentation.
Anti-Pattern 3: The Technology-First Approach
"Let's adopt Kubernetes, then figure out what to put in it." Technology decisions should follow business needs, not precede them. MedClaim did not need containers — they needed readable code, automated tests, and an API. The technology choices followed from those needs.
Anti-Pattern 4: The Documentation Skip
"We don't have time to document. Let's just start coding." This is how bugs are introduced. If you do not understand the existing code, you cannot safely change it. Documentation is not overhead — it is prerequisite.
Anti-Pattern 5: The All-or-Nothing Fallacy
"If we can't modernize everything, why modernize anything?" This mindset prevents incremental progress. Even documenting and testing the most critical 20% of the system delivers enormous value.
Working with the Student Mainframe Lab
Adapting MedClaim's Modernization for GnuCOBOL
Students working with GnuCOBOL or the Student Mainframe Lab can practice the core modernization concepts using simplified versions of MedClaim's programs.
Phase 1 (Documentation) requires no technology at all. Take any COBOL program — one from this textbook, one from an open-source repository, or one you have written yourself — and practice the code archaeology process:
- Create a program specification (one page)
- Draw a data flow diagram
- Build a business rule catalog
- Assign a modernization tier (1-4)
This exercise develops the analytical skills that are the foundation of all modernization work.
Phase 2 (Refactoring) translates directly to GnuCOBOL. The refactoring techniques — copybook creation, 88-level conditions, structured paragraphs, subprogram extraction — are standard COBOL and work identically in GnuCOBOL. Write the legacy version of CLM-INTAKE (from this chapter), then refactor it step by step, running comparison tests at each step.
The comparison testing framework is simpler in GnuCOBOL:
# Compile and run legacy version
cobc -x -o clm-intake-legacy clm-intake-legacy.cbl
./clm-intake-legacy < test-claims.dat > legacy-output.dat
# Compile and run refactored version
cobc -x -o clm-intake-new clm-intake-new.cbl
./clm-intake-new < test-claims.dat > new-output.dat
# Compare outputs
diff legacy-output.dat new-output.dat
If diff produces no output, the refactoring preserved behavior. If it shows differences, investigate.
Phase 3 (DB2 Migration) can be approximated with SQLite or PostgreSQL. Replace EXEC SQL blocks with file-based SQL calls using GnuCOBOL's ODBC support, or simulate the migration by replacing VSAM READ with indexed file access patterns that mirror DB2 SELECT.
Phase 4 (API) is the hardest to simulate without CICS. One approach is to build a simple command-line interface that accepts a claim ID as a parameter and returns the claim data formatted as JSON:
IDENTIFICATION DIVISION.
PROGRAM-ID. CLM-QUERY.
* Simplified claim query - command-line version
* Usage: ./clm-query CLM000098765
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-CLAIM-ID PIC X(15).
01 WS-JSON-OUTPUT PIC X(500).
PROCEDURE DIVISION.
ACCEPT WS-CLAIM-ID FROM COMMAND-LINE
PERFORM 1000-LOOKUP-CLAIM
DISPLAY WS-JSON-OUTPUT
STOP RUN.
This is not a web service, but it exercises the same pattern: receive input, look up data, format output as JSON.
Phase 5 (CI/CD) can be practiced with shell scripts and cron jobs. Write a shell script that compiles all programs, runs all tests, and reports results. Schedule it to run automatically after code changes.
The Modernization Mindset
The most important takeaway from this capstone is not any specific technology — it is the modernization mindset. This mindset includes:
Respect for existing code. Legacy code works. It has been tested by millions of production transactions. Treating it with contempt leads to bad modernization decisions.
Incremental progress over revolutionary change. Each phase delivers value independently. If the project is canceled after Phase 2, the code is still better than before.
Testing as the foundation. You cannot safely change what you cannot test. Build the test framework first, then change the code.
Documentation as investment. Documentation is not overhead — it is the enabler of all subsequent work. Skipping documentation to "save time" costs more time in debugging and rework.
People over technology. The right team with simple tools will outperform the wrong team with sophisticated tools every time.
These principles apply regardless of the technology stack. Whether you are modernizing COBOL on z/OS, Java on WebSphere, or Python on AWS, the fundamentals are the same: understand first, test always, change incrementally, and invest in people.
Understanding the Modernization Timeline
Month-by-Month Breakdown
James's 12-month project followed a deliberate timeline. Understanding why each phase took the time it did helps with planning future modernization efforts.
Months 1-3: Phase 1 (Documentation and Understanding)
This phase consumed 25% of the project timeline, which surprised MedClaim's management. "Three months of documentation before any code changes?" was the reaction from the VP of IT.
James defended the timeline: "We are documenting 18 years of accumulated business logic. If we skip this phase, we will introduce bugs in every subsequent phase — bugs that will cost more than three months to find and fix."
The deliverables: - Complete system inventory (programs, copybooks, files, jobs) - Program specifications for all 47 programs - Business rule catalog with 247 cataloged rules - Data flow diagrams for 8 job streams - Modernization tier assignments for all programs
Months 4-6: Phase 2 (Refactoring)
Refactoring proceeded at approximately 5 programs per month. Each program went through the full cycle: create comparison baseline, refactor one step at a time, test after each step, commit. The 20 Tier 3 and Tier 4 programs were all refactored by the end of month 6.
Key milestone: by the end of month 5, all 47 programs were using standardized copybooks. This alone eliminated the inconsistency problem that had caused two production incidents in the previous year.
Months 7-9: Phase 3 (DB2 Migration)
Tomás Rivera led the DB2 migration. He created the schema design in month 7, built the data migration JCL in month 8, and modified programs to use SQL instead of VSAM in month 9. The migration was done one table at a time, with parallel runs at each step.
The most challenging part was not the SQL — it was the BIND process. DB2 plans needed to be bound with specific isolation levels and lock sizes to match the VSAM access patterns that the programs expected.
Month 10: Phase 4 (API)
Building the CLM-API web service took approximately 4 weeks. The COBOL code was straightforward — it was essentially a CICS program that reads DB2 and formats output, which MedClaim had been doing for years. The configuration — CICS pipeline definitions, URIMAP, security setup — took more time than the code itself.
Months 11-12: Phase 5 (CI/CD)
James and Sarah Kim spent the final two months building the automated test suite and deployment pipeline. The test data creation was the largest effort — building comprehensive test cases for 34 programs required deep understanding of every business rule.
The 80/20 Rule in Practice
Looking back, James observes that the Pareto principle applied at every level:
- 80% of the modernization value came from 20% of the programs (the Tier 3 and Tier 4 programs)
- 80% of the test cases came from 20% of the business rules (the complex adjudication rules)
- 80% of the refactoring effort went to 20% of the programs (the 6 Tier 4 programs, especially the 12,000-line CLM-ADJUD)
- 80% of the production incidents were caused by 20% of the code (the copybook-inconsistency areas)
This pattern is not unique to MedClaim. It applies to virtually every legacy modernization effort. The practical implication: focus your resources on the critical 20%. Do not waste time polishing programs that are already clean and rarely modified.
Summary
This capstone demonstrated the complete modernization of a legacy COBOL system through five phases:
- Documentation and Understanding: Code archaeology, system inventory, business rule cataloging
- Refactoring for Modularity: Copybook consolidation, concern separation, subprogram extraction
- Adding DB2: Migrating from VSAM to relational tables, SQL access from COBOL
- Exposing as API: CICS web services, JSON generation, real-time access
- CI/CD Pipeline and Testing: Automated unit, integration, and regression tests; deployment pipeline
All five themes converge in this capstone:
- Legacy != Obsolete: The modernized system is still COBOL on z/OS, but it is no longer a liability
- Readability is a Feature: Copybooks, 88-levels, and meaningful names make the code self-documenting
- The Modernization Spectrum: Each phase moved the system along the spectrum without disrupting production
- Defensive Programming: Parallel runs, automated tests, and rollback capability ensured safety
- The Human Factor: The project succeeded because of the team, not the technology
The modernization journey does not end here. MedClaim's system is now positioned for further evolution: event-driven processing, machine learning for fraud detection, cloud deployment. But those are topics for Capstone 3.
Chapter Reflection: What Modernization Really Means
Looking back at MedClaim's 12-month modernization, the transformation is clear — but it is important to understand what changed and what did not.
What changed: The code is readable. The data is accessible. The system is testable. The team is capable. The interfaces are modern.
What did not change: The business logic. The programming language. The batch processing schedule. The z/OS platform. The core architecture.
This is the essence of modernization. It is not about replacing old with new — it is about making existing systems sustainable, maintainable, and extensible. A COBOL program with clear structure, standardized copybooks, comprehensive tests, and an API is a modern program — regardless of when the language was designed or when the program was first written.
James often uses an analogy from architecture: "You don't demolish a building because the plumbing is old. You upgrade the plumbing while keeping the structure. Our programs are the structure. The copybooks, the file handling, the interfaces — those are the plumbing. We upgraded the plumbing."
The Broader Context
MedClaim's modernization is not unique. Across the mainframe industry, organizations are following similar paths:
-
Financial services: Banks are modernizing COBOL payment systems to support real-time payments, APIs for fintech integration, and cloud-native front-ends backed by mainframe transaction engines.
-
Insurance: Health insurers are modernizing claims systems to support API-based provider portals, real-time eligibility verification, and interoperability requirements (like FHIR for healthcare data exchange).
-
Government: Federal and state agencies are modernizing tax processing, benefits administration, and social services systems — often with multi-year, multi-phase modernization plans similar to MedClaim's five-phase approach.
In every case, the pattern is the same: understand first, refactor incrementally, add modern interfaces, automate testing, and invest in the team. The technology varies, but the principles are universal.
Derek Washington, who has been observing MedClaim's modernization as preparation for Capstone 3, draws a key insight: "Building a new system is engineering. Modernizing an existing system is engineering plus archaeology plus diplomacy. You need all three to succeed."
Sarah Kim adds the business perspective: "The steering committee approved this project because we showed them the risk — two developers maintaining 800,000 lines of critical code. The technology improvements are real, but the risk reduction is what got us funded. Always lead with the business case."
These lessons — lead with business value, invest in people, modernize incrementally, test continuously — are the foundation of professional COBOL development. They are the lessons that James Okafor has learned over 15 years, and they are the lessons that this capstone aims to teach.