53 min read

> "The worst thing you can do to a legacy system is rewrite it from scratch. The second worst thing is leave it exactly as it is."

Chapter 44: Capstone 2 — Legacy System Modernization Case Study

"The worst thing you can do to a legacy system is rewrite it from scratch. The second worst thing is leave it exactly as it is." — James Okafor, to the MedClaim modernization steering committee

Introduction: The Modernization Imperative

In Capstone 1, you built a system from scratch. That was the easy part. Most of your career will be spent working with systems that already exist — systems written by people who are no longer with the organization, documented partially or not at all, running on technology that has been patched and extended for decades. These systems work. They process millions of transactions, adjudicate hundreds of thousands of claims, and run the infrastructure of modern commerce. But they are increasingly difficult to maintain, extend, and integrate with modern systems.

This capstone takes you through the complete modernization of a legacy COBOL system. Not a rewrite — a modernization. The distinction matters. A rewrite discards decades of accumulated business logic, introduces new bugs, and takes years to complete. A modernization preserves what works while making the system easier to maintain, extend, and integrate. It is incremental, reversible, and — most importantly — it can deliver value at every phase.

💡 The Modernization Spectrum. There is no single "right" way to modernize a legacy system. The spectrum ranges from "document and stabilize" (least disruptive) to "rewrite in a new language" (most disruptive). Most successful modernizations operate in the middle of this spectrum: refactoring for clarity, migrating data stores, exposing APIs, and automating testing and deployment. This capstone walks through each of these phases.

The System: MedClaim Insurance Processing

MedClaim Health Services processes approximately 500,000 insurance claims per month. Their core processing system, MEDCLAIM-PROC, is approximately 800,000 lines of COBOL running on z/OS with DB2. The system has been in production for 18 years, and it works — claims are processed accurately, providers are paid on time, and regulatory requirements are met.

But the system has problems:

  • Only two developers understand it. James Okafor (team lead, 15 years) and one other senior developer maintain the entire system. If both left, the organization would be in crisis.
  • No automated tests. Changes are tested manually against a copy of production data. Test cycles take 2-3 days.
  • Flat files everywhere. While the core data is in DB2, many interfaces between programs use flat sequential files. These interfaces are fragile and difficult to change.
  • No API access. Partner organizations need claim status information, but the only way to get it is through a batch extract that runs nightly. Real-time access does not exist.
  • Inconsistent copybooks. Over 18 years, multiple developers have created slightly different versions of the same copybooks. Some programs use the "official" copybook; others use local copies with modifications.

James Okafor has been asked to lead the modernization effort. He has a budget for 12 months of work, a team of three (himself, Sarah Kim as business analyst, and Tomás Rivera as DBA), and a mandate from management: "Make this system sustainable for the next 10 years."

James accepts the assignment with a mixture of excitement and trepidation. He knows this system better than anyone alive — he has been maintaining it for 15 years, fixing bugs at 2 AM, adding features under deadline pressure, and watching the technical debt accumulate. He has wanted to modernize it for years but never had the budget or the organizational support. Now he has both, and the pressure to deliver is real.

⚖️ The Human Factor. Notice that the modernization is driven by a human problem — knowledge concentration in too few people — not a technical problem. The system works fine technically. But a system that only two people can maintain is an organizational risk. This is a pattern you will see repeatedly in your career: the decision to modernize is rarely about technology. It is about people, risk, and sustainability.


Phase 1: Documentation and Understanding

Code Archaeology

Before changing anything, James must understand what exists. This is code archaeology — the disciplined process of reading, documenting, and mapping a system you did not write.

James begins with an inventory. He uses a combination of JCL analysis, COBOL cross-reference listings, and manual review to create a complete picture of the system.

System Inventory:

Component Count Description
COBOL programs 47 Batch and CICS programs
Copybooks 83 Record layouts and common data
JCL procedures 12 Cataloged procedures for job streams
JCL job streams 8 Daily, weekly, monthly, ad-hoc
DB2 tables 23 Core data store
VSAM files 6 Work files and lookup tables
Sequential files 31 Interface files between programs
CICS transactions 5 Online inquiry and maintenance
BMS maps 5 Screen definitions

"Forty-seven programs," Sarah Kim observes. "That's more than I expected."

"Wait until you see how they're connected," James replies. He draws a program dependency diagram on the whiteboard. It looks like a plate of spaghetti.

The Legacy Code Assessment

James categorizes each program into one of four modernization tiers:

Tier 1 — Leave Alone (12 programs): These programs are simple, well-structured, and rarely modified. They work, they are readable, and changing them would add risk without benefit. Examples: date conversion utilities, standard report headers, parameter validation routines.

Tier 2 — Document and Stabilize (15 programs): These programs are complex but stable. They have not been modified in years and rarely cause production issues. They need documentation (comments, flow diagrams, data maps) but not structural changes. Examples: monthly regulatory report generators, year-end processing.

Tier 3 — Refactor (14 programs): These programs are complex, frequently modified, and difficult to understand. They contain duplicated code, inconsistent copybooks, and tangled control flow. They need structural improvement. Examples: claim adjudication, provider payment calculation, eligibility verification.

Tier 4 — Redesign (6 programs): These programs are fundamentally flawed in their current design. They cannot be refactored incrementally — they need to be redesigned and rewritten, one at a time, with the same interfaces so the rest of the system is unaffected. Examples: the claim intake program (which mixes I/O, validation, and business logic in a single 12,000-line program), the batch scheduler.

📊 The 80/20 Rule of Modernization. James estimates that 80% of the modernization value will come from the Tier 3 and Tier 4 programs — which represent only 20 programs out of 47. This is typical. Most legacy systems have a core of critical, complex programs surrounded by a periphery of simpler support programs. Focus your modernization effort on the core.

Documenting the Existing System

For each Tier 2 and Tier 3 program, James creates three documentation artifacts:

1. Program Specification: A one-page summary of what the program does, its inputs, outputs, key business rules, and known issues.

2. Data Flow Diagram: A visual showing which files and tables the program reads and writes, and how data flows through the program's major sections.

3. Business Rule Catalog: A structured list of every business rule embedded in the code. Each rule is tagged with a line number reference, a plain-English description, and a confidence level (High/Medium/Low) indicating how well the rule is understood.

Here is an excerpt from the business rule catalog for CLM-ADJUD (the claim adjudication program):

Rule ID Description Code Location Confidence
ADJ-001 Claims over $50,000 require manual review Lines 1247-1260 High
ADJ-002 Provider must be in active status Lines 1305-1318 High
ADJ-003 Duplicate claim check uses member ID + date + provider + amount Lines 1402-1456 Medium
ADJ-004 Emergency room claims bypass pre-authorization check Lines 1520-1535 High
ADJ-005 Mental health parity calculation uses 2008 rule table Lines 1678-1742 Low

🔴 Confidence Levels Matter. Rule ADJ-005 has Low confidence because the code references a "2008 rule table" that James cannot find documentation for. The code works — claims are adjudicated correctly — but nobody can explain why the calculation uses the specific factors it does. This is a common pattern in legacy systems: the code embodies institutional knowledge that was never written down anywhere else. If the code is lost, the knowledge is lost.

The Original Legacy Code

Let us examine a representative piece of MedClaim's legacy code — the claim intake program. This is what James's team found when they opened CLM-INTAKE:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CLM-INTAKE.
      * CLAIM INTAKE - MODIFIED 03/2012 JO
      * MODIFIED 07/2014 JO - ADDED TYPE 7 PROCESSING
      * MODIFIED 11/2016 RK - FIXED BUG IN DATE CHECK
      * MODIFIED 02/2019 JO - NEW PROVIDER FORMAT
      * MODIFIED 09/2021 JO - COVID CODES

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT CLAIM-IN ASSIGN TO CLMIN
               FILE STATUS IS WS-FS1.
           SELECT CLAIM-OUT ASSIGN TO CLMOUT
               FILE STATUS IS WS-FS2.
           SELECT ERR-FILE ASSIGN TO ERROUT
               FILE STATUS IS WS-FS3.
           SELECT PROV-FILE ASSIGN TO PROVFL
               ORGANIZATION IS INDEXED
               ACCESS IS RANDOM
               RECORD KEY IS PROV-ID
               FILE STATUS IS WS-FS4.
           SELECT MEMB-FILE ASSIGN TO MEMBFL
               ORGANIZATION IS INDEXED
               ACCESS IS RANDOM
               RECORD KEY IS MEMB-ID
               FILE STATUS IS WS-FS5.

       DATA DIVISION.
       FILE SECTION.
       FD  CLAIM-IN.
       01  CLAIM-IN-REC.
           05  CI-MEMB-ID       PIC X(12).
           05  CI-PROV-ID       PIC X(10).
           05  CI-SVC-DATE      PIC 9(08).
           05  CI-DIAG-CODE     PIC X(07).
           05  CI-PROC-CODE     PIC X(05).
           05  CI-AMOUNT        PIC 9(07)V99.
           05  CI-CLM-TYPE      PIC 9(01).
           05  CI-PLACE-SVC     PIC X(02).
           05  CI-MODIFIER      PIC X(02).
           05  CI-AUTH-NUM      PIC X(12).
           05  CI-FILLER        PIC X(32).

       FD  CLAIM-OUT.
       01  CLAIM-OUT-REC        PIC X(200).

       FD  ERR-FILE.
       01  ERR-REC              PIC X(132).

       FD  PROV-FILE.
       01  PROV-REC.
           05  PROV-ID          PIC X(10).
           05  PROV-NAME        PIC X(30).
           05  PROV-STAT        PIC X(01).
           05  PROV-TYPE        PIC 9(02).
           05  PROV-REST        PIC X(57).

       FD  MEMB-FILE.
       01  MEMB-REC.
           05  MEMB-ID          PIC X(12).
           05  MEMB-NAME        PIC X(30).
           05  MEMB-EFF-DT      PIC 9(08).
           05  MEMB-TERM-DT     PIC 9(08).
           05  MEMB-PLAN        PIC X(04).
           05  MEMB-REST        PIC X(38).

       WORKING-STORAGE SECTION.
       01  WS-FS1               PIC X(02).
       01  WS-FS2               PIC X(02).
       01  WS-FS3               PIC X(02).
       01  WS-FS4               PIC X(02).
       01  WS-FS5               PIC X(02).
       01  WS-EOF               PIC X VALUE 'N'.
       01  WS-CTR1              PIC 9(07) VALUE 0.
       01  WS-CTR2              PIC 9(07) VALUE 0.
       01  WS-CTR3              PIC 9(07) VALUE 0.
       01  WS-ERR-MSG           PIC X(80) VALUE SPACES.
       01  WS-WORK-DATE         PIC 9(08).
       01  WS-TODAY             PIC 9(08).
       01  WS-CLM-WORK          PIC X(200).

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INIT.
           PERFORM 2000-PROC UNTIL WS-EOF = 'Y'.
           PERFORM 3000-TERM.
           STOP RUN.

       1000-INIT.
           OPEN INPUT CLAIM-IN PROV-FILE MEMB-FILE.
           OPEN OUTPUT CLAIM-OUT ERR-FILE.
           ACCEPT WS-TODAY FROM DATE YYYYMMDD.
           READ CLAIM-IN
               AT END MOVE 'Y' TO WS-EOF.

       2000-PROC.
           ADD 1 TO WS-CTR1.
      * CHECK MEMBER
           MOVE CI-MEMB-ID TO MEMB-ID.
           READ MEMB-FILE.
           IF WS-FS5 NOT = '00'
               MOVE 'MEMBER NOT FOUND' TO WS-ERR-MSG
               PERFORM 8000-ERR
               GO TO 2000-EXIT
           END-IF.
      * CHECK DATES
           IF CI-SVC-DATE < MEMB-EFF-DT OR
              CI-SVC-DATE > MEMB-TERM-DT
               MOVE 'SERVICE DATE OUT OF COVERAGE' TO WS-ERR-MSG
               PERFORM 8000-ERR
               GO TO 2000-EXIT
           END-IF.
      * CHECK PROVIDER
           MOVE CI-PROV-ID TO PROV-ID.
           READ PROV-FILE.
           IF WS-FS4 NOT = '00'
               MOVE 'PROVIDER NOT FOUND' TO WS-ERR-MSG
               PERFORM 8000-ERR
               GO TO 2000-EXIT
           END-IF.
           IF PROV-STAT NOT = 'A'
               MOVE 'PROVIDER NOT ACTIVE' TO WS-ERR-MSG
               PERFORM 8000-ERR
               GO TO 2000-EXIT
           END-IF.
      * CHECK AMOUNT
           IF CI-AMOUNT NOT > 0
               MOVE 'INVALID AMOUNT' TO WS-ERR-MSG
               PERFORM 8000-ERR
               GO TO 2000-EXIT
           END-IF.
      * CHECK AUTH FOR NON-EMERGENCY
           IF CI-PLACE-SVC NOT = '23'
               IF CI-AUTH-NUM = SPACES
                   IF CI-CLM-TYPE NOT = 7
                       MOVE 'AUTH REQUIRED' TO WS-ERR-MSG
                       PERFORM 8000-ERR
                       GO TO 2000-EXIT
                   END-IF
               END-IF
           END-IF.
      * COVID OVERRIDE - ADDED 09/2021
           IF CI-DIAG-CODE(1:3) = 'U07'
               MOVE SPACES TO CI-AUTH-NUM
               MOVE 0 TO CI-CLM-TYPE
           END-IF.
      * WRITE GOOD CLAIM
           MOVE CLAIM-IN-REC TO WS-CLM-WORK.
           WRITE CLAIM-OUT-REC FROM WS-CLM-WORK.
           ADD 1 TO WS-CTR2.

       2000-EXIT.
           READ CLAIM-IN
               AT END MOVE 'Y' TO WS-EOF.

       3000-TERM.
           DISPLAY 'CLAIMS READ:    ' WS-CTR1.
           DISPLAY 'CLAIMS WRITTEN: ' WS-CTR2.
           DISPLAY 'CLAIMS ERRORED: ' WS-CTR3.
           CLOSE CLAIM-IN CLAIM-OUT ERR-FILE PROV-FILE MEMB-FILE.

       8000-ERR.
           STRING CI-MEMB-ID DELIMITED SIZE
                  ' ' DELIMITED SIZE
                  WS-ERR-MSG DELIMITED SIZE
                  INTO ERR-REC.
           WRITE ERR-REC.
           ADD 1 TO WS-CTR3.

This code works. It has been processing claims for 18 years. But it has significant problems:

  1. No copybooks. Record layouts are defined inline. If the claim format changes, every program that reads claims must be updated independently.
  2. Cryptic variable names. WS-FS1, WS-CTR1, CI-MEMB-ID — these are understandable if you know the conventions, but they are not self-documenting.
  3. Inline file status checking. The IF WS-FS4 NOT = '00' pattern is repeated everywhere without 88-level conditions.
  4. Hardcoded business rules. The COVID override (checking for diagnosis code 'U07') is hardcoded. When COVID rules change, the program must be modified and recompiled.
  5. No audit trail. Rejected claims are written to an error file, but there is no record of why the claim was rejected with enough detail for systematic analysis.
  6. Mixed concerns. I/O, validation, and business rules are all in the same paragraph. This makes the program difficult to test and modify.
  7. GO TO for flow control. The GO TO 2000-EXIT pattern is a common legacy idiom but makes the control flow harder to follow than structured alternatives.
  8. Unsigned amount field. CI-AMOUNT is PIC 9(07)V99 — unsigned. If a negative amount were received (perhaps from a reversal), it would be silently treated as positive.

⚠️ Do Not Judge. It is easy — and wrong — to look at legacy code and conclude that the original developers were incompetent. They were not. They were working with the standards, tools, and time pressures of their era. The code worked then, and it works now. Our job is not to judge it but to improve it — incrementally, carefully, and respectfully. As James tells his team: "This code paid the bills for 18 years. Show it some respect."

Analyzing the Legacy Code: A Detailed Walk-Through

Let us trace through the legacy CLM-INTAKE code to understand both what it does well and where it falls short. This kind of analysis is the core skill of code archaeology.

What the legacy code does well:

  1. It checks file status after every I/O. The IF WS-FS5 NOT = '00' pattern is not pretty, but it works. Every file read is verified. Many legacy programs do not check file status at all — they simply assume every READ succeeds.

  2. It validates in a reasonable order. Member check comes before provider check, which comes before financial checks. This ordering makes sense: there is no point checking the provider if the member does not exist.

  3. It has counters. WS-CTR1, WS-CTR2, and WS-CTR3 track reads, writes, and errors. The DISPLAY statements in 3000-TERM produce a processing summary. This is elementary audit control, but it is present — many legacy programs do not even have this.

  4. It handles COVID. The U07 diagnosis code check (added in 2021) shows that the program has been maintained and adapted to changing business requirements. The modification comment at the top documents when and why the change was made.

Where the legacy code falls short:

  1. The GO TO 2000-EXIT pattern creates a spaghetti-like control flow that is difficult to trace. Each validation check has its own GO TO, and understanding the flow requires mentally tracking which GO TOs can be reached from which IF conditions. In the refactored version, cascading IFs make the flow explicit: "if valid so far, check the next thing."

  2. The COVID override is dangerous. Lines 249-252 clear the authorization number and claim type for COVID diagnosis codes. But what if a claim has a legitimate authorization? Clearing it discards information. The refactored version would add a COVID-OVERRIDE flag without destroying existing data.

  3. No distinction between "not found" and "file error." The legacy code treats WS-FS5 NOT = '00' as "member not found." But VSAM file status '23' (not found) is very different from '24' (boundary violation) or '9x' (physical I/O error). The legacy code conflates all non-zero statuses into a single error path.

  4. The error report is unstructured. The STRING in paragraph 8000-ERR produces a free-form error line that is difficult to parse programmatically. If the operations team wants to count errors by type, they must write a custom parser. The refactored version uses structured error records with separate fields for each piece of information.

Understanding these strengths and weaknesses is the foundation for effective refactoring. You preserve the strengths (file status checking, validation ordering, counters) while addressing the weaknesses (GO TO flow, crude error handling, inline definitions).

The Knowledge Transfer Problem

James faces a challenge that many modernization leaders encounter: the original developer of CLM-INTAKE left MedClaim four years ago. The modification log shows names (JO for James Okafor, RK for an unknown developer), but the reasoning behind design decisions is lost.

For example, line 249 checks CI-DIAG-CODE(1:3) = 'U07'. Why the first three characters? COVID diagnosis codes are U07.1 (COVID-19, virus identified) and U07.2 (COVID-19, virus not identified). Checking three characters catches both codes. But it also catches any future code starting with U07 — is that intentional? Nobody knows.

James documents his assumption: "The U07 check is intentionally broad, catching any current or future U07.x code. If WHO assigns a non-COVID code in the U07 range, this check will need to be revised." By documenting the assumption, James ensures that the next developer who encounters this code understands the reasoning — or the lack thereof.

This kind of archaeological inference — reading the code, forming hypotheses about intent, documenting assumptions — is a skill that separates competent legacy developers from exceptional ones. The code tells you what happens; documentation tells you why. When documentation is missing, you must reconstruct the "why" from the "what" and record it for future developers.


Phase 2: Refactoring for Modularity

Creating Standard Copybooks

The first concrete modernization step is copybook consolidation. James identifies 83 copybooks in the system, but many are duplicates or near-duplicates. After analysis, he reduces the canonical set to 34 copybooks and creates a naming standard:

Prefix Meaning Example
CLM- Claim records CLMREC, CLMHIST
PRV- Provider records PRVREC, PRVADDR
MBR- Member records MBRREC, MBRELIG
PAY- Payment records PAYREC, PAYHIST
ERR- Error/audit records ERRREC, ERRCODE
WS- Common working storage WSDATE, WSCNTRS
RPT- Report layouts RPTHDR, RPTDTL

The new claim record copybook:

      *================================================================*
      * COPYBOOK: CLMREC                                                *
      * Insurance Claim Record Layout - Version 2.0                     *
      * Modernized: 2024-03-15 by James Okafor                        *
      * Previous:   Multiple inline definitions (legacy)               *
      *================================================================*
       01  CLAIM-RECORD.
           05  CLM-HEADER.
               10  CLM-CLAIM-ID        PIC X(15).
               10  CLM-RECEIVED-DATE   PIC 9(08).
               10  CLM-RECEIVED-TIME   PIC 9(06).
               10  CLM-SOURCE          PIC X(02).
                   88  CLM-FROM-ELECTRONIC VALUE 'EL'.
                   88  CLM-FROM-PAPER      VALUE 'PA'.
                   88  CLM-FROM-PHONE      VALUE 'PH'.
                   88  CLM-VALID-SOURCE    VALUE 'EL' 'PA' 'PH'.
           05  CLM-MEMBER-INFO.
               10  CLM-MEMBER-ID       PIC X(12).
               10  CLM-MEMBER-NAME     PIC X(30).
               10  CLM-PLAN-CODE       PIC X(04).
           05  CLM-PROVIDER-INFO.
               10  CLM-PROVIDER-ID     PIC X(10).
               10  CLM-PROVIDER-NAME   PIC X(30).
               10  CLM-PROVIDER-TYPE   PIC 9(02).
                   88  CLM-PROV-PHYSICIAN  VALUE 01.
                   88  CLM-PROV-HOSPITAL   VALUE 02.
                   88  CLM-PROV-LAB        VALUE 03.
                   88  CLM-PROV-PHARMACY   VALUE 04.
           05  CLM-SERVICE-INFO.
               10  CLM-SERVICE-DATE    PIC 9(08).
               10  CLM-DIAGNOSIS-CODE  PIC X(07).
               10  CLM-PROCEDURE-CODE  PIC X(05).
               10  CLM-MODIFIER        PIC X(02).
               10  CLM-PLACE-OF-SVC    PIC X(02).
                   88  CLM-EMERGENCY-ROOM  VALUE '23'.
               10  CLM-CLAIM-TYPE      PIC 9(01).
                   88  CLM-TYPE-STANDARD   VALUE 0.
                   88  CLM-TYPE-EMERGENCY  VALUE 1.
                   88  CLM-TYPE-MENTAL-HTH VALUE 2.
                   88  CLM-TYPE-DENTAL     VALUE 3.
                   88  CLM-TYPE-VISION     VALUE 4.
                   88  CLM-TYPE-EXEMPT     VALUE 7.
           05  CLM-FINANCIAL-INFO.
               10  CLM-CHARGED-AMOUNT  PIC S9(7)V99 COMP-3.
               10  CLM-ALLOWED-AMOUNT  PIC S9(7)V99 COMP-3.
               10  CLM-PAID-AMOUNT     PIC S9(7)V99 COMP-3.
               10  CLM-COPAY-AMOUNT    PIC S9(5)V99 COMP-3.
               10  CLM-DEDUCTIBLE-AMT  PIC S9(5)V99 COMP-3.
           05  CLM-AUTHORIZATION.
               10  CLM-AUTH-NUMBER     PIC X(12).
               10  CLM-AUTH-REQUIRED   PIC X(01).
                   88  CLM-NEEDS-AUTH      VALUE 'Y'.
                   88  CLM-NO-AUTH-NEEDED  VALUE 'N'.
           05  CLM-STATUS-INFO.
               10  CLM-STATUS          PIC X(02).
                   88  CLM-RECEIVED        VALUE 'RC'.
                   88  CLM-VALIDATED       VALUE 'VL'.
                   88  CLM-ADJUDICATED     VALUE 'AJ'.
                   88  CLM-PAID            VALUE 'PD'.
                   88  CLM-DENIED          VALUE 'DN'.
                   88  CLM-PENDING-REVIEW  VALUE 'PR'.
               10  CLM-REASON-CODE     PIC X(04).
           05  FILLER                  PIC X(20).

Compare this to the original inline definition. The modernized version: - Groups related fields logically - Uses 88-level conditions for every coded field - Uses COMP-3 for all monetary fields with sign - Has a consistent naming convention - Includes FILLER for future expansion - Documents its purpose, version, and history in comments

Refactoring CLM-INTAKE

With the new copybook in place, James refactors CLM-INTAKE. The refactored version separates concerns into distinct paragraphs, uses the standard copybook, and follows modern COBOL conventions.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CLM-INTAKE.
      *================================================================*
      * Program:  CLM-INTAKE                                           *
      * Purpose:  Validate and route incoming insurance claims          *
      * Author:   Original unknown; refactored by James Okafor         *
      * Date:     Refactored 2024-03-20                                *
      * System:   MedClaim Insurance Processing                        *
      *================================================================*
      * Modification Log:                                               *
      * Date       Author  Description                                  *
      * ---------- ------- ------------------------------------------- *
      * 2024-03-20 JO      Refactored: copybooks, structured code,    *
      *                     audit trail, separated validation           *
      * 2024-03-25 JO      Added external rule table for auth check    *
      *================================================================*

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT CLAIM-INPUT
               ASSIGN TO CLMIN
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-INPUT-STATUS.

           SELECT CLAIM-OUTPUT
               ASSIGN TO CLMOUT
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-OUTPUT-STATUS.

           SELECT ERROR-REPORT
               ASSIGN TO ERROUT
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-ERROR-STATUS.

           SELECT PROVIDER-FILE
               ASSIGN TO PROVFL
               ORGANIZATION IS INDEXED
               ACCESS MODE IS RANDOM
               RECORD KEY IS PRV-PROVIDER-ID
               FILE STATUS IS WS-PROV-STATUS.

           SELECT MEMBER-FILE
               ASSIGN TO MEMBFL
               ORGANIZATION IS INDEXED
               ACCESS MODE IS RANDOM
               RECORD KEY IS MBR-MEMBER-ID
               FILE STATUS IS WS-MEMB-STATUS.

           SELECT AUDIT-FILE
               ASSIGN TO AUDOUT
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-AUDIT-STATUS.

       DATA DIVISION.
       FILE SECTION.

       FD  CLAIM-INPUT
           RECORDING MODE IS F
           RECORD CONTAINS 200 CHARACTERS.
       COPY CLMREC.

       FD  CLAIM-OUTPUT
           RECORDING MODE IS F
           RECORD CONTAINS 200 CHARACTERS.
       01  CLAIM-OUTPUT-REC            PIC X(200).

       FD  ERROR-REPORT
           RECORDING MODE IS F
           RECORD CONTAINS 132 CHARACTERS.
       01  ERROR-REPORT-LINE           PIC X(132).

       FD  PROVIDER-FILE
           RECORD CONTAINS 100 CHARACTERS.
       COPY PRVREC.

       FD  MEMBER-FILE
           RECORD CONTAINS 100 CHARACTERS.
       COPY MBRREC.

       FD  AUDIT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 150 CHARACTERS.
       COPY ERRREC.

       WORKING-STORAGE SECTION.

       01  WS-FILE-STATUSES.
           05  WS-INPUT-STATUS        PIC X(02).
               88  WS-INPUT-OK           VALUE '00'.
               88  WS-INPUT-EOF          VALUE '10'.
           05  WS-OUTPUT-STATUS       PIC X(02).
               88  WS-OUTPUT-OK          VALUE '00'.
           05  WS-ERROR-STATUS        PIC X(02).
           05  WS-PROV-STATUS         PIC X(02).
               88  WS-PROV-OK            VALUE '00'.
               88  WS-PROV-NOT-FOUND     VALUE '23'.
           05  WS-MEMB-STATUS         PIC X(02).
               88  WS-MEMB-OK            VALUE '00'.
               88  WS-MEMB-NOT-FOUND     VALUE '23'.
           05  WS-AUDIT-STATUS        PIC X(02).

       01  WS-COUNTERS.
           05  WS-CLAIMS-READ         PIC 9(07) VALUE ZERO.
           05  WS-CLAIMS-ACCEPTED     PIC 9(07) VALUE ZERO.
           05  WS-CLAIMS-REJECTED     PIC 9(07) VALUE ZERO.

       01  WS-FLAGS.
           05  WS-EOF-FLAG            PIC X(01) VALUE 'N'.
               88  WS-END-OF-INPUT       VALUE 'Y'.
               88  WS-MORE-INPUT         VALUE 'N'.
           05  WS-CLAIM-VALID         PIC X(01).
               88  WS-CLAIM-IS-VALID     VALUE 'Y'.
               88  WS-CLAIM-IS-INVALID   VALUE 'N'.

       01  WS-REJECTION-INFO.
           05  WS-REJECT-CODE         PIC X(04).
           05  WS-REJECT-MESSAGE      PIC X(60).

       01  WS-DATE-WORK.
           05  WS-CURRENT-DATE        PIC 9(08).

       PROCEDURE DIVISION.

       0000-MAIN.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS-CLAIMS
               UNTIL WS-END-OF-INPUT
           PERFORM 3000-TERMINATE
           STOP RUN
           .

       1000-INITIALIZE.
           OPEN INPUT  CLAIM-INPUT
                       PROVIDER-FILE
                       MEMBER-FILE
           OPEN OUTPUT CLAIM-OUTPUT
                       ERROR-REPORT
                       AUDIT-FILE

           IF NOT WS-INPUT-OK
               DISPLAY 'FATAL: CANNOT OPEN CLAIM INPUT: '
                       WS-INPUT-STATUS
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF

           ACCEPT WS-CURRENT-DATE FROM DATE YYYYMMDD
           PERFORM 2100-READ-CLAIM
           .

       2000-PROCESS-CLAIMS.
           ADD 1 TO WS-CLAIMS-READ
           SET WS-CLAIM-IS-VALID TO TRUE
           MOVE SPACES TO WS-REJECT-CODE
           MOVE SPACES TO WS-REJECT-MESSAGE

           PERFORM 2200-VALIDATE-MEMBER
           IF WS-CLAIM-IS-VALID
               PERFORM 2300-VALIDATE-PROVIDER
           END-IF
           IF WS-CLAIM-IS-VALID
               PERFORM 2400-VALIDATE-SERVICE
           END-IF
           IF WS-CLAIM-IS-VALID
               PERFORM 2500-VALIDATE-AUTHORIZATION
           END-IF
           IF WS-CLAIM-IS-VALID
               PERFORM 2600-VALIDATE-FINANCIAL
           END-IF

           IF WS-CLAIM-IS-VALID
               PERFORM 2700-WRITE-ACCEPTED-CLAIM
           ELSE
               PERFORM 2800-WRITE-REJECTED-CLAIM
           END-IF

           PERFORM 2100-READ-CLAIM
           .

       2100-READ-CLAIM.
           READ CLAIM-INPUT
           EVALUATE TRUE
               WHEN WS-INPUT-OK
                   CONTINUE
               WHEN WS-INPUT-EOF
                   SET WS-END-OF-INPUT TO TRUE
               WHEN OTHER
                   DISPLAY 'READ ERROR: ' WS-INPUT-STATUS
                   SET WS-END-OF-INPUT TO TRUE
           END-EVALUATE
           .

       2200-VALIDATE-MEMBER.
           MOVE CLM-MEMBER-ID TO MBR-MEMBER-ID
           READ MEMBER-FILE
           EVALUATE TRUE
               WHEN WS-MEMB-OK
                   IF MBR-TERM-DATE < CLM-SERVICE-DATE
                      OR MBR-EFF-DATE > CLM-SERVICE-DATE
                       SET WS-CLAIM-IS-INVALID TO TRUE
                       MOVE 'MCOV' TO WS-REJECT-CODE
                       MOVE 'SERVICE DATE OUTSIDE COVERAGE PERIOD'
                           TO WS-REJECT-MESSAGE
                   END-IF
               WHEN WS-MEMB-NOT-FOUND
                   SET WS-CLAIM-IS-INVALID TO TRUE
                   MOVE 'MNFD' TO WS-REJECT-CODE
                   MOVE 'MEMBER NOT FOUND' TO WS-REJECT-MESSAGE
               WHEN OTHER
                   SET WS-CLAIM-IS-INVALID TO TRUE
                   MOVE 'MERR' TO WS-REJECT-CODE
                   MOVE 'MEMBER FILE READ ERROR'
                       TO WS-REJECT-MESSAGE
           END-EVALUATE
           .

       2300-VALIDATE-PROVIDER.
           MOVE CLM-PROVIDER-ID TO PRV-PROVIDER-ID
           READ PROVIDER-FILE
           EVALUATE TRUE
               WHEN WS-PROV-OK
                   IF NOT PRV-ACTIVE
                       SET WS-CLAIM-IS-INVALID TO TRUE
                       MOVE 'PINA' TO WS-REJECT-CODE
                       MOVE 'PROVIDER NOT IN ACTIVE STATUS'
                           TO WS-REJECT-MESSAGE
                   END-IF
               WHEN WS-PROV-NOT-FOUND
                   SET WS-CLAIM-IS-INVALID TO TRUE
                   MOVE 'PNFD' TO WS-REJECT-CODE
                   MOVE 'PROVIDER NOT FOUND'
                       TO WS-REJECT-MESSAGE
               WHEN OTHER
                   SET WS-CLAIM-IS-INVALID TO TRUE
                   MOVE 'PERR' TO WS-REJECT-CODE
                   MOVE 'PROVIDER FILE READ ERROR'
                       TO WS-REJECT-MESSAGE
           END-EVALUATE
           .

       2400-VALIDATE-SERVICE.
           IF CLM-SERVICE-DATE > WS-CURRENT-DATE
               SET WS-CLAIM-IS-INVALID TO TRUE
               MOVE 'SFUT' TO WS-REJECT-CODE
               MOVE 'SERVICE DATE IN THE FUTURE'
                   TO WS-REJECT-MESSAGE
           END-IF
           .

       2500-VALIDATE-AUTHORIZATION.
           IF NOT CLM-EMERGENCY-ROOM
              AND NOT CLM-TYPE-EXEMPT
              AND CLM-AUTH-NUMBER = SPACES
               SET WS-CLAIM-IS-INVALID TO TRUE
               MOVE 'AUTH' TO WS-REJECT-CODE
               MOVE 'AUTHORIZATION REQUIRED FOR THIS SERVICE'
                   TO WS-REJECT-MESSAGE
           END-IF
           .

       2600-VALIDATE-FINANCIAL.
           IF CLM-CHARGED-AMOUNT NOT > ZERO
               SET WS-CLAIM-IS-INVALID TO TRUE
               MOVE 'IAMT' TO WS-REJECT-CODE
               MOVE 'INVALID CLAIM AMOUNT'
                   TO WS-REJECT-MESSAGE
           END-IF
           .

       2700-WRITE-ACCEPTED-CLAIM.
           SET CLM-VALIDATED TO TRUE
           WRITE CLAIM-OUTPUT-REC FROM CLAIM-RECORD
           IF WS-OUTPUT-OK
               ADD 1 TO WS-CLAIMS-ACCEPTED
           ELSE
               DISPLAY 'WRITE ERROR ON OUTPUT: ' WS-OUTPUT-STATUS
           END-IF
           .

       2800-WRITE-REJECTED-CLAIM.
           SET CLM-DENIED TO TRUE
           MOVE WS-REJECT-CODE TO CLM-REASON-CODE
           ADD 1 TO WS-CLAIMS-REJECTED
           PERFORM 2810-WRITE-AUDIT
           PERFORM 2820-WRITE-ERROR-REPORT
           .

       2810-WRITE-AUDIT.
           INITIALIZE AUDIT-RECORD
           MOVE CLM-CLAIM-ID      TO AUD-CLAIM-ID
           MOVE CLM-MEMBER-ID     TO AUD-MEMBER-ID
           MOVE WS-REJECT-CODE    TO AUD-REJECT-CODE
           MOVE WS-REJECT-MESSAGE TO AUD-REJECT-DESC
           MOVE WS-CURRENT-DATE   TO AUD-DATE
           WRITE AUDIT-RECORD
           .

       2820-WRITE-ERROR-REPORT.
           INITIALIZE ERROR-REPORT-LINE
           STRING CLM-CLAIM-ID    DELIMITED BY SPACES
                  ' | '           DELIMITED BY SIZE
                  CLM-MEMBER-ID   DELIMITED BY SPACES
                  ' | '           DELIMITED BY SIZE
                  WS-REJECT-CODE  DELIMITED BY SIZE
                  ' | '           DELIMITED BY SIZE
                  WS-REJECT-MESSAGE DELIMITED BY SPACES
                  INTO ERROR-REPORT-LINE
           END-STRING
           WRITE ERROR-REPORT-LINE
           .

       3000-TERMINATE.
           DISPLAY '======================================'
           DISPLAY 'CLM-INTAKE PROCESSING COMPLETE'
           DISPLAY '======================================'
           DISPLAY 'CLAIMS READ:     ' WS-CLAIMS-READ
           DISPLAY 'CLAIMS ACCEPTED: ' WS-CLAIMS-ACCEPTED
           DISPLAY 'CLAIMS REJECTED: ' WS-CLAIMS-REJECTED
           DISPLAY '======================================'

           CLOSE CLAIM-INPUT
                 CLAIM-OUTPUT
                 ERROR-REPORT
                 PROVIDER-FILE
                 MEMBER-FILE
                 AUDIT-FILE

           IF WS-CLAIMS-REJECTED > ZERO
               MOVE 4 TO RETURN-CODE
           ELSE
               MOVE 0 TO RETURN-CODE
           END-IF
           .

What changed and why:

  1. Copybooks replace inline definitions. CLMREC, PRVREC, MBRREC, and ERRREC are now shared across all programs.
  2. Meaningful file status variables. WS-INPUT-STATUS with 88-levels replaces WS-FS1.
  3. Separated validation paragraphs. Each validation concern has its own paragraph. This makes testing possible — you can test member validation independently from provider validation.
  4. Structured rejection handling. A reject code and message are captured for every rejection, then written to both an audit file and an error report.
  5. No GO TO. The cascading IF pattern replaces the GO TO 2000-EXIT pattern. Both achieve the same result — skipping later validations after a failure — but the structured version is easier to trace.
  6. Audit trail added. Every rejected claim now generates an audit record, enabling systematic analysis of rejection patterns.
  7. Authorization logic uses 88-levels. Instead of checking CI-PLACE-SVC NOT = '23' and CI-CLM-TYPE NOT = 7, the code uses NOT CLM-EMERGENCY-ROOM and NOT CLM-TYPE-EXEMPT.

Theme: Readability is a Feature. Compare the two versions of the authorization check:

Legacy: IF CI-PLACE-SVC NOT = '23' ... IF CI-CLM-TYPE NOT = 7

Refactored: IF NOT CLM-EMERGENCY-ROOM AND NOT CLM-TYPE-EXEMPT

The refactored version reads like a business rule: "If this is not an emergency room visit and not an exempt claim type, authorization is required." A business analyst can verify that the code matches the business requirement. The legacy version requires the reader to know that '23' means emergency room and 7 means exempt.

Extracting Validation Subprograms

James takes the refactoring one step further. The validation logic in CLM-INTAKE is also needed by other programs (CLM-ADJUD uses the same member and provider checks). Instead of duplicating the code, James extracts the validation logic into a called subprogram.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CLM-VALID.
      *================================================================*
      * Program:  CLM-VALID                                            *
      * Purpose:  Centralized claim validation subprogram               *
      * Called by: CLM-INTAKE, CLM-ADJUD, CLM-AMEND                   *
      * Author:   James Okafor                                        *
      * Date:     2024-04-01                                           *
      *================================================================*

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  WS-INTERNAL-FLAGS.
           05  WS-PROV-STATUS         PIC X(02).
               88  WS-PROV-FOUND         VALUE '00'.
           05  WS-MEMB-STATUS         PIC X(02).
               88  WS-MEMB-FOUND         VALUE '00'.

       LINKAGE SECTION.
       01  LS-VALIDATION-REQUEST.
           05  LS-FUNCTION            PIC X(02).
               88  LS-VALIDATE-MEMBER    VALUE 'VM'.
               88  LS-VALIDATE-PROVIDER  VALUE 'VP'.
               88  LS-VALIDATE-AUTH      VALUE 'VA'.
               88  LS-VALIDATE-ALL       VALUE 'AL'.
           05  LS-MEMBER-ID           PIC X(12).
           05  LS-PROVIDER-ID         PIC X(10).
           05  LS-SERVICE-DATE        PIC 9(08).
           05  LS-PLACE-OF-SERVICE    PIC X(02).
           05  LS-CLAIM-TYPE          PIC 9(01).
           05  LS-AUTH-NUMBER         PIC X(12).
           05  LS-RESULT.
               10  LS-VALID-FLAG      PIC X(01).
                   88  LS-IS-VALID        VALUE 'Y'.
                   88  LS-IS-INVALID      VALUE 'N'.
               10  LS-REJECT-CODE     PIC X(04).
               10  LS-REJECT-MSG      PIC X(60).

       PROCEDURE DIVISION USING LS-VALIDATION-REQUEST.

       0000-MAIN.
           SET LS-IS-VALID TO TRUE
           MOVE SPACES TO LS-REJECT-CODE
           MOVE SPACES TO LS-REJECT-MSG

           EVALUATE TRUE
               WHEN LS-VALIDATE-MEMBER
                   PERFORM 1000-CHECK-MEMBER
               WHEN LS-VALIDATE-PROVIDER
                   PERFORM 2000-CHECK-PROVIDER
               WHEN LS-VALIDATE-AUTH
                   PERFORM 3000-CHECK-AUTH
               WHEN LS-VALIDATE-ALL
                   PERFORM 1000-CHECK-MEMBER
                   IF LS-IS-VALID
                       PERFORM 2000-CHECK-PROVIDER
                   END-IF
                   IF LS-IS-VALID
                       PERFORM 3000-CHECK-AUTH
                   END-IF
               WHEN OTHER
                   SET LS-IS-INVALID TO TRUE
                   MOVE 'IFNC' TO LS-REJECT-CODE
                   MOVE 'INVALID VALIDATION FUNCTION'
                       TO LS-REJECT-MSG
           END-EVALUATE

           GOBACK
           .

       1000-CHECK-MEMBER.
      *    Member validation logic here
      *    (reads MEMBER-FILE via CICS or VSAM as appropriate)
           CONTINUE
           .

       2000-CHECK-PROVIDER.
      *    Provider validation logic here
           CONTINUE
           .

       3000-CHECK-AUTH.
      *    Authorization validation logic here
           IF LS-PLACE-OF-SERVICE NOT = '23'
              AND LS-CLAIM-TYPE NOT = 7
              AND LS-AUTH-NUMBER = SPACES
               SET LS-IS-INVALID TO TRUE
               MOVE 'AUTH' TO LS-REJECT-CODE
               MOVE 'AUTHORIZATION REQUIRED'
                   TO LS-REJECT-MSG
           END-IF
           .

Now CLM-INTAKE, CLM-ADJUD, and CLM-AMEND all call CLM-VALID instead of implementing their own validation. When a validation rule changes, it changes in one place.

🔗 Theme: The Modernization Spectrum. Notice that James has not changed the system's architecture. Programs still read the same files, produce the same outputs, and run in the same JCL. He has changed the internal structure — extracting copybooks, separating concerns, creating reusable subprograms — without changing any external interface. This is the least risky form of modernization: improving the code without changing what it does.


The Refactoring Process in Detail

Step-by-Step Refactoring Methodology

James establishes a disciplined refactoring methodology that he requires every team member to follow. Undisciplined refactoring — changing code without a systematic process — is more dangerous than leaving the code alone.

Step 1: Baseline. Before changing any program, create a baseline: capture the program's output for a known set of inputs. This output becomes the "expected result" for all future comparisons.

//*--- CREATE BASELINE OUTPUT ---
//BASELN   EXEC PGM=CLMINTAK
//STEPLIB  DD  DSN=MEDCLAIM.PROD.LOADLIB,DISP=SHR
//CLMIN    DD  DSN=MEDCLAIM.TEST.CLAIMS,DISP=SHR
//CLMOUT   DD  DSN=MEDCLAIM.TEST.BASELINE.CLMOUT,
//             DISP=(NEW,CATLG),
//             SPACE=(CYL,(1,1)),
//             DCB=(RECFM=FB,LRECL=200)
//ERROUT   DD  DSN=MEDCLAIM.TEST.BASELINE.ERRORS,
//             DISP=(NEW,CATLG),
//             SPACE=(TRK,(5,1)),
//             DCB=(RECFM=FB,LRECL=132)
//PROVFL   DD  DSN=MEDCLAIM.TEST.PROVIDER,DISP=SHR
//MEMBFL   DD  DSN=MEDCLAIM.TEST.MEMBER,DISP=SHR
//SYSOUT   DD  SYSOUT=*

Step 2: Understand. Read the program from top to bottom. Document every paragraph's purpose. Identify the data flow: which files are inputs, which are outputs, what working storage is shared between paragraphs. Create the three documentation artifacts (program spec, data flow diagram, business rule catalog).

Step 3: Test harness. Create a JCL procedure that runs the program against the test data and compares the output to the baseline. This is your safety net — after every change, you run this procedure. If the output changes, you have introduced a bug.

//*--- COMPARE REFACTORED OUTPUT TO BASELINE ---
//COMPARE  EXEC PGM=IEBCOMPR
//SYSPRINT DD  SYSOUT=*
//SYSUT1   DD  DSN=MEDCLAIM.TEST.BASELINE.CLMOUT,DISP=SHR
//SYSUT2   DD  DSN=MEDCLAIM.TEST.REFACTOR.CLMOUT,DISP=SHR
//SYSIN    DD  *
  COMPARE TYPORG=PS
/*

Step 4: Refactor in small steps. Each refactoring change should be small enough that if the comparison test fails, you can identify what broke within minutes. Typical steps:

  1. Replace inline record definitions with COPY statements (recompile, compare)
  2. Replace numeric file status variables with named 88-levels (recompile, compare)
  3. Extract one validation concern into its own paragraph (recompile, compare)
  4. Replace GO TO with structured alternative (recompile, compare)
  5. Add audit trail writing (recompile, compare — output will differ, verify new output is correct)

Each step produces a compilable, testable program. If any step breaks the comparison, you revert that single step and investigate.

Step 5: Parallel run. After all refactoring is complete, run both the legacy and refactored versions against the full test dataset (not just the small test file — the production-volume test data). Compare outputs byte-for-byte. Only when the parallel run shows zero differences is the refactored version promoted to production.

The GO TO Debate

James expects pushback on eliminating GO TO statements, and he gets it. One of MedClaim's senior developers argues: "GO TO 2000-EXIT is a perfectly valid pattern. Every COBOL programmer recognizes it. Changing it just to satisfy a textbook rule introduces risk for no benefit."

James acknowledges the point. "You're right that GO TO 2000-EXIT is a recognized pattern. But here's the problem: when a paragraph has eight GO TO 2000-EXIT statements scattered through twenty validation checks, the control flow becomes hard to trace. Which checks were skipped? Did we skip the right ones? The cascading-IF pattern makes this explicit."

He demonstrates with a concrete example. The legacy authorization check:

      * Legacy: Three levels of nesting with GO TO exit
           IF CI-PLACE-SVC NOT = '23'
               IF CI-AUTH-NUM = SPACES
                   IF CI-CLM-TYPE NOT = 7
                       MOVE 'AUTH REQUIRED' TO WS-ERR-MSG
                       PERFORM 8000-ERR
                       GO TO 2000-EXIT
                   END-IF
               END-IF
           END-IF.

The refactored version:

      * Refactored: Flat condition with 88-levels
           IF NOT CLM-EMERGENCY-ROOM
              AND NOT CLM-TYPE-EXEMPT
              AND CLM-AUTH-NUMBER = SPACES
               SET WS-CLAIM-IS-INVALID TO TRUE
               MOVE 'AUTH' TO WS-REJECT-CODE
               MOVE 'AUTHORIZATION REQUIRED FOR THIS SERVICE'
                   TO WS-REJECT-MESSAGE
           END-IF

The refactored version is easier to read, easier to verify against the business requirement, and produces the same result. The GO TO is not needed because the validation flag (WS-CLAIM-IS-INVALID) controls whether subsequent validations are executed.

"I'm not against GO TO on principle," James clarifies. "I'm against GO TO when there's a clearer alternative. In this case, there is."

📊 Measuring Refactoring Impact. After refactoring CLM-INTAKE, James measures two things: (1) the time it takes a new developer to understand the program (before: ~3 days; after: ~4 hours), and (2) the time it takes to make a typical change (before: ~2 days including testing; after: ~3 hours). These are subjective measurements, but they quantify the economic value of readability.

Copybook Migration Strategy

Migrating from inline record definitions to shared copybooks is the single most impactful refactoring in MedClaim's modernization. But it must be done carefully.

The Migration Paradox: You cannot change 47 programs simultaneously. But if you change only some programs to use the new copybook while others still use inline definitions, you create a temporary inconsistency. If the new copybook differs from any inline definition — even by one byte — the programs will interpret data differently.

James solves this with a three-phase approach:

Phase A: Create copybooks that match existing definitions exactly. The first version of CLMREC must produce exactly the same record layout as the inline definitions it replaces. No field name changes, no type changes, no field additions. This is purely a structural change.

Phase B: Migrate programs one at a time. Replace each program's inline definition with COPY CLMREC. Recompile. Run the comparison test. If the output matches the baseline, the migration is successful. Move to the next program.

Phase C: Improve the copybook. Once all programs use the shared copybook, improve it: add 88-level conditions, change field names to follow the naming convention, add FILLER. Then recompile all programs and run the full regression suite.

This approach ensures that at no point do programs disagree about the record layout. Phase A is invisible to the programs (same layout, different source). Phase B is invisible to the data (same layout, same programs). Phase C changes the layout but all programs change simultaneously.

⚠️ The Recompile Rule. When a copybook changes, every program that COPYs it must be recompiled. James creates a cross-reference file that maps every copybook to every program that uses it:

CLMREC    -> CLM-INTAKE, CLM-ADJUD, CLM-PAY, CLM-AMEND,
             CLM-REVERSE, CLM-EXTRACT, CLM-API
PRVREC    -> CLM-INTAKE, CLM-ADJUD, PRV-MAINT, PRV-REPORT
MBRREC    -> CLM-INTAKE, CLM-ADJUD, MBR-MAINT, MBR-REPORT,
             MBR-ELIG

This cross-reference is maintained manually — and James acknowledges this is a weakness. "In an ideal world, we'd have automated dependency tracking. That's on the roadmap for Phase 5."


Testing During Refactoring

The Comparison Testing Framework

James's refactoring strategy is built on a simple but powerful principle: the refactored program must produce byte-for-byte identical output to the legacy program when given the same input. This principle transforms refactoring from a risky activity into a safe, repeatable process.

The comparison testing framework works as follows:

  1. Capture baseline output. Run the legacy program against a standardized test dataset and save all output files (valid claims, error reports, audit records, counters).

  2. Run refactored program. Run the refactored program against the same test dataset and save all output files.

  3. Compare outputs. Use the z/OS IEBCOMPR utility (or the open-source diff command in GnuCOBOL environments) to compare the outputs byte-by-byte.

  4. Investigate differences. Any difference — even a single byte — must be investigated and explained. Some differences are intentional (e.g., the refactored version writes more detailed audit records). These are documented and approved by Sarah Kim. Unintentional differences indicate a bug in the refactoring.

//COMPARE  EXEC PGM=IEBCOMPR
//SYSPRINT DD  SYSOUT=*
//SYSUT1   DD  DSN=MEDCLAIM.TEST.LEGACY.OUTPUT,DISP=SHR
//SYSUT2   DD  DSN=MEDCLAIM.TEST.REFACT.OUTPUT,DISP=SHR
//SYSIN    DD  DUMMY

If IEBCOMPR returns RC=0, the outputs are identical. If it returns RC=8, there are differences, and the SYSPRINT will show the location of the first difference.

Building the Test Dataset

The test dataset is a critical artifact. It must cover every business rule and edge case in the program. James builds it from three sources:

1. Production sampling. James extracts a representative sample of 10,000 claims from the previous month's production data. This sample includes the full spectrum of claim types, provider types, member statuses, and error conditions that occur in real production.

2. Edge cases. James adds synthetic records that test specific edge cases:

Test Case Purpose
Service date = effective date Boundary test for date coverage
Service date = termination date Boundary test for date coverage
Service date = termination date + 1 Should reject (one day after coverage ends)
Claim amount = $0.00 Should reject (zero amount)
Claim amount = $99,999.99 Maximum valid amount
Provider status = 'S' (suspended) Should reject
Emergency room + no auth Should accept (ER exemption)
COVID diagnosis code U07.1 Should trigger COVID override
COVID diagnosis code U07.2 Should trigger COVID override
Member ID not in file Should reject (member not found)
Provider ID not in file Should reject (provider not found)
Duplicate claim (same member, date, provider, amount) Should accept (duplicate check is downstream)

3. Regression captures. Every production bug that has been fixed in the past 5 years is added as a test case. If bug #4521 was caused by a claim with a hyphenated diagnosis code, there is now a test record with a hyphenated diagnosis code. This ensures that the refactoring does not reintroduce previously fixed bugs.

Sarah Kim reviews the test dataset against the business rule catalog (from Phase 1) and verifies that every cataloged rule has at least one test case that exercises it. Rules ADJ-001 through ADJ-005 all have corresponding test records.

The Refactoring Safety Net

The comparison testing framework creates a safety net for refactoring. Each refactoring step follows this cycle:

  1. Make a single structural change (e.g., replace inline record definition with copybook)
  2. Recompile
  3. Run against test dataset
  4. Compare outputs
  5. If identical: commit the change, move to the next refactoring step
  6. If different: investigate, fix, and repeat step 3

This cycle means that at every point during the refactoring, the team has a working program that produces correct output. If the refactoring is interrupted (by a production emergency, a budget cut, or a team member leaving), the most recent committed version is safe to deploy.

James tracks the refactoring progress in a spreadsheet that lists every planned change, its status (planned/in-progress/tested/committed), and the comparison test result. After eight weeks of refactoring, CLM-INTAKE has undergone 23 structural changes, each individually tested and committed:

Change # Description Test Result
1 Replace inline claim definition with COPY CLMREC Identical
2 Replace inline provider definition with COPY PRVREC Identical
3 Replace inline member definition with COPY MBRREC Identical
4 Rename WS-FS1 through WS-FS5 to meaningful names Identical
5 Add 88-level conditions to file status fields Identical
6 Replace GO TO 2000-EXIT with cascading IF Identical
7 Separate member validation into 2200-VALIDATE-MEMBER Identical
8 Separate provider validation into 2300-VALIDATE-PROVIDER Identical
... ... ...
22 Add audit trail WRITE for rejected claims Audit records added (approved)
23 Add return code setting based on rejection count RC differs (approved)

Changes 22 and 23 produce different outputs by design — the refactored version generates audit records and return codes that the legacy version did not. Sarah Kim reviewed and approved these differences as intentional improvements.

🔵 Theme: Readability is a Feature. The comparison testing framework ensures that readability improvements (renaming variables, restructuring paragraphs, adding comments) do not accidentally change behavior. This is the foundation of safe refactoring: change the structure without changing the meaning.


Understanding the COBOL Compilation Pipeline

A key concept in MedClaim's modernization is understanding how COBOL programs move from source code to running production code. Many modernization mistakes — particularly the "forgot to rebind" problem with DB2 — stem from not understanding this pipeline.

From Source to Load Module

The compilation pipeline for a standard COBOL program (without DB2) has three stages:

Source Code → [COBOL Compiler (IGYCRCTL)] → Object Module (.OBJ)
Object Module → [Linkage Editor (IEWL)] → Load Module
Load Module → Production Load Library → Execution

The COBOL Compiler translates COBOL source into machine-language object code. It resolves COPY statements (inserting copybook contents), checks syntax, and generates diagnostics. The output is an object module — machine code that references external programs by name but does not contain their code.

The Linkage Editor resolves external references. If CLM-INTAKE calls CLM-VALID, the linkage editor finds CLM-VALID's object module and combines them into a single load module (or creates a reference for dynamic calls). The output is a load module — an executable program ready to run.

The Load Library stores load modules. When JCL specifies EXEC PGM=CLM-INTAKE, the operating system finds the load module in the load library and executes it.

The DB2 Compilation Pipeline

When a COBOL program contains EXEC SQL statements, the pipeline adds two stages:

Source Code → [DB2 Precompiler (DSNHPC)] → Modified Source + DBRM
Modified Source → [COBOL Compiler] → Object Module
Object Module → [Linkage Editor] → Load Module
DBRM → [BIND PLAN/PACKAGE (DSN)] → DB2 Plan (in DB2 catalog)

The DB2 Precompiler is the critical addition. It: 1. Extracts every EXEC SQL block from the source code 2. Replaces each block with a CALL to the DB2 interface module (DSNHLI) 3. Creates a DBRM (Database Request Module) containing the extracted SQL 4. Generates host variable declarations

The DBRM is then BOUND to create a DB2 plan or package. The BIND process: 1. Validates the SQL syntax against the DB2 catalog (do the tables and columns exist?) 2. Optimizes the SQL (choosing access paths, index usage, join strategies) 3. Stores the optimized access plan in the DB2 catalog

The load module and the DB2 plan must be in sync. The precompiler embeds a consistency token (timestamp) in both the DBRM and the modified source. At runtime, DB2 compares the token in the load module against the token in the plan. If they do not match (SQLCODE -818), the program ABENDs.

This is why the "forgot to rebind" mistake is so insidious: the program compiles successfully, the linkage editor succeeds, and the load module looks correct. But at runtime, DB2 detects the timestamp mismatch and the program fails. The fix is to always include the BIND step in the compilation JCL:

//*-----------------------------------------------------------
//* COMPILE, LINK, AND BIND - COMPLETE PIPELINE
//*-----------------------------------------------------------
//PRECOMP EXEC PGM=DSNHPC
//DBRMLIB DD DSN=MEDCLAIM.DBRM(CLMINTAK),DISP=SHR
//SYSIN   DD DSN=MEDCLAIM.SOURCE(CLMINTAK),DISP=SHR
//SYSLIB  DD DSN=MEDCLAIM.COPYLIB,DISP=SHR
//SYSPRINT DD SYSOUT=*
//SYSCIN  DD DSN=&&MODIFIED,DISP=(,PASS),UNIT=SYSDA
//*
//COMPILE EXEC PGM=IGYCRCTL,COND=(4,LT)
//SYSLIB  DD DSN=MEDCLAIM.COPYLIB,DISP=SHR
//SYSIN   DD DSN=&&MODIFIED,DISP=(OLD,DELETE)
//SYSLIN  DD DSN=&&OBJECT,DISP=(,PASS),UNIT=SYSDA
//SYSPRINT DD SYSOUT=*
//*
//LINK    EXEC PGM=IEWL,COND=(4,LT)
//SYSLIN  DD DSN=&&OBJECT,DISP=(OLD,DELETE)
//SYSLMOD DD DSN=MEDCLAIM.LOADLIB(CLMINTAK),DISP=SHR
//SYSPRINT DD SYSOUT=*
//*
//BIND    EXEC PGM=IKJEFT01,COND=(4,LT)
//SYSTSIN DD *
  DSN SYSTEM(DSN1)
  BIND PACKAGE(MEDCLAIM) -
       MEMBER(CLMINTAK) -
       ACTION(REPLACE) -
       VALIDATE(BIND) -
       ISOLATION(CS) -
       RELEASE(COMMIT)
  END
/*
//SYSPRINT DD SYSOUT=*

James creates this as a cataloged procedure so that developers cannot accidentally skip the BIND step. The procedure takes the program name as a symbolic parameter and executes all four steps automatically.


The DB2 Migration Decision

Why DB2? Why Now?

Moving from VSAM to DB2 is a significant undertaking. James must justify this decision to MedClaim's steering committee with concrete business benefits, not just technical arguments.

Business Case:

  1. Ad-hoc queries. MedClaim's business analysts currently request custom reports by filing tickets with the development team. Average turnaround: 5 business days. With DB2, analysts can write SQL queries directly using SPUFI or QMF. Turnaround: immediate.

  2. Referential integrity. VSAM files have no concept of relationships. A claim can reference a provider ID that does not exist in the provider file. DB2 foreign keys prevent this — the INSERT fails if the referenced provider does not exist.

  3. Concurrent access. VSAM's share options are coarse (file-level or CI-level). DB2's locking is row-level by default, allowing much more granular concurrent access. This is essential for Phase 4 (APIs).

  4. Recovery. VSAM recovery requires explicit EXPORT/IMPORT or REPRO. DB2 provides point-in-time recovery using its transaction log. If a batch program corrupts data, DB2 can recover to any point before the corruption.

What NOT to Migrate:

Not everything should move to DB2. James identifies three types of files that should remain as VSAM or sequential:

  • Work files: Temporary files used within a single job stream do not benefit from DB2 overhead.
  • Archive files: Historical data that is only read sequentially for reporting is more efficiently stored as compressed sequential files.
  • High-performance lookup tables: Small, frequently accessed tables (like code-to-description mappings) may perform better as VSAM files with aggressive buffering.

DCLGEN: Keeping COBOL and DB2 in Sync

When Tomás Rivera creates a DB2 table, he runs DCLGEN (Declaration Generator) to produce a COBOL copybook that exactly matches the table definition:

//DCLGEN   EXEC PGM=DSNTIAD
//SYSPRINT DD  SYSOUT=*
//SYSTSIN  DD  *
  DCLGEN TABLE(MEDCLAIM.CLAIM) -
         LIBRARY('MEDCLAIM.COPYLIB(DCLCLM)') -
         ACTION(REPLACE) -
         LANGUAGE(COBOL) -
         STRUCTURE(DCL-CLAIM) -
         APOST -
         LABEL(YES)
/*

DCLGEN generates a copybook like this:

      * DCLGEN TABLE(MEDCLAIM.CLAIM)
      *   ... GENERATED BY DCLGEN
       01  DCL-CLAIM.
           10  CLAIM-ID           PIC X(15).
           10  MEMBER-ID          PIC X(12).
           10  PROVIDER-ID        PIC X(10).
           10  SERVICE-DATE       PIC X(10).
           10  DIAGNOSIS-CODE     PIC X(7).
           10  PROCEDURE-CODE     PIC X(5).
           10  CHARGED-AMT        PIC S9(7)V9(2) COMP-3.
           10  ALLOWED-AMT        PIC S9(7)V9(2) COMP-3.
           10  PAID-AMT           PIC S9(7)V9(2) COMP-3.
           10  CLAIM-STATUS       PIC X(2).
           10  REASON-CODE        PIC X(4).
           10  RECEIVED-TS        PIC X(26).
           10  PROCESSED-TS       PIC X(26).

The key benefit: if someone changes the DB2 table definition (ALTER TABLE), rerunning DCLGEN produces an updated copybook. The recompile rule then forces all programs to be recompiled against the new definitions — preventing the copybook-mismatch problem that caused the incident in Case Study 1.

SQL in COBOL: The Precompiler Process

COBOL programs that contain EXEC SQL statements go through an additional compilation step: the DB2 precompiler. The precompiler:

  1. Extracts all EXEC SQL blocks from the COBOL source
  2. Replaces them with CALL statements to the DB2 interface module
  3. Creates a DBRM (Database Request Module) containing the SQL statements
  4. The DBRM is bound to a DB2 plan, which DB2 optimizes

The compilation sequence for a DB2/COBOL program:

COBOL Source → [DB2 Precompiler] → Modified COBOL Source + DBRM
                                          │
Modified COBOL Source → [COBOL Compiler] → Object Module
                                          │
Object Module → [Linkage Editor] → Load Module
                                          │
DBRM → [BIND PLAN] → DB2 Plan (stored in DB2 catalog)

Understanding this process helps debugging: if a program ABENDs with an SQL error, the DBRM and plan must be in sync with the COBOL source. If someone recompiles the COBOL without rebinding the plan, the program may produce unpredictable results.

🔴 A Common Mistake: Forgetting to Rebind. James has seen this happen multiple times: a developer changes an EXEC SQL statement, recompiles the COBOL, but forgets to rebind the DB2 plan. The program runs with the OLD SQL (from the previous BIND) against the NEW COBOL logic. Sometimes it works by coincidence; sometimes it produces wrong results; sometimes it ABENDs with SQLCODE -805 (plan not found) or -818 (timestamp mismatch). The fix is simple: always rebind after recompiling. Better yet: automate the compile-bind sequence so developers cannot forget.


Phase 3: Adding DB2

Migrating from Flat Files to DB2

The next phase moves lookup data from VSAM files to DB2 tables. This provides several benefits:

  • SQL access: Ad-hoc queries become possible without writing COBOL programs
  • Referential integrity: DB2 enforces relationships between tables
  • Concurrent access: DB2's locking model is more sophisticated than VSAM share options
  • Recovery: DB2's logging and recovery are more robust than VSAM backup/restore

Tomás Rivera designs the DB2 schema:

-- Provider table
CREATE TABLE MEDCLAIM.PROVIDER (
    PROVIDER_ID    CHAR(10)     NOT NULL,
    PROVIDER_NAME  VARCHAR(30)  NOT NULL,
    PROVIDER_TYPE  SMALLINT     NOT NULL,
    PROVIDER_STAT  CHAR(1)      NOT NULL
                   DEFAULT 'A',
    EFF_DATE       DATE         NOT NULL,
    TERM_DATE      DATE,
    TAX_ID         CHAR(9),
    NPI_NUMBER     CHAR(10),
    LAST_UPDATE    TIMESTAMP    NOT NULL
                   DEFAULT CURRENT TIMESTAMP,
    PRIMARY KEY (PROVIDER_ID)
);

-- Member table
CREATE TABLE MEDCLAIM.MEMBER (
    MEMBER_ID      CHAR(12)     NOT NULL,
    MEMBER_NAME    VARCHAR(30)  NOT NULL,
    PLAN_CODE      CHAR(4)      NOT NULL,
    EFF_DATE       DATE         NOT NULL,
    TERM_DATE      DATE,
    DATE_OF_BIRTH  DATE,
    GENDER         CHAR(1),
    LAST_UPDATE    TIMESTAMP    NOT NULL
                   DEFAULT CURRENT TIMESTAMP,
    PRIMARY KEY (MEMBER_ID)
);

-- Claim table
CREATE TABLE MEDCLAIM.CLAIM (
    CLAIM_ID       CHAR(15)     NOT NULL,
    MEMBER_ID      CHAR(12)     NOT NULL,
    PROVIDER_ID    CHAR(10)     NOT NULL,
    SERVICE_DATE   DATE         NOT NULL,
    DIAGNOSIS_CODE CHAR(7)      NOT NULL,
    PROCEDURE_CODE CHAR(5)      NOT NULL,
    CHARGED_AMT    DECIMAL(9,2) NOT NULL,
    ALLOWED_AMT    DECIMAL(9,2),
    PAID_AMT       DECIMAL(9,2),
    CLAIM_STATUS   CHAR(2)      NOT NULL
                   DEFAULT 'RC',
    REASON_CODE    CHAR(4),
    RECEIVED_TS    TIMESTAMP    NOT NULL
                   DEFAULT CURRENT TIMESTAMP,
    PROCESSED_TS   TIMESTAMP,
    PRIMARY KEY (CLAIM_ID),
    FOREIGN KEY (MEMBER_ID)
        REFERENCES MEDCLAIM.MEMBER (MEMBER_ID),
    FOREIGN KEY (PROVIDER_ID)
        REFERENCES MEDCLAIM.PROVIDER (PROVIDER_ID)
);

-- Index for common queries
CREATE INDEX MEDCLAIM.CLM_MEMBER_IX
    ON MEDCLAIM.CLAIM (MEMBER_ID, SERVICE_DATE);

CREATE INDEX MEDCLAIM.CLM_PROVIDER_IX
    ON MEDCLAIM.CLAIM (PROVIDER_ID, SERVICE_DATE);

CREATE INDEX MEDCLAIM.CLM_STATUS_IX
    ON MEDCLAIM.CLAIM (CLAIM_STATUS);

Modifying COBOL Programs for DB2

The member validation in CLM-INTAKE changes from VSAM READ to SQL SELECT:

      * BEFORE: VSAM READ
           MOVE CLM-MEMBER-ID TO MBR-MEMBER-ID.
           READ MEMBER-FILE.
           IF WS-MEMB-STATUS NOT = '00' ...

      * AFTER: DB2 SQL
           EXEC SQL
               SELECT MEMBER_NAME,
                      PLAN_CODE,
                      EFF_DATE,
                      TERM_DATE
               INTO  :WS-MEMBER-NAME,
                     :WS-PLAN-CODE,
                     :WS-EFF-DATE,
                     :WS-TERM-DATE
               FROM  MEDCLAIM.MEMBER
               WHERE MEMBER_ID = :CLM-MEMBER-ID
           END-EXEC

           EVALUATE SQLCODE
               WHEN 0
                   PERFORM 2210-CHECK-COVERAGE-DATES
               WHEN +100
                   SET WS-CLAIM-IS-INVALID TO TRUE
                   MOVE 'MNFD' TO WS-REJECT-CODE
                   MOVE 'MEMBER NOT FOUND'
                       TO WS-REJECT-MESSAGE
               WHEN OTHER
                   PERFORM 9000-DB2-ERROR
           END-EVALUATE

Key differences between VSAM and DB2 access:

  1. SQLCODE replaces file status. SQLCODE 0 means success, +100 means not found (equivalent to VSAM status '23'), and negative values indicate errors.

  2. Host variables. The :CLM-MEMBER-ID and :WS-MEMBER-NAME syntax identifies COBOL variables used in SQL statements. The colon prefix is required in SQL; it is not present in regular COBOL.

  3. No explicit OPEN/CLOSE. DB2 tables do not need to be opened or closed in the program. Connection management is handled by the DB2 subsystem.

  4. DCLGEN copybooks. DB2 provides a utility (DCLGEN) that generates COBOL copybooks matching DB2 table definitions. This ensures that COBOL field definitions always match the DB2 schema.

📊 The Migration Strategy. Tomás migrates one file at a time, starting with the least-used lookup files and working toward the most-used. Each migration follows the same pattern: (1) create the DB2 table, (2) load data from the VSAM file, (3) modify programs to use SQL instead of VSAM, (4) run parallel tests comparing SQL and VSAM results, (5) remove the VSAM file from production. This incremental approach means the system is never fully in VSAM or fully in DB2 — it operates in a mixed mode during the transition.

The Data Migration Process

Loading 18 years of VSAM data into DB2 tables is not a simple REPRO. The data must be validated, transformed, and loaded in a way that preserves referential integrity.

Tomás writes a COBOL data migration program for each table. The program reads the VSAM file, validates each record, transforms field formats as needed, and inserts into DB2:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. MBR-MIGRATE.
      *================================================================*
      * Program:  MBR-MIGRATE                                          *
      * Purpose:  Migrate member data from VSAM to DB2                 *
      * Author:   Tomás Rivera                                        *
      * Date:     2024-07-15                                           *
      *================================================================*

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       COPY MBRREC.

       01  WS-COUNTERS.
           05  WS-READ              PIC 9(07) VALUE ZERO.
           05  WS-INSERTED          PIC 9(07) VALUE ZERO.
           05  WS-SKIPPED           PIC 9(07) VALUE ZERO.
           05  WS-ERRORS            PIC 9(07) VALUE ZERO.
           05  WS-COMMIT-COUNT      PIC 9(07) VALUE ZERO.

       01  WS-DB2-FIELDS.
           05  WS-DB-MEMBER-ID      PIC X(12).
           05  WS-DB-MEMBER-NAME    PIC X(30).
           05  WS-DB-EFF-DATE       PIC X(10).
           05  WS-DB-TERM-DATE      PIC X(10).
           05  WS-DB-PLAN-CODE      PIC X(04).
           05  WS-DB-STATUS         PIC X(01).

           EXEC SQL INCLUDE SQLCA END-EXEC.

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS-RECORDS
               UNTIL WS-END-OF-INPUT
           PERFORM 3000-FINALIZE
           STOP RUN
           .

       2000-PROCESS-RECORDS.
           ADD 1 TO WS-READ
      *    Transform VSAM date format (YYYYMMDD) to DB2 date
      *    format (YYYY-MM-DD)
           STRING MBR-EFF-DATE(1:4)  DELIMITED BY SIZE
                  '-'                DELIMITED BY SIZE
                  MBR-EFF-DATE(5:2)  DELIMITED BY SIZE
                  '-'                DELIMITED BY SIZE
                  MBR-EFF-DATE(7:2)  DELIMITED BY SIZE
                  INTO WS-DB-EFF-DATE
           END-STRING

      *    Skip obviously invalid records
           IF MBR-MEMBER-ID = SPACES OR LOW-VALUES
               ADD 1 TO WS-SKIPPED
               PERFORM 2100-READ-NEXT
               GO TO 2000-PROCESS-RECORDS-EXIT
           END-IF

           EXEC SQL
               INSERT INTO MEDCLAIM.MEMBER
               (MEMBER_ID, MEMBER_NAME, EFF_DATE,
                TERM_DATE, PLAN_CODE, MEMBER_STATUS)
               VALUES
               (:MBR-MEMBER-ID, :MBR-MEMBER-NAME,
                :WS-DB-EFF-DATE, :WS-DB-TERM-DATE,
                :MBR-PLAN-CODE, :MBR-MEMBER-STATUS)
           END-EXEC

           EVALUATE SQLCODE
               WHEN 0
                   ADD 1 TO WS-INSERTED
                   ADD 1 TO WS-COMMIT-COUNT
                   IF WS-COMMIT-COUNT >= 1000
                       EXEC SQL COMMIT END-EXEC
                       MOVE ZERO TO WS-COMMIT-COUNT
                   END-IF
               WHEN -803
      *            Duplicate key - skip (already migrated)
                   ADD 1 TO WS-SKIPPED
               WHEN OTHER
                   ADD 1 TO WS-ERRORS
                   DISPLAY 'DB2 ERROR FOR: ' MBR-MEMBER-ID
                           ' SQLCODE: ' SQLCODE
           END-EVALUATE

           PERFORM 2100-READ-NEXT
           .

       2000-PROCESS-RECORDS-EXIT.
           EXIT
           .

Key migration design decisions:

  1. Commit every 1,000 rows. Without periodic commits, a migration of 500,000 member records would create a single massive DB2 transaction. If the migration fails at row 499,999, all 499,998 previous inserts are rolled back. Committing every 1,000 rows limits the maximum rollback to 1,000 rows.

  2. Handle duplicates gracefully. SQLCODE -803 means the primary key already exists. This allows the migration to be re-run safely — if it fails partway through and is restarted, it will skip already-migrated records instead of failing on duplicates.

  3. Date format transformation. VSAM stores dates as PIC 9(08) in YYYYMMDD format. DB2 stores dates as DATE type in YYYY-MM-DD format. The STRING statement handles the transformation.

  4. Skip invalid records. Rather than ABENDing on bad data (which would stop the entire migration), the program logs invalid records and continues. The skipped records are investigated manually.

DB2 Error Handling Patterns

After migrating to DB2, COBOL programs need robust SQL error handling. James establishes a standard error handling paragraph that all DB2-enabled programs must use:

       9000-DB2-ERROR.
      *    Standard DB2 error handler
      *    Log error details for debugging
           DISPLAY '*** DB2 ERROR ***'
           DISPLAY 'SQLCODE:  ' SQLCODE
           DISPLAY 'SQLERRMC: ' SQLERRMC
           DISPLAY 'PROGRAM:  CLM-INTAKE'
           DISPLAY 'LOCATION: ' WS-CURRENT-PARAGRAPH

      *    Determine severity
           EVALUATE TRUE
               WHEN SQLCODE = -911 OR SQLCODE = -913
      *            Deadlock or timeout - may be recoverable
                   DISPLAY 'DEADLOCK/TIMEOUT - WILL RETRY'
                   EXEC SQL ROLLBACK END-EXEC
               WHEN SQLCODE = -904
      *            Resource unavailable - system issue
                   DISPLAY 'RESOURCE UNAVAILABLE'
                   MOVE 16 TO RETURN-CODE
               WHEN SQLCODE = -551
      *            Authorization failure
                   DISPLAY 'AUTHORIZATION FAILURE'
                   MOVE 16 TO RETURN-CODE
               WHEN OTHER
      *            Unknown error - log and continue or abort
                   DISPLAY 'UNHANDLED DB2 ERROR'
                   MOVE 16 TO RETURN-CODE
           END-EVALUATE
           .

The error handler distinguishes between transient errors (deadlocks, timeouts) that may succeed on retry, and permanent errors (authorization failures, resource unavailable) that require operator intervention. This pattern is reused across all 34 programs that access DB2 after the migration.

🔵 SQLCODE Cheat Sheet for COBOL Developers. The most common SQLCODEs that COBOL developers encounter:

SQLCODE Meaning Action
0 Success Continue processing
+100 No rows found (SELECT) or no more rows (FETCH) Handle as "not found"
-803 Duplicate key on INSERT Record already exists
-811 SELECT returned more than one row Add more WHERE conditions
-818 Timestamp mismatch (plan vs. program) Rebind the plan
-904 Resource unavailable Wait and retry, or notify operator
-911 Deadlock detected, transaction rolled back Retry the transaction
-913 Unsuccessful execution, deadlock timeout Retry the transaction

This cheat sheet is posted in the team's work area and included in every new developer's onboarding packet.


Phase 4: Exposing as API

From Batch to Real-Time: CICS Web Services

The most transformative modernization phase exposes MedClaim's data through APIs. Partner organizations — hospitals, clinics, and other insurers — currently receive claim status information through nightly batch extracts. They want real-time access.

James's team builds a CICS web service that accepts JSON requests and returns JSON responses. The CICS program reads from DB2 (migrated in Phase 3) and formats the response using the JSON GENERATE statement available in Enterprise COBOL v6+.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CLM-API.
      *================================================================*
      * Program:  CLM-API                                              *
      * Purpose:  CICS web service for claim status inquiry             *
      * Channel:  CLMCHNL                                              *
      * Author:   James Okafor / Priya Kapoor                         *
      * Date:     2024-06-15                                           *
      *================================================================*

       DATA DIVISION.
       WORKING-STORAGE SECTION.

       01  WS-REQUEST-DATA.
           05  WS-REQ-CLAIM-ID        PIC X(15).
           05  WS-REQ-MEMBER-ID       PIC X(12).
           05  WS-REQ-FUNCTION        PIC X(10).

       01  WS-RESPONSE-DATA.
           05  WS-RSP-STATUS          PIC X(07).
           05  WS-RSP-MESSAGE         PIC X(100).
           05  WS-RSP-CLAIM.
               10  WS-RSP-CLAIM-ID    PIC X(15).
               10  WS-RSP-MEMBER-ID   PIC X(12).
               10  WS-RSP-MEMBER-NAME PIC X(30).
               10  WS-RSP-PROVIDER-ID PIC X(10).
               10  WS-RSP-SVC-DATE    PIC X(10).
               10  WS-RSP-DIAG-CODE   PIC X(07).
               10  WS-RSP-CLM-STATUS  PIC X(02).
               10  WS-RSP-CHARGED-AMT PIC X(12).
               10  WS-RSP-PAID-AMT    PIC X(12).

       01  WS-JSON-BUFFER             PIC X(2000).
       01  WS-JSON-LENGTH             PIC 9(04) COMP.

       01  WS-CONTAINER-NAME          PIC X(16).
       01  WS-CHANNEL-NAME            PIC X(16) VALUE 'CLMCHNL'.
       01  WS-RESP-CODE               PIC S9(08) COMP.

       01  WS-DB2-FIELDS.
           05  WS-DB-CLAIM-ID         PIC X(15).
           05  WS-DB-MEMBER-ID        PIC X(12).
           05  WS-DB-MEMBER-NAME      PIC X(30).
           05  WS-DB-PROVIDER-ID      PIC X(10).
           05  WS-DB-SVC-DATE         PIC X(10).
           05  WS-DB-DIAG-CODE        PIC X(07).
           05  WS-DB-STATUS           PIC X(02).
           05  WS-DB-CHARGED          PIC S9(7)V99 COMP-3.
           05  WS-DB-PAID             PIC S9(7)V99 COMP-3.

           EXEC SQL INCLUDE SQLCA END-EXEC.

       PROCEDURE DIVISION.

       0000-MAIN.
           PERFORM 1000-RECEIVE-REQUEST
           PERFORM 2000-PROCESS-REQUEST
           PERFORM 3000-SEND-RESPONSE
           EXEC CICS RETURN END-EXEC
           .

       1000-RECEIVE-REQUEST.
           MOVE 'CLMREQUEST' TO WS-CONTAINER-NAME

           EXEC CICS GET CONTAINER(WS-CONTAINER-NAME)
               CHANNEL(WS-CHANNEL-NAME)
               INTO(WS-JSON-BUFFER)
               FLENGTH(WS-JSON-LENGTH)
               RESP(WS-RESP-CODE)
           END-EXEC

           IF WS-RESP-CODE NOT = DFHRESP(NORMAL)
               MOVE 'ERROR' TO WS-RSP-STATUS
               MOVE 'FAILED TO RECEIVE REQUEST CONTAINER'
                   TO WS-RSP-MESSAGE
               PERFORM 3000-SEND-RESPONSE
               EXEC CICS RETURN END-EXEC
           END-IF

           EXEC CICS TRANSFORM DATATODATA
               CHANNEL(WS-CHANNEL-NAME)
               DATCONTAINER('CLMREQUEST')
               INCONTAINER('CLMREQUEST')
               RESP(WS-RESP-CODE)
           END-EXEC

           JSON PARSE WS-JSON-BUFFER
               INTO WS-REQUEST-DATA
           END-JSON
           .

       2000-PROCESS-REQUEST.
           EVALUATE WS-REQ-FUNCTION
               WHEN 'STATUS'
                   PERFORM 2100-CLAIM-STATUS
               WHEN 'HISTORY'
                   PERFORM 2200-CLAIM-HISTORY
               WHEN OTHER
                   MOVE 'ERROR' TO WS-RSP-STATUS
                   MOVE 'UNKNOWN FUNCTION' TO WS-RSP-MESSAGE
           END-EVALUATE
           .

       2100-CLAIM-STATUS.
           EXEC SQL
               SELECT C.CLAIM_ID,
                      C.MEMBER_ID,
                      M.MEMBER_NAME,
                      C.PROVIDER_ID,
                      CHAR(C.SERVICE_DATE, ISO),
                      C.DIAGNOSIS_CODE,
                      C.CLAIM_STATUS,
                      C.CHARGED_AMT,
                      C.PAID_AMT
               INTO  :WS-DB-CLAIM-ID,
                     :WS-DB-MEMBER-ID,
                     :WS-DB-MEMBER-NAME,
                     :WS-DB-PROVIDER-ID,
                     :WS-DB-SVC-DATE,
                     :WS-DB-DIAG-CODE,
                     :WS-DB-STATUS,
                     :WS-DB-CHARGED,
                     :WS-DB-PAID
               FROM  MEDCLAIM.CLAIM C
               JOIN  MEDCLAIM.MEMBER M
                 ON  C.MEMBER_ID = M.MEMBER_ID
               WHERE C.CLAIM_ID = :WS-REQ-CLAIM-ID
           END-EXEC

           EVALUATE SQLCODE
               WHEN 0
                   MOVE 'SUCCESS' TO WS-RSP-STATUS
                   MOVE 'CLAIM FOUND' TO WS-RSP-MESSAGE
                   MOVE WS-DB-CLAIM-ID    TO WS-RSP-CLAIM-ID
                   MOVE WS-DB-MEMBER-ID   TO WS-RSP-MEMBER-ID
                   MOVE WS-DB-MEMBER-NAME TO WS-RSP-MEMBER-NAME
                   MOVE WS-DB-PROVIDER-ID TO WS-RSP-PROVIDER-ID
                   MOVE WS-DB-SVC-DATE    TO WS-RSP-SVC-DATE
                   MOVE WS-DB-DIAG-CODE   TO WS-RSP-DIAG-CODE
                   MOVE WS-DB-STATUS      TO WS-RSP-CLM-STATUS
               WHEN +100
                   MOVE 'NOTFND' TO WS-RSP-STATUS
                   MOVE 'CLAIM NOT FOUND' TO WS-RSP-MESSAGE
               WHEN OTHER
                   MOVE 'ERROR' TO WS-RSP-STATUS
                   STRING 'DB2 ERROR SQLCODE: '
                       DELIMITED BY SIZE
                       SQLCODE DELIMITED BY SIZE
                       INTO WS-RSP-MESSAGE
                   END-STRING
           END-EVALUATE
           .

       2200-CLAIM-HISTORY.
      *    History retrieval using cursor (simplified)
           MOVE 'SUCCESS' TO WS-RSP-STATUS
           MOVE 'HISTORY FUNCTION NOT YET IMPLEMENTED'
               TO WS-RSP-MESSAGE
           .

       3000-SEND-RESPONSE.
           JSON GENERATE WS-JSON-BUFFER
               FROM WS-RESPONSE-DATA
               COUNT WS-JSON-LENGTH
           END-JSON

           MOVE 'CLMRESPONSE' TO WS-CONTAINER-NAME

           EXEC CICS PUT CONTAINER(WS-CONTAINER-NAME)
               CHANNEL(WS-CHANNEL-NAME)
               FROM(WS-JSON-BUFFER)
               FLENGTH(WS-JSON-LENGTH)
               RESP(WS-RESP-CODE)
           END-EXEC
           .

The Architecture of a COBOL Web Service:

The program uses CICS channels and containers — a modern alternative to the traditional COMMAREA. A container holds the JSON request; the program parses it, queries DB2, builds a JSON response, and places it in another container. The CICS web service infrastructure handles HTTP protocol, content negotiation, and routing.

From the outside, this is a REST API. A partner system sends an HTTP GET request and receives a JSON response. From the inside, it is a COBOL program reading DB2 — exactly what MedClaim has been doing for 18 years, just with a different interface.

🔵 The JSON Bridge. The JSON GENERATE and JSON PARSE statements (Enterprise COBOL v6.1+) are the bridge between COBOL's fixed-format data and the modern web's preference for JSON. They work by mapping COBOL data items to JSON fields based on the item names. WS-RSP-CLAIM-ID in COBOL becomes "WS-RSP-CLAIM-ID" in JSON. You can customize the mapping with the NAME phrase to produce cleaner JSON field names.


Phase 5: CI/CD Pipeline and Testing

Building Automated Tests

The final phase establishes automated testing and continuous integration. Without automated tests, every change to the system requires manual verification — the 2-3 day test cycle that has been slowing MedClaim down.

James creates three tiers of tests:

Unit Tests: Individual program tests that verify specific validation rules, calculations, and edge cases. These use synthetic test data and check specific outputs.

//*================================================================*
//* UNIT TEST: CLM-INTAKE MEMBER VALIDATION                        *
//* Tests: Valid member, invalid member, expired coverage           *
//*================================================================*
//UTEST01  EXEC PGM=CLMINTAK
//STEPLIB  DD  DSN=MEDCLAIM.TEST.LOADLIB,DISP=SHR
//CLMIN    DD  DSN=MEDCLAIM.TEST.CLAIMS.MEMBER,DISP=SHR
//CLMOUT   DD  DSN=&&CLMOUT,DISP=(NEW,PASS),
//             SPACE=(TRK,(1,1)),
//             DCB=(RECFM=FB,LRECL=200)
//ERROUT   DD  DSN=&&ERROUT,DISP=(NEW,PASS),
//             SPACE=(TRK,(1,1)),
//             DCB=(RECFM=FB,LRECL=132)
//PROVFL   DD  DSN=MEDCLAIM.TEST.PROVIDER,DISP=SHR
//MEMBFL   DD  DSN=MEDCLAIM.TEST.MEMBER,DISP=SHR
//AUDOUT   DD  DSN=&&AUDOUT,DISP=(NEW,PASS),
//             SPACE=(TRK,(1,1))
//SYSOUT   DD  SYSOUT=*
//*
//*--- VERIFY EXPECTED OUTPUT ---
//*
//VERIFY   EXEC PGM=IEBCOMPR,COND=(8,LT,UTEST01)
//SYSPRINT DD  SYSOUT=*
//SYSUT1   DD  DSN=&&CLMOUT,DISP=(OLD,DELETE)
//SYSUT2   DD  DSN=MEDCLAIM.TEST.EXPECTED.MEMBER,DISP=SHR
//SYSIN    DD  *
  COMPARE TYPORG=PS
/*

Integration Tests: End-to-end tests that run the complete job stream with test data and verify final outputs. These catch interface problems between programs.

Regression Tests: The complete test suite, run automatically after every code change. James configures the mainframe's automation tool to trigger regression tests whenever a program is compiled into the test load library.

The CI/CD Pipeline

James establishes a deployment pipeline using IBM's z/OS tools:

Developer makes change
        │
        ▼
Compile to TEST loadlib
        │
        ▼
Run unit tests (automated)
        │
        ▼
Run integration tests (automated)
        │
        ▼
Code review (manual - James or Sarah)
        │
        ▼
Promote to QA loadlib
        │
        ▼
Run regression suite against QA (automated)
        │
        ▼
User acceptance testing (Sarah Kim, manual)
        │
        ▼
Promote to PRODUCTION loadlib
        │
        ▼
Post-deployment verification (automated smoke test)

This is not a sophisticated CI/CD pipeline by modern standards. There is no container orchestration, no blue-green deployment, no canary releases. But it is a massive improvement over MedClaim's previous process, which was: compile, test manually, promote to production, and hope.

⚠️ Pragmatic Modernization. A common mistake in modernization is trying to adopt every modern practice at once. James resists the pressure to implement Kubernetes, microservices, and DevOps-as-code. Instead, he implements the practices that provide the most value for MedClaim's specific situation: automated testing (reduces the 2-3 day test cycle to 2 hours), code review (catches bugs before production), and a promotion pipeline (prevents accidental deployment of untested code). The goal is not to be trendy — it is to be effective.

Test Data Management

One of the underappreciated challenges of automated testing is test data management. Each test run needs consistent, predictable data. If test data changes between runs, test results become unreliable.

James establishes a test data strategy with three tiers:

Tier 1: Static Reference Data. Provider and member files that do not change between test runs. These are loaded once into test VSAM files (and later into test DB2 tables) and refreshed only when the data model changes.

Tier 2: Test Input Data. Claim records that exercise specific validation rules and edge cases. Each test case has a documented purpose, and the complete set is version-controlled alongside the source code.

Tier 3: Expected Output Data. The "golden" output files that represent correct behavior. When a test runs, its output is compared against the golden file. If the program behavior changes intentionally (e.g., a new audit field is added), the golden file is updated and the change is documented.

The test data is stored in a dedicated set of PDS members:

MEDCLAIM.TEST.INPUT(CLMMBR01)   - Member validation tests
MEDCLAIM.TEST.INPUT(CLMPROV01)  - Provider validation tests
MEDCLAIM.TEST.INPUT(CLMAUTH01)  - Authorization tests
MEDCLAIM.TEST.INPUT(CLMEDGE01)  - Edge cases
MEDCLAIM.TEST.INPUT(CLMREGR01)  - Regression (from past bugs)
MEDCLAIM.TEST.EXPECT(CLMMBR01) - Expected output for member tests
MEDCLAIM.TEST.EXPECT(CLMPROV01) - Expected output for provider tests
...

Measuring Modernization ROI

After completing all five phases, James prepares a modernization ROI report for the steering committee. The results quantify what modernization delivered:

Developer Productivity:

Metric Before After Improvement
Time to understand a program 3 days 4 hours 6x faster
Time to make a typical change 2 days 3 hours 5x faster
Time to test a change 2-3 days 2 hours 12x faster
Time to deploy to production 1 day 2 hours 4x faster
New developer onboarding 6 months 6 weeks 4x faster

System Quality:

Metric Before After Improvement
Production incidents per month 4.2 0.8 81% reduction
Mean time to resolve incident 6 hours 1.5 hours 75% reduction
Deployments per month 1-2 8-12 6x more frequent
Test coverage (business rules) ~30% ~90% 3x coverage
Code duplication rate ~35% ~8% 77% reduction

Business Capability:

Capability Before After
Real-time claim status Not available Available via API
Ad-hoc data queries 5-day request cycle Self-service SQL
New partner integration 3-6 months 2-4 weeks
Regulatory reporting changes 4-6 weeks 1-2 weeks

The total investment was 12 months of three-person effort — approximately 3 person-years. The annual savings in incident resolution, deployment efficiency, and developer productivity more than pay for the investment within the first year.

"The numbers tell one story," James says in his final presentation. "But the real ROI is harder to measure: we can now hire new developers and have them contributing within weeks instead of months. We can accept new partner integrations without a six-month project. And I can take a vacation without worrying that the system will fail while I'm gone."

🔴 Theme: The Human Factor. The modernization ROI is ultimately about people. Faster onboarding means the organization is not dependent on one or two developers. More frequent deployments mean changes reach users sooner. Lower incident rates mean the operations team spends less time firefighting and more time improving. The technology changes enable these human outcomes, but the human outcomes are what justify the investment.


The Web Service in Detail

CICS Pipeline Configuration

The CLM-API program does not exist in isolation. It is deployed within a CICS web service pipeline — a configuration that handles HTTP protocol, content type negotiation, and routing. The pipeline definition (typically in XML) specifies:

  1. Provider pipeline: The inbound pipeline that receives HTTP requests, converts them to CICS containers, and invokes the target program.
  2. URIMAP: A mapping from URL path to CICS program. For example, /medclaim/api/v1/claims/{claimId} maps to transaction CLMQ, which invokes CLM-API.
  3. JSON binding: The mapping between JSON field names and COBOL data item names. By default, COBOL item names become JSON field names. The NAME clause on JSON GENERATE/PARSE allows customization.

The CICS URIMAP definition:

DEFINE URIMAP(CLMAPI01)
    GROUP(MEDCLAIM)
    USAGE(SERVER)
    SCHEME(HTTPS)
    HOST(medclaim.example.com)
    PATH(/api/v1/claims/*)
    TRANSACTION(CLMQ)
    PIPELINE(CLMPIPE)
    TCPIPSERVICE(HTTPSPT)
    STATUS(ENABLED)

When an HTTP GET request arrives at https://medclaim.example.com/api/v1/claims/CLM000098765, CICS routes it through the pipeline, places the request body in a container, and starts transaction CLMQ, which invokes CLM-API.

Security Considerations for APIs

Exposing internal data through APIs creates security requirements that batch systems never faced:

Authentication: Every API call must identify the caller. Options include HTTP Basic Auth (username/password), OAuth 2.0 tokens, or client certificates. CICS integrates with RACF for authentication.

Authorization: Not every authenticated user should access every claim. James implements role-based access: hospitals can only see claims from their own members; MedClaim staff can see all claims.

Rate Limiting: A misbehaving client sending thousands of requests per second could overwhelm the CICS region. The API gateway (z/OS Connect or CICS web interface) should enforce request rate limits.

Encryption: All API traffic uses HTTPS (TLS). The TCPIPSERVICE definition specifies HTTPS and the SSL certificate.

Input Validation: The CLM-API program validates every input field before using it in an SQL statement. This prevents SQL injection — a concern that batch programs never had because their input came from trusted internal files.

      * Validate claim ID format before using in SQL
           IF WS-REQ-CLAIM-ID = SPACES
              OR WS-REQ-CLAIM-ID = LOW-VALUES
               MOVE 'ERROR' TO WS-RSP-STATUS
               MOVE 'CLAIM ID IS REQUIRED' TO WS-RSP-MESSAGE
               PERFORM 3000-SEND-RESPONSE
               EXEC CICS RETURN END-EXEC
           END-IF

      *    Check for embedded SQL injection characters
           INSPECT WS-REQ-CLAIM-ID TALLYING WS-TALLY-COUNT
               FOR ALL "'" ALL ";" ALL "--"
           IF WS-TALLY-COUNT > 0
               MOVE 'ERROR' TO WS-RSP-STATUS
               MOVE 'INVALID CHARACTERS IN CLAIM ID'
                   TO WS-RSP-MESSAGE
               PERFORM 3000-SEND-RESPONSE
               EXEC CICS RETURN END-EXEC
           END-IF

📊 API Usage Patterns. Within the first month after deployment, the CLM-API handles approximately 8,000 requests per day. The breakdown: 60% claim status inquiries from partner hospitals, 25% member claim history from call center agents, 15% automated monitoring checks from partner systems. Peak usage is 11 AM-2 PM Eastern (during business hours at partner organizations).

Error Handling in Web Services

Web services require a different error handling philosophy than batch programs. In batch, an error is logged to a file and the next record is processed. In a web service, the error must be communicated to the caller through the HTTP response:

Situation HTTP Status JSON Response
Claim found 200 OK Full claim data
Claim not found 404 Not Found {"status":"NOTFND","message":"Claim not found"}
Invalid input 400 Bad Request {"status":"ERROR","message":"Invalid claim ID format"}
DB2 error 500 Internal Server Error {"status":"ERROR","message":"Internal processing error"}
Authentication failed 401 Unauthorized {"status":"ERROR","message":"Authentication required"}

The CLM-API program maps internal processing results to appropriate HTTP status codes. Notice that internal error details (like SQLCODE values) are NOT exposed in the response — they could reveal database structure to potential attackers. Instead, the program logs detailed error information to a CICS system log and returns a generic error message to the caller.


Building the Test Automation Framework

The Three Tiers of Testing

James's testing framework uses three tiers, each serving a different purpose:

Tier 1: Unit Tests (per-program, per-function)

Unit tests validate individual validation rules and calculations. Each test provides specific input and verifies specific output. Unit tests are fast (seconds) and focused.

Example unit test for CLM-INTAKE member validation:

TEST: Valid member with active coverage
INPUT: Member MBR100000001, Service Date 20240315
EXPECTED: Claim accepted (no rejection)

TEST: Member not found
INPUT: Member MBR999999999, Service Date 20240315
EXPECTED: Claim rejected, code MNFD

TEST: Service date outside coverage
INPUT: Member MBR100000001, Service Date 20250101 (after term date)
EXPECTED: Claim rejected, code MCOV

TEST: Service date in the future
INPUT: Member MBR100000001, Service Date 20991231
EXPECTED: Claim rejected, code SFUT

Tier 2: Integration Tests (multi-program, full job stream)

Integration tests run the complete processing pipeline with test data. They verify that programs work together correctly — that CLM-INTAKE's output is correctly processed by CLM-ADJUD, and CLM-ADJUD's output is correctly processed by CLM-PAY.

Tier 3: Regression Tests (full system, production-volume data)

Regression tests use a sanitized copy of production data (with personal information masked). They run the complete job stream and verify that output matches a known-good baseline. Regression tests are run after every code change, no matter how small.

Automated Test Execution

James creates a JCL PROC (cataloged procedure) that runs all three tiers:

//MCTEST   PROC TIER='ALL'
//*================================================================*
//* MEDCLAIM AUTOMATED TEST SUITE                                   *
//* PARM: TIER = 'UNIT' | 'INTG' | 'REGR' | 'ALL'                *
//*================================================================*
//*--- TIER 1: UNIT TESTS ---
//UNIT01   EXEC PGM=CLMINTAK,
//         COND=(0,NE)        SKIP IF TIER NOT 'UNIT' OR 'ALL'
//STEPLIB  DD  DSN=MEDCLAIM.TEST.LOADLIB,DISP=SHR
// ... (unit test DD statements)
//*
//UNIT01V  EXEC PGM=IEBCOMPR,COND=(8,LT,UNIT01)
// ... (comparison to expected output)
//*
//*--- TIER 2: INTEGRATION TESTS ---
//INTG01   EXEC PGM=CLMINTAK,
//         COND=(4,LT,UNIT01V)  SKIP IF UNIT TESTS FAILED
// ... (integration test DD statements)

The test suite runs automatically every night at 8 PM, after the day's development is complete. Results are emailed to the team. If any test fails, the team investigates before promoting code.

🧪 The Test Data Challenge. Creating realistic test data is one of the hardest parts of automated testing. MedClaim's test data must include: valid claims for every combination of member type, provider type, and claim type; invalid claims for every validation rule; edge cases like zero-dollar claims, claims on the exact coverage start and end dates, and claims with the maximum allowed amount. James maintains a test data generation program that creates these scenarios systematically.


The Modernization Results

After 12 months, James presents the results to MedClaim's steering committee:

Quantitative Results

Metric Before After Change
Developers who understand the system 2 5 +150%
Test cycle time 2-3 days 2 hours -96%
Production incidents per quarter 12 3 -75%
Time to add new claim type 3-4 weeks 3-5 days -80%
Partner data latency 24 hours (batch) Sub-second (API) -99.99%
Copybook inconsistencies 49 0 -100%
Programs with automated tests 0 34 (of 47) +34

Qualitative Results

  1. Knowledge is no longer concentrated. Three new developers can now maintain the system because the code is documented, readable, and has automated tests that catch regressions.

  2. Change is no longer frightening. With automated tests, developers can modify programs with confidence that they will not break something unexpectedly. This has accelerated the pace of enhancement.

  3. Partners are satisfied. The API provides real-time access to claim status, eliminating the nightly batch extract that was MedClaim's most frequent source of partner complaints.

  4. The system is positioned for the future. With DB2, APIs, and automated testing in place, MedClaim can continue modernizing incrementally. The next phase might add event-driven processing, move to a cloud-hosted z/OS environment, or expose additional APIs.

What Was NOT Changed

Equally important is what James did not change:

  • The core adjudication logic. The business rules that determine how claims are paid were not modified. They were documented, tested, and preserved exactly as they were.
  • The batch job streams. Daily, weekly, and monthly batch processing continues to run exactly as before. The API is an addition, not a replacement.
  • The programming language. Everything is still COBOL. No Java wrappers, no Python scripts, no "strangler fig" pattern. COBOL on z/OS is the right tool for this workload, and modernizing the code does not require changing the language.

🔴 Theme: Legacy != Obsolete. MedClaim's COBOL system is now modern in every way that matters: well-documented, well-tested, modular, API-enabled, and maintainable by a team. It is still COBOL. It still runs on z/OS. It still processes claims in batch. But it is no longer a liability — it is an asset. The system is not legacy because it is old; it was legacy because it was unmaintainable. Now it is maintainable, and "legacy" no longer applies.


Lessons from the Trenches

Lesson 1: Start with Documentation, Not Code

James's first instinct was to start refactoring immediately. Sarah Kim talked him out of it. "If you change code you don't understand, you'll introduce bugs. Document first, change second." She was right. The three months spent on Phase 1 (documentation) saved six months of debugging in Phases 2-5.

Lesson 2: Modernize Incrementally

Every phase delivered value independently. Phase 1 (documentation) made the system understandable. Phase 2 (refactoring) made it maintainable. Phase 3 (DB2) made it queryable. Phase 4 (API) made it accessible. Phase 5 (CI/CD) made it safe to change. If the project had been canceled after any phase, MedClaim would still have benefited.

Lesson 3: Preserve the Business Logic

The most dangerous part of modernization is accidentally changing business logic. James's rule was: "The refactored program must produce exactly the same output as the legacy program for the same input." Every refactoring was verified by running both the old and new versions against the same test data and comparing outputs byte-for-byte.

Lesson 4: Automated Tests Are the Safety Net

Before Phase 5, every change was a risk. After Phase 5, changes were routine. The automated test suite is not glamorous — it is JCL that runs programs and compares outputs — but it is the single most valuable artifact of the entire modernization effort.

Lesson 5: People Matter More Than Technology

The modernization succeeded because James had the right team. Sarah Kim understood the business domain and could verify that refactored code preserved business intent. Tomás Rivera understood DB2 and could design efficient schemas. James himself understood the legacy code and could guide the refactoring. No amount of technology could have compensated for a team that did not understand the system.

🧪 Theme: Defensive Programming. Throughout the modernization, James maintained a "parallel run" capability. For every program that was refactored, both the old and new versions were available in production. If the new version produced unexpected results, the old version could be reinstated within minutes. This defensive approach meant that modernization never put production processing at risk.


Common Modernization Anti-Patterns

James has seen other organizations attempt modernizations that failed. He shares the most common anti-patterns:

Anti-Pattern 1: The Big Bang Rewrite

"Let's rewrite the whole system in Java." This approach takes years, costs millions, and usually fails because the new system cannot replicate the accumulated business logic of the legacy system. By the time the rewrite is "done," the requirements have changed, and the team has burned out.

Anti-Pattern 2: The Screen Scraper

Wrapping a 3270 terminal interface in a web browser does not modernize anything. It adds a layer of complexity and latency without improving the underlying system. Real modernization changes the code, not just the presentation.

Anti-Pattern 3: The Technology-First Approach

"Let's adopt Kubernetes, then figure out what to put in it." Technology decisions should follow business needs, not precede them. MedClaim did not need containers — they needed readable code, automated tests, and an API. The technology choices followed from those needs.

Anti-Pattern 4: The Documentation Skip

"We don't have time to document. Let's just start coding." This is how bugs are introduced. If you do not understand the existing code, you cannot safely change it. Documentation is not overhead — it is prerequisite.

Anti-Pattern 5: The All-or-Nothing Fallacy

"If we can't modernize everything, why modernize anything?" This mindset prevents incremental progress. Even documenting and testing the most critical 20% of the system delivers enormous value.


Working with the Student Mainframe Lab

Adapting MedClaim's Modernization for GnuCOBOL

Students working with GnuCOBOL or the Student Mainframe Lab can practice the core modernization concepts using simplified versions of MedClaim's programs.

Phase 1 (Documentation) requires no technology at all. Take any COBOL program — one from this textbook, one from an open-source repository, or one you have written yourself — and practice the code archaeology process:

  1. Create a program specification (one page)
  2. Draw a data flow diagram
  3. Build a business rule catalog
  4. Assign a modernization tier (1-4)

This exercise develops the analytical skills that are the foundation of all modernization work.

Phase 2 (Refactoring) translates directly to GnuCOBOL. The refactoring techniques — copybook creation, 88-level conditions, structured paragraphs, subprogram extraction — are standard COBOL and work identically in GnuCOBOL. Write the legacy version of CLM-INTAKE (from this chapter), then refactor it step by step, running comparison tests at each step.

The comparison testing framework is simpler in GnuCOBOL:

# Compile and run legacy version
cobc -x -o clm-intake-legacy clm-intake-legacy.cbl
./clm-intake-legacy < test-claims.dat > legacy-output.dat

# Compile and run refactored version
cobc -x -o clm-intake-new clm-intake-new.cbl
./clm-intake-new < test-claims.dat > new-output.dat

# Compare outputs
diff legacy-output.dat new-output.dat

If diff produces no output, the refactoring preserved behavior. If it shows differences, investigate.

Phase 3 (DB2 Migration) can be approximated with SQLite or PostgreSQL. Replace EXEC SQL blocks with file-based SQL calls using GnuCOBOL's ODBC support, or simulate the migration by replacing VSAM READ with indexed file access patterns that mirror DB2 SELECT.

Phase 4 (API) is the hardest to simulate without CICS. One approach is to build a simple command-line interface that accepts a claim ID as a parameter and returns the claim data formatted as JSON:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CLM-QUERY.
      * Simplified claim query - command-line version
      * Usage: ./clm-query CLM000098765

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  WS-CLAIM-ID          PIC X(15).
       01  WS-JSON-OUTPUT        PIC X(500).

       PROCEDURE DIVISION.
           ACCEPT WS-CLAIM-ID FROM COMMAND-LINE
           PERFORM 1000-LOOKUP-CLAIM
           DISPLAY WS-JSON-OUTPUT
           STOP RUN.

This is not a web service, but it exercises the same pattern: receive input, look up data, format output as JSON.

Phase 5 (CI/CD) can be practiced with shell scripts and cron jobs. Write a shell script that compiles all programs, runs all tests, and reports results. Schedule it to run automatically after code changes.

The Modernization Mindset

The most important takeaway from this capstone is not any specific technology — it is the modernization mindset. This mindset includes:

Respect for existing code. Legacy code works. It has been tested by millions of production transactions. Treating it with contempt leads to bad modernization decisions.

Incremental progress over revolutionary change. Each phase delivers value independently. If the project is canceled after Phase 2, the code is still better than before.

Testing as the foundation. You cannot safely change what you cannot test. Build the test framework first, then change the code.

Documentation as investment. Documentation is not overhead — it is the enabler of all subsequent work. Skipping documentation to "save time" costs more time in debugging and rework.

People over technology. The right team with simple tools will outperform the wrong team with sophisticated tools every time.

These principles apply regardless of the technology stack. Whether you are modernizing COBOL on z/OS, Java on WebSphere, or Python on AWS, the fundamentals are the same: understand first, test always, change incrementally, and invest in people.


Understanding the Modernization Timeline

Month-by-Month Breakdown

James's 12-month project followed a deliberate timeline. Understanding why each phase took the time it did helps with planning future modernization efforts.

Months 1-3: Phase 1 (Documentation and Understanding)

This phase consumed 25% of the project timeline, which surprised MedClaim's management. "Three months of documentation before any code changes?" was the reaction from the VP of IT.

James defended the timeline: "We are documenting 18 years of accumulated business logic. If we skip this phase, we will introduce bugs in every subsequent phase — bugs that will cost more than three months to find and fix."

The deliverables: - Complete system inventory (programs, copybooks, files, jobs) - Program specifications for all 47 programs - Business rule catalog with 247 cataloged rules - Data flow diagrams for 8 job streams - Modernization tier assignments for all programs

Months 4-6: Phase 2 (Refactoring)

Refactoring proceeded at approximately 5 programs per month. Each program went through the full cycle: create comparison baseline, refactor one step at a time, test after each step, commit. The 20 Tier 3 and Tier 4 programs were all refactored by the end of month 6.

Key milestone: by the end of month 5, all 47 programs were using standardized copybooks. This alone eliminated the inconsistency problem that had caused two production incidents in the previous year.

Months 7-9: Phase 3 (DB2 Migration)

Tomás Rivera led the DB2 migration. He created the schema design in month 7, built the data migration JCL in month 8, and modified programs to use SQL instead of VSAM in month 9. The migration was done one table at a time, with parallel runs at each step.

The most challenging part was not the SQL — it was the BIND process. DB2 plans needed to be bound with specific isolation levels and lock sizes to match the VSAM access patterns that the programs expected.

Month 10: Phase 4 (API)

Building the CLM-API web service took approximately 4 weeks. The COBOL code was straightforward — it was essentially a CICS program that reads DB2 and formats output, which MedClaim had been doing for years. The configuration — CICS pipeline definitions, URIMAP, security setup — took more time than the code itself.

Months 11-12: Phase 5 (CI/CD)

James and Sarah Kim spent the final two months building the automated test suite and deployment pipeline. The test data creation was the largest effort — building comprehensive test cases for 34 programs required deep understanding of every business rule.

The 80/20 Rule in Practice

Looking back, James observes that the Pareto principle applied at every level:

  • 80% of the modernization value came from 20% of the programs (the Tier 3 and Tier 4 programs)
  • 80% of the test cases came from 20% of the business rules (the complex adjudication rules)
  • 80% of the refactoring effort went to 20% of the programs (the 6 Tier 4 programs, especially the 12,000-line CLM-ADJUD)
  • 80% of the production incidents were caused by 20% of the code (the copybook-inconsistency areas)

This pattern is not unique to MedClaim. It applies to virtually every legacy modernization effort. The practical implication: focus your resources on the critical 20%. Do not waste time polishing programs that are already clean and rarely modified.


Summary

This capstone demonstrated the complete modernization of a legacy COBOL system through five phases:

  1. Documentation and Understanding: Code archaeology, system inventory, business rule cataloging
  2. Refactoring for Modularity: Copybook consolidation, concern separation, subprogram extraction
  3. Adding DB2: Migrating from VSAM to relational tables, SQL access from COBOL
  4. Exposing as API: CICS web services, JSON generation, real-time access
  5. CI/CD Pipeline and Testing: Automated unit, integration, and regression tests; deployment pipeline

All five themes converge in this capstone:

  • Legacy != Obsolete: The modernized system is still COBOL on z/OS, but it is no longer a liability
  • Readability is a Feature: Copybooks, 88-levels, and meaningful names make the code self-documenting
  • The Modernization Spectrum: Each phase moved the system along the spectrum without disrupting production
  • Defensive Programming: Parallel runs, automated tests, and rollback capability ensured safety
  • The Human Factor: The project succeeded because of the team, not the technology

The modernization journey does not end here. MedClaim's system is now positioned for further evolution: event-driven processing, machine learning for fraud detection, cloud deployment. But those are topics for Capstone 3.


Chapter Reflection: What Modernization Really Means

Looking back at MedClaim's 12-month modernization, the transformation is clear — but it is important to understand what changed and what did not.

What changed: The code is readable. The data is accessible. The system is testable. The team is capable. The interfaces are modern.

What did not change: The business logic. The programming language. The batch processing schedule. The z/OS platform. The core architecture.

This is the essence of modernization. It is not about replacing old with new — it is about making existing systems sustainable, maintainable, and extensible. A COBOL program with clear structure, standardized copybooks, comprehensive tests, and an API is a modern program — regardless of when the language was designed or when the program was first written.

James often uses an analogy from architecture: "You don't demolish a building because the plumbing is old. You upgrade the plumbing while keeping the structure. Our programs are the structure. The copybooks, the file handling, the interfaces — those are the plumbing. We upgraded the plumbing."

The Broader Context

MedClaim's modernization is not unique. Across the mainframe industry, organizations are following similar paths:

  • Financial services: Banks are modernizing COBOL payment systems to support real-time payments, APIs for fintech integration, and cloud-native front-ends backed by mainframe transaction engines.

  • Insurance: Health insurers are modernizing claims systems to support API-based provider portals, real-time eligibility verification, and interoperability requirements (like FHIR for healthcare data exchange).

  • Government: Federal and state agencies are modernizing tax processing, benefits administration, and social services systems — often with multi-year, multi-phase modernization plans similar to MedClaim's five-phase approach.

In every case, the pattern is the same: understand first, refactor incrementally, add modern interfaces, automate testing, and invest in the team. The technology varies, but the principles are universal.

Derek Washington, who has been observing MedClaim's modernization as preparation for Capstone 3, draws a key insight: "Building a new system is engineering. Modernizing an existing system is engineering plus archaeology plus diplomacy. You need all three to succeed."

Sarah Kim adds the business perspective: "The steering committee approved this project because we showed them the risk — two developers maintaining 800,000 lines of critical code. The technology improvements are real, but the risk reduction is what got us funded. Always lead with the business case."

These lessons — lead with business value, invest in people, modernize incrementally, test continuously — are the foundation of professional COBOL development. They are the lessons that James Okafor has learned over 15 years, and they are the lessons that this capstone aims to teach.