Case Study 9.2: MedClaim's Cross-System Copybook Strategy

Background

MedClaim Health Services processes 500,000 medical claims per month through a pipeline of four COBOL programs: CLM-INTAKE (intake), CLM-ADJUD (adjudication), CLM-PAY (payment processing), and RPT-PROVIDER (provider reporting). All four programs share the claim record layout, defined in the CLM-REC copybook.

But MedClaim's ecosystem extends beyond COBOL. A Java-based web portal allows providers to check claim status. A Python-based analytics system produces trend reports. An EDI (Electronic Data Interchange) gateway translates between MedClaim's internal format and the ANSI X12 837 standard.

James Okafor, the team lead, needed a strategy that kept all these systems in sync with the canonical claim record definition.

The Challenge

When the CLAIM-RECORD layout changes in CLM-REC, the following must also change:

  1. Four COBOL programs — recompile against the new copybook
  2. Java web portal — update the ClaimRecord Java class
  3. Python analytics — update the claim record parsing logic
  4. EDI gateway — update the mapping between X12 837 and MedClaim internal format
  5. Database — update the DB2 table if the claim record maps to a DB2 row
  6. Documentation — update the data dictionary and integration guide

Historically, each system was updated independently, and discrepancies crept in. The Java class once had a field width of 10 for CLM-PROVIDER-NPI while the COBOL copybook had 10 — this was fine. But when the Java developer misread a COMP-3 field width, the portal displayed garbled financial amounts for two days before anyone noticed.

James's Solution: Copybook as Source of Truth

James established a principle: the COBOL copybook is the canonical source of truth for the claim record layout. All other representations are derived from it, not independently maintained.

The Derivation Pipeline

CLM-REC.cpy (COBOL copybook)
    │
    ├──> cobol2json.py ──> claim-record.json (JSON Schema)
    │                           │
    │                           ├──> Java class generator
    │                           └──> Python dataclass generator
    │
    ├──> cobol2ddl.py ──> claim-table.sql (DB2 DDL)
    │
    └──> cobol2edi.py ──> claim-mapping.xml (EDI mapping)

The cobol2json.py script parses CLM-REC.cpy and produces a JSON Schema that describes every field, its type, its length, and its valid values (from 88-level conditions). Language-specific generators then produce the Java and Python representations.

The REPLACING Strategy for Testing

James also uses REPLACING creatively for testing. The production copybook defines full-width fields. A test version uses REPLACING to narrow fields, creating smaller test records:

      * Production — full 500-byte record
       COPY CLM-REC.

      * Testing — abbreviated record for unit tests
       COPY CLM-REC REPLACING
           ==PIC X(30)== BY ==PIC X(10)==
           ==PIC X(26)== BY ==PIC X(10)==.

Note: This is a simplified illustration. In practice, the test copybook would be a separate member that mirrors the structure with smaller fields, since REPLACING on PIC clauses affects all matching fields.

Results

Metric Before (Manual Sync) After (Derived)
Cross-system discrepancies per year 6-10 0-1
Time to propagate a copybook change 2-3 weeks 2-3 days
Integration test failures from layout mismatch Monthly Quarterly
Documentation accuracy ~85% ~99%

The Secondary Provider Addition

When Sarah Kim brought the secondary provider requirement (Section 9.10 in the chapter), James followed the established pipeline:

  1. Updated CLM-REC.cpy (added CLM-PRV2 sub-copybook, carved from FILLER)
  2. Ran cobol2json.py to produce the updated JSON Schema
  3. Generated the new Java class and Python dataclass
  4. Updated the DB2 DDL (ALTER TABLE)
  5. Updated the EDI mapping
  6. All four COBOL programs recompiled

Total time: 2.5 days, including testing. Previous similar changes had taken 2-3 weeks with manual synchronization.

Lessons Learned

  1. Single source of truth eliminates drift. When one artifact is the canonical definition and everything else is derived, discrepancies become impossible (assuming the derivation tools are correct).

  2. Invest in tooling. The cobol2json.py script took James two weeks to write. It saved months of effort over the following years.

  3. COBOL copybooks are a natural interface definition. They describe data precisely — types, sizes, valid values — which is exactly what integration requires.

  4. The FILLER strategy enables backward compatibility. By carving new fields from FILLER, the record length stays constant, and old data files remain readable.

Discussion Questions

  1. What happens if the cobol2json.py parser encounters a COBOL feature it does not understand (e.g., REDEFINES, DEPENDING ON)? How would you handle these cases?

  2. James chose the COBOL copybook as the source of truth. Could the JSON Schema or the DB2 DDL serve as the source instead? What are the trade-offs?

  3. How would this pipeline change if MedClaim adopted a microservices architecture where the claim record is transmitted as JSON over REST APIs rather than as a fixed-length record on a mainframe file?

  4. The REPLACING strategy for testing has limitations (it replaces all matching PIC clauses, not just the ones you intend). Propose an alternative approach for creating test copybooks.