Chapter 41: Legacy Code Archaeology

Generated by Claude

28 min read

> "The code is the documentation. Unfortunately, the code was written in 1987 by someone who thought comments were a waste of punch cards."

In This Chapter

41.1 The Reality of Legacy COBOL
41.2 A Systematic Approach to Code Reading
41.3 Data Flow Analysis
41.4 Impact Analysis
41.5 Reverse Engineering Business Rules
41.6 Documentation Recovery
41.7 Tools for Code Archaeology
41.8 Common Legacy Patterns and Anti-Patterns
41.9 Working with Tribal Knowledge
41.10 GlobalBank: Archaeology of a 1987 Module
41.11 MedClaim: Reverse Engineering Adjudication Rules
41.12 Try It Yourself: Analyzing an Unknown Program
41.13 Call Graph Construction Methods
41.14 Data Dictionary Recovery
41.15 COBOL Cross-Reference Analysis
41.16 Working with Tribal Knowledge: Advanced Techniques
41.17 Try It Yourself: Building a Cross-Reference Report
41.18 Documentation Templates for Legacy Systems
41.19 MedClaim: The Full Archaeology Report
41.20 Dead Code Detection and Removal
41.21 GlobalBank: The Retirement Knowledge Transfer
41.22 Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 41: Legacy Code Archaeology

"The code is the documentation. Unfortunately, the code was written in 1987 by someone who thought comments were a waste of punch cards." — Maria Chen, opening a 4,200-line program with three comment lines

Every COBOL developer, at some point in their career, will face this moment: a production problem emerges in a program that nobody currently on the team wrote, nobody fully understands, and nobody has touched in years. The original developer retired. The documentation, if it ever existed, is either missing or so outdated that it describes a version of the program that no longer exists. The program works — it has been working for decades — but now something needs to change, and someone needs to understand what this code actually does.

That someone is you.

Legacy code archaeology is the systematic process of understanding undocumented code. It is not random reading — it is structured investigation. You do not start at line 1 and read to line 5,000. You start with questions: What does this program do? What files does it touch? What are the inputs and outputs? Where are the business rules? Then you use specific techniques to answer those questions efficiently.

This chapter teaches you those techniques. By the end, you will be able to pick up any COBOL program — no matter how old, how long, or how uncommented — and systematically extract an understanding of its purpose, its logic, and its behavior.

41.1 The Reality of Legacy COBOL

Let us be honest about what you will encounter. The average age of production COBOL code is somewhere between 25 and 40 years old. Much of it was written before structured programming was widely adopted in COBOL shops. Much of it was modified dozens of times by dozens of developers, each of whom had their own style and their own understanding (or misunderstanding) of the original design.

What You Will Find

💡 Common Characteristics of Legacy COBOL: - Paragraph names like 2000-PROCESS and 3000-PROCESSING (what is the difference?) - Variables named WS-WRK-FLD-1 through WS-WRK-FLD-47 - GO TO statements creating spaghetti control flow - PERFORM THRU paragraphs with fall-through logic - Nested IF statements 10 levels deep with no scope terminators (pre-COBOL-85) - Commented-out code that may or may not be relevant - Multiple REDEFINES on the same data area for different record types - COPY members that have been modified in place (different versions in different libraries)

What You Will Not Find

Inline comments explaining business rules
A design document that matches the current code
Unit tests
A developer who remembers why paragraph 4700-SPECIAL-CALC exists

⚠️ Critical Mindset: Do not judge the original developers. They wrote this code under constraints you may not understand — tight deadlines, limited disk space, compiler limitations, performance requirements that forced certain design choices. The goal is not to criticize the code but to understand it.

41.2 A Systematic Approach to Code Reading

Random reading is the enemy of understanding. When you open a 5,000-line COBOL program, reading from top to bottom is like reading a novel by starting at page 200 — you will understand individual sentences but miss the story. Instead, follow this systematic approach.

Step 1: Identify the Program's Purpose

Start with the external evidence — everything except the code itself:

JCL: What files does the JCL assign? What DD names are used? What is the job name?
Program name: Does the name suggest a function? (CLMADJ = claim adjudication, RPTMTLY = monthly report)
File names: Input and output dataset names often reveal purpose
Scheduling: When does this job run? What runs before and after it?

//CLMADJ   EXEC PGM=CLMADJ01
//STEPLIB  DD  DSN=MEDCL.PROD.LOADLIB,DISP=SHR
//CLMIN    DD  DSN=MEDCL.CLAIMS.PENDING,DISP=SHR
//CLMOUT   DD  DSN=MEDCL.CLAIMS.ADJUDICATED,...
//PAYTBL   DD  DSN=MEDCL.PAYMENT.SCHEDULE,DISP=SHR
//ERRRPT   DD  SYSOUT=*
//CTLTOT   DD  DSN=MEDCL.CTL.CLMADJ,...

From this JCL alone, you can deduce: this program reads pending claims (CLMIN), produces adjudicated claims (CLMOUT), uses a payment schedule table (PAYTBL), writes error reports (ERRRPT), and produces control totals (CTLTOT). You know what it does before reading a single line of COBOL.

Step 2: Map the Data Division

The DATA DIVISION tells you what the program works with. Focus on:

File Section: Identify all files — their record layouts tell you what data flows in and out.

Working-Storage: Identify the major groups: - Control flags and switches (88-level items) - Counters and accumulators - Work areas and intermediate fields - Tables and arrays (OCCURS) - Constants and parameters

      * Look for meaningful group names:
       01  WS-CLAIM-WORK-AREA.        (claim processing fields)
       01  WS-PAYMENT-CALC.           (payment calculation)
       01  WS-ERROR-HANDLING.          (error management)
       01  WS-CONTROL-TOTALS.          (control totals)
       01  WS-FLAGS.                   (processing flags)
       01  WS-TABLE-AREAS.             (lookup tables)

📊 Data Division Survey Checklist: - [ ] How many files? Input vs. output? - [ ] How many record types (REDEFINES on FD records)? - [ ] What are the key fields (account numbers, claim numbers)? - [ ] What flags control processing flow (88-level items)? - [ ] What tables are loaded (OCCURS DEPENDING ON)? - [ ] Where are the financial totals (COMP-3 accumulators)?

Step 3: Trace the Main Control Flow

Look at the PROCEDURE DIVISION's main paragraph:

       0000-MAIN-CONTROL.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS UNTIL WS-EOF
           PERFORM 9000-TERMINATE
           STOP RUN.

This gives you the skeleton. Now trace each PERFORM one level deep:

       2000-PROCESS.
           PERFORM 2100-READ-INPUT
           IF WS-VALID-RECORD
               PERFORM 2200-VALIDATE
               IF WS-PASSES-VALIDATION
                   PERFORM 2300-ADJUDICATE
                   PERFORM 2400-CALCULATE-PAYMENT
                   PERFORM 2500-WRITE-OUTPUT
               ELSE
                   PERFORM 2600-WRITE-ERROR
               END-IF
           END-IF.

Now you have the program's story: read a claim, validate it, adjudicate it, calculate payment, write the result. The details are in the sub-paragraphs, but you understand the narrative.

Step 4: Build a Call Graph

A call graph shows which paragraphs call which other paragraphs. For a large program, this is essential:

0000-MAIN-CONTROL
├── 1000-INITIALIZE
│   ├── 1100-OPEN-FILES
│   ├── 1200-LOAD-TABLES
│   └── 1300-READ-PARMS
├── 2000-PROCESS
│   ├── 2100-READ-INPUT
│   ├── 2200-VALIDATE
│   │   ├── 2210-CHECK-MEMBER
│   │   ├── 2220-CHECK-PROVIDER
│   │   └── 2230-CHECK-ELIGIBILITY
│   ├── 2300-ADJUDICATE
│   │   ├── 2310-DETERMINE-COVERAGE
│   │   ├── 2320-APPLY-DEDUCTIBLE
│   │   ├── 2330-APPLY-COPAY
│   │   └── 2340-APPLY-LIMITS
│   ├── 2400-CALCULATE-PAYMENT
│   └── 2500-WRITE-OUTPUT
└── 9000-TERMINATE
    ├── 9100-VERIFY-TOTALS
    ├── 9200-WRITE-REPORT
    └── 9300-CLOSE-FILES

💡 Key Insight: You can build a call graph quickly by searching for PERFORM statements. In a well-structured program, the call graph reveals the entire business process at a glance. In a poorly structured program, it reveals exactly where the complexity hides.

Step 5: Focus on the Business Logic

The business rules live in the detail paragraphs — the ones that do the actual calculations, validations, and decisions. These are the paragraphs you need to understand most carefully:

       2310-DETERMINE-COVERAGE.
      *    This is where the money is.
      *    This EVALUATE determines how much the plan pays.
           EVALUATE CLM-PLAN-TYPE ALSO CLM-SVC-CATEGORY
               WHEN 'HMO'  ALSO 'PREV' ...
               WHEN 'PPO'  ALSO 'SPEC' ...
               WHEN OTHER ...
           END-EVALUATE.

41.3 Data Flow Analysis

Understanding where data comes from and where it goes is often more important than understanding the procedural logic. Data flow analysis answers the question: for a given field, what value does it hold, and how did it get there?

Forward Tracing

Start with an input field and trace it forward through the program:

CLM-CHARGED-AMOUNT (input record)
  → MOVE to WS-CHARGED-AMT (working storage)
  → Used in COMPUTE WS-ALLOWED-AMT (2320-APPLY-DEDUCTIBLE)
  → WS-ALLOWED-AMT used in COMPUTE WS-PLAN-PAYS (2340-APPLY-LIMITS)
  → WS-PLAN-PAYS MOVE to OUT-PAYMENT-AMOUNT (output record)

This trace shows you exactly how the input charge becomes the output payment.

Backward Tracing

Start with an output field and trace it backward to find its origin:

OUT-PAYMENT-AMOUNT (what we need to understand)
  ← MOVE from WS-PLAN-PAYS
  ← COMPUTED in 2340-APPLY-LIMITS
  ← Uses WS-ALLOWED-AMT and WS-PLAN-PCT
  ← WS-ALLOWED-AMT computed in 2320-APPLY-DEDUCTIBLE
  ← Uses WS-CHARGED-AMT minus WS-DEDUCTIBLE-AMT
  ← WS-CHARGED-AMT from CLM-CHARGED-AMOUNT (input)

Using grep for Data Flow

On a Unix/Linux system or with GnuCOBOL, you can use grep to quickly trace a field:

# Find every reference to a field
grep -n "WS-ALLOWED-AMT" CLMADJ01.cbl

# Find where a field is set (MOVE, COMPUTE, ADD, etc.)
grep -n "TO WS-ALLOWED-AMT\|WS-ALLOWED-AMT =" CLMADJ01.cbl
grep -n "COMPUTE WS-ALLOWED-AMT" CLMADJ01.cbl

# Find where a field is used (but not set)
grep -n "WS-ALLOWED-AMT" CLMADJ01.cbl | grep -v "TO WS-ALLOWED-AMT"

On the mainframe, ISPF's FIND command (or SDSF) serves the same purpose:

Command ===> FIND WS-ALLOWED-AMT ALL

📊 Data Flow Symbols

When documenting data flow, use these conventions: - → Direct MOVE - ⇒ Computed from (COMPUTE, ADD, SUBTRACT) - ? Conditional assignment (IF/EVALUATE) - ⟲ Loop accumulation (ADD within PERFORM loop) - ⊗ Unchanged (pass-through from input to output)

41.4 Impact Analysis

Impact analysis answers the question: if I change this field/paragraph/copybook, what else is affected?

Field Impact

To assess the impact of changing a field:

Find all references in the current program (grep or FIND)
Find all COPY members that define the field
Find all programs that COPY the same copybook
Find all JCL that uses the same file

# Find all programs that use a copybook
grep -l "COPY CLAIMCPY" /path/to/source/*.cbl

# Find all JCL that references a dataset
grep -l "MEDCL.CLAIMS.PENDING" /path/to/jcl/*.jcl

Paragraph Impact

Changing a paragraph affects: - Every paragraph that PERFORMs it - Every field it modifies (downstream effects) - Any paragraph it PERFORMs (if you change how it calls them)

Ripple Effect Analysis

Change: Modify WS-DEDUCTIBLE calculation in 2320-APPLY-DEDUCTIBLE

Direct impact:
  - WS-DEDUCTIBLE-AMT changes
  - WS-ALLOWED-AMT changes (depends on WS-DEDUCTIBLE-AMT)

Downstream impact:
  - WS-PLAN-PAYS changes (depends on WS-ALLOWED-AMT)
  - OUT-PAYMENT-AMOUNT changes (depends on WS-PLAN-PAYS)
  - WS-TOTAL-PAYMENTS accumulator changes
  - Control total report changes
  - Payment file downstream systems affected

Indirect impact:
  - GL reconciliation may fail (different total)
  - Provider payment amounts change
  - EOB (Explanation of Benefits) amounts change
  - Regulatory reports affected

⚠️ The Iceberg Principle: The visible change (modifying one paragraph) is the tip. The downstream, indirect, and cross-program impacts are the iceberg beneath the surface. Impact analysis maps the entire iceberg before you start changing code.

41.5 Reverse Engineering Business Rules

The most valuable output of code archaeology is a business rule catalog — a plain-language description of every decision the program makes.

Extracting Rules from EVALUATE

EVALUATE statements are the richest source of business rules:

           EVALUATE CLM-PLAN-TYPE
                    ALSO CLM-SVC-CATEGORY
                    ALSO CLM-NETWORK-STATUS
               WHEN 'HMO'  ALSO 'PREV'  ALSO 'IN '
                   MOVE 100 TO WS-COVERAGE-PCT
                   MOVE 0   TO WS-COPAY
               WHEN 'HMO'  ALSO 'PREV'  ALSO 'OUT'
                   MOVE 0   TO WS-COVERAGE-PCT
                   MOVE 0   TO WS-COPAY

Extracted business rule: | # | Rule | Plan | Service | Network | Coverage | Copay | |---|------|------|---------|---------|----------|-------| | BR-001 | HMO preventive in-network | HMO | Preventive | In | 100% | $0 | | BR-002 | HMO preventive out-of-network | HMO | Preventive | Out | 0% | $0 |

Extracting Rules from IF Statements

Nested IF statements encode business rules too, but they are harder to extract:

           IF CLM-AMOUNT > 50000
               IF CLM-PRE-AUTH = 'Y'
                   IF CLM-AUTH-DAYS-REMAINING > 0
                       PERFORM 2500-PROCESS-LARGE-CLAIM
                   ELSE
                       MOVE 'AUTH-EXP' TO WS-DENY-REASON
                       PERFORM 2600-DENY-CLAIM
                   END-IF
               ELSE
                   MOVE 'NO-AUTH'  TO WS-DENY-REASON
                   PERFORM 2600-DENY-CLAIM
               END-IF
           ELSE
               PERFORM 2500-PROCESS-NORMAL-CLAIM
           END-IF

Extracted business rules: - BR-010: Claims over $50,000 require pre-authorization - BR-011: Pre-authorization must not be expired (days remaining > 0) - BR-012: Claims $50,000 or less do not require pre-authorization - BR-013: Denied reasons: AUTH-EXP (expired authorization), NO-AUTH (no authorization on file)

Building a Business Rule Catalog

Document each extracted rule in a standard format:

Rule ID:     BR-010
Source:       CLMADJ01.cbl, paragraph 2400-CHECK-AUTH, line 847
Description:  Claims with charged amount exceeding $50,000
              require valid pre-authorization
Condition:    CLM-AMOUNT > 50000
Action:       If no pre-auth, deny with reason NO-AUTH
              If pre-auth expired, deny with reason AUTH-EXP
              If pre-auth valid, process as large claim
Dependencies: CLM-AMOUNT (input), CLM-PRE-AUTH (input),
              CLM-AUTH-DAYS-REMAINING (calculated in 2350)
Last Modified: Unknown (no change history)
Verified By:  [analyst initials and date]

✅ Best Practice: Have a business analyst review your extracted rules. Code tells you what the program does, but only a domain expert can confirm that what it does is correct. You may discover that the code implements a rule that is no longer valid, or implements it differently than the business intends.

41.6 Documentation Recovery

Documentation recovery is the process of creating documentation for a system that has none. It is different from documentation writing — you are not designing something new; you are describing something that already exists.

The Documentation Stack

Build documentation from the bottom up:

Data Dictionary: Every field, its type, its purpose, its valid values
Program Inventory: Every program, its purpose, its inputs and outputs
Call Graph / Program Flow: How programs call each other
Business Rule Catalog: Every decision the system makes
Job Stream Map: How jobs relate and depend on each other
System Overview: High-level architecture and data flow

Data Dictionary Template

Field Name:       CLM-CHARGED-AMOUNT
Program:          CLMADJ01
Copybook:         CLAIMCPY
PIC:              S9(07)V99 COMP-3
Description:      Total amount charged by provider for the
                  service. Used as the starting point for
                  payment calculation.
Valid Range:       0.01 to 9,999,999.99
Source:            Input file CLMIN (MEDCL.CLAIMS.PENDING)
Derived From:     Provider submission (EDI 837 or paper)
Used In:          2300-ADJUDICATE (calculate allowed amount)
                  2400-CALCULATE-PAYMENT (determine plan pays)
                  9100-VERIFY-TOTALS (accumulate for reconciliation)
Related Fields:   WS-ALLOWED-AMT, WS-PLAN-PAYS,
                  OUT-PAYMENT-AMOUNT

Program Inventory Template

Program ID:       CLMADJ01
Description:      Claims adjudication - applies benefit
                  rules to pending claims and calculates
                  payment amounts
Language:         COBOL (Enterprise COBOL 5.2)
Lines of Code:    4,247
Input Files:      CLMIN (claims pending),
                  PAYTBL (payment schedule)
Output Files:     CLMOUT (adjudicated claims),
                  ERRRPT (error report),
                  CTLTOT (control totals)
Called By:        JCL CLMADJ step in MEDCLAIM nightly batch
Calls:            DATECALC (date calculation subprogram)
Copybooks:        CLAIMCPY, PAYCPY, MEMBCPY, ERRCPY
DB2 Tables:       None (file-based)
Key Business Rules: BR-001 through BR-047
Last Modified:    2019-04-17 (per Endevor history)
Modified By:      J. Ramirez (no longer with company)

41.7 Tools for Code Archaeology

IBM Application Discovery and Delivery Intelligence (ADDI)

IBM ADDI (formerly IBM Application Discovery) automatically analyzes COBOL programs and produces: - Program call graphs - Data flow diagrams - Cross-reference reports - Dead code identification - Complexity metrics

COBOL Analyzers

Several vendor tools provide static analysis: - Micro Focus (OpenText) Enterprise Analyzer: Cross-reference, flow analysis, impact analysis - Compuware (BMC) Topaz: Program visualization, data lineage - Sonar COBOL Plugin: Code quality metrics, complexity measurement

Command-Line Techniques

When you do not have access to commercial tools, command-line utilities are surprisingly effective:

# Count paragraphs
grep -c "^       [0-9A-Z].*\.$" program.cbl

# List all paragraph names
grep "^       [0-9A-Z].*\.$" program.cbl

# Find all PERFORM statements
grep "PERFORM " program.cbl

# Find all file I/O operations
grep -E "READ |WRITE |REWRITE |DELETE |START " program.cbl

# Find all MOVE statements for a specific field
grep "MOVE.*TO WS-AMOUNT\|MOVE WS-AMOUNT" program.cbl

# Find GO TO statements (potential spaghetti code)
grep "GO TO" program.cbl

# Count lines of code (excluding comments and blanks)
grep -v "^......\*\|^$" program.cbl | wc -l

# Find all COPY statements
grep "COPY " program.cbl

# Find EVALUATE statements (business rule locations)
grep -c "EVALUATE" program.cbl

# Find all 88-level items (flags and conditions)
grep "88  " program.cbl

Building a Cross-Reference

A cross-reference lists every variable and every paragraph, showing where each is defined and referenced:

# Simple cross-reference generator (bash)
#!/bin/bash
PROGRAM=$1
echo "=== Cross-Reference for $PROGRAM ==="

# Extract all Working-Storage variables
echo "--- Variables ---"
grep "05  \|10  \|15  " "$PROGRAM" | \
    awk '{print $2}' | sort -u | while read VAR; do
    COUNT=$(grep -c "$VAR" "$PROGRAM")
    echo "  $VAR  ($COUNT references)"
done

# Extract all paragraphs
echo "--- Paragraphs ---"
grep "^       [0-9A-Z].*\.$" "$PROGRAM" | while read PARA; do
    NAME=$(echo "$PARA" | awk '{print $1}' | tr -d '.')
    CALLS=$(grep -c "PERFORM $NAME" "$PROGRAM")
    echo "  $NAME  (called $CALLS times)"
done

41.8 Common Legacy Patterns and Anti-Patterns

Knowing what to look for accelerates your understanding. Here are patterns you will encounter repeatedly:

Pattern: The Status Code

Legacy programs often use numeric status codes instead of named conditions:

           MOVE 1 TO WS-STATUS
      *    What does status 1 mean? You must find where
      *    WS-STATUS is tested to understand.
           ...
           IF WS-STATUS = 1
               PERFORM 3000-NORMAL
           ELSE IF WS-STATUS = 2
               PERFORM 4000-ERROR
           ELSE IF WS-STATUS = 3
               PERFORM 5000-SKIP

Archaeology technique: Search for every reference to WS-STATUS. Map each value to its meaning. Create a legend.

Anti-Pattern: PERFORM THRU with Fall-Through

           PERFORM 2000-START THRU 2999-END.
       2000-START.
           ... some logic ...
       2100-MIDDLE.
           ... more logic ...
       2999-END.
           EXIT.

The PERFORM THRU executes every paragraph from 2000-START through 2999-END sequentially. Any paragraph in that range is part of the PERFORM, even if it looks independent. This makes it dangerous to insert new paragraphs in the range.

Archaeology technique: Identify all PERFORM THRU ranges. Mark the start and end paragraphs. Be extremely careful not to add or remove paragraphs within the range.

Anti-Pattern: GO TO Spaghetti

       1000-PROCESS.
           IF WS-TYPE = 'A'
               GO TO 3000-TYPE-A
           END-IF
           IF WS-TYPE = 'B'
               GO TO 4000-TYPE-B
           END-IF
           GO TO 5000-DEFAULT.
       3000-TYPE-A.
           ...
           GO TO 6000-CONTINUE.
       4000-TYPE-B.
           ...
           IF WS-SPECIAL = 'Y'
               GO TO 3000-TYPE-A
           END-IF
           GO TO 6000-CONTINUE.

Archaeology technique: Draw a GO TO flow diagram. Identify all entry points and exit points for each paragraph. This is the only way to understand the actual control flow.

Pattern: The Working Storage Calculator

       01  WS-WORK-1              PIC S9(11)V99 COMP-3.
       01  WS-WORK-2              PIC S9(11)V99 COMP-3.
       01  WS-WORK-3              PIC S9(11)V99 COMP-3.
       01  WS-WORK-4              PIC S9(11)V99 COMP-3.

These generic work fields are reused throughout the program for different purposes. WS-WORK-1 might hold a balance in one paragraph and a payment amount in another.

Archaeology technique: Trace each work field through every paragraph that uses it. Document what it holds at each point. Flag places where it is repurposed.

Pattern: The Implicit Record Type

       01  INPUT-RECORD.
           05  INP-REC-TYPE        PIC X(02).
           05  INP-DATA            PIC X(198).

       01  INP-HEADER REDEFINES INPUT-RECORD.
           05  FILLER              PIC X(02).
           05  HDR-DATE            PIC 9(08).
           05  HDR-SOURCE          PIC X(10).

       01  INP-DETAIL REDEFINES INPUT-RECORD.
           05  FILLER              PIC X(02).
           05  DTL-ACCOUNT         PIC X(10).
           05  DTL-AMOUNT          PIC S9(09)V99 COMP-3.

       01  INP-TRAILER REDEFINES INPUT-RECORD.
           05  FILLER              PIC X(02).
           05  TRL-RECORD-COUNT    PIC 9(08).
           05  TRL-TOTAL-AMOUNT    PIC S9(13)V99 COMP-3.

The first two bytes determine which REDEFINES to use. This is a common and legitimate pattern, but it can be confusing if you do not recognize it.

Archaeology technique: Search for all REDEFINES on the FD record. Map each record type code to its corresponding REDEFINES. This reveals the file format.

41.9 Working with Tribal Knowledge

In many organizations, the most important documentation is not written down — it exists only in the heads of experienced developers, operators, and business users. This is tribal knowledge, and capturing it before people retire or leave is one of the most urgent challenges in mainframe shops.

Interviewing Techniques

When you have access to someone who knows the system, use structured interview techniques:

Start broad: - "What does this system do in business terms?" - "What would happen if this program stopped running?" - "What are the most common problems with this program?"

Then narrow: - "Why does paragraph 4700 exist? What special case does it handle?" - "This EVALUATE has 47 WHEN clauses. Are all of them still valid?" - "I see the deductible is calculated differently for plan type 'X'. Why?"

Record everything: - Take detailed notes - Record the conversation (with permission) - Follow up with a written summary for the interviewee to review

The Knowledge Transfer Checklist

When a senior developer announces retirement, prioritize knowledge transfer for:

Programs that only they maintain (single points of knowledge)
Programs with no documentation (no alternative information source)
Programs with unusual behavior ("Oh, that program has a special mode for leap years that is not documented anywhere")
Production incident history ("In 2019, we had to add paragraph 4700 because...")
Business rules that are not obvious from the code ("The regulatory requirement changed but we did not update the comments")

🔴 The Retirement Risk: The U.S. Government Accountability Office (GAO) has repeatedly warned about the risk of mainframe knowledge loss. The average COBOL developer is over 55 years old. When they retire, they take decades of context with them. Knowledge transfer is not optional — it is a business continuity issue.

41.10 GlobalBank: Archaeology of a 1987 Module

Maria Chen received a request to modify a program called GLRECON — the general ledger reconciliation module. The program had been written in 1987, modified sporadically through the 1990s, and untouched since 2003. It was 3,800 lines of COBOL with 12 comment lines (all in the IDENTIFICATION DIVISION).

Maria's Approach

Day 1: External Evidence

Maria started with the JCL: - Input: GBANK.GL.DAILY (general ledger transactions), GBANK.ACCT.MASTER (account master) - Output: GBANK.GL.RECONCILED, GBANK.GL.EXCEPTIONS, a SYSOUT report - Run position: Step 7 in the nightly batch, after interest calculation, before statement generation

This told her: GLRECON compares account balances against the general ledger, finds discrepancies, and writes exceptions for investigation.

Day 2: Data Division Survey

She cataloged the data structures: - 4 input/output files - 47 working storage variables (many with unhelpful names like WS-A1, WS-A2, WS-B1) - 3 tables loaded from a parameter file - 23 flags (88-level items)

Day 3: Call Graph

She built the call graph and found the program was mostly well-structured, with one exception: paragraphs 4700 through 4799 were a PERFORM THRU block that handled "special reconciliation adjustments" for a set of account types that had different GL mappings.

Day 4: Business Rule Extraction

The most complex paragraph was 3200-COMPARE-BALANCES. It contained an EVALUATE with 31 WHEN clauses, each mapping an account type to a GL category. Some of the account types had comments from the original 1987 development; others had been added later with no comments at all.

Maria created a business rule matrix:

Account Type	GL Category	Reconciliation Rule	Added
CHK	1000	Balance must match within $0.01	1987
SAV	1010	Balance must match within $0.01	1987
CD	1020	Balance must match exactly	1987
MMA	1030	Balance must match within $0.01	1993
IRA	1040	Special accrual adjustment	1997
HSA	1050	Separate reconciliation path	2003

Day 5: The Mystery of Paragraph 4700

Paragraph 4700-SPECIAL-RECON contained logic that nobody on the current team understood. It read a parameter file with a list of account numbers and applied a fixed adjustment to their GL balances before comparison. Maria found 15 account numbers in the parameter file, some belonging to accounts that had been closed for years.

She tracked down Harold Mercer, a retired developer who had worked at GlobalBank in the 1990s. Through a phone call (arranged by the HR department's alumni network), she learned: "Those are accounts that were involved in a system conversion in 1995. The conversion left a rounding difference on each account, and rather than fix the underlying data, we added the adjustment to the reconciliation program. We always meant to go back and fix it properly."

That conversation took 20 minutes and saved Maria weeks of confusion.

The Outcome

Maria produced: 1. A 24-page documentation package for GLRECON 2. A business rule catalog with 31 reconciliation rules 3. A recommendation to remove the 4700-SPECIAL-RECON logic (the 15 accounts no longer existed) 4. Her original modification (adding a new GL category for a new account type), completed confidently because she now understood the system

⚖️ Theme — The Human Factor: Maria's archaeology of GLRECON illustrates the human factor theme perfectly. The code was the easy part — she could read COBOL. The hard part was understanding why the code was written that way. That answer was in Harold Mercer's memory, not in the source code.

41.11 MedClaim: Reverse Engineering Adjudication Rules

James Okafor faced a different challenge. MedClaim's adjudication program, CLM-ADJUD, had accumulated business rules over 20 years. Different developers had added rules at different times, and there was no single document listing all the adjudication rules currently in effect. The compliance team needed a complete catalog for a regulatory audit.

The Scale of the Problem

CLM-ADJUD was 7,200 lines of COBOL. It contained: - 14 EVALUATE statements - 47 nested IF blocks - 23 PERFORM THRU ranges - 112 distinct business rules (as it turned out)

Sarah Kim's Contribution

Sarah Kim, the business analyst, worked alongside James. For each rule James extracted from the code, Sarah verified it against the current benefit plan documentation:

82 rules matched the current documentation exactly
19 rules were implemented correctly but undocumented
7 rules implemented a requirement that had since been superseded
4 rules were incorrect (implemented a misunderstood requirement)

The 7 superseded rules were legacy artifacts — they had been correct when written but the regulations had changed without the code being updated. Because the rules were lenient (they approved claims that should have been denied under new rules), they had not caused visible problems, but they represented a compliance risk.

The 4 incorrect rules were a discovery that James described as "finding a ticking time bomb." One of them incorrectly calculated the out-of-pocket maximum for a specific plan type, potentially exposing MedClaim to regulatory penalties.

The Archaeology Process

James used a structured approach:

Pass 1 — EVALUATE extraction: Documented every EVALUATE statement and its WHEN clauses
Pass 2 — IF extraction: Documented every nested IF block (focusing on those containing financial calculations)
Pass 3 — Cross-reference: Linked each rule to its input fields and output effects
Pass 4 — Business verification: Sarah verified each rule against plan documentation
Pass 5 — Gap analysis: Identified rules in the plan documentation that had no corresponding code

The complete catalog took three weeks to produce. James estimated it would have taken three months without Sarah's domain knowledge.

🔵 MedClaim Lesson: Code archaeology is most effective when a technical person (who can read the code) works alongside a domain expert (who knows the business). Neither one alone can produce a complete and accurate understanding.

41.12 Try It Yourself: Analyzing an Unknown Program

Student Lab Exercise

The code directory for this chapter contains a COBOL program called MYSTERY.cbl. It is 800 lines of intentionally undocumented code with unhelpful variable names and no comments (beyond the PROGRAM-ID).

Your assignment:

Do NOT read the program from top to bottom. Follow the systematic approach from this chapter.
Start with the PROCEDURE DIVISION's main control paragraph. What is the program's structure?
Build a call graph showing all PERFORM relationships.
Survey the DATA DIVISION. How many files? What are the record layouts?
Trace the data flow for the primary input field through to the primary output field.
Extract at least 5 business rules from EVALUATE and IF statements.
Write a one-page summary describing what the program does, as if you were explaining it to a new team member.

Time limit: 2 hours. This simulates a real-world scenario where you need to quickly understand an unfamiliar program.

41.13 Call Graph Construction Methods

Building a call graph is one of the first steps in code archaeology (section 41.2, Step 4). For small programs, you can trace PERFORM statements manually. For large systems — thousands of programs across hundreds of copybooks — you need systematic methods.

Method 1: grep-Based Call Graph

The simplest automated approach uses grep to extract PERFORM statements and build a tree:

#!/bin/bash
# call-graph.sh — Build a call graph from a COBOL program
# Usage: ./call-graph.sh PROGRAM.cbl

PROGRAM=$1
echo "=== Call Graph for $PROGRAM ==="
echo ""

# Extract all paragraph names (defined in col 8-72, ending with period)
echo "--- Paragraph Definitions ---"
grep -n "^       [0-9A-Z][0-9A-Z-]*\." "$PROGRAM" | \
    sed 's/\.$//' | \
    awk -F: '{printf "  Line %4d: %s\n", $1, $2}'

echo ""
echo "--- PERFORM Relationships ---"

# For each paragraph, find what it PERFORMs
grep -n "^       [0-9A-Z][0-9A-Z-]*\." "$PROGRAM" | \
    sed 's/\.$//' | \
    awk -F: '{print $2}' | \
    while read PARA; do
        PARA_NAME=$(echo "$PARA" | awk '{print $1}')
        # Find the line range of this paragraph (to next paragraph)
        START=$(grep -n "^       ${PARA_NAME}\." "$PROGRAM" | \
                head -1 | cut -d: -f1)
        # Find next paragraph start
        END=$(awk -v start="$START" \
            'NR > start && /^       [0-9A-Z][0-9A-Z-]*\./ \
            {print NR; exit}' "$PROGRAM")
        [ -z "$END" ] && END=$(wc -l < "$PROGRAM")

        # Extract PERFORMs within this paragraph
        PERFORMS=$(sed -n "${START},${END}p" "$PROGRAM" | \
                  grep "PERFORM " | \
                  grep -oP "PERFORM \K[0-9A-Z][0-9A-Z-]*" | \
                  sort -u)

        if [ -n "$PERFORMS" ]; then
            echo "  $PARA_NAME"
            echo "$PERFORMS" | while read P; do
                echo "    └── $P"
            done
        fi
    done

This script produces output like:

--- PERFORM Relationships ---
  0000-MAIN-CONTROL
    └── 1000-INITIALIZATION
    └── 2000-PROCESS
    └── 9000-TERMINATION
  2000-PROCESS
    └── 2100-READ-INPUT
    └── 2200-VALIDATE
    └── 2300-ADJUDICATE
    └── 2400-CALCULATE-PAYMENT
    └── 2500-WRITE-OUTPUT
  2200-VALIDATE
    └── 2210-CHECK-MEMBER
    └── 2220-CHECK-PROVIDER
    └── 2230-CHECK-ELIGIBILITY

Method 2: The Compiler Cross-Reference

Enterprise COBOL produces a cross-reference listing when compiled with the XREF option:

//COMPILE  EXEC PGM=IGYCRCTL,
//         PARM='XREF(FULL),MAP,LIST,OFFSET'

The XREF listing shows, for every data name and procedure name, every line where it is referenced and whether the reference is a definition, modification, or use. This is the most authoritative call graph source because it comes directly from the compiler.

Cross-Reference of Procedures
  Paragraph         Defined  References
  0000-MAIN-CONTROL    47
  1000-INITIALIZATION  53    P47
  1100-OPEN-FILES      68    P55
  1200-LOAD-TABLES     89    P56
  2000-PROCESS        112    P48
  2100-READ-INPUT     118    P113
  2200-VALIDATE       125    P114
  2210-CHECK-MEMBER   148    P126
  ...

The "P47" means "PERFORMed at line 47." This tells you that 1000-INITIALIZATION is called from line 47, which is within 0000-MAIN-CONTROL.

Method 3: IBM ADDI (Application Discovery)

For large-scale archaeology across thousands of programs, IBM ADDI automatically: - Scans all COBOL programs in a library - Builds inter-program call graphs (CALL statements across programs) - Maps copybook usage across all programs - Identifies dead code (paragraphs never PERFORMed) - Produces visual dependency diagrams

The output is a web-based interactive graph where you can click on a program to see everything it calls and everything that calls it. For a system with 2,000 COBOL programs, this is the only practical approach to understanding the full architecture.

📊 Call Graph Complexity Metrics

Metric	Description	Concern Level
Max depth	Deepest nesting of PERFORMs	> 8 levels = complex
Fan-out	Most paragraphs called by any single paragraph	> 10 = possibly doing too much
Fan-in	Most callers for any single paragraph	High fan-in = widely reused utility
Orphan paragraphs	Paragraphs never PERFORMed	Possible dead code
Circular references	A PERFORMs B PERFORMs A	Recursive logic (rare in COBOL)

41.14 Data Dictionary Recovery

When no data dictionary exists, you must build one from the source code. This is tedious but invaluable — once complete, it becomes the authoritative reference for everyone working on the system.

Automated Data Dictionary Extraction

#!/bin/bash
# extract-data-dict.sh — Extract data items from COBOL source
# Produces a CSV file of all data items

PROGRAM=$1
OUTPUT="${PROGRAM%.cbl}-data-dict.csv"

echo "Field Name,Level,PIC,Usage,Defined In,Line" > "$OUTPUT"

# Extract all data items with their PIC clauses
awk '
/^       [0-9][0-9] / {
    level = $1
    name = $2
    pic = ""
    usage = "DISPLAY"  # default

    # Look for PIC clause on this line or continuation
    if (match($0, /PIC[TURE]*\s+([^ .]+)/, arr)) {
        pic = arr[1]
    }
    if (match($0, /COMP-3/)) usage = "COMP-3"
    if (match($0, /COMP[^-]/)) usage = "COMP"
    if (match($0, /BINARY/)) usage = "BINARY"

    if (name != "FILLER") {
        printf "%s,%s,%s,%s,%s,%d\n", name, level, pic, usage,
               FILENAME, NR
    }
}' "$PROGRAM" >> "$OUTPUT"

echo "Data dictionary written to $OUTPUT"
echo "$(wc -l < "$OUTPUT") fields extracted"

The Data Dictionary Template

For each significant field, create a detailed entry:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DATA DICTIONARY ENTRY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Field:        CLM-OUT-OF-POCKET-ACCUM
Level:        05
Parent:       CLM-MEMBER-BENEFITS
PIC:          S9(07)V99 COMP-3
Bytes:        5
Description:  Year-to-date out-of-pocket accumulator for
              the member. Includes deductible payments,
              copays, and coinsurance. Excludes premium
              payments and non-covered services.
Valid Range:  0.00 to 99,999.99 (per plan year)
Reset:        Set to 0.00 on January 1 (or plan anniversary)
              by BNFRESET batch program.
Set By:       CLM-ADJ paragraph 3400-UPDATE-ACCUMULATORS
              CLM-VOID paragraph 2200-REVERSE-ACCUMULATORS
Used By:      CLM-ADJ paragraph 2310-CHECK-OOP-MAX
              ELIGCHK paragraph 4000-CALC-PATIENT-COST
              BNFSTMT paragraph 3100-MEMBER-SUMMARY
Related:      CLM-OOP-MAXIMUM (annual limit)
              CLM-DEDUCTIBLE-ACCUM (subset of OOP)
Source:       CLMCPY copybook
Programs:     CLM-ADJ, CLM-VOID, ELIGCHK, BNFSTMT, BNFRESET
Notes:        When OOP accumulator reaches OOP maximum,
              plan pays 100% (no further patient cost).
              This is a federal ACA requirement.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Key Insight: The most valuable part of a recovered data dictionary is the "Notes" field — the business context that is not captured in the PIC clause. At MedClaim, Sarah Kim contributed the notes for every field in the claims processing data dictionary, adding business context that no amount of code reading could reveal.

41.15 COBOL Cross-Reference Analysis

A cross-reference report is one of the most powerful tools for understanding legacy code. It tells you, for every name in the program, exactly where and how it is used.

Generating Cross-References

Enterprise COBOL (z/OS):

//COMPILE  EXEC PGM=IGYCRCTL,
//         PARM='XREF(FULL),VBREF,MAP,OFFSET'

XREF(FULL) produces a cross-reference of data names and procedure names
VBREF adds verb cross-reference (every COBOL verb and where it is used)
MAP produces a data map showing the offset and length of every field

GnuCOBOL:

cobc -x --listing --cross-reference PROGRAM.cbl

Reading the Cross-Reference

The data name cross-reference tells you which paragraphs modify each field:

Data Name             Defn   References
WS-PAYMENT-AMOUNT     047    M125 M237 R089 R142 R318 C415
                              ^    ^    ^    ^    ^    ^
                              |    |    |    |    |    |
                              |    |    |    |    |    Compared at 415
                              |    |    |    |    Referenced at 318
                              |    |    |    Referenced at 142
                              |    |    Referenced at 89
                              |    Modified at 237
                              Modified at 125

Legend: M = Modified  R = Referenced  C = Compared

This immediately tells you that WS-PAYMENT-AMOUNT is set at lines 125 and 237, and used (read) at lines 89, 142, 318, and 415. If you need to understand how the payment amount is calculated, you examine lines 125 and 237. If you need to understand who depends on it, you examine lines 89, 142, 318, and 415.

Using Cross-References for Impact Analysis

When asked "what happens if I change the size of CLM-CHARGED-AMOUNT from PIC S9(07)V99 to PIC S9(09)V99?", the cross-reference tells you every program and every line that references this field:

# Find all programs that reference CLM-CHARGED-AMOUNT
# (search across all source in the library)
grep -rn "CLM-CHARGED-AMOUNT" /path/to/source/*.cbl

# Find the copybook where it is defined
grep -rn "CLM-CHARGED-AMOUNT" /path/to/copybooks/*.cpy

From the cross-reference, you build an impact matrix:

Program	Lines Affected	Type of Reference	Risk
CLM-ADJ	125, 237, 318, 415	M, M, R, C	HIGH — calculates payment from this field
CLM-RPT	089, 142	R, R	LOW — display only
CLM-VOID	201, 245, 267	M, R, C	HIGH — reversal logic
ELIGCHK	none	none	NONE — does not reference this field

⚠️ The Hidden Impact: Changing a field's PIC size in a copybook affects every program that copies it — even programs that do not directly reference the field. The copybook change shifts the offsets of every field defined after it, which can corrupt data in programs that use REDEFINES or reference modification on the record.

41.16 Working with Tribal Knowledge: Advanced Techniques

Section 41.9 introduced the concept of tribal knowledge and basic interviewing techniques. Here we explore additional methods for capturing and preserving institutional knowledge.

The Annotated Code Review

Sit with the subject matter expert (SME) and walk through the code on screen, recording their running commentary:

      *    [SME annotation session — Harold Mercer, 2026-02-15]
      *
      *    Harold: "This paragraph is the heart of the program.
      *    It was originally just the first three IF blocks.
      *    The rest was added over the years."
      *
       4700-SPECIAL-RECON.
      *    Harold: "These 15 accounts were left over from the
      *    1995 conversion. Each has a small rounding error.
      *    The adjustment amounts are in the PARM file."
           PERFORM VARYING WS-ADJ-IDX
               FROM 1 BY 1
               UNTIL WS-ADJ-IDX > WS-ADJ-COUNT
               IF ACCT-NUMBER =
                  WS-ADJ-ACCT(WS-ADJ-IDX)
      *            Harold: "The adjustment is always a credit
      *            because the conversion shorted each account
      *            by a few cents."
                   ADD WS-ADJ-AMOUNT(WS-ADJ-IDX)
                       TO WS-GL-BALANCE
               END-IF
           END-PERFORM.
      *
      *    Harold: "If the adjustment accounts are ever closed,
      *    you can remove this entire paragraph and the
      *    parameter file. Nobody will miss it."

This annotated code becomes permanent documentation. The SME's words, attached to the specific lines they explain, are worth more than any amount of after-the-fact documentation.

The Scenario Walkthrough

Instead of asking general questions about code, walk through specific scenarios:

Scenario: "A claim comes in for $75,000 with a plan type of PPO and no pre-authorization. Walk me through what happens."

The SME traces the execution path while you document each decision point, field assignment, and branch. This produces a concrete trace that validates your understanding of the business rules.

Documentation Templates for Knowledge Transfer

Program Summary Template:

PROGRAM KNOWLEDGE TRANSFER DOCUMENT
====================================
Program: GLRECON
SME: Harold Mercer (retired, phone available)
Date: 2026-02-15
Scribe: Maria Chen

PURPOSE:
  [1-2 sentence business description]

WHEN IT RUNS:
  [Schedule, dependencies, batch position]

WHAT CAN GO WRONG:
  [Top 3 failure modes and recovery procedures]

BUSINESS RULES NOT OBVIOUS FROM CODE:
  1. [Rule description, which paragraph, why it exists]
  2. [...]
  3. [...]

HISTORY / WHY IT IS THE WAY IT IS:
  [Key historical context — conversions, mergers,
   regulatory changes that shaped the code]

IF YOU NEED TO CHANGE THIS PROGRAM:
  [Specific warnings, gotchas, areas of fragility]

CONTACT:
  [Who to call if this program fails at 3 AM]

Impact Analysis Tools

Beyond manual grep and cross-reference analysis, several tools specialize in COBOL impact analysis:

IBM ADDI Dependency Analysis: - Traces data flow across programs (field A in program X feeds field B in program Y via a shared file) - Identifies all programs affected by a copybook change - Maps DB2 table usage across all programs

Micro Focus (OpenText) Enterprise Analyzer: - Provides visual impact analysis diagrams - Supports "what-if" analysis — change a field and see the full ripple effect - Tracks data lineage from source to destination across the entire system

BMC Compuware Topaz for Total Test: - Records actual execution paths during testing - Identifies code paths that are never exercised (dead code candidates) - Generates test data from production patterns

📊 Impact Analysis Effort by System Size

System Size	Programs	Manual Analysis Time	Tool-Assisted Time
Small	10-50	1-3 days	2-4 hours
Medium	50-500	2-4 weeks	1-3 days
Large	500-5,000	3-6 months	1-3 weeks
Enterprise	5,000+	Not feasible	1-3 months

For systems above 500 programs, manual impact analysis is effectively impossible — the number of cross-program data flows exceeds what a human can track. Tool-assisted analysis is not optional; it is essential.

41.17 Try It Yourself: Building a Cross-Reference Report

Student Lab Exercise

Write a shell script (or COBOL program, if you prefer) that accepts a COBOL source file and produces:

Paragraph inventory: Every paragraph name, its line number, and how many times it is PERFORMed
Dead paragraph detection: Paragraphs that are defined but never PERFORMed (excluding the main paragraph)
Variable inventory: Every WORKING-STORAGE variable with its PIC clause and reference count
Unused variable detection: Variables defined but never referenced outside their definition
PERFORM depth analysis: For the main control paragraph, calculate the maximum nesting depth of PERFORM calls

Test your tool against the MYSTERY.cbl program from section 41.12. Compare your tool's output with your manual analysis to verify accuracy.

🧪 Extension Challenge: Enhance your tool to detect PERFORM THRU ranges and flag them with a warning. PERFORM THRU is the single most common source of confusion in legacy COBOL, and automatically identifying these ranges accelerates code understanding significantly.

41.18 Documentation Templates for Legacy Systems

When you complete a code archaeology project, the documentation you produce becomes the authoritative reference for everyone who touches the system. Having standard templates ensures consistency and completeness.

System Overview Template

SYSTEM OVERVIEW: [System Name]
================================
Date: [YYYY-MM-DD]
Author: [Name]
Reviewed By: [Name, Date]

1. BUSINESS PURPOSE
   [2-3 sentences describing what this system does
   in business terms, not technical terms]

2. SYSTEM BOUNDARIES
   Inputs:
     - [Source system] → [File/Queue/API] → [Program]
     - ...
   Outputs:
     - [Program] → [File/Queue/API] → [Target system]
     - ...

3. PROGRAM INVENTORY
   [Program ID]  [Lines] [Purpose]           [Criticality]
   CLMADJ01      4,247   Claims adjudication HIGH
   CLMVAL01      2,891   Claims validation   HIGH
   CLMRPT01      1,456   Claims reporting    MEDIUM
   ...

4. BATCH SCHEDULE
   [Job Name]  [Schedule]     [Duration] [Dependencies]
   CLMBATCH1   Daily 23:00    45 min     None
   CLMBATCH2   Daily 23:45    90 min     CLMBATCH1
   ...

5. DATA STORES
   [Dataset/Table]           [Type]  [Records] [Owner]
   MEDCL.CLAIMS.PENDING      VSAM    ~500K     CLM-INT
   MEDCL.CLAIMS.ADJUDICATED  SEQ     ~18K/day  CLM-ADJ
   MEMBER_COVERAGE            DB2     ~2M       ELIGCHK
   ...

6. KNOWN ISSUES
   - [Issue description, workaround, risk level]
   - ...

7. CHANGE HISTORY (from source control or Endevor)
   [Date]     [Program] [Developer] [Description]
   2019-04-17 CLMADJ01  J.Ramirez   Added HSA plan type
   ...

8. CONTACTS
   Primary: [Name, phone, email]
   Backup:  [Name, phone, email]
   Business: [Name, phone, email]

Job Stream Map Template

JOB STREAM MAP: [Stream Name]
==============================
Date: [YYYY-MM-DD]

TRIGGER: [Time/Event/Predecessor]

STEP 1: [Job Name]
  Program: [Program ID]
  Input:   [File(s)]
  Output:  [File(s)]
  Control: [Control total file]
  Recovery: [RERUN/RESTART/STOP]
  Max RC:   [0/4/8]
  Notes:    [Special considerations]

  ↓ (RC ≤ 4)

STEP 2: [Job Name]
  Program: [Program ID]
  Input:   [File(s) — output from Step 1]
  Control: [Verify against Step 1 control totals]
  ...

  ↓ (RC ≤ 4)     ↓ (RC = 4, warning)

STEP 3a: [Job]    STEP 3b: [Alert Job]
  ...               Notify operations team

Business Rule Catalog Template

BUSINESS RULE CATALOG: [System Name]
=====================================
Version: [N.N]
Date: [YYYY-MM-DD]
Verified By: [Business Analyst Name]

CATEGORY: [Category Name, e.g., "Eligibility Determination"]

BR-[NNN]: [Rule Name]
  Source:      [Program, Paragraph, Line]
  Description: [Plain language description of the rule]
  Condition:   [Technical condition from code]
  Action:      [What happens when condition is true]
  Action:      [What happens when condition is false]
  Regulatory:  [Regulatory citation, if applicable]
  Effective:   [Date rule became effective]
  Verified:    [Y/N] [Date] [Initials]
  Notes:       [Any additional context]

BR-001: Minimum Eligibility Age
  Source:      ELIGCHK, 1000-CHECK-MEMBER, line 142
  Description: Member must be at least 18 years old for
               individual coverage, or be a dependent
               under 26 for family coverage
  Condition:   MEMBER-AGE < 18 AND MEMBER-TYPE ≠ 'DEP'
  Action TRUE: Set COMM-NOT-ELIGIBLE, msg 'Under 18'
  Action FALSE: Continue eligibility check
  Regulatory:  ACA Section 2714 (dependents to age 26)
  Effective:   2010-09-23
  Verified:    Y  2026-02-15  SK
  Notes:       Age 26 cutoff applies to end of birth month

✅ Best Practice: Business rule catalogs should be living documents maintained in version control alongside the source code. When a developer changes a business rule in the code, the corresponding catalog entry should be updated in the same commit. At MedClaim, Sarah Kim reviews every pull request that modifies an EVALUATE or complex IF statement to ensure the business rule catalog stays current.

41.19 MedClaim: The Full Archaeology Report

To illustrate what a complete code archaeology output looks like, here is the table of contents from James Okafor's documentation of the CLM-ADJUD program — the 7,200-line adjudication engine described in section 41.11.

The Deliverables

CLM-ADJUD SYSTEM DOCUMENTATION
Version 1.0 — 2026-03-01
Prepared by: James Okafor, Sarah Kim

Table of Contents:

1. Executive Summary (2 pages)
   - What CLM-ADJUD does in business terms
   - Why this documentation was created
   - Key findings and recommendations

2. System Overview (5 pages)
   - Architecture diagram
   - Input/output file descriptions
   - Batch schedule and dependencies
   - Program inventory

3. Data Dictionary (28 pages)
   - 147 fields documented
   - Each field: name, PIC, description, valid values,
     source, destination, business meaning

4. Call Graph (4 pages)
   - Visual hierarchy of all 89 paragraphs
   - PERFORM THRU ranges highlighted
   - Dead code paragraphs identified (3 found)

5. Business Rule Catalog (35 pages)
   - 112 rules documented
   - Each rule: ID, source location, description,
     conditions, actions, regulatory citations
   - Verification status (82 confirmed, 19 undocumented,
     7 superseded, 4 incorrect)

6. Data Flow Diagrams (8 pages)
   - Input-to-output flow for charge amount
   - Input-to-output flow for payment amount
   - Accumulator flows (deductible, OOP, totals)

7. Known Issues and Recommendations (3 pages)
   - 7 superseded rules to remove
   - 4 incorrect rules to fix (PRIORITY: HIGH)
   - 3 dead code paragraphs to remove
   - 23 PERFORM THRU ranges to refactor (long-term)

8. Appendices (12 pages)
   - Cross-reference listing
   - Test case inventory
   - Revision history

This documentation package took three weeks to produce — one week of James reading code and building the call graph, one week of rule extraction with Sarah Kim verifying each rule, and one week of writing and reviewing. It has since been referenced hundreds of times by developers, business analysts, and compliance auditors.

"Three weeks of work that saves hundreds of hours over its lifetime," James said. "The ROI is infinite."

🔵 MedClaim Lesson: Documentation is not a luxury or an afterthought — it is a deliverable. At MedClaim, every major code archaeology project now produces a documentation package following this template. The packages are stored in the same Git repository as the source code, reviewed through pull requests, and updated when the code changes.

41.20 Dead Code Detection and Removal

Legacy programs accumulate dead code over decades — paragraphs that were once active but are no longer PERFORMed, variables that were once used but are now unused, and commented-out code blocks that clutter the program.

Types of Dead Code in COBOL

Unreachable paragraphs: Paragraphs that are never PERFORMed, CALLed, or GO TO'd from any other paragraph. They may have been active in a previous version but were orphaned when the calling logic was changed.

Unused variables: WORKING-STORAGE fields that are defined but never referenced in the PROCEDURE DIVISION. They may have been used by a paragraph that was removed, or they may be remnants of a planned feature that was never implemented.

Commented-out code: Large blocks of commented code are a common legacy pattern. Developers commented out code instead of deleting it, "just in case." After 20 years, nobody remembers what the commented code was for or whether it is still relevant.

Conditional dead code: Code that is technically reachable but can never execute because the condition that triggers it can never be true. For example, an EVALUATE WHEN clause for an account type that no longer exists in the system.

Detecting Dead Code

#!/bin/bash
# dead-code-detector.sh — Find unreachable paragraphs
PROGRAM=$1

echo "=== Dead Code Report for $PROGRAM ==="

# Get all paragraph names
PARAGRAPHS=$(grep "^       [0-9A-Z][0-9A-Z-]*\." "$PROGRAM" | \
             awk '{print $1}' | tr -d '.')

# Get the main paragraph (first paragraph)
MAIN_PARA=$(echo "$PARAGRAPHS" | head -1)

echo ""
echo "Unreachable Paragraphs:"
for PARA in $PARAGRAPHS; do
    if [ "$PARA" = "$MAIN_PARA" ]; then
        continue  # Skip the main paragraph
    fi
    # Check if this paragraph is PERFORMed anywhere
    REFS=$(grep -c "PERFORM.*${PARA}" "$PROGRAM" 2>/dev/null)
    GOTO_REFS=$(grep -c "GO TO.*${PARA}" "$PROGRAM" 2>/dev/null)
    TOTAL=$((REFS + GOTO_REFS))
    if [ "$TOTAL" -eq 0 ]; then
        LINE=$(grep -n "^       ${PARA}\." "$PROGRAM" | \
               head -1 | cut -d: -f1)
        echo "  WARNING: $PARA (line $LINE) — never called"
    fi
done

echo ""
echo "Unused Variables:"
# Check 05-level variables in WORKING-STORAGE
grep "       05  [A-Z][A-Z0-9-]*" "$PROGRAM" | \
    awk '{print $2}' | while read VAR; do
    # Count references (excluding the definition itself)
    REFS=$(grep -c "$VAR" "$PROGRAM")
    if [ "$REFS" -le 1 ]; then
        LINE=$(grep -n "       05  ${VAR}" "$PROGRAM" | \
               head -1 | cut -d: -f1)
        echo "  WARNING: $VAR (line $LINE) — defined but unused"
    fi
done

Safe Dead Code Removal

Removing dead code requires caution. Before deleting:

Verify with the compiler cross-reference — do not rely solely on grep. The compiler's XREF listing is authoritative.
Check for PERFORM THRU ranges — a paragraph that appears unreachable may be within a PERFORM THRU range and executed implicitly.
Check for indirect references — the paragraph might be referenced through a computed GO TO or through a variable-based PERFORM (rare but possible).
Test thoroughly — compile and test after every removal. Remove one paragraph or variable at a time.
Keep a record — document what you removed and why, in the commit message and in the change log.

      *    DEAD CODE REMOVED — 2026-03-10 — Maria Chen
      *    Paragraph 4700-SPECIAL-RECON removed.
      *    Per Harold Mercer: adjustment accounts from 1995
      *    conversion have all been closed. Parameter file
      *    GBANK.GLRECON.ADJPARM also deleted.
      *    Approval: Change Request CR-2026-0147

⚠️ The "Just In Case" Trap: Developers are often reluctant to remove dead code, reasoning "we might need it later." This is almost always wrong. Dead code is not a safety net — it is a liability. It confuses future developers, clutters search results, and inflates complexity metrics. If the code is in source control, it can always be retrieved from history if needed. Remove dead code aggressively, with proper documentation and testing.

41.21 GlobalBank: The Retirement Knowledge Transfer

When Robert announced his retirement with 18 months' notice, Maria Chen initiated a structured knowledge transfer program. Robert maintained 23 COBOL programs, including several that only he fully understood.

The Knowledge Transfer Schedule

MONTH 1-3: INVENTORY AND PRIORITY
  - Robert documented all 23 programs he maintained
  - Ranked by risk: HIGH (no one else understands),
    MEDIUM (partial understanding), LOW (well-documented)
  - 7 programs ranked HIGH, 9 MEDIUM, 7 LOW

MONTH 4-9: HIGH-RISK PROGRAMS
  - Robert paired with Derek for 2 HIGH programs
  - Robert paired with Jasmine for 2 HIGH programs
  - Robert paired with Ananya for 3 HIGH programs
  - Each pair: 2 weeks of annotated code review,
    1 week of shadowed maintenance, 1 week of solo
    maintenance with Robert available for questions

MONTH 10-14: MEDIUM-RISK PROGRAMS
  - Same pairing approach, accelerated schedule
  - 1 week annotated review, 1 week shadowed, then solo

MONTH 15-17: VALIDATION
  - Each transferee independently handles a maintenance
    request on their assigned programs
  - Robert reviews the change without doing the work
  - Any knowledge gaps identified and addressed

MONTH 18: ROBERT'S LAST MONTH
  - Final documentation review
  - "Robert's Rules" document: a collection of tips,
    gotchas, and institutional knowledge that did not
    fit anywhere else
  - Farewell presentation to the team

Robert's Rules (excerpts)

ROBERT'S RULES — Things I Wish Someone Had Told Me
═══════════════════════════════════════════════════

1. GBGLREC runs after GBPOST but before GBSTMT.
   If you ever need to rerun GBGLREC, you MUST also
   rerun GBSTMT afterward, because GBSTMT reads the
   reconciled balances, not the pre-recon balances.

2. The month-end close job (GBCLOSE) must run BEFORE
   midnight on the last business day, not after.
   The reason is that it uses FUNCTION CURRENT-DATE
   to determine the month, and if it runs at 12:01 AM,
   it thinks it is the next month.

3. Never change the SORT parameters in GBSORT01
   without checking with the VALIDATE team. The sort
   output format is not documented anywhere except
   in the GBVALID copybook (TXNSRTCPY).

4. Paragraph 3800 in GBINTCALC has a rounding adjustment
   that adds 0.005 before truncating to 2 decimal places.
   This is intentional — it implements banker's rounding
   required by Fed Regulation CC.

5. If GBARCHIVE fails, do NOT rerun it without first
   checking whether the GDG generation was created.
   A partial GDG generation can cause the next night's
   archive to roll off a good generation.

These five rules took Robert 30 seconds each to write. They encode decades of experience that would have taken his successors months to discover independently — if they discovered them at all.

💡 Key Insight: The most valuable knowledge in any legacy system is the "why," not the "what." The code tells you what it does. Only the people who built and maintained it can tell you why it does it that way. Capturing this knowledge before people leave is the single highest-ROI activity in legacy system management.

41.22 Chapter Summary

Legacy code archaeology is not glamorous, but it is one of the most valuable skills a COBOL developer can possess. The 220 billion lines of COBOL in production today were not all written yesterday — most of them have been running for decades, accumulating business rules, workarounds, and institutional knowledge that exists nowhere except in the code itself.

The systematic approach — external evidence first, then data division survey, then control flow tracing, then data flow analysis, then business rule extraction — turns a daunting wall of uncommented code into a manageable, understandable system. Tools help (grep, ISPF FIND, IBM ADDI), but the most important tool is a disciplined mind that asks the right questions in the right order.

And sometimes, the most important step is picking up the phone and calling Harold Mercer.

Maria Chen's advice to Derek Washington, who was staring at GLRECON for the first time: "Do not try to understand every line. Understand the story. What goes in, what comes out, and what decisions happen in between. The details will fill in as you need them."

Derek's response, after completing his first code archaeology assignment: "I feel like an anthropologist who just deciphered an ancient language. Except the language is still in production and moves three trillion dollars a day."

🔗 Looking Ahead: Chapter 42, the final chapter, looks to the future. What role will COBOL play in 2030 and beyond? How will AI change COBOL development? What does the career path look like? All five themes of this textbook converge in the final chapter.