30 min read

> "Every bug in a COBOL program was put there by a person. Understanding the bug means understanding the person's mistake. That requires empathy as much as skill." — James Okafor, MedClaim team lead

Chapter 33: Debugging Strategies

"Every bug in a COBOL program was put there by a person. Understanding the bug means understanding the person's mistake. That requires empathy as much as skill." — James Okafor, MedClaim team lead

It is 2:47 AM. James Okafor's phone rings. Production is down. The nightly claim processing batch job has abended with S0C7 at offset +003A2E in module CLM-ADJUD. Thousands of claims are stuck in the pipeline, and the morning processing window opens in four hours. James pulls up the dump, starts tracing through hexadecimal storage, and within twenty minutes identifies the problem: a provider record with an alphabetic character in a numeric field, introduced by a data conversion three weeks ago that nobody caught. He patches the data, restarts the job, and goes back to sleep.

This chapter teaches you the debugging skills that James Okafor uses daily — skills that separate effective mainframe developers from those who stare helplessly at dumps. You will learn to use DISPLAY statements strategically, leverage compiler debugging options, read abend dumps, use interactive debuggers, and develop the systematic thinking that turns mysterious failures into understood problems.

33.1 The Debugging Mindset

Before we discuss tools and techniques, let us establish the right mindset. Debugging is not random guessing. It is systematic hypothesis testing:

  1. Observe: What exactly happened? What abend code? What data was being processed? What was the expected behavior?
  2. Hypothesize: Based on the symptoms, what could cause this? Generate multiple hypotheses.
  3. Test: For each hypothesis, what evidence would confirm or refute it? Examine that evidence.
  4. Narrow: Eliminate hypotheses until one remains. If all are eliminated, generate new ones.
  5. Fix: Apply the smallest correct fix. Verify that it resolves the issue without introducing new ones.
  6. Prevent: Add validation, error handling, or tests to prevent recurrence.

🧪 The Human Factor: The hardest debugging is reading code someone else wrote years ago. You are not just debugging logic — you are reverse-engineering another person's intent. Comments may be wrong or missing. Variable names may be misleading. The code may have been modified multiple times by different people. Approach unfamiliar code with patience and humility.

33.2 DISPLAY Statement Debugging

The DISPLAY statement is the oldest and most universal debugging tool in COBOL. It writes text to SYSOUT (typically the job log), allowing you to trace program execution and inspect data values.

33.2.1 Strategic DISPLAY Placement

Do not scatter DISPLAYs randomly. Place them at key decision points:

      *--- Entry/exit of major paragraphs ---*
       2000-PROCESS-CLAIM.
           IF WS-DEBUG-MODE = 'Y'
               DISPLAY 'ENTERING 2000-PROCESS-CLAIM'
               DISPLAY '  CLAIM-ID: ' WS-CLAIM-ID
               DISPLAY '  CLAIM-AMT: ' WS-CLAIM-AMT
           END-IF

           PERFORM 2100-VALIDATE-CLAIM
           PERFORM 2200-ADJUDICATE-CLAIM
           PERFORM 2300-CREATE-PAYMENT

           IF WS-DEBUG-MODE = 'Y'
               DISPLAY 'EXITING 2000-PROCESS-CLAIM'
               DISPLAY '  STATUS: ' WS-CLAIM-STATUS
               DISPLAY '  APPROVED-AMT: ' WS-APPROVED-AMT
           END-IF.

33.2.2 Debug Levels

Implement a debug level system to control verbosity without recompiling:

       01  WS-DEBUG-CONTROL.
           05  WS-DEBUG-LEVEL   PIC 9(1) VALUE 0.
      *        0 = No debug output
      *        1 = Major paragraph entry/exit
      *        2 = Key decision points and data values
      *        3 = Every paragraph, detailed data
      *        9 = Full trace (extremely verbose)

       ...

      *--- Level 1: Major flow ---*
           IF WS-DEBUG-LEVEL >= 1
               DISPLAY 'DBG1: ENTERING PROCESS-CLAIM'
           END-IF

      *--- Level 2: Decisions and data ---*
           IF WS-DEBUG-LEVEL >= 2
               DISPLAY 'DBG2: PROVIDER-ID=' WS-PROVIDER-ID
               DISPLAY 'DBG2: BILLED-AMT=' WS-BILLED-AMT
               DISPLAY 'DBG2: PLAN-CODE=' WS-PLAN-CODE
           END-IF

      *--- Level 3: Detailed trace ---*
           IF WS-DEBUG-LEVEL >= 3
               DISPLAY 'DBG3: FEE-SCHED LOOKUP RESULT='
                       WS-FEE-RESULT
               DISPLAY 'DBG3: COPAY=' WS-COPAY-AMT
               DISPLAY 'DBG3: DEDUCTIBLE=' WS-DEDUCT-AMT
           END-IF

Read the debug level from a control card or PARM so you can change it without recompiling:

       1000-INITIALIZE.
           ACCEPT WS-DEBUG-LEVEL FROM JCL-PARM
           IF WS-DEBUG-LEVEL > 0
               DISPLAY 'DEBUG MODE ACTIVE - LEVEL '
                       WS-DEBUG-LEVEL
           END-IF.

33.2.3 Displaying Hex Values

When debugging data exceptions, you often need to see the hexadecimal representation of data. COBOL does not have a built-in hex display, but you can use this technique:

       01  WS-HEX-DISPLAY.
           05  WS-HEX-TABLE.
               10  FILLER PIC X(16) VALUE '0123456789ABCDEF'.
           05  WS-HEX-CHARS REDEFINES WS-HEX-TABLE
                             PIC X(1) OCCURS 16.
           05  WS-HEX-OUTPUT PIC X(80).

       DISPLAY-HEX-FIELD.
      *--- Display a field's hex content ---*
           MOVE SPACES TO WS-HEX-OUTPUT
           PERFORM VARYING WS-IDX FROM 1 BY 1
                       UNTIL WS-IDX > WS-FIELD-LENGTH
               DIVIDE WS-BYTE(WS-IDX) BY 16
                   GIVING WS-HIGH-NIBBLE
                   REMAINDER WS-LOW-NIBBLE
               STRING WS-HEX-CHARS(WS-HIGH-NIBBLE + 1)
                      WS-HEX-CHARS(WS-LOW-NIBBLE + 1)
                      DELIMITED BY SIZE
                      INTO WS-HEX-OUTPUT
                      WITH POINTER WS-HEX-PTR
           END-PERFORM

           DISPLAY 'HEX: ' WS-HEX-OUTPUT.

For production emergencies, a simpler approach is to display the raw data and examine it in the job log:

           DISPLAY 'RAW DATA: >' WS-SUSPECT-FIELD '<'

The angle brackets delimit the field so you can spot trailing spaces, low values, or other unexpected characters in the SYSOUT.

💡 Pro Tip: When adding temporary DISPLAY statements for debugging, prefix them with a unique tag like *DBG* so you can easily find and remove them later:

           DISPLAY '*DBG* BALANCE=' WS-BALANCE

After fixing the bug, search for *DBG* and remove all debug statements. Better yet, use conditional debug levels (Section 33.2.2) and leave them in permanently.

33.3 Compiler Debugging Options

The Enterprise COBOL compiler offers several options that detect bugs at compile time or runtime:

33.3.1 SSRANGE — Subscript Range Checking

SSRANGE checks that all table subscripts and indexes are within the declared bounds. Without SSRANGE, an out-of-bounds subscript silently accesses whatever memory happens to be adjacent to the table — a classic source of mysterious data corruption.

//COBOL  EXEC PGM=IGYCRCTL,
//       PARM='SSRANGE,TEST(DWARF),OPT(0)'

With SSRANGE active:

       01  WS-TABLE.
           05  WS-ENTRY PIC X(10) OCCURS 100 TIMES.

       ...

      *--- With SSRANGE, this causes a runtime error ---*
       MOVE 'OVERFLOW' TO WS-ENTRY(101).

Without SSRANGE, the MOVE silently writes past the table into whatever follows it in WORKING-STORAGE, corrupting data. With SSRANGE, the program abends with a clear diagnostic message telling you which subscript was out of range.

⚠️ Warning: SSRANGE adds runtime overhead (approximately 5-15% depending on the program). Use it during development and testing. Some shops disable it for production, though Maria Chen argues it should stay on: "The performance cost of SSRANGE is nothing compared to the cost of a production data corruption incident."

33.3.2 CHECK — Numeric Data Checking

The CHECK compiler option validates numeric fields at runtime, catching data exceptions before they cause S0C7 abends:

//COBOL  EXEC PGM=IGYCRCTL,
//       PARM='CHECK(ON)'

CHECK catches: - Non-numeric data in numeric fields (the classic S0C7 cause) - Decimal arithmetic overflows - Division by zero

33.3.3 TEST — Debug Symbol Generation

The TEST option generates debugging information that interactive debuggers use to map machine code back to COBOL source lines:

//COBOL  EXEC PGM=IGYCRCTL,
//       PARM='TEST(DWARF,SOURCE,SEPARATE)'

TEST options: - DWARF: Generate DWARF debug information (modern standard) - SOURCE: Include source statement information - SEPARATE: Store debug info in a separate file (reduces load module size)

33.3.4 LIST and MAP — Compiler Listings

While not runtime options, LIST and MAP generate essential debugging artifacts:

  • LIST: Produces an assembler listing showing the generated machine code, critical for mapping dump offsets to COBOL statements
  • MAP: Shows the storage layout — where every variable lives in memory, essential for reading dumps
//COBOL  EXEC PGM=IGYCRCTL,
//       PARM='LIST,MAP,OFFSET,XREF'

MAP output (partial example):

 DATA                     BASE   DISPL  DEFINITION
 DIVISION                 LOC
 -------                  ----   -----  ----------
 WS-CLAIM-ID             BL=01   000    PIC X(15)
 WS-CLAIM-AMT            BL=01   00F    PIC S9(9)V99 COMP-3
 WS-CLAIM-STATUS         BL=01   015    PIC X(2)
 WS-PROVIDER-ID          BL=01   017    PIC X(10)

The DISPL (displacement) column tells you exactly where each variable lives relative to the base register. When a dump shows data at a specific offset, the MAP listing lets you identify which variable it is.

33.3.5 OPTIMIZE vs. Debugging

Optimization and debugging are at odds. The compiler's optimizer reorders instructions, eliminates dead code, and consolidates calculations — all of which make it harder to map dump offsets to source lines.

Development/Test: OPT(0),TEST(DWARF,SOURCE),SSRANGE Production (normal): OPT(2),NOTEST,NOSSRANGE Production (debugging): OPT(0),TEST(DWARF,SOURCE),LIST,MAP

📊 Compiler Options Quick Reference:

Option Purpose Overhead When to Use
SSRANGE Subscript bounds checking 5-15% Development, test, ideally production
CHECK Numeric validation 10-20% Development, test
TEST Debug symbols Compile-time only Whenever interactive debugging is needed
LIST Assembler listing Compile-time only Always (for dump analysis)
MAP Storage map Compile-time only Always (for dump analysis)
OFFSET Statement offsets Compile-time only Always (for dump analysis)

33.4 READY TRACE and RESET TRACE

COBOL provides built-in tracing through the READY TRACE and RESET TRACE statements:

       READY TRACE.
      *--- From here, COBOL traces every paragraph entered ---*
       PERFORM 2000-PROCESS-RECORD.
       PERFORM 3000-UPDATE-DATABASE.

       RESET TRACE.
      *--- Tracing stops ---*

READY TRACE causes the runtime to display the name of every paragraph or section as it is entered. This produces voluminous output but is invaluable when you need to understand the actual execution path.

Sample output:

READY TRACE
 2000-PROCESS-RECORD
 2100-VALIDATE-INPUT
 2200-LOOKUP-PROVIDER
 2210-CHECK-CREDENTIALS
 2200-LOOKUP-PROVIDER       (returned)
 2300-CALCULATE-PAYMENT
 2310-APPLY-DEDUCTIBLE
 2320-APPLY-COPAY
 2300-CALCULATE-PAYMENT     (returned)
 2000-PROCESS-RECORD        (returned)
 3000-UPDATE-DATABASE

⚠️ Warning: READY TRACE generates enormous amounts of output for programs that execute millions of PERFORMs. Use it selectively — bracket it around the suspected problem area, and if possible, add a counter to activate it only for specific records:

           IF WS-RECORD-COUNT = WS-PROBLEM-RECORD-NUM
               READY TRACE
           END-IF

           PERFORM 2000-PROCESS-RECORD

           IF WS-RECORD-COUNT = WS-PROBLEM-RECORD-NUM
               RESET TRACE
           END-IF.

33.5 Abend Analysis — Reading Dumps

When a COBOL program terminates abnormally (abends), the system produces a dump — a snapshot of the program's memory at the moment of failure. Reading dumps is the most important debugging skill for mainframe developers.

33.5.1 Common Abend Codes

Every mainframe developer should have these memorized:

Code Name Usual Cause
S0C1 Operation Exception Branch to non-executable storage (bad CALL, corrupted branch address)
S0C4 Protection Exception Accessing storage outside your address space (wild pointer, unallocated memory)
S0C7 Data Exception Non-numeric data in a packed decimal field (the #1 COBOL abend)
S0CB Decimal Divide Exception Division by zero or quotient too large
S322 Time Limit Exceeded Job ran too long — likely an infinite loop
S806 Module Not Found CALL to a program that does not exist in the load library
S80A Not Enough Virtual Storage REGION too small or storage leak
S0C5 Addressing Exception Similar to S0C4, accessing invalid address
S013 Conflicting DCB File attributes don't match between JCL and program
S001 I/O Error Physical read/write error on a dataset

33.5.2 The Anatomy of S0C7

S0C7 (data exception) deserves special attention because it is by far the most common COBOL abend. It occurs when the hardware attempts a decimal arithmetic or comparison instruction on a field that does not contain valid packed decimal data.

What causes S0C7: - Uninitialized COMP-3 (packed decimal) fields - Alphabetic data moved to a numeric field - File records with corrupted data - Reading past end-of-file - Uninitialized group-level moves that overlay numeric fields - Incorrect REDEFINES over numeric fields

How to diagnose S0C7:

Step 1: Find the failing instruction's offset from the abend message:

IEA995I SYMPTOM DUMP:
SYSTEM COMPLETION CODE=0C7  REASON CODE=00000000
  PSW AT TIME OF ERROR  078D1000  A003A2E0
  ILC 06  INTC 07
  ACTIVE LOAD MODULE        CLM-ADJUD
  OFFSET IN LOAD MODULE     0003A2E0

Step 2: Find that offset in the compiler listing (compiled with LIST and OFFSET):

003A2E  AP    APPROVED-AMT(6,WS-BASE+180),
              COPAY-AMT(5,WS-BASE+186)

Step 3: The AP (Add Packed) instruction is failing. One of the two operands — APPROVED-AMT or COPAY-AMT — contains non-packed-decimal data.

Step 4: Find these fields in the dump. The MAP listing tells you their displacement from the base register. Look at the hex data:

WS-BASE+180:  00 50 00 0C  (valid packed: +5000.00)
WS-BASE+186:  F1 F2 F3 F4  (INVALID — this is EBCDIC "1234", not packed)

The COPAY-AMT field contains EBCDIC character data instead of packed decimal. Now you trace backward to find how that field got its value.

33.5.3 Reading a Dump — Step by Step

Here is a systematic approach to reading dumps:

Step 1: Identify the abend code and module.

COMPLETION CODE - SYSTEM=0C7  USER=0000
MODULE - CLM-ADJUD  OFFSET - 0003A2E0

Step 2: Get the compiler listing for the module. You need the listing from the exact compile that produced the load module in production.

Step 3: Map the offset to a COBOL statement. The OFFSET compiler option produces a table mapping offsets to source line numbers:

LINE #   OFFSET
001250   003A2A
001251   003A2E    <--- Our failing offset
001252   003A36

Line 1251 is:

001251     ADD WS-COPAY-AMT TO WS-APPROVED-AMT.

Step 4: Find the operands in the dump. The MAP listing gives you displacements:

WS-COPAY-AMT      BL=01  186   PIC S9(7)V99 COMP-3
WS-APPROVED-AMT   BL=01  180   PIC S9(9)V99 COMP-3

Step 5: Find Base Register 01 in the dump. The register save area shows:

REGS AT ENTRY TO ABENDING MODULE:
R13=00A34000 (save area)
...
BL=01 address = 00B5C000

Step 6: Read the data at that address.

00B5C000+180:  00 05 00 00 0C     (WS-APPROVED-AMT = +50000.0)
00B5C000+186:  F1 F2 F3 F4 F5     (WS-COPAY-AMT = INVALID)

Step 7: Diagnose. WS-COPAY-AMT contains X'F1F2F3F4F5' which is EBCDIC "12345". Someone moved display-format data into a COMP-3 field.

33.5.4 CEEDUMP and Language Environment Debugging

IBM's Language Environment (LE) provides enhanced debugging through CEEDUMP. When a LE-managed program abends, CEEDUMP produces a formatted dump that includes:

  • Traceback of active routines (call stack)
  • Condition information (what caused the abend)
  • Variable values at the point of failure
  • Argument values for each routine in the traceback

CEEDUMP output:

CEE3DMP V2 R5.0: Condition processing resulted in the
unhandled condition.
 Information for enclave CLM-ADJUD

  Condition Information for Active Routines
    Condition Information for  (DSA address 2098B528)
      CIB Address: 2098B3E0
      Current Condition:
        CEE3207S The system detected a data exception
                 (System Completion Code=0C7).
      Location:
        Program Unit:  CLM-ADJUD
        Entry:         CLM-ADJUD
        Statement:     1251
        Offset:        +0003A2E0

  Traceback:
    DSA    Entry      Statement  Offset
    1      CLM-ADJUD  1251       +0003A2E0
    2      CLM-MAIN   450        +00001A8C
    3      CEEMAIN    ---        +00000166

  Variables for Active Routine CLM-ADJUD:
    WS-CLAIM-ID          = "CLM000045892   "
    WS-APPROVED-AMT      = +50000.00
    WS-COPAY-AMT         = (INVALID DATA)
    WS-PROVIDER-ID       = "PROV001234"

💡 Key Advantage: CEEDUMP shows you the COBOL statement number and variable values without needing to manually map offsets and read hex. This dramatically reduces debugging time.

To enable CEEDUMP, add this to your JCL:

//CEEOPTS DD *
  TRAP(ON,SPIE)
  TERMTHDACT(UADUMP)
/*
//CEEDUMP DD SYSOUT=*

33.6 Interactive Debugging Tools

33.6.1 IBM Debug Tool (z/OS Debugger)

IBM Debug Tool (now called IBM z/OS Debugger) allows you to set breakpoints, step through COBOL code, inspect variables, and modify values at runtime — the mainframe equivalent of GDB or Visual Studio's debugger.

Key capabilities: - Set breakpoints at COBOL paragraph names or statement numbers - Step through code line by line - Display and modify variable values - Set conditional breakpoints (break only when a condition is true) - Monitor variables for changes - Examine call stacks

Starting a debug session in batch:

//TEST   EXEC PGM=EQANMDBG,PARM='TEST(,,,DBMDT*)'
//INSPLOG DD SYSOUT=*
//INSPPREF DD DUMMY

Common debug commands:

AT 2000-PROCESS-CLAIM        (breakpoint at paragraph)
AT LINE 1251                  (breakpoint at line number)
AT LINE 1251 WHEN WS-CLAIM-ID = 'CLM000045892'
                              (conditional breakpoint)
GO                            (run until next breakpoint)
STEP                          (execute one statement)
LIST WS-CLAIM-ID             (display variable value)
LIST WS-APPROVED-AMT HEX    (display in hex)
LIST (WS-TABLE(1:5))         (display first 5 table entries)
SET WS-DEBUG-LEVEL = 3       (modify a variable)
QUERY LOCATION               (show current position)
LIST CALLS                   (show call stack)

33.6.2 Compuware Xpediter

Xpediter is a popular third-party interactive debugger used at many mainframe shops. It provides similar capabilities to IBM Debug Tool with a different interface:

Key Xpediter features: - Source-level debugging with COBOL source display - Before/after data inspection - Automatic data formatting (displays COMP-3 fields in decimal) - File I/O interception — see exactly what records are being read/written - Abend-Aid integration — enhanced abend analysis

Xpediter workflow: 1. Compile with TEST option 2. Start Xpediter session (ISPF option or JCL PARM) 3. Set breakpoints by typing 'B' next to lines on the source display 4. Run program — execution stops at breakpoints 5. Inspect variables in the working storage display panel 6. Step through code, watching values change

At GlobalBank, Derek Washington prefers Xpediter because of its visual interface: "I can see the source code, the data values, and the execution flow all on one screen. It's like having X-ray vision into the program."

33.6.3 Choosing the Right Debugger

Feature IBM Debug Tool Xpediter
Cost Included with z/OS Separate license
Interface Commands or Eclipse ISPF panels
COBOL support Excellent Excellent
CICS debugging Yes Yes
DB2 debugging Yes Yes
Batch debugging Yes Yes
Remote debugging Via Eclipse Via ISPF
Abend analysis Basic Advanced (Abend-Aid)

33.7 Debugging CICS Programs

CICS programs present unique debugging challenges because they run inside the CICS region, not as standalone batch jobs.

33.7.1 CEDF — CICS Execution Diagnostic Facility

CEDF is CICS's built-in debugger. It intercepts every EXEC CICS command and lets you see the before/after values:

To start CEDF: type CEDF on a CICS terminal, then start your transaction.

TRANSACTION: XINQ  PROGRAM: CUSTINQ1
ABOUT TO EXECUTE COMMAND:
  EXEC CICS READ
    FILE     ('CUSTFILE')
    INTO     (X'00A340A0')
    LENGTH   (250)
    RIDFLD   (X'C3F0F0F0F0F1F0F0F0F1')
    RESP     (0)
    RESP2    (0)

PRESS ENTER TO CONTINUE

After the command executes:

COMMAND EXECUTION COMPLETE
  EXEC CICS READ
    FILE     ('CUSTFILE')
    INTO     (X'00A340A0')
    LENGTH   (250)
    RIDFLD   (X'C3F0F0F0F0F1F0F0F0F1')
    RESP     (NORMAL)
    RESP2    (0)

RESPONSE: NORMAL

PRESS ENTER TO CONTINUE

33.7.2 EDF — Extended Diagnostic Facility

EDF extends CEDF by showing COBOL statements between EXEC CICS commands. It provides a more complete view of program execution:

STATEMENT 1250:
    MOVE WS-CUST-ID TO RIDFLD-CUST-KEY

ABOUT TO EXECUTE COMMAND:
    EXEC CICS READ FILE('CUSTFILE') ...

STATEMENT 1255:
    IF WS-CICS-RESP = DFHRESP(NORMAL)

STATEMENT 1257:
    PERFORM 2200-DISPLAY-CUSTOMER

33.7.3 CICS Debugging Best Practices

  1. Always check RESP codes: The equivalent of checking SQLCODE in DB2.
       EXEC CICS READ
           FILE('CUSTFILE')
           INTO(WS-CUST-REC)
           RIDFLD(WS-CUST-KEY)
           RESP(WS-RESP)
           RESP2(WS-RESP2)
       END-EXEC

       IF WS-RESP NOT = DFHRESP(NORMAL)
           MOVE WS-RESP  TO WS-DIAG-RESP
           MOVE WS-RESP2 TO WS-DIAG-RESP2
           EXEC CICS WRITEQ TD
               QUEUE('CSMT')
               FROM(WS-DIAG-MSG)
               LENGTH(LENGTH OF WS-DIAG-MSG)
           END-EXEC
       END-IF.
  1. Write to CSMT for diagnostics: The CSMT transient data queue goes to the CICS message log, which is visible to system programmers.

  2. Use auxiliary trace: CICS trace captures detailed execution flow.

EXEC CICS SET TRACEFLAG(ON) END-EXEC.
  1. Avoid HANDLE CONDITION in new code: Use RESP/RESP2 instead. HANDLE CONDITION masks errors and makes debugging harder.

33.8 Debugging Embedded SQL

33.8.1 SQLCODE Checking

After every EXEC SQL statement, check SQLCODE:

       01  WS-SQL-DIAG.
           05  WS-DIAG-SQLCODE  PIC S9(9) COMP.
           05  WS-DIAG-SQLERRM  PIC X(70).
           05  WS-DIAG-STMT     PIC X(30).

       EXEC SQL
           SELECT CUST_NAME, CUST_STATUS
           INTO :WS-CUST-NAME, :WS-CUST-STATUS
           FROM CUSTOMER
           WHERE CUST_ID = :WS-CUST-ID
       END-EXEC.

       IF SQLCODE NOT = 0 AND SQLCODE NOT = +100
           MOVE SQLCODE TO WS-DIAG-SQLCODE
           MOVE SQLERRMC TO WS-DIAG-SQLERRM
           MOVE 'CUSTOMER SELECT' TO WS-DIAG-STMT
           DISPLAY 'SQL ERROR IN ' WS-DIAG-STMT
           DISPLAY '  SQLCODE: ' WS-DIAG-SQLCODE
           DISPLAY '  SQLERRM: ' WS-DIAG-SQLERRM
       END-IF.

33.8.2 Common SQLCODEs

SQLCODE Meaning Debugging Action
0 Success
+100 Not found / end of cursor Verify WHERE clause, check data exists
-180 Invalid date/time Check date format (YYYY-MM-DD for DB2)
-204 Object not found Table/view name wrong or not granted
-206 Column not in table Typo in column name
-305 Null indicator needed Column allows NULLs; add indicator variable
-530 Foreign key violation Parent row doesn't exist
-803 Duplicate key Unique index conflict
-811 Multiple rows returned SELECT INTO returned more than one row
-904 Resource unavailable Database or tablespace stopped/locked
-911 Deadlock/timeout Retry the transaction
-922 Authorization failure Missing GRANT

33.8.3 DSNTEP2 and SPUFI

DSNTEP2 is DB2's batch SQL processor. Use it to test SQL statements outside your COBOL program:

//DSNTEP2 EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN  DD *
  DSN SYSTEM(DB2P)
  RUN PROGRAM(DSNTEP2) PLAN(DSNTEP21)
  END
/*
//SYSPRINT DD SYSOUT=*
//SYSIN    DD *
  SELECT CUST_NAME, CUST_STATUS
  FROM CUSTOMER
  WHERE CUST_ID = 'C000010001';
/*

SPUFI (SQL Processor Using File Input) is the interactive ISPF-based SQL tool. It provides the same capability through an online panel.

Both tools are invaluable for: - Testing SQL syntax before embedding it in COBOL - Verifying that your WHERE clause returns the expected data - Examining the actual data in tables to understand why your program behaves unexpectedly

Try It Yourself: Take a program that accesses DB2 and add comprehensive SQLCODE checking after every EXEC SQL statement. Create a diagnostic paragraph that logs SQLCODE, SQLERRMC, and the statement identifier to SYSOUT. Test with a deliberately wrong table name to verify your error handling works.

33.9 Common COBOL Bugs and Their Symptoms

Experience teaches patterns. Here are the bugs that James Okafor sees most often at MedClaim:

33.9.1 Uninitialized Fields

Symptom: S0C7 on first calculation, or incorrect results that change with every run. Cause: WORKING-STORAGE fields not initialized with VALUE clauses or explicit MOVEs.

      *--- BUG: WS-TOTAL never initialized ---*
       01  WS-TOTAL  PIC S9(9)V99 COMP-3.
       ...
       ADD WS-AMOUNT TO WS-TOTAL.    *> S0C7 if uninit

      *--- FIX: Initialize in declaration or in code ---*
       01  WS-TOTAL  PIC S9(9)V99 COMP-3 VALUE ZEROS.

33.9.2 Off-by-One in Table Processing

Symptom: Last record not processed, or S0C4/S0C7 from processing one record past the end. Cause: Loop boundary error.

      *--- BUG: Processes one too many ---*
       PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > 101
           MOVE WS-TABLE-ENTRY(WS-IDX) TO ...
       END-PERFORM.

      *--- WS-TABLE only has 100 entries! ---*

      *--- FIX: Use exact bound ---*
       PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > WS-TABLE-COUNT
                      OR WS-IDX > 100

33.9.3 MOVE with Truncation

Symptom: Data silently truncated, incorrect values in output. Cause: Moving a larger field to a smaller one without checking.

       01  WS-FULL-NAME    PIC X(50).
       01  WS-SHORT-NAME   PIC X(20).

      *--- BUG: Silently truncates names longer than 20 ---*
       MOVE WS-FULL-NAME TO WS-SHORT-NAME.

      *--- Detection: Use compiler option TRUNC(BIN) or check ---*
       IF FUNCTION LENGTH(
           FUNCTION TRIM(WS-FULL-NAME TRAILING)) > 20
           DISPLAY 'WARNING: NAME TRUNCATED: '
                   WS-FULL-NAME
       END-IF.

33.9.4 Numeric Field Mismatch

Symptom: S0C7 or incorrect arithmetic results. Cause: Moving data between incompatible numeric types.

      *--- BUG: Moving display numeric to COMP-3 via group move ---*
       01  WS-INPUT-REC.
           05  IN-AMOUNT    PIC 9(7)V99.     *> Display numeric

       01  WS-WORK-REC.
           05  WK-AMOUNT    PIC S9(7)V99 COMP-3. *> Packed

      *--- This is a GROUP move — no conversion! ---*
       MOVE WS-INPUT-REC TO WS-WORK-REC.    *> BUG!

      *--- FIX: Move at the elementary level ---*
       MOVE IN-AMOUNT TO WK-AMOUNT.          *> Proper conversion

🔴 Critical Concept: Group moves are character moves — they copy bytes without any numeric conversion. Elementary moves between numeric fields perform the appropriate conversion (display to packed, packed to binary, etc.). This distinction is the #1 source of S0C7 abends in COBOL programs.

33.9.5 Missing Period in EVALUATE/IF

Symptom: Wrong code path executed, unexpected fall-through. Cause: In older-style COBOL without explicit scope terminators, a missing period changes the meaning of the code.

      *--- BUG (old style): Missing period ---*
       IF WS-STATUS = 'A'
           PERFORM PROCESS-ACTIVE
       IF WS-STATUS = 'I'
           PERFORM PROCESS-INACTIVE.

      *--- The second IF is INSIDE the first IF's
      *--- true branch! PROCESS-INACTIVE only runs
      *--- when status is 'A' AND 'I' (never).

      *--- FIX: Use END-IF scope terminators ---*
       IF WS-STATUS = 'A'
           PERFORM PROCESS-ACTIVE
       END-IF
       IF WS-STATUS = 'I'
           PERFORM PROCESS-INACTIVE
       END-IF.

33.9.6 File Status Not Checked

Symptom: Processing wrong data, infinite loop, mysterious S0C7. Cause: Continuing to process after a failed file operation.

      *--- BUG: No status check after READ ---*
       READ CUSTOMER-FILE INTO WS-CUST-REC.
       PERFORM PROCESS-CUSTOMER.
      *--- If READ failed, WS-CUST-REC is stale! ---*

      *--- FIX: Check file status ---*
       READ CUSTOMER-FILE INTO WS-CUST-REC
       IF WS-FILE-STATUS = '00'
           PERFORM PROCESS-CUSTOMER
       ELSE IF WS-FILE-STATUS = '10'
           SET END-OF-FILE TO TRUE
       ELSE
           DISPLAY 'READ ERROR: ' WS-FILE-STATUS
           PERFORM ABEND-HANDLER
       END-IF.

33.9.7 PERFORM THRU with Fall-Through

Symptom: Code executing that should not be. Cause: Adding new paragraphs between PERFORM THRU boundaries.

      *--- Original code ---*
       PERFORM 2000-START THRU 2000-EXIT.

       2000-START.
           ...
       2000-EXIT.
           EXIT.

      *--- Someone later adds: ---*
       2000-START.
           ...
       2050-NEW-PARAGRAPH.
           ...
       2000-EXIT.
           EXIT.

      *--- 2050-NEW-PARAGRAPH now executes inside the
      *--- PERFORM THRU, which may not be intended! ---*

💡 Prevention: Avoid PERFORM THRU when possible. Use structured PERFORM paragraph. If THRU is required, clearly document the boundaries.

33.10 GlobalBank Case Study: Debugging a S0C7 in BAL-CALC

Let us walk through a real-world debugging scenario at GlobalBank.

The Problem: The nightly BAL-CALC (balance calculation) batch job abends with S0C7 after processing 47,293 records successfully. It has been running without issues for three years.

Step 1: Read the abend message.

IEA995I SYMPTOM DUMP:
SYSTEM COMPLETION CODE=0C7
ACTIVE LOAD MODULE  BAL-CALC
OFFSET              0001A3F0

Step 2: Maria Chen pulls up the compiler listing and finds offset 0001A3F0:

LINE    OFFSET  SOURCE
003847  01A3E8  COMPUTE WS-NET-BALANCE =
003848  01A3F0      WS-GROSS-BALANCE - WS-HOLD-AMOUNT

Line 3848: the COMPUTE is failing. Either WS-GROSS-BALANCE or WS-HOLD-AMOUNT contains non-numeric data.

Step 3: Check the CEEDUMP for variable values.

Variables at Statement 3848:
  WS-ACCT-NUMBER     = "ACCT00047294"
  WS-GROSS-BALANCE   = +1234567.89
  WS-HOLD-AMOUNT     = (INVALID DATA - hex: 40404040404040)

WS-HOLD-AMOUNT contains X'40' (EBCDIC spaces). A COMP-3 field full of spaces is not valid packed decimal.

Step 4: Trace where WS-HOLD-AMOUNT gets its value.

Maria searches the source code and finds:

003820  MOVE ACCT-HOLD-AMT TO WS-HOLD-AMOUNT.

ACCT-HOLD-AMT comes from the account master file record. The record for ACCT00047294 must have spaces in the hold amount field.

Step 5: Verify the data.

//VERIFY EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN    DD *
  PRINT INFILE(ACCTMAST) -
    FROMKEY(ACCT00047294) -
    COUNT(1) CHARACTER
/*

Output confirms: the ACCT-HOLD-AMT field contains spaces.

Step 6: Find the root cause.

Maria checks the change log: three weeks ago, a data conversion program migrated accounts from an old system. That program used spaces as "no hold amount" instead of packed zeros. The conversion program had a bug:

      *--- Bug in conversion program ---*
       IF OLD-HOLD-AMT = SPACES
           CONTINUE            *> Should have moved zeros!
       ELSE
           MOVE OLD-HOLD-AMT TO NEW-HOLD-AMT
       END-IF.

Step 7: Fix the immediate problem.

      *--- Add validation before the calculation ---*
       IF ACCT-HOLD-AMT IS NOT NUMERIC
           MOVE ZEROS TO WS-HOLD-AMOUNT
           ADD 1 TO WS-INVALID-DATA-COUNT
           IF WS-DEBUG-LEVEL >= 2
               DISPLAY 'DBG2: NON-NUMERIC HOLD-AMT FOR '
                       WS-ACCT-NUMBER
                       ' — DEFAULTED TO ZERO'
           END-IF
       ELSE
           MOVE ACCT-HOLD-AMT TO WS-HOLD-AMOUNT
       END-IF.

Step 8: Fix the root cause.

Write a cleanup program to fix all accounts with spaces in the hold amount:

      *--- Data cleanup program ---*
       PERFORM UNTIL END-OF-ACCT-FILE
           READ ACCT-MASTER INTO WS-ACCT-REC
               AT END SET END-OF-ACCT-FILE TO TRUE
           END-READ

           IF NOT END-OF-ACCT-FILE
               IF ACCT-HOLD-AMT IS NOT NUMERIC
                   MOVE ZEROS TO ACCT-HOLD-AMT
                   REWRITE ACCT-RECORD FROM WS-ACCT-REC
                   ADD 1 TO WS-FIX-COUNT
               END-IF
           END-IF
       END-PERFORM

       DISPLAY 'RECORDS FIXED: ' WS-FIX-COUNT.

Step 9: Prevent recurrence. Add the IS NUMERIC test to BAL-CALC permanently, and add data quality validation to all conversion programs.

⚠️ Defensive Programming Lesson: The bug was not in BAL-CALC. BAL-CALC had run correctly for three years. The bug was in a data conversion program written three weeks earlier. Maria's fix does two things: corrects the data and adds defensive validation to BAL-CALC. "Never trust input data," she says. "Not even if it comes from your own system."

33.11 MedClaim Case Study: Finding Why Claims Are Miscalculated

The Problem: Claims for procedure code "99214" are being paid at 80% of the contracted rate instead of 100%. Sarah Kim, the business analyst, noticed the discrepancy in a weekly report.

Step 1: James Okafor examines the adjudication logic.

The claim adjudication program CLM-ADJUD calculates the approved amount based on a coverage table:

       EVALUATE CLM-PROC-CATEGORY
           WHEN 'PREVENTIVE'
               MOVE 1.00 TO WS-COVERAGE-PCT
           WHEN 'DIAGNOSTIC'
               MOVE 0.80 TO WS-COVERAGE-PCT
           WHEN 'SURGICAL'
               MOVE 0.70 TO WS-COVERAGE-PCT
           WHEN OTHER
               MOVE 0.80 TO WS-COVERAGE-PCT
       END-EVALUATE.

       COMPUTE WS-APPROVED-AMT =
           WS-ALLOWED-AMT * WS-COVERAGE-PCT.

Procedure 99214 (office visit, established patient, moderate complexity) should be "PREVENTIVE" but is being categorized as something else.

Step 2: Add DISPLAY debugging.

       IF WS-DEBUG-LEVEL >= 2
           DISPLAY 'DBG2: CLAIM=' WS-CLAIM-ID
           DISPLAY 'DBG2: PROC-CODE=' CLM-PROC-CODE
           DISPLAY 'DBG2: PROC-CATEGORY=>'
                   CLM-PROC-CATEGORY '<'
           DISPLAY 'DBG2: COVERAGE-PCT=' WS-COVERAGE-PCT
       END-IF

Notice the angle brackets around CLM-PROC-CATEGORY — this is intentional.

Step 3: Run with debug level 2 for a test claim.

Output:

DBG2: CLAIM=CLM000098765
DBG2: PROC-CODE=99214
DBG2: PROC-CATEGORY=>PREVENTIVE <
DBG2: COVERAGE-PCT=0.80

The category is PREVENTIVE — with a trailing space! The EVALUATE is comparing against 'PREVENTIVE' (10 characters) but the field contains 'PREVENTIVE ' (with a trailing space making it 11 characters).

Wait — in COBOL, that should not matter. COBOL pads shorter operands with spaces during comparison. Let James look more carefully...

Step 4: Deeper investigation.

       DISPLAY 'CATEGORY LENGTH='
               FUNCTION LENGTH(CLM-PROC-CATEGORY)
       DISPLAY 'CATEGORY HEX='
      *   ... hex display routine ...

Output:

CATEGORY LENGTH=11
CATEGORY HEX=D7D9C5E5C5D5E3C9E5C500

The last byte is X'00' — a null character! The field is 'PREVENTIVE\0'. The null character causes the EVALUATE to fall through to WHEN OTHER.

Step 5: Trace the source of the null.

CLM-PROC-CATEGORY comes from a Java-to-COBOL interface via MQ. The Java program sends null-terminated strings (a C/Java convention), but COBOL expects space-padded fields.

Step 6: Fix.

Add an INSPECT to clean the data:

      *--- Clean null characters from MQ input fields ---*
       INSPECT CLM-PROC-CATEGORY
           REPLACING ALL LOW-VALUES BY SPACES.

Or better, add a reusable paragraph:

       8000-CLEAN-INPUT-FIELDS.
           INSPECT CLM-PROC-CODE
               REPLACING ALL LOW-VALUES BY SPACES
           INSPECT CLM-PROC-CATEGORY
               REPLACING ALL LOW-VALUES BY SPACES
           INSPECT CLM-DIAG-CODE
               REPLACING ALL LOW-VALUES BY SPACES
           INSPECT CLM-MODIFIER
               REPLACING ALL LOW-VALUES BY SPACES.

🧪 The Human Factor in Debugging: This bug existed because two teams (Java and COBOL) had different assumptions about string representation. Java uses null-terminated strings; COBOL uses fixed-length, space-padded fields. Neither team was "wrong" — they were working in different paradigms. The fix requires understanding both worlds. James Okafor's ability to debug this comes from his experience working across the interface boundary.

33.12 Production Debugging — Limited Access

In production, you typically cannot run interactive debuggers or add DISPLAY statements. You must work with what is available:

33.12.1 Reading Existing Logs

Check the job log (SYSOUT), CICS logs (CSMT, CESE), and DB2 diagnostic logs for clues. Many shops have application-level logging that writes to a dataset or DB2 table.

33.12.2 Analyzing Dumps After the Fact

Request that operations save the dump (SYS1.DUMPxx) when a production abend occurs. Use IPCS (Interactive Problem Control System) to format and analyze the dump offline:

IPCS VERBX CEEDUMP 'COMP(COBOL)'

33.12.3 Reproducing in Test

The gold standard: reproduce the problem in a test environment.

  1. Identify the failing input data (from the dump or logs)
  2. Copy or recreate that data in the test environment
  3. Run the program with debugging options enabled (SSRANGE, CHECK, TEST)
  4. Step through with an interactive debugger
      *--- Create a test harness that feeds specific data ---*
       01  WS-TEST-MODE   PIC X(1) VALUE 'N'.

       1000-INITIALIZE.
           ACCEPT WS-TEST-MODE FROM JCL-PARM.
           IF WS-TEST-MODE = 'Y'
               PERFORM 1100-SETUP-TEST-DATA
           ELSE
               PERFORM 1200-NORMAL-INIT
           END-IF.

       1100-SETUP-TEST-DATA.
      *--- Hardcode the failing scenario ---*
           MOVE 'CLM000045892' TO WS-CLAIM-ID
           MOVE '99214'        TO CLM-PROC-CODE
           MOVE 'PREVENTIVE'   TO CLM-PROC-CATEGORY
           MOVE X'00' TO CLM-PROC-CATEGORY(11:1)
      *--- Now the program processes exactly the failing case ---*

33.12.4 Post-Mortem with SMF Records

IBM's System Management Facilities (SMF) records capture detailed execution data. SMF Type 30 records contain program execution statistics. SMF Type 101/102 records contain DB2 accounting data. These can help you understand what happened during a production failure without needing to reproduce it.

33.13 Building a Debugging Toolkit

Every experienced COBOL developer maintains a personal toolkit of debugging aids. Here is a starter kit:

33.13.1 Standard Debug Copybook

      *================================================================*
      * COPYBOOK: DEBUGCTL                                             *
      * PURPOSE:  Standard debugging control fields                    *
      *================================================================*
       01  WS-DEBUG-CONTROL.
           05  WS-DEBUG-LEVEL     PIC 9(1) VALUE 0.
           05  WS-DEBUG-START-REC PIC 9(9) VALUE ZEROS.
           05  WS-DEBUG-END-REC   PIC 9(9) VALUE 999999999.
           05  WS-DEBUG-PARA-NAME PIC X(30).
           05  WS-DEBUG-MSG       PIC X(120).
           05  WS-DEBUG-TIMESTAMP PIC X(26).

33.13.2 Standard Diagnostic Paragraph

       8888-DEBUG-DISPLAY.
           IF WS-DEBUG-LEVEL > 0
               MOVE FUNCTION CURRENT-DATE
                   TO WS-DEBUG-TIMESTAMP
               DISPLAY WS-DEBUG-TIMESTAMP(1:19) ' '
                       'L' WS-DEBUG-LEVEL ' '
                       WS-DEBUG-PARA-NAME ': '
                       WS-DEBUG-MSG
           END-IF.

33.13.3 SQL Diagnostic Paragraph

       8800-SQL-DIAGNOSTIC.
           DISPLAY '*** SQL ERROR ***'
           DISPLAY '  SQLCODE:   ' SQLCODE
           DISPLAY '  SQLERRM:   ' SQLERRMC
           DISPLAY '  STATEMENT: ' WS-DEBUG-PARA-NAME
           DISPLAY '  SQLERRD(3):' SQLERRD(3)
           IF SQLCODE = -911
               DISPLAY '  DEADLOCK/TIMEOUT — RETRY NEEDED'
           END-IF
           IF SQLCODE = -803
               DISPLAY '  DUPLICATE KEY'
           END-IF
           IF SQLCODE = -904
               DISPLAY '  RESOURCE UNAVAILABLE'
           END-IF.

33.13.4 Hex Dump Utility Paragraph

       01  WS-HEX-WORK.
           05  WS-HEX-CHARS     PIC X(16)
               VALUE '0123456789ABCDEF'.
           05  WS-HEX-TBL REDEFINES WS-HEX-CHARS
                           PIC X(1) OCCURS 16.
           05  WS-HEX-OUT       PIC X(200).
           05  WS-HEX-PTR       PIC 9(3).
           05  WS-HEX-BYTE      PIC 9(3) COMP.
           05  WS-HEX-HIGH      PIC 9(3) COMP.
           05  WS-HEX-LOW       PIC 9(3) COMP.

       8700-DISPLAY-HEX.
      *--- Call with WS-HEX-INPUT and WS-HEX-LEN set ---*
           MOVE 1 TO WS-HEX-PTR
           MOVE SPACES TO WS-HEX-OUT
           PERFORM VARYING WS-HEX-IDX FROM 1 BY 1
                       UNTIL WS-HEX-IDX > WS-HEX-LEN
               MOVE FUNCTION ORD(
                   WS-HEX-INPUT(WS-HEX-IDX:1))
                   TO WS-HEX-BYTE
               SUBTRACT 1 FROM WS-HEX-BYTE
               DIVIDE WS-HEX-BYTE BY 16
                   GIVING WS-HEX-HIGH
                   REMAINDER WS-HEX-LOW
               ADD 1 TO WS-HEX-HIGH
               ADD 1 TO WS-HEX-LOW
               MOVE WS-HEX-TBL(WS-HEX-HIGH)
                   TO WS-HEX-OUT(WS-HEX-PTR:1)
               ADD 1 TO WS-HEX-PTR
               MOVE WS-HEX-TBL(WS-HEX-LOW)
                   TO WS-HEX-OUT(WS-HEX-PTR:1)
               ADD 1 TO WS-HEX-PTR
               MOVE ' ' TO WS-HEX-OUT(WS-HEX-PTR:1)
               ADD 1 TO WS-HEX-PTR
           END-PERFORM
           DISPLAY 'HEX: ' WS-HEX-OUT.

33.14 Debugging Performance Problems

Not all bugs cause abends or incorrect results. Some bugs cause performance degradation — programs that produce correct output but take far longer than they should. These are among the hardest bugs to diagnose because there is no error message pointing to the problem.

33.14.1 Identifying Performance Bottlenecks

The first step is measurement. On z/OS, several tools provide performance data:

  • SMF records: Type 30 records capture CPU time, elapsed time, and I/O counts for each job step
  • DB2 accounting reports: Show SQL execution statistics, lock waits, and buffer pool hit rates
  • CICS monitoring: Transaction response times, CPU usage, and wait times
  • RMF (Resource Measurement Facility): System-wide resource utilization

For application-level profiling, add timing instrumentation:

       01  WS-TIMING.
           05  WS-START-STCK     PIC S9(18) COMP.
           05  WS-END-STCK       PIC S9(18) COMP.
           05  WS-ELAPSED-MICRO  PIC S9(18) COMP.
           05  WS-ELAPSED-DISP   PIC Z(12)9.

       PERFORM-WITH-TIMING.
      *--- Capture start time ---*
           ACCEPT WS-START-STCK FROM TIME

           PERFORM 2000-PROCESS-CLAIMS

      *--- Capture end time ---*
           ACCEPT WS-END-STCK FROM TIME
           COMPUTE WS-ELAPSED-MICRO =
               WS-END-STCK - WS-START-STCK
           MOVE WS-ELAPSED-MICRO TO WS-ELAPSED-DISP
           DISPLAY 'ELAPSED MICROSECONDS: '
                   WS-ELAPSED-DISP.

33.14.2 Common Performance Bugs in COBOL

The Accidental Full Table Scan:

      *--- BUG: Missing WHERE clause qualification ---*
       EXEC SQL
           SELECT CUST_NAME INTO :WS-CUST-NAME
           FROM CUSTOMER
           WHERE CUST_STATUS = 'A'
       END-EXEC.
      *--- This returns ALL active customers (millions)
      *--- and DB2 returns -811 (multiple rows).
      *--- Even if it doesn't error, it scans the whole table.

The N+1 Query Problem:

      *--- BUG: One SQL call per customer in a loop ---*
       PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > WS-CUST-COUNT
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-BALANCE
               FROM ACCOUNT
               WHERE CUST_ID = :WS-CUST-TABLE(WS-IDX)
           END-EXEC
       END-PERFORM.

      *--- FIX: Use a cursor with JOIN or array fetch ---*
       EXEC SQL DECLARE BALANCE-CURSOR CURSOR FOR
           SELECT C.CUST_ID, A.ACCT_BALANCE
           FROM CUSTOMER C
           JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
           WHERE C.CUST_STATUS = 'A'
           ORDER BY C.CUST_ID
       END-EXEC.

The Unnecessary SORT:

      *--- BUG: Sorting in COBOL when DB2 can do it ---*
       EXEC SQL DECLARE C1 CURSOR FOR
           SELECT * FROM TRANSACTIONS
           WHERE ACCT_ID = :WS-ACCT
       END-EXEC.
      *--- Fetch all into table, then SORT table ---*

      *--- FIX: Add ORDER BY to the SQL ---*
       EXEC SQL DECLARE C1 CURSOR FOR
           SELECT * FROM TRANSACTIONS
           WHERE ACCT_ID = :WS-ACCT
           ORDER BY TRANS_DATE DESC
       END-EXEC.

The Inefficient String Search:

      *--- BUG: Scanning a large table character by character ---*
       PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > WS-TABLE-SIZE
           IF WS-TABLE-ENTRY(WS-IDX) = WS-SEARCH-VALUE
               MOVE WS-IDX TO WS-FOUND-IDX
           END-IF
       END-PERFORM.

      *--- FIX: Use SEARCH ALL for sorted tables ---*
       SEARCH ALL WS-TABLE-ENTRY
           WHEN WS-TABLE-KEY(WS-IDX) = WS-SEARCH-VALUE
               MOVE WS-IDX TO WS-FOUND-IDX
       END-SEARCH.

33.14.3 Using DB2 EXPLAIN

The DB2 EXPLAIN statement shows you the access path DB2 will use for a query. This is invaluable for understanding why a query is slow:

//EXPLAIN EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
  DSN SYSTEM(DB2P)
  RUN PROGRAM(DSNTEP2) PLAN(DSNTEP21)
  END
/*
//SYSIN DD *
  EXPLAIN ALL SET QUERYNO = 1 FOR
  SELECT C.CUST_NAME, A.ACCT_BALANCE
  FROM CUSTOMER C
  JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
  WHERE C.CUST_ID = 'C000010001';
/*

The EXPLAIN output (in the PLAN_TABLE) tells you whether DB2 is using an index scan, a table scan, a nested loop join, a merge join, or a hash join. If you see a table scan (ACCESSTYPE = 'R') on a large table, you likely need an index on the WHERE clause columns.

📊 Performance Debugging Rule of Thumb: When a program is slow, the problem is almost always I/O — too many SQL calls, too many file reads, or SQL calls that trigger table scans. CPU-bound performance problems are rare in COBOL. Start by counting the number of SQL calls and file reads, then check whether each one is using an index.

33.15 Debugging Batch Programs with Large Data Volumes

Batch programs that process millions of records present unique debugging challenges. You cannot step through millions of iterations in a debugger, and adding DISPLAY statements to every record generates unmanageable output.

33.15.1 The Targeted Debug Window

Process normally until you reach the problem area, then activate debugging:

       01  WS-DEBUG-WINDOW.
           05  WS-DEBUG-START   PIC 9(9) VALUE 47290.
           05  WS-DEBUG-END     PIC 9(9) VALUE 47300.
           05  WS-CURRENT-REC   PIC 9(9) VALUE ZEROS.

       2000-PROCESS-LOOP.
           PERFORM UNTIL END-OF-FILE
               READ INPUT-FILE INTO WS-INPUT-REC
                   AT END SET END-OF-FILE TO TRUE
               END-READ

               IF NOT END-OF-FILE
                   ADD 1 TO WS-CURRENT-REC

      *--- Activate debugging only in the window ---*
                   IF WS-CURRENT-REC >= WS-DEBUG-START
                      AND WS-CURRENT-REC <= WS-DEBUG-END
                       MOVE 3 TO WS-DEBUG-LEVEL
                   ELSE
                       MOVE 0 TO WS-DEBUG-LEVEL
                   END-IF

                   PERFORM 2100-PROCESS-ONE-RECORD
               END-IF
           END-PERFORM.

This produces detailed debug output only for records 47,290 through 47,300 — a window around the known failure point.

33.15.2 The Binary Search Debug Strategy

When you do not know which record causes the failure, use a binary search approach:

  1. Run the program with a checkpoint every 100,000 records
  2. The program abends between records 3,200,000 and 3,300,000
  3. Re-run from the checkpoint with checkpoints every 10,000 records
  4. It abends between 3,240,000 and 3,250,000
  5. Re-run with debug output for records 3,240,000 to 3,250,000
  6. The exact failing record is identified

This approach converges on the problem record in logarithmic time rather than requiring a linear scan.

33.15.3 The Canary Record

Insert a known-bad record into your test data at a specific position. Verify that your error handling catches it:

      *--- Insert a canary at position 500 in test data ---*
      *--- Record 500 has spaces in the amount field ---*
      *--- If the program processes past 500 without
      *--- logging an error, your validation is broken ---*

This technique verifies that your defensive programming actually works before you encounter real bad data in production.

33.16 Debugging Multi-Program Systems

Production mainframe systems often involve chains of programs communicating through files, databases, queues, or CALL interfaces. A bug in one program may manifest as incorrect behavior in a downstream program.

33.16.1 The Interface Verification Pattern

At every program interface (file handoff, MQ message, database shared table), verify that the data conforms to the expected format:

       1500-VERIFY-INPUT-INTERFACE.
      *--- Verify record count matches header ---*
           IF WS-HEADER-COUNT NOT = WS-ACTUAL-COUNT
               DISPLAY '*** INTERFACE ERROR ***'
               DISPLAY '  EXPECTED: ' WS-HEADER-COUNT
               DISPLAY '  ACTUAL:   ' WS-ACTUAL-COUNT
               PERFORM 9999-ABEND
           END-IF

      *--- Verify control totals match ---*
           IF WS-HEADER-TOTAL NOT = WS-ACTUAL-TOTAL
               DISPLAY '*** CONTROL TOTAL MISMATCH ***'
               DISPLAY '  EXPECTED: ' WS-HEADER-TOTAL
               DISPLAY '  ACTUAL:   ' WS-ACTUAL-TOTAL
               PERFORM 9999-ABEND
           END-IF

      *--- Verify no null contamination ---*
           INSPECT WS-FIRST-RECORD
               TALLYING WS-NULL-COUNT
               FOR ALL LOW-VALUES
           IF WS-NULL-COUNT > 0
               DISPLAY '*** NULL CHARACTERS IN INPUT ***'
               DISPLAY '  COUNT: ' WS-NULL-COUNT
           END-IF.

33.16.2 The Breadcrumb Trail

When data flows through multiple programs, add a processing trail that records which programs have touched each record:

       01  WS-BREADCRUMB.
           05  BC-PROGRAM-NAME   PIC X(8).
           05  BC-TIMESTAMP      PIC X(26).
           05  BC-ACTION         PIC X(20).

      *--- Each program adds its breadcrumb ---*
       2500-ADD-BREADCRUMB.
           MOVE 'CLMADJ01' TO BC-PROGRAM-NAME
           MOVE FUNCTION CURRENT-DATE TO BC-TIMESTAMP
           MOVE 'ADJUDICATED' TO BC-ACTION

           EXEC SQL
               INSERT INTO PROCESSING_TRAIL
               (CLAIM_ID, PROGRAM_NAME, PROCESS_TIME,
                ACTION)
               VALUES
               (:WS-CLAIM-ID, :BC-PROGRAM-NAME,
                :BC-TIMESTAMP, :BC-ACTION)
           END-EXEC.

When a downstream program encounters unexpected data, the breadcrumb trail shows exactly which programs processed the record and when. This is invaluable for narrowing down which program introduced the problem.

🧪 The Human Factor: James Okafor estimates that 40% of his debugging time is spent on interface issues — data format mismatches between programs written by different teams. "The bug is always at the boundary," he says. "Two programs can each be individually correct and still produce incorrect results when connected, because they make different assumptions about the data format."

33.17 Debugging COBOL Called Programs (Subprograms)

Many COBOL applications consist of a main program that CALLs multiple subprograms. Debugging across CALL boundaries presents additional challenges.

33.17.1 Verifying CALL Parameters

A frequent source of bugs is mismatched parameters between the calling program and the called program. The COBOL compiler does not validate parameter layouts across separate compilation units.

      *--- Calling program expects 3 parameters ---*
       CALL 'CALCMOD' USING WS-INPUT-REC
                             WS-RESULT
                             WS-ERROR-CODE.

      *--- Called program (CALCMOD) expects 3 parameters ---*
       PROCEDURE DIVISION USING LS-INPUT-REC
                                LS-RESULT
                                LS-ERROR-CODE.

If the field sizes do not match between the calling and called program, data corruption occurs silently. For example, if the calling program defines WS-INPUT-REC as PIC X(100) but the called program defines LS-INPUT-REC as PIC X(150), the called program will read 50 bytes beyond the caller's allocated storage.

Debug technique: Add a length verification at the start of every called program:

       LINKAGE SECTION.
       01  LS-INPUT-REC.
           05  LS-EYECATCHER    PIC X(8).
           05  LS-DATA-LENGTH   PIC 9(5).
           05  LS-DATA          PIC X(92).

       PROCEDURE DIVISION USING LS-INPUT-REC ...

       1000-VERIFY-INTERFACE.
           IF LS-EYECATCHER NOT = 'CALCMOD1'
               DISPLAY '*** INTERFACE ERROR ***'
               DISPLAY '  EXPECTED EYECATCHER: CALCMOD1'
               DISPLAY '  RECEIVED: ' LS-EYECATCHER
               PERFORM 9999-ABEND
           END-IF
           IF LS-DATA-LENGTH NOT = 92
               DISPLAY '*** DATA LENGTH MISMATCH ***'
               DISPLAY '  EXPECTED: 92'
               DISPLAY '  RECEIVED: ' LS-DATA-LENGTH
               PERFORM 9999-ABEND
           END-IF.

The eyecatcher pattern ensures that the calling program is passing the correct parameter structure. This is especially valuable when a subprogram is called by multiple different main programs.

33.17.2 Debugging Static vs. Dynamic Calls

COBOL supports both static calls (resolved at link time) and dynamic calls (resolved at runtime):

      *--- Static call — linked into the load module ---*
       CALL 'CALCMOD' USING ...

      *--- Dynamic call — loaded at runtime from a library ---*
       MOVE 'CALCMOD' TO WS-PROGRAM-NAME
       CALL WS-PROGRAM-NAME USING ...

For dynamic calls, the S806 abend (module not found) is common. Debugging checklist for S806: 1. Is the module name spelled correctly? (Check for trailing spaces) 2. Is the load library included in the STEPLIB/JOBLIB DD? 3. Was the module compiled and link-edited successfully? 4. Is the module in a library that the program has access to? 5. For CICS, is the program defined in the CSD?

      *--- Defensive dynamic CALL ---*
       MOVE 'CALCMOD ' TO WS-PROGRAM-NAME
       CALL WS-PROGRAM-NAME USING WS-INPUT-REC
                                   WS-RESULT
                                   WS-ERROR-CODE
           ON EXCEPTION
               DISPLAY '*** CALL FAILED: '
                       WS-PROGRAM-NAME
               DISPLAY '*** CHECK STEPLIB/JOBLIB'
               PERFORM 9999-ABEND
       END-CALL.

The ON EXCEPTION clause catches the S806 condition and allows the program to produce a meaningful error message rather than a cryptic abend.

33.17.3 Debugging CANCEL and Re-CALL

When a program issues CALL to a dynamic subprogram, the subprogram's WORKING-STORAGE is initialized once (on the first CALL). Subsequent CALLs retain the previous WORKING-STORAGE values. If you need the subprogram to reinitialize, you must either CANCEL it first or design the subprogram with an explicit initialization parameter.

      *--- Subprogram retains state between calls ---*
       CALL 'COUNTER' USING WS-COUNT.  *> Returns 1
       CALL 'COUNTER' USING WS-COUNT.  *> Returns 2
       CALL 'COUNTER' USING WS-COUNT.  *> Returns 3

      *--- Reset by canceling and re-calling ---*
       CANCEL 'COUNTER'.
       CALL 'COUNTER' USING WS-COUNT.  *> Returns 1 again

Failing to understand this behavior is a common source of bugs in programs that call subprograms repeatedly with the expectation that each call starts fresh.

33.18 Establishing a Debugging Culture

Debugging is not just a technical skill — it is a team practice. The best debugging happens in organizations that build a culture around learning from bugs.

33.18.1 Post-Incident Reviews

After every production incident, conduct a blameless post-mortem: 1. What happened? (Timeline of events) 2. What was the root cause? 3. How was it detected? 4. How was it resolved? 5. What prevented earlier detection? 6. What changes will prevent recurrence?

At MedClaim, James Okafor maintains a "Bug Book" — a shared document that records every significant production bug, its root cause, and the fix. New team members read the Bug Book as part of onboarding. "Every bug is a lesson," James says. "If you don't write it down, you learn the same lesson twice."

33.18.2 Defensive Coding Standards

Encoding debugging knowledge into coding standards prevents entire categories of bugs:

  • All numeric fields must have VALUE clauses (prevents S0C7 from uninitialized fields)
  • All file operations must check status codes (prevents processing stale data)
  • All elementary numeric moves, never group moves to numeric targets (prevents data corruption)
  • All EVALUATE statements must have WHEN OTHER (prevents silent fall-through)
  • All programs must include debug level infrastructure (enables runtime diagnostics)
  • All CALL statements must use ON EXCEPTION (catches S806 before abend)

These standards, enforced through code review, eliminate the most common classes of COBOL bugs before they ever reach test.

33.18.3 The Debug Pair

When debugging a particularly stubborn problem, pair with a colleague. The second person brings fresh eyes and challenges your assumptions. Maria Chen and James Okafor have a standing agreement: when either one has spent more than 30 minutes on a bug without progress, they call the other. "The fresh perspective is worth more than any debugger," Maria says.

33.19 Debugging Checklist

When you encounter a bug, work through this checklist systematically:

For abends: 1. Record the abend code, module name, and offset 2. Get the compiler listing (with LIST, MAP, OFFSET) 3. Map the offset to a COBOL statement 4. Identify the failing instruction (AP, CP, ZAP for S0C7; branching/addressing for S0C1/S0C4) 5. Check the operand values in the dump 6. Trace how those operands got their values

For incorrect results: 1. Identify the first point where actual output diverges from expected 2. Add DISPLAY statements (or use debugger) at key decision points upstream 3. Inspect input data — is it what you expect? 4. Check EVALUATE/IF logic — are conditions tested correctly? 5. Check numeric conversions — group moves vs. elementary moves? 6. Check for boundary conditions — first record, last record, empty file

For performance problems: 1. Check DB2 access paths (EXPLAIN) 2. Count SQL calls — is the program issuing more than expected? 3. Check for missing indexes 4. Check commit frequency 5. Look for unnecessary I/O (reading files multiple times) 6. Check for CPU-intensive operations in loops

For intermittent failures: 1. Check for timing-dependent logic 2. Check for data-dependent paths (does it only fail on certain inputs?) 3. Check for uninitialized fields (values differ by run) 4. Check for concurrent access issues (locking, deadlocks) 5. Check for environmental differences (test vs. production)

For interface failures (cross-program or cross-platform): 1. Check data encoding (EBCDIC vs. ASCII) 2. Check for null characters (X'00' from Java/C systems) 3. Check field lengths — are the sending and receiving copybooks in sync? 4. Check numeric formats — is the sender using display, packed, or binary? 5. Check byte order — big-endian (z/OS) vs. little-endian (x86) 6. Verify record counts and control totals at every handoff point

33.20 A Debugging Decision Tree

When you first encounter a problem, this decision tree helps you choose the right debugging approach:

Is the program abending?
├── YES: What is the abend code?
│   ├── S0C7 → Check numeric fields in dump
│   │         → Look for group moves, uninitialized fields
│   │         → Check input data quality
│   ├── S0C4 → Check CALL parameters
│   │         → Check table subscripts
│   │         → Look for corrupted pointers
│   ├── S0C1 → Check CALL target existence
│   │         → Check for corrupted branch addresses
│   ├── S322 → Check for infinite loops
│   │         → Check TIME parameter in JCL
│   ├── S806 → Check STEPLIB, module name spelling
│   │         → Check compile/link-edit success
│   └── Other → Consult z/OS System Codes manual
│
├── NO: Is the output incorrect?
│   ├── YES: Where does output first diverge?
│   │   ├── Input data wrong → Debug upstream program
│   │   ├── Logic error → Add DISPLAY at decision points
│   │   ├── Data conversion → Check group vs. elementary moves
│   │   └── Comparison fails → Check for hidden characters
│   │
│   └── NO: Is it a performance problem?
│       ├── YES: Measure I/O counts
│       │   ├── Too many SQL calls → Use JOIN, cursor
│       │   ├── Table scans → Add index, fix WHERE clause
│       │   ├── Lock contention → Check lock duration, ordering
│       │   └── CPU-bound → Check loops, SORT efficiency
│       │
│       └── NO: Is it an intermittent failure?
│           ├── YES: Check for:
│           │   ├── Uninitialized fields
│           │   ├── Timing-dependent logic
│           │   ├── Data-dependent paths
│           │   └── Concurrent access issues
│           │
│           └── Clarify the symptom before proceeding

This tree is not exhaustive, but it covers the most common scenarios. Print it and keep it at your desk — it saves time when you are under pressure at 3 AM.

33.21 Try It Yourself: Debug Challenge Lab

This lab exercise gives you hands-on practice with the debugging techniques from this chapter.

Challenge 1: The S0C7 Hunt

A program processes employee payroll records. It abends with S0C7 on the 847th record. You are given: - The abend offset (from the job log) - The compiler listing (with LIST, MAP, OFFSET) - A hex dump of WORKING-STORAGE at the time of the abend

Your task: identify the failing statement, the corrupted field, the invalid hex data, and the root cause. (Hint: the 847th employee has a name that starts with a digit, and a group MOVE copies the name field over a numeric field.)

Challenge 2: The Silent Wrong Answer

A claim adjudication program calculates the correct approved amount for 99.8% of claims but produces $0.00 for claims with procedure code "99214" from providers who joined the network in 2024. The program does not abend.

Your task: add targeted DISPLAY debugging to trace the data flow for a failing claim, identify the root cause (a date comparison that uses display-format dates incorrectly), and apply the fix.

Challenge 3: The Intermittent Deadlock

A CICS transfer transaction works correctly 99.9% of the time but occasionally returns an error to the user. The CICS log shows SQLCODE -911.

Your task: analyze the locking pattern in the program, identify why deadlocks occur for specific account number combinations, implement resource ordering, and add deadlock retry logic.

These challenges mirror real-world debugging scenarios that you will encounter in production mainframe environments. Work through them systematically using the techniques from this chapter.

33.22 The Economics of Debugging

It may seem odd to discuss economics in a debugging chapter, but understanding the cost structure of bugs motivates the investment in prevention and detection.

33.22.1 The Cost Multiplier

A bug caught during development costs minutes to fix. The same bug caught in testing costs hours. The same bug found in production costs days or weeks when you include the investigation, the fix, the retesting, the emergency deployment, and the business impact.

At MedClaim, the null-character bug (Case Study 2) underpaid providers by $340,000 over two months. The fix took 30 minutes of coding. But the total cost included: - 2 person-days of investigation - 5 days of claim reprocessing - $340,000 in supplemental payments (float cost) - Provider relationship damage (unmeasured) - Regulatory reporting of the error

The total cost was estimated at $45,000 — for a bug that could have been prevented by a single INSPECT statement.

33.22.2 Investment in Prevention

Based on this cost analysis, James Okafor justified a project to add defensive data validation to all of MedClaim's claim processing programs. The project cost $120,000 (six person-months of development). In the first year after implementation, it caught 847 data quality issues that would previously have caused miscalculations or abends. James estimates that the project prevented at least $500,000 in incident costs.

"Debugging is reactive. Defensive programming is proactive. We should invest more in the proactive side," James told his management. The Bug Book became the evidence that convinced them.

33.23 Chapter Summary

Debugging is a skill that improves with practice and systematic thinking. In this chapter, you learned:

  • DISPLAY debugging: Strategic placement of DISPLAY statements with debug levels for controlled verbosity
  • Compiler options: SSRANGE for subscript checking, CHECK for numeric validation, TEST for debugger support, LIST/MAP/OFFSET for dump analysis
  • READY TRACE: Built-in paragraph-level tracing for understanding execution flow
  • Abend analysis: Reading dumps to identify the failing instruction and corrupted data, with special attention to S0C7 (data exceptions)
  • CEEDUMP: Language Environment's formatted dump with COBOL variable values and call stacks
  • Interactive debuggers: IBM Debug Tool and Xpediter for breakpoints, stepping, and variable inspection
  • CICS debugging: CEDF/EDF for intercepting EXEC CICS commands; RESP/RESP2 for programmatic error detection
  • SQL debugging: SQLCODE checking, DSNTEP2/SPUFI for testing SQL, common SQLCODE meanings
  • Common bugs: Uninitialized fields, group moves, off-by-one errors, missing status checks, null characters from cross-platform interfaces
  • Production debugging: Working with limited access — dump analysis, log reading, and reproducing in test environments

The best debugging skill, however, is one we have not discussed: prevention. Every bug you find teaches you something about how bugs are created. Use that knowledge to write more defensive code — validate inputs, check status codes, initialize fields, use scope terminators, and test boundary conditions. The best debugger is a programmer who writes code that does not need debugging.


"I've been debugging COBOL for fifteen years. The bugs haven't changed much — it's still uninitialized fields, unchecked status codes, and group moves. What's changed is how fast I find them." — James Okafor, reflecting on his craft