> "Every bug in a COBOL program was put there by a person. Understanding the bug means understanding the person's mistake. That requires empathy as much as skill." — James Okafor, MedClaim team lead
In This Chapter
- 33.1 The Debugging Mindset
- 33.2 DISPLAY Statement Debugging
- 33.3 Compiler Debugging Options
- 33.4 READY TRACE and RESET TRACE
- 33.5 Abend Analysis — Reading Dumps
- 33.6 Interactive Debugging Tools
- 33.7 Debugging CICS Programs
- 33.8 Debugging Embedded SQL
- 33.9 Common COBOL Bugs and Their Symptoms
- 33.10 GlobalBank Case Study: Debugging a S0C7 in BAL-CALC
- 33.11 MedClaim Case Study: Finding Why Claims Are Miscalculated
- 33.12 Production Debugging — Limited Access
- 33.13 Building a Debugging Toolkit
- 33.14 Debugging Performance Problems
- 33.15 Debugging Batch Programs with Large Data Volumes
- 33.16 Debugging Multi-Program Systems
- 33.17 Debugging COBOL Called Programs (Subprograms)
- 33.18 Establishing a Debugging Culture
- 33.19 Debugging Checklist
- 33.20 A Debugging Decision Tree
- 33.21 Try It Yourself: Debug Challenge Lab
- 33.22 The Economics of Debugging
- 33.23 Chapter Summary
Chapter 33: Debugging Strategies
"Every bug in a COBOL program was put there by a person. Understanding the bug means understanding the person's mistake. That requires empathy as much as skill." — James Okafor, MedClaim team lead
It is 2:47 AM. James Okafor's phone rings. Production is down. The nightly claim processing batch job has abended with S0C7 at offset +003A2E in module CLM-ADJUD. Thousands of claims are stuck in the pipeline, and the morning processing window opens in four hours. James pulls up the dump, starts tracing through hexadecimal storage, and within twenty minutes identifies the problem: a provider record with an alphabetic character in a numeric field, introduced by a data conversion three weeks ago that nobody caught. He patches the data, restarts the job, and goes back to sleep.
This chapter teaches you the debugging skills that James Okafor uses daily — skills that separate effective mainframe developers from those who stare helplessly at dumps. You will learn to use DISPLAY statements strategically, leverage compiler debugging options, read abend dumps, use interactive debuggers, and develop the systematic thinking that turns mysterious failures into understood problems.
33.1 The Debugging Mindset
Before we discuss tools and techniques, let us establish the right mindset. Debugging is not random guessing. It is systematic hypothesis testing:
- Observe: What exactly happened? What abend code? What data was being processed? What was the expected behavior?
- Hypothesize: Based on the symptoms, what could cause this? Generate multiple hypotheses.
- Test: For each hypothesis, what evidence would confirm or refute it? Examine that evidence.
- Narrow: Eliminate hypotheses until one remains. If all are eliminated, generate new ones.
- Fix: Apply the smallest correct fix. Verify that it resolves the issue without introducing new ones.
- Prevent: Add validation, error handling, or tests to prevent recurrence.
🧪 The Human Factor: The hardest debugging is reading code someone else wrote years ago. You are not just debugging logic — you are reverse-engineering another person's intent. Comments may be wrong or missing. Variable names may be misleading. The code may have been modified multiple times by different people. Approach unfamiliar code with patience and humility.
33.2 DISPLAY Statement Debugging
The DISPLAY statement is the oldest and most universal debugging tool in COBOL. It writes text to SYSOUT (typically the job log), allowing you to trace program execution and inspect data values.
33.2.1 Strategic DISPLAY Placement
Do not scatter DISPLAYs randomly. Place them at key decision points:
*--- Entry/exit of major paragraphs ---*
2000-PROCESS-CLAIM.
IF WS-DEBUG-MODE = 'Y'
DISPLAY 'ENTERING 2000-PROCESS-CLAIM'
DISPLAY ' CLAIM-ID: ' WS-CLAIM-ID
DISPLAY ' CLAIM-AMT: ' WS-CLAIM-AMT
END-IF
PERFORM 2100-VALIDATE-CLAIM
PERFORM 2200-ADJUDICATE-CLAIM
PERFORM 2300-CREATE-PAYMENT
IF WS-DEBUG-MODE = 'Y'
DISPLAY 'EXITING 2000-PROCESS-CLAIM'
DISPLAY ' STATUS: ' WS-CLAIM-STATUS
DISPLAY ' APPROVED-AMT: ' WS-APPROVED-AMT
END-IF.
33.2.2 Debug Levels
Implement a debug level system to control verbosity without recompiling:
01 WS-DEBUG-CONTROL.
05 WS-DEBUG-LEVEL PIC 9(1) VALUE 0.
* 0 = No debug output
* 1 = Major paragraph entry/exit
* 2 = Key decision points and data values
* 3 = Every paragraph, detailed data
* 9 = Full trace (extremely verbose)
...
*--- Level 1: Major flow ---*
IF WS-DEBUG-LEVEL >= 1
DISPLAY 'DBG1: ENTERING PROCESS-CLAIM'
END-IF
*--- Level 2: Decisions and data ---*
IF WS-DEBUG-LEVEL >= 2
DISPLAY 'DBG2: PROVIDER-ID=' WS-PROVIDER-ID
DISPLAY 'DBG2: BILLED-AMT=' WS-BILLED-AMT
DISPLAY 'DBG2: PLAN-CODE=' WS-PLAN-CODE
END-IF
*--- Level 3: Detailed trace ---*
IF WS-DEBUG-LEVEL >= 3
DISPLAY 'DBG3: FEE-SCHED LOOKUP RESULT='
WS-FEE-RESULT
DISPLAY 'DBG3: COPAY=' WS-COPAY-AMT
DISPLAY 'DBG3: DEDUCTIBLE=' WS-DEDUCT-AMT
END-IF
Read the debug level from a control card or PARM so you can change it without recompiling:
1000-INITIALIZE.
ACCEPT WS-DEBUG-LEVEL FROM JCL-PARM
IF WS-DEBUG-LEVEL > 0
DISPLAY 'DEBUG MODE ACTIVE - LEVEL '
WS-DEBUG-LEVEL
END-IF.
33.2.3 Displaying Hex Values
When debugging data exceptions, you often need to see the hexadecimal representation of data. COBOL does not have a built-in hex display, but you can use this technique:
01 WS-HEX-DISPLAY.
05 WS-HEX-TABLE.
10 FILLER PIC X(16) VALUE '0123456789ABCDEF'.
05 WS-HEX-CHARS REDEFINES WS-HEX-TABLE
PIC X(1) OCCURS 16.
05 WS-HEX-OUTPUT PIC X(80).
DISPLAY-HEX-FIELD.
*--- Display a field's hex content ---*
MOVE SPACES TO WS-HEX-OUTPUT
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > WS-FIELD-LENGTH
DIVIDE WS-BYTE(WS-IDX) BY 16
GIVING WS-HIGH-NIBBLE
REMAINDER WS-LOW-NIBBLE
STRING WS-HEX-CHARS(WS-HIGH-NIBBLE + 1)
WS-HEX-CHARS(WS-LOW-NIBBLE + 1)
DELIMITED BY SIZE
INTO WS-HEX-OUTPUT
WITH POINTER WS-HEX-PTR
END-PERFORM
DISPLAY 'HEX: ' WS-HEX-OUTPUT.
For production emergencies, a simpler approach is to display the raw data and examine it in the job log:
DISPLAY 'RAW DATA: >' WS-SUSPECT-FIELD '<'
The angle brackets delimit the field so you can spot trailing spaces, low values, or other unexpected characters in the SYSOUT.
💡 Pro Tip: When adding temporary DISPLAY statements for debugging, prefix them with a unique tag like *DBG* so you can easily find and remove them later:
DISPLAY '*DBG* BALANCE=' WS-BALANCE
After fixing the bug, search for *DBG* and remove all debug statements. Better yet, use conditional debug levels (Section 33.2.2) and leave them in permanently.
33.3 Compiler Debugging Options
The Enterprise COBOL compiler offers several options that detect bugs at compile time or runtime:
33.3.1 SSRANGE — Subscript Range Checking
SSRANGE checks that all table subscripts and indexes are within the declared bounds. Without SSRANGE, an out-of-bounds subscript silently accesses whatever memory happens to be adjacent to the table — a classic source of mysterious data corruption.
//COBOL EXEC PGM=IGYCRCTL,
// PARM='SSRANGE,TEST(DWARF),OPT(0)'
With SSRANGE active:
01 WS-TABLE.
05 WS-ENTRY PIC X(10) OCCURS 100 TIMES.
...
*--- With SSRANGE, this causes a runtime error ---*
MOVE 'OVERFLOW' TO WS-ENTRY(101).
Without SSRANGE, the MOVE silently writes past the table into whatever follows it in WORKING-STORAGE, corrupting data. With SSRANGE, the program abends with a clear diagnostic message telling you which subscript was out of range.
⚠️ Warning: SSRANGE adds runtime overhead (approximately 5-15% depending on the program). Use it during development and testing. Some shops disable it for production, though Maria Chen argues it should stay on: "The performance cost of SSRANGE is nothing compared to the cost of a production data corruption incident."
33.3.2 CHECK — Numeric Data Checking
The CHECK compiler option validates numeric fields at runtime, catching data exceptions before they cause S0C7 abends:
//COBOL EXEC PGM=IGYCRCTL,
// PARM='CHECK(ON)'
CHECK catches: - Non-numeric data in numeric fields (the classic S0C7 cause) - Decimal arithmetic overflows - Division by zero
33.3.3 TEST — Debug Symbol Generation
The TEST option generates debugging information that interactive debuggers use to map machine code back to COBOL source lines:
//COBOL EXEC PGM=IGYCRCTL,
// PARM='TEST(DWARF,SOURCE,SEPARATE)'
TEST options: - DWARF: Generate DWARF debug information (modern standard) - SOURCE: Include source statement information - SEPARATE: Store debug info in a separate file (reduces load module size)
33.3.4 LIST and MAP — Compiler Listings
While not runtime options, LIST and MAP generate essential debugging artifacts:
- LIST: Produces an assembler listing showing the generated machine code, critical for mapping dump offsets to COBOL statements
- MAP: Shows the storage layout — where every variable lives in memory, essential for reading dumps
//COBOL EXEC PGM=IGYCRCTL,
// PARM='LIST,MAP,OFFSET,XREF'
MAP output (partial example):
DATA BASE DISPL DEFINITION
DIVISION LOC
------- ---- ----- ----------
WS-CLAIM-ID BL=01 000 PIC X(15)
WS-CLAIM-AMT BL=01 00F PIC S9(9)V99 COMP-3
WS-CLAIM-STATUS BL=01 015 PIC X(2)
WS-PROVIDER-ID BL=01 017 PIC X(10)
The DISPL (displacement) column tells you exactly where each variable lives relative to the base register. When a dump shows data at a specific offset, the MAP listing lets you identify which variable it is.
33.3.5 OPTIMIZE vs. Debugging
Optimization and debugging are at odds. The compiler's optimizer reorders instructions, eliminates dead code, and consolidates calculations — all of which make it harder to map dump offsets to source lines.
Development/Test: OPT(0),TEST(DWARF,SOURCE),SSRANGE
Production (normal): OPT(2),NOTEST,NOSSRANGE
Production (debugging): OPT(0),TEST(DWARF,SOURCE),LIST,MAP
📊 Compiler Options Quick Reference:
| Option | Purpose | Overhead | When to Use |
|---|---|---|---|
| SSRANGE | Subscript bounds checking | 5-15% | Development, test, ideally production |
| CHECK | Numeric validation | 10-20% | Development, test |
| TEST | Debug symbols | Compile-time only | Whenever interactive debugging is needed |
| LIST | Assembler listing | Compile-time only | Always (for dump analysis) |
| MAP | Storage map | Compile-time only | Always (for dump analysis) |
| OFFSET | Statement offsets | Compile-time only | Always (for dump analysis) |
33.4 READY TRACE and RESET TRACE
COBOL provides built-in tracing through the READY TRACE and RESET TRACE statements:
READY TRACE.
*--- From here, COBOL traces every paragraph entered ---*
PERFORM 2000-PROCESS-RECORD.
PERFORM 3000-UPDATE-DATABASE.
RESET TRACE.
*--- Tracing stops ---*
READY TRACE causes the runtime to display the name of every paragraph or section as it is entered. This produces voluminous output but is invaluable when you need to understand the actual execution path.
Sample output:
READY TRACE
2000-PROCESS-RECORD
2100-VALIDATE-INPUT
2200-LOOKUP-PROVIDER
2210-CHECK-CREDENTIALS
2200-LOOKUP-PROVIDER (returned)
2300-CALCULATE-PAYMENT
2310-APPLY-DEDUCTIBLE
2320-APPLY-COPAY
2300-CALCULATE-PAYMENT (returned)
2000-PROCESS-RECORD (returned)
3000-UPDATE-DATABASE
⚠️ Warning: READY TRACE generates enormous amounts of output for programs that execute millions of PERFORMs. Use it selectively — bracket it around the suspected problem area, and if possible, add a counter to activate it only for specific records:
IF WS-RECORD-COUNT = WS-PROBLEM-RECORD-NUM
READY TRACE
END-IF
PERFORM 2000-PROCESS-RECORD
IF WS-RECORD-COUNT = WS-PROBLEM-RECORD-NUM
RESET TRACE
END-IF.
33.5 Abend Analysis — Reading Dumps
When a COBOL program terminates abnormally (abends), the system produces a dump — a snapshot of the program's memory at the moment of failure. Reading dumps is the most important debugging skill for mainframe developers.
33.5.1 Common Abend Codes
Every mainframe developer should have these memorized:
| Code | Name | Usual Cause |
|---|---|---|
| S0C1 | Operation Exception | Branch to non-executable storage (bad CALL, corrupted branch address) |
| S0C4 | Protection Exception | Accessing storage outside your address space (wild pointer, unallocated memory) |
| S0C7 | Data Exception | Non-numeric data in a packed decimal field (the #1 COBOL abend) |
| S0CB | Decimal Divide Exception | Division by zero or quotient too large |
| S322 | Time Limit Exceeded | Job ran too long — likely an infinite loop |
| S806 | Module Not Found | CALL to a program that does not exist in the load library |
| S80A | Not Enough Virtual Storage | REGION too small or storage leak |
| S0C5 | Addressing Exception | Similar to S0C4, accessing invalid address |
| S013 | Conflicting DCB | File attributes don't match between JCL and program |
| S001 | I/O Error | Physical read/write error on a dataset |
33.5.2 The Anatomy of S0C7
S0C7 (data exception) deserves special attention because it is by far the most common COBOL abend. It occurs when the hardware attempts a decimal arithmetic or comparison instruction on a field that does not contain valid packed decimal data.
What causes S0C7: - Uninitialized COMP-3 (packed decimal) fields - Alphabetic data moved to a numeric field - File records with corrupted data - Reading past end-of-file - Uninitialized group-level moves that overlay numeric fields - Incorrect REDEFINES over numeric fields
How to diagnose S0C7:
Step 1: Find the failing instruction's offset from the abend message:
IEA995I SYMPTOM DUMP:
SYSTEM COMPLETION CODE=0C7 REASON CODE=00000000
PSW AT TIME OF ERROR 078D1000 A003A2E0
ILC 06 INTC 07
ACTIVE LOAD MODULE CLM-ADJUD
OFFSET IN LOAD MODULE 0003A2E0
Step 2: Find that offset in the compiler listing (compiled with LIST and OFFSET):
003A2E AP APPROVED-AMT(6,WS-BASE+180),
COPAY-AMT(5,WS-BASE+186)
Step 3: The AP (Add Packed) instruction is failing. One of the two operands — APPROVED-AMT or COPAY-AMT — contains non-packed-decimal data.
Step 4: Find these fields in the dump. The MAP listing tells you their displacement from the base register. Look at the hex data:
WS-BASE+180: 00 50 00 0C (valid packed: +5000.00)
WS-BASE+186: F1 F2 F3 F4 (INVALID — this is EBCDIC "1234", not packed)
The COPAY-AMT field contains EBCDIC character data instead of packed decimal. Now you trace backward to find how that field got its value.
33.5.3 Reading a Dump — Step by Step
Here is a systematic approach to reading dumps:
Step 1: Identify the abend code and module.
COMPLETION CODE - SYSTEM=0C7 USER=0000
MODULE - CLM-ADJUD OFFSET - 0003A2E0
Step 2: Get the compiler listing for the module. You need the listing from the exact compile that produced the load module in production.
Step 3: Map the offset to a COBOL statement. The OFFSET compiler option produces a table mapping offsets to source line numbers:
LINE # OFFSET
001250 003A2A
001251 003A2E <--- Our failing offset
001252 003A36
Line 1251 is:
001251 ADD WS-COPAY-AMT TO WS-APPROVED-AMT.
Step 4: Find the operands in the dump. The MAP listing gives you displacements:
WS-COPAY-AMT BL=01 186 PIC S9(7)V99 COMP-3
WS-APPROVED-AMT BL=01 180 PIC S9(9)V99 COMP-3
Step 5: Find Base Register 01 in the dump. The register save area shows:
REGS AT ENTRY TO ABENDING MODULE:
R13=00A34000 (save area)
...
BL=01 address = 00B5C000
Step 6: Read the data at that address.
00B5C000+180: 00 05 00 00 0C (WS-APPROVED-AMT = +50000.0)
00B5C000+186: F1 F2 F3 F4 F5 (WS-COPAY-AMT = INVALID)
Step 7: Diagnose. WS-COPAY-AMT contains X'F1F2F3F4F5' which is EBCDIC "12345". Someone moved display-format data into a COMP-3 field.
33.5.4 CEEDUMP and Language Environment Debugging
IBM's Language Environment (LE) provides enhanced debugging through CEEDUMP. When a LE-managed program abends, CEEDUMP produces a formatted dump that includes:
- Traceback of active routines (call stack)
- Condition information (what caused the abend)
- Variable values at the point of failure
- Argument values for each routine in the traceback
CEEDUMP output:
CEE3DMP V2 R5.0: Condition processing resulted in the
unhandled condition.
Information for enclave CLM-ADJUD
Condition Information for Active Routines
Condition Information for (DSA address 2098B528)
CIB Address: 2098B3E0
Current Condition:
CEE3207S The system detected a data exception
(System Completion Code=0C7).
Location:
Program Unit: CLM-ADJUD
Entry: CLM-ADJUD
Statement: 1251
Offset: +0003A2E0
Traceback:
DSA Entry Statement Offset
1 CLM-ADJUD 1251 +0003A2E0
2 CLM-MAIN 450 +00001A8C
3 CEEMAIN --- +00000166
Variables for Active Routine CLM-ADJUD:
WS-CLAIM-ID = "CLM000045892 "
WS-APPROVED-AMT = +50000.00
WS-COPAY-AMT = (INVALID DATA)
WS-PROVIDER-ID = "PROV001234"
💡 Key Advantage: CEEDUMP shows you the COBOL statement number and variable values without needing to manually map offsets and read hex. This dramatically reduces debugging time.
To enable CEEDUMP, add this to your JCL:
//CEEOPTS DD *
TRAP(ON,SPIE)
TERMTHDACT(UADUMP)
/*
//CEEDUMP DD SYSOUT=*
33.6 Interactive Debugging Tools
33.6.1 IBM Debug Tool (z/OS Debugger)
IBM Debug Tool (now called IBM z/OS Debugger) allows you to set breakpoints, step through COBOL code, inspect variables, and modify values at runtime — the mainframe equivalent of GDB or Visual Studio's debugger.
Key capabilities: - Set breakpoints at COBOL paragraph names or statement numbers - Step through code line by line - Display and modify variable values - Set conditional breakpoints (break only when a condition is true) - Monitor variables for changes - Examine call stacks
Starting a debug session in batch:
//TEST EXEC PGM=EQANMDBG,PARM='TEST(,,,DBMDT*)'
//INSPLOG DD SYSOUT=*
//INSPPREF DD DUMMY
Common debug commands:
AT 2000-PROCESS-CLAIM (breakpoint at paragraph)
AT LINE 1251 (breakpoint at line number)
AT LINE 1251 WHEN WS-CLAIM-ID = 'CLM000045892'
(conditional breakpoint)
GO (run until next breakpoint)
STEP (execute one statement)
LIST WS-CLAIM-ID (display variable value)
LIST WS-APPROVED-AMT HEX (display in hex)
LIST (WS-TABLE(1:5)) (display first 5 table entries)
SET WS-DEBUG-LEVEL = 3 (modify a variable)
QUERY LOCATION (show current position)
LIST CALLS (show call stack)
33.6.2 Compuware Xpediter
Xpediter is a popular third-party interactive debugger used at many mainframe shops. It provides similar capabilities to IBM Debug Tool with a different interface:
Key Xpediter features: - Source-level debugging with COBOL source display - Before/after data inspection - Automatic data formatting (displays COMP-3 fields in decimal) - File I/O interception — see exactly what records are being read/written - Abend-Aid integration — enhanced abend analysis
Xpediter workflow: 1. Compile with TEST option 2. Start Xpediter session (ISPF option or JCL PARM) 3. Set breakpoints by typing 'B' next to lines on the source display 4. Run program — execution stops at breakpoints 5. Inspect variables in the working storage display panel 6. Step through code, watching values change
At GlobalBank, Derek Washington prefers Xpediter because of its visual interface: "I can see the source code, the data values, and the execution flow all on one screen. It's like having X-ray vision into the program."
33.6.3 Choosing the Right Debugger
| Feature | IBM Debug Tool | Xpediter |
|---|---|---|
| Cost | Included with z/OS | Separate license |
| Interface | Commands or Eclipse | ISPF panels |
| COBOL support | Excellent | Excellent |
| CICS debugging | Yes | Yes |
| DB2 debugging | Yes | Yes |
| Batch debugging | Yes | Yes |
| Remote debugging | Via Eclipse | Via ISPF |
| Abend analysis | Basic | Advanced (Abend-Aid) |
33.7 Debugging CICS Programs
CICS programs present unique debugging challenges because they run inside the CICS region, not as standalone batch jobs.
33.7.1 CEDF — CICS Execution Diagnostic Facility
CEDF is CICS's built-in debugger. It intercepts every EXEC CICS command and lets you see the before/after values:
To start CEDF: type CEDF on a CICS terminal, then start your transaction.
TRANSACTION: XINQ PROGRAM: CUSTINQ1
ABOUT TO EXECUTE COMMAND:
EXEC CICS READ
FILE ('CUSTFILE')
INTO (X'00A340A0')
LENGTH (250)
RIDFLD (X'C3F0F0F0F0F1F0F0F0F1')
RESP (0)
RESP2 (0)
PRESS ENTER TO CONTINUE
After the command executes:
COMMAND EXECUTION COMPLETE
EXEC CICS READ
FILE ('CUSTFILE')
INTO (X'00A340A0')
LENGTH (250)
RIDFLD (X'C3F0F0F0F0F1F0F0F0F1')
RESP (NORMAL)
RESP2 (0)
RESPONSE: NORMAL
PRESS ENTER TO CONTINUE
33.7.2 EDF — Extended Diagnostic Facility
EDF extends CEDF by showing COBOL statements between EXEC CICS commands. It provides a more complete view of program execution:
STATEMENT 1250:
MOVE WS-CUST-ID TO RIDFLD-CUST-KEY
ABOUT TO EXECUTE COMMAND:
EXEC CICS READ FILE('CUSTFILE') ...
STATEMENT 1255:
IF WS-CICS-RESP = DFHRESP(NORMAL)
STATEMENT 1257:
PERFORM 2200-DISPLAY-CUSTOMER
33.7.3 CICS Debugging Best Practices
- Always check RESP codes: The equivalent of checking SQLCODE in DB2.
EXEC CICS READ
FILE('CUSTFILE')
INTO(WS-CUST-REC)
RIDFLD(WS-CUST-KEY)
RESP(WS-RESP)
RESP2(WS-RESP2)
END-EXEC
IF WS-RESP NOT = DFHRESP(NORMAL)
MOVE WS-RESP TO WS-DIAG-RESP
MOVE WS-RESP2 TO WS-DIAG-RESP2
EXEC CICS WRITEQ TD
QUEUE('CSMT')
FROM(WS-DIAG-MSG)
LENGTH(LENGTH OF WS-DIAG-MSG)
END-EXEC
END-IF.
-
Write to CSMT for diagnostics: The CSMT transient data queue goes to the CICS message log, which is visible to system programmers.
-
Use auxiliary trace: CICS trace captures detailed execution flow.
EXEC CICS SET TRACEFLAG(ON) END-EXEC.
- Avoid HANDLE CONDITION in new code: Use RESP/RESP2 instead. HANDLE CONDITION masks errors and makes debugging harder.
33.8 Debugging Embedded SQL
33.8.1 SQLCODE Checking
After every EXEC SQL statement, check SQLCODE:
01 WS-SQL-DIAG.
05 WS-DIAG-SQLCODE PIC S9(9) COMP.
05 WS-DIAG-SQLERRM PIC X(70).
05 WS-DIAG-STMT PIC X(30).
EXEC SQL
SELECT CUST_NAME, CUST_STATUS
INTO :WS-CUST-NAME, :WS-CUST-STATUS
FROM CUSTOMER
WHERE CUST_ID = :WS-CUST-ID
END-EXEC.
IF SQLCODE NOT = 0 AND SQLCODE NOT = +100
MOVE SQLCODE TO WS-DIAG-SQLCODE
MOVE SQLERRMC TO WS-DIAG-SQLERRM
MOVE 'CUSTOMER SELECT' TO WS-DIAG-STMT
DISPLAY 'SQL ERROR IN ' WS-DIAG-STMT
DISPLAY ' SQLCODE: ' WS-DIAG-SQLCODE
DISPLAY ' SQLERRM: ' WS-DIAG-SQLERRM
END-IF.
33.8.2 Common SQLCODEs
| SQLCODE | Meaning | Debugging Action |
|---|---|---|
| 0 | Success | — |
| +100 | Not found / end of cursor | Verify WHERE clause, check data exists |
| -180 | Invalid date/time | Check date format (YYYY-MM-DD for DB2) |
| -204 | Object not found | Table/view name wrong or not granted |
| -206 | Column not in table | Typo in column name |
| -305 | Null indicator needed | Column allows NULLs; add indicator variable |
| -530 | Foreign key violation | Parent row doesn't exist |
| -803 | Duplicate key | Unique index conflict |
| -811 | Multiple rows returned | SELECT INTO returned more than one row |
| -904 | Resource unavailable | Database or tablespace stopped/locked |
| -911 | Deadlock/timeout | Retry the transaction |
| -922 | Authorization failure | Missing GRANT |
33.8.3 DSNTEP2 and SPUFI
DSNTEP2 is DB2's batch SQL processor. Use it to test SQL statements outside your COBOL program:
//DSNTEP2 EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(DB2P)
RUN PROGRAM(DSNTEP2) PLAN(DSNTEP21)
END
/*
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
SELECT CUST_NAME, CUST_STATUS
FROM CUSTOMER
WHERE CUST_ID = 'C000010001';
/*
SPUFI (SQL Processor Using File Input) is the interactive ISPF-based SQL tool. It provides the same capability through an online panel.
Both tools are invaluable for: - Testing SQL syntax before embedding it in COBOL - Verifying that your WHERE clause returns the expected data - Examining the actual data in tables to understand why your program behaves unexpectedly
✅ Try It Yourself: Take a program that accesses DB2 and add comprehensive SQLCODE checking after every EXEC SQL statement. Create a diagnostic paragraph that logs SQLCODE, SQLERRMC, and the statement identifier to SYSOUT. Test with a deliberately wrong table name to verify your error handling works.
33.9 Common COBOL Bugs and Their Symptoms
Experience teaches patterns. Here are the bugs that James Okafor sees most often at MedClaim:
33.9.1 Uninitialized Fields
Symptom: S0C7 on first calculation, or incorrect results that change with every run. Cause: WORKING-STORAGE fields not initialized with VALUE clauses or explicit MOVEs.
*--- BUG: WS-TOTAL never initialized ---*
01 WS-TOTAL PIC S9(9)V99 COMP-3.
...
ADD WS-AMOUNT TO WS-TOTAL. *> S0C7 if uninit
*--- FIX: Initialize in declaration or in code ---*
01 WS-TOTAL PIC S9(9)V99 COMP-3 VALUE ZEROS.
33.9.2 Off-by-One in Table Processing
Symptom: Last record not processed, or S0C4/S0C7 from processing one record past the end. Cause: Loop boundary error.
*--- BUG: Processes one too many ---*
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > 101
MOVE WS-TABLE-ENTRY(WS-IDX) TO ...
END-PERFORM.
*--- WS-TABLE only has 100 entries! ---*
*--- FIX: Use exact bound ---*
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > WS-TABLE-COUNT
OR WS-IDX > 100
33.9.3 MOVE with Truncation
Symptom: Data silently truncated, incorrect values in output. Cause: Moving a larger field to a smaller one without checking.
01 WS-FULL-NAME PIC X(50).
01 WS-SHORT-NAME PIC X(20).
*--- BUG: Silently truncates names longer than 20 ---*
MOVE WS-FULL-NAME TO WS-SHORT-NAME.
*--- Detection: Use compiler option TRUNC(BIN) or check ---*
IF FUNCTION LENGTH(
FUNCTION TRIM(WS-FULL-NAME TRAILING)) > 20
DISPLAY 'WARNING: NAME TRUNCATED: '
WS-FULL-NAME
END-IF.
33.9.4 Numeric Field Mismatch
Symptom: S0C7 or incorrect arithmetic results. Cause: Moving data between incompatible numeric types.
*--- BUG: Moving display numeric to COMP-3 via group move ---*
01 WS-INPUT-REC.
05 IN-AMOUNT PIC 9(7)V99. *> Display numeric
01 WS-WORK-REC.
05 WK-AMOUNT PIC S9(7)V99 COMP-3. *> Packed
*--- This is a GROUP move — no conversion! ---*
MOVE WS-INPUT-REC TO WS-WORK-REC. *> BUG!
*--- FIX: Move at the elementary level ---*
MOVE IN-AMOUNT TO WK-AMOUNT. *> Proper conversion
🔴 Critical Concept: Group moves are character moves — they copy bytes without any numeric conversion. Elementary moves between numeric fields perform the appropriate conversion (display to packed, packed to binary, etc.). This distinction is the #1 source of S0C7 abends in COBOL programs.
33.9.5 Missing Period in EVALUATE/IF
Symptom: Wrong code path executed, unexpected fall-through. Cause: In older-style COBOL without explicit scope terminators, a missing period changes the meaning of the code.
*--- BUG (old style): Missing period ---*
IF WS-STATUS = 'A'
PERFORM PROCESS-ACTIVE
IF WS-STATUS = 'I'
PERFORM PROCESS-INACTIVE.
*--- The second IF is INSIDE the first IF's
*--- true branch! PROCESS-INACTIVE only runs
*--- when status is 'A' AND 'I' (never).
*--- FIX: Use END-IF scope terminators ---*
IF WS-STATUS = 'A'
PERFORM PROCESS-ACTIVE
END-IF
IF WS-STATUS = 'I'
PERFORM PROCESS-INACTIVE
END-IF.
33.9.6 File Status Not Checked
Symptom: Processing wrong data, infinite loop, mysterious S0C7. Cause: Continuing to process after a failed file operation.
*--- BUG: No status check after READ ---*
READ CUSTOMER-FILE INTO WS-CUST-REC.
PERFORM PROCESS-CUSTOMER.
*--- If READ failed, WS-CUST-REC is stale! ---*
*--- FIX: Check file status ---*
READ CUSTOMER-FILE INTO WS-CUST-REC
IF WS-FILE-STATUS = '00'
PERFORM PROCESS-CUSTOMER
ELSE IF WS-FILE-STATUS = '10'
SET END-OF-FILE TO TRUE
ELSE
DISPLAY 'READ ERROR: ' WS-FILE-STATUS
PERFORM ABEND-HANDLER
END-IF.
33.9.7 PERFORM THRU with Fall-Through
Symptom: Code executing that should not be. Cause: Adding new paragraphs between PERFORM THRU boundaries.
*--- Original code ---*
PERFORM 2000-START THRU 2000-EXIT.
2000-START.
...
2000-EXIT.
EXIT.
*--- Someone later adds: ---*
2000-START.
...
2050-NEW-PARAGRAPH.
...
2000-EXIT.
EXIT.
*--- 2050-NEW-PARAGRAPH now executes inside the
*--- PERFORM THRU, which may not be intended! ---*
💡 Prevention: Avoid PERFORM THRU when possible. Use structured PERFORM paragraph. If THRU is required, clearly document the boundaries.
33.10 GlobalBank Case Study: Debugging a S0C7 in BAL-CALC
Let us walk through a real-world debugging scenario at GlobalBank.
The Problem: The nightly BAL-CALC (balance calculation) batch job abends with S0C7 after processing 47,293 records successfully. It has been running without issues for three years.
Step 1: Read the abend message.
IEA995I SYMPTOM DUMP:
SYSTEM COMPLETION CODE=0C7
ACTIVE LOAD MODULE BAL-CALC
OFFSET 0001A3F0
Step 2: Maria Chen pulls up the compiler listing and finds offset 0001A3F0:
LINE OFFSET SOURCE
003847 01A3E8 COMPUTE WS-NET-BALANCE =
003848 01A3F0 WS-GROSS-BALANCE - WS-HOLD-AMOUNT
Line 3848: the COMPUTE is failing. Either WS-GROSS-BALANCE or WS-HOLD-AMOUNT contains non-numeric data.
Step 3: Check the CEEDUMP for variable values.
Variables at Statement 3848:
WS-ACCT-NUMBER = "ACCT00047294"
WS-GROSS-BALANCE = +1234567.89
WS-HOLD-AMOUNT = (INVALID DATA - hex: 40404040404040)
WS-HOLD-AMOUNT contains X'40' (EBCDIC spaces). A COMP-3 field full of spaces is not valid packed decimal.
Step 4: Trace where WS-HOLD-AMOUNT gets its value.
Maria searches the source code and finds:
003820 MOVE ACCT-HOLD-AMT TO WS-HOLD-AMOUNT.
ACCT-HOLD-AMT comes from the account master file record. The record for ACCT00047294 must have spaces in the hold amount field.
Step 5: Verify the data.
//VERIFY EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
PRINT INFILE(ACCTMAST) -
FROMKEY(ACCT00047294) -
COUNT(1) CHARACTER
/*
Output confirms: the ACCT-HOLD-AMT field contains spaces.
Step 6: Find the root cause.
Maria checks the change log: three weeks ago, a data conversion program migrated accounts from an old system. That program used spaces as "no hold amount" instead of packed zeros. The conversion program had a bug:
*--- Bug in conversion program ---*
IF OLD-HOLD-AMT = SPACES
CONTINUE *> Should have moved zeros!
ELSE
MOVE OLD-HOLD-AMT TO NEW-HOLD-AMT
END-IF.
Step 7: Fix the immediate problem.
*--- Add validation before the calculation ---*
IF ACCT-HOLD-AMT IS NOT NUMERIC
MOVE ZEROS TO WS-HOLD-AMOUNT
ADD 1 TO WS-INVALID-DATA-COUNT
IF WS-DEBUG-LEVEL >= 2
DISPLAY 'DBG2: NON-NUMERIC HOLD-AMT FOR '
WS-ACCT-NUMBER
' — DEFAULTED TO ZERO'
END-IF
ELSE
MOVE ACCT-HOLD-AMT TO WS-HOLD-AMOUNT
END-IF.
Step 8: Fix the root cause.
Write a cleanup program to fix all accounts with spaces in the hold amount:
*--- Data cleanup program ---*
PERFORM UNTIL END-OF-ACCT-FILE
READ ACCT-MASTER INTO WS-ACCT-REC
AT END SET END-OF-ACCT-FILE TO TRUE
END-READ
IF NOT END-OF-ACCT-FILE
IF ACCT-HOLD-AMT IS NOT NUMERIC
MOVE ZEROS TO ACCT-HOLD-AMT
REWRITE ACCT-RECORD FROM WS-ACCT-REC
ADD 1 TO WS-FIX-COUNT
END-IF
END-IF
END-PERFORM
DISPLAY 'RECORDS FIXED: ' WS-FIX-COUNT.
Step 9: Prevent recurrence. Add the IS NUMERIC test to BAL-CALC permanently, and add data quality validation to all conversion programs.
⚠️ Defensive Programming Lesson: The bug was not in BAL-CALC. BAL-CALC had run correctly for three years. The bug was in a data conversion program written three weeks earlier. Maria's fix does two things: corrects the data and adds defensive validation to BAL-CALC. "Never trust input data," she says. "Not even if it comes from your own system."
33.11 MedClaim Case Study: Finding Why Claims Are Miscalculated
The Problem: Claims for procedure code "99214" are being paid at 80% of the contracted rate instead of 100%. Sarah Kim, the business analyst, noticed the discrepancy in a weekly report.
Step 1: James Okafor examines the adjudication logic.
The claim adjudication program CLM-ADJUD calculates the approved amount based on a coverage table:
EVALUATE CLM-PROC-CATEGORY
WHEN 'PREVENTIVE'
MOVE 1.00 TO WS-COVERAGE-PCT
WHEN 'DIAGNOSTIC'
MOVE 0.80 TO WS-COVERAGE-PCT
WHEN 'SURGICAL'
MOVE 0.70 TO WS-COVERAGE-PCT
WHEN OTHER
MOVE 0.80 TO WS-COVERAGE-PCT
END-EVALUATE.
COMPUTE WS-APPROVED-AMT =
WS-ALLOWED-AMT * WS-COVERAGE-PCT.
Procedure 99214 (office visit, established patient, moderate complexity) should be "PREVENTIVE" but is being categorized as something else.
Step 2: Add DISPLAY debugging.
IF WS-DEBUG-LEVEL >= 2
DISPLAY 'DBG2: CLAIM=' WS-CLAIM-ID
DISPLAY 'DBG2: PROC-CODE=' CLM-PROC-CODE
DISPLAY 'DBG2: PROC-CATEGORY=>'
CLM-PROC-CATEGORY '<'
DISPLAY 'DBG2: COVERAGE-PCT=' WS-COVERAGE-PCT
END-IF
Notice the angle brackets around CLM-PROC-CATEGORY — this is intentional.
Step 3: Run with debug level 2 for a test claim.
Output:
DBG2: CLAIM=CLM000098765
DBG2: PROC-CODE=99214
DBG2: PROC-CATEGORY=>PREVENTIVE <
DBG2: COVERAGE-PCT=0.80
The category is PREVENTIVE — with a trailing space! The EVALUATE is comparing against 'PREVENTIVE' (10 characters) but the field contains 'PREVENTIVE ' (with a trailing space making it 11 characters).
Wait — in COBOL, that should not matter. COBOL pads shorter operands with spaces during comparison. Let James look more carefully...
Step 4: Deeper investigation.
DISPLAY 'CATEGORY LENGTH='
FUNCTION LENGTH(CLM-PROC-CATEGORY)
DISPLAY 'CATEGORY HEX='
* ... hex display routine ...
Output:
CATEGORY LENGTH=11
CATEGORY HEX=D7D9C5E5C5D5E3C9E5C500
The last byte is X'00' — a null character! The field is 'PREVENTIVE\0'. The null character causes the EVALUATE to fall through to WHEN OTHER.
Step 5: Trace the source of the null.
CLM-PROC-CATEGORY comes from a Java-to-COBOL interface via MQ. The Java program sends null-terminated strings (a C/Java convention), but COBOL expects space-padded fields.
Step 6: Fix.
Add an INSPECT to clean the data:
*--- Clean null characters from MQ input fields ---*
INSPECT CLM-PROC-CATEGORY
REPLACING ALL LOW-VALUES BY SPACES.
Or better, add a reusable paragraph:
8000-CLEAN-INPUT-FIELDS.
INSPECT CLM-PROC-CODE
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-PROC-CATEGORY
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-DIAG-CODE
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-MODIFIER
REPLACING ALL LOW-VALUES BY SPACES.
🧪 The Human Factor in Debugging: This bug existed because two teams (Java and COBOL) had different assumptions about string representation. Java uses null-terminated strings; COBOL uses fixed-length, space-padded fields. Neither team was "wrong" — they were working in different paradigms. The fix requires understanding both worlds. James Okafor's ability to debug this comes from his experience working across the interface boundary.
33.12 Production Debugging — Limited Access
In production, you typically cannot run interactive debuggers or add DISPLAY statements. You must work with what is available:
33.12.1 Reading Existing Logs
Check the job log (SYSOUT), CICS logs (CSMT, CESE), and DB2 diagnostic logs for clues. Many shops have application-level logging that writes to a dataset or DB2 table.
33.12.2 Analyzing Dumps After the Fact
Request that operations save the dump (SYS1.DUMPxx) when a production abend occurs. Use IPCS (Interactive Problem Control System) to format and analyze the dump offline:
IPCS VERBX CEEDUMP 'COMP(COBOL)'
33.12.3 Reproducing in Test
The gold standard: reproduce the problem in a test environment.
- Identify the failing input data (from the dump or logs)
- Copy or recreate that data in the test environment
- Run the program with debugging options enabled (SSRANGE, CHECK, TEST)
- Step through with an interactive debugger
*--- Create a test harness that feeds specific data ---*
01 WS-TEST-MODE PIC X(1) VALUE 'N'.
1000-INITIALIZE.
ACCEPT WS-TEST-MODE FROM JCL-PARM.
IF WS-TEST-MODE = 'Y'
PERFORM 1100-SETUP-TEST-DATA
ELSE
PERFORM 1200-NORMAL-INIT
END-IF.
1100-SETUP-TEST-DATA.
*--- Hardcode the failing scenario ---*
MOVE 'CLM000045892' TO WS-CLAIM-ID
MOVE '99214' TO CLM-PROC-CODE
MOVE 'PREVENTIVE' TO CLM-PROC-CATEGORY
MOVE X'00' TO CLM-PROC-CATEGORY(11:1)
*--- Now the program processes exactly the failing case ---*
33.12.4 Post-Mortem with SMF Records
IBM's System Management Facilities (SMF) records capture detailed execution data. SMF Type 30 records contain program execution statistics. SMF Type 101/102 records contain DB2 accounting data. These can help you understand what happened during a production failure without needing to reproduce it.
33.13 Building a Debugging Toolkit
Every experienced COBOL developer maintains a personal toolkit of debugging aids. Here is a starter kit:
33.13.1 Standard Debug Copybook
*================================================================*
* COPYBOOK: DEBUGCTL *
* PURPOSE: Standard debugging control fields *
*================================================================*
01 WS-DEBUG-CONTROL.
05 WS-DEBUG-LEVEL PIC 9(1) VALUE 0.
05 WS-DEBUG-START-REC PIC 9(9) VALUE ZEROS.
05 WS-DEBUG-END-REC PIC 9(9) VALUE 999999999.
05 WS-DEBUG-PARA-NAME PIC X(30).
05 WS-DEBUG-MSG PIC X(120).
05 WS-DEBUG-TIMESTAMP PIC X(26).
33.13.2 Standard Diagnostic Paragraph
8888-DEBUG-DISPLAY.
IF WS-DEBUG-LEVEL > 0
MOVE FUNCTION CURRENT-DATE
TO WS-DEBUG-TIMESTAMP
DISPLAY WS-DEBUG-TIMESTAMP(1:19) ' '
'L' WS-DEBUG-LEVEL ' '
WS-DEBUG-PARA-NAME ': '
WS-DEBUG-MSG
END-IF.
33.13.3 SQL Diagnostic Paragraph
8800-SQL-DIAGNOSTIC.
DISPLAY '*** SQL ERROR ***'
DISPLAY ' SQLCODE: ' SQLCODE
DISPLAY ' SQLERRM: ' SQLERRMC
DISPLAY ' STATEMENT: ' WS-DEBUG-PARA-NAME
DISPLAY ' SQLERRD(3):' SQLERRD(3)
IF SQLCODE = -911
DISPLAY ' DEADLOCK/TIMEOUT — RETRY NEEDED'
END-IF
IF SQLCODE = -803
DISPLAY ' DUPLICATE KEY'
END-IF
IF SQLCODE = -904
DISPLAY ' RESOURCE UNAVAILABLE'
END-IF.
33.13.4 Hex Dump Utility Paragraph
01 WS-HEX-WORK.
05 WS-HEX-CHARS PIC X(16)
VALUE '0123456789ABCDEF'.
05 WS-HEX-TBL REDEFINES WS-HEX-CHARS
PIC X(1) OCCURS 16.
05 WS-HEX-OUT PIC X(200).
05 WS-HEX-PTR PIC 9(3).
05 WS-HEX-BYTE PIC 9(3) COMP.
05 WS-HEX-HIGH PIC 9(3) COMP.
05 WS-HEX-LOW PIC 9(3) COMP.
8700-DISPLAY-HEX.
*--- Call with WS-HEX-INPUT and WS-HEX-LEN set ---*
MOVE 1 TO WS-HEX-PTR
MOVE SPACES TO WS-HEX-OUT
PERFORM VARYING WS-HEX-IDX FROM 1 BY 1
UNTIL WS-HEX-IDX > WS-HEX-LEN
MOVE FUNCTION ORD(
WS-HEX-INPUT(WS-HEX-IDX:1))
TO WS-HEX-BYTE
SUBTRACT 1 FROM WS-HEX-BYTE
DIVIDE WS-HEX-BYTE BY 16
GIVING WS-HEX-HIGH
REMAINDER WS-HEX-LOW
ADD 1 TO WS-HEX-HIGH
ADD 1 TO WS-HEX-LOW
MOVE WS-HEX-TBL(WS-HEX-HIGH)
TO WS-HEX-OUT(WS-HEX-PTR:1)
ADD 1 TO WS-HEX-PTR
MOVE WS-HEX-TBL(WS-HEX-LOW)
TO WS-HEX-OUT(WS-HEX-PTR:1)
ADD 1 TO WS-HEX-PTR
MOVE ' ' TO WS-HEX-OUT(WS-HEX-PTR:1)
ADD 1 TO WS-HEX-PTR
END-PERFORM
DISPLAY 'HEX: ' WS-HEX-OUT.
33.14 Debugging Performance Problems
Not all bugs cause abends or incorrect results. Some bugs cause performance degradation — programs that produce correct output but take far longer than they should. These are among the hardest bugs to diagnose because there is no error message pointing to the problem.
33.14.1 Identifying Performance Bottlenecks
The first step is measurement. On z/OS, several tools provide performance data:
- SMF records: Type 30 records capture CPU time, elapsed time, and I/O counts for each job step
- DB2 accounting reports: Show SQL execution statistics, lock waits, and buffer pool hit rates
- CICS monitoring: Transaction response times, CPU usage, and wait times
- RMF (Resource Measurement Facility): System-wide resource utilization
For application-level profiling, add timing instrumentation:
01 WS-TIMING.
05 WS-START-STCK PIC S9(18) COMP.
05 WS-END-STCK PIC S9(18) COMP.
05 WS-ELAPSED-MICRO PIC S9(18) COMP.
05 WS-ELAPSED-DISP PIC Z(12)9.
PERFORM-WITH-TIMING.
*--- Capture start time ---*
ACCEPT WS-START-STCK FROM TIME
PERFORM 2000-PROCESS-CLAIMS
*--- Capture end time ---*
ACCEPT WS-END-STCK FROM TIME
COMPUTE WS-ELAPSED-MICRO =
WS-END-STCK - WS-START-STCK
MOVE WS-ELAPSED-MICRO TO WS-ELAPSED-DISP
DISPLAY 'ELAPSED MICROSECONDS: '
WS-ELAPSED-DISP.
33.14.2 Common Performance Bugs in COBOL
The Accidental Full Table Scan:
*--- BUG: Missing WHERE clause qualification ---*
EXEC SQL
SELECT CUST_NAME INTO :WS-CUST-NAME
FROM CUSTOMER
WHERE CUST_STATUS = 'A'
END-EXEC.
*--- This returns ALL active customers (millions)
*--- and DB2 returns -811 (multiple rows).
*--- Even if it doesn't error, it scans the whole table.
The N+1 Query Problem:
*--- BUG: One SQL call per customer in a loop ---*
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > WS-CUST-COUNT
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-BALANCE
FROM ACCOUNT
WHERE CUST_ID = :WS-CUST-TABLE(WS-IDX)
END-EXEC
END-PERFORM.
*--- FIX: Use a cursor with JOIN or array fetch ---*
EXEC SQL DECLARE BALANCE-CURSOR CURSOR FOR
SELECT C.CUST_ID, A.ACCT_BALANCE
FROM CUSTOMER C
JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
WHERE C.CUST_STATUS = 'A'
ORDER BY C.CUST_ID
END-EXEC.
The Unnecessary SORT:
*--- BUG: Sorting in COBOL when DB2 can do it ---*
EXEC SQL DECLARE C1 CURSOR FOR
SELECT * FROM TRANSACTIONS
WHERE ACCT_ID = :WS-ACCT
END-EXEC.
*--- Fetch all into table, then SORT table ---*
*--- FIX: Add ORDER BY to the SQL ---*
EXEC SQL DECLARE C1 CURSOR FOR
SELECT * FROM TRANSACTIONS
WHERE ACCT_ID = :WS-ACCT
ORDER BY TRANS_DATE DESC
END-EXEC.
The Inefficient String Search:
*--- BUG: Scanning a large table character by character ---*
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > WS-TABLE-SIZE
IF WS-TABLE-ENTRY(WS-IDX) = WS-SEARCH-VALUE
MOVE WS-IDX TO WS-FOUND-IDX
END-IF
END-PERFORM.
*--- FIX: Use SEARCH ALL for sorted tables ---*
SEARCH ALL WS-TABLE-ENTRY
WHEN WS-TABLE-KEY(WS-IDX) = WS-SEARCH-VALUE
MOVE WS-IDX TO WS-FOUND-IDX
END-SEARCH.
33.14.3 Using DB2 EXPLAIN
The DB2 EXPLAIN statement shows you the access path DB2 will use for a query. This is invaluable for understanding why a query is slow:
//EXPLAIN EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(DB2P)
RUN PROGRAM(DSNTEP2) PLAN(DSNTEP21)
END
/*
//SYSIN DD *
EXPLAIN ALL SET QUERYNO = 1 FOR
SELECT C.CUST_NAME, A.ACCT_BALANCE
FROM CUSTOMER C
JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
WHERE C.CUST_ID = 'C000010001';
/*
The EXPLAIN output (in the PLAN_TABLE) tells you whether DB2 is using an index scan, a table scan, a nested loop join, a merge join, or a hash join. If you see a table scan (ACCESSTYPE = 'R') on a large table, you likely need an index on the WHERE clause columns.
📊 Performance Debugging Rule of Thumb: When a program is slow, the problem is almost always I/O — too many SQL calls, too many file reads, or SQL calls that trigger table scans. CPU-bound performance problems are rare in COBOL. Start by counting the number of SQL calls and file reads, then check whether each one is using an index.
33.15 Debugging Batch Programs with Large Data Volumes
Batch programs that process millions of records present unique debugging challenges. You cannot step through millions of iterations in a debugger, and adding DISPLAY statements to every record generates unmanageable output.
33.15.1 The Targeted Debug Window
Process normally until you reach the problem area, then activate debugging:
01 WS-DEBUG-WINDOW.
05 WS-DEBUG-START PIC 9(9) VALUE 47290.
05 WS-DEBUG-END PIC 9(9) VALUE 47300.
05 WS-CURRENT-REC PIC 9(9) VALUE ZEROS.
2000-PROCESS-LOOP.
PERFORM UNTIL END-OF-FILE
READ INPUT-FILE INTO WS-INPUT-REC
AT END SET END-OF-FILE TO TRUE
END-READ
IF NOT END-OF-FILE
ADD 1 TO WS-CURRENT-REC
*--- Activate debugging only in the window ---*
IF WS-CURRENT-REC >= WS-DEBUG-START
AND WS-CURRENT-REC <= WS-DEBUG-END
MOVE 3 TO WS-DEBUG-LEVEL
ELSE
MOVE 0 TO WS-DEBUG-LEVEL
END-IF
PERFORM 2100-PROCESS-ONE-RECORD
END-IF
END-PERFORM.
This produces detailed debug output only for records 47,290 through 47,300 — a window around the known failure point.
33.15.2 The Binary Search Debug Strategy
When you do not know which record causes the failure, use a binary search approach:
- Run the program with a checkpoint every 100,000 records
- The program abends between records 3,200,000 and 3,300,000
- Re-run from the checkpoint with checkpoints every 10,000 records
- It abends between 3,240,000 and 3,250,000
- Re-run with debug output for records 3,240,000 to 3,250,000
- The exact failing record is identified
This approach converges on the problem record in logarithmic time rather than requiring a linear scan.
33.15.3 The Canary Record
Insert a known-bad record into your test data at a specific position. Verify that your error handling catches it:
*--- Insert a canary at position 500 in test data ---*
*--- Record 500 has spaces in the amount field ---*
*--- If the program processes past 500 without
*--- logging an error, your validation is broken ---*
This technique verifies that your defensive programming actually works before you encounter real bad data in production.
33.16 Debugging Multi-Program Systems
Production mainframe systems often involve chains of programs communicating through files, databases, queues, or CALL interfaces. A bug in one program may manifest as incorrect behavior in a downstream program.
33.16.1 The Interface Verification Pattern
At every program interface (file handoff, MQ message, database shared table), verify that the data conforms to the expected format:
1500-VERIFY-INPUT-INTERFACE.
*--- Verify record count matches header ---*
IF WS-HEADER-COUNT NOT = WS-ACTUAL-COUNT
DISPLAY '*** INTERFACE ERROR ***'
DISPLAY ' EXPECTED: ' WS-HEADER-COUNT
DISPLAY ' ACTUAL: ' WS-ACTUAL-COUNT
PERFORM 9999-ABEND
END-IF
*--- Verify control totals match ---*
IF WS-HEADER-TOTAL NOT = WS-ACTUAL-TOTAL
DISPLAY '*** CONTROL TOTAL MISMATCH ***'
DISPLAY ' EXPECTED: ' WS-HEADER-TOTAL
DISPLAY ' ACTUAL: ' WS-ACTUAL-TOTAL
PERFORM 9999-ABEND
END-IF
*--- Verify no null contamination ---*
INSPECT WS-FIRST-RECORD
TALLYING WS-NULL-COUNT
FOR ALL LOW-VALUES
IF WS-NULL-COUNT > 0
DISPLAY '*** NULL CHARACTERS IN INPUT ***'
DISPLAY ' COUNT: ' WS-NULL-COUNT
END-IF.
33.16.2 The Breadcrumb Trail
When data flows through multiple programs, add a processing trail that records which programs have touched each record:
01 WS-BREADCRUMB.
05 BC-PROGRAM-NAME PIC X(8).
05 BC-TIMESTAMP PIC X(26).
05 BC-ACTION PIC X(20).
*--- Each program adds its breadcrumb ---*
2500-ADD-BREADCRUMB.
MOVE 'CLMADJ01' TO BC-PROGRAM-NAME
MOVE FUNCTION CURRENT-DATE TO BC-TIMESTAMP
MOVE 'ADJUDICATED' TO BC-ACTION
EXEC SQL
INSERT INTO PROCESSING_TRAIL
(CLAIM_ID, PROGRAM_NAME, PROCESS_TIME,
ACTION)
VALUES
(:WS-CLAIM-ID, :BC-PROGRAM-NAME,
:BC-TIMESTAMP, :BC-ACTION)
END-EXEC.
When a downstream program encounters unexpected data, the breadcrumb trail shows exactly which programs processed the record and when. This is invaluable for narrowing down which program introduced the problem.
🧪 The Human Factor: James Okafor estimates that 40% of his debugging time is spent on interface issues — data format mismatches between programs written by different teams. "The bug is always at the boundary," he says. "Two programs can each be individually correct and still produce incorrect results when connected, because they make different assumptions about the data format."
33.17 Debugging COBOL Called Programs (Subprograms)
Many COBOL applications consist of a main program that CALLs multiple subprograms. Debugging across CALL boundaries presents additional challenges.
33.17.1 Verifying CALL Parameters
A frequent source of bugs is mismatched parameters between the calling program and the called program. The COBOL compiler does not validate parameter layouts across separate compilation units.
*--- Calling program expects 3 parameters ---*
CALL 'CALCMOD' USING WS-INPUT-REC
WS-RESULT
WS-ERROR-CODE.
*--- Called program (CALCMOD) expects 3 parameters ---*
PROCEDURE DIVISION USING LS-INPUT-REC
LS-RESULT
LS-ERROR-CODE.
If the field sizes do not match between the calling and called program, data corruption occurs silently. For example, if the calling program defines WS-INPUT-REC as PIC X(100) but the called program defines LS-INPUT-REC as PIC X(150), the called program will read 50 bytes beyond the caller's allocated storage.
Debug technique: Add a length verification at the start of every called program:
LINKAGE SECTION.
01 LS-INPUT-REC.
05 LS-EYECATCHER PIC X(8).
05 LS-DATA-LENGTH PIC 9(5).
05 LS-DATA PIC X(92).
PROCEDURE DIVISION USING LS-INPUT-REC ...
1000-VERIFY-INTERFACE.
IF LS-EYECATCHER NOT = 'CALCMOD1'
DISPLAY '*** INTERFACE ERROR ***'
DISPLAY ' EXPECTED EYECATCHER: CALCMOD1'
DISPLAY ' RECEIVED: ' LS-EYECATCHER
PERFORM 9999-ABEND
END-IF
IF LS-DATA-LENGTH NOT = 92
DISPLAY '*** DATA LENGTH MISMATCH ***'
DISPLAY ' EXPECTED: 92'
DISPLAY ' RECEIVED: ' LS-DATA-LENGTH
PERFORM 9999-ABEND
END-IF.
The eyecatcher pattern ensures that the calling program is passing the correct parameter structure. This is especially valuable when a subprogram is called by multiple different main programs.
33.17.2 Debugging Static vs. Dynamic Calls
COBOL supports both static calls (resolved at link time) and dynamic calls (resolved at runtime):
*--- Static call — linked into the load module ---*
CALL 'CALCMOD' USING ...
*--- Dynamic call — loaded at runtime from a library ---*
MOVE 'CALCMOD' TO WS-PROGRAM-NAME
CALL WS-PROGRAM-NAME USING ...
For dynamic calls, the S806 abend (module not found) is common. Debugging checklist for S806: 1. Is the module name spelled correctly? (Check for trailing spaces) 2. Is the load library included in the STEPLIB/JOBLIB DD? 3. Was the module compiled and link-edited successfully? 4. Is the module in a library that the program has access to? 5. For CICS, is the program defined in the CSD?
*--- Defensive dynamic CALL ---*
MOVE 'CALCMOD ' TO WS-PROGRAM-NAME
CALL WS-PROGRAM-NAME USING WS-INPUT-REC
WS-RESULT
WS-ERROR-CODE
ON EXCEPTION
DISPLAY '*** CALL FAILED: '
WS-PROGRAM-NAME
DISPLAY '*** CHECK STEPLIB/JOBLIB'
PERFORM 9999-ABEND
END-CALL.
The ON EXCEPTION clause catches the S806 condition and allows the program to produce a meaningful error message rather than a cryptic abend.
33.17.3 Debugging CANCEL and Re-CALL
When a program issues CALL to a dynamic subprogram, the subprogram's WORKING-STORAGE is initialized once (on the first CALL). Subsequent CALLs retain the previous WORKING-STORAGE values. If you need the subprogram to reinitialize, you must either CANCEL it first or design the subprogram with an explicit initialization parameter.
*--- Subprogram retains state between calls ---*
CALL 'COUNTER' USING WS-COUNT. *> Returns 1
CALL 'COUNTER' USING WS-COUNT. *> Returns 2
CALL 'COUNTER' USING WS-COUNT. *> Returns 3
*--- Reset by canceling and re-calling ---*
CANCEL 'COUNTER'.
CALL 'COUNTER' USING WS-COUNT. *> Returns 1 again
Failing to understand this behavior is a common source of bugs in programs that call subprograms repeatedly with the expectation that each call starts fresh.
33.18 Establishing a Debugging Culture
Debugging is not just a technical skill — it is a team practice. The best debugging happens in organizations that build a culture around learning from bugs.
33.18.1 Post-Incident Reviews
After every production incident, conduct a blameless post-mortem: 1. What happened? (Timeline of events) 2. What was the root cause? 3. How was it detected? 4. How was it resolved? 5. What prevented earlier detection? 6. What changes will prevent recurrence?
At MedClaim, James Okafor maintains a "Bug Book" — a shared document that records every significant production bug, its root cause, and the fix. New team members read the Bug Book as part of onboarding. "Every bug is a lesson," James says. "If you don't write it down, you learn the same lesson twice."
33.18.2 Defensive Coding Standards
Encoding debugging knowledge into coding standards prevents entire categories of bugs:
- All numeric fields must have VALUE clauses (prevents S0C7 from uninitialized fields)
- All file operations must check status codes (prevents processing stale data)
- All elementary numeric moves, never group moves to numeric targets (prevents data corruption)
- All EVALUATE statements must have WHEN OTHER (prevents silent fall-through)
- All programs must include debug level infrastructure (enables runtime diagnostics)
- All CALL statements must use ON EXCEPTION (catches S806 before abend)
These standards, enforced through code review, eliminate the most common classes of COBOL bugs before they ever reach test.
33.18.3 The Debug Pair
When debugging a particularly stubborn problem, pair with a colleague. The second person brings fresh eyes and challenges your assumptions. Maria Chen and James Okafor have a standing agreement: when either one has spent more than 30 minutes on a bug without progress, they call the other. "The fresh perspective is worth more than any debugger," Maria says.
33.19 Debugging Checklist
When you encounter a bug, work through this checklist systematically:
For abends: 1. Record the abend code, module name, and offset 2. Get the compiler listing (with LIST, MAP, OFFSET) 3. Map the offset to a COBOL statement 4. Identify the failing instruction (AP, CP, ZAP for S0C7; branching/addressing for S0C1/S0C4) 5. Check the operand values in the dump 6. Trace how those operands got their values
For incorrect results: 1. Identify the first point where actual output diverges from expected 2. Add DISPLAY statements (or use debugger) at key decision points upstream 3. Inspect input data — is it what you expect? 4. Check EVALUATE/IF logic — are conditions tested correctly? 5. Check numeric conversions — group moves vs. elementary moves? 6. Check for boundary conditions — first record, last record, empty file
For performance problems: 1. Check DB2 access paths (EXPLAIN) 2. Count SQL calls — is the program issuing more than expected? 3. Check for missing indexes 4. Check commit frequency 5. Look for unnecessary I/O (reading files multiple times) 6. Check for CPU-intensive operations in loops
For intermittent failures: 1. Check for timing-dependent logic 2. Check for data-dependent paths (does it only fail on certain inputs?) 3. Check for uninitialized fields (values differ by run) 4. Check for concurrent access issues (locking, deadlocks) 5. Check for environmental differences (test vs. production)
For interface failures (cross-program or cross-platform): 1. Check data encoding (EBCDIC vs. ASCII) 2. Check for null characters (X'00' from Java/C systems) 3. Check field lengths — are the sending and receiving copybooks in sync? 4. Check numeric formats — is the sender using display, packed, or binary? 5. Check byte order — big-endian (z/OS) vs. little-endian (x86) 6. Verify record counts and control totals at every handoff point
33.20 A Debugging Decision Tree
When you first encounter a problem, this decision tree helps you choose the right debugging approach:
Is the program abending?
├── YES: What is the abend code?
│ ├── S0C7 → Check numeric fields in dump
│ │ → Look for group moves, uninitialized fields
│ │ → Check input data quality
│ ├── S0C4 → Check CALL parameters
│ │ → Check table subscripts
│ │ → Look for corrupted pointers
│ ├── S0C1 → Check CALL target existence
│ │ → Check for corrupted branch addresses
│ ├── S322 → Check for infinite loops
│ │ → Check TIME parameter in JCL
│ ├── S806 → Check STEPLIB, module name spelling
│ │ → Check compile/link-edit success
│ └── Other → Consult z/OS System Codes manual
│
├── NO: Is the output incorrect?
│ ├── YES: Where does output first diverge?
│ │ ├── Input data wrong → Debug upstream program
│ │ ├── Logic error → Add DISPLAY at decision points
│ │ ├── Data conversion → Check group vs. elementary moves
│ │ └── Comparison fails → Check for hidden characters
│ │
│ └── NO: Is it a performance problem?
│ ├── YES: Measure I/O counts
│ │ ├── Too many SQL calls → Use JOIN, cursor
│ │ ├── Table scans → Add index, fix WHERE clause
│ │ ├── Lock contention → Check lock duration, ordering
│ │ └── CPU-bound → Check loops, SORT efficiency
│ │
│ └── NO: Is it an intermittent failure?
│ ├── YES: Check for:
│ │ ├── Uninitialized fields
│ │ ├── Timing-dependent logic
│ │ ├── Data-dependent paths
│ │ └── Concurrent access issues
│ │
│ └── Clarify the symptom before proceeding
This tree is not exhaustive, but it covers the most common scenarios. Print it and keep it at your desk — it saves time when you are under pressure at 3 AM.
33.21 Try It Yourself: Debug Challenge Lab
This lab exercise gives you hands-on practice with the debugging techniques from this chapter.
Challenge 1: The S0C7 Hunt
A program processes employee payroll records. It abends with S0C7 on the 847th record. You are given: - The abend offset (from the job log) - The compiler listing (with LIST, MAP, OFFSET) - A hex dump of WORKING-STORAGE at the time of the abend
Your task: identify the failing statement, the corrupted field, the invalid hex data, and the root cause. (Hint: the 847th employee has a name that starts with a digit, and a group MOVE copies the name field over a numeric field.)
Challenge 2: The Silent Wrong Answer
A claim adjudication program calculates the correct approved amount for 99.8% of claims but produces $0.00 for claims with procedure code "99214" from providers who joined the network in 2024. The program does not abend.
Your task: add targeted DISPLAY debugging to trace the data flow for a failing claim, identify the root cause (a date comparison that uses display-format dates incorrectly), and apply the fix.
Challenge 3: The Intermittent Deadlock
A CICS transfer transaction works correctly 99.9% of the time but occasionally returns an error to the user. The CICS log shows SQLCODE -911.
Your task: analyze the locking pattern in the program, identify why deadlocks occur for specific account number combinations, implement resource ordering, and add deadlock retry logic.
These challenges mirror real-world debugging scenarios that you will encounter in production mainframe environments. Work through them systematically using the techniques from this chapter.
33.22 The Economics of Debugging
It may seem odd to discuss economics in a debugging chapter, but understanding the cost structure of bugs motivates the investment in prevention and detection.
33.22.1 The Cost Multiplier
A bug caught during development costs minutes to fix. The same bug caught in testing costs hours. The same bug found in production costs days or weeks when you include the investigation, the fix, the retesting, the emergency deployment, and the business impact.
At MedClaim, the null-character bug (Case Study 2) underpaid providers by $340,000 over two months. The fix took 30 minutes of coding. But the total cost included: - 2 person-days of investigation - 5 days of claim reprocessing - $340,000 in supplemental payments (float cost) - Provider relationship damage (unmeasured) - Regulatory reporting of the error
The total cost was estimated at $45,000 — for a bug that could have been prevented by a single INSPECT statement.
33.22.2 Investment in Prevention
Based on this cost analysis, James Okafor justified a project to add defensive data validation to all of MedClaim's claim processing programs. The project cost $120,000 (six person-months of development). In the first year after implementation, it caught 847 data quality issues that would previously have caused miscalculations or abends. James estimates that the project prevented at least $500,000 in incident costs.
"Debugging is reactive. Defensive programming is proactive. We should invest more in the proactive side," James told his management. The Bug Book became the evidence that convinced them.
33.23 Chapter Summary
Debugging is a skill that improves with practice and systematic thinking. In this chapter, you learned:
- DISPLAY debugging: Strategic placement of DISPLAY statements with debug levels for controlled verbosity
- Compiler options: SSRANGE for subscript checking, CHECK for numeric validation, TEST for debugger support, LIST/MAP/OFFSET for dump analysis
- READY TRACE: Built-in paragraph-level tracing for understanding execution flow
- Abend analysis: Reading dumps to identify the failing instruction and corrupted data, with special attention to S0C7 (data exceptions)
- CEEDUMP: Language Environment's formatted dump with COBOL variable values and call stacks
- Interactive debuggers: IBM Debug Tool and Xpediter for breakpoints, stepping, and variable inspection
- CICS debugging: CEDF/EDF for intercepting EXEC CICS commands; RESP/RESP2 for programmatic error detection
- SQL debugging: SQLCODE checking, DSNTEP2/SPUFI for testing SQL, common SQLCODE meanings
- Common bugs: Uninitialized fields, group moves, off-by-one errors, missing status checks, null characters from cross-platform interfaces
- Production debugging: Working with limited access — dump analysis, log reading, and reproducing in test environments
The best debugging skill, however, is one we have not discussed: prevention. Every bug you find teaches you something about how bugs are created. Use that knowledge to write more defensive code — validate inputs, check status codes, initialize fields, use scope terminators, and test boundary conditions. The best debugger is a programmer who writes code that does not need debugging.
"I've been debugging COBOL for fifteen years. The bugs haven't changed much — it's still uninitialized fields, unchecked status codes, and group moves. What's changed is how fast I find them." — James Okafor, reflecting on his craft