> "In production COBOL, the question is never if something will go wrong. It's when, and whether your program will handle it gracefully or bring down the batch window."
In This Chapter
- 10.1 The Error Handling Philosophy
- 10.2 FILE STATUS Codes — Your First Line of Defense
- 10.3 Arithmetic Error Handlers
- 10.4 INVALID KEY for Indexed and Relative Files
- 10.5 DECLARATIVES and USE Statements
- 10.6 Return Codes and Error Propagation
- 10.7 Input Validation Techniques
- 10.8 Boundary Condition Handling
- 10.9 Graceful Degradation Patterns
- 10.10 Logging and Audit Trails
- 10.11 GlobalBank Case Study: Error Handling in TXN-PROC
- 10.12 MedClaim Case Study: Handling Malformed Claim Records
- 10.13 Defensive Coding for COMP-3 Fields
- 10.14 Environment-Specific Defensive Techniques
- 10.15 The ABEND Handler Pattern
- 10.16 Putting It All Together — A Defensive Programming Template
- 10.17 Best Practices Summary
- Chapter Summary
Chapter 10: Defensive Programming
"In production COBOL, the question is never if something will go wrong. It's when, and whether your program will handle it gracefully or bring down the batch window." — James Okafor, Team Lead, MedClaim Health Services
Every COBOL program you wrote in your introductory course probably assumed perfect inputs, abundant disk space, and cooperative file systems. You read a record, it was there. You wrote output, it worked. The AT END condition was the only exception you handled. In the real world — the world of 2.3 million daily transactions at GlobalBank, of 500,000 monthly claims at MedClaim — things go wrong constantly. Files are locked by other jobs. Records contain unexpected characters. Disk allocations fill up. Network connections to DB2 drop. Integer overflows silently truncate important calculations.
Defensive programming is the discipline of anticipating what can go wrong and handling it before it causes data corruption, an ABEND, or a silent miscalculation. It is not pessimism — it is professionalism. This chapter teaches you the specific COBOL constructs and patterns that make programs resilient: FILE STATUS codes, arithmetic error handlers, DECLARATIVES, input validation techniques, and the broader strategies of error propagation, graceful degradation, and audit logging.
This chapter is the primary home of the Defensive Programming theme that threads through the rest of this textbook. The techniques introduced here will be applied and extended in every subsequent chapter.
Think of defensive programming not as additional work layered on top of "real" programming, but as an integral part of writing correct software. In the same way that a structural engineer does not view load calculations as "extra" — they are the core of the work — a COBOL developer should view FILE STATUS checking, input validation, and error logging as intrinsic to the craft, not as overhead. The programs in this chapter may look longer than their undefended counterparts, but they are also the programs that run unattended for years without intervention, processing billions of records while their operators sleep soundly.
10.1 The Error Handling Philosophy
Before we dive into syntax, let us establish a philosophy. Defensive programming in COBOL rests on four principles:
Principle 1: Never Assume Success. Every I/O operation, every arithmetic operation that could overflow, every data field that comes from an external source — check the result. Do not assume the operation succeeded just because it usually does.
Principle 2: Fail Loudly. When something goes wrong, make sure someone knows. A program that encounters an error and silently continues is far more dangerous than one that ABENDs. Silent failures cause data corruption that may not be discovered for days or weeks.
Principle 3: Fail Safely. When an error is unrecoverable, stop processing in a way that leaves data in a consistent state. Close files. Set return codes. Write a final log entry. Do not leave partial updates.
Principle 4: Log Everything. Every error should be recorded with enough context to diagnose the problem: what happened, when, in which program, at which paragraph, processing which record. Tomorrow morning's production support team will thank you.
💡 Key Insight — Legacy != Obsolete The defensive programming patterns in COBOL predate modern concepts like exception handling in Java or Python. But they are not inferior — they are different. COBOL's explicit, procedural error handling forces developers to think through every failure mode, which is exactly what you want for systems that process millions of financial transactions or medical claims. Many Java shops have adopted similar explicit error-handling patterns (like Google's error-return conventions in Go) precisely because exception-based handling can obscure error flows.
The Cost of Not Being Defensive
James Okafor tells a story from his early days at MedClaim. A batch program that processed claim payments did not check FILE STATUS after writing to the payment file. One night, the output disk allocation filled up. The WRITE operation silently failed for the last 3,000 records. The program completed with a return code of zero — "success." The next morning, 3,000 healthcare providers did not receive their payments. It took two days to identify the problem, regenerate the missing payments, and restore trust. "That's when I became a fanatic about checking every single I/O operation," James says.
10.2 FILE STATUS Codes — Your First Line of Defense
The FILE STATUS clause is the single most important defensive programming feature in COBOL. It tells you whether an I/O operation succeeded and, if it failed, why. Without FILE STATUS, a failed I/O operation might silently set the record area to unpredictable values, leaving your program to process garbage data as if nothing were wrong. With FILE STATUS, you know immediately that something went wrong, exactly what went wrong (via the two-character code), and you can decide how to respond — retry, skip, log, or ABEND.
Every professional COBOL developer treats FILE STATUS as mandatory. It appears in every SELECT statement, and its value is checked after every I/O operation. If you take only one technique from this entire chapter and apply it to every program you write for the rest of your career, let it be this one.
Declaring FILE STATUS
In the FILE-CONTROL paragraph, every SELECT statement should include a FILE STATUS clause:
FILE-CONTROL.
SELECT ACCT-MASTER-FILE
ASSIGN TO ACCTMAST
ORGANIZATION IS INDEXED
ACCESS MODE IS DYNAMIC
RECORD KEY IS ACCT-NUMBER
FILE STATUS IS WS-ACCT-STATUS.
SELECT TXN-INPUT-FILE
ASSIGN TO TXNIN
FILE STATUS IS WS-TXN-STATUS.
SELECT REPORT-FILE
ASSIGN TO RPTOUT
FILE STATUS IS WS-RPT-STATUS.
The status variable must be defined in WORKING-STORAGE as a two-character field:
01 WS-ACCT-STATUS PIC XX.
01 WS-TXN-STATUS PIC XX.
01 WS-RPT-STATUS PIC XX.
The Status Code Table
After every file operation (OPEN, READ, WRITE, REWRITE, DELETE, START, CLOSE), the runtime sets the status variable to a two-character code:
| Code | Category | Meaning |
|---|---|---|
00 |
Success | Operation completed successfully |
02 |
Success | READ successful, duplicate alternate key exists |
04 |
Success | READ successful, record length does not match FD |
05 |
Success | OPEN successful, optional file not present (created) |
07 |
Success | CLOSE with NO REWIND/REEL on non-reel device |
10 |
End | AT END — no more records to read |
14 |
End | Sequential READ on relative file, record number too large |
21 |
Invalid key | Sequence error on sequential write to indexed file |
22 |
Invalid key | Duplicate key on WRITE or REWRITE |
23 |
Invalid key | Record not found on READ/START/DELETE |
24 |
Invalid key | Boundary violation — file full or key out of range |
30 |
Permanent error | I/O error (hardware/system failure) |
34 |
Permanent error | Boundary violation — file full, sequential |
35 |
Permanent error | OPEN failed — file not found (non-optional) |
37 |
Permanent error | OPEN failed — file type conflict |
38 |
Permanent error | OPEN failed — file locked by another process |
39 |
Permanent error | OPEN failed — FD attributes conflict with actual file |
41 |
Logic error | OPEN on already-open file |
42 |
Logic error | CLOSE on already-closed file |
43 |
Logic error | REWRITE/DELETE without prior READ in sequential mode |
44 |
Logic error | REWRITE with changed record length (fixed-length file) |
46 |
Logic error | Sequential READ without positioning (no prior READ/START) |
47 |
Logic error | READ on file not opened for INPUT or I-O |
48 |
Logic error | WRITE on file not opened for OUTPUT, I-O, or EXTEND |
49 |
Logic error | REWRITE/DELETE on file not opened for I-O |
9x |
Implementation | Vendor-specific status codes |
📊 Status Code Categories at a Glance | First Digit | Meaning | |-------------|---------| | 0 | Successful completion | | 1 | AT END condition | | 2 | Invalid key condition | | 3 | Permanent error | | 4 | Logic error (programmer mistake) | | 9 | Vendor-specific |
Checking FILE STATUS — The Pattern
Here is the fundamental pattern that every production COBOL program should follow:
2000-READ-TRANSACTION.
READ TXN-INPUT-FILE INTO WS-TXN-RECORD
EVALUATE WS-TXN-STATUS
WHEN '00'
ADD 1 TO WS-TXN-READ-COUNT
PERFORM 3000-PROCESS-TRANSACTION
WHEN '10'
SET WS-END-OF-FILE TO TRUE
WHEN OTHER
MOVE 'READ FAILED ON TXN FILE' TO WS-ERR-MSG
MOVE WS-TXN-STATUS TO WS-ERR-STATUS
PERFORM 9800-LOG-ERROR
PERFORM 9900-ABEND-PROGRAM
END-EVALUATE.
⚠️ Critical Rule Check FILE STATUS after every file operation. Not just READs — also OPENs, WRITEs, REWRITEs, DELETEs, STARTs, and CLOSEs. The OPEN check is especially important because a failed OPEN means every subsequent operation on that file will also fail, potentially in confusing ways.
88-Level Conditions for Status Codes
Using 88-level condition names makes status checking more readable:
01 WS-ACCT-STATUS PIC XX.
88 ACCT-SUCCESS VALUE '00'.
88 ACCT-SUCCESS-DUP VALUE '02'.
88 ACCT-EOF VALUE '10'.
88 ACCT-DUP-KEY VALUE '22'.
88 ACCT-NOT-FOUND VALUE '23'.
88 ACCT-FILE-FULL VALUE '24'.
88 ACCT-PERM-ERROR VALUE '30'.
88 ACCT-NOT-EXISTS VALUE '35'.
88 ACCT-LOCKED VALUE '38'.
...
2100-READ-ACCOUNT.
READ ACCT-MASTER-FILE
EVALUATE TRUE
WHEN ACCT-SUCCESS
PERFORM 2200-VALIDATE-ACCOUNT
WHEN ACCT-NOT-FOUND
PERFORM 2300-HANDLE-NOT-FOUND
WHEN ACCT-PERM-ERROR
PERFORM 9800-LOG-ERROR
PERFORM 9900-ABEND-PROGRAM
WHEN OTHER
MOVE 'UNEXPECTED ACCT STATUS' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
PERFORM 9900-ABEND-PROGRAM
END-EVALUATE.
Checking OPEN Status — A Complete Pattern
1000-INITIALIZE.
OPEN INPUT ACCT-MASTER-FILE
IF NOT ACCT-SUCCESS
DISPLAY 'FATAL: Cannot open ACCT-MASTER-FILE'
DISPLAY ' FILE STATUS: ' WS-ACCT-STATUS
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF
OPEN INPUT TXN-INPUT-FILE
IF NOT TXN-SUCCESS
DISPLAY 'FATAL: Cannot open TXN-INPUT-FILE'
DISPLAY ' FILE STATUS: ' WS-TXN-STATUS
CLOSE ACCT-MASTER-FILE
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF
OPEN OUTPUT REPORT-FILE
IF NOT RPT-SUCCESS
DISPLAY 'FATAL: Cannot open REPORT-FILE'
DISPLAY ' FILE STATUS: ' WS-RPT-STATUS
CLOSE ACCT-MASTER-FILE
CLOSE TXN-INPUT-FILE
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF.
Notice the cascading CLOSE operations — if the third file fails to open, we close the first two files that were opened successfully. This ensures clean resource cleanup.
Try It Yourself — FILE STATUS Exercise
Write a program that opens three files: an input file, an output file, and a report file. Implement the full OPEN checking pattern above. Then read the input file in a loop, checking status after every READ. Introduce deliberate errors (reference a non-existent file, try to read from an output file) and observe the status codes.
10.3 Arithmetic Error Handlers
COBOL provides built-in phrases for detecting arithmetic errors at the statement level.
ON SIZE ERROR
The ON SIZE ERROR phrase detects when an arithmetic result is too large for the receiving field, or when a divide-by-zero occurs:
ADD TXN-AMOUNT TO ACCT-BALANCE
ON SIZE ERROR
MOVE 'BALANCE OVERFLOW FOR ACCT: '
TO WS-ERR-MSG
STRING WS-ERR-MSG DELIMITED BY ' '
ACCT-NUMBER DELIMITED BY SIZE
INTO WS-ERR-MSG
END-STRING
PERFORM 9800-LOG-ERROR
SET WS-TXN-REJECTED TO TRUE
NOT ON SIZE ERROR
SET WS-TXN-PROCESSED TO TRUE
END-ADD
Size error detection applies to ADD, SUBTRACT, MULTIPLY, DIVIDE, and COMPUTE:
COMPUTE WS-INTEREST =
ACCT-BALANCE * WS-INTEREST-RATE / 365
ON SIZE ERROR
DISPLAY 'Interest calculation overflow'
DISPLAY 'Balance: ' ACCT-BALANCE
DISPLAY 'Rate: ' WS-INTEREST-RATE
MOVE ZERO TO WS-INTEREST
NOT ON SIZE ERROR
CONTINUE
END-COMPUTE
⚠️ Important: SIZE ERROR Does Not Detect Truncation of Decimal Places ON SIZE ERROR only fires when the integer portion of the result exceeds the receiving field. Truncation of decimal digits is not a size error. If you COMPUTE a value of 123.456 into a PIC 9(3)V99 field, the result is 123.45 — the trailing 6 is silently truncated, and no SIZE ERROR occurs. Be aware of this limitation when designing your data items.
DIVIDE and REMAINDER
Division deserves special attention because of the divide-by-zero risk:
IF WS-TRANSACTION-COUNT > ZERO
DIVIDE WS-TOTAL-AMOUNT BY WS-TRANSACTION-COUNT
GIVING WS-AVERAGE-AMOUNT
REMAINDER WS-REMAINDER
ON SIZE ERROR
MOVE ZERO TO WS-AVERAGE-AMOUNT
DISPLAY 'Division overflow in average calc'
END-DIVIDE
ELSE
MOVE ZERO TO WS-AVERAGE-AMOUNT
DISPLAY 'No transactions — average is zero'
END-IF
The defensive pattern here is twofold: check for zero before dividing, and use ON SIZE ERROR as a safety net in case the non-zero divisor still produces an overflow.
ON OVERFLOW for STRING and UNSTRING
The STRING and UNSTRING statements have their own overflow conditions:
STRING WS-LAST-NAME DELIMITED BY SPACES
', ' DELIMITED BY SIZE
WS-FIRST-NAME DELIMITED BY SPACES
INTO WS-FULL-NAME
WITH POINTER WS-STRING-PTR
ON OVERFLOW
DISPLAY 'Name too long for WS-FULL-NAME'
MOVE WS-LAST-NAME TO WS-FULL-NAME
END-STRING
ON OVERFLOW fires when the receiving field is full before all source data has been moved. Without this check, data is silently truncated.
10.4 INVALID KEY for Indexed and Relative Files
For indexed and relative file operations, the INVALID KEY phrase detects key-related errors:
READ ACCT-MASTER-FILE
INVALID KEY
IF ACCT-NOT-FOUND
PERFORM 2300-HANDLE-NOT-FOUND
ELSE
MOVE 'UNEXPECTED KEY ERROR' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
END-IF
NOT INVALID KEY
PERFORM 2200-PROCESS-ACCOUNT
END-READ
INVALID KEY applies to READ (random/dynamic), WRITE, REWRITE, DELETE, and START operations on indexed files:
WRITE ACCT-RECORD
INVALID KEY
IF ACCT-DUP-KEY
DISPLAY 'Duplicate account: ' ACCT-NUMBER
ADD 1 TO WS-DUP-COUNT
ELSE IF ACCT-FILE-FULL
DISPLAY 'ACCT file full — cannot add'
PERFORM 9900-ABEND-PROGRAM
ELSE
MOVE 'WRITE KEY ERROR' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
END-IF
END-WRITE
💡 Pro Tip — FILE STATUS vs. INVALID KEY You might wonder: if I check FILE STATUS after every operation, do I also need INVALID KEY? Technically, no — FILE STATUS gives you all the information INVALID KEY does, and more. However, many shops use both as a belt-and-suspenders approach. The FILE STATUS check is the authoritative error handler, while INVALID KEY provides inline handling for expected conditions (like record-not-found during a lookup). Use whichever approach your shop's standards require, but always use FILE STATUS at minimum.
10.5 DECLARATIVES and USE Statements
DECLARATIVES provide a mechanism for associating error-handling procedures with specific files or I/O operations. They are defined at the beginning of the PROCEDURE DIVISION and are invoked automatically by the runtime when certain conditions occur.
Syntax
PROCEDURE DIVISION.
DECLARATIVES.
ACCT-FILE-ERROR SECTION.
USE AFTER STANDARD ERROR PROCEDURE ON ACCT-MASTER-FILE.
ACCT-FILE-ERROR-PARA.
DISPLAY 'I/O ERROR ON ACCT-MASTER-FILE'
DISPLAY 'FILE STATUS: ' WS-ACCT-STATUS
DISPLAY 'OPERATION: ' WS-LAST-OPERATION
PERFORM 9800-LOG-ERROR.
TXN-FILE-ERROR SECTION.
USE AFTER STANDARD ERROR PROCEDURE ON TXN-INPUT-FILE.
TXN-FILE-ERROR-PARA.
DISPLAY 'I/O ERROR ON TXN-INPUT-FILE'
DISPLAY 'FILE STATUS: ' WS-TXN-STATUS
PERFORM 9800-LOG-ERROR.
END DECLARATIVES.
MAIN-PROGRAM SECTION.
0000-MAIN.
PERFORM 1000-INITIALIZE
...
When DECLARATIVES Fire
A USE AFTER STANDARD ERROR procedure is invoked after an unsuccessful I/O operation — specifically, when the file status indicates a permanent error (status codes 30 and above). It fires before control returns to the statement following the failed I/O operation.
Scope of DECLARATIVES
USE AFTER ERROR PROCEDURE ON file-name— applies to a specific fileUSE AFTER ERROR PROCEDURE ON INPUT— applies to all input filesUSE AFTER ERROR PROCEDURE ON OUTPUT— applies to all output filesUSE AFTER ERROR PROCEDURE ON I-O— applies to all I-O filesUSE AFTER ERROR PROCEDURE ON EXTEND— applies to all extend files
Practical Considerations
DECLARATIVES are powerful but have some constraints:
- They must appear at the very beginning of the PROCEDURE DIVISION, before any other sections.
- Code within DECLARATIVES cannot reference code outside DECLARATIVES via PERFORM (though some compilers relax this).
- They add a layer of indirection that can make control flow harder to follow.
Many modern COBOL shops prefer explicit FILE STATUS checking over DECLARATIVES because it keeps the error handling visible at the point of the I/O operation. However, DECLARATIVES are useful as a safety net — a catch-all that fires if a programmer forgets to check FILE STATUS at a particular I/O statement.
⚖️ Design Decision — DECLARATIVES vs. Inline Checking At GlobalBank, Maria Chen uses a hybrid approach: DECLARATIVES as a safety net for permanent errors, with inline FILE STATUS checking for expected conditions. "DECLARATIVES are my smoke alarm," she says. "I never want it to go off, but I'm glad it's there." James Okafor at MedClaim prefers purely inline checking: "I want every error handler visible right next to the I/O statement. No surprises."
10.6 Return Codes and Error Propagation
In batch COBOL programs, the RETURN-CODE special register communicates the program's final status to the operating system (or to a calling program):
01 WS-RETURN-CODE-VALUES.
05 RC-SUCCESS PIC S9(04) COMP VALUE +0.
05 RC-WARNING PIC S9(04) COMP VALUE +4.
05 RC-ERROR PIC S9(04) COMP VALUE +8.
05 RC-SEVERE PIC S9(04) COMP VALUE +12.
05 RC-FATAL PIC S9(04) COMP VALUE +16.
Setting RETURN-CODE
9000-TERMINATE.
EVALUATE TRUE
WHEN WS-ERROR-COUNT > WS-MAX-ERRORS
MOVE RC-SEVERE TO RETURN-CODE
WHEN WS-ERROR-COUNT > 0
MOVE RC-WARNING TO RETURN-CODE
WHEN WS-REJECT-COUNT > 0
MOVE RC-WARNING TO RETURN-CODE
WHEN OTHER
MOVE RC-SUCCESS TO RETURN-CODE
END-EVALUATE
DISPLAY 'PROGRAM COMPLETE — RC=' RETURN-CODE
DISPLAY 'RECORDS PROCESSED: ' WS-PROCESS-COUNT
DISPLAY 'RECORDS REJECTED: ' WS-REJECT-COUNT
DISPLAY 'ERRORS: ' WS-ERROR-COUNT
CLOSE ALL-FILES
STOP RUN.
JCL Condition Code Checking
The RETURN-CODE value becomes the JCL condition code, which subsequent job steps can test:
//STEP02 EXEC PGM=NEXTSTEP,COND=(8,LT,STEP01)
This says: skip STEP02 if STEP01's return code was less than 8. In practice, this means STEP02 runs only if STEP01 returned 8 or higher (error or worse).
📊 Industry Convention for Return Codes | Code | Meaning | Typical Action | |------|---------|----------------| | 0 | Success | Continue normally | | 4 | Warning | Continue but review | | 8 | Error | Subsequent steps may be skipped | | 12 | Severe error | Most subsequent steps skipped | | 16 | Fatal error | All subsequent steps skipped |
Error Propagation in Called Programs
When a program CALLs a subprogram, the subprogram can communicate errors back through:
- RETURN-CODE — set by the subprogram, visible to the caller after the CALL returns
- Explicit parameters — a return status field passed as a parameter
- Shared working storage — via EXTERNAL data items (less common)
* In the calling program:
CALL 'VALIDATE-ACCT' USING ACCT-RECORD
WS-VALIDATE-RESULT
EVALUATE WS-VALIDATE-RESULT
WHEN 'OK'
PERFORM 3000-PROCESS-VALID-ACCT
WHEN 'NF'
PERFORM 3100-HANDLE-NOT-FOUND
WHEN 'LK'
PERFORM 3200-HANDLE-LOCKED
WHEN OTHER
MOVE 'UNKNOWN VALIDATE RESULT'
TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
END-EVALUATE
* In the called subprogram VALIDATE-ACCT:
LINKAGE SECTION.
01 LS-ACCT-RECORD.
COPY ACCT-REC.
01 LS-RESULT PIC XX.
PROCEDURE DIVISION USING LS-ACCT-RECORD
LS-RESULT.
READ ACCT-MASTER-FILE
EVALUATE WS-ACCT-STATUS
WHEN '00'
MOVE 'OK' TO LS-RESULT
WHEN '23'
MOVE 'NF' TO LS-RESULT
WHEN '38'
MOVE 'LK' TO LS-RESULT
WHEN OTHER
MOVE 'ER' TO LS-RESULT
END-EVALUATE
GOBACK.
10.7 Input Validation Techniques
Data coming from external sources — files, user input, data feeds, EDI transmissions — should never be trusted. This is not paranoia; it is experience. James Okafor keeps a log of every data quality issue MedClaim has encountered over the past five years. The log currently has 847 entries. The most common categories are: non-numeric data in numeric fields (23%), missing required fields (19%), dates in unexpected formats (14%), amount values out of valid range (12%), and duplicate records (8%). Every one of these issues would cause a program failure or incorrect output if not caught by validation. Defensive programs validate every field before processing.
Numeric Validation
COBOL's NUMERIC class test checks whether a field contains valid numeric data:
IF WS-INPUT-AMOUNT IS NUMERIC
MOVE WS-INPUT-AMOUNT TO WS-TXN-AMOUNT
ELSE
MOVE 'NON-NUMERIC AMOUNT' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
SET WS-RECORD-INVALID TO TRUE
END-IF
⚠️ Caution — NUMERIC Test Behavior The NUMERIC test checks if a field contains characters that are valid for the field's USAGE. For a DISPLAY numeric field (PIC 9), valid characters are 0-9 and, for signed fields, the sign encoding. For COMP-3 (packed decimal), the test checks for valid packed-decimal encoding. For alphanumeric fields (PIC X), the NUMERIC test checks if all characters are in the range 0-9. Be sure you understand which test is being applied based on the field's definition.
Range Validation
IF CLM-BILLED-AMT >= ZERO
AND CLM-BILLED-AMT <= WS-MAX-CLAIM-AMOUNT
CONTINUE
ELSE
STRING 'CLAIM AMOUNT OUT OF RANGE: '
DELIMITED BY SIZE
CLM-ID DELIMITED BY SIZE
INTO WS-ERR-MSG
END-STRING
PERFORM 9800-LOG-ERROR
SET WS-RECORD-INVALID TO TRUE
END-IF
Date Validation
Dates are notoriously tricky. A defensive program validates not just format but logical correctness:
5000-VALIDATE-DATE.
* Assumes WS-VAL-DATE is PIC 9(8) in YYYYMMDD format
MOVE WS-VAL-DATE(1:4) TO WS-VAL-YEAR
MOVE WS-VAL-DATE(5:2) TO WS-VAL-MONTH
MOVE WS-VAL-DATE(7:2) TO WS-VAL-DAY
IF WS-VAL-YEAR < 1900 OR WS-VAL-YEAR > 2099
SET WS-DATE-INVALID TO TRUE
MOVE 'YEAR OUT OF RANGE' TO WS-VAL-ERR-MSG
ELSE IF WS-VAL-MONTH < 01 OR WS-VAL-MONTH > 12
SET WS-DATE-INVALID TO TRUE
MOVE 'MONTH OUT OF RANGE' TO WS-VAL-ERR-MSG
ELSE
PERFORM 5100-VALIDATE-DAY
END-IF.
5100-VALIDATE-DAY.
EVALUATE WS-VAL-MONTH
WHEN 01 WHEN 03 WHEN 05 WHEN 07
WHEN 08 WHEN 10 WHEN 12
IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 31
SET WS-DATE-INVALID TO TRUE
END-IF
WHEN 04 WHEN 06 WHEN 09 WHEN 11
IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 30
SET WS-DATE-INVALID TO TRUE
END-IF
WHEN 02
PERFORM 5200-CHECK-FEBRUARY
END-EVALUATE.
5200-CHECK-FEBRUARY.
IF FUNCTION MOD(WS-VAL-YEAR, 4) = 0
AND (FUNCTION MOD(WS-VAL-YEAR, 100) NOT = 0
OR FUNCTION MOD(WS-VAL-YEAR, 400) = 0)
IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 29
SET WS-DATE-INVALID TO TRUE
END-IF
ELSE
IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 28
SET WS-DATE-INVALID TO TRUE
END-IF
END-IF.
Condition-Name Validation
88-level condition names provide elegant validation for coded fields:
01 WS-INPUT-ACCT-TYPE PIC X(02).
88 VALID-ACCT-TYPE VALUE 'CH' 'SV' 'MM' 'CD'.
...
IF VALID-ACCT-TYPE
CONTINUE
ELSE
STRING 'INVALID ACCT TYPE: '
DELIMITED BY SIZE
WS-INPUT-ACCT-TYPE
DELIMITED BY SIZE
INTO WS-ERR-MSG
END-STRING
PERFORM 9800-LOG-ERROR
SET WS-RECORD-INVALID TO TRUE
END-IF
Cross-Field Validation
Sometimes individual fields are valid but their combination is not:
* A closed account should have a zero balance
IF ACCT-CLOSED AND ACCT-CURR-BALANCE NOT = ZERO
MOVE 'CLOSED ACCT WITH NON-ZERO BALANCE'
TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
SET WS-RECORD-SUSPICIOUS TO TRUE
END-IF
* A withdrawal cannot exceed available balance
IF TXN-WITHDRAW AND
TXN-AMOUNT > ACCT-AVAIL-BALANCE
MOVE 'WITHDRAWAL EXCEEDS AVAILABLE BAL'
TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
SET WS-TXN-REJECTED TO TRUE
END-IF
Building a Validation Framework
For programs that validate many fields, a structured approach prevents the code from becoming an unreadable chain of IF statements:
01 WS-VALIDATION-RESULTS.
05 WS-VAL-ERROR-COUNT PIC 9(03) VALUE ZERO.
05 WS-VAL-ERRORS.
10 WS-VAL-ERROR OCCURS 20 TIMES.
15 WS-VAL-ERR-FIELD PIC X(30).
15 WS-VAL-ERR-MSG PIC X(50).
05 WS-VAL-OVERALL PIC X(01).
88 WS-RECORD-VALID VALUE 'V'.
88 WS-RECORD-INVALID VALUE 'I'.
...
4000-VALIDATE-CLAIM.
MOVE ZERO TO WS-VAL-ERROR-COUNT
SET WS-RECORD-VALID TO TRUE
PERFORM 4100-VAL-CLAIM-ID
PERFORM 4200-VAL-CLAIM-TYPE
PERFORM 4300-VAL-MEMBER-INFO
PERFORM 4400-VAL-PROVIDER-INFO
PERFORM 4500-VAL-FINANCIAL
PERFORM 4600-VAL-DATES
PERFORM 4700-VAL-CROSS-FIELD
IF WS-VAL-ERROR-COUNT > 0
SET WS-RECORD-INVALID TO TRUE
END-IF.
4100-VAL-CLAIM-ID.
IF CLM-ID = SPACES OR CLM-ID = LOW-VALUES
ADD 1 TO WS-VAL-ERROR-COUNT
MOVE 'CLM-ID'
TO WS-VAL-ERR-FIELD(WS-VAL-ERROR-COUNT)
MOVE 'CLAIM ID IS BLANK'
TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
END-IF.
This pattern collects all validation errors for a record rather than stopping at the first one, which makes error reporting much more useful.
Try It Yourself — Validation Gauntlet
Create a program that reads employee records and runs them through a comprehensive validation gauntlet. Each record has these fields:
- Employee ID (PIC 9(6)) — must be numeric, non-zero
- Department (PIC X(3)) — must be one of: ENG, MKT, FIN, HR, OPS
- Salary (PIC 9(6)V99) — must be between 25,000.00 and 500,000.00
- Hire date (PIC 9(8)) — must be a valid YYYYMMDD date, not in the future
- Termination date (PIC 9(8)) — if non-zero, must be after hire date
- Manager ID (PIC 9(6)) — must be different from employee ID
Collect all errors per record (do not stop at the first). Write valid records to one file and invalid records to another, with all error descriptions appended. Count the number of records with 1 error, 2 errors, 3+ errors, and display these statistics at termination.
This exercise forces you to think about cross-field validation (termination date vs. hire date, manager ID vs. employee ID) and multi-error collection, which are skills that distinguish production-quality code from classroom exercises.
Validation Performance Considerations
In a high-volume batch program processing millions of records, validation can become a performance bottleneck. Some techniques to keep validation fast:
-
Use 88-level conditions instead of IF chains. Checking
IF VALID-DEPT-CODE(an 88-level with all valid values) is typically faster thanIF DEPT = 'ENG' OR DEPT = 'MKT' OR DEPT = 'FIN' OR .... -
Validate in decreasing likelihood of failure. If 99% of records have valid department codes but only 90% have valid dates, check dates first. This allows you to skip remaining validations sooner for invalid records (if using a stop-at-first-error strategy).
-
Pre-load reference data. If validation requires checking against a reference table (e.g., valid provider NPIs), load the table into working-storage arrays during initialization rather than performing file I/O during validation.
-
Consider binary search for large reference tables. If your validation reference table has thousands of entries, use SEARCH ALL (binary search) rather than SEARCH (sequential search). Binary search is O(log n) versus O(n) for sequential search.
01 WS-VALID-CODES.
05 WS-CODE-COUNT PIC 9(04) VALUE 50.
05 WS-CODE-TABLE.
10 WS-CODE-ENTRY PIC X(05)
OCCURS 50 TIMES
ASCENDING KEY WS-CODE-ENTRY
INDEXED BY WS-CODE-IDX.
...
* Binary search for valid code
SET WS-CODE-IDX TO 1
SEARCH ALL WS-CODE-ENTRY
AT END
SET RECORD-INVALID TO TRUE
MOVE 'INVALID CODE' TO WS-ERR-MSG
WHEN WS-CODE-ENTRY(WS-CODE-IDX) =
WS-INPUT-CODE
CONTINUE
END-SEARCH
10.8 Boundary Condition Handling
Boundary conditions are the edges of your program's expected input space — the first record, the last record, empty files, maximum-length fields, zero amounts, negative values, and so on.
Empty File Handling
2000-PROCESS-LOOP.
PERFORM 2100-READ-NEXT-TXN
IF WS-END-OF-FILE AND WS-TXN-READ-COUNT = ZERO
DISPLAY 'WARNING: Input file is empty'
MOVE RC-WARNING TO RETURN-CODE
ELSE
PERFORM UNTIL WS-END-OF-FILE
PERFORM 3000-PROCESS-TRANSACTION
PERFORM 2100-READ-NEXT-TXN
END-PERFORM
END-IF.
Table Overflow
When loading data into a table (OCCURS), always check bounds:
01 WS-ERROR-TABLE.
05 WS-MAX-ERRORS PIC 9(03) VALUE 100.
05 WS-ERROR-COUNT PIC 9(03) VALUE ZERO.
05 WS-ERRORS OCCURS 100 TIMES.
10 WS-ERR-CODE PIC X(04).
10 WS-ERR-TEXT PIC X(60).
...
ADD-ERROR-ENTRY.
IF WS-ERROR-COUNT >= WS-MAX-ERRORS
DISPLAY 'ERROR TABLE FULL — ERRORS TRUNCATED'
ELSE
ADD 1 TO WS-ERROR-COUNT
MOVE WS-NEW-ERR-CODE
TO WS-ERR-CODE(WS-ERROR-COUNT)
MOVE WS-NEW-ERR-TEXT
TO WS-ERR-TEXT(WS-ERROR-COUNT)
END-IF.
Counter Overflow
Even counters can overflow. If your program processes millions of records but your counter is PIC 9(5) (max 99,999), you have a problem:
* Safe counter increment
IF WS-RECORD-COUNT < 9999999
ADD 1 TO WS-RECORD-COUNT
ELSE
DISPLAY 'WARNING: Record counter overflow'
DISPLAY ' Count frozen at 9999999'
END-IF
Better yet, size your counters appropriately from the start. GlobalBank's TXN-PROC uses PIC 9(09) for transaction counters — enough for a billion records.
Try It Yourself — Boundary Testing
Write a program that demonstrates all of the boundary conditions described above:
- Create an input file with a known number of records (e.g., exactly 100).
- Process the file with a PIC 9(2) counter (max 99) and observe what happens at record 100.
- Add a table (OCCURS 20 TIMES) and try to add 25 entries.
- Create an empty file and run the program — verify your empty file handling works.
- Include a counter that will overflow with ON SIZE ERROR handling.
For each boundary, document: (a) what happens without the defensive check, and (b) what happens with it. This exercise builds the instinct to always ask "what happens at the edges?"
The Sentinel Value Pattern
A common boundary technique is the sentinel value — a special value that signals the end of data. In COBOL, HIGH-VALUES (x'FF...') and LOW-VALUES (x'00...') are natural sentinels.
When processing two sorted files in parallel (such as in the match-update pattern from Chapter 11), set the key to HIGH-VALUES when a file reaches end-of-file. This causes the exhausted file's key to compare higher than any valid key in the other file, so all remaining records from the non-exhausted file are processed naturally:
2100-READ-MASTER.
READ MASTER-FILE INTO WS-MASTER-RECORD
EVALUATE WS-MASTER-STATUS
WHEN '00'
ADD 1 TO WS-MASTER-READ
WHEN '10'
MOVE HIGH-VALUES TO WS-MASTER-KEY
SET MASTER-EOF TO TRUE
WHEN OTHER
PERFORM 9900-ABEND-PROGRAM
END-EVALUATE.
This technique eliminates the need for complex end-of-file logic in the main processing loop. The loop simply compares keys, and the sentinel values handle the exhaustion cases automatically. It is elegant and widely used in production COBOL.
10.9 Graceful Degradation Patterns
Not every error should be fatal. Defensive programs distinguish between errors that must stop processing and errors that can be handled, logged, and skipped.
The Error Threshold Pattern
01 WS-ERROR-CONTROL.
05 WS-MAX-ERRORS PIC 9(05) VALUE 100.
05 WS-CONSEC-ERRORS PIC 9(03) VALUE ZERO.
05 WS-MAX-CONSEC PIC 9(03) VALUE 10.
05 WS-TOTAL-ERRORS PIC 9(05) VALUE ZERO.
...
3000-PROCESS-TRANSACTION.
PERFORM 3100-VALIDATE-TXN
IF WS-RECORD-VALID
PERFORM 3200-APPLY-TXN
MOVE ZERO TO WS-CONSEC-ERRORS
ELSE
ADD 1 TO WS-TOTAL-ERRORS
ADD 1 TO WS-CONSEC-ERRORS
PERFORM 3300-WRITE-REJECT
IF WS-TOTAL-ERRORS >= WS-MAX-ERRORS
DISPLAY 'MAX TOTAL ERRORS REACHED'
PERFORM 9900-ABEND-PROGRAM
END-IF
IF WS-CONSEC-ERRORS >= WS-MAX-CONSEC
DISPLAY 'MAX CONSECUTIVE ERRORS REACHED'
DISPLAY 'POSSIBLE SYSTEMIC DATA PROBLEM'
PERFORM 9900-ABEND-PROGRAM
END-IF
END-IF.
The consecutive error counter is particularly valuable. If ten records in a row fail validation, the problem is probably systemic (wrong file, corrupt data, schema mismatch) rather than individual bad records. Continuing would be pointless.
The Bypass Pattern
When a non-critical operation fails, log it and continue:
3500-GENERATE-NOTIFICATION.
* Notification is nice-to-have, not critical
CALL 'NOTIFY-SVC' USING WS-NOTIFY-PARMS
WS-NOTIFY-STATUS
IF WS-NOTIFY-STATUS NOT = 'OK'
DISPLAY 'WARNING: Notification failed'
DISPLAY ' Continuing processing'
ADD 1 TO WS-WARNING-COUNT
END-IF.
The Retry Pattern
For transient errors (file lock, network timeout), retrying can resolve the issue:
01 WS-RETRY-CONTROL.
05 WS-MAX-RETRIES PIC 9(02) VALUE 03.
05 WS-RETRY-COUNT PIC 9(02) VALUE ZERO.
05 WS-RETRY-NEEDED PIC X(01).
88 RETRY-YES VALUE 'Y'.
88 RETRY-NO VALUE 'N'.
...
2500-READ-WITH-RETRY.
MOVE ZERO TO WS-RETRY-COUNT
SET RETRY-YES TO TRUE
PERFORM UNTIL RETRY-NO
READ ACCT-MASTER-FILE
EVALUATE WS-ACCT-STATUS
WHEN '00'
SET RETRY-NO TO TRUE
PERFORM 2600-PROCESS-RECORD
WHEN '38'
ADD 1 TO WS-RETRY-COUNT
IF WS-RETRY-COUNT >= WS-MAX-RETRIES
DISPLAY 'FILE LOCKED AFTER '
WS-MAX-RETRIES ' RETRIES'
SET RETRY-NO TO TRUE
PERFORM 9800-LOG-ERROR
ELSE
DISPLAY 'FILE LOCKED — RETRY '
WS-RETRY-COUNT
END-IF
WHEN OTHER
SET RETRY-NO TO TRUE
PERFORM 9800-LOG-ERROR
END-EVALUATE
END-PERFORM.
🧪 Real-World Note In true z/OS batch COBOL, you cannot easily implement a timed wait between retries (there is no SLEEP verb). The retry loop above retries immediately. In CICS (online) COBOL, you have access to EXEC CICS DELAY, which enables proper backoff-and-retry patterns. We will cover CICS error handling in Chapter 30.
The Dead Letter Queue Pattern
Borrowed from messaging systems, the dead letter queue pattern captures records that cannot be processed for any reason — not just validation failures, but also records that cause unexpected errors during processing. The dead letter file contains the original record plus diagnostic information:
01 WS-DEAD-LETTER.
05 DL-ORIGINAL-RECORD PIC X(500).
05 DL-ERROR-TYPE PIC X(10).
88 DL-VALIDATION VALUE 'VALIDATION'.
88 DL-PROCESSING VALUE 'PROCESSING'.
88 DL-REFERENCE VALUE 'REFERENCE '.
88 DL-SYSTEM VALUE 'SYSTEM '.
05 DL-ERROR-CODE PIC X(04).
05 DL-ERROR-MESSAGE PIC X(80).
05 DL-PROGRAM-NAME PIC X(08).
05 DL-PARAGRAPH-NAME PIC X(30).
05 DL-TIMESTAMP PIC X(26).
05 DL-RECORD-NUMBER PIC 9(09).
05 DL-RETRY-COUNT PIC 9(02) VALUE ZERO.
The DL-ERROR-TYPE field distinguishes between different classes of failure: VALIDATION errors are data quality issues, PROCESSING errors are logic failures, REFERENCE errors are referential integrity failures (e.g., the account does not exist), and SYSTEM errors are I/O failures or resource issues.
A companion program can later read the dead letter file, attempt to reprocess correctable records, and generate a human-readable exception report for records that require manual intervention. This pattern is particularly useful in MedClaim's environment, where claims that fail processing should not be silently discarded — they represent revenue.
🔵 Modern Practice The dead letter queue concept is native to modern messaging systems like IBM MQ, Apache Kafka, and Amazon SQS. In mainframe COBOL batch processing, the reject file serves the same purpose. When mainframe shops integrate with message queues, they often map between COBOL reject files and MQ dead letter queues as part of their modernization strategy.
Defensive Programming and Performance
A common objection to defensive programming is performance overhead. Checking FILE STATUS, validating every field, and writing structured log entries all cost CPU time. How significant is the overhead?
In practice, the overhead is negligible compared to I/O costs. For a batch program that processes 2 million records, the FILE STATUS checks add perhaps 0.5 seconds of CPU time. The validations add perhaps 2-3 seconds. The I/O operations — reading from disk, writing to disk — consume 95% of the elapsed time. The defensive checks occupy less than 1% of the total runtime.
James Okafor puts it bluntly: "I've never seen a COBOL batch program where defensive programming was the performance bottleneck. I've seen dozens where the lack of defensive programming was the availability bottleneck — because the program ABENDed and had to be rerun, costing hours of wall-clock time."
There are two exceptions where performance matters:
-
Reference validation with file I/O. If you validate every claim's member ID by reading the MEMBER-FILE, that is an I/O operation per record. For high-volume files, consider loading reference data into working-storage tables during initialization.
-
Logging at trace level. Writing a log entry for every successfully processed record (not just errors) can generate enormous log files. Use severity-based logging: log everything at ERROR and FATAL levels, but log INFO and DEBUG only when explicitly enabled (perhaps via a runtime parameter).
10.10 Logging and Audit Trails
A well-designed error log is worth more than the cleverest error-handling code. When problems occur at 2 AM and the on-call operator needs to diagnose and resolve the issue, the log is all they have.
Log Record Design
01 WS-LOG-RECORD.
05 WS-LOG-TIMESTAMP PIC X(26).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-SEVERITY PIC X(05).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-PROGRAM PIC X(08).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-PARAGRAPH PIC X(30).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-FILE-STATUS PIC X(02).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-RECORD-KEY PIC X(20).
05 FILLER PIC X(01) VALUE '|'.
05 WS-LOG-MESSAGE PIC X(80).
The Logging Paragraph
9800-LOG-ERROR.
MOVE FUNCTION CURRENT-DATE TO WS-CURR-DATE-TIME
STRING WS-CURR-YEAR DELIMITED BY SIZE
'-' DELIMITED BY SIZE
WS-CURR-MONTH DELIMITED BY SIZE
'-' DELIMITED BY SIZE
WS-CURR-DAY DELIMITED BY SIZE
' ' DELIMITED BY SIZE
WS-CURR-HOUR DELIMITED BY SIZE
':' DELIMITED BY SIZE
WS-CURR-MINUTE DELIMITED BY SIZE
':' DELIMITED BY SIZE
WS-CURR-SECOND DELIMITED BY SIZE
INTO WS-LOG-TIMESTAMP
END-STRING
MOVE 'ERROR' TO WS-LOG-SEVERITY
MOVE WS-PROGRAM-NAME TO WS-LOG-PROGRAM
MOVE WS-CURRENT-PARA TO WS-LOG-PARAGRAPH
MOVE WS-CURRENT-STATUS TO WS-LOG-FILE-STATUS
MOVE WS-CURRENT-KEY TO WS-LOG-RECORD-KEY
MOVE WS-ERR-MSG TO WS-LOG-MESSAGE
WRITE LOG-FILE-RECORD FROM WS-LOG-RECORD
IF NOT LOG-SUCCESS
DISPLAY 'CRITICAL: Cannot write to error log'
DISPLAY WS-LOG-RECORD
END-IF
ADD 1 TO WS-ERROR-COUNT.
Note the recursive defense: the logging paragraph itself checks whether the log write succeeded. If the log file is also broken, it falls back to DISPLAY, which goes to SYSOUT on z/OS — always available, if not always monitored.
Structured Logging for Modern Analysis
Modern operations teams parse logs with tools like Splunk or ELK. Producing structured output helps:
* JSON-style log entry for Splunk ingestion
STRING '{"ts":"' DELIMITED BY SIZE
WS-LOG-TIMESTAMP DELIMITED BY SIZE
'","pgm":"' DELIMITED BY SIZE
WS-LOG-PROGRAM DELIMITED BY SPACES
'","sev":"' DELIMITED BY SIZE
WS-LOG-SEVERITY DELIMITED BY SPACES
'","para":"' DELIMITED BY SIZE
WS-LOG-PARAGRAPH DELIMITED BY SPACES
'","fs":"' DELIMITED BY SIZE
WS-LOG-FILE-STATUS DELIMITED BY SIZE
'","key":"' DELIMITED BY SIZE
WS-LOG-RECORD-KEY DELIMITED BY SPACES
'","msg":"' DELIMITED BY SIZE
WS-LOG-MESSAGE DELIMITED BY ' '
'"}' DELIMITED BY SIZE
INTO WS-JSON-LOG-LINE
END-STRING
🔵 Modern Practice Several mainframe shops now write COBOL error logs directly as JSON lines, which are then streamed to Splunk or Elasticsearch via z/OS Connect or MQ. This bridges the gap between traditional batch processing and modern observability platforms.
Summary Statistics at Termination
Every production program should display summary statistics at termination. This is part of defensive programming because it provides a quick health check — the operator or reviewer can scan the numbers and immediately spot anomalies:
9000-TERMINATE.
DISPLAY '================================='
DISPLAY 'PROGRAM: ' WS-PROGRAM-NAME
DISPLAY 'RUN DATE: ' WS-CURRENT-DATE
DISPLAY '================================='
DISPLAY 'Records read: ' WS-READ-COUNT
DISPLAY 'Records processed: ' WS-PROCESS-COUNT
DISPLAY 'Records rejected: ' WS-REJECT-COUNT
DISPLAY 'Records in error: ' WS-ERROR-COUNT
DISPLAY 'Total amount: ' WS-TOTAL-AMOUNT
DISPLAY '---------------------------------'
DISPLAY 'INFO log entries: ' WS-INFO-LOG-COUNT
DISPLAY 'WARNING log entries:' WS-WARN-LOG-COUNT
DISPLAY 'ERROR log entries: ' WS-ERROR-LOG-COUNT
DISPLAY '================================='
* Verify record counts balance
COMPUTE WS-BALANCE-CHECK =
WS-PROCESS-COUNT + WS-REJECT-COUNT
IF WS-BALANCE-CHECK NOT = WS-READ-COUNT
DISPLAY '*** RECORD COUNT IMBALANCE ***'
DISPLAY ' Read: ' WS-READ-COUNT
DISPLAY ' Proc+Rej: ' WS-BALANCE-CHECK
MOVE RC-ERROR TO RETURN-CODE
END-IF
Notice the balancing check at the end. The number of records read should equal the number processed plus the number rejected. If it does not, something went wrong — perhaps a record was silently skipped without being counted in either category. This kind of self-auditing catches bugs that might otherwise go unnoticed.
Maria Chen makes her programs display these statistics both to SYSOUT (for the operator console) and to the error log file (for permanent record). "The console display is for the night operator," she says. "The log file entry is for Monday morning's review meeting."
10.11 GlobalBank Case Study: Error Handling in TXN-PROC
Let us trace what happens when a transaction fails in GlobalBank's TXN-PROC program.
Scenario: Transaction Against a Non-Existent Account
Derek Washington runs TXN-PROC for the first time in production. A transaction references account number 9999999999 — an account that does not exist in the master file.
Here is the error handling flow:
1. TXN-PROC reads TXN-RECORD with TXN-ACCT-FROM = '9999999999'
2. TXN-PROC does a random READ on ACCT-MASTER-FILE using the key
3. FILE STATUS returns '23' (record not found)
4. 88-level ACCT-NOT-FOUND is TRUE
5. Program executes PERFORM 2300-HANDLE-NOT-FOUND
6. 2300 writes a reject record to the reject file
7. 2300 calls 9800-LOG-ERROR with details
8. 2300 increments WS-REJECT-COUNT
9. Control returns to the main processing loop
10. Next transaction is read and processed
The key point: the program does not ABEND. It logs the error, rejects the transaction, and continues. A human reviewer will examine the reject file and determine whether account 9999999999 is a data entry error, a test record that leaked into production, or a genuine problem.
Scenario: Account File is Locked
During a month-end run, another job has exclusive access to ACCT-MASTER-FILE. TXN-PROC's OPEN returns status '38' (file locked).
1000-INITIALIZE.
OPEN I-O ACCT-MASTER-FILE
EVALUATE TRUE
WHEN ACCT-SUCCESS
CONTINUE
WHEN ACCT-LOCKED
DISPLAY 'ACCT FILE LOCKED BY ANOTHER JOB'
DISPLAY 'RETRY AFTER MONTH-END COMPLETES'
MOVE RC-SEVERE TO RETURN-CODE
STOP RUN
WHEN ACCT-NOT-EXISTS
DISPLAY 'ACCT FILE NOT FOUND — CHECK JCL'
MOVE RC-FATAL TO RETURN-CODE
STOP RUN
WHEN OTHER
DISPLAY 'UNEXPECTED ERROR OPENING ACCT FILE'
DISPLAY 'STATUS: ' WS-ACCT-STATUS
MOVE RC-FATAL TO RETURN-CODE
STOP RUN
END-EVALUATE.
The error message is specific and actionable: it tells the operator exactly what happened and what to do about it.
Scenario: Disk Full During WRITE
Mid-way through processing, the output file's disk allocation fills up. FILE STATUS returns '34' (boundary violation — sequential file full).
Maria's code handles this by: 1. Logging the error with the last successfully written record's key 2. Setting RETURN-CODE to RC-SEVERE (12) 3. Closing all files in an orderly fashion 4. Stopping the program
The JCL for subsequent steps checks the condition code and skips downstream processing. The operations team expands the disk allocation and reruns the job.
10.12 MedClaim Case Study: Handling Malformed Claim Records
MedClaim receives claims from hundreds of healthcare providers, each with their own systems and varying data quality. James Okafor's CLM-INTAKE program is the front line of defense.
The Validation Pipeline
3000-VALIDATE-CLAIM.
INITIALIZE WS-VALIDATION-RESULTS
SET WS-RECORD-VALID TO TRUE
* Level 1: Field-level validation
PERFORM 3100-VAL-CLAIM-ID
PERFORM 3110-VAL-CLAIM-TYPE
PERFORM 3120-VAL-MEMBER-ID
PERFORM 3130-VAL-PROVIDER-NPI
PERFORM 3140-VAL-BILLED-AMOUNT
PERFORM 3150-VAL-DIAGNOSIS-CODES
PERFORM 3160-VAL-PROCEDURE-CODE
PERFORM 3170-VAL-SERVICE-DATE
* Level 2: Cross-field validation
IF WS-RECORD-VALID
PERFORM 3200-VAL-TYPE-AMOUNT
PERFORM 3210-VAL-DATES-SEQUENCE
PERFORM 3220-VAL-DIAG-PROC-MATCH
END-IF
* Level 3: Referential validation
IF WS-RECORD-VALID
PERFORM 3300-VAL-MEMBER-EXISTS
PERFORM 3310-VAL-PROVIDER-EXISTS
PERFORM 3320-VAL-COVERAGE-ACTIVE
END-IF.
Notice the tiered approach: field-level validation runs first. If any field is invalid, cross-field validation is skipped (because comparing invalid fields is meaningless). Referential validation — checking against the member and provider files — only runs if the basic data is clean.
Handling Malformed Numeric Fields
A common problem: a provider's system sends non-numeric data in a numeric field. If COBOL tries to use this data in an arithmetic operation, the result is an ABEND (specifically, a S0C7 data exception on z/OS).
3140-VAL-BILLED-AMOUNT.
* The billed amount comes as display numeric
* in the raw input record
IF WS-RAW-BILLED-AMT IS NUMERIC
MOVE WS-RAW-BILLED-AMT TO CLM-BILLED-AMT
IF CLM-BILLED-AMT < 0.01
PERFORM 3141-ADD-ERROR
MOVE 'BILLED AMOUNT LESS THAN $0.01'
TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
END-IF
IF CLM-BILLED-AMT > 999999.99
PERFORM 3141-ADD-ERROR
MOVE 'BILLED AMOUNT EXCEEDS MAXIMUM'
TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
END-IF
ELSE
PERFORM 3141-ADD-ERROR
MOVE 'BILLED AMOUNT IS NOT NUMERIC'
TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
END-IF.
3141-ADD-ERROR.
ADD 1 TO WS-VAL-ERROR-COUNT
SET WS-RECORD-INVALID TO TRUE
MOVE 'CLM-BILLED-AMT'
TO WS-VAL-ERR-FIELD(WS-VAL-ERROR-COUNT).
The Reject File Pattern
Invalid records are written to a reject file with error details, enabling human review and resubmission:
01 WS-REJECT-RECORD.
05 REJ-ORIGINAL-DATA PIC X(500).
05 REJ-ERROR-COUNT PIC 9(03).
05 REJ-ERROR-DETAILS.
10 REJ-ERROR OCCURS 20 TIMES.
15 REJ-ERR-FIELD PIC X(30).
15 REJ-ERR-MSG PIC X(50).
05 REJ-TIMESTAMP PIC X(26).
05 REJ-SOURCE-FILE PIC X(44).
✅ Defensive Programming Checklist Before submitting any COBOL program for production deployment, verify: - [ ] FILE STATUS defined for every file - [ ] FILE STATUS checked after every I/O operation - [ ] ON SIZE ERROR on all arithmetic that could overflow - [ ] Input validation for all external data - [ ] Date validation (not just format, but logical correctness) - [ ] Boundary checks for all table subscripts - [ ] Counter sizes appropriate for expected volumes - [ ] Error threshold for graceful degradation - [ ] Meaningful RETURN-CODE set at termination - [ ] Error log with timestamp, program, paragraph, and context - [ ] All files closed in termination logic (including error paths)
10.13 Defensive Coding for COMP-3 Fields
One of the most insidious errors in COBOL is the S0C7 data exception on z/OS — a system ABEND that occurs when the program attempts to use a COMP-3 (packed decimal) field that contains invalid data. Understanding why this happens and how to prevent it is a core defensive programming skill.
How COMP-3 Works
COMP-3 stores two decimal digits per byte, with the sign stored in the lower nibble of the last byte. For example, the value +12345 in PIC S9(5) COMP-3 is stored as:
Hex: 12 34 5C
^^ ^^ ^^
12 34 5+ (C = positive sign)
Valid sign nibbles are: C (positive), D (negative), F (unsigned positive). Any other value in the sign nibble — and any hex digit greater than 9 in a digit nibble — causes a data exception when the field is used in arithmetic.
Common Sources of COMP-3 Corruption
-
Moving alphanumeric data to a COMP-3 field without validation. If a field comes from an external source as PIC X(10) and you move it directly to a PIC S9(7)V99 COMP-3 field, invalid characters become invalid packed-decimal bytes.
-
Reading a file with a mismatched copybook. If the copybook defines a field as COMP-3 but the actual data was written with a different layout, the bytes at that position may not be valid packed decimal.
-
Uninitialized WORKING-STORAGE. Although most compilers initialize WORKING-STORAGE to LOW-VALUES (x'00'), some older compilers or runtime environments may leave memory uninitialized. A COMP-3 field containing x'00' is actually valid (it represents zero with an unsigned sign), but x'FF' is not.
-
Copybook version mismatch. A program compiled with version 1 of a copybook reads a file written by a program compiled with version 2, where field positions have shifted.
The Defense: Validate Before You Compute
* WRONG — no protection against invalid COMP-3
ADD TXN-AMOUNT TO ACCT-BALANCE
* RIGHT — validate the raw input, then compute
IF WS-RAW-TXN-AMOUNT IS NUMERIC
MOVE WS-RAW-TXN-AMOUNT TO TXN-AMOUNT
ADD TXN-AMOUNT TO ACCT-BALANCE
ON SIZE ERROR
PERFORM HANDLE-OVERFLOW
END-ADD
ELSE
MOVE 'NON-NUMERIC TXN AMOUNT' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
PERFORM REJECT-TRANSACTION
END-IF
The FUNCTION TEST-NUMVAL Approach
COBOL 2002 introduced intrinsic functions for validating numeric representations. FUNCTION TEST-NUMVAL returns zero if the argument is a valid numeric value:
IF FUNCTION TEST-NUMVAL(WS-RAW-AMOUNT) = 0
COMPUTE WS-VALIDATED-AMOUNT =
FUNCTION NUMVAL(WS-RAW-AMOUNT)
ELSE
MOVE 'INVALID NUMERIC FORMAT' TO WS-ERR-MSG
PERFORM 9800-LOG-ERROR
END-IF
Not all COBOL compilers support these functions, but Enterprise COBOL V5+ and GnuCOBOL 3.x do.
INITIALIZE as a Safety Net
Always INITIALIZE working-storage group items before use, especially those that contain COMP-3 fields:
INITIALIZE WS-TRANSACTION-WORK
INITIALIZE WS-CALCULATION-FIELDS
INITIALIZE sets numeric fields to zero and alphanumeric fields to spaces, ensuring COMP-3 fields contain valid packed-decimal data.
10.14 Environment-Specific Defensive Techniques
z/OS-Specific Defenses
On z/OS, several platform-specific features support defensive programming:
ABEND codes and their meaning:
| ABEND | Cause | Defensive Response |
|---|---|---|
| S0C7 | Data exception (invalid packed decimal) | Validate all numeric data before arithmetic |
| S0C4 | Protection exception (invalid memory access) | Check subscripts, validate linkage section pointers |
| S0CB | Division by zero | Use ON SIZE ERROR or pre-check divisor |
| S0C1 | Operation exception | Usually a program structure issue |
| S013 | Dataset open error | Check FILE STATUS after OPEN |
| S001 | I/O error | Check FILE STATUS after every I/O |
The CEEDUMP and Language Environment: When a program ABENDs under IBM Language Environment, the runtime produces a CEEDUMP — a formatted dump that shows the call chain, register contents, and the last few lines of COBOL source executed. While you want to avoid ABENDs, understanding CEEDUMP output is essential for production support.
Compile options for defensive programming:
//COBOL EXEC PGM=IGYCRCTL,
// PARM='SSRANGE,TEST,LIST(SOURCE),NUMCHECK'
| Option | Purpose |
|---|---|
| SSRANGE | Runtime subscript range checking — ABENDs with S0C4-like diagnostic if a subscript goes out of bounds |
| NUMCHECK | Runtime packed-decimal validity checking — catches invalid COMP-3 at the point of use rather than later |
| TEST | Enables symbolic debugging |
| LIST(SOURCE) | Produces expanded source listing including copybook insertions |
⚠️ Performance Note SSRANGE and NUMCHECK add runtime overhead (typically 5-15%). Many shops enable them during testing and disable them in production. James Okafor at MedClaim keeps SSRANGE enabled even in production: "The performance cost is negligible compared to the cost of a data corruption incident."
GnuCOBOL-Specific Defenses
GnuCOBOL provides its own set of runtime checks:
# Enable runtime bounds checking
export COB_RUNTIME_FLAGS="-debug"
# Compile with bounds checking
cobc -x -debug -Wall -fbound-check myprogram.cbl
The -debug flag enables runtime checks for subscript bounds, numeric validity, and other common errors. The -Wall flag enables all compiler warnings, which can catch potential issues at compile time.
10.15 The ABEND Handler Pattern
When an error is truly unrecoverable, the program must stop — but it should stop in an orderly fashion:
9900-ABEND-PROGRAM.
DISPLAY '*** PROGRAM ABEND ***'
DISPLAY 'PROGRAM: ' WS-PROGRAM-NAME
DISPLAY 'PARAGRAPH: ' WS-CURRENT-PARA
DISPLAY 'ERROR: ' WS-ERR-MSG
DISPLAY 'RECORDS PROCESSED: ' WS-RECORD-COUNT
DISPLAY 'ERRORS LOGGED: ' WS-ERROR-COUNT
* Write final summary to log
MOVE 'FATAL' TO WS-LOG-SEVERITY
MOVE 'PROGRAM ABENDING' TO WS-LOG-MESSAGE
PERFORM 9800-LOG-ERROR
* Close all open files
PERFORM 9910-CLOSE-ALL-FILES
* Set return code and stop
MOVE RC-FATAL TO RETURN-CODE
STOP RUN.
9910-CLOSE-ALL-FILES.
* Close each file, ignoring errors on close
* (we are already in error state)
CLOSE ACCT-MASTER-FILE
CLOSE TXN-INPUT-FILE
CLOSE REPORT-FILE
CLOSE ERROR-LOG-FILE.
💡 Key Insight — Closing Files in Error Paths Even in an ABEND situation, close your files. On z/OS, an unclosed VSAM file can be left in a "not properly closed" state that requires manual intervention (VERIFY) before any other program can access it. The few microseconds spent on CLOSE can save hours of operational recovery.
10.16 Putting It All Together — A Defensive Programming Template
Here is a structural template that incorporates all the patterns from this chapter. Use it as a starting point for any production COBOL program:
IDENTIFICATION DIVISION.
PROGRAM-ID. TEMPLATE.
*================================================================*
* TEMPLATE - Defensive Programming Template
* Replace TEMPLATE with actual program name
*================================================================*
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT INPUT-FILE
ASSIGN TO INFILE
FILE STATUS IS WS-INPUT-STATUS.
SELECT OUTPUT-FILE
ASSIGN TO OUTFILE
FILE STATUS IS WS-OUTPUT-STATUS.
SELECT ERROR-LOG
ASSIGN TO ERRLOG
FILE STATUS IS WS-LOG-STATUS.
DATA DIVISION.
FILE SECTION.
* ... FD entries ...
WORKING-STORAGE SECTION.
* --- File Status Fields ---
01 WS-INPUT-STATUS PIC XX.
88 INPUT-SUCCESS VALUE '00'.
88 INPUT-EOF VALUE '10'.
01 WS-OUTPUT-STATUS PIC XX.
88 OUTPUT-SUCCESS VALUE '00'.
01 WS-LOG-STATUS PIC XX.
88 LOG-SUCCESS VALUE '00'.
* --- Program Control ---
01 WS-PROGRAM-NAME PIC X(08) VALUE 'TEMPLATE'.
01 WS-CURRENT-PARA PIC X(30).
01 WS-END-OF-FILE PIC X VALUE 'N'.
88 END-OF-FILE VALUE 'Y'.
88 NOT-END-OF-FILE VALUE 'N'.
* --- Counters ---
01 WS-COUNTERS.
05 WS-READ-COUNT PIC 9(09) VALUE ZERO.
05 WS-WRITE-COUNT PIC 9(09) VALUE ZERO.
05 WS-ERROR-COUNT PIC 9(07) VALUE ZERO.
05 WS-REJECT-COUNT PIC 9(07) VALUE ZERO.
* --- Error Handling ---
01 WS-ERR-MSG PIC X(80).
01 WS-MAX-ERRORS PIC 9(05) VALUE 100.
* --- Return Codes ---
01 RC-SUCCESS PIC S9(04) COMP VALUE +0.
01 RC-WARNING PIC S9(04) COMP VALUE +4.
01 RC-ERROR PIC S9(04) COMP VALUE +8.
01 RC-SEVERE PIC S9(04) COMP VALUE +12.
01 RC-FATAL PIC S9(04) COMP VALUE +16.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-PROCESS UNTIL END-OF-FILE
PERFORM 9000-TERMINATE
STOP RUN.
1000-INITIALIZE.
MOVE '1000-INITIALIZE' TO WS-CURRENT-PARA
* Open files with full status checking
* Initialize working storage
* Read first record (priming read)
.
2000-PROCESS.
MOVE '2000-PROCESS' TO WS-CURRENT-PARA
* Validate current record
* Process if valid, reject if invalid
* Check error threshold
* Read next record
.
9000-TERMINATE.
MOVE '9000-TERMINATE' TO WS-CURRENT-PARA
DISPLAY 'PROCESSING COMPLETE'
DISPLAY 'RECORDS READ: ' WS-READ-COUNT
DISPLAY 'RECORDS WRITTEN: ' WS-WRITE-COUNT
DISPLAY 'RECORDS REJECTED: ' WS-REJECT-COUNT
DISPLAY 'ERRORS: ' WS-ERROR-COUNT
EVALUATE TRUE
WHEN WS-ERROR-COUNT > WS-MAX-ERRORS
MOVE RC-SEVERE TO RETURN-CODE
WHEN WS-ERROR-COUNT > 0
MOVE RC-WARNING TO RETURN-CODE
WHEN OTHER
MOVE RC-SUCCESS TO RETURN-CODE
END-EVALUATE
CLOSE INPUT-FILE
OUTPUT-FILE
ERROR-LOG.
9800-LOG-ERROR.
* Write structured error log entry
* Increment error count
* Check for log write failure
.
9900-ABEND-PROGRAM.
* Display fatal error information
* Write final log entry
* Close all files
* Set RC-FATAL
* STOP RUN
.
10.17 Best Practices Summary
-
Check FILE STATUS after every I/O. No exceptions. Use 88-level conditions for readability.
-
Use ON SIZE ERROR on all arithmetic. Especially on COMPUTE, DIVIDE, and any calculation involving external data.
-
Validate all external input. Never assume data from files, users, or other systems is clean.
-
Use tiered validation. Field-level first, then cross-field, then referential.
-
Implement error thresholds. Both total and consecutive error limits prevent infinite bad-data processing.
-
Set meaningful return codes. 0/4/8/12/16 — let the JCL and operations team know the result.
-
Log with context. Timestamp, program name, paragraph, file status, record key, and message. Always.
-
Close files in error paths. Even when ABENDing, close your files.
-
Separate error severity. Not all errors are equal. Distinguish between warnings (continue), errors (reject and continue), and fatal errors (stop).
-
Document your error strategy. In the program header comment, describe what errors the program handles and how.
✅ Chapter Checkpoint You should now be able to: - Declare and check FILE STATUS codes for every file operation - Use ON SIZE ERROR, ON OVERFLOW, and INVALID KEY handlers - Implement DECLARATIVES and USE statements - Set and propagate return codes through JCL condition code checking - Build input validation frameworks with tiered checking - Handle boundary conditions (empty files, table overflow, counter limits) - Implement graceful degradation with error thresholds - Design structured error logs for production diagnostics - Write orderly ABEND handlers that leave data in a clean state
Chapter Summary
Defensive programming is not about expecting failure — it is about being prepared for it. In a production COBOL environment, where programs process millions of records containing financial or medical data, the cost of an unhandled error can range from hours of manual recovery to regulatory violations and financial loss.
The techniques in this chapter form a toolkit that you will apply throughout the rest of this textbook. FILE STATUS checking accompanies every file operation in Chapters 11-16. Input validation underpins the data manipulation techniques of Chapters 17-21. Return codes and error propagation become essential in the modular design of Chapters 22-26. And the testing chapters (33-37) will show you how to deliberately inject errors to verify that your defensive measures work.
James Okafor summarizes the philosophy: "Write every program as if the person who will debug it at 2 AM on a Saturday is a sleep-deprived version of yourself. Give that person everything they need — clear error messages, clean logs, and an orderly shutdown. They'll thank you."
The investment in defensive programming pays for itself many times over. A program that takes an extra day to write because of comprehensive error handling saves weeks of debugging, recovery, and reputation repair over its lifetime. And in COBOL, lifetimes are measured in decades.
In the next chapter, we move to Sequential File Processing — the most common I/O pattern in batch COBOL, where every defensive technique from this chapter will be put into practice.