Chapter 10: Defensive Programming

Generated by Claude

29 min read

> "In production COBOL, the question is never if something will go wrong. It's when, and whether your program will handle it gracefully or bring down the batch window."

In This Chapter

10.1 The Error Handling Philosophy
10.2 FILE STATUS Codes — Your First Line of Defense
10.3 Arithmetic Error Handlers
10.4 INVALID KEY for Indexed and Relative Files
10.5 DECLARATIVES and USE Statements
10.6 Return Codes and Error Propagation
10.7 Input Validation Techniques
10.8 Boundary Condition Handling
10.9 Graceful Degradation Patterns
10.10 Logging and Audit Trails
10.11 GlobalBank Case Study: Error Handling in TXN-PROC
10.12 MedClaim Case Study: Handling Malformed Claim Records
10.13 Defensive Coding for COMP-3 Fields
10.14 Environment-Specific Defensive Techniques
10.15 The ABEND Handler Pattern
10.16 Putting It All Together — A Defensive Programming Template
10.17 Best Practices Summary
Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 10: Defensive Programming

"In production COBOL, the question is never if something will go wrong. It's when, and whether your program will handle it gracefully or bring down the batch window." — James Okafor, Team Lead, MedClaim Health Services

Every COBOL program you wrote in your introductory course probably assumed perfect inputs, abundant disk space, and cooperative file systems. You read a record, it was there. You wrote output, it worked. The AT END condition was the only exception you handled. In the real world — the world of 2.3 million daily transactions at GlobalBank, of 500,000 monthly claims at MedClaim — things go wrong constantly. Files are locked by other jobs. Records contain unexpected characters. Disk allocations fill up. Network connections to DB2 drop. Integer overflows silently truncate important calculations.

Defensive programming is the discipline of anticipating what can go wrong and handling it before it causes data corruption, an ABEND, or a silent miscalculation. It is not pessimism — it is professionalism. This chapter teaches you the specific COBOL constructs and patterns that make programs resilient: FILE STATUS codes, arithmetic error handlers, DECLARATIVES, input validation techniques, and the broader strategies of error propagation, graceful degradation, and audit logging.

This chapter is the primary home of the Defensive Programming theme that threads through the rest of this textbook. The techniques introduced here will be applied and extended in every subsequent chapter.

Think of defensive programming not as additional work layered on top of "real" programming, but as an integral part of writing correct software. In the same way that a structural engineer does not view load calculations as "extra" — they are the core of the work — a COBOL developer should view FILE STATUS checking, input validation, and error logging as intrinsic to the craft, not as overhead. The programs in this chapter may look longer than their undefended counterparts, but they are also the programs that run unattended for years without intervention, processing billions of records while their operators sleep soundly.

10.1 The Error Handling Philosophy

Before we dive into syntax, let us establish a philosophy. Defensive programming in COBOL rests on four principles:

Principle 1: Never Assume Success. Every I/O operation, every arithmetic operation that could overflow, every data field that comes from an external source — check the result. Do not assume the operation succeeded just because it usually does.

Principle 2: Fail Loudly. When something goes wrong, make sure someone knows. A program that encounters an error and silently continues is far more dangerous than one that ABENDs. Silent failures cause data corruption that may not be discovered for days or weeks.

Principle 3: Fail Safely. When an error is unrecoverable, stop processing in a way that leaves data in a consistent state. Close files. Set return codes. Write a final log entry. Do not leave partial updates.

Principle 4: Log Everything. Every error should be recorded with enough context to diagnose the problem: what happened, when, in which program, at which paragraph, processing which record. Tomorrow morning's production support team will thank you.

💡 Key Insight — Legacy != Obsolete The defensive programming patterns in COBOL predate modern concepts like exception handling in Java or Python. But they are not inferior — they are different. COBOL's explicit, procedural error handling forces developers to think through every failure mode, which is exactly what you want for systems that process millions of financial transactions or medical claims. Many Java shops have adopted similar explicit error-handling patterns (like Google's error-return conventions in Go) precisely because exception-based handling can obscure error flows.

The Cost of Not Being Defensive

James Okafor tells a story from his early days at MedClaim. A batch program that processed claim payments did not check FILE STATUS after writing to the payment file. One night, the output disk allocation filled up. The WRITE operation silently failed for the last 3,000 records. The program completed with a return code of zero — "success." The next morning, 3,000 healthcare providers did not receive their payments. It took two days to identify the problem, regenerate the missing payments, and restore trust. "That's when I became a fanatic about checking every single I/O operation," James says.

10.2 FILE STATUS Codes — Your First Line of Defense

The FILE STATUS clause is the single most important defensive programming feature in COBOL. It tells you whether an I/O operation succeeded and, if it failed, why. Without FILE STATUS, a failed I/O operation might silently set the record area to unpredictable values, leaving your program to process garbage data as if nothing were wrong. With FILE STATUS, you know immediately that something went wrong, exactly what went wrong (via the two-character code), and you can decide how to respond — retry, skip, log, or ABEND.

Every professional COBOL developer treats FILE STATUS as mandatory. It appears in every SELECT statement, and its value is checked after every I/O operation. If you take only one technique from this entire chapter and apply it to every program you write for the rest of your career, let it be this one.

Declaring FILE STATUS

In the FILE-CONTROL paragraph, every SELECT statement should include a FILE STATUS clause:

       FILE-CONTROL.
           SELECT ACCT-MASTER-FILE
               ASSIGN TO ACCTMAST
               ORGANIZATION IS INDEXED
               ACCESS MODE IS DYNAMIC
               RECORD KEY IS ACCT-NUMBER
               FILE STATUS IS WS-ACCT-STATUS.

           SELECT TXN-INPUT-FILE
               ASSIGN TO TXNIN
               FILE STATUS IS WS-TXN-STATUS.

           SELECT REPORT-FILE
               ASSIGN TO RPTOUT
               FILE STATUS IS WS-RPT-STATUS.

The status variable must be defined in WORKING-STORAGE as a two-character field:

       01  WS-ACCT-STATUS           PIC XX.
       01  WS-TXN-STATUS            PIC XX.
       01  WS-RPT-STATUS            PIC XX.

The Status Code Table

After every file operation (OPEN, READ, WRITE, REWRITE, DELETE, START, CLOSE), the runtime sets the status variable to a two-character code:

Code	Category	Meaning
`00`	Success	Operation completed successfully
`02`	Success	READ successful, duplicate alternate key exists
`04`	Success	READ successful, record length does not match FD
`05`	Success	OPEN successful, optional file not present (created)
`07`	Success	CLOSE with NO REWIND/REEL on non-reel device
`10`	End	AT END — no more records to read
`14`	End	Sequential READ on relative file, record number too large
`21`	Invalid key	Sequence error on sequential write to indexed file
`22`	Invalid key	Duplicate key on WRITE or REWRITE
`23`	Invalid key	Record not found on READ/START/DELETE
`24`	Invalid key	Boundary violation — file full or key out of range
`30`	Permanent error	I/O error (hardware/system failure)
`34`	Permanent error	Boundary violation — file full, sequential
`35`	Permanent error	OPEN failed — file not found (non-optional)
`37`	Permanent error	OPEN failed — file type conflict
`38`	Permanent error	OPEN failed — file locked by another process
`39`	Permanent error	OPEN failed — FD attributes conflict with actual file
`41`	Logic error	OPEN on already-open file
`42`	Logic error	CLOSE on already-closed file
`43`	Logic error	REWRITE/DELETE without prior READ in sequential mode
`44`	Logic error	REWRITE with changed record length (fixed-length file)
`46`	Logic error	Sequential READ without positioning (no prior READ/START)
`47`	Logic error	READ on file not opened for INPUT or I-O
`48`	Logic error	WRITE on file not opened for OUTPUT, I-O, or EXTEND
`49`	Logic error	REWRITE/DELETE on file not opened for I-O
`9x`	Implementation	Vendor-specific status codes

📊 Status Code Categories at a Glance | First Digit | Meaning | |-------------|---------| | 0 | Successful completion | | 1 | AT END condition | | 2 | Invalid key condition | | 3 | Permanent error | | 4 | Logic error (programmer mistake) | | 9 | Vendor-specific |

Checking FILE STATUS — The Pattern

Here is the fundamental pattern that every production COBOL program should follow:

       2000-READ-TRANSACTION.
           READ TXN-INPUT-FILE INTO WS-TXN-RECORD
           EVALUATE WS-TXN-STATUS
               WHEN '00'
                   ADD 1 TO WS-TXN-READ-COUNT
                   PERFORM 3000-PROCESS-TRANSACTION
               WHEN '10'
                   SET WS-END-OF-FILE TO TRUE
               WHEN OTHER
                   MOVE 'READ FAILED ON TXN FILE' TO WS-ERR-MSG
                   MOVE WS-TXN-STATUS TO WS-ERR-STATUS
                   PERFORM 9800-LOG-ERROR
                   PERFORM 9900-ABEND-PROGRAM
           END-EVALUATE.

⚠️ Critical Rule Check FILE STATUS after every file operation. Not just READs — also OPENs, WRITEs, REWRITEs, DELETEs, STARTs, and CLOSEs. The OPEN check is especially important because a failed OPEN means every subsequent operation on that file will also fail, potentially in confusing ways.

88-Level Conditions for Status Codes

Using 88-level condition names makes status checking more readable:

       01  WS-ACCT-STATUS           PIC XX.
           88  ACCT-SUCCESS          VALUE '00'.
           88  ACCT-SUCCESS-DUP      VALUE '02'.
           88  ACCT-EOF              VALUE '10'.
           88  ACCT-DUP-KEY          VALUE '22'.
           88  ACCT-NOT-FOUND        VALUE '23'.
           88  ACCT-FILE-FULL        VALUE '24'.
           88  ACCT-PERM-ERROR       VALUE '30'.
           88  ACCT-NOT-EXISTS       VALUE '35'.
           88  ACCT-LOCKED           VALUE '38'.

       ...

       2100-READ-ACCOUNT.
           READ ACCT-MASTER-FILE
           EVALUATE TRUE
               WHEN ACCT-SUCCESS
                   PERFORM 2200-VALIDATE-ACCOUNT
               WHEN ACCT-NOT-FOUND
                   PERFORM 2300-HANDLE-NOT-FOUND
               WHEN ACCT-PERM-ERROR
                   PERFORM 9800-LOG-ERROR
                   PERFORM 9900-ABEND-PROGRAM
               WHEN OTHER
                   MOVE 'UNEXPECTED ACCT STATUS' TO WS-ERR-MSG
                   PERFORM 9800-LOG-ERROR
                   PERFORM 9900-ABEND-PROGRAM
           END-EVALUATE.

Checking OPEN Status — A Complete Pattern

       1000-INITIALIZE.
           OPEN INPUT ACCT-MASTER-FILE
           IF NOT ACCT-SUCCESS
               DISPLAY 'FATAL: Cannot open ACCT-MASTER-FILE'
               DISPLAY '       FILE STATUS: ' WS-ACCT-STATUS
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF

           OPEN INPUT TXN-INPUT-FILE
           IF NOT TXN-SUCCESS
               DISPLAY 'FATAL: Cannot open TXN-INPUT-FILE'
               DISPLAY '       FILE STATUS: ' WS-TXN-STATUS
               CLOSE ACCT-MASTER-FILE
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF

           OPEN OUTPUT REPORT-FILE
           IF NOT RPT-SUCCESS
               DISPLAY 'FATAL: Cannot open REPORT-FILE'
               DISPLAY '       FILE STATUS: ' WS-RPT-STATUS
               CLOSE ACCT-MASTER-FILE
               CLOSE TXN-INPUT-FILE
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF.

Notice the cascading CLOSE operations — if the third file fails to open, we close the first two files that were opened successfully. This ensures clean resource cleanup.

Try It Yourself — FILE STATUS Exercise

Write a program that opens three files: an input file, an output file, and a report file. Implement the full OPEN checking pattern above. Then read the input file in a loop, checking status after every READ. Introduce deliberate errors (reference a non-existent file, try to read from an output file) and observe the status codes.

10.3 Arithmetic Error Handlers

COBOL provides built-in phrases for detecting arithmetic errors at the statement level.

ON SIZE ERROR

The ON SIZE ERROR phrase detects when an arithmetic result is too large for the receiving field, or when a divide-by-zero occurs:

           ADD TXN-AMOUNT TO ACCT-BALANCE
               ON SIZE ERROR
                   MOVE 'BALANCE OVERFLOW FOR ACCT: '
                       TO WS-ERR-MSG
                   STRING WS-ERR-MSG DELIMITED BY '  '
                          ACCT-NUMBER DELIMITED BY SIZE
                     INTO WS-ERR-MSG
                   END-STRING
                   PERFORM 9800-LOG-ERROR
                   SET WS-TXN-REJECTED TO TRUE
               NOT ON SIZE ERROR
                   SET WS-TXN-PROCESSED TO TRUE
           END-ADD

Size error detection applies to ADD, SUBTRACT, MULTIPLY, DIVIDE, and COMPUTE:

           COMPUTE WS-INTEREST =
               ACCT-BALANCE * WS-INTEREST-RATE / 365
               ON SIZE ERROR
                   DISPLAY 'Interest calculation overflow'
                   DISPLAY 'Balance: ' ACCT-BALANCE
                   DISPLAY 'Rate: ' WS-INTEREST-RATE
                   MOVE ZERO TO WS-INTEREST
               NOT ON SIZE ERROR
                   CONTINUE
           END-COMPUTE

⚠️ Important: SIZE ERROR Does Not Detect Truncation of Decimal Places ON SIZE ERROR only fires when the integer portion of the result exceeds the receiving field. Truncation of decimal digits is not a size error. If you COMPUTE a value of 123.456 into a PIC 9(3)V99 field, the result is 123.45 — the trailing 6 is silently truncated, and no SIZE ERROR occurs. Be aware of this limitation when designing your data items.

DIVIDE and REMAINDER

Division deserves special attention because of the divide-by-zero risk:

           IF WS-TRANSACTION-COUNT > ZERO
               DIVIDE WS-TOTAL-AMOUNT BY WS-TRANSACTION-COUNT
                   GIVING WS-AVERAGE-AMOUNT
                   REMAINDER WS-REMAINDER
                   ON SIZE ERROR
                       MOVE ZERO TO WS-AVERAGE-AMOUNT
                       DISPLAY 'Division overflow in average calc'
               END-DIVIDE
           ELSE
               MOVE ZERO TO WS-AVERAGE-AMOUNT
               DISPLAY 'No transactions — average is zero'
           END-IF

The defensive pattern here is twofold: check for zero before dividing, and use ON SIZE ERROR as a safety net in case the non-zero divisor still produces an overflow.

ON OVERFLOW for STRING and UNSTRING

The STRING and UNSTRING statements have their own overflow conditions:

           STRING WS-LAST-NAME  DELIMITED BY SPACES
                  ', '           DELIMITED BY SIZE
                  WS-FIRST-NAME DELIMITED BY SPACES
             INTO WS-FULL-NAME
             WITH POINTER WS-STRING-PTR
             ON OVERFLOW
                 DISPLAY 'Name too long for WS-FULL-NAME'
                 MOVE WS-LAST-NAME TO WS-FULL-NAME
           END-STRING

ON OVERFLOW fires when the receiving field is full before all source data has been moved. Without this check, data is silently truncated.

10.4 INVALID KEY for Indexed and Relative Files

For indexed and relative file operations, the INVALID KEY phrase detects key-related errors:

           READ ACCT-MASTER-FILE
               INVALID KEY
                   IF ACCT-NOT-FOUND
                       PERFORM 2300-HANDLE-NOT-FOUND
                   ELSE
                       MOVE 'UNEXPECTED KEY ERROR' TO WS-ERR-MSG
                       PERFORM 9800-LOG-ERROR
                   END-IF
               NOT INVALID KEY
                   PERFORM 2200-PROCESS-ACCOUNT
           END-READ

INVALID KEY applies to READ (random/dynamic), WRITE, REWRITE, DELETE, and START operations on indexed files:

           WRITE ACCT-RECORD
               INVALID KEY
                   IF ACCT-DUP-KEY
                       DISPLAY 'Duplicate account: ' ACCT-NUMBER
                       ADD 1 TO WS-DUP-COUNT
                   ELSE IF ACCT-FILE-FULL
                       DISPLAY 'ACCT file full — cannot add'
                       PERFORM 9900-ABEND-PROGRAM
                   ELSE
                       MOVE 'WRITE KEY ERROR' TO WS-ERR-MSG
                       PERFORM 9800-LOG-ERROR
                   END-IF
           END-WRITE

💡 Pro Tip — FILE STATUS vs. INVALID KEY You might wonder: if I check FILE STATUS after every operation, do I also need INVALID KEY? Technically, no — FILE STATUS gives you all the information INVALID KEY does, and more. However, many shops use both as a belt-and-suspenders approach. The FILE STATUS check is the authoritative error handler, while INVALID KEY provides inline handling for expected conditions (like record-not-found during a lookup). Use whichever approach your shop's standards require, but always use FILE STATUS at minimum.

10.5 DECLARATIVES and USE Statements

DECLARATIVES provide a mechanism for associating error-handling procedures with specific files or I/O operations. They are defined at the beginning of the PROCEDURE DIVISION and are invoked automatically by the runtime when certain conditions occur.

Syntax

       PROCEDURE DIVISION.
       DECLARATIVES.
       ACCT-FILE-ERROR SECTION.
           USE AFTER STANDARD ERROR PROCEDURE ON ACCT-MASTER-FILE.
       ACCT-FILE-ERROR-PARA.
           DISPLAY 'I/O ERROR ON ACCT-MASTER-FILE'
           DISPLAY 'FILE STATUS: ' WS-ACCT-STATUS
           DISPLAY 'OPERATION: ' WS-LAST-OPERATION
           PERFORM 9800-LOG-ERROR.

       TXN-FILE-ERROR SECTION.
           USE AFTER STANDARD ERROR PROCEDURE ON TXN-INPUT-FILE.
       TXN-FILE-ERROR-PARA.
           DISPLAY 'I/O ERROR ON TXN-INPUT-FILE'
           DISPLAY 'FILE STATUS: ' WS-TXN-STATUS
           PERFORM 9800-LOG-ERROR.

       END DECLARATIVES.

       MAIN-PROGRAM SECTION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE
           ...

When DECLARATIVES Fire

A USE AFTER STANDARD ERROR procedure is invoked after an unsuccessful I/O operation — specifically, when the file status indicates a permanent error (status codes 30 and above). It fires before control returns to the statement following the failed I/O operation.

Scope of DECLARATIVES

USE AFTER ERROR PROCEDURE ON file-name — applies to a specific file
USE AFTER ERROR PROCEDURE ON INPUT — applies to all input files
USE AFTER ERROR PROCEDURE ON OUTPUT — applies to all output files
USE AFTER ERROR PROCEDURE ON I-O — applies to all I-O files
USE AFTER ERROR PROCEDURE ON EXTEND — applies to all extend files

Practical Considerations

DECLARATIVES are powerful but have some constraints:

They must appear at the very beginning of the PROCEDURE DIVISION, before any other sections.
Code within DECLARATIVES cannot reference code outside DECLARATIVES via PERFORM (though some compilers relax this).
They add a layer of indirection that can make control flow harder to follow.

Many modern COBOL shops prefer explicit FILE STATUS checking over DECLARATIVES because it keeps the error handling visible at the point of the I/O operation. However, DECLARATIVES are useful as a safety net — a catch-all that fires if a programmer forgets to check FILE STATUS at a particular I/O statement.

⚖️ Design Decision — DECLARATIVES vs. Inline Checking At GlobalBank, Maria Chen uses a hybrid approach: DECLARATIVES as a safety net for permanent errors, with inline FILE STATUS checking for expected conditions. "DECLARATIVES are my smoke alarm," she says. "I never want it to go off, but I'm glad it's there." James Okafor at MedClaim prefers purely inline checking: "I want every error handler visible right next to the I/O statement. No surprises."

10.6 Return Codes and Error Propagation

In batch COBOL programs, the RETURN-CODE special register communicates the program's final status to the operating system (or to a calling program):

       01  WS-RETURN-CODE-VALUES.
           05  RC-SUCCESS            PIC S9(04) COMP VALUE +0.
           05  RC-WARNING            PIC S9(04) COMP VALUE +4.
           05  RC-ERROR              PIC S9(04) COMP VALUE +8.
           05  RC-SEVERE             PIC S9(04) COMP VALUE +12.
           05  RC-FATAL              PIC S9(04) COMP VALUE +16.

Setting RETURN-CODE

       9000-TERMINATE.
           EVALUATE TRUE
               WHEN WS-ERROR-COUNT > WS-MAX-ERRORS
                   MOVE RC-SEVERE TO RETURN-CODE
               WHEN WS-ERROR-COUNT > 0
                   MOVE RC-WARNING TO RETURN-CODE
               WHEN WS-REJECT-COUNT > 0
                   MOVE RC-WARNING TO RETURN-CODE
               WHEN OTHER
                   MOVE RC-SUCCESS TO RETURN-CODE
           END-EVALUATE

           DISPLAY 'PROGRAM COMPLETE — RC=' RETURN-CODE
           DISPLAY 'RECORDS PROCESSED: ' WS-PROCESS-COUNT
           DISPLAY 'RECORDS REJECTED:  ' WS-REJECT-COUNT
           DISPLAY 'ERRORS:            ' WS-ERROR-COUNT
           CLOSE ALL-FILES
           STOP RUN.

JCL Condition Code Checking

The RETURN-CODE value becomes the JCL condition code, which subsequent job steps can test:

//STEP02   EXEC PGM=NEXTSTEP,COND=(8,LT,STEP01)

This says: skip STEP02 if STEP01's return code was less than 8. In practice, this means STEP02 runs only if STEP01 returned 8 or higher (error or worse).

📊 Industry Convention for Return Codes | Code | Meaning | Typical Action | |------|---------|----------------| | 0 | Success | Continue normally | | 4 | Warning | Continue but review | | 8 | Error | Subsequent steps may be skipped | | 12 | Severe error | Most subsequent steps skipped | | 16 | Fatal error | All subsequent steps skipped |

Error Propagation in Called Programs

When a program CALLs a subprogram, the subprogram can communicate errors back through:

RETURN-CODE — set by the subprogram, visible to the caller after the CALL returns
Explicit parameters — a return status field passed as a parameter
Shared working storage — via EXTERNAL data items (less common)

      * In the calling program:
           CALL 'VALIDATE-ACCT' USING ACCT-RECORD
                                      WS-VALIDATE-RESULT
           EVALUATE WS-VALIDATE-RESULT
               WHEN 'OK'
                   PERFORM 3000-PROCESS-VALID-ACCT
               WHEN 'NF'
                   PERFORM 3100-HANDLE-NOT-FOUND
               WHEN 'LK'
                   PERFORM 3200-HANDLE-LOCKED
               WHEN OTHER
                   MOVE 'UNKNOWN VALIDATE RESULT'
                       TO WS-ERR-MSG
                   PERFORM 9800-LOG-ERROR
           END-EVALUATE

      * In the called subprogram VALIDATE-ACCT:
       LINKAGE SECTION.
       01  LS-ACCT-RECORD.
           COPY ACCT-REC.
       01  LS-RESULT                 PIC XX.

       PROCEDURE DIVISION USING LS-ACCT-RECORD
                                LS-RESULT.
           READ ACCT-MASTER-FILE
           EVALUATE WS-ACCT-STATUS
               WHEN '00'
                   MOVE 'OK' TO LS-RESULT
               WHEN '23'
                   MOVE 'NF' TO LS-RESULT
               WHEN '38'
                   MOVE 'LK' TO LS-RESULT
               WHEN OTHER
                   MOVE 'ER' TO LS-RESULT
           END-EVALUATE
           GOBACK.

10.7 Input Validation Techniques

Data coming from external sources — files, user input, data feeds, EDI transmissions — should never be trusted. This is not paranoia; it is experience. James Okafor keeps a log of every data quality issue MedClaim has encountered over the past five years. The log currently has 847 entries. The most common categories are: non-numeric data in numeric fields (23%), missing required fields (19%), dates in unexpected formats (14%), amount values out of valid range (12%), and duplicate records (8%). Every one of these issues would cause a program failure or incorrect output if not caught by validation. Defensive programs validate every field before processing.

Numeric Validation

COBOL's NUMERIC class test checks whether a field contains valid numeric data:

           IF WS-INPUT-AMOUNT IS NUMERIC
               MOVE WS-INPUT-AMOUNT TO WS-TXN-AMOUNT
           ELSE
               MOVE 'NON-NUMERIC AMOUNT' TO WS-ERR-MSG
               PERFORM 9800-LOG-ERROR
               SET WS-RECORD-INVALID TO TRUE
           END-IF

⚠️ Caution — NUMERIC Test Behavior The NUMERIC test checks if a field contains characters that are valid for the field's USAGE. For a DISPLAY numeric field (PIC 9), valid characters are 0-9 and, for signed fields, the sign encoding. For COMP-3 (packed decimal), the test checks for valid packed-decimal encoding. For alphanumeric fields (PIC X), the NUMERIC test checks if all characters are in the range 0-9. Be sure you understand which test is being applied based on the field's definition.

Range Validation

           IF CLM-BILLED-AMT >= ZERO
               AND CLM-BILLED-AMT <= WS-MAX-CLAIM-AMOUNT
               CONTINUE
           ELSE
               STRING 'CLAIM AMOUNT OUT OF RANGE: '
                      DELIMITED BY SIZE
                      CLM-ID DELIMITED BY SIZE
                 INTO WS-ERR-MSG
               END-STRING
               PERFORM 9800-LOG-ERROR
               SET WS-RECORD-INVALID TO TRUE
           END-IF

Date Validation

Dates are notoriously tricky. A defensive program validates not just format but logical correctness:

       5000-VALIDATE-DATE.
      * Assumes WS-VAL-DATE is PIC 9(8) in YYYYMMDD format
           MOVE WS-VAL-DATE(1:4) TO WS-VAL-YEAR
           MOVE WS-VAL-DATE(5:2) TO WS-VAL-MONTH
           MOVE WS-VAL-DATE(7:2) TO WS-VAL-DAY

           IF WS-VAL-YEAR < 1900 OR WS-VAL-YEAR > 2099
               SET WS-DATE-INVALID TO TRUE
               MOVE 'YEAR OUT OF RANGE' TO WS-VAL-ERR-MSG
           ELSE IF WS-VAL-MONTH < 01 OR WS-VAL-MONTH > 12
               SET WS-DATE-INVALID TO TRUE
               MOVE 'MONTH OUT OF RANGE' TO WS-VAL-ERR-MSG
           ELSE
               PERFORM 5100-VALIDATE-DAY
           END-IF.

       5100-VALIDATE-DAY.
           EVALUATE WS-VAL-MONTH
               WHEN 01 WHEN 03 WHEN 05 WHEN 07
               WHEN 08 WHEN 10 WHEN 12
                   IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 31
                       SET WS-DATE-INVALID TO TRUE
                   END-IF
               WHEN 04 WHEN 06 WHEN 09 WHEN 11
                   IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 30
                       SET WS-DATE-INVALID TO TRUE
                   END-IF
               WHEN 02
                   PERFORM 5200-CHECK-FEBRUARY
           END-EVALUATE.

       5200-CHECK-FEBRUARY.
           IF FUNCTION MOD(WS-VAL-YEAR, 4) = 0
               AND (FUNCTION MOD(WS-VAL-YEAR, 100) NOT = 0
                OR FUNCTION MOD(WS-VAL-YEAR, 400) = 0)
               IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 29
                   SET WS-DATE-INVALID TO TRUE
               END-IF
           ELSE
               IF WS-VAL-DAY < 01 OR WS-VAL-DAY > 28
                   SET WS-DATE-INVALID TO TRUE
               END-IF
           END-IF.

Condition-Name Validation

88-level condition names provide elegant validation for coded fields:

       01  WS-INPUT-ACCT-TYPE        PIC X(02).
           88  VALID-ACCT-TYPE        VALUE 'CH' 'SV' 'MM' 'CD'.

       ...

           IF VALID-ACCT-TYPE
               CONTINUE
           ELSE
               STRING 'INVALID ACCT TYPE: '
                      DELIMITED BY SIZE
                      WS-INPUT-ACCT-TYPE
                      DELIMITED BY SIZE
                 INTO WS-ERR-MSG
               END-STRING
               PERFORM 9800-LOG-ERROR
               SET WS-RECORD-INVALID TO TRUE
           END-IF

Cross-Field Validation

Sometimes individual fields are valid but their combination is not:

      * A closed account should have a zero balance
           IF ACCT-CLOSED AND ACCT-CURR-BALANCE NOT = ZERO
               MOVE 'CLOSED ACCT WITH NON-ZERO BALANCE'
                   TO WS-ERR-MSG
               PERFORM 9800-LOG-ERROR
               SET WS-RECORD-SUSPICIOUS TO TRUE
           END-IF

      * A withdrawal cannot exceed available balance
           IF TXN-WITHDRAW AND
              TXN-AMOUNT > ACCT-AVAIL-BALANCE
               MOVE 'WITHDRAWAL EXCEEDS AVAILABLE BAL'
                   TO WS-ERR-MSG
               PERFORM 9800-LOG-ERROR
               SET WS-TXN-REJECTED TO TRUE
           END-IF

Building a Validation Framework

For programs that validate many fields, a structured approach prevents the code from becoming an unreadable chain of IF statements:

       01  WS-VALIDATION-RESULTS.
           05  WS-VAL-ERROR-COUNT    PIC 9(03) VALUE ZERO.
           05  WS-VAL-ERRORS.
               10  WS-VAL-ERROR      OCCURS 20 TIMES.
                   15  WS-VAL-ERR-FIELD  PIC X(30).
                   15  WS-VAL-ERR-MSG    PIC X(50).
           05  WS-VAL-OVERALL        PIC X(01).
               88  WS-RECORD-VALID   VALUE 'V'.
               88  WS-RECORD-INVALID VALUE 'I'.

       ...

       4000-VALIDATE-CLAIM.
           MOVE ZERO TO WS-VAL-ERROR-COUNT
           SET WS-RECORD-VALID TO TRUE

           PERFORM 4100-VAL-CLAIM-ID
           PERFORM 4200-VAL-CLAIM-TYPE
           PERFORM 4300-VAL-MEMBER-INFO
           PERFORM 4400-VAL-PROVIDER-INFO
           PERFORM 4500-VAL-FINANCIAL
           PERFORM 4600-VAL-DATES
           PERFORM 4700-VAL-CROSS-FIELD

           IF WS-VAL-ERROR-COUNT > 0
               SET WS-RECORD-INVALID TO TRUE
           END-IF.

       4100-VAL-CLAIM-ID.
           IF CLM-ID = SPACES OR CLM-ID = LOW-VALUES
               ADD 1 TO WS-VAL-ERROR-COUNT
               MOVE 'CLM-ID'
                   TO WS-VAL-ERR-FIELD(WS-VAL-ERROR-COUNT)
               MOVE 'CLAIM ID IS BLANK'
                   TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
           END-IF.

This pattern collects all validation errors for a record rather than stopping at the first one, which makes error reporting much more useful.

Try It Yourself — Validation Gauntlet

Create a program that reads employee records and runs them through a comprehensive validation gauntlet. Each record has these fields:

Employee ID (PIC 9(6)) — must be numeric, non-zero
Department (PIC X(3)) — must be one of: ENG, MKT, FIN, HR, OPS
Salary (PIC 9(6)V99) — must be between 25,000.00 and 500,000.00
Hire date (PIC 9(8)) — must be a valid YYYYMMDD date, not in the future
Termination date (PIC 9(8)) — if non-zero, must be after hire date
Manager ID (PIC 9(6)) — must be different from employee ID

Collect all errors per record (do not stop at the first). Write valid records to one file and invalid records to another, with all error descriptions appended. Count the number of records with 1 error, 2 errors, 3+ errors, and display these statistics at termination.

This exercise forces you to think about cross-field validation (termination date vs. hire date, manager ID vs. employee ID) and multi-error collection, which are skills that distinguish production-quality code from classroom exercises.

Validation Performance Considerations

In a high-volume batch program processing millions of records, validation can become a performance bottleneck. Some techniques to keep validation fast:

Use 88-level conditions instead of IF chains. Checking IF VALID-DEPT-CODE (an 88-level with all valid values) is typically faster than IF DEPT = 'ENG' OR DEPT = 'MKT' OR DEPT = 'FIN' OR ....
Validate in decreasing likelihood of failure. If 99% of records have valid department codes but only 90% have valid dates, check dates first. This allows you to skip remaining validations sooner for invalid records (if using a stop-at-first-error strategy).
Pre-load reference data. If validation requires checking against a reference table (e.g., valid provider NPIs), load the table into working-storage arrays during initialization rather than performing file I/O during validation.
Consider binary search for large reference tables. If your validation reference table has thousands of entries, use SEARCH ALL (binary search) rather than SEARCH (sequential search). Binary search is O(log n) versus O(n) for sequential search.

       01  WS-VALID-CODES.
           05  WS-CODE-COUNT        PIC 9(04) VALUE 50.
           05  WS-CODE-TABLE.
               10  WS-CODE-ENTRY    PIC X(05)
                                    OCCURS 50 TIMES
                                    ASCENDING KEY WS-CODE-ENTRY
                                    INDEXED BY WS-CODE-IDX.

       ...

      * Binary search for valid code
           SET WS-CODE-IDX TO 1
           SEARCH ALL WS-CODE-ENTRY
               AT END
                   SET RECORD-INVALID TO TRUE
                   MOVE 'INVALID CODE' TO WS-ERR-MSG
               WHEN WS-CODE-ENTRY(WS-CODE-IDX) =
                   WS-INPUT-CODE
                   CONTINUE
           END-SEARCH

10.8 Boundary Condition Handling

Boundary conditions are the edges of your program's expected input space — the first record, the last record, empty files, maximum-length fields, zero amounts, negative values, and so on.

Empty File Handling

       2000-PROCESS-LOOP.
           PERFORM 2100-READ-NEXT-TXN
           IF WS-END-OF-FILE AND WS-TXN-READ-COUNT = ZERO
               DISPLAY 'WARNING: Input file is empty'
               MOVE RC-WARNING TO RETURN-CODE
           ELSE
               PERFORM UNTIL WS-END-OF-FILE
                   PERFORM 3000-PROCESS-TRANSACTION
                   PERFORM 2100-READ-NEXT-TXN
               END-PERFORM
           END-IF.

Table Overflow

When loading data into a table (OCCURS), always check bounds:

       01  WS-ERROR-TABLE.
           05  WS-MAX-ERRORS         PIC 9(03) VALUE 100.
           05  WS-ERROR-COUNT        PIC 9(03) VALUE ZERO.
           05  WS-ERRORS OCCURS 100 TIMES.
               10  WS-ERR-CODE       PIC X(04).
               10  WS-ERR-TEXT       PIC X(60).

       ...

       ADD-ERROR-ENTRY.
           IF WS-ERROR-COUNT >= WS-MAX-ERRORS
               DISPLAY 'ERROR TABLE FULL — ERRORS TRUNCATED'
           ELSE
               ADD 1 TO WS-ERROR-COUNT
               MOVE WS-NEW-ERR-CODE
                   TO WS-ERR-CODE(WS-ERROR-COUNT)
               MOVE WS-NEW-ERR-TEXT
                   TO WS-ERR-TEXT(WS-ERROR-COUNT)
           END-IF.

Counter Overflow

Even counters can overflow. If your program processes millions of records but your counter is PIC 9(5) (max 99,999), you have a problem:

      * Safe counter increment
           IF WS-RECORD-COUNT < 9999999
               ADD 1 TO WS-RECORD-COUNT
           ELSE
               DISPLAY 'WARNING: Record counter overflow'
               DISPLAY '         Count frozen at 9999999'
           END-IF

Better yet, size your counters appropriately from the start. GlobalBank's TXN-PROC uses PIC 9(09) for transaction counters — enough for a billion records.

Try It Yourself — Boundary Testing

Write a program that demonstrates all of the boundary conditions described above:

Create an input file with a known number of records (e.g., exactly 100).
Process the file with a PIC 9(2) counter (max 99) and observe what happens at record 100.
Add a table (OCCURS 20 TIMES) and try to add 25 entries.
Create an empty file and run the program — verify your empty file handling works.
Include a counter that will overflow with ON SIZE ERROR handling.

For each boundary, document: (a) what happens without the defensive check, and (b) what happens with it. This exercise builds the instinct to always ask "what happens at the edges?"

The Sentinel Value Pattern

A common boundary technique is the sentinel value — a special value that signals the end of data. In COBOL, HIGH-VALUES (x'FF...') and LOW-VALUES (x'00...') are natural sentinels.

When processing two sorted files in parallel (such as in the match-update pattern from Chapter 11), set the key to HIGH-VALUES when a file reaches end-of-file. This causes the exhausted file's key to compare higher than any valid key in the other file, so all remaining records from the non-exhausted file are processed naturally:

       2100-READ-MASTER.
           READ MASTER-FILE INTO WS-MASTER-RECORD
           EVALUATE WS-MASTER-STATUS
               WHEN '00'
                   ADD 1 TO WS-MASTER-READ
               WHEN '10'
                   MOVE HIGH-VALUES TO WS-MASTER-KEY
                   SET MASTER-EOF TO TRUE
               WHEN OTHER
                   PERFORM 9900-ABEND-PROGRAM
           END-EVALUATE.

This technique eliminates the need for complex end-of-file logic in the main processing loop. The loop simply compares keys, and the sentinel values handle the exhaustion cases automatically. It is elegant and widely used in production COBOL.

10.9 Graceful Degradation Patterns

Not every error should be fatal. Defensive programs distinguish between errors that must stop processing and errors that can be handled, logged, and skipped.

The Error Threshold Pattern

       01  WS-ERROR-CONTROL.
           05  WS-MAX-ERRORS         PIC 9(05) VALUE 100.
           05  WS-CONSEC-ERRORS      PIC 9(03) VALUE ZERO.
           05  WS-MAX-CONSEC         PIC 9(03) VALUE 10.
           05  WS-TOTAL-ERRORS       PIC 9(05) VALUE ZERO.

       ...

       3000-PROCESS-TRANSACTION.
           PERFORM 3100-VALIDATE-TXN
           IF WS-RECORD-VALID
               PERFORM 3200-APPLY-TXN
               MOVE ZERO TO WS-CONSEC-ERRORS
           ELSE
               ADD 1 TO WS-TOTAL-ERRORS
               ADD 1 TO WS-CONSEC-ERRORS
               PERFORM 3300-WRITE-REJECT
               IF WS-TOTAL-ERRORS >= WS-MAX-ERRORS
                   DISPLAY 'MAX TOTAL ERRORS REACHED'
                   PERFORM 9900-ABEND-PROGRAM
               END-IF
               IF WS-CONSEC-ERRORS >= WS-MAX-CONSEC
                   DISPLAY 'MAX CONSECUTIVE ERRORS REACHED'
                   DISPLAY 'POSSIBLE SYSTEMIC DATA PROBLEM'
                   PERFORM 9900-ABEND-PROGRAM
               END-IF
           END-IF.

The consecutive error counter is particularly valuable. If ten records in a row fail validation, the problem is probably systemic (wrong file, corrupt data, schema mismatch) rather than individual bad records. Continuing would be pointless.

The Bypass Pattern

When a non-critical operation fails, log it and continue:

       3500-GENERATE-NOTIFICATION.
      * Notification is nice-to-have, not critical
           CALL 'NOTIFY-SVC' USING WS-NOTIFY-PARMS
                                    WS-NOTIFY-STATUS
           IF WS-NOTIFY-STATUS NOT = 'OK'
               DISPLAY 'WARNING: Notification failed'
               DISPLAY '         Continuing processing'
               ADD 1 TO WS-WARNING-COUNT
           END-IF.

The Retry Pattern

For transient errors (file lock, network timeout), retrying can resolve the issue:

       01  WS-RETRY-CONTROL.
           05  WS-MAX-RETRIES        PIC 9(02) VALUE 03.
           05  WS-RETRY-COUNT        PIC 9(02) VALUE ZERO.
           05  WS-RETRY-NEEDED       PIC X(01).
               88  RETRY-YES         VALUE 'Y'.
               88  RETRY-NO          VALUE 'N'.

       ...

       2500-READ-WITH-RETRY.
           MOVE ZERO TO WS-RETRY-COUNT
           SET RETRY-YES TO TRUE
           PERFORM UNTIL RETRY-NO
               READ ACCT-MASTER-FILE
               EVALUATE WS-ACCT-STATUS
                   WHEN '00'
                       SET RETRY-NO TO TRUE
                       PERFORM 2600-PROCESS-RECORD
                   WHEN '38'
                       ADD 1 TO WS-RETRY-COUNT
                       IF WS-RETRY-COUNT >= WS-MAX-RETRIES
                           DISPLAY 'FILE LOCKED AFTER '
                                   WS-MAX-RETRIES ' RETRIES'
                           SET RETRY-NO TO TRUE
                           PERFORM 9800-LOG-ERROR
                       ELSE
                           DISPLAY 'FILE LOCKED — RETRY '
                                   WS-RETRY-COUNT
                       END-IF
                   WHEN OTHER
                       SET RETRY-NO TO TRUE
                       PERFORM 9800-LOG-ERROR
               END-EVALUATE
           END-PERFORM.

🧪 Real-World Note In true z/OS batch COBOL, you cannot easily implement a timed wait between retries (there is no SLEEP verb). The retry loop above retries immediately. In CICS (online) COBOL, you have access to EXEC CICS DELAY, which enables proper backoff-and-retry patterns. We will cover CICS error handling in Chapter 30.

The Dead Letter Queue Pattern

Borrowed from messaging systems, the dead letter queue pattern captures records that cannot be processed for any reason — not just validation failures, but also records that cause unexpected errors during processing. The dead letter file contains the original record plus diagnostic information:

       01  WS-DEAD-LETTER.
           05  DL-ORIGINAL-RECORD    PIC X(500).
           05  DL-ERROR-TYPE         PIC X(10).
               88  DL-VALIDATION     VALUE 'VALIDATION'.
               88  DL-PROCESSING     VALUE 'PROCESSING'.
               88  DL-REFERENCE      VALUE 'REFERENCE '.
               88  DL-SYSTEM         VALUE 'SYSTEM    '.
           05  DL-ERROR-CODE         PIC X(04).
           05  DL-ERROR-MESSAGE      PIC X(80).
           05  DL-PROGRAM-NAME       PIC X(08).
           05  DL-PARAGRAPH-NAME     PIC X(30).
           05  DL-TIMESTAMP          PIC X(26).
           05  DL-RECORD-NUMBER      PIC 9(09).
           05  DL-RETRY-COUNT        PIC 9(02) VALUE ZERO.

The DL-ERROR-TYPE field distinguishes between different classes of failure: VALIDATION errors are data quality issues, PROCESSING errors are logic failures, REFERENCE errors are referential integrity failures (e.g., the account does not exist), and SYSTEM errors are I/O failures or resource issues.

A companion program can later read the dead letter file, attempt to reprocess correctable records, and generate a human-readable exception report for records that require manual intervention. This pattern is particularly useful in MedClaim's environment, where claims that fail processing should not be silently discarded — they represent revenue.

🔵 Modern Practice The dead letter queue concept is native to modern messaging systems like IBM MQ, Apache Kafka, and Amazon SQS. In mainframe COBOL batch processing, the reject file serves the same purpose. When mainframe shops integrate with message queues, they often map between COBOL reject files and MQ dead letter queues as part of their modernization strategy.

Defensive Programming and Performance

A common objection to defensive programming is performance overhead. Checking FILE STATUS, validating every field, and writing structured log entries all cost CPU time. How significant is the overhead?

In practice, the overhead is negligible compared to I/O costs. For a batch program that processes 2 million records, the FILE STATUS checks add perhaps 0.5 seconds of CPU time. The validations add perhaps 2-3 seconds. The I/O operations — reading from disk, writing to disk — consume 95% of the elapsed time. The defensive checks occupy less than 1% of the total runtime.

James Okafor puts it bluntly: "I've never seen a COBOL batch program where defensive programming was the performance bottleneck. I've seen dozens where the lack of defensive programming was the availability bottleneck — because the program ABENDed and had to be rerun, costing hours of wall-clock time."

There are two exceptions where performance matters:

Reference validation with file I/O. If you validate every claim's member ID by reading the MEMBER-FILE, that is an I/O operation per record. For high-volume files, consider loading reference data into working-storage tables during initialization.
Logging at trace level. Writing a log entry for every successfully processed record (not just errors) can generate enormous log files. Use severity-based logging: log everything at ERROR and FATAL levels, but log INFO and DEBUG only when explicitly enabled (perhaps via a runtime parameter).

10.10 Logging and Audit Trails

A well-designed error log is worth more than the cleverest error-handling code. When problems occur at 2 AM and the on-call operator needs to diagnose and resolve the issue, the log is all they have.

Log Record Design

       01  WS-LOG-RECORD.
           05  WS-LOG-TIMESTAMP      PIC X(26).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-SEVERITY       PIC X(05).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-PROGRAM        PIC X(08).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-PARAGRAPH      PIC X(30).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-FILE-STATUS    PIC X(02).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-RECORD-KEY     PIC X(20).
           05  FILLER                 PIC X(01) VALUE '|'.
           05  WS-LOG-MESSAGE        PIC X(80).

The Logging Paragraph

       9800-LOG-ERROR.
           MOVE FUNCTION CURRENT-DATE TO WS-CURR-DATE-TIME
           STRING WS-CURR-YEAR     DELIMITED BY SIZE
                  '-'               DELIMITED BY SIZE
                  WS-CURR-MONTH    DELIMITED BY SIZE
                  '-'               DELIMITED BY SIZE
                  WS-CURR-DAY      DELIMITED BY SIZE
                  ' '               DELIMITED BY SIZE
                  WS-CURR-HOUR     DELIMITED BY SIZE
                  ':'               DELIMITED BY SIZE
                  WS-CURR-MINUTE   DELIMITED BY SIZE
                  ':'               DELIMITED BY SIZE
                  WS-CURR-SECOND   DELIMITED BY SIZE
             INTO WS-LOG-TIMESTAMP
           END-STRING

           MOVE 'ERROR' TO WS-LOG-SEVERITY
           MOVE WS-PROGRAM-NAME TO WS-LOG-PROGRAM
           MOVE WS-CURRENT-PARA TO WS-LOG-PARAGRAPH
           MOVE WS-CURRENT-STATUS TO WS-LOG-FILE-STATUS
           MOVE WS-CURRENT-KEY TO WS-LOG-RECORD-KEY
           MOVE WS-ERR-MSG TO WS-LOG-MESSAGE

           WRITE LOG-FILE-RECORD FROM WS-LOG-RECORD
           IF NOT LOG-SUCCESS
               DISPLAY 'CRITICAL: Cannot write to error log'
               DISPLAY WS-LOG-RECORD
           END-IF

           ADD 1 TO WS-ERROR-COUNT.

Note the recursive defense: the logging paragraph itself checks whether the log write succeeded. If the log file is also broken, it falls back to DISPLAY, which goes to SYSOUT on z/OS — always available, if not always monitored.

Structured Logging for Modern Analysis

Modern operations teams parse logs with tools like Splunk or ELK. Producing structured output helps:

      * JSON-style log entry for Splunk ingestion
           STRING '{"ts":"'          DELIMITED BY SIZE
                  WS-LOG-TIMESTAMP   DELIMITED BY SIZE
                  '","pgm":"'        DELIMITED BY SIZE
                  WS-LOG-PROGRAM     DELIMITED BY SPACES
                  '","sev":"'        DELIMITED BY SIZE
                  WS-LOG-SEVERITY    DELIMITED BY SPACES
                  '","para":"'       DELIMITED BY SIZE
                  WS-LOG-PARAGRAPH   DELIMITED BY SPACES
                  '","fs":"'         DELIMITED BY SIZE
                  WS-LOG-FILE-STATUS DELIMITED BY SIZE
                  '","key":"'        DELIMITED BY SIZE
                  WS-LOG-RECORD-KEY  DELIMITED BY SPACES
                  '","msg":"'        DELIMITED BY SIZE
                  WS-LOG-MESSAGE     DELIMITED BY '  '
                  '"}'               DELIMITED BY SIZE
             INTO WS-JSON-LOG-LINE
           END-STRING

🔵 Modern Practice Several mainframe shops now write COBOL error logs directly as JSON lines, which are then streamed to Splunk or Elasticsearch via z/OS Connect or MQ. This bridges the gap between traditional batch processing and modern observability platforms.

Summary Statistics at Termination

Every production program should display summary statistics at termination. This is part of defensive programming because it provides a quick health check — the operator or reviewer can scan the numbers and immediately spot anomalies:

       9000-TERMINATE.
           DISPLAY '================================='
           DISPLAY 'PROGRAM: ' WS-PROGRAM-NAME
           DISPLAY 'RUN DATE: ' WS-CURRENT-DATE
           DISPLAY '================================='
           DISPLAY 'Records read:       ' WS-READ-COUNT
           DISPLAY 'Records processed:  ' WS-PROCESS-COUNT
           DISPLAY 'Records rejected:   ' WS-REJECT-COUNT
           DISPLAY 'Records in error:   ' WS-ERROR-COUNT
           DISPLAY 'Total amount:       ' WS-TOTAL-AMOUNT
           DISPLAY '---------------------------------'
           DISPLAY 'INFO log entries:   ' WS-INFO-LOG-COUNT
           DISPLAY 'WARNING log entries:' WS-WARN-LOG-COUNT
           DISPLAY 'ERROR log entries:  ' WS-ERROR-LOG-COUNT
           DISPLAY '================================='

      * Verify record counts balance
           COMPUTE WS-BALANCE-CHECK =
               WS-PROCESS-COUNT + WS-REJECT-COUNT
           IF WS-BALANCE-CHECK NOT = WS-READ-COUNT
               DISPLAY '*** RECORD COUNT IMBALANCE ***'
               DISPLAY '  Read:     ' WS-READ-COUNT
               DISPLAY '  Proc+Rej: ' WS-BALANCE-CHECK
               MOVE RC-ERROR TO RETURN-CODE
           END-IF

Notice the balancing check at the end. The number of records read should equal the number processed plus the number rejected. If it does not, something went wrong — perhaps a record was silently skipped without being counted in either category. This kind of self-auditing catches bugs that might otherwise go unnoticed.

Maria Chen makes her programs display these statistics both to SYSOUT (for the operator console) and to the error log file (for permanent record). "The console display is for the night operator," she says. "The log file entry is for Monday morning's review meeting."

10.11 GlobalBank Case Study: Error Handling in TXN-PROC

Let us trace what happens when a transaction fails in GlobalBank's TXN-PROC program.

Scenario: Transaction Against a Non-Existent Account

Derek Washington runs TXN-PROC for the first time in production. A transaction references account number 9999999999 — an account that does not exist in the master file.

Here is the error handling flow:

1. TXN-PROC reads TXN-RECORD with TXN-ACCT-FROM = '9999999999'
2. TXN-PROC does a random READ on ACCT-MASTER-FILE using the key
3. FILE STATUS returns '23' (record not found)
4. 88-level ACCT-NOT-FOUND is TRUE
5. Program executes PERFORM 2300-HANDLE-NOT-FOUND
6. 2300 writes a reject record to the reject file
7. 2300 calls 9800-LOG-ERROR with details
8. 2300 increments WS-REJECT-COUNT
9. Control returns to the main processing loop
10. Next transaction is read and processed

The key point: the program does not ABEND. It logs the error, rejects the transaction, and continues. A human reviewer will examine the reject file and determine whether account 9999999999 is a data entry error, a test record that leaked into production, or a genuine problem.

Scenario: Account File is Locked

During a month-end run, another job has exclusive access to ACCT-MASTER-FILE. TXN-PROC's OPEN returns status '38' (file locked).

       1000-INITIALIZE.
           OPEN I-O ACCT-MASTER-FILE
           EVALUATE TRUE
               WHEN ACCT-SUCCESS
                   CONTINUE
               WHEN ACCT-LOCKED
                   DISPLAY 'ACCT FILE LOCKED BY ANOTHER JOB'
                   DISPLAY 'RETRY AFTER MONTH-END COMPLETES'
                   MOVE RC-SEVERE TO RETURN-CODE
                   STOP RUN
               WHEN ACCT-NOT-EXISTS
                   DISPLAY 'ACCT FILE NOT FOUND — CHECK JCL'
                   MOVE RC-FATAL TO RETURN-CODE
                   STOP RUN
               WHEN OTHER
                   DISPLAY 'UNEXPECTED ERROR OPENING ACCT FILE'
                   DISPLAY 'STATUS: ' WS-ACCT-STATUS
                   MOVE RC-FATAL TO RETURN-CODE
                   STOP RUN
           END-EVALUATE.

The error message is specific and actionable: it tells the operator exactly what happened and what to do about it.

Scenario: Disk Full During WRITE

Mid-way through processing, the output file's disk allocation fills up. FILE STATUS returns '34' (boundary violation — sequential file full).

Maria's code handles this by: 1. Logging the error with the last successfully written record's key 2. Setting RETURN-CODE to RC-SEVERE (12) 3. Closing all files in an orderly fashion 4. Stopping the program

The JCL for subsequent steps checks the condition code and skips downstream processing. The operations team expands the disk allocation and reruns the job.

10.12 MedClaim Case Study: Handling Malformed Claim Records

MedClaim receives claims from hundreds of healthcare providers, each with their own systems and varying data quality. James Okafor's CLM-INTAKE program is the front line of defense.

The Validation Pipeline

       3000-VALIDATE-CLAIM.
           INITIALIZE WS-VALIDATION-RESULTS
           SET WS-RECORD-VALID TO TRUE

      * Level 1: Field-level validation
           PERFORM 3100-VAL-CLAIM-ID
           PERFORM 3110-VAL-CLAIM-TYPE
           PERFORM 3120-VAL-MEMBER-ID
           PERFORM 3130-VAL-PROVIDER-NPI
           PERFORM 3140-VAL-BILLED-AMOUNT
           PERFORM 3150-VAL-DIAGNOSIS-CODES
           PERFORM 3160-VAL-PROCEDURE-CODE
           PERFORM 3170-VAL-SERVICE-DATE

      * Level 2: Cross-field validation
           IF WS-RECORD-VALID
               PERFORM 3200-VAL-TYPE-AMOUNT
               PERFORM 3210-VAL-DATES-SEQUENCE
               PERFORM 3220-VAL-DIAG-PROC-MATCH
           END-IF

      * Level 3: Referential validation
           IF WS-RECORD-VALID
               PERFORM 3300-VAL-MEMBER-EXISTS
               PERFORM 3310-VAL-PROVIDER-EXISTS
               PERFORM 3320-VAL-COVERAGE-ACTIVE
           END-IF.

Notice the tiered approach: field-level validation runs first. If any field is invalid, cross-field validation is skipped (because comparing invalid fields is meaningless). Referential validation — checking against the member and provider files — only runs if the basic data is clean.

Handling Malformed Numeric Fields

A common problem: a provider's system sends non-numeric data in a numeric field. If COBOL tries to use this data in an arithmetic operation, the result is an ABEND (specifically, a S0C7 data exception on z/OS).

       3140-VAL-BILLED-AMOUNT.
      * The billed amount comes as display numeric
      * in the raw input record
           IF WS-RAW-BILLED-AMT IS NUMERIC
               MOVE WS-RAW-BILLED-AMT TO CLM-BILLED-AMT
               IF CLM-BILLED-AMT < 0.01
                   PERFORM 3141-ADD-ERROR
                   MOVE 'BILLED AMOUNT LESS THAN $0.01'
                       TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
               END-IF
               IF CLM-BILLED-AMT > 999999.99
                   PERFORM 3141-ADD-ERROR
                   MOVE 'BILLED AMOUNT EXCEEDS MAXIMUM'
                       TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
               END-IF
           ELSE
               PERFORM 3141-ADD-ERROR
               MOVE 'BILLED AMOUNT IS NOT NUMERIC'
                   TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
           END-IF.

       3141-ADD-ERROR.
           ADD 1 TO WS-VAL-ERROR-COUNT
           SET WS-RECORD-INVALID TO TRUE
           MOVE 'CLM-BILLED-AMT'
               TO WS-VAL-ERR-FIELD(WS-VAL-ERROR-COUNT).

The Reject File Pattern

Invalid records are written to a reject file with error details, enabling human review and resubmission:

       01  WS-REJECT-RECORD.
           05  REJ-ORIGINAL-DATA     PIC X(500).
           05  REJ-ERROR-COUNT       PIC 9(03).
           05  REJ-ERROR-DETAILS.
               10  REJ-ERROR         OCCURS 20 TIMES.
                   15  REJ-ERR-FIELD PIC X(30).
                   15  REJ-ERR-MSG   PIC X(50).
           05  REJ-TIMESTAMP         PIC X(26).
           05  REJ-SOURCE-FILE       PIC X(44).

✅ Defensive Programming Checklist Before submitting any COBOL program for production deployment, verify: - [ ] FILE STATUS defined for every file - [ ] FILE STATUS checked after every I/O operation - [ ] ON SIZE ERROR on all arithmetic that could overflow - [ ] Input validation for all external data - [ ] Date validation (not just format, but logical correctness) - [ ] Boundary checks for all table subscripts - [ ] Counter sizes appropriate for expected volumes - [ ] Error threshold for graceful degradation - [ ] Meaningful RETURN-CODE set at termination - [ ] Error log with timestamp, program, paragraph, and context - [ ] All files closed in termination logic (including error paths)

10.13 Defensive Coding for COMP-3 Fields

One of the most insidious errors in COBOL is the S0C7 data exception on z/OS — a system ABEND that occurs when the program attempts to use a COMP-3 (packed decimal) field that contains invalid data. Understanding why this happens and how to prevent it is a core defensive programming skill.

How COMP-3 Works

COMP-3 stores two decimal digits per byte, with the sign stored in the lower nibble of the last byte. For example, the value +12345 in PIC S9(5) COMP-3 is stored as:

Hex:  12 34 5C
      ^^ ^^ ^^
      12 34 5+ (C = positive sign)

Valid sign nibbles are: C (positive), D (negative), F (unsigned positive). Any other value in the sign nibble — and any hex digit greater than 9 in a digit nibble — causes a data exception when the field is used in arithmetic.

Common Sources of COMP-3 Corruption

Moving alphanumeric data to a COMP-3 field without validation. If a field comes from an external source as PIC X(10) and you move it directly to a PIC S9(7)V99 COMP-3 field, invalid characters become invalid packed-decimal bytes.
Reading a file with a mismatched copybook. If the copybook defines a field as COMP-3 but the actual data was written with a different layout, the bytes at that position may not be valid packed decimal.
Uninitialized WORKING-STORAGE. Although most compilers initialize WORKING-STORAGE to LOW-VALUES (x'00'), some older compilers or runtime environments may leave memory uninitialized. A COMP-3 field containing x'00' is actually valid (it represents zero with an unsigned sign), but x'FF' is not.
Copybook version mismatch. A program compiled with version 1 of a copybook reads a file written by a program compiled with version 2, where field positions have shifted.

The Defense: Validate Before You Compute

      * WRONG — no protection against invalid COMP-3
           ADD TXN-AMOUNT TO ACCT-BALANCE

      * RIGHT — validate the raw input, then compute
           IF WS-RAW-TXN-AMOUNT IS NUMERIC
               MOVE WS-RAW-TXN-AMOUNT TO TXN-AMOUNT
               ADD TXN-AMOUNT TO ACCT-BALANCE
                   ON SIZE ERROR
                       PERFORM HANDLE-OVERFLOW
               END-ADD
           ELSE
               MOVE 'NON-NUMERIC TXN AMOUNT' TO WS-ERR-MSG
               PERFORM 9800-LOG-ERROR
               PERFORM REJECT-TRANSACTION
           END-IF

The FUNCTION TEST-NUMVAL Approach

COBOL 2002 introduced intrinsic functions for validating numeric representations. FUNCTION TEST-NUMVAL returns zero if the argument is a valid numeric value:

           IF FUNCTION TEST-NUMVAL(WS-RAW-AMOUNT) = 0
               COMPUTE WS-VALIDATED-AMOUNT =
                   FUNCTION NUMVAL(WS-RAW-AMOUNT)
           ELSE
               MOVE 'INVALID NUMERIC FORMAT' TO WS-ERR-MSG
               PERFORM 9800-LOG-ERROR
           END-IF

Not all COBOL compilers support these functions, but Enterprise COBOL V5+ and GnuCOBOL 3.x do.

INITIALIZE as a Safety Net

Always INITIALIZE working-storage group items before use, especially those that contain COMP-3 fields:

           INITIALIZE WS-TRANSACTION-WORK
           INITIALIZE WS-CALCULATION-FIELDS

INITIALIZE sets numeric fields to zero and alphanumeric fields to spaces, ensuring COMP-3 fields contain valid packed-decimal data.

10.14 Environment-Specific Defensive Techniques

z/OS-Specific Defenses

On z/OS, several platform-specific features support defensive programming:

ABEND codes and their meaning:

ABEND	Cause	Defensive Response
S0C7	Data exception (invalid packed decimal)	Validate all numeric data before arithmetic
S0C4	Protection exception (invalid memory access)	Check subscripts, validate linkage section pointers
S0CB	Division by zero	Use ON SIZE ERROR or pre-check divisor
S0C1	Operation exception	Usually a program structure issue
S013	Dataset open error	Check FILE STATUS after OPEN
S001	I/O error	Check FILE STATUS after every I/O

The CEEDUMP and Language Environment: When a program ABENDs under IBM Language Environment, the runtime produces a CEEDUMP — a formatted dump that shows the call chain, register contents, and the last few lines of COBOL source executed. While you want to avoid ABENDs, understanding CEEDUMP output is essential for production support.

Compile options for defensive programming:

//COBOL    EXEC PGM=IGYCRCTL,
//         PARM='SSRANGE,TEST,LIST(SOURCE),NUMCHECK'

Option	Purpose
SSRANGE	Runtime subscript range checking — ABENDs with S0C4-like diagnostic if a subscript goes out of bounds
NUMCHECK	Runtime packed-decimal validity checking — catches invalid COMP-3 at the point of use rather than later
TEST	Enables symbolic debugging
LIST(SOURCE)	Produces expanded source listing including copybook insertions

⚠️ Performance Note SSRANGE and NUMCHECK add runtime overhead (typically 5-15%). Many shops enable them during testing and disable them in production. James Okafor at MedClaim keeps SSRANGE enabled even in production: "The performance cost is negligible compared to the cost of a data corruption incident."

GnuCOBOL-Specific Defenses

GnuCOBOL provides its own set of runtime checks:

# Enable runtime bounds checking
export COB_RUNTIME_FLAGS="-debug"

# Compile with bounds checking
cobc -x -debug -Wall -fbound-check myprogram.cbl

The -debug flag enables runtime checks for subscript bounds, numeric validity, and other common errors. The -Wall flag enables all compiler warnings, which can catch potential issues at compile time.

10.15 The ABEND Handler Pattern

When an error is truly unrecoverable, the program must stop — but it should stop in an orderly fashion:

       9900-ABEND-PROGRAM.
           DISPLAY '*** PROGRAM ABEND ***'
           DISPLAY 'PROGRAM:   ' WS-PROGRAM-NAME
           DISPLAY 'PARAGRAPH: ' WS-CURRENT-PARA
           DISPLAY 'ERROR:     ' WS-ERR-MSG
           DISPLAY 'RECORDS PROCESSED: ' WS-RECORD-COUNT
           DISPLAY 'ERRORS LOGGED:     ' WS-ERROR-COUNT

      * Write final summary to log
           MOVE 'FATAL' TO WS-LOG-SEVERITY
           MOVE 'PROGRAM ABENDING' TO WS-LOG-MESSAGE
           PERFORM 9800-LOG-ERROR

      * Close all open files
           PERFORM 9910-CLOSE-ALL-FILES

      * Set return code and stop
           MOVE RC-FATAL TO RETURN-CODE
           STOP RUN.

       9910-CLOSE-ALL-FILES.
      * Close each file, ignoring errors on close
      * (we are already in error state)
           CLOSE ACCT-MASTER-FILE
           CLOSE TXN-INPUT-FILE
           CLOSE REPORT-FILE
           CLOSE ERROR-LOG-FILE.

💡 Key Insight — Closing Files in Error Paths Even in an ABEND situation, close your files. On z/OS, an unclosed VSAM file can be left in a "not properly closed" state that requires manual intervention (VERIFY) before any other program can access it. The few microseconds spent on CLOSE can save hours of operational recovery.

10.16 Putting It All Together — A Defensive Programming Template

Here is a structural template that incorporates all the patterns from this chapter. Use it as a starting point for any production COBOL program:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. TEMPLATE.
      *================================================================*
      * TEMPLATE - Defensive Programming Template
      * Replace TEMPLATE with actual program name
      *================================================================*

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT INPUT-FILE
               ASSIGN TO INFILE
               FILE STATUS IS WS-INPUT-STATUS.
           SELECT OUTPUT-FILE
               ASSIGN TO OUTFILE
               FILE STATUS IS WS-OUTPUT-STATUS.
           SELECT ERROR-LOG
               ASSIGN TO ERRLOG
               FILE STATUS IS WS-LOG-STATUS.

       DATA DIVISION.
       FILE SECTION.
      * ... FD entries ...

       WORKING-STORAGE SECTION.
      * --- File Status Fields ---
       01  WS-INPUT-STATUS           PIC XX.
           88  INPUT-SUCCESS         VALUE '00'.
           88  INPUT-EOF             VALUE '10'.
       01  WS-OUTPUT-STATUS          PIC XX.
           88  OUTPUT-SUCCESS        VALUE '00'.
       01  WS-LOG-STATUS             PIC XX.
           88  LOG-SUCCESS           VALUE '00'.

      * --- Program Control ---
       01  WS-PROGRAM-NAME           PIC X(08) VALUE 'TEMPLATE'.
       01  WS-CURRENT-PARA           PIC X(30).
       01  WS-END-OF-FILE            PIC X VALUE 'N'.
           88  END-OF-FILE           VALUE 'Y'.
           88  NOT-END-OF-FILE       VALUE 'N'.

      * --- Counters ---
       01  WS-COUNTERS.
           05  WS-READ-COUNT         PIC 9(09) VALUE ZERO.
           05  WS-WRITE-COUNT        PIC 9(09) VALUE ZERO.
           05  WS-ERROR-COUNT        PIC 9(07) VALUE ZERO.
           05  WS-REJECT-COUNT       PIC 9(07) VALUE ZERO.

      * --- Error Handling ---
       01  WS-ERR-MSG                PIC X(80).
       01  WS-MAX-ERRORS             PIC 9(05) VALUE 100.

      * --- Return Codes ---
       01  RC-SUCCESS                PIC S9(04) COMP VALUE +0.
       01  RC-WARNING                PIC S9(04) COMP VALUE +4.
       01  RC-ERROR                  PIC S9(04) COMP VALUE +8.
       01  RC-SEVERE                 PIC S9(04) COMP VALUE +12.
       01  RC-FATAL                  PIC S9(04) COMP VALUE +16.

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS UNTIL END-OF-FILE
           PERFORM 9000-TERMINATE
           STOP RUN.

       1000-INITIALIZE.
           MOVE '1000-INITIALIZE' TO WS-CURRENT-PARA
      *    Open files with full status checking
      *    Initialize working storage
      *    Read first record (priming read)
           .

       2000-PROCESS.
           MOVE '2000-PROCESS' TO WS-CURRENT-PARA
      *    Validate current record
      *    Process if valid, reject if invalid
      *    Check error threshold
      *    Read next record
           .

       9000-TERMINATE.
           MOVE '9000-TERMINATE' TO WS-CURRENT-PARA
           DISPLAY 'PROCESSING COMPLETE'
           DISPLAY 'RECORDS READ:     ' WS-READ-COUNT
           DISPLAY 'RECORDS WRITTEN:  ' WS-WRITE-COUNT
           DISPLAY 'RECORDS REJECTED: ' WS-REJECT-COUNT
           DISPLAY 'ERRORS:           ' WS-ERROR-COUNT

           EVALUATE TRUE
               WHEN WS-ERROR-COUNT > WS-MAX-ERRORS
                   MOVE RC-SEVERE TO RETURN-CODE
               WHEN WS-ERROR-COUNT > 0
                   MOVE RC-WARNING TO RETURN-CODE
               WHEN OTHER
                   MOVE RC-SUCCESS TO RETURN-CODE
           END-EVALUATE

           CLOSE INPUT-FILE
                 OUTPUT-FILE
                 ERROR-LOG.

       9800-LOG-ERROR.
      *    Write structured error log entry
      *    Increment error count
      *    Check for log write failure
           .

       9900-ABEND-PROGRAM.
      *    Display fatal error information
      *    Write final log entry
      *    Close all files
      *    Set RC-FATAL
      *    STOP RUN
           .

10.17 Best Practices Summary

Check FILE STATUS after every I/O. No exceptions. Use 88-level conditions for readability.
Use ON SIZE ERROR on all arithmetic. Especially on COMPUTE, DIVIDE, and any calculation involving external data.
Validate all external input. Never assume data from files, users, or other systems is clean.
Use tiered validation. Field-level first, then cross-field, then referential.
Implement error thresholds. Both total and consecutive error limits prevent infinite bad-data processing.
Set meaningful return codes. 0/4/8/12/16 — let the JCL and operations team know the result.
Log with context. Timestamp, program name, paragraph, file status, record key, and message. Always.
Close files in error paths. Even when ABENDing, close your files.
Separate error severity. Not all errors are equal. Distinguish between warnings (continue), errors (reject and continue), and fatal errors (stop).
Document your error strategy. In the program header comment, describe what errors the program handles and how.

✅ Chapter Checkpoint You should now be able to: - Declare and check FILE STATUS codes for every file operation - Use ON SIZE ERROR, ON OVERFLOW, and INVALID KEY handlers - Implement DECLARATIVES and USE statements - Set and propagate return codes through JCL condition code checking - Build input validation frameworks with tiered checking - Handle boundary conditions (empty files, table overflow, counter limits) - Implement graceful degradation with error thresholds - Design structured error logs for production diagnostics - Write orderly ABEND handlers that leave data in a clean state

Chapter Summary

Defensive programming is not about expecting failure — it is about being prepared for it. In a production COBOL environment, where programs process millions of records containing financial or medical data, the cost of an unhandled error can range from hours of manual recovery to regulatory violations and financial loss.

The techniques in this chapter form a toolkit that you will apply throughout the rest of this textbook. FILE STATUS checking accompanies every file operation in Chapters 11-16. Input validation underpins the data manipulation techniques of Chapters 17-21. Return codes and error propagation become essential in the modular design of Chapters 22-26. And the testing chapters (33-37) will show you how to deliberately inject errors to verify that your defensive measures work.

James Okafor summarizes the philosophy: "Write every program as if the person who will debug it at 2 AM on a Saturday is a sleep-deprived version of yourself. Give that person everything they need — clear error messages, clean logs, and an orderly shutdown. They'll thank you."

The investment in defensive programming pays for itself many times over. A program that takes an extra day to write because of comprehensive error handling saves weeks of debugging, recovery, and reputation repair over its lifetime. And in COBOL, lifetimes are measured in decades.

In the next chapter, we move to Sequential File Processing — the most common I/O pattern in batch COBOL, where every defensive technique from this chapter will be put into practice.