Appendix D: z/OS Return and Abend Codes

When a program abends in production, the abend code is your first clue. This appendix covers the abend codes you will actually encounter in COBOL work — not an exhaustive catalog of every possible code, but the ones that account for 95% of the midnight calls. For each code: what it means, why it happens, how to diagnose it, and how to fix it.

Abend codes come in three flavors: - System abends (Sxxx): Generated by z/OS or subsystem components. The "S" prefix is followed by a 3-digit hexadecimal code. - User abends (Unnnn): Generated by application programs or runtime libraries via the ABEND macro or STOP RUN with a non-zero code. The "U" prefix is followed by a 4-digit decimal code. - Subsystem abends: DB2 and CICS have their own abend code formats.


System Abends

S0C1 — Operation Exception

What it means: The CPU attempted to execute an invalid instruction.

Common causes in COBOL: - A PERFORM or CALL branched to an address containing data instead of code. - A program was linked with an unresolved external reference (the entry point is garbage). - A table subscript or pointer was corrupted, causing a branch to a data area. - Calling a program compiled with incompatible compiler options (e.g., mixing RENT and NORENT incorrectly).

Diagnostic steps: 1. Check the PSW (Program Status Word) in the dump. The instruction address tells you where the invalid instruction was. 2. Map the offset back to the COBOL statement using the compiler OFFSET listing. 3. If the offset does not map to any COBOL statement, the problem is likely a corrupted branch address — examine the CALL chain and subscript values. 4. Check the linkage editor (binder) listing for unresolved externals.

Fix approaches: - Verify all called programs are in the STEPLIB/JOBLIB/link list. - Check that the CALL target is correct (no trailing spaces or invalid characters in dynamic CALL identifiers). - Validate subscript values — an out-of-range subscript can corrupt adjacent storage, including branch addresses. - Recompile with SSRANGE to catch subscript violations at runtime.


S0C4 — Protection Exception

What it means: The program attempted to access storage it does not own or that is protected.

Common causes in COBOL: - Subscript or index out of range (the most common cause by far). - Reference modification with an out-of-range start position or length. - Linkage Section data item accessed before the address is established (i.e., before a CALL that passes the address, or before SET ADDRESS OF). - Reading from or writing to a file that is not open. - Moving data to a field defined in a COPY member whose layout changed but the program was not recompiled.

Diagnostic steps: 1. Check the PSW instruction address and map to the COBOL offset listing. 2. Look at the storage address that caused the violation (in the dump's PSW and interrupt information). 3. If the address is 0x00000000 or close to zero, you are referencing uninitialized Linkage Section data. 4. Check all subscripts and reference modifications in the vicinity of the failing statement. 5. If the failing instruction is in a library module (not your code), the problem is almost certainly a corrupted parameter passed via CALL or EXEC CICS.

Fix approaches: - Compile with SSRANGE to catch subscript and reference modification violations. - Validate all subscripts before use. Never trust data-driven subscripts without range checking. - Ensure every Linkage Section pointer is established before use. - After a copybook change, recompile every program that uses that copybook.

This is the single most common abend in COBOL production. If you remember only one diagnostic technique, make it this: compile with SSRANGE in your test environment and run every test case.


S0C7 — Data Exception

What it means: A packed decimal (COMP-3) or zoned decimal (DISPLAY numeric) field contains invalid data — characters that are not valid decimal digits.

Common causes in COBOL: - A numeric field was never initialized (Working-Storage) or was initialized to spaces. - Data read from a file contains characters in a field expected to be numeric. - A group MOVE overlaid a numeric field with non-numeric data. - A REDEFINES or UNION placed character data over a numeric field. - An input file has a record layout mismatch — the program's copybook does not match the actual file format.

Diagnostic steps: 1. Map the PSW offset to the COBOL statement. 2. Identify which numeric field is being referenced. 3. Hex-dump that field's contents. Valid packed decimal has digits 0-9 in each nibble with a valid sign nibble (C, D, or F). Valid zoned decimal has digits F0-F9 with a sign in the last byte. 4. Trace back to where that field was populated — file READ, group MOVE, or initialization.

Fix approaches: - Initialize all numeric fields: INITIALIZE WS-RECORD or explicit MOVE 0 TO field. - Validate input data before arithmetic. Use class tests: IF WS-FIELD IS NUMERIC. - Never use group MOVEs into structures containing numeric fields unless you are certain of the source data layout. - Compile with CHECK(ON) to catch invalid numeric data at the point of the MOVE rather than at the point of the COMPUTE.

S0C7 is the second most common COBOL abend. The root cause is almost always one of: uninitialized data, file layout mismatch, or group MOVE corruption.


S0CB — Division by Zero

What it means: A DIVIDE or COMPUTE attempted to divide by zero.

Common causes in COBOL: - A divisor was not validated before use. - A field used as a divisor was initialized to zero and never populated. - A calculation chain produced zero in an intermediate result used as a divisor.

Diagnostic steps: 1. Map the offset to the COBOL DIVIDE or COMPUTE statement. 2. Display the divisor field's value at the time of the abend.

Fix approaches: - Always validate divisors: IF WS-DIVISOR NOT = 0. - Use ON SIZE ERROR on COMPUTE and DIVIDE statements. - Consider whether a zero divisor is a data error (fix the data) or a valid condition (handle it in logic).


S80A — Insufficient Virtual Storage

What it means: The program requested more storage than the region allows.

Common causes: - REGION on the JCL JOB or EXEC card is too small. - The program allocates very large Working-Storage areas (multi-megabyte tables or arrays). - A runaway loop is consuming stack frames (recursive or deeply nested PERFORMs). - Multiple large programs are loaded into the same address space. - LE runtime options (HEAP, STACK) are set too low.

Diagnostic steps: 1. Check the JCL REGION parameter. Is it explicitly limited? 2. Check the program's Working-Storage size (compiler listing). 3. Look at LE runtime option values in the dump (CEEOPTS or CEEDOPT). 4. If the program worked before and now fails, check for recently added storage areas or tables.

Fix approaches: - Increase REGION (or use REGION=0M if the installation allows it). - Reduce Working-Storage (use files instead of in-memory tables for large data). - Check LE HEAP and STACK settings — increase if needed. - For CICS programs, check DSA limits (see CICS abends below).


S878 — Insufficient Virtual Storage (Getmain Failed)

What it means: A GETMAIN (storage allocation) request failed. Similar to S80A but occurs during dynamic storage allocation rather than initial region allocation.

Common causes: Same as S80A, plus: - Dynamic CALL loading many large subprograms. - DB2 or CICS storage requirements exceeding expectations.

Diagnostic steps and fix: Same as S80A. Check reason code (in register 15) for more specifics.


S806 — Module Not Found

What it means: A LOAD, LINK, XCTL, or dynamic CALL could not find the requested module.

Common causes: - The load module is not in any library in the search order (STEPLIB, JOBLIB, link list, LPA). - The module name is misspelled in the CALL statement. - For dynamic CALLs, the program name in the identifier field has trailing spaces or low-values. - The load library (PDS or PDSE) is unavailable.

Diagnostic steps: 1. Check the reason code (displayed with the abend): X'04' = module not found. X'0C' = BLDL error. 2. Check the module name being requested (in the dump or the SYSLOG). 3. Verify the STEPLIB/JOBLIB concatenation — is the correct load library included? 4. Verify the module exists in the library: use ISPF 3.4 or LISTDS.

Fix approaches: - Add the correct load library to STEPLIB. - Verify the module name. For dynamic CALLs, DISPLAY the call target before calling. - If the module is new, make sure it was link-edited successfully.


S913 — RACF Security Violation

What it means: The job's userid does not have the required RACF (or equivalent security product) authorization for a resource — typically a data set, but can be any RACF-protected resource.

Diagnostic steps: 1. Check JESMSGLG for the ICH408I message, which identifies the resource, the userid, and the access level attempted. 2. Note the data set name and the access level (READ, UPDATE, ALTER).

Fix approaches: - Request the appropriate RACF permission from your security team. - Verify the data set name is correct — a typo can generate a false 913 if the wrong DSN matches a different security profile.


S222 — Job Cancelled by Operator

What it means: The job was cancelled, either by an operator (CANCEL command), by an automated operations tool, or by the job's own timeout (TIME parameter exceeded).

Diagnostic steps: 1. Check JESMSGLG for the cancel reason. 2. If IEF352I ADDRESS SPACE UNAVAILABLE appears, the job timed out. 3. If $HASP852 CANCEL -- JOBNAME appears, it was an operator or automation cancel.

Fix approaches: - If timed out, increase the TIME parameter or investigate why the job is running longer than expected. - If operator-cancelled, investigate the reason (often a runaway job or resource contention).


Sx37 Family — Out of DASD Space

Code Meaning
S B37 End of volume, no secondary allocation available
S D37 Primary allocation failed, or secondary exhausted on a single volume
S E37 All extents used, or maximum volume count reached

Common causes: - Output data set SPACE allocation too small. - More data than expected (input volume spike). - Secondary allocation too small or exhausted (max 15 extents per volume). - Data set on a nearly full volume.

Diagnostic steps: 1. Check the IEC030I or IEC032I message for the DD name, data set name, and volume. 2. Check the data set's current space allocation with IDCAMS LISTCAT or ISPF 3.4.

Fix approaches: - Increase primary and secondary space allocation. - Add RLSE to release unused space from other data sets. - For VSAM, increase CA (control area) and CI (control interval) sizes. - Consider compression (DFSMS data set compression for sequential; VSAM compression for KSDS). - Request SMS to assign the data set to a less-full storage group.


User Abends (Language Environment / COBOL Runtime)

User abend codes from the Language Environment runtime start with U and are decimal numbers. The most common:

U0016 — LE Condition Was Raised

What it means: An unhandled LE condition was raised and no condition handler caught it. This is a "catch-all" for various LE errors.

Diagnostic steps: 1. Check CEEDUMP for the specific condition (e.g., CEE3207S, CEE3209S). 2. The LE condition token identifies the exact error.

Fix approaches: Depends on the specific condition. Common ones include: - CEE3207S: Recursive paragraph invocation overflow — restructure your PERFORMs. - CEE3209S: STACK overflow — increase LE STACK option.


U1026 — LE AMODE Incompatibility

What it means: An AMODE 31 program called an AMODE 24 program (or vice versa) in an incompatible way.

Fix: Ensure all programs in the call chain use consistent AMODE. Recompile and relink as needed.


U4038 — LE CEEDUMP Failed

What it means: The runtime tried to produce a CEEDUMP but could not (usually because the CEEDUMP DD is missing or the SYSOUT class is invalid).

Fix: Ensure CEEDUMP DD is allocated in the JCL. This is not the real problem — the real problem is whatever caused the original failure. Fix the CEEDUMP DD, rerun, and diagnose the actual error.


U4093 — LE Message File Not Available

What it means: The LE message file (SCEEMSGP) is not available.

Fix: Ensure the LE runtime libraries are in STEPLIB or the link list. This is almost always an environment setup issue.


DB2 Abend Codes

DB2 issues its own abend codes when it terminates a thread or address space abnormally. These appear as reason codes in the z/OS dump.

04E — DB2 Internal Error (Resource Unavailable)

What it means: DB2 could not acquire a resource — typically an EDM pool page, a log buffer, or a thread.

Common causes: - DB2 subsystem under extreme stress. - EDM pool too small for the workload. - Too many concurrent threads.

Diagnostic steps: 1. Check the DB2 message log for DSNB messages. 2. Check DB2 subsystem statistics — EDM pool full counts, thread usage.

Fix approaches: - Work with the DBA to increase EDM pool, thread limits, or buffer pool sizes. - Reduce concurrent application load. - Optimize SQL to reduce resource consumption.


04F — DB2 Thread Abend

What it means: DB2 terminated the application thread due to an unrecoverable error — typically associated with SQLCODE -904, -911, or -913.

Diagnostic steps: 1. Check the SQLCODE in the SQLCA (from the dump or program output). 2. Check the DB2 message log for the specific error.

Fix approaches: Address the underlying SQLCODE: - -904: Resource unavailable — check tablespace status. - -911: Deadlock — restructure access order, reduce lock duration. - -913: Timeout — same as -911.


CICS Abend Codes

CICS abend codes are 4 characters, typically starting with A. They appear in the CICS log and in the transaction dump.

AICA — Runaway Task

What it means: The transaction exceeded the ICVR (runaway interval) without making a CICS request. CICS assumes it is in an infinite loop and terminates it.

Common causes: - An actual infinite loop in the program. - A long-running computation that does not make any CICS calls. - The ICVR value is too low for the amount of processing required.

Diagnostic steps: 1. Check the CICS transaction dump for the current program counter position. 2. Map the offset to the COBOL statement. 3. Examine the loop — is the termination condition ever being met?

Fix approaches: - Fix the infinite loop. - If the computation is legitimately long, insert an EXEC CICS DELAY FOR HOURS(0) MINUTES(0) SECONDS(0) END-EXEC to reset the runaway timer (but consider whether CICS is the right environment for heavy computation). - Increase ICVR in the SIT (but be cautious — this masks real runaway tasks).


ASRA — Program Check

What it means: A hardware program check (equivalent to an S0Cx system abend) occurred in a CICS program.

Common sub-codes (from the PSW interrupt code):

Code Equivalent Meaning
01 S0C1 Operation exception
04 S0C4 Protection exception
07 S0C7 Data exception

Diagnostic steps: 1. Request a CICS transaction dump (CEDF or CICS DUMP). 2. Map the PSW offset to the COBOL program using the compiler listing. 3. Diagnose as you would the equivalent S0Cx abend.

This is the most common CICS abend. It is essentially the CICS equivalent of S0C4 and S0C7 combined. The same diagnostic and prevention techniques apply.


ASRB — Operating System Abend

What it means: An operating system abend occurred in a CICS program. The system abend code accompanies the ASRB.

Diagnostic steps: Check the accompanying system abend code and diagnose accordingly.


AEY7 — Program Not Found (EXEC CICS LINK/XCTL)

What it means: CICS could not find the program specified in a LINK or XCTL command.

Common causes: - Program not defined in the CICS CSD (CEDA DEFINE PROGRAM). - Program name misspelled. - Load module not in the DFHRPL concatenation. - Program defined but disabled.

Diagnostic steps: 1. Check the CICS CSMT log for the DFHPG0202 message, which identifies the program name. 2. Check CSD definitions: CEDA DISPLAY PROGRAM(pgmname) GROUP(grpname). 3. Check DFHRPL for the load library.

Fix approaches: - Define the program in the CSD if missing. - Verify the load library is in DFHRPL. - Enable the program if disabled: CEMT SET PROGRAM(pgmname) ENABLED. - Correct the program name in the COBOL source.


AEY9 — EXEC CICS Command Failed

What it means: A CICS command returned an unexpected condition that was not handled by RESP/RESP2 or HANDLE CONDITION.

Diagnostic steps: 1. Check the transaction dump for EIBRESP and EIBRESP2 values. 2. Identify which CICS command failed.

Fix approaches: - Add RESP(ws-resp) RESP2(ws-resp2) to all CICS commands. - Handle all expected conditions in an EVALUATE block after each command.


AKCT — Task Purged Due to Timeout

What it means: The transaction exceeded its DTIMOUT (deadlock timeout) waiting for a resource locked by another transaction.

Common causes: - Two transactions locked resources in conflicting order (deadlock). - A batch job held a DB2 lock that the CICS transaction needed. - High contention on a hot resource (e.g., a sequence number table row).

Fix approaches: - Restructure resource access order to prevent deadlock. - Reduce lock duration (commit more frequently, use READ UNCOMMITTED where appropriate). - Consider increasing DTIMOUT if the wait is expected to resolve. - For batch/CICS conflicts, schedule batch outside peak CICS hours.


AEIO — Map/BMS Error

What it means: A BMS SEND MAP or RECEIVE MAP command failed, typically because the map name does not match or the mapset is not available.

Common causes: - Mapset name misspelled in the EXEC CICS SEND MAP or RECEIVE MAP command. - Physical mapset load module not in DFHRPL. - Map compiled with a different name than referenced.

Fix approaches: - Verify mapset and map names in the source. - Verify the mapset load module is in DFHRPL. - Recompile the mapset (BMS assembler) and ensure the output is in the right library.


Diagnostic Decision Tree

When a COBOL program abends, follow this sequence:

1. What is the abend code?
   ├── S0C4 → Subscript/reference modification out of range?
   │         → Linkage Section pointer not established?
   │         → Copybook layout mismatch after changes?
   │
   ├── S0C7 → Which field contains invalid data?
   │         → How was that field populated? (READ, MOVE, INITIALIZE)
   │         → Is the file layout correct?
   │
   ├── S806 → Is the module in STEPLIB/JOBLIB?
   │         → Is the module name spelled correctly?
   │
   ├── S913 → Check ICH408I message for resource and userid
   │
   ├── Sx37 → Increase SPACE allocation
   │
   ├── ASRA → What is the sub-code (interrupt code)?
   │         → Treat as equivalent S0Cx
   │
   └── Other → Check z/OS Messages and Codes manual (SA38-0680)

2. Get the failing statement:
   ├── Batch → PSW offset → compiler OFFSET listing
   └── CICS  → Transaction dump offset → compiler OFFSET listing

3. Get the data values:
   ├── Batch → CEEDUMP, SYSUDUMP, or Debug Tool
   └── CICS  → Transaction dump, CEDF, or Debug Tool

4. Reproduce:
   ├── Same input data → Same abend? → Code fix
   └── Different data → Data-dependent? → Data validation fix

Prevention Checklist

The best abend is the one that never happens. This checklist prevents the most common production abends:

  1. Compile with SSRANGE in all test environments. It catches S0C4 causes before they reach production.
  2. Compile with CHECK(ON) in test. It catches S0C7 causes at the MOVE, not at the COMPUTE.
  3. Initialize all numeric fields. Use INITIALIZE or explicit MOVEs. Never rely on storage containing zeros.
  4. Always check FILE STATUS after every file I/O verb. Handle status 10 (end of file), 23 (not found), 35 (file not found), and 34 (space).
  5. Always check SQLCODE after every EXEC SQL statement. Handle +100 (not found), -803 (duplicate), -811 (multiple rows), -904 (resource unavailable), and -911/-913 (deadlock/timeout).
  6. Always use RESP/RESP2 on every EXEC CICS command. Handle NOTFND, DUPKEY, LENGERR, and PGMIDERR.
  7. Validate input data before arithmetic. Check IS NUMERIC before any computation on data that came from outside your program.
  8. Validate subscripts before use. If a subscript comes from input data, range-check it.
  9. Code ON SIZE ERROR on all COMPUTE, DIVIDE, MULTIPLY, ADD, and SUBTRACT statements that operate on external data.
  10. Allocate CEEDUMP and SYSUDUMP in all production JCL. When an abend does occur, you need the dump.