Case Study 1: GlobalBank's Standardized Return Codes and Error Structures
Background
After modularizing the account maintenance system (Chapter 22), Maria Chen faced a new problem: each subprogram communicated errors differently. ACCTREAD returned file status codes. ACCTVAL returned ad-hoc numeric codes (1 through 7). ACCTCALC returned English text in a message field but no numeric code. ACCTUPD sometimes returned codes and sometimes did not, depending on which developer had last modified it.
"We had modularized the code, but we hadn't modularized the thinking," Maria explains. "Every time I debugged a production issue, I spent the first 30 minutes just figuring out what the error codes meant for the specific module that had failed. Is 3 an error or a warning in ACCTVAL? Nobody remembered."
Derek Washington, as the newest team member, felt the pain most acutely. "I'd see return code 2 from one module and assume it was a mild warning, because in another module it was. Turned out it meant 'account not found' — a real error. I let the job continue processing, and we posted transactions to a non-existent account. That was a bad day."
The Standardization Initiative
Maria proposed a three-part standard:
Part 1: Universal Return Code Convention
Every subprogram would use the RETCODCP copybook:
*================================================================
* RETCODCP — GlobalBank Standard Return Code
*================================================================
01 RTN-CODE-AREA.
05 RTN-CODE PIC S9(4) COMP.
88 RTN-SUCCESS VALUE 0.
88 RTN-WARNING VALUE 4.
88 RTN-ERROR VALUE 8.
88 RTN-SEVERE VALUE 12.
88 RTN-FATAL VALUE 16.
88 RTN-OK VALUES 0 THRU 4.
88 RTN-FAILED VALUES 8 THRU 16.
No exceptions. No variations. No "but my module is different."
Part 2: Standard Error Structure
The ERRSTRCP copybook captured detailed error information:
*================================================================
* ERRSTRCP — GlobalBank Standard Error Structure
*================================================================
01 ERR-STRUCTURE.
05 ERR-PROGRAM-ID PIC X(8).
05 ERR-PARAGRAPH-ID PIC X(30).
05 ERR-TIMESTAMP PIC X(26).
05 ERR-SEVERITY PIC S9(4) COMP.
05 ERR-CATEGORY PIC X(4).
05 ERR-CODE PIC X(8).
05 ERR-SHORT-MSG PIC X(40).
05 ERR-LONG-MSG PIC X(200).
05 ERR-FIELD-INFO.
10 ERR-FIELD-NAME PIC X(30).
10 ERR-FIELD-VAL PIC X(50).
10 ERR-EXPECTED PIC X(50).
05 ERR-CONTEXT.
10 ERR-ACCT-ID PIC X(10).
10 ERR-TXN-ID PIC X(15).
10 ERR-USER-ID PIC X(8).
Part 3: Error Code Catalog
Maria created a central catalog mapping error codes to meanings:
| Code | Module | Severity | Description |
|---|---|---|---|
| ACRD0001 | ACCTREAD | 8 | Account not found |
| ACRD0002 | ACCTREAD | 12 | VSAM I/O error |
| ACRD0003 | ACCTREAD | 4 | Account in closed status |
| ACVL0001 | ACCTVAL | 8 | Invalid account type |
| ACVL0002 | ACCTVAL | 8 | Balance below minimum |
| ACVL0003 | ACCTVAL | 4 | Account approaching minimum |
| ACCL0001 | ACCTCALC | 8 | Interest rate out of range |
| ACCL0002 | ACCTCALC | 8 | Arithmetic overflow |
Implementation Approach
Maria used BY CONTENT for input parameters and BY REFERENCE for outputs, making the data flow direction visible at the call site:
* Before: ambiguous parameter direction
CALL 'ACCTVAL' USING WS-ACCT-DATA
WS-RESULT
* After: clear input/output distinction
CALL 'ACCTVAL' USING BY CONTENT WS-ACCT-ID
BY CONTENT WS-ACCT-TYPE
BY REFERENCE WS-ACCT-STATUS
BY REFERENCE RTN-CODE-AREA
BY REFERENCE ERR-STRUCTURE
The driver program's error handling became uniform:
CHECK-CALL-RESULT.
EVALUATE TRUE
WHEN RTN-SUCCESS
CONTINUE
WHEN RTN-WARNING
ADD 1 TO WS-WARN-COUNT
CALL 'ERRLOG' USING ERR-STRUCTURE
WS-LOG-RC
WHEN RTN-ERROR
ADD 1 TO WS-ERR-COUNT
CALL 'ERRLOG' USING ERR-STRUCTURE
WS-LOG-RC
EXIT PARAGRAPH
WHEN RTN-SEVERE RTN-FATAL
CALL 'ERRLOG' USING ERR-STRUCTURE
WS-LOG-RC
SET STOP-PROCESSING TO TRUE
END-EVALUATE.
Results
The standardization took three weeks — not because the coding was complex, but because every module needed to be modified and retested. The results were measured over six months:
| Metric | Before | After |
|---|---|---|
| Average time to diagnose a production error | 2.5 hours | 1.5 hours |
| Incidents caused by misinterpreted return codes | 4 per quarter | 0 |
| Time for new developer to learn error handling | 2 weeks | 2 days |
| Error log usefulness rating (team survey, 1-5) | 2.1 | 4.6 |
Derek Washington summed it up: "Now when I see RTN-ERROR, I know exactly what it means. I look at ERR-CODE in the error structure, I look it up in the catalog, and I know what happened. No archaeology required."
Lessons Learned
-
Standardization trumps cleverness: A simple, consistent convention understood by everyone is worth more than a clever, module-specific approach understood by one developer.
-
BY CONTENT for inputs makes intent visible: When you see BY CONTENT at the call site, you know that parameter is input-only. The parameter passing mode becomes documentation.
-
Level 88 condition names on return codes eliminate magic numbers:
IF RTN-OKis clearer thanIF WS-RC < 8. -
Error catalogs are essential: Return codes communicate severity; error codes communicate specifics. You need both.
-
The cost of NOT standardizing is paid in debugging time: Every hour spent deciphering ad-hoc error codes is an hour not spent fixing the actual problem.
Discussion Questions
-
The error structure includes ERR-CONTEXT with account-specific fields. How would you generalize this for programs that don't process accounts?
-
Maria chose to use separate parameters (data, return code, error structure) rather than a single COMMAREA. What influenced this decision? When might a COMMAREA be better?
-
The error code catalog is maintained as a separate document. What are the risks of this approach? How could you keep the catalog synchronized with the code?
-
How would you handle a situation where a subprogram encounters multiple errors on a single call? The current structure supports only one error.
-
Derek's incident (processing transactions against a non-existent account) highlights the cost of inconsistent return codes. What testing strategy would catch this kind of cross-module misinterpretation?