Case Study 1: GlobalBank's Standardized Return Codes and Error Structures

Background

After modularizing the account maintenance system (Chapter 22), Maria Chen faced a new problem: each subprogram communicated errors differently. ACCTREAD returned file status codes. ACCTVAL returned ad-hoc numeric codes (1 through 7). ACCTCALC returned English text in a message field but no numeric code. ACCTUPD sometimes returned codes and sometimes did not, depending on which developer had last modified it.

"We had modularized the code, but we hadn't modularized the thinking," Maria explains. "Every time I debugged a production issue, I spent the first 30 minutes just figuring out what the error codes meant for the specific module that had failed. Is 3 an error or a warning in ACCTVAL? Nobody remembered."

Derek Washington, as the newest team member, felt the pain most acutely. "I'd see return code 2 from one module and assume it was a mild warning, because in another module it was. Turned out it meant 'account not found' — a real error. I let the job continue processing, and we posted transactions to a non-existent account. That was a bad day."

The Standardization Initiative

Maria proposed a three-part standard:

Part 1: Universal Return Code Convention

Every subprogram would use the RETCODCP copybook:

      *================================================================
      * RETCODCP — GlobalBank Standard Return Code
      *================================================================
       01  RTN-CODE-AREA.
           05  RTN-CODE           PIC S9(4) COMP.
               88  RTN-SUCCESS    VALUE 0.
               88  RTN-WARNING    VALUE 4.
               88  RTN-ERROR      VALUE 8.
               88  RTN-SEVERE     VALUE 12.
               88  RTN-FATAL      VALUE 16.
               88  RTN-OK         VALUES 0 THRU 4.
               88  RTN-FAILED     VALUES 8 THRU 16.

No exceptions. No variations. No "but my module is different."

Part 2: Standard Error Structure

The ERRSTRCP copybook captured detailed error information:

      *================================================================
      * ERRSTRCP — GlobalBank Standard Error Structure
      *================================================================
       01  ERR-STRUCTURE.
           05  ERR-PROGRAM-ID     PIC X(8).
           05  ERR-PARAGRAPH-ID   PIC X(30).
           05  ERR-TIMESTAMP      PIC X(26).
           05  ERR-SEVERITY       PIC S9(4) COMP.
           05  ERR-CATEGORY       PIC X(4).
           05  ERR-CODE           PIC X(8).
           05  ERR-SHORT-MSG      PIC X(40).
           05  ERR-LONG-MSG       PIC X(200).
           05  ERR-FIELD-INFO.
               10  ERR-FIELD-NAME PIC X(30).
               10  ERR-FIELD-VAL  PIC X(50).
               10  ERR-EXPECTED   PIC X(50).
           05  ERR-CONTEXT.
               10  ERR-ACCT-ID    PIC X(10).
               10  ERR-TXN-ID     PIC X(15).
               10  ERR-USER-ID    PIC X(8).

Part 3: Error Code Catalog

Maria created a central catalog mapping error codes to meanings:

Code Module Severity Description
ACRD0001 ACCTREAD 8 Account not found
ACRD0002 ACCTREAD 12 VSAM I/O error
ACRD0003 ACCTREAD 4 Account in closed status
ACVL0001 ACCTVAL 8 Invalid account type
ACVL0002 ACCTVAL 8 Balance below minimum
ACVL0003 ACCTVAL 4 Account approaching minimum
ACCL0001 ACCTCALC 8 Interest rate out of range
ACCL0002 ACCTCALC 8 Arithmetic overflow

Implementation Approach

Maria used BY CONTENT for input parameters and BY REFERENCE for outputs, making the data flow direction visible at the call site:

      * Before: ambiguous parameter direction
       CALL 'ACCTVAL' USING WS-ACCT-DATA
                             WS-RESULT

      * After: clear input/output distinction
       CALL 'ACCTVAL' USING BY CONTENT WS-ACCT-ID
                             BY CONTENT WS-ACCT-TYPE
                             BY REFERENCE WS-ACCT-STATUS
                             BY REFERENCE RTN-CODE-AREA
                             BY REFERENCE ERR-STRUCTURE

The driver program's error handling became uniform:

       CHECK-CALL-RESULT.
           EVALUATE TRUE
               WHEN RTN-SUCCESS
                   CONTINUE
               WHEN RTN-WARNING
                   ADD 1 TO WS-WARN-COUNT
                   CALL 'ERRLOG' USING ERR-STRUCTURE
                                       WS-LOG-RC
               WHEN RTN-ERROR
                   ADD 1 TO WS-ERR-COUNT
                   CALL 'ERRLOG' USING ERR-STRUCTURE
                                       WS-LOG-RC
                   EXIT PARAGRAPH
               WHEN RTN-SEVERE RTN-FATAL
                   CALL 'ERRLOG' USING ERR-STRUCTURE
                                       WS-LOG-RC
                   SET STOP-PROCESSING TO TRUE
           END-EVALUATE.

Results

The standardization took three weeks — not because the coding was complex, but because every module needed to be modified and retested. The results were measured over six months:

Metric Before After
Average time to diagnose a production error 2.5 hours 1.5 hours
Incidents caused by misinterpreted return codes 4 per quarter 0
Time for new developer to learn error handling 2 weeks 2 days
Error log usefulness rating (team survey, 1-5) 2.1 4.6

Derek Washington summed it up: "Now when I see RTN-ERROR, I know exactly what it means. I look at ERR-CODE in the error structure, I look it up in the catalog, and I know what happened. No archaeology required."

Lessons Learned

  1. Standardization trumps cleverness: A simple, consistent convention understood by everyone is worth more than a clever, module-specific approach understood by one developer.

  2. BY CONTENT for inputs makes intent visible: When you see BY CONTENT at the call site, you know that parameter is input-only. The parameter passing mode becomes documentation.

  3. Level 88 condition names on return codes eliminate magic numbers: IF RTN-OK is clearer than IF WS-RC < 8.

  4. Error catalogs are essential: Return codes communicate severity; error codes communicate specifics. You need both.

  5. The cost of NOT standardizing is paid in debugging time: Every hour spent deciphering ad-hoc error codes is an hour not spent fixing the actual problem.

Discussion Questions

  1. The error structure includes ERR-CONTEXT with account-specific fields. How would you generalize this for programs that don't process accounts?

  2. Maria chose to use separate parameters (data, return code, error structure) rather than a single COMMAREA. What influenced this decision? When might a COMMAREA be better?

  3. The error code catalog is maintained as a separate document. What are the risks of this approach? How could you keep the catalog synchronized with the code?

  4. How would you handle a situation where a subprogram encounters multiple errors on a single call? The current structure supports only one error.

  5. Derek's incident (processing transactions against a non-existent account) highlights the cost of inconsistent return codes. What testing strategy would catch this kind of cross-module misinterpretation?