Chapter 38: Batch Processing Patterns

DataField.Dev

28 min read

> "Batch processing is the heartbeat of the mainframe. Online systems get the glory, but batch jobs do the heavy lifting — every single night, without fail, while the rest of the world sleeps."

In This Chapter

38.1 Batch Architecture: The Big Picture
38.2 Restart and Recovery
38.3 Control Totals
38.4 Audit Trails
38.5 The Balanced Update Pattern
38.6 Multi-Step Job Streams
38.7 JCL for Complex Batch
38.8 Generation Data Groups (GDG)
38.9 Try It Yourself: Building a Checkpoint/Restart Program
38.10 GlobalBank Case Study: The Nightly Batch Cycle
38.11 MedClaim Case Study: Claims Batch Processing Pipeline
38.12 Advanced Patterns
38.13 Return Codes and Job Stream Communication
38.14 Performance Considerations
38.15 Putting It All Together
38.16 GDG Management in Production
38.17 Production Scheduling: TWS and CA-7
38.18 Restart/Recovery: A Complete Worked Example
38.19 Multi-Step Job Dependencies: Advanced Patterns
38.20 Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 38: Batch Processing Patterns

"Batch processing is the heartbeat of the mainframe. Online systems get the glory, but batch jobs do the heavy lifting — every single night, without fail, while the rest of the world sleeps." — Maria Chen, reviewing GlobalBank's nightly batch schedule

You have written batch programs before. You have read records from a file, processed them, and written output. But the batch programs running in production at banks, insurance companies, and government agencies are a different animal entirely. They run in coordinated sequences. They restart cleanly after failures. They track every dollar, every record, every change. They produce audit trails that satisfy regulators. And they do all of this across millions of records every night within a window that keeps shrinking as transaction volumes grow.

This chapter teaches you the patterns that make enterprise batch processing reliable, auditable, and recoverable. These are not theoretical abstractions — they are the techniques that Maria Chen uses every night at GlobalBank and that James Okafor relies on at MedClaim. When a job fails at 2:47 AM and the operations team pages you, these patterns are what stand between a smooth restart and a very long night.

38.1 Batch Architecture: The Big Picture

Before we dive into code, you need to understand how enterprise batch processing is organized. A production batch environment is not a single program — it is an orchestrated system of interconnected jobs that must execute in a specific order, within a specific window, producing specific outputs.

The Batch Window

The batch window is the time period — typically overnight — when batch jobs run. At GlobalBank, the batch window opens at 11:00 PM when the last online CICS region quiesces, and it must close by 6:00 AM when online processing resumes. That gives Maria's team seven hours to process the entire day's work.

📊 GlobalBank Nightly Batch Window - 11:00 PM — Online regions quiesce, batch window opens - 11:15 PM — Extract jobs pull daily transactions - 11:45 PM — Validation and enrichment jobs - 12:30 AM — Core processing: interest calculation, fee assessment - 2:00 AM — Account updates: post transactions to master files - 3:00 AM — General ledger reconciliation - 4:00 AM — Statement generation, regulatory reporting - 5:00 AM — Data distribution: feeds to downstream systems - 5:30 AM — Housekeeping: archive, backup, catalog maintenance - 6:00 AM — Batch window closes, online regions restart

If any job runs long or fails, every subsequent job is affected. This is why restart/recovery is not optional — it is essential.

Job Streams and Dependencies

A job stream is a sequence of batch jobs that must execute in order. Dependencies can be:

Sequential: Job B cannot start until Job A completes successfully
Parallel: Jobs C and D can run simultaneously after Job B
Conditional: Job E runs only if Job C produced output
Resource-based: Job F requires exclusive access to a dataset

💡 Key Insight: Modern mainframe scheduling tools like CA-7, TWS (Tivoli Workload Scheduler), and Control-M manage these dependencies automatically. But as a COBOL developer, you must understand the dependencies because your program's design — especially its restart behavior — must align with the job stream architecture.

The Typical Batch Program Structure

Most enterprise batch COBOL programs follow a common structure:

INITIALIZATION
  - Open files
  - Load reference tables
  - Initialize counters and accumulators
  - Read checkpoint file (for restart)
  - Position to restart point (if restarting)

MAIN PROCESSING LOOP
  - Read input record
  - Validate record
  - Apply business rules
  - Write output
  - Increment counters
  - Write checkpoint at intervals

TERMINATION
  - Write final control totals
  - Write audit trail records
  - Close all files
  - Write completion status
  - Return appropriate return code

Every pattern in this chapter fits into this structure. Let us examine each one.

38.2 Restart and Recovery

Restart/recovery is the most important pattern in enterprise batch processing. A program that processes 10 million records must be able to resume from a failure point without reprocessing records that were already handled successfully.

Why Restart Matters

Consider GlobalBank's nightly transaction posting job. It reads 2.3 million transactions and posts them against the account master file. If the job fails at transaction 1,500,000 due to a disk error, you have two choices:

Restart from the beginning: Reprocess 1.5 million transactions. But those transactions already updated the master file. You would double-post them, corrupting every affected account.
Restart from the failure point: Resume at transaction 1,500,001. This requires knowing exactly where you left off and ensuring the last batch of updates was either fully committed or fully backed out.

Option 2 is the only viable approach. This is what checkpoint/restart provides.

Checkpoint Records

A checkpoint is a snapshot of your program's state at a specific point in processing. It records:

The number of records processed so far
Current control total values
The key of the last record successfully processed
Any in-flight accumulations
A timestamp

You write checkpoints at regular intervals — every 5,000 records, every 10,000 records, or at natural boundaries in your data (such as after each account number changes).

       01  WS-CHECKPOINT-RECORD.
           05  CHKPT-PROGRAM-ID         PIC X(08).
           05  CHKPT-RUN-DATE           PIC 9(08).
           05  CHKPT-RUN-TIME           PIC 9(06).
           05  CHKPT-TIMESTAMP          PIC X(26).
           05  CHKPT-RECORDS-READ       PIC 9(10).
           05  CHKPT-RECORDS-WRITTEN    PIC 9(10).
           05  CHKPT-RECORDS-REJECTED   PIC 9(10).
           05  CHKPT-LAST-KEY           PIC X(20).
           05  CHKPT-HASH-TOTAL         PIC S9(15) COMP-3.
           05  CHKPT-FINANCIAL-TOTAL    PIC S9(13)V99 COMP-3.
           05  CHKPT-STATUS             PIC X(01).
               88  CHKPT-IN-PROGRESS    VALUE 'P'.
               88  CHKPT-COMPLETED      VALUE 'C'.
               88  CHKPT-ABENDED        VALUE 'A'.
           05  CHKPT-INTERVAL-COUNT     PIC 9(05).
           05  CHKPT-FILLER             PIC X(50).

The Checkpoint/Restart Pattern

Here is the core logic for checkpoint/restart:

       PROCEDURE DIVISION.
       0000-MAIN-CONTROL.
           PERFORM 1000-INITIALIZATION
           PERFORM 2000-PROCESS-RECORDS
               UNTIL WS-EOF-FLAG = 'Y'
           PERFORM 9000-TERMINATION
           STOP RUN.

       1000-INITIALIZATION.
           OPEN INPUT  TXN-FILE
           OPEN I-O    ACCT-MASTER
           OPEN I-O    CHECKPOINT-FILE
      *    -----------------------------------------------
      *    Check for restart: read checkpoint file
      *    -----------------------------------------------
           READ CHECKPOINT-FILE INTO WS-CHECKPOINT-RECORD
               AT END
                   MOVE 'N' TO WS-RESTART-FLAG
               NOT AT END
                   IF CHKPT-IN-PROGRESS
                       MOVE 'Y' TO WS-RESTART-FLAG
                       PERFORM 1500-POSITION-FOR-RESTART
                   ELSE
                       MOVE 'N' TO WS-RESTART-FLAG
                   END-IF
           END-READ

           IF WS-RESTART-FLAG = 'N'
               INITIALIZE WS-CHECKPOINT-RECORD
               MOVE 'TXNPOST'   TO CHKPT-PROGRAM-ID
               MOVE WS-RUN-DATE TO CHKPT-RUN-DATE
               SET CHKPT-IN-PROGRESS TO TRUE
           END-IF.

       1500-POSITION-FOR-RESTART.
      *    -----------------------------------------------
      *    Skip records already processed
      *    Restore counters from checkpoint
      *    -----------------------------------------------
           MOVE CHKPT-RECORDS-READ     TO WS-RECORDS-READ
           MOVE CHKPT-RECORDS-WRITTEN  TO WS-RECORDS-WRITTEN
           MOVE CHKPT-RECORDS-REJECTED TO WS-RECORDS-REJECTED
           MOVE CHKPT-HASH-TOTAL       TO WS-HASH-TOTAL
           MOVE CHKPT-FINANCIAL-TOTAL  TO WS-FINANCIAL-TOTAL

           PERFORM VARYING WS-SKIP-COUNT
               FROM 1 BY 1
               UNTIL WS-SKIP-COUNT > CHKPT-RECORDS-READ
                  OR WS-EOF-FLAG = 'Y'
               READ TXN-FILE INTO WS-TXN-RECORD
                   AT END
                       MOVE 'Y' TO WS-EOF-FLAG
               END-READ
           END-PERFORM

           DISPLAY 'RESTART: Skipped ' WS-SKIP-COUNT
                   ' records, resuming from record '
                   CHKPT-RECORDS-READ.

⚠️ Critical Point: When restarting, you must skip exactly the right number of input records AND ensure that any partially updated master file records are handled correctly. This is why many shops use VSAM with backout capabilities or DB2 with commit/rollback — they provide transactional integrity that sequential files cannot.

Checkpoint Interval

How often should you write checkpoints? The answer involves a trade-off:

More frequent checkpoints = less reprocessing after failure, but more I/O overhead
Less frequent checkpoints = better performance, but more reprocessing after failure

A common guideline is to checkpoint every 5,000 to 10,000 records. At GlobalBank, Maria's team uses a configurable checkpoint interval stored in a parameter file:

       01  WS-PARM-RECORD.
           05  PARM-CHECKPOINT-INTERVAL PIC 9(05) VALUE 5000.
           05  PARM-COMMIT-FREQUENCY    PIC 9(05) VALUE 1000.
           05  PARM-MAX-ERRORS          PIC 9(05) VALUE 100.

Writing the Checkpoint

       2500-WRITE-CHECKPOINT.
           MOVE WS-RECORDS-READ      TO CHKPT-RECORDS-READ
           MOVE WS-RECORDS-WRITTEN   TO CHKPT-RECORDS-WRITTEN
           MOVE WS-RECORDS-REJECTED  TO CHKPT-RECORDS-REJECTED
           MOVE WS-LAST-KEY-PROCESSED
                                     TO CHKPT-LAST-KEY
           MOVE WS-HASH-TOTAL        TO CHKPT-HASH-TOTAL
           MOVE WS-FINANCIAL-TOTAL   TO CHKPT-FINANCIAL-TOTAL
           ADD 1 TO CHKPT-INTERVAL-COUNT
           MOVE FUNCTION CURRENT-DATE TO CHKPT-TIMESTAMP
           SET CHKPT-IN-PROGRESS TO TRUE

           REWRITE CHECKPOINT-RECORD FROM WS-CHECKPOINT-RECORD
           IF CHKPT-FILE-STATUS NOT = '00'
               DISPLAY 'FATAL: Checkpoint write failed, '
                       'status=' CHKPT-FILE-STATUS
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF.

💡 Key Insight: Never ignore a checkpoint write failure. If you cannot write a checkpoint, you cannot guarantee restart capability. Abort immediately with a high return code so the scheduler knows the job failed.

38.3 Control Totals

Control totals are the immune system of batch processing. They catch errors that logic alone cannot detect — missing records, duplicate processing, data corruption, and rounding drift.

Types of Control Totals

There are three categories:

1. Record Counts Simple counts of records read, written, processed, rejected, and skipped. These are the most basic control total.

       01  WS-RECORD-COUNTS.
           05  WS-RECORDS-READ        PIC 9(10) VALUE 0.
           05  WS-RECORDS-WRITTEN     PIC 9(10) VALUE 0.
           05  WS-RECORDS-UPDATED     PIC 9(10) VALUE 0.
           05  WS-RECORDS-REJECTED    PIC 9(10) VALUE 0.
           05  WS-RECORDS-SKIPPED     PIC 9(10) VALUE 0.

The fundamental control total equation is:

Records Read = Records Written + Records Rejected + Records Skipped

If this equation does not balance at the end of the run, something went wrong.

2. Hash Totals

A hash total is a sum of a non-financial field — typically account numbers or record keys. The sum itself is meaningless, but if two systems independently compute the same hash total over the same set of records, you know they processed the same records.

       01  WS-HASH-TOTALS.
           05  WS-HASH-ACCT-NO       PIC S9(15) COMP-3 VALUE 0.
           05  WS-HASH-CLAIM-NO      PIC S9(15) COMP-3 VALUE 0.

At MedClaim, James Okafor uses hash totals to verify that every claim that enters the intake process exits the adjudication process:

      *    Accumulate hash total of claim numbers
           ADD CLM-CLAIM-NUMBER TO WS-HASH-CLAIM-NO

At the end of processing, the claim intake hash total must match the sum of adjudicated hash totals plus rejected hash totals.

3. Financial Totals

Financial totals track monetary amounts. They must balance to the penny.

       01  WS-FINANCIAL-TOTALS.
           05  WS-TOTAL-DEBITS       PIC S9(13)V99 COMP-3 VALUE 0.
           05  WS-TOTAL-CREDITS      PIC S9(13)V99 COMP-3 VALUE 0.
           05  WS-NET-AMOUNT         PIC S9(13)V99 COMP-3 VALUE 0.
           05  WS-TOTAL-INTEREST     PIC S9(11)V99 COMP-3 VALUE 0.
           05  WS-TOTAL-FEES         PIC S9(11)V99 COMP-3 VALUE 0.

📊 Control Total Report — GlobalBank Nightly Posting

 GLOBALBANK TRANSACTION POSTING - CONTROL TOTAL REPORT
 RUN DATE: 2026-03-10  RUN TIME: 01:47:33
 ===================================================
 RECORD COUNTS:
   TRANSACTIONS READ:          2,347,891
   TRANSACTIONS POSTED:        2,340,156
   TRANSACTIONS REJECTED:          7,412
   TRANSACTIONS SKIPPED:             323
   COUNT VERIFICATION:         BALANCED  ✓

 HASH TOTALS:
   INPUT ACCT HASH:    483,291,847,223
   OUTPUT ACCT HASH:   483,291,847,223
   HASH VERIFICATION:         BALANCED  ✓

 FINANCIAL TOTALS:
   TOTAL DEBITS:      $  847,291,433.27
   TOTAL CREDITS:     $  851,003,892.41
   NET MOVEMENT:      $   -3,712,459.14
   GL RECONCILIATION:         BALANCED  ✓

Implementing Control Total Verification

       9100-VERIFY-CONTROL-TOTALS.
      *    -----------------------------------------------
      *    Verify record count balance
      *    -----------------------------------------------
           COMPUTE WS-EXPECTED-TOTAL =
               WS-RECORDS-WRITTEN +
               WS-RECORDS-REJECTED +
               WS-RECORDS-SKIPPED

           IF WS-EXPECTED-TOTAL NOT = WS-RECORDS-READ
               DISPLAY 'ERROR: Record count imbalance'
               DISPLAY '  Read:     ' WS-RECORDS-READ
               DISPLAY '  Expected: ' WS-EXPECTED-TOTAL
               MOVE 12 TO RETURN-CODE
               MOVE 'Y' TO WS-ERROR-FLAG
           END-IF

      *    -----------------------------------------------
      *    Verify hash total balance
      *    -----------------------------------------------
           COMPUTE WS-HASH-DIFF =
               WS-HASH-INPUT - WS-HASH-OUTPUT
           IF WS-HASH-DIFF NOT = 0
               DISPLAY 'ERROR: Hash total imbalance'
               DISPLAY '  Input hash:  ' WS-HASH-INPUT
               DISPLAY '  Output hash: ' WS-HASH-OUTPUT
               DISPLAY '  Difference:  ' WS-HASH-DIFF
               MOVE 12 TO RETURN-CODE
               MOVE 'Y' TO WS-ERROR-FLAG
           END-IF

      *    -----------------------------------------------
      *    Verify financial balance (debits = credits)
      *    -----------------------------------------------
           COMPUTE WS-FINANCIAL-DIFF =
               WS-TOTAL-DEBITS - WS-TOTAL-CREDITS
                                - WS-NET-MOVEMENT
           IF WS-FINANCIAL-DIFF NOT = 0
               DISPLAY 'WARNING: Financial imbalance of '
                       WS-FINANCIAL-DIFF
               MOVE 08 TO RETURN-CODE
           END-IF.

⚠️ Common Trap: Never use floating-point (COMP-1 or COMP-2) for financial totals. The rounding errors will accumulate and your totals will not balance. Always use COMP-3 (packed decimal) or DISPLAY numeric for money.

38.4 Audit Trails

Regulators, auditors, and compliance teams need to answer a simple question: What changed, when, and why? Audit trails provide the answer.

Before/After Images

The most rigorous audit approach records both the before image (the record before the change) and the after image (the record after the change).

       01  WS-AUDIT-RECORD.
           05  AUDIT-TIMESTAMP        PIC X(26).
           05  AUDIT-PROGRAM-ID       PIC X(08).
           05  AUDIT-USER-ID          PIC X(08).
           05  AUDIT-ACTION           PIC X(01).
               88  AUDIT-ADD          VALUE 'A'.
               88  AUDIT-UPDATE       VALUE 'U'.
               88  AUDIT-DELETE       VALUE 'D'.
           05  AUDIT-RECORD-KEY       PIC X(20).
           05  AUDIT-FIELD-NAME       PIC X(30).
           05  AUDIT-BEFORE-VALUE     PIC X(50).
           05  AUDIT-AFTER-VALUE      PIC X(50).
           05  AUDIT-REASON-CODE      PIC X(04).

Writing Audit Records

       3500-WRITE-AUDIT-TRAIL.
      *    -----------------------------------------------
      *    Compare before and after images field by field
      *    Write audit record for each changed field
      *    -----------------------------------------------
           MOVE FUNCTION CURRENT-DATE
               TO AUDIT-TIMESTAMP
           MOVE 'TXNPOST'   TO AUDIT-PROGRAM-ID
           MOVE WS-USER-ID  TO AUDIT-USER-ID
           SET AUDIT-UPDATE  TO TRUE
           MOVE ACCT-ACCOUNT-NO TO AUDIT-RECORD-KEY

      *    Check each auditable field
           IF WS-BEFORE-BALANCE NOT = WS-AFTER-BALANCE
               MOVE 'ACCT-BALANCE' TO AUDIT-FIELD-NAME
               MOVE WS-BEFORE-BALANCE TO AUDIT-BEFORE-VALUE
               MOVE WS-AFTER-BALANCE  TO AUDIT-AFTER-VALUE
               MOVE 'TPST'            TO AUDIT-REASON-CODE
               WRITE AUDIT-FILE-RECORD FROM WS-AUDIT-RECORD
           END-IF

           IF WS-BEFORE-STATUS NOT = WS-AFTER-STATUS
               MOVE 'ACCT-STATUS' TO AUDIT-FIELD-NAME
               MOVE WS-BEFORE-STATUS TO AUDIT-BEFORE-VALUE
               MOVE WS-AFTER-STATUS  TO AUDIT-AFTER-VALUE
               MOVE 'STCH'           TO AUDIT-REASON-CODE
               WRITE AUDIT-FILE-RECORD FROM WS-AUDIT-RECORD
           END-IF.

💡 Key Insight: Field-level audit trails are more useful than record-level ones. When an auditor asks "who changed this customer's credit limit?", you can answer precisely instead of forcing them to compare two complete records to find the difference.

Audit Trail for Batch Runs

Beyond individual record changes, you should also audit the batch run itself:

       01  WS-BATCH-AUDIT-RECORD.
           05  BA-RUN-ID              PIC X(20).
           05  BA-PROGRAM-ID          PIC X(08).
           05  BA-START-TIMESTAMP     PIC X(26).
           05  BA-END-TIMESTAMP       PIC X(26).
           05  BA-RECORDS-PROCESSED   PIC 9(10).
           05  BA-RECORDS-REJECTED    PIC 9(10).
           05  BA-RETURN-CODE         PIC 9(04).
           05  BA-COMPLETION-STATUS   PIC X(08).
               88  BA-SUCCESS         VALUE 'SUCCESS '.
               88  BA-WARNING         VALUE 'WARNING '.
               88  BA-FAILURE         VALUE 'FAILURE '.
           05  BA-CONTROL-TOTALS.
               10  BA-HASH-TOTAL     PIC S9(15) COMP-3.
               10  BA-FIN-TOTAL      PIC S9(13)V99 COMP-3.
           05  BA-CHECKPOINT-COUNT    PIC 9(05).
           05  BA-RESTART-FLAG        PIC X(01).
               88  BA-FRESH-RUN      VALUE 'N'.
               88  BA-RESTART-RUN    VALUE 'Y'.

38.5 The Balanced Update Pattern

The balanced update pattern (also called balanced line update or sequential master file update) is one of the most important patterns in batch COBOL. It processes a sorted transaction file against a sorted master file, producing an updated master file.

The Classic Algorithm

Both files must be sorted on the same key. The algorithm works by comparing keys:

IF master-key < transaction-key:
    Write master record unchanged to new master
    Read next master record
ELSE IF master-key = transaction-key:
    Apply transaction to master record
    (Don't write yet — there may be more transactions for this key)
ELSE (master-key > transaction-key):
    If transaction is an ADD, create new master record
    If transaction is UPDATE/DELETE, it's an error (no matching master)
    Read next transaction

Here is the complete pattern:

       2000-BALANCED-UPDATE.
           PERFORM 2100-READ-MASTER
           PERFORM 2200-READ-TRANSACTION
           PERFORM UNTIL WS-BOTH-EOF
               EVALUATE TRUE
                   WHEN MSTR-KEY < TXN-KEY
                       PERFORM 2300-WRITE-MASTER-UNCHANGED
                       PERFORM 2100-READ-MASTER
                   WHEN MSTR-KEY = TXN-KEY
                       PERFORM 2400-APPLY-TRANSACTION
                       PERFORM 2200-READ-TRANSACTION
                   WHEN MSTR-KEY > TXN-KEY
                       PERFORM 2500-UNMATCHED-TRANSACTION
                       PERFORM 2200-READ-TRANSACTION
                   WHEN WS-MASTER-EOF AND NOT WS-TXN-EOF
                       PERFORM 2500-UNMATCHED-TRANSACTION
                       PERFORM 2200-READ-TRANSACTION
                   WHEN WS-TXN-EOF AND NOT WS-MASTER-EOF
                       PERFORM 2300-WRITE-MASTER-UNCHANGED
                       PERFORM 2100-READ-MASTER
               END-EVALUATE
           END-PERFORM.

       2400-APPLY-TRANSACTION.
      *    -----------------------------------------------
      *    Process all transactions for the current key
      *    Then write the updated master
      *    -----------------------------------------------
           EVALUATE TXN-ACTION-CODE
               WHEN 'U'
                   PERFORM 2410-UPDATE-MASTER-FIELDS
               WHEN 'D'
                   SET WS-DELETE-FLAG TO TRUE
               WHEN OTHER
                   PERFORM 9500-WRITE-ERROR-RECORD
           END-EVALUATE

      *    Check for more transactions with the same key
           PERFORM 2200-READ-TRANSACTION
           PERFORM UNTIL TXN-KEY NOT = MSTR-KEY
                      OR WS-TXN-EOF
               EVALUATE TXN-ACTION-CODE
                   WHEN 'U'
                       PERFORM 2410-UPDATE-MASTER-FIELDS
                   WHEN 'D'
                       SET WS-DELETE-FLAG TO TRUE
                   WHEN OTHER
                       PERFORM 9500-WRITE-ERROR-RECORD
               END-EVALUATE
               PERFORM 2200-READ-TRANSACTION
           END-PERFORM

           IF NOT WS-DELETE-FLAG
               PERFORM 2300-WRITE-MASTER-UPDATED
           END-IF
           SET WS-DELETE-FLAG TO FALSE.

⚠️ Critical Pattern Rule: Always handle the case where multiple transactions exist for the same master key. The classic mistake is to write the updated master after the first matching transaction, then find a second transaction for the same key with no master to update.

High-Value Keys: End-of-File Handling

A subtle but critical detail in the balanced update is handling end-of-file conditions. When one file reaches EOF before the other, you need to continue processing the remaining file. The standard technique uses high-value keys:

       2100-READ-MASTER.
           READ MASTER-FILE INTO WS-MASTER-RECORD
               AT END
                   MOVE HIGH-VALUES TO MSTR-KEY
                   MOVE 'Y' TO WS-MASTER-EOF-FLAG
           END-READ.

       2200-READ-TRANSACTION.
           READ TXN-FILE INTO WS-TXN-RECORD
               AT END
                   MOVE HIGH-VALUES TO TXN-KEY
                   MOVE 'Y' TO WS-TXN-EOF-FLAG
           END-READ.

By setting the key to HIGH-VALUES at EOF, the comparison logic naturally handles the remaining records: a master key of HIGH-VALUES will always be greater than any transaction key, and vice versa. Both reaching HIGH-VALUES simultaneously signals the end of processing.

       01  WS-EOF-FLAGS.
           05  WS-MASTER-EOF-FLAG     PIC X(01) VALUE 'N'.
               88  WS-MASTER-EOF      VALUE 'Y'.
           05  WS-TXN-EOF-FLAG        PIC X(01) VALUE 'N'.
               88  WS-TXN-EOF         VALUE 'Y'.
           05  WS-BOTH-EOF            VALUE 'Y' 'Y'.

38.6 Multi-Step Job Streams

Real batch processing involves multiple programs running in sequence. Each program performs one step in a larger process. The output of one step becomes the input of the next.

GlobalBank Nightly Cycle — Step by Step

Let us trace through GlobalBank's nightly batch cycle to see how multi-step job streams work in practice. Maria Chen manages this sequence, and she will tell you that every step exists for a reason — usually a reason discovered at 3 AM during a production incident.

Step 1: Extract (GBEXTRACT) Extract the day's transactions from the online transaction log into a sequential file.

Step 2: Sort (GBSORT01) Sort the extracted transactions by account number for balanced update processing.

Step 3: Validate (GBVALID) Validate each transaction against business rules. Write valid transactions to a good file, invalid ones to an error file.

Step 4: Post (GBPOST) Balanced update: apply valid transactions to the account master file. This is the critical step — the one that changes real balances.

Step 5: Interest (GBINTCALC) Calculate daily interest on all accounts. Read the updated master, compute interest, write interest accrual records.

Step 6: Fees (GBFEECALC) Assess monthly fees for accounts that meet fee criteria.

Step 7: GL Reconciliation (GBGLREC) Reconcile account totals against the general ledger. This must balance to zero.

Step 8: Statements (GBSTMT) Generate account statements for accounts with statement cycle dates matching today.

Step 9: Archive (GBARCHIVE) Archive the day's transaction file to a GDG for historical retention.

Step 10: Housekeeping (GBCLEAN) Clean up work files, update run control tables, prepare for the next cycle.

Each step produces a return code: - 0 = success - 4 = warning (minor issues, continue processing) - 8 = error (significant issues, may need investigation) - 12 = severe error (stop the job stream) - 16 = catastrophic failure (immediate page to on-call)

Designing for Restartability

When designing a multi-step job stream, each step must be independently restartable. This means:

Each step must be idempotent — running it twice produces the same result as running it once (where possible)
Each step must check for prior completion — skip processing if already done
Each step must produce clear return codes
Each step must write control totals — so the next step can verify it received the correct input

      *    -----------------------------------------------
      *    Step initialization: check prior completion
      *    -----------------------------------------------
       1000-INITIALIZATION.
           PERFORM 1100-CHECK-RUN-CONTROL

           IF WS-ALREADY-COMPLETED
               DISPLAY 'GBPOST already completed for '
                       WS-BUSINESS-DATE
               DISPLAY 'Skipping - return code 0'
               MOVE 0 TO RETURN-CODE
               STOP RUN
           END-IF

      *    Verify input control totals from prior step
           PERFORM 1200-VERIFY-INPUT-CONTROLS

           IF WS-INPUT-CONTROLS-BAD
               DISPLAY 'ERROR: Input control totals do not'
                       ' match GBVALID output totals'
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF.

✅ Best Practice: Always verify input control totals against the prior step's output control totals. If Step 4 receives a file with 100,000 records but Step 3 reported writing 100,001 records, something was lost or corrupted in transit.

38.7 JCL for Complex Batch

The Job Control Language (JCL) that drives batch jobs is not just boilerplate — it contains critical logic for job stream management. While a full JCL course is beyond our scope, you must understand the patterns that affect your COBOL programs.

Conditional Execution with COND and IF/THEN/ELSE

The traditional COND parameter controls whether a step executes based on return codes from prior steps:

//GBPOST   EXEC PGM=GBPOST
//         COND=(8,LT)

This means: do NOT execute GBPOST if the return code from any prior step is LESS THAN 8. In other words, skip this step if any prior step returned 8 or higher.

⚠️ JCL Gotcha: The COND parameter tests the inverse of what you want. COND=(8,LT) means "skip if any prior RC < 8" — which skips on RC 0 and 4. This is the opposite of what most people expect. It is a source of countless production incidents.

Modern JCL uses IF/THEN/ELSE/ENDIF, which is far more readable:

//    IF (GBVALID.RC <= 4) THEN
//GBPOST   EXEC PGM=GBPOST
//STEPLIB  DD  DSN=GBANK.PROD.LOADLIB,DISP=SHR
//MASTIN   DD  DSN=GBANK.ACCT.MASTER,DISP=OLD
//MASTOUT  DD  DSN=GBANK.ACCT.MASTER.NEW,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(100,50),RLSE),
//             DCB=(RECFM=FB,LRECL=500,BLKSIZE=27500)
//TXNIN    DD  DSN=GBANK.TXN.VALID,DISP=SHR
//ERROUT   DD  DSN=GBANK.TXN.ERRORS,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(TRK,(50,25),RLSE)
//SYSOUT   DD  SYSOUT=*
//    ELSE
//         EXEC PGM=IEFBR14
//    ENDIF

💡 Key Insight: IF/THEN/ELSE in JCL evaluates the same return code logic but in a readable format. Always prefer this over COND in new jobs.

Multi-Step JCL Example

Here is a simplified version of GlobalBank's nightly batch JCL:

//GBNITELY JOB (ACCT),'NIGHTLY BATCH',
//         CLASS=A,MSGCLASS=H,
//         NOTIFY=&SYSUID,
//         RESTART=*
//*
//* =============================================
//* GLOBALBANK NIGHTLY BATCH CYCLE
//* =============================================
//*
//* STEP 1: EXTRACT DAILY TRANSACTIONS
//EXTRACT  EXEC PGM=GBEXTRACT
//STEPLIB  DD  DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNLOG   DD  DSN=GBANK.ONLINE.TXNLOG,DISP=SHR
//TXNOUT   DD  DSN=GBANK.TXN.EXTRACT,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(200,100),RLSE),
//             DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//CTLOUT   DD  DSN=GBANK.CTL.EXTRACT,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(TRK,(1,1))
//SYSOUT   DD  SYSOUT=*
//*
//* STEP 2: SORT BY ACCOUNT NUMBER
//    IF (EXTRACT.RC <= 4) THEN
//SORT01   EXEC PGM=SORT
//SORTIN   DD  DSN=GBANK.TXN.EXTRACT,DISP=SHR
//SORTOUT  DD  DSN=GBANK.TXN.SORTED,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(200,100),RLSE)
//SYSIN    DD  *
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=NONE
/*
//SYSOUT   DD  SYSOUT=*
//    ENDIF
//*
//* STEP 3: VALIDATE
//    IF (SORT01.RC <= 4) THEN
//VALID    EXEC PGM=GBVALID
//STEPLIB  DD  DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNIN    DD  DSN=GBANK.TXN.SORTED,DISP=SHR
//TXNOUT   DD  DSN=GBANK.TXN.VALID,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(200,100),RLSE)
//ERROUT   DD  DSN=GBANK.TXN.ERRORS,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(10,5),RLSE)
//CTLIN    DD  DSN=GBANK.CTL.EXTRACT,DISP=SHR
//CTLOUT   DD  DSN=GBANK.CTL.VALID,
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(TRK,(1,1))
//SYSOUT   DD  SYSOUT=*
//    ENDIF

38.8 Generation Data Groups (GDG)

A Generation Data Group (GDG) is a collection of chronologically related datasets that share a common base name. Each time you create a new generation, it gets a version number. Old generations can be automatically deleted based on a retention limit.

Why GDGs Matter

Consider GlobalBank's daily transaction archive. Every night, the archive step creates a file containing the day's transactions. Without GDGs, Maria would need to manually manage dataset names:

GBANK.TXN.ARCHIVE.D20260301
GBANK.TXN.ARCHIVE.D20260302
GBANK.TXN.ARCHIVE.D20260303
...

With GDGs, the system manages generations automatically:

GBANK.TXN.ARCHIVE.G0001V00  (oldest)
GBANK.TXN.ARCHIVE.G0002V00
GBANK.TXN.ARCHIVE.G0003V00
...
GBANK.TXN.ARCHIVE.G0365V00  (newest)

GDG Relative References

The real power of GDGs is relative referencing:

GBANK.TXN.ARCHIVE(0) — the current (most recent) generation
GBANK.TXN.ARCHIVE(+1) — the next generation (one you are creating)
GBANK.TXN.ARCHIVE(-1) — the previous generation
GBANK.TXN.ARCHIVE(-2) — two generations back

This makes JCL generic — it never needs date-specific dataset names:

//* Create new generation of transaction archive
//ARCHIVE  EXEC PGM=GBARCHIVE
//STEPLIB  DD  DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNIN    DD  DSN=GBANK.TXN.VALID,DISP=SHR
//ARCOUT   DD  DSN=GBANK.TXN.ARCHIVE(+1),
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(200,100),RLSE),
//             DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//*
//* Compare today's archive with yesterday's for anomalies
//COMPARE  EXEC PGM=GBCOMPARE
//TODAY    DD  DSN=GBANK.TXN.ARCHIVE(0),DISP=SHR
//YESTERDAY DD DSN=GBANK.TXN.ARCHIVE(-1),DISP=SHR
//RPTOUT   DD  SYSOUT=*

GDG in COBOL Programs

Your COBOL program does not need to know about GDG generations — the JCL handles the dataset resolution. Your program simply reads from or writes to the DD name:

       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT ARCHIVE-FILE
               ASSIGN TO ARCOUT
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-ARCHIVE-STATUS.

      *    The COBOL program writes to ARCOUT.
      *    The JCL maps ARCOUT to GBANK.TXN.ARCHIVE(+1).
      *    The system creates the new generation automatically.

📊 GDG Configuration Example

GDG Base:   GBANK.TXN.ARCHIVE
Limit:      365  (keep one year of daily files)
Empty:      YES  (allow empty GDG base to be referenced)
Scratch:    YES  (delete uncataloged generations)
Order:      LIFO (newest generation has highest relative number)

GDG Limit and Rolloff

When the number of generations reaches the limit, the oldest generation is rolled off — uncataloged and optionally deleted. This provides automatic data lifecycle management.

//* Define a GDG base with a limit of 30 generations
//DEFGDG   EXEC PGM=IDCAMS
//SYSPRINT DD  SYSOUT=*
//SYSIN    DD  *
  DEFINE GDG                          -
    (NAME(GBANK.MONTHLY.STMTS)        -
     LIMIT(30)                        -
     SCRATCH                          -
     NOEMPTY)
/*

💡 Key Insight: GDGs are one of the mainframe's most elegant features. They solve a problem — versioned dataset management — that distributed systems often handle with ad hoc scripting and manual cleanup. When Derek Washington first saw GDGs in action at GlobalBank, he said: "So it's basically Git for datasets?" Maria replied: "Git wishes it were this reliable."

38.9 Try It Yourself: Building a Checkpoint/Restart Program

Let us build a complete checkpoint/restart program for the Student Mainframe Lab. This program processes a student transaction file and writes a summary report, with full restart capability.

The Scenario

You have a file of student financial aid disbursements. Each record contains a student ID, disbursement amount, and fund code. You need to post these disbursements to a student account master file, with checkpoint/restart capability.

Step 1: Define the Files

       IDENTIFICATION DIVISION.
       PROGRAM-ID. STUDCHKR.
      *
      * Student Checkpoint/Restart Exercise
      * Demonstrates checkpoint/restart pattern with
      * control totals and audit trail.
      *
       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT DISBURSEMENT-FILE
               ASSIGN TO DISBIN
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-DISB-STATUS.
           SELECT STUDENT-MASTER
               ASSIGN TO STUDMSTR
               ORGANIZATION IS INDEXED
               ACCESS MODE IS RANDOM
               RECORD KEY IS SM-STUDENT-ID
               FILE STATUS IS WS-MSTR-STATUS.
           SELECT CHECKPOINT-FILE
               ASSIGN TO CHKPTFL
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-CHKPT-STATUS.
           SELECT AUDIT-FILE
               ASSIGN TO AUDITFL
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-AUDIT-STATUS.
           SELECT REPORT-FILE
               ASSIGN TO RPTOUT
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-RPT-STATUS.

       DATA DIVISION.
       FILE SECTION.
       FD  DISBURSEMENT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 80 CHARACTERS.
       01  DISB-RECORD                 PIC X(80).

       FD  STUDENT-MASTER
           RECORD CONTAINS 200 CHARACTERS.
       01  STUDENT-MASTER-RECORD.
           05  SM-STUDENT-ID           PIC X(10).
           05  SM-STUDENT-NAME         PIC X(30).
           05  SM-BALANCE              PIC S9(7)V99 COMP-3.
           05  SM-LAST-DISB-DATE       PIC 9(08).
           05  SM-TOTAL-DISBURSED      PIC S9(9)V99 COMP-3.
           05  SM-FILLER               PIC X(141).

       FD  CHECKPOINT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 100 CHARACTERS.
       01  CHECKPOINT-RECORD           PIC X(100).

       FD  AUDIT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 150 CHARACTERS.
       01  AUDIT-FILE-RECORD           PIC X(150).

       FD  REPORT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 132 CHARACTERS.
       01  REPORT-RECORD               PIC X(132).

       WORKING-STORAGE SECTION.
       01  WS-FILE-STATUSES.
           05  WS-DISB-STATUS          PIC X(02).
           05  WS-MSTR-STATUS          PIC X(02).
           05  WS-CHKPT-STATUS         PIC X(02).
           05  WS-AUDIT-STATUS         PIC X(02).
           05  WS-RPT-STATUS           PIC X(02).

       01  WS-FLAGS.
           05  WS-EOF-FLAG             PIC X(01) VALUE 'N'.
               88  WS-EOF              VALUE 'Y'.
           05  WS-RESTART-FLAG         PIC X(01) VALUE 'N'.
               88  WS-IS-RESTART       VALUE 'Y'.
           05  WS-ERROR-FLAG           PIC X(01) VALUE 'N'.
               88  WS-HAS-ERROR        VALUE 'Y'.

       01  WS-COUNTERS.
           05  WS-RECORDS-READ         PIC 9(10) VALUE 0.
           05  WS-RECORDS-POSTED       PIC 9(10) VALUE 0.
           05  WS-RECORDS-REJECTED     PIC 9(10) VALUE 0.
           05  WS-CHECKPOINT-COUNT     PIC 9(05) VALUE 0.
           05  WS-SKIP-COUNT           PIC 9(10) VALUE 0.

       01  WS-TOTALS.
           05  WS-TOTAL-DISBURSED      PIC S9(11)V99 COMP-3
                                       VALUE 0.
           05  WS-HASH-TOTAL           PIC S9(15) COMP-3
                                       VALUE 0.
       01  WS-PARMS.
           05  WS-CHECKPOINT-INTERVAL  PIC 9(05) VALUE 05000.

       01  WS-DISB-WS.
           05  WS-DISB-STUDENT-ID      PIC X(10).
           05  WS-DISB-AMOUNT          PIC S9(7)V99 COMP-3.
           05  WS-DISB-FUND-CODE       PIC X(04).
           05  WS-DISB-DATE            PIC 9(08).
           05  WS-DISB-FILLER          PIC X(49).

Step 2: Implement the Logic

The key paragraphs are in the code file STUDCHKR.cbl provided with this chapter. Study the restart logic carefully — note how the program reads the checkpoint file during initialization and either resumes from where it left off or starts fresh.

🔗 Cross-Reference: This pattern builds on the file handling techniques from Chapters 11–16 and the error handling patterns from Chapter 10.

38.10 GlobalBank Case Study: The Nightly Batch Cycle

Maria Chen arrived at GlobalBank eight years ago, inheriting a nightly batch cycle that was already 15 years old. Over those eight years, she has refined it into a system that runs 363 nights out of 365 without human intervention. The other two nights? "Usually a disk issue or a feed from an external system arriving late," she says. "The COBOL code itself almost never fails."

The Architecture

GlobalBank's nightly cycle consists of 47 batch jobs organized into 12 job streams. The job streams run in a specific dependency order managed by TWS (Tivoli Workload Scheduler). Here is the high-level flow:

                    ┌─────────────┐
                    │   EXTRACT   │
                    │  (Stream 1) │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │    SORT     │
                    │  (Stream 2) │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼────┐ ┌─────▼─────┐
       │  VALIDATE   │ │ ENRICH│ │  XREF     │
       │ (Stream 3)  │ │(Str 4)│ │ (Str 5)   │
       └──────┬──────┘ └──┬────┘ └─────┬─────┘
              │            │            │
              └────────────┼────────────┘
                           │
                    ┌──────▼──────┐
                    │    POST     │
                    │  (Stream 6) │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼────────┐ ┌─▼──────────┐
       │  INTEREST   │ │   FEES    │ │ GL RECON   │
       │ (Stream 7)  │ │(Stream 8) │ │ (Stream 9) │
       └──────┬──────┘ └──┬────────┘ └─┬──────────┘
              │            │            │
              └────────────┼────────────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼────────┐ ┌─▼──────────┐
       │ STATEMENTS  │ │  REPORTS  │ │  ARCHIVE   │
       │(Stream 10)  │ │(Stream 11)│ │(Stream 12) │
       └─────────────┘ └──────────┘ └────────────┘

The Nightly Incident

One Tuesday night, the SORT step (Stream 2) completed but produced a file with 2,341,000 records — 6,891 fewer than the EXTRACT step reported extracting. The VALIDATE step detected the hash total mismatch and aborted with return code 16.

The operations team paged Maria at 12:45 AM.

Maria's investigation: 1. She checked the EXTRACT control total report: 2,347,891 records extracted, hash total 483,291,847,223 2. She checked the SORT output: 2,341,000 records, hash total 481,983,512,190 3. The difference: 6,891 records lost, hash total difference of 1,308,335,033

The cause? A SORT parameter error introduced during a maintenance change that afternoon. The SORT was configured with SUM FIELDS=(21,8,PD), which was intended to sum a packed decimal field but accidentally consolidated records with duplicate keys — 6,891 records that had the same account number (legitimate duplicates from multiple transactions on the same account) were summed instead of kept separate.

Maria fixed the SORT parameter, restarted from the SORT step, and the batch window completed by 5:15 AM — 45 minutes behind schedule but fully reconciled.

"This is why control totals exist," Maria told Derek the next morning. "Without the hash total check in GBVALID, those 6,891 transactions would have been silently lost. Customers would have had incorrect balances, and we might not have caught it for days."

Lessons from the Incident

Control totals caught an error that logic alone could not detect. The SORT completed successfully (return code 0) but produced incorrect output.
Step-to-step control total verification is essential. Every step must verify that its input matches the prior step's output.
The batch architecture supported clean restart. Maria restarted from Step 2 without rerunning Step 1.
GDGs preserved the evidence. The original extract file was still available for the re-sort because it had not been scratched yet.

⚖️ Theme — Defensive Programming: This incident illustrates why defensive programming is not paranoia — it is engineering discipline. The hash total check in GBVALID took Maria 30 minutes to code originally. It saved the bank from a potential multi-million-dollar reconciliation nightmare.

38.11 MedClaim Case Study: Claims Batch Processing Pipeline

At MedClaim, James Okafor manages a claims processing pipeline that handles 500,000 claims per month. The pipeline runs nightly, processing claims received during the day through intake, validation, adjudication, and payment stages.

The Pipeline Architecture

  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
  │  INTAKE  │───▶│ VALIDATE │───▶│ADJUDICATE│───▶│   PAY    │
  │ CLM-INT  │    │ CLM-VAL  │    │ CLM-ADJ  │    │ CLM-PAY  │
  └──────────┘    └──────────┘    └──────────┘    └──────────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
  │ CTL TOT  │    │ CTL TOT  │    │ CTL TOT  │    │ CTL TOT  │
  │  REPORT  │    │  REPORT  │    │  REPORT  │    │  REPORT  │
  └──────────┘    └──────────┘    └──────────┘    └──────────┘

Intake Stage (CLM-INT)

The intake program reads electronic claims from multiple sources — EDI 837 transactions, direct provider submissions, and paper claims entered by the data entry team. Each claim is assigned a unique claim number and written to the claims pending file.

      *    -----------------------------------------------
      *    CLM-INT: Assign claim number and write to
      *    pending file. Track control totals by source.
      *    -----------------------------------------------
       2000-PROCESS-CLAIM.
           ADD 1 TO WS-CLAIMS-READ
           PERFORM 2100-ASSIGN-CLAIM-NUMBER
           PERFORM 2200-VALIDATE-FORMAT
           IF WS-FORMAT-VALID
               PERFORM 2300-WRITE-PENDING
               ADD 1 TO WS-CLAIMS-ACCEPTED
               ADD CLM-CHARGED-AMOUNT
                   TO WS-TOTAL-CHARGED
               ADD CLM-CLAIM-NUMBER
                   TO WS-HASH-CLAIM-NO
           ELSE
               PERFORM 2400-WRITE-REJECT
               ADD 1 TO WS-CLAIMS-REJECTED
           END-IF

      *    Checkpoint every N records
           IF FUNCTION MOD(WS-CLAIMS-READ
                           WS-CHECKPOINT-INTERVAL) = 0
               PERFORM 2500-WRITE-CHECKPOINT
           END-IF.

Adjudication Stage (CLM-ADJ)

The adjudication program is the most complex step. It applies hundreds of business rules to determine coverage, calculate allowed amounts, apply deductibles and copays, and produce an explanation of benefits.

Sarah Kim, the business analyst, maintains a business rules matrix that maps directly to EVALUATE statements in the adjudication code:

      *    -----------------------------------------------
      *    Apply benefit rules based on plan type and
      *    service category
      *    -----------------------------------------------
       3100-APPLY-BENEFIT-RULES.
           EVALUATE CLM-PLAN-TYPE
                    ALSO CLM-SERVICE-CATEGORY
               WHEN 'HMO'  ALSO 'PREV'
                   MOVE 100 TO WS-COVERAGE-PCT
                   MOVE   0 TO WS-COPAY-AMOUNT
               WHEN 'HMO'  ALSO 'SPEC'
                   MOVE  80 TO WS-COVERAGE-PCT
                   MOVE  40 TO WS-COPAY-AMOUNT
               WHEN 'PPO'  ALSO 'PREV'
                   MOVE  90 TO WS-COVERAGE-PCT
                   MOVE  20 TO WS-COPAY-AMOUNT
               WHEN 'PPO'  ALSO 'SPEC'
                   MOVE  70 TO WS-COVERAGE-PCT
                   MOVE  50 TO WS-COPAY-AMOUNT
               WHEN OTHER
                   PERFORM 3110-LOOKUP-CUSTOM-RULES
           END-EVALUATE.

      *    Apply deductible
           IF WS-MEMBER-DEDUCTIBLE-MET = 'N'
               COMPUTE WS-DEDUCTIBLE-REMAINING =
                   WS-ANNUAL-DEDUCTIBLE -
                   WS-DEDUCTIBLE-YTD
               IF WS-ALLOWED-AMOUNT <=
                  WS-DEDUCTIBLE-REMAINING
                   MOVE WS-ALLOWED-AMOUNT
                       TO WS-DEDUCTIBLE-APPLIED
                   MOVE 0 TO WS-PAYABLE-AMOUNT
               ELSE
                   MOVE WS-DEDUCTIBLE-REMAINING
                       TO WS-DEDUCTIBLE-APPLIED
                   COMPUTE WS-PAYABLE-AMOUNT =
                       (WS-ALLOWED-AMOUNT -
                        WS-DEDUCTIBLE-REMAINING) *
                       WS-COVERAGE-PCT / 100
                       - WS-COPAY-AMOUNT
               END-IF
           ELSE
               COMPUTE WS-PAYABLE-AMOUNT =
                   WS-ALLOWED-AMOUNT *
                   WS-COVERAGE-PCT / 100
                   - WS-COPAY-AMOUNT
           END-IF.

End-to-End Control Total Verification

At the end of the pipeline, a reconciliation job verifies that every claim that entered intake is accounted for — either adjudicated and paid, rejected, or pended for manual review:

 MEDCLAIM PIPELINE RECONCILIATION - 2026-03-10
 ===================================================
 INTAKE:
   Claims received:           18,247
   Claims accepted:           17,891
   Claims rejected (format):     356
   Hash total (intake):    892,441,337

 VALIDATION:
   Claims validated:          17,612
   Claims failed validation:     279
   Hash total (valid):     862,113,448
   Hash total (invalid):    30,327,889
   Combined hash:          892,441,337  ✓ MATCHES INTAKE

 ADJUDICATION:
   Claims adjudicated:       17,612
   Claims approved:          16,198
   Claims denied:             1,414
   Total charged:       $4,271,443.82
   Total allowed:       $3,198,204.11
   Total payable:       $2,847,291.03

 PAYMENT:
   Claims paid:              16,198
   Total paid:           $2,847,291.03  ✓ MATCHES ADJUDICATION

 PIPELINE VERIFICATION:       BALANCED  ✓

🔴 MedClaim Reality Check: James Okafor's team once discovered a bug where the adjudication program was not counting denied claims in its hash total. The pipeline reconciliation caught the error — the adjudication hash total was short by 47 claims. "Those 47 denied claims would have disappeared from our system," James said. "No denial letters, no member notifications, no regulatory reporting. Control totals saved us from a compliance violation."

38.12 Advanced Patterns

Pattern: Commit Scope Management

When working with DB2, you control the commit scope — how many records you process before issuing a COMMIT:

       2000-PROCESS-RECORDS.
           PERFORM UNTIL WS-EOF
               READ INPUT-FILE INTO WS-INPUT-RECORD
                   AT END
                       SET WS-EOF TO TRUE
               END-READ

               IF NOT WS-EOF
                   PERFORM 2100-PROCESS-ONE-RECORD
                   ADD 1 TO WS-RECORDS-SINCE-COMMIT

                   IF WS-RECORDS-SINCE-COMMIT >=
                      WS-COMMIT-FREQUENCY
                       EXEC SQL COMMIT END-EXEC
                       MOVE 0 TO WS-RECORDS-SINCE-COMMIT
                       PERFORM 2500-WRITE-CHECKPOINT
                   END-IF
               END-IF
           END-PERFORM.

Pattern: Error Threshold Management

Production programs should not process indefinitely when errors are accumulating. Set an error threshold:

       2100-PROCESS-ONE-RECORD.
           PERFORM 2110-VALIDATE-RECORD
           IF WS-RECORD-VALID
               PERFORM 2120-APPLY-BUSINESS-RULES
               PERFORM 2130-WRITE-OUTPUT
               ADD 1 TO WS-RECORDS-PROCESSED
           ELSE
               PERFORM 2140-WRITE-ERROR
               ADD 1 TO WS-RECORDS-REJECTED

      *        Check error threshold
               COMPUTE WS-ERROR-PCT =
                   (WS-RECORDS-REJECTED /
                    WS-RECORDS-READ) * 100

               IF WS-ERROR-PCT > WS-MAX-ERROR-PCT
                   DISPLAY 'ERROR THRESHOLD EXCEEDED: '
                           WS-ERROR-PCT '% errors'
                   DISPLAY 'Maximum allowed: '
                           WS-MAX-ERROR-PCT '%'
                   MOVE 12 TO RETURN-CODE
                   PERFORM 9000-TERMINATION
                   STOP RUN
               END-IF
           END-IF.

Pattern: Run Control Table

A run control table tracks which batch steps have completed for each business date:

       01  WS-RUN-CONTROL-RECORD.
           05  RC-BUSINESS-DATE        PIC 9(08).
           05  RC-STEP-NAME            PIC X(08).
           05  RC-STATUS               PIC X(01).
               88  RC-NOT-STARTED      VALUE 'N'.
               88  RC-IN-PROGRESS      VALUE 'P'.
               88  RC-COMPLETED        VALUE 'C'.
               88  RC-FAILED           VALUE 'F'.
           05  RC-START-TIMESTAMP      PIC X(26).
           05  RC-END-TIMESTAMP        PIC X(26).
           05  RC-RECORDS-PROCESSED    PIC 9(10).
           05  RC-RETURN-CODE          PIC 9(04).
           05  RC-CONTROL-TOTALS.
               10  RC-HASH-TOTAL      PIC S9(15) COMP-3.
               10  RC-FIN-TOTAL       PIC S9(13)V99 COMP-3.
               10  RC-RECORD-COUNT    PIC 9(10).

This table enables: - Restart detection: Check if a step already completed for today - Dependency verification: Confirm prerequisite steps completed - Progress monitoring: Operations can see which step is running - Audit: Complete history of all batch runs

38.13 Return Codes and Job Stream Communication

COBOL programs communicate with JCL and the scheduler through return codes. Setting the return code correctly is critical for job stream behavior.

      *    Set return code based on processing results
       9200-SET-RETURN-CODE.
           EVALUATE TRUE
               WHEN WS-HAS-ERROR AND WS-ERROR-SEVERE
                   MOVE 16 TO RETURN-CODE
               WHEN WS-HAS-ERROR
                   MOVE 12 TO RETURN-CODE
               WHEN WS-HAS-WARNING
                   MOVE 04 TO RETURN-CODE
               WHEN OTHER
                   MOVE 00 TO RETURN-CODE
           END-EVALUATE

           DISPLAY 'Program ' WS-PROGRAM-ID
                   ' completed with RC=' RETURN-CODE.

The scheduler uses these return codes to decide which downstream jobs to trigger:

Return Code	Meaning	Scheduler Action
0	Normal completion	Continue to next job
4	Warning	Continue, but flag for review
8	Error	Stop dependent jobs, alert operator
12	Severe error	Stop job stream, page on-call
16	Catastrophic	Stop all processing, emergency page

✅ Best Practice: Always display the return code before exiting. When operations is investigating a failure at 3 AM, this single line in the SYSOUT saves valuable time.

38.14 Performance Considerations

Batch processing performance matters because the batch window is finite. Here are the key techniques:

Buffering

       FD  LARGE-INPUT-FILE
           BLOCK CONTAINS 0 RECORDS
           RECORDING MODE IS F.

The BLOCK CONTAINS 0 RECORDS clause tells the system to use the blocksize specified in the JCL or dataset DCB, which is typically optimized for the device. This can reduce I/O by a factor of 10 or more.

Minimize OPEN/CLOSE

Opening and closing files is expensive. If your program processes multiple files, open all of them at the start and close them at the end — do not open and close files repeatedly within your processing loop.

Efficient Key Comparison

In the balanced update pattern, the key comparison happens for every record. Use numeric comparisons for numeric keys (faster than alphanumeric), and avoid unnecessary MOVEs in the inner loop.

SORT Integration

Let the system SORT utility handle sorting — it is highly optimized. Using an internal SORT (SORT verb in COBOL) is acceptable for small files, but for millions of records, the external SORT utility with its own JCL step is faster.

38.15 Putting It All Together

Let us trace through a complete batch processing scenario to see how all the patterns fit together.

Scenario: GlobalBank needs to process 2.3 million daily transactions. The processing must be restartable, auditable, and reconcilable.

The complete program structure:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. GBPOST.
      *
      * GlobalBank Transaction Posting Program
      * Posts daily transactions to the account master.
      * Features: checkpoint/restart, control totals,
      *           audit trail, error threshold management.
      *

       PROCEDURE DIVISION.
       0000-MAIN-CONTROL.
           PERFORM 1000-INITIALIZATION
           PERFORM 2000-PROCESS-TRANSACTIONS
               UNTIL WS-EOF
           PERFORM 8000-VERIFY-CONTROL-TOTALS
           PERFORM 8500-WRITE-CONTROL-REPORT
           PERFORM 9000-TERMINATION
           STOP RUN.

       1000-INITIALIZATION.
           PERFORM 1100-OPEN-FILES
           PERFORM 1200-READ-CHECKPOINT
           IF WS-IS-RESTART
               PERFORM 1300-POSITION-FOR-RESTART
           ELSE
               PERFORM 1400-FRESH-START
           END-IF
           PERFORM 1500-READ-PARMS.

       2000-PROCESS-TRANSACTIONS.
           READ TXN-FILE INTO WS-TXN-RECORD
               AT END
                   SET WS-EOF TO TRUE
           END-READ

           IF NOT WS-EOF
               ADD 1 TO WS-RECORDS-READ
               PERFORM 2100-VALIDATE-TXN
               IF WS-TXN-VALID
                   PERFORM 2200-LOOKUP-ACCOUNT
                   IF WS-ACCOUNT-FOUND
                       PERFORM 2300-SAVE-BEFORE-IMAGE
                       PERFORM 2400-APPLY-TXN
                       PERFORM 2500-REWRITE-ACCOUNT
                       PERFORM 2600-WRITE-AUDIT
                       ADD 1 TO WS-RECORDS-POSTED
                   ELSE
                       PERFORM 2700-WRITE-UNMATCHED
                       ADD 1 TO WS-RECORDS-REJECTED
                   END-IF
               ELSE
                   PERFORM 2800-WRITE-INVALID
                   ADD 1 TO WS-RECORDS-REJECTED
               END-IF

      *        Accumulate control totals
               ADD TXN-AMOUNT   TO WS-FINANCIAL-TOTAL
               ADD TXN-ACCT-NO  TO WS-HASH-TOTAL

      *        Checkpoint at interval
               IF FUNCTION MOD(WS-RECORDS-READ
                               WS-CHECKPOINT-INTERVAL) = 0
                   PERFORM 2900-WRITE-CHECKPOINT
               END-IF

      *        Check error threshold
               PERFORM 2950-CHECK-ERROR-THRESHOLD
           END-IF.

This program demonstrates every pattern from this chapter: - Checkpoint/restart (paragraphs 1200, 1300, 2900) - Control totals (paragraph 8000) - Audit trail (paragraph 2600) - Error threshold (paragraph 2950) - Run control (paragraph 1000) - Return code management (paragraph 9000)

38.16 GDG Management in Production

Generation Data Groups require careful management in production environments. The patterns described in section 38.8 cover the basics, but real-world GDG management involves several additional considerations that Maria Chen and her team handle routinely at GlobalBank.

GDG Model Copies and DCB Attributes

When you create a new generation, the system can inherit DCB attributes (record format, record length, block size) from a model dataset. This ensures consistency across generations:

//* Define a model dataset for the GDG
//DEFMODEL EXEC PGM=IEFBR14
//GDGMODEL DD  DSN=GBANK.TXN.ARCHIVE.MODEL,
//             DISP=(NEW,CATLG),
//             SPACE=(TRK,(0)),
//             DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//*
//* New generations inherit DCB from model
//ARCOUT   DD  DSN=GBANK.TXN.ARCHIVE(+1),
//             DISP=(NEW,CATLG,DELETE),
//             SPACE=(CYL,(200,100),RLSE),
//             DCB=GBANK.TXN.ARCHIVE.MODEL

Without a model dataset, each generation must explicitly specify its DCB attributes in the JCL. If a developer forgets or uses inconsistent values, downstream programs that read the GDG may fail or produce incorrect results.

GDG Recoverability

When a batch job that creates a GDG generation fails, you face a decision: the partially written generation exists in the catalog. The DISP=(NEW,CATLG,DELETE) parameter tells the system to delete the dataset if the step abends. But what if the step completes with a high return code instead of abending? Maria's standard practice is explicit cleanup:

//* Step to clean up failed GDG generation
//    IF (ARCHIVE.RC > 4) THEN
//CLEANUP  EXEC PGM=IEFBR14
//BADGEN   DD  DSN=GBANK.TXN.ARCHIVE(+1),
//             DISP=(OLD,DELETE,DELETE)
//    ENDIF

📊 GDG Housekeeping Checklist - Verify GDG limit is set appropriately (too low = premature rolloff, too high = wasted DASD) - Monitor DASD consumption — 365 generations of a large file consume significant space - Periodically verify that the GDG base and all active generations are consistent in the catalog - Use IDCAMS LISTCAT to audit GDG contents during monthly maintenance windows - Document which GDGs have downstream dependencies (regulatory archives, audit trails)

GDG and Tape Management

For long-term archival, older GDG generations are often migrated to tape. Tape management systems like CA-1 (TMS) or DFSMSrmm coordinate with GDG rolloff:

GDG Generation Policy Example:
  Generations 0 to -30:   DASD (immediate access)
  Generations -31 to -365: Migrated to tape (recall delay ~30 seconds)
  Generations older than -365: Expired and scratched

This tiered storage approach balances cost against access speed. Maria's team can access yesterday's archive instantly but must wait 30 seconds for last month's archive to be recalled from tape.

⚠️ Tape Recall Trap: If a production job references a GDG generation that has been migrated to tape, the job will wait for the tape mount. At 2 AM, there may be no operator available to mount the tape. Always verify that your job's GDG references point to DASD-resident generations, or ensure automated tape library support is available.

38.17 Production Scheduling: TWS and CA-7

Enterprise batch scheduling is far more sophisticated than simply running jobs in sequence. Scheduling tools manage dependencies, calendars, alerts, and automatic recovery.

Tivoli Workload Scheduler (TWS / IBM Workload Automation)

TWS (now IBM Workload Automation) is the scheduling system used at GlobalBank and many other mainframe shops. It manages:

Job dependencies: Job B waits for Job A to complete with RC ≤ 4
Resource dependencies: Only one job at a time can access the account master
Calendar dependencies: Month-end jobs run only on the last business day
Time dependencies: Statement generation cannot start before 4:00 AM
Special resources: Printer availability, external feed arrival

Defining a Job Stream in TWS

TWS uses an application description to define job streams:

APPLICATION: GBNIGHTLY
  OWNER: BATCHOPS
  CALENDAR: GBANK-WEEKDAY
  RUN DAYS: MON TUE WED THU FRI
  EXCLUDED: GBANK-HOLIDAYS

  OPERATION: GBEXTRACT
    JOB: GBANK.PROD.JCL(GBEXTRACT)
    PREDECESSORS: NONE (triggered by time 23:00)
    RECOVERY: RERUN

  OPERATION: GBSORT01
    JOB: GBANK.PROD.JCL(GBSORT01)
    PREDECESSORS: GBEXTRACT (RC ≤ 4)
    RECOVERY: RERUN

  OPERATION: GBVALID
    JOB: GBANK.PROD.JCL(GBVALID)
    PREDECESSORS: GBSORT01 (RC ≤ 4)
    RECOVERY: RERUN

  OPERATION: GBPOST
    JOB: GBANK.PROD.JCL(GBPOST)
    PREDECESSORS: GBVALID (RC ≤ 4)
    SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
    RECOVERY: STOP (manual intervention required)
    ALERT: PAGE ONCALL-DBA IF RC > 4

💡 Key Insight: Notice that GBPOST specifies RECOVERY: STOP instead of RERUN. This is because GBPOST modifies the account master file — rerunning it blindly could double-post transactions. The on-call DBA must verify the checkpoint status before restarting. This is a deliberate design decision that reflects the risk profile of the step.

CA-7 (Broadcom Workload Automation)

CA-7 is the other major mainframe scheduling system. It uses a different vocabulary but provides similar capabilities. Where TWS uses "applications" and "operations," CA-7 uses "job networks" and "schedule IDs":

SCHID: 001   JOB: GBEXTRACT   TRIGTYPE: TIME(2300)
SCHID: 002   JOB: GBSORT01    TRIGTYPE: JOB(GBEXTRACT)  MAXRC: 4
SCHID: 003   JOB: GBVALID     TRIGTYPE: JOB(GBSORT01)   MAXRC: 4
SCHID: 004   JOB: GBPOST      TRIGTYPE: JOB(GBVALID)    MAXRC: 4

Calendar Management

Production batch schedules must account for: - Business days vs. calendar days: Month-end processing runs on the last business day, not December 31 - Holidays: No processing on bank holidays (but some feeds still arrive) - Month-end / quarter-end / year-end: Additional jobs for period-close processing - Daylight saving time: The batch window shifts by one hour twice a year

CALENDAR: GBANK-WEEKDAY
  EXCLUDE: SATURDAY, SUNDAY
  EXCLUDE: GBANK-HOLIDAYS  (separate holiday calendar)
  SPECIAL: LAST-BUSINESS-DAY (triggers month-end jobs)
  SPECIAL: QUARTER-END (triggers regulatory reporting)

Maria Chen maintains GlobalBank's holiday calendar a year in advance. "Missing a holiday in the calendar means the batch tries to run when the Federal Reserve wire system is closed," she explains. "That is a very bad day."

Monitoring and Alerting

Production scheduling systems provide real-time monitoring dashboards:

GBNIGHTLY Status — 2026-03-11 02:47:33
═══════════════════════════════════════════
GBEXTRACT  ▓▓▓▓▓▓▓▓▓▓  COMPLETE   RC=0   23:15
GBSORT01   ▓▓▓▓▓▓▓▓▓▓  COMPLETE   RC=0   23:28
GBVALID    ▓▓▓▓▓▓▓▓▓▓  COMPLETE   RC=0   00:05
GBPOST     ▓▓▓▓▓▓▓▓░░  RUNNING    ---    00:22
GBINTCALC  ░░░░░░░░░░  WAITING    ---    ---
GBFEECALC  ░░░░░░░░░░  WAITING    ---    ---
GBGLREC    ░░░░░░░░░░  WAITING    ---    ---
GBSTMT     ░░░░░░░░░░  WAITING    ---    ---
GBARCHIVE  ░░░░░░░░░░  WAITING    ---    ---

Estimated completion: 05:12
Batch window closes:  06:00
Margin:               48 minutes

The operations team watches this dashboard throughout the night. If the estimated completion time approaches the batch window close time, they can take proactive measures — canceling low-priority jobs, increasing system priority for critical jobs, or extending the window if possible.

✅ Best Practice: Build slack into the batch window. If your batch cycle typically completes in 5 hours, the 7-hour window gives you 2 hours of margin. That margin is consumed by retries, restart/recovery, and the inevitable nights when volumes spike or a disk gets slow.

38.18 Restart/Recovery: A Complete Worked Example

Let us walk through a restart scenario step by step to see exactly how all the pieces fit together. This is the scenario that keeps COBOL developers employed at 3 AM.

The Scenario

It is Tuesday night at GlobalBank. The GBPOST job (transaction posting) starts at 12:22 AM with 2,347,891 transactions to process. The checkpoint interval is 10,000 records. At 1:47 AM, after processing 1,230,000 transactions (checkpoint 123 written), a DB2 tablespace runs out of space. The job abends with a S04E system abend code.

Step 1: Diagnosis (1:52 AM)

The operations team pages Maria. She checks the job output:

GBPOST  - TRANSACTION POSTING
STARTED: 2026-03-11  00:22:15
CHECKPOINT 123 WRITTEN AT 01:43:22
  RECORDS READ:      1,230,000
  RECORDS POSTED:    1,221,847
  RECORDS REJECTED:      8,153
  HASH TOTAL:    287,441,338,291
  FINANCIAL TOTAL: $412,847,291.33
  LAST KEY PROCESSED: 4472819003

*** ABEND S04E AT 01:47:33 ***
  RECORDS READ SINCE CHECKPOINT: 4,217
  THESE RECORDS MAY NEED REPROCESSING

Step 2: Fix the Root Cause (2:05 AM)

Maria contacts the DBA on call. The DB2 tablespace for the audit trail table is full. The DBA extends the tablespace:

ALTER TABLESPACE GBANK.AUDITTS
  ADD VOLUME(VOL003)
  BUFFERPOOL BP32K;

Step 3: Verify Checkpoint Integrity (2:15 AM)

Maria reads the checkpoint file to verify it is consistent:

//CHKREAD  EXEC PGM=IDCAMS
//SYSPRINT DD  SYSOUT=*
//SYSIN    DD  *
  PRINT INFILE(CHKPTFL) CHARACTER
/*
//CHKPTFL  DD  DSN=GBANK.GBPOST.CHECKPOINT,DISP=SHR

The checkpoint shows: - Program ID: GBPOST - Status: P (in progress — confirming the job did not complete) - Records read: 1,230,000 - Last key: 4472819003 - All control totals present

Step 4: Restart the Job (2:22 AM)

Maria restarts GBPOST. The scheduler submits the same JCL. The COBOL program:

Opens the checkpoint file
Reads the checkpoint record — finds status "P" (in progress)
Sets restart flag to "Y"
Restores all counters from the checkpoint
Reads and skips 1,230,000 input records (this takes about 90 seconds for 2.3 million records)
Resumes processing at record 1,230,001

GBPOST  - TRANSACTION POSTING (RESTART)
STARTED: 2026-03-11  02:22:45
RESTART DETECTED - CHECKPOINT 123
  RESTORING COUNTERS FROM CHECKPOINT
  RECORDS TO SKIP: 1,230,000
  SKIPPING... COMPLETE (92 seconds)
  RESUMING FROM RECORD 1,230,001
  LAST KEY FROM CHECKPOINT: 4472819003

PROCESSING RESUMED AT 02:24:17

Step 5: Completion and Verification (3:48 AM)

The restarted job processes the remaining 1,117,891 transactions and completes:

GBPOST  - COMPLETED (RESTART RUN)
  TOTAL RECORDS READ:      2,347,891
  TOTAL RECORDS POSTED:    2,331,204
  TOTAL RECORDS REJECTED:     16,687
  HASH TOTAL:          483,291,847,223
  FINANCIAL TOTAL:  $  847,291,433.27
  RETURN CODE: 0

  CHECKPOINT FINAL STATUS: C (COMPLETED)
  RESTART: YES (FROM CHECKPOINT 123)
  ELAPSED TIME: 01:25:33 (restart portion)

Step 6: Continue the Batch Window (3:48 AM)

The scheduler sees GBPOST completed with RC=0 and triggers the downstream jobs: GBINTCALC, GBFEECALC, and GBGLREC. The batch window completes at 5:38 AM — 22 minutes behind the normal schedule but safely within the 6:00 AM deadline.

Lessons from This Restart

The checkpoint interval of 10,000 records meant a maximum of 9,999 records needed reprocessing — in this case, the 4,217 records read between checkpoint 123 and the abend. But because those records had already been applied to the master file (VSAM REWRITE), the restart logic must handle idempotency: the COBOL program checks whether a transaction has already been posted (by comparing the transaction timestamp against the account's last-update timestamp) before applying it.
The skip phase (90 seconds) was the longest part of the restart initialization. For very large files, consider using the checkpoint's record-count to POSITION the file using START (for indexed files) or to use a relative record number (for relative files). Sequential files require reading and discarding records, which is slower.
The DB2 space issue was the root cause, not the COBOL program. Production failures are more often infrastructure issues (disk, memory, network) than logic bugs. Good restart/recovery handles both.

🧪 Try It Yourself: Using the STUDCHKR program from section 38.9, deliberately interrupt the program (Ctrl+C or kill the process) during processing. Then restart it and verify that it resumes correctly from the last checkpoint. Compare the final control totals with an uninterrupted run to confirm they match.

38.19 Multi-Step Job Dependencies: Advanced Patterns

Real production batch environments have dependency patterns far more complex than simple linear chains. Understanding these patterns is essential for designing restartable, efficient batch architectures.

The Diamond Dependency

A diamond dependency occurs when two parallel streams must both complete before a downstream job can start:

         GBEXTRACT
            │
        ┌───┴───┐
        ▼       ▼
     GBSORT   GBXREF
        │       │
        └───┬───┘
            ▼
        GBMERGE

GBMERGE requires both GBSORT and GBXREF to complete successfully. If GBSORT finishes but GBXREF fails, GBMERGE must wait. If both fail, fixing and restarting requires attention to the order: either can be restarted independently, but GBMERGE must wait for both.

In TWS, this is expressed as:

OPERATION: GBMERGE
  PREDECESSORS: GBSORT (RC ≤ 4) AND GBXREF (RC ≤ 4)

The Conditional Branch

Some jobs should only run when specific conditions are met:

        GBPOST
           │
     ┌─────┼─────┐
     ▼     ▼     ▼
  GBFEES  GBINT  GBALERT
  (always) (always) (only if RC=4 from GBPOST)

GBALERT is a notification job that runs only when GBPOST reports warnings (RC=4, indicating rejected transactions above a threshold). In normal operation (RC=0), it is skipped.

The Resource Fence

Some jobs cannot run simultaneously because they compete for a shared resource:

GBPOST:    requires ACCT-MASTER (exclusive)
GBINTCALC: requires ACCT-MASTER (exclusive)
GBFEECALC: requires ACCT-MASTER (exclusive)

These three jobs CANNOT run in parallel even though
their logical dependencies would allow it.

TWS manages this with special resources:

OPERATION: GBPOST
  SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
OPERATION: GBINTCALC
  SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
  PREDECESSORS: GBPOST (RC ≤ 4)
OPERATION: GBFEECALC
  SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
  PREDECESSORS: GBINTCALC (RC ≤ 4)

The External Trigger

Some batch jobs depend on external events — a file arriving from another organization, a time window opening, or a manual approval:

OPERATION: GBCARDFEED
  TRIGGER: EXTERNAL FILE ARRIVAL
    DATASET: VISA.DAILY.FEED.D*
    TIMEOUT: 01:30 (if file not received by 1:30 AM, alert)

At GlobalBank, the Visa card transaction feed arrives between 11:30 PM and 12:30 AM. If it has not arrived by 1:30 AM, the scheduler alerts the operations team, who contacts Visa's operations center. Maria Chen has been through this scenario dozens of times: "The feed is late about once a month. Usually it is a network issue on their end. We have a 90-minute buffer in the schedule for exactly this reason."

⚖️ Theme — Defensive Programming: Every dependency pattern represents a potential failure mode. Diamond dependencies can deadlock if one branch fails. Conditional branches can skip critical jobs if the condition is wrong. Resource fences can create bottlenecks. External triggers can block the entire batch window. Designing for these failure modes — not just the happy path — is what separates a reliable batch architecture from a fragile one.

38.20 Chapter Summary

Batch processing is the backbone of enterprise computing, and the patterns in this chapter are what make it reliable. Checkpoint/restart ensures recovery from failures. Control totals catch errors that logic cannot detect. Audit trails satisfy regulators and protect the organization. The balanced update pattern processes sorted files efficiently. GDGs manage versioned datasets automatically. And JCL ties it all together into orchestrated job streams.

These patterns are not glamorous. They do not make for exciting conference talks or viral blog posts. But they are the reason that 2.3 million transactions get posted correctly every single night at GlobalBank, that 500,000 claims get adjudicated every month at MedClaim, and that the financial infrastructure of the world keeps running while the rest of us sleep.

Derek Washington summed it up well after his first night observing the batch window: "It's like watching a symphony where every instrument plays exactly on cue, and nobody in the audience even knows it's happening."

Maria's response: "That's the point."

🔗 Looking Ahead: In Chapter 39, we will explore how COBOL systems integrate with real-time systems — message queues, web services, and APIs. The batch world and the online world are converging, and modern COBOL developers need to be comfortable in both.