> "Batch processing is the heartbeat of the mainframe. Online systems get the glory, but batch jobs do the heavy lifting — every single night, without fail, while the rest of the world sleeps."
In This Chapter
- 38.1 Batch Architecture: The Big Picture
- 38.2 Restart and Recovery
- 38.3 Control Totals
- 38.4 Audit Trails
- 38.5 The Balanced Update Pattern
- 38.6 Multi-Step Job Streams
- 38.7 JCL for Complex Batch
- 38.8 Generation Data Groups (GDG)
- 38.9 Try It Yourself: Building a Checkpoint/Restart Program
- 38.10 GlobalBank Case Study: The Nightly Batch Cycle
- 38.11 MedClaim Case Study: Claims Batch Processing Pipeline
- 38.12 Advanced Patterns
- 38.13 Return Codes and Job Stream Communication
- 38.14 Performance Considerations
- 38.15 Putting It All Together
- 38.16 GDG Management in Production
- 38.17 Production Scheduling: TWS and CA-7
- 38.18 Restart/Recovery: A Complete Worked Example
- 38.19 Multi-Step Job Dependencies: Advanced Patterns
- 38.20 Chapter Summary
Chapter 38: Batch Processing Patterns
"Batch processing is the heartbeat of the mainframe. Online systems get the glory, but batch jobs do the heavy lifting — every single night, without fail, while the rest of the world sleeps." — Maria Chen, reviewing GlobalBank's nightly batch schedule
You have written batch programs before. You have read records from a file, processed them, and written output. But the batch programs running in production at banks, insurance companies, and government agencies are a different animal entirely. They run in coordinated sequences. They restart cleanly after failures. They track every dollar, every record, every change. They produce audit trails that satisfy regulators. And they do all of this across millions of records every night within a window that keeps shrinking as transaction volumes grow.
This chapter teaches you the patterns that make enterprise batch processing reliable, auditable, and recoverable. These are not theoretical abstractions — they are the techniques that Maria Chen uses every night at GlobalBank and that James Okafor relies on at MedClaim. When a job fails at 2:47 AM and the operations team pages you, these patterns are what stand between a smooth restart and a very long night.
38.1 Batch Architecture: The Big Picture
Before we dive into code, you need to understand how enterprise batch processing is organized. A production batch environment is not a single program — it is an orchestrated system of interconnected jobs that must execute in a specific order, within a specific window, producing specific outputs.
The Batch Window
The batch window is the time period — typically overnight — when batch jobs run. At GlobalBank, the batch window opens at 11:00 PM when the last online CICS region quiesces, and it must close by 6:00 AM when online processing resumes. That gives Maria's team seven hours to process the entire day's work.
📊 GlobalBank Nightly Batch Window - 11:00 PM — Online regions quiesce, batch window opens - 11:15 PM — Extract jobs pull daily transactions - 11:45 PM — Validation and enrichment jobs - 12:30 AM — Core processing: interest calculation, fee assessment - 2:00 AM — Account updates: post transactions to master files - 3:00 AM — General ledger reconciliation - 4:00 AM — Statement generation, regulatory reporting - 5:00 AM — Data distribution: feeds to downstream systems - 5:30 AM — Housekeeping: archive, backup, catalog maintenance - 6:00 AM — Batch window closes, online regions restart
If any job runs long or fails, every subsequent job is affected. This is why restart/recovery is not optional — it is essential.
Job Streams and Dependencies
A job stream is a sequence of batch jobs that must execute in order. Dependencies can be:
- Sequential: Job B cannot start until Job A completes successfully
- Parallel: Jobs C and D can run simultaneously after Job B
- Conditional: Job E runs only if Job C produced output
- Resource-based: Job F requires exclusive access to a dataset
💡 Key Insight: Modern mainframe scheduling tools like CA-7, TWS (Tivoli Workload Scheduler), and Control-M manage these dependencies automatically. But as a COBOL developer, you must understand the dependencies because your program's design — especially its restart behavior — must align with the job stream architecture.
The Typical Batch Program Structure
Most enterprise batch COBOL programs follow a common structure:
INITIALIZATION
- Open files
- Load reference tables
- Initialize counters and accumulators
- Read checkpoint file (for restart)
- Position to restart point (if restarting)
MAIN PROCESSING LOOP
- Read input record
- Validate record
- Apply business rules
- Write output
- Increment counters
- Write checkpoint at intervals
TERMINATION
- Write final control totals
- Write audit trail records
- Close all files
- Write completion status
- Return appropriate return code
Every pattern in this chapter fits into this structure. Let us examine each one.
38.2 Restart and Recovery
Restart/recovery is the most important pattern in enterprise batch processing. A program that processes 10 million records must be able to resume from a failure point without reprocessing records that were already handled successfully.
Why Restart Matters
Consider GlobalBank's nightly transaction posting job. It reads 2.3 million transactions and posts them against the account master file. If the job fails at transaction 1,500,000 due to a disk error, you have two choices:
- Restart from the beginning: Reprocess 1.5 million transactions. But those transactions already updated the master file. You would double-post them, corrupting every affected account.
- Restart from the failure point: Resume at transaction 1,500,001. This requires knowing exactly where you left off and ensuring the last batch of updates was either fully committed or fully backed out.
Option 2 is the only viable approach. This is what checkpoint/restart provides.
Checkpoint Records
A checkpoint is a snapshot of your program's state at a specific point in processing. It records:
- The number of records processed so far
- Current control total values
- The key of the last record successfully processed
- Any in-flight accumulations
- A timestamp
You write checkpoints at regular intervals — every 5,000 records, every 10,000 records, or at natural boundaries in your data (such as after each account number changes).
01 WS-CHECKPOINT-RECORD.
05 CHKPT-PROGRAM-ID PIC X(08).
05 CHKPT-RUN-DATE PIC 9(08).
05 CHKPT-RUN-TIME PIC 9(06).
05 CHKPT-TIMESTAMP PIC X(26).
05 CHKPT-RECORDS-READ PIC 9(10).
05 CHKPT-RECORDS-WRITTEN PIC 9(10).
05 CHKPT-RECORDS-REJECTED PIC 9(10).
05 CHKPT-LAST-KEY PIC X(20).
05 CHKPT-HASH-TOTAL PIC S9(15) COMP-3.
05 CHKPT-FINANCIAL-TOTAL PIC S9(13)V99 COMP-3.
05 CHKPT-STATUS PIC X(01).
88 CHKPT-IN-PROGRESS VALUE 'P'.
88 CHKPT-COMPLETED VALUE 'C'.
88 CHKPT-ABENDED VALUE 'A'.
05 CHKPT-INTERVAL-COUNT PIC 9(05).
05 CHKPT-FILLER PIC X(50).
The Checkpoint/Restart Pattern
Here is the core logic for checkpoint/restart:
PROCEDURE DIVISION.
0000-MAIN-CONTROL.
PERFORM 1000-INITIALIZATION
PERFORM 2000-PROCESS-RECORDS
UNTIL WS-EOF-FLAG = 'Y'
PERFORM 9000-TERMINATION
STOP RUN.
1000-INITIALIZATION.
OPEN INPUT TXN-FILE
OPEN I-O ACCT-MASTER
OPEN I-O CHECKPOINT-FILE
* -----------------------------------------------
* Check for restart: read checkpoint file
* -----------------------------------------------
READ CHECKPOINT-FILE INTO WS-CHECKPOINT-RECORD
AT END
MOVE 'N' TO WS-RESTART-FLAG
NOT AT END
IF CHKPT-IN-PROGRESS
MOVE 'Y' TO WS-RESTART-FLAG
PERFORM 1500-POSITION-FOR-RESTART
ELSE
MOVE 'N' TO WS-RESTART-FLAG
END-IF
END-READ
IF WS-RESTART-FLAG = 'N'
INITIALIZE WS-CHECKPOINT-RECORD
MOVE 'TXNPOST' TO CHKPT-PROGRAM-ID
MOVE WS-RUN-DATE TO CHKPT-RUN-DATE
SET CHKPT-IN-PROGRESS TO TRUE
END-IF.
1500-POSITION-FOR-RESTART.
* -----------------------------------------------
* Skip records already processed
* Restore counters from checkpoint
* -----------------------------------------------
MOVE CHKPT-RECORDS-READ TO WS-RECORDS-READ
MOVE CHKPT-RECORDS-WRITTEN TO WS-RECORDS-WRITTEN
MOVE CHKPT-RECORDS-REJECTED TO WS-RECORDS-REJECTED
MOVE CHKPT-HASH-TOTAL TO WS-HASH-TOTAL
MOVE CHKPT-FINANCIAL-TOTAL TO WS-FINANCIAL-TOTAL
PERFORM VARYING WS-SKIP-COUNT
FROM 1 BY 1
UNTIL WS-SKIP-COUNT > CHKPT-RECORDS-READ
OR WS-EOF-FLAG = 'Y'
READ TXN-FILE INTO WS-TXN-RECORD
AT END
MOVE 'Y' TO WS-EOF-FLAG
END-READ
END-PERFORM
DISPLAY 'RESTART: Skipped ' WS-SKIP-COUNT
' records, resuming from record '
CHKPT-RECORDS-READ.
⚠️ Critical Point: When restarting, you must skip exactly the right number of input records AND ensure that any partially updated master file records are handled correctly. This is why many shops use VSAM with backout capabilities or DB2 with commit/rollback — they provide transactional integrity that sequential files cannot.
Checkpoint Interval
How often should you write checkpoints? The answer involves a trade-off:
- More frequent checkpoints = less reprocessing after failure, but more I/O overhead
- Less frequent checkpoints = better performance, but more reprocessing after failure
A common guideline is to checkpoint every 5,000 to 10,000 records. At GlobalBank, Maria's team uses a configurable checkpoint interval stored in a parameter file:
01 WS-PARM-RECORD.
05 PARM-CHECKPOINT-INTERVAL PIC 9(05) VALUE 5000.
05 PARM-COMMIT-FREQUENCY PIC 9(05) VALUE 1000.
05 PARM-MAX-ERRORS PIC 9(05) VALUE 100.
Writing the Checkpoint
2500-WRITE-CHECKPOINT.
MOVE WS-RECORDS-READ TO CHKPT-RECORDS-READ
MOVE WS-RECORDS-WRITTEN TO CHKPT-RECORDS-WRITTEN
MOVE WS-RECORDS-REJECTED TO CHKPT-RECORDS-REJECTED
MOVE WS-LAST-KEY-PROCESSED
TO CHKPT-LAST-KEY
MOVE WS-HASH-TOTAL TO CHKPT-HASH-TOTAL
MOVE WS-FINANCIAL-TOTAL TO CHKPT-FINANCIAL-TOTAL
ADD 1 TO CHKPT-INTERVAL-COUNT
MOVE FUNCTION CURRENT-DATE TO CHKPT-TIMESTAMP
SET CHKPT-IN-PROGRESS TO TRUE
REWRITE CHECKPOINT-RECORD FROM WS-CHECKPOINT-RECORD
IF CHKPT-FILE-STATUS NOT = '00'
DISPLAY 'FATAL: Checkpoint write failed, '
'status=' CHKPT-FILE-STATUS
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF.
💡 Key Insight: Never ignore a checkpoint write failure. If you cannot write a checkpoint, you cannot guarantee restart capability. Abort immediately with a high return code so the scheduler knows the job failed.
38.3 Control Totals
Control totals are the immune system of batch processing. They catch errors that logic alone cannot detect — missing records, duplicate processing, data corruption, and rounding drift.
Types of Control Totals
There are three categories:
1. Record Counts Simple counts of records read, written, processed, rejected, and skipped. These are the most basic control total.
01 WS-RECORD-COUNTS.
05 WS-RECORDS-READ PIC 9(10) VALUE 0.
05 WS-RECORDS-WRITTEN PIC 9(10) VALUE 0.
05 WS-RECORDS-UPDATED PIC 9(10) VALUE 0.
05 WS-RECORDS-REJECTED PIC 9(10) VALUE 0.
05 WS-RECORDS-SKIPPED PIC 9(10) VALUE 0.
The fundamental control total equation is:
Records Read = Records Written + Records Rejected + Records Skipped
If this equation does not balance at the end of the run, something went wrong.
2. Hash Totals
A hash total is a sum of a non-financial field — typically account numbers or record keys. The sum itself is meaningless, but if two systems independently compute the same hash total over the same set of records, you know they processed the same records.
01 WS-HASH-TOTALS.
05 WS-HASH-ACCT-NO PIC S9(15) COMP-3 VALUE 0.
05 WS-HASH-CLAIM-NO PIC S9(15) COMP-3 VALUE 0.
At MedClaim, James Okafor uses hash totals to verify that every claim that enters the intake process exits the adjudication process:
* Accumulate hash total of claim numbers
ADD CLM-CLAIM-NUMBER TO WS-HASH-CLAIM-NO
At the end of processing, the claim intake hash total must match the sum of adjudicated hash totals plus rejected hash totals.
3. Financial Totals
Financial totals track monetary amounts. They must balance to the penny.
01 WS-FINANCIAL-TOTALS.
05 WS-TOTAL-DEBITS PIC S9(13)V99 COMP-3 VALUE 0.
05 WS-TOTAL-CREDITS PIC S9(13)V99 COMP-3 VALUE 0.
05 WS-NET-AMOUNT PIC S9(13)V99 COMP-3 VALUE 0.
05 WS-TOTAL-INTEREST PIC S9(11)V99 COMP-3 VALUE 0.
05 WS-TOTAL-FEES PIC S9(11)V99 COMP-3 VALUE 0.
📊 Control Total Report — GlobalBank Nightly Posting
GLOBALBANK TRANSACTION POSTING - CONTROL TOTAL REPORT
RUN DATE: 2026-03-10 RUN TIME: 01:47:33
===================================================
RECORD COUNTS:
TRANSACTIONS READ: 2,347,891
TRANSACTIONS POSTED: 2,340,156
TRANSACTIONS REJECTED: 7,412
TRANSACTIONS SKIPPED: 323
COUNT VERIFICATION: BALANCED ✓
HASH TOTALS:
INPUT ACCT HASH: 483,291,847,223
OUTPUT ACCT HASH: 483,291,847,223
HASH VERIFICATION: BALANCED ✓
FINANCIAL TOTALS:
TOTAL DEBITS: $ 847,291,433.27
TOTAL CREDITS: $ 851,003,892.41
NET MOVEMENT: $ -3,712,459.14
GL RECONCILIATION: BALANCED ✓
Implementing Control Total Verification
9100-VERIFY-CONTROL-TOTALS.
* -----------------------------------------------
* Verify record count balance
* -----------------------------------------------
COMPUTE WS-EXPECTED-TOTAL =
WS-RECORDS-WRITTEN +
WS-RECORDS-REJECTED +
WS-RECORDS-SKIPPED
IF WS-EXPECTED-TOTAL NOT = WS-RECORDS-READ
DISPLAY 'ERROR: Record count imbalance'
DISPLAY ' Read: ' WS-RECORDS-READ
DISPLAY ' Expected: ' WS-EXPECTED-TOTAL
MOVE 12 TO RETURN-CODE
MOVE 'Y' TO WS-ERROR-FLAG
END-IF
* -----------------------------------------------
* Verify hash total balance
* -----------------------------------------------
COMPUTE WS-HASH-DIFF =
WS-HASH-INPUT - WS-HASH-OUTPUT
IF WS-HASH-DIFF NOT = 0
DISPLAY 'ERROR: Hash total imbalance'
DISPLAY ' Input hash: ' WS-HASH-INPUT
DISPLAY ' Output hash: ' WS-HASH-OUTPUT
DISPLAY ' Difference: ' WS-HASH-DIFF
MOVE 12 TO RETURN-CODE
MOVE 'Y' TO WS-ERROR-FLAG
END-IF
* -----------------------------------------------
* Verify financial balance (debits = credits)
* -----------------------------------------------
COMPUTE WS-FINANCIAL-DIFF =
WS-TOTAL-DEBITS - WS-TOTAL-CREDITS
- WS-NET-MOVEMENT
IF WS-FINANCIAL-DIFF NOT = 0
DISPLAY 'WARNING: Financial imbalance of '
WS-FINANCIAL-DIFF
MOVE 08 TO RETURN-CODE
END-IF.
⚠️ Common Trap: Never use floating-point (COMP-1 or COMP-2) for financial totals. The rounding errors will accumulate and your totals will not balance. Always use COMP-3 (packed decimal) or DISPLAY numeric for money.
38.4 Audit Trails
Regulators, auditors, and compliance teams need to answer a simple question: What changed, when, and why? Audit trails provide the answer.
Before/After Images
The most rigorous audit approach records both the before image (the record before the change) and the after image (the record after the change).
01 WS-AUDIT-RECORD.
05 AUDIT-TIMESTAMP PIC X(26).
05 AUDIT-PROGRAM-ID PIC X(08).
05 AUDIT-USER-ID PIC X(08).
05 AUDIT-ACTION PIC X(01).
88 AUDIT-ADD VALUE 'A'.
88 AUDIT-UPDATE VALUE 'U'.
88 AUDIT-DELETE VALUE 'D'.
05 AUDIT-RECORD-KEY PIC X(20).
05 AUDIT-FIELD-NAME PIC X(30).
05 AUDIT-BEFORE-VALUE PIC X(50).
05 AUDIT-AFTER-VALUE PIC X(50).
05 AUDIT-REASON-CODE PIC X(04).
Writing Audit Records
3500-WRITE-AUDIT-TRAIL.
* -----------------------------------------------
* Compare before and after images field by field
* Write audit record for each changed field
* -----------------------------------------------
MOVE FUNCTION CURRENT-DATE
TO AUDIT-TIMESTAMP
MOVE 'TXNPOST' TO AUDIT-PROGRAM-ID
MOVE WS-USER-ID TO AUDIT-USER-ID
SET AUDIT-UPDATE TO TRUE
MOVE ACCT-ACCOUNT-NO TO AUDIT-RECORD-KEY
* Check each auditable field
IF WS-BEFORE-BALANCE NOT = WS-AFTER-BALANCE
MOVE 'ACCT-BALANCE' TO AUDIT-FIELD-NAME
MOVE WS-BEFORE-BALANCE TO AUDIT-BEFORE-VALUE
MOVE WS-AFTER-BALANCE TO AUDIT-AFTER-VALUE
MOVE 'TPST' TO AUDIT-REASON-CODE
WRITE AUDIT-FILE-RECORD FROM WS-AUDIT-RECORD
END-IF
IF WS-BEFORE-STATUS NOT = WS-AFTER-STATUS
MOVE 'ACCT-STATUS' TO AUDIT-FIELD-NAME
MOVE WS-BEFORE-STATUS TO AUDIT-BEFORE-VALUE
MOVE WS-AFTER-STATUS TO AUDIT-AFTER-VALUE
MOVE 'STCH' TO AUDIT-REASON-CODE
WRITE AUDIT-FILE-RECORD FROM WS-AUDIT-RECORD
END-IF.
💡 Key Insight: Field-level audit trails are more useful than record-level ones. When an auditor asks "who changed this customer's credit limit?", you can answer precisely instead of forcing them to compare two complete records to find the difference.
Audit Trail for Batch Runs
Beyond individual record changes, you should also audit the batch run itself:
01 WS-BATCH-AUDIT-RECORD.
05 BA-RUN-ID PIC X(20).
05 BA-PROGRAM-ID PIC X(08).
05 BA-START-TIMESTAMP PIC X(26).
05 BA-END-TIMESTAMP PIC X(26).
05 BA-RECORDS-PROCESSED PIC 9(10).
05 BA-RECORDS-REJECTED PIC 9(10).
05 BA-RETURN-CODE PIC 9(04).
05 BA-COMPLETION-STATUS PIC X(08).
88 BA-SUCCESS VALUE 'SUCCESS '.
88 BA-WARNING VALUE 'WARNING '.
88 BA-FAILURE VALUE 'FAILURE '.
05 BA-CONTROL-TOTALS.
10 BA-HASH-TOTAL PIC S9(15) COMP-3.
10 BA-FIN-TOTAL PIC S9(13)V99 COMP-3.
05 BA-CHECKPOINT-COUNT PIC 9(05).
05 BA-RESTART-FLAG PIC X(01).
88 BA-FRESH-RUN VALUE 'N'.
88 BA-RESTART-RUN VALUE 'Y'.
38.5 The Balanced Update Pattern
The balanced update pattern (also called balanced line update or sequential master file update) is one of the most important patterns in batch COBOL. It processes a sorted transaction file against a sorted master file, producing an updated master file.
The Classic Algorithm
Both files must be sorted on the same key. The algorithm works by comparing keys:
IF master-key < transaction-key:
Write master record unchanged to new master
Read next master record
ELSE IF master-key = transaction-key:
Apply transaction to master record
(Don't write yet — there may be more transactions for this key)
ELSE (master-key > transaction-key):
If transaction is an ADD, create new master record
If transaction is UPDATE/DELETE, it's an error (no matching master)
Read next transaction
Here is the complete pattern:
2000-BALANCED-UPDATE.
PERFORM 2100-READ-MASTER
PERFORM 2200-READ-TRANSACTION
PERFORM UNTIL WS-BOTH-EOF
EVALUATE TRUE
WHEN MSTR-KEY < TXN-KEY
PERFORM 2300-WRITE-MASTER-UNCHANGED
PERFORM 2100-READ-MASTER
WHEN MSTR-KEY = TXN-KEY
PERFORM 2400-APPLY-TRANSACTION
PERFORM 2200-READ-TRANSACTION
WHEN MSTR-KEY > TXN-KEY
PERFORM 2500-UNMATCHED-TRANSACTION
PERFORM 2200-READ-TRANSACTION
WHEN WS-MASTER-EOF AND NOT WS-TXN-EOF
PERFORM 2500-UNMATCHED-TRANSACTION
PERFORM 2200-READ-TRANSACTION
WHEN WS-TXN-EOF AND NOT WS-MASTER-EOF
PERFORM 2300-WRITE-MASTER-UNCHANGED
PERFORM 2100-READ-MASTER
END-EVALUATE
END-PERFORM.
2400-APPLY-TRANSACTION.
* -----------------------------------------------
* Process all transactions for the current key
* Then write the updated master
* -----------------------------------------------
EVALUATE TXN-ACTION-CODE
WHEN 'U'
PERFORM 2410-UPDATE-MASTER-FIELDS
WHEN 'D'
SET WS-DELETE-FLAG TO TRUE
WHEN OTHER
PERFORM 9500-WRITE-ERROR-RECORD
END-EVALUATE
* Check for more transactions with the same key
PERFORM 2200-READ-TRANSACTION
PERFORM UNTIL TXN-KEY NOT = MSTR-KEY
OR WS-TXN-EOF
EVALUATE TXN-ACTION-CODE
WHEN 'U'
PERFORM 2410-UPDATE-MASTER-FIELDS
WHEN 'D'
SET WS-DELETE-FLAG TO TRUE
WHEN OTHER
PERFORM 9500-WRITE-ERROR-RECORD
END-EVALUATE
PERFORM 2200-READ-TRANSACTION
END-PERFORM
IF NOT WS-DELETE-FLAG
PERFORM 2300-WRITE-MASTER-UPDATED
END-IF
SET WS-DELETE-FLAG TO FALSE.
⚠️ Critical Pattern Rule: Always handle the case where multiple transactions exist for the same master key. The classic mistake is to write the updated master after the first matching transaction, then find a second transaction for the same key with no master to update.
High-Value Keys: End-of-File Handling
A subtle but critical detail in the balanced update is handling end-of-file conditions. When one file reaches EOF before the other, you need to continue processing the remaining file. The standard technique uses high-value keys:
2100-READ-MASTER.
READ MASTER-FILE INTO WS-MASTER-RECORD
AT END
MOVE HIGH-VALUES TO MSTR-KEY
MOVE 'Y' TO WS-MASTER-EOF-FLAG
END-READ.
2200-READ-TRANSACTION.
READ TXN-FILE INTO WS-TXN-RECORD
AT END
MOVE HIGH-VALUES TO TXN-KEY
MOVE 'Y' TO WS-TXN-EOF-FLAG
END-READ.
By setting the key to HIGH-VALUES at EOF, the comparison logic naturally handles the remaining records: a master key of HIGH-VALUES will always be greater than any transaction key, and vice versa. Both reaching HIGH-VALUES simultaneously signals the end of processing.
01 WS-EOF-FLAGS.
05 WS-MASTER-EOF-FLAG PIC X(01) VALUE 'N'.
88 WS-MASTER-EOF VALUE 'Y'.
05 WS-TXN-EOF-FLAG PIC X(01) VALUE 'N'.
88 WS-TXN-EOF VALUE 'Y'.
05 WS-BOTH-EOF VALUE 'Y' 'Y'.
38.6 Multi-Step Job Streams
Real batch processing involves multiple programs running in sequence. Each program performs one step in a larger process. The output of one step becomes the input of the next.
GlobalBank Nightly Cycle — Step by Step
Let us trace through GlobalBank's nightly batch cycle to see how multi-step job streams work in practice. Maria Chen manages this sequence, and she will tell you that every step exists for a reason — usually a reason discovered at 3 AM during a production incident.
Step 1: Extract (GBEXTRACT) Extract the day's transactions from the online transaction log into a sequential file.
Step 2: Sort (GBSORT01) Sort the extracted transactions by account number for balanced update processing.
Step 3: Validate (GBVALID) Validate each transaction against business rules. Write valid transactions to a good file, invalid ones to an error file.
Step 4: Post (GBPOST) Balanced update: apply valid transactions to the account master file. This is the critical step — the one that changes real balances.
Step 5: Interest (GBINTCALC) Calculate daily interest on all accounts. Read the updated master, compute interest, write interest accrual records.
Step 6: Fees (GBFEECALC) Assess monthly fees for accounts that meet fee criteria.
Step 7: GL Reconciliation (GBGLREC) Reconcile account totals against the general ledger. This must balance to zero.
Step 8: Statements (GBSTMT) Generate account statements for accounts with statement cycle dates matching today.
Step 9: Archive (GBARCHIVE) Archive the day's transaction file to a GDG for historical retention.
Step 10: Housekeeping (GBCLEAN) Clean up work files, update run control tables, prepare for the next cycle.
Each step produces a return code: - 0 = success - 4 = warning (minor issues, continue processing) - 8 = error (significant issues, may need investigation) - 12 = severe error (stop the job stream) - 16 = catastrophic failure (immediate page to on-call)
Designing for Restartability
When designing a multi-step job stream, each step must be independently restartable. This means:
- Each step must be idempotent — running it twice produces the same result as running it once (where possible)
- Each step must check for prior completion — skip processing if already done
- Each step must produce clear return codes
- Each step must write control totals — so the next step can verify it received the correct input
* -----------------------------------------------
* Step initialization: check prior completion
* -----------------------------------------------
1000-INITIALIZATION.
PERFORM 1100-CHECK-RUN-CONTROL
IF WS-ALREADY-COMPLETED
DISPLAY 'GBPOST already completed for '
WS-BUSINESS-DATE
DISPLAY 'Skipping - return code 0'
MOVE 0 TO RETURN-CODE
STOP RUN
END-IF
* Verify input control totals from prior step
PERFORM 1200-VERIFY-INPUT-CONTROLS
IF WS-INPUT-CONTROLS-BAD
DISPLAY 'ERROR: Input control totals do not'
' match GBVALID output totals'
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF.
✅ Best Practice: Always verify input control totals against the prior step's output control totals. If Step 4 receives a file with 100,000 records but Step 3 reported writing 100,001 records, something was lost or corrupted in transit.
38.7 JCL for Complex Batch
The Job Control Language (JCL) that drives batch jobs is not just boilerplate — it contains critical logic for job stream management. While a full JCL course is beyond our scope, you must understand the patterns that affect your COBOL programs.
Conditional Execution with COND and IF/THEN/ELSE
The traditional COND parameter controls whether a step executes based on return codes from prior steps:
//GBPOST EXEC PGM=GBPOST
// COND=(8,LT)
This means: do NOT execute GBPOST if the return code from any prior step is LESS THAN 8. In other words, skip this step if any prior step returned 8 or higher.
⚠️ JCL Gotcha: The COND parameter tests the inverse of what you want. COND=(8,LT) means "skip if any prior RC < 8" — which skips on RC 0 and 4. This is the opposite of what most people expect. It is a source of countless production incidents.
Modern JCL uses IF/THEN/ELSE/ENDIF, which is far more readable:
// IF (GBVALID.RC <= 4) THEN
//GBPOST EXEC PGM=GBPOST
//STEPLIB DD DSN=GBANK.PROD.LOADLIB,DISP=SHR
//MASTIN DD DSN=GBANK.ACCT.MASTER,DISP=OLD
//MASTOUT DD DSN=GBANK.ACCT.MASTER.NEW,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(100,50),RLSE),
// DCB=(RECFM=FB,LRECL=500,BLKSIZE=27500)
//TXNIN DD DSN=GBANK.TXN.VALID,DISP=SHR
//ERROUT DD DSN=GBANK.TXN.ERRORS,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(TRK,(50,25),RLSE)
//SYSOUT DD SYSOUT=*
// ELSE
// EXEC PGM=IEFBR14
// ENDIF
💡 Key Insight: IF/THEN/ELSE in JCL evaluates the same return code logic but in a readable format. Always prefer this over COND in new jobs.
Multi-Step JCL Example
Here is a simplified version of GlobalBank's nightly batch JCL:
//GBNITELY JOB (ACCT),'NIGHTLY BATCH',
// CLASS=A,MSGCLASS=H,
// NOTIFY=&SYSUID,
// RESTART=*
//*
//* =============================================
//* GLOBALBANK NIGHTLY BATCH CYCLE
//* =============================================
//*
//* STEP 1: EXTRACT DAILY TRANSACTIONS
//EXTRACT EXEC PGM=GBEXTRACT
//STEPLIB DD DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNLOG DD DSN=GBANK.ONLINE.TXNLOG,DISP=SHR
//TXNOUT DD DSN=GBANK.TXN.EXTRACT,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(200,100),RLSE),
// DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//CTLOUT DD DSN=GBANK.CTL.EXTRACT,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(TRK,(1,1))
//SYSOUT DD SYSOUT=*
//*
//* STEP 2: SORT BY ACCOUNT NUMBER
// IF (EXTRACT.RC <= 4) THEN
//SORT01 EXEC PGM=SORT
//SORTIN DD DSN=GBANK.TXN.EXTRACT,DISP=SHR
//SORTOUT DD DSN=GBANK.TXN.SORTED,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(200,100),RLSE)
//SYSIN DD *
SORT FIELDS=(1,10,CH,A)
SUM FIELDS=NONE
/*
//SYSOUT DD SYSOUT=*
// ENDIF
//*
//* STEP 3: VALIDATE
// IF (SORT01.RC <= 4) THEN
//VALID EXEC PGM=GBVALID
//STEPLIB DD DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNIN DD DSN=GBANK.TXN.SORTED,DISP=SHR
//TXNOUT DD DSN=GBANK.TXN.VALID,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(200,100),RLSE)
//ERROUT DD DSN=GBANK.TXN.ERRORS,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(10,5),RLSE)
//CTLIN DD DSN=GBANK.CTL.EXTRACT,DISP=SHR
//CTLOUT DD DSN=GBANK.CTL.VALID,
// DISP=(NEW,CATLG,DELETE),
// SPACE=(TRK,(1,1))
//SYSOUT DD SYSOUT=*
// ENDIF
38.8 Generation Data Groups (GDG)
A Generation Data Group (GDG) is a collection of chronologically related datasets that share a common base name. Each time you create a new generation, it gets a version number. Old generations can be automatically deleted based on a retention limit.
Why GDGs Matter
Consider GlobalBank's daily transaction archive. Every night, the archive step creates a file containing the day's transactions. Without GDGs, Maria would need to manually manage dataset names:
GBANK.TXN.ARCHIVE.D20260301
GBANK.TXN.ARCHIVE.D20260302
GBANK.TXN.ARCHIVE.D20260303
...
With GDGs, the system manages generations automatically:
GBANK.TXN.ARCHIVE.G0001V00 (oldest)
GBANK.TXN.ARCHIVE.G0002V00
GBANK.TXN.ARCHIVE.G0003V00
...
GBANK.TXN.ARCHIVE.G0365V00 (newest)
GDG Relative References
The real power of GDGs is relative referencing:
GBANK.TXN.ARCHIVE(0)— the current (most recent) generationGBANK.TXN.ARCHIVE(+1)— the next generation (one you are creating)GBANK.TXN.ARCHIVE(-1)— the previous generationGBANK.TXN.ARCHIVE(-2)— two generations back
This makes JCL generic — it never needs date-specific dataset names:
//* Create new generation of transaction archive
//ARCHIVE EXEC PGM=GBARCHIVE
//STEPLIB DD DSN=GBANK.PROD.LOADLIB,DISP=SHR
//TXNIN DD DSN=GBANK.TXN.VALID,DISP=SHR
//ARCOUT DD DSN=GBANK.TXN.ARCHIVE(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(200,100),RLSE),
// DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//*
//* Compare today's archive with yesterday's for anomalies
//COMPARE EXEC PGM=GBCOMPARE
//TODAY DD DSN=GBANK.TXN.ARCHIVE(0),DISP=SHR
//YESTERDAY DD DSN=GBANK.TXN.ARCHIVE(-1),DISP=SHR
//RPTOUT DD SYSOUT=*
GDG in COBOL Programs
Your COBOL program does not need to know about GDG generations — the JCL handles the dataset resolution. Your program simply reads from or writes to the DD name:
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT ARCHIVE-FILE
ASSIGN TO ARCOUT
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-ARCHIVE-STATUS.
* The COBOL program writes to ARCOUT.
* The JCL maps ARCOUT to GBANK.TXN.ARCHIVE(+1).
* The system creates the new generation automatically.
📊 GDG Configuration Example
GDG Base: GBANK.TXN.ARCHIVE
Limit: 365 (keep one year of daily files)
Empty: YES (allow empty GDG base to be referenced)
Scratch: YES (delete uncataloged generations)
Order: LIFO (newest generation has highest relative number)
GDG Limit and Rolloff
When the number of generations reaches the limit, the oldest generation is rolled off — uncataloged and optionally deleted. This provides automatic data lifecycle management.
//* Define a GDG base with a limit of 30 generations
//DEFGDG EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG -
(NAME(GBANK.MONTHLY.STMTS) -
LIMIT(30) -
SCRATCH -
NOEMPTY)
/*
💡 Key Insight: GDGs are one of the mainframe's most elegant features. They solve a problem — versioned dataset management — that distributed systems often handle with ad hoc scripting and manual cleanup. When Derek Washington first saw GDGs in action at GlobalBank, he said: "So it's basically Git for datasets?" Maria replied: "Git wishes it were this reliable."
38.9 Try It Yourself: Building a Checkpoint/Restart Program
Let us build a complete checkpoint/restart program for the Student Mainframe Lab. This program processes a student transaction file and writes a summary report, with full restart capability.
The Scenario
You have a file of student financial aid disbursements. Each record contains a student ID, disbursement amount, and fund code. You need to post these disbursements to a student account master file, with checkpoint/restart capability.
Step 1: Define the Files
IDENTIFICATION DIVISION.
PROGRAM-ID. STUDCHKR.
*
* Student Checkpoint/Restart Exercise
* Demonstrates checkpoint/restart pattern with
* control totals and audit trail.
*
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT DISBURSEMENT-FILE
ASSIGN TO DISBIN
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-DISB-STATUS.
SELECT STUDENT-MASTER
ASSIGN TO STUDMSTR
ORGANIZATION IS INDEXED
ACCESS MODE IS RANDOM
RECORD KEY IS SM-STUDENT-ID
FILE STATUS IS WS-MSTR-STATUS.
SELECT CHECKPOINT-FILE
ASSIGN TO CHKPTFL
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-CHKPT-STATUS.
SELECT AUDIT-FILE
ASSIGN TO AUDITFL
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-AUDIT-STATUS.
SELECT REPORT-FILE
ASSIGN TO RPTOUT
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-RPT-STATUS.
DATA DIVISION.
FILE SECTION.
FD DISBURSEMENT-FILE
RECORDING MODE IS F
RECORD CONTAINS 80 CHARACTERS.
01 DISB-RECORD PIC X(80).
FD STUDENT-MASTER
RECORD CONTAINS 200 CHARACTERS.
01 STUDENT-MASTER-RECORD.
05 SM-STUDENT-ID PIC X(10).
05 SM-STUDENT-NAME PIC X(30).
05 SM-BALANCE PIC S9(7)V99 COMP-3.
05 SM-LAST-DISB-DATE PIC 9(08).
05 SM-TOTAL-DISBURSED PIC S9(9)V99 COMP-3.
05 SM-FILLER PIC X(141).
FD CHECKPOINT-FILE
RECORDING MODE IS F
RECORD CONTAINS 100 CHARACTERS.
01 CHECKPOINT-RECORD PIC X(100).
FD AUDIT-FILE
RECORDING MODE IS F
RECORD CONTAINS 150 CHARACTERS.
01 AUDIT-FILE-RECORD PIC X(150).
FD REPORT-FILE
RECORDING MODE IS F
RECORD CONTAINS 132 CHARACTERS.
01 REPORT-RECORD PIC X(132).
WORKING-STORAGE SECTION.
01 WS-FILE-STATUSES.
05 WS-DISB-STATUS PIC X(02).
05 WS-MSTR-STATUS PIC X(02).
05 WS-CHKPT-STATUS PIC X(02).
05 WS-AUDIT-STATUS PIC X(02).
05 WS-RPT-STATUS PIC X(02).
01 WS-FLAGS.
05 WS-EOF-FLAG PIC X(01) VALUE 'N'.
88 WS-EOF VALUE 'Y'.
05 WS-RESTART-FLAG PIC X(01) VALUE 'N'.
88 WS-IS-RESTART VALUE 'Y'.
05 WS-ERROR-FLAG PIC X(01) VALUE 'N'.
88 WS-HAS-ERROR VALUE 'Y'.
01 WS-COUNTERS.
05 WS-RECORDS-READ PIC 9(10) VALUE 0.
05 WS-RECORDS-POSTED PIC 9(10) VALUE 0.
05 WS-RECORDS-REJECTED PIC 9(10) VALUE 0.
05 WS-CHECKPOINT-COUNT PIC 9(05) VALUE 0.
05 WS-SKIP-COUNT PIC 9(10) VALUE 0.
01 WS-TOTALS.
05 WS-TOTAL-DISBURSED PIC S9(11)V99 COMP-3
VALUE 0.
05 WS-HASH-TOTAL PIC S9(15) COMP-3
VALUE 0.
01 WS-PARMS.
05 WS-CHECKPOINT-INTERVAL PIC 9(05) VALUE 05000.
01 WS-DISB-WS.
05 WS-DISB-STUDENT-ID PIC X(10).
05 WS-DISB-AMOUNT PIC S9(7)V99 COMP-3.
05 WS-DISB-FUND-CODE PIC X(04).
05 WS-DISB-DATE PIC 9(08).
05 WS-DISB-FILLER PIC X(49).
Step 2: Implement the Logic
The key paragraphs are in the code file STUDCHKR.cbl provided with this chapter. Study the restart logic carefully — note how the program reads the checkpoint file during initialization and either resumes from where it left off or starts fresh.
🔗 Cross-Reference: This pattern builds on the file handling techniques from Chapters 11–16 and the error handling patterns from Chapter 22.
38.10 GlobalBank Case Study: The Nightly Batch Cycle
Maria Chen arrived at GlobalBank eight years ago, inheriting a nightly batch cycle that was already 15 years old. Over those eight years, she has refined it into a system that runs 363 nights out of 365 without human intervention. The other two nights? "Usually a disk issue or a feed from an external system arriving late," she says. "The COBOL code itself almost never fails."
The Architecture
GlobalBank's nightly cycle consists of 47 batch jobs organized into 12 job streams. The job streams run in a specific dependency order managed by TWS (Tivoli Workload Scheduler). Here is the high-level flow:
┌─────────────┐
│ EXTRACT │
│ (Stream 1) │
└──────┬──────┘
│
┌──────▼──────┐
│ SORT │
│ (Stream 2) │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ ┌──▼────┐ ┌─────▼─────┐
│ VALIDATE │ │ ENRICH│ │ XREF │
│ (Stream 3) │ │(Str 4)│ │ (Str 5) │
└──────┬──────┘ └──┬────┘ └─────┬─────┘
│ │ │
└────────────┼────────────┘
│
┌──────▼──────┐
│ POST │
│ (Stream 6) │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ ┌──▼────────┐ ┌─▼──────────┐
│ INTEREST │ │ FEES │ │ GL RECON │
│ (Stream 7) │ │(Stream 8) │ │ (Stream 9) │
└──────┬──────┘ └──┬────────┘ └─┬──────────┘
│ │ │
└────────────┼────────────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ ┌──▼────────┐ ┌─▼──────────┐
│ STATEMENTS │ │ REPORTS │ │ ARCHIVE │
│(Stream 10) │ │(Stream 11)│ │(Stream 12) │
└─────────────┘ └──────────┘ └────────────┘
The Nightly Incident
One Tuesday night, the SORT step (Stream 2) completed but produced a file with 2,341,000 records — 6,891 fewer than the EXTRACT step reported extracting. The VALIDATE step detected the hash total mismatch and aborted with return code 16.
The operations team paged Maria at 12:45 AM.
Maria's investigation: 1. She checked the EXTRACT control total report: 2,347,891 records extracted, hash total 483,291,847,223 2. She checked the SORT output: 2,341,000 records, hash total 481,983,512,190 3. The difference: 6,891 records lost, hash total difference of 1,308,335,033
The cause? A SORT parameter error introduced during a maintenance change that afternoon. The SORT was configured with SUM FIELDS=(21,8,PD), which was intended to sum a packed decimal field but accidentally consolidated records with duplicate keys — 6,891 records that had the same account number (legitimate duplicates from multiple transactions on the same account) were summed instead of kept separate.
Maria fixed the SORT parameter, restarted from the SORT step, and the batch window completed by 5:15 AM — 45 minutes behind schedule but fully reconciled.
"This is why control totals exist," Maria told Derek the next morning. "Without the hash total check in GBVALID, those 6,891 transactions would have been silently lost. Customers would have had incorrect balances, and we might not have caught it for days."
Lessons from the Incident
- Control totals caught an error that logic alone could not detect. The SORT completed successfully (return code 0) but produced incorrect output.
- Step-to-step control total verification is essential. Every step must verify that its input matches the prior step's output.
- The batch architecture supported clean restart. Maria restarted from Step 2 without rerunning Step 1.
- GDGs preserved the evidence. The original extract file was still available for the re-sort because it had not been scratched yet.
⚖️ Theme — Defensive Programming: This incident illustrates why defensive programming is not paranoia — it is engineering discipline. The hash total check in GBVALID took Maria 30 minutes to code originally. It saved the bank from a potential multi-million-dollar reconciliation nightmare.
38.11 MedClaim Case Study: Claims Batch Processing Pipeline
At MedClaim, James Okafor manages a claims processing pipeline that handles 500,000 claims per month. The pipeline runs nightly, processing claims received during the day through intake, validation, adjudication, and payment stages.
The Pipeline Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ INTAKE │───▶│ VALIDATE │───▶│ADJUDICATE│───▶│ PAY │
│ CLM-INT │ │ CLM-VAL │ │ CLM-ADJ │ │ CLM-PAY │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ CTL TOT │ │ CTL TOT │ │ CTL TOT │ │ CTL TOT │
│ REPORT │ │ REPORT │ │ REPORT │ │ REPORT │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Intake Stage (CLM-INT)
The intake program reads electronic claims from multiple sources — EDI 837 transactions, direct provider submissions, and paper claims entered by the data entry team. Each claim is assigned a unique claim number and written to the claims pending file.
* -----------------------------------------------
* CLM-INT: Assign claim number and write to
* pending file. Track control totals by source.
* -----------------------------------------------
2000-PROCESS-CLAIM.
ADD 1 TO WS-CLAIMS-READ
PERFORM 2100-ASSIGN-CLAIM-NUMBER
PERFORM 2200-VALIDATE-FORMAT
IF WS-FORMAT-VALID
PERFORM 2300-WRITE-PENDING
ADD 1 TO WS-CLAIMS-ACCEPTED
ADD CLM-CHARGED-AMOUNT
TO WS-TOTAL-CHARGED
ADD CLM-CLAIM-NUMBER
TO WS-HASH-CLAIM-NO
ELSE
PERFORM 2400-WRITE-REJECT
ADD 1 TO WS-CLAIMS-REJECTED
END-IF
* Checkpoint every N records
IF FUNCTION MOD(WS-CLAIMS-READ
WS-CHECKPOINT-INTERVAL) = 0
PERFORM 2500-WRITE-CHECKPOINT
END-IF.
Adjudication Stage (CLM-ADJ)
The adjudication program is the most complex step. It applies hundreds of business rules to determine coverage, calculate allowed amounts, apply deductibles and copays, and produce an explanation of benefits.
Sarah Kim, the business analyst, maintains a business rules matrix that maps directly to EVALUATE statements in the adjudication code:
* -----------------------------------------------
* Apply benefit rules based on plan type and
* service category
* -----------------------------------------------
3100-APPLY-BENEFIT-RULES.
EVALUATE CLM-PLAN-TYPE
ALSO CLM-SERVICE-CATEGORY
WHEN 'HMO' ALSO 'PREV'
MOVE 100 TO WS-COVERAGE-PCT
MOVE 0 TO WS-COPAY-AMOUNT
WHEN 'HMO' ALSO 'SPEC'
MOVE 80 TO WS-COVERAGE-PCT
MOVE 40 TO WS-COPAY-AMOUNT
WHEN 'PPO' ALSO 'PREV'
MOVE 90 TO WS-COVERAGE-PCT
MOVE 20 TO WS-COPAY-AMOUNT
WHEN 'PPO' ALSO 'SPEC'
MOVE 70 TO WS-COVERAGE-PCT
MOVE 50 TO WS-COPAY-AMOUNT
WHEN OTHER
PERFORM 3110-LOOKUP-CUSTOM-RULES
END-EVALUATE.
* Apply deductible
IF WS-MEMBER-DEDUCTIBLE-MET = 'N'
COMPUTE WS-DEDUCTIBLE-REMAINING =
WS-ANNUAL-DEDUCTIBLE -
WS-DEDUCTIBLE-YTD
IF WS-ALLOWED-AMOUNT <=
WS-DEDUCTIBLE-REMAINING
MOVE WS-ALLOWED-AMOUNT
TO WS-DEDUCTIBLE-APPLIED
MOVE 0 TO WS-PAYABLE-AMOUNT
ELSE
MOVE WS-DEDUCTIBLE-REMAINING
TO WS-DEDUCTIBLE-APPLIED
COMPUTE WS-PAYABLE-AMOUNT =
(WS-ALLOWED-AMOUNT -
WS-DEDUCTIBLE-REMAINING) *
WS-COVERAGE-PCT / 100
- WS-COPAY-AMOUNT
END-IF
ELSE
COMPUTE WS-PAYABLE-AMOUNT =
WS-ALLOWED-AMOUNT *
WS-COVERAGE-PCT / 100
- WS-COPAY-AMOUNT
END-IF.
End-to-End Control Total Verification
At the end of the pipeline, a reconciliation job verifies that every claim that entered intake is accounted for — either adjudicated and paid, rejected, or pended for manual review:
MEDCLAIM PIPELINE RECONCILIATION - 2026-03-10
===================================================
INTAKE:
Claims received: 18,247
Claims accepted: 17,891
Claims rejected (format): 356
Hash total (intake): 892,441,337
VALIDATION:
Claims validated: 17,612
Claims failed validation: 279
Hash total (valid): 862,113,448
Hash total (invalid): 30,327,889
Combined hash: 892,441,337 ✓ MATCHES INTAKE
ADJUDICATION:
Claims adjudicated: 17,612
Claims approved: 16,198
Claims denied: 1,414
Total charged: $4,271,443.82
Total allowed: $3,198,204.11
Total payable: $2,847,291.03
PAYMENT:
Claims paid: 16,198
Total paid: $2,847,291.03 ✓ MATCHES ADJUDICATION
PIPELINE VERIFICATION: BALANCED ✓
🔴 MedClaim Reality Check: James Okafor's team once discovered a bug where the adjudication program was not counting denied claims in its hash total. The pipeline reconciliation caught the error — the adjudication hash total was short by 47 claims. "Those 47 denied claims would have disappeared from our system," James said. "No denial letters, no member notifications, no regulatory reporting. Control totals saved us from a compliance violation."
38.12 Advanced Patterns
Pattern: Commit Scope Management
When working with DB2, you control the commit scope — how many records you process before issuing a COMMIT:
2000-PROCESS-RECORDS.
PERFORM UNTIL WS-EOF
READ INPUT-FILE INTO WS-INPUT-RECORD
AT END
SET WS-EOF TO TRUE
END-READ
IF NOT WS-EOF
PERFORM 2100-PROCESS-ONE-RECORD
ADD 1 TO WS-RECORDS-SINCE-COMMIT
IF WS-RECORDS-SINCE-COMMIT >=
WS-COMMIT-FREQUENCY
EXEC SQL COMMIT END-EXEC
MOVE 0 TO WS-RECORDS-SINCE-COMMIT
PERFORM 2500-WRITE-CHECKPOINT
END-IF
END-IF
END-PERFORM.
Pattern: Error Threshold Management
Production programs should not process indefinitely when errors are accumulating. Set an error threshold:
2100-PROCESS-ONE-RECORD.
PERFORM 2110-VALIDATE-RECORD
IF WS-RECORD-VALID
PERFORM 2120-APPLY-BUSINESS-RULES
PERFORM 2130-WRITE-OUTPUT
ADD 1 TO WS-RECORDS-PROCESSED
ELSE
PERFORM 2140-WRITE-ERROR
ADD 1 TO WS-RECORDS-REJECTED
* Check error threshold
COMPUTE WS-ERROR-PCT =
(WS-RECORDS-REJECTED /
WS-RECORDS-READ) * 100
IF WS-ERROR-PCT > WS-MAX-ERROR-PCT
DISPLAY 'ERROR THRESHOLD EXCEEDED: '
WS-ERROR-PCT '% errors'
DISPLAY 'Maximum allowed: '
WS-MAX-ERROR-PCT '%'
MOVE 12 TO RETURN-CODE
PERFORM 9000-TERMINATION
STOP RUN
END-IF
END-IF.
Pattern: Run Control Table
A run control table tracks which batch steps have completed for each business date:
01 WS-RUN-CONTROL-RECORD.
05 RC-BUSINESS-DATE PIC 9(08).
05 RC-STEP-NAME PIC X(08).
05 RC-STATUS PIC X(01).
88 RC-NOT-STARTED VALUE 'N'.
88 RC-IN-PROGRESS VALUE 'P'.
88 RC-COMPLETED VALUE 'C'.
88 RC-FAILED VALUE 'F'.
05 RC-START-TIMESTAMP PIC X(26).
05 RC-END-TIMESTAMP PIC X(26).
05 RC-RECORDS-PROCESSED PIC 9(10).
05 RC-RETURN-CODE PIC 9(04).
05 RC-CONTROL-TOTALS.
10 RC-HASH-TOTAL PIC S9(15) COMP-3.
10 RC-FIN-TOTAL PIC S9(13)V99 COMP-3.
10 RC-RECORD-COUNT PIC 9(10).
This table enables: - Restart detection: Check if a step already completed for today - Dependency verification: Confirm prerequisite steps completed - Progress monitoring: Operations can see which step is running - Audit: Complete history of all batch runs
38.13 Return Codes and Job Stream Communication
COBOL programs communicate with JCL and the scheduler through return codes. Setting the return code correctly is critical for job stream behavior.
* Set return code based on processing results
9200-SET-RETURN-CODE.
EVALUATE TRUE
WHEN WS-HAS-ERROR AND WS-ERROR-SEVERE
MOVE 16 TO RETURN-CODE
WHEN WS-HAS-ERROR
MOVE 12 TO RETURN-CODE
WHEN WS-HAS-WARNING
MOVE 04 TO RETURN-CODE
WHEN OTHER
MOVE 00 TO RETURN-CODE
END-EVALUATE
DISPLAY 'Program ' WS-PROGRAM-ID
' completed with RC=' RETURN-CODE.
The scheduler uses these return codes to decide which downstream jobs to trigger:
| Return Code | Meaning | Scheduler Action |
|---|---|---|
| 0 | Normal completion | Continue to next job |
| 4 | Warning | Continue, but flag for review |
| 8 | Error | Stop dependent jobs, alert operator |
| 12 | Severe error | Stop job stream, page on-call |
| 16 | Catastrophic | Stop all processing, emergency page |
✅ Best Practice: Always display the return code before exiting. When operations is investigating a failure at 3 AM, this single line in the SYSOUT saves valuable time.
38.14 Performance Considerations
Batch processing performance matters because the batch window is finite. Here are the key techniques:
Buffering
FD LARGE-INPUT-FILE
BLOCK CONTAINS 0 RECORDS
RECORDING MODE IS F.
The BLOCK CONTAINS 0 RECORDS clause tells the system to use the blocksize specified in the JCL or dataset DCB, which is typically optimized for the device. This can reduce I/O by a factor of 10 or more.
Minimize OPEN/CLOSE
Opening and closing files is expensive. If your program processes multiple files, open all of them at the start and close them at the end — do not open and close files repeatedly within your processing loop.
Efficient Key Comparison
In the balanced update pattern, the key comparison happens for every record. Use numeric comparisons for numeric keys (faster than alphanumeric), and avoid unnecessary MOVEs in the inner loop.
SORT Integration
Let the system SORT utility handle sorting — it is highly optimized. Using an internal SORT (SORT verb in COBOL) is acceptable for small files, but for millions of records, the external SORT utility with its own JCL step is faster.
38.15 Putting It All Together
Let us trace through a complete batch processing scenario to see how all the patterns fit together.
Scenario: GlobalBank needs to process 2.3 million daily transactions. The processing must be restartable, auditable, and reconcilable.
The complete program structure:
IDENTIFICATION DIVISION.
PROGRAM-ID. GBPOST.
*
* GlobalBank Transaction Posting Program
* Posts daily transactions to the account master.
* Features: checkpoint/restart, control totals,
* audit trail, error threshold management.
*
PROCEDURE DIVISION.
0000-MAIN-CONTROL.
PERFORM 1000-INITIALIZATION
PERFORM 2000-PROCESS-TRANSACTIONS
UNTIL WS-EOF
PERFORM 8000-VERIFY-CONTROL-TOTALS
PERFORM 8500-WRITE-CONTROL-REPORT
PERFORM 9000-TERMINATION
STOP RUN.
1000-INITIALIZATION.
PERFORM 1100-OPEN-FILES
PERFORM 1200-READ-CHECKPOINT
IF WS-IS-RESTART
PERFORM 1300-POSITION-FOR-RESTART
ELSE
PERFORM 1400-FRESH-START
END-IF
PERFORM 1500-READ-PARMS.
2000-PROCESS-TRANSACTIONS.
READ TXN-FILE INTO WS-TXN-RECORD
AT END
SET WS-EOF TO TRUE
END-READ
IF NOT WS-EOF
ADD 1 TO WS-RECORDS-READ
PERFORM 2100-VALIDATE-TXN
IF WS-TXN-VALID
PERFORM 2200-LOOKUP-ACCOUNT
IF WS-ACCOUNT-FOUND
PERFORM 2300-SAVE-BEFORE-IMAGE
PERFORM 2400-APPLY-TXN
PERFORM 2500-REWRITE-ACCOUNT
PERFORM 2600-WRITE-AUDIT
ADD 1 TO WS-RECORDS-POSTED
ELSE
PERFORM 2700-WRITE-UNMATCHED
ADD 1 TO WS-RECORDS-REJECTED
END-IF
ELSE
PERFORM 2800-WRITE-INVALID
ADD 1 TO WS-RECORDS-REJECTED
END-IF
* Accumulate control totals
ADD TXN-AMOUNT TO WS-FINANCIAL-TOTAL
ADD TXN-ACCT-NO TO WS-HASH-TOTAL
* Checkpoint at interval
IF FUNCTION MOD(WS-RECORDS-READ
WS-CHECKPOINT-INTERVAL) = 0
PERFORM 2900-WRITE-CHECKPOINT
END-IF
* Check error threshold
PERFORM 2950-CHECK-ERROR-THRESHOLD
END-IF.
This program demonstrates every pattern from this chapter: - Checkpoint/restart (paragraphs 1200, 1300, 2900) - Control totals (paragraph 8000) - Audit trail (paragraph 2600) - Error threshold (paragraph 2950) - Run control (paragraph 1000) - Return code management (paragraph 9000)
38.16 GDG Management in Production
Generation Data Groups require careful management in production environments. The patterns described in section 38.8 cover the basics, but real-world GDG management involves several additional considerations that Maria Chen and her team handle routinely at GlobalBank.
GDG Model Copies and DCB Attributes
When you create a new generation, the system can inherit DCB attributes (record format, record length, block size) from a model dataset. This ensures consistency across generations:
//* Define a model dataset for the GDG
//DEFMODEL EXEC PGM=IEFBR14
//GDGMODEL DD DSN=GBANK.TXN.ARCHIVE.MODEL,
// DISP=(NEW,CATLG),
// SPACE=(TRK,(0)),
// DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)
//*
//* New generations inherit DCB from model
//ARCOUT DD DSN=GBANK.TXN.ARCHIVE(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(200,100),RLSE),
// DCB=GBANK.TXN.ARCHIVE.MODEL
Without a model dataset, each generation must explicitly specify its DCB attributes in the JCL. If a developer forgets or uses inconsistent values, downstream programs that read the GDG may fail or produce incorrect results.
GDG Recoverability
When a batch job that creates a GDG generation fails, you face a decision: the partially written generation exists in the catalog. The DISP=(NEW,CATLG,DELETE) parameter tells the system to delete the dataset if the step abends. But what if the step completes with a high return code instead of abending? Maria's standard practice is explicit cleanup:
//* Step to clean up failed GDG generation
// IF (ARCHIVE.RC > 4) THEN
//CLEANUP EXEC PGM=IEFBR14
//BADGEN DD DSN=GBANK.TXN.ARCHIVE(+1),
// DISP=(OLD,DELETE,DELETE)
// ENDIF
📊 GDG Housekeeping Checklist - Verify GDG limit is set appropriately (too low = premature rolloff, too high = wasted DASD) - Monitor DASD consumption — 365 generations of a large file consume significant space - Periodically verify that the GDG base and all active generations are consistent in the catalog - Use IDCAMS LISTCAT to audit GDG contents during monthly maintenance windows - Document which GDGs have downstream dependencies (regulatory archives, audit trails)
GDG and Tape Management
For long-term archival, older GDG generations are often migrated to tape. Tape management systems like CA-1 (TMS) or DFSMSrmm coordinate with GDG rolloff:
GDG Generation Policy Example:
Generations 0 to -30: DASD (immediate access)
Generations -31 to -365: Migrated to tape (recall delay ~30 seconds)
Generations older than -365: Expired and scratched
This tiered storage approach balances cost against access speed. Maria's team can access yesterday's archive instantly but must wait 30 seconds for last month's archive to be recalled from tape.
⚠️ Tape Recall Trap: If a production job references a GDG generation that has been migrated to tape, the job will wait for the tape mount. At 2 AM, there may be no operator available to mount the tape. Always verify that your job's GDG references point to DASD-resident generations, or ensure automated tape library support is available.
38.17 Production Scheduling: TWS and CA-7
Enterprise batch scheduling is far more sophisticated than simply running jobs in sequence. Scheduling tools manage dependencies, calendars, alerts, and automatic recovery.
Tivoli Workload Scheduler (TWS / IBM Workload Automation)
TWS (now IBM Workload Automation) is the scheduling system used at GlobalBank and many other mainframe shops. It manages:
- Job dependencies: Job B waits for Job A to complete with RC ≤ 4
- Resource dependencies: Only one job at a time can access the account master
- Calendar dependencies: Month-end jobs run only on the last business day
- Time dependencies: Statement generation cannot start before 4:00 AM
- Special resources: Printer availability, external feed arrival
Defining a Job Stream in TWS
TWS uses an application description to define job streams:
APPLICATION: GBNIGHTLY
OWNER: BATCHOPS
CALENDAR: GBANK-WEEKDAY
RUN DAYS: MON TUE WED THU FRI
EXCLUDED: GBANK-HOLIDAYS
OPERATION: GBEXTRACT
JOB: GBANK.PROD.JCL(GBEXTRACT)
PREDECESSORS: NONE (triggered by time 23:00)
RECOVERY: RERUN
OPERATION: GBSORT01
JOB: GBANK.PROD.JCL(GBSORT01)
PREDECESSORS: GBEXTRACT (RC ≤ 4)
RECOVERY: RERUN
OPERATION: GBVALID
JOB: GBANK.PROD.JCL(GBVALID)
PREDECESSORS: GBSORT01 (RC ≤ 4)
RECOVERY: RERUN
OPERATION: GBPOST
JOB: GBANK.PROD.JCL(GBPOST)
PREDECESSORS: GBVALID (RC ≤ 4)
SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
RECOVERY: STOP (manual intervention required)
ALERT: PAGE ONCALL-DBA IF RC > 4
💡 Key Insight: Notice that GBPOST specifies RECOVERY: STOP instead of RERUN. This is because GBPOST modifies the account master file — rerunning it blindly could double-post transactions. The on-call DBA must verify the checkpoint status before restarting. This is a deliberate design decision that reflects the risk profile of the step.
CA-7 (Broadcom Workload Automation)
CA-7 is the other major mainframe scheduling system. It uses a different vocabulary but provides similar capabilities. Where TWS uses "applications" and "operations," CA-7 uses "job networks" and "schedule IDs":
SCHID: 001 JOB: GBEXTRACT TRIGTYPE: TIME(2300)
SCHID: 002 JOB: GBSORT01 TRIGTYPE: JOB(GBEXTRACT) MAXRC: 4
SCHID: 003 JOB: GBVALID TRIGTYPE: JOB(GBSORT01) MAXRC: 4
SCHID: 004 JOB: GBPOST TRIGTYPE: JOB(GBVALID) MAXRC: 4
Calendar Management
Production batch schedules must account for: - Business days vs. calendar days: Month-end processing runs on the last business day, not December 31 - Holidays: No processing on bank holidays (but some feeds still arrive) - Month-end / quarter-end / year-end: Additional jobs for period-close processing - Daylight saving time: The batch window shifts by one hour twice a year
CALENDAR: GBANK-WEEKDAY
EXCLUDE: SATURDAY, SUNDAY
EXCLUDE: GBANK-HOLIDAYS (separate holiday calendar)
SPECIAL: LAST-BUSINESS-DAY (triggers month-end jobs)
SPECIAL: QUARTER-END (triggers regulatory reporting)
Maria Chen maintains GlobalBank's holiday calendar a year in advance. "Missing a holiday in the calendar means the batch tries to run when the Federal Reserve wire system is closed," she explains. "That is a very bad day."
Monitoring and Alerting
Production scheduling systems provide real-time monitoring dashboards:
GBNIGHTLY Status — 2026-03-11 02:47:33
═══════════════════════════════════════════
GBEXTRACT ▓▓▓▓▓▓▓▓▓▓ COMPLETE RC=0 23:15
GBSORT01 ▓▓▓▓▓▓▓▓▓▓ COMPLETE RC=0 23:28
GBVALID ▓▓▓▓▓▓▓▓▓▓ COMPLETE RC=0 00:05
GBPOST ▓▓▓▓▓▓▓▓░░ RUNNING --- 00:22
GBINTCALC ░░░░░░░░░░ WAITING --- ---
GBFEECALC ░░░░░░░░░░ WAITING --- ---
GBGLREC ░░░░░░░░░░ WAITING --- ---
GBSTMT ░░░░░░░░░░ WAITING --- ---
GBARCHIVE ░░░░░░░░░░ WAITING --- ---
Estimated completion: 05:12
Batch window closes: 06:00
Margin: 48 minutes
The operations team watches this dashboard throughout the night. If the estimated completion time approaches the batch window close time, they can take proactive measures — canceling low-priority jobs, increasing system priority for critical jobs, or extending the window if possible.
✅ Best Practice: Build slack into the batch window. If your batch cycle typically completes in 5 hours, the 7-hour window gives you 2 hours of margin. That margin is consumed by retries, restart/recovery, and the inevitable nights when volumes spike or a disk gets slow.
38.18 Restart/Recovery: A Complete Worked Example
Let us walk through a restart scenario step by step to see exactly how all the pieces fit together. This is the scenario that keeps COBOL developers employed at 3 AM.
The Scenario
It is Tuesday night at GlobalBank. The GBPOST job (transaction posting) starts at 12:22 AM with 2,347,891 transactions to process. The checkpoint interval is 10,000 records. At 1:47 AM, after processing 1,230,000 transactions (checkpoint 123 written), a DB2 tablespace runs out of space. The job abends with a S04E system abend code.
Step 1: Diagnosis (1:52 AM)
The operations team pages Maria. She checks the job output:
GBPOST - TRANSACTION POSTING
STARTED: 2026-03-11 00:22:15
CHECKPOINT 123 WRITTEN AT 01:43:22
RECORDS READ: 1,230,000
RECORDS POSTED: 1,221,847
RECORDS REJECTED: 8,153
HASH TOTAL: 287,441,338,291
FINANCIAL TOTAL: $412,847,291.33
LAST KEY PROCESSED: 4472819003
*** ABEND S04E AT 01:47:33 ***
RECORDS READ SINCE CHECKPOINT: 4,217
THESE RECORDS MAY NEED REPROCESSING
Step 2: Fix the Root Cause (2:05 AM)
Maria contacts the DBA on call. The DB2 tablespace for the audit trail table is full. The DBA extends the tablespace:
ALTER TABLESPACE GBANK.AUDITTS
ADD VOLUME(VOL003)
BUFFERPOOL BP32K;
Step 3: Verify Checkpoint Integrity (2:15 AM)
Maria reads the checkpoint file to verify it is consistent:
//CHKREAD EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
PRINT INFILE(CHKPTFL) CHARACTER
/*
//CHKPTFL DD DSN=GBANK.GBPOST.CHECKPOINT,DISP=SHR
The checkpoint shows: - Program ID: GBPOST - Status: P (in progress — confirming the job did not complete) - Records read: 1,230,000 - Last key: 4472819003 - All control totals present
Step 4: Restart the Job (2:22 AM)
Maria restarts GBPOST. The scheduler submits the same JCL. The COBOL program:
- Opens the checkpoint file
- Reads the checkpoint record — finds status "P" (in progress)
- Sets restart flag to "Y"
- Restores all counters from the checkpoint
- Reads and skips 1,230,000 input records (this takes about 90 seconds for 2.3 million records)
- Resumes processing at record 1,230,001
GBPOST - TRANSACTION POSTING (RESTART)
STARTED: 2026-03-11 02:22:45
RESTART DETECTED - CHECKPOINT 123
RESTORING COUNTERS FROM CHECKPOINT
RECORDS TO SKIP: 1,230,000
SKIPPING... COMPLETE (92 seconds)
RESUMING FROM RECORD 1,230,001
LAST KEY FROM CHECKPOINT: 4472819003
PROCESSING RESUMED AT 02:24:17
Step 5: Completion and Verification (3:48 AM)
The restarted job processes the remaining 1,117,891 transactions and completes:
GBPOST - COMPLETED (RESTART RUN)
TOTAL RECORDS READ: 2,347,891
TOTAL RECORDS POSTED: 2,331,204
TOTAL RECORDS REJECTED: 16,687
HASH TOTAL: 483,291,847,223
FINANCIAL TOTAL: $ 847,291,433.27
RETURN CODE: 0
CHECKPOINT FINAL STATUS: C (COMPLETED)
RESTART: YES (FROM CHECKPOINT 123)
ELAPSED TIME: 01:25:33 (restart portion)
Step 6: Continue the Batch Window (3:48 AM)
The scheduler sees GBPOST completed with RC=0 and triggers the downstream jobs: GBINTCALC, GBFEECALC, and GBGLREC. The batch window completes at 5:38 AM — 22 minutes behind the normal schedule but safely within the 6:00 AM deadline.
Lessons from This Restart
-
The checkpoint interval of 10,000 records meant a maximum of 9,999 records needed reprocessing — in this case, the 4,217 records read between checkpoint 123 and the abend. But because those records had already been applied to the master file (VSAM REWRITE), the restart logic must handle idempotency: the COBOL program checks whether a transaction has already been posted (by comparing the transaction timestamp against the account's last-update timestamp) before applying it.
-
The skip phase (90 seconds) was the longest part of the restart initialization. For very large files, consider using the checkpoint's record-count to POSITION the file using START (for indexed files) or to use a relative record number (for relative files). Sequential files require reading and discarding records, which is slower.
-
The DB2 space issue was the root cause, not the COBOL program. Production failures are more often infrastructure issues (disk, memory, network) than logic bugs. Good restart/recovery handles both.
🧪 Try It Yourself: Using the STUDCHKR program from section 38.9, deliberately interrupt the program (Ctrl+C or kill the process) during processing. Then restart it and verify that it resumes correctly from the last checkpoint. Compare the final control totals with an uninterrupted run to confirm they match.
38.19 Multi-Step Job Dependencies: Advanced Patterns
Real production batch environments have dependency patterns far more complex than simple linear chains. Understanding these patterns is essential for designing restartable, efficient batch architectures.
The Diamond Dependency
A diamond dependency occurs when two parallel streams must both complete before a downstream job can start:
GBEXTRACT
│
┌───┴───┐
▼ ▼
GBSORT GBXREF
│ │
└───┬───┘
▼
GBMERGE
GBMERGE requires both GBSORT and GBXREF to complete successfully. If GBSORT finishes but GBXREF fails, GBMERGE must wait. If both fail, fixing and restarting requires attention to the order: either can be restarted independently, but GBMERGE must wait for both.
In TWS, this is expressed as:
OPERATION: GBMERGE
PREDECESSORS: GBSORT (RC ≤ 4) AND GBXREF (RC ≤ 4)
The Conditional Branch
Some jobs should only run when specific conditions are met:
GBPOST
│
┌─────┼─────┐
▼ ▼ ▼
GBFEES GBINT GBALERT
(always) (always) (only if RC=4 from GBPOST)
GBALERT is a notification job that runs only when GBPOST reports warnings (RC=4, indicating rejected transactions above a threshold). In normal operation (RC=0), it is skipped.
The Resource Fence
Some jobs cannot run simultaneously because they compete for a shared resource:
GBPOST: requires ACCT-MASTER (exclusive)
GBINTCALC: requires ACCT-MASTER (exclusive)
GBFEECALC: requires ACCT-MASTER (exclusive)
These three jobs CANNOT run in parallel even though
their logical dependencies would allow it.
TWS manages this with special resources:
OPERATION: GBPOST
SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
OPERATION: GBINTCALC
SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
PREDECESSORS: GBPOST (RC ≤ 4)
OPERATION: GBFEECALC
SPECIAL RESOURCE: ACCT-MASTER (EXCLUSIVE)
PREDECESSORS: GBINTCALC (RC ≤ 4)
The External Trigger
Some batch jobs depend on external events — a file arriving from another organization, a time window opening, or a manual approval:
OPERATION: GBCARDFEED
TRIGGER: EXTERNAL FILE ARRIVAL
DATASET: VISA.DAILY.FEED.D*
TIMEOUT: 01:30 (if file not received by 1:30 AM, alert)
At GlobalBank, the Visa card transaction feed arrives between 11:30 PM and 12:30 AM. If it has not arrived by 1:30 AM, the scheduler alerts the operations team, who contacts Visa's operations center. Maria Chen has been through this scenario dozens of times: "The feed is late about once a month. Usually it is a network issue on their end. We have a 90-minute buffer in the schedule for exactly this reason."
⚖️ Theme — Defensive Programming: Every dependency pattern represents a potential failure mode. Diamond dependencies can deadlock if one branch fails. Conditional branches can skip critical jobs if the condition is wrong. Resource fences can create bottlenecks. External triggers can block the entire batch window. Designing for these failure modes — not just the happy path — is what separates a reliable batch architecture from a fragile one.
38.20 Chapter Summary
Batch processing is the backbone of enterprise computing, and the patterns in this chapter are what make it reliable. Checkpoint/restart ensures recovery from failures. Control totals catch errors that logic cannot detect. Audit trails satisfy regulators and protect the organization. The balanced update pattern processes sorted files efficiently. GDGs manage versioned datasets automatically. And JCL ties it all together into orchestrated job streams.
These patterns are not glamorous. They do not make for exciting conference talks or viral blog posts. But they are the reason that 2.3 million transactions get posted correctly every single night at GlobalBank, that 500,000 claims get adjudicated every month at MedClaim, and that the financial infrastructure of the world keeps running while the rest of us sleep.
Derek Washington summed it up well after his first night observing the batch window: "It's like watching a symphony where every instrument plays exactly on cue, and nobody in the audience even knows it's happening."
Maria's response: "That's the point."
🔗 Looking Ahead: In Chapter 39, we will explore how COBOL systems integrate with real-time systems — message queues, web services, and APIs. The batch world and the online world are converging, and modern COBOL developers need to be comfortable in both.