23 min read

Despite the rise of real-time processing, web services, and event-driven architectures, batch processing remains the workhorse of enterprise computing. On IBM mainframes, batch workloads account for the majority of CPU cycles consumed each day...

Chapter 28: Batch Processing Patterns and Design

Introduction: The Backbone of Mainframe Workload

Despite the rise of real-time processing, web services, and event-driven architectures, batch processing remains the workhorse of enterprise computing. On IBM mainframes, batch workloads account for the majority of CPU cycles consumed each day. Banks process millions of transactions, insurance companies calculate premiums for entire portfolios, and retailers update inventory across thousands of stores---all through carefully orchestrated batch programs.

Batch processing is the execution of a series of programs on a computer without manual intervention. Data is collected, entered, processed, and output in discrete groups called "batches." In a mainframe COBOL environment, batch programs typically read sequential files, process records according to business rules, and produce output files and reports.

This chapter explores the fundamental patterns that COBOL programmers have refined over decades. These patterns are not mere historical curiosities---they represent battle-tested solutions to problems that every enterprise faces: How do you process millions of records efficiently? How do you recover when something goes wrong halfway through? How do you coordinate dozens of interdependent programs? Understanding these patterns is essential for any programmer working in a mainframe environment.


28.1 Batch vs. Online Processing

Before diving into patterns, it is important to understand where batch processing fits in the overall application landscape.

Characteristics of Batch Processing

Batch processing has several defining characteristics:

  • No user interaction: Programs run from start to completion without human input.
  • High volume: Batch programs typically process thousands to millions of records.
  • Sequential access: Most batch programs read files sequentially from beginning to end.
  • Scheduled execution: Jobs run at predetermined times, often during off-peak hours.
  • Resource intensive: Batch jobs consume significant CPU, I/O, and memory resources.
  • All-or-nothing semantics: A batch job either completes successfully or must be restarted.

Characteristics of Online Processing

Online (or interactive) processing, by contrast, involves:

  • User-driven: Each transaction is initiated by a user at a terminal or through a service call.
  • Low latency: Responses must be returned within seconds.
  • Random access: Programs typically access specific records by key using VSAM or DB2.
  • Continuous availability: Online systems must be available during business hours.
  • Single-record scope: Each transaction processes one record or a small set of records.

When to Use Batch Processing

Batch processing is the right choice when:

  1. Large volumes must be processed: Applying interest to every account in a bank, generating statements for all customers, or recalculating insurance premiums across an entire portfolio.
  2. Processing is periodic: End-of-day, end-of-month, or end-of-year operations that occur on a fixed schedule.
  3. Data must be coordinated across systems: Extract-Transform-Load operations that move data between databases, applications, or platforms.
  4. Audit trails are required: Batch programs naturally produce reports, control totals, and audit records.
  5. Resource-intensive calculations are needed: Complex algorithms that would be too slow for interactive response times.
  6. Sequential processing is natural: When every record in a file must be examined or updated, sequential batch processing is far more efficient than random access.

The Batch Window

The "batch window" is the period---typically overnight---when batch processing runs. As online systems have extended their hours of availability, the batch window has shrunk. A bank that once had from 6:00 PM to 6:00 AM for batch processing might now have only from midnight to 5:00 AM. This compression of the batch window has driven innovation in batch design, including the optimization techniques discussed later in this chapter.


28.2 The Fundamental Batch Pattern: Read-Process-Write

Every batch program follows the same basic structure:

PERFORM INITIALIZATION
PERFORM READ-FIRST-RECORD
PERFORM PROCESS-RECORDS UNTIL END-OF-FILE
PERFORM FINALIZATION
STOP RUN

This "Read-Process-Write" pattern has three phases:

Initialization Phase

During initialization, the program:

  • Opens all files
  • Reads control cards or parameter files
  • Initializes accumulators, counters, and work areas
  • Writes report headers
  • Records the start time for statistics

Processing Phase

The processing loop reads each input record, applies business logic, and writes output. The structure is almost always:

PERFORM UNTIL WS-EOF = 'Y'
    PERFORM PROCESS-CURRENT-RECORD
    PERFORM READ-NEXT-RECORD
END-PERFORM

Note the "read-ahead" technique: the first record is read before entering the loop, and each iteration reads the next record at the end. This ensures that the end-of-file condition is detected cleanly.

Finalization Phase

During finalization, the program:

  • Writes report footers and grand totals
  • Closes all files
  • Writes processing statistics (records read, written, rejected)
  • Sets the return code
  • Displays completion messages

This three-phase structure is so universal that experienced COBOL programmers can navigate any batch program by looking for these landmarks.


28.3 Control Break Processing

Control break processing is one of the most important batch patterns. It produces reports with subtotals at each level of a hierarchy. For example, a sales report might show totals by salesperson within district within region.

How Control Breaks Work

The input file must be sorted by the control fields in major-to-minor order. As the program reads each record, it compares the control fields to the values from the previous record. When a field changes, a "break" occurs at that level and all lower levels.

Consider a file sorted by Region, District, and Salesperson:

Region  District  Salesperson  Amount
East    Boston    Adams        1000
East    Boston    Adams         500
East    Boston    Baker         750
East    New York  Clark        2000
West    Denver    Davis        1500

When the program reads the record for Baker, the Salesperson field has changed---a minor break. When it reads Clark, the District field has changed---an intermediate break (which also triggers a minor break). When it reads Davis, the Region has changed---a major break (triggering intermediate and minor breaks as well).

Multi-Level Control Break Logic

The key insight is that breaks cascade: a major break always triggers intermediate and minor breaks. The standard logic is:

PERFORM UNTIL WS-EOF = 'Y'
    EVALUATE TRUE
        WHEN INPUT-REGION NOT = WS-PREV-REGION
            PERFORM MINOR-BREAK
            PERFORM INTERMEDIATE-BREAK
            PERFORM MAJOR-BREAK
        WHEN INPUT-DISTRICT NOT = WS-PREV-DISTRICT
            PERFORM MINOR-BREAK
            PERFORM INTERMEDIATE-BREAK
        WHEN INPUT-SALESPERSON NOT = WS-PREV-SALESPERSON
            PERFORM MINOR-BREAK
    END-EVALUATE
    PERFORM PROCESS-DETAIL-RECORD
    PERFORM READ-NEXT-RECORD
END-PERFORM

Each break paragraph prints the subtotal for that level, rolls the accumulator up to the next level, and resets the lower-level accumulator to zero.

Accumulators

Accumulators are the numeric fields that hold running totals. In a three-level control break:

  • WS-MINOR-TOTAL: Accumulates detail amounts for the current minor group.
  • WS-INTERMEDIATE-TOTAL: Accumulates minor totals for the current intermediate group.
  • WS-MAJOR-TOTAL: Accumulates intermediate totals for the current major group.
  • WS-GRAND-TOTAL: Accumulates major totals across the entire file.

When a minor break occurs: 1. Print the minor total. 2. Add the minor total to the intermediate total. 3. Reset the minor total to zero.

See Example 1 (example-01-control-break.cob) for a complete multi-level control break program with three levels of subtotals.


28.4 Sequential Master File Update

The sequential master file update is perhaps the most classic of all batch patterns. It takes an existing master file and a file of transactions, and produces a new master file that reflects all the changes.

The Balanced Line Algorithm

The "balanced line algorithm" (also called the "matching record" technique) processes two sorted files in a single pass. Both files must be sorted by the same key. The algorithm compares the keys and takes one of three actions:

  1. Master key < Transaction key: The master record has no matching transaction. Write it to the new master unchanged and read the next master record.
  2. Master key = Transaction key: The master record has a matching transaction. Apply the transaction (change or delete) and read the next records from both files.
  3. Master key > Transaction key: The transaction has no matching master. If it is an "add" transaction, write a new master record. Otherwise, report an error.

The algorithm uses "high-values" sentinels: when either file reaches end-of-file, its key is set to HIGH-VALUES (binary all-ones), which is greater than any real key. Processing continues until both keys are HIGH-VALUES.

PERFORM UNTIL WS-MASTER-KEY = HIGH-VALUES
               AND WS-TRANS-KEY = HIGH-VALUES
    EVALUATE TRUE
        WHEN WS-MASTER-KEY < WS-TRANS-KEY
            PERFORM WRITE-MASTER-UNCHANGED
            PERFORM READ-MASTER
        WHEN WS-MASTER-KEY = WS-TRANS-KEY
            PERFORM APPLY-TRANSACTION
            PERFORM READ-MASTER
            PERFORM READ-TRANSACTION
        WHEN WS-MASTER-KEY > WS-TRANS-KEY
            PERFORM PROCESS-UNMATCHED-TRANS
            PERFORM READ-TRANSACTION
    END-EVALUATE
END-PERFORM

Transaction Types

Sequential update programs typically handle three transaction types:

  • Add (A): Creates a new master record. Valid only when no matching master exists.
  • Change (C): Modifies fields in an existing master record. Valid only when a matching master exists.
  • Delete (D): Removes a master record. Valid only when a matching master exists.

Multiple transactions for the same key must be handled carefully. A common approach is to collect all transactions for a given key before processing them.

Error Handling in Master Updates

Robust master update programs detect and report:

  • Add transactions where a master already exists
  • Change or delete transactions where no master exists
  • Invalid transaction codes
  • Data validation failures

Rejected transactions are typically written to an error file along with a reason code.

See Example 2 (example-02-master-update.cob) for a complete sequential master file update program.


28.5 File Matching and Merging

File matching extends the balanced line algorithm to coordinate reads across multiple sorted input files. Common scenarios include:

  • Merging transactions from multiple sources into a single file
  • Matching records across systems to find discrepancies
  • Combining data from several extracts into a consolidated view

The Merge Pattern

When merging multiple files, the program compares the current keys from all files and selects the lowest. That record is written to the output, and the next record is read from the source file. This continues until all files are exhausted.

For two files, the logic is:

PERFORM UNTIL WS-KEY-A = HIGH-VALUES
               AND WS-KEY-B = HIGH-VALUES
    EVALUATE TRUE
        WHEN WS-KEY-A < WS-KEY-B
            WRITE OUTPUT-RECORD FROM INPUT-A-RECORD
            PERFORM READ-FILE-A
        WHEN WS-KEY-A > WS-KEY-B
            WRITE OUTPUT-RECORD FROM INPUT-B-RECORD
            PERFORM READ-FILE-B
        WHEN WS-KEY-A = WS-KEY-B
            PERFORM HANDLE-MATCH
            PERFORM READ-FILE-A
            PERFORM READ-FILE-B
    END-EVALUATE
END-PERFORM

The handling of matches depends on the business requirement. In a pure merge, both records are written. In a matching operation, only the matched pairs might be written. In a reconciliation, the unmatched records from each file are of primary interest.

Using COBOL's MERGE Statement

COBOL provides a built-in MERGE statement that can merge pre-sorted files:

MERGE SORT-FILE
    ON ASCENDING KEY SORT-KEY
    USING INPUT-FILE-A INPUT-FILE-B
    GIVING OUTPUT-FILE.

This is simpler but offers less control than a programmatic merge. When you need to apply business logic during the merge---validating, transforming, or filtering records---a programmatic approach is necessary.

See Example 3 (example-03-matching-merge.cob) for a complete matching and merging program.


28.6 Extract-Transform-Load (ETL)

ETL is a pattern for moving data between systems. It has three distinct phases:

Extract

Read data from the source system. In a mainframe context, this might mean reading a VSAM file, a DB2 table (via embedded SQL), or a sequential extract file produced by another program.

Transform

Apply business rules to convert the data from the source format to the target format:

  • Data type conversion: Convert packed decimal to display numeric, or EBCDIC to ASCII.
  • Code translation: Map source codes to target codes using reference tables.
  • Validation: Check that all required fields are present and valid.
  • Derivation: Calculate new fields from existing ones.
  • Aggregation: Summarize detail records into totals.
  • Filtering: Exclude records that do not meet selection criteria.

Load

Write the transformed data to the target system. This might be a sequential file for transmission to another platform, a VSAM file for an online application, or an insert into a DB2 table.

Validation and Cleansing

A critical part of the ETL pattern is data validation. Records that fail validation are typically split into two streams:

  • Clean file: Records that pass all validation checks, ready for loading.
  • Reject file: Records that fail one or more checks, along with error descriptions.

A well-designed ETL program validates:

  • Required fields are not empty or zero
  • Numeric fields contain valid numbers
  • Dates are valid calendar dates
  • Code fields contain recognized values
  • Cross-field edits (e.g., if status = "Active", then balance must be >= 0)
  • Referential integrity (e.g., department code exists in department table)

See Example 5 (example-05-extract-transform.cob) for a complete ETL program with validation.


28.7 Advanced Batch Patterns

Checkpoint/Restart

When a batch program processes millions of records and fails near the end, restarting from the beginning wastes hours of processing time. The checkpoint/restart pattern solves this by periodically saving the program's state so that it can resume from the last checkpoint after a failure.

Writing Checkpoints

At regular intervals (every N records, or every N minutes), the program writes a checkpoint record containing:

  • The current position in the input file (record number or key)
  • All accumulators and counters
  • A timestamp

Restarting from a Checkpoint

When restarting, the program:

  1. Reads the checkpoint file
  2. Restores all accumulators and counters
  3. Repositions to the correct point in the input file (by reading and discarding records until the checkpoint position is reached)
  4. Resumes normal processing

COBOL RERUN Clause

COBOL provides the RERUN clause in the I-O-CONTROL paragraph for automatic checkpointing:

I-O-CONTROL.
    RERUN ON CHECKPOINT-FILE
        EVERY 10000 RECORDS OF INPUT-FILE.

However, most mainframe shops prefer programmatic checkpointing for greater control.

See Example 4 (example-04-checkpoint-restart.cob) for a complete checkpoint/restart implementation.

Batch Window Optimization

As batch windows shrink, programmers must optimize their jobs:

  • Parallel processing: Identify independent jobs that can run simultaneously. If the accounts receivable update and the accounts payable update use different files, they can run in parallel.
  • Reduced I/O: Every file open, close, read, and write consumes time. Combine processing steps that read the same file into a single program rather than making multiple passes.
  • Efficient sorting: Use DFSORT or SYNCSORT for sorting rather than writing sort logic in COBOL. The sort utilities are highly optimized for the mainframe architecture.
  • Multi-step vs. multi-program: Sometimes a single program with multiple processing steps is more efficient than multiple programs, because it eliminates intermediate files.

Error Handling Framework

Production batch programs need robust error handling:

  • Error file: Write rejected or erroneous records to a separate file with error codes and descriptions.
  • Error counts: Track the number of errors by type.
  • Threshold-based abort: If errors exceed a configurable threshold (e.g., 1% of records or 1000 total errors), abort the program rather than continuing to produce unreliable output.
  • Return codes: Set appropriate return codes so that downstream job steps and scheduling systems can respond correctly.

Statistics and Auditing

Every batch program should report:

  • Record counts: Records read, processed, written, rejected, and skipped. The input count should equal the sum of output counts.
  • Hash totals: Sum of a key field across all records. The hash total of the input should match the hash total of the output (adjusted for adds and deletes). This detects lost or duplicated records.
  • Control totals: Sum of monetary amounts. Input control total should equal output control total, providing assurance that no money was lost or created.
  • Processing time: Elapsed time and CPU time consumed.
  • Checkpoint information: Number of checkpoints taken, records processed since last checkpoint.

Header/Trailer Record Processing

Many batch files include header and trailer records:

  • Header record: Contains the file creation date, source system identifier, expected record count, and a file identifier. The program validates the header before processing.
  • Trailer record: Contains the actual record count, hash totals, and control totals. The program compares these to its own counts to verify that the entire file was received and processed correctly.
EVALUATE INPUT-RECORD-TYPE
    WHEN 'H'
        PERFORM PROCESS-HEADER
    WHEN 'D'
        PERFORM PROCESS-DETAIL
    WHEN 'T'
        PERFORM PROCESS-TRAILER
    WHEN OTHER
        PERFORM HANDLE-INVALID-RECORD-TYPE
END-EVALUATE

28.8 Batch Job Design

Job Flow Design

A batch cycle is not a single program---it is a series of programs (jobs) that must execute in a specific order. Designing the job flow involves:

  1. Identifying the programs: List every program needed for the batch cycle.
  2. Defining dependencies: Program B cannot start until Program A completes successfully. Draw a dependency graph.
  3. Finding parallelism: Programs that have no dependencies on each other can run simultaneously.
  4. Identifying the critical path: The longest chain of sequential dependencies determines the minimum elapsed time.
  5. Balancing workload: Distribute programs across available processors to minimize total elapsed time.

A typical end-of-day batch cycle for a bank might look like:

Step 1: Sort transactions by account (SORT)
Step 2: Post transactions to accounts (COBOL program)
Step 3a: Calculate interest (COBOL) [parallel with 3b]
Step 3b: Generate regulatory extract (COBOL) [parallel with 3a]
Step 4: Generate statements (COBOL) [depends on 2 and 3a]
Step 5: Produce management reports (COBOL) [depends on 2]

Return Codes

Mainframe batch programs communicate success or failure through return codes (also called condition codes). The standard conventions are:

Return Code Meaning Action
0 Success Continue with next step
4 Warning Continue, but investigate
8 Error Skip dependent steps
12 Severe error Abort the job
16 Critical Abort immediately, notify operations

In COBOL, the return code is set using:

MOVE 0 TO RETURN-CODE.

or, more commonly at the end of a program:

EVALUATE TRUE
    WHEN WS-ERROR-COUNT = 0
        MOVE 0 TO RETURN-CODE
    WHEN WS-ERROR-COUNT < WS-ERROR-THRESHOLD
        MOVE 4 TO RETURN-CODE
    WHEN OTHER
        MOVE 8 TO RETURN-CODE
END-EVALUATE.

Step Condition Code Checking

In JCL, the COND parameter controls whether a step executes based on the return codes of previous steps:

//STEP02  EXEC PGM=PROGRAM2,COND=(8,LT)

This means: skip STEP02 if 8 is less than any previous return code (i.e., if any previous step returned more than 8). The modern alternative is the IF/THEN/ELSE/ENDIF construct:

//     IF (STEP01.RC <= 4) THEN
//STEP02  EXEC PGM=PROGRAM2
//     ENDIF

Graceful Degradation

Well-designed batch jobs degrade gracefully rather than failing catastrophically:

  • If a non-critical step fails, subsequent steps can still run if they do not depend on it.
  • Warning return codes allow jobs to complete while flagging issues for investigation.
  • Error recovery procedures specify exactly what to do when each step fails.
  • Empty input files are handled gracefully (produce empty output, return code 0 or 4) rather than causing abends.

28.9 Batch Scheduling Concepts

Job Schedulers

In production, batch jobs are managed by scheduling software rather than being submitted manually. The major mainframe job schedulers are:

  • TWS (Tivoli Workload Scheduler), formerly OPC (Operations Planning and Control): IBM's scheduling product. Defines job streams with dependencies, time windows, and resource requirements.
  • CA-7 (Broadcom/CA): A widely used job scheduler that manages job dependencies, scheduling, and monitoring.
  • Control-M (BMC): A cross-platform scheduling tool that can coordinate batch jobs across mainframes, distributed systems, and cloud environments.

Job Dependencies and Triggers

Schedulers manage several types of dependencies:

  • Job-to-job: Job B starts only after Job A completes successfully.
  • File availability: A job starts when a particular file or dataset becomes available.
  • Time-based: A job starts at a specific time, such as 11:00 PM.
  • Event-based: A job starts in response to an external event, such as receiving a file from a business partner.
  • Resource-based: A job starts only when required resources (tape drives, disk space, database connections) are available.

Calendar-Based Scheduling

Batch schedules follow business calendars:

  • Daily jobs: Run every business day (not weekends or holidays).
  • Weekly jobs: Run on a specific day of the week.
  • Monthly jobs: Run on the last business day of the month, or the first business day.
  • Quarterly and annual jobs: Run at end of quarter or end of year.
  • Holiday adjustments: Jobs that normally run on a holiday are rescheduled to the previous or next business day.

Batch Window Management

Managing the batch window involves:

  • Monitoring elapsed time: Track how long each job takes. If a job runs longer than expected, alert the operations team.
  • Resource allocation: Ensure sufficient MIPS (processing power), DASD (disk), and channel bandwidth for the batch workload.
  • Conflict resolution: Prevent batch jobs from competing with online systems for the same resources.
  • Window compression: Continuously optimize to complete the batch cycle in less time, creating buffer for unexpected issues.

28.10 Performance Considerations

Block Size Optimization

Sequential files are read and written in blocks. Larger blocks reduce the number of I/O operations:

//MASTER   DD DSN=PROD.MASTER.FILE,DISP=SHR,
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=27800)

A block size of 27800 for a 200-byte record means 139 records per block. The system parameter BLKSIZE=0 lets the operating system choose an optimal block size based on the device type.

Buffering

The BUFNO parameter specifies the number of I/O buffers:

//MASTER   DD DSN=PROD.MASTER.FILE,DISP=SHR,
//            DCB=(BUFNO=20)

More buffers allow the system to read ahead, reducing wait time between I/O operations. For sequential processing, 5 to 20 buffers is typical. The trade-off is memory consumption.

Minimize File Opens and Closes

Each file open and close involves system overhead: catalog lookups, allocation, OPEN/CLOSE processing. If a batch job needs to read the same file in multiple steps, consider:

  • Processing all steps in a single program
  • Using the COBOL SORT with INPUT PROCEDURE and OUTPUT PROCEDURE to combine sort and processing

Efficient PERFORM Structure

The structure of PERFORM loops affects performance:

* Less efficient - tests condition at bottom
PERFORM PROCESS-RECORD
    UNTIL WS-EOF = 'Y'.

* More efficient in some compilers - inline PERFORM
PERFORM UNTIL WS-EOF = 'Y'
    ADD 1 TO WS-RECORD-COUNT
    PERFORM PROCESS-CURRENT-RECORD
    PERFORM READ-NEXT-RECORD
END-PERFORM

Inline PERFORMs avoid the overhead of transferring control to an out-of-line paragraph for each iteration.

Use Computational Data Types

For arithmetic operations, use COMP (binary) or COMP-3 (packed decimal) rather than DISPLAY (zoned decimal):

01  WS-COUNTER        PIC 9(8) COMP.
01  WS-AMOUNT         PIC S9(13)V99 COMP-3.

COMP-3 is particularly efficient for decimal arithmetic on the mainframe because the hardware has native packed decimal instructions. COMP is best for subscripts, counters, and integer arithmetic.

Additional Performance Tips

  • Avoid unnecessary MOVE operations: Each MOVE consumes CPU cycles. Do not move data to a work area if you can process it in place.
  • Use SEARCH ALL for table lookups: Binary search is O(log n) compared to O(n) for sequential search.
  • Minimize STRING and UNSTRING usage: These are relatively expensive operations. If the format is fixed, use reference modification or simple MOVEs instead.
  • Avoid unnecessary COMPUTE: For simple arithmetic (adding 1, multiplying by a constant), ADD, SUBTRACT, MULTIPLY, and DIVIDE are slightly more efficient than COMPUTE.

28.11 Batch Testing Strategies

Test Data Creation

Creating realistic test data is critical for batch program testing:

  • Edge cases: Empty files, single-record files, files with only headers and trailers.
  • Boundary values: Maximum and minimum field values, dates at month/year boundaries.
  • Error conditions: Invalid data in every field, missing required fields, duplicate keys.
  • Volume testing: Test with production-scale volumes to verify performance and resource usage.
  • Sequence testing: For programs that require sorted input, include records that test the sort order.

A common approach is to extract a subset of production data (with sensitive fields masked) and augment it with hand-crafted test records for specific scenarios.

Parallel Testing

When replacing an existing batch program, parallel testing compares old and new results:

  1. Run both the old and new programs against the same input.
  2. Compare the outputs record by record.
  3. Investigate and explain every difference.
  4. Continue parallel testing for multiple cycles (daily, monthly) until confident.

The IDCAMS REPRO and IEBCOMPR utilities, or custom comparison programs, are used for this purpose.

Regression Testing

Maintain a library of test cases with known-good outputs:

  • After each program change, re-run all test cases.
  • Compare outputs to the known-good versions.
  • Any unexpected differences indicate a regression (a bug introduced by the change).
  • Automate this process so it runs as part of the build/deploy pipeline.

28.12 A Complete Batch Framework

Production batch programs share so many common elements that it makes sense to define a standard framework. Example 6 (example-06-batch-framework.cob) demonstrates a complete batch framework that includes:

  • Standardized initialization and termination
  • Parameter file processing
  • Error handling with configurable thresholds
  • Record counting and control totals
  • Processing statistics reporting
  • Return code management
  • Checkpoint/restart hooks

Using a consistent framework across all batch programs in a shop provides several benefits:

  • Consistency: All programs behave the same way, making them easier to understand and maintain.
  • Quality: Common error handling and statistics logic is written and tested once.
  • Productivity: Programmers can focus on business logic rather than infrastructure.
  • Operations: Operations staff know what to expect from every program: standard return codes, standard statistics reports, standard error handling.

28.13 Common Batch Processing Scenarios in Finance

End-of-Day Processing

The end-of-day (EOD) batch cycle is the most critical daily batch window. For a bank, it typically includes:

  1. Transaction posting: Apply the day's transactions to account balances.
  2. Balance reconciliation: Verify that debits equal credits across all accounts.
  3. Interest accrual: Calculate daily interest on all interest-bearing accounts.
  4. Fee assessment: Apply monthly or periodic fees where applicable.
  5. Overdraft processing: Identify overdrawn accounts and initiate notifications.
  6. GL posting: Summarize transactions for the General Ledger.
  7. Regulatory reporting: Generate required regulatory extracts (SAR, CTR, OFAC).
  8. Archive: Move processed transactions to archive files.

Interest Calculation and Posting

Interest calculation is a classic batch application:

For each account:
    Determine the interest rate (based on account type, balance tier, rate table)
    Calculate daily interest = balance * (annual rate / 365)
    Accumulate daily interest in an accrual field
    On posting day (monthly), add accrued interest to balance
    Generate interest posting transaction
    Reset accrual to zero

This requires reading every account, performing calculations, and writing updates---a perfect fit for sequential batch processing.

Statement Generation

Monthly statement generation processes every active account:

  1. Read account master for current balances.
  2. Read transaction history for the statement period.
  3. Format the statement with opening balance, transactions, interest, fees, and closing balance.
  4. Write the statement to a print file or electronic delivery system.
  5. Update the "last statement date" on the account master.

For a large bank with millions of accounts, this job can run for hours and is a prime candidate for checkpoint/restart and parallelization (splitting accounts across multiple jobs by account number range).

Regulatory Reporting

Financial institutions must produce regular reports for regulatory agencies:

  • Call reports: Quarterly balance sheet and income statement data.
  • CTR (Currency Transaction Reports): Reports of cash transactions over $10,000.
  • SAR (Suspicious Activity Reports): Reports of potentially fraudulent activity.
  • OFAC screening: Checking customer names against sanctioned persons lists.

These are typically extract-transform-load operations that read from operational databases, apply formatting rules, and produce files in regulatory-specified layouts.

Archive and Purge

Over time, transaction files grow beyond what is needed for daily operations. Archive and purge batch jobs:

  1. Select records older than a retention threshold (e.g., 7 years for financial records).
  2. Copy selected records to archive storage (tape or cheaper disk).
  3. Delete the archived records from the operational database.
  4. Log the archive operation for audit purposes.

28.14 Putting It All Together

Batch processing is not glamorous, but it is essential. The patterns described in this chapter---control break, sequential update, matching/merging, ETL, checkpoint/restart, and the batch framework---form the foundation of mainframe application development. A programmer who masters these patterns can build reliable, efficient programs that process millions of records night after night, year after year.

The key principles to remember:

  1. Structure every program with initialization, processing, and finalization phases.
  2. Use the read-ahead technique for clean end-of-file handling.
  3. Validate input data rigorously and report errors clearly.
  4. Count everything: records in, records out, records rejected. The counts must balance.
  5. Set return codes that accurately reflect the outcome.
  6. Design for restart: checkpoints save hours when failures occur.
  7. Optimize for the batch window: block sizes, buffering, parallelism.
  8. Test with realistic data and volumes.

These are the disciplines that have kept mainframe batch systems running reliably for decades. They are as relevant today as they were when COBOL was first standardized, because the underlying requirements---processing large volumes of data accurately, reliably, and efficiently---have not changed.


Summary

This chapter covered the major patterns and principles of batch processing in a COBOL mainframe environment:

  • Batch vs. online processing: Batch handles high-volume, scheduled, sequential workloads without user interaction.
  • Read-Process-Write: The fundamental three-phase batch pattern.
  • Control break processing: Producing hierarchical reports with multi-level subtotals.
  • Sequential master file update: The balanced line algorithm for applying transactions to a master file.
  • File matching and merging: Coordinating reads across multiple sorted files.
  • ETL (Extract-Transform-Load): Moving and transforming data between systems.
  • Checkpoint/restart: Saving program state for recovery from failures.
  • Error handling: Error files, counts, thresholds, and return codes.
  • Statistics and auditing: Record counts, hash totals, control totals.
  • Job flow design: Dependencies, parallelism, critical path analysis.
  • Scheduling: Job schedulers, triggers, calendars, batch window management.
  • Performance: Block sizes, buffering, efficient data types, minimizing I/O.
  • Testing: Test data creation, parallel testing, regression testing.
  • Financial batch scenarios: EOD processing, interest calculation, statements, regulatory reporting, archive/purge.

The code examples that accompany this chapter demonstrate these patterns in complete, working COBOL programs with their associated JCL. Study them carefully, as you will encounter variations of these patterns in virtually every mainframe COBOL shop.


In the next chapter, we will explore CICS programming, where COBOL programs interact with users in real time through the Customer Information Control System.