Chapter 25: Parallel Batch Processing — Multi-Step Pipelines, Partitioned Processing, and DB2 Parallelism

DataField.Dev

36 min read

You have a six-hour batch window. Your end-of-day processing takes seven hours running serially. You can optimize individual programs until the heat death of the universe and still not close that gap. The answer is not faster code — it is more code...

In This Chapter

25.1 Why Parallelize Batch
25.2 Partitioned Processing Design
25.3 Implementing Partition-Safe COBOL Programs
25.4 DB2 Parallelism for Batch
25.5 Multi-Step Pipeline Design
25.6 SORT Parallelism and DFSORT Tricks
25.7 Monitoring and Troubleshooting Parallel Batch
25.8 Progressive Project: HA Banking System — Parallel Batch Architecture
Chapter Summary
Key Terms

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 25: Parallel Batch Processing — Multi-Step Pipelines, Partitioned Processing, and DB2 Parallelism

You have a six-hour batch window. Your end-of-day processing takes seven hours running serially. You can optimize individual programs until the heat death of the universe and still not close that gap. The answer is not faster code — it is more code running at the same time.

Parallel batch processing is how every serious mainframe shop meets its batch window. It is not exotic. It is not optional. If you are running production batch on z/OS and you are not parallelizing, you are already behind.

This chapter covers three dimensions of parallelism that matter for COBOL batch: partitioned processing (splitting work across multiple copies of the same program), DB2 parallelism (letting the database engine use multiple engines and I/O paths simultaneously), and multi-step pipeline design (orchestrating parallel streams with fan-out and fan-in merge points). These are not theoretical concepts. They are what you will implement on Monday morning when the batch window shrinks again.

Learning Objectives:

By the end of this chapter, you will be able to:

Design partitioned batch processing that splits work across multiple parallel tasks
Implement COBOL programs that support partition-safe checkpoint/restart
Configure DB2 parallelism (I/O parallelism, CP parallelism, Sysplex query parallelism) for batch workloads
Design multi-step batch pipelines with parallel streams and merge points
Architect parallel batch processing for the HA banking system

Spaced Review — Chapter 8 (DB2 Locking): Parallel batch processing multiplies lock contention. If you have four partitions hitting the same DB2 tablespace simultaneously, every locking concept from Chapter 8 is amplified. Before proceeding, recall: lock escalation thresholds, LOCKSIZE ROW vs PAGE vs TABLE, and the difference between shared and exclusive locks. Parallel batch that ignores locking will deadlock within minutes.

Spaced Review — Chapter 23 (Batch Window Management): Chapter 23 established critical path analysis for batch streams. Parallelism changes the critical path. A six-step serial chain becomes a three-step parallel execution with fan-out and fan-in — but the critical path is now determined by the slowest partition, not the total work. Review your batch dependency graphs before parallelizing.

Spaced Review — Chapter 24 (Checkpoint/Restart): Checkpoint/restart in a single-program context is already complex. Partition-level checkpoint/restart adds another dimension: each partition must checkpoint independently, and restart must resume only the failed partition without reprocessing work completed by successful partitions. Review your checkpoint interval calculations and commit strategies.

25.1 Why Parallelize Batch

The case for parallel batch processing is arithmetic. If a program processes 10 million records in 60 minutes serially, and you split those records into four partitions of 2.5 million each, and each partition runs on a separate task with its own I/O paths — you finish in roughly 15 minutes. Not exactly 15, because partitioning has overhead, partitions are rarely perfectly balanced, and shared resources (DB2, DASD controllers, coupling facility) create contention. But 18–20 minutes is realistic, and that is still a 65–70% reduction in elapsed time.

The Batch Window Problem

Every mainframe shop faces the same squeeze. Business grows. Transaction volumes increase. New regulatory requirements add processing steps. The online window expands as global operations demand 20-hour or 22-hour online availability. And the batch window shrinks.

Kwame at CNB has seen this firsthand. Five years ago, CNB's end-of-day batch ran from 11 PM to 4 AM — a comfortable five-hour window with margin. Today, the online system stays up until midnight for West Coast branches, and the morning online window opens at 4:30 AM for European correspondent banking. The batch window is four and a half hours, and the serial processing time is over six hours. Without parallelism, CNB cannot close its books.

Types of Parallelism

There are three distinct types of parallelism in mainframe batch, and they operate at different levels:

Application-Level Parallelism (Partitioned Processing): You split the input data and run multiple copies of the same program, each processing a different subset. This is the most common form and gives you the most control. It operates at the JCL and application level.

Database-Level Parallelism (DB2 Parallelism): DB2 can parallelize query execution across multiple engines, I/O paths, and Sysplex members. This operates within DB2 and is controlled through BIND parameters, ZPARM settings, and table design. A single COBOL program benefits without code changes.

Infrastructure-Level Parallelism (SORT and Utility Parallelism): DFSORT, ICETOOL, and IDCAMS can parallelize their own operations across multiple I/O paths and engines. This operates below the application level and is controlled through sort parameters and system configuration.

The most effective parallel batch designs combine all three. Your JCL splits work into four partitions. Each partition's COBOL program issues DB2 queries that themselves parallelize across two engines. And the merge step uses DFSORT with hiperspace buffering and parallel I/O. The multiplicative effect is dramatic.

When Not to Parallelize

Not every batch job benefits from parallelism. Jobs that are CPU-bound on a single task with no natural partition key gain nothing from splitting — you just add overhead. Jobs that process a tiny volume (under 100,000 records) rarely justify the complexity. Jobs with heavy inter-record dependencies where record N's processing depends on record N-1's result cannot be partitioned without fundamental redesign.

The decision framework is straightforward:

Factor	Favors Parallelism	Favors Serial
Volume	> 500K records	< 100K records
Elapsed time	> 30 minutes	< 10 minutes
Natural partition key exists	Yes — account range, region, date	No obvious split
I/O bound vs CPU bound	I/O bound (common)	Pure CPU (rare in batch)
Inter-record dependency	None or within-partition only	Cross-partition dependencies
Batch window pressure	Tight window	Plenty of margin
Operational complexity tolerance	Mature operations team	Skeleton crew

Sandra at Federal Benefits learned this the hard way. Her team parallelized a benefits recalculation job that processes only 80,000 records. The four-partition version finished in 3 minutes instead of 8 minutes — a savings of 5 minutes that cost two weeks of development, introduced partition-boundary bugs, and created an operational procedure that confused the night shift. The serial version was fine. Save parallelism for jobs where it matters.

25.2 Partitioned Processing Design

Partitioned processing is the workhorse of parallel batch. The concept is simple: split the input, run multiple tasks, combine the output. The engineering is in the details.

Choosing a Partition Key

The partition key determines how work is divided. A good partition key has four properties:

Even distribution: Each partition gets roughly equal work. A key that puts 80% of records in one partition and 5% in each of the other four is worse than serial processing — you wait for the overloaded partition.
No cross-partition dependencies: Processing a record in partition A must not require data from partition B. If account 1000001 in partition 1 needs the balance from account 5000001 in partition 3, you cannot parallelize by account range without a pre-pass.
Alignment with physical data layout: If your VSAM file is keyed by account number, partitioning by account range means each partition reads a contiguous segment of the file. Partitioning by geographic region when the file is sorted by account number means every partition reads the entire file, which is worse than serial.
Compatibility with checkpoint/restart: Each partition must be independently restartable. The partition key must support resuming from a checkpoint without reprocessing other partitions' work.

Partition Strategies

Key Range Partitioning: Split by ranges of the primary key. Account numbers 0000001–2500000 in partition 1, 2500001–5000000 in partition 2, and so on. This is the most common strategy and works well when the key is numeric, sequential, and evenly distributed.

Partition 1: ACCT-NUM 0000001 - 2500000
Partition 2: ACCT-NUM 2500001 - 5000000
Partition 3: ACCT-NUM 5000001 - 7500000
Partition 4: ACCT-NUM 7500001 - 9999999

The trap is uneven distribution. At CNB, account numbers starting with 1 and 2 were assigned to the original branches opened in the 1970s. Those branches have the most accounts. A naive four-way split by first digit puts 45% of accounts in the first partition. Lisa solved this by analyzing the actual distribution and computing balanced split points — the boundaries are 0000001/1847293/4102887/6850441, not neat round numbers.

Hash Partitioning: Apply a hash function to the key and use modulo arithmetic to assign records to partitions. PARTITION-NUM = FUNCTION MOD(FUNCTION ORD(ACCT-NUM(1:1)) + ACCT-NUM-NUMERIC, 4) + 1. Hash partitioning gives nearly perfect balance regardless of key distribution, but it destroys key-order locality. Each partition's records are scattered across the file, which means random I/O instead of sequential.

Hash partitioning works well when: - The input is in a DB2 table (random access is the same cost as sequential) - You do not need sorted output from each partition - Key distribution is highly skewed and range partitioning cannot balance

To see why hash wins on skewed data, measure both approaches on the same input before you commit. Take CNB's account master — the one where 1970s branches cluster in the low number range — and run a one-pass distribution probe in SQL for each candidate scheme. For a naive range split on the first digit:

SELECT SUBSTR(ACCT_NUM, 1, 1) AS BUCKET, COUNT(*) AS CNT
  FROM ACCOUNT_MASTER GROUP BY SUBSTR(ACCT_NUM, 1, 1) ORDER BY 1;

and for the hash scheme, group by the same MOD(...) expression the batch will use. On CNB's 8.7-million-row master the two probes produced very different spreads across four partitions:

Partition	Naive range (1st digit)	Hash (MOD 4)
1	3,915,000 (45%)	2,178,000 (25.0%)
2	2,610,000 (30%)	2,176,500 (25.0%)
3	1,305,000 (15%)	2,172,000 (25.0%)
4	870,000 (10%)	2,173,500 (25.0%)

The way to quantify "good enough" is a chi-squared uniformity check: with expected count E = total / N per partition, compute χ² = Σ (observedᵢ − E)² / E. For the range split χ² is enormous (the 45%/10% spread is wildly non-uniform); for the hash split χ² is near zero, well under the critical value for 3 degrees of freedom (7.81 at p=0.05). In practice you do not need the full statistic — the rule of thumb on the floor is the imbalance ratio (largest partition count ÷ smallest). Range here is 4.5:1, which means partition 1 runs 4.5× longer than partition 4 and the whole job is gated by it; hash is 1.003:1. That ratio is the same metric the monitoring section later tracks, so probe it at design time and you will not be surprised in production. The cost of the hash scheme is paid in I/O pattern, not balance: because the hash scatters consecutive accounts across all four partitions, each partition reads its rows randomly, which is free against a DB2 table but punishing against a sequential VSAM file — which is exactly why the "works well when" list above leads with input is in a DB2 table.

Geographic/Organizational Partitioning: Split by business-meaningful boundaries — region, branch, state, department. This is natural when the business already thinks in these terms and when regional data is stored in separate files or tablespaces.

At Pinnacle Health, Diane partitions claims processing by state groupings: Northeast (12 states), Southeast (12 states), Midwest (12 states), and West (14 states). The claim volumes are not perfectly balanced — California alone generates as much volume as the entire Midwest — so she further splits West into West-CA and West-Other, creating a five-partition scheme.

Temporal Partitioning: Split by date or time ranges. Process Monday's transactions in one partition, Tuesday's in another. This works for catch-up processing and historical reprocessing, but is unusual for standard daily batch because a single day's transactions cannot be further split by date.

The Partition Control Table

Production parallel batch uses a partition control table rather than hardcoded boundaries. This table lives in DB2 and is maintained by a setup job that runs before the parallel step:

CREATE TABLE BATCH.PARTITION_CONTROL (
    JOB_NAME        CHAR(8)     NOT NULL,
    RUN_DATE        DATE        NOT NULL,
    PARTITION_NUM   SMALLINT    NOT NULL,
    LOW_KEY         CHAR(20)    NOT NULL,
    HIGH_KEY        CHAR(20)    NOT NULL,
    EXPECTED_COUNT  INTEGER,
    ACTUAL_COUNT    INTEGER,
    STATUS          CHAR(1)     DEFAULT 'P',
    START_TIME      TIMESTAMP,
    END_TIME        TIMESTAMP,
    RETURN_CODE     SMALLINT,
    PRIMARY KEY (JOB_NAME, RUN_DATE, PARTITION_NUM)
);

The STATUS column tracks partition lifecycle: P (pending), R (running), C (completed), F (failed), X (restarting). The setup job analyzes the input data, computes balanced split points, and populates this table. Each partition reads its own row to determine its key range. The merge job reads all rows to confirm all partitions completed successfully.

This design gives you operational visibility — you can query the table to see which partitions finished, which are still running, and which failed. More critically, it enables partition-level restart: if partition 3 fails, you restart only partition 3 by setting its status back to P and resubmitting that single task.

Determining Partition Count

More partitions means more parallelism, but also more overhead. The sweet spot depends on:

Available engines (CPs/zIIPs): No point running 16 partitions on a 4-CP machine — you just time-slice. Four to eight partitions on a 4-CP machine is typical.
I/O bandwidth: Each partition needs its own I/O paths. If four partitions compete for the same DASD volume, you have serialized the I/O and gained nothing.
DB2 connection limits: Each partition is a separate DB2 thread. If your DB2 subsystem is configured for 200 threads and online uses 180, you have room for 20 batch threads — not 50.
Operational complexity: Four partitions are manageable. Thirty-two partitions create thirty-two places for things to go wrong.

Rob at CNB uses a formula: PARTITION_COUNT = MIN(available_CPs, available_IO_paths / 2, DB2_batch_thread_limit / 2, 8). The divide-by-two gives headroom. The cap at 8 reflects operational reality — CNB's night shift can handle eight parallel tasks but not sixteen.

25.3 Implementing Partition-Safe COBOL Programs

A partition-safe COBOL program is one that can be run as multiple concurrent copies, each processing a different data subset, without interference. This requires discipline in four areas.

Partition Parameter Passing

Each copy of the program must know which partition it is processing. The standard approach is to pass the partition number and key boundaries through the JCL PARM or through SYSIN:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. ACCTPROC.
      *
       DATA DIVISION.
       WORKING-STORAGE SECTION.
      *
       01  WS-PARTITION-INFO.
           05  WS-PARTITION-NUM       PIC 9(02).
           05  WS-LOW-KEY             PIC X(10).
           05  WS-HIGH-KEY            PIC X(10).
           05  WS-TOTAL-PARTITIONS    PIC 9(02).
      *
       01  WS-PARM-DATA.
           05  WS-PARM-LENGTH         PIC S9(04) COMP.
           05  WS-PARM-STRING         PIC X(100).
      *
       LINKAGE SECTION.
       01  LS-PARM.
           05  LS-PARM-LENGTH         PIC S9(04) COMP.
           05  LS-PARM-DATA           PIC X(100).
      *
       PROCEDURE DIVISION USING LS-PARM.
       0000-MAIN.
           PERFORM 1000-INIT-PARTITION
           PERFORM 2000-PROCESS-PARTITION
           PERFORM 9000-CLEANUP
           STOP RUN.
      *
       1000-INIT-PARTITION.
           IF LS-PARM-LENGTH = ZEROS
              DISPLAY 'NO PARTITION PARM - ABORTING'
              MOVE 16 TO RETURN-CODE
              STOP RUN
           END-IF
           UNSTRING LS-PARM-DATA
              DELIMITED BY ','
              INTO WS-PARTITION-NUM
                   WS-LOW-KEY
                   WS-HIGH-KEY
                   WS-TOTAL-PARTITIONS
           END-UNSTRING
           DISPLAY 'PARTITION ' WS-PARTITION-NUM
                   ' OF ' WS-TOTAL-PARTITIONS
                   ' RANGE: ' WS-LOW-KEY
                   ' TO '     WS-HIGH-KEY
           .

Alternatively, the program reads the partition control table in DB2 using its own partition number (passed via PARM) as a key. This is cleaner and avoids long PARM strings:

       1000-INIT-PARTITION.
           EXEC SQL
              SELECT LOW_KEY, HIGH_KEY
              INTO :WS-LOW-KEY, :WS-HIGH-KEY
              FROM BATCH.PARTITION_CONTROL
              WHERE JOB_NAME = 'ACCTPROC'
                AND RUN_DATE = :WS-CURRENT-DATE
                AND PARTITION_NUM = :WS-PARTITION-NUM
           END-EXEC
           IF SQLCODE NOT = 0
              DISPLAY 'PARTITION CONTROL READ FAILED: '
                      SQLCODE
              MOVE 16 TO RETURN-CODE
              STOP RUN
           END-IF
           EXEC SQL
              UPDATE BATCH.PARTITION_CONTROL
              SET STATUS = 'R',
                  START_TIME = CURRENT TIMESTAMP
              WHERE JOB_NAME = 'ACCTPROC'
                AND RUN_DATE = :WS-CURRENT-DATE
                AND PARTITION_NUM = :WS-PARTITION-NUM
           END-EXEC
           EXEC SQL COMMIT END-EXEC
           .

Partition Key Selection Strategies

Choosing the right partition key is a design decision that determines whether your parallel batch will achieve near-linear speedup or devolve into one fast partition waiting for three slow ones. The key selection is not "pick a column and divide by N" — it requires understanding the data distribution, the processing cost per record, and the downstream merge requirements.

Strategy 1: Equal record count. Divide the keyspace so each partition processes the same number of records. This works when processing cost per record is roughly uniform. Query the data to find quantile boundaries:

WITH RANKED AS (
  SELECT ACCOUNT_NUM,
         NTILE(4) OVER (ORDER BY ACCOUNT_NUM) AS QUARTILE
  FROM ACCOUNT_MASTER
)
SELECT QUARTILE, MIN(ACCOUNT_NUM) AS LOW_KEY, MAX(ACCOUNT_NUM) AS HIGH_KEY,
       COUNT(*) AS RECORD_COUNT
FROM RANKED
GROUP BY QUARTILE
ORDER BY QUARTILE;

Strategy 2: Equal processing cost. When some records are far more expensive to process than others (a premium account with 200 sub-accounts costs 50x more than a basic savings account with one), equal record counts produce partition skew. Instead, weight each record by its estimated processing cost and partition so total weight is equal:

SELECT ACCOUNT_NUM, ESTIMATED_COST
FROM (
  SELECT ACCOUNT_NUM,
         (SUB_ACCOUNT_COUNT * 3 + TRANSACTION_COUNT) AS ESTIMATED_COST,
         SUM(SUB_ACCOUNT_COUNT * 3 + TRANSACTION_COUNT)
           OVER (ORDER BY ACCOUNT_NUM) AS RUNNING_COST,
         SUM(SUB_ACCOUNT_COUNT * 3 + TRANSACTION_COUNT)
           OVER () AS TOTAL_COST
  FROM ACCOUNT_MASTER
)
WHERE RUNNING_COST IN (
  TOTAL_COST / 4, TOTAL_COST / 2, TOTAL_COST * 3 / 4
);

Lisa at CNB switched from equal-count to equal-cost partitioning for their interest calculation batch. The commercial banking partition (25% of accounts but 60% of processing cost due to complex tiered interest structures) had been running 3x longer than the retail partition. After rebalancing by cost, all four partitions completed within 8% of each other, recovering 22 minutes of batch window.

Strategy 3: DB2 partition alignment. If the target DB2 table is already partitioned, align your batch partitions with the DB2 table partitions. This ensures each batch partition touches only its own DB2 partition, eliminating cross-partition lock contention entirely. The downside is that DB2 partition boundaries may not produce equal record counts — but the elimination of lock contention often more than compensates for any imbalance.

Strategy 4: Hash partitioning for skew-resistant distribution. When the natural key has severe distribution skew (some key ranges are densely populated, others sparse), apply a hash function to the key and partition on the hash. This guarantees near-equal distribution regardless of key patterns. The trade-off is that output is no longer in key order, so the merge step requires a full sort rather than a simple merge.

Resource Isolation

Multiple copies of the same program running concurrently must not collide on resources. This means:

Separate output files. Each partition writes to its own output dataset. The JCL uses the partition number in the dataset name or GDG generation:

//OUTFILE  DD DSN=PROD.ACCTPROC.PART&PARTNUM..OUTPUT,
//            DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(100,20),RLSE)

Separate sort work areas. If each partition sorts, it needs its own SORTWKnn datasets on different volumes. Two partitions sharing a SORTWK volume create I/O contention that destroys the parallelism benefit.

Separate checkpoint datasets. Each partition's checkpoint file must be unique. Using a shared checkpoint dataset causes ENQ conflicts and corrupted checkpoints.

No shared QSAM files. Two programs cannot write to the same sequential file simultaneously. Ever. VSAM files can support concurrent access with SHAREOPTIONS 4, but this requires careful design and is error-prone.

Partition-Safe DB2 Access

When four partitions hit the same DB2 table simultaneously, lock contention is the primary risk. Partition-safe DB2 access requires:

Row-level locking. Ensure the tablespace uses LOCKSIZE ROW. Page-level locking causes false contention when two partitions update different rows on the same page.

Partitioned tablespaces. If the DB2 table is partitioned (and it should be for any table large enough to warrant parallel batch processing), each partition should align with DB2 table partitions. Partition 1 of your batch processes DB2 partition 1 of the table. This eliminates cross-partition lock contention entirely.

Commit frequency. Each partition must commit regularly — every 500 to 2,000 rows depending on lock escalation thresholds. Uncommon commits cause lock escalation, which causes tablespace-level locks, which blocks all other partitions.

Deadlock handling. Even with good design, parallel batch can produce occasional deadlocks. Every partition-safe program must handle SQLCODE -911 and -913:

       5000-UPDATE-ACCOUNT.
           MOVE 0 TO WS-DEADLOCK-RETRIES
           PERFORM 5100-TRY-UPDATE
              UNTIL WS-UPDATE-DONE = 'Y'
                 OR WS-DEADLOCK-RETRIES > 3
           IF WS-DEADLOCK-RETRIES > 3
              DISPLAY 'DEADLOCK LIMIT EXCEEDED - '
                      'ACCT: ' WS-ACCOUNT-NUM
              MOVE 12 TO RETURN-CODE
              PERFORM 9000-CLEANUP
              STOP RUN
           END-IF
           .
      *
       5100-TRY-UPDATE.
           EXEC SQL
              UPDATE ACCOUNT_MASTER
              SET BALANCE = :WS-NEW-BALANCE,
                  LAST_UPDATED = CURRENT TIMESTAMP
              WHERE ACCOUNT_NUM = :WS-ACCOUNT-NUM
           END-EXEC
           EVALUATE SQLCODE
              WHEN 0
                 MOVE 'Y' TO WS-UPDATE-DONE
              WHEN -911
              WHEN -913
                 ADD 1 TO WS-DEADLOCK-RETRIES
                 DISPLAY 'DEADLOCK ON ACCT '
                         WS-ACCOUNT-NUM
                         ' - RETRY ' WS-DEADLOCK-RETRIES
                 EXEC SQL ROLLBACK END-EXEC
                 CALL 'CEE3DLY' USING WS-DELAY-SECONDS
              WHEN OTHER
                 DISPLAY 'SQL ERROR: ' SQLCODE
                 MOVE 16 TO RETURN-CODE
                 PERFORM 9000-CLEANUP
                 STOP RUN
           END-EVALUATE
           .

Notice the CEE3DLY call — a short delay before retry. Without it, two partitions that deadlock will immediately retry and deadlock again, creating a livelock. A delay of 1–3 seconds with a random component breaks the cycle.

Partition-Level Checkpoint/Restart

Chapter 24 covered checkpoint/restart for serial batch. Partition-level checkpoint/restart adds complexity because each partition checkpoints independently. The key principles:

Each partition has its own checkpoint ID. Do not use a shared checkpoint sequence number. Partition 1's checkpoint is ACCTPROC-P01-20260315-00047, not ACCTPROC-00047.

Restart resubmits only the failed partition. If partition 3 of 4 fails, you restart partition 3 only. Partitions 1, 2, and 4 already completed successfully — rerunning them wastes time and risks duplicate processing.

The partition control table tracks checkpoint state:

UPDATE BATCH.PARTITION_CONTROL
SET CHECKPOINT_KEY = :WS-LAST-PROCESSED-KEY,
    CHECKPOINT_COUNT = :WS-RECORDS-PROCESSED,
    CHECKPOINT_TIME = CURRENT TIMESTAMP
WHERE JOB_NAME = 'ACCTPROC'
  AND RUN_DATE = :WS-CURRENT-DATE
  AND PARTITION_NUM = :WS-PARTITION-NUM;
COMMIT;

On restart, the program reads its last checkpoint key and repositions:

       1500-RESTART-POSITION.
           EXEC SQL
              SELECT CHECKPOINT_KEY, CHECKPOINT_COUNT
              INTO :WS-RESTART-KEY, :WS-RESTART-COUNT
              FROM BATCH.PARTITION_CONTROL
              WHERE JOB_NAME = 'ACCTPROC'
                AND RUN_DATE = :WS-CURRENT-DATE
                AND PARTITION_NUM = :WS-PARTITION-NUM
                AND STATUS = 'X'
           END-EXEC
           IF SQLCODE = 0
              DISPLAY 'RESTARTING FROM KEY: '
                      WS-RESTART-KEY
              MOVE 'Y' TO WS-RESTART-MODE
              PERFORM 1510-REPOSITION-INPUT
           ELSE
              DISPLAY 'FRESH START - NO CHECKPOINT'
              MOVE 'N' TO WS-RESTART-MODE
           END-IF
           .

Output file handling on restart. This is the hardest part. If partition 3 wrote 60% of its output file before failing, you need to either: (a) rewrite the output from scratch, truncating the file, or (b) reposition the output file to the checkpoint position. Option (a) is simpler and safer. Empty the output, reprocess from the checkpoint key forward, and accept that records before the checkpoint key are re-output. Then the merge step deduplicates.

25.4 DB2 Parallelism for Batch

DB2 for z/OS has three forms of internal parallelism. These operate within the database engine and benefit your COBOL programs without application code changes — but they require deliberate configuration.

I/O Parallelism

I/O parallelism allows DB2 to issue multiple concurrent I/O requests for a single query. When a tablespace scan reads partition 1, 2, and 3 of a partitioned tablespace, I/O parallelism reads all three simultaneously instead of sequentially.

I/O parallelism is controlled by the CURRENT DEGREE special register and the DEGREE bind parameter:

SET CURRENT DEGREE = 'ANY';

Or at bind time:

BIND PLAN(ACCTPROC) DEGREE(ANY) ...

DEGREE(ANY) tells DB2 to use as many parallel operations as it determines beneficial. DEGREE(1) disables parallelism. There is no way to specify an exact degree — DB2 decides based on the number of partitions, available buffer pool pages, and current system load.

I/O parallelism is most effective when: - The tablespace is partitioned (the partitions provide natural units of parallel I/O) - The query performs a tablespace scan or partition scan (index access on a single row does not benefit) - Sufficient buffer pool pages are available (DB2 needs buffers for each parallel stream) - The underlying DASD has multiple I/O paths (parallelism to a single volume is limited by the volume's I/O rate)

Ahmad at Pinnacle Health measured a 3.2x improvement in a claims history query by enabling I/O parallelism on a 12-partition tablespace. The query scanned all partitions — serially, this meant 12 sequential reads. With I/O parallelism, DB2 read 4 partitions at a time (limited by buffer pool size), completing in roughly one-third the time.

CP Parallelism (Query CP Parallelism)

CP parallelism goes beyond I/O — it splits query processing work across multiple zIIP-eligible or CP-eligible tasks. A single SQL statement can use multiple engines for scanning, sorting, joining, and aggregating.

CP parallelism activates when: - DEGREE(ANY) is specified - The query accesses a partitioned tablespace - DB2 determines that the query cost justifies parallel execution - Sufficient system resources (engines, virtual storage, threads) are available

The access path in EXPLAIN shows parallelism in the PARALLELISM_MODE column:

Mode	Meaning
I	I/O parallelism only
C	CP parallelism (I/O + CPU)
X	Sysplex parallelism
(blank)	No parallelism

For batch workloads, CP parallelism is transformative for queries that scan large tablespaces and perform aggregation. A monthly interest calculation that scans 50 million account rows, computes interest for each, and updates the balance — with CP parallelism, DB2 splits this across available engines. On a 6-CP machine, you might see 4 parallel tasks for the query (DB2 reserves some capacity for other work).

Sysplex Query Parallelism

In a data sharing environment (multiple DB2 members in a Sysplex), Sysplex parallelism distributes a single query across multiple DB2 members. Partition 1 of the tablespace is processed by DB2 member DB2A, partition 2 by DB2B, partition 3 by DB2C.

This is the most powerful form of DB2 parallelism and the most complex to configure:

Group Buffer Pools (GBP): In data sharing, changed pages are written to the coupling facility's group buffer pool. Sysplex parallelism increases GBP traffic. You must size GBPs adequately — undersized GBPs cause GBP full conditions that force synchronous writes and destroy performance.

BIND parameters for Sysplex parallelism:

BIND PLAN(ACCTPROC) DEGREE(ANY) ...

The same DEGREE(ANY) enables all three forms. DB2 chooses the appropriate level based on the environment.

Workload balancing: Sysplex parallelism assumes all DB2 members have roughly equal capacity. If DB2A is on a 6-CP LPAR and DB2B is on a 2-CP LPAR, the work assigned to DB2B takes three times longer, and the overall query is gated by the slowest member.

Yuki at SecureFirst runs a 4-member DB2 data sharing group. Sysplex parallelism reduced their nightly fraud scoring batch from 2 hours 40 minutes to 52 minutes — a 3.1x improvement on a 4-member group. Not the theoretical 4x, because coupling facility overhead, GBP cross-invalidation, and member capacity differences consume some of the benefit.

Tuning DB2 Parallelism for Batch

DB2 parallelism is not "set DEGREE ANY and walk away." Production tuning requires:

1. Buffer pool sizing. Each parallel stream needs its own buffer pool pages. If you have 4 parallel streams and each needs 10,000 pages, you need 40,000 pages minimum. Undersized buffer pools degrade parallelism to serial I/O.

2. PARAMDEG ZPARM. This system parameter caps the maximum degree of parallelism. Default is 0 (no limit), but many shops set it to limit resource consumption. A value of 8 means no query uses more than 8 parallel tasks, regardless of how many partitions it accesses.

3. RID pool size. Parallel queries that use list prefetch consume RID pool pages. Insufficient RID pool forces fallback to sequential access.

4. Sort pool (MAXSORT_IN_MEMORY). Parallel queries that require sorting need sort work areas. Each parallel task gets its own sort pool allocation.

5. Thread limits (MAX_BATCH_CONNECTED). Each parallel task inside DB2 counts as a thread for resource accounting. Excessive parallelism can exhaust batch thread limits.

6. Workfile database sizing. Parallel queries generate parallel sort and hash join workfiles. Each parallel task may require its own workfile allocation. If the workfile database runs out of space, DB2 falls back to non-parallel execution — silently. This is one of the most common reasons shops enable DEGREE(ANY) and see no improvement: the workfile database is undersized for the parallel workload.

Kwame's rule of thumb for workfile sizing: take the largest single-query workfile requirement from your existing serial batch, multiply by the maximum PARAMDEG value, and add 50% headroom. For CNB, this meant expanding the workfile database from 8 GB to 35 GB — a significant allocation, but the parallel batch time savings paid for the DASD in the first week.

7. Buffer pool separation for parallel batch. Consider dedicating a buffer pool (e.g., BP3) exclusively for parallel batch workloads. This prevents parallel batch from evicting pages needed by online transactions and vice versa. At Federal Benefits, Marcus discovered that parallel batch queries were polluting the shared buffer pool BP0 with sequential scan pages, causing CICS transaction buffer pool hit ratios to drop from 98% to 72% during the batch window. Creating a dedicated BP3 for batch with ALTER BUFFERPOOL(BP3) VPSIZE(150000) and reassigning batch tablespaces to BP3 resolved the conflict entirely.

Monitoring parallelism: DB2 provides accounting trace class 3 records that show actual parallelism achieved versus requested. The key fields are:

QXDEGAT — degree at which parallelism was attempted
QXDEGRD — degree to which parallelism was reduced (due to resource constraints)
QXREDRN — reason for reduction (buffer pool shortage, thread limit, etc.)

If QXDEGRD is consistently less than QXDEGAT, you have a resource bottleneck preventing full parallelism. The QXREDRN code tells you which resource to expand.

Marcus at Federal Benefits discovered through accounting traces that their batch queries requested parallelism of 8 but consistently ran at 3 due to buffer pool constraints. Adding 100,000 pages to buffer pool BP2 increased actual parallelism to 7, reducing batch elapsed time by 40%.

25.5 Multi-Step Pipeline Design

Real batch processing is not a single parallel step. It is a pipeline: extract, transform, load, validate, report — with some steps that can run in parallel and others that must wait for predecessors. Designing these pipelines is batch architecture.

Pipeline Anatomy

A typical parallel batch pipeline has five phases:

Phase 1 — Setup: Analyze input data, compute partition boundaries, populate the partition control table, allocate partition-specific datasets. This phase is serial and runs once.

Phase 2 — Fan-Out (Split): Divide the input data into partition files. This can be a COBOL program that reads the input and writes to N output files, or it can be a DFSORT/ICETOOL operation. Alternatively, if the input is a DB2 table, there is no physical split — each partition simply queries its own key range.

Phase 3 — Parallel Processing: Run N copies of the processing program, one per partition. All run concurrently. This is where the elapsed time savings occur.

Phase 4 — Fan-In (Merge): Combine the partition outputs into a single consolidated output. This may involve sorting, deduplication, reconciliation, or simple concatenation.

Phase 5 — Finalization: Post-processing that must run after all partitions complete: summary reporting, control total validation, status updates, trigger downstream jobs.

JCL for Parallel Pipelines

JES2 and JES3 handle parallel steps differently, but the common approach uses a scheduling product (TWS/OPC, CA-7, Control-M) to manage parallel execution:

JOB ACCTPROC-SETUP     (serial, runs first)
  |
  +-- JOB ACCTPROC-P01  (parallel, starts after SETUP)
  +-- JOB ACCTPROC-P02  (parallel, starts after SETUP)
  +-- JOB ACCTPROC-P03  (parallel, starts after SETUP)
  +-- JOB ACCTPROC-P04  (parallel, starts after SETUP)
  |
  +-- All four complete
  |
JOB ACCTPROC-MERGE     (serial, starts after all P* jobs)
  |
JOB ACCTPROC-REPORT    (serial, starts after MERGE)

Within a single JCL job, you cannot run steps in parallel (JES executes steps sequentially within a job). Parallel execution requires separate jobs coordinated by an external scheduler.

However, you can achieve a limited form of intra-job parallelism using started tasks or ATTACH/DETACH in assembler. These approaches are rare in production COBOL batch and add complexity that most shops avoid.

Fan-Out Design Patterns

Pattern 1: Pre-split with DFSORT ICETOOL. ICETOOL's OCCUR and SPLICE operators can split a file by key ranges, but the most reliable approach is OUTFIL with INCLUDE:

//SPLIT    EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//IN       DD DSN=PROD.DAILY.TRANS,DISP=SHR
//OUT1     DD DSN=PROD.TRANS.PART01,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(50,10),RLSE)
//OUT2     DD DSN=PROD.TRANS.PART02,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(50,10),RLSE)
//OUT3     DD DSN=PROD.TRANS.PART03,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(50,10),RLSE)
//OUT4     DD DSN=PROD.TRANS.PART04,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(50,10),RLSE)
//TOOLIN   DD *
  SORT FROM(IN) TO(OUT1) USING(SPL1)
  SORT FROM(IN) TO(OUT2) USING(SPL2)
  SORT FROM(IN) TO(OUT3) USING(SPL3)
  SORT FROM(IN) TO(OUT4) USING(SPL4)
/*

Each USING card references a control file that specifies the INCLUDE condition for that partition. This approach reads the input four times — once per partition — which is inefficient for large files.

Pattern 2: Single-pass COBOL splitter. A COBOL program reads the input once and writes to N output files based on the partition key:

       2000-SPLIT-RECORDS.
           READ INPUT-FILE INTO WS-INPUT-RECORD
              AT END MOVE 'Y' TO WS-EOF-FLAG
           END-READ
           IF WS-EOF-FLAG = 'N'
              EVALUATE TRUE
                WHEN WS-ACCT-NUM < WS-SPLIT-POINT(1)
                   WRITE PART01-RECORD FROM WS-INPUT-RECORD
                WHEN WS-ACCT-NUM < WS-SPLIT-POINT(2)
                   WRITE PART02-RECORD FROM WS-INPUT-RECORD
                WHEN WS-ACCT-NUM < WS-SPLIT-POINT(3)
                   WRITE PART03-RECORD FROM WS-INPUT-RECORD
                WHEN OTHER
                   WRITE PART04-RECORD FROM WS-INPUT-RECORD
              END-EVALUATE
              ADD 1 TO WS-RECORDS-WRITTEN
           END-IF
           .

Pattern 3: DB2 query partitioning (no physical split). When the input is a DB2 table, skip the split entirely. Each partition's COBOL program queries its own key range:

SELECT ACCOUNT_NUM, BALANCE, TRANS_COUNT
FROM ACCOUNT_MASTER
WHERE ACCOUNT_NUM BETWEEN :WS-LOW-KEY AND :WS-HIGH-KEY
ORDER BY ACCOUNT_NUM

This is the cleanest approach and eliminates the fan-out step entirely. DB2 partition pruning ensures each program reads only its assigned partitions of the tablespace.

Fan-In / Merge Design Patterns

Pattern 1: DFSORT merge. If partition outputs are sorted, use DFSORT MERGE to combine them in a single pass:

//MERGE   EXEC PGM=SORT
//SYSOUT  DD SYSOUT=*
//SORTIN01 DD DSN=PROD.ACCTPROC.PART01.OUTPUT,DISP=SHR
//SORTIN02 DD DSN=PROD.ACCTPROC.PART02.OUTPUT,DISP=SHR
//SORTIN03 DD DSN=PROD.ACCTPROC.PART03.OUTPUT,DISP=SHR
//SORTIN04 DD DSN=PROD.ACCTPROC.PART04.OUTPUT,DISP=SHR
//SORTOUT  DD DSN=PROD.ACCTPROC.MERGED.OUTPUT,
//            DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(200,40),RLSE)
//SYSIN    DD *
  MERGE FIELDS=(1,10,CH,A)
/*

MERGE is dramatically faster than SORT because the inputs are already sorted. The merge operation is O(N) rather than O(N log N).

Pattern 2: Concatenation (unsorted). If output order does not matter, simply concatenate the partition files:

//CONCAT  DD DSN=PROD.ACCTPROC.PART01.OUTPUT,DISP=SHR
//        DD DSN=PROD.ACCTPROC.PART02.OUTPUT,DISP=SHR
//        DD DSN=PROD.ACCTPROC.PART03.OUTPUT,DISP=SHR
//        DD DSN=PROD.ACCTPROC.PART04.OUTPUT,DISP=SHR

Pattern 3: DB2 is the merge. If each partition updated different rows in a DB2 table, there is no merge step. The table itself is the consolidated output. This is the simplest and most common pattern for DB2-centric batch.

Reconciliation

After merging, you must verify that the parallel execution produced correct results. Standard reconciliation checks:

Record count: Sum of partition record counts must equal the total input count. The setup job counts the input; each partition reports its count; the merge job validates.

Control totals: Hash totals, amount totals, and check digits computed by each partition are summed and compared against totals from the input.

Gap/overlap detection: For key-range partitioning, verify that no keys were missed (gaps) or processed twice (overlaps). The merge step checks that the minimum key of partition N+1 is exactly one greater than the maximum key of partition N.

Lisa at CNB built a reconciliation framework that runs after every parallel batch job. It queries the partition control table, sums the ACTUAL_COUNT columns, compares against the known input count, and produces a reconciliation report. Any discrepancy triggers an alert and halts downstream processing.

25.6 SORT Parallelism and DFSORT Tricks

DFSORT (and its compatible replacements SYNCSORT and MFISORT) provides its own parallelism capabilities that complement application-level partitioning.

DFSORT Parallel Sort

DFSORT can sort a dataset using multiple concurrent I/O operations. The key parameter is MAINSIZE:

//SYSIN DD *
  SORT FIELDS=(1,10,CH,A)
  OPTION MAINSIZE=MAX
/*

MAINSIZE=MAX tells DFSORT to use as much virtual storage as available. More memory means more data can be sorted in-memory, reducing I/O passes. DFSORT automatically uses parallel I/O when multiple SORTWK datasets are allocated on different volumes:

//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(100)),VOL=SER=VOL001
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(100)),VOL=SER=VOL002
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(100)),VOL=SER=VOL003
//SORTWK04 DD UNIT=SYSDA,SPACE=(CYL,(100)),VOL=SER=VOL004

Four SORTWK datasets on four volumes enables four concurrent I/O streams during the sort phase. DFSORT manages the parallelism internally — you just provide the work files.

Hiperspace Sorting

DFSORT can use hiperspaces (data-in-memory objects on z/OS) as sort work areas, avoiding DASD I/O entirely:

//SYSIN DD *
  SORT FIELDS=(1,10,CH,A)
  OPTION HIPRMAX=OPTIMAL
/*

HIPRMAX=OPTIMAL lets DFSORT use as many hiperspace pages as it determines beneficial. For sorts that fit in available memory, this eliminates all sort I/O — the sort runs entirely in memory at CPU speed.

Carlos at SecureFirst uses hiperspace sorting for the fraud detection batch. The input dataset is 2 GB — large enough to benefit from sorting but small enough to fit in a hiperspace. The sort step dropped from 8 minutes (DASD-based) to 45 seconds (hiperspace-based).

ICETOOL Parallel Operations

ICETOOL can perform multiple operations in a single invocation:

//TOOLIN DD *
  SORT FROM(INPUT) TO(SORTED) USING(CTL1)
  SELECT FROM(SORTED) TO(HIGHVAL) ON(45,10,ZD) HIGHER(1000000)
  STATS FROM(SORTED) ON(45,10,ZD) -
    TITLE('BALANCE STATISTICS')
  COUNT FROM(SORTED) TO(COUNTS) -
    ON(55,2,CH) -
    TITLE('RECORDS BY STATUS')
/*

While ICETOOL operations within a single invocation run serially, ICETOOL's strength for parallel batch is performing multiple transformations in a single pass of the data. Instead of running four separate sort jobs, one ICETOOL invocation can sort, filter, calculate statistics, and produce counts — reading the input once.

IDCAMS REPRO for Partition Split

IDCAMS REPRO with FROMKEY/TOKEY is a zero-code way to split a VSAM KSDS into partitions:

//SPLIT1   EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//INPUT    DD DSN=PROD.ACCOUNT.MASTER,DISP=SHR
//OUTPUT   DD DSN=PROD.ACCOUNT.PART01,DISP=(NEW,CATLG)
//SYSIN    DD *
  REPRO INFILE(INPUT) OUTFILE(OUTPUT) -
    FROMKEY(0000001) TOKEY(2500000)
/*

Four REPRO steps — one per partition — can run as four parallel jobs. Each reads only its key range from the KSDS, which is efficient because VSAM positions directly to the starting key.

Parallel SORT in Practice

A practical pattern combining DFSORT parallelism with application-level partitioning:

Pre-sort split: Use DFSORT OUTFIL with STARTREC/ENDREC to split a large flat file into N pieces by record number (not key). This is the fastest split method for flat files.
Parallel sort: Run N parallel DFSORT jobs, each sorting its piece. Each job uses its own SORTWK volumes and hiperspace allocation.
Merge: Use DFSORT MERGE to combine the N sorted pieces into a single sorted output.

This three-step pattern sorts a 100-million-record file in roughly one-quarter the time of a single sort, assuming four partitions with adequate I/O paths.

Rob at CNB uses this pattern for the end-of-day transaction sort. The raw transaction file contains 15 million records and must be sorted by account number for the posting programs. A single sort takes 25 minutes. The four-way parallel sort + merge takes 8 minutes — a savings that moves the posting programs 17 minutes earlier in the batch schedule.

25.7 Monitoring and Troubleshooting Parallel Batch

Parallel batch introduces failure modes that do not exist in serial processing. Monitoring and troubleshooting must account for partition interdependence, resource contention, and partial failure.

Monitoring Framework

Real-time partition tracking. The partition control table provides a live dashboard:

SELECT PARTITION_NUM, STATUS,
       ACTUAL_COUNT, EXPECTED_COUNT,
       TIMESTAMPDIFF(2, END_TIME - START_TIME) AS ELAPSED_SEC
FROM BATCH.PARTITION_CONTROL
WHERE JOB_NAME = 'ACCTPROC'
  AND RUN_DATE = CURRENT DATE
ORDER BY PARTITION_NUM;

A monitoring job or REXX script can poll this table every 60 seconds and alert if: - Any partition has been running longer than 150% of its expected time - Any partition has failed (STATUS = 'F') - Record counts deviate more than 10% from expected counts

SMF records. Type 30 (job accounting) records for each partition job provide CPU time, I/O counts, and elapsed time. Compare partition-to-partition — significant imbalance indicates uneven data distribution.

DB2 statistics. DB2 statistics trace (class 1 and class 3) for batch threads shows: - Lock wait time (should be minimal if partitions are aligned) - Buffer pool hit ratios (should be consistent across partitions) - Getpages per commit (high values indicate inefficient access paths) - Parallelism reduction (QXDEGRD < QXDEGAT)

Common Failure Modes

1. Partition skew. One partition takes three times longer than the others. The pipeline is gated by the slowest partition, so the parallel benefit is reduced. Diagnosis: check record counts per partition. Solution: recompute partition boundaries based on actual data distribution, not assumptions.

2. Lock contention between partitions. Partitions updating different key ranges in the same tablespace should not contend — but if index pages are shared, or if lock escalation occurs, cross-partition locking happens. Diagnosis: DB2 statistics class 3, lock wait time per thread. Solution: ensure LOCKSIZE ROW, increase NUMLKTS (lock escalation threshold), align batch partitions with DB2 table partitions.

3. I/O bottleneck on shared volumes. Four partitions reading four different datasets that reside on the same DASD volume serialize their I/O. Diagnosis: RMF I/O reports showing response time spikes on specific volumes. Solution: spread partition datasets across different volumes, use SMS managed storage with striping.

4. Partial failure with downstream dependency. Partition 3 of 4 fails. The merge job cannot run because it requires all four partition outputs. The downstream reporting job cannot run because it requires the merge output. A single partition failure cascades through the pipeline. Solution: the merge job checks partition status before starting. If any partition failed, the merge job ends with a warning return code and triggers the restart procedure for the failed partition only.

5. Duplicate processing on restart. Partition 3 fails after processing 60% of its records. On restart, it reprocesses from the last checkpoint — but records between the checkpoint and the failure point may have been partially committed to DB2. Solution: idempotent processing (updates that produce the same result whether applied once or twice) or explicit duplicate checking on restart.

6. Resource exhaustion. Running eight partition jobs simultaneously consumes eight DB2 threads, eight initiators, eight sets of DASD allocations, and eight times the virtual storage. If the system cannot support this, jobs wait for resources, reducing or eliminating the parallelism benefit. Diagnosis: JES initiator waits, DB2 thread waits, DASD allocation waits. Solution: reduce partition count to match available resources.

Troubleshooting Playbook

When a parallel batch run has problems, work through this checklist:

Identify the failure. Which partitions failed? What return codes? Check JESMSGLG, JESYSMSG, and SYSOUT for each partition job.
Check for data issues. Did the splitter produce correct partition files? Record counts match expectations? Any empty partitions?
Check for resource contention. Pull RMF reports for the batch window. Look for I/O response time spikes, CPU queue lengths, paging rates.
Check DB2 health. Thread waits, lock timeouts, deadlocks, buffer pool shortages. The DB2 operator command -DIS THREAD(*) shows active batch threads.
Check for environmental issues. DASD full, catalog contention, HSM recalls, tape mount waits. These affect parallel jobs multiplicatively — one stuck tape mount blocks one job, but if that job is on the critical path, it blocks the entire pipeline.
Determine restart strategy. Can you restart only the failed partition? Or did the failure corrupt shared state (DB2 tables, control records) that requires a full rerun? The partition control table and checkpoint data inform this decision.

Performance Baselines

Establish baselines for parallel batch jobs:

Metric	Baseline	Alert Threshold
Total elapsed time	Record average over 30 days	> 120% of average
Longest partition elapsed	Record per-partition averages	> 150% of partition average
Partition imbalance ratio	Max/Min elapsed time ratio	> 1.5:1
DB2 lock wait time (total)	Record average	> 200% of average
I/O rate per partition	Record average	< 50% of average (indicates bottleneck)
Reconciliation delta	Should be 0	Any non-zero value

Kwame at CNB reviews these metrics weekly. The partition imbalance ratio crept from 1.1:1 to 1.8:1 over three months as account distribution shifted. He recomputed partition boundaries and restored the ratio to 1.15:1, recovering 12 minutes of batch window.

Trend analysis for capacity planning. Beyond daily monitoring, track parallel batch metrics over months. If total elapsed time is increasing 2% per month, you know the current partition count will be insufficient within a year. If partition skew is worsening despite stable partition boundaries, the underlying data distribution is shifting — perhaps one region is growing faster than others. This trend data informs the decision to add partitions proactively rather than reactively when the batch window overruns.

25.8 Progressive Project: HA Banking System — Parallel Batch Architecture

The HA Banking Transaction Processing System processes 8 million transactions per day. The end-of-day batch window is 4 hours. Serial processing takes 6.5 hours. You must design and implement a parallel batch architecture.

Requirements

Transaction posting must complete within 90 minutes. Currently takes 180 minutes serially.
Interest calculation must complete within 45 minutes. Currently takes 80 minutes serially.
Statement generation must complete within 60 minutes. Currently takes 120 minutes serially.
Fraud scoring must complete within 30 minutes. Currently takes 40 minutes serially (already fast, but must be faster).
All jobs must support partition-level checkpoint/restart.
The total pipeline must complete within the 4-hour batch window.

Design Tasks

Task 1: Partition Strategy. Design the partition scheme for each job:

Transaction posting: 4 partitions by account number range (2M accounts, 500K per partition)
Interest calculation: 6 partitions by account number range (aligned with DB2 table partitions)
Statement generation: 8 partitions by account number hash (output order irrelevant)
Fraud scoring: 2 partitions (low volume, parallelism for resilience more than speed)

Populate a partition control table for each job. Compute balanced split points using account distribution data.

Task 2: Pipeline Design. Design the end-of-day pipeline with parallel streams:

SETUP (5 min)
  |
  +-- Transaction Posting P1-P4 (45 min each, parallel)
  |
POSTING-MERGE (10 min)
  |
  +-- Interest Calc P1-P6 (12 min each, parallel)
  |   |
  |   +-- Statement Gen P1-P8 (15 min each, parallel)
  |
  +-- Fraud Scoring P1-P2 (18 min each, parallel)
  |
FINAL-RECONCILIATION (10 min)

Note that statement generation starts after interest calculation (because statements include interest), but fraud scoring can run in parallel with interest calculation (it operates on transaction data, not balance data).

Critical path: SETUP (5) + POSTING (45) + MERGE (10) + INTEREST (12) + STATEMENT (15) + RECONCILIATION (10) = 97 minutes. Well within the 4-hour window, with margin for partition imbalance and retries.

Task 3: Implement Partition-Safe Transaction Posting. Code the COBOL program for transaction posting with: - Partition parameter reading from the control table - Key-range filtering for the assigned partition - Checkpoint every 1,000 transactions - Deadlock handling with retry - Commit every 500 updates - Completion status update to the control table

Task 4: Implement the Reconciliation Job. Code the reconciliation program that: - Reads all partition control table entries - Validates all partitions completed successfully - Sums record counts and compares to input count - Reports partition elapsed times and imbalance ratios - Sets return code 0 (all OK), 4 (warnings), or 8 (failures detected)

Deliverables

Pipeline dependency diagram
JCL for the complete pipeline (SETUP, four POSTING jobs, MERGE, six INTEREST jobs, eight STATEMENT jobs, two FRAUD jobs, RECONCILIATION)
Partition-safe COBOL for transaction posting with checkpoint/restart
Reconciliation COBOL program
Partition control table DDL and population SQL
Performance baseline targets for each parallel step

This project checkpoint builds directly on the checkpoint/restart framework from Chapter 24 and the batch window analysis from Chapter 23. The transaction posting program extends the basic posting program from Chapter 23 with partition awareness and parallel safety.

Chapter Summary

Parallel batch processing is not an optimization — it is a survival strategy. Every mainframe shop with growing volumes and shrinking batch windows must parallelize. The three levels of parallelism — application partitioning, DB2 internal parallelism, and SORT parallelism — combine multiplicatively.

The engineering discipline is in the details: balanced partition boundaries, resource isolation, deadlock handling, partition-level checkpoint/restart, and rigorous reconciliation. Get these right, and your six-hour serial batch runs in two hours with headroom. Get them wrong, and you have six hours of serial batch plus two hours of debugging parallel failures.

The partition control table is the linchpin. It provides operational visibility, enables partition-level restart, supports reconciliation, and documents the parallel execution for audit. Every parallel batch design should start with this table.

In the next chapter, we will examine batch performance tuning — how to optimize individual programs and system resources to squeeze more throughput from each partition, making your parallel batch even faster.

Key Terms

Term	Definition
Parallel processing	Executing multiple tasks simultaneously to reduce elapsed time
Partitioned processing	Splitting input data and running multiple copies of a program, each processing a different subset
Partition key	The data element used to divide records among partitions (account number, region code, hash value)
Partition-safe	A program that can run as multiple concurrent copies without interference or data corruption
Multi-step pipeline	A sequence of batch jobs with dependency relationships, some running in parallel and some serially
Fan-out	The point in a pipeline where a single stream splits into multiple parallel streams
Fan-in	The point in a pipeline where multiple parallel streams converge back into a single stream
DB2 I/O parallelism	DB2's ability to issue multiple concurrent I/O requests for a single query
DB2 CP parallelism	DB2's ability to use multiple engines for scanning, sorting, and aggregating within a single query
DB2 Sysplex parallelism	DB2's ability to distribute query processing across multiple DB2 members in a data sharing group
DEGREE	The BIND parameter that enables DB2 parallelism; DEGREE(ANY) allows DB2 to choose the parallelism level
GBP (Group Buffer Pool)	Coupling facility structure used for cross-member buffer coherency in DB2 data sharing
Parallel SORT	DFSORT's ability to use multiple I/O streams and hiperspace for concurrent sort operations
ICETOOL	DFSORT utility that performs multiple operations (sort, select, statistics, count) in a controlled sequence
DFSORT/SYNCSORT	IBM and vendor sort utilities that provide parallel processing capabilities for sort, merge, and copy
IDCAMS REPRO FROMKEY/TOKEY	IDCAMS command to copy a range of records from a VSAM KSDS, useful for partition splitting