Chapter 28 Quiz: Batch Processing Patterns and Design

Test your understanding of batch processing patterns and design principles for COBOL mainframe programs. Each question is followed by a hidden answer -- try to answer before revealing it.

Question 1

What are the three phases that every well-structured batch program follows?

Show Answer

Every batch program follows the **Read-Process-Write** pattern with three phases: 1. **Initialization**: Open all files, read control cards or parameter files, initialize accumulators and counters, write report headers, and record the start time. 2. **Processing**: The main loop reads each input record, applies business logic (validation, transformation, calculation), and writes output. Uses the "read-ahead" technique where the first record is read before entering the loop. 3. **Finalization**: Write report footers and grand totals, close all files, display processing statistics (records read, written, rejected), and set the return code. This three-phase structure is universal across virtually all COBOL batch programs. Experienced programmers can navigate any batch program by looking for these landmarks.

Question 2

In the "read-ahead" technique, when is the first input record read?

A) At the beginning of the processing loop B) Before entering the processing loop C) After the first iteration of the processing loop D) During the initialization phase, before opening the output file

Show Answer

**B) Before entering the processing loop** The read-ahead technique reads the first record before entering the main processing loop:

PERFORM 1000-READ-FIRST-RECORD.
PERFORM 2000-PROCESS-RECORDS
    UNTIL WS-EOF = 'Y'.
PERFORM 3000-FINALIZE.

Within the processing loop, the next record is read at the **end** of each iteration, after processing the current record:

2000-PROCESS-RECORDS.
    PERFORM 2100-PROCESS-CURRENT-RECORD.
    PERFORM 1000-READ-NEXT-RECORD.

This ensures clean end-of-file handling: when EOF is detected, the loop exits immediately without processing garbage data. It also handles empty files correctly (the first read sets EOF, and the loop never executes).

Question 3

What is the balanced line algorithm, and what is its key prerequisite?

Show Answer

The **balanced line algorithm** (also called the matching record technique) is the standard pattern for sequential master file updates. It processes two sorted files in a single pass, comparing keys to determine the action: - **Master key < Transaction key**: Write master unchanged; read next master - **Master key = Transaction key**: Apply transaction to master; read both - **Master key > Transaction key**: Process unmatched transaction (add or error); read next transaction The **key prerequisite** is that **both files must be sorted on the same key field** in the same order (typically ascending). Without this, the algorithm cannot correctly identify matching and unmatched records. The algorithm uses HIGH-VALUES sentinels: when either file reaches end-of-file, its key is set to HIGH-VALUES (all-ones, which sorts higher than any real key). Processing continues until both keys equal HIGH-VALUES.

Question 4

In a three-level control break program (Region, District, Salesperson), a Region break occurs. Which break paragraphs execute, and in what order?

Show Answer

When a **Region break** (major break) occurs, all three break paragraphs execute in **minor-to-major** order: 1. **Salesperson break** (minor) -- Print salesperson subtotal, roll up to district accumulator, reset salesperson accumulator 2. **District break** (intermediate) -- Print district subtotal, roll up to region accumulator, reset district accumulator 3. **Region break** (major) -- Print region subtotal, roll up to grand total accumulator, reset region accumulator This cascade is essential because a higher-level break implies breaks at all lower levels. When the region changes, the previous salesperson's and district's totals must be finalized before the region total is printed. The logic:

EVALUATE TRUE
    WHEN INPUT-REGION NOT = WS-PREV-REGION
        PERFORM SALESPERSON-BREAK
        PERFORM DISTRICT-BREAK
        PERFORM REGION-BREAK
    WHEN INPUT-DISTRICT NOT = WS-PREV-DISTRICT
        PERFORM SALESPERSON-BREAK
        PERFORM DISTRICT-BREAK
    WHEN INPUT-SALESPERSON NOT = WS-PREV-SALESPERSON
        PERFORM SALESPERSON-BREAK
END-EVALUATE

Question 5

True or False: In a control break program, the final break at the end of the file is handled automatically by the COBOL runtime system.

Show Answer

**False.** The programmer must explicitly force the final break after the processing loop ends. When the last record has been processed and end-of-file is detected, the loop exits, but the accumulators for the last group at every level still contain unprinted totals. The programmer must add code after the loop to trigger all break levels:

PERFORM 2000-PROCESS-RECORDS
    UNTIL WS-EOF = 'Y'.

* Force final breaks for the last group
IF WS-RECORDS-READ > 0
    PERFORM SALESPERSON-BREAK
    PERFORM DISTRICT-BREAK
    PERFORM REGION-BREAK
    PERFORM GRAND-TOTAL
END-IF.

Forgetting the final break is one of the most common bugs in control break programs. It causes the last group's subtotals to be omitted from the report.

Question 6

What is the purpose of checkpoint/restart in batch processing, and when should a checkpoint be written?

Show Answer

**Checkpoint/restart** saves the program's state periodically so that after a failure, the program can resume from the last checkpoint rather than restarting from the beginning. This is critical for long-running jobs that process millions of records. A checkpoint record typically contains: - Current position in the input file (record count or key value) - All accumulators and counters - A timestamp - An eye-catcher or validation marker **When to write checkpoints:** - At regular intervals (every N records, such as every 100,000 records) - At natural break points (after completing a logical group of records) - Before and after critical operations The tradeoff is between checkpoint frequency and overhead: - **Too infrequent**: A failure requires reprocessing more records - **Too frequent**: Checkpoint I/O overhead slows normal processing A common guideline is to checkpoint at intervals that represent 5-15 minutes of processing time, ensuring that the maximum restart penalty is reasonable.

Question 7

What are the standard mainframe batch return code conventions?

A) 0 = success, 1 = warning, 2 = error, 3 = critical B) 0 = success, 4 = warning, 8 = error, 12 = severe, 16 = critical C) 0 = success, 2 = info, 4 = warning, 8 = error, 16 = fatal D) 0 = success, 1-7 = warning, 8-15 = error, 16+ = critical

Show Answer

**B) 0 = success, 4 = warning, 8 = error, 12 = severe, 16 = critical** The standard mainframe return code conventions use multiples of 4: | RC | Meaning | Action | |---|---|---| | 0 | Success | Continue with next step | | 4 | Warning | Continue, but investigate (e.g., some records rejected, below threshold) | | 8 | Error | Skip dependent steps (e.g., error threshold exceeded, processing unreliable) | | 12 | Severe error | Abort the job (e.g., critical data failure) | | 16 | Critical | Abort immediately, notify operations (e.g., file corruption, system error) | These values are set in COBOL using:

MOVE 4 TO RETURN-CODE.

JCL COND parameters and IF/THEN/ELSE constructs test these return codes to control step execution. The multiples-of-4 convention is a long-standing IBM mainframe tradition, though it is a guideline rather than a technical requirement.

Question 8

What is the difference between a control total and a hash total?

Show Answer

**Control total**: A meaningful sum of a business field, typically a monetary amount. The control total of all transaction amounts should balance between input and output. For example, if the input transactions total $1,250,000 in debits, the output should show the same total (adjusted for any adds or deletes). Control totals verify that no money was lost or created during processing. **Hash total**: A meaningless sum of a non-monetary numeric field, such as account numbers. The hash total has no business meaning -- you would never report "the sum of all account numbers is 47,234,891." Its purpose is to detect lost or duplicated records. If the hash total of account numbers differs between input and output, a record was lost, duplicated, or had its account number corrupted. Both totals are calculated during processing and compared to expected values (from a trailer record, a control file, or a previous run). Together with record counts, they form a three-way verification: - **Record count**: Correct number of records - **Control total**: Correct monetary amounts - **Hash total**: Correct record identity (no substitutions)

Question 9

True or False: In a sequential master file update, when a transaction has no matching master record and the transaction type is "Add", the program should write a new master record to the output file.

Show Answer

**True.** In the balanced line algorithm, when the master key is greater than the transaction key (meaning there is no master record for this transaction), the program checks the transaction type: - **Add (A)**: Valid. Create a new master record from the transaction data and write it to the new master file. - **Change (C)**: Error. Cannot change a record that does not exist. Write to the error file. - **Delete (D)**: Error. Cannot delete a record that does not exist. Write to the error file.

WHEN WS-MASTER-KEY > WS-TRANS-KEY
    IF TR-TYPE = 'A'
        PERFORM CREATE-NEW-MASTER
    ELSE
        PERFORM WRITE-ERROR-NO-MASTER
    END-IF
    PERFORM READ-TRANSACTION

The new master record is written to the output file in the correct key sequence because the balanced line algorithm processes keys in ascending order.

Question 10

What does the "batch window" refer to, and why has it become a challenge?

Show Answer

The **batch window** is the period -- typically overnight -- when batch processing runs. During the batch window, online systems may be unavailable or running with reduced functionality because batch jobs need exclusive access to critical files and databases. A typical batch window might be from midnight to 5:00 AM (5 hours). The batch window has become a challenge because: 1. **Online systems now run 24/7**: Banks, airlines, and retailers need their systems available around the clock. This compresses the batch window. 2. **Data volumes have grown**: More customers, more transactions, and more regulatory requirements mean more records to process. 3. **Global operations**: A multinational bank cannot shut down its systems at night because it is always "business hours" somewhere. 4. **Real-time expectations**: Customers expect instant account updates, not next-day batch processing. Solutions include: parallelizing batch jobs, optimizing I/O (block sizes, buffering), consolidating steps to reduce passes over data, moving some processing to near-real-time micro-batches, and upgrading hardware (faster CPUs, faster I/O subsystems).

Question 11

A batch program reads a header record, 1,000,000 detail records, and a trailer record. The trailer contains the expected record count of 1,000,000. The program counts 999,998 detail records. What should the program do?

Show Answer

The program should: 1. **Report the discrepancy** clearly in the processing log: ``` *** RECORD COUNT MISMATCH *** Expected (from trailer): 1,000,000 Actual (program count): 999,998 Difference: 2 ``` 2. **Set the return code to 8 or higher** (Error), because 2 missing records could represent significant financial data loss. 3. **Write the discrepancy to an audit file** for investigation. 4. **Do not produce the output file** (or produce it but flag it as unreliable), because the input data integrity is compromised. Two records may have been lost during file transmission, during a copy operation, or due to a truncated transfer. 5. **Notify operations** so the source system can retransmit the file. The 2-record discrepancy is small in percentage terms (0.0002%), but in banking, even a single missing transaction could represent a large financial impact. The program should never silently accept a count mismatch.

Question 12

What technique is used to handle end-of-file in the balanced line algorithm when only one of the two input files has been exhausted?

Show Answer

The **HIGH-VALUES sentinel** technique. When either the master file or the transaction file reaches end-of-file, its key variable is set to HIGH-VALUES (binary all-ones, which is greater than any possible real key value):

1100-READ-MASTER.
    READ MASTER-FILE INTO MASTER-RECORD
        AT END
            MOVE HIGH-VALUES TO WS-MASTER-KEY
    END-READ.

1200-READ-TRANS.
    READ TRANS-FILE INTO TRANS-RECORD
        AT END
            MOVE HIGH-VALUES TO WS-TRANS-KEY
    END-READ.

Processing continues with the other file. Since HIGH-VALUES is always greater than any real key: - If the master file is exhausted (WS-MASTER-KEY = HIGH-VALUES), all remaining transactions are "unmatched transactions" (master key > trans key path). - If the transaction file is exhausted (WS-TRANS-KEY = HIGH-VALUES), all remaining masters are "unmatched masters" (master key < trans key path) and are written unchanged. Processing stops when **both** keys equal HIGH-VALUES:

PERFORM UNTIL WS-MASTER-KEY = HIGH-VALUES
           AND WS-TRANS-KEY = HIGH-VALUES

Question 13

Which batch processing pattern is used to produce a report that shows total sales by department within each region?

A) Sequential master file update B) Extract-Transform-Load (ETL) C) Control break processing D) Checkpoint/restart

Show Answer

**C) Control break processing** Control break processing produces hierarchical reports with subtotals at each level of a grouping hierarchy. For "total sales by department within each region," there are two control break levels: - **Major break**: Region (when the region changes, print region subtotal) - **Minor break**: Department (when the department changes, print department subtotal) The input file must be sorted by region and then by department within each region. The program detects changes in these control fields and triggers the appropriate break processing. The other patterns serve different purposes: - Sequential master file update: Applying transactions to a master file - ETL: Moving and transforming data between systems - Checkpoint/restart: Recovery from failures in long-running jobs

Question 14

What is the purpose of a trailer record in a batch file?

Show Answer

A **trailer record** is the last record in a batch file and serves as a verification mechanism. It typically contains: 1. **Record type indicator**: Usually 'T' to identify it as a trailer 2. **Record count**: The number of detail records in the file 3. **Control totals**: Sum of monetary amount fields (e.g., total debits, total credits) 4. **Hash totals**: Sum of key fields (e.g., sum of all account numbers) 5. **File creation date/time**: When the file was generated 6. **Source system identifier**: Which system produced the file The receiving program compares the trailer values to its own counts and totals calculated during processing. Any discrepancy indicates data integrity problems such as: - Lost records (count mismatch) - Corrupted data (control total mismatch) - Duplicated or substituted records (hash total mismatch) - Wrong file version (date mismatch) Trailer records are a critical audit control in financial batch processing. Regulatory auditors expect to see trailer verification in every data exchange.

Question 15

True or False: In a batch program, it is acceptable to process all records and only check for errors at the end of the run.

Show Answer

**False.** Batch programs should validate each record as it is processed and handle errors immediately for several reasons: 1. **Threshold-based abort**: If errors exceed a configured threshold (e.g., 1% of records), the program should abort early rather than processing millions of additional records with potentially corrupt data. 2. **Error isolation**: Recording errors at the point of detection preserves context (the input record, the specific field that failed, the position in the file). 3. **Performance**: If there is a systematic error (e.g., wrong file format), early detection saves hours of wasted processing. 4. **Data integrity**: Continuing to process after encountering bad data could corrupt downstream files and reports. The correct approach is to validate each record, write rejects to an error file immediately, maintain running error counts, and check against thresholds throughout processing:

IF WS-TOTAL-ERRORS > WS-ERROR-THRESHOLD
    DISPLAY 'ERROR THRESHOLD EXCEEDED - ABORTING'
    MOVE 12 TO RETURN-CODE
    PERFORM 9000-ABNORMAL-TERMINATION
END-IF

Question 16

What is the Extract-Transform-Load (ETL) pattern, and what are its three phases?

Show Answer

**ETL (Extract-Transform-Load)** is a batch pattern for moving data between systems. Its three phases are: 1. **Extract**: Read data from the source system. On a mainframe, this might mean reading a VSAM file, executing SQL against DB2, or reading a sequential extract file from another application. 2. **Transform**: Convert the data from the source format to the target format: - Data type conversion (packed decimal to display, EBCDIC to ASCII) - Code translation (source system codes to target system codes) - Validation (check required fields, valid values, cross-field rules) - Derivation (calculate new fields from existing ones) - Aggregation (summarize detail records) - Filtering (exclude records that do not meet criteria) 3. **Load**: Write the transformed data to the target system. This could be a sequential file for transmission, a VSAM file for online access, or SQL INSERTs into a DB2 table. ETL programs typically produce two output streams: a **clean file** with valid, transformed records, and a **reject file** with records that failed validation, accompanied by error descriptions.

Question 17

A sequential master file update produces a new master file. What happens to the old master file?

Show Answer

The old master file is **preserved as a backup**. The sequential master file update pattern is non-destructive -- it reads the old master and writes a new master as a separate file. The old master is never modified. This design provides a built-in **grandfather-father-son** backup strategy: - **Son**: The new master file just created (current generation) - **Father**: The previous master file (previous generation, used as input for today's run) - **Grandfather**: The master file from two runs ago Using GDGs (Generation Data Groups), this versioning is automatic:

//OLDMAST  DD  DSN=PROD.ACCT.MASTER(0),DISP=SHR     Father (input)
//NEWMAST  DD  DSN=PROD.ACCT.MASTER(+1),              Son (output)
//             DISP=(NEW,CATLG,DELETE),...

If today's run produces incorrect results, the old master is still available. Recovery involves rerunning the job with corrected transactions against the old master. This is one of the key advantages of the sequential update pattern over in-place VSAM updates.

Question 18

What is the "batch window compression" problem, and name three techniques to address it.

Show Answer

**Batch window compression** occurs when the available time for batch processing shrinks while the volume of data to process grows. As online systems extend their operating hours (24x7 availability), the overnight batch window becomes shorter, but the number of records, regulatory requirements, and complexity of processing increase. Three techniques to address it: 1. **Parallelization**: Identify independent jobs that can run simultaneously. If the accounts receivable and accounts payable jobs use different files, run them in parallel. Split large files by key range and process partitions concurrently. 2. **Step consolidation**: Combine multiple programs that read the same input file into a single program that makes one pass. Eliminating intermediate files reduces I/O, which is typically the biggest performance bottleneck. 3. **I/O optimization**: Maximize block sizes (BLKSIZE=0), increase buffering (BUFNO=10-20), use system-managed storage (SMS) for optimal placement, and allocate sort work files on separate volumes for parallel I/O. Optimized I/O can reduce batch elapsed time by 30-50%. Additional techniques include: sorting with DFSORT instead of COBOL SORT (DFSORT is highly optimized for z/OS hardware), using COMP and COMP-3 data types for arithmetic, and migrating some batch processes to near-real-time event-driven processing.

Question 19

What is the difference between a "record count" verification and a "hash total" verification?

Show Answer

**Record count verification** checks that the correct **number** of records was processed. If the input has 1,000,000 records and the output has 999,999, one record was lost. However, record count alone cannot detect a more subtle problem: if one record was lost AND one record was duplicated, the count still shows 1,000,000. **Hash total verification** catches problems that record counts miss. A hash total is the sum of a key field (like account number) across all records. If account 12345 is replaced by a duplicate of account 67890: - Record count: 1,000,000 (unchanged -- looks correct) - Hash total: Different from expected (because 12345 was replaced by 67890) Together, the three verification mechanisms provide comprehensive data integrity checking: | Verification | Detects | |---|---| | Record count | Lost or extra records | | Control total | Lost, extra, or corrupted monetary data | | Hash total | Lost, duplicated, or substituted records (even when counts balance) | In banking, all three are typically required for every file exchange.

Question 20

In a multi-step batch job, what does the following JCL conditional execution mean?

// IF (STEP010.RC <= 4 & STEP020.RC = 0) THEN
//STEP030  EXEC PGM=REPORT
// ENDIF

Show Answer

This IF statement means: **Execute STEP030 only if STEP010 returned 0 or 4 (success or warning) AND STEP020 returned exactly 0 (success).** Breaking down the condition: - `STEP010.RC <= 4`: STEP010's return code is 0 or 4 (acceptable completion) - `&`: Logical AND -- both conditions must be true - `STEP020.RC = 0`: STEP020's return code is exactly 0 (clean success) STEP030 (the reporting program) will run only when both prior steps completed acceptably. If STEP010 returned 8 (error) or STEP020 returned 4 (warning), STEP030 is bypassed. This is a common pattern: generate reports only when all processing steps completed successfully. There is no point in generating reports from incomplete or unreliable data. The `&` operator performs logical AND; `|` performs logical OR; and `NOT` performs negation. These can be combined for complex conditions.

Question 21

True or False: The COBOL RERUN clause in the I-O-CONTROL paragraph is the recommended approach for checkpoint/restart in modern production programs.

Show Answer

**False.** While COBOL does provide the RERUN clause for automatic checkpointing:

I-O-CONTROL.
    RERUN ON CHECKPOINT-FILE
        EVERY 10000 RECORDS OF INPUT-FILE.

Most production mainframe shops prefer **programmatic checkpointing** for several reasons: 1. **Greater control**: Programmatic checkpointing lets you choose exactly what data to save (accumulators, counters, state flags) and when to save it. 2. **Flexibility**: You can adjust checkpoint frequency based on processing conditions (more frequent during complex processing, less frequent during simple passes). 3. **Custom restart logic**: Programmatic restart can include validation, logging, and notification that the RERUN clause cannot provide. 4. **Multiple files**: When a program reads and writes multiple files, programmatic checkpointing can coordinate the positions across all files. 5. **Integration**: Programmatic checkpoints can be integrated with job scheduling systems and operational monitoring tools. The RERUN clause is part of the COBOL standard but is rarely used in modern enterprise environments.

Question 22

A batch program processes a file sorted by account number. Multiple transactions may exist for the same account. What technique ensures all transactions for one account are processed together before moving to the next account?

Show Answer

The technique is to **collect all transactions for the same key** before processing them. This is done by reading ahead and comparing keys:

PERFORM UNTIL WS-TRANS-KEY NOT = WS-CURRENT-KEY
           OR WS-EOF = 'Y'
    PERFORM PROCESS-SINGLE-TRANSACTION
    PERFORM READ-NEXT-TRANSACTION
END-PERFORM

This is essential in the balanced line algorithm for sequential master updates. When the master key equals the transaction key, the program must process ALL transactions for that key before reading the next master:

2200-APPLY-TRANSACTIONS.
    MOVE MASTER-RECORD TO WS-WORK-MASTER.
    PERFORM UNTIL WS-TRANS-KEY NOT = WS-MASTER-KEY
        PERFORM APPLY-SINGLE-TRANSACTION
        PERFORM READ-NEXT-TRANSACTION
    END-PERFORM.
    WRITE NEW-MASTER-RECORD FROM WS-WORK-MASTER.
    PERFORM READ-NEXT-MASTER.

Without this grouping, the program would process only the first transaction for a key and then advance to the next master record, losing subsequent transactions. This is another common bug in master update programs.

Question 23

What are the advantages of the sequential master file update pattern over in-place VSAM KSDS updates for batch processing?

Show Answer

The sequential master file update pattern has several advantages over in-place VSAM updates for batch processing: 1. **Built-in backup**: The old master is preserved. If the update produces incorrect results, you can rerun from the old master. With in-place VSAM updates, you must explicitly back up before updating. 2. **Atomicity**: The new master is either fully created or not. If the job fails midway, the old master is still intact. With in-place VSAM updates, a mid-job failure leaves the file in a partially updated, inconsistent state. 3. **Performance**: Sequential I/O is dramatically faster than random VSAM I/O. Reading a 5 million record file sequentially takes minutes; updating 5 million records randomly in a VSAM KSDS takes much longer due to index lookups and CI/CA splits. 4. **No contention**: The old master can remain available for online readers during the batch update (DISP=SHR on old master). With in-place VSAM updates, the file must be exclusively locked (DISP=OLD), blocking online access. 5. **Audit trail**: Keeping multiple generations provides a natural audit trail of the file's state at any point in time. The main disadvantage is disk space: maintaining two copies of a large master file requires double the storage. For very large files, this can be significant.

Question 24

What is the purpose of the error threshold in a batch program, and how should it be implemented?

Show Answer

The **error threshold** is a configurable limit that determines when a batch program should abort due to excessive errors rather than continuing to produce unreliable output. **Purpose:** - A few errors (0.01%) in a 10 million record file are normal (bad data in individual records) - A large number of errors (5%) usually indicates a systemic problem (wrong file format, corrupted input, incorrect processing logic) - Processing millions of additional records when the input is fundamentally flawed wastes time and produces useless output **Implementation:**

01  WS-ERROR-THRESHOLD    PIC 9(06).
01  WS-ABORT-FLAG         PIC X(01) VALUE 'N'.

* Read threshold from parameter file
    MOVE PARM-ERR-THRESHOLD TO WS-ERROR-THRESHOLD.

* After each error, check threshold
    ADD 1 TO WS-ERROR-COUNT.
    IF WS-RECORDS-READ > 1000
        COMPUTE WS-ERROR-PCT =
            (WS-ERROR-COUNT * 100) / WS-RECORDS-READ
        IF WS-ERROR-PCT > WS-ERROR-THRESHOLD
            MOVE 'Y' TO WS-ABORT-FLAG
            DISPLAY 'ABORT: Error rate ' WS-ERROR-PCT
                '% exceeds threshold ' WS-ERROR-THRESHOLD '%'
            MOVE 12 TO RETURN-CODE
        END-IF
    END-IF.

The threshold should be: - **Configurable** (read from a parameter file, not hard-coded) - **Checked periodically** (not just at end of file) - **Based on percentage** (1% of 10 million is different from 1% of 1,000) - **Logged** when triggered (for diagnosis)

Question 25

What is the critical path in a batch job dependency graph, and why does it matter?

Show Answer

The **critical path** is the longest chain of sequential dependencies through a batch job dependency graph. It represents the **minimum possible elapsed time** for the entire batch cycle, even with unlimited parallelism. **Example:**

SORT (20 min) -> POST (60 min) -> INTEREST (45 min) -> STATEMENTS (40 min)
                               -> FEES (30 min)     -^
                               -> GL (25 min)       -> ARCHIVE (15 min)

Critical path: SORT (20) + POST (60) + INTEREST (45) + STATEMENTS (40) + ARCHIVE (15) = **180 minutes** Even though FEES and GL can run in parallel with INTEREST, the critical path cannot be shortened below 180 minutes without optimizing the programs on that path. **Why it matters:** 1. It determines whether the batch cycle fits within the batch window 2. Optimization efforts should focus on programs on the critical path (speeding up GL does not help if it is not on the critical path) 3. It guides capacity planning (adding more CPUs helps parallel jobs but not the critical path) 4. It helps identify where parallelism can be exploited (only off-critical-path jobs benefit from parallelism)

Question 26

True or False: Using COMP-3 (packed decimal) data types for monetary calculations in batch COBOL programs provides better performance than using DISPLAY (zoned decimal) data types.

Show Answer

**True.** COMP-3 (packed decimal) is significantly more efficient for decimal arithmetic on IBM mainframe hardware because: 1. **Native hardware support**: The IBM z/Architecture processor has dedicated packed decimal instructions (AP, SP, MP, DP, CP) that operate directly on COMP-3 data. DISPLAY data must first be converted to packed format before arithmetic can be performed. 2. **Smaller storage**: COMP-3 stores two digits per byte (plus a half-byte for the sign), while DISPLAY uses one byte per digit. PIC S9(13)V99 occupies 8 bytes as COMP-3 but 15 bytes as DISPLAY. Smaller data means fewer I/O operations when reading/writing files. 3. **Fewer conversions**: If intermediate calculations use COMP-3 and the output file also stores COMP-3, no conversion is needed. DISPLAY fields require pack-on-read and unpack-on-write. For batch programs processing millions of records, the performance difference can be substantial. Best practices: - Use **COMP-3** for all monetary amounts and arithmetic fields - Use **COMP** (binary) for subscripts, counters, and loop variables - Use **DISPLAY** only for fields that must be human-readable in the file (report output, display-format extract files)

Question 27

A batch program must process records from three different sources: ATM transactions, teller transactions, and online transactions. Each source provides a separate sorted file. What are two approaches to processing all three files, and which is more efficient?

Show Answer

**Approach 1: Concatenate and Sort** Concatenate all three files and sort the combined file by account number:

//SORTIN   DD  DSN=PROD.ATM.TRANS,DISP=SHR
//         DD  DSN=PROD.TELLER.TRANS,DISP=SHR
//         DD  DSN=PROD.ONLINE.TRANS,DISP=SHR

Then process the sorted output. **Approach 2: Merge** Since the files are already sorted, merge them using COBOL MERGE or DFSORT MERGE:

MERGE SORT-FILE
    ON ASCENDING KEY SORT-ACCOUNT
    USING ATM-FILE TELLER-FILE ONLINE-FILE
    GIVING MERGED-FILE.

**Approach 2 (MERGE) is more efficient** because: 1. The files are already sorted -- merging simply interleaves records from three sorted streams 2. No sort work files are needed (no SORTWK allocations) 3. The merge makes a single pass through each file (linear I/O) 4. A full sort of the concatenated file requires multiple passes and sort work I/O For three files of 1 million records each (3 million total), merge requires approximately 6 million I/Os (read 3M + write 3M). A sort requires reading 3M, writing to sort work files (multiple passes), and writing the final output -- potentially 15-20 million I/Os.

Question 28

What should a batch program do when it encounters an empty input file?

Show Answer

A well-designed batch program handles empty input files gracefully rather than abending: 1. **Detect the empty file**: The first READ returns AT END immediately. 2. **Do not abend**: An empty file is a valid condition that the program should handle. 3. **Set an appropriate return code**: Typically RC=4 (warning) or RC=0 (success), depending on whether the empty file is expected: - If the file is expected to have records (e.g., daily transactions), RC=4 signals that something unusual happened - If the file might legitimately be empty (e.g., a reject file from a previous step), RC=0 is appropriate 4. **Produce valid output**: Write headers and trailers with zero counts. Create an empty output file (not no file). 5. **Log the condition**: Display a message indicating the file was empty.

PERFORM 1100-READ-FIRST-RECORD.
IF WS-EOF = 'Y'
    DISPLAY 'WARNING: Input file is empty'
    MOVE 4 TO RETURN-CODE
    PERFORM 3000-WRITE-EMPTY-OUTPUT
    PERFORM 4000-FINALIZE
    STOP RUN
END-IF.

Creating an empty output file (rather than no output file) is important because downstream programs and JCL steps may expect the file to exist.

Question 29

What is the grandfather-father-son backup strategy, and how does it relate to GDGs?

Show Answer

The **grandfather-father-son** (GFS) strategy maintains three generations of a file: - **Son**: The current (newest) version, just created by today's batch run - **Father**: Yesterday's version, used as input for today's run - **Grandfather**: The version from two days ago If today's run produces incorrect results: 1. Discard the son 2. Rerun using the father as input with corrected transactions 3. If that also fails, the grandfather is still available **GDGs (Generation Data Groups)** automate this strategy on z/OS:

//OLDMAST  DD  DSN=PROD.ACCT.MASTER(0),DISP=SHR   Father
//NEWMAST  DD  DSN=PROD.ACCT.MASTER(+1),           Son
//             DISP=(NEW,CATLG,DELETE),...

- `(0)` = current generation (father for this run) - `(+1)` = new generation being created (son) - `(-1)` = previous generation (grandfather) The GDG base has a LIMIT parameter (e.g., LIMIT(30)) that controls how many generations are retained. When the limit is reached, the oldest generation is automatically deleted. This approach provides built-in recovery capability without requiring separate backup jobs or manual intervention.

Question 30

A bank needs to process 5 million account records to calculate daily interest. The processing takes 3 hours sequentially. The batch window is 2 hours. Propose a strategy to fit this job within the batch window.

Show Answer

The primary strategy is **parallel partitioning** -- splitting the 5 million accounts into multiple partitions and processing them simultaneously: **Step 1: Partition the accounts** Split by account number range into 4 partitions: - Partition 1: Accounts 0000000001 - 1250000000 - Partition 2: Accounts 1250000001 - 2500000000 - Partition 3: Accounts 2500000001 - 3750000000 - Partition 4: Accounts 3750000001 - 5000000000 **Step 2: Run 4 parallel jobs** Each job processes approximately 1.25 million accounts in approximately 45 minutes (3 hours / 4). **Step 3: Merge results** Merge the 4 output files into a single interest posting file (approximately 15 minutes). **Total elapsed time**: 45 + 15 = **60 minutes** (well within the 2-hour window)

//* JOB 1 - Process partition 1
//INTCALC1 EXEC PGM=INTCALC,PARM='0000000001,1250000000'

//* JOB 2 - Process partition 2 (runs simultaneously)
//INTCALC2 EXEC PGM=INTCALC,PARM='1250000001,2500000000'

//* (Jobs 3 and 4 similar)

//* After all 4 complete - merge results
//MERGE    EXEC PGM=SORT
//SORTIN   DD DSN=PROD.INT.PART1,DISP=SHR
//         DD DSN=PROD.INT.PART2,DISP=SHR
//         DD DSN=PROD.INT.PART3,DISP=SHR
//         DD DSN=PROD.INT.PART4,DISP=SHR

Additional optimizations: - Increase buffer count (BUFNO=20) for each job - Use COMP-3 for all monetary calculations - Preload the rate table into a WORKING-STORAGE table to avoid file I/O for lookups - Use system-determined block sizes (BLKSIZE=0)