Chapter 25 Quiz: Parallel Batch Processing

DataField.Dev

Chapter 25 Quiz: Parallel Batch Processing

Instructions: Select the best answer for each question. Each question has exactly one correct answer.

Question 1. What is the primary reason mainframe shops implement parallel batch processing?

A) To reduce CPU consumption by distributing work across engines B) To meet batch window constraints when serial processing exceeds the available time C) To simplify operations by breaking large jobs into smaller pieces D) To comply with DB2 licensing requirements for multi-threaded access

Correct Answer: B Explanation: The fundamental driver for parallel batch is the batch window squeeze — serial processing time exceeds the available window. While parallelism does distribute work and may reduce CPU wall-clock time, the primary business justification is always the batch window. Parallelism actually increases operational complexity (not simplifies it), and DB2 licensing has no such requirement.

Question 2. Which property is NOT required for a good partition key?

A) Even distribution of records across partitions B) No cross-partition processing dependencies C) Sequential numeric values starting from 1 D) Alignment with the physical data layout

Correct Answer: C Explanation: A partition key does not need to be sequential or start from 1. Alphanumeric keys, hash values, geographic codes, and date ranges all work as partition keys. The key requirements are even distribution (A), no cross-partition dependencies (B), and alignment with physical data (D) for I/O efficiency. Compatibility with checkpoint/restart is also important.

Question 3. A four-partition batch job has the following record distribution: P1 = 4.2M, P2 = 1.1M, P3 = 0.9M, P4 = 3.8M. What is the partition imbalance ratio and what does it indicate?

A) 4.67:1 — the partitions are well balanced B) 4.67:1 — the partitions are severely imbalanced and partition boundaries need recomputation C) 3.82:1 — the partitions are moderately imbalanced D) 1.1:1 — the partitions are nearly perfectly balanced

Correct Answer: B Explanation: The imbalance ratio is largest/smallest = 4.2M/0.9M = 4.67:1. This is severe imbalance — the total elapsed time is gated by the slowest partition (P1 with 4.2M records), while P3 finishes in roughly one-fifth the time and sits idle. A well-balanced partition has a ratio below 1.3:1. This requires recomputing partition boundaries based on actual data distribution.

Question 4. What is the purpose of the partition control table in parallel batch processing?

A) To store the COBOL program source code for each partition B) To track partition boundaries, status, checkpoints, and timing for operational visibility and restart C) To replace the JCL PARM field, which has a 100-byte limit D) To provide DB2 with partition pruning information for query optimization

Correct Answer: B Explanation: The partition control table is the operational linchpin of parallel batch. It stores partition key boundaries, tracks status (pending/running/completed/failed), records checkpoint positions for restart, and captures timing data for monitoring and reconciliation. It provides real-time visibility into partition execution and enables partition-level restart without reprocessing successful partitions.

Question 5. Two partitions update different rows in the same DB2 tablespace. The tablespace uses LOCKSIZE PAGE. Why can deadlocks still occur?

A) Deadlocks cannot occur with LOCKSIZE PAGE — only with LOCKSIZE TABLE B) Different rows may reside on the same page, causing page-level lock contention C) LOCKSIZE PAGE is ignored during batch processing D) DB2 always escalates page locks to tablespace locks during parallel batch

Correct Answer: B Explanation: With LOCKSIZE PAGE, DB2 locks the entire page containing the row, not just the row. Two partitions updating different rows that happen to be on the same data page will compete for the same page lock. If partition 1 holds page lock A and waits for page lock B, while partition 2 holds page lock B and waits for page lock A, a deadlock occurs. LOCKSIZE ROW eliminates this specific problem (at the cost of more lock entries).

Question 6. A partition-safe COBOL program must handle SQLCODE -911. What does this SQLCODE indicate, and what is the correct response?

A) -911 indicates a syntax error; the program should terminate B) -911 indicates a deadlock or timeout; the program should rollback and retry with a short delay C) -911 indicates the table does not exist; the program should skip the update D) -911 indicates a duplicate key; the program should update instead of insert

Correct Answer: B Explanation: SQLCODE -911 indicates that the current unit of work was rolled back due to a deadlock or timeout. The correct response is to issue an explicit ROLLBACK (though DB2 has already rolled back the failing statement), introduce a short delay (1-3 seconds) to break the retry cycle, and reattempt the operation. A retry limit (typically 3) prevents infinite retry loops. The delay with a random component prevents two deadlocking partitions from immediately deadlocking again.

Question 7. DB2 DEGREE(ANY) enables which forms of parallelism?

A) Only I/O parallelism B) Only CP parallelism C) I/O parallelism, CP parallelism, and Sysplex parallelism (DB2 chooses) D) Application-level parallelism through multiple COBOL programs

Correct Answer: C Explanation: DEGREE(ANY) is a single setting that enables all three forms of DB2 internal parallelism. DB2 determines which form(s) to use based on the query, tablespace characteristics, available resources, and system configuration. It does not affect application-level parallelism, which is managed through JCL and scheduling, not DB2 BIND parameters.

Question 8. DB2 accounting trace shows QXDEGAT = 8 and QXDEGRD = 3 for a batch query. What does this mean?

A) The query attempted 8 parallel tasks and successfully used 8 B) The query attempted 8 parallel tasks but was reduced to 3 due to resource constraints C) The query ran 3 times and used 8 threads total D) The query was 3/8 = 37.5% complete when the trace was captured

Correct Answer: B Explanation: QXDEGAT (degree attempted) = 8 means DB2 determined that 8 parallel tasks would be optimal. QXDEGRD (degree reduced) = 3 means resource constraints forced DB2 to actually use only 3 parallel tasks. The QXREDRN (reason for reduction) field identifies the specific constraint — typically buffer pool shortage, thread limits, or PARAMDEG ZPARM cap. This indicates a tuning opportunity: resolving the constraining resource could improve parallelism from 3x to closer to 8x.

Question 9. In a multi-step parallel pipeline, what is the "fan-out" step?

A) The step where multiple parallel streams are merged back into a single stream B) The step where a single input is split into multiple parallel streams for concurrent processing C) The step where failed partitions are identified and restarted D) The step where DB2 distributes queries across Sysplex members

Correct Answer: B Explanation: Fan-out is the pipeline step where a single stream of data is divided into multiple parallel streams. This can be a DFSORT split, a COBOL splitter program, or simply multiple jobs querying different key ranges from a DB2 table. Fan-out precedes the parallel processing phase. The opposite step — where parallel streams converge back into a single stream — is fan-in (or merge).

Question 10. Why can't steps within a single JCL job run in parallel?

A) JCL syntax does not support parallel EXEC statements B) JES processes job steps sequentially within a job; parallel execution requires separate jobs coordinated by a scheduler C) Parallel steps within a job would exceed the region size limit D) IBM withdrew support for intra-job parallelism in z/OS 2.1

Correct Answer: B Explanation: JES (Job Entry Subsystem) executes steps within a job sequentially — step 2 starts after step 1 completes. This is a fundamental characteristic of JCL job execution. To run processing in parallel, you must submit separate jobs and use an external scheduler (TWS/OPC, CA-7, Control-M) to manage the dependencies and parallel execution. The scheduler starts the partition jobs simultaneously after the setup job completes.

Question 11. DFSORT MERGE is used instead of DFSORT SORT for combining partition outputs because:

A) MERGE supports more input files than SORT B) MERGE preserves duplicate records while SORT removes them C) MERGE is O(N) for pre-sorted inputs, while SORT is O(N log N) D) MERGE uses less DASD space for work files

Correct Answer: C Explanation: When partition outputs are already sorted (because each partition processed data in key order), DFSORT MERGE combines them in a single pass through the data — O(N) complexity. SORT would re-sort all records at O(N log N) complexity, which is wasteful when the inputs are already sorted. MERGE also requires no SORTWK datasets because there is no intermediate sort phase, which does save DASD — but the primary benefit is the algorithmic efficiency.

Question 12. During partition-level checkpoint/restart, why must each partition have its own unique checkpoint ID?

A) DB2 requires unique checkpoint IDs for each concurrent thread B) Shared checkpoint IDs cause ENQ conflicts and enable a failed partition's restart to corrupt a successful partition's checkpoint data C) JES2 allocates checkpoint datasets by checkpoint ID D) Unique checkpoint IDs are a coding standard but not technically required

Correct Answer: B Explanation: If partitions share a checkpoint ID or checkpoint dataset, restart of a failed partition can overwrite or corrupt the checkpoint data of a successful partition. Additionally, concurrent writes to a shared checkpoint dataset cause ENQ (serialization) conflicts that serialize the partitions — defeating the purpose of parallelism. Each partition needs a unique checkpoint ID (typically incorporating the partition number) and its own checkpoint dataset.

Question 13. What reconciliation check must run after a parallel merge step?

A) Verify that the merged file's record length matches the input B) Verify that the sum of all partition record counts equals the total input count, and control totals match C) Verify that DB2 statistics were updated during processing D) Verify that all sort work datasets were deleted

Correct Answer: B Explanation: The critical reconciliation check after merge is record count and control total verification. The setup job counts the total input records. Each partition reports its processed count. The merge/reconciliation step sums partition counts and compares to the input total. Any discrepancy indicates lost records (gap between partitions), duplicate records (overlap between partitions), or processing errors. Control totals (hash totals, amount sums) provide additional verification. This is the essential quality gate for parallel processing.

Question 14. What is the Group Buffer Pool (GBP) in the context of DB2 Sysplex parallelism?

A) A shared memory area on each LPAR for DB2 buffer pools B) A coupling facility structure that provides cross-member buffer coherency for data sharing C) A DFSORT hiperspace used for parallel merge operations D) A JES2 buffer pool for parallel job output

Correct Answer: B Explanation: The Group Buffer Pool is a coupling facility structure used in DB2 data sharing environments. When multiple DB2 members access the same data, the GBP ensures that a page modified by member A is visible to member B. During Sysplex parallelism, multiple members process different partitions simultaneously, generating significant GBP traffic (writes of modified pages, cross-invalidation of cached copies). Undersized GBPs cause GBP full conditions that force synchronous writes, destroying the parallelism benefit.

Question 15. Which DFSORT parameter enables hiperspace sorting to avoid DASD I/O?

A) MAINSIZE=MAX B) HIPRMAX=OPTIMAL C) SORTSIZE=MEMORY D) WORKSPACE=HIPER

Correct Answer: B Explanation: HIPRMAX=OPTIMAL tells DFSORT to use hiperspace (data-in-memory objects on z/OS) as sort work areas, with DB2 determining the optimal amount. Hiperspace sorting avoids all sort I/O to DASD, running the sort entirely in memory at CPU speed. MAINSIZE=MAX controls the virtual storage used for sort, not hiperspace. The other options are not valid DFSORT parameters.

Question 16. Four SORTWK datasets are allocated for a DFSORT job. All four are on the same DASD volume. What is the impact?

A) No impact — DFSORT uses them round-robin regardless of volume placement B) Sort performance is degraded because all four work streams compete for the same volume's I/O bandwidth C) DFSORT will reject the configuration and abend D) The sort uses only the first SORTWK and ignores the other three

Correct Answer: B Explanation: DFSORT uses multiple SORTWK datasets to parallelize sort I/O — reading and writing to multiple work files simultaneously. If all work files are on the same volume, the I/O operations serialize on that volume's channel and control unit. The sort runs, but with severely degraded I/O performance. For maximum benefit, each SORTWK should be on a separate volume (ideally on separate channels/control units).

Question 17. IDCAMS REPRO with FROMKEY/TOKEY is used to split a VSAM KSDS for partitioning. What is the I/O efficiency concern?

A) REPRO always reads the entire KSDS regardless of FROMKEY B) REPRO positions directly to FROMKEY using the VSAM index, so there is no efficiency concern C) REPRO cannot split KSDS files — only ESDS files D) REPRO locks the entire KSDS during the copy operation

Correct Answer: B Explanation: IDCAMS REPRO with FROMKEY positions directly to the starting key using the VSAM index, then reads sequentially until TOKEY is reached. This means each partition's REPRO reads only its key range — not the entire file. This makes REPRO an efficient way to split a KSDS for partitioning. (Note: for ESDS files, REPRO would need FROMADDRESS/TOADDRESS and positioning is less efficient.) Multiple REPRO operations can run in parallel as separate jobs.

Question 18. A parallel posting job has 4 partitions. Partition 3 fails. Partitions 1, 2, and 4 completed successfully. What is the correct restart procedure?

A) Restart all 4 partitions from the beginning to ensure consistency B) Restart only partition 3 from its last checkpoint, then run the merge job C) Skip partition 3 and run the merge with only 3 partition outputs D) Restart the entire pipeline from the setup step

Correct Answer: B Explanation: The correct procedure is to restart only the failed partition from its last checkpoint. Partitions 1, 2, and 4 completed successfully — their output is correct and their DB2 updates are committed. Rerunning them wastes time and risks duplicate processing. Partition 3 restarts from its checkpoint key, processes the remaining records, and then the merge job runs with all four complete partition outputs. This is why partition-level checkpoint/restart and the partition control table are essential.

Question 19. When designing the number of partitions for a parallel batch job, which factor is LEAST important?

A) The number of available CPs/engines B) The number of available I/O paths to DASD C) The total number of records in the input file D) The DB2 batch thread limit

Correct Answer: C Explanation: While the total number of records affects whether parallelism is worthwhile at all (very small files do not benefit), it does not significantly influence the optimal partition count. A 10-million-record file and a 100-million-record file might both use 4 partitions — the optimal count depends on available CPs (A), I/O paths (B), DB2 threads (D), and operational complexity tolerance. The partition count is a function of system capacity, not data volume (assuming the volume is large enough to justify parallelism).

Question 20. A batch pipeline has this structure: SETUP → {P1, P2, P3, P4 parallel} → MERGE → REPORT. Each partition takes 20 minutes. SETUP takes 5 minutes. MERGE takes 8 minutes. REPORT takes 10 minutes. What is the critical path elapsed time?

A) 43 minutes (5 + 20 + 8 + 10) B) 103 minutes (5 + 4×20 + 8 + 10) C) 23 minutes (5 + 8 + 10) D) 38 minutes (5 + 20 + 8 + 10/4)

Correct Answer: A Explanation: The critical path is the longest path through the pipeline. SETUP (5 min) runs first. Then all four partitions run in parallel — the elapsed time is the longest partition, which is 20 minutes (they are all equal). MERGE (8 min) runs after all partitions complete. REPORT (10 min) runs after MERGE. Total critical path: 5 + 20 + 8 + 10 = 43 minutes. The serial total would be 5 + 80 + 8 + 10 = 103 minutes. The speedup from parallelism is 103/43 = 2.4x.