Chapter 22 Quiz: Data Integration Patterns
Question 1
Which integration pattern is MOST appropriate for transferring 14 million records from a mainframe to a data warehouse on a nightly schedule?
A) REST API via z/OS Connect B) MQ publish/subscribe C) File-based transfer via Connect:Direct D) Change Data Capture with real-time replication
Answer: C Explanation: File-based transfer via Connect:Direct is optimal for high-volume, batch-scheduled data movement. It provides checkpoint/restart, compression, and guaranteed delivery — all critical for a 14-million-record transfer. APIs would be too slow (serial processing), MQ has per-message overhead that is prohibitive at this volume, and CDC is for incremental changes, not full extracts.
Question 2
What is the primary advantage of Connect:Direct over FTP for mainframe file transfers?
A) Connect:Direct supports EBCDIC encoding B) Connect:Direct provides checkpoint/restart for interrupted transfers C) Connect:Direct is free while FTP requires licensing D) Connect:Direct supports binary file transfer
Answer: B Explanation: Checkpoint/restart is Connect:Direct's killer feature. A 50GB transfer that fails at 38GB resumes from the last checkpoint rather than restarting from the beginning. FTP also supports EBCDIC and binary transfer, and Connect:Direct requires commercial licensing. The checkpoint/restart capability alone justifies Connect:Direct's cost for large enterprise transfers.
Question 3
In a GDG-based handoff pattern, what does the relative generation number (0) refer to?
A) The first generation ever created B) The generation currently being written C) The most recently cataloged generation D) The generation with the lowest sequence number
Answer: C
Explanation: GDG relative generation (0) always refers to the most recently cataloged (current) generation. (+1) refers to the next generation to be created, and (-1) refers to the generation before the current one. This relative referencing is what makes GDGs powerful for producer-consumer patterns — the consumer always reads (0) regardless of the absolute generation number.
Question 4
What is the purpose of a control record in a file-based integration?
A) To compress the file for efficient transfer B) To encrypt sensitive data fields C) To provide record counts, hash totals, and metadata for validation D) To specify the target system for the file
Answer: C Explanation: Control records (header and/or trailer) contain metadata that the consumer uses to validate file completeness and integrity. Record counts verify no records were lost during transfer. Hash totals verify data integrity. Timestamps and business dates confirm the file contains expected data. Without control records, a consumer cannot distinguish between a complete file and a truncated one.
Question 5
In MQ content-based routing, what should happen when a message has an unrecognized type?
A) Silently discard the message B) Return the message to the sender C) Route to a dead-letter or unrouted queue and generate an alert D) Hold the message until a new routing rule is added
Answer: C Explanation: Unrecognized messages must NEVER be silently discarded. Routing them to a designated unrouted/dead-letter queue preserves the message for investigation while generating an alert to notify operations staff. This catches cases where new message types are introduced upstream without corresponding routing rules, preventing silent data loss.
Question 6
What problem does a canonical data model solve in enterprise integration?
A) It eliminates the need for data encryption B) It reduces the number of transformations from N(N-1) to 2N C) It guarantees zero data loss during transfer D) It automatically converts between EBCDIC and ASCII
Answer: B Explanation: Without a canonical model, N systems exchanging data require up to N(N-1) point-to-point transformations. A canonical data model defines one common format; each system needs only a "to canonical" and "from canonical" transformer, reducing the total to 2N. For CNB with five downstream consumers plus the mainframe (N=6), that's 30 transformations without canonical versus 12 with canonical.
Question 7
Which CDC approach requires NO changes to existing COBOL application programs?
A) Application-level CDC B) Trigger-based CDC C) Log-based CDC D) All CDC approaches require application changes
Answer: C Explanation: Log-based CDC reads the database recovery log to identify changes, operating entirely at the database infrastructure level. The COBOL programs are completely untouched. Trigger-based CDC requires creating database triggers (a schema change, not an application change). Application-level CDC requires modifying every program that updates the source data to write change records.
Question 8
What is the MOST dangerous failure mode in file-based integration?
A) A transfer that fails with an error code B) A file that arrives with a corrupted control record C) A file that never arrives (no file, no error) D) A file that arrives with the wrong character encoding
Answer: C Explanation: The most dangerous failure is the one that doesn't happen. When no file arrives, no error occurs — no transfer fails, no job abends, no alert fires. This "silent failure" can go undetected for days. That's why monitoring must detect the ABSENCE of expected files, not just the failure of transfers. Active monitoring that expects a file by a certain time and alerts when it's missing is essential.
Question 9
Why must API-based integration include pagination for mainframe data queries?
A) Pagination reduces network bandwidth usage B) Unpaginated queries returning thousands of records can exhaust CICS resources and timeout C) Pagination is required by the HTTP/REST specification D) z/OS Connect only supports paginated responses
Answer: B Explanation: A query returning 50,000 records in a single API response monopolizes a CICS task for the duration, consumes excessive Liberty heap memory, and will likely timeout before completion. Pagination breaks large result sets into manageable chunks, keeping individual requests fast and resource-friendly. This is particularly important on mainframes where CPU (MIPS) is metered and expensive.
Question 10
What is a poison message in MQ-based integration?
A) A message containing malicious code B) A message that exceeds the maximum queue depth C) A message that repeatedly causes consumer failure and gets redelivered indefinitely D) A message with an expired TTL
Answer: C Explanation: A poison message is one that causes the consumer to fail every time it's processed — due to a data format error, invalid content, or a bug triggered by specific data values. Without protection, the queue manager redelivers the message, the consumer fails again, creating an infinite loop that blocks all processing on that queue. Checking the backout count and routing to a poison queue after a threshold is the standard defense.
Question 11
When converting COBOL packed decimal (COMP-3) data for a distributed system, what must you do?
A) Transfer the bytes directly; Java can read COMP-3 B) Convert to a display numeric string representation C) Convert to IEEE 754 floating point D) Left-pad with zeros to a fixed width
Answer: B Explanation: Packed decimal (COMP-3) is a mainframe-specific binary format with no direct equivalent on distributed systems. A PIC S9(7)V99 COMP-3 field stores values in a proprietary packed format (each digit occupies a nibble, with the sign in the last nibble). The standard conversion is to a display numeric string (e.g., "12345.67") that any system can parse. Converting to floating point would introduce precision loss for financial data.
Question 12
In the HA Banking System integration architecture, which pattern is used for the fraud detection engine integration?
A) File-based transfer via Connect:Direct B) REST API via z/OS Connect C) CDC events published via MQ topics D) Direct VSAM file sharing
Answer: C Explanation: The fraud detection engine needs near-real-time transaction data to score transactions as they occur. CDC captures changes to transaction data and publishes them as events on MQ topics. The fraud engine subscribes to these topics and receives transaction events within seconds. File-based transfer would be too slow (batch latency), APIs would require the fraud engine to poll continuously, and direct file sharing is not a valid cross-platform integration pattern.
Question 13
What is the "near-real-time with batch reconciliation" hybrid pattern?
A) APIs during the day, files at night B) CDC/MQ for real-time changes plus a nightly full-file extract for discrepancy detection C) MQ messages during batch window, APIs during online hours D) Connect:Direct with real-time monitoring
Answer: B Explanation: This hybrid uses CDC and MQ to publish changes in near-real-time throughout the day, combined with a nightly batch file containing the complete current state. The downstream system compares its accumulated real-time changes against the batch file and resolves any discrepancies. This provides both the low latency of messaging and the reliability/auditability of file-based integration.
Question 14
Why should API rate limits change during the mainframe batch window?
A) Rate limits should never change; they must be consistent B) The batch window requires all mainframe CPU, so API traffic should be reduced or eliminated C) APIs run faster during batch window due to less CICS activity D) Regulatory requirements mandate constant rate limits
Answer: B Explanation: The mainframe batch window is typically when critical batch jobs (end-of-day processing, extracts, reports) run and consume significant CPU resources. API traffic during this window competes for the same CPU, potentially delaying batch completion. Reducing or eliminating API rate limits during the batch window protects batch SLAs. This is the "Ignore the Mainframe Calendar" anti-pattern discussed in section 22.7.
Question 15
What is the primary risk of bidirectional data replication?
A) Network bandwidth consumption doubles B) Conflicting updates on both sides require conflict resolution strategies C) Both systems must use the same database technology D) Bidirectional replication violates ACID properties
Answer: B Explanation: When changes flow in both directions, both systems can modify the same record simultaneously, creating a conflict. If the mainframe updates account 12345's address at 2:00 PM and the web system updates the same address at 2:01 PM, which update wins? Without a clear conflict resolution strategy (last-write-wins, source priority, merge logic), data corruption is inevitable. This is why unidirectional replication with a single system of record is strongly preferred.
Question 16
How does an idempotent consumer handle receiving the same MQ message twice?
A) Processes the message again, producing duplicate results B) Checks a tracking table for the message ID and skips if already processed C) Returns the message to the queue for another consumer D) Deletes the message from the queue without processing
Answer: B Explanation: An idempotent consumer maintains a tracking table of processed message IDs. Before processing any message, it checks whether the message ID exists in the tracking table. If found, the message is a duplicate and is skipped (logged but not reprocessed). If not found, the message is processed and its ID is inserted into the tracking table within the same unit of work as the business update, ensuring atomicity.
Question 17
What happens when you transfer a COBOL COMP (binary) field from z/OS to an x86 Linux system without conversion?
A) The value is preserved correctly because binary is universal B) The value is corrupted because z/OS uses big-endian and x86 uses little-endian byte ordering C) The value is automatically converted by the network layer D) The value is preserved but the sign is reversed
Answer: B
Explanation: z/OS stores binary integers in big-endian format (most significant byte first), while x86 systems use little-endian (least significant byte first). A binary value of 1000 is stored as 00 00 03 E8 on z/OS but would be interpreted as E8 03 00 00 (3,892,510,720) on x86 without conversion. All binary fields must be explicitly converted during cross-platform data transfer.
Question 18
Which anti-pattern involves sending all mainframe data through REST APIs, including high-volume batch extracts?
A) The "Giant File" anti-pattern B) The "API Everything" anti-pattern C) The "Synchronous Chain" anti-pattern D) The "No Schema" anti-pattern
Answer: B Explanation: The "API Everything" anti-pattern attempts to expose every mainframe function as an API, including high-volume batch extracts that are far better served by file-based transfer. This leads to excessive mainframe CPU consumption (APIs incur per-transaction overhead), tight coupling between all systems, and potential cascading failures. APIs are appropriate for low-volume, real-time interactions — not bulk data movement.
Question 19
In the five-question integration pattern selection framework, which question determines whether you need file-based or API-based integration?
A) Who initiates the exchange? B) What is the latency requirement? C) Does the exchange require a response? D) What is the security classification?
Answer: B Explanation: Latency requirement is the primary discriminator between patterns. Real-time (sub-second) needs point to APIs, near-real-time (seconds to minutes) points to messages, and batch (hours) points to files. While other questions (volume, failure handling, initiation, response needs) refine the choice, latency is the first and most decisive filter in the selection framework.
Question 20
Why does CNB's API caching strategy reduce MIPS consumption by 67%?
A) Caching eliminates the need for VSAM file access B) Cached responses are served from a distributed cache layer without hitting the mainframe at all C) Caching compresses API responses to reduce processing D) Caching allows the mainframe to batch multiple API requests
Answer: B Explanation: A cache layer (Redis, API gateway cache) sitting in front of z/OS Connect intercepts repeated requests for the same data and serves cached responses directly, without any mainframe involvement. Since 60% of CNB's API traffic queries the same frequently-accessed accounts, a 60-second cache TTL means each account is queried once per minute on the mainframe regardless of how many API calls arrive. The mainframe CPU savings are proportional to the cache hit rate.