Chapter 22 Exercises: Data Integration Patterns

Exercise Set A: File-Based Integration (Exercises 1-8)

Exercise 1: GDG Base Definition and Management

Bloom Level: Apply

Write the IDCAMS control statements to: a) Define a GDG base called HABANK.PROD.ACCT.EXTRACT with a LIMIT of 30, NOEMPTY, and SCRATCH. b) Define a model DSCB for the GDG with LRECL=400, BLKSIZE=32000, RECFM=FB. c) Explain what happens when generation 31 is created. What happens to generation 1? d) Write the JCL DD statement to create a new generation (+1) and the DD statement to read the current generation (0).

Exercise 2: Control Record Design

Bloom Level: Design

Design a control record structure for the HA Banking daily account extract file. Your control record must include: - File identifier (12 bytes) - Business date (8 bytes) - GDG generation number (4 bytes) - Record count excluding control record (9 bytes) - Hash total of all account balances (15 bytes with sign) - Creation timestamp (26 bytes) - Source system identifier (8 bytes) - File sequence number for multi-file extracts (3 bytes)

Write the COBOL 01-level definition for this control record. Then write the COBOL paragraph that builds the control record after processing all account records.

Exercise 3: Connect:Direct Process Design

Bloom Level: Apply

Write a Connect:Direct process that: a) Copies HABANK.PROD.ACCT.EXTRACT(0) from the mainframe (PNODE) to /data/warehouse/incoming/acct_extract_YYYYMMDD.dat on the warehouse server (SNODE). b) Uses extended compression and checkpoints every 500,000 records. c) Runs a post-transfer validation script on the SNODE that verifies the record count. d) Sends a notification back to the PNODE on success or failure. e) Include error handling for network failures with automatic retry (3 attempts, 5-minute interval).

Exercise 4: FTP Automation with Error Handling

Bloom Level: Apply

Write a complete JCL job that: a) Runs the account extract program (ACCTEXTR) in step 1. b) Conditionally (only if step 1 RC=0) runs an FTP transfer in step 2 to send the extract to warehouse.habank.com. c) Uses SFTP (SSH) rather than plain FTP. d) Includes a step 3 that checks the FTP return code and sends an email notification on failure. e) Explain why you should never put plaintext passwords in JCL and describe two alternatives.

Exercise 5: Dual-GDG Acknowledgment Pattern

Bloom Level: Analyze

CNB uses paired GDGs for critical feeds: a data GDG and an acknowledgment GDG.

a) Write the IDCAMS definitions for both GDGs. b) Write a COBOL monitoring program that reads the catalog to determine the latest generation number for both GDGs, compares them, and raises an alert if the data GDG is more than 2 generations ahead. c) How would this monitoring approach handle the case where the GDG has wrapped around its limit? Describe the edge case and how you'd address it. d) What additional information should the acknowledgment record contain beyond just "received"?

Exercise 6: Multi-Format File Extract

Bloom Level: Apply

The regulatory reporting system requires data in a specific fixed-format layout that differs from the internal account record layout. Write a COBOL program that: a) Reads the internal account VSAM file (KSDS). b) Selects only accounts matching specific regulatory criteria (active accounts with balance > $10,000). c) Transforms each record from the internal layout to the regulatory layout, including packed-to-display conversion and date reformatting. d) Writes the output as a sequential file with both header and trailer control records. e) Produces a summary report showing record counts by account type.

Exercise 7: File-Based Integration Failure Analysis

Bloom Level: Evaluate

Analyze the following production incident at CNB:

On March 10, the nightly account extract job completed normally with RC=0. The Connect:Direct transfer completed normally. The warehouse load completed normally. However, the next morning, the warehouse showed 14,203,847 accounts while the mainframe showed 14,204,122 accounts — a discrepancy of 275 records.

a) List at least five possible root causes for this discrepancy. b) For each root cause, describe the diagnostic steps you would take. c) What design changes to the integration would prevent this class of problem? d) How should the reconciliation process handle these 275 "missing" records?

Exercise 8: File Transfer Performance Tuning

Bloom Level: Analyze

The HA Banking daily extract is 50GB (14 million records at ~3,700 bytes each). The current transfer over Connect:Direct takes 2 hours and 15 minutes. The batch window allows only 1 hour and 30 minutes.

a) Calculate the effective transfer rate in MB/sec. b) List five specific tuning actions that could reduce the transfer time. c) Calculate the expected improvement from enabling Connect:Direct compression, assuming 70% compression ratio on COBOL fixed-length records. d) Propose an alternative architecture that splits the extract into parallel streams. What are the tradeoffs?


Exercise Set B: Message-Based Integration (Exercises 9-16)

Exercise 9: Content-Based Router

Bloom Level: Apply

Write a complete COBOL program that acts as a content-based router for the HA Banking system. The program should: a) Get messages from queue Q.HABANK.INBOUND. b) Parse the message type field (first 4 bytes of message). c) Route messages as follows: - OPENQ.HABANK.ACCT.OPEN - CLOSQ.HABANK.ACCT.CLOSE - XFERQ.HABANK.TRANSFER.PROCESS - BALQQ.HABANK.BALANCE.INQUIRY - Any amount field > 50,000 → Q.HABANK.HIGHVALUE.REVIEW (in addition to normal routing) - Unknown → Q.HABANK.UNROUTED d) Include counters for each route and a summary report at end of processing. e) Handle poison messages (backout count > 3).

Exercise 10: MQ Pub/Sub Configuration

Bloom Level: Apply

Design the MQ topic tree for the HA Banking system's transaction events.

a) Define the topic hierarchy with at least three levels (e.g., HABANK/ACCOUNTS/UPDATES). b) Write the MQSC commands to define the topic objects and administrative subscriptions. c) Design subscription filters for three consumers: the fraud engine (wants all transactions > $10,000), the mobile notification service (wants all transactions for enrolled accounts), and the data warehouse (wants all transactions). d) Explain how adding a sixth consumer would work without changing the publisher.

Exercise 11: Message Transformation — COBOL to JSON

Bloom Level: Apply

Write a COBOL program that transforms an account transaction record from internal COBOL format to JSON. The internal format:

01  WS-TRANSACTION-RECORD.
    05  WS-TXN-ID           PIC X(16).
    05  WS-ACCT-NUMBER      PIC X(12).
    05  WS-TXN-TYPE         PIC X(04).
    05  WS-TXN-AMOUNT       PIC S9(11)V99 COMP-3.
    05  WS-TXN-DATE         PIC 9(08).
    05  WS-TXN-TIME         PIC 9(06).
    05  WS-TXN-DESC         PIC X(40).
    05  WS-MERCHANT-ID      PIC X(15).
    05  WS-TXN-STATUS       PIC X(02).

The target JSON format:

{
  "transactionId": "TXN-0000000001",
  "accountNumber": "ACCT-00001234",
  "type": "DEBIT",
  "amount": "1234.56",
  "currency": "USD",
  "timestamp": "2026-03-15T14:23:07",
  "description": "PURCHASE AT STORE",
  "merchantId": "MERCH-001",
  "status": "COMPLETED"
}

Handle all format conversions: packed decimal to string, YYYYMMDD+HHMMSS to ISO 8601, trailing spaces trimmed from strings.

Exercise 12: Idempotent Consumer Implementation

Bloom Level: Apply

Write a COBOL/DB2 program that implements an idempotent MQ consumer: a) Gets messages from a queue. b) Checks a DB2 tracking table to see if the message ID was already processed. c) If duplicate, skips processing and logs the duplicate. d) If new, processes the message within a DB2 unit of work that includes both the business update and the tracking table insert. e) Commits the MQ get and the DB2 updates as a single syncpoint. f) Explain why the tracking table insert and the business update MUST be in the same unit of work.

Exercise 13: Message Sequencing and Ordering

Bloom Level: Analyze

The fraud detection engine requires that transaction messages arrive in chronological order per account. However, MQ does not guarantee ordering across multiple messages when multiple producer threads are involved.

a) Design a sequencing scheme using message headers that allows the consumer to detect out-of-order delivery. b) Write COBOL pseudocode for a consumer that detects gaps and out-of-order messages. c) How would you handle the case where message 5 arrives before message 4? What are the options (hold, process anyway, reject)? d) What is the performance impact of strict ordering versus best-effort ordering?

Exercise 14: Canonical Data Model Design

Bloom Level: Design

Design a canonical data model for the HA Banking system that supports account data exchange between all five downstream systems.

a) Define the JSON schema for the canonical account model, including all fields needed by all five consumers. b) Write the COBOL "to canonical" transformer that converts from the internal VSAM record layout. c) Write the COBOL "from canonical" transformer for the regulatory reporting format (fixed-width, ASCII, specific field ordering). d) Explain how you'd version the canonical model when new fields are added. What happens to existing consumers? e) Calculate the reduction in transformation count: N consumers without canonical vs. with canonical.

Exercise 15: Dead Letter Queue Processing

Bloom Level: Apply

Write a COBOL program that processes the dead letter queue (DLQ) for the HA Banking system: a) Reads messages from the DLQ. b) Examines the dead-letter header (MQDLH) to determine the original destination queue and the reason code. c) Categorizes failures: queue full, queue not found, message too large, expired, put inhibited. d) For "queue full" failures, attempts to re-route to the original queue after a configurable delay. e) For all other failures, writes the message to an error log dataset and sends an alert. f) Produces a summary report of DLQ processing.

Exercise 16: Message Volume Capacity Planning

Bloom Level: Evaluate

The HA Banking system processes 2 million transactions per day. Management wants all transactions published via MQ for real-time consumption.

a) Calculate the messages per second during peak hours (assume 70% of daily volume in an 8-hour window). b) If each message averages 2KB after transformation, calculate the hourly MQ storage requirement during peak hours. c) If three subscribers each receive a copy of every message, calculate the total daily MQ throughput in GB. d) At what point does message-based integration become impractical and file-based becomes necessary? Justify your threshold. e) Design a hybrid approach that uses MQ for real-time critical consumers and batch files for non-critical consumers.


Exercise Set C: API-Based Integration (Exercises 17-21)

Exercise 17: z/OS Connect Service Design

Bloom Level: Design

Design the z/OS Connect service definition for the HA Banking account inquiry API: a) Define the CICS COMMAREA copybook for the request and response. b) Specify the RESTful endpoint design: URI, HTTP method, path parameters, query parameters. c) Define the JSON request and response schemas. d) Map each COBOL field to its JSON equivalent, noting any format transformations. e) Specify error response formats for: account not found, invalid account number, system unavailable, authorization failure.

Exercise 18: API Versioning Strategy

Bloom Level: Evaluate

SecureFirst Insurance needs to add three new fields to their policy inquiry API. The existing API has 47 consumers.

a) Compare three versioning strategies: URL versioning (/v1/, /v2/), header versioning (Accept: application/vnd.securefirst.v2+json), and query parameter versioning (?version=2). b) For each strategy, describe the impact on existing consumers. c) Recommend a strategy and justify your choice. d) Design a deprecation timeline: how long do you support v1 after v2 launches? e) Write a z/OS Connect configuration snippet that routes requests to different CICS programs based on API version.

Exercise 19: API Caching Strategy

Bloom Level: Analyze

CNB's account inquiry API handles 50,000 calls per hour. 60% are for the same 5,000 accounts (frequent balance checkers). Each call costs 0.003 MIPS.

a) Calculate the daily MIPS consumption without caching. b) If you cache account data with a 60-second TTL, estimate the MIPS reduction. State your assumptions about cache hit rates. c) What data should NOT be cached? (Hint: think about regulatory and consistency requirements.) d) Design a cache invalidation strategy that uses MQ messages to invalidate cache entries when account data changes. e) Calculate the cost savings if 1 MIPS = $3,000/month.

Exercise 20: API Rate Limiting and Throttling

Bloom Level: Design

Design a rate-limiting strategy for the HA Banking API layer: a) Define rate limits for three consumer tiers: internal systems (high), partner systems (medium), and third-party applications (low). b) Describe how rate limiting protects the mainframe during traffic spikes. c) Write the HTTP response headers that communicate rate limit status to consumers. d) Design a circuit breaker pattern that temporarily disables API access when mainframe response times exceed a threshold. e) How do rate limits interact with the batch window? Should API rate limits change during batch processing?

Exercise 21: End-to-End API Latency Analysis

Bloom Level: Evaluate

Trace the complete request path for an account balance inquiry from a mobile app to the mainframe and back:

Mobile App → CDN/Load Balancer → API Gateway → z/OS Connect → CICS → VSAM → Response path

a) Estimate the latency contribution of each component (provide a range). b) Identify the top three latency bottlenecks. c) For each bottleneck, propose a mitigation strategy. d) What is the realistic end-to-end latency target for this API? Under what conditions would this target be missed? e) How would you monitor API latency in production? What alerting thresholds would you set?


Exercise Set D: Advanced Integration (Exercises 22-28)

Exercise 22: CDC Implementation for VSAM

Bloom Level: Apply

Design and code an application-level CDC solution for the HA Banking system's VSAM account master file: a) Define the CDC record layout (before image, after image, operation type, timestamp, program ID, user ID). b) Write a reusable COBOL subprogram (CDCWRITE) that any update program can call to write a CDC record. c) Write the consumer program that reads CDC records and publishes them as MQ messages. d) How do you ensure that every program that updates the VSAM file calls CDCWRITE? What happens if a program is missed? e) Design a reconciliation process that detects CDC gaps by comparing the VSAM file state against accumulated CDC records.

Exercise 23: Data Warehouse Feed Architecture

Bloom Level: Design

Design the complete data warehouse feed for the HA Banking system. The warehouse needs: - Daily full extract of all accounts (14 million records) - Daily incremental extract of changed transactions - Monthly full refresh of reference data (branches, products, officers)

a) Design the JCL job stream for the daily feed (all extracts, transfers, and notifications). b) Design the file layouts for each extract, including control records. c) How do you handle the case where the full extract shows 14,000,275 accounts but the warehouse's accumulated daily incrementals show 14,000,250? (25 missing accounts) d) Estimate the total data volume for the daily feed and the monthly full refresh. e) Design the monitoring dashboard that tracks feed health: completion times, record counts, transfer speeds, discrepancy counts.

Exercise 24: Regulatory Reporting Pipeline

Bloom Level: Design

Federal Benefits must produce a quarterly regulatory report containing all benefit payments, adjustments, and eligibility changes for 4.2 million beneficiaries. The report must be delivered to three federal agencies in three different file formats.

a) Design the extract process: what data sources, what selection criteria, what sorting? b) Design the transformation layer that produces three output formats from a single canonical extract. c) How do you ensure the report is complete and accurate before submission? Describe the validation steps. d) Design the secure transfer mechanism for each agency (assume each has different connectivity: one uses Connect:Direct, one uses SFTP, one uses an API upload endpoint). e) What happens if an agency rejects the file? Design the error handling and resubmission process.

Exercise 25: Integration Testing Strategy

Bloom Level: Evaluate

You've built the HA Banking integration layer with file, message, and API patterns. Now you need to test it.

a) Design test cases for file-based integration: normal processing, empty file, file with invalid control record, missing file, duplicate file. b) Design test cases for message-based integration: normal routing, poison message, queue full, out-of-order delivery, duplicate message. c) Design test cases for API-based integration: normal response, not found, timeout, rate limit exceeded, invalid input, concurrent requests. d) How do you test the end-to-end flow from mainframe update through CDC to MQ to downstream consumption? e) What is your strategy for testing with production-like volumes without impacting production?

Exercise 26: Integration Failure Recovery

Bloom Level: Evaluate

At 3:47 AM, the MQ channel between the mainframe and the fraud detection engine goes down. Transactions continue processing on the mainframe, but CDC messages queue up. The channel is restored at 5:12 AM — 85 minutes of queued messages.

a) How many messages are queued if the system processes 150 transactions per minute overnight? b) What happens when the channel restores? Describe the message flood scenario. c) How do you prevent the fraud engine from being overwhelmed by 12,750 messages arriving simultaneously? d) What if some queued messages expired during the outage? How do you recover the lost messages? e) Design an automated recovery procedure that handles this scenario without manual intervention.

Exercise 27: Format Conversion Edge Cases

Bloom Level: Analyze

Write COBOL code that handles the following format conversion edge cases: a) A packed decimal field (PIC S9(11)V99 COMP-3) containing the maximum positive value, the maximum negative value, and zero. b) An EBCDIC text field containing characters that have no ASCII equivalent (e.g., the EBCDIC cent sign, logical NOT). c) A date field containing 00000000 (null date), 99991231 (no-end-date sentinel), and 20260229 (invalid — 2026 is not a leap year). d) A variable-length COBOL field (OCCURS DEPENDING ON) that must be converted to a JSON array. e) A REDEFINES clause where the same storage can represent either a numeric amount or an alphanumeric code depending on a type indicator.

Exercise 28: Integration Architecture Review

Bloom Level: Evaluate

You are asked to review the following integration architecture proposed by a junior architect at Pinnacle Health:

"All integration will use REST APIs via z/OS Connect. The 500,000 daily claim records will be sent one at a time via API calls from the downstream billing system. Real-time eligibility checks will also use APIs. The nightly data warehouse feed will query the API 500,000 times to pull all claims. All APIs will share a single z/OS Connect instance with no caching or rate limiting."

a) Identify at least five critical problems with this architecture. b) For each problem, propose a specific remedy. c) Redesign the architecture using appropriate integration patterns for each use case. d) Estimate the MIPS impact of the original architecture versus your redesigned architecture. e) Write a one-page architecture decision record (ADR) documenting your recommended design and its rationale.