Chapter 22: Data Integration Patterns: File-Based, Message-Based, and API-Based Exchange with Distributed Systems

DataField.Dev

44 min read

> "Every mainframe I've ever worked on is the system of record for something. The question is never whether you need to share that data — it's how many systems need it, how fast they need it, and what happens when the feed breaks at 2 AM on a...

Prerequisites

1
19
4
20
21

Learning Objectives

Evaluate file-based, message-based, and API-based data integration patterns for COBOL systems
Design file-based integration with FTP/SFTP, Connect:Direct, and GDG-based handoff patterns
Implement message-based integration using MQ format transformation and routing
Design API-based integration patterns for real-time data exchange
Architect the data integration layer for the HA banking system

In This Chapter

22.1 The Integration Challenge
22.2 File-Based Integration
22.3 Message-Based Integration
22.4 API-Based Integration
22.5 Data Format Challenges
22.6 Change Data Capture and Replication
22.7 Integration Pattern Selection Framework
22.8 HA Banking System: Data Integration Layer
Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 22: Data Integration Patterns: File-Based, Message-Based, and API-Based Exchange with Distributed Systems

"Every mainframe I've ever worked on is the system of record for something. The question is never whether you need to share that data — it's how many systems need it, how fast they need it, and what happens when the feed breaks at 2 AM on a Saturday." — Rob Fielding, Integration Architect, Continental National Bank

22.1 The Integration Challenge

Let me be blunt about something that every mainframe developer eventually learns: your COBOL programs don't exist in isolation, and they never have. The myth of the self-contained mainframe died sometime in the 1990s, and if you're still thinking of your batch jobs and CICS transactions as endpoints unto themselves, you're going to have a very bad time.

The modern mainframe sits at the center of a web of integrations that would make a spider jealous. At Continental National Bank, Kwame Mensah's team counted 347 distinct integration points flowing into and out of their z/OS environment last year. Three hundred and forty-seven. That number grew by 23% from the prior year, and nobody expects the trend to reverse.

The Mainframe as System of Record

The reason the mainframe remains so deeply integrated is simple: it holds the truth. For financial institutions, healthcare systems, government agencies, and insurance companies, the mainframe is the authoritative source for core business data. Account balances. Patient records. Benefit eligibility. Policy details. These aren't copies or caches — they are the data.

This creates a fundamental asymmetry in integration design. When your Java microservice disagrees with the VSAM file about a customer's balance, the VSAM file wins. Every time. This asymmetry has profound implications for how you design integration patterns, handle conflicts, and think about data consistency.

The N-System Problem

Consider what Sandra Chen faces at Federal Benefits Administration. The core eligibility system — 4.2 million lines of COBOL running batch and CICS — must feed data to:

A web portal (Java/React) serving 60 million beneficiaries
A mobile application (iOS/Android) with push notifications
A data warehouse (Teradata) for congressional reporting
Three state-level systems receiving daily eligibility files
An analytics platform (Hadoop/Spark) for fraud detection
An audit system maintaining seven-year transaction history
Two partner agency systems with real-time inquiry needs
A document generation system producing 2 million letters monthly

That's nine distinct consumers with different data formats, latency requirements, volume characteristics, and failure modes. And Sandra's team of twelve COBOL developers is responsible for the mainframe side of every single one.

This is the N-system problem. As N grows, point-to-point integration becomes untenable. If you have N systems each talking to every other system, you have N(N-1)/2 potential integration points. With Sandra's nine consumers plus the mainframe, that's 45 potential connections. Add three more systems next year and you're at 78.

Integration Pattern Categories

The industry has converged on three fundamental patterns for moving data between systems, and every integration you'll ever build falls into one of these categories or a hybrid:

Pattern	Latency	Volume	Coupling	Complexity
File-Based	Minutes to hours	Very high (millions of records)	Loose	Low to moderate
Message-Based	Seconds to minutes	Moderate (thousands/sec)	Moderate	Moderate to high
API-Based	Milliseconds	Low to moderate (hundreds/sec)	Tight	Moderate

None of these patterns is inherently superior. Each has sweet spots and failure modes. The art of integration design is matching the pattern to the business requirement, and experienced practitioners know that most real-world systems use all three simultaneously.

ETL vs. ELT: Where Transformation Happens

Two philosophies govern how data moves between systems. In ETL (Extract, Transform, Load), you transform data on the source system before sending it. The mainframe COBOL program reads VSAM, converts packed decimal to display, reformats dates, translates EBCDIC to ASCII, and writes the output file. The downstream system loads the file as-is.

In ELT (Extract, Load, Transform), you send raw data and let the target system handle transformation. The mainframe writes the file in native format — EBCDIC, packed decimal, raw COMP fields — and the data warehouse or middleware handles conversion on ingest.

Most mainframe integrations use ETL. Why? Because the mainframe understands its own data formats better than any external system. A COBOL program knows that field X is PIC S9(7)V99 COMP-3 because the copybook says so. Asking a Java program to decode packed decimal from raw bytes is asking for trouble. Kwame's team at CNB spent two weeks debugging an ELT integration where the Java-side packed decimal parser mishandled negative values. They switched to ETL — COBOL does the conversion — and the problem disappeared.

The exception is when you're sending data to another mainframe. Mainframe-to-mainframe transfers can use native formats because both sides understand EBCDIC, packed decimal, and the full COBOL data representation repertoire.

The Cost of Integration

Integration isn't free. At CNB, Kwame estimates that 35% of his team's development effort goes to integration-related work: building new feeds, maintaining existing ones, troubleshooting failures, and responding to downstream consumer requests. At Federal Benefits, Sandra's estimate is even higher — 40%. The mainframe team spends more time sharing data than processing it.

This cost is invisible to most stakeholders. A business user who requests a "simple data feed" doesn't see the JCL, the COBOL transformation program, the Connect:Direct process, the GDG management, the control record validation, the monitoring, the on-call rotation, and the documentation. A "simple" feed is typically 2-3 weeks of work for an experienced developer, and it requires ongoing maintenance indefinitely.

Understanding this cost drives good architecture decisions. Every integration you avoid — by consolidating feeds, using canonical models, or implementing pub/sub instead of point-to-point — saves not just development time but ongoing maintenance burden.

Spaced Review: Datasets (Chapter 4)

Before we dive into file-based integration, recall what you learned about z/OS datasets in Chapter 4. Sequential datasets (PS), partitioned datasets (PDS/PDSE), and VSAM files each have characteristics that directly influence integration design. A sequential dataset is a natural fit for file-based integration — it's essentially a flat file ready for transfer. VSAM KSDS files support keyed access patterns that matter for CDC. GDG (Generation Data Group) bases provide automatic versioning that we'll exploit heavily in this chapter. If these concepts feel fuzzy, revisit Chapter 4 sections 4.3 through 4.5 before continuing.

22.2 File-Based Integration

File-based integration is the oldest pattern in the mainframe world, and it remains the workhorse of enterprise data movement. When Kwame's team at CNB needs to send 14 million account records to the data warehouse every night, they don't use an API. They write a file, transfer it, and let the downstream system process it.

Why File-Based Integration Persists

Critics dismiss file-based integration as legacy thinking. They're wrong. File-based integration persists because it has properties that no other pattern matches:

Decoupling in time. The producer and consumer don't need to be running simultaneously. Your batch job can write the file at 2 AM, and the downstream system can pick it up at 6 AM. If the downstream system is down for maintenance, the file waits.

Volume efficiency. Nothing moves large volumes of structured data more efficiently than a well-designed flat file transfer. MQ and APIs incur per-message overhead that becomes punishing at scale. A 50GB file transfer over Connect:Direct saturates the network pipe. Sending 50GB as individual MQ messages adds headers, acknowledgments, and session management overhead that can triple the elapsed time.

Recoverability. If a file transfer fails, you retransmit the file. If a message-based feed loses messages, you're in a much harder recovery scenario. Files are inherently idempotent — you can re-send the same file and get the same result.

Auditability. The file exists as a physical artifact. You can inspect it, count its records, validate its checksums, and archive it for regulatory compliance. Try doing that with a stream of MQ messages after the fact.

Transfer Mechanisms

FTP/SFTP

FTP has been moving files off mainframes since the 1970s, and z/OS Communications Server provides a robust FTP server. SFTP (SSH File Transfer Protocol) adds encryption and has largely replaced plain FTP for cross-platform transfers.

Key considerations for mainframe FTP/SFTP:

Character encoding. FTP between z/OS and distributed systems must handle EBCDIC-to-ASCII conversion. The SITE subcommands and transfer mode (ASCII vs. BINARY) control this. Get it wrong and your downstream system sees gibberish.
Dataset allocation. When receiving files, you must pre-allocate the target dataset or use JCL-based triggers. FTP can create sequential datasets but the space parameters need attention.
Security. RACF profiles control FTP access. Every FTP user maps to a z/OS userid with appropriate dataset access. SFTP uses SSH keys, which means managing key distribution and rotation.
Automation. FTP transfers triggered by batch jobs use JCL and the FTP program (PGMFTP or similar). The control statements go in SYSIN. Error handling requires checking return codes from the FTP step.

//FTPSTEP  EXEC PGM=FTP,PARM='(EXIT'
//SYSPRINT DD SYSOUT=*
//OUTPUT   DD SYSOUT=*
//INPUT    DD *
 OPEN WAREHOUSE.BANK.COM
 USER WHADMIN ********
 ASCII
 PUT 'CNB.PROD.ACCT.EXTRACT.G0000V00' ACCT_EXTRACT.DAT
 QUIT
/*

That's a minimal example. Production jobs add error trapping, retry logic, and notification steps. Lisa Park at CNB maintains a standardized FTP JCL library with 47 variations covering their common transfer scenarios.

The critical gotcha with FTP from JCL: the FTP program's return code doesn't always reflect what you think. A return code of 0 means the FTP client ran successfully, not necessarily that all transfers completed. You must parse SYSPRINT output for transfer confirmation or use a verification step that checks for the file's existence on the remote system. Rob Fielding has a rule: never trust an FTP return code — always verify independently.

Here is how that rule turns into JCL. Three controls work together. First, raise FTP's own diagnostic sensitivity so it will set a non-zero return code on a transfer failure: the EXIT=nn parameter tells FTP to terminate with a return code if any command fails, and the SRESTART/SBSENDEOL and RDW options control record handling. With PARM='EXIT=08', a failed PUT returns 8 instead of silently completing. Second, capture and test the return code in a downstream step rather than trusting the job to abend on its own. Third — the control Rob actually relies on — independently verify the result on the remote side, because FTP reporting success only proves bytes left the mainframe, not that they landed intact.

//FTPSTEP  EXEC PGM=FTP,PARM='EXIT=08',REGION=0M
//SYSPRINT DD SYSOUT=*
//OUTPUT   DD DSN=&&FTPLOG,DISP=(NEW,PASS),
//            SPACE=(TRK,(5,5)),DCB=(RECFM=FB,LRECL=160)
//INPUT    DD *
 OPEN WAREHOUSE.BANK.COM
 USER WHADMIN ********
 ASCII
 LOCSITE FILETYPE=SEQ
 PUT 'CNB.PROD.ACCT.EXTRACT.G0000V00' /data/incoming/acct_extract.dat
 RENAME /data/incoming/acct_extract.dat /data/incoming/acct_extract.done
 QUIT
/*
//* Step fails only if RC > 0; check it explicitly:
//CHKRC    IF (FTPSTEP.RC > 0) THEN
//FAILNOTE EXEC PGM=IKJEFT01,PARM='NOTIFY ACCTXFER FAILED RC=&FTPSTEP.RC'
//SYSTSPRT DD SYSOUT=*
//SYSTSIN  DD DUMMY
//         ENDIF

The RENAME step is the trick that makes verification trivial: the file is transmitted under a .dat name and renamed to .done only after the PUT succeeds. The downstream consumer is configured to ignore anything that is not *.done, so a partial transfer can never be picked up. The mainframe-side IF (FTPSTEP.RC > 0) test catches FTP-reported failures, and a later monitoring job (the dual-GDG acknowledgment pattern described below) catches the failures FTP doesn't report. Belt and suspenders — because the one time you skip the verification step is the night the warehouse loads a half-written extract and the morning P&L is wrong.

FTP vs. SFTP Decision Matrix:

Factor	FTP	SFTP
Encryption	None (plaintext)	Full (SSH tunnel)
Authentication	User/password	SSH keys or user/password
Firewall friendliness	Poor (dual port)	Good (single port 22)
z/OS support	Native (Communications Server)	Via USS SSH
Performance	Slightly faster (no encryption overhead)	Slightly slower
Regulatory compliance	Fails most audits	Passes most audits

For any new integration in 2026, use SFTP. Plain FTP is a compliance finding waiting to happen. The only exception is mainframe-to-mainframe transfers within a secured data center network where both nodes are behind the same firewall — and even then, your security team will push back.

Connect:Direct (NDM)

Connect:Direct — still called NDM (Network Data Mover) by veterans who remember its Sterling Commerce origins — is the enterprise standard for mainframe file transfer. If you work in financial services, healthcare, or government, you're using Connect:Direct. Period.

Connect:Direct offers capabilities that FTP simply cannot match:

Checkpoint/restart. A 50GB transfer that fails at 38GB resumes from the last checkpoint, not from the beginning. For transfers running over WAN links, this alone justifies Connect:Direct's licensing cost.

Compression. Built-in compression reduces transfer volumes by 60-80% for typical COBOL record layouts. The repetitive nature of fixed-length records compresses extremely well.

Process automation. Connect:Direct "processes" are scripts that define multi-step transfer workflows including pre- and post-processing, conditional logic, and notification. A single process can extract data, transfer it, and trigger downstream processing.

Guaranteed delivery. Connect:Direct maintains a transmission queue and guarantees delivery even if the remote node is temporarily unavailable. Messages are queued and forwarded when connectivity resumes.

Scheduling. Built-in scheduling eliminates the need for external schedulers for file transfer timing.

A Connect:Direct process for CNB's nightly account extract looks like this:

ACCTXFER PROCESS SNODE=WAREHOUSE
  STEP01  COPY FROM (
              DSN='CNB.PROD.ACCT.EXTRACT.G0000V00'
              DISP=SHR
              PNODE)
          TO (
              FILE=/data/incoming/acct_extract.dat
              DISP=RPL
              SNODE)
          COMPRESS EXTENDED
          CHECKPOINT=1000000
  STEP02  RUN TASK (
              PGM='/opt/scripts/trigger_warehouse_load.sh'
              SNODE)
  IF (STEP01 = 0 AND STEP02 = 0) THEN
    STEP03  RUN TASK (
                PGM=IKJEFT01
                PARM='NOTIFY ACCTXFER COMPLETE'
                PNODE)
  ELSE
    STEP04  RUN TASK (
                PGM=IKJEFT01
                PARM='NOTIFY ACCTXFER FAILED'
                PNODE)
  ENDIF
ACCTXFER ENDPROCESS

This process copies the latest GDG generation, compresses it during transfer, checkpoints every million records, triggers the warehouse load script on the remote node, and notifies the mainframe of success or failure. All as a single atomic unit of work.

GDG-Based Handoff Patterns

Generation Data Groups are the unsung heroes of file-based integration. If you're not using GDGs for your integration files, you're doing it wrong.

A GDG base with a LIMIT of 30 gives you automatic versioning, rotation, and cleanup. Every time your batch job creates a new extract file, it writes to the next generation (G0001V00 becomes G0002V00, and so on). The previous 29 generations remain available for reprocessing, auditing, or recovery.

The Producer-Consumer GDG Pattern:

Producer Job (runs at 02:00):
  → Writes CNB.PROD.ACCT.EXTRACT(+1)
  → Catalogs new generation G0035V00
  → Previous generation G0034V00 still available

Transfer Job (runs at 03:00):
  → Reads CNB.PROD.ACCT.EXTRACT(0)  [current = G0035V00]
  → Transfers to downstream system
  → Writes transfer confirmation record

Consumer Job (downstream, runs at 06:00):
  → Processes received file
  → Sends acknowledgment back

The beauty of this pattern is recovery. If the transfer job fails, you rerun it — (0) still points to G0035V00. If the consumer needs yesterday's file, it's at (-1), which is G0034V00. If an auditor needs the file from two weeks ago, it's at (-14) assuming daily processing and a sufficient GDG limit.

The Dual-GDG Acknowledgment Pattern:

For critical feeds, CNB uses paired GDGs:

CNB.PROD.ACCT.EXTRACT — the data GDG (producer writes, transfer reads)
CNB.PROD.ACCT.EXTRACT.ACK — the acknowledgment GDG (consumer writes, monitoring reads)

Each acknowledgment generation corresponds to a data generation. The monitoring job compares the latest generation numbers. If the data GDG is two or more generations ahead of the acknowledgment GDG, an alert fires. This simple pattern has caught dozens of silent feed failures at CNB.

Designing Robust File-Based Feeds

Rob Fielding learned the following rules the hard way over two decades. Every rule represents at least one production incident:

Always include a control record. First or last record contains record count, hash total, timestamp, and generation identifier. The consumer validates before processing.
Use fixed-length records for mainframe-to-mainframe. Variable-length records introduce RDW handling complexity. Fixed-length records are simpler, faster, and compress better.
Timestamp your filenames. Even with GDGs, include the business date in the data. CNB.PROD.ACCT.EXTRACT.D20260315 is self-documenting. A GDG generation number tells you nothing about the business date.
Define restart points. Large files should have logical restart markers. If you're processing a 14-million-record file and fail at record 8 million, you need to know where to restart.
Never assume the file arrived complete. Check the control record. Check the file size. Check the record count. Then — and only then — begin processing.
Monitor for missing files, not just failed transfers. The most dangerous failure is the one that doesn't happen. If no file arrives, no error occurs. Build monitoring that detects the absence of expected files.
Version your file layouts. Include a layout version identifier in the header record. When you change the record layout, increment the version. The consumer checks the version and applies the appropriate parsing logic. This enables rolling upgrades — you can produce the new format while consumers transition at their own pace.
Document the business calendar. Not every day produces a file. Weekends, holidays, and quarter-end processing can alter the schedule. The monitoring system needs to know which days to expect files and which days to skip. At CNB, Lisa maintains a "feed calendar" dataset that lists every expected file for the next 90 days, updated quarterly.

File-Based Integration at Pinnacle Health

Diane Morrison's team at Pinnacle Health faces a unique challenge: HIPAA compliance. Every file containing Protected Health Information (PHI) must be encrypted at rest and in transit. Their file-based integration adds two steps to the standard pattern:

Pre-transfer encryption. A z/OS batch step encrypts the extract file using PGP encryption before Connect:Direct picks it up. The encryption keys are managed through RACF and rotated quarterly.

Post-receipt decryption. The downstream system decrypts after transfer, validates the control record, and only then begins processing.

This adds approximately 15 minutes to a typical transfer cycle — negligible for batch feeds but significant for near-real-time requirements. Ahmad Patel evaluated moving their patient eligibility feed from files to MQ specifically to reduce the encryption overhead (MQ channel encryption is handled at the transport layer, avoiding file-level encryption/decryption cycles).

22.3 Message-Based Integration

When file-based integration's latency is too high, you move to messages. Chapter 19 covered MQ fundamentals — queue managers, channels, persistent messaging, the whole stack. This section focuses on using MQ for data integration patterns that go beyond simple request-reply.

Spaced Review: MQ Fundamentals (Chapter 19)

Recall from Chapter 19 that IBM MQ provides guaranteed message delivery through persistent queues, with exactly-once semantics when used with CICS or batch syncpoint. Messages flow through channels between queue managers. Dead-letter queues catch undeliverable messages. Triggering enables automatic program execution when messages arrive. If you're shaky on any of these concepts, revisit Chapter 19 sections 19.3 through 19.5 before continuing.

Message Routing Patterns

Point-to-Point Routing

The simplest MQ integration is point-to-point: one sender, one receiver, one queue. Your COBOL program puts a message on a queue, and a consumer program gets it off. This works fine for bilateral integrations but creates the same N-squared scaling problem as point-to-point file transfers.

MQ's pub/sub capability (available since MQ v7) decouples producers from consumers. Your COBOL program publishes account update messages to a topic. Any number of subscribers receive copies. Adding a new consumer doesn't require changing the producer.

       PERFORM MQ-PUBLISH-UPDATE
       ...
       MQ-PUBLISH-UPDATE.
           MOVE 'CNB/ACCOUNTS/UPDATES' TO WS-TOPIC-STRING
           MOVE LENGTH OF WS-TOPIC-STRING
                                        TO WS-TOPIC-LENGTH

           CALL 'MQOPEN' USING WS-HCONN
                                WS-OBJECT-DESC
                                MQOO-OUTPUT
                                WS-HOBJ
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE = MQCC-OK
               CALL 'MQPUT' USING WS-HCONN
                                   WS-HOBJ
                                   WS-MSG-DESC
                                   WS-PUT-OPTIONS
                                   WS-MSG-LENGTH
                                   WS-MSG-BUFFER
                                   WS-COMPCODE
                                   WS-REASON
           END-IF
           .

At CNB, pub/sub reduced Kwame's team's integration maintenance by 40%. Before pub/sub, adding a new consumer of account updates meant modifying the producer program, testing the change, and coordinating deployment across both systems. Now the producer is untouched — the new consumer simply subscribes to the topic.

Content-Based Routing

Content-based routing (CBR) examines message content and routes to different destinations based on data values. In mainframe integration, this typically means a routing program that reads each message, inspects key fields, and puts the message on the appropriate output queue.

       EVALUATE WS-MSG-TYPE
           WHEN 'ACCT-OPEN'
               MOVE 'Q.ACCT.OPEN.PROCESS' TO WS-TARGET-QUEUE
           WHEN 'ACCT-CLOSE'
               MOVE 'Q.ACCT.CLOSE.PROCESS' TO WS-TARGET-QUEUE
           WHEN 'ACCT-UPDATE'
               PERFORM ROUTE-BY-REGION
           WHEN 'BALANCE-ADJ'
               IF WS-ADJ-AMOUNT > 10000
                   MOVE 'Q.HIGVAL.REVIEW' TO WS-TARGET-QUEUE
               ELSE
                   MOVE 'Q.BALANCE.PROCESS' TO WS-TARGET-QUEUE
               END-IF
           WHEN OTHER
               MOVE 'Q.UNROUTED.DLQ' TO WS-TARGET-QUEUE
               ADD 1 TO WS-UNROUTED-COUNT
       END-EVALUATE

Notice the WHEN OTHER clause routing unrecognized messages to a dead-letter-like queue. Never silently drop unrecognized messages. Ever. Ahmad Patel at Pinnacle Health learned this when a new message type was introduced by an upstream system and 12,000 patient eligibility updates silently vanished into a black hole because the router had no default case.

Message Transformation

The real complexity in message-based integration isn't routing — it's transformation. Your COBOL program produces data in EBCDIC with packed decimal amounts and dates in YYYYMMDD format. The Java consumer wants UTF-8 JSON with string amounts and ISO 8601 dates. Something has to bridge that gap.

Transformation Approaches

In-program transformation. The COBOL program itself formats the message for the consumer. This is the most common approach and the one I recommend for simple integrations. Your program knows its own data structures and can build the target format directly.

MQ exits. Channel exits can transform messages in flight. Useful for encoding conversion but too limited for complex structural transformation.

Integration middleware. IBM Integration Bus (IIB, now App Connect Enterprise), DataPower, or similar middleware sits between MQ queue managers and handles complex transformations, enrichment, and routing. This is the right answer for high-volume, complex-format integrations.

z/OS Connect EE. Specifically designed for mainframe-to-API transformation. We'll cover this in section 22.4.

Building a Canonical Data Model

When you have N systems exchanging data, the transformation count explodes. System A speaks EBCDIC fixed-format. System B speaks JSON. System C speaks XML. System D speaks CSV. Without a canonical model, you need A→B, A→C, A→D, B→A, B→C, B→D... transformations. That's N(N-1) transformations.

A canonical data model (CDM) defines a single common format. Every system translates to and from the canonical format. Now you need only 2N transformations — each system needs one "to canonical" and one "from canonical" transformer.

CNB's canonical model for account data uses a JSON schema:

{
  "canonical_version": "3.2",
  "entity_type": "ACCOUNT",
  "business_date": "2026-03-15",
  "account": {
    "account_id": "CNB-00847291035",
    "account_type": "CHECKING",
    "status": "ACTIVE",
    "currency": "USD",
    "balance": {
      "current": "15423.87",
      "available": "14923.87",
      "pending_holds": "500.00"
    },
    "customer_id": "CUST-2847103",
    "branch_id": "BR-0042",
    "opened_date": "2019-06-15",
    "last_activity_date": "2026-03-15T14:23:07Z"
  }
}

The COBOL-side transformer converts from the internal VSAM record layout to this canonical JSON. Every downstream system has its own transformer that converts from canonical to its native format. When CNB adds a new consumer, they write one new "from canonical" transformer. When the COBOL record layout changes, they update one "to canonical" transformer.

Lisa Park spent three months building CNB's canonical model. She says it was the single highest-ROI effort of her career. Every new integration that uses the canonical model takes days instead of weeks to implement.

Message Reliability Patterns

Idempotent Consumers

Network failures, queue manager restarts, and application errors can all cause message redelivery. Your consumer must handle receiving the same message twice without corrupting data. This means:

Assign a unique message ID at the producer
Track processed message IDs at the consumer
Check before processing; skip duplicates

       CHECK-DUPLICATE.
           EXEC SQL
               SELECT COUNT(*) INTO :WS-DUP-COUNT
               FROM MSG_TRACKING
               WHERE MSG_ID = :WS-INCOMING-MSG-ID
           END-EXEC

           IF WS-DUP-COUNT > 0
               ADD 1 TO WS-DUPLICATE-COUNT
               MOVE 'Y' TO WS-SKIP-FLAG
           ELSE
               EXEC SQL
                   INSERT INTO MSG_TRACKING
                   (MSG_ID, RECEIVED_TS, STATUS)
                   VALUES
                   (:WS-INCOMING-MSG-ID,
                    CURRENT TIMESTAMP,
                    'PROCESSING')
               END-EXEC
               MOVE 'N' TO WS-SKIP-FLAG
           END-IF
           .

Poison Message Handling

A poison message is one that repeatedly causes consumer failure. Without protection, the queue manager redelivers it, the consumer fails, the queue manager redelivers it, and you have an infinite loop that blocks all processing.

MQ tracks the backout count for each message. Your program should check it:

       CHECK-BACKOUT.
           IF WS-BACKOUT-COUNT > WS-BACKOUT-THRESHOLD
               PERFORM MOVE-TO-POISON-QUEUE
           ELSE
               PERFORM PROCESS-MESSAGE
           END-IF
           .

       MOVE-TO-POISON-QUEUE.
           MOVE 'Q.POISON.REVIEW' TO WS-POISON-QUEUE
           CALL 'MQPUT1' USING WS-HCONN
                                WS-POISON-OD
                                WS-MSG-DESC
                                WS-PUT-OPTIONS
                                WS-MSG-LENGTH
                                WS-MSG-BUFFER
                                WS-COMPCODE
                                WS-REASON
           ADD 1 TO WS-POISON-COUNT
           .

At Pinnacle Health, Diane Morrison mandates a backout threshold of 3 for all MQ consumers. On the fourth delivery attempt, the message goes to a poison queue and an alert fires. Ahmad Patel reviews poison messages daily — they often reveal data quality issues in upstream systems.

Message Aggregation and Splitting

Two additional MQ patterns deserve attention for mainframe integration.

Message aggregation combines multiple small messages into a single larger message. If your COBOL program produces 10,000 individual transaction messages per minute, and the downstream system processes them in batches anyway, aggregation reduces MQ overhead. A COBOL aggregator program reads messages from an input queue, accumulates them until it reaches a count threshold (e.g., 100 messages) or a time threshold (e.g., 5 seconds), packages them into a single compound message, and puts the compound message on the output queue.

       AGGREGATE-MESSAGES.
           PERFORM UNTIL WS-AGG-COUNT >= WS-AGG-THRESHOLD
               OR WS-AGG-ELAPSED >= WS-TIME-THRESHOLD

               CALL 'MQGET' USING WS-HCONN
                                   WS-HOBJ-INPUT
                                   WS-MSG-DESC
                                   WS-GET-OPTIONS
                                   WS-MSG-LENGTH
                                   WS-MSG-BUFFER
                                   WS-COMPCODE
                                   WS-REASON

               IF WS-COMPCODE = MQCC-OK
                   PERFORM ADD-TO-AGGREGATE
                   ADD 1 TO WS-AGG-COUNT
               END-IF
           END-PERFORM

           IF WS-AGG-COUNT > 0
               PERFORM PUBLISH-AGGREGATE
           END-IF
           .

Message splitting does the reverse — breaks a large message into smaller ones. This is useful when a batch COBOL program produces a single large output (e.g., a complete account extract in one message) and the consumer needs individual records.

At SecureFirst Insurance, Yuki Tanaka uses both patterns. Claim status updates are aggregated (20 updates per compound message to reduce MQ overhead) before sending to the web portal. Policy renewal batch output is split into individual policy messages for the mobile notification system.

MQ Performance Considerations for COBOL Programs

MQ performance in COBOL programs depends on several factors that are easy to get wrong:

Syncpoint frequency. Every MQCMIT (syncpoint) is a synchronous write to the MQ log. If you commit after every message, you're limited by log I/O speed — typically 3,000-5,000 commits per second. Batch programs should commit every N messages (50-100 is typical) to amortize the log write cost. But don't batch too aggressively — a failure after 10,000 uncommitted messages means replaying all 10,000.

Message persistence. Persistent messages survive queue manager restart but require log writes. Non-persistent messages are faster but are lost on restart. For integration data that can be regenerated (e.g., extracted from VSAM again), non-persistent may be acceptable. For transaction events that can't be reproduced, always use persistent.

Buffer sizing. The MQGET buffer must be large enough for the largest expected message. If it's too small, you get MQRC_TRUNCATED_MSG_FAILED (reason code 2080). If it's too large, you waste working storage. Size your buffers based on your message format specifications, not guesswork.

22.4 API-Based Integration

APIs are the newest integration pattern for mainframes, and they're also the most misunderstood. When architects say "we need to expose the mainframe via APIs," they often mean wildly different things. Let's be precise.

What API-Based Integration Actually Means on z/OS

API-based integration means exposing mainframe functionality as RESTful (or occasionally SOAP) web services that distributed systems can call synchronously over HTTP/HTTPS. The caller sends a request, waits, and receives a response. Real-time. Synchronous. Tightly coupled in time.

This is fundamentally different from file and message patterns. There's no intermediary store-and-forward mechanism. If the mainframe is down, the API call fails. If the mainframe is slow, the API call is slow. The caller's user experience is directly coupled to mainframe performance.

z/OS Connect Enterprise Edition

z/OS Connect EE is IBM's strategic product for exposing mainframe assets as RESTful APIs. It runs as a Liberty server on z/OS and provides:

Service discovery. Automatically generate API definitions from CICS programs, IMS transactions, VSAM files, and DB2 stored procedures.
Data transformation. Convert between COBOL copybook layouts and JSON automatically. This is the killer feature. You point z/OS Connect at a COBOL copybook, and it generates the JSON schema and the transformation logic.
API management. Rate limiting, authentication, logging, and monitoring built in.
Swagger/OpenAPI. Auto-generates OpenAPI 3.0 specifications that distributed developers can consume directly.

Here's what this looks like in practice. Yuki Tanaka at SecureFirst Insurance has a CICS program that retrieves policy details:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. POLQRY01.

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  WS-POLICY-REQUEST.
           05  WS-POLICY-NUMBER     PIC X(12).
           05  WS-REQUEST-TYPE      PIC X(01).

       01  WS-POLICY-RESPONSE.
           05  WS-RETURN-CODE       PIC S9(04) COMP.
           05  WS-POLICY-DATA.
               10  WS-POL-STATUS    PIC X(02).
               10  WS-POL-TYPE      PIC X(04).
               10  WS-POL-EFF-DATE  PIC 9(08).
               10  WS-POL-EXP-DATE  PIC 9(08).
               10  WS-POL-PREMIUM   PIC S9(07)V99 COMP-3.
               10  WS-POL-HOLDER    PIC X(40).
               10  WS-POL-AGENT     PIC X(08).

       LINKAGE SECTION.
       01  DFHCOMMAREA.
           05  LS-COMM-REQUEST      PIC X(13).
           05  LS-COMM-RESPONSE     PIC X(75).

       PROCEDURE DIVISION.
           MOVE LS-COMM-REQUEST TO WS-POLICY-REQUEST
           PERFORM RETRIEVE-POLICY
           MOVE WS-POLICY-RESPONSE TO LS-COMM-RESPONSE
           EXEC CICS RETURN END-EXEC
           .

z/OS Connect wraps this CICS program as a REST API. A distributed developer calls:

GET /api/v1/policies/POL-00847291

And receives:

{
  "policyNumber": "POL-00847291",
  "status": "ACTIVE",
  "type": "AUTO",
  "effectiveDate": "2025-06-15",
  "expirationDate": "026-06-15",
  "premium": 1247.50,
  "policyHolder": "MARTINEZ, ELENA R",
  "agentId": "AGT-0042"
}

The distributed developer never knows — and doesn't need to know — that the data came from a COBOL program reading a VSAM file on z/OS. The transformation between COBOL copybook layout and JSON is handled entirely by z/OS Connect.

Carlos Rivera configured SecureFirst's z/OS Connect instance to handle 500 API calls per second with sub-200ms response times. That required careful tuning of the Liberty thread pool, CICS maxTask settings, and VSAM buffer allocation, but the result is an API that competes with any cloud-native service on response time.

API Design Considerations for Mainframe Data

Designing APIs that front mainframe data requires thinking differently than designing APIs for cloud-native systems:

Granularity matters more. Every API call hits the mainframe. If your API returns a customer summary and the caller always needs the address too, combine them. Two API calls means two CICS transactions, two VSAM reads, and double the CPU consumption. Mainframe CPU is metered — literally — and unnecessary API calls cost real money.

Pagination is essential. A query that returns 50,000 records as a single API response will blow out the Liberty heap, monopolize a CICS task, and probably time out. Design paginated APIs from the start.

GET /api/v1/accounts?customer_id=CUST-2847103&page=1&size=20

Caching changes the economics. If the same policy is queried 100 times per hour and the data changes once per day, cache it. Put a cache layer (Redis, API gateway cache) in front of z/OS Connect and reduce mainframe CPU consumption by 99%. Kwame's team at CNB reduced their API-driven MIPS consumption by 67% by adding a 60-second cache for account summary queries.

Versioning prevents disasters. When you change the COBOL copybook, the JSON schema changes. If you don't version your APIs, every consumer breaks simultaneously. Use URL versioning (/api/v1/, /api/v2/) and maintain backward compatibility for at least two versions.

Error responses must be meaningful. A CICS RESP code of 13 means nothing to a Java developer. Your API must translate mainframe error conditions into standard HTTP status codes with descriptive error messages:

{
  "status": 404,
  "error": "POLICY_NOT_FOUND",
  "message": "No policy found with ID POL-99999999",
  "timestamp": "2026-03-15T14:23:07Z",
  "correlationId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

CICS as API Provider vs. Batch as API Provider

Most z/OS Connect implementations front CICS programs because CICS provides the natural transaction execution environment — fast, stateless, and already optimized for high-throughput short-running requests. But batch programs can also serve as API backends through z/OS Connect's support for batch containers and IMS Connect.

The distinction matters for design. CICS-backed APIs should complete in milliseconds. If your CICS program needs to read a VSAM file, look up a customer, and return the result, that's a CICS sweet spot. If your program needs to scan 14 million records to find all accounts matching a complex criterion, that's a batch workload and should not be exposed as a synchronous API.

Yuki Tanaka at SecureFirst initially exposed a claim search function as a CICS-backed API. The function scanned a VSAM file sequentially to find claims matching a date range and status filter. For narrow searches (specific policy, recent date range), it returned in 200ms. For broad searches (all pending claims, six-month date range), it took 45 seconds and tied up a CICS task the entire time. Carlos Rivera redesigned it as a two-step process: the API submits a search request (immediate return), and a batch job processes the search asynchronously, writing results to a temporary dataset that a second API call retrieves. The user experience changed from "wait 45 seconds" to "submit search, check back in 2 minutes," which was actually preferred by the call center agents who could work other tasks while waiting.

When NOT to Use APIs

APIs are not the answer for everything. Do not use APIs when:

Volume is high. If you need to transfer 14 million records, use a file. An API processing 500 records per second takes 7.8 hours. A file transfer completes in minutes.
The consumer doesn't need real-time data. If yesterday's data is fine, a batch file is simpler, cheaper, and more reliable.
The mainframe must be the caller. COBOL programs calling outbound REST APIs is possible (via z/OS Connect or CICS web services) but introduces timeout management, retry logic, and error handling complexity that message-based integration handles more gracefully.
Transaction boundaries span systems. If you need atomic commit across the mainframe and a distributed system, APIs don't give you two-phase commit. Use MQ with coordinated transaction management instead.

22.5 Data Format Challenges

If I had a dollar for every integration bug caused by character encoding, I could retire. Data format translation is the single most common source of integration failures, and COBOL systems are especially prone because they use data representations that the rest of the computing world abandoned decades ago.

EBCDIC, ASCII, and UTF-8

z/OS uses EBCDIC (Extended Binary Coded Decimal Interchange Code). Everything else uses ASCII or UTF-8. The same byte value means completely different things:

Hex Value	EBCDIC	ASCII
0xC1	A	Á
0xF0	0	ð
0x5B	$	[
0x4B	.	K

A COBOL PIC X(10) field containing "SMITH" is stored as hex E2 D4 C9 E3 C8 40 40 40 40 40 in EBCDIC. In ASCII, "SMITH" is 53 4D 49 54 48 20 20 20 20 20. If you transfer this field without conversion, the downstream system sees "SMITH" as garbage.

For simple alphabetic and numeric data, code page conversion tables handle the translation. But complications arise with:

Special characters. The cent sign, logical-not, and vertical bar map differently between EBCDIC code pages. EBCDIC code page 037 (US English) and code page 500 (International) disagree on the position of brackets, braces, and the exclamation point.

Packed decimal. PIC S9(7)V99 COMP-3 stores the value 12345.67 as hex 01 23 45 67 0C — five bytes. There is no ASCII equivalent. You must convert to a string representation ("12345.67") or a binary numeric format before transfer.

Binary fields. PIC S9(9) COMP stores values in big-endian binary. x86 systems use little-endian. A binary value of 1000 is stored as 00 00 03 E8 on z/OS and E8 03 00 00 on x86. Transfer without conversion and your 1,000 becomes 3,892,510,720.

Date Format Translation

COBOL dates are a special circle of integration hell. Common formats you'll encounter in production systems:

PIC 9(8) — YYYYMMDD (e.g., 20260315)
PIC 9(7) — YYYYDDD (Julian date, e.g., 2026074)
PIC 9(6) — YYMMDD (Y2K-era code that somehow still exists)
PIC S9(7) COMP-3 — Packed date in various formats
PIC 9(5) COMP — Days since some epoch (every shop picks a different epoch)

Downstream systems typically want ISO 8601: 2026-03-15 or 2026-03-15T14:23:07Z. Your transformation code must handle all the formats your source programs produce, including edge cases:

Zeros in date fields (what does 00000000 mean — null? Beginning of time? A bug?)
Invalid dates (20260230 — February 30 doesn't exist, but it's in your database)
Future dates that might be placeholders (99991231 is commonly used as "no end date")

       CONVERT-DATE-TO-ISO.
           EVALUATE TRUE
               WHEN WS-INPUT-DATE = ZEROS
                   MOVE SPACES TO WS-ISO-DATE
               WHEN WS-INPUT-DATE = 99991231
                   MOVE '9999-12-31' TO WS-ISO-DATE
               WHEN OTHER
                   MOVE WS-INPUT-DATE(1:4) TO WS-ISO-YEAR
                   MOVE '-'                 TO WS-ISO-SEP1
                   MOVE WS-INPUT-DATE(5:2) TO WS-ISO-MONTH
                   MOVE '-'                 TO WS-ISO-SEP2
                   MOVE WS-INPUT-DATE(7:2) TO WS-ISO-DAY
                   STRING WS-ISO-YEAR WS-ISO-SEP1
                          WS-ISO-MONTH WS-ISO-SEP2
                          WS-ISO-DAY
                       DELIMITED BY SIZE
                       INTO WS-ISO-DATE
                   END-STRING
           END-EVALUATE
           .

The Canonical Transformation Program

Every mainframe shop needs a standard transformation utility. At Federal Benefits, Marcus Johnson built XFORMUTIL — a reusable COBOL program that handles the twelve most common data format conversions:

EBCDIC to ASCII
ASCII to EBCDIC
Packed decimal to display numeric string
Display numeric to packed decimal
Binary (COMP) to display numeric string
COBOL date (YYYYMMDD) to ISO 8601
Julian date (YYYYDDD) to ISO 8601
Zoned decimal to display numeric string
Fixed-length record to delimited (CSV)
Delimited to fixed-length record
EBCDIC to UTF-8 (with extended character handling)
UTF-8 to EBCDIC (with unmappable character substitution)

XFORMUTIL is called as a subprogram with parameters specifying the conversion type, input buffer, output buffer, and field definitions. It processes one field at a time, which is slower than a bulk approach but infinitely more maintainable. Marcus estimates XFORMUTIL is called 200 million times per day across Federal Benefits' integration feeds.

The Signed Field Problem

Signed numeric fields in COBOL deserve special attention because they're a constant source of integration bugs. In EBCDIC, a signed PIC S9(5) display field stores the sign in the zone nibble of the last byte. The value +12345 is stored as F1 F2 F3 F4 C5 — notice the C in the last byte's zone, indicating positive. The value -12345 is stored as F1 F2 F3 F4 D5 — D for negative.

If you transfer this field to an ASCII system without proper conversion, the last byte maps to a letter, not a digit. The ASCII system sees "1234E" (positive) or "1234N" (negative). Many integration failures trace to this exact problem — a batch of financial records where every amount field ends with a letter instead of a digit.

The fix is explicit conversion before transfer:

       CONVERT-SIGNED-TO-DISPLAY.
           IF WS-SIGNED-AMOUNT < 0
               COMPUTE WS-ABS-AMOUNT =
                   WS-SIGNED-AMOUNT * -1
               MOVE '-' TO WS-DISPLAY-SIGN
           ELSE
               MOVE WS-SIGNED-AMOUNT TO WS-ABS-AMOUNT
               MOVE '+' TO WS-DISPLAY-SIGN
           END-IF

           MOVE WS-ABS-AMOUNT TO WS-DISPLAY-AMOUNT
           STRING WS-DISPLAY-SIGN
                  WS-DISPLAY-AMOUNT
               DELIMITED BY SIZE
               INTO WS-OUTPUT-FIELD
           END-STRING
           .

Code Page Complications

Even within EBCDIC, there are dozens of code pages. Code page 037 (US/Canada EBCDIC) and code page 500 (International EBCDIC) disagree on the positions of brackets, braces, exclamation points, and several other special characters. If your integration file contains JSON (which uses brackets and braces extensively) and the sending system uses CP-037 while the receiving system expects CP-500, every bracket and brace is wrong.

The solution is to always specify and document the code page in your integration specifications. CNB standardized on CP-037 for all mainframe-originated files and includes the code page identifier in the file header control record. When a new consumer connects, there's no ambiguity about which code page to expect.

For z/OS Connect and API-based integration, this is handled automatically — z/OS Connect converts to UTF-8 for all outbound JSON responses. But for file-based and message-based integration, code page management is your responsibility.

22.6 Change Data Capture and Replication

File-based feeds send complete snapshots — every record, every time. For a 14-million-record account file, that's wasteful when only 200,000 accounts changed today. Change Data Capture (CDC) solves this by identifying and transmitting only the changes.

Spaced Review: Event-Driven Patterns (Chapter 20)

CDC is conceptually an event-driven pattern. Each data change generates an event — a record was inserted, updated, or deleted. Chapter 20 discussed event-driven architecture broadly. CDC applies those concepts specifically to database and file changes. The distinction between event notification (telling consumers something changed) and event-carried state transfer (sending the changed data itself) is particularly relevant here. Most CDC implementations use event-carried state transfer because consumers need the actual new values, not just a notification.

CDC Approaches for z/OS

Log-Based CDC

The most reliable CDC approach reads the database log to identify changes. For DB2 on z/OS, this means reading the DB2 recovery log (active log and archive logs). IBM InfoSphere Data Replication (formerly CDC Replication) and similar products tap the DB2 log stream, identify INSERT, UPDATE, and DELETE operations, and forward them to target systems.

Log-based CDC advantages: - No application changes. The COBOL programs are untouched. CDC operates at the database level. - No performance impact. Reading the log doesn't add any overhead to the application. - Complete capture. Every change is captured, including changes made by utilities, ad hoc SQL, and non-COBOL programs.

Log-based CDC challenges: - Product licensing. Enterprise CDC products are expensive. - Log volume. High-change-volume tables generate massive log streams. - Schema awareness. The CDC tool must understand the DB2 table schema to interpret log records.

Trigger-Based CDC

DB2 triggers fire on INSERT, UPDATE, and DELETE operations and can write change records to a CDC tracking table:

CREATE TRIGGER ACCT_CDC_INSERT
  AFTER INSERT ON ACCOUNT
  REFERENCING NEW AS N
  FOR EACH ROW
  INSERT INTO ACCOUNT_CDC
    (CDC_SEQ, CDC_TIMESTAMP, CDC_OPERATION,
     ACCT_NUMBER, ACCT_STATUS, ACCT_BALANCE,
     ACCT_LAST_ACTIVITY)
  VALUES
    (NEXT VALUE FOR CDC_SEQ_GEN,
     CURRENT TIMESTAMP, 'I',
     N.ACCT_NUMBER, N.ACCT_STATUS, N.ACCT_BALANCE,
     N.ACCT_LAST_ACTIVITY);

Trigger-based CDC is simpler and requires no additional products, but it adds overhead to every write operation and requires triggers on every monitored table. For high-volume tables with thousands of updates per second, the trigger overhead is unacceptable.

Application-Level CDC

The COBOL program itself writes change records as part of its processing. This is the most common approach in shops that can't justify CDC product licensing:

       WRITE-CDC-RECORD.
           MOVE WS-ACCOUNT-NUMBER  TO CDC-ACCT-NUMBER
           MOVE WS-OPERATION-TYPE  TO CDC-OPERATION
           MOVE WS-CURRENT-TS      TO CDC-TIMESTAMP
           MOVE WS-BEFORE-IMAGE    TO CDC-BEFORE-DATA
           MOVE WS-AFTER-IMAGE     TO CDC-AFTER-DATA
           WRITE CDC-RECORD
           ADD 1 TO WS-CDC-COUNT
           .

Application-level CDC requires modifying every program that updates the source data, which is labor-intensive and error-prone. Miss one program and you have an incomplete change stream. But for VSAM-based systems where log-based CDC isn't an option, it may be the only practical approach.

Replication Topologies

CDC captured changes need to flow somewhere. Common topologies:

Unidirectional replication. Mainframe → target. The mainframe is the master; the target is a read-only replica. This is the most common pattern for data warehouse feeds and read replicas.

Bidirectional replication. Changes flow in both directions. This is complex and dangerous — conflict resolution is required when both sides change the same record. Avoid this unless you have a clear conflict resolution strategy and the tooling to enforce it.

Fan-out replication. One source, many targets. CDC captures changes once and distributes to N consumers. This is the ideal pattern for the N-system problem — capture once, deliver many times.

At CNB, Kwame's team uses log-based CDC for DB2 tables and application-level CDC for VSAM files. The CDC changes flow through MQ topics (pub/sub) to five downstream systems. Adding a sixth consumer last quarter required zero changes to the CDC capture — they simply subscribed a new consumer to the existing topics.

CDC and the Data Warehouse: Full Load vs. Incremental

The classic data warehouse design question is: should you do full loads or incremental loads? CDC enables incremental, but the answer is rarely "one or the other" — it's usually both.

Full loads are simpler, more reliable, and self-correcting. If yesterday's incremental load missed three records, today's full load catches up automatically. But full loads are expensive — 14 million records every night when only 200,000 changed.

Incremental loads are efficient but fragile. A missed CDC event, a late-arriving transaction, or a direct database update that bypasses the CDC mechanism creates a silent discrepancy that accumulates over time.

The industry best practice — and what CNB, Federal Benefits, Pinnacle Health, and SecureFirst all use — is incremental with periodic full reconciliation. Daily CDC events keep the downstream system current in near-real-time. A weekly (or nightly) full extract feeds a reconciliation process that detects and corrects any drift. The incremental feed handles 99.9% of the data freshness requirement; the reconciliation handles the 0.1% that slips through.

Sandra Chen at Federal Benefits quantifies this: her CDC feed captures 99.97% of all changes. The remaining 0.03% — typically changes made by utility programs and ad hoc SQL that bypass the application CDC layer — are caught by the nightly reconciliation. Without reconciliation, those missed changes would accumulate at a rate of approximately 50 records per day, reaching potentially thousands of discrepancies per quarter.

CDC Ordering and Consistency

CDC events must be applied in order. If account 12345's balance changes from $1,000 to $1,500 and then from $1,500 to $2,000, applying these changes out of order produces incorrect intermediate states. For DB2 log-based CDC, ordering is guaranteed by the log sequence number. For application-level CDC, your programs must write a monotonically increasing sequence number with each change record.

Cross-table consistency is harder. If a transfer debits account A and credits account B, the CDC stream contains two events. If the consumer processes the debit before the credit, the intermediate state shows money that has left account A but hasn't arrived at account B. For data warehouse purposes, this temporary inconsistency is usually acceptable (the next event resolves it). For a real-time balance display, it's not — users will see their money "disappear" briefly.

The solution for real-time consistency is transaction-level CDC: grouping all changes within a single transaction into a single CDC event or a linked set of events. DB2 log-based CDC provides this naturally (the log records include transaction boundaries). Application-level CDC requires explicit logic to group related changes.

22.7 Integration Pattern Selection Framework

After twenty-five years of building integrations, I can tell you that pattern selection is where most teams go wrong. They pick their favorite pattern (usually APIs, because that's what the architects learned at the last conference) and apply it everywhere. Here's a decision framework that actually works.

The Five Questions

Before selecting an integration pattern, answer these questions:

1. What is the latency requirement? - Real-time (sub-second): API - Near-real-time (seconds to minutes): Message - Batch (hours): File

2. What is the volume? - Low (< 1,000 records/hour): API or Message - Medium (1,000 - 100,000 records/hour): Message - High (> 100,000 records/hour): File or CDC + Message

3. What happens when the consumer is down? - Must not lose data: File or Message (persistent) - Can retry: API with retry logic - Can tolerate loss: Any pattern (but question why you're integrating)

4. Who initiates the exchange? - Consumer pulls when ready: API or File - Producer pushes when available: Message or File - Changes drive the exchange: CDC

5. Does the exchange require a response? - Synchronous request-reply: API - Fire-and-forget: File or Message - Asynchronous request-reply: Message (correlation ID)

The Decision Matrix

Scenario	Pattern	Rationale
Nightly data warehouse load	File (Connect:Direct + GDG)	High volume, batch timing, no real-time need
Account balance inquiry from web portal	API (z/OS Connect)	Low latency, small payload, synchronous need
Transaction notification to fraud system	Message (MQ pub/sub)	Near-real-time, decoupled, multiple consumers
Regulatory reporting extract	File (GDG + SFTP)	Scheduled, high volume, audit trail needed
Customer address update from mobile app	API (z/OS Connect)	Real-time, small update, needs immediate confirmation
Account changes to five downstream systems	CDC + Message (MQ topics)	Incremental changes, multiple consumers, near-real-time
Large batch of payment instructions	File (Connect:Direct)	High volume, needs checkpoint/restart
Real-time policy lookup from call center	API (z/OS Connect)	Sub-second, synchronous, small payload

Hybrid Patterns

Real-world integrations often combine patterns. The most common hybrid is the "near-real-time with batch reconciliation" pattern:

Primary: CDC captures changes and publishes via MQ in near-real-time
Reconciliation: A nightly batch file contains the complete current state
Comparison: The downstream system compares its accumulated real-time changes against the batch file and resolves any discrepancies

This hybrid provides the low latency of CDC/messaging with the reliability and auditability of file-based integration. CNB uses this pattern for their web banking platform. Real-time account updates flow via MQ throughout the day, and a nightly batch reconciliation ensures the web banking database matches the mainframe's VSAM files exactly.

Sandra Chen at Federal Benefits uses a different hybrid: API for real-time eligibility inquiries during the day, and file-based batch feeds for bulk reporting at night. The same data, two different access patterns, two different integration mechanisms.

Anti-Patterns to Avoid

The "API Everything" anti-pattern. Some architects want to expose every mainframe function as an API. This leads to excessive mainframe CPU consumption (remember, MIPs cost money), tight coupling between all systems, and cascading failures when the mainframe has a slow day.

The "Giant File" anti-pattern. Combining unrelated data into a single massive extract file because "it's easier to send one file." This creates a brittle integration where a problem with one data domain blocks all domains.

The "Synchronous Chain" anti-pattern. System A calls System B's API, which calls System C's API, which calls the mainframe API. Latency compounds. Failure probability compounds. Debug difficulty compounds exponentially.

The "No Schema" anti-pattern. Sending data without a documented, versioned schema. "Just parse the pipe-delimited file" works until someone adds a field and every consumer breaks.

The "Ignore the Mainframe Calendar" anti-pattern. Scheduling API-heavy integrations during the batch window. The mainframe's batch cycle is sacred. Don't compete with it for CPU.

The "Fire and Forget Without Verification" anti-pattern. Sending data without confirming receipt. You put the file on the server, the downstream team says they'll "pick it up when they're ready," and nobody verifies that processing actually happened. Weeks later, you discover the file was never loaded. Always implement acknowledgment mechanisms — whether that's the dual-GDG pattern, an MQ acknowledgment message, or an API callback.

Pattern Evolution Over Time

Integration patterns evolve as requirements change. A common progression at CNB:

Year 1: New downstream system. Simple file extract, nightly batch, FTP. Quick to implement, meets the immediate need.
Year 2: Business wants faster data. Add CDC via MQ for near-real-time updates. Keep the nightly file for reconciliation.
Year 3: External partner needs real-time inquiry. Add API via z/OS Connect for the specific functions they need.
Year 5: Multiple consumers need the same data. Refactor from point-to-point to canonical model with pub/sub distribution.

This evolution is normal and healthy. Don't try to build the final-state architecture on day one. Start with the simplest pattern that meets current requirements, and evolve as needs change. The critical discipline is ensuring each evolution preserves the reliability of existing integrations.

The Integration Inventory

Every mainframe shop should maintain an integration inventory — a documented catalog of every data flow into and out of the z/OS environment. For each integration, document:

Source system and target system
Data content (what entities and fields are exchanged)
Integration pattern (file, message, API)
Schedule (when does the exchange occur)
Volume (how many records/messages/calls per cycle)
SLA (when must data arrive, what's the maximum acceptable latency)
Owner (who is responsible for each side)
Monitoring (how do you know it's working, who gets alerted on failure)
Recovery procedure (what to do when it breaks)

Lisa Park maintains CNB's integration inventory as a DB2 table with a CICS inquiry transaction for real-time lookup. When the on-call team gets a 3 AM alert about a failed integration, they query the inventory to find the owner, the recovery procedure, and the escalation path. Before the inventory existed, night-shift operators spent an average of 45 minutes figuring out who to call. Now it takes 2 minutes.

22.8 HA Banking System: Data Integration Layer

Time to apply everything we've discussed to the progressive project. Your HA Banking Transaction Processing System needs a data integration layer that feeds five downstream systems while maintaining the mainframe as the system of record.

Integration Requirements

The HA Banking System must integrate with:

Online Banking Portal — Real-time account inquiry and transaction history (API)
Mobile Banking App — Push notifications for transactions, real-time balance (API + Message)
Data Warehouse — Daily full extract of all accounts and transactions (File)
Fraud Detection Engine — Real-time transaction feed for scoring (Message/CDC)
Regulatory Reporting — Monthly and quarterly extracts in prescribed formats (File)

Architecture Design

                    ┌──────────────┐
                    │  Online      │◄─── REST API (z/OS Connect)
                    │  Banking     │
                    └──────────────┘
                    ┌──────────────┐
                    │  Mobile      │◄─── REST API + MQ Pub/Sub
                    │  Banking     │
                    └──────────────┘
┌─────────┐        ┌──────────────┐
│  COBOL  │───────►│  Data        │◄─── Connect:Direct (GDG files)
│  System │  CDC   │  Warehouse   │
│  of     │───┐    └──────────────┘
│  Record │   │    ┌──────────────┐
│         │   └───►│  Fraud       │◄─── MQ Topics (CDC events)
└─────────┘        │  Detection   │
                    └──────────────┘
                    ┌──────────────┐
                    │  Regulatory  │◄─── Batch File (GDG + SFTP)
                    │  Reporting   │
                    └──────────────┘

Project Checkpoint: What to Build

For your HA Banking System, implement:

A GDG-based extract process — JCL that creates a daily account extract with control records, record counts, and hash totals. Use GDG with LIMIT(30) for 30 days of history.
An MQ-based CDC publisher — COBOL program that reads the transaction CDC log and publishes change events to an MQ topic (HABANK/TRANSACTIONS/CHANGES). Include message format transformation from COBOL copybook layout to the canonical JSON model.
An API service definition — Design the z/OS Connect service definition for account inquiry, including request/response copybooks and the JSON mapping.
A format conversion module — Reusable COBOL subprogram that converts between internal COBOL data formats and the canonical model. Handle packed decimal, dates, and EBCDIC/ASCII conversion.
A reconciliation batch job — JCL and COBOL program that compares the current VSAM file state against the data warehouse's last received extract and reports discrepancies.

This checkpoint builds the integration foundation. Subsequent chapters will add monitoring, error handling, and failover capabilities to this integration layer.

Design Decisions for Your Implementation

Document your choices for the following questions:

Will you use a canonical data model or point-to-point transformations? (Hint: with five consumers, canonical pays for itself immediately.)
What is your CDC strategy for VSAM files? (Application-level CDC is your most practical option unless you have a CDC product.)
How will you handle API caching to minimize MIPS? (Consider what data changes frequently versus what's relatively static.)
What is your reconciliation window? (Daily is standard; how will you handle discrepancies discovered during reconciliation?)
How will you version your canonical model? (The schema will evolve. Plan for it now.)

Chapter Summary

Data integration is where mainframe COBOL systems meet the rest of the enterprise. The three fundamental patterns — file-based, message-based, and API-based — each serve different needs, and most real-world architectures use all three.

File-based integration remains the workhorse for high-volume, batch-oriented data movement. Connect:Direct with GDG-based handoff patterns provides the reliability, recoverability, and auditability that enterprise systems demand. Don't let anyone tell you files are "legacy" — they're fit for purpose.

Message-based integration fills the gap between batch files and real-time APIs. MQ pub/sub, content-based routing, and message transformation enable scalable, decoupled integration that handles the N-system problem gracefully. Idempotent consumers and poison message handling are not optional — they're essential.

API-based integration provides real-time access to mainframe data and functions. z/OS Connect transforms COBOL copybook layouts into JSON APIs that distributed developers consume without knowing the mainframe exists. But APIs are not appropriate for high-volume, batch-oriented workloads — know when to use them and when to use files.

Data format challenges — EBCDIC/ASCII conversion, packed decimal translation, date format normalization — cause more integration bugs than any other single factor. A canonical data model and standardized transformation utilities pay for themselves quickly.

Change Data Capture enables incremental data movement, solving the efficiency problem of full-file extracts. Whether log-based, trigger-based, or application-level, CDC transforms the economics of keeping downstream systems current.

The pattern selection framework — latency, volume, failure behavior, initiation, and response needs — guides you to the right pattern for each integration. Most architectures are hybrids, combining real-time and batch patterns with reconciliation to ensure consistency.

Your HA Banking System's integration layer combines all three patterns: APIs for real-time inquiry, MQ for near-real-time event distribution, and files for bulk extracts and regulatory reporting. This is not unusual — it's how every major mainframe installation operates.

The skill that separates a competent mainframe developer from an excellent one is knowing which pattern to reach for in each situation. The five-question framework gives you a starting point, but experience refines your judgment. After building fifty integrations, you'll develop an intuition for pattern selection that no framework can fully capture. Until then, use the framework, document your decisions, and learn from every failure — because in integration, failures are your best teacher.

Next Chapter Preview: Chapter 23 takes us into the monitoring and observability side of integration — how you detect, diagnose, and recover from integration failures before they become business incidents. We'll build the monitoring layer for the integration architecture designed in this chapter, including proactive alerting, SLA dashboards, and the automated recovery procedures that keep your integrations running through the night without paging the on-call engineer.

"The best integration is the one nobody notices. You only hear about the ones that break." — Lisa Park, Senior Systems Programmer, Continental National Bank

Prerequisites

Learning Objectives

In This Chapter

Chapter 22: Data Integration Patterns: File-Based, Message-Based, and API-Based Exchange with Distributed Systems

22.1 The Integration Challenge

The Mainframe as System of Record

The N-System Problem

Integration Pattern Categories

ETL vs. ELT: Where Transformation Happens

The Cost of Integration

Spaced Review: Datasets (Chapter 4)

22.2 File-Based Integration

Why File-Based Integration Persists

Transfer Mechanisms

FTP/SFTP

Connect:Direct (NDM)

GDG-Based Handoff Patterns

Designing Robust File-Based Feeds

File-Based Integration at Pinnacle Health

22.3 Message-Based Integration

Spaced Review: MQ Fundamentals (Chapter 19)

Message Routing Patterns

Point-to-Point Routing

Publish-Subscribe

Content-Based Routing

Message Transformation

Transformation Approaches

Building a Canonical Data Model

Message Reliability Patterns

Idempotent Consumers

Poison Message Handling

Message Aggregation and Splitting

MQ Performance Considerations for COBOL Programs

22.4 API-Based Integration

What API-Based Integration Actually Means on z/OS

z/OS Connect Enterprise Edition

API Design Considerations for Mainframe Data

CICS as API Provider vs. Batch as API Provider

When NOT to Use APIs

22.5 Data Format Challenges

EBCDIC, ASCII, and UTF-8

Date Format Translation

The Canonical Transformation Program

The Signed Field Problem

Code Page Complications

22.6 Change Data Capture and Replication

Spaced Review: Event-Driven Patterns (Chapter 20)

CDC Approaches for z/OS

Log-Based CDC

Trigger-Based CDC

Application-Level CDC

Replication Topologies

CDC and the Data Warehouse: Full Load vs. Incremental

CDC Ordering and Consistency

22.7 Integration Pattern Selection Framework

The Five Questions

The Decision Matrix

Hybrid Patterns

Anti-Patterns to Avoid

Pattern Evolution Over Time

The Integration Inventory

22.8 HA Banking System: Data Integration Layer

Integration Requirements

Architecture Design

Project Checkpoint: What to Build

Design Decisions for Your Implementation

Chapter Summary