37 min read

> This chapter opens the messaging section. Everything that follows — event-driven architectures, API gateways, cloud integration — builds on the MQ fundamentals you learn here.

Chapter 19: IBM MQ for COBOL — Queue Management, Message Patterns, and Transactional Messaging

🚪 GATEWAY CHAPTER — Part IV: Messaging and Integration This chapter opens the messaging section. Everything that follows — event-driven architectures, API gateways, cloud integration — builds on the MQ fundamentals you learn here.


19.1 Why Messaging — The Architecture Pattern That Changed Enterprise Computing

Kwame Mensah remembers the week Continental National Bank almost drowned in its own success.

It was Q4 2019. CNB had just acquired a regional bank with 1.2 million consumer accounts. The migration plan called for a 72-hour cutover weekend — take both cores offline, merge the databases, bring up the unified system. Standard playbook. Kwame had done three acquisitions before.

Except this time, the acquiring bank's real-time fraud detection system couldn't tolerate 72 hours of downtime. Neither could the new mobile banking platform that marketing had launched six months early "to capture millennial market share." And neither could the Federal Reserve's FedNow interface, which had gone live three months prior and expected CNB to settle payments within seconds, not hours.

"We had 47 systems connected to the core banking platform," Kwame says. "And every single one of them was point-to-point. System A called System B through a CICS transaction. System C read System D's VSAM files through a shared DASD path. System E sent System F a flat file through Connect:Direct at 2 AM. When we mapped it out on a whiteboard, it looked like someone had thrown spaghetti at the wall."

The acquisition forced a reckoning. CNB couldn't keep adding point-to-point connections. Every new system multiplied the complexity geometrically. With 47 systems, the theoretical maximum was over 1,000 unique point-to-point paths. They were running about 340.

Kwame's team rebuilt CNB's integration architecture around IBM MQ. Three years later, those 47 systems still talk to each other — but through a messaging backbone instead of spaghetti. Adding a new system means defining queues and message formats, not writing custom integration code for every existing system it needs to talk to.

This chapter is about how that works.

The Problem with Point-to-Point

Every enterprise architect has seen the spaghetti diagram. But let's be precise about what's actually wrong with point-to-point integration, because understanding the specific failure modes tells you exactly why messaging exists.

Temporal coupling. When System A calls System B synchronously, both systems must be running at the same time. If B is down for maintenance, A fails. In a 47-system environment, you can't take anything down without checking every upstream dependency first. Rob Calloway at CNB maintained a spreadsheet — literally a spreadsheet — tracking which batch windows were safe for maintenance on which systems. It had 2,300 rows.

Spatial coupling. System A has to know where System B lives — its hostname, port, transaction ID, or dataset name. Move System B to a different LPAR, change a CICS region name, or migrate to a new subsystem, and every system that calls B needs to be updated. At CNB, changing a single CICS region name required updating 23 different programs.

Format coupling. When A sends data to B directly, they have to agree on a format. Add a field to B's input, and every system that talks to B must change. At 47 systems, a single format change cascades across the entire environment.

Capacity coupling. If A can produce work faster than B can consume it, the only options are: A slows down (wasting capacity), B scales up (expensive), or something breaks. In peak season, CNB's wire transfer system produced transactions three times faster than the compliance screening system could process them. The "solution" was an artificial delay loop in the wire transfer program. An actual PERFORM VARYING WS-WAIT FROM 1 BY 1 UNTIL WS-WAIT > 50000 doing nothing, burning CPU cycles, just to slow things down.

Messaging solves all four problems. And the key insight — the threshold concept for this chapter — is understanding that messaging doesn't just decouple systems across space. It decouples them across time.

🔍 THRESHOLD CONCEPT: Messaging Decouples Time, Not Just Space

Most people first understand MQ as "sending messages between systems" — spatial decoupling. System A puts a message on a queue; System B picks it up. They don't need to know about each other.

But the deeper insight is temporal decoupling. System A puts a message on a queue at 2:00 PM. System B might pick it up at 2:00:00.003 PM. Or at 2:15 PM when it finishes its current batch. Or at 6:00 AM tomorrow when the overnight cycle starts. The message persists on the queue until someone retrieves it. The sender doesn't wait. The receiver doesn't hurry.

Once you internalize this, everything about MQ design falls into place: why messages must be persistent, why transactional coordination matters, why dead letter queues exist, why backout thresholds are critical. Time decoupling means the queue is a contract between the present and the future. And contracts require guarantees.

What IBM MQ Actually Is

IBM MQ (formerly WebSphere MQ, formerly MQSeries) is a message-oriented middleware (MOM) product. It provides:

  • Guaranteed delivery: once MQ accepts your message, it will deliver it or tell you why it couldn't. Messages survive queue manager restarts, system crashes, and network outages.
  • Transactional integrity: MQ participates in two-phase commit with DB2 and CICS. You can put a message and update a database row as a single atomic unit of work.
  • Location transparency: senders put messages on queues by name. MQ figures out where the queue lives and how to get the message there.
  • Protocol independence: MQ handles the network protocol. Your COBOL program doesn't care whether the target system is on the same LPAR, across the Sysplex, or on a Linux server in AWS.

On z/OS, MQ runs as its own subsystem — a started task with its own address spaces, log datasets, and configuration. It's at the same architectural level as DB2 or CICS. This matters because it means MQ has its own recovery mechanisms, its own security model, and its own operations team. At CNB, the MQ team is five people. They manage 12 queue managers across four LPARs processing 180 million messages per day.

MQ's Position in the z/OS Ecosystem

Understanding where MQ sits relative to other z/OS subsystems is critical for architecture decisions. Consider the subsystem interaction model:

+-------------------+
|   Your COBOL      |
|   Application     |
+---+-----+-----+--+
    |     |     |
    v     v     v
 +----+ +----+ +----+
 |DB2 | |CICS| | MQ |
 +----+ +----+ +----+
    \     |     /
     \    |    /
      v   v   v
  +--------------+
  |    z/OS RRS  |
  | (Transaction |
  | Coordinator) |
  +--------------+

Your COBOL program can interact with all three subsystems within a single unit of work. RRS (Resource Recovery Services) sits underneath, coordinating the two-phase commit protocol across all participating resource managers. This is not theoretical — it's how CNB's wire transfer system works: one DB2 update, two MQ puts, all committed atomically.

The key implication: MQ is not a bolt-on utility. It's a first-class z/OS citizen with the same recovery guarantees as DB2. When Kwame says "MQ will deliver your message or tell you why it couldn't," that guarantee is backed by the same infrastructure that guarantees your DB2 commits survive a system crash.

💡 MQ VERSION AWARENESS As of this writing, IBM MQ 9.3 and 9.4 are the active versions on z/OS. Key capabilities added in recent versions include: streaming queues (9.3.4), enhanced pub/sub performance, native REST API access to queues, and improved container support for hybrid deployments. Your shop may be on an older version — always check DIS QMGR VERSION to know what you're working with. The MQI API itself has been remarkably stable; COBOL programs written for MQ V5 still compile and run on V9 with minimal changes.


19.2 MQ Fundamentals — Queue Managers, Queues, and Channels

Queue Managers

A queue manager is the core MQ server process. It owns queues, manages connections, handles message persistence, coordinates transactions, and controls security. Every MQ object belongs to exactly one queue manager.

On z/OS, a queue manager runs as a set of started tasks:

Component STC Name (typical) Purpose
Queue manager CSQ1MSTR Main address space, queue management
Channel initiator CSQ1CHIN Network communication
Utility CSQ1UTIL Admin commands

CNB runs four queue managers — one per LPAR:

  • CNBPQM01 — Production LPAR A (primary banking)
  • CNBPQM02 — Production LPAR B (payments and wire)
  • CNBPQM03 — Production LPAR C (batch processing)
  • CNBPQM04 — Production LPAR D (interfaces and gateways)

💡 NAMING CONVENTION CNB uses a standard: {company}{env}{type}{nn}. So CNBPQM01 = Continental National Bank, Production, Queue Manager, 01. Your shop will have its own convention, but have one. When you're troubleshooting at 3 AM with 12 queue managers in play, naming matters.

Queue Types

MQ defines several queue types. You need to understand all of them, because you'll use all of them.

Local queues are the workhorses. A local queue lives on the queue manager where it's defined. Programs connected to that queue manager can put messages to it and get messages from it. This is where messages physically reside.

DEFINE QLOCAL(CNB.WIRE.REQUEST)  +
       DESCR('Inbound wire transfer requests')  +
       PUT(ENABLED) GET(ENABLED)  +
       DEFPSIST(YES)  +
       MAXDEPTH(500000)  +
       MAXMSGL(100000)  +
       BOTHRESH(3)  +
       BOQNAME(CNB.WIRE.REQUEST.BACKOUT)  +
       TRIGTYPE(EVERY)  +
       TRIGGER  +
       PROCESS(CNB.WIRE.PROC)  +
       INITQ(SYSTEM.CICS.INITIATION.QUEUE)

Let's unpack the important attributes:

  • DEFPSIST(YES): Messages are persistent by default — they survive queue manager restarts. For financial transactions, always YES.
  • MAXDEPTH(500000): Maximum messages before the queue is full. Size this for your peak scenario plus headroom.
  • BOTHRESH(3): Backout threshold. If a message is rolled back 3 times, move it to the backout queue. This prevents poison messages from looping forever.
  • BOQNAME(...): Where poison messages go.
  • TRIGTYPE(EVERY): Trigger a process for every message arrival. This is how MQ starts CICS transactions automatically.

Remote queue definitions are pointers, not storage. A remote queue definition on Queue Manager A says "when someone puts a message to this queue name, actually route it to a queue on Queue Manager B." The sending program doesn't know or care that the queue is remote.

DEFINE QREMOTE(CNB.FRAUD.CHECK)  +
       DESCR('Route to fraud QM')  +
       RNAME(FRAUD.INBOUND.REQUEST)  +
       RQMNAME(CNBPQM02)  +
       XMITQ(CNBPQM01.TO.CNBPQM02)

Transmission queues are special local queues that hold messages in transit to another queue manager. When you put a message to a remote queue, MQ actually puts it on the transmission queue. The channel initiator picks it up and sends it across the network.

DEFINE QLOCAL(CNBPQM01.TO.CNBPQM02)  +
       DESCR('Xmit queue to QM02')  +
       USAGE(XMITQ)  +
       DEFPSIST(YES)  +
       MAXDEPTH(999999999)

Dead letter queues (DLQ) catch messages that can't be delivered. Every queue manager should have one. If MQ can't put a message on its target queue (queue full, queue doesn't exist, authorization failure), the message goes to the DLQ instead of being lost.

DEFINE QLOCAL(CNB.DEAD.LETTER.QUEUE)  +
       DESCR('Dead letter queue for CNBPQM01')  +
       PUT(ENABLED) GET(ENABLED)  +
       DEFPSIST(YES)  +
       MAXDEPTH(999999999)

⚠️ PRODUCTION REALITY "The DLQ is not a trash can," Kwame tells every new developer. "It's an ICU. Every message in the DLQ is a business transaction that failed to process. Someone needs to look at it, diagnose it, and either fix it and resubmit it or escalate it." CNB runs a monitoring job every 15 minutes that checks DLQ depth. If it's above zero, someone gets paged.

Model queues are templates. When a program creates a dynamic queue (common for reply-to queues in request/reply patterns), MQ uses a model queue as the template. The dynamic queue inherits the model's attributes.

Alias queues provide an alternative name for another queue. Useful for migration — point the alias at the old queue, then switch it to the new queue without changing any programs.

Channels

Channels are the communication links between queue managers. They come in pairs:

  • Sender channel (SDR) on Queue Manager A pushes messages from a transmission queue to Queue Manager B
  • Receiver channel (RCVR) on Queue Manager B accepts incoming messages

There are also: - Server-connection channels (SVRCONN) for client applications connecting to the queue manager - Cluster-sender and Cluster-receiver channels for MQ clustering

-- On CNBPQM01:
DEFINE CHANNEL(CNBPQM01.TO.CNBPQM02) +
       CHLTYPE(SDR)  +
       CONNAME('10.1.2.3(1414)')  +
       XMITQ(CNBPQM01.TO.CNBPQM02)  +
       SSLCIPH(TLS_RSA_WITH_AES_256_CBC_SHA256)

-- On CNBPQM02:
DEFINE CHANNEL(CNBPQM01.TO.CNBPQM02) +
       CHLTYPE(RCVR)  +
       SSLCIPH(TLS_RSA_WITH_AES_256_CBC_SHA256)

💡 THE COMPLETE FLOW Program puts message to CNB.FRAUD.CHECK (remote queue def) → MQ resolves to transmission queue CNBPQM01.TO.CNBPQM02 → Channel initiator picks up message via sender channel → Sends to CNBPQM02 via receiver channel → Message arrives on FRAUD.INBOUND.REQUEST (local queue on QM02). The sending program's code is identical whether the target queue is local or remote.


19.3 COBOL and MQ — The MQI API in Practice

The Message Queue Interface (MQI) is MQ's API. On z/OS, COBOL programs call MQI through standard CALL statements. The MQI is remarkably small — about a dozen verbs cover everything:

Verb Purpose
MQCONN / MQCONNX Connect to queue manager
MQDISC Disconnect from queue manager
MQOPEN Open a queue
MQCLOSE Close a queue
MQPUT Put a message on an open queue
MQPUT1 Open, put, close in one call (convenience)
MQGET Get a message from a queue
MQINQ Inquire about queue attributes
MQSET Set queue attributes
MQSUB Subscribe to a topic (pub/sub)
MQCMIT Commit unit of work (batch only)
MQBACK Back out unit of work (batch only)

COBOL Copybooks

MQ provides COBOL copybooks for all its data structures. You need these in your WORKING-STORAGE:

       WORKING-STORAGE SECTION.
      *-------------------------------------------------------*
      * MQ API Copybooks                                       *
      *-------------------------------------------------------*
           COPY CMQV.
      *    MQ Constants (completion codes, reason codes, etc.)

           COPY CMQODV.
      *    Object Descriptor (MQOD) - identifies the queue

           COPY CMQMDV.
      *    Message Descriptor (MQMD) - message metadata

           COPY CMQPMOV.
      *    Put Message Options (MQPMO)

           COPY CMQGMOV.
      *    Get Message Options (MQGMO)

These copybooks define structures with default values. CMQMDV, for instance, gives you an MQMD structure pre-filled with IBM's defaults. You override what you need.

Connecting to the Queue Manager

In a CICS environment, you typically don't call MQCONN — the CICS-MQ adapter handles the connection. In batch, you connect explicitly:

      *-------------------------------------------------------*
      * Connect to the queue manager                           *
      *-------------------------------------------------------*
           MOVE 'CNBPQM01' TO MQOD-OBJECTNAME

           CALL 'MQCONN' USING WS-QMGR-NAME
                                WS-HCONN
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE NOT = MQCC-OK
               DISPLAY 'MQCONN failed. Reason: ' WS-REASON
               PERFORM 9000-ABEND-ROUTINE
           END-IF

WS-HCONN is the connection handle. You pass it to every subsequent MQI call. Think of it like a DB2 plan — it represents your session with the queue manager.

Opening a Queue

Before you can put or get messages, you open the queue:

      *-------------------------------------------------------*
      * Open queue for output (putting messages)               *
      *-------------------------------------------------------*
           MOVE MQOT-Q           TO MQOD-OBJECTTYPE
           MOVE 'CNB.WIRE.REQUEST'
                                 TO MQOD-OBJECTNAME
           MOVE SPACES           TO MQOD-OBJECTQMGRNAME

           COMPUTE WS-OPTIONS = MQOO-OUTPUT
                               + MQOO-FAIL-IF-QUIESCING

           CALL 'MQOPEN' USING WS-HCONN
                                MQOD
                                WS-OPTIONS
                                WS-HOBJ
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE NOT = MQCC-OK
               DISPLAY 'MQOPEN failed. Reason: ' WS-REASON
               PERFORM 9000-ABEND-ROUTINE
           END-IF

Key points: - MQOO-OUTPUT opens for putting. Use MQOO-INPUT-SHARED or MQOO-INPUT-EXCLUSIVE for getting. - MQOO-FAIL-IF-QUIESCING means the call fails if the queue manager is shutting down. Always include this — you don't want your program hanging during planned maintenance. - WS-HOBJ is the object handle. You use it in MQPUT and MQGET calls. - Leave MQOD-OBJECTQMGRNAME blank to use the connected queue manager.

Putting a Message

Here's a complete MQPUT with proper MQMD setup:

      *-------------------------------------------------------*
      * Build the message descriptor                           *
      *-------------------------------------------------------*
      * Reset MQMD to defaults
           PERFORM INITIALIZE-MQMD

           MOVE MQMT-DATAGRAM    TO MQMD-MSGTYPE
           MOVE MQPER-PERSISTENT  TO MQMD-PERSISTENCE
           MOVE MQFMT-STRING     TO MQMD-FORMAT
           MOVE MQEI-UNLIMITED   TO MQMD-EXPIRY
           MOVE 5                TO MQMD-PRIORITY

      *-------------------------------------------------------*
      * Build the put message options                          *
      *-------------------------------------------------------*
           MOVE MQPMO-SYNCPOINT  TO MQPMO-OPTIONS
           ADD  MQPMO-NEW-MSG-ID TO MQPMO-OPTIONS
           ADD  MQPMO-NEW-CORREL-ID
                                 TO MQPMO-OPTIONS
           ADD  MQPMO-FAIL-IF-QUIESCING
                                 TO MQPMO-OPTIONS

      *-------------------------------------------------------*
      * Build the message body                                 *
      *-------------------------------------------------------*
           MOVE WS-WIRE-TRANSFER-REC
                                 TO WS-MSG-BUFFER
           MOVE LENGTH OF WS-WIRE-TRANSFER-REC
                                 TO WS-MSG-LENGTH

      *-------------------------------------------------------*
      * Put the message                                        *
      *-------------------------------------------------------*
           CALL 'MQPUT' USING WS-HCONN
                               WS-HOBJ
                               MQMD
                               MQPMO
                               WS-MSG-LENGTH
                               WS-MSG-BUFFER
                               WS-COMPCODE
                               WS-REASON

           EVALUATE WS-COMPCODE
               WHEN MQCC-OK
                   ADD 1 TO WS-PUT-COUNT
               WHEN MQCC-WARNING
                   DISPLAY 'MQPUT warning. Reason: '
                           WS-REASON
                   ADD 1 TO WS-WARN-COUNT
               WHEN MQCC-FAILED
                   DISPLAY 'MQPUT failed. Reason: '
                           WS-REASON
                   PERFORM 8000-HANDLE-MQ-ERROR
           END-EVALUATE

Let me unpack the MQMD fields because they matter:

  • MQMD-MSGTYPE: MQMT-DATAGRAM for fire-and-forget, MQMT-REQUEST for request/reply, MQMT-REPLY for the reply.
  • MQMD-PERSISTENCE: MQPER-PERSISTENT means the message is written to the MQ log. It survives queue manager restarts. For financial transactions, always persistent. The cost is I/O — persistent messages are synchronously logged.
  • MQMD-FORMAT: Tells the receiver how to interpret the body. MQFMT-STRING for character data. Use a custom format name for structured binary data.
  • MQMD-EXPIRY: How long the message lives (in tenths of a second). MQEI-UNLIMITED means forever. For request/reply, set an expiry so stale replies don't accumulate.
  • MQMD-PRIORITY: 0-9. Higher priority messages are delivered first. CNB uses priority 7 for wire transfers, 5 for standard transactions, 3 for batch feeds.

And the MQPMO options:

  • MQPMO-SYNCPOINT: The put is part of a unit of work. It's not committed until you call MQCMIT (batch) or the CICS task completes (CICS). This is critical for transactional integrity.
  • MQPMO-NEW-MSG-ID: MQ generates a unique message ID. You'll need this for correlation.
  • MQPMO-FAIL-IF-QUIESCING: Same reason as MQOPEN — respect shutdown signals.

Getting a Message

Getting is slightly more complex because you have options about which message to get:

      *-------------------------------------------------------*
      * Set up get message options                             *
      *-------------------------------------------------------*
           MOVE MQGMO-SYNCPOINT  TO MQGMO-OPTIONS
           ADD  MQGMO-WAIT       TO MQGMO-OPTIONS
           ADD  MQGMO-FAIL-IF-QUIESCING
                                 TO MQGMO-OPTIONS
           MOVE 30000            TO MQGMO-WAITINTERVAL
      *    Wait up to 30 seconds for a message

      *    Clear MQMD fields to get next available message
           MOVE MQMI-NONE        TO MQMD-MSGID
           MOVE MQCI-NONE        TO MQMD-CORRELID

      *-------------------------------------------------------*
      * Get the message                                        *
      *-------------------------------------------------------*
           MOVE LENGTH OF WS-MSG-BUFFER
                                 TO WS-BUFFER-LENGTH

           CALL 'MQGET' USING WS-HCONN
                               WS-HOBJ
                               MQMD
                               MQGMO
                               WS-BUFFER-LENGTH
                               WS-MSG-BUFFER
                               WS-MSG-LENGTH
                               WS-COMPCODE
                               WS-REASON

           EVALUATE TRUE
               WHEN WS-COMPCODE = MQCC-OK
                   PERFORM 5000-PROCESS-MESSAGE
               WHEN WS-REASON = MQRC-NO-MSG-AVAILABLE
                   SET WS-NO-MORE-MESSAGES TO TRUE
               WHEN WS-REASON = MQRC-TRUNCATED-MSG-FAILED
                   DISPLAY 'Message too large. Length: '
                           WS-MSG-LENGTH
                   PERFORM 8100-HANDLE-TRUNCATION
               WHEN OTHER
                   DISPLAY 'MQGET failed. CC: '
                           WS-COMPCODE
                           ' Reason: ' WS-REASON
                   PERFORM 8000-HANDLE-MQ-ERROR
           END-EVALUATE

Key points: - MQGMO-WAIT with MQGMO-WAITINTERVAL of 30000 means "wait up to 30 seconds for a message." Without this, MQGET returns immediately if no message is available. - Setting MQMD-MSGID and MQMD-CORRELID to MQMI-NONE/MQCI-NONE: This means "get the next available message." If you set these to specific values, MQ retrieves only a message matching those IDs. This is how you implement request/reply — you get the reply by correlation ID. - MQRC-NO-MSG-AVAILABLE (reason code 2033): Not an error. It means the wait interval expired with no message. Your program should handle this gracefully. - MQRC-TRUNCATED-MSG-FAILED (reason code 2080): The message was larger than your buffer. WS-MSG-LENGTH tells you the actual size.

⚠️ THE BUFFER SIZE TRAP Here's a mistake Kwame sees constantly from developers new to MQ: they define a 4K buffer because "our messages are always 2K." Then six months later, someone adds fields to the message format, it grows to 5K, and MQGET starts failing with reason code 2080 on every message. Size your buffers for the maximum message length the queue allows (MAXMSGL), or at minimum, handle truncation explicitly and have a recovery path.

Error Handling — The Reason Codes That Matter

MQ returns hundreds of reason codes. In practice, you'll encounter about 20 regularly. Here are the ones your error handling must cover:

Reason Code Value Meaning Response
MQRC-NONE 0 Success Continue
MQRC-NOT-AUTHORIZED 2035 Security failure Log, alert, abend
MQRC-Q-FULL 2053 Queue at MAXDEPTH Retry with backoff, alert
MQRC-NO-MSG-AVAILABLE 2033 No message (with wait) Normal — exit loop
MQRC-CONNECTION-BROKEN 2009 Lost QM connection Reconnect or abend
MQRC-Q-MGR-QUIESCING 2161 QM shutting down Clean exit
MQRC-TRUNCATED-MSG-FAILED 2080 Buffer too small Handle or abend
MQRC-BACKED-OUT 2003 UOW rolled back Retry or escalate
MQRC-PUT-INHIBITED 2051 Queue blocked for PUT Retry later, alert
MQRC-GET-INHIBITED 2016 Queue blocked for GET Retry later, alert
       8000-HANDLE-MQ-ERROR.
      *-------------------------------------------------------*
      * Central MQ error handler                               *
      *-------------------------------------------------------*
           EVALUATE WS-REASON
               WHEN MQRC-NOT-AUTHORIZED
                   MOVE 'SECURITY VIOLATION'
                                     TO WS-ERROR-TYPE
                   PERFORM 8500-LOG-AND-ABEND

               WHEN MQRC-Q-FULL
                   ADD 1 TO WS-RETRY-COUNT
                   IF WS-RETRY-COUNT > WS-MAX-RETRIES
                       MOVE 'QUEUE FULL - RETRIES EXHAUSTED'
                                     TO WS-ERROR-TYPE
                       PERFORM 8500-LOG-AND-ABEND
                   ELSE
                       CALL 'CEEINTV' USING WS-WAIT-SECS
                                             WS-FC
                       GO TO 4000-PUT-MESSAGE
                   END-IF

               WHEN MQRC-CONNECTION-BROKEN
                   MOVE 'CONNECTION LOST'
                                     TO WS-ERROR-TYPE
                   PERFORM 8500-LOG-AND-ABEND

               WHEN MQRC-Q-MGR-QUIESCING
                   DISPLAY 'Queue manager quiescing. '
                           'Clean shutdown.'
                   PERFORM 9500-CLEAN-SHUTDOWN

               WHEN OTHER
                   MOVE 'UNEXPECTED MQ ERROR'
                                     TO WS-ERROR-TYPE
                   DISPLAY 'MQ Error - CC: ' WS-COMPCODE
                           ' RC: ' WS-REASON
                   PERFORM 8500-LOG-AND-ABEND
           END-EVALUATE.

🔄 SPACED REVIEW — Chapter 1 Recall from Chapter 1 how z/OS subsystems (DB2, CICS, MQ) each have their own address spaces and recovery mechanisms. MQ's connection handle (HCONN) is analogous to a DB2 thread — it's your program's session with the subsystem. A broken connection handle is as serious as a lost DB2 thread.


19.4 Message Patterns — Request/Reply, Fire-and-Forget, and Pub/Sub

Patterns are not academic exercises. The pattern you choose determines your error handling, your timeout strategy, your monitoring requirements, and your capacity planning. Choose wrong and you'll be redesigning under pressure during the next peak season.

Fire-and-Forget (Datagram)

The simplest pattern. The sender puts a message and moves on. No response expected.

Sender → MQPUT → [Queue] → MQGET → Receiver

When to use it: - Audit logging - Event notifications - Data feeds where the sender doesn't need confirmation - Any case where the sender's processing shouldn't block on the receiver

COBOL implementation: Set MQMD-MSGTYPE to MQMT-DATAGRAM. The sending program doesn't need a reply-to queue. The MQPUT example in Section 19.3 is a fire-and-forget pattern.

At CNB, fire-and-forget handles 70% of all messages. Every core banking transaction puts an audit message to CNB.AUDIT.FEED. The audit system processes them independently. If the audit system falls behind, the queue just gets deeper — the core banking system doesn't slow down.

The risk: If the receiver fails permanently, messages accumulate until the queue fills up. You need monitoring on queue depth.

Request/Reply

The sender puts a request and waits for a response. This is the MQ equivalent of a synchronous call, but with a crucial difference — the sender and receiver are still temporally decoupled.

Sender → MQPUT → [Request Queue] → MQGET → Receiver
                                               |
Sender ← MQGET ← [Reply Queue]   ← MQPUT ←---+

When to use it: - Real-time inquiries (account balance check) - Validation (fraud screening before authorization) - Any case where the sender needs a response before proceeding

COBOL implementation — the sender side:

      *-------------------------------------------------------*
      * Request/Reply - Sender                                 *
      *-------------------------------------------------------*
      * Step 1: Create a temporary reply queue
           MOVE 'CNB.REPLY.MODEL' TO MQOD-OBJECTNAME
           MOVE MQOD-DYNAMIC-Q-NAME
                                 TO MQOD-DYNAMICQNAME
      *    Pattern: 'CNB.REPLY.*' generates unique names
           MOVE 'CNB.REPLY.*'    TO MQOD-DYNAMICQNAME

           COMPUTE WS-OPTIONS = MQOO-INPUT-EXCLUSIVE

           CALL 'MQOPEN' USING WS-HCONN
                                MQOD
                                WS-OPTIONS
                                WS-HOBJ-REPLY
                                WS-COMPCODE
                                WS-REASON

      * Save the generated reply queue name
           MOVE MQOD-OBJECTNAME  TO WS-REPLY-QNAME

      * Step 2: Put the request message
           MOVE MQMT-REQUEST     TO MQMD-MSGTYPE
           MOVE WS-REPLY-QNAME   TO MQMD-REPLYTOQ
           MOVE SPACES           TO MQMD-REPLYTOQMGR
           MOVE 3000             TO MQMD-EXPIRY
      *    Expires in 300 seconds (30 * 10ths of seconds)

           PERFORM 4000-PUT-MESSAGE

      * Save the message ID for correlation
           MOVE MQMD-MSGID       TO WS-SAVED-MSGID

      * Step 3: Wait for the reply
           MOVE MQCI-NONE        TO MQMD-CORRELID
           MOVE WS-SAVED-MSGID   TO MQMD-CORRELID
           MOVE MQMI-NONE        TO MQMD-MSGID

           MOVE MQGMO-WAIT       TO MQGMO-OPTIONS
           ADD  MQGMO-SYNCPOINT  TO MQGMO-OPTIONS
           ADD  MQGMO-FAIL-IF-QUIESCING
                                 TO MQGMO-OPTIONS
           MOVE 30000            TO MQGMO-WAITINTERVAL

           CALL 'MQGET' USING WS-HCONN
                               WS-HOBJ-REPLY
                               MQMD
                               MQGMO
                               WS-BUFFER-LENGTH
                               WS-MSG-BUFFER
                               WS-MSG-LENGTH
                               WS-COMPCODE
                               WS-REASON

           IF WS-REASON = MQRC-NO-MSG-AVAILABLE
               MOVE 'REPLY TIMEOUT' TO WS-ERROR-TYPE
               PERFORM 8200-HANDLE-TIMEOUT
           END-IF

The critical detail: the correlation ID. The sender saves the message ID of its request. The receiver copies that message ID into the correlation ID of its reply. The sender then gets the reply by matching on correlation ID. Without this, in a high-volume system with hundreds of concurrent request/reply pairs, you'd get someone else's reply.

COBOL implementation — the receiver side:

      *-------------------------------------------------------*
      * Request/Reply - Receiver (service program)             *
      *-------------------------------------------------------*
      * Get the request
           PERFORM 5000-GET-MESSAGE

      * Save request's MQMD fields for the reply
           MOVE MQMD-MSGID       TO WS-REQUEST-MSGID
           MOVE MQMD-REPLYTOQ    TO WS-REPLY-QNAME
           MOVE MQMD-REPLYTOQMGR TO WS-REPLY-QMGR

      * Process the request
           PERFORM 6000-PROCESS-REQUEST

      * Build the reply
           PERFORM INITIALIZE-MQMD
           MOVE MQMT-REPLY       TO MQMD-MSGTYPE
           MOVE WS-REQUEST-MSGID TO MQMD-CORRELID
           MOVE MQPER-PERSISTENT  TO MQMD-PERSISTENCE

      * Open the reply queue
           MOVE WS-REPLY-QNAME   TO MQOD-OBJECTNAME
           MOVE WS-REPLY-QMGR   TO MQOD-OBJECTQMGRNAME

           COMPUTE WS-OPTIONS = MQOO-OUTPUT
                               + MQOO-FAIL-IF-QUIESCING

           CALL 'MQOPEN' USING WS-HCONN
                                MQOD
                                WS-OPTIONS
                                WS-HOBJ-REPLY
                                WS-COMPCODE
                                WS-REASON

      * Put the reply
           PERFORM 4100-PUT-REPLY

🧩 PATTERN INTERACTION At CNB, the wire transfer system uses request/reply for fraud screening. The wire transfer program puts a request to CNB.FRAUD.CHECK, which routes (via remote queue definition) to CNBPQM02 where the fraud engine runs. The fraud engine processes the request and puts the reply back to the dynamic reply queue on CNBPQM01. The wire transfer program waits up to 30 seconds. If no reply comes, it routes the transaction for manual review. SLA: 95% of fraud checks return within 2 seconds.

Publish/Subscribe

In pub/sub, publishers send messages to topics, and MQ distributes copies to all subscribers. Publishers don't know about subscribers. Subscribers don't know about publishers. MQ handles the fan-out.

Publisher → MQPUT → [Topic: CNB/RATES/FX] → MQ distributes
                                              ↓       ↓       ↓
                                         [Sub 1]  [Sub 2]  [Sub 3]
                                         Trading  Pricing  Reporting

When to use it: - Reference data distribution (FX rates, interest rates, product catalog updates) - Event broadcasting (system status changes, market data) - Any case where multiple consumers need the same information

COBOL publisher:

      *-------------------------------------------------------*
      * Publish FX rate update                                 *
      *-------------------------------------------------------*
           MOVE MQOT-TOPIC       TO MQOD-OBJECTTYPE
           MOVE 'CNB/RATES/FX'  TO MQOD-OBJECTSTRING
           MOVE 14               TO MQOD-OBJECTSTRINGLENGTH
           MOVE SPACES           TO MQOD-OBJECTNAME

           COMPUTE WS-OPTIONS = MQOO-OUTPUT
                               + MQOO-FAIL-IF-QUIESCING

           CALL 'MQOPEN' USING WS-HCONN
                                MQOD
                                WS-OPTIONS
                                WS-HOBJ-TOPIC
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE = MQCC-OK
               PERFORM 4200-PUT-FX-RATE
           END-IF

COBOL subscriber (durable subscription):

      *-------------------------------------------------------*
      * Subscribe to FX rate updates                           *
      *-------------------------------------------------------*
           MOVE 'FX.RATE.SUB.TRADING'
                                 TO MQSD-SUBNAME
           MOVE 'CNB/RATES/FX'  TO MQSD-OBJECTSTRING
           MOVE 14               TO MQSD-OBJECTSTRINGLENGTH
           MOVE MQSO-CREATE + MQSO-RESUME +
                MQSO-DURABLE + MQSO-FAIL-IF-QUIESCING
                                 TO MQSD-OPTIONS

           CALL 'MQSUB' USING WS-HCONN
                               MQSD
                               WS-HOBJ-SUB
                               WS-HSUB
                               WS-COMPCODE
                               WS-REASON

Durable subscriptions (MQSO-DURABLE) persist across queue manager restarts. If the subscriber disconnects, MQ keeps collecting messages for it. When the subscriber reconnects and resumes (MQSO-RESUME), it gets all the messages it missed. For financial data, always durable.

💡 CHOOSING THE RIGHT PATTERN

Requirement Pattern Why
"Send and forget" Datagram No response needed
"Need an answer" Request/Reply Synchronous-style with decoupling
"Everyone needs to know" Pub/Sub One-to-many distribution
"Exactly one consumer" Datagram with competing consumers Load balancing across instances
"Must process in order" Datagram with exclusive get Serialization guarantee

Competing Consumers — The Pattern You'll Use Most

There's a pattern that doesn't get its own category in textbooks but dominates production: competing consumers. Multiple instances of the same program read from a single queue. MQ ensures each message is delivered to exactly one consumer.

At CNB, the transaction processing queue (CNB.TXN.INBOUND) has six CICS regions reading from it simultaneously. Each MQGET with MQOO-INPUT-SHARED retrieves the next available message. MQ handles the locking — no two consumers get the same message.

                 +--→ [Consumer 1 - CICS Region A1]
                 |
[Queue] →→→→→→→→+--→ [Consumer 2 - CICS Region A2]
                 |
                 +--→ [Consumer 3 - CICS Region B1]

This gives you horizontal scaling. If your consumers can't keep up, add another one. If a consumer crashes, the others continue processing. Messages that were in-flight on the crashed consumer (under syncpoint) are rolled back and picked up by another consumer.

The constraint: message ordering is not guaranteed across competing consumers. If messages A, B, and C arrive in that order, Consumer 1 might process A and C while Consumer 2 processes B. If your business logic requires strict ordering, you can't use competing consumers — you need a single consumer with MQOO-INPUT-EXCLUSIVE, which serializes all processing through one reader.

CNB's solution: they use competing consumers for transaction processing (order doesn't matter — each transaction is independent) but a single exclusive consumer for the general ledger feed (GL entries must post in sequence).

⚠️ THE COMPETING CONSUMERS TRAP Diane Okoye at Pinnacle Health learned this the hard way. Their claims adjudication queue had three competing consumers. Claims are supposed to be independent — but a batch of related claims (a patient with multiple procedures on the same visit) had inter-claim dependencies. Consumer 1 processed the primary claim and approved it. Consumer 3 processed a dependent claim, checked for the primary's approval, didn't find it (Consumer 1 hadn't committed yet), and rejected it. The fix: batch-related claims into a single message group (covered in Chapter 20) so one consumer processes the entire group.

Pattern Anti-Patterns

A few patterns that look reasonable but cause production problems:

Request/reply for high-volume fire-and-forget. If you don't need the response, don't ask for one. Every request/reply creates a dynamic reply queue, waits for a response, and doubles the message count. At CNB's peak of 22 million messages per hour, converting the audit feed from datagram to request/reply would have added 22 million reply messages, 22 million dynamic queues, and a 50% increase in MQ log volume. "Just because you can doesn't mean you should," Kwame says.

Pub/sub for point-to-point. If only one system needs the message, use a datagram to a specific queue. Pub/sub adds the overhead of topic tree management and subscription matching. Use it when you genuinely have one-to-many distribution.

Synchronous request/reply as a replacement for all CICS LINK calls. MQ request/reply adds latency (serialization + network + deserialization + queue processing) compared to an intra-region CICS LINK. If two programs are in the same CICS region and always will be, LINK is faster. Use MQ when you need the decoupling benefits — cross-system, cross-LPAR, or temporal independence.


19.5 Transactional Messaging — Coordinating MQ with DB2 and CICS

This is where MQ earns its keep in enterprise systems. The ability to coordinate a message put or get with a database update as a single atomic transaction is the foundation of reliable enterprise messaging.

The Problem

Consider this scenario at CNB: A wire transfer program debits Account A in DB2 and puts a message to the correspondent bank's queue. What happens if the DB2 update succeeds but the MQPUT fails? Account A is debited, but the correspondent bank never gets the message. Money disappears.

Or the reverse: MQPUT succeeds, but DB2 update fails. The correspondent bank receives a message for a debit that never happened.

Both outcomes are catastrophic. You need atomicity — either both succeed or both fail.

Syncpoint and Unit of Work

MQ participates in z/OS Resource Recovery Services (RRS), the platform's distributed transaction coordinator. When you specify MQPMO-SYNCPOINT on an MQPUT or MQGMO-SYNCPOINT on an MQGET, the MQ operation becomes part of the current unit of work. It's committed or rolled back along with everything else in that unit of work — DB2 updates, VSAM changes, other MQ operations.

In CICS, the unit of work is managed by CICS. An EXEC CICS SYNCPOINT commits everything — DB2 updates, MQ messages, and any other recoverable resources.

      *-------------------------------------------------------*
      * Wire Transfer - Transactional Pattern in CICS          *
      *-------------------------------------------------------*
      * Step 1: Debit the source account (DB2)
           EXEC SQL
               UPDATE ACCOUNTS
               SET BALANCE = BALANCE - :WS-AMOUNT
               WHERE ACCOUNT_NUM = :WS-SOURCE-ACCT
           END-EXEC

           IF SQLCODE NOT = 0
               EXEC CICS SYNCPOINT ROLLBACK END-EXEC
               PERFORM 8000-HANDLE-DB2-ERROR
               GO TO 9000-RETURN
           END-IF

      * Step 2: Put message to correspondent bank (MQ)
           MOVE MQPMO-SYNCPOINT  TO MQPMO-OPTIONS
           ADD  MQPMO-FAIL-IF-QUIESCING
                                 TO MQPMO-OPTIONS

           CALL 'MQPUT' USING WS-HCONN
                               WS-HOBJ
                               MQMD
                               MQPMO
                               WS-MSG-LENGTH
                               WS-MSG-BUFFER
                               WS-COMPCODE
                               WS-REASON

           IF WS-COMPCODE NOT = MQCC-OK
               EXEC CICS SYNCPOINT ROLLBACK END-EXEC
               PERFORM 8000-HANDLE-MQ-ERROR
               GO TO 9000-RETURN
           END-IF

      * Step 3: Commit both operations atomically
           EXEC CICS SYNCPOINT END-EXEC

If the MQPUT fails, the SYNCPOINT ROLLBACK undoes the DB2 debit. If the SYNCPOINT itself fails (system crash), RRS coordinates recovery on restart — either both commit or both roll back. Two-phase commit guarantees it.

In batch, you use MQCMIT and MQBACK:

      *-------------------------------------------------------*
      * Batch commit                                           *
      *-------------------------------------------------------*
           CALL 'MQCMIT' USING WS-HCONN
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE NOT = MQCC-OK
               DISPLAY 'MQCMIT failed. Reason: '
                       WS-REASON
               CALL 'MQBACK' USING WS-HCONN
                                    WS-COMPCODE
                                    WS-REASON
               PERFORM 8000-HANDLE-COMMIT-FAILURE
           END-IF

⚠️ CRITICAL: THE TWO-PHASE COMMIT WINDOW Between the MQPUT and the SYNCPOINT, there's a window where the message is "in doubt." If the CICS region fails in that window, RRS resolves it on restart. But here's what catches people: the message is invisible to consumers during this window. An MQGET won't see it until it's committed. This is correct behavior — it prevents consumers from processing messages that might be rolled back. But it means you can't test your MQ programs by putting a message and immediately checking the queue depth. The depth only changes after commit.

The Backout Counter

What happens when a consumer gets a message, tries to process it, and fails? With MQGMO-SYNCPOINT, the MQGET is rolled back, and the message reappears on the queue. The consumer gets it again. Tries again. Fails again. Without intervention, this loops forever.

MQ tracks a backout counter in the message header (MQMD-BACKOUTCOUNT). Each rollback increments it. The queue's BOTHRESH attribute defines the threshold. When the counter hits the threshold, your program should route the message to the backout queue instead of processing it.

      *-------------------------------------------------------*
      * Check backout count before processing                  *
      *-------------------------------------------------------*
           IF MQMD-BACKOUTCOUNT >= WS-BACKOUT-THRESHOLD
               PERFORM 7000-ROUTE-TO-BACKOUT-QUEUE
           ELSE
               PERFORM 6000-PROCESS-MESSAGE
           END-IF
       7000-ROUTE-TO-BACKOUT-QUEUE.
      *-------------------------------------------------------*
      * Move poison message to backout queue                   *
      *-------------------------------------------------------*
           MOVE 'CNB.WIRE.REQUEST.BACKOUT'
                                 TO MQOD-OBJECTNAME

           COMPUTE WS-OPTIONS = MQOO-OUTPUT
                               + MQOO-FAIL-IF-QUIESCING

           CALL 'MQOPEN' USING WS-HCONN
                                MQOD
                                WS-OPTIONS
                                WS-HOBJ-BACKOUT
                                WS-COMPCODE
                                WS-REASON

           IF WS-COMPCODE = MQCC-OK
               MOVE MQPMO-SYNCPOINT
                                 TO MQPMO-OPTIONS
               CALL 'MQPUT' USING WS-HCONN
                                   WS-HOBJ-BACKOUT
                                   MQMD
                                   MQPMO
                                   WS-MSG-LENGTH
                                   WS-MSG-BUFFER
                                   WS-COMPCODE
                                   WS-REASON
           END-IF

           CALL 'MQCLOSE' USING WS-HCONN
                                 WS-HOBJ-BACKOUT
                                 MQCO-NONE
                                 WS-COMPCODE
                                 WS-REASON.

🔍 WHY THIS MATTERS Lisa Tran at CNB calls the backout queue "the canary in the coal mine." A sudden spike in backout queue depth usually means a deployment broke the message format, a DB2 table is in a bad state, or a resource is unavailable. CNB's monitoring triggers a Severity 2 incident if any backout queue depth exceeds 100 messages.


19.6 MQ in CICS — The EXEC CICS Interface vs. MQI

CICS applications have two ways to interact with MQ:

  1. Direct MQI calls (CALL 'MQPUT', CALL 'MQGET', etc.)
  2. CICS API bridge and the CICS-MQ adapter

In practice, most CICS-MQ programs use the direct MQI with the CICS-MQ adapter handling the connection. You don't call MQCONN in CICS — the adapter provides the connection handle through the CICS task.

How the CICS-MQ Adapter Works

The adapter is configured in CICS with a CICS MQ connection definition:

DEFINE MQCONN(MQ01)
       GROUP(MQGROUP)
       MQNAME(CNBPQM01)
       RESYNCMEMBER(YES)

When your CICS program makes its first MQI call, the adapter connects to the queue manager on your behalf. All MQI calls within the same CICS task share this connection. The connection is released when the task ends.

The critical benefit: transaction coordination. The CICS-MQ adapter participates in CICS's unit of work. When you issue EXEC CICS SYNCPOINT, MQ operations under MQPMO-SYNCPOINT and MQGMO-SYNCPOINT are committed along with DB2 and VSAM updates. This is the two-phase commit integration we discussed in Section 19.5.

MQ Triggering in CICS

One of MQ's most powerful features in CICS is automatic triggering. You can configure a queue so that when a message arrives, MQ automatically starts a CICS transaction to process it.

The setup:

DEFINE QLOCAL(CNB.WIRE.REQUEST)  +
       TRIGGER  +
       TRIGTYPE(EVERY)  +
       PROCESS(CNB.WIRE.PROC)  +
       INITQ(SYSTEM.CICS.INITIATION.QUEUE)

DEFINE PROCESS(CNB.WIRE.PROC)  +
       APPLICID('WTRN')  +
       APPLTYPE(CICS)

When a message arrives on CNB.WIRE.REQUEST, MQ puts a trigger message on the initiation queue. The CICS MQ trigger monitor (transaction CKTI) picks up the trigger message and starts transaction WTRN. WTRN reads the message from CNB.WIRE.REQUEST and processes it.

Trigger types: - EVERY: Trigger on every message arrival. Use for low-to-medium volume queues where immediate processing is needed. - FIRST: Trigger only when the queue transitions from empty to non-empty. Use when you want one instance of the processing program to drain the queue. - DEPTH: Trigger when queue depth reaches a threshold. Use for batch-style processing — wait until enough messages accumulate.

The triggered CICS program follows a specific pattern. Here's the skeleton that CNB uses for all triggered MQ consumers:

       0000-MAIN.
      *    CKTI passes the trigger message in a
      *    CICS Transient Data area. We don't need it —
      *    we know which queue to read.
           MOVE MQHC-DEF-HCONN   TO WS-HCONN
           PERFORM 1000-OPEN-QUEUE
           PERFORM 2000-PROCESS-LOOP
           PERFORM 3000-CLOSE-QUEUE
           EXEC CICS RETURN END-EXEC.

       2000-PROCESS-LOOP.
           PERFORM UNTIL WS-NO-MORE-MESSAGES
               PERFORM 2100-GET-MESSAGE
               IF NOT WS-NO-MORE-MESSAGES
                   PERFORM 2200-CHECK-BACKOUT
                   IF NOT WS-IS-POISON
                       PERFORM 2300-PROCESS-MESSAGE
                   END-IF
                   PERFORM 2400-COMMIT
               END-IF
           END-PERFORM.

Note the loop structure: a triggered program doesn't process just one message and exit. It drains the queue (or processes until a configurable batch limit). This is especially important with TRIGTYPE(FIRST) — one trigger starts one program instance, and that instance is responsible for all currently queued messages. If you process only one message and return, the remaining messages sit on the queue until a new message arrives and triggers the program again.

⚠️ TRIGGERED PROGRAM LIFETIME Marcus Whitfield at Federal Benefits Administration learned a painful lesson about triggered programs. Their enrollment processor was triggered with TRIGTYPE(FIRST). The program opened the queue, processed one message, committed, and returned. During peak enrollment periods with 50,000 messages queued, the program was triggered, processed one message, exited, waited for the next trigger (which only fires on empty-to-non-empty transition), and the other 49,999 messages sat there. The fix was simple but the 3-hour processing delay affected 12,000 benefit enrollments. Always drain the queue in a loop.

🔄 SPACED REVIEW — Chapter 13 In Chapter 13, we discussed CICS Multi-Region Operation (MRO) for connecting CICS regions. MQ offers an alternative approach. With MRO, the regions must be on the same LPAR and one region directly calls a program in another. With MQ, the regions can be anywhere — same LPAR, different LPARs, different sysplexes, different continents. The tradeoff: MRO is faster (no serialization/deserialization), MQ is more flexible. CNB uses MRO within the core banking TOR/AOR/FOR structure and MQ for inter-system integration. Know which tool fits which job.

EXEC CICS WRITEQ TS vs. MQ

Don't confuse CICS Temporary Storage (TS) queues with MQ queues. They solve different problems:

Feature CICS TS Queue MQ Queue
Scope Single CICS region Cross-system, cross-platform
Persistence Optional (AUXILIARY) Yes (default for production)
Transaction CICS UOW RRS-coordinated
Triggering No Yes
Routing No Yes (channels, clusters)
Use case Scratch pad within a transaction Inter-system messaging

If your data stays within one CICS region and one transaction's lifetime, use TS queues. If it crosses any boundary — system, time, or reliability requirement — use MQ.


19.7 MQ Clustering and High Availability

For production banking systems, a single queue manager is a single point of failure. MQ provides two mechanisms for high availability on z/OS: queue manager clusters and shared queues.

Queue Manager Clusters

An MQ cluster is a group of queue managers that can automatically route messages to each other without explicit channel and remote queue definitions.

-- Define cluster on each queue manager
ALTER QMGR REPOS(CNB.CLUSTER)

-- Define a cluster queue (available to all members)
DEFINE QLOCAL(CNB.PAYMENT.INBOUND)  +
       CLUSTER(CNB.CLUSTER)  +
       DEFBIND(NOTFIXED)  +
       CLWLPRTY(5)

With DEFBIND(NOTFIXED), MQ can route each message to a different instance of the queue across the cluster. This provides:

  • Workload balancing: Messages distribute across multiple queue managers
  • Availability: If one queue manager is down, messages route to others
  • Simplified administration: No need for explicit remote queue definitions and channels between every pair

CNB's cluster has four members (one per LPAR). The CNB.PAYMENT.INBOUND queue exists on all four. When any system puts a message to that queue, MQ routes it to the least-loaded instance.

Shared Queues in Parallel Sysplex

For z/OS sites running a Parallel Sysplex, MQ supports shared queues stored in the Coupling Facility (CF). This is the highest level of MQ availability.

DEFINE QLOCAL(CNB.CRITICAL.PAYMENTS)  +
       QSGDISP(SHARED)  +
       CFSTRUCT(CNB_PAYMENT_CF)  +
       DEFPSIST(YES)

With shared queues: - Any queue manager in the Queue Sharing Group can access the queue - If a queue manager fails, other queue managers immediately access the messages - No message loss, no failover delay - The Coupling Facility provides the shared storage

            +-------------------+
            | Coupling Facility |
            | CNB_PAYMENT_CF    |
            | [Shared Queue]    |
            +---+-------+---+--+
                |       |   |
        +-------+  +----+  +-------+
        |          |                |
   +----+----+ +---+-----+ +-------+--+
   |CNBPQM01 | |CNBPQM02 | |CNBPQM03  |
   |LPAR A   | |LPAR B   | |LPAR C    |
   +---------+ +---------+ +----------+

💡 SHARED QUEUES vs. CLUSTERING

Feature Cluster Shared Queues
Scope Any connected QMs Queue Sharing Group (Sysplex)
Failover Message re-routing Instant (CF-based)
Message ordering Not guaranteed across instances Preserved per queue
Infrastructure Network channels Coupling Facility
Cost Lower Higher (CF hardware)

CNB uses shared queues for critical payment processing (zero message loss tolerance) and clustering for lower-priority workloads like batch feeds and audit messages.

Workload Balancing

In a cluster, MQ balances messages across instances of a queue using several algorithms:

  • Round-robin (default): Messages alternate between instances
  • Priority-based (CLWLPRTY): Higher-priority instances get messages first
  • Rank-based (CLWLRANK): Used for primary/backup — rank 0 gets no messages unless all higher ranks are unavailable

For CNB's payment processing, Kwame configured the cluster with priority-based balancing. The primary LPAR gets priority 5; the DR LPAR gets priority 1. Under normal operations, all payments flow through the primary. If the primary LPAR fails, payments automatically route to DR.


19.8 Dead Letter Queues, Poison Messages, and Error Handling

This section is about what happens when things go wrong. In a production MQ environment processing 180 million messages per day, things go wrong daily.

Dead Letter Queue Processing

Every queue manager must have a DLQ. Messages end up there when:

  • The target queue is full
  • The target queue doesn't exist
  • The sender isn't authorized to put to the target queue
  • The message exceeds the target queue's maximum message length
  • The queue is PUT-inhibited

MQ adds a Dead Letter Header (MQDLH) to the front of the message. The MQDLH tells you why the message was dead-lettered and where it was originally going.

       COPY CMQDLHV.
      *    Dead Letter Header structure

       5100-PROCESS-DLQ-MESSAGE.
      *-------------------------------------------------------*
      * Process a dead letter queue message                    *
      *-------------------------------------------------------*
      * The DLH is prepended to the original message
           MOVE WS-MSG-BUFFER(1:LENGTH OF MQDLH)
                                 TO MQDLH

      * Extract diagnostic information
           MOVE MQDLH-DESTQNAME  TO WS-ORIG-QUEUE
           MOVE MQDLH-DESTQMGRNAME
                                 TO WS-ORIG-QMGR
           MOVE MQDLH-REASON     TO WS-DLQ-REASON
           MOVE MQDLH-PUTDATE    TO WS-DLQ-DATE
           MOVE MQDLH-PUTTIME    TO WS-DLQ-TIME

      * Log for investigation
           DISPLAY 'DLQ Message:'
           DISPLAY '  Original queue: ' WS-ORIG-QUEUE
           DISPLAY '  Original QMgr:  ' WS-ORIG-QMGR
           DISPLAY '  DLQ Reason:     ' WS-DLQ-REASON
           DISPLAY '  Put date/time:  ' WS-DLQ-DATE
                   '/' WS-DLQ-TIME

      * Attempt reprocessing based on reason
           EVALUATE WS-DLQ-REASON
               WHEN MQRC-Q-FULL
                   PERFORM 5110-RETRY-DELIVERY
               WHEN MQRC-NOT-AUTHORIZED
                   PERFORM 5120-SECURITY-ALERT
               WHEN OTHER
                   PERFORM 5130-ESCALATE
           END-EVALUATE.

⚠️ PRODUCTION WAR STORY In 2022, CNB had an incident where a DLQ processor had a bug — it read the dead letter header incorrectly, submitted messages for reprocessing to the wrong queue, which generated more dead letters, which got reprocessed to the wrong queue again. In 45 minutes, the DLQ went from 200 messages to 2.3 million. The fix: the DLQ processor now writes a reprocessing audit trail and has a circuit breaker — if DLQ depth increases during reprocessing, it stops and alerts. "Never trust your error handler until your error handler has been through an incident," Kwame says.

Poison Message Handling

A poison message is one that causes the consuming application to fail every time it tries to process it. Maybe the message has an invalid format. Maybe it references data that doesn't exist in DB2. Maybe the processing logic has a bug that only triggers for certain message content.

Without handling, a poison message creates a rollback loop: MQGET → process → fail → rollback → MQGET (same message) → process → fail → rollback → forever.

The defense is the backout count mechanism we covered in Section 19.5, combined with an explicit routing strategy:

       5200-CHECK-POISON-MESSAGE.
      *-------------------------------------------------------*
      * Poison message detection and routing                   *
      *-------------------------------------------------------*
           IF MQMD-BACKOUTCOUNT > 0
               IF MQMD-BACKOUTCOUNT >= WS-BACKOUT-THRESHOLD
      *            This message has failed too many times
                   PERFORM 5210-LOG-POISON-MESSAGE
                   PERFORM 5220-ROUTE-TO-EXCEPTION
                   PERFORM 5230-COMMIT-AND-CONTINUE
               ELSE
      *            Retry - but log the retry
                   ADD 1 TO WS-RETRY-TOTAL
                   DISPLAY 'Retry #'
                           MQMD-BACKOUTCOUNT
                           ' for MsgId: '
                           MQMD-MSGID
               END-IF
           END-IF.

       5210-LOG-POISON-MESSAGE.
           DISPLAY '*** POISON MESSAGE DETECTED ***'
           DISPLAY '  MsgId:     ' MQMD-MSGID
           DISPLAY '  CorrelId:  ' MQMD-CORRELID
           DISPLAY '  Backouts:  ' MQMD-BACKOUTCOUNT
           DISPLAY '  Queue:     ' WS-QUEUE-NAME
           DISPLAY '  PutDate:   ' MQMD-PUTDATE
           DISPLAY '  PutTime:   ' MQMD-PUTTIME
      *    Write to audit table for investigation
           PERFORM 5215-WRITE-AUDIT-RECORD.

A Complete Error Handling Strategy

Production MQ applications need a layered error handling strategy:

Layer 1: Operation-level. Check completion code and reason code after every MQI call. Handle expected conditions (queue full, no message available, quiescing) inline.

Layer 2: Message-level. Check backout count before processing. Route poison messages to the exception queue. Log every retry and every routing decision.

Layer 3: Program-level. If the program encounters an unrecoverable error (connection broken, security violation), clean up resources, log diagnostics, and abend with a meaningful code.

Layer 4: Infrastructure-level. Monitor queue depths, DLQ depths, channel statuses, and backout queue depths. Alert operations when thresholds are exceeded.

+----------------------------------------------+
| Layer 4: Infrastructure Monitoring            |
|   Queue depth alerts, channel monitoring,     |
|   DLQ depth, backout queue depth              |
+----------------------------------------------+
| Layer 3: Program-Level                        |
|   Unrecoverable errors → cleanup → abend     |
+----------------------------------------------+
| Layer 2: Message-Level                        |
|   Backout detection → poison routing          |
+----------------------------------------------+
| Layer 1: Operation-Level                      |
|   CC/RC checking after every MQI call         |
+----------------------------------------------+

At CNB, every MQ program must implement all four layers. It's a code review gate — no program goes to production without verified MQ error handling at every layer.


19.9 Project Checkpoint — MQ Queue Architecture for HA Banking System

🧩 PROGRESSIVE PROJECT: HA Banking Transaction Processing System

For the HA Banking System project, you'll now design the MQ messaging layer that connects the system's components. Here's what you need:

Requirements

  1. Inter-LPAR communication: The HA system spans two LPARs (active-active). Transactions initiated on either LPAR must be visible on both.
  2. External interfaces: The system connects to FedNow (real-time payments), SWIFT (international wire), and the bank's mobile platform.
  3. Audit trail: Every transaction must produce an audit message for the compliance system.
  4. Fraud screening: Every debit over $10,000 must pass through fraud screening before authorization (request/reply, 5-second SLA).
  5. Notification: Account holders receive real-time notifications via the mobile platform.

Design Tasks

Task 1: Queue Manager Topology Define the queue managers. Consider: how many? Where? What's the HA strategy — clustering, shared queues, or both?

Task 2: Queue Definitions For each message flow, define the queues needed. Consider: local, remote, transmission, dead letter, backout. Name them consistently.

Task 3: Message Patterns For each flow, choose the pattern (datagram, request/reply, pub/sub) and justify the choice.

Task 4: Transactional Boundaries For the fraud screening flow, write the pseudocode showing DB2 update, MQPUT, and SYNCPOINT coordination. What happens if fraud screening times out?

Task 5: Error Handling Design the DLQ processing strategy. What happens to messages that can't be delivered? Who monitors? What's the escalation path?

Here's a starter topology diagram for your design:

LPAR-A                              LPAR-B
+------------------+                +------------------+
| HAQM01           |  Shared Queue  | HAQM02           |
|                  |<--- CF --->   |                  |
| Core Banking A   |                | Core Banking B   |
| Fraud Engine     |                | Fraud Engine     |
| Notification Pub |                | Notification Pub |
+--------+---------+                +--------+---------+
         |                                   |
         +------- MQ Cluster --------+-------+
                      |
              +-------+-------+
              |               |
         +----+----+    +-----+----+
         | Gateway |    | Gateway  |
         | QM03    |    | QM04     |
         | FedNow  |    | SWIFT    |
         +---------+    | Mobile   |
                        +----------+

CHECKPOINT DELIVERABLE Submit your queue architecture document with: queue manager definitions, queue definitions (MQSC format), channel definitions, message flow diagrams, and error handling strategy. This design becomes the foundation for the implementation exercises in Chapters 20–22.


Production Considerations

Monitoring. At minimum, monitor: queue depth (current and trend), DLQ depth, channel status, queue manager status, and message age (oldest message on each queue). CNB uses IBM MQ Explorer for real-time monitoring and Tivoli for alerting.

Security. MQ security has three layers: 1. Connection security: Who can connect to the queue manager (RACF MQCONN class) 2. Queue security: Who can open/put/get/inquire on each queue (RACF MQQUEUE class) 3. Channel security: TLS encryption and authentication on channels

Every MQ program at CNB runs under a specific RACF user ID with precisely scoped permissions. No program gets blanket access to all queues. "Least privilege isn't just a good idea," Lisa Tran says. "It's how you sleep at night."

Performance tuning. Key parameters: - MAXDEPTH: Size for peak volume plus 50% headroom. If a queue consistently runs deep, your consumers are too slow. - MAXMSGL: Keep as small as possible. Larger messages consume more buffer pool storage and log space. - Buffer pool size: The MQ equivalent of DB2's buffer pools. Size for your working set of messages. - Log configuration: Dual logging for production (always). Archive logs for recovery. - Persistent vs. non-persistent: Persistent messages hit the log (I/O cost). Use non-persistent only when you can tolerate message loss — which in banking means almost never.

Message design. Define your message formats in COBOL copybooks. Version them. Include a header with format version, timestamp, source system, and message type. When the format changes, increment the version. Consumers check the version and handle old and new formats during transition periods.

       01  WS-MSG-HEADER.
           05  MSG-FORMAT-ID       PIC X(8)  VALUE 'CNBWIRE1'.
           05  MSG-FORMAT-VERSION  PIC 9(4)  VALUE 0003.
           05  MSG-TIMESTAMP       PIC X(26).
           05  MSG-SOURCE-SYSTEM   PIC X(8).
           05  MSG-TYPE            PIC X(4).
           05  MSG-BODY-LENGTH     PIC 9(8)  COMP.

Capacity planning. Know your numbers: - Messages per second (average and peak) - Average message size - Persistent vs. non-persistent ratio - Queue depth profiles (how long messages sit before processing) - Log volume (persistent messages * average size * 2 for dual logging)

CNB processes 180 million messages per day. Peak hour is 7x average. During the acquisition migration, they hit 340 million in one day. The infrastructure was sized for 500 million. "Size for the disaster you haven't had yet," Rob Calloway says.

Testing MQ programs. Testing MQ-based applications is fundamentally different from testing batch file-processing programs. You need:

  1. A test queue manager. Never test against production MQ. CNB has dedicated test queue managers (CNBTQM01 through CNBTQM04) that mirror the production topology but with smaller queue depths and no external connections.

  2. Message injection tools. You need a way to put test messages on queues. IBM provides amqsput (the sample put utility) and the MQ Explorer for manual message injection. For automated testing, CNB has a COBOL utility program that reads a test data file and puts each record as a message — essentially the same as Example 1 in this chapter's code samples.

  3. Message verification. After your program processes messages, you need to verify the results: Were the right messages put to the output queues? Were error cases routed to the backout queue? Were DLQ messages handled? CNB's test framework includes a verification program that reads output queues and compares messages against expected results.

  4. Syncpoint testing. This is the trickiest part. You need to verify that rollback works correctly — that a failed processing step actually rolls back both the DB2 changes and the MQ messages. CNB has a test harness that deliberately injects failures at each step of the processing pipeline and verifies that the unit of work is properly backed out.

🔍 TESTING GOTCHA Here's a trap that catches every new MQ developer at least once: you put a message under syncpoint, then immediately use MQ Explorer to browse the queue and wonder why the message isn't there. It's there — but it's uncommitted. Until you commit (MQCMIT in batch, SYNCPOINT in CICS), the message is invisible to other programs. This is correct behavior, but it makes testing confusing until you understand the syncpoint lifecycle.

Operational runbook entries. Every MQ application should have runbook entries covering:

  • How to check queue depths and channel status (DIS Q(*) CURDEPTH and DIS CHS(*))
  • How to restart a stopped channel (START CHANNEL(name))
  • How to drain a queue in an emergency (CLEAR QLOCAL(name) — destructive, use with extreme caution)
  • How to inhibit puts or gets on a queue (ALTER QLOCAL(name) PUT(DISABLED))
  • How to display and resolve in-doubt units of work (DIS CONN(*) TYPE(CONN) WHERE(UOWSTATE EQ UNRESOLVED))
  • The escalation path for each queue's DLQ and backout queue messages

At CNB, the MQ runbook is 340 pages. It took two years to build. "Every page was written because something went wrong," Lisa Tran says. "The runbook is our institutional memory of incidents."


Summary

IBM MQ is the nervous system of enterprise mainframe integration. This chapter covered:

  1. Why messaging exists: Temporal and spatial decoupling solve the four fundamental problems of point-to-point integration (temporal coupling, spatial coupling, format coupling, capacity coupling).

  2. MQ architecture: Queue managers own queues. Channels connect queue managers. Messages flow through local queues, remote queue definitions, and transmission queues. Dead letter queues catch failures.

  3. The MQI API in COBOL: MQCONN, MQOPEN, MQPUT, MQGET, MQCLOSE, MQDISC. Every call returns a completion code and reason code. Check both. Always.

  4. Message patterns: Fire-and-forget (datagram) for decoupled flows, request/reply for synchronous-style interactions, pub/sub for one-to-many distribution. The pattern determines your error handling and capacity planning.

  5. Transactional messaging: MQPMO-SYNCPOINT and MQGMO-SYNCPOINT make MQ operations part of the CICS/DB2 unit of work. Two-phase commit via RRS ensures atomicity across subsystems.

  6. CICS integration: The CICS-MQ adapter handles connections. MQ triggering starts CICS transactions automatically. Know when to use MQ vs. MRO vs. TS queues.

  7. High availability: Queue manager clusters for workload balancing and routing resilience. Shared queues in the Coupling Facility for zero-downtime failover.

  8. Error handling: Four layers — operation, message, program, infrastructure. Backout counters prevent poison message loops. DLQs are ICUs, not trash cans.

The threshold concept: messaging decouples time, not just space. This is why messages must be persistent, why transactional coordination matters, why backout queues exist, and why the DLQ needs a processing strategy. When you put a message on a queue, you're making a promise to the future. MQ is how you keep that promise.


Spaced Review

From Chapter 1: z/OS Subsystems

MQ is a z/OS subsystem like DB2 and CICS — its own address spaces, its own recovery, its own security. The connection handle (HCONN) is your session with the subsystem, just as a DB2 plan is your session with DB2. Understanding the subsystem model helps you reason about MQ failures, recovery, and security.

From Chapter 13: CICS Regions — MRO vs. MQ

MRO connects CICS regions on the same LPAR with direct program calls. MQ connects anything to anything with asynchronous messages. MRO is faster (no serialization). MQ is more flexible (cross-LPAR, cross-platform, temporally decoupled). Production systems use both: MRO within the CICS region topology, MQ for inter-system integration.

🔍 LOOKING AHEAD Chapter 20 builds on this foundation with advanced MQ patterns — message sequencing, message grouping, and complex request/reply choreographies. Chapter 21 introduces MQ as the bridge to REST APIs and cloud services. Everything in Part IV starts here.