> This chapter opens the messaging section. Everything that follows — event-driven architectures, API gateways, cloud integration — builds on the MQ fundamentals you learn here.
In This Chapter
- 19.1 Why Messaging — The Architecture Pattern That Changed Enterprise Computing
- 19.2 MQ Fundamentals — Queue Managers, Queues, and Channels
- 19.3 COBOL and MQ — The MQI API in Practice
- 19.4 Message Patterns — Request/Reply, Fire-and-Forget, and Pub/Sub
- 19.5 Transactional Messaging — Coordinating MQ with DB2 and CICS
- 19.6 MQ in CICS — The EXEC CICS Interface vs. MQI
- 19.7 MQ Clustering and High Availability
- 19.8 Dead Letter Queues, Poison Messages, and Error Handling
- 19.9 Project Checkpoint — MQ Queue Architecture for HA Banking System
- Production Considerations
- Summary
- Spaced Review
Chapter 19: IBM MQ for COBOL — Queue Management, Message Patterns, and Transactional Messaging
🚪 GATEWAY CHAPTER — Part IV: Messaging and Integration This chapter opens the messaging section. Everything that follows — event-driven architectures, API gateways, cloud integration — builds on the MQ fundamentals you learn here.
19.1 Why Messaging — The Architecture Pattern That Changed Enterprise Computing
Kwame Mensah remembers the week Continental National Bank almost drowned in its own success.
It was Q4 2019. CNB had just acquired a regional bank with 1.2 million consumer accounts. The migration plan called for a 72-hour cutover weekend — take both cores offline, merge the databases, bring up the unified system. Standard playbook. Kwame had done three acquisitions before.
Except this time, the acquiring bank's real-time fraud detection system couldn't tolerate 72 hours of downtime. Neither could the new mobile banking platform that marketing had launched six months early "to capture millennial market share." And neither could the Federal Reserve's FedNow interface, which had gone live three months prior and expected CNB to settle payments within seconds, not hours.
"We had 47 systems connected to the core banking platform," Kwame says. "And every single one of them was point-to-point. System A called System B through a CICS transaction. System C read System D's VSAM files through a shared DASD path. System E sent System F a flat file through Connect:Direct at 2 AM. When we mapped it out on a whiteboard, it looked like someone had thrown spaghetti at the wall."
The acquisition forced a reckoning. CNB couldn't keep adding point-to-point connections. Every new system multiplied the complexity geometrically. With 47 systems, the theoretical maximum was over 1,000 unique point-to-point paths. They were running about 340.
Kwame's team rebuilt CNB's integration architecture around IBM MQ. Three years later, those 47 systems still talk to each other — but through a messaging backbone instead of spaghetti. Adding a new system means defining queues and message formats, not writing custom integration code for every existing system it needs to talk to.
This chapter is about how that works.
The Problem with Point-to-Point
Every enterprise architect has seen the spaghetti diagram. But let's be precise about what's actually wrong with point-to-point integration, because understanding the specific failure modes tells you exactly why messaging exists.
Temporal coupling. When System A calls System B synchronously, both systems must be running at the same time. If B is down for maintenance, A fails. In a 47-system environment, you can't take anything down without checking every upstream dependency first. Rob Calloway at CNB maintained a spreadsheet — literally a spreadsheet — tracking which batch windows were safe for maintenance on which systems. It had 2,300 rows.
Spatial coupling. System A has to know where System B lives — its hostname, port, transaction ID, or dataset name. Move System B to a different LPAR, change a CICS region name, or migrate to a new subsystem, and every system that calls B needs to be updated. At CNB, changing a single CICS region name required updating 23 different programs.
Format coupling. When A sends data to B directly, they have to agree on a format. Add a field to B's input, and every system that talks to B must change. At 47 systems, a single format change cascades across the entire environment.
Capacity coupling. If A can produce work faster than B can consume it, the only options are: A slows down (wasting capacity), B scales up (expensive), or something breaks. In peak season, CNB's wire transfer system produced transactions three times faster than the compliance screening system could process them. The "solution" was an artificial delay loop in the wire transfer program. An actual PERFORM VARYING WS-WAIT FROM 1 BY 1 UNTIL WS-WAIT > 50000 doing nothing, burning CPU cycles, just to slow things down.
Messaging solves all four problems. And the key insight — the threshold concept for this chapter — is understanding that messaging doesn't just decouple systems across space. It decouples them across time.
🔍 THRESHOLD CONCEPT: Messaging Decouples Time, Not Just Space
Most people first understand MQ as "sending messages between systems" — spatial decoupling. System A puts a message on a queue; System B picks it up. They don't need to know about each other.
But the deeper insight is temporal decoupling. System A puts a message on a queue at 2:00 PM. System B might pick it up at 2:00:00.003 PM. Or at 2:15 PM when it finishes its current batch. Or at 6:00 AM tomorrow when the overnight cycle starts. The message persists on the queue until someone retrieves it. The sender doesn't wait. The receiver doesn't hurry.
Once you internalize this, everything about MQ design falls into place: why messages must be persistent, why transactional coordination matters, why dead letter queues exist, why backout thresholds are critical. Time decoupling means the queue is a contract between the present and the future. And contracts require guarantees.
What IBM MQ Actually Is
IBM MQ (formerly WebSphere MQ, formerly MQSeries) is a message-oriented middleware (MOM) product. It provides:
- Guaranteed delivery: once MQ accepts your message, it will deliver it or tell you why it couldn't. Messages survive queue manager restarts, system crashes, and network outages.
- Transactional integrity: MQ participates in two-phase commit with DB2 and CICS. You can put a message and update a database row as a single atomic unit of work.
- Location transparency: senders put messages on queues by name. MQ figures out where the queue lives and how to get the message there.
- Protocol independence: MQ handles the network protocol. Your COBOL program doesn't care whether the target system is on the same LPAR, across the Sysplex, or on a Linux server in AWS.
On z/OS, MQ runs as its own subsystem — a started task with its own address spaces, log datasets, and configuration. It's at the same architectural level as DB2 or CICS. This matters because it means MQ has its own recovery mechanisms, its own security model, and its own operations team. At CNB, the MQ team is five people. They manage 12 queue managers across four LPARs processing 180 million messages per day.
MQ's Position in the z/OS Ecosystem
Understanding where MQ sits relative to other z/OS subsystems is critical for architecture decisions. Consider the subsystem interaction model:
+-------------------+
| Your COBOL |
| Application |
+---+-----+-----+--+
| | |
v v v
+----+ +----+ +----+
|DB2 | |CICS| | MQ |
+----+ +----+ +----+
\ | /
\ | /
v v v
+--------------+
| z/OS RRS |
| (Transaction |
| Coordinator) |
+--------------+
Your COBOL program can interact with all three subsystems within a single unit of work. RRS (Resource Recovery Services) sits underneath, coordinating the two-phase commit protocol across all participating resource managers. This is not theoretical — it's how CNB's wire transfer system works: one DB2 update, two MQ puts, all committed atomically.
The key implication: MQ is not a bolt-on utility. It's a first-class z/OS citizen with the same recovery guarantees as DB2. When Kwame says "MQ will deliver your message or tell you why it couldn't," that guarantee is backed by the same infrastructure that guarantees your DB2 commits survive a system crash.
💡 MQ VERSION AWARENESS As of this writing, IBM MQ 9.3 and 9.4 are the active versions on z/OS. Key capabilities added in recent versions include: streaming queues (9.3.4), enhanced pub/sub performance, native REST API access to queues, and improved container support for hybrid deployments. Your shop may be on an older version — always check
DIS QMGR VERSIONto know what you're working with. The MQI API itself has been remarkably stable; COBOL programs written for MQ V5 still compile and run on V9 with minimal changes.
19.2 MQ Fundamentals — Queue Managers, Queues, and Channels
Queue Managers
A queue manager is the core MQ server process. It owns queues, manages connections, handles message persistence, coordinates transactions, and controls security. Every MQ object belongs to exactly one queue manager.
On z/OS, a queue manager runs as a set of started tasks:
| Component | STC Name (typical) | Purpose |
|---|---|---|
| Queue manager | CSQ1MSTR | Main address space, queue management |
| Channel initiator | CSQ1CHIN | Network communication |
| Utility | CSQ1UTIL | Admin commands |
CNB runs four queue managers — one per LPAR:
CNBPQM01— Production LPAR A (primary banking)CNBPQM02— Production LPAR B (payments and wire)CNBPQM03— Production LPAR C (batch processing)CNBPQM04— Production LPAR D (interfaces and gateways)
💡 NAMING CONVENTION CNB uses a standard:
{company}{env}{type}{nn}. SoCNBPQM01= Continental National Bank, Production, Queue Manager, 01. Your shop will have its own convention, but have one. When you're troubleshooting at 3 AM with 12 queue managers in play, naming matters.
Queue Types
MQ defines several queue types. You need to understand all of them, because you'll use all of them.
Local queues are the workhorses. A local queue lives on the queue manager where it's defined. Programs connected to that queue manager can put messages to it and get messages from it. This is where messages physically reside.
DEFINE QLOCAL(CNB.WIRE.REQUEST) +
DESCR('Inbound wire transfer requests') +
PUT(ENABLED) GET(ENABLED) +
DEFPSIST(YES) +
MAXDEPTH(500000) +
MAXMSGL(100000) +
BOTHRESH(3) +
BOQNAME(CNB.WIRE.REQUEST.BACKOUT) +
TRIGTYPE(EVERY) +
TRIGGER +
PROCESS(CNB.WIRE.PROC) +
INITQ(SYSTEM.CICS.INITIATION.QUEUE)
Let's unpack the important attributes:
DEFPSIST(YES): Messages are persistent by default — they survive queue manager restarts. For financial transactions, always YES.MAXDEPTH(500000): Maximum messages before the queue is full. Size this for your peak scenario plus headroom.BOTHRESH(3): Backout threshold. If a message is rolled back 3 times, move it to the backout queue. This prevents poison messages from looping forever.BOQNAME(...): Where poison messages go.TRIGTYPE(EVERY): Trigger a process for every message arrival. This is how MQ starts CICS transactions automatically.
Remote queue definitions are pointers, not storage. A remote queue definition on Queue Manager A says "when someone puts a message to this queue name, actually route it to a queue on Queue Manager B." The sending program doesn't know or care that the queue is remote.
DEFINE QREMOTE(CNB.FRAUD.CHECK) +
DESCR('Route to fraud QM') +
RNAME(FRAUD.INBOUND.REQUEST) +
RQMNAME(CNBPQM02) +
XMITQ(CNBPQM01.TO.CNBPQM02)
Transmission queues are special local queues that hold messages in transit to another queue manager. When you put a message to a remote queue, MQ actually puts it on the transmission queue. The channel initiator picks it up and sends it across the network.
DEFINE QLOCAL(CNBPQM01.TO.CNBPQM02) +
DESCR('Xmit queue to QM02') +
USAGE(XMITQ) +
DEFPSIST(YES) +
MAXDEPTH(999999999)
Dead letter queues (DLQ) catch messages that can't be delivered. Every queue manager should have one. If MQ can't put a message on its target queue (queue full, queue doesn't exist, authorization failure), the message goes to the DLQ instead of being lost.
DEFINE QLOCAL(CNB.DEAD.LETTER.QUEUE) +
DESCR('Dead letter queue for CNBPQM01') +
PUT(ENABLED) GET(ENABLED) +
DEFPSIST(YES) +
MAXDEPTH(999999999)
⚠️ PRODUCTION REALITY "The DLQ is not a trash can," Kwame tells every new developer. "It's an ICU. Every message in the DLQ is a business transaction that failed to process. Someone needs to look at it, diagnose it, and either fix it and resubmit it or escalate it." CNB runs a monitoring job every 15 minutes that checks DLQ depth. If it's above zero, someone gets paged.
Model queues are templates. When a program creates a dynamic queue (common for reply-to queues in request/reply patterns), MQ uses a model queue as the template. The dynamic queue inherits the model's attributes.
Alias queues provide an alternative name for another queue. Useful for migration — point the alias at the old queue, then switch it to the new queue without changing any programs.
Channels
Channels are the communication links between queue managers. They come in pairs:
- Sender channel (SDR) on Queue Manager A pushes messages from a transmission queue to Queue Manager B
- Receiver channel (RCVR) on Queue Manager B accepts incoming messages
There are also: - Server-connection channels (SVRCONN) for client applications connecting to the queue manager - Cluster-sender and Cluster-receiver channels for MQ clustering
-- On CNBPQM01:
DEFINE CHANNEL(CNBPQM01.TO.CNBPQM02) +
CHLTYPE(SDR) +
CONNAME('10.1.2.3(1414)') +
XMITQ(CNBPQM01.TO.CNBPQM02) +
SSLCIPH(TLS_RSA_WITH_AES_256_CBC_SHA256)
-- On CNBPQM02:
DEFINE CHANNEL(CNBPQM01.TO.CNBPQM02) +
CHLTYPE(RCVR) +
SSLCIPH(TLS_RSA_WITH_AES_256_CBC_SHA256)
💡 THE COMPLETE FLOW Program puts message to
CNB.FRAUD.CHECK(remote queue def) → MQ resolves to transmission queueCNBPQM01.TO.CNBPQM02→ Channel initiator picks up message via sender channel → Sends to CNBPQM02 via receiver channel → Message arrives onFRAUD.INBOUND.REQUEST(local queue on QM02). The sending program's code is identical whether the target queue is local or remote.
19.3 COBOL and MQ — The MQI API in Practice
The Message Queue Interface (MQI) is MQ's API. On z/OS, COBOL programs call MQI through standard CALL statements. The MQI is remarkably small — about a dozen verbs cover everything:
| Verb | Purpose |
|---|---|
| MQCONN / MQCONNX | Connect to queue manager |
| MQDISC | Disconnect from queue manager |
| MQOPEN | Open a queue |
| MQCLOSE | Close a queue |
| MQPUT | Put a message on an open queue |
| MQPUT1 | Open, put, close in one call (convenience) |
| MQGET | Get a message from a queue |
| MQINQ | Inquire about queue attributes |
| MQSET | Set queue attributes |
| MQSUB | Subscribe to a topic (pub/sub) |
| MQCMIT | Commit unit of work (batch only) |
| MQBACK | Back out unit of work (batch only) |
COBOL Copybooks
MQ provides COBOL copybooks for all its data structures. You need these in your WORKING-STORAGE:
WORKING-STORAGE SECTION.
*-------------------------------------------------------*
* MQ API Copybooks *
*-------------------------------------------------------*
COPY CMQV.
* MQ Constants (completion codes, reason codes, etc.)
COPY CMQODV.
* Object Descriptor (MQOD) - identifies the queue
COPY CMQMDV.
* Message Descriptor (MQMD) - message metadata
COPY CMQPMOV.
* Put Message Options (MQPMO)
COPY CMQGMOV.
* Get Message Options (MQGMO)
These copybooks define structures with default values. CMQMDV, for instance, gives you an MQMD structure pre-filled with IBM's defaults. You override what you need.
Connecting to the Queue Manager
In a CICS environment, you typically don't call MQCONN — the CICS-MQ adapter handles the connection. In batch, you connect explicitly:
*-------------------------------------------------------*
* Connect to the queue manager *
*-------------------------------------------------------*
MOVE 'CNBPQM01' TO MQOD-OBJECTNAME
CALL 'MQCONN' USING WS-QMGR-NAME
WS-HCONN
WS-COMPCODE
WS-REASON
IF WS-COMPCODE NOT = MQCC-OK
DISPLAY 'MQCONN failed. Reason: ' WS-REASON
PERFORM 9000-ABEND-ROUTINE
END-IF
WS-HCONN is the connection handle. You pass it to every subsequent MQI call. Think of it like a DB2 plan — it represents your session with the queue manager.
Opening a Queue
Before you can put or get messages, you open the queue:
*-------------------------------------------------------*
* Open queue for output (putting messages) *
*-------------------------------------------------------*
MOVE MQOT-Q TO MQOD-OBJECTTYPE
MOVE 'CNB.WIRE.REQUEST'
TO MQOD-OBJECTNAME
MOVE SPACES TO MQOD-OBJECTQMGRNAME
COMPUTE WS-OPTIONS = MQOO-OUTPUT
+ MQOO-FAIL-IF-QUIESCING
CALL 'MQOPEN' USING WS-HCONN
MQOD
WS-OPTIONS
WS-HOBJ
WS-COMPCODE
WS-REASON
IF WS-COMPCODE NOT = MQCC-OK
DISPLAY 'MQOPEN failed. Reason: ' WS-REASON
PERFORM 9000-ABEND-ROUTINE
END-IF
Key points:
- MQOO-OUTPUT opens for putting. Use MQOO-INPUT-SHARED or MQOO-INPUT-EXCLUSIVE for getting.
- MQOO-FAIL-IF-QUIESCING means the call fails if the queue manager is shutting down. Always include this — you don't want your program hanging during planned maintenance.
- WS-HOBJ is the object handle. You use it in MQPUT and MQGET calls.
- Leave MQOD-OBJECTQMGRNAME blank to use the connected queue manager.
Putting a Message
Here's a complete MQPUT with proper MQMD setup:
*-------------------------------------------------------*
* Build the message descriptor *
*-------------------------------------------------------*
* Reset MQMD to defaults
PERFORM INITIALIZE-MQMD
MOVE MQMT-DATAGRAM TO MQMD-MSGTYPE
MOVE MQPER-PERSISTENT TO MQMD-PERSISTENCE
MOVE MQFMT-STRING TO MQMD-FORMAT
MOVE MQEI-UNLIMITED TO MQMD-EXPIRY
MOVE 5 TO MQMD-PRIORITY
*-------------------------------------------------------*
* Build the put message options *
*-------------------------------------------------------*
MOVE MQPMO-SYNCPOINT TO MQPMO-OPTIONS
ADD MQPMO-NEW-MSG-ID TO MQPMO-OPTIONS
ADD MQPMO-NEW-CORREL-ID
TO MQPMO-OPTIONS
ADD MQPMO-FAIL-IF-QUIESCING
TO MQPMO-OPTIONS
*-------------------------------------------------------*
* Build the message body *
*-------------------------------------------------------*
MOVE WS-WIRE-TRANSFER-REC
TO WS-MSG-BUFFER
MOVE LENGTH OF WS-WIRE-TRANSFER-REC
TO WS-MSG-LENGTH
*-------------------------------------------------------*
* Put the message *
*-------------------------------------------------------*
CALL 'MQPUT' USING WS-HCONN
WS-HOBJ
MQMD
MQPMO
WS-MSG-LENGTH
WS-MSG-BUFFER
WS-COMPCODE
WS-REASON
EVALUATE WS-COMPCODE
WHEN MQCC-OK
ADD 1 TO WS-PUT-COUNT
WHEN MQCC-WARNING
DISPLAY 'MQPUT warning. Reason: '
WS-REASON
ADD 1 TO WS-WARN-COUNT
WHEN MQCC-FAILED
DISPLAY 'MQPUT failed. Reason: '
WS-REASON
PERFORM 8000-HANDLE-MQ-ERROR
END-EVALUATE
Let me unpack the MQMD fields because they matter:
- MQMD-MSGTYPE:
MQMT-DATAGRAMfor fire-and-forget,MQMT-REQUESTfor request/reply,MQMT-REPLYfor the reply. - MQMD-PERSISTENCE:
MQPER-PERSISTENTmeans the message is written to the MQ log. It survives queue manager restarts. For financial transactions, always persistent. The cost is I/O — persistent messages are synchronously logged. - MQMD-FORMAT: Tells the receiver how to interpret the body.
MQFMT-STRINGfor character data. Use a custom format name for structured binary data. - MQMD-EXPIRY: How long the message lives (in tenths of a second).
MQEI-UNLIMITEDmeans forever. For request/reply, set an expiry so stale replies don't accumulate. - MQMD-PRIORITY: 0-9. Higher priority messages are delivered first. CNB uses priority 7 for wire transfers, 5 for standard transactions, 3 for batch feeds.
And the MQPMO options:
- MQPMO-SYNCPOINT: The put is part of a unit of work. It's not committed until you call MQCMIT (batch) or the CICS task completes (CICS). This is critical for transactional integrity.
- MQPMO-NEW-MSG-ID: MQ generates a unique message ID. You'll need this for correlation.
- MQPMO-FAIL-IF-QUIESCING: Same reason as MQOPEN — respect shutdown signals.
Getting a Message
Getting is slightly more complex because you have options about which message to get:
*-------------------------------------------------------*
* Set up get message options *
*-------------------------------------------------------*
MOVE MQGMO-SYNCPOINT TO MQGMO-OPTIONS
ADD MQGMO-WAIT TO MQGMO-OPTIONS
ADD MQGMO-FAIL-IF-QUIESCING
TO MQGMO-OPTIONS
MOVE 30000 TO MQGMO-WAITINTERVAL
* Wait up to 30 seconds for a message
* Clear MQMD fields to get next available message
MOVE MQMI-NONE TO MQMD-MSGID
MOVE MQCI-NONE TO MQMD-CORRELID
*-------------------------------------------------------*
* Get the message *
*-------------------------------------------------------*
MOVE LENGTH OF WS-MSG-BUFFER
TO WS-BUFFER-LENGTH
CALL 'MQGET' USING WS-HCONN
WS-HOBJ
MQMD
MQGMO
WS-BUFFER-LENGTH
WS-MSG-BUFFER
WS-MSG-LENGTH
WS-COMPCODE
WS-REASON
EVALUATE TRUE
WHEN WS-COMPCODE = MQCC-OK
PERFORM 5000-PROCESS-MESSAGE
WHEN WS-REASON = MQRC-NO-MSG-AVAILABLE
SET WS-NO-MORE-MESSAGES TO TRUE
WHEN WS-REASON = MQRC-TRUNCATED-MSG-FAILED
DISPLAY 'Message too large. Length: '
WS-MSG-LENGTH
PERFORM 8100-HANDLE-TRUNCATION
WHEN OTHER
DISPLAY 'MQGET failed. CC: '
WS-COMPCODE
' Reason: ' WS-REASON
PERFORM 8000-HANDLE-MQ-ERROR
END-EVALUATE
Key points:
- MQGMO-WAIT with MQGMO-WAITINTERVAL of 30000 means "wait up to 30 seconds for a message." Without this, MQGET returns immediately if no message is available.
- Setting MQMD-MSGID and MQMD-CORRELID to MQMI-NONE/MQCI-NONE: This means "get the next available message." If you set these to specific values, MQ retrieves only a message matching those IDs. This is how you implement request/reply — you get the reply by correlation ID.
- MQRC-NO-MSG-AVAILABLE (reason code 2033): Not an error. It means the wait interval expired with no message. Your program should handle this gracefully.
- MQRC-TRUNCATED-MSG-FAILED (reason code 2080): The message was larger than your buffer. WS-MSG-LENGTH tells you the actual size.
⚠️ THE BUFFER SIZE TRAP Here's a mistake Kwame sees constantly from developers new to MQ: they define a 4K buffer because "our messages are always 2K." Then six months later, someone adds fields to the message format, it grows to 5K, and MQGET starts failing with reason code 2080 on every message. Size your buffers for the maximum message length the queue allows (
MAXMSGL), or at minimum, handle truncation explicitly and have a recovery path.
Error Handling — The Reason Codes That Matter
MQ returns hundreds of reason codes. In practice, you'll encounter about 20 regularly. Here are the ones your error handling must cover:
| Reason Code | Value | Meaning | Response |
|---|---|---|---|
| MQRC-NONE | 0 | Success | Continue |
| MQRC-NOT-AUTHORIZED | 2035 | Security failure | Log, alert, abend |
| MQRC-Q-FULL | 2053 | Queue at MAXDEPTH | Retry with backoff, alert |
| MQRC-NO-MSG-AVAILABLE | 2033 | No message (with wait) | Normal — exit loop |
| MQRC-CONNECTION-BROKEN | 2009 | Lost QM connection | Reconnect or abend |
| MQRC-Q-MGR-QUIESCING | 2161 | QM shutting down | Clean exit |
| MQRC-TRUNCATED-MSG-FAILED | 2080 | Buffer too small | Handle or abend |
| MQRC-BACKED-OUT | 2003 | UOW rolled back | Retry or escalate |
| MQRC-PUT-INHIBITED | 2051 | Queue blocked for PUT | Retry later, alert |
| MQRC-GET-INHIBITED | 2016 | Queue blocked for GET | Retry later, alert |
8000-HANDLE-MQ-ERROR.
*-------------------------------------------------------*
* Central MQ error handler *
*-------------------------------------------------------*
EVALUATE WS-REASON
WHEN MQRC-NOT-AUTHORIZED
MOVE 'SECURITY VIOLATION'
TO WS-ERROR-TYPE
PERFORM 8500-LOG-AND-ABEND
WHEN MQRC-Q-FULL
ADD 1 TO WS-RETRY-COUNT
IF WS-RETRY-COUNT > WS-MAX-RETRIES
MOVE 'QUEUE FULL - RETRIES EXHAUSTED'
TO WS-ERROR-TYPE
PERFORM 8500-LOG-AND-ABEND
ELSE
CALL 'CEEINTV' USING WS-WAIT-SECS
WS-FC
GO TO 4000-PUT-MESSAGE
END-IF
WHEN MQRC-CONNECTION-BROKEN
MOVE 'CONNECTION LOST'
TO WS-ERROR-TYPE
PERFORM 8500-LOG-AND-ABEND
WHEN MQRC-Q-MGR-QUIESCING
DISPLAY 'Queue manager quiescing. '
'Clean shutdown.'
PERFORM 9500-CLEAN-SHUTDOWN
WHEN OTHER
MOVE 'UNEXPECTED MQ ERROR'
TO WS-ERROR-TYPE
DISPLAY 'MQ Error - CC: ' WS-COMPCODE
' RC: ' WS-REASON
PERFORM 8500-LOG-AND-ABEND
END-EVALUATE.
🔄 SPACED REVIEW — Chapter 1 Recall from Chapter 1 how z/OS subsystems (DB2, CICS, MQ) each have their own address spaces and recovery mechanisms. MQ's connection handle (
HCONN) is analogous to a DB2 thread — it's your program's session with the subsystem. A broken connection handle is as serious as a lost DB2 thread.
19.4 Message Patterns — Request/Reply, Fire-and-Forget, and Pub/Sub
Patterns are not academic exercises. The pattern you choose determines your error handling, your timeout strategy, your monitoring requirements, and your capacity planning. Choose wrong and you'll be redesigning under pressure during the next peak season.
Fire-and-Forget (Datagram)
The simplest pattern. The sender puts a message and moves on. No response expected.
Sender → MQPUT → [Queue] → MQGET → Receiver
When to use it: - Audit logging - Event notifications - Data feeds where the sender doesn't need confirmation - Any case where the sender's processing shouldn't block on the receiver
COBOL implementation: Set MQMD-MSGTYPE to MQMT-DATAGRAM. The sending program doesn't need a reply-to queue. The MQPUT example in Section 19.3 is a fire-and-forget pattern.
At CNB, fire-and-forget handles 70% of all messages. Every core banking transaction puts an audit message to CNB.AUDIT.FEED. The audit system processes them independently. If the audit system falls behind, the queue just gets deeper — the core banking system doesn't slow down.
The risk: If the receiver fails permanently, messages accumulate until the queue fills up. You need monitoring on queue depth.
Request/Reply
The sender puts a request and waits for a response. This is the MQ equivalent of a synchronous call, but with a crucial difference — the sender and receiver are still temporally decoupled.
Sender → MQPUT → [Request Queue] → MQGET → Receiver
|
Sender ← MQGET ← [Reply Queue] ← MQPUT ←---+
When to use it: - Real-time inquiries (account balance check) - Validation (fraud screening before authorization) - Any case where the sender needs a response before proceeding
COBOL implementation — the sender side:
*-------------------------------------------------------*
* Request/Reply - Sender *
*-------------------------------------------------------*
* Step 1: Create a temporary reply queue
MOVE 'CNB.REPLY.MODEL' TO MQOD-OBJECTNAME
MOVE MQOD-DYNAMIC-Q-NAME
TO MQOD-DYNAMICQNAME
* Pattern: 'CNB.REPLY.*' generates unique names
MOVE 'CNB.REPLY.*' TO MQOD-DYNAMICQNAME
COMPUTE WS-OPTIONS = MQOO-INPUT-EXCLUSIVE
CALL 'MQOPEN' USING WS-HCONN
MQOD
WS-OPTIONS
WS-HOBJ-REPLY
WS-COMPCODE
WS-REASON
* Save the generated reply queue name
MOVE MQOD-OBJECTNAME TO WS-REPLY-QNAME
* Step 2: Put the request message
MOVE MQMT-REQUEST TO MQMD-MSGTYPE
MOVE WS-REPLY-QNAME TO MQMD-REPLYTOQ
MOVE SPACES TO MQMD-REPLYTOQMGR
MOVE 3000 TO MQMD-EXPIRY
* Expires in 300 seconds (30 * 10ths of seconds)
PERFORM 4000-PUT-MESSAGE
* Save the message ID for correlation
MOVE MQMD-MSGID TO WS-SAVED-MSGID
* Step 3: Wait for the reply
MOVE MQCI-NONE TO MQMD-CORRELID
MOVE WS-SAVED-MSGID TO MQMD-CORRELID
MOVE MQMI-NONE TO MQMD-MSGID
MOVE MQGMO-WAIT TO MQGMO-OPTIONS
ADD MQGMO-SYNCPOINT TO MQGMO-OPTIONS
ADD MQGMO-FAIL-IF-QUIESCING
TO MQGMO-OPTIONS
MOVE 30000 TO MQGMO-WAITINTERVAL
CALL 'MQGET' USING WS-HCONN
WS-HOBJ-REPLY
MQMD
MQGMO
WS-BUFFER-LENGTH
WS-MSG-BUFFER
WS-MSG-LENGTH
WS-COMPCODE
WS-REASON
IF WS-REASON = MQRC-NO-MSG-AVAILABLE
MOVE 'REPLY TIMEOUT' TO WS-ERROR-TYPE
PERFORM 8200-HANDLE-TIMEOUT
END-IF
The critical detail: the correlation ID. The sender saves the message ID of its request. The receiver copies that message ID into the correlation ID of its reply. The sender then gets the reply by matching on correlation ID. Without this, in a high-volume system with hundreds of concurrent request/reply pairs, you'd get someone else's reply.
COBOL implementation — the receiver side:
*-------------------------------------------------------*
* Request/Reply - Receiver (service program) *
*-------------------------------------------------------*
* Get the request
PERFORM 5000-GET-MESSAGE
* Save request's MQMD fields for the reply
MOVE MQMD-MSGID TO WS-REQUEST-MSGID
MOVE MQMD-REPLYTOQ TO WS-REPLY-QNAME
MOVE MQMD-REPLYTOQMGR TO WS-REPLY-QMGR
* Process the request
PERFORM 6000-PROCESS-REQUEST
* Build the reply
PERFORM INITIALIZE-MQMD
MOVE MQMT-REPLY TO MQMD-MSGTYPE
MOVE WS-REQUEST-MSGID TO MQMD-CORRELID
MOVE MQPER-PERSISTENT TO MQMD-PERSISTENCE
* Open the reply queue
MOVE WS-REPLY-QNAME TO MQOD-OBJECTNAME
MOVE WS-REPLY-QMGR TO MQOD-OBJECTQMGRNAME
COMPUTE WS-OPTIONS = MQOO-OUTPUT
+ MQOO-FAIL-IF-QUIESCING
CALL 'MQOPEN' USING WS-HCONN
MQOD
WS-OPTIONS
WS-HOBJ-REPLY
WS-COMPCODE
WS-REASON
* Put the reply
PERFORM 4100-PUT-REPLY
🧩 PATTERN INTERACTION At CNB, the wire transfer system uses request/reply for fraud screening. The wire transfer program puts a request to
CNB.FRAUD.CHECK, which routes (via remote queue definition) to CNBPQM02 where the fraud engine runs. The fraud engine processes the request and puts the reply back to the dynamic reply queue on CNBPQM01. The wire transfer program waits up to 30 seconds. If no reply comes, it routes the transaction for manual review. SLA: 95% of fraud checks return within 2 seconds.
Publish/Subscribe
In pub/sub, publishers send messages to topics, and MQ distributes copies to all subscribers. Publishers don't know about subscribers. Subscribers don't know about publishers. MQ handles the fan-out.
Publisher → MQPUT → [Topic: CNB/RATES/FX] → MQ distributes
↓ ↓ ↓
[Sub 1] [Sub 2] [Sub 3]
Trading Pricing Reporting
When to use it: - Reference data distribution (FX rates, interest rates, product catalog updates) - Event broadcasting (system status changes, market data) - Any case where multiple consumers need the same information
COBOL publisher:
*-------------------------------------------------------*
* Publish FX rate update *
*-------------------------------------------------------*
MOVE MQOT-TOPIC TO MQOD-OBJECTTYPE
MOVE 'CNB/RATES/FX' TO MQOD-OBJECTSTRING
MOVE 14 TO MQOD-OBJECTSTRINGLENGTH
MOVE SPACES TO MQOD-OBJECTNAME
COMPUTE WS-OPTIONS = MQOO-OUTPUT
+ MQOO-FAIL-IF-QUIESCING
CALL 'MQOPEN' USING WS-HCONN
MQOD
WS-OPTIONS
WS-HOBJ-TOPIC
WS-COMPCODE
WS-REASON
IF WS-COMPCODE = MQCC-OK
PERFORM 4200-PUT-FX-RATE
END-IF
COBOL subscriber (durable subscription):
*-------------------------------------------------------*
* Subscribe to FX rate updates *
*-------------------------------------------------------*
MOVE 'FX.RATE.SUB.TRADING'
TO MQSD-SUBNAME
MOVE 'CNB/RATES/FX' TO MQSD-OBJECTSTRING
MOVE 14 TO MQSD-OBJECTSTRINGLENGTH
MOVE MQSO-CREATE + MQSO-RESUME +
MQSO-DURABLE + MQSO-FAIL-IF-QUIESCING
TO MQSD-OPTIONS
CALL 'MQSUB' USING WS-HCONN
MQSD
WS-HOBJ-SUB
WS-HSUB
WS-COMPCODE
WS-REASON
Durable subscriptions (MQSO-DURABLE) persist across queue manager restarts. If the subscriber disconnects, MQ keeps collecting messages for it. When the subscriber reconnects and resumes (MQSO-RESUME), it gets all the messages it missed. For financial data, always durable.
💡 CHOOSING THE RIGHT PATTERN
Requirement Pattern Why "Send and forget" Datagram No response needed "Need an answer" Request/Reply Synchronous-style with decoupling "Everyone needs to know" Pub/Sub One-to-many distribution "Exactly one consumer" Datagram with competing consumers Load balancing across instances "Must process in order" Datagram with exclusive get Serialization guarantee
Competing Consumers — The Pattern You'll Use Most
There's a pattern that doesn't get its own category in textbooks but dominates production: competing consumers. Multiple instances of the same program read from a single queue. MQ ensures each message is delivered to exactly one consumer.
At CNB, the transaction processing queue (CNB.TXN.INBOUND) has six CICS regions reading from it simultaneously. Each MQGET with MQOO-INPUT-SHARED retrieves the next available message. MQ handles the locking — no two consumers get the same message.
+--→ [Consumer 1 - CICS Region A1]
|
[Queue] →→→→→→→→+--→ [Consumer 2 - CICS Region A2]
|
+--→ [Consumer 3 - CICS Region B1]
This gives you horizontal scaling. If your consumers can't keep up, add another one. If a consumer crashes, the others continue processing. Messages that were in-flight on the crashed consumer (under syncpoint) are rolled back and picked up by another consumer.
The constraint: message ordering is not guaranteed across competing consumers. If messages A, B, and C arrive in that order, Consumer 1 might process A and C while Consumer 2 processes B. If your business logic requires strict ordering, you can't use competing consumers — you need a single consumer with MQOO-INPUT-EXCLUSIVE, which serializes all processing through one reader.
CNB's solution: they use competing consumers for transaction processing (order doesn't matter — each transaction is independent) but a single exclusive consumer for the general ledger feed (GL entries must post in sequence).
⚠️ THE COMPETING CONSUMERS TRAP Diane Okoye at Pinnacle Health learned this the hard way. Their claims adjudication queue had three competing consumers. Claims are supposed to be independent — but a batch of related claims (a patient with multiple procedures on the same visit) had inter-claim dependencies. Consumer 1 processed the primary claim and approved it. Consumer 3 processed a dependent claim, checked for the primary's approval, didn't find it (Consumer 1 hadn't committed yet), and rejected it. The fix: batch-related claims into a single message group (covered in Chapter 20) so one consumer processes the entire group.
Pattern Anti-Patterns
A few patterns that look reasonable but cause production problems:
Request/reply for high-volume fire-and-forget. If you don't need the response, don't ask for one. Every request/reply creates a dynamic reply queue, waits for a response, and doubles the message count. At CNB's peak of 22 million messages per hour, converting the audit feed from datagram to request/reply would have added 22 million reply messages, 22 million dynamic queues, and a 50% increase in MQ log volume. "Just because you can doesn't mean you should," Kwame says.
Pub/sub for point-to-point. If only one system needs the message, use a datagram to a specific queue. Pub/sub adds the overhead of topic tree management and subscription matching. Use it when you genuinely have one-to-many distribution.
Synchronous request/reply as a replacement for all CICS LINK calls. MQ request/reply adds latency (serialization + network + deserialization + queue processing) compared to an intra-region CICS LINK. If two programs are in the same CICS region and always will be, LINK is faster. Use MQ when you need the decoupling benefits — cross-system, cross-LPAR, or temporal independence.
19.5 Transactional Messaging — Coordinating MQ with DB2 and CICS
This is where MQ earns its keep in enterprise systems. The ability to coordinate a message put or get with a database update as a single atomic transaction is the foundation of reliable enterprise messaging.
The Problem
Consider this scenario at CNB: A wire transfer program debits Account A in DB2 and puts a message to the correspondent bank's queue. What happens if the DB2 update succeeds but the MQPUT fails? Account A is debited, but the correspondent bank never gets the message. Money disappears.
Or the reverse: MQPUT succeeds, but DB2 update fails. The correspondent bank receives a message for a debit that never happened.
Both outcomes are catastrophic. You need atomicity — either both succeed or both fail.
Syncpoint and Unit of Work
MQ participates in z/OS Resource Recovery Services (RRS), the platform's distributed transaction coordinator. When you specify MQPMO-SYNCPOINT on an MQPUT or MQGMO-SYNCPOINT on an MQGET, the MQ operation becomes part of the current unit of work. It's committed or rolled back along with everything else in that unit of work — DB2 updates, VSAM changes, other MQ operations.
In CICS, the unit of work is managed by CICS. An EXEC CICS SYNCPOINT commits everything — DB2 updates, MQ messages, and any other recoverable resources.
*-------------------------------------------------------*
* Wire Transfer - Transactional Pattern in CICS *
*-------------------------------------------------------*
* Step 1: Debit the source account (DB2)
EXEC SQL
UPDATE ACCOUNTS
SET BALANCE = BALANCE - :WS-AMOUNT
WHERE ACCOUNT_NUM = :WS-SOURCE-ACCT
END-EXEC
IF SQLCODE NOT = 0
EXEC CICS SYNCPOINT ROLLBACK END-EXEC
PERFORM 8000-HANDLE-DB2-ERROR
GO TO 9000-RETURN
END-IF
* Step 2: Put message to correspondent bank (MQ)
MOVE MQPMO-SYNCPOINT TO MQPMO-OPTIONS
ADD MQPMO-FAIL-IF-QUIESCING
TO MQPMO-OPTIONS
CALL 'MQPUT' USING WS-HCONN
WS-HOBJ
MQMD
MQPMO
WS-MSG-LENGTH
WS-MSG-BUFFER
WS-COMPCODE
WS-REASON
IF WS-COMPCODE NOT = MQCC-OK
EXEC CICS SYNCPOINT ROLLBACK END-EXEC
PERFORM 8000-HANDLE-MQ-ERROR
GO TO 9000-RETURN
END-IF
* Step 3: Commit both operations atomically
EXEC CICS SYNCPOINT END-EXEC
If the MQPUT fails, the SYNCPOINT ROLLBACK undoes the DB2 debit. If the SYNCPOINT itself fails (system crash), RRS coordinates recovery on restart — either both commit or both roll back. Two-phase commit guarantees it.
In batch, you use MQCMIT and MQBACK:
*-------------------------------------------------------*
* Batch commit *
*-------------------------------------------------------*
CALL 'MQCMIT' USING WS-HCONN
WS-COMPCODE
WS-REASON
IF WS-COMPCODE NOT = MQCC-OK
DISPLAY 'MQCMIT failed. Reason: '
WS-REASON
CALL 'MQBACK' USING WS-HCONN
WS-COMPCODE
WS-REASON
PERFORM 8000-HANDLE-COMMIT-FAILURE
END-IF
⚠️ CRITICAL: THE TWO-PHASE COMMIT WINDOW Between the MQPUT and the SYNCPOINT, there's a window where the message is "in doubt." If the CICS region fails in that window, RRS resolves it on restart. But here's what catches people: the message is invisible to consumers during this window. An MQGET won't see it until it's committed. This is correct behavior — it prevents consumers from processing messages that might be rolled back. But it means you can't test your MQ programs by putting a message and immediately checking the queue depth. The depth only changes after commit.
The Backout Counter
What happens when a consumer gets a message, tries to process it, and fails? With MQGMO-SYNCPOINT, the MQGET is rolled back, and the message reappears on the queue. The consumer gets it again. Tries again. Fails again. Without intervention, this loops forever.
MQ tracks a backout counter in the message header (MQMD-BACKOUTCOUNT). Each rollback increments it. The queue's BOTHRESH attribute defines the threshold. When the counter hits the threshold, your program should route the message to the backout queue instead of processing it.
*-------------------------------------------------------*
* Check backout count before processing *
*-------------------------------------------------------*
IF MQMD-BACKOUTCOUNT >= WS-BACKOUT-THRESHOLD
PERFORM 7000-ROUTE-TO-BACKOUT-QUEUE
ELSE
PERFORM 6000-PROCESS-MESSAGE
END-IF
7000-ROUTE-TO-BACKOUT-QUEUE.
*-------------------------------------------------------*
* Move poison message to backout queue *
*-------------------------------------------------------*
MOVE 'CNB.WIRE.REQUEST.BACKOUT'
TO MQOD-OBJECTNAME
COMPUTE WS-OPTIONS = MQOO-OUTPUT
+ MQOO-FAIL-IF-QUIESCING
CALL 'MQOPEN' USING WS-HCONN
MQOD
WS-OPTIONS
WS-HOBJ-BACKOUT
WS-COMPCODE
WS-REASON
IF WS-COMPCODE = MQCC-OK
MOVE MQPMO-SYNCPOINT
TO MQPMO-OPTIONS
CALL 'MQPUT' USING WS-HCONN
WS-HOBJ-BACKOUT
MQMD
MQPMO
WS-MSG-LENGTH
WS-MSG-BUFFER
WS-COMPCODE
WS-REASON
END-IF
CALL 'MQCLOSE' USING WS-HCONN
WS-HOBJ-BACKOUT
MQCO-NONE
WS-COMPCODE
WS-REASON.
🔍 WHY THIS MATTERS Lisa Tran at CNB calls the backout queue "the canary in the coal mine." A sudden spike in backout queue depth usually means a deployment broke the message format, a DB2 table is in a bad state, or a resource is unavailable. CNB's monitoring triggers a Severity 2 incident if any backout queue depth exceeds 100 messages.
19.6 MQ in CICS — The EXEC CICS Interface vs. MQI
CICS applications have two ways to interact with MQ:
- Direct MQI calls (CALL 'MQPUT', CALL 'MQGET', etc.)
- CICS API bridge and the CICS-MQ adapter
In practice, most CICS-MQ programs use the direct MQI with the CICS-MQ adapter handling the connection. You don't call MQCONN in CICS — the adapter provides the connection handle through the CICS task.
How the CICS-MQ Adapter Works
The adapter is configured in CICS with a CICS MQ connection definition:
DEFINE MQCONN(MQ01)
GROUP(MQGROUP)
MQNAME(CNBPQM01)
RESYNCMEMBER(YES)
When your CICS program makes its first MQI call, the adapter connects to the queue manager on your behalf. All MQI calls within the same CICS task share this connection. The connection is released when the task ends.
The critical benefit: transaction coordination. The CICS-MQ adapter participates in CICS's unit of work. When you issue EXEC CICS SYNCPOINT, MQ operations under MQPMO-SYNCPOINT and MQGMO-SYNCPOINT are committed along with DB2 and VSAM updates. This is the two-phase commit integration we discussed in Section 19.5.
MQ Triggering in CICS
One of MQ's most powerful features in CICS is automatic triggering. You can configure a queue so that when a message arrives, MQ automatically starts a CICS transaction to process it.
The setup:
DEFINE QLOCAL(CNB.WIRE.REQUEST) +
TRIGGER +
TRIGTYPE(EVERY) +
PROCESS(CNB.WIRE.PROC) +
INITQ(SYSTEM.CICS.INITIATION.QUEUE)
DEFINE PROCESS(CNB.WIRE.PROC) +
APPLICID('WTRN') +
APPLTYPE(CICS)
When a message arrives on CNB.WIRE.REQUEST, MQ puts a trigger message on the initiation queue. The CICS MQ trigger monitor (transaction CKTI) picks up the trigger message and starts transaction WTRN. WTRN reads the message from CNB.WIRE.REQUEST and processes it.
Trigger types: - EVERY: Trigger on every message arrival. Use for low-to-medium volume queues where immediate processing is needed. - FIRST: Trigger only when the queue transitions from empty to non-empty. Use when you want one instance of the processing program to drain the queue. - DEPTH: Trigger when queue depth reaches a threshold. Use for batch-style processing — wait until enough messages accumulate.
The triggered CICS program follows a specific pattern. Here's the skeleton that CNB uses for all triggered MQ consumers:
0000-MAIN.
* CKTI passes the trigger message in a
* CICS Transient Data area. We don't need it —
* we know which queue to read.
MOVE MQHC-DEF-HCONN TO WS-HCONN
PERFORM 1000-OPEN-QUEUE
PERFORM 2000-PROCESS-LOOP
PERFORM 3000-CLOSE-QUEUE
EXEC CICS RETURN END-EXEC.
2000-PROCESS-LOOP.
PERFORM UNTIL WS-NO-MORE-MESSAGES
PERFORM 2100-GET-MESSAGE
IF NOT WS-NO-MORE-MESSAGES
PERFORM 2200-CHECK-BACKOUT
IF NOT WS-IS-POISON
PERFORM 2300-PROCESS-MESSAGE
END-IF
PERFORM 2400-COMMIT
END-IF
END-PERFORM.
Note the loop structure: a triggered program doesn't process just one message and exit. It drains the queue (or processes until a configurable batch limit). This is especially important with TRIGTYPE(FIRST) — one trigger starts one program instance, and that instance is responsible for all currently queued messages. If you process only one message and return, the remaining messages sit on the queue until a new message arrives and triggers the program again.
⚠️ TRIGGERED PROGRAM LIFETIME Marcus Whitfield at Federal Benefits Administration learned a painful lesson about triggered programs. Their enrollment processor was triggered with TRIGTYPE(FIRST). The program opened the queue, processed one message, committed, and returned. During peak enrollment periods with 50,000 messages queued, the program was triggered, processed one message, exited, waited for the next trigger (which only fires on empty-to-non-empty transition), and the other 49,999 messages sat there. The fix was simple but the 3-hour processing delay affected 12,000 benefit enrollments. Always drain the queue in a loop.
🔄 SPACED REVIEW — Chapter 13 In Chapter 13, we discussed CICS Multi-Region Operation (MRO) for connecting CICS regions. MQ offers an alternative approach. With MRO, the regions must be on the same LPAR and one region directly calls a program in another. With MQ, the regions can be anywhere — same LPAR, different LPARs, different sysplexes, different continents. The tradeoff: MRO is faster (no serialization/deserialization), MQ is more flexible. CNB uses MRO within the core banking TOR/AOR/FOR structure and MQ for inter-system integration. Know which tool fits which job.
EXEC CICS WRITEQ TS vs. MQ
Don't confuse CICS Temporary Storage (TS) queues with MQ queues. They solve different problems:
| Feature | CICS TS Queue | MQ Queue |
|---|---|---|
| Scope | Single CICS region | Cross-system, cross-platform |
| Persistence | Optional (AUXILIARY) | Yes (default for production) |
| Transaction | CICS UOW | RRS-coordinated |
| Triggering | No | Yes |
| Routing | No | Yes (channels, clusters) |
| Use case | Scratch pad within a transaction | Inter-system messaging |
If your data stays within one CICS region and one transaction's lifetime, use TS queues. If it crosses any boundary — system, time, or reliability requirement — use MQ.
19.7 MQ Clustering and High Availability
For production banking systems, a single queue manager is a single point of failure. MQ provides two mechanisms for high availability on z/OS: queue manager clusters and shared queues.
Queue Manager Clusters
An MQ cluster is a group of queue managers that can automatically route messages to each other without explicit channel and remote queue definitions.
-- Define cluster on each queue manager
ALTER QMGR REPOS(CNB.CLUSTER)
-- Define a cluster queue (available to all members)
DEFINE QLOCAL(CNB.PAYMENT.INBOUND) +
CLUSTER(CNB.CLUSTER) +
DEFBIND(NOTFIXED) +
CLWLPRTY(5)
With DEFBIND(NOTFIXED), MQ can route each message to a different instance of the queue across the cluster. This provides:
- Workload balancing: Messages distribute across multiple queue managers
- Availability: If one queue manager is down, messages route to others
- Simplified administration: No need for explicit remote queue definitions and channels between every pair
CNB's cluster has four members (one per LPAR). The CNB.PAYMENT.INBOUND queue exists on all four. When any system puts a message to that queue, MQ routes it to the least-loaded instance.
Shared Queues in Parallel Sysplex
For z/OS sites running a Parallel Sysplex, MQ supports shared queues stored in the Coupling Facility (CF). This is the highest level of MQ availability.
DEFINE QLOCAL(CNB.CRITICAL.PAYMENTS) +
QSGDISP(SHARED) +
CFSTRUCT(CNB_PAYMENT_CF) +
DEFPSIST(YES)
With shared queues: - Any queue manager in the Queue Sharing Group can access the queue - If a queue manager fails, other queue managers immediately access the messages - No message loss, no failover delay - The Coupling Facility provides the shared storage
+-------------------+
| Coupling Facility |
| CNB_PAYMENT_CF |
| [Shared Queue] |
+---+-------+---+--+
| | |
+-------+ +----+ +-------+
| | |
+----+----+ +---+-----+ +-------+--+
|CNBPQM01 | |CNBPQM02 | |CNBPQM03 |
|LPAR A | |LPAR B | |LPAR C |
+---------+ +---------+ +----------+
💡 SHARED QUEUES vs. CLUSTERING
Feature Cluster Shared Queues Scope Any connected QMs Queue Sharing Group (Sysplex) Failover Message re-routing Instant (CF-based) Message ordering Not guaranteed across instances Preserved per queue Infrastructure Network channels Coupling Facility Cost Lower Higher (CF hardware) CNB uses shared queues for critical payment processing (zero message loss tolerance) and clustering for lower-priority workloads like batch feeds and audit messages.
Workload Balancing
In a cluster, MQ balances messages across instances of a queue using several algorithms:
- Round-robin (default): Messages alternate between instances
- Priority-based (
CLWLPRTY): Higher-priority instances get messages first - Rank-based (
CLWLRANK): Used for primary/backup — rank 0 gets no messages unless all higher ranks are unavailable
For CNB's payment processing, Kwame configured the cluster with priority-based balancing. The primary LPAR gets priority 5; the DR LPAR gets priority 1. Under normal operations, all payments flow through the primary. If the primary LPAR fails, payments automatically route to DR.
19.8 Dead Letter Queues, Poison Messages, and Error Handling
This section is about what happens when things go wrong. In a production MQ environment processing 180 million messages per day, things go wrong daily.
Dead Letter Queue Processing
Every queue manager must have a DLQ. Messages end up there when:
- The target queue is full
- The target queue doesn't exist
- The sender isn't authorized to put to the target queue
- The message exceeds the target queue's maximum message length
- The queue is PUT-inhibited
MQ adds a Dead Letter Header (MQDLH) to the front of the message. The MQDLH tells you why the message was dead-lettered and where it was originally going.
COPY CMQDLHV.
* Dead Letter Header structure
5100-PROCESS-DLQ-MESSAGE.
*-------------------------------------------------------*
* Process a dead letter queue message *
*-------------------------------------------------------*
* The DLH is prepended to the original message
MOVE WS-MSG-BUFFER(1:LENGTH OF MQDLH)
TO MQDLH
* Extract diagnostic information
MOVE MQDLH-DESTQNAME TO WS-ORIG-QUEUE
MOVE MQDLH-DESTQMGRNAME
TO WS-ORIG-QMGR
MOVE MQDLH-REASON TO WS-DLQ-REASON
MOVE MQDLH-PUTDATE TO WS-DLQ-DATE
MOVE MQDLH-PUTTIME TO WS-DLQ-TIME
* Log for investigation
DISPLAY 'DLQ Message:'
DISPLAY ' Original queue: ' WS-ORIG-QUEUE
DISPLAY ' Original QMgr: ' WS-ORIG-QMGR
DISPLAY ' DLQ Reason: ' WS-DLQ-REASON
DISPLAY ' Put date/time: ' WS-DLQ-DATE
'/' WS-DLQ-TIME
* Attempt reprocessing based on reason
EVALUATE WS-DLQ-REASON
WHEN MQRC-Q-FULL
PERFORM 5110-RETRY-DELIVERY
WHEN MQRC-NOT-AUTHORIZED
PERFORM 5120-SECURITY-ALERT
WHEN OTHER
PERFORM 5130-ESCALATE
END-EVALUATE.
⚠️ PRODUCTION WAR STORY In 2022, CNB had an incident where a DLQ processor had a bug — it read the dead letter header incorrectly, submitted messages for reprocessing to the wrong queue, which generated more dead letters, which got reprocessed to the wrong queue again. In 45 minutes, the DLQ went from 200 messages to 2.3 million. The fix: the DLQ processor now writes a reprocessing audit trail and has a circuit breaker — if DLQ depth increases during reprocessing, it stops and alerts. "Never trust your error handler until your error handler has been through an incident," Kwame says.
Poison Message Handling
A poison message is one that causes the consuming application to fail every time it tries to process it. Maybe the message has an invalid format. Maybe it references data that doesn't exist in DB2. Maybe the processing logic has a bug that only triggers for certain message content.
Without handling, a poison message creates a rollback loop: MQGET → process → fail → rollback → MQGET (same message) → process → fail → rollback → forever.
The defense is the backout count mechanism we covered in Section 19.5, combined with an explicit routing strategy:
5200-CHECK-POISON-MESSAGE.
*-------------------------------------------------------*
* Poison message detection and routing *
*-------------------------------------------------------*
IF MQMD-BACKOUTCOUNT > 0
IF MQMD-BACKOUTCOUNT >= WS-BACKOUT-THRESHOLD
* This message has failed too many times
PERFORM 5210-LOG-POISON-MESSAGE
PERFORM 5220-ROUTE-TO-EXCEPTION
PERFORM 5230-COMMIT-AND-CONTINUE
ELSE
* Retry - but log the retry
ADD 1 TO WS-RETRY-TOTAL
DISPLAY 'Retry #'
MQMD-BACKOUTCOUNT
' for MsgId: '
MQMD-MSGID
END-IF
END-IF.
5210-LOG-POISON-MESSAGE.
DISPLAY '*** POISON MESSAGE DETECTED ***'
DISPLAY ' MsgId: ' MQMD-MSGID
DISPLAY ' CorrelId: ' MQMD-CORRELID
DISPLAY ' Backouts: ' MQMD-BACKOUTCOUNT
DISPLAY ' Queue: ' WS-QUEUE-NAME
DISPLAY ' PutDate: ' MQMD-PUTDATE
DISPLAY ' PutTime: ' MQMD-PUTTIME
* Write to audit table for investigation
PERFORM 5215-WRITE-AUDIT-RECORD.
A Complete Error Handling Strategy
Production MQ applications need a layered error handling strategy:
Layer 1: Operation-level. Check completion code and reason code after every MQI call. Handle expected conditions (queue full, no message available, quiescing) inline.
Layer 2: Message-level. Check backout count before processing. Route poison messages to the exception queue. Log every retry and every routing decision.
Layer 3: Program-level. If the program encounters an unrecoverable error (connection broken, security violation), clean up resources, log diagnostics, and abend with a meaningful code.
Layer 4: Infrastructure-level. Monitor queue depths, DLQ depths, channel statuses, and backout queue depths. Alert operations when thresholds are exceeded.
+----------------------------------------------+
| Layer 4: Infrastructure Monitoring |
| Queue depth alerts, channel monitoring, |
| DLQ depth, backout queue depth |
+----------------------------------------------+
| Layer 3: Program-Level |
| Unrecoverable errors → cleanup → abend |
+----------------------------------------------+
| Layer 2: Message-Level |
| Backout detection → poison routing |
+----------------------------------------------+
| Layer 1: Operation-Level |
| CC/RC checking after every MQI call |
+----------------------------------------------+
At CNB, every MQ program must implement all four layers. It's a code review gate — no program goes to production without verified MQ error handling at every layer.
19.9 Project Checkpoint — MQ Queue Architecture for HA Banking System
🧩 PROGRESSIVE PROJECT: HA Banking Transaction Processing System
For the HA Banking System project, you'll now design the MQ messaging layer that connects the system's components. Here's what you need:
Requirements
- Inter-LPAR communication: The HA system spans two LPARs (active-active). Transactions initiated on either LPAR must be visible on both.
- External interfaces: The system connects to FedNow (real-time payments), SWIFT (international wire), and the bank's mobile platform.
- Audit trail: Every transaction must produce an audit message for the compliance system.
- Fraud screening: Every debit over $10,000 must pass through fraud screening before authorization (request/reply, 5-second SLA).
- Notification: Account holders receive real-time notifications via the mobile platform.
Design Tasks
Task 1: Queue Manager Topology Define the queue managers. Consider: how many? Where? What's the HA strategy — clustering, shared queues, or both?
Task 2: Queue Definitions For each message flow, define the queues needed. Consider: local, remote, transmission, dead letter, backout. Name them consistently.
Task 3: Message Patterns For each flow, choose the pattern (datagram, request/reply, pub/sub) and justify the choice.
Task 4: Transactional Boundaries For the fraud screening flow, write the pseudocode showing DB2 update, MQPUT, and SYNCPOINT coordination. What happens if fraud screening times out?
Task 5: Error Handling Design the DLQ processing strategy. What happens to messages that can't be delivered? Who monitors? What's the escalation path?
Here's a starter topology diagram for your design:
LPAR-A LPAR-B
+------------------+ +------------------+
| HAQM01 | Shared Queue | HAQM02 |
| |<--- CF ---> | |
| Core Banking A | | Core Banking B |
| Fraud Engine | | Fraud Engine |
| Notification Pub | | Notification Pub |
+--------+---------+ +--------+---------+
| |
+------- MQ Cluster --------+-------+
|
+-------+-------+
| |
+----+----+ +-----+----+
| Gateway | | Gateway |
| QM03 | | QM04 |
| FedNow | | SWIFT |
+---------+ | Mobile |
+----------+
✅ CHECKPOINT DELIVERABLE Submit your queue architecture document with: queue manager definitions, queue definitions (MQSC format), channel definitions, message flow diagrams, and error handling strategy. This design becomes the foundation for the implementation exercises in Chapters 20–22.
Production Considerations
Monitoring. At minimum, monitor: queue depth (current and trend), DLQ depth, channel status, queue manager status, and message age (oldest message on each queue). CNB uses IBM MQ Explorer for real-time monitoring and Tivoli for alerting.
Security. MQ security has three layers: 1. Connection security: Who can connect to the queue manager (RACF MQCONN class) 2. Queue security: Who can open/put/get/inquire on each queue (RACF MQQUEUE class) 3. Channel security: TLS encryption and authentication on channels
Every MQ program at CNB runs under a specific RACF user ID with precisely scoped permissions. No program gets blanket access to all queues. "Least privilege isn't just a good idea," Lisa Tran says. "It's how you sleep at night."
Performance tuning. Key parameters: - MAXDEPTH: Size for peak volume plus 50% headroom. If a queue consistently runs deep, your consumers are too slow. - MAXMSGL: Keep as small as possible. Larger messages consume more buffer pool storage and log space. - Buffer pool size: The MQ equivalent of DB2's buffer pools. Size for your working set of messages. - Log configuration: Dual logging for production (always). Archive logs for recovery. - Persistent vs. non-persistent: Persistent messages hit the log (I/O cost). Use non-persistent only when you can tolerate message loss — which in banking means almost never.
Message design. Define your message formats in COBOL copybooks. Version them. Include a header with format version, timestamp, source system, and message type. When the format changes, increment the version. Consumers check the version and handle old and new formats during transition periods.
01 WS-MSG-HEADER.
05 MSG-FORMAT-ID PIC X(8) VALUE 'CNBWIRE1'.
05 MSG-FORMAT-VERSION PIC 9(4) VALUE 0003.
05 MSG-TIMESTAMP PIC X(26).
05 MSG-SOURCE-SYSTEM PIC X(8).
05 MSG-TYPE PIC X(4).
05 MSG-BODY-LENGTH PIC 9(8) COMP.
Capacity planning. Know your numbers: - Messages per second (average and peak) - Average message size - Persistent vs. non-persistent ratio - Queue depth profiles (how long messages sit before processing) - Log volume (persistent messages * average size * 2 for dual logging)
CNB processes 180 million messages per day. Peak hour is 7x average. During the acquisition migration, they hit 340 million in one day. The infrastructure was sized for 500 million. "Size for the disaster you haven't had yet," Rob Calloway says.
Testing MQ programs. Testing MQ-based applications is fundamentally different from testing batch file-processing programs. You need:
-
A test queue manager. Never test against production MQ. CNB has dedicated test queue managers (
CNBTQM01throughCNBTQM04) that mirror the production topology but with smaller queue depths and no external connections. -
Message injection tools. You need a way to put test messages on queues. IBM provides
amqsput(the sample put utility) and the MQ Explorer for manual message injection. For automated testing, CNB has a COBOL utility program that reads a test data file and puts each record as a message — essentially the same as Example 1 in this chapter's code samples. -
Message verification. After your program processes messages, you need to verify the results: Were the right messages put to the output queues? Were error cases routed to the backout queue? Were DLQ messages handled? CNB's test framework includes a verification program that reads output queues and compares messages against expected results.
-
Syncpoint testing. This is the trickiest part. You need to verify that rollback works correctly — that a failed processing step actually rolls back both the DB2 changes and the MQ messages. CNB has a test harness that deliberately injects failures at each step of the processing pipeline and verifies that the unit of work is properly backed out.
🔍 TESTING GOTCHA Here's a trap that catches every new MQ developer at least once: you put a message under syncpoint, then immediately use MQ Explorer to browse the queue and wonder why the message isn't there. It's there — but it's uncommitted. Until you commit (MQCMIT in batch, SYNCPOINT in CICS), the message is invisible to other programs. This is correct behavior, but it makes testing confusing until you understand the syncpoint lifecycle.
Operational runbook entries. Every MQ application should have runbook entries covering:
- How to check queue depths and channel status (
DIS Q(*) CURDEPTHandDIS CHS(*)) - How to restart a stopped channel (
START CHANNEL(name)) - How to drain a queue in an emergency (
CLEAR QLOCAL(name)— destructive, use with extreme caution) - How to inhibit puts or gets on a queue (
ALTER QLOCAL(name) PUT(DISABLED)) - How to display and resolve in-doubt units of work (
DIS CONN(*) TYPE(CONN) WHERE(UOWSTATE EQ UNRESOLVED)) - The escalation path for each queue's DLQ and backout queue messages
At CNB, the MQ runbook is 340 pages. It took two years to build. "Every page was written because something went wrong," Lisa Tran says. "The runbook is our institutional memory of incidents."
Summary
IBM MQ is the nervous system of enterprise mainframe integration. This chapter covered:
-
Why messaging exists: Temporal and spatial decoupling solve the four fundamental problems of point-to-point integration (temporal coupling, spatial coupling, format coupling, capacity coupling).
-
MQ architecture: Queue managers own queues. Channels connect queue managers. Messages flow through local queues, remote queue definitions, and transmission queues. Dead letter queues catch failures.
-
The MQI API in COBOL: MQCONN, MQOPEN, MQPUT, MQGET, MQCLOSE, MQDISC. Every call returns a completion code and reason code. Check both. Always.
-
Message patterns: Fire-and-forget (datagram) for decoupled flows, request/reply for synchronous-style interactions, pub/sub for one-to-many distribution. The pattern determines your error handling and capacity planning.
-
Transactional messaging: MQPMO-SYNCPOINT and MQGMO-SYNCPOINT make MQ operations part of the CICS/DB2 unit of work. Two-phase commit via RRS ensures atomicity across subsystems.
-
CICS integration: The CICS-MQ adapter handles connections. MQ triggering starts CICS transactions automatically. Know when to use MQ vs. MRO vs. TS queues.
-
High availability: Queue manager clusters for workload balancing and routing resilience. Shared queues in the Coupling Facility for zero-downtime failover.
-
Error handling: Four layers — operation, message, program, infrastructure. Backout counters prevent poison message loops. DLQs are ICUs, not trash cans.
The threshold concept: messaging decouples time, not just space. This is why messages must be persistent, why transactional coordination matters, why backout queues exist, and why the DLQ needs a processing strategy. When you put a message on a queue, you're making a promise to the future. MQ is how you keep that promise.
Spaced Review
From Chapter 1: z/OS Subsystems
MQ is a z/OS subsystem like DB2 and CICS — its own address spaces, its own recovery, its own security. The connection handle (HCONN) is your session with the subsystem, just as a DB2 plan is your session with DB2. Understanding the subsystem model helps you reason about MQ failures, recovery, and security.
From Chapter 13: CICS Regions — MRO vs. MQ
MRO connects CICS regions on the same LPAR with direct program calls. MQ connects anything to anything with asynchronous messages. MRO is faster (no serialization). MQ is more flexible (cross-LPAR, cross-platform, temporally decoupled). Production systems use both: MRO within the CICS region topology, MQ for inter-system integration.
🔍 LOOKING AHEAD Chapter 20 builds on this foundation with advanced MQ patterns — message sequencing, message grouping, and complex request/reply choreographies. Chapter 21 introduces MQ as the bridge to REST APIs and cloud services. Everything in Part IV starts here.