Case Study 1: CNB's Real-Time Fraud Detection via MQ Triggers and Pub/Sub
Background
Continental National Bank processes $4.7 billion in wire transfers daily. Before the event-driven architecture initiative, wire transfer fraud detection was a sequential request/reply flow: the wire transfer program submitted each transaction to the fraud screening system, waited for a risk score, then submitted to compliance review, waited again, and finally submitted to AML (anti-money-laundering) scoring. Four systems, four round trips, a best-case latency of 4 seconds and a worst-case of 22 seconds.
In November 2023, a fraudulent wire transfer for $2.3 million cleared because the compliance review system was processing a backlog from the morning's ACH batch. The fraud system detected the anomaly in 200 milliseconds. The compliance system took 13.8 seconds to respond. By the time all four systems had approved, the wire had settled through the Federal Reserve's FedNow system and the funds were irrecoverable.
Kwame Mensah's post-mortem concluded with a single architectural directive: "No single downstream system's latency should ever gate the fraud detection path. These systems must process in parallel, not in sequence."
The Pre-Event Architecture
Message Flow (Request/Reply)
WIREXFR1 (Wire Transfer Program)
|
|--[MQPUT]-> FRAUD.SCREEN.REQUEST
| [wait...]
|<-[MQGET]-- FRAUD.SCREEN.REPLY (avg 200ms, max 3s)
|
|--[MQPUT]-> COMPL.REVIEW.REQUEST
| [wait...]
|<-[MQGET]-- COMPL.REVIEW.REPLY (avg 1.2s, max 14s)
|
|--[MQPUT]-> SANCTIONS.CHECK.REQUEST
| [wait...]
|<-[MQGET]-- SANCTIONS.CHECK.REPLY (avg 800ms, max 5s)
|
|--[MQPUT]-> AML.SCORE.REQUEST
| [wait...]
|<-[MQGET]-- AML.SCORE.REPLY (avg 1.5s, max 8s)
|
[Proceed with wire transfer]
Total latency: Average 3.7 seconds, maximum 30 seconds (sum of worst cases).
Problems Identified
-
Sequential dependency. Each system waited for the previous system's response before submitting to the next. A slow compliance system blocked fraud detection, even though the two systems are functionally independent.
-
CICS task holding. The wire transfer transaction held a CICS task slot, a DB2 thread, and task-related storage for the entire duration of the four request/reply exchanges. During peak hours, 40–60 wire transfers were in flight simultaneously, consuming task resources for seconds each.
-
Fragile timeout logic. The wire transfer program had four different timeout handlers, each with different retry and fallback behavior. The timeout for compliance review was set to 15 seconds (increased from 10 after false timeouts during batch processing). The timeout for fraud screening was 5 seconds. Each timeout path had its own error handling paragraph, its own logging format, and its own escalation procedure. The program had grown to 12,000 lines, largely due to timeout handling.
-
Operational coupling. Maintenance on any of the four downstream systems required coordinating with the wire transfer operations team because taking a system down meant the wire transfer program would timeout and reject transfers.
The Event-Driven Architecture
Design Principles
Lisa Tran and Kwame designed the new architecture around three principles:
- The wire transfer program publishes, never waits. It announces the wire transfer event and moves on.
- Downstream systems subscribe independently. Each system receives the event, processes it at its own pace, and publishes its own result event.
- A separate decision aggregator collects results. Rather than the wire transfer program coordinating the sequential flow, a new "Fraud Decision Aggregator" program collects results from all downstream systems and makes the hold/release decision.
New Message Flow (Event-Driven)
WIREXFR1 (Wire Transfer Program)
|
|--[MQPUT to topic]--> CNB/EVENTS/WIRE/SUBMITTED
| (returns immediately — program is done)
|
+---> FRAUD.WIRE.INBOUND (subscription queue)
| |
| v
| FRAUDSCR (Fraud Screening) — MQ triggered, TRIGTYPE(FIRST)
| |--[publish]--> CNB/EVENTS/WIRE/FRAUD_SCORED
|
+---> COMPL.WIRE.INBOUND (subscription queue)
| |
| v
| COMPLRVW (Compliance Review) — MQ triggered, TRIGTYPE(FIRST)
| |--[publish]--> CNB/EVENTS/WIRE/COMPL_REVIEWED
|
+---> SANCT.WIRE.INBOUND (subscription queue)
| |
| v
| SANCTCHK (Sanctions Check) — MQ triggered, TRIGTYPE(FIRST)
| |--[publish]--> CNB/EVENTS/WIRE/SANCTIONS_CHECKED
|
+---> AML.WIRE.INBOUND (subscription queue)
| |
| v
| AMLSCOR (AML Scoring) — MQ triggered, TRIGTYPE(FIRST)
| |--[publish]--> CNB/EVENTS/WIRE/AML_SCORED
|
+---> AUDIT.WIRE.INBOUND (subscription queue)
|
v
AUDITLOG (Audit Logger) — writes to DB2 audit table
The Fraud Decision Aggregator (FRDAGGR) subscribes to all four result topics:
CNB/EVENTS/WIRE/FRAUD_SCORED ---> FRDAGGR.FRAUD.INBOUND
CNB/EVENTS/WIRE/COMPL_REVIEWED ---> FRDAGGR.COMPL.INBOUND
CNB/EVENTS/WIRE/SANCTIONS_CHECKED -> FRDAGGR.SANCT.INBOUND
CNB/EVENTS/WIRE/AML_SCORED ---> FRDAGGR.AML.INBOUND
FRDAGGR is a CICS transaction that maintains aggregation state in a DB2 table. For each wire transfer (identified by correlation ID), it tracks which result events have been received. When all four results are in (or when a configurable timeout expires), it makes the hold/release decision and publishes the final event:
CNB/EVENTS/WIRE/DECISION (HOLD or RELEASE)
CICS Event Processing Configuration
Rather than modifying the wire transfer program (WIREXFR1), the team used CICS EP to capture events. The event binding captures the VSAM WRITE to the WIRETRAN file:
<event-binding name="WireTransferSubmitted">
<capture-specification>
<event-capture name="WireCapture"
component="WIREXFR1"
capture-point="FILE_WRITE"
current-context="WIRETRAN">
<capture-data>
<data-item name="TransactionID"
source="CONTAINER"
container-name="WIRE-DATA"
offset="0"
length="20"/>
<data-item name="SourceAccount"
source="CONTAINER"
container-name="WIRE-DATA"
offset="20"
length="16"/>
<data-item name="DestAccount"
source="CONTAINER"
container-name="WIRE-DATA"
offset="36"
length="16"/>
<data-item name="Amount"
source="CONTAINER"
container-name="WIRE-DATA"
offset="52"
length="8"/>
<data-item name="Currency"
source="CONTAINER"
container-name="WIRE-DATA"
offset="60"
length="3"/>
<data-item name="BeneficiaryName"
source="CONTAINER"
container-name="WIRE-DATA"
offset="63"
length="40"/>
<data-item name="BeneficiaryBank"
source="CONTAINER"
container-name="WIRE-DATA"
offset="103"
length="11"/>
<data-item name="OriginatorChannel"
source="CONTAINER"
container-name="WIRE-DATA"
offset="114"
length="10"/>
<data-item name="OriginatorIP"
source="CONTAINER"
container-name="WIRE-DATA"
offset="124"
length="45"/>
</capture-data>
</event-capture>
</capture-specification>
<event-emission>
<adapter-name>MQAdapter</adapter-name>
<adapter-properties>
<property name="queue-manager" value="HAQM01"/>
<property name="topic-string"
value="CNB/EVENTS/WIRE/SUBMITTED"/>
<property name="format" value="JSON"/>
<property name="persistence" value="PERSISTENT"/>
</adapter-properties>
</event-emission>
</event-binding>
This event binding was deployed in 3 hours, including testing. Zero code changes to WIREXFR1.
MQ Trigger Configuration
Each downstream system's subscription queue is configured with TRIGTYPE(FIRST) to start the processing program when events arrive:
-- Fraud screening trigger
DEFINE QLOCAL(FRAUD.WIRE.INBOUND) +
DESCR('Wire events for fraud screening') +
PUT(ENABLED) GET(ENABLED) +
DEFPSIST(YES) +
MAXDEPTH(100000) +
TRIGGER +
TRIGTYPE(FIRST) +
INITQ(SYSTEM.CICS.INITIATION.QUEUE) +
PROCESS(FRAUD.WIRE.PROCESS) +
BOTHRESH(3) +
BOQNAME(FRAUD.WIRE.BACKOUT)
DEFINE PROCESS(FRAUD.WIRE.PROCESS) +
APPLICTYPE(CICS) +
APPLICID(FRDS) +
USERDATA('FRAUD.WIRE.INBOUND')
Similar definitions exist for compliance, sanctions, and AML queues.
Implementation Timeline
| Week | Activity |
|---|---|
| 1–2 | Event schema design, topic hierarchy, subscription matrix |
| 3 | CICS event binding creation and unit testing |
| 4 | Fraud Decision Aggregator program development |
| 5–6 | MQ trigger configuration, subscription setup, integration testing |
| 7 | Performance testing (500,000 simulated wire transfers) |
| 8 | Parallel run (old and new paths, results compared) |
| 9 | Cutover — old request/reply path disabled |
| 10 | Stabilization and monitoring tuning |
Total: 10 weeks from design to production. The previous attempt to add sanctions screening to the request/reply path had taken 5 weeks for a single integration point.
Production Results
Latency Improvement
| Metric | Before (Request/Reply) | After (Event-Driven) |
|---|---|---|
| Mean detection time (fraud flag) | 3.7 seconds | 340 milliseconds |
| P99 detection time | 22 seconds | 1.8 seconds |
| Wire transfer program latency | 3.7 seconds (waiting) | 12 milliseconds (publish and return) |
| CICS task hold time (wire txn) | 3.7 seconds | 12 milliseconds |
The fraud screening system now processes in parallel with compliance, sanctions, and AML. The fraud system's 200ms response time is no longer hidden behind the compliance system's 1.2-second average.
Resource Improvement
| Metric | Before | After |
|---|---|---|
| Concurrent CICS tasks (wire transfer) | 40–60 (holding for replies) | 5–8 (publish and return) |
| DB2 thread hold time (wire transfer) | 3.7s average | 12ms average |
| Peak CICS task-related storage (wire transfer AOR) | 340 MB | 48 MB |
The reduction in CICS task holding freed capacity equivalent to one full AOR region — $180K/year in MSU licensing savings.
Operational Improvement
- Adding a new subscriber (e.g., the regulatory reporting system added in Q2 2024) requires one MQSC subscription definition. No code changes. No testing of the wire transfer program. Deployment in 45 minutes including change control.
- Maintenance on downstream systems no longer affects wire transfer processing. The compliance system's monthly maintenance window no longer requires coordination with the wire transfer operations team.
Fraud Decision Aggregator — Design Details
The aggregator is the most complex new component. It maintains state in a DB2 table:
CREATE TABLE CNB_FRAUD_AGGREGATION (
WIRE_CORRELATION_ID CHAR(24) NOT NULL,
WIRE_AMOUNT DECIMAL(15,2) NOT NULL,
WIRE_TIMESTAMP TIMESTAMP NOT NULL,
FRAUD_SCORE SMALLINT DEFAULT -1,
FRAUD_RECEIVED_TS TIMESTAMP,
COMPL_RESULT CHAR(10) DEFAULT 'PENDING',
COMPL_RECEIVED_TS TIMESTAMP,
SANCTIONS_RESULT CHAR(10) DEFAULT 'PENDING',
SANCTIONS_RECEIVED_TS TIMESTAMP,
AML_SCORE SMALLINT DEFAULT -1,
AML_RECEIVED_TS TIMESTAMP,
DECISION CHAR(10) DEFAULT 'PENDING',
DECISION_TS TIMESTAMP,
DECISION_REASON VARCHAR(200),
PRIMARY KEY (WIRE_CORRELATION_ID)
);
CREATE INDEX IX_FRAUD_AGG_PENDING
ON CNB_FRAUD_AGGREGATION (DECISION)
WHERE DECISION = 'PENDING';
The aggregator processes each result event by updating the corresponding column. After each update, it checks whether all four results are in:
* Check if all results received
IF FRAUD-SCORE NOT = -1
AND COMPL-RESULT NOT = 'PENDING'
AND SANCTIONS-RESULT NOT = 'PENDING'
AND AML-SCORE NOT = -1
* All results in — make decision
PERFORM 5000-MAKE-FRAUD-DECISION
ELSE
* Still waiting — check for timeout
PERFORM 5100-CHECK-TIMEOUT
END-IF
The decision logic evaluates the aggregate results:
5000-MAKE-FRAUD-DECISION.
EVALUATE TRUE
WHEN SANCTIONS-RESULT = 'BLOCKED'
* Sanctions hit — mandatory hold
MOVE 'HOLD' TO WS-DECISION
MOVE 'SANCTIONS BLOCK' TO WS-REASON
WHEN FRAUD-SCORE > 85
* High fraud score — hold for review
MOVE 'HOLD' TO WS-DECISION
MOVE 'HIGH FRAUD SCORE' TO WS-REASON
WHEN COMPL-RESULT = 'ESCALATE'
* Compliance escalation — hold
MOVE 'HOLD' TO WS-DECISION
MOVE 'COMPLIANCE ESCALATION' TO WS-REASON
WHEN AML-SCORE > 70
* AML flag — hold for SAR review
MOVE 'HOLD' TO WS-DECISION
MOVE 'AML FLAG' TO WS-REASON
WHEN OTHER
* All clear — release
MOVE 'RELEASE' TO WS-DECISION
MOVE 'ALL CHECKS PASSED' TO WS-REASON
END-EVALUATE.
A timeout mechanism handles cases where one or more downstream systems fail to respond within 60 seconds. The timeout behavior is configurable per system: fraud timeout results in HOLD (fail-safe), notification timeout results in RELEASE (fail-open).
Lessons Learned
What Worked
-
CICS EP eliminated the highest-risk code change. Not having to modify WIREXFR1 — a 12,000-line program processing $4.7B daily — removed three weeks of development and two weeks of regression testing from the timeline.
-
Pub/sub made the architecture extensible. Adding the fifth subscriber (regulatory reporting) took 45 minutes. Adding a sixth (real-time dashboard) took 30 minutes. The marginal cost of new consumers approaches zero.
-
MQ triggers eliminated resource waste. The four downstream programs now start only when events arrive, rather than running continuously in GET loops. During overnight quiet hours (midnight to 5 AM), when wire transfer volume drops to near zero, no triggered programs run.
What Didn't Work (Initially)
-
The first version of the aggregator had a race condition. Two result events arriving simultaneously for the same wire transfer caused a DB2 deadlock on the aggregation table. The fix: row-level locking with ISOLATION(CS) and a deadlock retry loop. The aggregator now handles 200 concurrent updates without deadlocks.
-
CICS EP phantom events caused false fraud alerts during the first week. The EP binding captured wire transfer WRITE events even for transactions that later rolled back (Section 20.3). The fraud system scored and alerted on phantom transfers. The fix: the fraud scoring program now checks the WIRETRAN VSAM file to confirm the transaction exists before scoring. Phantom events are logged and discarded.
-
Monitoring gaps. The team didn't initially monitor the individual subscription queue depths. After a compliance system maintenance window, 12,000 events accumulated on COMPL.WIRE.INBOUND. Nobody noticed until a compliance analyst asked why reviews were delayed. The fix: queue depth monitoring with alerts at 1,000 messages on every subscription queue.
Current State (2025)
CNB's event-driven fraud detection system has been in production for 14 months. Key metrics:
- 0 fraudulent wires cleared due to downstream system latency since deployment (compared to 3 in the 14 months before)
- $11.2 million in prevented fraud attributed to the faster detection pipeline
- 97.3% of wire transfers receive all four results within 2 seconds
- 8 subscriber systems now consume wire transfer events (up from the original 4)
- Average 1.2 million events published daily to the wire transfer topic hierarchy
The architecture has been adopted as the reference pattern for CNB's next three event-driven initiatives: real-time payment screening (FedNow), card transaction monitoring, and ACH fraud detection.