Case Study 7.1: From Alert Chaos to Priority Queue — Meridian Capital's AML Transformation
The Situation
Organization: Meridian Capital (fictional) — a US broker-dealer and investment adviser with $4.2 billion in assets under management Rafael's role: VP Compliance Technology Timeline: 2022–2023 Challenge: Inheriting a broken AML transaction monitoring program generating 340 alerts per week against a team capacity of 120 reviewed per week
The Inheritance
When Rafael Torres joined Meridian Capital as VP Compliance Technology in March 2022, he was handed a queue.
Not just any queue: a 550-alert backlog of unreviewed transaction monitoring alerts, some dating back eight weeks. The team of three AML analysts was reviewing approximately 120 alerts per week — well below the 340 new alerts being generated each week. The queue was growing by 220 alerts per week.
The previous compliance technology director had escalated the backlog problem to senior management twice and been told the team would be expanded "when budget allowed." The budget had not yet allowed.
Rafael's first two weeks were spent not redesigning the system but understanding it. He documented:
- 47 active monitoring scenarios — several variations of the same typology, accumulated over years as new scenarios had been added without retiring old ones
- Alert composition: reviewing a sample of 80 backlogged alerts, he estimated 3 genuine suspicious activity indicators. The rest were legitimate transactions tripping thresholds: a law firm's client-fund wire transfers repeatedly triggering rapid-movement scenarios; an import/export business generating consistent foreign exchange patterns that looked like layering to the rules engine
- Analyst workflow: no priority scheme — alerts were reviewed in the order they were received. The oldest alert in the queue was 58 days old
His memo to the Chief Compliance Officer was brief: "The current system generates approximately 340 alerts per week of which approximately 3–5% are genuinely suspicious. Our analysts are reviewing 120 alerts per week. The remaining 97–98% legitimate alerts are effectively burying the genuine ones."
Phase 1: Emergency Triage (Months 1–3)
Before redesigning the system, Rafael needed to address the immediate backlog problem.
Decision 1: Prioritize by scenario risk weight
Rafael assigned each of the 47 scenarios a risk weight based on his analysis of historical alert outcomes. Scenarios with historically high true positive rates (structuring-adjacent scenarios, OFAC-country wire scenarios) were weighted higher. Scenarios with near-zero historical true positive rates were flagged for elimination.
The analyst team was instructed to work the highest-weighted scenarios first, regardless of alert age.
This did not reduce the backlog. But it ensured that if genuine suspicious activity was sitting in the queue, it was more likely to be discovered quickly.
Decision 2: Bulk review for low-weight scenarios
For the four scenarios with the lowest historical true positive rates, Rafael implemented a "bulk review" protocol: analysts reviewed five alerts from the scenario simultaneously, looking for whether any of the five showed characteristics beyond the triggering condition. If none did, all five were closed with a documented group rationale. This was a deliberate regulatory risk — he documented his justification and obtained CCO approval.
This reduced the effective review time for low-risk alerts from 20–25 minutes each to approximately 8 minutes each (in groups of five), freeing capacity for higher-risk alerts.
Decision 3: Customer-level exception documentation
For the law firm and the import/export business — both identified as chronic false positive generators — Rafael documented detailed customer-level exception rationales. Future alerts from these customers in the triggering scenarios were reviewed against the documented exception, reducing review time from 20 minutes to approximately 5 minutes per alert.
By the end of month 3: backlog reduced from 550 to 280. Not solved, but no longer growing.
Phase 2: Scenario Rationalization (Months 4–6)
With the immediate crisis stabilized, Rafael turned to the system design.
The scenario audit
Rafael conducted a formal audit of all 47 scenarios, documenting for each: - Historical alert volume (weekly average) - Historical true positive rate (estimated from sample reviews) - Coverage overlap with other scenarios (were multiple scenarios effectively catching the same patterns?) - Regulatory basis (was there a regulatory requirement or guidance document requiring this typology to be monitored?)
The findings were stark. Eight scenarios had not generated a single confirmed suspicious activity alert in the past 12 months. Twelve scenarios were partial duplicates of other scenarios. Five scenarios had been added in response to individual regulatory examination feedback but had never been calibrated against actual transaction data.
The rationalization decision
After discussion with the CCO and outside AML counsel, Rafael retired 19 scenarios outright (the eight zero-output scenarios plus 11 duplicates), modified 8 scenarios (adjusting thresholds based on historical population data), and retained 20 scenarios unchanged.
From 47 scenarios to 28: a 40% reduction. Projected impact: alert volume reduction from 340 to approximately 220 per week.
He documented the rationale for every retirement and modification in a formal "Scenario Rationalization Memorandum" maintained in the AML program file — a critical paper trail for the next regulatory examination.
Phase 3: ML Alert Prioritization (Months 7–12)
The scenario rationalization had not solved the core problem: even at 220 alerts per week, the team was reviewing 120. The fundamental capacity gap remained.
Rafael evaluated three options: 1. Hire additional analysts — budget constrained; approved for one additional hire (not enough) 2. Implement a full ML replacement system — capital-intensive, long implementation, significant regulatory documentation burden 3. Deploy an ML triage layer — a scoring model that prioritizes the existing alert queue without replacing the underlying rules engine
He chose option 3.
The ML triage model
Working with an external RegTech vendor, Rafael deployed a gradient boosting model that scored each incoming alert (0.0–1.0) based on 45 features, including:
- Customer risk rating at time of alert
- Customer account age and relationship history
- Transaction counterparty characteristics (known counterparty, first-time counterparty, jurisdiction)
- Pattern context (was this alert triggered in isolation or alongside other unusual patterns?)
- Historical alert outcomes for this customer
- Velocity changes in the customer's transaction behavior (30-day, 90-day windows)
- Time-of-day and day-of-week features
- Deviation from the customer's established behavioral baseline
The model was trained on Meridian's historical SAR cases (54 cases over four years) and negative cases (confirmed closed alerts). The dataset was small — 54 confirmed positives is a very limited training set — so the vendor supplemented with synthetic data generation and used transfer learning from a pre-trained model trained on a larger industry dataset.
Integration with the alert queue
The ML score did not generate or close alerts. It reordered the queue. An alert from a high-risk customer with a first-time offshore counterparty, appearing alongside a velocity change alert on the same account, received a high score and moved to the top of the analyst queue. An alert from a long-established customer with a stable counterparty relationship and no behavioral changes received a low score and moved to the bottom.
Analysts still reviewed every alert. The model just changed the order.
Explainability implementation
Each alert in the analyst interface displayed a "top factors" section generated by SHAP values: the three features that contributed most to the alert's elevated score. An analyst reviewing an alert with a score of 0.82 would see: "Key risk factors: (1) first-time counterparty jurisdiction [high-risk], (2) 340% velocity increase vs. 90-day baseline, (3) customer risk rating elevated (High) for 6 months."
This reduced average alert review time by approximately 20%: analysts spent less time reconstructing context and more time making decisions.
Results: 12 Months Post-Implementation
| Metric | Baseline (Mar 2022) | 12 Months Post (Mar 2023) |
|---|---|---|
| Weekly alert volume | 340 | 190 |
| Weekly review capacity | 120 | 145 (one additional hire) |
| Queue backlog | 550 (peak) | 38 |
| False positive rate | ~96% | ~78% |
| True positive detection rate | Baseline | Maintained at baseline |
| Average review time per alert | ~22 min | ~18 min |
| SAR filing rate per analyst-hour | Baseline | +35% |
| Scenario count | 47 | 28 |
The combination of scenario rationalization (340→220 alerts) and one additional hire (capacity 120→145) created a modest surplus. ML prioritization ensured genuine suspicious activity was reviewed before lower-risk alerts. Over the 12 months post-implementation, the team filed 34% more SARs per analyst-hour than in the prior 12-month period.
Regulatory Examination: The Documentation Test
Six months after implementation, Meridian's primary banking regulator conducted a targeted AML examination focused on the transaction monitoring program.
The examination included: - Review of the Scenario Rationalization Memorandum - Sample review of 25 alert dispositions (including 5 closed alerts, 5 escalated, 2 SAR filings, and 13 alerts where the ML model had assigned high scores) - Questions about the ML model's training, validation, and governance
The ML governance documentation was the most intensive part of the examination. Examiners reviewed: - Training dataset description and limitations (including the acknowledgment of the small positive class) - Model validation report (conducted by an independent internal model risk team) - Ongoing monitoring protocols (model performance reviewed quarterly) - Fallback procedures (if the ML model is unavailable, the queue defaults to time-based ordering)
The examination concluded with no findings related to the monitoring program. The examiner's written feedback noted the institution's "documented rationale for scenario rationalization" and "appropriate governance framework for the supplemental ML prioritization tool."
Rafael's note in the compliance program file after the examination: "Document everything. The examiner didn't care that we had an ML model. They cared that we knew what it was doing and why."
Discussion Questions
1. Rafael's bulk review protocol for low-risk scenarios required CCO approval and explicit documentation. What are the regulatory risks of this approach? What documentation would reduce those risks?
2. The ML triage model was trained on only 54 confirmed SAR cases — a very small positive class. What specific model validation steps would you implement to assess whether a model trained on such limited data is reliable enough for production use?
3. Rafael's ML model reordered the alert queue but did not auto-close any alerts. If he had instead implemented auto-closure of alerts below a certain ML score threshold (e.g., score < 0.2), what additional governance and documentation would be required?
4. The regulatory examiner focused significantly on ML model governance. Based on the examination described, design a one-page "ML Model Governance Summary" that a compliance officer could maintain in the AML program file to address likely examiner questions.
5. Meridian Capital's program achieved a false positive rate reduction from 96% to 78% — a meaningful improvement but still a very high rate of false positives. What are the diminishing returns in false positive reduction, and at what false positive rate does the investment in additional improvement (additional ML sophistication, additional tuning) stop generating meaningful compliance value?