When Rafael Torres inherited responsibility for Meridian Capital's AML transaction monitoring program, the system was generating 340 alerts per week. His team of three analysts could review approximately 120 per week. The alert queue was growing.
In This Chapter
- Opening: Rafael's Alert Problem
- 7.1 The AML Framework: From FATF to Local Implementation
- 7.2 Transaction Monitoring: How It Works
- 7.3 Rules-Based Systems: Tuning, Thresholds, and Typologies
- 7.4 Machine Learning in Transaction Monitoring: What Changes
- 7.5 Managing Alert Volume: The False Positive Problem
- 7.6 Hybrid Approaches: Rules + AI in Production
- 7.7 Alert Review Workflows and Productivity Metrics
- Chapter Summary
Chapter 7: AML Transaction Monitoring: Rules-Based vs. AI-Driven Approaches
Opening: Rafael's Alert Problem
When Rafael Torres inherited responsibility for Meridian Capital's AML transaction monitoring program, the system was generating 340 alerts per week. His team of three analysts could review approximately 120 per week. The alert queue was growing.
The harder problem was not the volume, though. The harder problem was what the queue contained. Rafael ran an internal analysis on a three-month sample of the alert backlog — the 550 alerts that had not yet been reviewed. Of those, his estimate was that perhaps 12 to 15 represented genuine suspicious activity warranting further investigation. The other 535 were legitimate transactions that had tripped a threshold — a large cash deposit by a law firm holding client funds, a series of international wires for a legitimate import/export business, a cluster of deposits from a contractor who invoiced monthly and received payment from multiple clients.
In the weeks those 535 false positive alerts sat unreviewed in the queue, the 12–15 genuine alerts were also sitting there. Unreviewed. Unacted upon.
This is the central paradox of rule-based AML transaction monitoring: it is so effective at generating alerts that it effectively obscures the suspicious activity it is supposed to surface.
7.1 The AML Framework: From FATF to Local Implementation
Anti-money laundering (AML) is the body of law, regulation, and compliance practice designed to prevent and detect the use of the financial system for laundering the proceeds of crime.
The Three Stages of Money Laundering
Money laundering typically proceeds through three stages:
Placement: Introducing the proceeds of crime into the financial system. This is typically the highest-risk moment for the launderer — physical cash entering a bank in large amounts is conspicuous. Common placement techniques include: cash deposits (structured to avoid reporting thresholds), casino transactions, purchase of high-value goods (jewelry, art, real estate), and use of money service businesses.
Layering: Moving money through a series of transactions to obscure its origins. Wire transfers to multiple jurisdictions, conversion between currencies, purchase and sale of assets, routing through shell companies. The goal is to break the audit trail between the placed funds and their criminal origin.
Integration: Reintroducing the laundered funds into the legitimate economy — investment in real estate, business operations, or financial assets — in a form that appears legitimate.
Transaction monitoring systems are most effective at detecting placement and layering, which involve suspicious transaction patterns. Integration is harder to detect through transaction monitoring alone.
The FATF Framework
The Financial Action Task Force's Forty Recommendations establish the international standard for AML/CFT. Recommendation 20 requires that countries "ensure that financial institutions report suspicious transactions to the financial intelligence unit (FIU)" when they suspect that funds are the proceeds of criminal activity or are related to terrorist financing.
This reporting obligation — filing a Suspicious Activity Report (SAR) in the US, a Suspicious Transaction Report (STR) in many other jurisdictions — is the central output of an AML transaction monitoring program. The entire monitoring system exists to generate qualified referrals for SAR filing.
7.2 Transaction Monitoring: How It Works
A transaction monitoring system analyzes financial transactions — deposits, withdrawals, wire transfers, loan payments, trades — against a set of detection criteria, producing alerts for cases that warrant human investigation.
The Basic Architecture
Financial Transactions
(real-time or batch)
│
▼
┌─────────────────────────────────────┐
│ DATA ENRICHMENT │
│ Customer risk rating, account │
│ history, counterparty information │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ DETECTION ENGINE │
│ Rules / ML models / Hybrid │
│ (produces alert score per tx) │
└─────────────────────────────────────┘
│
┌─────┴──────┐
Score > threshold Score < threshold
│ │
▼ ▼
ALERT QUEUE No action
│
▼
┌─────────────────────────────────────┐
│ ALERT REVIEW │
│ Analyst: investigate, close, or │
│ escalate to SAR filing │
└─────────────────────────────────────┘
The Alert Review Process
When an alert is generated, an analyst reviews it following a structured process:
- Alert review: What triggered the alert? What transaction(s) are involved?
- Customer review: What is the customer's profile? What is their risk rating? Is this consistent with their stated business purpose?
- Transaction context: Does the transaction make sense given the customer's profile? Are there explanations consistent with legitimate activity?
- Pattern analysis: Does this transaction form part of a pattern with other transactions?
- Decision: Close the alert (no suspicious activity identified), escalate to enhanced review, or file a SAR.
All steps must be documented. The audit trail of alert review decisions is the primary evidence of a functioning AML program.
7.3 Rules-Based Systems: Tuning, Thresholds, and Typologies
Rule-based transaction monitoring applies predefined detection criteria — "scenarios" — to transaction data. A scenario is a combination of rules that together define a suspicious pattern.
Typical AML Scenarios
Scenario 1: Currency Transaction Report (CTR) avoidance — structuring
Customer conducts multiple cash transactions in a single day
AND total cash amount exceeds $10,000
AND no individual transaction exceeds $10,000
→ Flag for structuring review
Scenario 2: Rapid in-and-out funds movement
Account receives inbound wire(s) totaling > $X within Y days
AND outbound wire(s) debit > 90% of inbound amount within Z days
AND account balance returns to near-zero
→ Flag for rapid movement review
Scenario 3: Round-dollar international transfers
International wire transfer where amount is a round number
AND amount > $5,000
AND destination country is in enhanced-scrutiny list
→ Flag for review
The Tuning Challenge
Each scenario has parameters — thresholds, time windows, percentage ratios — that must be calibrated. Too sensitive: excessive false positives. Too lenient: suspicious activity missed.
Calibration is typically done through a process of: - Historical analysis: looking at past confirmed suspicious activity to understand what parameters would have flagged it - Sampling: reviewing a random sample of transactions that were flagged and not flagged to estimate false positive and true positive rates - Peer benchmarking: comparing scenario performance metrics to industry averages
The tuning challenge is compounded by the fact that customer and transaction populations change over time. A threshold calibrated for a bank's 2020 customer base may be wrong for its 2024 customer base after significant growth or product expansion.
"""
AML Scenario Tuning Analysis
This example shows how to analyze the impact of different
threshold settings on alert volume and detection rate.
"""
import pandas as pd
import numpy as np
def analyze_rapid_movement_scenario(
transactions_df: pd.DataFrame,
known_suspicious_ids: list,
inbound_threshold: float = 50_000,
outflow_percentage: float = 0.90,
time_window_days: int = 3
) -> dict:
"""
Analyze the Rapid In-and-Out scenario with given parameters.
Returns: alert statistics including estimated false positive rate.
"""
# Group transactions by account
account_groups = transactions_df.groupby('account_id')
flagged_accounts = []
for account_id, txns in account_groups:
# Check each possible time window
txns = txns.sort_values('transaction_date')
for i, row in txns.iterrows():
window_end = row['transaction_date'] + pd.Timedelta(days=time_window_days)
window_txns = txns[
(txns['transaction_date'] >= row['transaction_date']) &
(txns['transaction_date'] <= window_end)
]
inbound = window_txns[window_txns['direction'] == 'CREDIT']['amount'].sum()
outbound = window_txns[window_txns['direction'] == 'DEBIT']['amount'].sum()
if inbound >= inbound_threshold and outbound / inbound >= outflow_percentage:
flagged_accounts.append(account_id)
break
flagged_accounts = list(set(flagged_accounts))
true_positives = [a for a in flagged_accounts if a in known_suspicious_ids]
false_positives = [a for a in flagged_accounts if a not in known_suspicious_ids]
# What did we miss?
missed = [a for a in known_suspicious_ids if a not in flagged_accounts]
return {
'parameters': {
'inbound_threshold': inbound_threshold,
'outflow_percentage': outflow_percentage,
'time_window_days': time_window_days,
},
'total_flagged': len(flagged_accounts),
'true_positives': len(true_positives),
'false_positives': len(false_positives),
'missed_suspicious': len(missed),
'false_positive_rate': len(false_positives) / len(flagged_accounts)
if flagged_accounts else 0,
'recall': len(true_positives) / len(known_suspicious_ids)
if known_suspicious_ids else 0,
}
def find_optimal_threshold(transactions_df: pd.DataFrame,
known_suspicious_ids: list) -> pd.DataFrame:
"""
Test multiple threshold combinations to find the optimal setting
that balances recall (detection rate) against false positive volume.
"""
results = []
thresholds = [25_000, 50_000, 75_000, 100_000]
percentages = [0.80, 0.85, 0.90, 0.95]
windows = [3, 5, 7]
for threshold in thresholds:
for pct in percentages:
for window in windows:
result = analyze_rapid_movement_scenario(
transactions_df,
known_suspicious_ids,
inbound_threshold=threshold,
outflow_percentage=pct,
time_window_days=window
)
results.append(result)
results_df = pd.DataFrame(results)
results_df = results_df.sort_values(['recall', 'false_positive_rate'],
ascending=[False, True])
return results_df[['parameters', 'total_flagged', 'true_positives',
'false_positives', 'false_positive_rate', 'recall']]
7.4 Machine Learning in Transaction Monitoring: What Changes
When ML replaces or augments rule-based detection, the fundamental architecture changes:
Rules-based approach: Predefined criteria applied to each transaction → binary flag (alert/no alert).
ML approach: Features extracted from each transaction and its context → probability score (0.0–1.0) → threshold applied to score to generate alert.
The critical differences:
What ML adds: The ability to detect complex, non-linear patterns that rules cannot express. A rules-based system can check whether a transaction amount is above a threshold. An ML system can learn that the combination of: amount in a specific range + unusual hour of day + third-party counterparty risk level + recent velocity change + customer account age creates a specific risk signature, even if no single factor exceeds any threshold.
What rules add: Determinism and explicability. A rules-based alert can be explained exactly: "This transaction was flagged because the account received $48,000 in four transactions over two days, then sent $46,500 in three transactions to an offshore account." An ML alert can say: "The model assigned a score of 0.87 based on features including [list of top features]." The explanation is less precise for ML, though explainability techniques (SHAP, Chapter 26) can improve this significantly.
The Hybrid Approach
In practice, most sophisticated AML monitoring programs use a hybrid approach: - Rules-based layer: Catches known typologies and regulatory-required scenarios (CTR structuring, OFAC-adjacent patterns) - ML layer: Identifies unusual patterns not captured by rules - Priority-weighted alert queue: Combines rule-based alerts (with known typology labels) and ML-based alerts (with risk scores) into a single queue prioritized by risk
This hybrid architecture captures the transparency of rules (important for regulatory examination) while benefiting from ML's ability to detect novel patterns.
7.5 Managing Alert Volume: The False Positive Problem
The false positive problem is not just an operational inconvenience. It is a compliance risk in its own right, because analysts overwhelmed with false positive alerts inevitably spend less time on each review — and may miss genuine suspicious activity.
Why False Positive Rates Are High
Threshold sensitivity: Rules must be calibrated conservatively to ensure high recall. This conservative calibration inevitably catches many legitimate transactions.
Population mismatch: Scenarios calibrated for a broad customer population may be very sensitive for specific customer segments (e.g., a large cash business scenario is calibrated for average customers, not for a restaurant that handles large legitimate cash volumes).
Stale scenarios: As customer profiles and transaction patterns evolve, scenarios calibrated on historical data become less accurate.
Lack of context: Rule-based alerts evaluate individual transactions in isolation, without the context of the customer's broader activity that might explain the transaction.
Strategies for False Positive Reduction
Customer segmentation: Apply different scenario configurations for different customer segments. A restaurant should not trigger the same cash-monitoring scenarios as a retail customer.
Threshold tuning: Regular review and adjustment of scenario thresholds based on analysis of alert outcomes.
ML-enhanced alert triage: Even where rules generate the initial alerts, ML can be used to prioritize the alert queue — scoring alerts by likelihood of genuine suspicion based on contextual features.
Negative news integration: Pre-filtering alerts from customers with no adverse media, no sanctions matches, and stable account histories — where the base rate of genuine suspicion is very low.
7.6 Hybrid Approaches: Rules + AI in Production
Rafael's eventual solution for Meridian Capital was a hybrid system:
The existing rules-based scenarios were retained but rationalised — he reduced from 47 scenarios to 28, eliminating scenarios with historically near-zero true positive rates. The remaining 28 were recalibrated based on the historical analysis.
An ML model was added that scored all transactions using 45 features. The model had been trained on Meridian's historical SAR cases (enriched with FinCEN SAR feedback where available). The model produced a risk score for each transaction, used to prioritize the alert queue.
The combined result: a reduction in weekly alerts from 340 to 190 (from scenario rationalization and threshold tuning), with the ML priority scoring ensuring that analysts reviewed the highest-risk alerts first.
Post-implementation metrics: - False positive rate: 78% (vs. prior 96%) - True positive detection rate: maintained at baseline - Time to review per alert: reduced by 20% (analysts were better prepared by the ML-generated risk narrative) - SAR filing rate: increased by 35% per analyst hour (more genuine cases reached investigation)
7.7 Alert Review Workflows and Productivity Metrics
The quality of an AML program cannot be assessed solely by the technology. The human review workflow — how analysts receive, investigate, and dispose of alerts — is equally important.
Key Workflow Components
Alert queue management: How are alerts assigned to analysts? What is the priority scheme? How are backlogs managed?
Case investigation tools: What information does an analyst have access to? Can they see the full transaction history? Can they see related accounts? Can they access external databases (corporate registries, press archives)?
Documentation requirements: What must the analyst document for each alert reviewed? Minimum requirements typically include: alert reviewed (Y/N), customer transaction history reviewed, sanctions/PEP check confirmed, reason for closure or escalation.
Quality assurance: What QA process reviews analyst decisions? Peer review? Manager review? Automated consistency checking?
Productivity Metrics That Matter
| Metric | What It Measures | Target Range |
|---|---|---|
| Alerts reviewed per analyst per day | Throughput | 25-50 (varies with alert complexity) |
| Average time to review per alert | Efficiency | 15-45 minutes |
| False positive rate (confirmed through QA) | Alert quality | <90% (industry: ~95%) |
| SAR filing rate as % of alerts reviewed | Conversion rate | 0.5–3% |
| Queue age (oldest unreviewed alert) | Backlog management | <30 days |
| Rework rate (QA-flagged decisions) | Quality | <5% |
Chapter Summary
AML transaction monitoring is the mechanism by which financial institutions identify suspicious activity that warrants SAR filing. Its central operational challenge is the false positive problem: most alerts require human review but represent legitimate activity.
The AML framework centers on the obligation to file SARs when suspicious activity is identified — driven by FATF standards and implemented through domestic AML law.
Money laundering stages — placement, layering, integration — provide the conceptual framework for understanding what monitoring systems are looking for.
Rules-based monitoring is transparent and auditable but generates high false positive rates and cannot detect patterns its rules don't define.
ML-based monitoring can detect complex patterns and reduce false positive rates, but requires training data, governance, and explainability attention.
Hybrid approaches combine the auditability of rules with the sophistication of ML — the practical solution for most institutions.
Alert workflow management — how alerts are reviewed, documented, and dispositioned — is as important to AML program quality as the detection technology.
Continue to Chapter 8: Sanctions Screening →