Case Study 31-2: Priya's Cross-Border GFIN Application

DataField.Dev

Case Study 31-2: Priya's Cross-Border GFIN Application

Background

Priya Nair had been advising RegTech firms for long enough to recognize the moment when a client's problem was more interesting than it first appeared. The call that began this engagement started simply enough: a Singapore-based RegTech firm called Triage Analytics had built an AI-driven AML alert triage system and wanted to enter the UK market. Could Priya advise on FCA authorization requirements?

The more she heard, the more she understood why the authorization question was the easy part.

Triage Analytics had developed a machine learning system that addressed one of the most persistent problems in AML compliance: alert fatigue. A typical bank's transaction monitoring system generates tens of thousands of alerts per month; the vast majority — industry estimates range from 90% to 99% — are false positives. AML investigators spend most of their time confirming that innocent transactions are innocent, which means they spend very little time investigating the genuinely suspicious ones. Alert fatigue was not just an efficiency problem; it was a compliance problem. When investigators were overwhelmed by volume, genuinely suspicious alerts got lost in the queue.

Triage Analytics' system sat between the transaction monitoring system and the human investigation team. It analyzed flagged transactions against a broad set of typological indicators — payment network structure, counterparty relationships, transaction timing patterns, behavioral baselines — and assigned each alert a triage priority score. Alerts scored above a defined threshold went immediately to human investigators. Alerts scored below the threshold were automatically closed with a detailed documentation trail. The system did not make final SAR (Suspicious Activity Report) decisions — that judgment was always human — but it compressed the effective alert volume, routing genuine signals to human attention and closing clear false positives automatically.

The technology had been developed and piloted in Singapore, where Triage's two anchor clients were MAS-supervised banks. MAS had been watching the development with interest. The volume of AML alerts generated by Singapore's banks was a known supervisory concern, and a technology that demonstrably reduced false positives without increasing missed suspicious activity would address a problem MAS had been trying to solve through supervisory guidance for several years.

The UK ambition was logical: the same alert fatigue problem existed in every major financial center. FCA-supervised banks generated alert volumes that overwhelmed their AML teams just as Singapore's did. If Triage could demonstrate that its system worked across regulatory jurisdictions — with different transaction monitoring systems, different AML typology databases, different suspicious activity reporting frameworks — it would be genuinely marketable to any bank in any major jurisdiction.

The catch was the phrase "if it could demonstrate." Nobody had run a live cross-border AML triage AI system under dual-regulator oversight before. MAS and the FCA both had legitimate concerns about an AI system that was making automated decisions about which AML alerts would and would not receive human review. The concerns were different in detail but shared a common structure: how do you ensure that the system's triage decisions are accurate, consistent, explainable, and auditable — and how do you establish this to two regulators' satisfaction simultaneously?

Priya's recommendation, after a month of preliminary analysis, was to use the GFIN cross-border testing framework.

Structuring the GFIN Application

The GFIN cross-border testing process requires simultaneous applications to the member regulators who will participate in the test. Triage Analytics would apply to both the MAS (its home regulator) and the FCA (its target market regulator), with the applications coordinated so that both regulators received the same core technology description and the same proposed test parameters, adapted for their specific regulatory frameworks.

Priya managed the FCA application directly. Triage's Singapore counsel managed the MAS application. The two teams held weekly coordination calls to ensure that the applications were genuinely consistent — not just formally concurrent — and that neither application made commitments to one regulator that would be impossible to honor alongside the commitments made to the other.

The core applications addressed the same five eligibility criteria from each regulator's perspective:

Genuine innovation. An AI triage system that automatically closed AML alerts below a defined threshold, with a complete documentation trail for each automated closure, was a genuine first for both UK and Singapore regulated markets. Neither the FCA nor MAS had published guidance on automated AML alert closure — the published frameworks assumed human review of all alerts above the transaction monitoring threshold.

Consumer benefit. This criterion required careful framing in both applications. The direct beneficiaries of AML triage were not consumers in the conventional sense — they were the banks' AML investigators and, ultimately, the financial system's ability to identify genuine financial crime. The consumer benefit case was indirect: by reducing the false positive burden on human investigators, the system would make the investigators more effective at detecting genuine suspicious activity, which served the public interest in preventing financial crime. Both MAS and the FCA accepted this framing, though the MAS application was able to draw on two years of pilot data from MAS-supervised banks to quantify the benefit more precisely than the FCA application could.

Need for sandbox. Both regulators' AML frameworks assumed human review of flagged transactions. Automated closure of alerts — even clearly benign ones, even with full documentation — was an activity that neither regulator had a framework for authorizing through standard processes. The sandbox was necessary.

Jurisdictional nexus. MAS: Triage was Singapore-incorporated, and the test clients would be MAS-supervised banks. FCA: the test clients would be FCA-supervised banks conducting AML activities in the UK under FCA and HMRC supervision.

Ready to test. Triage had two years of operational data from the Singapore pilot. The technology was validated.

The Divergent Sandbox Parameters

The regulatory parameters that emerged from the two sandbox assessments were where the cross-border complexity became real.

On automated closure scope:

The MAS sandbox permitted automated closure of alerts that scored below the 30th percentile of the triage model's priority distribution, subject to a complete documentation trail and a minimum daily random sample audit (10% of all auto-closed alerts reviewed by a human investigator). MAS was comfortable with this threshold because the two-year Singapore pilot had demonstrated empirically that alerts in this range had a verified false positive rate exceeding 99.5%.

The FCA was more conservative. The FCA sandbox permitted automated closure only for alerts in the bottom 15th percentile. The FCA's position was that it was approving an activity that had no precedent in its regulatory framework, and that a more conservative threshold was appropriate for the first test in its jurisdiction, even recognizing that the MAS experience provided supporting evidence. Additionally, the FCA required that the random sample audit cover 20% of auto-closed alerts rather than 10%.

This created a genuine operational challenge. Triage's system was calibrated to a single threshold — the 30th percentile, which had been established as the appropriate cut-off through the Singapore pilot. Running the same system at different thresholds in different jurisdictions was technically possible, but it meant that the UK test and the Singapore test were not testing the same model. The cross-border test was, in effect, two parallel single-jurisdiction tests with different parameters.

Priya worked with the MAS and FCA case officers on a resolution: Triage would run the UK test at the FCA's 15th percentile threshold, with a parallel shadow analysis that tracked what the outcomes would have been at the 30th percentile threshold, for review by both regulators at the end of the test. This allowed the FCA test to be conducted within FCA-approved parameters while generating the data that would allow both regulators to assess whether the threshold should be aligned in any future authorization.

On explainability requirements:

MAS required that Triage's system provide a structured explanation for each triage decision — a summary of the factors that drove the priority score, expressed in terms that an AML investigator could evaluate and, if necessary, override. MAS specified that the explanation must identify the three highest-weighted features in the model's score for each alert.

The FCA's explainability requirement was broader. The FCA required that the system's explanations be comprehensible not just to AML investigators but to a compliance officer reviewing the audit trail — a person who might have limited technical background. The FCA required that explanation text avoid model-specific jargon and express the triage logic in terms of observable transaction characteristics.

These two requirements were compatible in principle but created design tension in practice. An explanation optimized for a technically sophisticated investigator (MAS) was not the same as an explanation optimized for a compliance officer without technical background (FCA). Triage designed a two-tier explanation format: a technical summary (for investigators) and a plain-English summary (for compliance review), satisfying both requirements with additional design work.

On SAR filing impact:

The MAS sandbox required Triage to monitor and report the rate at which alerts that had been auto-closed by the system were subsequently re-raised by investigators reviewing the documentation trail or by other intelligence sources. If an alert that was auto-closed was subsequently the subject of a SAR filing, that was a model error event requiring escalation.

The FCA had the same requirement, but also added a requirement to report any instance where an auto-closed alert was related to a transaction subsequently investigated by law enforcement. This second requirement was harder to monitor in real time — FCA-supervised banks do not always know that a customer has become the subject of law enforcement interest until well after the relevant transactions — and required a retrospective review mechanism rather than a real-time monitoring mechanism.

What Cross-Border Testing Revealed

The twelve-month test ran in both jurisdictions simultaneously, with both regulators receiving quarterly progress reports and with a joint regulator call at the six-month mark — a mechanism facilitated by GFIN's coordination infrastructure.

The most important finding from cross-border testing was one that neither a Singapore-only test nor a UK-only test could have generated: the model's triage performance was materially different across the two jurisdictions, and the source of the difference was identifiable.

The model's alert priority scores were calibrated on Singapore transaction patterns — the typological indicators, network structures, and behavioral baselines that characterized suspicious activity in Singapore's financial system. The Singapore pilot data on which the model had been trained reflected MAS-supervised banks' customer bases, transaction types, and financial crime typologies.

When the model was applied to FCA-supervised bank transaction data, it encountered a different distribution of typological indicators. UK financial crime typologies — particularly in the areas of fraud-enabled money laundering and trade-based money laundering — had structural patterns that differed from their Singapore equivalents. The model's triage scores in the UK were less calibrated than in Singapore: the bottom 15th percentile in the UK included some alerts that, in the joint regulator review, both sets of case officers agreed should not have been auto-closed.

This was precisely the finding that single-jurisdiction testing would not have produced. In Singapore, the model performed well because it had been trained on Singapore data. A Singapore-only test would have confirmed the model's Singapore performance and provided no information about its transferability. The cross-border test revealed that the model needed jurisdiction-specific calibration before it could be deployed in new markets — a finding with significant implications for any AML AI vendor seeking to sell to banks in multiple jurisdictions.

Triage restructured its product architecture in response: the core triage engine remained global, but the typological indicator weights were calibrated per-jurisdiction using local training data, with a minimum six-month calibration period on local bank data required before any new jurisdiction deployment.

Post-Test Outcomes

Both MAS and the FCA issued positive sandbox exit assessments. MAS authorized Triage Analytics as a technology service provider to MAS-supervised banks for AML alert triage services, subject to ongoing audit requirements. The FCA determined that FCA-supervised banks could use Triage's system under existing regulatory frameworks, with the automated closure scope and audit requirements established in the sandbox terms incorporated as baseline compliance expectations.

GFIN published a joint learning note from the MAS and FCA teams — the first cross-border AML AI sandbox to generate such a publication — noting the jurisdiction-calibration finding and recommending that other jurisdictions assessing similar technologies require jurisdiction-specific training data validation before authorizing deployment.

Priya's final note to the Triage Analytics board captured the cross-border dimension precisely: "A Singapore test would have told you your system works in Singapore. The GFIN test told you how to make it work everywhere — and what would have happened if you had tried to deploy it in London without the calibration. The cross-border test is harder, more expensive, and takes longer. It is also the only test worth doing."

Discussion Questions

The MAS and FCA sandbox parameters diverged significantly on automated closure threshold (30th vs. 15th percentile) and audit sampling rates (10% vs. 20%). What does this divergence reveal about the relationship between regulatory conservatism and the accumulation of evidence over time? Under what conditions might a regulator appropriately accept a more permissive threshold — and what type of evidence would be required?
The model's performance was significantly different in the UK than in Singapore due to differences in financial crime typologies. What does this imply about the regulatory oversight of AI systems that have been validated in one jurisdiction and are then deployed in another? Should there be a general requirement for jurisdiction-specific validation before deployment?
GFIN's cross-border testing framework requires coordinating applications to two or more regulators simultaneously, managing divergent requirements, and conducting a test across two regulatory perimeters. Given this complexity, for what types of RegTech innovation is cross-border GFIN testing worth the additional burden — and for what types is it not?
The FCA case officer and MAS case officer held a joint call at the six-month mark of the test. What governance challenges arise when two regulators with different supervisory frameworks jointly oversee a cross-border sandbox test? Who has primary responsibility for consumer protection if something goes wrong mid-test?
Triage Analytics emerged from the GFIN test with a finding that reshaped its product architecture: the model required jurisdiction-specific calibration. This was a commercial setback (higher deployment costs and timelines) but also a regulatory compliance improvement. How should RegTech firms evaluate the trade-off between product uniformity (lower cost, faster deployment) and regulatory calibration (higher cost, better local performance)?