Case Study 25.2: Priya's Fraud System Assessment — When the False Positive Rate Is the Product

The Situation

Organization: Clearwater Payments (fictional mid-market payment processor) Priya's engagement: Independent assessment of Clearwater's card fraud detection system for a prospective acquirer Timeline: Q3 2024 Regulatory backdrop: FCA authorisation as a payment institution; Consumer Duty obligations; GDPR


Background

Clearwater Payments processes card transactions for approximately 8,000 UK small and medium-sized merchants — primarily retail, hospitality, and e-commerce. Transaction volume: approximately 2.4 million transactions per month. Clearwater's fraud detection system was built in-house three years ago, based on a rules-based engine with a gradient-boosted model layer added eighteen months ago.

Priya Nair was engaged by a private equity firm considering acquiring Clearwater. The engagement scope: assess the fraud detection system's technical quality, regulatory compliance, and any material gaps that would affect the acquisition valuation.

Before beginning her technical review, Priya asked Clearwater's fraud team for their key performance metrics. The response: "We have an industry-leading false positive rate. Our merchants love us — they get almost no declined legitimate transactions."

Priya wrote in her engagement notes: "Whenever someone leads with the false positive rate as their headline metric, I want to know what the fraud loss rate looks like."


What Priya Found

Metric 1: False Positive Rate. Clearwater's false positive rate was genuinely low: 0.8% of all legitimate transactions were incorrectly flagged. Industry average for comparable payment processors: 1.2–1.5%. Clearwater's merchants did indeed have fewer complaints about declined legitimate transactions.

Metric 2: Fraud Loss Rate. Clearwater's confirmed fraud loss rate was 0.52% of processed volume. Industry average: 0.18–0.25%. Clearwater was experiencing more than twice the industry-average fraud loss rate.

The explanation was immediate: Clearwater had optimized for low false positives at the expense of high recall. The model threshold had been set — deliberately — at a point where very few legitimate transactions were blocked. The consequence: the model was also missing a substantial proportion of fraudulent transactions.

Priya asked to see the model's precision-recall curve and the threshold choice rationale. The documentation showed: the threshold had been set by Clearwater's commercial team, not the fraud team, in response to merchant complaints about declined transactions eighteen months earlier. The fraud risk team had objected; the commercial team had overruled them.

Metric 3: Fraud Loss Allocation. Under Clearwater's merchant agreements, fraud losses below a certain threshold were borne by Clearwater (as the payment processor). Above the threshold, losses were passed to the merchant. Clearwater's commercial team had set the fraud model threshold knowing that the fraud losses from missed detections were largely borne by merchants, not by Clearwater.

Priya wrote in her notes: "This is not a bad model. This is a business model choice disguised as a model calibration choice. The system is deliberately tuned to minimize merchant friction — and the fraud losses are being passed to the merchants who pay Clearwater for fraud protection."


The Regulatory Analysis

Priya's regulatory assessment identified three concerns.

Consumer Duty (FCA). Although Clearwater's direct customers are merchants (not end consumers), the Consumer Duty applies to payment processors in the chain of consumer financial services. End consumers whose card details are fraudulently used on Clearwater-processed transactions experience harm: unauthorized charges, account disruption, time spent disputing. The FCA's Consumer Duty requires firms to take responsibility for their role in customer harm across the chain. A system deliberately tuned to minimize merchant friction while passing fraud harm downstream is at tension with Consumer Duty principles.

Payment Institution Authorisation. Clearwater is authorised as a payment institution under the Payment Services Regulations 2017. The PSR 2017 and PSD2 require payment service providers to have robust risk management, including fraud controls. Clearwater's fraud control framework documentation described the threshold as "calibrated to industry best practice" — which was not accurate. The threshold had been set by commercial decision, not fraud risk analysis. The documentation misstated the calibration rationale.

Merchant Agreement Representation. Clearwater's merchant agreements represented that Clearwater "maintains sophisticated fraud detection technology designed to protect merchant accounts." This representation, combined with the deliberate under-detection that passed losses to merchants, raised a potential misrepresentation concern under UK consumer contract law — applicable to the merchants as Clearwater's customers.


The Threshold Calibration Issue

Priya met with Clearwater's model development lead, Sasha, to understand the technical picture in full.

The model itself, Sasha explained, was well-constructed. On a holdout test set, at the optimal F1 threshold, the model achieved precision 58%, recall 84%, F1 0.68 — competitive with industry benchmarks. The problem was the deployed threshold. Clearwater was operating at a threshold that achieved precision 82%, recall 43% — meaning the model was catching fewer than half of fraudulent transactions.

Priya asked: what would happen if the threshold were moved to the F1-optimal point?

Sasha had already modeled this. At the F1-optimal threshold: - False positive rate would increase from 0.8% to 1.4% (still below industry average of 1.5%) - Recall would increase from 43% to 84% - Estimated fraud detection improvement: catching approximately £2.1M additional fraud per month (based on current monthly fraud volumes) - Merchant-borne losses at F1-optimal: reduced by ~£1.8M/month (net of the fraud Clearwater would now bear)

The numbers told the story: the threshold calibration was costing merchants approximately £1.8M per month in avoidable fraud losses. Clearwater had accepted a fraud-enabling model calibration to protect its commercial relationships — with the cost borne by its customers.


The Acquirer's Decision

Priya's assessment report to the private equity acquirer noted:

"The fraud detection system's technical quality is adequate. The model itself performs competitively at appropriate calibration. The issue is not model quality but model deployment — a threshold calibration decision made for commercial rather than risk management reasons, which has resulted in a materially elevated fraud loss rate borne by merchants.

"Three risks arise for an acquirer:

"1. FCA enforcement risk: if Clearwater's Consumer Duty compliance is reviewed in the context of its threshold calibration decision, the documentation that describes the calibration as 'industry best practice' creates a misrepresentation risk that compounds regulatory exposure.

"2. Merchant churn and litigation risk: merchants experiencing elevated fraud losses are potential claimants in contract disputes, particularly if they become aware that the losses were avoidable.

"3. Remediation cost: recalibrating the threshold to the F1-optimal point will increase false positives, generating merchant complaints. A merchant communication program will be required. Implementation cost is estimated at 2–3 months' revenue in customer retention investment.

"The fraud system is not a valuation-determining risk. It is a manageable operational remediation. The calibration decision should be reversed within 90 days of acquisition. The compliance documentation should be corrected before FCA review is triggered."

The private equity firm proceeded with the acquisition at a reduced valuation that reflected the remediation cost and regulatory risk. Clearwater's threshold was adjusted within the first quarter of new ownership. Merchant complaints about false positives increased by the expected amount and subsided within 90 days as merchants adapted to the new calibration.

The fraud loss rate fell from 0.52% to 0.21% of processed volume within six months — converging with industry benchmarks.


Discussion Questions

1. Clearwater's commercial team overruled the fraud risk team's threshold recommendation, setting the threshold for business reasons rather than fraud risk reasons. What governance controls should prevent this type of decision from being made unilaterally by a commercial function? Who should have final authority over model threshold calibration in a payment processing firm?

2. The fraud loss rate was borne by merchants, not by Clearwater. This misalignment of incentives directly influenced the threshold calibration decision. How should payment processing agreements be structured to ensure that the processor's incentives are aligned with effective fraud protection for merchants? What regulatory provisions (PSD2, FCA rules) address this?

3. Clearwater's fraud documentation described the threshold calibration as "industry best practice" when it had been set by commercial decision. This misrepresentation was identified during due diligence. From a regulatory perspective, what is the risk of inaccurate model governance documentation — and why might regulators treat documentation inaccuracies as seriously as the underlying calibration failure itself?

4. Priya found that the model itself was technically sound — the calibration, not the model, was the problem. This distinction matters: a bad calibration is operationally remediable; a fundamentally flawed model may require rebuilding. How should due diligence on AI/ML systems distinguish between model quality and model deployment quality? What questions should an acquirer ask in technical due diligence?

5. After the acquisition and threshold recalibration, false positives increased, generating merchant complaints. From a change management perspective, how should the new owner communicate the threshold change to merchants? Is it possible to be transparent about why the change was made (previous calibration was suboptimal) without damaging commercial relationships?