Case Study 02: Priya's Vendor Assessment — Demanding Disaggregated Performance

DataField.Dev

Case Study 02: Priya's Vendor Assessment — Demanding Disaggregated Performance

Background

Priya Chandrasekaran is a senior regulatory technology consultant at a London-based advisory firm. She has spent seven years advising financial services clients on model risk management, AI governance, and regulatory compliance, with a particular focus on algorithmic systems in credit and onboarding. Her current engagement is with Meridian Financial, a consumer lending platform that is in the process of procuring a new credit decisioning system. Meridian processes approximately 180,000 loan applications per year, predominantly for personal loans in the £2,000–£25,000 range.

Meridian's existing credit decisioning model is eight years old and uses a traditional scorecard architecture. The procurement exercise has identified three vendors offering modern machine learning-based decisioning platforms. All three have passed initial technical and security due diligence. Meridian's procurement committee has requested Priya's assessment of the model risk and regulatory compliance aspects of each vendor.

The Vendor Presentation

The leading candidate vendor — VantageDecision, a fintech that has been operating in the UK market for five years — presents to Meridian's procurement committee. The presentation is polished and technically impressive. VantageDecision's platform uses a gradient-boosted model trained on 4.2 million UK consumer credit decisions. The model outputs a probability of default alongside a credit decision recommendation.

The performance metrics are strong. AUC of 0.83. F1 score of 0.74. Precision of 0.81. Recall of 0.68. Gini coefficient of 0.66. Validation was conducted on a holdout dataset of 420,000 applications. The validation report is thorough and professionally presented. VantageDecision's presenters are confident: this is a well-validated, high-performance model that should meaningfully improve Meridian's approval rate while maintaining risk management standards.

Priya asks one question during the presentation.

"Can you show me these metrics disaggregated by demographic group?"

There is a brief pause. The lead presenter says that VantageDecision does not routinely prepare disaggregated performance analysis, but they are happy to discuss the validation methodology. Priya follows up: which demographic attributes were included in the training data? Not as features, she clarifies, but in the dataset — does VantageDecision have data on the demographic composition of the applicants whose outcomes were used to train the model? A second pause. VantageDecision's technical lead says they can look into this.

Priya's note to the committee following the presentation is short: "Do not proceed with VantageDecision until disaggregated demographic performance data is provided. This is a procurement condition, not a request."

The Negotiation

VantageDecision's initial response is diplomatic resistance. Their legal team argues that disaggregated demographic performance data constitutes commercially sensitive intellectual property that they cannot share with a potential client. Priya's response, communicated through Meridian's legal team, is direct: Meridian cannot procure a credit decisioning system without understanding whether it produces disparate impacts on protected groups under the Equality Act 2010 and ECOA. If VantageDecision cannot provide demographic performance data under an appropriate non-disclosure agreement, Meridian will need to consider other vendors who can. The request is not unreasonable — it is a basic requirement of responsible model procurement.

After three weeks, VantageDecision provides the disaggregated data. The NDA covers the specific figures, but the structure of what they find is permitted to be communicated.

The results show that VantageDecision's model has an AUC of 0.83 across the full population — the headline number from the initial presentation. But when the validation dataset is segmented, the model's approval rate ratio for two demographic groups falls below 0.80. For applicants categorised as South Asian, the approval rate is 0.76 of the approval rate for white British applicants. For applicants categorised as Black African or Black Caribbean, the approval rate is 0.69 of the approval rate for white British applicants. Both figures represent four-fifths rule violations.

VantageDecision's technical team offers an explanation: the disparities reflect genuine differences in credit risk profiles across the demographic groups in their training data. The model is simply learning what the data shows. Priya reads this explanation carefully. She has two responses.

First: calibration analysis. Are the model's probability-of-default scores equally accurate across groups? VantageDecision provides this analysis. For the white British group, a score of 0.20 (20% predicted probability of default) corresponds to an observed 19.8% default rate in the holdout data — excellent calibration. For the Black African/Black Caribbean group, a score of 0.20 corresponds to an observed 23.1% default rate — worse calibration, but still roughly in range. The model is not wildly miscalibrated.

Second: the historical question. VantageDecision's training data is composed of credit decisions made over a ten-year period by a range of UK lenders. Priya asks a question that VantageDecision finds difficult to answer: were the lenders whose data was used in training themselves subject to any regulatory findings regarding discriminatory lending practices? The answer is that VantageDecision does not know — they assembled the dataset from multiple sources and did not audit the historical lending practices of those sources. This is historical bias in the making: if any of those lenders applied stricter standards to certain demographic groups over the training period, those standards are now encoded in VantageDecision's model.

Priya's Advice

Priya presents her findings to Meridian's procurement committee in a formal written assessment. Her conclusions are structured around three questions.

The first question is legal: does the VantageDecision model create a disparate impact exposure for Meridian? The answer is yes. Two demographic groups show four-fifths rule violations. This does not establish that the model constitutes unlawful indirect discrimination — VantageDecision and Meridian might argue that the disparities reflect genuine risk differences (the "justification" defence under the Equality Act). But the disparities create regulatory exposure that must be managed. Meridian's ability to defend a discrimination claim will depend on whether it can demonstrate: (a) that the disparate impact is justified by genuine business necessity (differential credit risk), (b) that the risk model is well-calibrated (this appears to hold, though imperfectly), and (c) that there is no less discriminatory alternative available. Meridian has not yet assessed (c).

The second question is regulatory: what will the FCA's Consumer Duty require? The Duty requires Meridian to monitor outcomes across customer segments and investigate where those outcomes are not consistently good. Approval rate disparities of the magnitude observed — 69% of the majority group's approval rate for one demographic group — are precisely the kind of outcome the Consumer Duty is designed to surface. Meridian will need to run ongoing demographic outcome monitoring on any credit decisioning system it deploys. If it deploys VantageDecision without first addressing the identified disparities, it is starting from a position of known non-compliance.

The third question is commercial: can VantageDecision remediate the disparities, and on what timeline? Priya's recommendation is that Meridian make remediation a binding procurement condition rather than a good-faith aspiration. Specifically: VantageDecision should provide a fairness remediation plan within sixty days of contract signature, including analysis of whether the disparities reflect genuine risk differences or training data biases, proposed threshold adjustments or model modifications, and a commitment to provide quarterly disaggregated performance reports during the contract term. If VantageDecision cannot commit to this programme, Priya recommends considering the second-ranked vendor in the procurement process.

Her final observation is the one that stays with the procurement committee: "You cannot manage what you cannot measure. VantageDecision did not routinely prepare this data. That is itself a finding. Any vendor deploying a credit model in the UK in 2024 should know their demographic performance metrics the way they know their AUC score. If they don't, they haven't been looking."

Outcome

VantageDecision agrees to the remediation conditions. The contract includes a fairness schedule with quarterly disaggregated reporting obligations, a four-fifths rule compliance commitment with a twelve-month remediation timeline for the identified violations, and a right for Meridian to exit the contract at annual renewal if fairness obligations are not met. Meridian's model risk management framework is updated to include demographic performance monitoring as a first-class metric alongside accuracy and stability indicators.

Six months into deployment, the Black African/Black Caribbean approval rate ratio has improved to 0.77 — still below the four-fifths threshold, but on an improving trajectory. VantageDecision attributes the improvement to threshold recalibration and additional training data sourced from UK lenders with more recent, post-ECOA-guidance data practices. The South Asian approval rate ratio has improved to 0.84, above the four-fifths threshold. Both groups remain on the monitoring programme.

Priya's engagement summary to Meridian closes with a note on industry practice: "This procurement process demonstrates what responsible model governance looks like. Every firm procuring an algorithmic decisioning system should ask the disaggregated performance question at the vendor presentation stage. The question takes fifteen seconds to ask. The answer tells you more about a model's regulatory fitness than any aggregate AUC score."

Discussion Questions

1. VantageDecision's initial resistance to providing disaggregated performance data was framed as an intellectual property concern. Evaluate this argument. What legitimate business interests might a vendor have in protecting disaggregated performance data, and how can those interests be balanced against the firm's need for fairness information to discharge its regulatory obligations?

2. VantageDecision argues that the approval rate disparities reflect genuine differences in credit risk across demographic groups — that the model is simply learning what the data shows. Priya's response involves examining both calibration and training data provenance. Explain why calibration analysis alone is insufficient to determine whether observed approval rate disparities are "justified" for Equality Act and Consumer Duty purposes.

3. Priya identifies three questions to structure her assessment: legal exposure, Consumer Duty compliance, and commercial remediation conditions. How do these three dimensions interact? In what circumstances might a model be legally defensible (genuine business justification for disparate impact) while still requiring remediation under the Consumer Duty?

4. The contract fairness schedule that Priya negotiates includes quarterly disaggregated reporting, a four-fifths rule compliance commitment with a remediation timeline, and an exit right at annual renewal. Extend this framework: what additional clauses would you recommend for a comprehensive vendor fairness schedule in a credit decisioning context?

5. Priya's concluding observation is that a vendor not routinely preparing disaggregated performance data has "not been looking." What does this imply about the current state of industry practice? What governance, regulatory, or market mechanisms could be used to make demographic-disaggregated performance reporting a standard practice across the UK financial services vendor ecosystem?

Teaching Notes

This case study is designed to illustrate the practical mechanics of vendor fairness assessment in a procurement context, and to develop students' understanding of the distinction between statistical and legal/regulatory analysis of algorithmic disparities.

The calibration point is important and often misunderstood. A well-calibrated model — one whose probability scores accurately predict outcomes across all groups — can still produce disparate impact. Calibration addresses whether the model's scores mean the same thing across groups, not whether the outcomes produced by applying a fixed decision threshold to those scores are equitable. A model can be perfectly calibrated and still have a 0.69 approval rate ratio for a demographic group if the score distribution of that group is shifted relative to the majority group.

The historical bias analysis — Priya's question about whether the lenders in VantageDecision's training dataset had been subject to discriminatory lending findings — represents good practice in training data provenance review. Students should understand that training data provenance is a model governance requirement, not merely an academic concern. The EU AI Act's Article 10 data governance requirements make this an explicit regulatory obligation for high-risk AI systems.

The three-question assessment framework (legal, regulatory, commercial) provides a useful structure for any algorithmic fairness evaluation. Students should appreciate that these are not always aligned: a legally defensible model may still require Consumer Duty remediation; a model with commercially acceptable remediation timelines may still create short-term regulatory exposure that requires active management.