Case Study 2: The Fairness Audit That Changed the Credit Model

DataField.Dev

Case Study 2: The Fairness Audit That Changed the Credit Model

Background

Cornerstone Bank is a mid-sized UK retail bank with approximately 2.1 million personal banking customers and a lending portfolio concentrated in personal loans and credit cards. Its credit scoring model — CreditEdge v5 — has been in production since 2021 and has a validated AUC of 0.83, which the model risk team considers strong for a logistic regression model on this product class.

CreditEdge v5 uses twenty-two input features. Among them is the applicant's postcode at the two-letter-plus-digit level — a geographic feature that captures local economic conditions including area unemployment rates, average local property values, and consumer spending patterns. The model development team's justification for including postcode was straightforward: postcode is predictive of default rates in the historical training data, where applications from certain postal districts had materially higher default rates than others. Excluding it reduced the model's AUC by 3.1 points — from 0.83 to 0.798 — a meaningful degradation in discrimination ability.

CreditEdge v5's overall approval rate across all applications in the twelve months preceding the audit was 74.3%.

The fairness audit was commissioned by Cornerstone's Chief Risk Officer, Beatrice Omolade, in response to a thematic review letter from the FCA asking firms in the retail lending sector to demonstrate that their credit decision processes produced fair outcomes across consumer groups. The letter did not allege any specific violation. It was the kind of supervisory communication that signals regulatory attention to an emerging concern — and Beatrice Omolade had been in financial services long enough to know that the firms that respond proactively to this kind of letter are not the ones who end up in enforcement proceedings.

The Findings

The external consultancy conducting the fairness audit, DataEquity Partners, approached the analysis in three stages. First, they analysed approval and denial rates across the postcodes in Cornerstone's application data, using Office for National Statistics data to classify postcodes by the demographic composition of their local authority areas. Second, they computed the disparate impact ratio — the approval rate for the lower-approval group as a fraction of the approval rate for the higher-approval group — using the standard four-fifths (80%) rule. Third, they used SHAP analysis to decompose the contribution of the postcode feature to individual application decisions.

The findings were significant.

Applications from postcodes in the bottom quartile of the postcode feature distribution had an approval rate of 58.1%. Applications from postcodes in the top quartile had an approval rate of 84.7%. The disparate impact ratio was 58.1 / 84.7 = 0.686 — well below the 0.80 threshold that defines disparate impact under the four-fifths rule.

The geographic pattern of the low-approval postcodes, when mapped, showed a concentration in urban areas with historically high proportions of residents from Black, Asian, and mixed-ethnicity backgrounds. Cornerstone's model did not include race or ethnicity as an input feature. It did include postcode, and postcode correlated with race and ethnicity because of decades of residential segregation that the model's training data reflected without context or qualification.

DataEquity's report was careful in its framing. The correlation between postcode-based approval rates and demographic composition of local areas did not prove that CreditEdge v5 was discriminatory. It could not prove that without individual-level demographic data, which Cornerstone did not collect and could not legally use. What it demonstrated was a pattern consistent with disparate impact — a pattern that, under ECOA/Regulation B's US framework, would trigger a business necessity justification requirement. Under the UK's regulatory framework, the FCA's Consumer Duty and the Equality Act 2010 created analogous obligations, though the specific procedural requirements differed.

The Options

Beatrice Omolade convened a working group with the head of credit risk, the model risk lead, the legal team, and the chief data officer. The working group identified three primary options.

Option 1: Remove postcode from the model. Removing postcode would eliminate the feature most directly responsible for the disparate impact pattern. It would reduce the model's AUC from 0.83 to 0.798 — a 3.2-point reduction. In practical terms, a lower AUC means worse discrimination between good and bad credit risks: some customers who would default would be approved who otherwise would not have been, and some customers who would have repaid would be declined who otherwise would have been approved. The working group estimated that a 3.2-point AUC reduction would increase the expected annualised default rate on new lending by approximately 0.6 percentage points. On Cornerstone's lending volume, that represented an expected additional annual loss of approximately £4.2 million against a modest capital buffer.

Option 2: Retain postcode but document a business necessity justification. The business necessity defence holds that a practice that produces disparate impact may be legally defensible if it serves a legitimate, significant business objective and there is no less discriminatory alternative that achieves the same objective. The working group concluded that postcode's predictive value was empirically demonstrated and that the feature captured legitimate economic signals beyond demographic proxies. However, the legal team noted that the business necessity justification was stronger in some jurisdictions than others, and that retaining the feature while documenting the justification would not eliminate regulatory risk — it would manage it. If the FCA concluded that a less discriminatory alternative existed, the business necessity defence would fail.

Option 3: Redesign the model with fairness constraints. A fairness-constrained model would include additional variables that captured the legitimate economic signals associated with postcode — local unemployment rate, area index of multiple deprivation, average property price — without using postcode itself as a categorical variable. This approach would attempt to preserve the predictive information in postcode's economic signals while removing the categorical postcode variable whose primary effect might be to proxy demographic characteristics. A feasibility analysis suggested that a redesigned model with these area-level economic indicators as continuous features could recover approximately 1.8 AUC points of the 3.2-point loss, reaching approximately 0.811 AUC, while substantially reducing the disparate impact ratio.

The Decision

The working group spent three weeks modelling the alternatives, consulting with external legal counsel, and engaging with a specialist in algorithmic fairness. After two sessions with Beatrice Omolade, the group's recommendation was Option 3, with a validation requirement comparing the new model's disparate impact ratio against both the existing model and the postcode-removal baseline.

The redesign took four months. The rebuilt model — CreditEdge v6 — used postcode area-level economic indicators as continuous features rather than postcode itself as a categorical feature. It also incorporated two additional variables not in the original model: the applicant's stated purpose for the loan (which was correlated with default propensity independently of geography) and a tenancy status indicator. On the validation dataset, CreditEdge v6 achieved an AUC of 0.816 — a 1.4-point reduction from v5's 0.83, smaller than the 3.2-point reduction the postcode-removal scenario had projected.

The disparate impact ratio for CreditEdge v6, measured on the same geographic groupings as the original analysis, improved to 0.831 — above the 0.80 four-fifths threshold. The improvement was not complete: there remained a geographic approval rate differential, reflecting the genuine differences in credit risk profiles across areas that the economic indicators captured. But the gap had narrowed substantially, and the residual differential was more clearly attributable to economic factors than to demographic proxies.

Cornerstone's response to the FCA included a full account of the fairness audit, the options considered, the redesign rationale, and the v6 validation results. The FCA acknowledged the response and noted that the firm had taken a proactive and structured approach. No further supervisory action resulted.

Beatrice Omolade made one further decision. She commissioned an annual fairness monitoring programme for CreditEdge v6: an annual analysis of approval rates and SHAP contributions by geographic segment, with automatic escalation to the model risk committee if the disparate impact ratio fell below 0.80. The programme was designed to detect drift — the possibility that as the model's training data aged and the population's geographic distribution shifted, disparate impact patterns could re-emerge even in a model that had passed its initial fairness validation.

She told her team: "This is not a box we've ticked. It's a programme we're running."

Discussion Questions

1. The four-fifths rule is a commonly used heuristic for identifying disparate impact, but it has significant limitations as a standalone standard. What are the limitations of the four-fifths rule, and what additional analysis should a firm conduct alongside it to build a complete picture of its model's fairness properties?

2. Option 1 (remove postcode) and Option 3 (redesign with economic indicators) both reduce disparate impact relative to the status quo, but by different mechanisms and with different costs. How should a firm weigh the trade-off between model performance (AUC) and fairness outcomes? Is there a principled basis for deciding how much AUC reduction is acceptable in exchange for fairness improvement, or is this inherently a value judgment?

3. Cornerstone did not collect individual-level demographic data on its applicants, which meant DataEquity could demonstrate only a geographic pattern rather than individual-level disparate impact. Some firms argue that not collecting demographic data protects them from discrimination risk. Others argue that not collecting it prevents the firm from detecting and remediating discrimination. Evaluate both positions. What does the regulatory framework in the UK and US suggest about the appropriate approach to collecting sensitive demographic data for fairness monitoring?

4. The feasibility analysis showed that replacing postcode with area-level economic indicators recovered approximately 1.8 of the 3.2 AUC points that postcode removal cost. This means the redesigned model still underperforms the original by 1.4 AUC points. How should the remaining 1.4-point gap be framed to internal stakeholders (the board, shareholders) and to the regulator? What does a 1.4-point AUC reduction mean in practical credit risk management terms?

5. Beatrice Omolade established an annual fairness monitoring programme that will escalate to the model risk committee if the disparate impact ratio falls below 0.80. Design the elements of a comprehensive fairness monitoring framework for a retail credit scoring model: what metrics would you track, at what frequency, with what escalation thresholds, and using what data? What are the resource implications of this framework for a mid-sized bank?