Case Study 9.2: Fairness Metrics in Loan Approval — The HMDA Data and Algorithmic Lending

Chapter 9 | Supporting Case Study


Introduction

American mortgage lending has a documented history of racial discrimination spanning a century. From the explicitly discriminatory redlining maps of the 1930s — in which federal surveyors color-coded neighborhoods by racial composition and recommended against lending in minority neighborhoods — to the discriminatory lending practices of the 2000s subprime boom that systematically sold Black and Latino homeowners worse mortgage products than their credit profiles warranted, racial disparities in mortgage access have been persistent, documented, and consequential.

Today, the underwriting decisions that determine who gets a mortgage at what rate are increasingly automated. Algorithms have replaced or supplemented human loan officers in assessing creditworthiness. These algorithms are marketed as objective, consistent, and free of the racial animus that marked earlier generations of human discrimination. Fairness metric analysis of the data those algorithms produce tells a more complicated story.


1. The HMDA Data: What It Is and What It Shows

The Home Mortgage Disclosure Act (HMDA), passed by Congress in 1975, requires most mortgage lenders to collect and publicly disclose data about their lending activity. The data includes information about each mortgage application: the loan amount, property location, purpose of the loan, the applicant's income, race, ethnicity, gender, and the disposition of the application — approved, denied, or withdrawn.

HMDA data is the most comprehensive public dataset on mortgage lending in the United States. It covers tens of millions of applications annually. It is the primary tool used by regulators, researchers, and civil rights organizations to identify patterns of disparate treatment and disparate impact in mortgage lending.

What HMDA data consistently shows, year after year, is substantial racial disparities in mortgage denial rates. In recent years:

  • Black applicants are denied conventional mortgages at roughly twice the rate of white applicants with comparable incomes.
  • Latino applicants face denial rates between those of Black and white applicants.
  • These disparities persist after controlling for income — though they narrow somewhat when more detailed credit characteristics are controlled for.

The persistence of these disparities despite decades of fair lending law enforcement reflects both ongoing discrimination and the structural legacy of past exclusion: historical redlining concentrated Black wealth in neighborhoods with lower property values, limiting home equity and creditworthiness for subsequent generations.

What HMDA data cannot definitively establish is the cause of disparities. Because HMDA captures income but not credit scores, debt-to-income ratios, or the full range of underwriting variables, it is impossible to determine from HMDA data alone how much of the disparity is attributable to credit differences versus discrimination. This limitation is important — but it is also convenient for lenders who wish to attribute all observed disparities to legitimate underwriting factors.


2. How Algorithmic Lending Works

Modern mortgage underwriting typically involves two layers of algorithmic decision-making. The first layer — which has been in place since the 1990s — involves government-sponsored enterprise (GSE) automated underwriting systems: Fannie Mae's Desktop Underwriter and Freddie Mac's Loan Product Advisor. These systems assess whether a loan meets the criteria for purchase by the GSEs, which is the primary mechanism through which lenders manage their mortgage portfolio risk. A loan that receives an "approve" recommendation from an automated underwriting system can typically be originated with confidence.

The second, more recent layer involves proprietary machine learning models developed by individual lenders or fintech companies. These models may incorporate a wider range of data — rental payment history, utility payments, bank account transaction patterns, and in some cases alternative data sources — to supplement the traditional credit assessment factors.

Traditional mortgage underwriting relies primarily on the "three Cs": credit history (measured by FICO scores and credit report data), capacity (debt-to-income ratio, employment history, income stability), and collateral (the loan-to-value ratio for the property). Algorithmic underwriting systems incorporate these factors and may add additional variables.

The fundamental fairness question for algorithmic mortgage underwriting is: when the model incorporates variables that correlate with race without directly measuring race, does it produce racially disparate outcomes? The answer, as we will see, is yes — by design, in some cases.


3. What Fairness Metric Analysis Reveals

Researchers who have applied fairness metric analysis to mortgage lending data have found consistent patterns. A landmark 2018 analysis of HMDA data by the Center for Investigative Reporting (Markup) found that in 61 metropolitan areas across the United States, people of color were significantly more likely to be denied home loans than white applicants with similar financial characteristics.

When fairness metrics are applied to algorithmic lending data:

Demographic parity is substantially violated in most markets. Black and Latino applicants receive mortgage approvals at lower rates than white applicants with comparable incomes. The disparity is largest in metropolitan areas with histories of severe residential segregation.

Calibration analysis is more difficult to perform on mortgage data because default outcomes take years to materialize, but studies of actual mortgage performance by demographic group suggest that at any given credit score, Black and Latino borrowers default at lower rates than white borrowers — a finding consistent with over-restriction of credit to minority borrowers, the opposite of credit overextension.

Equalized odds analysis is complicated by the lack of full access to lenders' underwriting variables, but academic analyses using available data suggest that denial rates for Black and Latino applicants with credit profiles that would lead to approval for white applicants are substantially higher than denial rates for white applicants with equivalent profiles.

The 80% rule under disparate impact doctrine provides a regulatory floor: if the denial rate for a protected group exceeds 125% of the denial rate for the most-favored group (inverse of the four-fifths rule), that constitutes a prima facie case of adverse impact. Most major lenders' denial rate disparities substantially exceed this threshold.


4. The Zip Code Proxy: How Neighborhood Data Encodes Race

Perhaps the most significant fairness issue in algorithmic mortgage lending is what researchers call the "zip code proxy" problem. Many algorithmic underwriting models incorporate property location — zip code, census tract, or neighborhood characteristics — as a variable. In principle, this is legitimate: property values, market liquidity, and neighborhood economic trends are relevant to assessing the collateral value of a home loan. In practice, because of the history of racial residential segregation, neighborhood location is a powerful proxy for race.

American neighborhoods remain highly racially segregated. A ZIP code in a major metropolitan area is often an extremely reliable predictor of the racial composition of its residents — more reliable, in many markets, than explicit racial data. A model that uses neighborhood characteristics as underwriting variables will therefore produce racially correlated outputs, even if race is explicitly excluded.

The legal framework for this problem comes from the Fair Housing Act, which prohibits discriminatory effects regardless of discriminatory intent. A lending policy with a racially disparate impact is unlawful unless the lender can demonstrate that it is justified by business necessity and there is no less discriminatory alternative. The use of neighborhood-level variables in automated underwriting has not been systematically litigated under this standard — but the regulatory landscape is shifting.

The 2015 Supreme Court decision in Texas Department of Housing and Community Affairs v. Inclusive Communities Project confirmed that disparate impact claims are cognizable under the Fair Housing Act. This legal foundation supports challenges to algorithmic underwriting models that use neighborhood proxies, even if those models do not explicitly incorporate race.


5. The Upstart Case: An Alternative Credit Scoring Model

Upstart is a fintech company that offers personal loans using an alternative credit scoring model. Rather than relying primarily on FICO scores, Upstart's model incorporates variables including educational attainment, field of study, employment history, and income trajectory. The company argues that these variables are better predictors of creditworthiness than traditional FICO scores, particularly for people who are "credit invisible" — thin-file borrowers without extensive credit histories, who are disproportionately young, recent immigrants, and members of minority groups.

In 2019, the Consumer Financial Protection Bureau (CFPB) issued a no-action letter to Upstart, agreeing not to pursue enforcement actions under ECOA against Upstart's model while the company tested its alternative approach. The CFPB framed this as an opportunity to determine whether alternative data could expand credit access for underserved populations. Upstart reported that its model approved 27% more borrowers than a traditional model at the same loss rate, with applicants receiving interest rates 16% lower on average.

The fairness implications of Upstart's model are contested. Including educational attainment as a variable could expand credit access for college graduates who are in the early stages of their careers — a demographic that includes many young professionals of all races. But it could also disadvantage applicants without college degrees, who are disproportionately Black and Latino, and introduce new forms of credential-based discrimination.

The specific variables Upstart uses are proprietary, and the company has not published detailed fairness metric analyses disaggregated by race for all demographic groups across all credit score ranges. The CFPB's no-action letter has been criticized by fair lending advocates for allowing an unproven model to operate without requiring the demonstration of demographic parity or equalized odds.


6. CFPB Scrutiny of Algorithmic Underwriting

The Consumer Financial Protection Bureau has increasingly focused on algorithmic credit decisions as a priority area. The Equal Credit Opportunity Act (ECOA) prohibits discrimination in credit decisions based on race, color, religion, national origin, sex, marital status, age, or receipt of public assistance. ECOA applies to all credit decisions, including those made by algorithms.

In 2022, the CFPB published guidance clarifying that the adverse action notice requirements of ECOA apply to algorithmic credit decisions — meaning that when a credit application is denied, the applicant must be given specific reasons for the denial, even if those reasons are generated by a machine learning model. This requirement has significant fairness implications: it forces lenders to maintain interpretability in their models sufficient to generate meaningful explanations.

The CFPB has also signaled interest in examining whether algorithmic credit models constitute "digital redlining" — systematic denial of credit to minority communities through algorithmic means, even when intent to discriminate is absent. In 2023, the CFPB brought an enforcement action against a lender whose algorithmic model produced substantially higher denial rates for applicants in majority-Black and majority-Latino neighborhoods, even controlling for available credit variables.

The regulatory trajectory is toward greater scrutiny, greater transparency requirements, and a willingness to apply disparate impact standards to algorithmic credit models. Lenders who have not conducted rigorous fairness metric analysis of their models face meaningful regulatory and litigation risk.


7. The Community Reinvestment Act and Its Limits

The Community Reinvestment Act (CRA), passed in 1977, requires federally regulated depository institutions to meet the credit needs of all communities in which they operate, including low- and moderate-income communities. CRA compliance is evaluated through periodic examinations, and CRA ratings affect banks' ability to engage in mergers, acquisitions, and branch expansions.

The CRA was designed as a remedy for redlining — the systematic denial of credit to minority communities — and it has had demonstrable positive effects on lending in underserved communities. However, the CRA has significant limitations in the algorithmic lending era:

  • CRA applies only to federally regulated depository institutions. Fintech lenders, mortgage companies not affiliated with banks, and online lenders — which collectively originate a large and growing share of mortgages — are not subject to CRA.
  • CRA compliance examinations focus on community reinvestment activities broadly defined, not specifically on fairness metrics in algorithmic credit decisions.
  • CRA examinations have historically given high passing rates to institutions with documented lending disparities, suggesting that CRA compliance and fair lending compliance are evaluated separately and imperfectly.

The regulatory modernization of the CRA, finalized by the federal banking agencies in 2023, expanded the definition of assessment areas and updated evaluation criteria — but did not fundamentally address the algorithmic fairness challenge. A lender can be CRA compliant while operating an algorithmic underwriting model that produces substantial racial disparities.


8. The Algorithmic Redlining Litigation Landscape

Civil rights litigation against algorithmic lending discrimination is a growing area of law. Several significant cases illustrate the landscape:

National Fair Housing Alliance v. Facebook (2019): The National Fair Housing Alliance and other civil rights organizations sued Facebook, alleging that its advertising targeting system allowed housing advertisers to exclude Black, Latino, and other minority groups from seeing housing advertisements. Facebook settled, agreeing to overhaul its ad targeting system for housing ads. The case is significant because it established that algorithmic targeting of advertising — not just credit underwriting — can constitute fair housing violations.

U.S. v. Trident Mortgage (2022): The Department of Justice brought a redlining case against Trident Mortgage Company, alleging that the lender had systematically avoided lending in majority-Black and majority-Latino neighborhoods in Philadelphia. The case included algorithmic evidence: analysis of the lender's application patterns, marketing activities, and staff deployment showed systematic avoidance of minority neighborhoods. Trident settled for $22 million.

Class actions against algorithmic underwriting: Multiple putative class actions have been filed against lenders and fintech companies, alleging that their automated underwriting systems produce racially disparate denial rates. These cases face significant litigation challenges — establishing liability requires showing that the algorithm's disparate impact is not justified by business necessity and that a less discriminatory alternative exists.

The litigation landscape is creating real financial and reputational risk for lenders whose algorithms produce disparate impact. This risk is a meaningful driver of corporate interest in fairness metric analysis — though cynics might note that managing litigation risk is not the same as being committed to fair lending.


9. What Fair Algorithmic Lending Would Require

Genuinely fair algorithmic lending — in contrast to fair lending compliance as a risk management exercise — would require:

Comprehensive fairness metric analysis as a condition of deployment. Before deploying a credit underwriting model, lenders should be required to demonstrate that it meets specified fairness criteria — at minimum, demographic parity and equalized odds evaluated across racial and ethnic groups. Where these criteria conflict (as the impossibility theorem predicts they will when base rates differ), lenders should document the trade-off made and the reasoning.

Transparent underwriting standards. Applicants should be able to understand, in general terms, what factors the model uses and how they are weighted. Full algorithmic transparency may be impractical, but meaningful explanation of denial reasons — as ECOA already requires — is a minimum standard.

Regular auditing with disaggregated results. Lenders should conduct and publish regular audits of their models' fairness metrics, disaggregated by race, ethnicity, gender, and their intersections. The results should be publicly accessible, not merely reported to regulators.

Elimination or validation of neighborhood proxies. Any variable that proxies for race — including zip code, census tract, neighborhood-level demographic variables — should be either eliminated from underwriting models or validated to demonstrate that its inclusion is necessary for accurate risk assessment and that its effect on racial disparities has been minimized.

Community participation in model governance. Affected communities — historically redlined neighborhoods, community development organizations, fair lending advocacy groups — should have formal roles in the governance of algorithmic underwriting systems that affect their access to credit.

None of these requirements is technically impossible. All of them are politically difficult, because they impose costs on lenders and may modestly reduce profitability by expanding credit to borrowers who are currently denied. The history of fair lending enforcement suggests that genuine progress requires both regulatory mandate and litigation risk — voluntary adoption has consistently proven insufficient.


Discussion Questions

  1. HMDA data shows persistent racial disparities in mortgage denial rates, but HMDA does not capture all underwriting variables. Lenders argue that unexplained disparities are attributable to credit differences not captured in HMDA. Fair lending advocates argue that in the absence of full data, disparities should be presumed to reflect discrimination. How should regulators handle this evidentiary ambiguity? What additional data collection would help resolve it?

  2. Upstart argues that its alternative credit scoring model expands credit access for underserved borrowers by using non-traditional variables. Fair lending advocates express concern that variables like educational attainment introduce new forms of discrimination. What fairness metric analysis would you require Upstart to conduct before receiving regulatory approval to expand its lending model? What would the results need to show?

  3. The impossibility theorem tells us that demographic parity and calibration cannot simultaneously hold when base rates differ across racial groups. In mortgage lending, which criterion should take priority — equal approval rates across groups (demographic parity) or equal predictive accuracy across groups (calibration)? How does your answer reflect your values about the purpose of mortgage lending and the role of government in addressing historical inequity?

  4. The Community Reinvestment Act was designed to prevent geographic redlining by requiring banks to serve all communities in their assessment areas. How adequate is this framework for addressing algorithmic redlining, where the discrimination operates through underwriting algorithms rather than geographic avoidance? What legislative or regulatory reforms would better address algorithmic discrimination in lending?


See also: Chapter 30 (AI and Financial Services) and Chapter 20 (Accountability Structures) for broader treatment of algorithmic discrimination in finance and regulatory responses.