37 min read

In February 2016, Eric Loomis appeared before a Wisconsin circuit court judge for sentencing on charges of eluding an officer and operating a vehicle without the owner's consent. It was not a complex case. But what happened next set in motion one of...

In This Chapter

Opening: The Algorithm That Sentenced Eric Loomis
Section 1: AI Across the Criminal Justice Pipeline
Section 2: Predictive Policing
Section 3: Risk Assessment Tools
Section 4: The COMPAS Investigation in Full
Section 5: Bail Algorithms
Section 6: AI in Policing — Surveillance Technologies
Section 7: Facial Recognition in Law Enforcement
Section 8: The Due Process Problem
Section 9: International Comparison
Section 10: Reimagining Criminal Justice AI
Section 11: The Vendor Ecosystem and Market Dynamics
Section 12: Race, Criminal Justice AI, and the History of Discrimination
Summary

Case Study 01 Case Study 02 Key Takeaways Exercises Quiz Further Reading

Chapter 30: AI in Criminal Justice Systems

Opening: The Algorithm That Sentenced Eric Loomis

The judge had before him a risk assessment from COMPAS — Correctional Offender Management Profiling for Alternative Sanctions — a proprietary algorithm developed by a company called Northpointe (later renamed Equivant). COMPAS had scored Loomis as "high risk" across multiple categories: risk to community, recidivism risk, and pre-trial release. The judge explicitly referenced this score in imposing a six-year prison sentence. Loomis challenged the use of the secret algorithm, arguing that his due process rights under the US Constitution had been violated: he could not examine the formula that helped determine his sentence, could not question it, and could not meaningfully contest it.

The Wisconsin Supreme Court ruled against Loomis in 2016. It found that COMPAS had been used appropriately as one factor among many, and that the due process concerns, while acknowledged, did not rise to constitutional violation. The US Supreme Court declined to hear the case in 2017. Northpointe's algorithm remained in use. The formula remained secret.

The Loomis case is not an isolated incident. It is a window into a system-wide transformation: AI is now present at virtually every stage of the American criminal justice process, from the prediction of where crimes will occur before they happen, through the assessment of whether a defendant is released before trial, through recommendations that influence sentencing, and into the assessment of when imprisoned people should be released. At each stage, the same fundamental questions arise: Is the algorithm accurate? Is it fair? Is it transparent? And can its outputs be challenged by those whose freedom is at stake?

This chapter examines these questions with the rigor they demand. Criminal justice AI is not an abstraction; it operates on real people, in systems already marked by profound racial and socioeconomic inequality, with consequences measured in years of human freedom.

Section 1: AI Across the Criminal Justice Pipeline

The Full Scope of Algorithmic Criminal Justice

To understand AI in criminal justice, it helps to trace the full pipeline from the moment a potential crime is identified through the eventual end of justice system involvement.

Pre-crime and patrol: Predictive policing systems analyze historical crime data, socioeconomic indicators, and geographic patterns to predict where crimes are likely to occur and who is likely to commit them. Police resources are directed accordingly. A person going about their lawful business in a predicted hot spot may be stopped, questioned, or surveilled at higher rates than someone in a differently predicted area. The AI has already begun shaping who encounters law enforcement before any crime has been alleged.

Arrest and booking: Facial recognition systems are used by an increasing number of police departments to identify suspects from photographs — witness photos, surveillance footage, social media images. Automated license plate readers record the movements of every vehicle passing through high-coverage areas. Social media monitoring tools scan public posts for intelligence relevant to criminal investigations.

Bail and pretrial release: Risk assessment instruments produce scores predicting the likelihood that a defendant will fail to appear for trial or be rearrested during the pretrial period. These scores influence judicial decisions about whether defendants are detained before trial — a decision with enormous consequences, as pretrial detention is associated with worse trial outcomes, job loss, housing loss, and family disruption regardless of eventual guilt determination.

Charging: Prosecutors in some jurisdictions use analytics tools to assess the strength of cases and the predicted outcomes of prosecutorial strategies. AI assistance in charging decisions is less publicly documented than other pipeline stages but is growing.

Sentencing: Risk assessment instruments, of which COMPAS is the most prominent, produce scores used in sentencing recommendations. In some jurisdictions, these are formally integrated into sentencing guidelines; in others, they are provided to judges as supplementary information.

Incarceration: Risk classification systems determine housing assignments, privilege levels, and programming assignments within correctional facilities. These systems affect the daily conditions of incarceration.

Parole and release: Risk assessment instruments are used in parole decisions — determining when incarcerated individuals are released and under what supervision conditions. A high algorithmic risk score can mean the difference between release and continued imprisonment.

Bias Evidence at Each Stage

The racial bias evidence across these pipeline stages is pervasive and consistent. Predictive policing directs enforcement to areas disproportionately populated by communities of color, creating higher surveillance of those communities regardless of actual differential offending rates. Facial recognition systems have documented higher error rates for darker-skinned individuals, particularly Black women. Bail risk assessments have been shown in multiple studies to produce higher risk scores for Black defendants than for similarly situated white defendants. Sentencing risk assessments exhibit similar patterns. Each stage where bias appears compounds the bias of previous stages: a person incorrectly identified as high risk for pretrial release is more likely to be detained, more likely to accept a plea deal, more likely to receive a sentence that generates another risk assessment, and so on through the pipeline.

The concept of "feedback loops" is critical here: if predictive policing directs more police to Black neighborhoods, more crimes will be detected in those neighborhoods (not because they occur more frequently, but because they are policed more intensively), which feeds historical crime data that trains the next iteration of the predictive policing model to direct even more police there. The historical data that trains AI criminal justice systems reflects historical policing practices, which were themselves discriminatory. AI systems trained on that data will encode and perpetuate — and through feedback loops, intensify — those discriminatory patterns.

Section 2: Predictive Policing

How Predictive Policing Systems Work

Predictive policing tools use machine learning to analyze historical crime data, geographic and demographic variables, and sometimes social network data to produce predictions about where crimes are likely to occur ("place-based" prediction) or who is likely to commit crimes ("person-based" prediction). These predictions are used to direct police patrol resources, deploy investigative attention, or — in the most aggressive applications — justify targeted contact with specific individuals before any crime has been alleged.

The most widely deployed commercial predictive policing product was PredPol (later renamed Geolitica), which used a place-based model adapted from earthquake aftershock prediction to identify small geographic areas ("hot spots") with elevated crime probability. PredPol's output was daily maps of hot spots where officers were directed to spend patrol time. The company operated in dozens of US cities from its founding in 2011 until it ceased operations in 2023, following sustained criticism of its methodology and the loss of major municipal contracts.

Chicago's Strategic Subject List (SSL) — sometimes called the "heat list" — was a person-based predictive system that scored approximately 400,000 Chicago residents for risk of being involved in a shooting, either as victim or offender. The system assigned risk scores that were shared with police and used to guide outreach and enforcement contact. A 2021 ACLU investigation found that the SSL had high error rates, listed people as high risk who had never been arrested, was disproportionately focused on Black and brown communities, and had been used in ways not disclosed to the public.

The Feedback Loop Problem

The fundamental methodological problem with predictive policing is the feedback loop: the historical crime data used to train predictions reflects where police previously deployed attention and where crime was therefore previously detected — not an objective picture of where crime occurs. If police patrol heavily in neighborhood A and minimally in neighborhood B, more crime will be detected in neighborhood A even if similar quantities of crime occur in both places, because detection is a function of enforcement intensity. Using detection data as a proxy for occurrence data will consistently predict more crime where police already patrol, directing police to those areas, generating more detections, and confirming the model's predictions in a self-fulfilling cycle.

Research by Rashida Richardson, Jason Schultz, and Kate Crawford (2019) documented "dirty data" in predictive policing: cities including New Orleans, Chicago, and others had trained predictive policing systems on crime data that was itself a product of documented police misconduct — falsified police reports, manufactured evidence, unconstitutional stops, corrupt narcotics units. The AI trained on this data learned from corrupted records, but its outputs carry the authority of algorithmic objectivity.

Moratoria and the Reform Movement

The civil society and political response to predictive policing has been substantial. Santa Cruz, California, became the first US city to ban predictive policing in 2020. Los Angeles terminated its PredPol contract in 2020 following a sustained campaign by Stop LAPD Spying Coalition and other advocacy groups. Portland, Oregon, and New York City implemented moratoria on certain predictive policing applications. The state of California enacted legislation (AB 13) establishing requirements for law enforcement's use of predictive policing systems.

The evidence on predictive policing's effectiveness is contested and limited. Studies have found modest crime reduction effects in some deployments; critics argue these effects are achieved through civil liberties violations that would not be tolerated if applied universally rather than concentrated in communities of color. The RAND Corporation's evaluation of predictive policing programs found mixed and often underwhelming effectiveness evidence, with significant methodological challenges in isolating the program's effect from other simultaneous interventions.

Section 3: Risk Assessment Tools

The Landscape of Criminal Justice Risk Assessment

Risk assessment instruments in criminal justice are structured tools — combining interview items, criminal history records, and demographic data to produce numerical scores representing predicted probability of future criminal behavior. They are used in bail decisions, sentencing, parole determinations, and risk classification in corrections facilities.

The major instruments in use include:

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions): Developed by Northpointe/Equivant, COMPAS produces multiple risk scores including general recidivism risk, violent recidivism risk, and risk to community. It uses approximately 137 questions covering criminal history, drug use, residential stability, education, vocation, criminal attitudes, family criminality, and social isolation. General recidivism risk scores range from 1 to 10. It has been deployed in Wisconsin, California, New York, and multiple other jurisdictions. Its proprietary formula was a central issue in the Loomis case.

PSA (Public Safety Assessment): Developed by the Arnold Foundation (now Arnold Ventures), the PSA is specifically designed for pretrial decisions. It uses nine factors derived entirely from criminal history records — no interview required — to produce risk scores for failure to appear and new criminal activity. The Arnold Foundation explicitly designed the PSA to be transparent and publicly available, publishing its methodology. It has been deployed in New Jersey, Kentucky, and other jurisdictions.

ORAS (Ohio Risk Assessment System): A suite of instruments used in Ohio across pretrial, pre-sentence, community supervision, and institutional settings, developed by the University of Cincinnati Corrections Institute. Like PSA, ORAS has publicly available methodology documentation.

LSI-R (Level of Service Inventory — Revised): Widely used in community supervision and corrections, LSI-R covers 54 items across criminal history, education/employment, financial, family/marital, accommodation, leisure/recreation, companions, alcohol/drug problems, emotional/personal, and attitudes/orientation. It is used for risk classification and supervision intensity decisions.

What These Tools Claim to Measure

Risk assessment instruments claim to measure the probability of specific future behaviors — typically rearrest, reconviction, or failure to appear for trial within a specific time window (typically two years). This is an empirical, statistical claim: the score represents where this individual's profile places them within the distribution of outcomes observed for people with similar profiles in historical data.

A critical point for understanding risk assessment: these tools do not predict individual behavior with precision. A "high risk" score does not mean an individual will reoffend; it means that individuals with similar profiles have reoffended at higher rates in historical data. The score is a probabilistic population-level statement about similar people, applied to an individual whose future is not determined by group membership. This application of population statistics to individual destiny is itself ethically contested.

The Bias Evidence

The most influential bias evidence for criminal justice risk assessment tools came from ProPublica's 2016 investigation of COMPAS — discussed in full in the next section. The broader research literature on risk assessment bias is extensive and consistent in finding racial disparities.

A 2018 study by Jennifer Skeem and Christopher Lowenkamp, examining the PSA in multiple jurisdictions, found that while the PSA predicted failure to appear with similar accuracy across racial groups, Black defendants were assigned higher risk scores at similar base rates of subsequent offense — suggesting the calibration reflects genuine differential risk (attributable to structural factors like employment and housing instability) rather than measurement bias per se. This distinction — between an instrument that measures accurately versus one that measures a socially just outcome — is central to the bias debate.

A 2020 study by Dressel and Farid found that COMPAS achieved only modest predictive accuracy — roughly equivalent to the predictions of random people with brief case descriptions — casting doubt on whether the algorithmic sophistication adds value over simpler approaches.

Section 4: The COMPAS Investigation in Full

The ProPublica Investigation

On May 23, 2016, ProPublica published "Machine Bias" — one of the most consequential pieces of data journalism in the history of AI accountability. Reporters Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner had obtained COMPAS risk scores for every defendant scored in Broward County, Florida, over a two-year period, matched those scores to criminal justice outcomes, and conducted a detailed statistical analysis of the instrument's accuracy and racial equity.

The finding that became a cultural reference point: among defendants who did not reoffend within two years, Black defendants were nearly twice as likely as white defendants to have been falsely flagged as high risk (44.9% vs. 23.5%). Among defendants who did reoffend, white defendants were more likely than Black defendants to have been falsely assessed as low risk (47.7% vs. 28.0%). In other words, COMPAS made systematically different types of errors for Black and white defendants: it over-predicted recidivism for Black defendants and under-predicted it for white defendants.

The investigation also found that COMPAS's overall predictive accuracy was modest: it correctly predicted recidivism in about 65% of cases — only slightly better than a simple two-variable model based on age and number of prior offenses. The complex, proprietary, 137-question instrument was not dramatically outperforming simple approaches.

ProPublica illustrated its findings with individual case studies: Brisha Borden, a Black 18-year-old charged with petty theft of a bicycle, received a high recidivism risk score and no subsequent offense; Vernon Prater, a white 41-year-old with a more serious prior record, received a low risk score and subsequently committed additional crimes. The individual cases illustrated the pattern quantitatively — without implying that the algorithm could be evaluated from individual cases alone.

Northpointe's Rebuttal

Northpointe responded to the ProPublica analysis with detailed technical critiques. The company argued that ProPublica's analysis was methodologically flawed in a specific and important way: ProPublica had examined error rates conditioned on actual outcome (what percentage of those who didn't reoffend were nonetheless flagged high risk, by race), while the appropriate fairness criterion, Northpointe argued, was calibration — whether the score meant the same thing for Black and white defendants (whether a score of 7 predicted the same actual recidivism rate for Black and white defendants).

Northpointe's analysis showed that COMPAS was well-calibrated across racial groups: a score of 7 predicted roughly the same recidivism rate regardless of the defendant's race. On this criterion, the instrument was fair.

Both claims were technically correct — and mutually inconsistent. This was the spark for one of the most significant mathematical analyses in AI fairness literature.

The Chouldechova Impossibility Result

In 2017, Carnegie Mellon researcher Alexandra Chouldechova published a mathematical proof showing that when two groups have different base rates of the outcome being predicted (different rates of actual recidivism, in this case), no risk assessment instrument can simultaneously satisfy all of the following fairness criteria: (1) equal false positive rates across groups, (2) equal false negative rates across groups, and (3) calibration (scores mean the same thing across groups). You can achieve any two of these criteria, but not all three simultaneously when base rates differ across groups.

Applied to COMPAS: Black defendants in Broward County had higher actual recidivism rates than white defendants (a disparity itself reflecting structural inequalities in policing, prosecution, and socioeconomic conditions). Given this base rate difference, no calibrated instrument can simultaneously achieve equal false positive rates and equal false negative rates. If the instrument is calibrated (Northpointe's criterion), it will necessarily produce different error rate patterns for the two groups (ProPublica's finding). If it achieves equal error rates, it will necessarily be miscalibrated.

This is not a failure of any specific algorithm — it is a mathematical impossibility result that applies to any classification system when the outcome base rates differ between groups. The Chouldechova result should fundamentally reframe how we think about risk assessment: the question is not whether to violate a fairness criterion (some violation is mathematically unavoidable when base rates differ), but which criterion to prioritize — a question that is ultimately a value judgment about whose interests to protect from which types of error.

What the Legal and Policy Community Did With This

The immediate institutional response was more modest than the public attention suggested. Wisconsin's Supreme Court had already ruled on Loomis by the time the ProPublica investigation appeared, and the Chouldechova result did not create a legal obligation to stop using COMPAS. Several jurisdictions reviewed their use of risk assessment tools; some made modifications. New Jersey's implementation of the PSA for pretrial decisions, which began in 2017, was monitored for disparate impact and subjected to independent evaluation. California enacted requirements for a state agency to evaluate risk assessment tools used in criminal justice.

The longer-term policy trajectory has been mixed. The New Jersey criminal justice reform, which used the PSA as a centerpiece, showed reductions in pretrial detention without significant increases in crime — suggesting that well-implemented risk assessment can reduce mass incarceration without compromising public safety. At the same time, racial disparities in the system persisted: Black defendants were still detained at higher rates than white defendants with similar scores, suggesting that racial bias in judicial decision-making compounded whatever bias existed in the tool itself.

The Arnold Foundation (which developed the PSA) took the transparency route that Northpointe did not: publishing its methodology fully, commissioning independent evaluations, and making the tool free to jurisdictions. This transparency did not eliminate bias concerns — the PSA's inputs include criminal history data that reflects biased historical enforcement — but it enabled independent scrutiny that proprietary tools foreclosed.

Section 5: Bail Algorithms

The Pretrial Detention Crisis

The American pretrial detention system is both enormous in scale and severe in consequence. Approximately 500,000 people are detained in jails on any given day before trial — awaiting a criminal proceeding that has not yet determined their guilt. These individuals have not been convicted of a crime. Many are detained for inability to pay bail — for financial reasons — rather than because they are determined to be flight risks or public safety threats. The consequences of pretrial detention are severe and well-documented: defendants who are detained before trial are more likely to plead guilty (even when innocent, to secure release), receive longer sentences if convicted, lose employment and housing during detention, and suffer family separation.

Bail reform — replacing cash bail with risk-based release decisions — was embraced by many reformers as a response to the pretrial detention crisis. If defendants could be released or detained based on flight risk and public safety risk rather than wealth, poor defendants would no longer be disadvantaged relative to wealthy defendants charged with identical offenses.

New Jersey's Reform and Its Outcomes

New Jersey implemented criminal justice reform in 2017 that effectively eliminated cash bail, using the PSA to guide pretrial release decisions. The reform was significant: pretrial detention rates fell substantially, the jail population declined, and importantly, rates of failure to appear and new criminal activity during the pretrial period did not increase significantly — the reform achieved its stated goals.

But racial disparities did not disappear. Analysis of New Jersey's post-reform data found that Black defendants continued to be detained at higher rates than white defendants with similar PSA scores — suggesting that judicial decision-making added racial bias on top of whatever was in the tool's outputs. The reform also faced political backlash: high-profile crimes committed by defendants released under the new system generated intense criticism that led to modification of the reform in subsequent years.

The New Jersey experience illustrates a recurring pattern: algorithmic criminal justice tools can be deployed in ways that reduce mass incarceration, but they do not automatically overcome the racial bias in human decision-making that surrounds them, and they create political dynamics in which individual high-profile failures are used to roll back systemic reforms that, on aggregate, improved outcomes.

Section 6: AI in Policing — Surveillance Technologies

The Surveillance Infrastructure

Beyond predictive policing, AI is now woven through the full surveillance infrastructure of American and international policing. This infrastructure includes:

Automated License Plate Readers (ALPRs): Systems mounted on patrol cars, bridges, and fixed infrastructure that photograph and record the license plate of every vehicle they encounter, automatically running plates against warrants, stolen vehicle databases, and potentially other lists. Major cities have built dense networks of ALPR readers that provide near-comprehensive records of vehicle movement throughout urban areas. This data is retained by local police departments and often shared with national networks. ACLU analysis of ALPR data has found that the vast majority of plates read are not associated with any criminal activity — the system generates a permanent database of innocent people's movements.

ShotSpotter (now SoundThinking): Acoustic gunshot detection systems that use a network of sensors to detect, locate, and alert police to possible gunshots in real time. Deployed in approximately 120 US cities, ShotSpotter activates police response to algorithmically identified events. The system's accuracy and legal implications are the subject of the case study accompanying this chapter.

Social Media Monitoring: Police departments use tools — including commercial platforms purpose-built for law enforcement — to monitor public social media posts, identify individuals, and compile profiles. The Brennan Center for Justice has documented the use of social media monitoring against political protesters, religious communities, and advocacy groups in ways that raise First and Fourth Amendment concerns.

Gang Databases: Many jurisdictions maintain databases of individuals categorized as gang members or gang associates, based on criteria that often rely on subjective assessments by individual officers. These databases have been found to be inaccurate, to contain individuals added without notice or process, and to disproportionately include people of color. AI tools that incorporate gang database status as an input variable inherit these data quality problems.

Section 7: Facial Recognition in Law Enforcement

Documented Wrongful Arrests

Chapter 26 of this textbook addresses facial recognition bias in detail. The specific context of facial recognition in law enforcement is characterized by several documented wrongful arrests that illustrate the human cost of algorithmic error compounded by institutional failure.

Robert Williams was arrested in Detroit in January 2020 after facial recognition software misidentified him as a shoplifting suspect. Williams, a Black man, was arrested in front of his family, detained for 18 hours, and charged — before investigators eventually compared the suspect photograph to Williams directly and determined they had the wrong person. The Detroit Police Department acknowledged the misidentification. The ACLU, representing Williams, brought a complaint against the department, and Detroit subsequently restricted facial recognition use.

Michael Oliver, also Black, was wrongly arrested by Detroit police in 2019 based on facial recognition; charges were eventually dismissed. Porcha Woodruff, a pregnant Black woman, was wrongly arrested in Detroit in 2023 based on a facial recognition match — one of the most recent high-profile cases. The pattern of wrongful arrests in Detroit — all involving Black individuals — reflects the documented higher error rates of facial recognition systems for darker-skinned subjects.

Nijeer Parks in New Jersey, Randal Reid in Georgia, and Alonzo Sawyer in Maryland represent additional documented cases of wrongful arrests attributed to facial recognition misidentification, all involving Black men.

Departmental Policies and Reform

The response to documented facial recognition failures has been a patchwork of municipal and state-level policies. San Francisco banned police facial recognition use in 2019 — the first major US city to do so. Boston, Portland, Oregon, and several other cities followed with restrictions or bans. Detroit, despite its documented wrongful arrests, adopted a policy permitting facial recognition use but requiring supervisory review and prohibiting arrests based on facial recognition alone — a "human in the loop" requirement that acknowledges the technology's fallibility.

Several states enacted legislation addressing facial recognition in law enforcement: Illinois, Washington, and others established requirements for warrants, disclosures, or limitations on use. Comprehensive federal legislation on law enforcement facial recognition use had not been enacted as of 2024.

Section 8: The Due Process Problem

Opacity and the Right to Challenge

The due process problem with criminal justice AI is both constitutional and ethical in dimensions that cannot be fully separated. The constitutional dimension concerns what the Due Process Clause of the US Constitution requires when AI systems influence criminal proceedings. The ethical dimension concerns what fair process demands even when constitutional minimums may be technically met.

In Loomis v. Wisconsin, the due process argument was specific: Eric Loomis could not examine the COMPAS algorithm's formula, could not question an expert witness on how his specific inputs drove his specific output, and could not meaningfully challenge the assessment's application to his case. The Wisconsin Supreme Court held that this did not constitute a due process violation because: the defendant was provided with his risk scores and the factors they were based on; COMPAS was used as one factor among many; and the judge's sentence was independently supported by other evidence.

The court's reasoning is legally defensible but ethically troubling. Providing a defendant with their score and the general categories of factors does not enable meaningful challenge if the defendant cannot understand how the factors were weighted, cannot examine whether their specific inputs were correctly recorded, and cannot independently verify the model's accuracy or detect errors specific to their case. The ability to challenge evidence used against you is a cornerstone of fair adjudication — and algorithmic opacity forecloses meaningful challenge.

The Trade Secrecy Defense

Northpointe's defense of COMPAS's secrecy has consistently invoked trade secrecy: the formula is a proprietary business asset whose disclosure would destroy its commercial value. This is a remarkable assertion in the criminal justice context. Trade secrecy — a principle governing commercial competitive advantage — is being used to defeat the rights of defendants facing criminal punishment by the state.

Several jurisdictions have responded by requiring that algorithmic tools used in criminal proceedings be disclosed to the defense. California requires that criminal defendants who face significant decisions influenced by algorithmic tools receive source code, validation studies, and other documentation. New York City's government algorithm disclosure law requires disclosure of automated decision systems used by city agencies. These disclosure requirements create some accountability but face legitimate security concerns (disclosed code can be gamed by defendants) and technical challenges (source code disclosure does not necessarily enable meaningful audit without supporting data and expertise).

Section 9: International Comparison

UK Predictive Policing

The United Kingdom has piloted predictive policing in several jurisdictions. The Kent Police PRECINCT (Predictive Risk Intelligence Criminal Intelligence Needs Tool) and the Durham Constabulary's HART (Harm Assessment Risk Tool) system, used in bail decisions, have been the most prominently analyzed. HART's use of a neural network to predict whether a defendant poses "high," "medium," or "low" risk of reoffending drew criticism for opacity — the neural network's decision logic was not interpretable even by the researchers who developed it, let alone by defendants whose bail decisions it influenced.

A 2019 Big Brother Watch investigation and a 2021 Science and Technology Committee inquiry in Parliament found that UK predictive policing pilots lacked adequate evidence of effectiveness and adequate accountability mechanisms. The Metropolitan Police subsequently disclosed that its use of AI in crime prediction had included postcodes as a variable — effectively using geography as a proxy for race in ways that reproduced demographic profiling.

EU AI Act's Prohibitions

The EU AI Act, adopted in 2024, takes a substantially more restrictive approach to criminal justice AI than US frameworks have produced. The Act prohibits certain AI applications categorically, including:

AI systems used by public authorities to assess criminality risk on the basis of individual profiling
Real-time remote biometric identification in public spaces by law enforcement for purposes beyond terrorism, missing children, and certain other specific exceptions
AI-based social scoring by government authorities

For criminal justice AI applications that are permitted but classified as "high risk" — including risk assessment, bail, sentencing, and supervision — the Act requires conformity assessments, registration in an EU database, technical documentation, human oversight requirements, and transparency to affected individuals.

France's Prohibition on Judicial Scoring

France enacted legislation in 2019 prohibiting the use of judges' or courts' decisions as training data for predictive justice systems, and more broadly prohibiting AI tools that score or profile judges. This reflected concern about algorithmic optimization of judicial outcomes — the risk that AI systems would be used to predict likely judicial decisions and to optimize prosecutorial or defense strategies accordingly, in ways that might constrain judicial independence or enable gaming of justice outcomes.

The Dutch SyRI Lessons

The Netherlands' SyRI system — discussed in Chapter 29 — has direct relevance to criminal justice AI. A Dutch court found that SyRI's use in identifying welfare fraud suspects violated the European Convention on Human Rights because its algorithm was insufficiently transparent to allow meaningful challenge by those it profiled. The court held that the lack of meaningful legal protection against algorithmic profiling violated fundamental rights — a ruling with broader implications for risk assessment tools in criminal justice contexts.

Section 10: Reimagining Criminal Justice AI

Conditions for Ethical Use — If Any

The question of whether any use of AI in criminal justice can be genuinely ethical depends on what we think the criminal justice system is supposed to accomplish and what constraints ethical criminal justice requires. This is a contested question with serious positions on multiple sides.

The reformist position holds that algorithmic tools can be used ethically in criminal justice if specific conditions are met: the tools are validated with adequate evidence of accuracy; they are transparent and their methodology is publicly available; their disparate impact across demographic groups is measured, disclosed, and minimized; they are used only as advisory inputs to human decision-makers rather than as determinative; defendants have meaningful opportunity to challenge their application; and they are subject to ongoing independent evaluation with automatic review when evidence of harm emerges.

The abolitionist position — represented by scholars including Bernard Harcourt and advocates including the Electronic Frontier Foundation — holds that the fundamental problem with criminal justice AI is not technical but systemic: because the criminal justice system is itself marked by profound racial and socioeconomic inequality, algorithmic tools trained on criminal justice data will inevitably encode and perpetuate that inequality regardless of technical safeguards. This position calls for abolishing not just flawed algorithms but the entire carceral system that deploys them. On this view, improving COMPAS is like optimizing the design of a tool used for an unjust purpose — the ethical response is to refuse the purpose, not to improve the tool.

Between these positions lies a range of views on partial reforms: prohibiting specific high-risk applications (facial recognition for law enforcement, person-based predictive policing) while permitting lower-risk uses (validated pretrial risk assessment with full transparency and discharge mechanisms); requiring human decision-maker accountability such that algorithmic outputs cannot substitute for judicial judgment; and building robust challenge mechanisms including court-appointed technical experts for defendants whose freedom is affected by algorithmic assessments.

Transparency Requirements

The minimum requirements for any defensible use of AI in criminal justice include: full public disclosure of the tool's methodology, including training data, input variables, and weighting; ongoing validation studies examining accuracy and disparate impact; disclosure to defendants of their specific inputs and the factors that drove their score; access for defense counsel to documentation sufficient for meaningful challenge; independent evaluation by bodies not funded by the tool's developer; and mandatory reassessment when evidence of systematic error or bias emerges.

What Genuine Accountability Requires

Accountability for criminal justice AI requires more than disclosure. It requires that someone be legally and institutionally responsible when algorithmic criminal justice causes harm — when a wrongful arrest occurs based on a facial recognition misidentification, when a person is detained longer than necessary based on a flawed risk assessment, when a person is paroled or denied parole based on an instrument that is later found to be invalid. The current accountability structure — a mix of qualified immunity for law enforcement officers, narrow state liability, and virtually no liability for algorithm developers — creates an accountability gap where significant harm is nobody's specifically actionable responsibility.

Closing this accountability gap would require: vendor liability for tools that cause harm through documented inaccuracy or discriminatory impact; personal accountability for officials who implement AI tools without adequate due diligence; and civil rights enforcement mechanisms sufficient to deter discriminatory AI deployment. None of these exist adequately in current US law.

Section 11: The Vendor Ecosystem and Market Dynamics

Who Builds and Sells Criminal Justice AI

The AI tools used across the criminal justice pipeline are not built by government agencies; they are built by private companies and sold to governments as commercial products. This vendor ecosystem has its own dynamics — competitive, financial, and political — that shape what tools get built, how they are marketed, what their capabilities actually are, and how accountability for their performance is structured.

The criminal justice AI vendor market includes: established analytics companies that pivoted into this space (Palantir, SAS, IBM); dedicated criminal justice AI startups (Equivant/COMPAS, PredPol/Geolitica, SoundThinking/ShotSpotter, Axon, Motorola Solutions); and general AI companies whose products are adapted for criminal justice contexts (Microsoft, Amazon, Google selling facial recognition and cloud AI services to law enforcement). Each segment has different accountability dynamics.

Dedicated criminal justice AI vendors depend entirely on government contracts for revenue. Their business model requires maintaining those contracts, winning new ones, and defending their products against criticism — creating strong financial incentives to market effectiveness claims aggressively, to resist independent evaluation that might produce unfavorable results, and to leverage political relationships with police leadership and elected officials who champion their tools. The ShotSpotter/SoundThinking case illustrates this dynamic: a vendor whose primary revenue comes from police department contracts responded to critical research with legal challenges and political advocacy rather than genuine engagement with evidence.

The Due Diligence Gap in Procurement

Government procurement of criminal justice AI tools has historically been characterized by inadequate due diligence. Police chiefs and correctional administrators making procurement decisions typically lack the technical expertise to evaluate algorithmic tools' accuracy claims, the research expertise to assess validation studies' methodological quality, or the statistical expertise to understand disparate impact analyses. Vendor demonstrations are optimized to be compelling; validation studies provided by vendors are designed to support their products; and peer references from other police departments who have adopted the tool provide social proof rather than independent evidence.

The result is a systematic information asymmetry: vendors know far more about their tools' actual performance than the agencies purchasing them, and procurement processes do not effectively close this gap. Third-party evaluation — by academic researchers, independent auditors, or government technical agencies — could close the gap, but is rarely required as a condition of procurement and is actively resisted by some vendors who fear unfavorable results.

Several jurisdictions have moved to address this gap. The New York City Automated Decision Systems Law (Local Law 49 of 2018) required the city to publish an inventory of AI systems used in city agencies and established an algorithms management and policy unit to oversee AI procurement. California legislation requires independent validation of risk assessment tools used in criminal proceedings. These requirements represent progress, but comprehensive due diligence standards for criminal justice AI procurement remain the exception rather than the rule.

Section 12: Race, Criminal Justice AI, and the History of Discrimination

Understanding Why Bias Is Structural, Not Incidental

The racial disparities documented in predictive policing, risk assessment, facial recognition, and other criminal justice AI applications are not random errors that better algorithms will eliminate. They are structural features of systems trained on data produced by racially unequal historical practices. Understanding why this is so — not merely as an observation but as a causal claim — is essential for anyone assessing the ethics of criminal justice AI.

The criminal justice system in the United States has deep roots in racial hierarchy. The criminal codes and their enforcement in the antebellum South were explicitly racial instruments. Post-Reconstruction vagrancy laws, Black Codes, and convict leasing were mechanisms for racial control using criminal justice processes. Through the 20th century, racially discriminatory policing, prosecution, and sentencing were extensively documented and partially addressed through civil rights litigation and legislation — but only partially. The mass incarceration era beginning in the 1970s and accelerating through the 1980s and 1990s disproportionately affected Black communities, producing incarceration rates so high that they have been characterized by scholars as constituting a new racial caste system.

The criminal records and criminal justice contact data that trains AI criminal justice systems is the accumulated output of this history. A model trained on this data is trained on the results of decades of discriminatory policing, prosecutorial discretion exercised in racially disparate ways, and sentencing patterns that disadvantaged people of color. The algorithm did not create this inequality; the algorithm learns from and therefore perpetuates it.

This structural understanding has a practical implication: attempts to debias criminal justice AI by adjusting the algorithm — holding race constant, reweighting variables, optimizing for different fairness metrics — operate on the model without changing the data it learns from. So long as the training data reflects historical discrimination, the model will encode historical discrimination in ways that are not fully correctable through post-hoc algorithmic adjustment. The Chouldechova impossibility result makes this explicit in the domain of risk assessment: when actual recidivism rates differ across racial groups (as they do, due to structural inequalities in the criminal justice system that produced those rates), no calibrated algorithm can achieve equal error rates across those groups.

The Cumulative Disadvantage Dynamic

Criminal justice AI does not merely encode discrimination; it can intensify discrimination through a cumulative disadvantage dynamic. Consider: a Black teenager in a neighborhood subject to heavy predictive policing is more likely to be stopped and questioned than a white teenager in a less-policed neighborhood engaging in identical behavior. If that stop produces an arrest record, that record feeds into risk assessment instruments as a predictor of future recidivism — regardless of whether the underlying behavior warranted arrest. The arrest record increases the risk score. The higher risk score increases the likelihood of detention before trial. Pretrial detention is associated with worse trial outcomes and greater likelihood of conviction. A conviction produces a longer criminal record. A longer criminal record increases future risk scores. At each stage, the algorithmic system processes the output of previous discriminatory decisions and produces the next discriminatory decision.

This is not a failure of any individual algorithm; it is a system property. Criminal justice AI that processes the outputs of a discriminatory system will produce more discriminatory outputs as inputs to the next stage — producing a feedback loop that intensifies rather than merely replicates historical inequality.

What This Means for Reform Strategy

Understanding the structural basis of criminal justice AI bias has important implications for reform strategy. Reforms that target individual algorithms — replacing COMPAS with a more accurate tool, banning PredPol while allowing other predictive policing systems, requiring facial recognition accuracy standards — address symptoms while leaving the structural problem intact. A more accurate recidivism prediction tool trained on racially biased criminal history data will produce more precisely encoded racial bias, not less racial bias.

This structural analysis does not necessarily lead to the abolitionist conclusion that all criminal justice AI is irredeemable; it leads to the conclusion that the path to genuinely fair criminal justice AI runs through structural reform of the criminal justice system itself — not just technical improvement of the algorithms. Reforms that reduce racially differential policing, prosecution, and sentencing will produce training data with less encoded bias, which will produce AI tools with less encoded bias. In the absence of structural reform, technical AI improvement is likely to produce more sophisticated encoding of structural inequality rather than its elimination.

For business professionals considering criminal justice AI deployment or investment, this structural analysis suggests several concrete implications. First, vendor claims of bias mitigation that focus on algorithmic adjustment without addressing training data quality should be treated with appropriate skepticism: a well-calibrated model trained on biased data is a more precisely biased model. Second, independent validation of AI tools in the specific jurisdiction where they will be deployed — not just in the training jurisdiction — is essential, as the structural biases in criminal justice data vary significantly across jurisdictions. Third, ongoing monitoring with public reporting of disparate impact in deployment is non-negotiable: what a tool does in the field may differ substantially from what it does on validation data. Fourth, genuine accountability mechanisms — not just contractual representations by vendors but legal liability for demonstrable harm — are necessary conditions for responsible criminal justice AI deployment.

The history of criminal justice AI has been characterized by adoption significantly faster than evidence accumulation, and by accountability significantly weaker than the consequences imposed on those affected by algorithmic errors warrant. Changing this pattern requires regulatory requirements, institutional design, and leadership commitment of a different order from what has characterized the field to date.

Summary

AI is now present at every stage of the American criminal justice system — from predicting where crimes will occur, through determining who is detained before trial, through influencing sentences, through managing incarceration, through governing release. At each stage, the same patterns appear: accuracy claims that often exceed actual performance; racial disparities that reflect the historical inequities of the data on which systems are trained; opacity that forecloses meaningful challenge; and accountability gaps that leave the people most harmed with the least recourse.

The mathematical impossibility results that emerged from the COMPAS controversy — Chouldechova's proof that multiple fairness criteria cannot be simultaneously satisfied when group base rates differ — reframe the criminal justice AI debate: the question is not whether to violate a fairness criterion, but which criterion to prioritize. This is a political and ethical choice, not a technical one, and it should be made explicitly and democratically rather than embedded invisibly in algorithmic design.

The criminal justice system is among the most consequential domains for AI deployment — it operates on human freedom, it is already marked by profound inequality, and its outputs can mean years of imprisonment or wrongful conviction. These stakes demand the highest standards of evidence, transparency, accountability, and challenge — standards that current AI criminal justice practice does not consistently meet.

Next: Chapter 31 examines AI's environmental footprint — the energy, carbon, and water costs of building and running AI systems, and who bears those costs.