> "The law, in its majestic equality, forbids rich and poor alike to sleep under bridges, to beg in the streets, and to steal loaves of bread."
Learning Objectives
- Analyze the use of AI in criminal justice, including risk assessment, predictive policing, and surveillance
- Evaluate the constitutional and civil rights implications of algorithmic justice systems
- Assess accountability frameworks for AI-caused harm in justice contexts
- Compare approaches to algorithmic justice across jurisdictions
- Develop evidence-based policy recommendations for equitable justice AI
In This Chapter
- Chapter Overview
- 17.1 Predictive Policing: Data, Bias, and the Feedback Loop
- 17.2 Risk Assessment Tools: Sentencing by Algorithm
- 17.3 Constitutional Questions: Due Process and Equal Protection
- 17.4 Who's Accountable When AI Gets It Wrong?
- 17.5 Alternatives and Reforms: Reimagining Justice AI
- 17.6 Chapter Summary
- Spaced Review
- 🎯 Project Checkpoint: AI Audit Report — Step 17
- What's Next
"The law, in its majestic equality, forbids rich and poor alike to sleep under bridges, to beg in the streets, and to steal loaves of bread." — Anatole France, The Red Lily (1894)
Chapter Overview
On a Tuesday afternoon in 2016, a reporter named Julia Angwin and her team at the investigative news organization ProPublica published a story that changed the conversation about artificial intelligence in America. The story was about a piece of software that most people had never heard of — a risk assessment tool called COMPAS — and it made a simple, devastating claim: the algorithm was biased against Black defendants.
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) was used in courtrooms across the United States. Judges consulted its output when deciding whether to set bail, what sentence to impose, and whether someone was likely to reoffend. The algorithm assigned each defendant a "risk score" from 1 to 10. A high score meant the system predicted you were likely to commit another crime. A low score meant it predicted you probably would not.
ProPublica analyzed over 7,000 defendants in Broward County, Florida, and found a pattern: Black defendants who did not go on to reoffend were almost twice as likely as white defendants to be falsely flagged as high risk. White defendants who did go on to reoffend were almost twice as likely as Black defendants to be falsely labeled low risk. The algorithm, ProPublica concluded, was systematically biased.
The story provoked an immediate and fierce debate — one that, years later, remains unresolved. Northpointe (the company that built COMPAS, now called Equivant) fired back: their algorithm was equally accurate for Black and white defendants in terms of overall prediction rates. Both sides were technically correct. They were measuring fairness differently, and — as you learned in Chapter 9 — it is mathematically impossible to satisfy all reasonable definitions of fairness simultaneously when the base rates of the thing you are predicting differ between groups.
This chapter is about what happens when that mathematical impossibility collides with the real world of courtrooms, jail cells, police patrols, and constitutional rights. We are going to examine how AI systems are being used throughout the criminal justice system — from the moment police decide where to patrol, to the moment a judge decides how long someone spends behind bars. We are going to ask hard questions about due process, equal protection, and accountability. And we are going to confront a challenge that does not have easy answers: when an algorithm helps make a decision that ruins someone's life, who is responsible?
If you have been following our anchor example CityScope Predict, this is the chapter where that story comes to a head. And if you have been tracking the recurring theme of "who benefits, who is harmed" — this is where the stakes are highest.
In this chapter you will learn to:
- Analyze the use of AI in criminal justice — risk assessment, predictive policing, and surveillance
- Evaluate constitutional and civil rights implications of algorithmic justice
- Assess accountability frameworks for AI-caused harm
- Compare approaches to algorithmic justice across different jurisdictions
- Develop evidence-based policy recommendations for equitable justice AI
Learning Paths
Fast Track (60 minutes): Read sections 17.1, 17.3, 17.4, and 17.6. Complete the Check Your Understanding prompts and the Project Checkpoint.
Deep Dive (3–4 hours): Read all sections, complete the Check Your Understanding prompts, work through the Argument Map, read both case studies, and write up your accountability analysis for the AI Audit Report.
17.1 Predictive Policing: Data, Bias, and the Feedback Loop
Imagine you are a police chief in a mid-size American city. Your department has limited resources — never enough officers, never enough hours in the day. A technology company approaches you with a pitch: their software can analyze years of historical crime data and produce detailed maps showing exactly where crimes are most likely to occur next week. Instead of sending patrols out based on intuition, habit, or tradition, you could allocate them based on data. More efficient. More objective. More effective.
This is the promise of predictive policing — the use of algorithms to forecast where and when crimes will occur, and sometimes to predict who will commit them. The concept gained traction in the early 2010s, with companies like PredPol (now Geolitica) and Palantir offering tools to police departments across the United States and beyond.
The pitch sounds reasonable. But the moment you look beneath the surface, you find one of the most vivid examples of how AI can encode and amplify human bias rather than eliminate it.
The Data Problem
Predictive policing systems are trained on historical crime data. That data comes from police records — arrests, incident reports, 911 calls. And here is the fundamental problem: historical crime data does not measure crime. It measures policing.
Think about that distinction carefully. If a neighborhood has been heavily policed for decades — more patrols, more stop-and-frisk encounters, more drug sweeps — then the data will show more crime in that neighborhood. Not because more crime happens there, but because more crime is detected there. Meanwhile, identical activity in a lightly policed neighborhood goes unrecorded.
This matters enormously because the pattern of heavy policing in the United States has not been random. Decades of research document that Black and Latino neighborhoods have been subject to substantially more aggressive policing than white neighborhoods with comparable or even higher rates of actual criminal activity. The data reflects not just crime, but the history of who got policed and who did not.
💡 Intuition: Imagine you are looking for lost keys using only a flashlight. You search the kitchen thoroughly and find three keys there. You briefly scan the living room and find none. Does that mean the kitchen has more keys? No — it means you looked harder in the kitchen. Predictive policing does something similar: it assumes the flashlight beam (police attention) was evenly distributed, when in fact it was concentrated on specific neighborhoods.
The Feedback Loop
Now introduce the algorithm. It ingests historical data showing high crime rates in certain neighborhoods. It produces a heat map recommending more police presence in those neighborhoods. Officers are deployed there. They make more arrests. The new arrest data is fed back into the algorithm. The algorithm now has even more evidence that those neighborhoods are "high crime" areas. It recommends even more policing.
This is a runaway feedback loop — a cycle in which an AI system's outputs become its own future inputs, reinforcing and amplifying the pattern that existed in the original data. In the policing context, the loop works like this:
- Historical data reflects biased policing patterns
- Algorithm identifies certain areas as "high risk" based on that data
- Police are sent to those areas in greater numbers
- More arrests occur in those areas (because more officers are present)
- New data confirms the algorithm's prediction
- Algorithm doubles down on those areas
- Return to step 3
The result is that the algorithm does not discover where crime is occurring. It amplifies where policing has already been concentrated. As the legal scholar Andrew Ferguson has written, predictive policing risks creating "a digital version of the same biased policing that generated the biased data."
📊 Research Spotlight: A 2019 study published in Nature Human Behaviour by researchers at the Human Rights Data Analysis Group tested the feedback loop hypothesis directly. Using drug crime data from Oakland, California, the researchers showed that a predictive policing algorithm trained on historical drug arrest data would direct police disproportionately to neighborhoods with high concentrations of Black and Latino residents — even though surveys consistently show that drug use is roughly equal across racial groups. The algorithm was not detecting crime; it was detecting policing patterns.
CityScope Predict: The Story So Far
You first met CityScope Predict in Chapter 1 — a predictive policing system being considered by the fictional city of Millhaven. The city council is divided. Police Chief Rodriguez argues that the system will reduce bias by replacing "gut feelings" with data. Council member Aisha Thompson, who represents a historically over-policed district, warns that the data itself is biased.
In Chapter 7, you analyzed how CityScope Predict makes its predictions — processing inputs (historical crime data, time of day, weather, events) and producing outputs (heat maps, patrol recommendations). In Chapter 9, you applied different fairness metrics to CityScope Predict and discovered the impossibility of satisfying all of them simultaneously. In Chapter 13, you examined what governance structures might oversee its deployment.
Now, in this chapter, we confront the justice questions head-on: Does CityScope Predict violate anyone's constitutional rights? If it directs police to a neighborhood and an innocent person is stopped, who is accountable? And is there a way to build such a system that is genuinely equitable?
🔄 Check Your Understanding: Explain the difference between "crime data" and "policing data." Why does this distinction matter for predictive policing systems? Connect this to the concept of biased training data from Chapter 4.
What Happened in the Real World
Several real-world cities have grappled with these exact questions — and reached different conclusions.
Los Angeles adopted PredPol in 2011, making it one of the first major cities to use predictive policing. By 2019, an audit by the city's inspector general found that the system was deployed disproportionately in neighborhoods with higher concentrations of Black residents. In 2020, following public pressure and the recommendations of a community oversight board, the LAPD discontinued the program.
Chicago developed a "Strategic Subject List" (also called the "heat list") — an algorithm that attempted to predict not just where crimes would occur, but who would be involved in gun violence, either as perpetrator or victim. A 2019 RAND Corporation study found the list was ineffective at reducing gun violence and was disproportionately composed of Black men from certain neighborhoods. The city eventually discontinued the program.
New Orleans used Palantir's predictive system from 2012 to 2018 without publicly disclosing it. The program was revealed through investigative reporting, raising questions about transparency and democratic accountability. The city council had not voted to approve the system, and most residents did not know it existed.
These are not hypothetical concerns. These are documented outcomes from real cities, affecting real people.
17.2 Risk Assessment Tools: Sentencing by Algorithm
Predictive policing determines where police go. Risk assessment tools determine what happens to the people they encounter. These tools — technically called risk assessment instruments (RAIs) — are used at multiple points in the criminal justice process:
- Pretrial: Should a defendant be released on bail, held in jail, or released with conditions while awaiting trial?
- Sentencing: How long should the sentence be? Should it include alternatives to incarceration?
- Parole: Should an incarcerated person be released early?
- Supervision: What level of monitoring does a person on probation or parole need?
The logic is seductive: instead of relying on a single judge's intuition — which might be influenced by the judge's mood, implicit biases, or how recently they had lunch (yes, this has been studied) — why not use a statistical model that weighs dozens of factors systematically?
How Risk Assessment Tools Work
Most RAIs work by scoring defendants on a set of factors statistically associated with the likelihood of rearrest. These factors typically include:
- Criminal history (prior arrests, convictions, failures to appear in court)
- Age (younger people are statistically more likely to be rearrested)
- Employment and housing status
- Substance use history
- Social environment factors (peer criminal involvement, neighborhood characteristics)
The tool produces a score — often on a scale from 1 to 10 — that represents the predicted probability of a specific outcome, usually rearrest within a defined time period. Judges then use this score alongside other information when making their decisions.
⚠️ Critical Distinction: Risk assessment tools predict rearrest, not reoffending. These are very different things. Rearrest depends not only on a person's behavior but on police presence, surveillance intensity, and prosecutorial decisions. A person in a heavily policed neighborhood is more likely to be rearrested for the same behavior than a person in a lightly policed one. This means the tool is partially predicting policing patterns, not just individual behavior — the same problem we saw with predictive policing.
The COMPAS Controversy
The ProPublica investigation of COMPAS, described in this chapter's opening, remains the most widely discussed case study in algorithmic justice. Let us examine it more carefully, because it reveals something profound about the nature of fairness.
ProPublica's key finding was about false positive rates: among defendants who did not go on to reoffend, Black defendants were far more likely than white defendants to have been incorrectly classified as high risk. In other words, if you were Black and were not going to commit another crime, the algorithm was much more likely to mistakenly label you as dangerous.
Northpointe (COMPAS's developer) responded with a different metric: predictive parity. They argued that among defendants scored as high risk, the percentage who actually went on to reoffend was roughly the same for Black and white defendants. In other words, when the algorithm said "high risk," it was equally accurate regardless of race.
Both claims were true. Both are legitimate ways to measure fairness. And — as a team of researchers led by Jon Kleinberg at Cornell demonstrated in 2016 — it is mathematically impossible to satisfy both simultaneously when the base rates differ between groups (that is, when the actual recidivism rate is different for the two groups being compared). This is not a software bug. It is a fundamental constraint.
💡 Connecting Back to Chapter 9: This is the "fairness impossibility theorem" in action. In Chapter 9, you learned that different fairness metrics can conflict with one another. The COMPAS case is the most consequential real-world illustration of that principle. The algorithm cannot simultaneously equalize false positive rates across racial groups and maintain equal predictive accuracy across groups — not because the algorithm is poorly designed, but because the underlying rates are different.
The question, then, is not "Is the algorithm fair?" but rather "Which kind of fairness matters most in this context — and who gets to decide?"
The Human Judgment Baseline
A crucial question often missing from the debate: compared to what? Risk assessment tools do not replace a fair and unbiased process. They replace human judges, who bring their own set of biases.
Research on judicial decision-making reveals troubling patterns. Studies have found that:
- Judges set higher bail for Black defendants than for white defendants charged with similar offenses
- Sentencing severity can vary based on the judge assigned to a case, not just the facts of the case
- Decisions can be influenced by factors that have nothing to do with justice, including time of day and cognitive fatigue
This does not mean risk assessment tools are automatically better. But it means the comparison should not be "algorithm vs. perfect justice." It should be "algorithm vs. the status quo" — and the status quo is far from perfect.
⚖️ Ethical Analysis: The Baseline Problem
When evaluating an AI system in the justice context, always ask: "Compared to what?" This is not a defense of flawed algorithms — it is an insistence on honest comparison.
Consider three scenarios for pretrial detention decisions: 1. Pure judicial discretion — judges decide alone, bringing their expertise and their biases 2. Algorithm replaces judge — the risk score determines the outcome automatically 3. Algorithm informs judge — the judge sees the risk score as one input among many
Most current deployments use Scenario 3. But even in Scenario 3, research shows that judges may anchor on the algorithmic score and fail to adequately weigh other information. The tool influences the decision even when it is not supposed to be the decision.
Your position: Which scenario do you think is most defensible? What safeguards would you add to your preferred approach?
🔄 Check Your Understanding: A friend argues that "we should just remove race from the algorithm, and then it will be fair." Using what you know from Chapters 9 and 17, explain why this solution is more complicated than it sounds. (Hint: think about proxy variables.)
17.3 Constitutional Questions: Due Process and Equal Protection
The United States Constitution does not mention algorithms, data science, or artificial intelligence. It was written in 1787. But its principles — particularly the Fourteenth Amendment's guarantees of due process and equal protection — are directly relevant to AI in the justice system. And similar constitutional and human rights principles apply in democracies worldwide.
Due Process: Can You Challenge a Machine?
The Fourteenth Amendment states that no person shall be deprived of "life, liberty, or property, without due process of law." Due process has two components:
Procedural due process means you have a right to a fair process — notice of the charges against you, an opportunity to be heard, the right to confront evidence, and access to an impartial decision-maker.
Substantive due process means the government cannot deprive you of fundamental rights for arbitrary or irrational reasons, regardless of the procedures it follows.
Algorithmic decision-making challenges both forms.
The opacity problem. Many risk assessment tools are proprietary. The company that built the algorithm considers it a trade secret. Defendants and their attorneys cannot inspect the code, the training data, or the specific factors that produced a particular score. How do you challenge evidence you cannot see? How do you confront a witness that is a black box?
In the landmark case Loomis v. Wisconsin (2016), Eric Loomis was sentenced to six years in prison. The judge explicitly referenced a COMPAS risk score in the sentencing decision. Loomis argued that using a proprietary algorithm he could not inspect violated his due process rights. The Wisconsin Supreme Court ruled against him — finding that the tool was acceptable as long as it was not the sole basis for the sentencing decision. But the court also acknowledged that the algorithm's proprietary nature was "a legitimate concern" and instructed judges to use caution.
The case reached the U.S. Supreme Court, which declined to hear it — leaving the constitutional question unresolved.
🗺️ Argument Map: Due Process and Algorithmic Sentencing
Position Key Arguments Strongest Challenge Algorithms violate due process Defendants cannot inspect proprietary code; cannot meaningfully challenge opaque evidence; right to confront evidence requires transparency Judges already consider many factors defendants cannot fully challenge (psychiatric evaluations, presentence reports) Algorithms are compatible with due process They are advisory, not determinative; judges retain discretion; outputs can be contested even without seeing the code If a judge relies on the score, the advisory distinction may be meaningless in practice; anchoring bias is well-documented Due process requires transparency, not prohibition Mandate open-source algorithms or government-built tools; require disclosure of factors and weights; allow expert testimony to challenge scores Transparency may not help if the system is too complex for judges, attorneys, or juries to understand; "explainability" is itself a difficult technical problem
The right to explanation. A growing movement argues that defendants should have a legally enforceable right to explanation — the right to receive a meaningful, human-understandable account of how an algorithmic decision was reached. The European Union's General Data Protection Regulation (GDPR) includes a version of this right for automated decisions. The United States does not currently have a federal equivalent, though some states and cities have begun legislating in this direction.
Equal Protection: Algorithmic Discrimination
The Fourteenth Amendment also guarantees "equal protection of the laws" — the principle that the government cannot treat similarly situated people differently based on protected characteristics like race, gender, or religion.
Disparate treatment occurs when a system explicitly uses a protected characteristic (like race) as a factor in a decision. Most modern risk assessment tools do not include race as an input variable. Problem solved? Not quite.
Disparate impact occurs when a facially neutral system produces systematically different outcomes for different groups — even if race is never explicitly considered. This is where proxy variables become critical. Factors like zip code, employment status, education level, and prior arrest history are all correlated with race in the United States, because they reflect the legacy of segregation, redlining, and discriminatory policing. An algorithm that uses these factors — even without knowing a defendant's race — can reproduce and reinforce racial disparities.
💡 Intuition: Imagine a rule that says "we will only hire people who went to a school within 5 miles of our office." The rule does not mention race. But if the office is in a historically segregated white neighborhood, the rule has a racially disparate impact. Algorithms work the same way — neutral-sounding inputs can produce discriminatory outputs when those inputs are shaped by historical discrimination.
The legal standard for disparate impact in the United States requires showing that a policy has a disproportionate adverse effect and that there are less discriminatory alternatives available. Applying this framework to algorithmic tools is an active area of legal scholarship, and courts have not yet established clear precedent.
A Global Perspective
Different countries and legal traditions are approaching these questions in markedly different ways:
-
The European Union has taken the most aggressive regulatory stance. The EU AI Act (2024) classifies AI systems used in law enforcement and criminal justice as "high risk," requiring mandatory conformity assessments, human oversight, transparency requirements, and documentation of training data and performance metrics. The Act prohibits certain uses of AI in justice, including real-time facial recognition in public spaces for law enforcement purposes (with limited exceptions).
-
The United Kingdom takes a sector-specific approach, with guidance from bodies like the Centre for Data Ethics and Innovation. UK courts have heard challenges to algorithmic policing tools, and some police forces have voluntarily discontinued predictive systems.
-
China has deployed AI extensively in its justice system, including a "smart court" system that reportedly assists with sentencing recommendations and a social credit system with implications for justice-related decisions. The transparency and accountability mechanisms differ fundamentally from Western democratic models.
-
Canada conducted an Algorithmic Impact Assessment review of federal government systems and found that many were being deployed without adequate oversight or equity analysis.
🔄 Check Your Understanding: Explain the difference between disparate treatment and disparate impact. Give an example of an AI system in the justice context that could produce disparate impact even without using race as an input variable.
17.4 Who's Accountable When AI Gets It Wrong?
On a cold morning in Detroit in January 2020, Robert Julian-Borchak Williams was arrested in his driveway in front of his wife and young daughters. He was taken to a detention center and held for thirty hours. His alleged crime: shoplifting several watches from a luxury goods store. The evidence: a facial recognition match between a blurry surveillance image and his driver's license photo.
Williams had never been to the store. The facial recognition system had produced a false match. He was the first known case of a wrongful arrest caused by facial recognition technology in the United States — but not the last. In the following years, at least two additional wrongful arrests based on faulty facial recognition matches were publicly documented in the Detroit area, all involving Black men. Research has consistently shown that facial recognition systems have significantly higher error rates for darker-skinned faces, particularly darker-skinned women.
When something goes wrong in a traditional criminal justice context — when a witness misidentifies someone, or an officer fabricates evidence — there are (imperfect but existing) accountability mechanisms. The witness can be cross-examined. The officer can be investigated. The evidence can be challenged.
But when an algorithm gets it wrong, who do you hold accountable?
The Accountability Gap
The accountability gap is the space between an AI system's harmful output and any individual or institution that can be held responsible for it. In the justice context, this gap is particularly dangerous because the consequences — arrest, detention, conviction, incarceration — are among the most severe a person can experience.
Consider the chain of actors involved when a risk assessment tool contributes to an unjust outcome:
- The developers who built the algorithm and chose its training data
- The company that sold it to the jurisdiction
- The government agency that procured and deployed it
- The judge who consulted the score (but was told not to rely on it exclusively)
- The legislature that authorized (or failed to regulate) its use
Each actor can point to someone else. The developer says "we just built a tool — judges make the decisions." The judge says "I was told the tool was validated — I relied on expert assurance." The government agency says "we followed procurement best practices." The company says "we provided documentation and training." The legislature says "we did not anticipate this specific use."
This is not unique to AI — accountability gaps exist in many complex institutional settings. But AI systems exacerbate the problem because of their opacity (it is hard to see how the error occurred), their scale (the same error can affect thousands of people simultaneously), and their appearance of objectivity (people treat algorithmic outputs as more neutral than human judgments, even when they are not).
⚖️ Ethical Analysis: Accountability Mapping
For any AI system deployed in a justice context, map the following:
- Who designed it? (developer, institution)
- Who validated it? (was it independently tested? by whom?)
- Who decided to deploy it? (elected officials, appointed administrators, individual officers)
- Who uses it on a daily basis? (are they trained? do they understand its limitations?)
- Who is affected by its outputs? (defendants, communities, victims)
- Who can challenge its outputs? (defense attorneys, advocacy organizations, oversight bodies)
- Who has the power to shut it down? (elected officials, courts, contract terms)
If any of these questions cannot be answered clearly, you have identified an accountability gap.
Existing and Emerging Accountability Frameworks
Several approaches to closing the accountability gap have been proposed or implemented:
Algorithmic Impact Assessments (AIAs) are structured evaluations conducted before an AI system is deployed. Modeled on environmental impact assessments, they require the deploying agency to document the system's purpose, training data, expected effects on different populations, potential harms, and mitigation strategies. New York City's Local Law 144 (2021) is one example, requiring bias audits for AI-based hiring tools — though its scope is limited and its enforcement has been contested. Canada's federal government has implemented a mandatory Algorithmic Impact Assessment tool for government systems.
Mandatory transparency and disclosure requires that jurisdictions publicly disclose when they are using algorithmic tools, what data those tools use, and how they have been validated. Illinois, Idaho, and a growing number of states have passed laws requiring disclosure of facial recognition use by law enforcement.
Independent auditing involves having a third party — not the developer or the deployer — evaluate the system's performance, accuracy, and equity. The challenge is that meaningful auditing requires access to the algorithm and its training data, which companies often resist providing.
Individual liability frameworks would hold specific actors — developers, deploying officials, or both — legally accountable for algorithmic harms. This approach is largely theoretical in the United States, where courts have not yet established clear standards. In the EU, the AI Act includes provisions for penalties on providers and deployers who fail to comply with requirements.
👁️ Perspective-Taking: Three Viewpoints on Accountability
The defendant's attorney: "My client was held in jail for three days because a proprietary algorithm said they were high risk. I cannot examine the algorithm. I cannot cross-examine it. I cannot even see what factors it weighed. This is fundamentally incompatible with the Sixth Amendment right to confront evidence."
The county administrator: "We adopted this tool to reduce bias, not increase it. We followed the vendor's guidelines, we trained our judges, and we told them to use the score as one factor among many. We did our due diligence. If the algorithm has flaws, that is between the vendor and the researchers who need to improve it."
The algorithm developer: "We have been transparent about what the tool can and cannot do. Our validation studies show it performs as well as or better than unaided human judgment. We cannot control how individual judges use the output, or whether jurisdictions deploy it in ways we did not intend."
Notice how each actor is telling a reasonable story — and how the combined effect is that no one is responsible.
ContentGuard: Moderation as Justice
Though ContentGuard is a content moderation system rather than a criminal justice tool, the accountability parallels are striking. When ContentGuard removes a post, bans a user, or flags content as violating community standards, it is making a judgment about what speech is acceptable. That judgment can have real consequences: lost income for creators, silenced political speech, and — in countries where online expression is monitored by the state — physical danger.
Who is accountable when ContentGuard wrongly removes content? The AI system? The platform that deployed it? The policy team that wrote the rules it enforces? The human reviewer who upheld the AI's decision on appeal?
This connection — between content moderation and criminal justice — may seem like a stretch, but legal scholars increasingly argue that platform governance operates as a private justice system, with its own rules, its own enforcement mechanisms, and its own punishments. The accountability gaps are remarkably similar.
Priya's Semester: Academic Accountability
Consider Priya's situation from a different angle. Her university has adopted an AI plagiarism detection system — a tool that scans student submissions and flags potential academic integrity violations. In one case, the system flags a paragraph in Priya's essay as "substantially AI-generated." The university's academic integrity board convenes a hearing.
Priya did not use AI to write the paragraph. She wrote it herself. But the detection tool's confidence score is 87%. The burden has effectively shifted: Priya must prove she did not use AI, a task that is nearly impossible. How do you prove a negative?
This is accountability in reverse — an AI system making an accusation, with the accused bearing the burden of disproof. It mirrors dynamics in the criminal justice system, where risk assessment scores can shift the burden from the state (which must prove dangerousness) to the defendant (who must overcome a high-risk label).
🔄 Check Your Understanding: Explain the concept of the "accountability gap." Identify at least three factors that make accountability harder for AI systems than for individual human decision-makers.
17.5 Alternatives and Reforms: Reimagining Justice AI
If the previous sections left you feeling uneasy, good. That unease is appropriate. But it should not lead to paralysis. The question is not simply "Should we use AI in the justice system?" — it is "Under what conditions, with what safeguards, and with whose consent?"
This section examines concrete reforms that have been proposed, piloted, or implemented. Not all of them are AI-based — some argue the best reform is reducing reliance on algorithmic tools altogether.
Approach 1: Fix the Data
Some advocates argue the problem is not the algorithms but the data they are trained on. If historical policing data reflects biased practices, then train on better data — or adjust the data to correct for known biases.
This approach has real merit. Researchers have developed techniques for debiasing training data — identifying and correcting patterns that reflect discriminatory practices rather than underlying differences in behavior. But debiasing is technically difficult, conceptually contested (what does "unbiased" crime data even look like?), and risks creating a false sense of security ("we fixed the data, so the algorithm must be fair now").
Approach 2: Fix the Algorithm
Others argue for building fairer algorithms from the ground up — tools explicitly designed to satisfy particular fairness constraints. For example, an algorithm could be constrained to produce equal false positive rates across racial groups, accepting that this may reduce overall predictive accuracy.
This approach forces a policy choice into the open: which kind of fairness do we want, and how much accuracy are we willing to sacrifice for it? That is a legitimate and important debate — but it is a political debate, not a technical one. The algorithm cannot resolve it. People must.
Approach 3: Fix the Process
A third approach focuses not on the algorithm itself but on the institutional processes surrounding its use:
- Mandatory training for judges and officers on algorithmic limitations, including specific instruction on feedback loops, proxy variables, and the difference between prediction and explanation
- Structured override protocols that require judges to document their reasoning when they deviate from an algorithmic recommendation — and, equally important, when they follow one
- Community review boards with genuine authority to evaluate, modify, or discontinue algorithmic tools
- Sunset clauses that require re-authorization after a defined period, preventing tools from becoming entrenched without ongoing scrutiny
Approach 4: Reduce Reliance on Prediction
The most ambitious reform challenges the entire premise. Do we need to predict who will commit crimes? Or does the prediction paradigm itself lead to unjust outcomes — treating people not for what they have done but for what a statistical model says they might do?
Advocates of this approach point to alternatives that focus on need rather than risk: connecting people to housing, employment, mental health services, and substance abuse treatment rather than calculating their probability of rearrest. Some jurisdictions — including Washington, D.C., and New Jersey — have reformed their pretrial systems to release more defendants without bail conditions, finding that the vast majority appear for their court dates without incident and without algorithmic supervision.
📊 Comparison: Reform Approaches
Approach Focus Strength Limitation Fix the data Training data quality Addresses root cause "Unbiased" data may be impossible Fix the algorithm Fairness constraints Forces explicit value choices Technical fixes can obscure political problems Fix the process Institutional safeguards Preserves human judgment Depends on institutional will and funding Reduce reliance Questioning the paradigm Avoids prediction pitfalls entirely Politically difficult; requires alternative infrastructure
The Role of Affected Communities
One principle unites almost all reform proposals: the people most affected by algorithmic justice systems should have a meaningful role in decisions about whether and how those systems are deployed.
This is not just an ethical principle — it is a practical one. Community members often have knowledge that developers and administrators lack. They know which neighborhoods are over-policed. They know which factors in a risk assessment tool reflect systemic disadvantage rather than individual behavior. They know what "public safety" actually means in their daily lives.
Meaningful community engagement is not the same as a public comment period or a town hall meeting held after the decision has already been made. It means:
- Early involvement — before procurement, not after deployment
- Genuine authority — the power to reject or modify proposals, not just provide input
- Accessible information — technical documentation translated into plain language
- Diverse representation — including formerly incarcerated individuals, defense attorneys, and community organizations, not just law enforcement voices
🧪 New Technique: Algorithmic Impact Assessment (AIA)
An Algorithmic Impact Assessment is a structured evaluation conducted before deploying an AI system in a high-stakes context. It requires answering the following questions:
- Purpose: What specific problem is this system designed to solve?
- Alternatives: What non-algorithmic approaches were considered?
- Data: What data will the system use? What are its known limitations and biases?
- Equity analysis: How will the system affect different demographic groups?
- Accuracy: What are the error rates? How do they vary by group?
- Accountability: Who is responsible for monitoring, auditing, and correcting the system?
- Community input: How were affected communities consulted?
- Sunset provision: When and how will the system be reevaluated?
Try applying this framework to your AI Audit Report system. Even if your system is not in the criminal justice domain, the AIA structure applies to any high-stakes AI deployment.
🔄 Check Your Understanding: A city is considering adopting a predictive policing system. Draft three specific conditions you would require before approving the deployment. For each condition, explain what harm it is designed to prevent.
17.6 Chapter Summary
This chapter has taken you through some of the most consequential applications of AI in contemporary society. Let us consolidate what we have learned.
Predictive policing systems are trained on data that reflects policing patterns, not crime patterns. When historical data encodes decades of racially biased enforcement, algorithms trained on that data risk amplifying rather than correcting those biases. Runaway feedback loops — in which the algorithm's outputs become its own future inputs — can entrench and intensify discriminatory patterns.
Risk assessment tools force a confrontation with the impossibility of universal fairness. The COMPAS controversy demonstrated that different legitimate definitions of fairness can produce contradictory conclusions about the same system. The choice of which fairness metric to prioritize is a political and ethical decision, not a technical one.
Constitutional principles — due process and equal protection — apply to algorithmic decision-making, but courts have not yet fully worked out how. The right to inspect and challenge evidence is complicated by proprietary algorithms. Disparate impact doctrine is complicated by proxy variables that reproduce racial disparities without explicitly considering race.
Accountability gaps are a structural problem, not an individual failing. When multiple actors — developers, vendors, agencies, judges — each bear partial responsibility, the result is often that no one bears meaningful responsibility. Closing these gaps requires intentional institutional design: impact assessments, mandatory transparency, independent auditing, and community oversight.
Reform is possible, but it requires political will, not just better technology. Fixing the data, fixing the algorithm, fixing the process, and reducing reliance on prediction are all viable strategies — and they are not mutually exclusive. The most promising reforms combine technical improvements with institutional safeguards and genuine community participation.
📋 Key Concepts Introduced in This Chapter
Concept Definition Predictive policing Using algorithms to forecast where crimes will occur, directing police resources accordingly Risk assessment instrument (RAI) A tool that scores individuals on their predicted likelihood of reoffending, used in bail, sentencing, and parole decisions Runaway feedback loop A cycle in which an AI system's outputs become future inputs, amplifying existing patterns Algorithmic accountability The principle that institutions deploying AI should be responsible for its outcomes Due process (procedural/substantive) Constitutional guarantee of fair legal procedures and protection against arbitrary government action Equal protection Constitutional principle that similarly situated people must be treated equally by the law Disparate impact When a facially neutral system produces systematically different outcomes for different groups Right to explanation The principle that individuals affected by algorithmic decisions should receive meaningful accounts of how those decisions were reached Algorithmic Impact Assessment A structured evaluation conducted before deploying a high-stakes AI system Accountability gap The space between an AI system's harmful output and any person or institution that can be held responsible
Spaced Review
These questions revisit concepts from earlier chapters. Try answering them before checking.
From Chapter 7 (AI Decision-Making): What is the difference between a prediction and a decision? Why does this distinction matter when an AI system "recommends" a sentence for a defendant?
Review
A prediction is a probability estimate — "there is a 72% chance this person will be rearrested." A decision is an action — "this person should be denied bail." AI systems make predictions; humans (should) make decisions. But when judges anchor on algorithmic predictions, the line between prediction and decision blurs. The system's prediction becomes the decision in practice, even if the judge retains formal authority.From Chapter 9 (Bias and Fairness): Name two different fairness metrics that were applied to COMPAS. Why can they not both be satisfied simultaneously?
Review
False positive rate parity (equal rates of incorrectly labeling non-reoffenders as high risk across racial groups) and predictive parity (equal accuracy among those labeled high risk across groups). When base rates of the predicted outcome differ between groups — as recidivism rates do — satisfying one metric necessarily violates the other. This is the fairness impossibility theorem in action.From Chapter 13 (Governing AI): What governance structure would you recommend for a city deploying a predictive policing system? How does it differ from governance for a lower-stakes application?
Review
High-stakes applications require more rigorous governance: mandatory impact assessments, independent auditing, community oversight boards with genuine authority, mandatory transparency about the system's design and performance, sunset clauses requiring periodic reauthorization, and clearly defined accountability chains. Lower-stakes applications may use lighter-touch governance while still requiring transparency and basic accountability.🎯 Project Checkpoint: AI Audit Report — Step 17
Your task: Analyze the accountability structures for the AI system you have been studying throughout the course.
For this chapter, complete the following:
-
Accountability mapping. Identify every actor in the chain of responsibility for your AI system's outputs: the developer, the deployer, the user, the regulator (if any), and the people affected. For each actor, describe what responsibility they bear and what power they have to change the system.
-
Due process analysis. If your system's output affects individuals, can those individuals inspect, understand, and challenge the system's decisions? If not, what mechanisms could be created to enable this?
-
Disparate impact assessment. Based on your research (including the bias audit in Chapter 9), does your system produce different outcomes for different demographic groups? If so, are there less discriminatory alternatives?
-
Accountability gap identification. Where in the accountability chain is responsibility unclear or absent? Propose a specific mechanism to close each gap you identify.
-
Policy recommendation. Draft one specific, actionable policy recommendation for improving accountability for your AI system. Be concrete: who would implement it, what would it require, and how would compliance be verified?
Deliverable: 2–3 pages. Add to your AI Audit portfolio.
What's Next
In Chapter 18: AI and the Environment — Climate, Resources, and Sustainability, we shift our focus from justice to the planet. You will learn that AI systems have a significant environmental footprint — from the energy consumed by massive data centers to the water used to cool them, to the minerals mined to build the hardware they run on. But you will also learn that AI is being used to fight climate change, optimize energy systems, and monitor deforestation. The chapter asks you to weigh both sides of the ledger and evaluate whether AI's environmental benefits can outweigh its costs — and for whom.