Chapter 30: Quiz — AI in Criminal Justice Systems
20 questions. Mix of multiple choice, true/false, and short answer.
Multiple Choice
1. The ProPublica "Machine Bias" investigation of COMPAS found that among defendants who did NOT reoffend within two years, what was the false positive rate for Black defendants compared to white defendants?
A) Black defendants: 28%; White defendants: 44% B) Black defendants: 44.9%; White defendants: 23.5% C) Black defendants: 50%; White defendants: 30% D) The false positive rates were essentially equal across racial groups
Answer: B — ProPublica found Black defendants were nearly twice as likely as white defendants to be falsely classified as high risk among true non-recidivists (44.9% vs. 23.5%).
2. Northpointe's rebuttal to the ProPublica analysis argued that COMPAS was fair because it demonstrated:
A) Equal false positive rates across racial groups B) Equal false negative rates across racial groups C) Calibration — that a given score predicted the same actual recidivism rate regardless of race D) No statistically significant difference in overall accuracy across racial groups
Answer: C — Northpointe argued calibration: that a COMPAS score of 7 predicted approximately the same actual two-year recidivism rate for Black and white defendants, making the scores mean the same thing regardless of race.
3. The Chouldechova impossibility result (2017) proved that:
A) Algorithmic risk assessment can never be accurate for minority populations B) When two groups have different base rates of the predicted outcome, no classifier can simultaneously satisfy equal false positive rates, equal false negative rates, and calibration C) COMPAS specifically was incapable of fairness due to its training data D) Human judges are inherently more fair than algorithmic systems
Answer: B — Chouldechova's mathematical proof shows this is a constraint on any classification system when group base rates differ, not a property of COMPAS specifically.
4. In Loomis v. Wisconsin (2016), the Wisconsin Supreme Court held that:
A) Using COMPAS in sentencing unconstitutionally violated due process B) Proprietary algorithmic risk assessment scores cannot be used in criminal sentencing C) The use of COMPAS as one factor among many in sentencing did not violate due process, despite the formula's secrecy D) Risk assessment instruments must be open source to be used in criminal proceedings
Answer: C — The court rejected Loomis's due process challenge, holding that providing the defendant with his scores and the general factor categories satisfied due process requirements when COMPAS was used as one input among many.
5. The MacArthur Justice Center's 2021 investigation of ShotSpotter in Chicago found that what percentage of ShotSpotter alerts dispatched to police led to no evidence of a gun crime?
A) 45% B) 67% C) 89% D) 95%
Answer: C — The MacArthur Justice Center found that 89% of ShotSpotter alerts dispatched to Chicago police resulted in no evidence of a gun crime when officers responded.
6. The "feedback loop problem" in predictive policing refers to:
A) Police officers giving feedback to improve the algorithm's accuracy over time B) The self-reinforcing cycle where enforcement in algorithmically-identified areas generates more crime detection data, which directs more enforcement to those areas C) AI systems learning from their own mistakes in real time D) Community feedback mechanisms that improve police accountability
Answer: B — The feedback loop: more policing of X generates more crime detection in X, which validates the model's prediction of X as a high-crime area, directing even more police to X.
7. The Arnold Foundation's Public Safety Assessment (PSA) differs from COMPAS most significantly in that:
A) The PSA uses more input variables and is therefore more accurate B) The PSA is publicly available with its full methodology published, enabling independent scrutiny C) The PSA is used for parole decisions rather than bail decisions D) The PSA includes interview-based inputs rather than relying solely on criminal records
Answer: B — The Arnold Foundation explicitly published the PSA's full methodology, making it transparent and independently evaluable — the opposite of COMPAS's proprietary approach.
8. Which of the following cities announced the termination of its ShotSpotter contract in 2024?
A) New York City B) Los Angeles C) Chicago D) Philadelphia
Answer: C — Chicago Mayor Brandon Johnson announced in February 2024 that the city would not renew its ShotSpotter contract, which expired in September 2024.
9. The EU AI Act (2024) treats which of the following criminal justice AI applications as categorically prohibited?
A) Risk assessment tools used in parole decisions, if adequately validated B) AI used by public authorities to assess criminality risk on the basis of individual profiling C) Facial recognition used in airports for border security D) Automated license plate readers
Answer: B — The EU AI Act prohibits AI used by public authorities to evaluate or classify individuals for criminality risk based on profiling, as a categorically prohibited application.
10. Robert Williams's 2020 wrongful arrest in Detroit was attributed to:
A) A false eyewitness identification B) A fabricated police report C) A facial recognition system misidentifying him as a shoplifting suspect D) A COMPAS score that incorrectly predicted violent behavior
Answer: C — Williams was arrested after Detroit Police used facial recognition software that misidentified him from a surveillance image; he was detained for 18 hours before investigators compared the suspect photo directly and determined it was a misidentification.
True/False
11. COMPAS uses approximately 10 input variables in its risk assessment.
Answer: False — COMPAS uses approximately 137 questions covering criminal history, drug use, residential stability, education, vocation, criminal attitudes, family criminality, and social isolation — a substantially more complex instrument than 10 variables.
12. The Chouldechova impossibility result applies only to COMPAS specifically due to its flawed training data.
Answer: False — The impossibility result is a mathematical property of any classification system applied to groups with different base rates. It applies regardless of the specific tool, its training data, or its design choices.
13. Chicago's Strategic Subject List (SSL) was a place-based predictive policing system that identified geographic hot spots.
Answer: False — The SSL was a person-based system that assigned risk scores to approximately 400,000 individual Chicago residents, predicting their likelihood of involvement in a shooting. PredPol is an example of a place-based system.
14. The MacArthur Justice Center found that ShotSpotter alerts were concentrated in predominantly white neighborhoods in Chicago.
Answer: False — ShotSpotter was deployed in predominantly Black and Latino neighborhoods on Chicago's South and West sides, creating an asymmetric surveillance burden concentrated in minority communities.
15. The US Supreme Court definitively resolved the constitutional question of algorithmic sentencing by ruling in favor of defendants in Loomis v. Wisconsin.
Answer: False — The US Supreme Court declined to hear Loomis v. Wisconsin in 2017, leaving the Wisconsin Supreme Court's ruling (upholding COMPAS use) as the authoritative treatment without federal constitutional resolution. A denial of certiorari does not constitute Supreme Court endorsement of the lower court's ruling.
Short Answer
16. Explain why ProPublica's finding of higher false positive rates for Black defendants and Northpointe's finding of calibration across races are both technically correct, despite appearing contradictory.
Model Answer: The two findings measure different properties of the same system. ProPublica's analysis measured error rates conditioned on actual outcome: of those who actually did not reoffend, what proportion were incorrectly classified as high risk, separately by race? This gives false positive rates. Northpointe's analysis measured calibration: does a given score predict the same actual recidivism rate regardless of race? Both analyses used the same data but asked different questions and measured different properties. The Chouldechova impossibility result explains why both can simultaneously be true: when Black and white defendants in Broward County had different actual recidivism rates (base rate difference), any calibrated classifier will necessarily produce different false positive and false negative rate patterns across groups. Choosing calibration means accepting different error rates; achieving equal error rates requires abandoning calibration. Both claims are accurate descriptions of the same system measured differently.
17. What is the "dirty data" problem in predictive policing, and why does it undermine the objectivity claims often made for AI criminal justice systems?
Model Answer: The "dirty data" problem refers to predictive policing systems trained on crime data that is itself a product of documented police misconduct, discriminatory enforcement, or systematic error — rather than a neutral record of where crime occurs. Richardson, Schultz, and Crawford (2019) documented that cities including New Orleans, Chicago, and others trained predictive policing systems on data sets that included records from corrupt narcotics units manufacturing evidence, documented unconstitutional stops, and racially biased enforcement patterns. The AI trained on this data learns the patterns in the data — including the discriminatory and corrupted patterns — and produces predictions that reflect those patterns. But because the output carries the appearance of algorithmic objectivity, the discriminatory history embedded in the training data is obscured. The claim of "data-driven" objectivity actually refers to the encoding of historical discrimination into an algorithmic format that makes it harder to challenge than overt human bias.
18. What constitutional argument did Eric Loomis make against the use of COMPAS in his sentencing, and how did the Wisconsin Supreme Court respond?
Model Answer: Loomis made two core constitutional arguments. First, that the use of a proprietary algorithm whose formula could not be disclosed or examined violated his due process right to challenge evidence used against him — he received a score but could not question the formula generating it. Second, that using a group-based statistical assessment to impose an individualized sentence violated his right to an individualized sentencing determination — COMPAS scores him based on population-level statistics for people with similar profiles, not based on his specific individual circumstances and future. The Wisconsin Supreme Court rejected both arguments. On the formula secrecy issue, the court held that Loomis had received sufficient information — his scores and the general factor categories — to satisfy due process, noting that COMPAS was used as one factor among many and that the judge's sentence was supported by independent evidence. On the group-based individualization issue, the court held that courts routinely use group-based actuarial information (criminal history records being one example) in sentencing determinations, and that COMPAS was within that tradition. The court did note that judges should not give COMPAS "exclusive or determinative weight."
19. Describe the "vendor accountability problem" illustrated by the ShotSpotter case. Why is it difficult to hold AI vendors accountable for public safety tool performance?
Model Answer: The vendor accountability problem has several interlocking dimensions. First, contractual insulation: vendor contracts with government entities are structured around service delivery (the system activates and sends alerts) rather than outcome accountability (the system actually reduces gun violence). Vendors can demonstrate technical performance (alert generation) while the public safety benefit remains undemonstrated and contractually unrequired. Second, evidence barriers: demonstrating that a deployed technology does not achieve its stated public safety purpose requires rigorous independent evaluation — controlled comparisons, adequate statistical power, and methodological sophistication. Government procurement rarely requires this as a condition of contract; government agencies rarely have the capacity to commission it independently. Third, information asymmetry: the vendor has the most complete knowledge of the system's performance and the least incentive to generate or disclose evidence of limitation. SoundThinking's responses to critical research included legal challenges and methodological criticism while relying on self-commissioned or vendor-favored studies. Fourth, political economy: once deployed, technology contracts develop political constituencies (police departments that have integrated the tool, vendors with community relations investments) that make termination politically difficult despite evidence of ineffectiveness. The cumulative effect is that AI public safety tools can persist at public expense and public harm without rigorous accountability.
20. What are the minimum transparency requirements that defenders of algorithmic criminal justice tools should have to satisfy in order for those tools to be ethically defensible in criminal proceedings?
Model Answer: Minimum transparency requirements for ethical criminal justice AI include: full public disclosure of the tool's methodology, including training data sources, input variables, weighting or model architecture, and validation approach; ongoing independent validation studies examining accuracy and disparate impact, conducted by researchers without financial ties to the vendor and with methodology and data access sufficient for peer review; disclosure to defendants of their specific inputs — the actual data values entered for their case — so they can challenge factual accuracy; access for defense counsel to documentation sufficient for meaningful challenge, including validation studies showing accuracy across relevant demographic groups; an accessible mechanism (court-appointed technical expert, discovery rights) for defendants who lack technical expertise to obtain expert analysis of how their specific inputs drove their specific output; mandatory disclosure when any retrospective revision of system outputs is used as evidence; and ongoing public reporting on system performance including error rates, disparate impact metrics, and validation evidence, with automatic review when evidence of systematic error or disproportionate harm emerges.