Case Study: Robert Williams: Wrongful Arrest by Facial Recognition
"I held my driver's license photo next to the surveillance image and asked them, 'Do you think all Black people look alike?' The detective looked at the two photos, looked at me, and said: 'I guess the computer got it wrong.'" — Robert Williams, recounting his interrogation, ACLU testimony, 2020
Overview
On January 9, 2020, Robert Williams was at work when he received a call from the Detroit Police Department. He was told there was a warrant for his arrest. When Williams arrived home, two officers were waiting in his driveway. They handcuffed him in front of his wife and two young daughters — ages 2 and 5 — and transported him to a detention center, where he was held for 30 hours, photographed, fingerprinted, and his DNA was collected. The charge was larceny — the theft of watches from a Shinola store in downtown Detroit.
Williams had not committed the crime. He had been identified by a facial recognition system that matched a grainy surveillance camera image to his driver's license photo. The match was wrong. This case study examines what happened, why the technology failed, what it reveals about the intersection of algorithmic bias and law enforcement practice, and why the Robert Williams case became a turning point in the national debate over facial recognition.
Skills Applied: - Analyzing accuracy disparities in biometric identification systems - Evaluating law enforcement governance of automated decision-making tools - Connecting algorithmic bias to civil rights and due process - Applying the concepts of biometric privacy, consent, and accountability from Chapter 12
The Incident
The Theft
In October 2018, five watches valued at approximately $3,800 were stolen from a Shinola store in the Midtown neighborhood of Detroit. The store's surveillance cameras captured footage of a Black man inside the store around the time of the theft. The images were low quality — grainy, captured from an elevated angle, and partially obscured.
The Facial Recognition Match
The Detroit Police Department's video analytics unit submitted a still image from the surveillance footage to its facial recognition system. The system — provided by DataWorks Plus, a commercial vendor — compared the image against a database of driver's license photos maintained by the Michigan Secretary of State, containing photographs of approximately 49 million individuals.
The system returned a list of potential matches, ranked by similarity score. Robert Williams' driver's license photo appeared on the list.
The Investigation (or Lack Thereof)
What happened next is where the governance failure becomes most apparent. An investigator included Williams' photo in a six-person photo lineup and showed it to the Shinola loss prevention contractor who had reported the theft. The contractor, a private security employee who had not witnessed the theft himself, identified Williams from the lineup.
Based on the facial recognition match and the photo lineup identification, a warrant was issued for Williams' arrest. There was no additional corroborating evidence — no fingerprints, no credit card records, no witness testimony placing Williams at the scene. The facial recognition output and the subsequent photo lineup were the entirety of the case against him.
The Arrest and Detention
Williams was arrested in front of his family, transported to a detention center, and held overnight. During interrogation the following day, detectives presented him with the surveillance image. Williams, looking at the grainy, low-resolution photo, immediately recognized that the person in the image was not him. He held his driver's license photo next to the surveillance image and asked the detectives to compare them.
According to Williams' account, one detective acknowledged that "the computer" got it wrong. Williams was released approximately 30 hours after his arrest. The charges were eventually dropped, but his arrest record remained.
The Aftermath for Williams
The experience left lasting effects beyond the legal case. Williams was humiliated in front of his wife and daughters. His older daughter, then 5, told her school friends that "Daddy got arrested." His employer was notified of the arrest. Williams suffered anxiety and loss of trust in law enforcement. He had to return to the police station to complete paperwork related to the dismissed charges, requiring him to miss work.
In collaboration with the ACLU of Michigan, Williams filed a complaint and later a lawsuit against the Detroit Police Department and the City of Detroit. In 2021, he testified before a U.S. House subcommittee, bringing national attention to the risks of facial recognition in policing.
The Technology: Accuracy Disparities
The Gender Shades Study
Williams' wrongful arrest did not occur in a vacuum. It occurred against a backdrop of well-documented accuracy disparities in facial recognition systems. In 2018, MIT researcher Joy Buolamwini and Google researcher Timnit Gebru published the Gender Shades study, which evaluated commercial facial recognition systems from IBM, Microsoft, and Face++ (a Chinese company) on their ability to classify faces by gender.
The results were stark:
| Demographic Group | IBM Error Rate | Microsoft Error Rate | Face++ Error Rate |
|---|---|---|---|
| Lighter-skinned males | 0.0% | 0.8% | 0.7% |
| Lighter-skinned females | 6.0% | 3.0% | 2.0% |
| Darker-skinned males | 11.8% | 6.0% | 0.7% |
| Darker-skinned females | 34.7% | 20.8% | 19.8% |
The pattern was consistent across all three systems: error rates were highest for darker-skinned women and lowest for lighter-skinned men. The gap was not marginal — IBM's system was 34.7 percentage points more accurate for lighter-skinned men than for darker-skinned women.
Why the Disparities Exist
The accuracy disparities stem from multiple, reinforcing factors:
Training data bias. Facial recognition systems learn from large datasets of labeled face images. These datasets have historically overrepresented lighter-skinned faces, particularly lighter-skinned male faces. When a system is trained predominantly on images of one demographic group, it develops stronger internal representations for that group and weaker ones for underrepresented groups.
Imaging physics. Camera sensors and image processing algorithms are optimized for lighter skin tones. In low-light conditions — precisely the conditions in many surveillance scenarios — darker skin is more difficult to resolve with sufficient detail for accurate matching. This is not an inherent limitation of the technology but a reflection of design choices made during camera development.
Benchmark design. Until the Gender Shades study, widely used facial recognition benchmarks (the datasets used to evaluate system accuracy) were themselves unbalanced. Systems that performed well on these benchmarks were not actually performing well for all populations — they were performing well for the populations overrepresented in the benchmarks.
The NIST Studies
The National Institute of Standards and Technology (NIST) conducted its own evaluation of facial recognition accuracy across demographics. The 2019 NIST Face Recognition Vendor Test (FRVT) examined 189 algorithms from 99 developers and found:
- False positive rates for African American and Asian faces were 10 to 100 times higher than for Caucasian faces in many algorithms.
- The disparity was most pronounced in one-to-many matching (searching a face against a large database) — precisely the application used in the Williams case.
- Some algorithms showed minimal demographic differences, demonstrating that disparity is not inherent to the technology but is a function of how individual systems are designed and trained.
The Governance Failure
What Should Have Happened
The Detroit Police Department had an internal policy stating that facial recognition results should be used as an "investigative lead only" and should not serve as the sole basis for arrest. This policy, however, was inadequately implemented and apparently not followed in the Williams case.
A rigorous investigative process would have included:
-
Treating the facial recognition output as a lead, not an identification. The system's match should have prompted further investigation — reviewing additional surveillance footage, examining credit card records, checking alibi evidence, seeking witnesses — not a warrant application.
-
Assessing image quality. The surveillance image was low quality: grainy, from an oblique angle, and partially obscured. A competent operator would have recognized that a reliable match from such an image was unlikely, regardless of the algorithm's confidence score.
-
Conducting a properly administered photo lineup. The photo lineup shown to the Shinola contractor was administered after the investigator had already been influenced by the facial recognition match — a form of confirmation bias. Additionally, the contractor had not witnessed the theft and was identifying from surveillance footage he had reviewed, not from personal observation. The identification was weak and should not have been treated as corroborating evidence.
-
Independent review. No supervisor reviewed the case before the warrant application. No human expert evaluated the quality of the facial recognition match. The system's output flowed through the investigative process with minimal human scrutiny.
What Actually Happened
The facial recognition match was treated as an identification rather than a lead. The photo lineup provided thin corroboration from a non-witness. The warrant was issued based on these two pieces of evidence alone. The investigative process essentially automated the identification decision and then constructed a minimal evidentiary facade around it.
This pattern — where an automated system's output becomes the de facto decision, with human review serving as a rubber stamp rather than a genuine check — is a recurring theme in algorithmic decision-making. It appears in contexts far beyond law enforcement: in credit scoring, hiring, content moderation, and healthcare triage. The Williams case illustrates the consequences when this pattern operates in a domain where individual liberty is at stake.
The Broader Pattern
Williams Was Not Alone
Williams' case was the first publicly documented wrongful arrest attributed to facial recognition in the United States, but it was not the last. In the months and years following, additional cases emerged:
- Michael Oliver (2019, Detroit): A Black man wrongfully accused of a felony based on a facial recognition match. Charges were dropped after an investigation revealed the match was incorrect.
- Nijeer Parks (2019, New Jersey): A Black man arrested for shoplifting and assault based on a facial recognition match. Parks spent 10 days in jail before charges were dismissed. He had never been to the town where the crime occurred.
- Porcha Woodruff (2023, Detroit): A Black woman, eight months pregnant, wrongfully arrested based on a facial recognition match and held for 11 hours. She was charged with robbery and carjacking and experienced contractions and dehydration during detention.
All publicly documented cases of facial recognition-related wrongful arrests in the United States have involved Black individuals. This pattern is consistent with the accuracy disparities documented by the Gender Shades study and NIST evaluations — higher false positive rates for darker-skinned faces translate directly into higher wrongful identification rates for Black Americans.
The Civil Rights Dimension
Eli, who followed the Williams case closely, frames the issue as a civil rights concern in his class discussion: "This isn't a technology problem that happens to affect Black people. This is a civil rights problem that uses technology as its mechanism. Facial recognition didn't create racial bias in policing — it automated it."
The argument has historical resonance. Law enforcement has a documented history of racially biased practices — from racial profiling to disproportionate stop-and-frisk targeting to the over-policing of Black neighborhoods. When a facial recognition system with higher error rates for Black faces is deployed in a policing context with pre-existing racial disparities, the technology amplifies the bias rather than correcting it.
This is not merely an abstract concern. The communities where facial recognition surveillance is most extensively deployed — urban, predominantly Black neighborhoods — are the communities where the technology is least accurate. The people most surveilled are the people most likely to be wrongly identified.
Legal and Policy Responses
The ACLU's Role
The ACLU of Michigan represented Williams in his complaint and lawsuit against the Detroit Police Department. The organization has been a leading voice calling for bans on government use of facial recognition, arguing that the technology cannot be made sufficiently accurate and that even if it could, the infrastructure of ubiquitous face surveillance is incompatible with civil liberties.
Municipal Bans
Williams' case strengthened the movement for municipal bans on government facial recognition. As of 2024, cities including San Francisco, Oakland, Boston, Portland (Oregon), Minneapolis, and New Orleans have enacted restrictions on government use of facial recognition, ranging from outright bans to moratoriums to requirements for city council approval before deployment.
Federal Proposals
Several federal bills have been introduced to regulate or restrict facial recognition, including the Facial Recognition and Biometric Technology Moratorium Act, which would prohibit federal use of facial recognition and condition federal funding on state and local agencies doing the same. As of the date of this textbook, no comprehensive federal facial recognition law has been enacted.
Detroit's Response
Following the Williams case and subsequent wrongful arrests, the Detroit Police Department revised its facial recognition policy. The updated policy requires: - Facial recognition may only be used for violent crimes. - Results must be treated as investigative leads, not identifications. - A detective must conduct independent investigation before seeking an arrest warrant. - Photo lineups based on facial recognition must be administered by someone unaware of which photo was the system's match.
Whether these procedural reforms are sufficient — given that the previous policy also stated that facial recognition should be an "investigative lead only" but was not followed — remains contested.
Discussion Questions
-
The accuracy question. Some argue that facial recognition should be banned because it is inaccurate for certain populations. Others argue it should be improved rather than banned. But consider: even if facial recognition achieved 99.9% accuracy across all demographics, searching it against a database of 49 million faces would still produce thousands of false matches. Is the problem accuracy, or is it the scale of deployment? Would a perfectly accurate system still raise civil liberties concerns?
-
The governance gap. Detroit's policy stated that facial recognition should be an "investigative lead only." The policy was not followed. What governance mechanisms could ensure compliance with such policies? Is policy alone sufficient, or are technical controls (e.g., systems that prevent warrant applications based solely on facial recognition) needed?
-
The civil rights framing. Eli argues that facial recognition in policing is a civil rights issue, not merely a technology issue. Do you agree? What historical parallels exist between facial recognition and other technologies that disproportionately affected communities of color (e.g., polygraph tests, predictive policing, stop-and-frisk databases)?
-
The community perspective. The communities where facial recognition is most deployed are also communities with high crime rates, where residents may want more effective policing. How should the concerns of crime victims in these communities be weighed against the civil liberties concerns raised by false matches? Is this a genuine tension, or a false dichotomy?
Your Turn: Mini-Project
Option A: Accuracy Audit. Read the NIST Face Recognition Vendor Test (FRVT) report (available online). Select three algorithms and compare their false positive rates across demographic groups. Present your findings in a table and write a one-page analysis of the demographic patterns you observe. Are some algorithms significantly more equitable than others? What design choices might explain the differences?
Option B: Policy Comparison. Research the facial recognition policies of three cities — one that has banned government use, one that has restricted it, and one that has no restrictions. Write a two-page comparative analysis covering: (a) what each city permits and prohibits, (b) the arguments made for each approach, (c) any enforcement mechanisms, and (d) your recommendation for which approach best balances safety and civil liberties.
Option C: The Williams Testimony. Read Robert Williams' testimony to the U.S. House Judiciary Committee's Subcommittee on Crime, Terrorism, and Homeland Security (available online). Write a 1,000-word analysis connecting his testimony to three specific concepts from Chapter 12: biometric privacy, the irreversibility of biometric data, and the accountability gap in automated decision-making. How does Williams' experience illustrate the theoretical concerns discussed in the chapter?
References
-
Hill, Kashmir. "Wrongfully Accused by an Algorithm." The New York Times, June 24, 2020.
-
Buolamwini, Joy, and Timnit Gebru. "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT)*, 77-91. PMLR, 2018.
-
Grother, Patrick, Mei Ngan, and Kayee Hanaoka. "Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects." NIST Interagency Report 8280, December 2019.
-
American Civil Liberties Union. "Robert Williams: The First Person Arrested Based on a False Facial Recognition Match." ACLU, 2020.
-
Williams, Robert. Testimony before the U.S. House Judiciary Committee, Subcommittee on Crime, Terrorism, and Homeland Security. July 13, 2021.
-
Garvie, Clare, Alvaro Bedoya, and Jonathan Frankle. "The Perpetual Line-Up: Unregulated Police Face Recognition in America." Georgetown Law Center on Privacy and Technology, 2016.
-
Raji, Inioluwa Deborah, and Joy Buolamwini. "Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products." In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 429-435. ACM, 2019.
-
Detroit Police Department. "Directive 307.5: Use of Facial Recognition Technology." Revised January 2021.
-
Ferguson, Andrew Guthrie. The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement. New York: New York University Press, 2017.