Appendix B: Key Studies in Data Governance and AI Ethics
This appendix provides detailed summaries of 30 landmark studies referenced throughout this textbook. Each entry includes the study's citation, research question, method, key findings, significance for data governance, and limitations. These studies form the empirical foundation of the field. Reading them in full -- where accessible -- will deepen your understanding far beyond what any textbook summary can provide.
Studies are organized chronologically to show the development of the field over time.
Study 1: Warren and Brandeis — "The Right to Privacy" (1890)
Citation: Warren, S.D. and Brandeis, L.D. (1890). "The Right to Privacy." Harvard Law Review, 4(5), 193-220.
Research question: Does existing law protect individuals from the publication of private information, and if not, should it?
Method: Legal analysis and normative argument. Warren and Brandeis analyzed existing precedents in tort law, intellectual property, and contract law to argue for a new legal right.
Key findings: Existing legal categories (property, contract) were insufficient to protect privacy. The authors proposed a new right -- "the right to be let alone" -- grounded in the principle of an "inviolate personality." They argued that technological change (specifically, instantaneous photography and newspaper enterprise) had created new threats to privacy that required new legal protections.
Significance: This article is widely credited as the founding document of American privacy law. It established privacy as a legal concept distinct from property rights and introduced the idea that technological change can create new privacy harms requiring new legal responses -- a logic that remains central to data governance today (Chapter 7).
Limitations: The article's conception of privacy as "the right to be let alone" is passive and individualistic, poorly suited to contexts where privacy requires active data governance across networks of interconnected actors.
Study 2: Foucault — Discipline and Punish: Panopticism (1975)
Citation: Foucault, M. (1975/1977). Discipline and Punish: The Birth of the Prison. Trans. Alan Sheridan. Pantheon Books. Chapter 3: "Panopticism."
Research question: How do modern institutions exercise power over individuals, and what role does surveillance play in this process?
Method: Historical and philosophical analysis of the evolution of punishment and social control from the spectacle of public execution to the disciplinary mechanisms of modern institutions.
Key findings: Foucault theorized that modern power operates not through overt violence but through discipline -- the internalization of surveillance. Using Bentham's panopticon as a model, he argued that the key mechanism is visibility: when individuals know they could be observed at any time, they regulate their own behavior. Power becomes self-enforcing.
Significance: Foucault's panopticon model is the most widely cited theoretical framework for understanding digital surveillance (Chapter 8). His concepts of power/knowledge and disciplinary power underpin analyses of how data collection shapes behavior, even when no specific act of surveillance produces direct harm (Chapter 5).
Limitations: Foucault's model assumes a centralized, institutional observer. Contemporary surveillance is often decentralized (surveillance by multiple competing platforms) and participatory (social media users surveilling each other), requiring adaptations to the original framework.
Study 3: de Montjoye et al. — MetaPhone Study (2013)
Citation: de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., and Blondel, V.D. (2013). "Unique in the Crowd: The privacy bounds of human mobility." Scientific Reports, 3, 1376.
Research question: How uniquely identifiable are individuals from their mobility traces, and how much data is needed to re-identify them?
Method: Analysis of 15 months of mobility data from 1.5 million anonymized mobile phone users. The researchers measured how many spatiotemporal points (locations at specific times) were needed to uniquely identify an individual.
Key findings: Four spatiotemporal points were sufficient to uniquely identify 95% of individuals. Even coarse data (cell tower-level, hourly resolution) allowed re-identification of over 50% of users. The study demonstrated that human mobility patterns are highly unique and that "anonymization" of location data provides far less protection than commonly assumed.
Significance: This study is a foundational reference for discussions of re-identification risk (Chapter 10), the limits of anonymization, and the argument that metadata can be as revealing as content (Chapter 1). It provides empirical grounding for the claim that location data should be treated as personal data regardless of whether names are attached.
Limitations: The study measured theoretical uniqueness (how many points are sufficient for identification) rather than practical re-identification (whether an adversary could actually perform the identification with available resources). Real-world re-identification may be harder than the theoretical bounds suggest.
Study 4: Sweeney — Re-identification of Massachusetts Hospital Data (2000/2002)
Citation: Sweeney, L. (2000). "Simple Demographics Often Identify People Uniquely." Carnegie Mellon University Data Privacy Working Paper 3. Also: Sweeney, L. (2002). "k-anonymity: A model for protecting privacy." International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 557-570.
Research question: Can individuals be re-identified from "anonymized" datasets using publicly available information?
Method: Sweeney obtained an anonymized hospital discharge dataset from the Massachusetts Group Insurance Commission and linked it to publicly available voter registration records using zip code, date of birth, and gender. She successfully identified the medical records of then-Governor William Weld.
Key findings: 87% of the U.S. population could be uniquely identified by the combination of zip code, date of birth, and gender -- three attributes commonly retained in "de-identified" datasets. The study demonstrated that simple quasi-identifier combinations provide far less anonymity than assumed.
Significance: This study motivated the development of k-anonymity (Chapter 10) and fundamentally changed how the field understands anonymization. It demonstrated that removing names and Social Security numbers is not sufficient to prevent re-identification, a finding with profound implications for health data, census data, and any dataset released for research purposes.
Limitations: The specific statistic (87%) has been debated by subsequent researchers who note it depends on the resolution of zip code data. However, the core finding -- that quasi-identifiers enable re-identification -- has been repeatedly confirmed.
Study 5: ProPublica — COMPAS Recidivism Algorithm Analysis (2016)
Citation: Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." ProPublica, May 23, 2016. Also: Larson, J., Mattu, S., Kirchner, L., and Angwin, J. (2016). "How We Analyzed the COMPAS Recidivism Algorithm." ProPublica.
Research question: Does the COMPAS recidivism prediction algorithm treat Black and white defendants fairly?
Method: Analysis of COMPAS risk scores and outcomes for over 7,000 defendants in Broward County, Florida. ProPublica compared false positive rates (incorrectly predicted to reoffend) and false negative rates (incorrectly predicted not to reoffend) across racial groups.
Key findings: Black defendants were nearly twice as likely as white defendants to be falsely labeled as high-risk (false positive rate: ~45% vs. ~24%). White defendants were more likely to be falsely labeled as low-risk (false negative rate: ~48% vs. ~28%). The system was calibrated (risk scores meant roughly the same thing across races) but produced racially disparate error rates.
Significance: This study launched the public debate on algorithmic fairness and is the central case study in Chapters 14 and 15. It demonstrated that an algorithm can be "fair" by one definition (calibration) and "unfair" by another (equalized odds), making the choice of fairness metric a political, not merely technical, decision. It also catalyzed the development of the impossibility theorems discussed in Chapter 15.
Limitations: ProPublica's analysis was contested by Northpointe (now Equivant), which argued the system was calibrated and therefore fair. The debate illustrated that different stakeholders can reach different conclusions from the same data depending on their chosen fairness metric. Additionally, the two-year follow-up period may have been insufficient to capture all recidivism.
Study 6: Buolamwini and Gebru — Gender Shades (2018)
Citation: Buolamwini, J. and Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." Proceedings of Machine Learning Research, 81, 1-15.
Research question: Do commercial facial recognition systems perform equally well across gender and skin-type subgroups?
Method: The researchers created a new benchmark dataset (Pilot Parliaments Benchmark, or PPB) with balanced representation across gender and skin type. They evaluated three commercial facial recognition systems (Microsoft, IBM, Face++) on this benchmark, disaggregating accuracy by intersectional subgroups (lighter-skinned males, lighter-skinned females, darker-skinned males, darker-skinned females).
Key findings: All three systems performed worst on darker-skinned females, with error rates up to 34.7% compared to 0.8% for lighter-skinned males. The error rate gap was far larger at the intersection of gender and skin type than for either dimension alone.
Significance: This study is a landmark in algorithmic fairness research and a paradigmatic example of intersectional analysis (Chapters 14, 15, 17). It demonstrated that (1) existing benchmarks masked disparities by reporting only aggregate accuracy, (2) intersectional subgroups experienced harms invisible to single-axis analysis, and (3) companies' own testing procedures were inadequate. Following the study, Microsoft, IBM, and other companies improved their systems and committed to more rigorous testing.
Limitations: The PPB benchmark, while more diverse than predecessors, used photos of parliamentarians and may not be representative of all real-world use cases. The study evaluated a binary gender classification task, which does not address the rights of non-binary or gender-nonconforming individuals.
Study 7: Obermeyer et al. — Racial Bias in Health Algorithms (2019)
Citation: Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations." Science, 366(6464), 447-453.
Research question: Does a widely used commercial algorithm for identifying patients who would benefit from care coordination programs exhibit racial bias?
Method: The researchers analyzed the algorithm's predictions for approximately 50,000 patients, comparing predicted health needs to actual health needs across racial groups. They identified the mechanism of bias by examining the algorithm's choice of prediction target.
Key findings: The algorithm used healthcare spending as a proxy for health need. Because Black patients have historically had less access to healthcare (and therefore lower spending), the algorithm systematically underestimated their health needs. At a given risk score, Black patients were significantly sicker than white patients with the same score. The researchers estimated that eliminating this bias would increase the proportion of Black patients receiving additional care from 17.7% to 46.5%.
Significance: This study is the central case for discussions of proxy variable bias in Chapter 14 and is referenced in the VitraMed thread. It demonstrated that bias can arise not from malicious intent but from the choice of optimization target -- a decision made early in the development process and rarely revisited. It also showed that "accurate" predictions (the algorithm accurately predicted spending) can be profoundly unfair when the prediction target is itself biased.
Limitations: The study analyzed one specific algorithm from one vendor. While the mechanism of bias (using spending as a proxy) is likely common across similar systems, the specific magnitude of bias may vary.
Study 8: Vosoughi, Roy, and Aral — The Spread of False News (2018)
Citation: Vosoughi, S., Roy, D., and Aral, S. (2018). "The spread of true and false news online." Science, 359(6380), 1146-1152.
Research question: How does the spread of true and false news stories differ on social media?
Method: Analysis of approximately 126,000 stories tweeted by 3 million people on Twitter between 2006 and 2017. Stories were verified as true or false by six independent fact-checking organizations. The researchers compared the speed, depth, and breadth of diffusion for true and false stories.
Key findings: False news stories spread farther, faster, and more broadly than true stories across all categories of information. False political news was the most viral category. The effect was not attributable to bots: false news was spread primarily by humans. The researchers attributed the differential spread to the greater novelty and emotional intensity of false stories.
Significance: This is the most comprehensive empirical study of misinformation dynamics and is the foundational reference in Chapter 31. It established that the architecture of social media platforms -- specifically, algorithmic amplification of novel and engaging content -- creates a structural advantage for false information. This finding has profound implications for platform governance and content moderation.
Limitations: The study measured spread, not impact (exposure does not equal belief). It relied on fact-checking organization classifications, which may not capture all forms of misinformation. The data came exclusively from Twitter, which may not be representative of other platforms.
Study 9: Strubell, Ganesh, and McCallum — Energy and Policy in NLP (2019)
Citation: Strubell, E., Ganesh, A., and McCallum, A. (2019). "Energy and Policy Considerations for Deep Learning in NLP." Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645-3650.
Research question: What are the energy consumption and carbon emissions associated with training large neural network models for natural language processing?
Method: The researchers measured the energy consumption of training several common NLP model architectures and estimated carbon emissions based on the energy mix of the training location. They compared the emissions to everyday equivalents (car lifetimes, airline flights).
Key findings: Training a single large Transformer model with neural architecture search produced approximately 626,155 pounds of CO2 -- roughly equivalent to five times the lifetime emissions of an average American car. The study documented that energy costs of training had been doubling every 3.4 months.
Significance: This study catalyzed the Green AI movement (Chapter 34) and established the environmental cost of AI as a legitimate governance concern. It introduced the practice of reporting carbon emissions alongside model performance metrics and influenced subsequent work on efficient model architectures and carbon-aware computing.
Limitations: The study's specific estimates have been debated; some researchers argue the analysis overestimated emissions by assuming worst-case hardware and energy mixes. However, the core finding -- that large model training has significant and often unreported environmental costs -- has been validated by subsequent research.
Study 10: Awad et al. — The Moral Machine Experiment (2018)
Citation: Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.-F., and Rahwan, I. (2018). "The Moral Machine experiment." Nature, 563, 59-64.
Research question: What moral preferences do people express when faced with ethical dilemmas involving autonomous vehicles, and how do these preferences vary across cultures?
Method: An online experimental platform presented participants from 233 countries with trolley-problem-style dilemmas involving autonomous vehicles (e.g., save passengers or pedestrians, save more lives or fewer, save young or old). Over 40 million decisions were collected.
Key findings: Three dominant preferences were cross-culturally shared: saving more lives, saving younger lives, and saving humans over animals. However, significant cultural variation emerged on other dimensions: individualist cultures showed stronger preferences for saving younger people; collectivist cultures showed less age bias; and countries with stronger institutions showed more consistent responses. The study identified three cultural "clusters" with distinct moral profiles.
Significance: This study is the primary reference for the cultural dimension of autonomous systems ethics in Chapter 19. It demonstrated that "universal" moral rules for autonomous systems do not exist and that the process by which moral parameters are set is as important as the parameters themselves. It also raised methodological questions about whether trolley-problem scenarios capture the actual ethical challenges of autonomous vehicles.
Limitations: The study relied on hypothetical scenarios that may not reflect how people would behave in real situations. The sample was self-selected (online participants who chose to engage with the platform) and skewed toward younger, more educated, and more internet-connected populations. The binary-choice format forced respondents into artificial either/or decisions.
Study 11: Acquisti, Brandimarte, and Loewenstein — Privacy and Human Behavior (2015)
Citation: Acquisti, A., Brandimarte, L., and Loewenstein, G. (2015). "Privacy and Human Behavior in the Age of Information." Science, 347(6221), 509-514.
Research question: What factors influence individuals' privacy-related decisions, and how well do economic models of rational choice explain privacy behavior?
Method: Review and synthesis of experimental and observational studies on privacy decision-making, including the authors' own experiments on the privacy paradox.
Key findings: People's privacy behavior is driven not by stable, coherent preferences but by contextual factors, framing effects, and cognitive biases. The "privacy paradox" -- the gap between stated concern and actual behavior -- is explained by: bounded rationality, present bias, the complexity of privacy trade-offs, and the influence of defaults and framing. People are not irrational about privacy; they face genuinely complex decisions with inadequate information and limited cognitive resources.
Significance: This study is the primary reference for the economics of privacy discussion in Chapter 11 and the consent critique in Chapter 9. It provides empirical evidence that the notice-and-consent model fails not because people do not care about privacy but because privacy decisions are systematically harder than the model assumes.
Limitations: Most experimental findings come from Western, educated, industrialized populations. Privacy behavior may be significantly different in other cultural contexts.
Study 12: Nissenbaum — Privacy as Contextual Integrity (2004/2009)
Citation: Nissenbaum, H. (2004). "Privacy as Contextual Integrity." Washington Law Review, 79(1), 119-158. Also: Nissenbaum, H. (2009). Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press.
Research question: What constitutes a privacy violation, and how can we distinguish legitimate from illegitimate information flows?
Method: Philosophical analysis and framework development. Nissenbaum developed contextual integrity as an alternative to both the consent model and the public/private dichotomy.
Key findings: Privacy is violated when information flows breach the norms appropriate to a specific social context. Each context (healthcare, education, friendship, commerce) has its own informational norms specifying appropriate actors, attributes, and transmission principles. A privacy violation occurs when information flows in ways that do not conform to the governing norms -- even if the individual "consented" through a terms-of-service agreement.
Significance: Contextual integrity is the most widely adopted theoretical framework for privacy analysis in information ethics (Chapter 7). It provides a principled basis for evaluating whether specific data practices are appropriate, moving beyond the binary of "consented/did not consent" to ask whether the information flow respects the norms of the relevant social context.
Limitations: Contextual integrity can be conservative -- it evaluates new practices against existing norms, which may themselves be unjust or outdated. It also requires identifying the relevant context, which can be ambiguous when data flows cross multiple contexts.
Study 13: Zuboff — Surveillance Capitalism (2019)
Citation: Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.
Research question: How has the business model of major technology companies transformed the relationship between capitalism, surveillance, and human autonomy?
Method: Historical and sociological analysis combining corporate documents, patent filings, executive statements, and economic analysis.
Key findings: Zuboff coined the term "surveillance capitalism" to describe a new economic logic in which human experience is claimed as free raw material for prediction products. She identified "behavioral surplus" -- data collected beyond what is needed to improve services -- as the foundation of this logic and argued that surveillance capitalism represents a fundamentally new form of power that threatens human autonomy.
Significance: Zuboff's framework is widely referenced in Chapters 4, 5, and 8. The concept of behavioral surplus is central to the textbook's analysis of the attention economy and the power asymmetry between platforms and users.
Limitations: The framework has been criticized for presenting surveillance capitalism as more novel than it is (advertising-based models predate digital platforms) and for underemphasizing the role of the state in enabling and participating in surveillance.
Study 14: Eubanks — Automating Inequality (2018)
Citation: Eubanks, V. (2018). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin's Press.
Research question: How do automated decision-making systems affect poor and working-class Americans?
Method: Case studies of three algorithmic systems: Indiana's automated welfare eligibility system, a coordinated entry system for homeless services in Los Angeles, and a child abuse predictive model in Allegheny County, Pennsylvania.
Key findings: Automated systems do not simply replicate existing inequality -- they intensify it. They create a "digital poorhouse" in which surveillance and algorithmic control are concentrated on the most vulnerable populations. The systems' errors disproportionately harm people who lack the resources, time, and institutional knowledge to appeal or correct them.
Significance: This study grounds the digital divide and data justice discussions in Chapter 32 and the accountability analysis in Chapter 17. It demonstrates that algorithmic systems operate differently on different populations, with the most vulnerable bearing the greatest burden.
Limitations: The three case studies are all set in the United States. The dynamics of automated inequality may differ in countries with different welfare systems and administrative structures.
Study 15: Whittaker et al. — AI Now Report on Disability, Bias, and AI (2019)
Citation: Whittaker, M., Alper, M., Bennett, C.L., Hendren, S., Kaziunas, L., Mills, M., Morris, M.R., Rankin, J., Rogers, E., Salas, M., and West, S.M. (2019). "Disability, Bias, and AI." AI Now Institute, New York University.
Research question: How do AI systems create barriers and biases for people with disabilities?
Method: Literature review, case analysis, and policy analysis examining how disability intersects with AI deployment across sectors including hiring, healthcare, insurance, and public services.
Key findings: AI systems frequently disadvantage people with disabilities: hiring algorithms penalize non-standard speech patterns; emotion recognition systems misinterpret facial expressions associated with certain conditions; and predictive models treat disability as a risk factor rather than a characteristic to be accommodated. Existing bias auditing frameworks rarely include disability as a protected category.
Significance: This report broadens the algorithmic fairness discussion (Chapters 14-15) beyond race and gender to include disability. It demonstrates that fairness frameworks built around "protected categories" may still exclude groups not well-represented in training data or testing benchmarks.
Limitations: The report is primarily a policy document rather than an empirical study; it synthesizes existing findings rather than presenting new data.
Study 16: Gebru et al. — Datasheets for Datasets (2021)
Citation: Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J.W., Wallach, H., Daume III, H., and Crawford, K. (2021). "Datasheets for Datasets." Communications of the ACM, 64(12), 86-92.
Research question: How can the documentation of datasets be standardized to improve transparency and accountability in machine learning?
Method: Framework development. The authors proposed a standardized documentation format for datasets, modeled on datasheets used in the electronics industry.
Key findings: The authors proposed that every dataset be accompanied by a "datasheet" documenting its motivation, composition, collection process, preprocessing, uses, distribution, and maintenance. The datasheet answers questions like: Who collected the data? What was the collection mechanism? What are the known biases? Who was excluded?
Significance: Datasheets for datasets are a core component of the responsible AI development pipeline discussed in Chapter 29. They address the "data black box" problem -- the fact that machine learning models are typically evaluated on their outputs without scrutiny of their inputs.
Limitations: Datasheets require effort to create and maintain, and there is currently no enforcement mechanism to ensure their quality or completeness. Voluntary adoption has been uneven.
Study 17: Mitchell et al. — Model Cards for Model Reporting (2019)
Citation: Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D., and Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229.
Research question: How can documentation of machine learning models be standardized to improve transparency about model capabilities, limitations, and potential for harm?
Method: Framework development. The authors proposed "model cards" as standardized documentation that accompanies trained machine learning models.
Key findings: Model cards should include: model details, intended use, out-of-scope uses, relevant factors (demographic and environmental), metrics (including disaggregated performance), evaluation data, training data, ethical considerations, and caveats. The key innovation was requiring disaggregated performance metrics -- reporting accuracy not just overall but for specific demographic subgroups.
Significance: Model cards are the primary documentation framework discussed in Chapter 29 and are implemented through the ModelCard dataclass. They have been widely adopted in the machine learning community and are required by some AI governance frameworks.
Limitations: Model cards can become compliance artifacts rather than genuine transparency tools if they are completed as checkboxes rather than as thoughtful evaluations. Their effectiveness depends on who reads them and whether the information influences decisions.
Study 18: Benjamin — Race After Technology (2019)
Citation: Benjamin, R. (2019). Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press.
Research question: How do digital technologies reproduce and deepen racial inequalities, even when designed with ostensibly neutral or beneficial intent?
Method: Cultural analysis and critical theory, drawing on case studies from healthcare, criminal justice, marketing, and urban planning.
Key findings: Benjamin coined the term "the New Jim Code" to describe the intersection of technology and racial inequality: coded inequity (algorithmic bias), default discrimination (design choices that disadvantage), and engineered inequity (systems built on racialized assumptions). She argued that seemingly race-neutral technologies can function as "racist robots" when deployed in a society structured by racial inequality.
Significance: Benjamin's framework informs the textbook's analysis of structural bias (Chapter 14), digital redlining (Chapter 32), and the relationship between technology and social power (Chapter 5).
Limitations: The analysis focuses primarily on the United States; the mechanisms of techno-racial inequality may operate differently in other national and cultural contexts.
Study 19: Kosinski, Stillwell, and Graepel — Digital Footprints and Personality (2013)
Citation: Kosinski, M., Stillwell, D., and Graepel, T. (2013). "Private traits and attributes are predictable from digital records of human behavior." Proceedings of the National Academy of Sciences, 110(15), 5802-5805.
Research question: Can personal attributes be predicted from digital records of behavior, specifically Facebook "likes"?
Method: Analysis of Facebook profiles of 58,466 users who voluntarily provided personality questionnaires and demographic information. Machine learning models predicted personal attributes from Facebook Likes alone.
Key findings: Models could accurately predict sexual orientation (88% for men, 75% for women), ethnicity (95%), political affiliation (85%), religious views (82%), personality traits, intelligence, and substance use from Likes alone. The study demonstrated that even innocuous-seeming behavioral data could reveal sensitive attributes.
Significance: This study is a foundational reference for discussions of inference, profiling, and the limits of consent in Chapters 7 and 9. It demonstrated that the traditional model of protecting "sensitive data" by restricting specific categories fails when sensitive attributes can be inferred from non-sensitive data.
Limitations: Prediction accuracy was measured at the group level, not the individual level; individual predictions may be significantly less accurate. The study also predated many privacy changes to Facebook's API.
Study 20: Narayanan and Shmatikov — De-anonymization of Netflix Data (2008)
Citation: Narayanan, A. and Shmatikov, V. (2008). "Robust De-anonymization of Large Sparse Datasets." IEEE Symposium on Security and Privacy, 111-125.
Research question: Can anonymized datasets be re-identified by linking them to other publicly available data?
Method: The researchers linked the anonymized Netflix Prize dataset (movie ratings with names removed) to publicly available Internet Movie Database (IMDb) reviews. They used statistical matching techniques to identify individual Netflix users.
Key findings: The researchers successfully identified Netflix users from the anonymized dataset by matching rating patterns to public IMDb profiles. The attack required as few as 8 movie ratings to uniquely identify a user with high confidence.
Significance: This study, along with Sweeney's work, is a cornerstone of the re-identification literature (Chapter 10). It demonstrated that anonymization of behavioral data is fundamentally fragile when auxiliary datasets are available for linkage.
Limitations: The specific vulnerability was tied to the Netflix Prize dataset's structure. Modern anonymization techniques (differential privacy) provide stronger guarantees, though they come with accuracy trade-offs.
Study 21: Chouldechova — Fair Prediction with Disparate Impact (2017)
Citation: Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163.
Research question: Can a prediction instrument simultaneously satisfy multiple fairness criteria?
Method: Mathematical proof supported by empirical analysis of COMPAS data.
Key findings: When base rates differ between groups, it is mathematically impossible for a prediction instrument to simultaneously achieve calibration (equal PPV) and error rate balance (equal FPR and FNR). This is a formal impossibility result, not a limitation of any particular algorithm.
Significance: Along with Kleinberg, Mullainathan, and Raghavan (2016), this study established the impossibility theorem that is central to Chapter 15. It proved that the ProPublica-Northpointe disagreement about COMPAS was not a resolvable empirical dispute but a reflection of a fundamental mathematical constraint.
Limitations: The impossibility result applies only when base rates differ. It does not tell us which fairness criterion to prioritize; that remains a normative question.
Study 22: Noble — Algorithms of Oppression (2018)
Citation: Noble, S.U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.
Research question: How do search engine algorithms reproduce and reinforce racial and gender stereotypes?
Method: Critical analysis of Google search results, combining content analysis of search outputs with historical and sociological analysis of the advertising and information economy.
Key findings: Google searches for terms related to Black women, girls, and other marginalized groups returned results dominated by pornography, stereotypes, and commercial exploitation. Noble argued that these results were not neutral reflections of the internet but products of algorithmic design, advertising economics, and the historical devaluation of Black women in American culture.
Significance: Noble's work informs the textbook's analysis of algorithmic gatekeeping (Chapter 13) and the relationship between search algorithms and social power (Chapter 5). It demonstrates that information retrieval systems are not neutral and that their outputs have material consequences for the communities they represent.
Limitations: Search results change rapidly, and some of the specific results Noble documented have been altered since publication. The analysis focuses on Google, which may not be representative of all search engines.
Study 23: Amnesty International — Surveillance Giants (2019)
Citation: Amnesty International (2019). "Surveillance Giants: How the Business Model of Google and Facebook Threatens Human Rights." Amnesty International.
Research question: Are the business models of major technology platforms compatible with the right to privacy and other fundamental human rights?
Method: Human rights assessment applying the UN Guiding Principles on Business and Human Rights and international human rights law to the business models of Google and Facebook.
Key findings: The surveillance-based business models of Google and Facebook are inherently incompatible with the right to privacy. The report concluded that these companies' core business operations -- mass data collection, profiling, and targeted advertising -- constitute a form of surveillance that undermines privacy, freedom of expression, and freedom of opinion. The report called for a fundamental rethinking of the advertising-based business model.
Significance: This study bridges the human rights and data governance literatures and informs Chapter 4 (the attention economy), Chapter 8 (surveillance), and Chapter 36 (national security).
Limitations: The report takes a strong normative position that may be contested by those who argue that advertising-based business models provide genuine value in exchange for data.
Study 24: Raji et al. — Actionable Auditing (2020)
Citation: Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., and Barnes, P. (2020). "Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing." Proceedings of the Conference on Fairness, Accountability, and Transparency, 33-44.
Research question: How can organizations implement effective internal algorithmic auditing processes?
Method: Framework development based on the authors' experience at Google, combining audit methodology with organizational change theory.
Key findings: Effective algorithmic auditing requires an end-to-end framework spanning the entire machine learning lifecycle: scoping (defining what to audit), mapping (identifying stakeholders and potential harms), artifact collection, testing, reflection, and mitigation. The authors argued that audits must be embedded in organizational processes, not conducted as one-off exercises.
Significance: This framework informs the accountability and audit discussion in Chapter 17 and the responsible AI development pipeline in Chapter 29.
Limitations: The framework was developed primarily for large technology companies with significant internal resources; adaptation for smaller organizations requires attention to resource constraints.
Study 25: Crawford and Joler — Anatomy of an AI System (2018)
Citation: Crawford, K. and Joler, V. (2018). "Anatomy of an AI System: The Amazon Echo as an Anatomical Map of Human Labor, Data, and Planetary Resources."
Research question: What are the full material, labor, and data costs of a consumer AI device?
Method: Detailed mapping of the supply chain, data flows, and labor involved in producing and operating an Amazon Echo device, from mineral extraction through manufacturing, data collection, cloud processing, and disposal.
Key findings: The anatomy map revealed that a single AI device depends on: rare earth mineral extraction (often involving exploitative labor conditions), global manufacturing supply chains, massive data center infrastructure, underpaid data labeling and content moderation labor, and generates e-waste with significant environmental consequences. The total cost of an AI device is vastly larger than its purchase price.
Significance: This study is referenced in Chapter 33 (labor), Chapter 34 (environment), and Chapter 37 (Global South perspectives). It demonstrates that the "cloud" is not immaterial but depends on physical infrastructure, human labor, and natural resources with unevenly distributed costs.
Limitations: The analysis focuses on one specific device and company; the anatomy of other AI systems may differ in detail.
Study 26: Couldry and Mejias — The Costs of Connection (2019)
Citation: Couldry, N. and Mejias, U.A. (2019). The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism. Stanford University Press.
Research question: How does the contemporary data economy reproduce dynamics of historical colonialism?
Method: Theoretical analysis combining political economy, postcolonial theory, and critical data studies.
Key findings: The authors coined "data colonialism" to describe the systematic appropriation of human life through data extraction, arguing that just as historical colonialism appropriated land and resources, data colonialism appropriates human experience as raw material. They identified data relations (the social arrangements that normalize data extraction) as the mechanism through which colonialism is reproduced.
Significance: The data colonialism framework is central to Chapter 37 (Global South perspectives) and informs the power analysis throughout the textbook.
Limitations: The colonial analogy, while illuminating, has been critiqued for potentially minimizing the violence and physical dispossession of historical colonialism.
Study 27: Calo — Digital Market Manipulation (2014)
Citation: Calo, R. (2014). "Digital Market Manipulation." George Washington Law Review, 82(4), 995-1051.
Research question: Does the ability of online firms to exploit individual cognitive biases through personalized interfaces constitute a new form of market manipulation?
Method: Legal and economic analysis.
Key findings: Online firms possess two capabilities unavailable to traditional firms: the ability to identify individual consumers' cognitive vulnerabilities through data, and the ability to target each consumer with personalized persuasion designed to exploit those specific vulnerabilities. This combination constitutes "digital market manipulation" -- a practice that existing consumer protection law is poorly equipped to address.
Significance: This study informs the dark patterns analysis in Chapter 4 and the consent critique in Chapter 9.
Limitations: The article is a legal analysis, not an empirical study; it identifies the theoretical possibility of digital market manipulation but does not quantify its prevalence.
Study 28: Madden et al. — Privacy, Poverty, and Big Data (2017)
Citation: Madden, M., Gilman, M., Levy, K., and Marwick, A. (2017). "Privacy, Poverty, and Big Data: A Matrix of Vulnerabilities for Poor Americans." Washington University Law Review, 95(1), 53-125.
Research question: How does poverty shape vulnerability to privacy invasions in a data-driven society?
Method: Literature review and theoretical analysis combining privacy law, poverty law, and surveillance studies.
Key findings: Low-income Americans face a "matrix of vulnerabilities" in which poverty increases exposure to surveillance (through welfare systems, public housing, and social services) while simultaneously reducing the resources available to resist surveillance (legal representation, digital literacy, alternative services). Privacy is not equally distributed; it is stratified along the same dimensions as other forms of social inequality.
Significance: This study informs Chapter 32 (digital divide and data justice) and the textbook's recurring theme that data governance challenges disproportionately affect vulnerable populations.
Limitations: The analysis is US-focused; the intersection of poverty and privacy may operate differently under other social welfare systems.
Study 29: Jobin, Ienca, and Vayena — Global Landscape of AI Ethics Guidelines (2019)
Citation: Jobin, A., Ienca, M., and Vayena, E. (2019). "The global landscape of AI ethics guidelines." Nature Machine Intelligence, 1, 389-399.
Research question: What principles do AI ethics guidelines around the world converge on, and where do they diverge?
Method: Systematic review of 84 AI ethics guidelines from public, private, and civil society organizations across the world.
Key findings: Five principles appeared in more than half of all guidelines: transparency, justice/fairness, non-maleficence, responsibility, and privacy. However, guidelines diverged significantly on implementation mechanisms, enforcement, and the relative weight given to different principles. Most guidelines were aspirational rather than enforceable, and few included mechanisms for accountability.
Significance: This study informs the responsible AI frameworks discussion in Chapter 29 and the regulatory landscape analysis in Chapter 20.
Limitations: The study analyzed the content of guidelines, not their implementation or effectiveness. The gap between written principles and actual practice is a persistent challenge.
Study 30: Manyika et al. — AI, Automation, and the Future of Work (2017)
Citation: Manyika, J., Chui, M., Miremadi, M., Bughin, J., George, K., Willmott, P., and Dewhurst, M. (2017). "A Future That Works: Automation, Employment, and Productivity." McKinsey Global Institute.
Research question: How many and what types of jobs are susceptible to automation, and what are the implications for labor markets?
Method: Analysis of the automation potential of over 2,000 work activities across 800 occupations, combined with economic modeling of transition scenarios.
Key findings: About half of all work activities could be automated using currently demonstrated technology, but fewer than 5% of occupations could be fully automated. The impact would vary significantly by sector, geography, and demographic group. Low-wage workers would be disproportionately affected, and the transition period could see significant disruption even if new jobs emerge over time.
Significance: This study informs the labor and automation analysis in Chapter 33. It demonstrates that automation's impact is not uniform and that governance responses must address distributional questions, not just aggregate economic effects.
Limitations: The study's projections have been critiqued for both overestimating (by focusing on technical feasibility rather than economic viability) and underestimating (by not accounting for generative AI capabilities that emerged after publication) the pace of automation.
How to Use This Appendix
For each study, we recommend:
- Read the original. Summaries inevitably lose nuance. Many of these studies are freely available online.
- Evaluate the methodology. Use the criteria in Appendix A to assess each study's validity, reliability, and limitations.
- Connect to the textbook. Each study is referenced in specific chapters; re-reading the relevant chapter sections after reading the original study will deepen your understanding.
- Consider the study's context. When was it published? What was happening in the field at that time? Has subsequent research confirmed, qualified, or contradicted its findings?
- Identify gaps. What questions does the study leave unanswered? What would a follow-up study need to investigate?