Case Study 2.2: The Hidden Workers — Labor and Exploitation in the AI Pipeline
Chapter 2 | AI Ethics for Business Professionals
Overview
Every AI system that seems to function autonomously depends, at some point in its development, on the labor of human beings who are typically invisible in the final product. They label photographs, transcribe audio recordings, categorize text, moderate content, and perform hundreds of other tasks that create the training data and safety guardrails that AI systems require. This workforce — sometimes called "ghost workers," "micro-taskers," or "data annotators" — is large, globally distributed, poorly compensated, and routinely exposed to psychologically harmful content without adequate support.
Understanding the hidden labor that powers AI is not a peripheral concern for business professionals who use or deploy AI systems. It is central to the ethical evaluation of those systems: the supply chain of AI includes human labor as surely as a garment supply chain includes fabric manufacturing, and the conditions in that supply chain are an ethical responsibility of every organization that purchases and deploys the product.
1. The Myth of Fully Automated AI
The popular narrative of AI presents it as a technology of increasing automation: machines that learn from data and then operate independently, reducing and eventually eliminating the need for human involvement. The narrative is not false — AI systems do automate tasks that previously required human judgment — but it is profoundly incomplete. It obscures the massive and ongoing human labor that AI systems require at every stage of their development.
The myth of full automation serves organizational purposes. It simplifies the product narrative — "our AI does this" is cleaner than "our AI, trained by thousands of human workers in Uganda and the Philippines, does this." It distances AI companies from labor relations questions that would otherwise arise — questions about wages, conditions, psychological support, and accountability that are easier to ignore when workers are invisible. And it reinforces intellectual property claims — the value of an AI system lies in the model and the algorithms, not in the human work of labeling that trained it, so the human contribution can be treated as a commodity input rather than as creative or skilled labor.
Kate Crawford and Trevor Paglen's 2019 investigation "Excavating AI" documented the human curation decisions embedded in the ImageNet dataset — one of the most widely used AI training datasets in the world. ImageNet contained 14 million images organized into over 21,000 categories, many of which had been labeled by human workers on Amazon's Mechanical Turk platform. Crawford and Paglen examined those labels and found that many images of people had been categorized with crude, offensive, and dehumanizing labels: categories like "slut," "failure," and offensive racial terms. These were not accidental inclusions; they reflected the cultural assumptions and sometimes the cruelty of the humans who had categorized them, translated directly into a dataset used to train AI systems deployed commercially by major organizations.
The ImageNet example is disturbing in itself. More disturbing, as Crawford and Paglen argued, is what it reveals about the entire infrastructure of AI training data: it is produced by human beings making human judgments, those judgments reflect human values and biases, and the resulting training data encodes those values and biases into systems that are then deployed as if their outputs were objective.
2. What Data Annotation Actually Is and Who Does It
Data annotation is the process of adding labels, tags, or metadata to raw data to make it usable for machine learning. The specific tasks involved vary by application: for computer vision, annotators might draw bounding boxes around objects in images, label the emotions on faces, or classify whether an image is safe for work. For natural language processing, annotators might classify the sentiment of text, identify named entities, evaluate whether an AI-generated response is helpful and accurate, or rank competing responses to the same question. For autonomous vehicle systems, annotators might spend hours labeling each pixel in video frames — sky, road, pedestrian, vehicle — so that the system can learn to navigate real-world environments.
Annotation work varies considerably in skill requirements. Some tasks — determining whether a photograph contains a cat — can be completed accurately by almost any adult with minimal training. Others — evaluating whether a medical AI's diagnosis is clinically reasonable, or assessing whether a legal AI's document review is accurate — require substantial domain expertise. The wage and organizational structures attached to annotation work tend not to reflect these differences in skill; expert-level annotation is often outsourced through the same platforms and at similar rates to unskilled annotation.
The people who do annotation work are largely invisible to the AI companies that ultimately benefit from it. They work through intermediary platforms and vendors — Amazon Mechanical Turk, Scale AI, Sama Group, iMerit, Appen — that recruit and manage workforces globally. These platforms serve as both employers and labor markets: workers compete for available tasks, are paid per piece rather than per hour, and have limited or no employment relationship with the end customer. The resulting accountability structure obscures responsibility: the AI company says its systems are built on human feedback without acknowledging the conditions of that work; the annotation vendor says it manages worker welfare without providing full transparency to customers; and the workers have limited recourse if conditions are inadequate.
Mary Gray and Siddharth Suri's book Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass (2019) documents this workforce in rich ethnographic detail. Gray and Suri spent five years interviewing and observing on-demand platform workers in the United States, India, and elsewhere, and their analysis is important for business professionals because it challenges the clean distinction between the "platform economy" and traditional employment. Ghost workers, they argue, are not free agents making independent choices about flexible work; they are workers in a system with significant power asymmetries, limited labor protections, and real costs to their lives and wellbeing.
3. Wage Structures and Working Conditions
The wages paid to data annotators vary by geography, by the complexity of the task, and by the contracting arrangement. In general, they are low. Amazon Mechanical Turk — the best-studied of the major annotation platforms — has been the subject of multiple academic studies examining worker wages. A 2018 analysis by Sherry Katz and colleagues found that median earnings on Mechanical Turk were approximately $2.00 per hour when the time spent searching for and qualifying for tasks was included. Only 4% of workers earned more than $7.25 per hour (the US federal minimum wage at the time). International workers, who constitute a substantial portion of the Mechanical Turk workforce, typically earned less.
The piece-rate structure of annotation work creates specific economic pressures. Workers are paid per completed task, which incentivizes speed over quality. Tasks that require careful judgment — evaluating whether an AI-generated response accurately represents a complex topic, or assessing the severity of content policy violations — may be completed in ways that sacrifice accuracy for throughput. This creates a quality problem that feedback loops can partially address, but that is structurally rooted in the wage model.
Working conditions on annotation platforms also tend to lack the protections associated with standard employment. Workers are typically classified as independent contractors, which means they do not receive minimum wage guarantees, overtime pay, employer health insurance contributions, workers' compensation, or the right to organize collectively in many jurisdictions. The flexibility that contractors have — to choose when and how much to work — is real, but it is accompanied by economic insecurity that makes it genuinely constraining rather than freely chosen.
The geographic distribution of annotation work reflects and reproduces global economic inequality. The combination of low wages by Global North standards and moderate wages by Global South standards means that annotation work concentrates in countries including the Philippines, India, Kenya, Uganda, Venezuela, and Mexico. Workers in these countries do the labor that enables AI systems predominantly used by and for the benefit of consumers and organizations in wealthier countries. The value created flows upward and outward; the costs of the work — including the psychological costs — stay local.
4. Psychological Harm of Content Moderation
Among the most ethically urgent issues in the hidden AI labor supply chain is the psychological harm caused by content moderation work. Content moderators — human workers who review user-generated content to determine whether it violates platform policies — are routinely exposed to graphic violence, child sexual abuse material, animal abuse, torture, suicide, self-harm, terrorism, and the full range of material that humans are willing to document and upload to the internet.
The psychological consequences of this work are severe and well-documented. Studies of content moderators have found high rates of post-traumatic stress disorder, depression, anxiety, and substance abuse. Former Facebook content moderators who sued the company in 2020 described exposure to tens of thousands of graphic images and videos, including footage of beheadings, child abuse, and mass shootings, with inadequate psychological support and pressure to meet processing quotas.
The work of content moderation for AI systems is distinct from but related to content moderation for social platforms. AI systems — including large language models — require human feedback on their outputs to improve their safety and reduce harmful outputs. This human feedback often involves reviewing and rating the AI's responses to harmful prompts — responses that may themselves be disturbing — and generating alternative, better-calibrated responses. Workers providing this feedback are exposed to the same category of content as social media content moderators.
Karen Hao's 2022 reporting in MIT Technology Review on content moderation work performed for OpenAI by Sama Group in Nairobi, Kenya is one of the most important accounts of this issue. Hao's investigation found that Sama Group workers were paid approximately $1.32 to $2.00 per hour to review ChatGPT outputs for child sexual abuse material, torture, suicide, violence, and other disturbing content. Workers told Hao's reporting team that the work was psychologically traumatizing, that they had not been adequately prepared for the content they would encounter, that mental health support was insufficient, and that they felt unable to speak up about conditions because of economic vulnerability.
Sama Group terminated its contract with OpenAI shortly after Hao's article was published, citing decisions to prioritize other work. The timing suggested that reputational pressure had accelerated the decision. OpenAI subsequently announced commitments to improving worker conditions in its annotation supply chain. Whether these commitments were substantively implemented and what specifically changed in the conditions experienced by workers is not fully documented in available public reporting.
The content moderation case illustrates a broader principle about hidden labor and AI ethics: the psychological costs of AI safety are borne disproportionately by people in economically vulnerable positions, in countries with limited labor protections, who are least able to refuse the work or speak up about its conditions. The organizations that benefit from the work — and the users of the AI systems that the work makes safer — are largely insulated from these costs.
5. Geographic and Racial Dimensions
The geographic concentration of annotation and content moderation work in the Global South is not accidental. It is the result of deliberate economic calculation by companies seeking to minimize labor costs. It is also the result of the same colonial patterns of economic organization that have characterized extractive relationships between wealthy and poorer countries across many domains.
The racial and national character of the workforce is itself an ethical issue, independent of wage levels. When the painful, tedious, and psychologically harmful work of building AI systems is concentrated in countries that are predominantly Black and Brown, while the benefits of those systems accrue predominantly to companies and users in predominantly white wealthy countries, the arrangement replicates a colonial economic structure: extraction of labor from one population for the benefit of another, with limited accountability and limited political power for the extractees.
Antonio Garcia Martinez's reporting on Facebook's content moderation practices documented a different geographic dimension: the use of content moderation workers in countries where Facebook's market position made speaking up about conditions particularly difficult. Workers in countries where Facebook was a dominant employer in the tech-adjacent sector had limited alternatives, making the power asymmetry between employer and worker particularly pronounced.
The geographic and racial dimensions of annotation labor also shape the cultural assumptions embedded in AI systems. When training data is labeled by workers in specific cultural contexts, those cultural contexts influence how labels are applied. What counts as a "neutral" facial expression, a "normal" family structure, or a "professional" appearance reflects cultural assumptions that vary across societies. Workers labeling images from a Kenyan or Filipino cultural context will apply somewhat different judgments than workers from a US or European context. The resulting training data will reflect a complex and partially invisible mix of cultural assumptions that are difficult to audit and often not disclosed to AI customers.
6. Corporate Concealment
The invisibility of annotation and content moderation labor is not a natural feature of AI development; it is a choice that organizations make, enforced through a combination of non-disclosure agreements, contractual structures, and public relations norms.
Workers on annotation platforms typically sign non-disclosure agreements that prohibit them from discussing their work publicly, including discussing what content they reviewed, what AI systems they worked on, and what conditions they experienced. These agreements serve legitimate purposes — protecting confidential product information — but they also function to prevent the kinds of worker accounts that would create public accountability for working conditions.
The contractual structure that interposes annotation vendors between AI companies and workers serves a similar function. When OpenAI contracts with Sama Group, which then employs workers in Nairobi, OpenAI can characterize its ethical responsibilities as limited to its direct contractors — Sama Group — and can claim that conditions experienced by workers are Sama Group's responsibility. This is a legally coherent position but an ethically inadequate one. An organization that benefits from labor has ethical responsibilities for the conditions of that labor throughout the supply chain, regardless of the number of intermediary vendors interposed.
The public communications of major AI companies have generally not disclosed the scale, location, or conditions of annotation and content moderation work. Product announcements, research papers, and technical documentation frequently describe AI systems as trained on "human feedback" or "labeled data" without specifying who provided the feedback, under what conditions, at what cost. This omission is not accidental; it is a communication choice that protects the organizations from accountability questions that disclosure would invite.
7. Ethical Implications for AI Buyers and Deployers
For organizations that purchase and deploy AI systems — which increasingly means any organization of significant size — the hidden labor supply chain raises concrete ethical responsibilities. These responsibilities do not disappear because the labor is invisible in the final product or because it was performed by workers several contractual removes from the purchasing organization.
Several practical implications follow from this analysis.
Supply chain transparency as due diligence. An organization that purchases an AI product should ask its vendor: Where was the training data labeled? By whom? Under what conditions? At what wages? What psychological support was provided to workers who reviewed harmful content? These questions are analogous to the supply chain due diligence questions that responsible organizations ask about physical goods manufacturing. The fact that AI vendors are not currently required to disclose this information does not relieve purchasing organizations of the ethical obligation to ask.
Cost-pressure caution. The low wages of annotation workers are one reason AI products can be priced competitively. Organizations that negotiate AI vendor prices downward are, in effect, exerting pressure that is ultimately transmitted to annotation workforces. This is a supply chain ethics issue with direct analogs in the sourcing of manufactured goods.
Vendor qualification on labor standards. Responsible purchasing organizations should evaluate AI vendors on their disclosed labor standards for annotation and content moderation workers, as part of vendor qualification and ongoing contract management. Vendors that refuse to disclose this information or that provide only vague assurances should be treated as higher-risk.
Advocacy for industry standards. Individual organizational purchasing choices are insufficient to change the incentive structures that produce exploitative annotation labor. Industry-wide standards — developed by industry associations, mandated by regulation, or established through consumer pressure — are necessary for systemic change. Organizations that claim commitment to AI ethics should actively support such standards, including participating in industry initiatives and regulatory processes that address annotation labor conditions.
Internal annotation ethics. Organizations that build their own AI systems using internally managed or contracted annotation workforces have the most direct responsibility for those workers' conditions. Internal annotation programs should be subject to the same labor standards as other workers — minimum wages, safe working conditions, psychological support for content moderation, and protection against retaliation for raising concerns.
8. Discussion Questions
-
The myth of fully automated AI — the idea that AI systems operate independently of human labor — serves specific organizational purposes, including distancing companies from labor relations questions. As an AI product buyer, what information would you want vendors to disclose about the human labor involved in their product's development? How would you incorporate this information into procurement decisions?
-
Content moderation workers are routinely exposed to psychologically harmful material as part of their work maintaining AI safety. What obligations do AI companies have to these workers? How do those obligations compare to the obligations companies have to workers in other psychologically demanding professions — such as emergency responders, trauma therapists, or forensic investigators?
-
The geographic concentration of annotation labor in the Global South replicates patterns of economic extraction that characterize colonial economic relationships. Does this characterization change how you think about the ethical responsibilities of organizations that use AI systems? If so, how?
-
Non-disclosure agreements and contractual intermediaries function to make annotation and content moderation labor invisible, reducing the accountability pressure on AI companies for conditions in their supply chains. What regulatory or industry-led mechanisms might effectively address this concealment? What would you argue for if you were advising a government body considering AI supply chain transparency regulation?
Return to Chapter 2 index | Proceed to Key Takeaways