Chapter 2: Key Takeaways, Vocabulary, and Core Tensions

Chapter 2 | AI Ethics for Business Professionals

Key Takeaways

The same failure modes recur across AI's history. Inadequate attention to who will interact with AI systems and how, training data that reflects historical inequity, homogeneous development teams, opacity as a business strategy, and ethics infrastructure without authority — these failures appeared in the 1950s and they are appearing now. Historical literacy is a practical tool for recognizing these patterns before harm is done.
The AI winters taught a lesson the field repeatedly fails to apply. Overpromising on AI capabilities creates governance gaps: decisions are made based on capabilities that do not exist, and when those capabilities fail to materialize, the accountability mechanisms that should have been in place are absent. Hype is not a harmless communications strategy; it shapes resource allocation, policy, and public trust in ways with real consequences.
Statistical learning systems inherited and often amplified the inequities embedded in their training data. The shift from symbolic AI to machine learning did not reduce bias; it relocated it from explicit rules to implicit patterns in data. Historical data reflecting historical inequity — in lending, hiring, criminal justice — produces systems that perpetuate that inequity when used to train AI that makes similar decisions.
The great bias reckoning of 2015–2018 was not a discovery — it was a documentation. The bias that Joy Buolamwini, ProPublica, and investigative journalists documented in deployed AI systems was not new. It was predicted by researchers who understood how statistical learning works on inequitable data. The "discovery" was organizational, not technical: it became undeniable at sufficient scale to force public acknowledgment.
Ethics washing is a specific organizational phenomenon, distinct from genuine ethics commitment. The proliferation of AI ethics principles documents after 2016 produced statements of intent that, in many cases, were deployed to reduce regulatory and reputational pressure without substantively changing development and deployment practices. Genuine ethics commitment requires enforcement mechanisms, organizational authority for ethics functions, and willingness to absorb real costs in the form of delayed or foregone deployments.
The hidden labor supply chain of AI is an ethical responsibility, not an operational footnote. Annotation workers, content moderators, and other human laborers who make AI systems function are often invisible in final products, poorly compensated, and inadequately protected from psychological harm. Every organization that deploys AI systems has ethical responsibilities for the conditions in that supply chain, regardless of the number of contractual intermediaries between them and the workers.
Governance consistently lags capability, and the gap is not inevitable. The pattern in which regulation arrives years after documented harm is not a natural feature of technological change; it reflects specific political and economic dynamics — industry capture of regulatory agencies, jurisdiction gaps, the complexity of technical standards — that can be addressed if there is political will to do so. Organizations that wait for external regulation to enforce ethical practice will consistently be too late.
Generative AI has intensified and widened the scope of potential harms without resolving the organizational dynamics that produce harm. The capability shift represented by generative AI is real and significant. But the organizational failures that produced earlier AI ethics harms — speed over caution, homogeneous teams, inadequate pre-deployment testing, ethics infrastructure without authority — persist. More powerful capabilities deployed through unchanged organizational processes produce more serious harms.
Civil society, journalism, and independent research have done more to hold AI systems accountable than internal ethics infrastructure at deploying organizations. This is a structural observation, not an accusation. It implies that robust AI governance requires external accountability mechanisms and that internal ethics functions, to be effective, must be designed with genuine independence and authority.
The "this time is different" argument requires evidence, not assertion. Every phase of AI development has been accompanied by claims that the lessons of previous phases do not apply. Sometimes these claims identify genuine novelty. More often, they are used to avoid the hard organizational work of applying historical lessons to current deployments. The burden of proof lies with those who claim exemption from history.
Power is the organizing question of AI ethics history. Who holds power in AI development? Who bears the costs of AI failures? Who has the political and economic capacity to demand accountability? The answers — development power concentrated in a small number of organizations, costs borne disproportionately by vulnerable populations, accountability demanded by external advocates rather than built into deploying organizations — are consistent across the history traced in this chapter.
AI ethics failures are organizational failures, not merely technical ones. The bias in COMPAS was not a mathematical accident; it was the result of organizational decisions about what data to use, what fairness criteria to apply, and who would be affected by deployment. The failure of Tay was not a programming error; it was the result of organizational decisions about how to test, what features to include, and how to weight deployment speed against deployment safety. Technical improvements are necessary but insufficient; organizational change is required.

Essential Vocabulary

Algorithmic accountability The principle that AI systems and the organizations that deploy them should be answerable for the outputs of those systems, including their errors and harms. Accountability in this sense requires both transparency (the ability to inspect how a system works) and liability (consequences for harmful outcomes). The term emerged as a field of academic study and policy advocacy in roughly 2016–2020.

Disparate impact A legal and ethical concept describing situations in which a facially neutral policy or practice produces outcomes that are significantly less favorable to members of a protected class (defined by race, gender, disability, etc.) than to other groups, without adequate justification. In AI ethics, disparate impact refers to AI systems that produce inequitable outcomes across demographic groups even when they do not explicitly consider group membership. Distinguished from disparate treatment, which involves explicit consideration of protected characteristics.

Ethics washing The practice of deploying ethics language — principles documents, ethics boards, ethics teams, ethics commitments — as a reputational and regulatory strategy, without substantive accompanying change in organizational practices or willingness to absorb real costs for ethical compliance. Ethics washing is distinguished from genuine ethics commitment by the absence of enforcement mechanisms, organizational authority for ethics functions, and willingness to sacrifice profitable use cases to ethical constraints.

Filter bubble A concept introduced by Eli Pariser (2011) to describe the personalized information environment created by algorithmic curation, in which users are shown content consistent with their prior behavior and expressed preferences, reducing exposure to divergent perspectives. Filter bubbles are an unintended consequence of engagement optimization: algorithms that maximize user engagement tend to show users what they already agree with, because that generates more clicks and time on site than content that challenges them.

Ghost work A term coined by Mary Gray and Siddharth Suri (2019) to describe the on-demand, platform-mediated labor that powers AI and other technology systems, performed by workers who are invisible in the final product and lack the employment protections associated with conventional employment. Ghost workers include data annotators, content moderators, click workers, and other micro-taskers who provide the human judgment that AI systems require.

Training data The dataset used to train a machine learning model — the examples from which the model learns the patterns it will later apply to new inputs. Training data quality, size, and composition are primary determinants of model behavior, including its biases. Because training data typically reflects historical human decisions made in unequal social contexts, it commonly encodes the inequities of those contexts into the models trained on it.

Disparate treatment vs. disparate impact Two distinct theories of discrimination relevant to AI ethics. Disparate treatment occurs when a system explicitly considers a protected characteristic (race, gender, etc.) in a way that disadvantages members of that group. Disparate impact occurs when a facially neutral system produces outcomes that disproportionately disadvantage members of a protected group. Many AI bias cases involve disparate impact rather than disparate treatment: the systems do not explicitly consider race or gender but use proxy variables that correlate with them.

Recidivism prediction The use of statistical models to predict the likelihood that a person convicted of a crime will commit additional crimes in the future. Recidivism prediction tools — of which COMPAS is the most studied — are used in sentencing, bail, and parole decisions in many US jurisdictions. ProPublica's 2016 investigation found that COMPAS produced racially disparate predictions, wrongly classifying Black defendants as high risk at nearly twice the rate it wrongly classified white defendants. Subsequent academic analysis demonstrated that the disparities were mathematically related to different base rates of measured recidivism across racial groups — a finding that generated important theoretical work on the impossibility of simultaneously satisfying multiple intuitive fairness criteria.

Core Tensions

Speed vs. safety. The competitive and cultural dynamics of the technology industry reward rapid deployment. Comprehensive pre-deployment safety testing takes time and resources. These objectives are in genuine tension, and the history of AI ethics failures suggests that speed has consistently been weighted more heavily than safety. The tension does not resolve itself; it requires explicit organizational decisions about which to prioritize and under what conditions.

Transparency vs. intellectual property. Meaningful accountability for AI systems requires inspectability — the ability for affected parties, regulators, and independent researchers to understand how a system works. Inspectability conflicts with intellectual property protections that organizations use to protect competitive advantages. The resolution of this tension requires regulatory frameworks that can mandate sufficient transparency for accountability without compromising legitimate competitive interests.

Individual fairness vs. group fairness. Intuitive definitions of fairness at the individual level (treating similar individuals similarly) and at the group level (producing similar error rates across demographic groups) are mathematically incompatible in many realistic situations. The incompatibility is not a technical problem to be solved; it is a values question about whose interests to prioritize when they conflict. This tension requires explicit ethical reasoning, not technical optimization.

Innovation vs. harm prevention. New AI capabilities create new possibilities for beneficial applications and new possibilities for harm. The appropriate pace of deployment — how thoroughly to test before deploying, how much caution to exercise in the face of uncertain but potentially serious harms — is a genuine ethical judgment that cannot be resolved by technical analysis alone. The history traced in this chapter suggests that the technology industry has consistently resolved this tension in favor of innovation, often at the expense of populations with less power to resist harm.

Questions to Carry Forward

What makes an ethics commitment genuine rather than performative? What organizational structures and cultural norms are necessary conditions for genuine ethics commitment?
How should organizations allocate responsibility for AI ethics failures when those failures emerge from decisions made at multiple levels — technical design choices, deployment decisions, commercial contracts, and broader industry norms?
What would adequate labor standards for AI annotation and content moderation work look like, and what combination of market pressure, regulation, and industry self-governance would be required to implement them?
How can organizations maintain meaningful accountability for AI systems whose internal workings are opaque even to technical experts?
What specific historical lessons from the cases in this chapter should be legible to an AI product manager or an organizational AI ethics committee, and what processes would make those lessons actionable?