Glossary

This glossary provides definitions of key terms used throughout Data, Society, and Responsibility. Terms are organized alphabetically. Each entry indicates the chapter(s) where the term is most prominently discussed, though many concepts appear across multiple chapters. Cross-references ("See also") help you trace connections between related ideas.


A

Accountability (Chapters 17, 29, 40)
The principle that individuals, organizations, and institutions must answer for the outcomes of the data systems they create, deploy, or operate. Accountability requires identifiable responsible parties, transparent decision processes, and mechanisms for redress when harm occurs. See also: accountability gap, algorithmic audit, liability.
Accountability gap (Chapters 17, 25, 40)
The situation that arises when no clearly identifiable party bears responsibility for harms caused by data-driven systems, often because responsibility is diffused across designers, deployers, operators, and regulators. A recurring theme throughout the text. See also: accountability, liability, distributed responsibility.
Adequacy decision (Chapter 23)
A formal determination by a regulatory authority (most commonly the European Commission under GDPR) that a non-EU country provides an essentially equivalent level of data protection, thereby permitting the free flow of personal data to that country without additional safeguards. See also: GDPR, standard contractual clauses, data localization.
Age-appropriate design (Chapter 35)
A regulatory and design philosophy requiring that digital products and services likely to be accessed by children incorporate protections suited to the developmental needs and vulnerabilities of young users. The UK's Age Appropriate Design Code (Children's Code) is a leading example. See also: COPPA, children's data.
Age verification (Chapter 35)
Technical mechanisms used to confirm that a user meets a minimum age requirement before accessing certain digital services or content. Methods range from self-declaration to document verification and biometric estimation, each carrying distinct privacy and accuracy trade-offs. See also: COPPA, age-appropriate design.
Algorithm (Chapters 13, 14, 15)
A step-by-step set of instructions or rules that a computer follows to perform a task, make a decision, or solve a problem. In the context of this text, the term typically refers to automated decision-making systems that sort, rank, recommend, or classify information and people. See also: algorithmic bias, recommendation system, machine learning.
Algorithmic audit (Chapter 17)
A systematic examination of an algorithmic system to assess its performance, fairness, transparency, and compliance with legal or ethical standards. Audits may be internal, external, or regulatory, and can focus on inputs, processes, outputs, or impacts. See also: algorithmic impact assessment, accountability, fairness.
Algorithmic bias (Chapters 14, 15, 17)
Systematic and repeatable errors in a computer system's outputs that create unfair outcomes, such as privileging one group over another. Algorithmic bias can originate from biased training data, flawed model design, or the broader social context in which a system operates. See also: historical bias, representation bias, feedback loop, disparate impact.
Algorithmic impact assessment (AIA) (Chapters 17, 28)
A structured evaluation process, often conducted before deployment, that examines the potential social, ethical, and legal consequences of an algorithmic system. AIAs typically include stakeholder consultation, risk analysis, and mitigation planning. See also: privacy impact assessment, DPIA, algorithmic audit.
Algorithmic management (Chapter 33)
The use of automated systems to direct, evaluate, and discipline workers, particularly prevalent in gig economy platforms. Algorithmic management encompasses task assignment, performance monitoring, pay determination, and termination decisions with minimal or no human oversight. See also: gig economy, worker surveillance, automation.
Anonymization (Chapter 10)
The process of removing or altering personal identifiers from a dataset so that individual data subjects can no longer be identified, even by the data holder. True anonymization is irreversible and, if achieved, typically exempts data from privacy regulations. In practice, re-identification risks often persist. See also: pseudonymization, k-anonymity, differential privacy, de-identification.
Anticipatory governance (Chapter 38)
An approach to technology governance that seeks to anticipate and prepare for the social implications of emerging technologies before they become widespread, rather than reacting after harms materialize. Involves foresight methods, scenario planning, and flexible regulatory frameworks. See also: Collingridge dilemma, precautionary principle, regulatory sandbox.
Attention economy (Chapter 4)
An economic framework in which human attention is treated as a scarce resource and platforms compete to capture, retain, and monetize user engagement. The concept highlights how design choices are shaped by the imperative to maximize time-on-platform and advertising revenue. See also: engagement optimization, behavioral surplus, dark patterns, persuasive design.
Automation (Chapters 19, 33)
The use of technology to perform tasks previously carried out by humans, with varying degrees of human oversight. In data ethics contexts, automation raises questions about labor displacement, accountability for automated decisions, and the boundaries of machine autonomy. See also: autonomous system, algorithmic management, human-in-the-loop.
Autonomous system (Chapter 19)
A system capable of performing tasks or making decisions in the physical or digital world without continuous human direction. Examples include self-driving vehicles, autonomous weapons systems, and automated trading algorithms. The degree of autonomy varies along a spectrum from fully supervised to fully independent. See also: human-in-the-loop, moral agency, automation.

B

Behavioral surplus (Chapter 4)
A term coined by Shoshana Zuboff to describe the data about user behavior that exceeds what is needed to improve a service and is instead redirected toward predictive products sold to advertisers and other third parties. Behavioral surplus is the raw material of surveillance capitalism. See also: surveillance capitalism, attention economy, data exhaust.
Big Data (Chapters 2, 5)
A term describing datasets characterized by extreme volume, velocity, variety, and (sometimes) veracity that exceed the capacity of traditional data processing tools. Beyond its technical meaning, "Big Data" also signifies a cultural and institutional shift toward data-driven decision-making in commerce, governance, and research. See also: datafication, data lifecycle, data exhaust.
Biometric data (Chapter 12)
Data derived from the measurement and analysis of unique physical or behavioral characteristics of an individual, such as fingerprints, facial geometry, iris patterns, voiceprints, or gait. Biometric data is considered especially sensitive because it is generally immutable and uniquely identifying. See also: facial recognition, genetic data, HIPAA.
Biopower (Chapter 5)
A concept from Michel Foucault describing the regulation and management of populations through knowledge of life processes, including birth rates, health statistics, and demographic data. Biopower operates through norms, standards, and statistical governance rather than through direct coercion. See also: disciplinary power, power/knowledge, panopticism.
Black box (Chapter 16)
A system whose internal workings are opaque to those affected by its outputs. In machine learning, a "black box" model produces decisions or predictions without offering an interpretable explanation of how inputs relate to outputs. The term also applies to institutional decision-making processes that lack transparency. See also: explainability, transparency, LIME, SHAP.
Brain-computer interface (BCI) (Chapter 38)
A technology that establishes a direct communication pathway between the brain and an external device, raising profound questions about neural data privacy, cognitive liberty, and the boundaries of the self. See also: anticipatory governance, emerging technologies.
Breach notification (Chapter 30)
A legal requirement, established under regulations such as GDPR and various US state laws, that organizations must inform affected individuals and/or regulators within a specified timeframe when a data breach compromises personal data. See also: data breach, incident response, GDPR.

C

Calibration (Chapter 15)
A fairness property requiring that predicted probabilities correspond to actual outcome frequencies across groups. For example, if an algorithm assigns a 70% risk score to individuals, approximately 70% of those individuals should in fact experience the predicted outcome, and this should hold true regardless of group membership. See also: equalized odds, demographic parity, impossibility theorem.
Carbon footprint (of AI) (Chapter 34)
The total greenhouse gas emissions associated with the lifecycle of an artificial intelligence system, including the energy consumed during training, inference, and data storage, as well as the embodied emissions in hardware manufacturing. See also: Green AI, environmental data ethics.
Care ethics (Chapter 6)
An ethical framework that emphasizes relationships, interdependence, and responsibilities of care as central to moral reasoning. Originating in the work of Carol Gilligan and Nel Noddings, care ethics attends to power differentials, vulnerability, and the needs of those who depend on others. When applied to data, it foregrounds the relational impacts of data practices and asks who is vulnerable to harm. See also: virtue ethics, justice theory, moral pluralism.
Categorical imperative (Chapter 6)
The central principle of Immanuel Kant's deontological ethics, requiring that moral agents act only according to rules they could will to be universal laws. In data ethics, the categorical imperative challenges practices that treat data subjects merely as means to an end rather than as ends in themselves. See also: deontology, utilitarianism, moral pluralism.
CCPA (California Consumer Privacy Act) (Chapters 20, 25)
A 2018 California state law granting residents rights over their personal information, including the right to know what data is collected, to delete it, to opt out of its sale, and to non-discrimination for exercising privacy rights. Amended and expanded by the California Privacy Rights Act (CPRA) in 2020. See also: GDPR, right to privacy, data broker.
Census (Chapter 2)
A systematic, usually government-conducted count and characterization of a population. Historically, the census exemplifies how data collection serves state power, resource allocation, and political representation, while also raising concerns about surveillance, categorization, and the politics of classification. See also: statistical state, data subject, biopower.
Chief data officer (CDO) (Chapter 27)
A senior executive responsible for an organization's data governance, data quality, data strategy, and increasingly, data ethics. The CDO role has evolved from a primarily technical function toward a strategic and ethical leadership position. See also: data stewardship, data governance, data catalog.
Children's data (Chapter 35)
Personal data belonging to individuals below the age of legal majority, subject to heightened protections under laws such as COPPA, GDPR (which allows member states to set the age of digital consent between 13 and 16), and the UK Age Appropriate Design Code. See also: COPPA, age-appropriate design, age verification.
Citizen assembly (Chapter 39)
A deliberative democratic body composed of randomly selected members of the public, convened to consider a specific policy question and produce recommendations. In data governance contexts, citizen assemblies have been used to deliberate on issues such as facial recognition, AI regulation, and data sharing. See also: participatory design, data cooperative, deliberative democracy.
CLOUD Act (Clarifying Lawful Overseas Use of Data Act) (Chapter 23)
A 2018 US federal law enabling US law enforcement agencies to compel US-based technology companies to provide data stored on servers regardless of their physical location, and establishing a framework for bilateral data access agreements with foreign governments. See also: data localization, digital sovereignty, cross-border data flows.
Collingridge dilemma (Chapter 38)
The observation that the social consequences of a technology are difficult to predict until the technology is widely adopted, but by that point, control and change become expensive, difficult, or impossible. Named after David Collingridge, the dilemma underscores the challenge of governing emerging technologies. See also: anticipatory governance, precautionary principle.
Compliance (Chapters 25, 26)
The act of adhering to legal requirements, industry standards, and organizational policies governing data practices. While necessary, compliance alone is insufficient for ethical data practice, as laws may lag behind technological development and may not address all ethical concerns. See also: enforcement, regulatory capture, data ethics program.
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) (Chapter 14)
A proprietary risk assessment algorithm used in the US criminal justice system to predict the likelihood of recidivism. The 2016 ProPublica investigation revealing racial disparities in COMPAS scores became a landmark case in algorithmic bias research. See also: algorithmic bias, disparate impact, fairness.
Conformity assessment (Chapter 21)
A process under the EU AI Act by which providers of high-risk AI systems demonstrate compliance with regulatory requirements before placing the system on the market. Depending on the risk category, conformity assessment may be self-conducted or require third-party evaluation. See also: EU AI Act, risk-based regulation.
Consent (Chapters 9, 10, 25)
The agreement by a data subject to the collection, processing, or sharing of their personal data. Under frameworks like GDPR, valid consent must be freely given, specific, informed, and unambiguous. The meaning and limits of consent in data contexts remain vigorously debated. See also: informed consent, consent fatigue, consent fiction, dark patterns.
Consent fatigue (Chapter 9)
The phenomenon in which individuals, overwhelmed by the volume and complexity of consent requests in digital environments, stop meaningfully engaging with privacy notices and simply click "agree" by default. Consent fatigue undermines the legitimacy of consent as a legal and ethical basis for data processing. See also: informed consent, consent, privacy paradox, dark patterns.
Content moderation (Chapters 13, 31)
The practice of monitoring, reviewing, and removing user-generated content on digital platforms based on community guidelines, terms of service, or legal requirements. Content moderation involves a mix of automated systems and human reviewers and raises tensions between free expression, safety, and platform responsibility. See also: platform governance, Section 230, DSA, misinformation.
Contextual integrity (Chapter 7)
A theory of privacy developed by Helen Nissenbaum holding that privacy is maintained when the flow of personal information conforms to the norms appropriate to a given social context. A privacy violation occurs when information flows in ways that breach the informational norms governing a particular context, even if the information is technically "public." See also: informational norms, informational privacy, right to privacy.
COPPA (Children's Online Privacy Protection Act) (Chapters 24, 35)
A 1998 US federal law regulating the collection of personal information from children under 13 by operators of websites and online services. COPPA requires verifiable parental consent and imposes data minimization requirements for children's data. See also: children's data, age-appropriate design, FERPA.
Cross-border data flows (Chapter 23)
The transfer of personal or other regulated data across national jurisdictions. Cross-border data flows are governed by a patchwork of mechanisms including adequacy decisions, standard contractual clauses, binding corporate rules, and data localization requirements. See also: adequacy decision, data localization, digital sovereignty, GDPR.

D

DAMA-DMBOK (Data Management Body of Knowledge) (Chapter 22)
A comprehensive framework published by the Data Management Association International (DAMA) that defines the functions, terminology, and best practices of data management. DAMA-DMBOK covers data governance, data quality, metadata management, data architecture, and related disciplines. See also: data governance, data quality, metadata.
Dark patterns (Chapters 4, 9)
User interface design choices that manipulate, deceive, or coerce users into making decisions they would not otherwise make, such as sharing more data, subscribing to services, or consenting to terms. Dark patterns exploit cognitive biases and undermine autonomous choice. See also: persuasive design, consent fatigue, attention economy.
Data (Chapter 1)
Facts, observations, or measurements recorded in a form suitable for storage, processing, and communication. Data can be quantitative or qualitative, structured or unstructured, personal or aggregate. Throughout this text, data is understood not as a neutral raw material but as a social artifact shaped by the choices of those who collect, categorize, and interpret it. See also: metadata, datafication, structured data, unstructured data.
Data breach (Chapter 30)
A security incident in which personal, confidential, or protected data is accessed, disclosed, or acquired by an unauthorized party. Data breaches may result from cyberattacks, insider threats, negligence, or system vulnerabilities. See also: breach notification, incident response, data security.
Data broker (Chapter 11)
A company that collects, aggregates, and sells personal data about individuals, typically without a direct relationship with the data subjects. Data brokers compile profiles from public records, commercial transactions, social media, and other sources. See also: CCPA, behavioral surplus, data exhaust.
Data catalog (Chapter 27)
An organized inventory of an organization's data assets that includes metadata describing each dataset's content, source, owner, format, quality, and permissible uses. Data catalogs support discoverability, governance, and responsible data use. See also: metadata, data lineage, data stewardship, chief data officer.
Data collection (Chapters 1, 9)
The process of gathering data about individuals, populations, or phenomena, whether through direct observation, surveys, sensor networks, web tracking, transaction records, or other methods. How data is collected shapes what questions can be asked, whose experiences are represented, and what harms may follow. See also: consent, data minimization, datafication.
Data colonialism (Chapters 5, 32, 37)
A concept describing the appropriation of human life through data, drawing parallels with historical colonialism. Data colonialism involves the extraction of data from individuals and communities, particularly in the Global South, for the benefit of powerful corporations and states, perpetuating global inequalities. See also: digital extractivism, data sovereignty, data justice.
Data controller (Chapters 1, 20)
Under GDPR and similar frameworks, the entity that determines the purposes and means of processing personal data. The data controller bears primary legal responsibility for compliance with data protection requirements. See also: data subject, data processor, GDPR.
Data cooperative (Chapter 39)
An organizational structure in which individuals pool their data collectively and govern its use through democratic decision-making. Data cooperatives seek to restore collective bargaining power over data and to ensure that the value generated from data benefits the contributing community. See also: data trust, participatory design, data justice.
Data ethics program (Chapter 26)
A structured organizational initiative that embeds ethical considerations into data collection, analysis, and use. A data ethics program typically includes governance structures (such as ethics committees), training, guidelines, review processes, and mechanisms for raising concerns. See also: responsible AI, data stewardship, compliance.
Data exhaust (Chapters 1, 4)
The trail of data generated as a byproduct of individuals' online and offline activities, such as browsing histories, location data, and transaction logs. Although often collected passively, data exhaust can be aggregated and analyzed to reveal detailed profiles of individuals. See also: behavioral surplus, datafication, metadata.
Data feminism (Chapter 32)
An approach to data science and data ethics informed by intersectional feminist theory, as articulated by Catherine D'Ignazio and Lauren Klein. Data feminism examines how power operates in data systems, challenges the myth of objectivity, advocates for pluralism in data practices, and centers the perspectives of marginalized communities. See also: data justice, epistemic injustice, intersectionality.
Data governance (Chapters 22, 27)
The collection of policies, processes, roles, standards, and metrics that ensure the effective and efficient use of data within an organization or across a jurisdiction, encompassing data quality, security, privacy, and ethical considerations. See also: DAMA-DMBOK, data stewardship, chief data officer, data quality.
Data justice (Chapter 32)
A framework for evaluating the fairness of data practices by examining how data systems distribute power, resources, and opportunities across social groups. Data justice attends to structural inequalities and asks who benefits, who is harmed, and who has voice in data governance decisions. See also: data feminism, digital divide, digital redlining, equity.
Data lifecycle (Chapter 1)
The sequence of stages through which data passes, from creation or collection through storage, processing, analysis, sharing, and eventual archiving or deletion. Understanding the data lifecycle is essential for identifying where ethical issues arise and where governance interventions can be most effective. See also: data collection, data minimization, data governance.
Data lineage (Chapter 27)
The documentation of data's origins, transformations, and movements throughout its lifecycle. Data lineage tracking enables organizations to trace how data was created, how it has been modified, and where it has traveled, supporting transparency, quality assurance, and regulatory compliance. See also: data catalog, metadata, data governance.
Data localization (Chapter 23)
Laws or policies requiring that data collected within a jurisdiction be stored and/or processed within that jurisdiction's borders. Data localization reflects concerns about sovereignty, law enforcement access, and economic development, but can also fragment the global internet. See also: digital sovereignty, cross-border data flows, adequacy decision.
Data minimization (Chapters 10, 28)
The principle that organizations should collect, process, and retain only the minimum amount of personal data necessary for a specified purpose. Data minimization is a core tenet of privacy by design and is legally required under GDPR and similar frameworks. See also: privacy by design, purpose limitation, anonymization.
Data processor (Chapter 20)
Under GDPR and similar frameworks, an entity that processes personal data on behalf of the data controller. While the data controller determines the purposes and means of processing, the processor carries out the processing operations and has certain independent obligations. See also: data controller, data subject, GDPR.
Data quality (Chapters 14, 22)
The degree to which data is accurate, complete, consistent, timely, and fit for its intended purpose. Poor data quality can introduce or amplify bias in algorithmic systems, undermine decision-making, and compromise research integrity. See also: data governance, DAMA-DMBOK, measurement bias.
Data sovereignty (Chapters 23, 37)
The principle that data is subject to the laws and governance structures of the jurisdiction in which it is collected or to which the data subject belongs. In the Global South context, data sovereignty is often framed as resistance to the extractive data practices of multinational corporations and former colonial powers. See also: digital sovereignty, data localization, data colonialism, indigenous data sovereignty.
Data stewardship (Chapter 27)
The responsible management of data assets on behalf of data subjects, organizations, or communities. Data stewards are accountable for ensuring data quality, security, privacy, and ethical use. The concept frames data management as a fiduciary obligation rather than an ownership right. See also: chief data officer, data governance, data catalog.
Data subject (Chapters 1, 20)
An identified or identifiable natural person to whom personal data relates. Under GDPR, data subjects have enumerated rights including access, rectification, erasure, and objection. The concept foregrounds the human being behind the data point. See also: data controller, data processor, GDPR.
Data trust (Chapter 39)
A legal and governance structure in which an independent trustee manages data on behalf of a defined group of beneficiaries, according to the terms of a trust agreement. Data trusts aim to provide fiduciary oversight of data use and to balance individual rights with collective benefits. See also: data cooperative, data stewardship, data governance.
Database (Chapter 2)
An organized collection of data stored and accessed electronically. Databases underpin modern data practices, and the design of a database, including its categories, fields, and relationships, embeds assumptions about what matters and who counts. See also: data, structured data, metadata.
Datafication (Chapter 1)
The process by which aspects of human life, behavior, and social interaction that were previously unquantified are converted into data that can be tracked, analyzed, and monetized. Datafication extends data collection into domains such as friendship, emotion, movement, and attention. See also: quantified self, data exhaust, Big Data, behavioral surplus.
Dataveillance (Chapter 8)
The systematic monitoring of people through their data, including metadata, transaction records, communication patterns, and location data. Coined by Roger Clarke, dataveillance extends surveillance beyond physical observation to the digital traces of everyday life. See also: surveillance, panopticon, sousveillance, mass surveillance.
Datasheet for datasets (Chapter 29)
A documentation practice, proposed by Timnit Gebru and colleagues, in which the creators of a dataset provide a standardized set of information about its motivation, composition, collection process, intended use, and known limitations. Datasheets are analogous to material safety data sheets in manufacturing. See also: model card, responsible AI, data quality.
De-identification (Chapter 10)
The process of removing or obscuring personal identifiers from a dataset to reduce the risk of identifying individual data subjects. De-identification is a broader term than anonymization and may include techniques that are reversible (pseudonymization) or that leave residual re-identification risk. See also: anonymization, pseudonymization, k-anonymity.
Deepfake (Chapter 18)
Synthetic media, most commonly video or audio, created using deep learning techniques to convincingly depict individuals saying or doing things they never said or did. Deepfakes raise concerns about misinformation, consent, identity theft, and the erosion of trust in authentic media. See also: generative AI, synthetic media, hallucination, watermarking.
Deliberative democracy (Chapter 39)
A democratic theory emphasizing informed public deliberation and reasoned argument as the basis for legitimate collective decision-making. In data governance, deliberative approaches include citizen assemblies, public consultations, and participatory technology assessment. See also: citizen assembly, participatory design.
Demographic parity (Chapter 15)
A fairness criterion requiring that a decision system's positive outcome rate is the same across all relevant demographic groups. While intuitive, demographic parity can conflict with other fairness metrics and may not be appropriate in all contexts. See also: equalized odds, calibration, impossibility theorem, disparate impact.
Deontology (Chapter 6)
An ethical tradition holding that the rightness or wrongness of an action depends on whether it conforms to moral rules or duties, rather than on its consequences. In data ethics, deontological reasoning emphasizes rights, consent, and respect for persons. Immanuel Kant's categorical imperative is the most influential deontological principle. See also: categorical imperative, utilitarianism, virtue ethics, moral pluralism.
Differential privacy (Chapter 10)
A mathematical framework for quantifying and limiting the privacy risk of individual data subjects when their data is included in a dataset used for statistical analysis. Differential privacy adds carefully calibrated noise to data or query results, providing provable privacy guarantees. See also: k-anonymity, anonymization, privacy-enhancing technologies.
Digital divide (Chapter 32)
The gap between individuals, households, communities, and countries with regard to access to and use of information and communication technologies, including internet connectivity, digital devices, and digital literacy. The digital divide intersects with socioeconomic, racial, geographic, and generational inequalities. See also: digital redlining, data justice, equity.
Digital extractivism (Chapter 37)
The systematic extraction of data from individuals and communities in the Global South by corporations and governments primarily based in the Global North, mirroring historical patterns of resource extraction during colonialism. See also: data colonialism, data sovereignty, Global South perspectives.
Digital redlining (Chapter 32)
The use of digital technologies and data-driven systems to perpetuate or create discriminatory practices that restrict access to services, opportunities, or information for marginalized communities, analogous to the historic practice of redlining in housing. See also: algorithmic bias, data justice, digital divide, disparate impact.
Digital sovereignty (Chapter 23)
The capacity of a state, community, or organization to exercise control over the data generated within its borders and the digital infrastructure that processes it. Digital sovereignty reflects concerns about economic dependence, national security, and cultural autonomy in the digital age. See also: data sovereignty, data localization, cross-border data flows.
Digital twin (Chapter 38)
A virtual replica of a physical system, process, or entity that is continuously updated with real-time data. Digital twins raise governance questions about ownership of the virtual model, privacy implications of detailed simulation, and the use of digital twins for predictive decision-making. See also: anticipatory governance, IoT, emerging technologies.
Disciplinary power (Chapter 5)
A concept from Michel Foucault describing the mechanisms through which institutions regulate individual behavior through surveillance, normalization, and examination. Disciplinary power operates through the internalization of norms, making individuals self-regulating subjects. See also: panopticism, biopower, power/knowledge.
Disparate impact (Chapters 14, 15)
A legal and analytical concept describing a situation in which a facially neutral policy, practice, or algorithm produces disproportionately adverse outcomes for members of a protected group, even without discriminatory intent. See also: algorithmic bias, demographic parity, fairness.
Distributed responsibility (Chapter 17)
A condition in which responsibility for a system's behavior is spread across multiple actors, such as data providers, algorithm designers, deployers, and end users, making it difficult to assign clear accountability for harms. See also: accountability gap, liability.
DPIA (Data Protection Impact Assessment) (Chapter 28)
A structured assessment, required under GDPR for processing activities likely to result in high risks to individuals' rights and freedoms. A DPIA identifies risks, evaluates their severity and likelihood, and documents the measures taken to mitigate them. See also: privacy impact assessment, algorithmic impact assessment, GDPR.
DSA (Digital Services Act) (Chapter 31)
A 2022 European Union regulation establishing a comprehensive framework for the responsibilities of digital intermediaries, including platforms, regarding illegal content, transparent advertising, and algorithmic accountability. The DSA introduces tiered obligations based on platform size. See also: platform governance, Section 230, content moderation.

E

Emerging technologies (Chapter 38)
Technologies that are in early stages of development or adoption and whose social implications are not yet fully understood. In this text, emerging technologies include quantum computing, brain-computer interfaces, extended reality, advanced IoT, and synthetic biology, among others. See also: anticipatory governance, Collingridge dilemma.
Encryption (Chapter 36)
The process of converting data into a coded form that can be read only by authorized parties possessing the appropriate decryption key. End-to-end encryption is a central issue in debates between privacy advocates and law enforcement agencies regarding access to communications. See also: privacy-enhancing technologies, national security, FISA.
Enforcement (Chapter 25)
The mechanisms by which data protection and AI regulations are implemented and violations are sanctioned. Enforcement may be carried out by data protection authorities (DPAs), courts, sector-specific regulators, or through private rights of action. Effective enforcement requires adequate resources, independence, and political will. See also: compliance, regulatory capture, data protection authority.
Engagement optimization (Chapter 4)
The systematic design of digital platforms and content to maximize user interaction metrics such as time spent, clicks, shares, and return visits. Engagement optimization often relies on behavioral data and psychological techniques that may conflict with user wellbeing. See also: attention economy, dark patterns, behavioral surplus, persuasive design.
Environmental data ethics (Chapter 34)
The field examining the ethical implications of environmental data collection, environmental monitoring technologies, and the environmental costs of data-intensive technologies such as AI. Encompasses questions of environmental justice, access to climate data, and the carbon footprint of computation. See also: Green AI, carbon footprint.
Epistemic injustice (Chapters 5, 32)
A concept from philosopher Miranda Fricker describing the wrong done to someone in their capacity as a knower. In data contexts, epistemic injustice occurs when certain groups' knowledge, experiences, or testimony are systematically devalued or excluded from datasets, algorithms, and governance processes. See also: data justice, data feminism, representation bias.
Equalized odds (Chapter 15)
A fairness criterion requiring that an algorithm's true positive rate and false positive rate are equal across all relevant demographic groups. Equalized odds addresses the concern that errors should not fall disproportionately on any one group. See also: demographic parity, calibration, impossibility theorem, fairness.
Equity (Chapter 32)
The principle of fairness that recognizes different individuals and groups may need different resources and treatment to achieve comparable outcomes. In data contexts, equity goes beyond formal equality to address structural disadvantages that data systems may perpetuate or exacerbate. See also: data justice, fairness, digital divide.
EU AI Act (Chapter 21)
A European Union regulation, adopted in 2024, establishing a comprehensive legal framework for artificial intelligence based on a risk classification system. The EU AI Act prohibits certain AI practices, imposes strict requirements on high-risk systems, and mandates transparency for certain AI applications. See also: risk-based regulation, conformity assessment, GDPR.
Explainability (Chapter 16)
The capacity of an AI or algorithmic system to provide understandable reasons for its outputs, decisions, or recommendations. Explainability exists on a spectrum and its appropriate form depends on the audience and the stakes of the decision. See also: transparency, LIME, SHAP, black box, interpretability.

F

Facial recognition (Chapters 8, 12)
A biometric technology that identifies or verifies individuals by analyzing the geometric features of their faces in images or video. Facial recognition raises acute concerns about mass surveillance, bias (particularly against people of color and women), consent, and chilling effects on public assembly. See also: biometric data, surveillance, dataveillance.
Fairness (Chapters 14, 15, 17)
In the context of algorithmic systems, the quality of producing equitable outcomes and avoiding unjustified discrimination. Fairness has multiple formal definitions (demographic parity, equalized odds, calibration, individual fairness) that can conflict with each other, making trade-offs unavoidable. See also: algorithmic bias, disparate impact, impossibility theorem, equity.
Feedback loop (Chapter 14)
A dynamic in which the output of an algorithmic system influences the inputs it receives in subsequent iterations, often amplifying existing biases. For example, predictive policing systems that direct officers to certain neighborhoods generate more arrests in those areas, which reinforces the system's prediction. See also: algorithmic bias, historical bias, self-fulfilling prophecy.
FERPA (Family Educational Rights and Privacy Act) (Chapter 24)
A US federal law protecting the privacy of student education records and giving parents and eligible students rights to access and control those records. FERPA governs schools and educational institutions receiving federal funding. See also: COPPA, sector-specific governance, children's data.
FISA (Foreign Intelligence Surveillance Act) (Chapter 36)
A 1978 US federal law establishing procedures for the physical and electronic surveillance and collection of foreign intelligence information. FISA created the Foreign Intelligence Surveillance Court (FISC) and has been repeatedly amended, including through Section 702, which authorizes warrantless surveillance of non-US persons abroad. See also: mass surveillance, national security, encryption, Five Eyes.
Five Eyes (Chapter 36)
An intelligence-sharing alliance among the United States, United Kingdom, Canada, Australia, and New Zealand. The Five Eyes arrangement enables extensive signals intelligence cooperation and has been a focal point of debates about mass surveillance, as revealed by the Edward Snowden disclosures. See also: FISA, mass surveillance, national security.

G

GDPR (General Data Protection Regulation) (Chapters 20, 23, 25, 28)
The European Union's comprehensive data protection law, effective since May 2018. GDPR establishes rights for data subjects, obligations for data controllers and processors, principles governing data processing (including lawfulness, fairness, transparency, purpose limitation, data minimization, accuracy, storage limitation, and accountability), and enforcement mechanisms including fines of up to 4% of global annual revenue. See also: data subject, data controller, DPIA, adequacy decision, CCPA.
Generative AI (Chapter 18)
Artificial intelligence systems capable of producing new content, including text, images, audio, video, and code, based on patterns learned from training data. Generative AI raises ethical questions about authorship, copyright, labor displacement, misinformation, and the provenance of synthetic content. See also: deepfake, hallucination, large language model, synthetic media.
Genetic data (Chapter 12)
Data relating to an individual's inherited or acquired genetic characteristics, obtained through genetic testing or analysis of biological samples. Genetic data is considered a special category of sensitive data under GDPR and raises concerns about discrimination, familial privacy, and the irreversible nature of genetic information. See also: biometric data, HIPAA, GINA.
Gig economy (Chapter 33)
An economic arrangement characterized by short-term, task-based work facilitated by digital platforms, such as ride-hailing, food delivery, and freelance marketplaces. Gig workers often face intensive algorithmic management with limited transparency, bargaining power, or employment protections. See also: algorithmic management, worker surveillance, automation.
GINA (Genetic Information Nondiscrimination Act) (Chapter 12)
A 2008 US federal law prohibiting discrimination in employment and health insurance on the basis of genetic information. GINA addresses concerns about genetic determinism and the misuse of genetic test results. See also: genetic data, HIPAA.
Green AI (Chapter 34)
A research and development paradigm that prioritizes the energy efficiency and environmental sustainability of artificial intelligence systems, advocating for evaluation metrics that account for computational cost alongside model performance. See also: carbon footprint, environmental data ethics.

H

Hallucination (AI) (Chapter 18)
The generation of plausible-sounding but factually incorrect or fabricated content by a large language model or other generative AI system. AI hallucinations pose risks in high-stakes applications where incorrect information can cause harm and erode trust in AI-generated outputs. See also: generative AI, deepfake.
HIPAA (Health Insurance Portability and Accountability Act) (Chapters 12, 24)
A 1996 US federal law that establishes national standards for the protection of individually identifiable health information (protected health information, or PHI). HIPAA's Privacy Rule and Security Rule govern how healthcare providers, insurers, and their business associates handle health data. See also: biometric data, genetic data, sector-specific governance.
Historical bias (Chapter 14)
Bias that exists in the real world and is reflected in training data, leading algorithms to perpetuate or amplify pre-existing societal inequities. Because machine learning systems learn from historical patterns, they risk encoding the discrimination embedded in those patterns. See also: representation bias, measurement bias, algorithmic bias, feedback loop.
Human-in-the-loop (Chapter 19)
A system design principle requiring that a human decision-maker retains meaningful oversight of and authority over an automated system's critical decisions. Human-in-the-loop designs aim to prevent purely algorithmic decisions in high-stakes contexts, though the effectiveness depends on whether the human's oversight is genuinely meaningful. See also: autonomous system, accountability, automation.

I

Impossibility theorem (of fairness) (Chapter 15)
The mathematical demonstration that certain desirable fairness criteria, such as demographic parity, equalized odds, and calibration, cannot all be simultaneously satisfied except under unrealistic conditions (such as equal base rates across groups). The impossibility theorem forces explicit normative choices about which aspects of fairness to prioritize. See also: demographic parity, equalized odds, calibration, fairness.
Incident response (Chapter 30)
A structured process for detecting, containing, investigating, and recovering from a data breach or other security incident. An effective incident response plan includes defined roles, communication protocols, forensic procedures, and post-incident review. See also: data breach, breach notification, crisis communication.
Indigenous data sovereignty (Chapters 3, 32, 37)
The right of indigenous peoples to govern the collection, ownership, and application of data about their communities, lands, and resources. Indigenous data sovereignty asserts that indigenous communities are the primary stakeholders in data about themselves and challenges external appropriation of indigenous knowledge. See also: data sovereignty, data colonialism, CARE Principles.
Information asymmetry (Chapters 3, 5, 11)
A condition in which one party to a relationship possesses significantly more information than the other, creating an imbalance of power. In data contexts, information asymmetry describes the gap between what organizations know about individuals and what individuals know about how their data is collected, used, and shared. See also: power/knowledge, transparency, data subject.
Informational norms (Chapter 7)
In Helen Nissenbaum's theory of contextual integrity, the socially shared expectations governing the flow of personal information within a given context, including who is entitled to share information, about whom, with whom, and under what conditions. See also: contextual integrity, informational privacy.
Informational privacy (Chapter 7)
The right or interest of individuals in controlling how information about them is collected, used, and disseminated. Informational privacy is distinguished from spatial privacy (control over physical spaces) and decisional privacy (freedom from interference in personal choices). See also: right to privacy, contextual integrity, informational self-determination.
Informational self-determination (Chapter 7)
A concept originating in German constitutional law (the 1983 Census Decision) establishing the right of individuals to determine the disclosure and use of their personal data. Informational self-determination has influenced European data protection frameworks. See also: informational privacy, GDPR, data subject.
Informed consent (Chapters 6, 9)
A standard requiring that individuals receive clear, complete, and comprehensible information about data practices before agreeing to them, and that their agreement is given freely and without coercion. Informed consent is both an ethical principle (derived from medical ethics and research ethics) and a legal requirement under many data protection laws. See also: consent, consent fatigue, dark patterns.
Interpretability (Chapter 16)
The degree to which a human can understand the cause of a model's decision or prediction. Interpretability is closely related to, but distinct from, explainability: an interpretable model is inherently understandable (e.g., a decision tree), whereas explainability may be achieved through post-hoc methods applied to opaque models. See also: explainability, black box, transparency.
Intersectionality (Chapter 32)
A concept developed by legal scholar Kimberle Crenshaw describing how multiple dimensions of identity, such as race, gender, class, and disability, interact to produce distinct forms of disadvantage and privilege. Intersectional analysis is essential for understanding how data systems can compound inequalities affecting people at the intersection of multiple marginalized identities. See also: data feminism, data justice, equity.
IoT (Internet of Things) (Chapters 1, 38)
A network of interconnected physical devices, sensors, and objects that collect and exchange data. IoT devices range from smart home appliances to industrial sensors, and their proliferation generates vast quantities of data, raising privacy, security, and governance challenges. See also: datafication, data exhaust, digital twin, emerging technologies.

J

Justice theory (Chapter 6)
A body of ethical thought concerned with the fair distribution of benefits, burdens, opportunities, and power in society. In data ethics, justice theory, particularly John Rawls's theory of justice as fairness, informs analyses of how data systems distribute advantages and disadvantages. See also: veil of ignorance, equity, data justice, fairness.
Just transition (Chapter 33)
A framework for ensuring that the shift toward automated and data-driven economic systems does not disproportionately harm workers and communities. Originally developed in environmental policy, the concept has been applied to technology-driven labor displacement to advocate for retraining, social protection, and worker participation in transition planning. See also: automation, gig economy, algorithmic management.

K

k-anonymity (Chapter 10)
A privacy property requiring that every record in a published dataset is indistinguishable from at least k-1 other records with respect to certain identifying attributes (quasi-identifiers). k-anonymity is a foundational but limited de-identification technique, as it is vulnerable to certain attacks (e.g., homogeneity and background knowledge attacks). See also: differential privacy, anonymization, de-identification, privacy-enhancing technologies.

L

Large language model (LLM) (Chapter 18)
A type of generative AI system trained on massive text corpora that can generate, summarize, translate, and reason about natural language text. LLMs raise ethical issues related to training data provenance, copyright, environmental cost, bias, hallucination, and labor displacement. See also: generative AI, hallucination, deepfake.
Liability (Chapters 17, 19, 25)
Legal responsibility for harms caused by a product, service, or action. In the context of AI and data systems, liability questions arise around who (developer, deployer, operator, or end user) bears legal responsibility when an automated system causes injury, discrimination, or financial loss. See also: accountability, accountability gap, distributed responsibility.
LIME (Local Interpretable Model-Agnostic Explanations) (Chapter 16)
A technique for explaining the predictions of any machine learning model by approximating it locally with a simpler, interpretable model. LIME generates feature-importance explanations for individual predictions. See also: SHAP, explainability, black box, transparency.

M

Machine learning (Chapters 13, 14)
A branch of artificial intelligence in which systems improve their performance on a task through exposure to data, without being explicitly programmed with rules. Machine learning encompasses supervised learning, unsupervised learning, and reinforcement learning, and is the foundation of most contemporary algorithmic decision-making systems. See also: algorithm, algorithmic bias, training data.
Mass surveillance (Chapter 36)
The large-scale monitoring of entire populations or significant segments of a population, typically by governments, through communications interception, metadata collection, CCTV networks, or other means. The Snowden disclosures of 2013 revealed the extent of mass surveillance programs conducted by the NSA and its Five Eyes partners. See also: FISA, Five Eyes, dataveillance, panopticon.
Measurement bias (Chapter 14)
Bias that arises when the variables chosen to measure a concept of interest are flawed proxies, systematically misrepresenting the phenomenon for certain groups. For example, using arrest rates as a proxy for crime rates introduces measurement bias because policing patterns are themselves biased. See also: historical bias, representation bias, algorithmic bias.
Metadata (Chapter 1)
Data about data: information describing the characteristics, context, and structure of a dataset, such as its creation date, author, format, schema, and provenance. Metadata is also used to describe the properties of individual data items (e.g., the time and location of a photograph). Although often perceived as innocuous, metadata can reveal sensitive information about individuals' behaviors and associations. See also: data, data catalog, data lineage.
Misinformation (Chapter 31)
False or inaccurate information spread without the intent to deceive, often through carelessness, misunderstanding, or algorithmic amplification. Misinformation is distinguished from disinformation, which involves deliberate deception. See also: disinformation, platform governance, content moderation.
Model card (Chapter 29)
A structured documentation framework, proposed by Margaret Mitchell and colleagues, that accompanies a machine learning model and reports its intended use, performance characteristics, evaluation metrics, ethical considerations, and known limitations. Model cards promote transparency and informed deployment decisions. See also: datasheet for datasets, responsible AI, explainability.
Model drift (Chapter 29)
The degradation of a machine learning model's performance over time as the statistical properties of the real-world data it encounters diverge from the data on which it was trained. Model drift can introduce or amplify biases and reduce accuracy, making ongoing monitoring essential. See also: responsible AI, data quality, algorithmic audit.
Moral agency (Chapter 19)
The capacity to make moral judgments and be held morally responsible for one's actions. Whether AI systems possess or can possess moral agency is a contested question in AI ethics, with implications for the assignment of responsibility when autonomous systems cause harm. See also: autonomous system, accountability, human-in-the-loop.
Moral pluralism (Chapter 6)
The view that there are multiple valid ethical frameworks and that no single theory can resolve all moral questions. In data ethics, moral pluralism encourages practitioners to draw on utilitarianism, deontology, virtue ethics, care ethics, and justice theory as complementary rather than competing lenses. See also: utilitarianism, deontology, virtue ethics, care ethics, justice theory.

N

National security (Chapter 36)
The protection of a nation's sovereignty, territorial integrity, and citizens from threats, which in the digital age increasingly involves surveillance, signals intelligence, cybersecurity, and the regulation of encryption. Tensions between national security imperatives and civil liberties are a central theme in data governance. See also: mass surveillance, FISA, Five Eyes, encryption.
Notice and consent (Chapter 9)
The prevailing model for privacy governance in which organizations inform individuals about data practices (notice) and obtain their agreement (consent) before proceeding. Critics argue that notice and consent places an unreasonable burden on individuals and provides inadequate protection in practice. See also: consent, informed consent, consent fatigue, privacy by design.

O

Opt-in / Opt-out (Chapters 9, 20)
Two models for obtaining consent for data collection. Under opt-in, data is not collected unless the individual affirmatively agrees. Under opt-out, data is collected by default, and the individual must take action to prevent collection. GDPR generally requires opt-in consent, while the US approach has historically favored opt-out. See also: consent, GDPR, CCPA.

P

Panopticon (Chapters 5, 8)
An architectural design by Jeremy Bentham for an institutional building (originally a prison) in which a central observer can see all inmates without them knowing whether they are being watched. Michel Foucault used the panopticon as a metaphor for modern disciplinary power, arguing that the internalization of potential surveillance shapes behavior. See also: panopticism, surveillance, dataveillance, disciplinary power.
Panopticism (Chapters 5, 8)
The social dynamic, described by Michel Foucault, in which the possibility of being observed, even without certainty, leads individuals to regulate their own behavior. In data ethics, panopticism describes how pervasive data collection and monitoring create a chilling effect on expression and action. See also: panopticon, disciplinary power, surveillance, dataveillance.
Participatory design (Chapter 39)
A design methodology that actively involves the people who will be affected by a system in its design, development, and evaluation. In data governance, participatory design seeks to democratize decision-making about data systems by including data subjects, communities, and other stakeholders. See also: citizen assembly, data cooperative, speculative design.
Persuasive design (Chapter 4)
The intentional use of design techniques to influence user behavior, attitudes, or decisions. In digital platforms, persuasive design encompasses features such as infinite scroll, variable-ratio reinforcement schedules (notifications), and social proof mechanisms. When used manipulatively, it overlaps with dark patterns. See also: dark patterns, engagement optimization, attention economy.
Phronesis (Chapter 6)
Practical wisdom; the Aristotelian virtue of discerning the right course of action in particular circumstances through experience, judgment, and moral perception. In data ethics, phronesis is invoked as the capacity that enables practitioners to navigate ethical dilemmas that cannot be resolved by rules alone. See also: virtue ethics, moral pluralism.
Platform governance (Chapters 31, 35)
The rules, norms, architectures, and enforcement mechanisms through which digital platforms regulate the conduct of their users and the distribution of content. Platform governance encompasses content moderation policies, algorithmic ranking decisions, advertising standards, and API access policies. See also: content moderation, Section 230, DSA, misinformation.
Power/knowledge (Chapter 5)
A concept from Michel Foucault describing the inseparable relationship between power and knowledge: those who control the production of knowledge shape what is considered true, and those who define truth exercise power. In data ethics, power/knowledge illuminates how data collection, classification, and analysis are exercises of power. See also: biopower, disciplinary power, panopticism, information asymmetry.
Practitioner's oath (Chapter 40)
A formal commitment, analogous to the Hippocratic oath in medicine, articulating the ethical principles and responsibilities that data professionals pledge to uphold. The concept appears in the capstone chapter as a tool for integrating the book's themes into professional identity. See also: data ethics program, responsible AI.
Prebunking (Chapter 31)
A proactive strategy for combating misinformation by inoculating individuals against manipulation techniques before they encounter false content. Prebunking is informed by inoculation theory from social psychology and typically involves brief educational interventions. See also: misinformation, disinformation, platform governance.
Precautionary principle (Chapter 38)
The principle that when an activity raises threats to the environment or human health or welfare, precautionary measures should be taken even if some cause-and-effect relationships are not fully established scientifically. Applied to technology governance, the precautionary principle argues for proactive regulation of potentially harmful technologies. See also: anticipatory governance, Collingridge dilemma, risk-based regulation.
Predictive policing (Chapters 8, 14)
The use of data analysis and algorithmic models to predict where crimes will occur or who is likely to commit crimes. Predictive policing systems have been widely criticized for perpetuating racial bias, as they often rely on historically biased arrest and incident data. See also: feedback loop, algorithmic bias, disparate impact, COMPAS.
Privacy (Chapters 7, 8, 9, 10)
A multifaceted concept encompassing the right to control access to one's person, spaces, and information. This text treats privacy as both an individual right and a social good, drawing on multiple theoretical traditions including Warren and Brandeis's "right to be let alone," Westin's informational self-determination, and Nissenbaum's contextual integrity. See also: informational privacy, right to privacy, contextual integrity.
Privacy by design (Chapter 10)
A framework, developed by Ann Cavoukian, advocating for the integration of privacy protections into the design, operation, and management of data systems from the outset, rather than adding them as an afterthought. Privacy by design encompasses seven foundational principles. See also: data minimization, privacy-enhancing technologies, GDPR.
Privacy-enhancing technologies (PETs) (Chapter 10)
Technical tools and methods that protect personal privacy in data processing, including encryption, differential privacy, homomorphic encryption, secure multiparty computation, and federated learning. PETs enable data analysis while minimizing the exposure of individual-level data. See also: differential privacy, k-anonymity, anonymization, encryption.
Privacy impact assessment (PIA) (Chapter 28)
A systematic evaluation of how a proposed project, system, or policy may affect the privacy of individuals. PIAs identify privacy risks and recommend measures to mitigate them. While related to DPIAs under GDPR, PIAs are a broader concept not limited to any single jurisdiction. See also: DPIA, algorithmic impact assessment, privacy by design.
Privacy paradox (Chapter 11)
The observed discrepancy between individuals' stated privacy concerns and their actual privacy-related behavior. Although people express high levels of privacy concern in surveys, they frequently share personal information with minimal precaution. Explanations include rational choice under uncertainty, bounded rationality, and resignation. See also: consent fatigue, behavioral surplus, information asymmetry.
Pseudonymization (Chapter 10)
The processing of personal data so that it can no longer be attributed to a specific data subject without the use of additional information, which is kept separately and subject to technical and organizational safeguards. Unlike anonymization, pseudonymization is reversible. Under GDPR, pseudonymized data remains personal data. See also: anonymization, de-identification, data minimization.
Purpose limitation (Chapters 10, 20)
A data protection principle requiring that personal data be collected for specified, explicit, and legitimate purposes and not further processed in ways incompatible with those purposes. Purpose limitation is a foundational principle of GDPR. See also: data minimization, GDPR, consent.

Q

Quantified self (Chapter 1)
A movement and set of practices in which individuals use technology, such as fitness trackers, health monitors, and productivity apps, to systematically track and analyze data about their own lives. The quantified self illustrates the voluntary dimension of datafication. See also: datafication, data exhaust, IoT.
Quantum computing (Chapter 38)
A computing paradigm that uses quantum-mechanical phenomena (such as superposition and entanglement) to perform certain computations far faster than classical computers. Quantum computing raises anticipatory governance concerns, particularly regarding its potential to break current encryption standards. See also: encryption, anticipatory governance, emerging technologies.

R

Reasonable expectation of privacy (Chapter 7)
A legal standard, particularly influential in US Fourth Amendment jurisprudence, for determining whether a government intrusion constitutes a "search" requiring a warrant. The test, established in Katz v. United States (1967), asks whether an individual had a subjective expectation of privacy that society recognizes as objectively reasonable. See also: right to privacy, third-party doctrine, informational privacy.
Recommendation system (Chapter 13)
An algorithmic system that predicts and suggests items, content, or actions that a user is likely to be interested in, based on user data, item characteristics, and behavioral patterns. Recommendation systems shape information access and consumption on platforms ranging from streaming services to news aggregators. See also: algorithm, content moderation, engagement optimization, filter bubble.
Red-teaming (Chapter 29)
A practice in which a team deliberately assumes an adversarial posture to identify vulnerabilities, failure modes, and potential harms in a system before deployment. In responsible AI development, red-teaming involves probing a model for biased outputs, security vulnerabilities, and harmful capabilities. See also: responsible AI, algorithmic audit, model card.
Regulatory capture (Chapter 25)
A phenomenon in which a regulatory agency, created to act in the public interest, instead advances the commercial or political concerns of the industry it is supposed to regulate. Regulatory capture can occur through revolving-door hiring, lobbying, information asymmetry, and financial dependence. See also: enforcement, compliance, accountability gap.
Regulatory sandbox (Chapter 38)
A controlled environment in which innovative technologies, products, or business models can be tested under regulatory oversight with relaxed requirements, allowing regulators and innovators to learn about risks and benefits before full-scale regulation is applied. See also: anticipatory governance, precautionary principle, risk-based regulation.
Representation bias (Chapter 14)
Bias that occurs when training data fails to adequately represent certain populations, leading to algorithmic systems that perform poorly for underrepresented groups. Representation bias can result from sampling methods, data collection practices, or the historical exclusion of marginalized communities from datasets. See also: historical bias, measurement bias, algorithmic bias.
Responsible AI (Chapter 29)
An umbrella term for the principles, practices, and governance structures aimed at ensuring that artificial intelligence systems are developed and deployed in ways that are ethical, transparent, accountable, fair, and aligned with human values. See also: model card, datasheet for datasets, red-teaming, fairness, data ethics program.
Right to explanation (Chapters 16, 20)
The principle, partially codified under GDPR's Articles 13, 14, and 22, that individuals subject to automated decision-making have the right to meaningful information about the logic involved. The scope and enforceability of the right to explanation remain debated. See also: explainability, transparency, GDPR, black box.
Right to privacy (Chapter 7)
The claim that individuals possess a fundamental right to control access to their personal information, physical spaces, and decisional autonomy. The right to privacy has constitutional, statutory, and human rights dimensions and has evolved significantly in response to digital technologies. See also: informational privacy, contextual integrity, reasonable expectation of privacy.
Risk-based regulation (Chapter 21)
A regulatory approach that classifies technologies, activities, or applications by their potential for harm and applies graduated levels of oversight accordingly. The EU AI Act is the paradigmatic example, distinguishing among unacceptable, high, limited, and minimal risk AI applications. See also: EU AI Act, conformity assessment, precautionary principle.

S

Schrems I and II (Chapter 23)
Two landmark decisions by the Court of Justice of the European Union (CJEU) that invalidated, respectively, the US-EU Safe Harbor agreement (2015, Schrems I) and the Privacy Shield framework (2020, Schrems II) on the grounds that US surveillance laws did not provide adequate protection for EU citizens' data. Named after Austrian privacy activist Max Schrems. See also: adequacy decision, standard contractual clauses, cross-border data flows.
Section 230 (Chapter 31)
Section 230 of the US Communications Decency Act (1996), which provides legal immunity to online platforms for third-party content posted by users and protects platforms' good-faith content moderation decisions. Section 230 is widely regarded as foundational to the development of the modern internet and is subject to ongoing reform debates. See also: platform governance, content moderation, DSA.
Self-fulfilling prophecy (Chapter 14)
A dynamic in which a prediction or belief, once acted upon, causes the predicted outcome to occur. In algorithmic contexts, a model's predictions can shape institutional actions (e.g., increased policing) that generate data confirming the original prediction, regardless of its initial accuracy. See also: feedback loop, predictive policing.
SHAP (SHapley Additive exPlanations) (Chapter 16)
An explainability technique based on Shapley values from cooperative game theory that assigns each feature a contribution value for a particular prediction, providing consistent and locally accurate explanations. See also: LIME, explainability, black box, transparency.
Smart city (Chapters 1, 8, 38)
An urban area that integrates digital technologies, IoT sensors, and data analytics into the management of city infrastructure, services, and governance. Smart city initiatives raise questions about surveillance, consent, democratic accountability, and the equitable distribution of technological benefits. See also: IoT, dataveillance, digital sovereignty, Eli's Detroit thread.
Sousveillance (Chapter 5)
The practice of surveillance from below, in which individuals or communities use recording technologies to monitor those in positions of power, such as police officers or government officials. Sousveillance inverts the panoptic gaze and has been theorized as a tool of resistance and accountability. See also: surveillance, dataveillance, panopticon.
Speculative design (Chapter 39)
A design practice that creates scenarios, artifacts, and experiences depicting possible futures in order to provoke critical reflection and public deliberation about the social implications of emerging technologies. In data governance, speculative design helps stakeholders imagine and evaluate alternative governance configurations. See also: anticipatory governance, participatory design, citizen assembly.
Splinternet (Chapter 23)
A term describing the potential fragmentation of the global internet into separate, nationally or regionally controlled networks with different rules, content, and access policies. The splinternet reflects tensions between digital sovereignty and the vision of a unified, open internet. See also: data localization, digital sovereignty, cross-border data flows.
Standard contractual clauses (SCCs) (Chapter 23)
Pre-approved legal frameworks provided by the European Commission that organizations can use to ensure adequate data protection safeguards when transferring personal data outside the EU. SCCs gained increased importance following the invalidation of Privacy Shield in Schrems II. See also: adequacy decision, GDPR, cross-border data flows.
Statistical state (Chapter 2)
A concept describing the modern state's reliance on statistics, measurement, and data collection as tools of governance, policy-making, and population management. The statistical state emerged alongside the development of census-taking, public health records, and national statistical agencies. See also: census, biopower, power/knowledge.
Structured data (Chapter 1)
Data that is organized in a predefined format, such as rows and columns in a relational database or fields in a spreadsheet. Structured data is easily searchable and analyzable using standard computational tools. See also: unstructured data, data, database.
Surveillance (Chapters 8, 36)
The systematic monitoring and observation of individuals, groups, or populations by authorities, organizations, or other parties. In the digital age, surveillance extends beyond physical observation to encompass the collection and analysis of digital data, communications metadata, and behavioral patterns. See also: dataveillance, panopticon, mass surveillance, sousveillance.
Surveillance capitalism (Chapters 4, 8)
A term coined by Shoshana Zuboff describing an economic logic in which human experience is unilaterally claimed as raw material for translation into behavioral data, which is then used to create prediction products traded in behavioral futures markets. Surveillance capitalism represents a new form of economic power that profits from the prediction and modification of human behavior. See also: behavioral surplus, attention economy, data exhaust.
Synthetic media (Chapter 18)
Media content, including images, video, audio, and text, generated or significantly altered by artificial intelligence. Synthetic media encompasses deepfakes, AI-generated art, and machine-written text, and raises questions about authenticity, consent, and trust. See also: deepfake, generative AI, watermarking.

T

Third-party doctrine (Chapter 7)
A legal principle in US jurisprudence holding that individuals have no reasonable expectation of privacy in information voluntarily disclosed to third parties, such as banks (financial records) or telecommunications providers (call metadata). The third-party doctrine has been partially narrowed by the Supreme Court's Carpenter v. United States (2018) decision. See also: reasonable expectation of privacy, metadata, dataveillance.
Training data (Chapters 14, 18)
The dataset used to train a machine learning model. The composition, quality, and representativeness of training data fundamentally shape the model's behavior, including its biases, capabilities, and limitations. See also: historical bias, representation bias, data quality, machine learning.
Transparency (Chapters 16, 17, 22)
The quality of openness and visibility in decision-making processes, algorithmic systems, and organizational practices. Transparency in data ethics encompasses making data practices understandable to affected parties, disclosing how algorithmic decisions are made, and enabling external scrutiny. Meaningful transparency goes beyond disclosure to ensure comprehensibility. See also: explainability, black box, accountability, algorithmic audit.
Transparency theater (Chapter 16)
The performance of transparency without its substance, such as publishing lengthy, jargon-filled technical disclosures that technically satisfy disclosure requirements but are not genuinely comprehensible to the audiences that need them. See also: transparency, explainability, accountability.

U

Unstructured data (Chapter 1)
Data that does not conform to a predefined data model or organizational schema, such as free-text documents, images, audio files, social media posts, and video. Unstructured data constitutes the majority of data generated globally and poses distinct challenges for analysis, governance, and privacy protection. See also: structured data, data, Big Data.
Utilitarianism (Chapter 6)
An ethical framework holding that the morally right action is the one that produces the greatest overall well-being (or "utility") for the greatest number of people. In data ethics, utilitarian reasoning weighs the aggregate benefits of data practices against their harms. Critics note that utilitarianism can justify significant harm to minorities if the majority benefits sufficiently. See also: deontology, virtue ethics, care ethics, moral pluralism.

V

Veil of ignorance (Chapter 6)
A thought experiment devised by John Rawls in which decision-makers choose principles of justice from behind a "veil" that prevents them from knowing their own social position, abilities, or group membership. The veil of ignorance is used in data ethics to test whether data governance arrangements would be acceptable to someone who does not know whether they would benefit or be disadvantaged by them. See also: justice theory, fairness, equity.
Virtue ethics (Chapter 6)
An ethical tradition, rooted in Aristotle, that focuses on the character and moral virtues of the agent rather than on rules or consequences. Virtue ethics asks not "What should I do?" but "What kind of person should I be?" In data ethics, virtue ethics emphasizes cultivating professional dispositions such as honesty, courage, justice, and practical wisdom (phronesis). See also: phronesis, deontology, utilitarianism, care ethics, moral pluralism.

W

Watermarking (digital) (Chapter 18)
A technique for embedding imperceptible identifying information into digital content, including AI-generated media, to enable provenance tracking, authentication, and detection of synthetic content. Watermarking is one of several approaches to maintaining trust in the age of generative AI. See also: deepfake, synthetic media, generative AI.
Worker surveillance (Chapter 33)
The monitoring of employees' activities, productivity, communications, location, and even biometric data by employers, often facilitated by digital technologies. Worker surveillance has intensified with remote work and platform-based labor, raising concerns about dignity, autonomy, and the boundary between work and private life. See also: algorithmic management, gig economy, dataveillance.

X

XAI (Explainable AI) (Chapter 16)
A subfield of artificial intelligence focused on developing techniques and methods that make AI systems' decisions understandable to humans. XAI encompasses interpretable model design, post-hoc explanation methods (such as LIME and SHAP), and evaluation of explanation quality. See also: explainability, LIME, SHAP, transparency, black box.

This glossary is intended as a reference companion to the main text. For fuller treatment of any term, consult the chapter(s) indicated. For the relationships among key concepts, see the "See also" cross-references and the thematic diagrams in the relevant chapter introductions.