Further Reading: The Data All Around Us

The sources below provide deeper engagement with the themes introduced in Chapter 1. They are organized by topic and include a mix of foundational texts, empirical research, accessible popular works, and policy reports. Annotations describe what each source covers and why it is relevant to the chapter's core questions.

Foundational Texts on Data and Society

Kitchin, Rob. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: SAGE Publications, 2014. One of the most comprehensive introductions to the social dimensions of data. Kitchin examines what data is, how it is generated, and the political and economic structures that shape its use. Essential for understanding the conceptual vocabulary introduced in this chapter, including the distinction between data types and the significance of data infrastructures.

Gitelman, Lisa (ed.). "Raw Data" Is an Oxymoron. Cambridge, MA: MIT Press, 2013. This edited collection challenges the widespread assumption that data is raw, objective, or pre-interpretive. Contributors from history, media studies, and information science demonstrate that data is always shaped by the instruments, categories, and intentions behind its collection. A powerful corrective for anyone who treats data as simply "given."

Floridi, Luciano. The Ethics of Information. Oxford: Oxford University Press, 2013. A philosophical treatment of what information is, how it relates to knowledge, and what ethical obligations arise from its creation and circulation. Floridi's framework for informational privacy and the moral status of data entities provides conceptual grounding for the governance questions raised throughout this textbook.

Datafication and the Quantified World

Mayer-Schonberger, Viktor, and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt, 2013. An accessible and widely cited introduction to the big data phenomenon. The authors argue that the shift from sampling to comprehensive data collection changes not just the scale but the nature of knowledge production. Useful for understanding the promises and pressures of datafication, though readers should note the book's generally optimistic framing.

Van Dijck, Jose. "Datafication, Dataism, and Dataveillance: Big Data between Scientific Paradigm and Ideology." Surveillance & Society 12, no. 2 (2014): 197–208. Van Dijck introduces the concept of "dataism" — a widespread belief in the objective quantification of all human behavior — and critiques its ideological foundations. This article is essential for understanding how datafication is not just a technical process but a cultural and political project with significant implications for power and governance.

Lupton, Deborah. The Quantified Self. Cambridge: Polity Press, 2016. A sociological analysis of self-tracking practices, from fitness wearables to mood-logging apps. Lupton explores how quantified self technologies reshape understandings of the body, health, and personal responsibility. Directly relevant to the chapter's discussion of how individuals generate — and sometimes surrender control over — their own data.

Cheney-Lippold, John. We Are Data: Algorithms and the Making of Our Digital Selves. New York: New York University Press, 2017. Cheney-Lippold examines how algorithmic systems construct digital identities from behavioral data, often in ways that diverge from how individuals understand themselves. The book is particularly strong on the concept of data exhaust and the gap between user awareness and institutional data practices.

Metadata, Privacy, and Identification

Mayer, Jonathan, Patrick Mutchler, and John C. Mitchell. "Evaluating the Privacy Properties of Telephone Metadata." Proceedings of the National Academy of Sciences 113, no. 20 (2016): 5536–5541. This empirical study demonstrates that telephone metadata alone — without access to call content — can be used to infer sensitive personal information, including medical conditions, firearm ownership, and religious affiliation. A landmark paper for understanding why the chapter's claim that "metadata can be more revealing than data itself" is not hyperbole.

Narayanan, Arvind, and Vitaly Shmatikov. "Robust De-anonymization of Large Sparse Datasets." IEEE Symposium on Security and Privacy (2008): 111–125. The paper that demonstrated how supposedly anonymized Netflix viewing records could be re-identified using publicly available IMDb ratings. Narayanan and Shmatikov's work is foundational to the chapter's discussion of the instability of the personal/non-personal data boundary and the limits of anonymization as a privacy strategy.

Solove, Daniel J. Understanding Privacy. Cambridge, MA: Harvard University Press, 2008. Solove argues that privacy is not a single concept but a family of related concerns — information collection, processing, dissemination, and invasion. His taxonomy is invaluable for unpacking why different data types and different lifecycle stages raise different kinds of privacy risks, a theme that runs through the entire chapter.

Ohm, Paul. "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization." UCLA Law Review 57 (2010): 1701–1777. A legal analysis of why anonymization techniques consistently fail to protect individuals from re-identification. Ohm argues that the legal and regulatory frameworks built on the assumption of effective anonymization are fundamentally flawed. Directly relevant to the Key Debates section on whether anonymization can ever be permanent.

Data Governance: Introductions and Frameworks

Zuboff, Shoshana. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: PublicAffairs, 2019. Zuboff's influential and ambitious book argues that major technology companies have created a new economic logic — surveillance capitalism — that claims human experience as free raw material for commercial extraction. While some scholars contest the scope of her claims, the book is indispensable for understanding why data governance has become one of the defining political questions of the twenty-first century.

Véliz, Carissa. Privacy Is Power: Why and How You Should Take Back Control of Your Data. London: Bantam Press, 2020. A philosopher's argument that privacy is not a personal preference but a form of political power, and that its erosion undermines democracy, equality, and autonomy. Véliz writes with clarity and urgency, making complex governance debates accessible to a broad audience. An excellent companion to the chapter's closing argument that data governance is not optional.

European Commission. "General Data Protection Regulation (GDPR): Regulation (EU) 2016/679." Official Journal of the European Union, 2016. The full text of the GDPR remains the single most important reference document for understanding modern data governance. Its definitions of personal data, sensitive data categories, data processing principles, and individual rights (access, rectification, erasure, portability) directly correspond to the concepts introduced in this chapter. Available freely online.

World Economic Forum. "Personal Data: The Emergence of a New Asset Class." Geneva: World Economic Forum, 2011. An early and influential policy report framing personal data as an economic asset comparable to oil or gold. While subsequent scholarship has complicated this metaphor — data is non-rivalrous, context-dependent, and difficult to price — the report remains useful for understanding the economic reasoning that drives both data collection and resistance to governance.

These readings are starting points, not endpoints. As subsequent chapters introduce new themes — historical power, algorithmic bias, consent, surveillance, and collective governance — the further reading sections will build on and extend the foundations laid here.