Chapter 2: Key Takeaways

A Brief History of Data and Society


Summary Card

  1. Data collection is as old as governance itself. From Sumerian clay tablets (c. 3800 BCE) to the Roman census to the Domesday Book, the impulse to count and classify populations has always been bound up with taxation, military power, and social control. Counting has never been a neutral act.

  2. Classification constructs what it claims to describe. The British colonial census in India didn't record pre-existing castes — it hardened fluid social identities into rigid administrative categories that determined access to resources and political power. Modern algorithmic classifications carry the same risk.

  3. Statistical methods carry the fingerprints of their origins. Francis Galton developed correlation and regression in service of eugenics. This doesn't invalidate the tools, but it demands awareness of how seemingly neutral mathematics can encode discriminatory assumptions — a lesson directly applicable to machine learning today.

  4. Data technology enables systematic harm at scale. IBM's punch card systems did not cause the Holocaust, but they enabled its systematic, industrial character. The distinction between causation and enablement is crucial: technology amplifies human choices rather than replacing them.

  5. Governance consistently lags behind technology. The internet commercialized in the 1990s; the GDPR took effect in 2018. The National Data Center debate of the 1960s produced the Privacy Act a decade later. This governance lag is not accidental — it is structural, and shortening it is one of the central challenges of data ethics.

  6. The surveillance business model was a choice, not an inevitability. The internet was not destined to be funded through advertising-driven data extraction. That outcome resulted from specific decisions by specific companies. Recognizing it as a choice is the first step toward imagining alternatives.

  7. Predictive analytics shift power from description to preemption. The move from analyzing what happened to predicting what will happen — in policing, insurance, hiring, healthcare — raises fundamental questions about acting on predictions about people who haven't done anything yet. Feedback loops can make predictions self-fulfilling.

  8. The AI era concentrates data power and introduces a provenance crisis. Machine learning's appetite for training data advantages companies that already hold the largest datasets, while generative AI makes it increasingly difficult to distinguish human-created from machine-generated content.

  9. The costs of data systems fall disproportionately on the least powerful. From colonial subjects to communities targeted by predictive policing, marginalized groups consistently bear the greatest risks while the benefits accrue to states, corporations, and wealthy nations.

  10. Every era of unchecked data power has eventually produced a governance response. Fair Information Practice Principles, the Privacy Act, data protection authorities, the GDPR — all emerged from crises. History offers both warning and hope: the harm is real, but so is the capacity for governance innovation.


Key Concepts

Concept Definition First Introduced
Census From Latin censere (to assess/estimate); a systematic count and classification of a population, historically tied to taxation and military power Section 2.1
Statistical state A state that uses statistical methods to make its population legible and governable Section 2.2.2
Punch card A physical medium for encoding data as patterns of holes, enabling mechanical sorting and tabulation; the foundational technology for modern data processing Section 2.3.1
Database An organized collection of data stored electronically, enabling storage, search, retrieval, and linking of records Section 2.4.1
Information asymmetry An imbalance in which one party in a relationship holds significantly more data than the other, creating a power differential Section 2.5.2
Surveillance business model An economic model in which personal data is collected through free services and monetized through targeted advertising Section 2.5.2
Big Data Data characterized by high volume, velocity, and variety, often exceeding the capacity of traditional processing tools Section 2.6.1
Predictive analytics Statistical and machine learning techniques used to forecast future events or behaviors based on historical data Section 2.6.2
Digital enclosure The process by which everyday activities generate data captured by platforms and corporations, creating a comprehensive record of behavior Section 2.5
Platform capitalism An economic system in which digital platforms extract value by mediating interactions and collecting data from participants Section 2.5.2
Data provenance The documented history of a piece of data, including its origin, transformations, and chain of custody Section 2.7.2
Techno-determinism The belief that technology drives social change independently of human choices, politics, and institutions Section 2.5.2

Key Debates

Is technological determinism useful or misleading? The chapter argues against strong technological determinism — the view that technology's effects are independent of human choices. The surveillance business model was a choice; the punch card didn't cause the Holocaust. But a weaker version of the argument — that certain technologies make certain outcomes more likely — has some validity. Where you land on this debate shapes how you approach governance: determinists focus on controlling technology itself; voluntarists focus on shaping the human decisions around it.

Are historical analogies illuminating or dangerous? Comparing surveillance capitalism to colonialism, or algorithmic classification to the colonial census, can reveal structural patterns. But such analogies can also mislead by obscuring the differences in scale, intent, and context. The chapter uses analogies as analytical tools while acknowledging their limits.

Is the governance lag solvable? If governance always trails technology, is anticipatory governance possible — or will we always be writing regulations for the last crisis? This question connects directly to Chapter 38 (Emerging Technologies and Anticipatory Governance).


Applied Framework: The Historical Precedent Test

When encountering a new data governance debate, apply this five-step analysis:

  1. Identify the historical precedent. What earlier situation does this resemble?
  2. Analyze the power dynamics. Who benefited from the precedent, and who bore the costs?
  3. Trace the governance response. What governance mechanisms eventually emerged?
  4. Compare and contrast. How is the current situation similar to and different from the precedent?
  5. Project forward. What does the precedent suggest about likely outcomes if governance is not developed?

This framework is not a formula — historical analogies always have limits. But it provides a disciplined starting point for analysis, grounding contemporary debates in evidence rather than speculation.


Looking Ahead

Chapter 2 established that data has always been entangled with power. Chapter 3 — Who Owns Your Data? — moves from history to one of the most contested questions in the contemporary data landscape: the question of ownership. Who has rights over the data generated by your body, your behavior, your creative work, and your digital life? The answers depend on legal tradition, data type, and theory of ownership — and they carry profound practical consequences for everything from health records to social media posts to AI training data.

The historical patterns from Chapter 2 will remain relevant: ownership debates are shaped by the same power asymmetries, governance lags, and dual-use tensions we traced across millennia of data history.