Chapter 2 Quiz: A Brief History of Data and Society
Test your understanding of the historical arc traced in this chapter. Answers are hidden — attempt each question before revealing them.
Section 1: Multiple Choice
Q1. The word "census" derives from the Latin censere, meaning:
a) To count b) To assess or estimate c) To govern d) To classify
Answer
**b) To assess or estimate.** (Section 2.1) The etymology is significant because it reveals that censuses were never simply about counting — they were about *evaluating* populations for the purposes of taxation, military conscription, and resource allocation.Q2. What was the primary governance significance of the Domesday Book (1086)?
a) It was the first census to include women b) It established the principle of data protection c) It created a centralized, standardized dataset for crown taxation and resource control d) It introduced statistical sampling methods to Europe
Answer
**c) It created a centralized, standardized dataset for crown taxation and resource control.** (Section 2.1.2) The Domesday Book was a data governance innovation that allowed William the Conqueror to know precisely what he owned, could tax, and could mobilize — and it functioned as an instrument of conquest, compiled by Norman administrators cataloguing Anglo-Saxon property.Q3. According to the chapter, the British colonial census in India beginning in 1871:
a) Accurately recorded pre-existing caste categories b) Hardened fluid social identities into rigid administrative classifications c) Abolished the caste system by replacing it with economic categories d) Was rejected by the Indian population and never completed
Answer
**b) Hardened fluid social identities into rigid administrative classifications.** (Section 2.2.1) The chapter draws on historian Nicholas Dirks to argue that the colonial census "created" caste as a pan-Indian system. Census categories determined access to education, employment, and political representation, meaning the data was *constructing* society rather than merely describing it.Q4. Herman Hollerith's punch card system was significant because it:
a) Replaced the census entirely b) Was the first computer c) Enabled cross-tabulation of multiple variables at scale d) Was developed specifically for military use
Answer
**c) Enabled cross-tabulation of multiple variables at scale.** (Section 2.3.1) Hollerith's punch card system tabulated the 1890 U.S. Census in one year (versus eight years for the 1880 Census by hand) and, critically, allowed governments to cross-tabulate occupation by ethnicity by geography by age — transforming not just the speed of counting but what was *possible* to count.Q5. The 1965 National Data Center proposal was significant because:
a) It was successfully built and remains in operation today b) It was the first government database c) The debate it generated led to early data protection legislation d) It was the precursor to the internet
Answer
**c) The debate it generated led to early data protection legislation.** (Section 2.4.2) Although the National Data Center was never built, the Congressional backlash led to the Fair Credit Reporting Act (1970) and the Privacy Act (1974) — among the first modern data protection laws in the United States.Q6. According to Shoshana Zuboff, Google's foundational innovation was:
a) Creating the first search engine b) Developing the PageRank algorithm c) Converting "behavioral surplus" from user activity into prediction products sold to advertisers d) Building the first cloud computing platform
Answer
**c) Converting "behavioral surplus" from user activity into prediction products sold to advertisers.** (Section 2.5.2) Zuboff argues that Google pioneered the monetization of data exhaust — the information generated by user behavior beyond what was needed to improve the search service itself. This model was then replicated across the internet economy.Q7. The chapter identifies four principles illustrated by credit scoring. Which of the following is NOT one of them?
a) Reduction b) Transparency c) Consequentiality d) Disparate impact
Answer
**b) Transparency.** (Section 2.4.3) The four principles are Reduction (complex life collapsed to a single number), **Opacity** (not Transparency — most consumers had no idea how the score was calculated), Consequentiality (the score determined life-altering outcomes), and Disparate Impact (historical discrimination encoded in scoring models). The fact that "Opacity" rather than "Transparency" characterizes credit scoring is central to the chapter's argument.Q8. What does the chapter mean by the "data provenance crisis" in the AI era?
a) AI systems are running out of training data b) We can no longer always determine whether data was produced by humans or machines c) Data is becoming too expensive to store d) Governments have lost the ability to regulate data flows
Answer
**b) We can no longer always determine whether data was produced by humans or machines.** (Section 2.7.2) Generative AI produces synthetic content at scale — text, images, audio, video — that is increasingly difficult to distinguish from human-created content. Additionally, future AI training data increasingly includes output from previous AI systems, compounding the provenance problem.Q9. The "Ratchet Effect" in data collection refers to:
a) Data becoming less accurate over time b) Data collection capabilities expanding but rarely contracting c) The tendency for data breaches to escalate d) The acceleration of computing power over time
Answer
**b) Data collection capabilities expanding but rarely contracting.** (Section 2.8.1) Each new capability — census expansion, punch card cross-tabulation, database linking, real-time internet surveillance — becomes the baseline for the next expansion. Governments and corporations rarely *un-collect* data.Q10. According to the chapter's analysis, who has historically borne the disproportionate costs of data collection systems?
a) Technology companies b) Government regulators c) Marginalized groups, including colonial subjects, racial minorities, low-income communities, and the Global South d) Academic researchers
Answer
**c) Marginalized groups, including colonial subjects, racial minorities, low-income communities, and the Global South.** (Section 2.8.1, "The Burden Falls Downward") This is identified as one of the four recurring dynamics across the entire history of data and society, from colonial censuses to predictive policing to AI's concentration of power.Section 2: True or False (with Justification)
For each statement, indicate whether it is True or False, then provide a one- to two-sentence justification referencing the relevant section.
Q11. The chapter argues that technology alone caused the Holocaust.
Answer
**False.** (Section 2.3.2) The chapter explicitly states: "The punch card didn't cause the genocide — that required ideology, political power, and human cruelty. But it enabled the *systematic* nature of the killing." The distinction between causation and enablement is central to the chapter's argument against technological determinism.Q12. The internet was always destined to become a surveillance-based advertising platform.
Answer
**False.** (Section 2.5.2) The chapter states that "the internet was not destined to become a surveillance infrastructure" and notes that early visions imagined a decentralized, empowering technology. The surveillance business model was a *human decision* made by specific companies and investors — the internet could have been funded through subscriptions, public investment, or micropayments. The "Common Pitfall" box explicitly warns against blaming "technology" rather than the business model.Q13. The FICO credit score was introduced in 1989 and is calculated on a scale of 300 to 850.
Answer
**True.** (Section 2.4.3) The Fair Isaac Corporation (now FICO) introduced the first widely used credit score in 1989. It collapses a person's financial history into a single three-digit number between 300 and 850 that determines access to credit, housing, and often employment.Q14. The chapter argues that historical analogies between colonial data practices and modern tech platforms are always misleading.
Answer
**False.** (Section 2.9, Key Debates) Whether historical analogies such as comparing surveillance capitalism to colonialism are "illuminating or misleading" is explicitly listed as one of the chapter's key debates — an open question, not a settled conclusion. The chapter itself draws extensive parallels between colonial classification and modern algorithmic categorization while acknowledging the limits of such comparisons.Q15. According to the chapter, governance mechanisms have historically kept pace with data technology developments.
Answer
**False.** (Section 2.8.1) The "Governance Lag" is identified as one of four recurring dynamics: "Governance consistently lags behind technological capability, often by decades." The internet commercialized in the 1990s; the GDPR didn't take effect until 2018. The National Data Center debate of the 1960s produced the Privacy Act of 1974, a decade later.Section 3: Short Answer
Q16. In two to three sentences, explain how Web 2.0 changed the nature of the data being collected about internet users compared to the earlier web. (Reference Section 2.5.3.)
Answer
Web 2.0 shifted the web from read-only to read-write, meaning users actively created content (thoughts, photos, locations, relationships, life events) rather than merely browsing. This meant the data was no longer just "exhaust" from searches and purchases — it was the product of people's creative and social labor. Users became simultaneously the product (data sold to advertisers), the content creators (posts attracting other users), and the product of the product (engagement data refining targeting algorithms).Q17. Explain what Eli means when he says the ShotSpotter sensors were "measuring the sensors, not the neighborhood." How does this illustrate the concept of a feedback loop in predictive analytics?
Answer
Eli's point is that the ShotSpotter system detected not only actual gunshots but also car backfires, fireworks, and slamming dumpsters — producing data that *appeared* to confirm the neighborhood was dangerous but actually reflected the sensitivity of the sensors rather than the reality of the neighborhood. This illustrates a feedback loop because the sensors generated data that justified increased police presence, which led to more arrests, which generated more data reinforcing the original prediction, which justified more sensors and more policing — a self-reinforcing cycle that targeted already-targeted communities regardless of actual crime levels. (Section 2.6.2)Q18. The chapter notes that Francis Galton developed foundational statistical tools (correlation, regression) in service of eugenics. Dr. Adeyemi argues this history "demands that we approach quantitative methods with awareness of their origins." In three to four sentences, explain what "awareness of origins" means in practical terms. What should a data practitioner do differently because of this history?
Answer
"Awareness of origins" means recognizing that statistical tools were developed within specific power structures and for specific purposes — purposes that included racial hierarchy and the quantification of human "worth." In practical terms, a data practitioner should scrutinize the categories they use (asking whether classification schemes encode assumptions about hierarchy or difference), question whose interests are served by the questions being asked of data, and examine whether "objective" measurements might carry embedded biases from the contexts in which methods were developed. This doesn't mean abandoning statistical tools, but it means never treating them as neutral instruments divorced from the social contexts of their creation and application. (Section 2.2.2)Q19. In two to three sentences, explain what the chapter means by "Big Data's three Vs" and why some frameworks add a fourth and fifth V.
Answer
The three Vs are Volume (the sheer amount of data generated), Velocity (the speed at which data is generated and must be processed), and Variety (the diversity of data types, from structured to unstructured). Some frameworks add Veracity (the accuracy and reliability of data) because not all data is trustworthy, and Value (what the data is worth once analyzed) because raw data requires processing and context to become useful. (Section 2.6.1)Section 4: Applied Scenario
Q20. A city government proposes consolidating its police records, public health data, school enrollment data, social services records, and utility billing information into a single "Integrated City Data Platform" to improve service delivery and identify at-risk residents who need support.
Using at least three of the four recurring dynamics from Section 2.8.1, analyze this proposal. Your answer should:
- Identify which historical precedent(s) are most relevant
- Explain how at least three of the four dynamics (Ratchet Effect, Dual Use, Governance Lag, Burden Falls Downward) apply to this scenario
- Propose at least two governance safeguards that might mitigate the risks you identify
- Reference Eli's experience with Smart City surveillance and the National Data Center debate as points of comparison
(Aim for 300-500 words.)
Answer
**Strong answers will include the following elements:** **Historical precedents:** The most directly relevant precedent is the 1965 National Data Center proposal (Section 2.4.2), which sought to consolidate federal statistical data and was defeated on privacy grounds. The colonial census (Section 2.2.1) is also relevant — it shows how centralizing population data enables classification and control. **Ratchet Effect:** Once data from multiple city systems is consolidated, it is extremely unlikely to be de-consolidated. The platform will expand over time to include additional data sources (traffic cameras, smart meter data, social media monitoring), and each expansion becomes the new baseline. The original "service delivery" justification will gradually expand to include law enforcement, code enforcement, and other uses. **Dual Use:** The same platform that identifies a family in need of social services could also flag them for child protective investigation, immigration enforcement, or predictive policing. The same health data used to allocate public health resources could be used to deny housing or insurance. The "at-risk" identification that sounds supportive can easily become the "at-risk" identification that triggers punitive intervention. **Burden Falls Downward:** The residents most likely to appear across multiple city databases — police records, social services, public health — are disproportionately low-income residents and residents of color. They bear the greatest surveillance burden and face the greatest risk from data consolidation, while wealthier residents who use fewer public services are largely invisible to the system. This directly parallels Eli's experience: his neighborhood received smart city sensors while wealthier neighborhoods did not (Section 2.6.2). **Governance Lag:** The technology to build such a platform exists today; the governance frameworks to regulate it do not. Most cities lack data governance ordinances, algorithmic impact assessment requirements, or independent oversight bodies for data consolidation. **Governance safeguards might include:** (1) A community data governance board with veto power over new data integrations and use cases, modeled on the community input Eli's Detroit context lacked; (2) Purpose limitation requirements with legal force — the data can only be used for the specific purposes approved, with criminal penalties for mission creep; (3) Mandatory algorithmic impact assessments before any predictive model is deployed using the consolidated data; (4) Automatic data expiration — records are deleted after a defined period unless specific legal authorization requires retention; (5) An independent data ombudsperson with authority to investigate complaints and audit the system.Total: 20 questions (10 multiple choice, 5 true/false with justification, 4 short answer, 1 applied scenario)