Exercises: The Age of Political Data

Tier 1: Foundational (Comprehension and Recall)

Exercise 1.1: Defining Political Data List five distinct types of political data mentioned in this chapter. For each type, identify (a) who produces it, (b) who uses it, and (c) one limitation or potential bias. Present your answers in a table format.

Exercise 1.2: Three Worlds In your own words, describe the three "worlds" of political data (campaign, media/research, and civic). For each world, identify the primary goal, one example of how data is used, and one ethical concern that arises from that use.

Exercise 1.3: Analytics vs. Punditry A cable news commentator says: "Garza is going to lose because she can't connect with rural voters." Rewrite this statement in the language of analytics, incorporating quantified uncertainty. Explain why the analytical version is more useful, even though it sounds less decisive.

Exercise 1.4: The Garza-Whitfield Race Create a one-page "analytical brief" for the Garza-Whitfield Senate race. Include: (a) a summary of the demographic composition of the state, (b) each candidate's key strengths and vulnerabilities, (c) three data questions you would want answered before making a prediction, and (d) one reason why past election data might not be a reliable guide to this race.

Exercise 1.5: Vocabulary Check Define the following terms in your own words, with an example for each: voter file, microtargeting, data ecosystem, margin of error, persuasion modeling, response rate.

Tier 2: Analytical (Application and Analysis)

Exercise 1.6: Data Audit Choose a real political data source (e.g., the FEC's campaign finance database, your state's voter file, the U.S. Census Bureau's demographic data). Spend 15 minutes exploring it. Then write a one-page "data audit" that addresses: (a) What information is available? (b) What is missing? (c) Who might be systematically underrepresented or excluded? (d) What decisions did the data producers make about categories and definitions, and what are the consequences of those decisions?

Exercise 1.7: Stakeholder Analysis Consider the following scenario: Meridian Research Group publishes a poll showing Maria Garza leading Tom Whitfield by 4 points among likely voters. Analyze how each of the following stakeholders would interpret, use, or react to this information: (a) the Garza campaign, (b) the Whitfield campaign, (c) a news reporter covering the race, (d) a potential campaign donor, (e) a voter deciding whether to volunteer, (f) OpenDemocracy Analytics.

Exercise 1.8: Measurement Shapes Reality The chapter argues that "measurement shapes reality." Choose one of the following and explain how the act of measuring or categorizing creates political consequences: - The Census Bureau's categories for race and ethnicity - The definition of "likely voter" in public polls - The threshold for disclosing campaign contributions ($200 for federal races) - The decision to include or exclude third-party candidates in public polls

Exercise 1.9: Comparing Campaigns Based on the character descriptions of Nadia Osei and Jake Rourke, compare their approaches to campaign decision-making. Create a two-column table contrasting their likely approaches to each of the following decisions: (a) where to open a new field office, (b) what message to use in a television ad, (c) whether to invest more in turnout or persuasion, (d) how to respond to an opponent's attack. What are the strengths and weaknesses of each approach?

Exercise 1.10: The Data You Cannot See The chapter notes the importance of being aware of data you lack. For the Garza-Whitfield race, identify five types of data that would be valuable but are either (a) proprietary and inaccessible, (b) not collected at all, or (c) collected but unreliable. For each type, explain why it is difficult to obtain and how its absence might bias analysis.

Tier 3: Advanced (Synthesis and Evaluation)

Exercise 1.11: Tool or Weapon? An Essay Write a 1,000-word essay arguing either (a) that the explosion of political data has been, on balance, beneficial for democratic politics, or (b) that it has been, on balance, harmful. Use specific examples from the chapter and from your own knowledge of recent elections. Address at least one strong counterargument to your position.

Exercise 1.12: Designing a Data Operation Imagine you are Nadia Osei at the start of the Garza campaign. You have a budget of $500,000 for your analytics operation. Outline how you would allocate that budget across the following categories: (a) voter file acquisition and data infrastructure, (b) polling, (c) digital analytics and social media monitoring, (d) personnel, (e) other. Justify each allocation decision, explain what trade-offs you are making, and identify the biggest risk in your plan.

Exercise 1.13: Research Design You are a political science graduate student who wants to study whether data-driven campaign tactics increase voter turnout among underrepresented communities. Sketch a research design that would allow you to address this question. Include: (a) a clear research question, (b) the data you would need, (c) the comparison or control group, (d) at least two potential confounds or challenges, and (e) a brief discussion of the ethical considerations involved in studying campaign tactics.

Exercise 1.14: Cross-World Tensions The chapter describes three "worlds" of political data with different goals. Describe a realistic scenario in which the interests of the campaign world conflict directly with the interests of the civic world. How would each side justify its position? Is there a resolution that serves both interests, or is the conflict irreconcilable?

Exercise 1.15: The Prediction Problem The chapter describes a tension between prediction and explanation. Consider the following two models of the Garza-Whitfield race: - Model A: Uses 50 variables (demographics, past voting, consumer data, social media activity) and predicts the winner correctly in backtests 90% of the time, but the analysts cannot explain why any individual variable matters. - Model B: Uses 5 variables (presidential approval, GDP growth, incumbency, state partisanship, candidate quality) and predicts the winner correctly in backtests 75% of the time, but the analysts can clearly explain the role of each variable. Which model would you choose if you were (a) a campaign manager, (b) a journalist, (c) an academic researcher? Explain your reasoning for each role.

Exercise 1.16: Historical Comparison Research one of the following historical episodes in political data: (a) the 1936 Literary Digest poll, (b) the 2016 election forecasting failures, (c) the Cambridge Analytica scandal, (d) the controversy over the 2020 Census citizenship question. Write a 500-word analysis connecting the episode to at least two themes from this chapter (Measurement Shapes Reality, Who Gets Counted, Prediction vs. Explanation, Data in Democracy, The Map vs. the Territory).

Exercise 1.17: Advising ODA Adaeze Nwosu at OpenDemocracy Analytics asks you for advice: "How can we make our tools useful to people who are not already politically engaged?" Write a one-page memo proposing three specific strategies, each grounded in the concepts from this chapter. For each strategy, identify a potential obstacle and how you would address it.