Chapter 5 Further Reading: Your First Political Dataset

Python and pandas for Data Analysis

McKinney, Wes. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter. 3rd ed. O'Reilly, 2022. The definitive reference for pandas, written by its creator. Chapters 5 (Getting Started with pandas), 7 (Data Cleaning and Preparation), 8 (Data Wrangling: Join, Combine, and Reshape), and 9 (Plotting and Visualization) are the most directly relevant to the skills in Chapter 5. This is the book to own if you plan to do serious data analysis in Python.

VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data. 2nd ed. O'Reilly, 2022. A broader introduction to the Python data science ecosystem, covering NumPy, pandas, matplotlib, and scikit-learn. The chapters on pandas and matplotlib are excellent complements to McKinney's reference. The full text is available free at jakevdp.github.io/PythonDataScienceHandbook.

Pandas Documentation. User Guide. pandas.pydata.org/docs/user_guide/index.html The official pandas documentation is unusually readable and comprehensive. The sections on "Indexing and Selecting Data," "Missing Data Handling," "Group By," and "Time Series / Date Functionality" are directly relevant to the operations in this chapter.

Political Data Infrastructure

Nickerson, David W., and Todd Rogers. "Political Campaigns and Big Data." Journal of Economic Perspectives 28.2 (2014): 51–74. An excellent academic overview of how campaigns use voter file data, targeting models, and field experiments. Covers the construction of voter files, the development of support and turnout scores, and the evidence base for data-driven campaign strategies. Accessible to non-specialists and directly relevant to the ODA voter table structure.

Hersh, Eitan. Hacking the Electorate: How Campaigns Perceive Voters. Cambridge University Press, 2015. A political scientist's examination of how voter data shapes campaign strategy and, through campaigns' perception of voters, influences which communities are targeted and which are ignored. Particularly relevant to the "Who Gets Counted" themes in Section 5.12. Hersh shows that campaigns operating on voter file data systematically underinvest in communities underrepresented in those files.

Issenberg, Sasha. The Victory Lab: The Secret Science of Winning Campaigns. Crown, 2012. A journalistic account of the development of data-driven campaigning in the United States, from the early voter file era through the Obama 2008 campaign's targeting revolution. Accessible and narrative-driven; provides essential historical context for understanding why the ODA Dataset structure looks the way it does.

Survey and Polling Data

Roper Center for Public Opinion Research. iPoll Database. ropercenter.cornell.edu The most comprehensive archive of US public opinion survey data, with tens of thousands of polls from 1936 to the present. Useful for accessing historical polling data and understanding the range of survey methodologies used in practice. Many university libraries provide access.

FiveThirtyEight Pollster Ratings. fivethirtyeight.com/interactives/elections/pollster-ratings/ A publicly available database of pollster performance ratings, tracking each firm's historical accuracy and methodological characteristics. Essential context for evaluating which pollsters' data should be weighted more or less heavily — directly relevant to the house effects discussion and the methodology column in oda_polls.csv.

American National Election Studies (ANES). electionstudies.org The gold-standard longitudinal survey of American electoral behavior, conducted since 1948. The ANES provides the richest individual-level data on political attitudes, vote choice, and campaign engagement. The variables in oda_voters.csv — support scores, party identification, vote history — reflect the conceptual framework developed through decades of ANES research.

Campaign Finance Data

Federal Election Commission. Campaign Finance Data. fec.gov/data/ The official public source for federal campaign finance records. Individual contributions over $200 are publicly disclosed with donor name, employer, occupation, zip code, amount, date, and recipient. This is the real-world equivalent of oda_donations.csv. Learning to navigate the FEC's bulk data downloads is an essential skill for political data journalists and analysts.

OpenSecrets (Center for Responsive Politics). opensecrets.org A nonprofit that aggregates and presents FEC data in user-friendly formats. Their industry and sector breakdowns of campaign contributions are particularly useful for understanding the economic interests behind political giving. Their "money in politics" data is among the most-cited in political journalism.

Missing Data

Sterne, Jonathan A.C., et al. "Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls." BMJ 338 (2009): b2393. Though from epidemiology, this paper provides one of the clearest accessible explanations of the distinction between MCAR, MAR, and MNAR missing data mechanisms — the conceptual framework applied in Case Study 5.1. The political applications are direct: vote history missingness, undercoverage in phone surveys, and non-disclosure of donor occupations all reflect the MNAR pattern.

Data Visualization

Wilke, Claus O. Fundamentals of Data Visualization. O'Reilly, 2019. A principles-based guide to visualization that covers why certain chart types work for certain data types and how to make effective choices. The chapters on visualizing distributions, amounts, and proportions are directly relevant to the voter demographic charts in this chapter. Available free at clauswilke.com/dataviz.

Schwabish, Jonathan. Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks. Columbia University Press, 2021. Targeted specifically at researchers and policy analysts — the audience most likely to be working with data like the ODA Dataset. Schwabish's advice on chart selection, annotation, and storytelling with data is practical and immediately applicable.

Cairo, Alberto. The Truthful Art: Data, Charts, and Maps for Communication. New Riders, 2016. A comprehensive treatment of data visualization from a journalism perspective, covering both the technical and ethical dimensions of visual representation. Cairo's discussion of the difference between exploration and communication charts directly maps to Section 5.7, and his treatment of choropleth maps is essential background for the density-vs.-count map discussion in Case Study 5.2.

Data Ethics and "Who Gets Counted"

D'Ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT Press, 2020. A rigorous and accessible argument that data practices embed power relations that shape whose experiences are counted, whose are invisible, and who benefits from data-driven decisions. The book's "Principle 1: Examine Power" and "Principle 7: Make Labor Visible" are directly relevant to the "Who Gets Counted" framework in Section 5.12. Available free at datafeminism.io.

Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. Polity, 2019. Benjamin's analysis of how algorithmic systems replicate and amplify racial inequality applies directly to voter targeting models built on historical turnout data. Communities that faced structural barriers to voting in the past have lower historical turnout, which translates into lower turnout propensity scores, which translates into lower campaign investment — a self-reinforcing cycle that the ODA voter file's vote_history_* columns quietly embed.