Appendix A: Research Methods Primer

This primer exists for a simple reason: the methods used throughout this textbook are only as useful as your ability to interpret them honestly. You do not need to be a mathematician to read this book. You do need to understand what researchers mean when they say a finding is "statistically significant," what a confidence interval actually tells you, and why correlation between two variables does not prove that one caused the other. These concepts appear in every chapter, and misunderstanding them produces the kind of confident wrongness that afflicts cable news panels and social media arguments alike.

Work through this appendix before beginning Part I, and return to specific sections whenever a concept in the main text feels unclear.

A.1 Types of Research

Political scientists use several broad categories of research design, each suited to different questions and each carrying different strengths and limitations.

Experimental research assigns subjects randomly to a treatment condition and a control condition, then measures outcomes. Random assignment is the key. When people are randomly assigned to groups, any difference between groups at the end is likely caused by the treatment rather than by pre-existing differences between participants. In political science, field experiments are particularly valuable. Gerber and Green's landmark study (see Appendix D) randomly assigned registered voters to receive door-to-door canvassing, phone calls, or no contact, then compared turnout. Because assignment was random, differences in turnout could be attributed to the contact, not to preexisting differences in civic motivation.

The challenge with experiments in political science is that many things cannot be ethically or practically randomized. You cannot randomly assign countries to authoritarian governments, randomly give half the electorate misinformation, or randomly expose half a city to a recession. This pushes most political research toward observational designs.

Observational research examines the world as it is, without the researcher manipulating anything. A study comparing voter turnout in states with strict versus lenient ID laws is observational: the researcher did not assign those laws, and states with strict laws differ from lenient-law states in many ways beyond the ID policy. Observational research can identify patterns and associations with great precision; establishing causation is harder and requires additional assumptions.

Descriptive research aims to accurately characterize what exists without necessarily explaining why. A careful demographic breakdown of who voted in the 2024 election is descriptive research. The exit poll data is descriptive. The finding that women voted for Democratic candidates at higher rates than men voted for them is a descriptive fact. Descriptive research is not "mere description" — accurate description of a complicated social reality is genuinely difficult and genuinely valuable.

Inferential research uses data from a sample to draw conclusions about a larger population. A poll of 1,000 registered voters is inferential: the goal is to say something true about the 180 million registered voters those 1,000 represent. Almost all polling and survey research is inferential. The quality of the inference depends heavily on how the sample was drawn and whether it is truly representative of the target population.

A.2 Variables: Independent, Dependent, Control, and Confounding

A variable is any characteristic that can take on different values. Age is a variable. Party identification is a variable. Vote choice is a variable. GDP growth rate is a variable.

The dependent variable (DV) is what you are trying to explain — the outcome of interest. In a study of voter turnout, turnout is the dependent variable. In a study of how economic conditions affect presidential approval, approval is the dependent variable.

The independent variable (IV) is what you think explains or predicts the dependent variable — the presumed cause or predictor. In the turnout study, the independent variable might be whether the voter received a mobilization mailer. In the approval study, it is the change in unemployment rate.

Control variables are additional variables the researcher holds constant or statistically adjusts for because they might otherwise muddy the relationship between IV and DV. If you want to know whether income predicts Republican vote share and you know that rural residents are both more likely to vote Republican and more likely to have lower incomes, you might control for urban/rural status. This lets you isolate the income-partisanship relationship more cleanly.

Confounding variables (confounders) are variables that are correlated with both the independent and dependent variable and can produce a spurious association between them. Ice cream sales and drowning rates are both higher in summer: ice cream does not cause drowning, but a third variable — hot weather and outdoor water activity — explains both. In political research, education and income are frequent confounders. A finding that homeowners vote at higher rates than renters might partly reflect the fact that homeowners are older and older citizens vote more reliably. Age is a confounder.

Identifying and accounting for confounders is one of the core challenges of political science research. A study that fails to account for obvious confounders produces unreliable conclusions.

A.3 Levels of Measurement

Variables differ in what kind of information they carry. Statisticians distinguish four levels:

Nominal variables have categories with no inherent order or numerical meaning. Party identification (Democrat, Republican, Independent, Other) is nominal. Gender identity is nominal. State of residence is nominal. You can count how many people fall into each category, but you cannot say that one category is "more" than another in any meaningful numeric sense. The average party identification has no meaning.

Ordinal variables have categories with a meaningful order, but the gaps between categories are not necessarily equal. Survey responses on a 5-point Likert scale ("strongly agree" to "strongly disagree") are ordinal. The "agree" category is between "strongly agree" and "neutral," but we cannot assume the psychological distance between strongly agree and agree is the same as between agree and neutral. Ideology measured on a "very liberal / liberal / moderate / conservative / very conservative" scale is ordinal.

Interval variables have meaningful order and equal gaps between values, but no true zero point. Temperature in Celsius is the classic example. In political science, index scores constructed from multiple survey items (like a civil liberties score running from 0 to 100) are often treated as interval even though the underlying scale is technically ordinal — this is a common and usually acceptable simplification.

Ratio variables have all the properties of interval variables plus a true zero. Vote percentage is ratio: 0% means literally no votes. Campaign spending in dollars is ratio. Turnout rate is ratio. The zero is meaningful, and ratios between values are interpretable (a candidate who won 60% got twice the share of one who won 30%).

Why this matters: the level of measurement determines what statistics are appropriate. You can compute a mean for ratio and interval data. Computing the mean of a nominal variable (the "average" party) produces nonsense. Applying correlation to nominal variables requires special techniques. Throughout this textbook, attention to measurement level prevents methodological errors.

A.4 Descriptive Statistics

Descriptive statistics summarize large amounts of data into comprehensible numbers.

Mean is the arithmetic average: sum all values and divide by the number of observations. The mean presidential approval rating across all Gallup polls in 2024 captures the central tendency of approval over that period. The mean is sensitive to extreme values (outliers). A district where 99 precincts report 50% turnout and one precinct reports 100% turnout will have a mean of roughly 50.5%, masking nothing unusual. But in skewed distributions, the mean can mislead.

Median is the middle value when observations are sorted. Half the values fall above the median, half below. The median is resistant to outliers. If you want to describe the "typical" congressional district in terms of campaign spending, the median is more useful than the mean because a few megadistricts with multimillion-dollar races would inflate the mean dramatically.

Mode is the most frequent value. In a distribution of party identification among U.S. adults, the mode is typically "Independent" (or whatever label captures the plurality), though this depends on how the question is asked. The mode is the only central tendency measure that makes sense for nominal data.

Variance measures how spread out values are around the mean. It is the average of the squared deviations from the mean. Large variance means values are dispersed; small variance means they cluster tightly.

Standard deviation is the square root of variance, expressed in the same units as the original variable. A state where average county-level Republican vote share is 55% with a standard deviation of 3 percentage points is politically more uniform than one with the same average but a standard deviation of 15 points. Standard deviation is the workhorse measure of spread in political data.

A.5 Distributions: Normal, Skewed, and What They Mean

A distribution shows how values of a variable are spread across their possible range. When you see a histogram of presidential approval ratings across all survey respondents, you are visualizing a distribution.

The normal distribution — the famous bell curve — is symmetric, with most values clustered around the mean and fewer values as you move away from the mean in either direction. Many naturally occurring quantities approximate normality. The normal distribution is foundational to classical statistics because it has predictable mathematical properties: in a normal distribution, 68% of observations fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three.

Political variables are often not normally distributed. Vote margins in congressional elections are frequently right-skewed (a long tail to the right): most incumbents win by comfortable margins, but a few win by enormous landslides. Campaign spending is dramatically right-skewed: most candidates spend modest amounts, but a few spend tens of millions. In right-skewed distributions, the mean is higher than the median because the extreme high values pull the mean upward.

Left-skewed distributions have a long tail to the left, meaning most values are high but a few are very low. Presidential approval ratings for popular presidents during crises can be left-skewed.

When data are skewed, researchers often apply transformations (like taking the natural log of spending) to make the distribution more symmetrical and statistical assumptions more defensible. Chapters 6 and 11 use log transformation for campaign expenditure data for exactly this reason.

Bimodal distributions have two peaks. Public opinion on highly polarized issues often looks bimodal: many people hold strong opinions on one side or the other, with fewer moderates in the middle. A bimodal distribution of ideology in the American electorate would be consistent with polarization arguments explored in Part IV.

A.6 Probability Basics: p-Values, Confidence Intervals, and Margins of Error

These three concepts are the most commonly misunderstood in political analysis and the most consequential to get right.

Probability is a number between 0 and 1 expressing how likely an event is to occur under a specified set of conditions. A probability of 0 means the event cannot happen; 1 means it is certain. A fair coin has a 0.5 probability of landing heads on any given flip.

The p-value appears constantly in political science research. It is formally defined as: the probability of observing a result at least as extreme as the one found, assuming the null hypothesis is true. The null hypothesis is typically the boring claim that there is no real effect — no difference between groups, no relationship between variables.

When a study of voter contact reports a p-value of 0.03 for the effect of canvassing on turnout, it means: if canvassing had no real effect on turnout, we would see a result this large (or larger) in only 3% of studies due to random sampling variation alone. Because 3% is low, we doubt the null hypothesis and provisionally accept that canvassing has an effect.

The conventional threshold is p < 0.05, a historical convention set by statistician Ronald Fisher that has no deep theoretical basis. A finding with p = 0.04 is statistically significant by this standard; p = 0.06 is not — but the underlying evidence is nearly identical. The p-value does NOT tell you the probability that the null hypothesis is true. It does NOT tell you the size of the effect. It does NOT tell you that your finding will replicate. These misinterpretations are extremely common in political journalism.

Confidence intervals express the uncertainty around an estimate. If a poll finds that 47% of likely voters support Candidate A and reports a 95% confidence interval of [44%, 50%], the interval was constructed using a procedure that, if repeated across many samples, would capture the true population value 95% of the time. A common shorthand: we are "95% confident" the true value lies between 44% and 50%.

This does NOT mean there is a 95% probability the true value is in that specific interval — the true value either is in there or it isn't. The 95% refers to the long-run performance of the method. This distinction matters philosophically but is a fine point for most practical purposes.

Margin of error in polling is typically the half-width of the 95% confidence interval for a proportion. A poll with a 3-point margin of error means the confidence interval extends 3 points above and below the reported percentage. A candidate reported at 47% with a 3-point margin of error could plausibly be anywhere from 44% to 50%. A race where the two candidates are within the margin of error is genuinely too close to call — not "essentially tied," which would require the margin to be zero, but genuinely uncertain.

The margin of error reported by polling firms usually addresses only sampling error — the randomness introduced by surveying a sample rather than the whole population. It does not account for nonresponse bias, question wording effects, or weighting errors. Chapter 8 covers these additional sources of polling error in detail.

A.7 Correlation and Regression

Correlation measures the strength and direction of the linear relationship between two variables. The Pearson correlation coefficient (r) ranges from -1 to +1. A correlation of +1 means the two variables move in perfect lockstep in the same direction. A correlation of -1 means they move in perfect lockstep in opposite directions. A correlation of 0 means no linear relationship exists.

In political science, a correlation of 0.70 between a county's college-education rate and its Democratic vote share would be considered strong. Correlations of 0.30–0.40 between variables measured with error across hundreds of observations might be theoretically important. There is no universal cutoff for "strong" correlation — context matters.

Regression goes further: it estimates the equation of the line that best fits the data and allows you to predict the dependent variable from one or more independent variables while controlling for others. In simple linear regression, you estimate: DV = a + b(IV) + error. The coefficient b tells you how much the DV is expected to change for each one-unit increase in the IV. In multiple regression, you include several IVs simultaneously, and each coefficient reflects the variable's relationship with the DV holding all other variables constant.

Regression is the workhorse method in Chapters 6, 11, 14, and 19. When a regression analysis says that each percentage-point increase in unemployment is associated with a 1.8-point decrease in presidential approval, that coefficient comes from a regression model that almost certainly controls for several other variables thought to affect approval.

What correlation and regression are NOT: they are not proof of causation. A high correlation between ice cream consumption and violent crime across cities does not mean ice cream causes violence (both rise in hot weather). A regression coefficient is not a causal estimate unless the research design supports causal inference — which requires either random assignment, a natural experiment, or strong theoretical and statistical arguments about why confounders are not driving the result.

A.8 Causation vs. Correlation: Four Political Examples

Recognizing spurious correlations is a core skill for political analysts. Four examples drawn from actual debates in political science illustrate the challenge.

Example 1: Storks and babies. European nations with higher stork populations have higher birth rates. Neither causes the other: both correlate with rural land area and population density. The political analogue: states with more gun stores per capita have higher gun death rates. A naive analysis might conclude gun stores cause deaths. But both variables correlate with rurality, poverty, and population density. This doesn't mean no relationship exists — it means the relationship is more complicated than the raw correlation suggests.

Example 2: Education and Democratic vote share. Since 2016, college education and Democratic voting have been strongly correlated at the county level. Does education cause Democratic voting? The relationship is partly causal — education affects values and social networks — but also partly a sorting process in which Democrats increasingly live in high-education areas and Republicans in lower-education areas. Ecological correlations (measured at the county level) should not be read as individual-level causal effects.

Example 3: Economic conditions and presidential approval. Approval falls when unemployment rises. This correlation is well-documented and plausibly causal: voters punish presidents for bad economies. But is the mechanism retrospective accountability (voters judge past performance) or prospective calculation (voters anticipate future conditions)? The correlation does not resolve this. And presidents inherit economic conditions from their predecessors — causation runs partly backward.

Example 4: Social media use and polarization. Cross-sectional surveys show heavy social media users are more politically polarized. Does social media cause polarization? Possibly — but highly polarized people may be more drawn to political content on social media. The causal direction is ambiguous, and longitudinal experiments show smaller effects than cross-sectional correlations suggest (see Chapter 22 and the Guess et al. 2019 discussion in Appendix D).

A.9 Statistical Significance: What It Means and the Common Misinterpretation

A finding is statistically significant if the p-value falls below the chosen threshold (conventionally 0.05). It means: this result is unlikely to be due to sampling chance alone, given our assumptions.

Statistical significance does NOT mean: - The finding is large or important - The finding will replicate - The hypothesis being tested is true - The result is "real" in any deep sense

A study of 2 million voters can find statistically significant differences so tiny they are practically meaningless — large samples detect even trivially small effects. A study of 50 participants might find practically large differences that fail to reach significance because the sample is too small to detect them reliably. Statistical significance is partly a function of sample size. This is why Part VI of this textbook consistently reports effect sizes alongside significance levels.

The 2019 American Statistical Association statement explicitly warned against using "p < 0.05" as a binary bright line for scientific conclusions. Political analysts who report only significance and ignore effect sizes, confidence intervals, and replication evidence are doing their readers a disservice.

A.10 Effect Size

Effect size is a standardized measure of how large an effect is, independent of sample size. The most common effect size measure for comparing two groups is Cohen's d — the difference between group means divided by the pooled standard deviation.

Cohen's rough guidelines: d = 0.2 is small, d = 0.5 is medium, d = 0.8 is large. For proportions, the difference between two percentages is itself an effect size (a treatment that raises turnout from 50% to 52% has a 2-percentage-point effect). For regression, the R-squared statistic (proportion of variance explained) is a common effect size measure.

In political contexts, even small effects can be practically important when elections are decided by margins of thousands of votes. A door-to-door canvassing effort that raises turnout by 1.5 percentage points among 100,000 targeted voters produces 1,500 additional votes. In a close congressional race, that is decisive. Effect size matters in absolute terms, not just relative ones.

A.11 Survey Methodology Overview

Surveys are the primary data collection tool for measuring public opinion, voter behavior, and political attitudes. Chapters 7 through 9 cover survey methodology in depth; this section provides orientation.

A survey is a structured set of questions administered to a sample of respondents. The goal is to produce estimates that generalize to the population of interest. Key concepts:

Sampling frame: the list from which respondents are drawn. Registered voter lists, random digit dialing pools, and online panel rosters are common frames. If the frame excludes part of the population (cell-phone-only households were missing from early 21st-century landline surveys), estimates can be biased.

Response rate: the proportion of contacted individuals who complete the survey. Falling response rates — from 70–80% in the 1970s to under 10% for many phone surveys today — are a major challenge. Low response rates can introduce bias if nonrespondents differ systematically from respondents.

Weighting: statistical adjustment that makes the sample resemble the target population on key demographics. A sample with too many college graduates relative to the population will be weighted down on education. Weighting corrects known imbalances but cannot correct for unknown ones.

Question wording effects: the exact phrasing of a question shapes responses. "Should the government do more to regulate firearms?" produces different results than "Should the government infringe on citizens' Second Amendment rights?" Comparing responses across differently worded surveys is problematic.

A.12 Validity and Reliability

Reliability refers to consistency. A measure is reliable if it produces the same result when applied repeatedly to the same thing under the same conditions. A bathroom scale that gives you a different weight each time you step on it within a minute is unreliable. A party identification question that gives stable answers across multiple administrations to the same respondents is reliable.

Validity refers to whether a measure actually captures what it claims to measure. A question that asks "Do you consider yourself a strong Democrat, weak Democrat, Independent leaning Democrat, Independent, Independent leaning Republican, weak Republican, or strong Republican?" has high face validity as a measure of party identification. Using the number of yard signs in a neighborhood to measure partisan commitment is a more questionable measure — it might be picking up something real but also wealth, HOA rules, and social pressure.

A measure can be reliable without being valid (a broken thermometer consistently reads 72 degrees — reliable, but not valid). A valid measure will typically also be reliable, though measurement error can reduce reliability.

In political research, construct validity — whether the operationalization captures the theoretical concept — is the most debated form of validity. Does "political ideology" measured by self-placement on a 7-point liberal-conservative scale capture the multidimensional ideological space that political theorists describe? Probably not fully. This tension between measurable proxies and unmeasurable constructs runs throughout quantitative political science.

A.13 Common Logical Fallacies in Political Data Analysis

The Ecological Fallacy occurs when conclusions about individuals are drawn from group-level data. If states with higher median income voted more Democratic in 2020, you cannot conclude that wealthy individuals voted more Democratic. In fact, throughout most of the 20th century, higher-income individuals were more likely to vote Republican even while higher-income states were more competitive. The relationship at the state level reflected geographic sorting, not individual-level income effects. Political journalists frequently commit the ecological fallacy when analyzing county-level election results.

Base Rate Neglect occurs when highly salient specific information overwhelms statistically more informative base rate information. If you read that a political operative who worked on a failed campaign is being considered for a senior White House role, you might be skeptical. But the relevant comparison is the base rate: what proportion of political operatives who worked on failed campaigns go on to senior government positions? If that base rate is 40%, your skepticism may be misplaced. In election forecasting, base rate neglect leads analysts to overweight dramatic recent events relative to the structural fundamentals (economic conditions, incumbency) that historically predict outcomes with high accuracy.

Selection Bias occurs when the sample is not representative of the population you want to make inferences about. Analyzing Twitter discourse to understand public political opinion selects heavily toward younger, more educated, more politically engaged, and more ideologically extreme individuals. Conclusions from that analysis do not generalize to the electorate. Similarly, studying political violence by analyzing documented cases will oversample dramatic, visible incidents and undersample lower-level intimidation and coercion. Congressional hearings on politically sensitive topics attract witnesses who are unrepresentative of the broader expert or stakeholder community.

Confirmation Bias is not strictly a statistical fallacy but a cognitive error that shapes which data get collected, how they are analyzed, and how results are interpreted. Political analysts — and their funders — are not immune. A campaign's internal polling firm may, consciously or not, use question wordings and weighting schemes that favor their client. Being aware of who commissioned a study and what incentives shaped its design is essential critical thinking for political data consumers.

A.14 A Note on Learning Statistics Through Political Examples

The methods in this primer are not inherently political. Regression, significance testing, and survey weighting are general-purpose tools used in medicine, economics, and psychology. They come into this textbook carrying no ideological valence. A well-designed study finding that minimum wage increases have small employment effects is not "liberal." A well-designed study finding the opposite is not "conservative." What distinguishes good political analysis from propaganda is not the conclusion but the transparent, honest application of sound methods.

The goal of this textbook is to make you a better reader of political evidence — able to identify strong studies from weak ones, to understand what a finding does and does not demonstrate, and to build your own analyses on solid methodological ground. This appendix is the foundation. Everything else builds on it.

For deeper study of research methods in political science, see: King, Keohane, and Verba, Designing Social Inquiry (1994); Imai, Quantitative Social Science (2022); and the APSA Data Access and Research Transparency guidelines at www.dartstatement.org.