Appendix G: Glossary of Key Terms in Political Analytics
How to Use This Glossary: Terms are arranged alphabetically. Cross-references to related terms appear in italics. Terms that are defined by formal AAPOR or statistical standards are noted with [AAPOR] or [STAT]. Chapter references indicate where each concept is developed in depth.
A
Acquiescence bias — The tendency of survey respondents to agree with any proposition presented to them, regardless of its content. Respondents may agree with "The government should do more to reduce crime" and also agree with "The government should stay out of crime-fighting and leave it to local communities." Acquiescence bias is a form of response set that inflates support for whichever option is presented as the "agree" choice. Minimized by using forced-choice formats or balancing agree/disagree framings. See also: response set, social desirability bias.
Affective polarization — The tendency of partisans to view members of their own party favorably and members of the opposing party with hostility, disgust, or contempt — independent of policy disagreements. Measured directly via feeling thermometers and in-group/out-group difference scores. Distinguished from ideological polarization (divergence in policy positions): the U.S. has experienced far greater growth in affective polarization than in ideological distance between the median Republican and Democrat. See also: negative partisanship, ideological sorting, polarization. (Ch. 3, 33)
Aggregate bias — A systematic error that is consistent in direction across many polls or measurements, causing the aggregate of polls to be biased rather than merely variable. Distinguished from random error, which cancels out in averages. Aggregate bias persists even in large samples if the underlying methodology is systematically flawed. The Literary Digest error of 1936 and the 2020 polling errors are canonical examples of aggregate bias. See also: house effect, partisan nonresponse, coverage bias.
Anchoring effect — A cognitive bias in which respondents' answers are influenced by a reference point (the "anchor") presented in the question or preceding questions. In political surveys, a question presenting a specific policy amount (e.g., "Should the minimum wage be raised to $15?") anchors subsequent judgments about wage policy. See also: framing effect, question order effect.
Area probability sampling — A form of probability sampling in which geographic areas are sampled at the first stage, then households within selected areas, then individuals within selected households. Used primarily in face-to-face surveys. Avoids the need for a complete sampling frame of individuals. See also: probability sample, cluster sampling, stratified sampling.
AAPOR — American Association for Public Opinion Research. The primary professional organization for survey researchers in the United States. AAPOR publishes standards for reporting polling methodology (the Transparency Initiative), definition of response rates, and codes of professional ethics. Its post-election evaluations (1948, 2016, 2020) are the field's primary self-assessment mechanisms. See also: response rate, transparency.
B
Ballot test — A survey question that simulates the act of voting by presenting respondents with the names and offices of candidates and asking how they would vote if the election were held today. The most common ballot test format is the "head-to-head" ballot test between two candidates. Also called the "horse race question." The wording and order of candidates listed can influence results significantly.
Base rate — The unconditional probability of an event occurring, before any case-specific information is applied. In political analytics, the base rate for an incumbent seeking reelection, or for the party holding the presidency in a midterm, provides the prior probability in a Bayesian analysis before polls are incorporated. Ignoring base rates produces the "base rate fallacy" — overweighting specific case information relative to the underlying rate. See also: Bayesian updating, prior probability, fundamentals model. (Ch. 17)
Bayesian updating — The process of revising probability estimates in light of new evidence according to Bayes' theorem. A prior probability (based on historical base rates or structural fundamentals) is updated using the likelihood of observing the new evidence under each possible state of the world, producing a posterior probability. Election forecasters use Bayesian updating to combine structural models with polls as they accumulate during the campaign. See also: base rate, prior probability, fundamentals model. (Ch. 17, 21)
Battleground state — A state in which neither major party has a reliable advantage in presidential elections, making it competitive and therefore a primary target of campaign resources. Also called "swing state." The definition of battleground states shifts over time as partisan composition and demographics change. States like Wisconsin, Pennsylvania, Michigan, Arizona, Georgia, and Nevada have been battlegrounds in recent elections.
Benchmark poll — A comprehensive baseline survey conducted at the outset of a campaign, before significant public communication has occurred. Benchmark polls measure candidate name recognition, initial vote share, attribute ratings, issue salience, and opponent vulnerabilities. They establish the baseline against which subsequent polls are measured. See also: tracking poll.
Blocking (experimental design) — A procedure in which experimental subjects are divided into homogeneous groups (blocks) before random assignment, ensuring that treatment and control groups are balanced on known characteristics. In political experiments, common blocking variables include party affiliation, prior vote history, geography, and demographics. See also: randomized controlled trial, stratified sampling.
Boomerang effect — The tendency for persuasive messages to produce attitude change in the opposite direction from what was intended, particularly when the message is perceived as heavy-handed or the recipient has strong prior commitments. Campaign negative ads can produce boomerang effects when they are perceived as unfair or excessive. See also: reactance, backfire effect.
C
Call back — The practice of attempting to contact a selected household at multiple times and on multiple days before giving up. Call-back protocols are critical for telephone surveys: households that are hard to reach (because residents work long hours, travel frequently, or are otherwise busy) tend to have different characteristics from households reached on the first attempt. See also: response rate, non-response bias.
Causal inference — The process of determining whether a relationship between two variables is causal (X causes Y) rather than merely correlational. In political analytics, causal inference requires either random assignment (as in randomized controlled trials) or quasi-experimental designs that approximate random assignment (difference-in-differences, regression discontinuity, instrumental variables). See also: randomized controlled trial, confound. (Ch. 5)
Cell phone problem — The methodological challenge created by the shift of American households from landline to cell phone usage. Telephone Consumer Protection Act provisions prohibit automated dialing to cell phones, requiring human interviewers (higher cost). Cell phone numbers cannot be localized geographically. Cell-only households are disproportionately young and mobile. Pollsters who rely exclusively on landline samples systematically undersample these populations. See also: coverage bias, sampling frame. (Ch. 10)
Cluster sampling — A sampling procedure in which the population is divided into naturally occurring clusters (e.g., counties, precincts, households), a random sample of clusters is selected, and all or a random subset of units within each selected cluster are interviewed. Cluster sampling is operationally efficient but produces estimates with higher sampling error than simple random sampling of equivalent size, quantified by the intraclass correlation (ICC). See also: intraclass correlation, stratified sampling, area probability sampling. [STAT]
Confidence interval — A range of values computed from sample data within which the true population parameter is estimated to fall with a specified probability (usually 90%, 95%, or 99%). A 95% confidence interval means that if the sampling procedure were repeated many times, 95% of the resulting intervals would contain the true population value. In polls, the margin of error is approximately the half-width of a 95% confidence interval. See also: margin of error, sampling error. [STAT] (Ch. 8)
Confound (also confounding variable, confounder) — A variable that is associated with both the independent variable (treatment) and the dependent variable (outcome), creating a spurious apparent relationship between them. In non-experimental research, confounds make causal inference difficult. For example, income is a confound in the apparent relationship between education and Democratic vote share: education and income are correlated, and both are related to vote choice. See also: causal inference, randomized controlled trial. [STAT]
Coverage bias — Error arising when members of the target population have no chance of being selected because they are not included in the sampling frame. Examples include online polls that exclude people without internet access, telephone polls that exclude people without phones, and polls using listed telephone directories that exclude unlisted numbers. See also: sampling frame, cell phone problem, non-probability sample. (Ch. 10)
Cross-tabulation (also crosstab) — A table that displays the joint distribution of two or more variables, showing how the distribution of one variable differs across categories of another. For example, a crosstab of candidate preference by gender shows what percentage of men and women favor each candidate. The workhorse of descriptive survey analysis. Statistical tests (chi-square) assess whether observed differences across groups are likely to be due to chance. (Ch. 9)
Cue-taking — The process by which citizens rely on information shortcuts (cues) from trusted sources — parties, interest groups, candidates, or peers — to make political judgments without processing detailed policy information. Party identification is the most powerful political cue in American politics. Elite cues can rapidly shift mass opinion on issues where citizens have weak prior views. See also: party identification, heuristic.
D
Data merge — The process of combining records from two or more datasets using a common identifier (often a voter file record ID, or a combination of name, date of birth, and address). Merging commercial consumer data, voter file data, and survey data is the foundation of voter targeting. The quality of the merge — what fraction of records match successfully — significantly affects the quality of targeting models. See also: voter file, microtargeting.
Demographic determinism — The error of assuming that an individual's political preferences can be reliably predicted from their demographic characteristics (race, gender, education, age) alone. While demographics are powerful predictors at the aggregate level, individual-level prediction is far more uncertain. Demographic determinism in targeting — assuming all members of a demographic group share political views — leads to wasted resources and missed opportunities. See also: ecological fallacy, microtargeting. (Ch. 13)
Differential response (also differential nonresponse, partisan nonresponse) — The phenomenon in which the probability of responding to a survey differs systematically between partisan groups, creating bias in polling estimates. If Republicans are less likely to respond to surveys than Democrats when political interest is high among Republicans (or vice versa), polls will systematically over- or under-estimate support for each party even when demographic weighting is applied. Identified as a primary cause of 2020 polling errors. See also: non-response bias, partisan nonresponse, response rate. (Ch. 10)
Divergence framing — A news framing strategy that emphasizes conflict, disagreement, and partisan division in coverage of political events. Divergence framing can amplify the appearance of polarization beyond what actually exists in public opinion. Contrasted with consensus framing (emphasizing agreement) or complication framing (emphasizing nuance). See also: framing effect, emphasis framing. (Ch. 24)
Down-ballot — Referring to candidates or measures appearing below the top-of-ticket (presidential or gubernatorial) races on the ballot. Down-ballot races (state legislature, local office, ballot measures) typically receive less media attention, lower voter familiarity, and substantial coattail effects from the top of the ticket. Analytics methods for down-ballot races often rely more heavily on ecological inference due to limited polling.
E
Ecological fallacy — The error of inferring individual-level relationships from aggregate-level data. For example, if precincts with more college-educated voters tend to be more Democratic, it does not necessarily follow that college-educated individuals within those precincts are more Democratic — there could be other explanatory variables operating at the individual level. See also: ecological inference, demographic determinism. (Ch. 14)
Ecological inference — Statistical methods for estimating individual-level relationships from aggregate data. Used extensively in political science where individual-level data (surveys) are unavailable or unreliable, particularly for historical elections or in international contexts. Associated with Gary King's EI method. See also: ecological fallacy, aggregate bias. (Ch. 14)
Emphasis framing — A type of framing effect in which a particular attribute, consideration, or value is made more salient in the presentation of an issue, affecting how audiences evaluate it, without changing the underlying facts. "Emphasizing" the economic cost of immigration policy versus its cultural effects, or the security benefits versus the civil liberties costs of surveillance, produces different attitude outcomes. See also: framing effect, priming. (Ch. 23)
Equivalence framing — Presenting logically equivalent information in different surface forms to demonstrate that framing shapes judgment. The classic example: "95% survival rate" versus "5% death rate" for a medical treatment produce different evaluations despite being identical statements. In politics: "97 out of 100 scientists agree" versus "3% of scientists disagree." See also: framing effect.
Exit poll — A survey conducted at polling places on Election Day in which voters who have just cast their ballots are asked to complete a questionnaire about their vote choices, demographic characteristics, and issue priorities. Conducted by Edison Research for the National Election Pool (ABC, CBS, NBC, CNN, Fox News, AP) in the United States. Used for Election Night projections and post-election analysis of voter demographics. Subject to biases including differential participation by partisan group, early voting exclusion, and differential rates of in-person versus mail voting. See also: election night projection, differential response. (App. E)
F
Feeling thermometer — A survey question that asks respondents to rate their warmth or coolness toward a person, group, or institution on a scale from 0° (very cold/negative) to 100° (very warm/positive). Used extensively in the American National Election Studies (ANES). Feeling thermometers capture affective (emotional) evaluations as distinct from cognitive (attribute-based) assessments. Affective polarization is commonly measured as the gap between in-group and out-group thermometer ratings. See also: affective polarization, ANES. (Ch. 3)
Focus group — A small-group qualitative research method in which a moderator facilitates discussion among 6–12 participants on a defined topic (a candidate, a message, an advertisement, a policy position). Focus groups generate hypotheses and reveal the language people use to think about issues; they are not representative of any population. Results cannot and should not be generalized beyond the participants. See also: qualitative research, message testing. (Ch. 6)
Framing effect — The phenomenon in which the way an issue is presented — the words used, the aspects highlighted, the comparisons invoked — influences the attitudes and judgments respondents express, independent of the underlying facts. Framing effects are among the most robust findings in political communication research. See also: emphasis framing, equivalence framing, priming, divergence framing. (Ch. 23)
Fundamentals model — An election forecasting model that predicts electoral outcomes using pre-campaign structural variables — economic conditions (GDP growth, unemployment, income, presidential approval) and structural factors (incumbency, time-in-office) — without incorporating any polling data. Fundamentals models are based on historical relationships between these variables and electoral outcomes and provide the prior probability against which polling data is evaluated. See also: structural model, Bayesian updating, economic voting. (Ch. 17)
G
Generic ballot — A survey question asking respondents whether they prefer to vote for the "Republican Party candidate" or the "Democratic Party candidate" for Congress, without naming specific candidates. The generic ballot is a widely used indicator of the national political environment and a key input into congressional forecasting models. Typically, a party needs a generic ballot advantage of 3–5 points to win a majority in the House, due to the geographic distribution of partisan voters. See also: fundamentals model, structural model. (Ch. 18)
Geographic Information System (GIS) — Software systems that store, analyze, and visualize spatial data. In political analytics, GIS tools are used for mapping partisan composition by precinct, planning canvassing routes, analyzing redistricting proposals, and visualizing demographic data. ArcGIS and QGIS are the most common platforms. See also: redistricting, microtargeting. (Ch. 29)
GOTV (Get Out the Vote) — Campaign activities designed to increase turnout among voters who already support the campaign. GOTV operations typically focus on high-support, low-propensity voters — those with support scores above a threshold but turnout scores below a threshold — and use phone banking, door knocking, ride-to-polls programs, and absentee ballot assistance. Contrasted with persuasion targeting. See also: support score, turnout model, universe (targeting). (Ch. 29, 30)
H
Herding (polls) — The tendency of polling firms to publish results that cluster around the consensus estimate or around results from prestigious pollsters, suppressing genuine variation in polling results. Herding occurs when pollsters who obtain outlier results adjust their methodology or withhold publication rather than risk being wrong publicly. Produces an artificially narrow range of published results and can cause the polling average to "converge" on an incorrect value. Identified as a contributor to both 1948 and 2016 polling failures. See also: house effect, aggregate bias. (Ch. 10)
House effect — The systematic tendency of a particular polling firm to produce results that favor one party or candidate more than the average of other polls. House effects can arise from methodological choices (likely voter screen, weighting variables, mode of interviewing) or from deliberate decisions. A firm with a consistent 2-point Republican house effect will show the Republican 2 points better than the underlying polling average. Adjusting for house effects improves accuracy of polling averages. See also: herding, aggregate bias. (Ch. 10, 19)
I
Ideological sorting — The process by which the two major political parties have become more internally homogeneous ideologically and more distinct from each other. "Sorted" parties mean that liberals are concentrated in the Democratic Party and conservatives in the Republican Party — a development that was much less true in the mid-20th century era of Southern conservative Democrats and Northern liberal Republicans. Sorting is distinct from polarization (the overall spread of opinion) though the two are related. See also: affective polarization, polarization. (Ch. 3)
Index (composite) — A single summary measure created by combining multiple related survey questions or variables. For example, an economic anxiety index might combine questions about job security, financial situation, and economic optimism. Composite indexes reduce the noise in individual questions and capture latent constructs that no single question fully measures. Construction requires decisions about which items to include, how to scale them, and how to weight them. See also: reliability, factor analysis. (Ch. 9)
Inferential statistics — Statistical methods that draw conclusions about a population from a sample. In survey research, inferential statistics include confidence intervals, margins of error, hypothesis tests (t-tests, chi-square), and regression analysis. Distinguished from descriptive statistics, which summarize the sample itself. Inferential statistics assume that the sample was drawn via a probability-based method; applying them to non-probability samples requires caution and alternative justification. See also: confidence interval, margin of error, probability sample. [STAT]
Information environment — The totality of political information available to citizens through all channels — news media, social media, interpersonal communication, campaign communication, entertainment media. The information environment shapes the salience of issues, the attributes of candidates that citizens use to evaluate them, and the level of political knowledge citizens hold. See also: priming, selective exposure, echo chamber. (Ch. 22)
Instrumental variable — A variable used in observational research to approximate the conditions of a randomized experiment. An instrumental variable is correlated with the treatment (independent variable) but affects the outcome only through the treatment, not through any other path. Valid instruments are rare but powerful tools for causal inference. See also: causal inference, quasi-experimental design. [STAT]
Intraclass correlation (ICC) — A measure of the degree to which units within a cluster (e.g., households within a precinct, precincts within a county) are more similar to each other than to units in other clusters. In cluster sampling, a high ICC means that clusters are very homogeneous, and sampling more clusters rather than more units per cluster will increase precision. High ICC in political data (precincts are politically homogeneous) means that geographically clustered samples are less efficient than simple random samples. See also: cluster sampling. [STAT]
L
Likely voter (LV) — A survey respondent classified as likely to cast a ballot in the upcoming election. Likely voter screens attempt to improve election predictions by restricting the sample to people who will actually vote. Common screening criteria include stated intention to vote, past voting history, interest in the election, and knowledge of where to vote. Different LV models produce significantly different results, particularly in primaries and low-salience elections. Contrasted with registered voter (RV) and all adults (AA) samples. See also: turnout model. (Ch. 11)
List experiment (also item count technique) — A survey technique for measuring attitudes on sensitive topics (support for racist policies, illegal behavior, socially stigmatized views) that respondents might not report honestly in direct questions. Respondents are randomly assigned to see either a list of non-sensitive items or the same list plus the sensitive item, and asked to report the total count rather than the specific items. Differences in means between conditions estimate the prevalence of the sensitive attitude. See also: social desirability bias. (Ch. 7)
Longitudinal survey — A survey design in which the same respondents are interviewed at multiple points in time. Panel surveys are the most common form, following the same individuals across survey waves. Allows measurement of individual-level change (vote switching, opinion change) rather than just aggregate-level trends. Subject to panel attrition (dropouts) and panel conditioning (respondents changing behavior because of being surveyed). Contrasted with cross-sectional surveys, which interview different samples at each time point. See also: panel conditioning, tracking poll. (Ch. 4)
M
Margin of error (MoE) — The range above and below a poll estimate within which the true population value is estimated to fall with a specified probability (usually 95%). For a simple random sample, MoE ≈ 1/√N for a 95% confidence interval, where N is the sample size. A poll of 1,000 respondents has an MoE of approximately ±3 percentage points. The MoE applies to each individual proportion, but the MoE for a difference between two proportions is larger (approximately √2 times larger). Commonly misunderstood as a statement about the maximum possible error rather than the expected range of sampling variability. See also: confidence interval, sampling error. [STAT] (Ch. 8)
Mean reversion — The tendency of extreme values to move toward the long-term average over time. In election forecasting, mean reversion is a key concept: polls showing very large leads or deficits typically overstate the eventual margin because some of the observed gap reflects sampling variation and temporary factors that diminish as Election Day approaches. See also: regression to the mean, fundamentals model.
Media market — A geographic area defined by which television stations' signals reach it, used by the Nielsen company to measure television audiences. Media markets are the primary unit for television advertising planning in political campaigns. The United States has approximately 210 media markets, ranging from New York (the largest) to Glendive, Montana (the smallest). Key in campaign resource allocation analysis. (Ch. 28)
Microtargeting — The use of individual-level data to identify specific voters who are likely to respond favorably to specific messages or mobilization efforts, and to target those voters with differentiated communication. Modern microtargeting combines voter file data, commercial consumer data, and predictive models to produce individual-level support scores and persuadability scores. First used at scale by the Bush 2004 campaign. See also: support score, voter file, targeting. (Ch. 29)
Mode effect — The difference in survey results produced by different methods of data collection (telephone vs. online vs. face-to-face vs. text). Mode effects arise because the medium of interviewing affects question interpretation, social desirability pressures, respondent attention, and the populations reached. For example, social desirability bias is stronger in interviewer-administered surveys than in self-administered ones. Understanding mode effects is essential when comparing polls conducted by different methods. See also: social desirability bias, online panel. (Ch. 10)
Model accuracy — A measure of how well a predictive model's outputs match actual outcomes. In political analytics, model accuracy is assessed by comparing predicted probabilities (support scores, turnout probabilities) to actual behavior. Key metrics include Brier score (for probabilistic predictions), AUC-ROC (discrimination), and calibration (are events predicted at 70% probability happening about 70% of the time?). See also: support score, turnout model. (Ch. 16)
MRP (Multilevel Regression and Poststratification) — A statistical method for estimating opinion in small geographic areas using data from national surveys. The multilevel regression step models individual opinion as a function of individual-level demographics (age, sex, race, education) and geographic-level characteristics (state partisanship, urbanicity). The poststratification step applies these model estimates to the actual demographic composition of each geographic unit. MRP allows estimation of state or district-level opinion from a national survey and is increasingly used as a polling alternative for small areas. See also: poststratification weighting, regression, small-area estimation. (Ch. 20)
N
Negative partisanship — The phenomenon in which partisan loyalty is driven primarily by dislike of the opposing party rather than enthusiasm for one's own party. Voters with negative partisanship vote reliably for their party not because they enthusiastically support its candidates but because they find the opposing party deeply objectionable. Negative partisanship has grown substantially in the U.S. electorate since the 1990s. See also: affective polarization, party identification. (Ch. 3, 33)
Non-probability sample — A sample in which the probability of selection for each member of the population is unknown. Online opt-in panels, self-selected web polls, and convenience samples are non-probability samples. Traditional inferential statistics (margin of error, confidence intervals) do not apply directly to non-probability samples. Alternative methods — propensity score weighting, matched sampling, model-based inference — are used to improve the representativeness of non-probability samples. See also: probability sample, online panel, coverage bias. (Ch. 10)
Non-response bias — Bias arising when people who do not respond to a survey differ systematically from those who do, in ways that affect the measured variable. If politically engaged voters are more likely to respond to political surveys, and political engagement correlates with partisan preference, the resulting bias will affect measured candidate support. Non-response bias cannot be corrected by simply weighting for demographics if the unresponding group differs not just demographically but attitudinally from the responding group. See also: differential response, partisan nonresponse, response rate. (Ch. 10)
O
Online panel — A pool of pre-recruited volunteers who have agreed to complete surveys in exchange for points, cash, or other incentives. Online panels can be surveyed quickly and cheaply, making them attractive for political polling. Because panelists self-select into the panel, they are not probability samples; representativeness requires weighting and modeling. Major online panel providers include Lucid, Dynata, Qualtrics Panels, and YouGov. See also: non-probability sample, matched sampling. (Ch. 10)
Open-ended question — A survey question that allows respondents to answer in their own words rather than selecting from predetermined response options. Open-ended questions capture the range and texture of opinion and reveal the language respondents use naturally. More expensive to analyze (require coding or NLP analysis) than closed-ended questions. Often used in qualitative research and as follow-up to quantitative measures. See also: coding, NLP, qualitative research.
Opposition research (oppo research) — The systematic collection and analysis of information about political opponents, including voting records, public statements, financial disclosures, court records, and past positions. Opposition research forms the evidentiary basis for negative campaign advertising and earned media strategy. In the analytics context, opposition research is increasingly combined with survey testing to identify which vulnerabilities are most persuasive with target audiences. (Ch. 28)
Oversample — The practice of including a larger proportion of a minority group in a survey than exists in the general population, ensuring enough cases for statistically meaningful analysis of that group. Oversamples are common for racial minorities, residents of specific geographic areas, or specific age groups. Results must be weighted back to population proportions before reporting overall estimates. See also: stratified sampling, weighting. (Ch. 8)
P
Panel attrition — The dropout of respondents between waves of a longitudinal panel survey. Attrition is typically non-random: lower-engagement respondents, younger respondents, and mobile respondents are more likely to drop out. This creates bias over time as the panel becomes increasingly composed of engaged, stable respondents. See also: longitudinal survey, panel conditioning.
Panel conditioning — The effect of prior survey participation on respondents' subsequent attitudes and behavior. Respondents who have been asked about their voting intentions may become more likely to vote; those asked about issue positions may develop more crystallized views. Panel conditioning is a concern in studies that use the same sample repeatedly. See also: longitudinal survey, panel attrition. (Ch. 4)
Partisan nonresponse — A specific form of differential response in which partisans of one party are less likely to respond to surveys during periods when their political enthusiasm or hostility is activated. When Republicans are politically engaged and resistant to outreach (as some analysts argue was the case in 2020), Republican representation in survey samples drops, producing Democratic-inflated estimates. Correcting for partisan nonresponse is methodologically challenging because partisanship may not be stable enough to use as a weighting variable. See also: differential response, non-response bias. (Ch. 10)
Party identification — A long-standing psychological attachment to a political party that shapes how individuals perceive and evaluate political information. The concept was introduced by Angus Campbell, Philip Converse, Warren Miller, and Donald Stokes in The American Voter (1960). Party identification is typically measured on a 7-point scale from "strong Democrat" through "Independent" to "strong Republican." It is both the strongest single predictor of vote choice and an object of substantial academic debate about its stability, causality, and contemporary meaning. See also: negative partisanship, ideological sorting. (Ch. 2, 3)
Persuadability score — An individual-level model estimate of how responsive a voter is to campaign communication. High persuadability scores indicate voters whose opinions are weakly held, who have considered voting for both parties, or whose demographic and attitudinal profile suggests openness to persuasion. Persuadability scores are used to target persuasion contact in campaigns. See also: support score, microtargeting, universe (targeting). (Ch. 29)
Point estimate — A single value calculated from sample data as the best estimate of a population parameter. In polling, the percentage of respondents who prefer a candidate is the point estimate of that candidate's support in the population. Distinguished from an interval estimate (confidence interval), which expresses uncertainty around the point estimate. See also: confidence interval, margin of error. [STAT]
Polarization — The divergence of political attitudes, party coalitions, or governing behavior toward opposite extremes. Political scientists distinguish: (1) mass polarization (whether ordinary citizens' views have become more extreme); (2) elite polarization (documented increase in ideological distance between Republican and Democratic elected officials); (3) affective polarization (partisan hostility regardless of policy distance). Evidence is strong for elite and affective polarization, more contested for mass polarization. See also: affective polarization, ideological sorting, negative partisanship. (Ch. 3, 33)
Polling average — A combined estimate of candidate support calculated by averaging across multiple polls, often with weights based on poll quality, sample size, recency, and pollster track record. Polling averages reduce the influence of any single poll's idiosyncratic error and provide more reliable signal than individual polls. Nate Silver/FiveThirtyEight, RealClearPolitics, and 538's successor Polymarket all publish polling averages. See also: house effect, herding. (Ch. 19)
Populism — A political ideology or rhetorical style that posits a virtuous, unified "people" in opposition to a corrupt, self-serving "elite." In its thin-centered form, populism is compatible with many different thick ideological programs (left-wing or right-wing), and its specific content depends on who the "people" and the "elite" are defined to be. In political analytics, populism is measured through survey instruments that assess anti-elite sentiment, demand for direct democracy, and belief in popular sovereignty. See also: thin ideology, authoritarian attitudes. (Ch. 33, 34)
Post-stratification weighting — A statistical adjustment applied to survey data to make the sample match known population characteristics. If women are 52% of the adult population but 55% of respondents, women's responses are downweighted to 52% and men's upweighted to 48%. Standard weighting variables include age, sex, race/ethnicity, education, and geographic region. More complex forms include raking (iterative proportional fitting) and MRP (model-based poststratification). See also: raking, MRP, weighting. (Ch. 9, 20)
Probabilistic forecasting — An approach to election prediction that expresses outcomes as probabilities rather than point predictions. Instead of "Candidate X will win," a probabilistic forecast states "Candidate X has a 72% probability of winning." Probabilistic forecasts are more honest about uncertainty, allow readers to reason about scenarios, and can be formally evaluated for calibration after elections. See also: Bayesian updating, calibration, fundamentals model. (Ch. 17, 21)
Probability sample — A sample in which every member of the target population has a known, non-zero probability of being selected. Probability samples are the foundation of classical inferential statistics; their theoretical guarantees (unbiasedness, calculable variance) underpin the margin of error and confidence interval. Simple random samples, stratified random samples, cluster samples, and systematic samples are all forms of probability samples. See also: non-probability sample, stratified sampling, cluster sampling. [AAPOR] (Ch. 8)
Priming — The process by which media coverage or campaign communication raises the salience of particular issues, causing citizens to weight those issues more heavily in their political evaluations. If media devote extensive coverage to immigration, voters may evaluate the president more heavily on immigration performance. Priming is related to but distinct from agenda-setting (which issues receive attention) and framing (how issues are characterized). See also: framing effect, emphasis framing, agenda-setting. (Ch. 23)
Push poll — A type of campaign tactic disguised as a poll in which respondents are not genuinely sampled to measure opinion but are contacted to be exposed to negative (often false or misleading) information about an opponent. "If you knew that Candidate X had done Y, would you still vote for them?" Push polls are not polls — they are phone-based negative advertising. They are condemned by AAPOR and other professional organizations. Distinguished from message testing (which genuinely measures reaction to messages) by their non-representative sample and their intent to influence rather than measure. See also: message testing. (Ch. 10)
Q
Quota sampling — A sampling method in which interviewers are instructed to recruit a specified number of respondents in each demographic category (e.g., 50% women, 30% age 18–34, 40% non-college). Used by Gallup, Roper, and Crossley through 1948. Quota sampling does not require a random mechanism for selection within categories; interviewers fill quotas however is convenient. This introduces interviewer selection bias — the systematic tendency to recruit more accessible, cooperative, higher-status respondents within each cell — which led to the failure of the 1948 polls. See also: probability sample, stratified sampling. (App. E)
R
Raking (also iterative proportional fitting) — A mathematical procedure for adjusting survey weights so that the weighted sample simultaneously matches population marginals on multiple demographic variables. Unlike simple cell-weighting (which requires a known joint distribution of all weighting variables), raking only requires known marginal distributions (e.g., the age distribution and the race distribution separately, not the joint age × race distribution). Raking iterates through variables until convergence. Standard practice in commercial polling. See also: post-stratification weighting, weighting. (Ch. 9)
Randomized controlled trial (RCT) — An experiment in which participants are randomly assigned to treatment and control conditions, enabling causal inference. In political analytics, RCTs are used to test the effects of canvassing, mailers, advertising, and messaging on vote choice and turnout. The gold standard for causal claims about what works in campaigns. See also: causal inference, A/B testing, GOTV. (Ch. 5, 30)
RAS model — The Receive-Accept-Sample model of attitude formation developed by John Zaller. Individuals receive political messages (Receive), accept messages consistent with their predispositions (Accept), and sample from their considerations when asked for an opinion (Sample). The RAS model explains why educated citizens can hold contradictory views (they receive more messages) and why opinion is malleable (sampling variation produces instability). See also: top of head sampling, considerations. (Ch. 2)
Recall bias — Error arising when respondents inaccurately remember past events or behavior. In political surveys, respondents systematically over-report past voting (bandwagon effect), over-report having voted for the winner, and misremember their own past policy positions to align with current views. Vote recall questions are notoriously unreliable more than one or two elections back. See also: social desirability bias. (Ch. 7)
Redistricting — The process of redrawing legislative district boundaries, typically following each decennial census. Redistricting determines the partisan composition of congressional and state legislative districts and has profound effects on electoral competitiveness. Analytics play a central role in redistricting: GIS mapping, demographic modeling, and partisan performance analysis are all used by legislatures, courts, and advocacy groups. See also: geographic information system, gerrymandering. (Ch. 14)
Regression discontinuity — A quasi-experimental research design that exploits a discontinuity (a threshold or cutoff) in treatment assignment to estimate causal effects. In political science: comparing candidates who barely won an election (received just over 50% of the vote) to those who barely lost. Candidates on either side of the threshold are assumed to be similar in all respects except incumbency, enabling causal inference about incumbency effects. See also: causal inference, quasi-experimental design. [STAT]
Regression to the mean — The statistical phenomenon whereby extreme observations in a sample tend to be followed by less extreme observations on subsequent measurement. In polling, a poll showing an unusually large lead is more likely to be followed by a more moderate result because the initial result contains sampling error. In forecasting, regression to the mean argues against projecting observed extreme margins forward. See also: mean reversion, sampling error. [STAT]
Reliability — The degree to which a survey measure produces consistent results when administered under similar conditions (test-retest reliability) or when multiple indicators of the same construct are used (internal consistency reliability). A reliable measure of partisanship should produce similar results if the same respondent is asked the same questions a week later. Reliability is a necessary but not sufficient condition for validity. See also: validity, index. [STAT] (Ch. 7)
Response rate — The proportion of selected sample units from which completed interviews are obtained. AAPOR defines multiple response rate formulas of varying stringency (RR1 through RR6). Declining response rates in telephone polling (from ~36% in 1997 to ~6% in 2018) have intensified concerns about non-response bias. Low response rates do not automatically produce biased results if respondents are similar to non-respondents on the measured variables, but this assumption becomes increasingly difficult to justify as response rates fall. See also: non-response bias, differential response. [AAPOR] (Ch. 10)
Retrospective voting — The theory that voters evaluate incumbents or governing parties based on past performance (particularly economic performance) rather than future promises. The key empirical prediction: when the economy is strong, the incumbent party gains votes; when it is weak, the incumbent loses votes. The dominant interpretation of Kramer's seminal work, and the foundation of economic voting theory and fundamentals forecasting models. See also: fundamentals model, economic voting. (Ch. 17)
S
Sampling error — The random variation in sample statistics that arises because a sample, rather than the entire population, was measured. Sampling error decreases as sample size increases (proportional to 1/√N for simple random samples). Sampling error is the only form of survey error quantified by the margin of error; systematic errors (coverage bias, non-response bias, measurement error) are not captured by the MoE. See also: margin of error, confidence interval, non-sampling error. [STAT] (Ch. 8)
Sampling frame — The list or rule from which a sample is drawn. The quality of a sampling frame determines the extent of coverage bias in the resulting sample. Telephone directories excluded unlisted numbers; landline RDD sampling excluded cell-only households; online panels exclude people without internet access. A perfect sampling frame would include every member of the target population exactly once. See also: coverage bias, probability sample. (Ch. 8)
Selective exposure — The tendency of individuals to seek out and consume information that confirms their existing beliefs and avoid information that challenges them. Selective exposure to partisan media limits the corrective function of new information and contributes to affective polarization. While the empirical evidence for strong selective exposure effects is more limited than popular narratives suggest, it is a documented phenomenon in politically engaged populations. See also: echo chamber, affective polarization. (Ch. 22)
Sleeper effect — The phenomenon in which the persuasive impact of a low-credibility message increases over time, because recipients remember the message but forget the source. Implies that negative ads from disreputable sources may be more effective than initially appears. An area of ongoing debate in political communication research. See also: source credibility.
Social desirability bias — The tendency of survey respondents to report attitudes and behaviors that they believe are socially expected or approved, rather than their true views. Particularly significant for questions about race, immigration, prejudice, voter turnout (over-reporting), candidate support (when one candidate is perceived as the more socially acceptable choice), and sensitive personal behavior. List experiments and endorsement experiments are designed to reduce social desirability bias. See also: list experiment, recall bias, mode effect. (Ch. 7)
Spiral of silence — Elisabeth Noelle-Neumann's theory that individuals who believe their opinion is in the minority will be less likely to express it publicly, out of fear of social isolation. This silencing then further reduces the perceived prevalence of the minority view, creating a feedback loop. In political analytics, the spiral of silence is invoked to explain why polling might underestimate support for stigmatized candidates or positions — a version of the social desirability argument applied at the social/environmental level. See also: social desirability bias, bandwagon effect. (Ch. 7, 25)
Split sample — An experimental design embedded in a survey in which different randomly assigned subgroups of respondents receive different versions of a question or item set. Split samples allow researchers to test the effect of question wording, framing, or information on attitude expression while controlling for all other variables. The workhorse tool for survey experiments. See also: survey experiment, framing effect. (Ch. 5, 7)
Stratified sampling — A sampling procedure in which the population is divided into non-overlapping groups (strata), and separate samples are drawn from each stratum. Stratified sampling increases precision when strata are internally homogeneous and different from each other. In political polling, common strata include geographic regions, racial groups, and age cohorts. Oversamples of small strata (minority groups) are standard in stratified samples. See also: cluster sampling, probability sample, oversample. [STAT] (Ch. 8)
Structural model — An election forecasting model based on stable, measurable features of the political environment (presidential approval, economic conditions, seat exposure, incumbency) that predict electoral outcomes based on historical patterns. Distinguished from poll-only models by the inclusion of non-poll predictors. See also: fundamentals model, Bayesian updating. (Ch. 17)
Support score — An individual-level model estimate of a voter's probability of supporting a specific candidate or party, expressed on a 0–100 scale. Support scores are generated by applying predictive models (trained on surveys and past voting data) to individual voter records. The primary targeting tool in modern campaigns: voters with support scores above a threshold are identified as base supporters, below a threshold as opponents, and in the middle as persuasion targets. See also: persuadability score, microtargeting, voter file. (Ch. 29)
Survey experiment — A research design that uses random assignment of question wording, frames, information, or conditions to measure causal effects of communication on attitudes. Allows causal inference within survey settings. See split sample. See also: split sample, RCT. (Ch. 5)
Swing voter — A voter who does not have a strong attachment to either major party and whose vote choice varies from election to election. In campaigns, swing voters are the primary target of persuasion efforts. The definition and size of the "true" swing voter population is contested: some analysts argue it has shrunk dramatically as affective polarization has increased, while others maintain that a meaningful pool of true persuadables exists in most competitive districts. See also: persuadability score, affective polarization. (Ch. 29)
Systematic sampling — A probability sampling procedure in which every kth unit is selected from a list after a random starting point. For example, to select 100 households from a list of 10,000, select a random number between 1 and 100, then select every 100th household on the list. Systematic sampling approximates simple random sampling when the list is not sorted by the variable of interest. See also: probability sample, stratified sampling. [STAT]
T
Thin ideology — A concept developed by political theorist Michael Freeden to describe ideologies (like populism) that occupy only part of the ideological spectrum and require combination with a "thick" or full ideology (socialism, nationalism, liberalism) to specify complete policy positions. Thin ideologies are flexible and can be attached to movements across the ideological spectrum. See also: populism. (Ch. 33)
Tracking poll — A poll conducted continuously (daily or every few days) using a rolling sample that is averaged over a defined window (typically 3–7 days). Each day, a new set of interviews is added and the oldest set is dropped, providing a rolling estimate of public opinion. Used by campaigns to monitor opinion trends in near-real time. See also: benchmark poll, longitudinal survey. (Ch. 4)
Turnout model — A predictive model that estimates the probability that an individual registered voter will cast a ballot in a specific upcoming election. Inputs typically include past voting history (the single strongest predictor), party registration, age, geographic location, and consumer data indicators. Turnout models are used in conjunction with support scores to identify the GOTV and persuasion universes in campaigns. See also: support score, GOTV, voter file. (Ch. 30)
U
Universe (targeting) — In campaign parlance, a defined list of voters who meet specified criteria and are the target of a particular campaign activity (GOTV, persuasion, fundraising, opposition targeting). Defining universes — who is included, what criteria are used, how large the list is — is the central operational task of voter targeting analytics. See also: GOTV, microtargeting, support score. (Ch. 29)
V
Validity — The degree to which a survey measure actually captures the construct it is intended to measure. A measure can be reliable (consistent) but not valid (not measuring the intended thing). Forms of validity include: face validity (the measure looks like what it's supposed to measure), content validity (it covers the full domain of the construct), criterion validity (it predicts an external criterion), and construct validity (it relates to other measures as theory predicts). See also: reliability. [STAT] (Ch. 7)
Voter file — A database of registered voters maintained by state election authorities, containing each voter's name, address, date of birth, party registration (in states with party registration), and voting history in past elections. State voter files are publicly available (at varying costs and under varying terms). Campaign analytics operations acquire voter files and append additional data (consumer records, survey data, volunteer activity) to build targeting models. See also: microtargeting, support score, data merge. (Ch. 29)
Voter persuasion — Campaign activities designed to change the preferences of persuadable voters who have not yet committed to supporting the campaign's candidate. Persuasion targets are identified via persuadability scores and contacted via direct mail, digital advertising, canvassing, or phone banking with messages designed to appeal to their specific concerns. Contrasted with GOTV (mobilizing committed supporters). See also: persuadability score, GOTV, swing voter. (Ch. 29)
W
Weighting — The statistical adjustment of survey data to correct for imbalances between the sample and the target population. Standard demographic weights correct for overrepresentation of some groups (e.g., college-educated respondents) and underrepresentation of others (e.g., younger respondents). Advanced weighting also adjusts for behavioral characteristics (past vote, turnout) and geographic composition. Over-weighting (applying extreme weights to very few respondents) increases sampling variance. See also: post-stratification weighting, raking, MRP. (Ch. 9)
Word embedding — A computational technique in natural language processing (NLP) in which words or phrases are represented as dense numerical vectors in a high-dimensional space, such that words with similar meanings are close together in the space. Word embeddings (Word2Vec, GloVe, fastText) enable machine learning models to understand semantic relationships between words. Used in political analytics for classifying social media content, analyzing political speech, and measuring ideological positioning from text. See also: NLP, sentiment analysis. (Ch. 16)
Additional Terms
A/B testing — A randomized experiment comparing two (or more) versions of a communication (email subject lines, digital ads, mailer headlines) to determine which produces better outcomes (open rates, click-throughs, donations, volunteer signups). The campaign equivalent of a clinical trial. See also: RCT. (Ch. 5, 30)
Agenda-setting — The theory that media influence public opinion not by telling people what to think, but by telling them what to think about. Issues that receive extensive media coverage become more salient to citizens, who then weight those issues more heavily in political evaluations. See also: priming, framing effect. (Ch. 22, 23)
Authoritarianism — A personality orientation or value system characterized by preference for social conformity, deference to authority, hostility to out-groups, and desire for strong leadership. Measured in surveys via items about child-rearing values or directly via "social dominance orientation" and "right-wing authoritarianism" scales. An important predictor of support for populist authoritarian movements. See also: populism, authoritarian attitudes. (Ch. 33, 34)
Ballot measure — A legislative question placed directly before voters, either through citizen initiative, legislative referral, or other mechanisms. Analytics for ballot measures require different approaches than candidate races: there is no voter history for the specific measure, framing and messaging research is especially critical, and opposition targeting is less well-defined.
Bellwether — A county, precinct, state, or other geographic unit whose electoral results consistently predict the national outcome. Bellwethers are useful for tracking national results on election night, but their predictive reliability tends to degrade over time as demographic change alters their composition.
Calibration — A property of probabilistic forecasts: a forecast is well-calibrated if events predicted at 70% probability actually occur approximately 70% of the time, events predicted at 90% occur about 90% of the time, etc. Calibration is assessed across many predictions of similar probability. See also: probabilistic forecasting, Brier score. (Ch. 21)
Canvassing — Door-to-door voter contact in which campaign volunteers or paid staff visit homes of targeted voters to deliver a scripted message and record contact information. The most effective GOTV and persuasion tactic per contact, but also the most expensive. Effectiveness is well-documented through randomized experiments. See also: GOTV, RCT. (Ch. 30)
Contingent valuation — A survey method that asks respondents to express the monetary value they place on a good or outcome. Used in policy research to measure the value of public goods. Applied in political analytics to estimate willingness to pay for policy outcomes or to prioritize among policy options.
Economic voting — The tendency of voters to reward or punish incumbent parties for economic conditions, especially income growth and unemployment. The central mechanism of the fundamentals model. Classic studies by Kramer (1971), Fair (1978), and Hibbs (1987) established the empirical basis; ongoing debate concerns which economic metrics matter most (GDP vs. income, national vs. local, absolute vs. change). See also: retrospective voting, fundamentals model. (Ch. 17)
Electoral College — The system by which the U.S. president is elected, in which each state receives a number of electors equal to its congressional delegation (House seats + 2 senators). In 48 states, the winner of the popular vote in that state receives all of its electoral votes (winner-take-all). This structure means presidential elections are effectively 50 separate state elections, and a candidate can win the presidency while losing the national popular vote (2000, 2016). See also: battleground state, structural model. (Ch. 18)
Endorsement experiment — A survey experiment that measures support for a policy by varying whether it is attributed to a particular leader, group, or country. If the same policy is more supported when attributed to a popular leader, the difference estimates the endorsement effect. Used to study cue-taking and to measure political sensitivity around policies that respondents might not endorse if they knew the true source. See also: list experiment, survey experiment. (Ch. 7)
Gender gap — The difference in political preferences between men and women. In the United States, women have voted more Democratic than men since at least 1980 (the "gender gap"), and this gap has widened substantially since 2016. The gender gap is distinct from the marriage gap, the education gap, and other demographic cleavages. See also: cross-tabulation. (Ch. 13)
Ground game — The field operations component of a campaign, including voter registration, canvassing, phone banking, early vote programs, and Election Day turnout operations. The ground game represents the operational application of analytics: models identify universe, field staff executes contacts. See also: GOTV, canvassing. (Ch. 30)
Heuristic — A mental shortcut or rule of thumb used to simplify complex decisions. In voting, heuristics include party identification, candidate attributes (likability, competence), endorsements, and "Who does this candidate look like?" Low-information voters rely more heavily on heuristics; political knowledge increases capacity to use policy information as a decision criterion. See also: cue-taking, party identification. (Ch. 2)
In-group favoritism / out-group derogation — The tendency to evaluate members of one's own social group (in-group) more favorably than members of other groups (out-groups). A core mechanism of affective polarization: partisans rate their own party warmly and the opposing party coldly. See also: affective polarization, feeling thermometer. (Ch. 3)
Issue ownership — The phenomenon in which parties or candidates are perceived as more competent and trustworthy on certain issues (Republicans on defense and crime; Democrats on healthcare and education). Issue ownership affects which issues are activated in campaigns and how frames resonate with different audiences. See also: priming, agenda-setting. (Ch. 23)
Persuasion rate — In campaign field experiments, the estimated percentage of contacted voters who changed their vote intention as a result of the contact. Typical persuasion rates for canvassing and phone banking are very small (1–5%) on a per-contact basis, but aggregate to meaningful numbers in large-scale operations. See also: RCT, canvassing. (Ch. 30)
Political knowledge — Respondents' demonstrated recall of factual information about politics (names of officials, positions of parties, constitutional provisions). Political knowledge is associated with more consistent political attitudes, greater resistance to persuasion attempts, and greater use of policy information in evaluations. The most reliably measured psychological variable in political research. See also: heuristic, cue-taking. (Ch. 2)
Quota — In targeting, the number of contacts or conversations a volunteer or field organizer is expected to complete in a given time period. Quotas are set based on model predictions of voter density in walking lists or phone lists, and are used to assess field program performance.
Regularization — A technique in machine learning and regression analysis that penalizes model complexity to prevent overfitting — the tendency of models trained on sample data to fit noise as well as signal, performing well in-sample but poorly out-of-sample. Common regularization methods include ridge regression (L2), LASSO (L1), and elastic net. Standard practice in political analytics modeling. See also: model accuracy, overfitting. (Ch. 16)
Super PAC — A type of political action committee that may raise unlimited funds from corporations, unions, and individuals, but may not contribute to or coordinate directly with candidates or political parties. Super PACs conduct "independent expenditures" — advertising and organizing activities conducted independently of campaigns. The analytics and targeting functions of super PACs are technically separate from campaign analytics. See also: opposition research.
Topline — The summary document of a poll's results, presenting all questions and response distributions for the full sample and key subgroups. Professional polling organizations publish topline documents alongside their press releases. Analysis of a poll without access to the topline — relying only on press coverage — is often misleading. See also: crosstab. (Ch. 10)
Turnout gap — Differences in turnout rates between demographic groups. The most politically consequential turnout gaps in American elections are between younger and older voters (older voters turn out at substantially higher rates), between high- and low-income voters, and (in recent elections) between college-educated and non-college voters. See also: GOTV, turnout model. (Ch. 11, 30)
Undecided voter — A survey respondent who does not express a preference between candidates when asked the ballot test question. Depending on the poll, undecideds may be pushed toward a preference ("If you had to choose..."), left undecided, or further probed for leanings. Undecideds are not equivalent to swing voters: many have weak preferences they are not willing to commit to; others are genuinely undecided. See also: swing voter, ballot test.