Appendix H: Answers to Selected Exercises

How to Use This Appendix: Solutions are organized by chapter. For each exercise, we provide: (1) a brief restatement of the problem so this appendix can be consulted independently; (2) a complete worked solution; and (3) the key insight the exercise is meant to teach. For conceptual exercises, multiple valid answers may exist — the provided answer represents a strong model response. For Python exercises, representative correct code is provided along with a description of expected output. For quantitative exercises, full calculation steps are shown.


Chapter 1: What Is Political Analytics?

Exercise 1.3 — Distinguish Analytics from Punditry

Problem restatement: Compare a political pundit's claim that "enthusiasm among suburban women is clearly shifting" to an analyst's statement that "our tracker shows a 6-point improvement in favorable ratings among suburban women 35–54 since the first debate, n=412, MoE ±4.8." Identify two differences in epistemological claims and one methodological assumption embedded in the analyst's statement.

Worked solution:

Two epistemological differences:

First, the pundit's claim is qualitative and impressionistic — "clearly shifting" implies subjective confidence without specifying what evidence supports the claim or what magnitude of shift is being described. The analyst's claim is quantitative and falsifiable: it specifies the direction (favorable ratings improving), the magnitude (6 points), the population (suburban women 35–54), the time frame (since the first debate), the sample size (412), and the uncertainty (±4.8 points).

Second, the pundit's claim cannot be evaluated for accuracy after the fact — what would falsify the claim that "enthusiasm is shifting"? The analyst's claim can be evaluated: if subsequent polls show no improvement among this group, the claim is contradicted. Falsifiability is a core property of scientific claims.

One methodological assumption in the analyst's statement:

The MoE of ±4.8 assumes that the 412 suburban women in the tracker were drawn via simple random sampling from the population of suburban women 35–54. In practice, they were almost certainly a subset of a larger poll that used complex sampling (weighting, geographic stratification, likely voter screening). The design effect from weighting may increase the effective MoE beyond the stated ±4.8. The analyst should report the effective sample size, not just the raw count, if weighting is applied.

Key insight: Quantitative claims are not automatically more valid than qualitative claims, but they are more falsifiable and more precisely communicable. The discipline of political analytics consists largely of replacing impressionistic qualitative judgments with falsifiable quantitative ones — and then being honest about the assumptions embedded in those quantities.


Exercise 1.7 — Historical Precedents

Problem restatement: The 1936 Literary Digest poll had the largest sample in polling history (approximately 2.3 million respondents) and produced the most spectacular failure in polling history. In two paragraphs, explain why large sample size cannot substitute for representativeness, using the formula for sampling error versus coverage bias.

Worked solution:

The standard formula for sampling error (the random variation inherent in any sample) is approximately SE = √(p(1-p)/n), where p is the true population proportion and n is the sample size. For a proportion near 0.50, this gives SE ≈ 0.50/√n. With n = 2.3 million, sampling error is approximately 0.50/√2,300,000 ≈ 0.0003, or 0.03 percentage points. The Literary Digest's sampling error was trivially small. Their predicted error — Roosevelt would win 43% of the popular vote when he actually won 62% — was approximately 19 percentage points. Sampling error cannot explain this. It is 600 times larger than the sampling error formula would predict.

What the formula makes clear is that sampling error is only one source of error in polls. The formula applies only to probability samples in which every member of the population has a known probability of selection. The Digest's mailing list excluded anyone without a telephone or automobile registration, excluding approximately two-thirds of American households at the time — disproportionately lower-income households that voted heavily for Roosevelt. This coverage bias is a systematic error that does not diminish with sample size. Adding more Digest respondents would have produced the same biased estimate, just more precisely. A sample of 100 truly randomly selected voters from the full population would have outperformed the Digest's 2.3 million biased responses.

Key insight: There are two fundamentally different types of survey error: variable error (sampling error, which decreases with sample size) and systematic error (bias, which does not decrease with sample size and is not captured by the margin of error). No amount of respondents can fix a biased sampling frame. Size is no substitute for representativeness.


Exercise 1.9 — Define the Field's Core Tension

Problem restatement: In political analytics, there is a persistent tension between the "scientific" goal of accurate measurement and prediction, and the "applied" goal of winning elections. Identify one way this tension is productive and one way it is problematic.

Worked solution:

Productive tension: The demand for electoral accuracy creates a strong empirical incentive for methodological improvement. Campaigns that rely on flawed polling lose elections; forecasters who make overconfident predictions are publicly discredited. This competitive pressure for accuracy has driven genuine methodological innovation — from the shift to probability sampling after 1948, to the development of micro-targeting after 2004, to the adoption of randomized field experiments in the 2000s. The accountability of electoral outcomes functions as a truth-forcing mechanism that academic research alone lacks.

Problematic tension: Applied analytics for campaigns creates systematic pressure to generate the results clients want to hear. A pollster hired by a campaign may — consciously or not — make methodological choices that produce rosier results for the client. More fundamentally, the goal of winning elections can license analytical practices — deceptive messaging, suppression targeting, push polling — that are effective campaign tools but violate the norms of honest public communication. An analyst whose professional identity is defined by "winning" may rationalize crossing ethical lines that a scientist committed to truth-seeking would not cross. Managing this tension requires explicit ethical commitments and professional standards that are enforced by institutions, not just individual character.

Key insight: Political analytics sits at the intersection of empirical social science and political advocacy. Understanding this position is essential both for practitioners (who need to maintain intellectual integrity within a goal-directed organizational context) and for observers (who must evaluate claims made by analysts who have stakes in the outcomes they describe).


Chapter 2: Foundations of Public Opinion Theory

Exercise 2.2 — The RAS Model Applied

Problem restatement: Apply Zaller's Receive-Accept-Sample (RAS) model to explain why highly politically aware citizens are sometimes more susceptible to partisan information than low-awareness citizens, while low-awareness citizens are sometimes harder to persuade.

Worked solution:

The RAS model predicts that political awareness (which Zaller measures as political knowledge) affects both reception (the likelihood of receiving a political message) and resistance (the likelihood of rejecting messages inconsistent with one's predispositions). The key insight is that awareness and resistance interact in non-linear ways that produce different predictions for different types of persuasion attempts.

For persuasion by a credible in-party source: High-awareness partisans receive the message (high reception) and accept it because it aligns with their predispositions (high acceptance). Low-awareness partisans may not encounter the message at all (low reception), so the persuasion attempt fails not because of resistance but because of non-reception.

For persuasion by a counter-attitudinal source: High-awareness partisans receive the message but resist it because their strong predispositions lead them to reject counter-attitudinal arguments (high awareness = high resistance). Low-awareness partisans may receive the message if it reaches them, but may also accept it because they lack the partisan schema to evaluate it critically. This is the "two-sided" model of persuasion: high-awareness individuals are hard to persuade against their predispositions but easy to persuade with them; low-awareness individuals are hard to reach but may be movable when they are reached.

Key insight: Awareness and persuasibility are not monotonically related. The RAS model predicts a "two-edged" relationship where high-awareness partisans are both more likely to receive campaign messages (making them easier to mobilize) and more likely to resist counter-attitudinal arguments (making them harder to convert). Effective campaigns distinguish between reinforcement and conversion goals, because the audiences for each are different.


Exercise 2.5 — Belief Systems and Constraint

Problem restatement: Philip Converse found in 1964 that most Americans held "non-attitudes" — unstable, unconstrained opinion expressions that changed between survey waves without reflecting genuine underlying beliefs. Design a study that would distinguish between (a) genuine opinion instability and (b) question ambiguity (people hold stable opinions but interpret the question differently each time).

Worked solution:

The key design challenge is that observed instability on survey items between panel waves could result from either genuine opinion change (the person's views changed) or measurement error (the question is interpreted differently on each occasion, producing different answers from a stable underlying attitude). To distinguish between these:

Approach 1: Multiple-indicator latent variable analysis. If question instability reflects true opinion instability, then all questions measuring the same construct should change together across waves. If instability reflects measurement error, then errors should be uncorrelated across different items measuring the same construct. By administering multiple distinct questions about the same attitude object (e.g., foreign policy conservatism) and modeling their co-variation across waves as a structural equation model with a latent true opinion variable and uncorrelated measurement errors, researchers can estimate the true reliability of each item. Zaller himself (with Feldman) applied this approach and found that even relatively unstable individual items captured genuine underlying orientations when modeled correctly.

Approach 2: Split-sample with varying question wording. Randomly assign respondents across survey waves to receive different phrasings of the same question. If instability is primarily due to question ambiguity, instability should be higher between waves where the wording changed than between waves where it was identical. If instability reflects genuine change, it should be similar regardless of wording variation.

Approach 3: Experimental crystallization. At wave 1, provide some respondents with a crystallizing treatment (a clear, high-quality explanation of the policy question before they are asked their opinion) and leave others in the standard condition. If instability reflects ambiguity, the crystallized group should show more stability across waves. If it reflects genuine opinion fluidity, crystallization should have no effect on stability.

Key insight: The distinction between measurement error and genuine attitude instability is one of the deepest methodological challenges in survey research. Converse's finding of non-attitudes likely combined both genuine instability (many Americans genuinely do not have stable positions on many policy questions) and measurement error (survey items are ambiguous, and respondents interpret them differently on different occasions). Contemporary researchers have generally concluded that Converse overstated the instability by underestimating measurement error, but genuine instability — especially for low-salience issues — remains well-documented.


Exercise 2.8 — Party Identification Measurement

Problem restatement: The standard ANES party identification question is a 7-point scale from "Strong Democrat" to "Strong Republican." Identify two limitations of this measurement approach and suggest a modification for each.

Worked solution:

Limitation 1: Dimensionality. The 7-point scale assumes that party identification is a single dimension on which everyone can be located. But some respondents may feel strongly attached to both parties (cross-pressured partisans), neither (genuine independents), or may think about parties along multiple dimensions (economic vs. cultural). A single dimension may mask important heterogeneity in the nature of partisan attachment.

Modification: Measure party attachment separately for each party, using two independent 1–4 scales: "How closely do you feel attached to the Democratic Party?" (1 = not at all, 4 = very closely) and the same for the Republican Party. This allows identification of true independents (low on both), pure partisans (high on one, low on other), and cross-pressured partisans (moderate or high on both). Research by Weisberg (1983) and Green, Palmquist, and Schickler (2002) has explored these alternatives.

Limitation 2: Conflation of identity and behavior. The standard question conflates psychological identification (feeling attached to a party) with behavioral tendency (voting for the party). Some respondents who say "Democrat" primarily mean "I vote Democratic" rather than "I feel psychologically affiliated with the Democratic Party." This is particularly problematic for scholars trying to understand whether party ID causes vote choice (if it is partly defined by vote choice, the causal claim is circular).

Modification: Separate the measurement into a purely affective/identity component ("How much do you feel like a Democrat / Republican / Independent?") and a behavioral component ("How often have you voted for Democratic / Republican candidates?"). The identity item captures the psychological attachment that Angus Campbell's original concept emphasized, while the behavioral item captures the habitual component.

Key insight: The measurement of party identification has remained largely unchanged since 1952, even as the theoretical understanding of what party identification means has evolved substantially. Measurement limitations can silently constrain theoretical progress, and periodically revisiting foundational measurements is valuable even for concepts that seem settled.


Chapter 3: Political Polarization

Exercise 3.1 — Affective vs. Ideological Polarization

Problem restatement: Using the distinction between affective polarization and ideological polarization, explain why a voter might simultaneously score very high on affective polarization (strong dislike of the opposing party) but very low on ideological polarization (their actual policy positions are moderately centrist).

Worked solution:

A voter can be highly affectively polarized without holding extreme policy positions because affective polarization is driven primarily by social identity processes — the psychological tendency to favor one's own group and derogate out-groups — rather than by policy disagreements. A voter who identifies as a Democrat can feel intense hostility toward Republicans as a social group (seeing them as closed-minded, unpatriotic, selfish) while simultaneously holding moderate positions on immigration, healthcare, or taxes. The identity and the policy positions are not required to cohere.

Several mechanisms produce this pattern. First, partisan media and social networks increasingly define partisan identity through cultural and social symbols rather than specific policy content, so a person can develop a strong partisan identity and associated out-group hostility without ever forming detailed policy views. Second, the expansion of partisan identity into everyday social identity (attending the same churches, living in the same neighborhoods, consuming the same media) means that partisanship becomes a package identity tied to social belonging rather than a policy coalition. Third, "negative partisanship" — voting against the out-party more than for the in-party — can coexist with moderate actual views: a moderate voter who finds the opposing party's leaders especially distasteful may vote reliably for their own party while holding centrist positions.

Key insight: Campaigns that treat polarization as primarily ideological (and therefore try to distinguish themselves from opponents primarily on policy) may be misunderstanding the nature of contemporary partisan motivation. Voters who are affectively polarized are moved more by social identity appeals ("people like us / people like them") than by policy comparisons. This has direct implications for message strategy.


Exercise 3.4 — Measuring Polarization

Problem restatement: You have data from two ANES surveys (1980 and 2020) including feeling thermometer ratings of the Democratic Party and Republican Party. Describe how you would use this data to measure change in affective polarization, and what confounds might limit your conclusions.

Worked solution:

Measuring change:

For each respondent in each survey year, calculate the in-party/out-party thermometer difference: for Democrats, subtract the Republican Party thermometer rating from the Democratic Party rating; for Republicans, subtract the Democratic Party rating from the Republican Party rating. This gives a "partisan thermometer gap" for each respondent. Average these gaps within each year and compare: if the average gap in 2020 is larger than in 1980, affective polarization has increased.

Additionally, examine the distribution, not just the mean: has the proportion of respondents with very large gaps (e.g., ≥ 50 points) increased? Distributional change reveals whether polarization reflects a shift in the mean or growth in an extreme tail.

Confounds:

First, scale use differences over time: Americans may use rating scales differently across decades. If respondents in 2020 are more likely to give extreme ratings on all thermometers (a scale-use change), the increase in the partisan gap might partly reflect changed rating behavior rather than changed affect. This can be partially addressed by using standardized within-person gap scores rather than absolute ratings.

Second, partisan composition change: The proportion of "leaners" versus "strong" partisans has changed since 1980, and strong partisans show higher thermometer gaps. If the 2020 sample contains more strong partisans, the average gap increases even if the within-group pattern is unchanged.

Third, event-specific effects: If the 2020 survey was conducted during a particularly polarizing moment (e.g., during COVID), measured affect may reflect heightened situational hostility rather than stable underlying polarization. Measurement timing matters.

Key insight: Measuring political change over time requires careful attention to whether observed differences reflect genuine change in the phenomenon of interest or artifacts of measurement: different question contexts, different population compositions, or different scale-use norms. "Polarization increased" is a more complex empirical claim than it appears.


Chapter 4: Survey Research Design

Exercise 4.2 — Panel vs. Cross-Sectional Design

Problem restatement: A researcher wants to study how campaign advertising affects voter preferences. She has two options: (A) a cross-sectional survey conducted at the end of the campaign that asks retrospective questions about advertising exposure and current vote preference; (B) a panel survey with waves before, during, and after the campaign. Identify two advantages of design B and one disadvantage.

Worked solution:

Advantage 1 — Measurement of change: A panel design allows the researcher to observe actual change in preferences at the individual level, comparing each respondent's pre-campaign and post-campaign vote intention. A cross-sectional design can only observe the end-state; any "before" information must come from retrospective recall, which is subject to severe recall bias and post-hoc rationalization. Individual-level change scores are far more powerful evidence of advertising effects than cross-sectional associations.

Advantage 2 — Causal ordering: A panel design establishes temporal precedence — advertising exposure (measured during wave 2) preceded by the baseline vote preference (measured in wave 1). This temporal ordering is a necessary (though not sufficient) condition for causal inference. A cross-sectional design measures advertising exposure and vote preference simultaneously, making it impossible to distinguish whether advertising shifted preferences or whether pre-existing preferences led voters to notice and recall advertising for their preferred candidate (confirmation bias in advertising recall).

Disadvantage — Panel conditioning: Respondents who have been surveyed before the campaign about their vote intention, issue priorities, and candidate evaluations may pay more attention to campaign advertising than they would otherwise, because they are thinking about the campaign more deliberately. This makes the treated panel a non-representative sample for studying how advertising affects ordinary voters, who are not being tracked by researchers. The panel's behavior under study may differ from the population's behavior.

Key insight: The choice of research design is not merely technical — it determines what questions can and cannot be answered. A cross-sectional design can identify correlations between advertising exposure and preferences; a panel design can identify change; an experiment (random assignment to advertising conditions) can establish causation. Each design answers a different question, and matching design to question is the first task of good research.


Exercise 4.5 — Question Order Effects

Problem restatement: In an experiment testing question order effects on presidential approval, half of respondents (Group A) were asked about presidential approval first, then economic conditions. The other half (Group B) were asked about economic conditions first, then presidential approval. Group A showed 52% approval; Group B showed 44% approval. Explain this result and describe its implications for survey design.

Worked solution:

The 8-point gap between conditions likely reflects an assimilation effect from question order. When respondents in Group A are asked about presidential approval first (before thinking about the economy), their evaluation draws on all the considerations salient to them at that moment — foreign policy, candidate personality, cultural issues, as well as economics. But when respondents in Group B are first asked about economic conditions, this primes the economy as a salient consideration, making it more available and more heavily weighted in the subsequent approval question. If current economic evaluations are negative, the primed economy lowers approval for Group B relative to Group A.

This result has two major implications for survey design. First, the order in which questions appear in a survey is not arbitrary — it alters what is measured. A researcher who reports "52% approval" is reporting a finding that depends on prior questions in the survey instrument, and that result would likely change if the preceding questions were different. This is measurement context dependence: the "true" level of approval is not a context-free quantity.

Second, in practice, the "right" question order depends on the researcher's goals. If the goal is to measure overall approval uninflated by priming, asking approval first is preferable. If the goal is to understand how economic evaluations inform approval, putting economic evaluations first is appropriate. Researchers should make this choice deliberately and report it explicitly.

Key insight: Survey questions do not exist in a vacuum — the context created by prior questions shapes what respondents are thinking about when they answer each question. Question order is a design decision that affects results and should be reported in any credible poll disclosure. Analysts evaluating polls should always ask: what preceded this question?


Chapter 5: Causal Inference and Experiments

Exercise 5.1 — Python: Analyzing a Randomized Field Experiment

Problem restatement: A GOTV canvassing experiment randomly assigned 2,400 registered voters to a treatment group (n=1,200, received a canvassing visit) or a control group (n=1,200, did not receive a visit). Turnout was: treatment = 38.2%, control = 33.8%. Write Python code to calculate the treatment effect, its standard error, and the 95% confidence interval, and conduct a two-proportion z-test.

Key code:

import numpy as np
from scipy import stats

# Experimental results
n_treatment = 1200
n_control = 1200
p_treatment = 0.382  # 38.2% turnout in treatment group
p_control = 0.338    # 33.8% turnout in control group

# Observed counts
x_treatment = int(p_treatment * n_treatment)  # 458
x_control = int(p_control * n_control)         # 406

# Average treatment effect (ATE)
ate = p_treatment - p_control
print(f"Average Treatment Effect (ATE): {ate:.4f} ({ate*100:.2f} percentage points)")

# Standard error of the difference in proportions
se = np.sqrt((p_treatment * (1 - p_treatment) / n_treatment) +
             (p_control * (1 - p_control) / n_control))
print(f"Standard Error: {se:.4f}")

# 95% confidence interval
z_critical = 1.96
ci_lower = ate - z_critical * se
ci_upper = ate + z_critical * se
print(f"95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"95% CI: [{ci_lower*100:.2f}pp, {ci_upper*100:.2f}pp]")

# Two-proportion z-test
# Pooled proportion under null hypothesis (no effect)
p_pooled = (x_treatment + x_control) / (n_treatment + n_control)
se_null = np.sqrt(p_pooled * (1 - p_pooled) * (1/n_treatment + 1/n_control))
z_stat = (p_treatment - p_control) / se_null
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"\nTwo-proportion z-test:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Statistically significant (p < 0.05): {p_value < 0.05}")

Expected output:

Average Treatment Effect (ATE): 0.0440 (4.40 percentage points)
Standard Error: 0.0199
95% CI: [0.0050, 0.0830]
95% CI: [0.50pp, 8.30pp]

Two-proportion z-test:
Z-statistic: 2.2079
P-value: 0.0273
Statistically significant (p < 0.05): True

Interpretation: The canvassing treatment increased turnout by 4.4 percentage points (95% CI: 0.5pp to 8.3pp). This effect is statistically significant (p = 0.027), indicating it is unlikely to have arisen by chance if the true effect is zero. The wide confidence interval (0.5pp to 8.3pp) reflects the genuine uncertainty around this estimate given the sample size. Campaigns should interpret this as consistent with a real positive effect but not as a precise estimate of effect size.

Key insight: The value of random assignment is that it makes the control group a valid counterfactual for the treatment group. Without randomization, we could not rule out that people who received canvassers were already more likely to vote (selection bias). With randomization, the only expected difference between groups at baseline is chance variation, which the z-test accounts for.


Exercise 5.4 — Designing a Survey Experiment

Problem restatement: Design a survey experiment to test whether describing immigration policy as "border security" versus "immigration restriction" affects support for a proposal to reduce legal immigration by 20%. Your design should include a control condition, two treatment conditions, and a manipulation check.

Worked solution:

Design:

This is a three-condition between-subjects experiment. Respondents are randomly assigned to one of three conditions before the policy question is asked.

Condition A (Control): "A proposal has been introduced in Congress to reduce the number of legal immigrants admitted to the United States by 20%. Do you support or oppose this proposal? (Strongly support / Somewhat support / Somewhat oppose / Strongly oppose / Don't know)"

Condition B (Security frame): "A proposal has been introduced in Congress to strengthen border security by reducing the number of legal immigrants admitted to the United States by 20%. Do you support or oppose this proposal? [same response options]"

Condition C (Restriction frame): "A proposal has been introduced in Congress to restrict immigration by reducing the number of legal immigrants admitted to the United States by 20%. Do you support or oppose this proposal? [same response options]"

Manipulation check: After the support/oppose question, ask: "As you understand it, what is the primary goal of the proposal described above?" with options: (a) National security and safety, (b) Reducing immigration levels, (c) Economic protection, (d) Something else. Respondents in Condition B should be more likely to select (a); respondents in Condition C should be more likely to select (b). If there are no differences in the manipulation check across conditions, the frame was not successfully communicated.

Analysis: Compare mean support scores across conditions using an ANOVA or OLS regression with condition dummy variables. Examine whether the effect of framing differs by respondent characteristics (party identification, prior immigration attitudes, authoritarianism) through interaction terms.

Key insight: The manipulation check is as important as the outcome measure. A null result in a framing experiment (no difference in support across conditions) is ambiguous: it could mean the frame had no effect on evaluations, or it could mean the frame was not successfully processed by respondents. Confirming that the frame was received as intended before testing its persuasive effect allows researchers to distinguish between these interpretations.


Chapter 6: Qualitative Methods in Political Analytics

Exercise 6.2 — Focus Group Interpretation

Problem restatement: A political consultant conducts a focus group of 8 "soft Republican" suburban voters in Maricopa County, Arizona. Six of the eight participants express concern about "extreme positions" by the Republican candidate. The consultant reports to the campaign that "suburban voters are deeply concerned about extremism." Identify two methodological problems with this conclusion.

Worked solution:

Problem 1 — Sample size and representativeness: Eight participants in a single focus group is an insufficient basis for any conclusion about "suburban voters" as a class. There are hundreds of thousands of soft Republican suburban voters in Maricopa County. A focus group is designed for hypothesis generation — identifying themes and language — not for estimating the prevalence of opinions. The statement that suburban voters "are deeply concerned" implies a prevalence claim that the focus group cannot support. What the consultant can legitimately say is: "Some soft Republican suburban voters in our focus group expressed concern about extreme positions, and this theme merits investigation with a larger quantitative survey."

Problem 2 — Group dynamics and interviewer effects: Focus groups are susceptible to social influence within the group. If one or two articulate or high-status participants express concern about extremism, others may agree (or express more concern than they privately feel) due to conformity pressures. Conversely, if the topic of extremism was introduced by the moderator's question framing, participants may have been expressing agreement with a frame provided to them rather than independently volunteering this concern. A valid interpretation requires knowing how the topic arose, whether it was volunteered or prompted, and whether participants would have expressed similar concern in private.

Key insight: Focus groups are powerful tools for hearing how voters talk about issues in their own language and for identifying which arguments resonate emotionally. They are not tools for measuring how many voters hold a given view. Conflating qualitative richness with quantitative representativeness is one of the most common errors in political research communication.


Exercise 6.5 — Integrating Qualitative and Quantitative Data

Problem restatement: A campaign has conducted both a 600-respondent survey showing that 38% of independent voters "strongly" care about "economic security" and a focus group in which independent voters describe economic anxiety in vivid, specific terms (job loss, healthcare costs, retirement fears). How should the campaign's analyst integrate these two data streams?

Worked solution:

The quantitative and qualitative data play complementary but distinct roles. The survey provides the prevalence estimate: 38% of independent voters strongly prioritize economic security. This number allows comparison to other issue priorities, subgroup analysis (which independent voters are most economically anxious?), and tracking over time. It does not tell the analyst what economic security means to voters, which specific anxieties are most activated, or what language resonates.

The focus groups provide the content and texture that the survey cannot: the specific fears (not just "economic anxiety" as an abstraction but "I'm afraid I'll lose my health insurance if I lose my job"), the language voters use ("making ends meet" rather than "economic security"), and the narrative logic connecting economic concerns to candidate evaluations ("candidates like him don't understand what we're going through"). This qualitative content is essential for message development.

The integration goes in both directions. First, focus group insights should be used to sharpen survey measurement: if focus group participants talk about healthcare costs, job stability, and retirement security as distinct concerns, the next survey should include separate items for each, rather than a single "economic security" item. Second, survey prevalence helps the analyst weight the qualitative themes: if focus groups surface both healthcare fears and retirement fears with equal intensity, survey data can reveal whether one of these is substantially more prevalent in the broader population.

Key insight: The greatest value of qualitative research is not in the conclusions it generates but in the questions it refines. A well-conducted focus group makes the subsequent quantitative survey more valid by ensuring that the survey's questions capture what voters actually care about in the terms they actually use.


Chapter 7: Survey Question Design

Exercise 7.3 — Identifying Question Flaws

Problem restatement: Evaluate the following question: "Do you agree that Congress has failed to address the serious problems facing ordinary Americans and that real change requires new leadership?" Identify all design flaws and rewrite the question.

Worked solution:

Design flaws:

  1. Double-barreled: The question asks simultaneously (a) whether Congress has failed to address problems and (b) whether new leadership is needed. A respondent might agree with (a) but not (b), or vice versa. The question requires one answer to two distinct propositions.

  2. Leading language: "Serious problems" presupposes that there are serious problems of a particular character. "Real change" is an advocacy phrase associated with reform movements. "New leadership" implies the existing leadership is inadequate. Each phrase pushes respondents toward a particular evaluation.

  3. False premise embedded: "Congress has failed" is a contested normative judgment presented as a fact in the question stem. A question that asks respondents whether they agree with a premise they have not been asked to evaluate independently is a leading question.

  4. Acquiescence bias: The question is structured as a single proposition to which respondents can "agree" or disagree. Respondents with acquiescence bias — who tend to agree with propositions presented to them — will systematically over-agree.

Rewrite:

Split the question into two separate items:

Item 1: "How good a job do you think Congress is doing addressing the problems facing ordinary Americans? (Very good job / Somewhat good job / Somewhat poor job / Very poor job / No opinion)"

Item 2: "Do you think Congress needs mostly new leaders, or do you think the current congressional leaders are doing an adequate job? (Mostly needs new leaders / Current leaders are adequate / No opinion)"

Key insight: The original question is a push poll item — a statement that advances an argument while appearing to measure opinion. The rewrite separates the two concepts, uses neutral language, and offers balanced response options that allow genuine disagreement. The difference in design philosophy: push poll questions are designed to produce a particular answer; survey questions are designed to measure whatever answer respondents actually hold.


Exercise 7.6 — Designing a List Experiment

Problem restatement: A researcher believes that many voters hold anti-immigration views they would not report directly because of social desirability concerns. Design a list experiment to estimate the true prevalence of support for "ending all legal immigration for five years." Explain the logic of the design and how to analyze results.

Worked solution:

Design:

Randomly assign respondents to two conditions.

Control condition (n/2 respondents): "I'm going to read you a list of political proposals. I don't want to know which ones you support — just tell me how many of the proposals on this list you support. Here are the proposals: [1] Increase funding for rural infrastructure; [2] Expand Medicare to cover dental care; [3] Create a national paid parental leave program; [4] Reform campaign finance laws. How many of these proposals do you support? (0 / 1 / 2 / 3 / 4)"

Treatment condition (n/2 respondents): Same instructions, but the list contains a fifth item: "[5] End all legal immigration to the United States for five years." Same response scale: (0 / 1 / 2 / 3 / 4 / 5)

Logic: Because respondents report only a count (not which items they support), the presence of the sensitive item is protected — no individual respondent's support for the sensitive item is identifiable. Respondents should therefore answer more honestly.

Analysis: The estimated prevalence of support for ending legal immigration = Mean(treatment count) − Mean(control count). If control group mean = 1.8 and treatment group mean = 2.3, the estimated prevalence is 0.5, or 50%. Statistical significance is tested by comparing the means (t-test or regression with a treatment indicator).

Assumptions: The list experiment assumes that: (1) adding the sensitive item does not change responses to the other items (no "floor" or "ceiling" effects — if a respondent supports 0 or all of the control items, there is no room to differentiate); (2) respondents understand the task; (3) the sensitive item is actually sensitive (produces social desirability pressure in direct questions). Researchers typically screen out ceiling and floor effects in analysis.

Key insight: The list experiment trades off individual-level measurement (you cannot know which specific respondents support the sensitive item) for population-level accuracy (the aggregate estimate is protected from social desirability bias). This tradeoff is appropriate when the primary research interest is prevalence estimation rather than individual targeting.


Chapter 8: Sampling Theory and Practice

Exercise 8.2 — Calculating Margins of Error

Problem restatement: A poll of 847 likely voters finds Candidate A at 48%, Candidate B at 44%, and 8% undecided. (a) Calculate the 95% margin of error for each candidate's support percentage. (b) Calculate the 95% confidence interval for the difference between the candidates. (c) Can you conclude with 95% confidence that Candidate A is ahead?

Worked solution:

(a) Margin of error for each candidate:

For a proportion p in a simple random sample of size n, the MoE at 95% confidence is:

MoE = 1.96 × √(p(1-p)/n)

For Candidate A (p = 0.48): MoE = 1.96 × √(0.48 × 0.52 / 847) = 1.96 × √(0.000295) = 1.96 × 0.01717 = 0.0337, approximately ±3.4 percentage points.

For Candidate B (p = 0.44): MoE = 1.96 × √(0.44 × 0.56 / 847) = 1.96 × √(0.000291) = 1.96 × 0.01705 = 0.0334, approximately ±3.3 percentage points.

(b) Confidence interval for the difference:

The difference D = 0.48 − 0.44 = 0.04 (4 percentage points).

The standard error of the difference (assuming independent samples, or using the formula for a within-poll comparison):

SE(D) = √[(p_A(1-p_A)/n) + (p_B(1-p_B)/n)] = √[(0.48×0.52/847) + (0.44×0.56/847)] = √[(0.000295) + (0.000291)] = √(0.000586) = 0.02420

95% CI for D: 0.04 ± 1.96 × 0.02420 = 0.04 ± 0.0474

95% CI: −0.0074 to +0.0874, or approximately −0.7pp to +8.7pp.

(c) Can we conclude A is ahead?

No, we cannot conclude with 95% confidence that Candidate A is ahead. The 95% confidence interval for the difference runs from −0.7 percentage points to +8.7 percentage points. Because this interval includes zero (and includes values where B is ahead), we cannot reject the null hypothesis that the candidates are tied at the 95% confidence level. The race should be described as "within the margin of error" or "a statistical tie" — not as Candidate A leading.

Key insight: A critically important and frequently misunderstood point: the margin of error for any individual candidate's support level (±3.4pp) is NOT the same as the margin of error for the difference between candidates. The confidence interval for the difference is approximately 1.4 times wider than the MoE for a single proportion. Reporters and campaigns frequently make the error of applying the single-candidate MoE to assess whether two candidates' numbers are distinguishably different. The correct calculation uses the formula for the difference's standard error.


Exercise 8.5 — Stratified Sampling Design

Problem restatement: You are designing a survey of 1,200 voters for a U.S. Senate race. The state has four regions (Metro: 45% of registered voters; Suburban: 30%; Small city: 15%; Rural: 10%). Design a proportionate stratified sample and an optimally allocated stratified sample, assuming the rural stratum has twice the variance in Senate candidate preference as the other strata.

Worked solution:

Proportionate stratified sample:

Allocate sample proportional to stratum population size:

  • Metro: 1200 × 0.45 = 540 respondents
  • Suburban: 1200 × 0.30 = 360 respondents
  • Small city: 1200 × 0.15 = 180 respondents
  • Rural: 1200 × 0.10 = 120 respondents

Optimal allocation (Neyman allocation):

Optimal allocation assigns more respondents to strata that are larger and/or have higher variance. The formula is:

n_h = n × (N_h × S_h) / Σ(N_h × S_h)

where N_h is the population share and S_h is the standard deviation in stratum h.

Assume standard deviations: Metro = 1.0 (arbitrary unit), Suburban = 1.0, Small city = 1.0, Rural = √2 ≈ 1.414 (twice the variance = √2 times the SD).

Compute N_h × S_h products: - Metro: 0.45 × 1.0 = 0.450 - Suburban: 0.30 × 1.0 = 0.300 - Small city: 0.15 × 1.0 = 0.150 - Rural: 0.10 × 1.414 = 0.141

Sum = 0.450 + 0.300 + 0.150 + 0.141 = 1.041

Optimal allocation: - Metro: 1200 × (0.450/1.041) = 519 respondents - Suburban: 1200 × (0.300/1.041) = 346 respondents - Small city: 1200 × (0.150/1.041) = 173 respondents - Rural: 1200 × (0.141/1.041) = 163 respondents

The optimal allocation oversamples the rural stratum (163 vs. 120 in proportionate) to compensate for its higher variance, and slightly undersamples the other strata accordingly.

Key insight: Optimal allocation improves precision by concentrating sample in the strata where variance is highest. The intuitively appealing proportionate allocation is not always the most efficient design. However, optimal allocation requires knowing (or estimating) stratum variances in advance — often from prior surveys. When prior variance estimates are unavailable, proportionate allocation is a safe default.


Chapter 9: Data Analysis and Weighting

Exercise 9.3 — Raking a Survey Sample

Problem restatement: A survey of 500 respondents has the following demographic profile vs. the true population: Men: 43% (vs. 49% in population); Women: 57% (vs. 51%); Age 18–34: 18% (vs. 26%); Age 35–54: 35% (vs. 33%); Age 55+: 47% (vs. 41%). Describe one iteration of raking and explain when raking converges.

Worked solution:

One iteration of raking:

Raking adjusts each demographic variable in turn, resetting the overall sample weight each time.

Step 1 — Adjust for gender: Men are currently 43% but should be 49%; multiply men's weights by 49/43 = 1.140. Women are 57% but should be 51%; multiply women's weights by 51/57 = 0.895.

After Step 1, the gender margin matches the target (49% men, 51% women), but the age margin may have shifted slightly because the gender adjustment changed effective weights.

Step 2 — Adjust for age: After Step 1, recompute the weighted age distribution. If it now shows 18–34 at 19% (vs. target 26%), adjust 18–34 weights by 26/19 = 1.368; 35–54 at 34% (vs. 33%), multiply by 33/34 = 0.971; 55+ at 47% (vs. 41%), multiply by 41/47 = 0.872.

After Step 2, the age margin matches targets, but the gender margin may have drifted slightly from the Step 1 adjustment.

Convergence:

Raking iterates through all weighting variables (one full iteration = adjusting for gender, then age, then any additional variables) until the weighted sample margins are within a defined tolerance of all targets simultaneously (e.g., all within 0.1 percentage points). In practice, convergence typically occurs within 5–15 iterations. Raking converges reliably when the weighting variables are not too strongly correlated and when the sample is not too extreme in its imbalance on any single variable.

Key insight: Raking is a computationally simple but theoretically sound method for simultaneously satisfying multiple marginal constraints without requiring knowledge of the joint distribution. Its limitation is that it corrects for known demographic imbalances but cannot correct for imbalances in unobserved characteristics correlated with political opinion — the fundamental challenge of non-response bias.


Chapter 10: Evaluating Poll Quality

Exercise 10.2 — Python: Detecting House Effects

Problem restatement: You have a dataset of Senate race polls from 10 different polling firms over 3 months. Using Python and pandas, describe how you would estimate each firm's house effect (systematic bias relative to the polling average) and test whether it is statistically significant.

Key code:

import pandas as pd
import numpy as np
from scipy import stats

# Simulate example poll dataset
# In practice, load from CSV: pd.read_csv('senate_polls.csv')
np.random.seed(42)

firms = ['Firm_A', 'Firm_B', 'Firm_C', 'Firm_D', 'Firm_E']
# Simulate each firm having a true house effect
true_house_effects = {'Firm_A': +2.0, 'Firm_B': -1.5, 'Firm_C': 0.0,
                      'Firm_D': +0.5, 'Firm_E': -3.0}

polls = []
for firm in firms:
    n_polls = 8  # 8 polls per firm
    for _ in range(n_polls):
        # True D margin around -1.0, with firm-specific bias and random error
        dem_margin = -1.0 + true_house_effects[firm] + np.random.normal(0, 3.5)
        polls.append({'firm': firm, 'dem_margin': dem_margin, 'n': 800})

df = pd.DataFrame(polls)

# Step 1: Calculate the overall polling average
overall_avg = df['dem_margin'].mean()
print(f"Overall polling average (D margin): {overall_avg:.2f}pp")

# Step 2: Calculate each firm's mean and deviation from the overall average
firm_stats = df.groupby('firm')['dem_margin'].agg(['mean', 'std', 'count']).reset_index()
firm_stats.columns = ['firm', 'firm_mean', 'firm_std', 'n_polls']
firm_stats['house_effect'] = firm_stats['firm_mean'] - overall_avg

# Step 3: T-test for each firm (is firm mean significantly different from overall avg?)
for _, row in firm_stats.iterrows():
    firm_data = df[df['firm'] == row['firm']]['dem_margin']
    t_stat, p_val = stats.ttest_1samp(firm_data, overall_avg)
    firm_stats.loc[firm_stats['firm'] == row['firm'], 't_stat'] = t_stat
    firm_stats.loc[firm_stats['firm'] == row['firm'], 'p_value'] = p_val

print("\nHouse Effects by Firm:")
print(firm_stats[['firm', 'firm_mean', 'house_effect', 't_stat', 'p_value']].to_string(index=False))

# Step 4: Flag firms with significant house effects
significant = firm_stats[firm_stats['p_value'] < 0.05]
if len(significant) > 0:
    print(f"\nFirms with statistically significant house effects (p<0.05):")
    print(significant[['firm', 'house_effect', 'p_value']].to_string(index=False))
else:
    print("\nNo firms show statistically significant house effects.")

# Step 5: Adjusted polling average (correcting for house effects)
df_adj = df.merge(firm_stats[['firm', 'house_effect']], on='firm')
df_adj['adjusted_margin'] = df_adj['dem_margin'] - df_adj['house_effect']
adj_avg = df_adj['adjusted_margin'].mean()
print(f"\nHouse-effect-adjusted polling average: {adj_avg:.2f}pp")

Expected output pattern:

The code produces a table showing each firm's mean, house effect (deviation from overall average), t-statistic, and p-value. Firms with large and consistent directional biases will show significant p-values (< 0.05). The house-effect-adjusted average is typically closer to the actual election result than the unadjusted average, particularly when one or two firms dominate the polling landscape with systematic bias.

Key insight: House effects are estimated relative to the polling average, which is itself not the ground truth — the election result is. This means house effect estimates conflate two things: true firm bias and the aggregate polling average's own bias. More sophisticated methods estimate house effects as deviations from the election result across multiple elections, requiring a larger historical dataset. The t-test approach here is appropriate for detecting within-election-cycle house effects but should not be over-interpreted.


Chapter 11: Likely Voter Models

Exercise 11.3 — Constructing a Likely Voter Index

Problem restatement: You have a survey of 1,500 registered voters with the following questions: (1) Intent to vote (1–5 scale); (2) Past midterm voting (0/1); (3) Past presidential voting (0/1); (4) Interest in the current election (1–4 scale). Describe how to construct a likely voter index from these four variables, weight the index, and determine a cutoff threshold.

Worked solution:

Step 1 — Standardize variables:

The four variables are on different scales. Either recode each to a 0–1 scale or standardize using Z-scores. For simplicity, recode: - Intent to vote: already 1–5 scale; recode to 0–1: (score − 1)/4 - Past midterm: 0/1, already binary - Past presidential: 0/1, already binary - Interest: recode 1–4 to 0–1: (score − 1)/3

Step 2 — Assign weights:

Past voting history is the strongest predictor of future turnout, so it should receive higher weight. Intent to vote is stated, not behavioral, so it receives moderate weight. Interest is related to both but somewhat weaker. One reasonable weighting scheme:

  • Intent to vote: weight = 2 (behavioral intention, directly relevant)
  • Past midterm voting: weight = 3 (strongest predictor for midterm)
  • Past presidential voting: weight = 2 (relevant but less specific)
  • Interest in current election: weight = 1 (weakest predictor)

Index = (2 × intent_01 + 3 × midterm_01 + 2 × presidential_01 + 1 × interest_01) / 8

Maximum possible index = 8/8 = 1.0; minimum = 0/8 = 0.

Step 3 — Determine threshold:

The threshold should be set to retain approximately the expected turnout proportion as "likely voters." If expected turnout in this election is 45% of registered voters, the cutoff should be set so that approximately 45% of the weighted sample exceeds it. This is done empirically: rank all respondents by index score, set the cutoff at the 55th percentile (excluding the bottom 55%, retaining the top 45%).

Step 4 — Validation:

If prior election data is available, the likely voter index can be validated by checking whether respondents classified as likely voters in past surveys actually turned out at higher rates than non-classified respondents. Ideal validation uses administrative voter file data matched to survey records.

Key insight: Every likely voter model involves a threshold judgment that is both methodological and political. Setting the threshold too high excludes too many actual voters, artificially inflating the winning candidate's lead. Setting it too low includes non-voters, moving results toward registered voter distributions. The "right" threshold depends on the election context — and in primary elections with very low and hard-to-predict turnout, likely voter models are especially uncertain.


Chapters 12–20: Additional Selected Exercises

Chapter 12, Exercise 12.4 — Ecological Inference

Problem restatement: County-level data shows a strong negative correlation between the percentage of Hispanic residents in a county and the Republican presidential vote share. Can you conclude from this that Hispanic voters are less likely to support the Republican candidate? What is the ecological fallacy, and what evidence would you need to rule it out?

Worked solution:

No. This is precisely the ecological fallacy: inferring individual-level behavior from aggregate-level correlations. Counties with higher Hispanic populations tend to vote less Republican, but this pattern could be driven entirely by non-Hispanic voters in those counties — perhaps because counties with high Hispanic populations tend to be urban, and urban voters of all ethnicities vote more Democratic. The aggregate correlation conflates the effect of individual Hispanic identity with the effect of county-level context.

To rule out the ecological fallacy, individual-level evidence is needed: surveys of Hispanic voters that directly measure their vote choice, or voter file analyses that estimate Hispanic individuals' vote probability based on their turnout history and district-level results.

Key insight: Ecological correlations are among the most seductive and most treacherous forms of evidence in political analysis. They are easy to obtain (aggregate data is widely available) and look like strong evidence. But the ecological fallacy is a persistent danger whenever individual inference is drawn from aggregate patterns.


Chapter 13, Exercise 13.2 — Gender Gap Analysis

Problem restatement: A poll shows that Candidate X leads among women 55%–38% but trails among men 41%–53%. With a sample of 600 women and 580 men (MoE for each group approximately ±4pp), are these gender gaps statistically distinguishable? What is the gender gap in this poll, conventionally defined?

Worked solution:

Gender gap definition: The gender gap is conventionally defined as the difference in the Democratic (or female-defined candidate's) two-party vote share between women and men. Here: Women's D margin = 55% − 38% = +17pp; Men's D margin = 41% − 53% = −12pp. Gender gap = 17 − (−12) = 29 percentage points. This is a large gender gap by historical standards.

Statistical significance: The SE of the gender gap (difference in differences) can be computed:

SE(gap) = √[SE(women)² + SE(men)²]

Where SE for each group ≈ 4pp/1.96 ≈ 2.04pp (since MoE ≈ 1.96 × SE).

SE(gap) = √[2.04² + 2.04²] = √[4.16 + 4.16] = √8.32 = 2.88pp

95% CI for gap: 29 ± 1.96 × 2.88 = 29 ± 5.6pp, i.e., 23.4pp to 34.6pp.

The confidence interval does not include zero, confirming the gender gap is statistically distinguishable from zero at high confidence.

Key insight: The gender gap in American politics has grown substantially since the 1980 election. Understanding its composition — how much reflects college/non-college sorting, urban/rural sorting, age differences, and genuine gender-based political differences — requires multivariate analysis controlling for confounding demographic variables.


Chapter 14, Exercise 14.5 — Redistricting and Partisan Efficiency

Problem restatement: A state has three districts. District 1: D wins 72%–28%. District 2: R wins 58%–42%. District 3: R wins 54%–46%. Calculate the "wasted votes" for each party in each district and the overall efficiency gap.

Worked solution:

Wasted votes are defined as: for the losing party, all their votes; for the winning party, votes in excess of 50% + 1 vote (the minimum needed to win). Assume equal district size of 100 votes for simplicity.

District D Votes R Votes Winner D Wasted R Wasted
1 (D wins 72–28) 72 28 D 72 − 51 = 21 28
2 (R wins 42–58) 42 58 R 42 58 − 51 = 7
3 (R wins 46–54) 46 54 R 46 54 − 51 = 3
Total 160 140 109 38

Efficiency gap = (D wasted − R wasted) / Total votes = (109 − 38) / 300 = 71/300 = 0.237, or approximately 23.7 percentage points favoring Republicans.

An efficiency gap of 23.7% is very large — it suggests significant partisan gerrymandering favoring Republicans. The D votes are being "packed" into District 1 (where many D votes are wasted) while R votes are efficiently distributed to win Districts 2 and 3 with minimal waste.

Key insight: The efficiency gap is one of several proposed mathematical measures of partisan gerrymandering. Its value is that it provides a single number quantifying the degree of partisan advantage embedded in a district map. Its limitation is that it does not distinguish between partisan intent and natural geographic clustering of partisan voters.


Chapter 15, Exercise 15.3 — Media Framing Analysis

Problem restatement: Describe a systematic method for coding whether news coverage of immigration emphasizes "economic threat," "cultural threat," "humanitarian concern," or "legal/administrative" frames. How would you assess inter-coder reliability, and what threshold would you require?

Worked solution:

Coding method:

Develop a codebook that operationally defines each frame with specific indicators: - Economic threat frame: Presence of language about wages, jobs, fiscal costs, economic competition, burden on public services - Cultural threat frame: Language about cultural change, national identity, language, social cohesion, out-group incompatibility - Humanitarian frame: Language about refugee conditions, family separation, safety, rights, suffering, asylum - Legal/administrative frame: Language about law enforcement, process, documentation, policy mechanics, border security procedures

For each article, the primary coder assigns one dominant frame (the frame most prominent in terms of headline emphasis, lead paragraph, and proportion of text). If multiple frames are present at roughly equal weight, a "mixed frame" category is available. Subframe coding is also possible (coding secondary frames).

Inter-coder reliability:

Assign a random 20% of articles to be independently coded by a second coder. Calculate Cohen's kappa (κ) for the frame assignments. Kappa corrects for agreement that would occur by chance.

κ = (P_o − P_e) / (1 − P_e)

where P_o is observed agreement proportion and P_e is expected agreement by chance.

The generally accepted threshold for "substantial agreement" is κ ≥ 0.60. For content analysis in published research, κ ≥ 0.70 is typically required. Below 0.60 indicates the coding categories are insufficiently operationalized and coders should be retrained or definitions revised.

Key insight: Content analysis is only as reliable as its coding scheme. A well-specified codebook that provides concrete examples of each frame category is essential for inter-coder agreement. Reliability assessment is not optional — it is the primary validity check for content analysis. Research that reports coding results without inter-coder reliability statistics should be treated with caution.


Chapter 17, Exercise 17.1 — Fundamentals Models

Problem restatement: In 2024, a simplified economic fundamentals model predicts the incumbent party candidate will receive 48% of the two-party vote based on GDP growth and presidential approval. Polling shows the candidate at 52%. Describe how to combine these two inputs using weighted averaging, and explain how you would weight them as a function of days until the election.

Worked solution:

Weighted average formula:

Combined estimate = (w_f × Fundamentals) + (w_p × Polls)

where w_f + w_p = 1.

At any given point in time, the relative weight of polls versus fundamentals reflects the information content of polls — how much signal is contained in current polls about the eventual vote share — versus the reliability of structural fundamentals.

Weighting as a function of time:

Early in the campaign (180+ days out), polls are highly volatile and contain substantial "house effect" noise. Fundamentals models have more predictive power at this stage. A reasonable weighting: w_f = 0.70, w_p = 0.30.

As Election Day approaches, polls become progressively more predictive as voter uncertainty resolves and the election environment stabilizes. Nate Silver's models and academic research suggest polls become progressively more informative relative to fundamentals from about 60 days out.

Final combined estimate at 30 days out (example weights: w_f = 0.30, w_p = 0.70):

Combined = 0.30 × 48% + 0.70 × 52% = 14.4 + 36.4 = 50.8%

Interpretation: The combined estimate is between the two inputs, weighted toward the polls given their higher reliability at 30 days out. The incumbent candidate is narrowly ahead in this estimate. Uncertainty around this estimate is substantial because both inputs have error.

Key insight: The art of combining fundamentals and polls is not just calculating a weighted average — it is correctly specifying how the weights should change as a function of elapsed time and available information. Overweighting polls early in the campaign imports volatile, low-information signal into the forecast; overweighting fundamentals late in the campaign ignores genuinely informative polling data.


Chapter 20, Exercise 20.3 — MRP Step by Step

Problem restatement: Explain in plain language how MRP (Multilevel Regression and Poststratification) allows a researcher to estimate support for a ballot measure in a specific state using only a national survey. What are the key assumptions, and when might MRP fail?

Worked solution:

How MRP works:

MRP proceeds in two stages. In the first stage (multilevel regression), the researcher fits a statistical model using the national survey data, predicting individual respondents' support for the ballot measure from their individual characteristics (age, sex, education, race, party identification) and the characteristics of their state (state partisanship, income, region). This model captures how support varies across demographic groups and geographic contexts.

In the second stage (poststratification), the researcher uses Census data or voter file data to determine the actual demographic composition of the target state — what proportion of the state's adult population is female, college-educated, Black, Democratic, etc. The model's predictions for each demographic cell are then averaged using the actual state composition as weights.

The result is an estimate of ballot measure support in the target state that reflects (a) what the survey says about how different types of people support the measure, and (b) how many of each type of person actually live in that state.

Key assumptions:

First, the multilevel model must correctly specify how demographics and state characteristics predict support. If the relationship between, say, college education and ballot measure support differs substantially between the target state and the rest of the country, the model will be misspecified.

Second, the demographic data used for poststratification must accurately represent the target electorate. Census population data does not equal the likely voter electorate; applying Census demographics without voter-turnout adjustment can produce inaccurate estimates.

When MRP fails:

MRP performs poorly when the target state has unique political characteristics not captured by the model's geographic variables — when the state is genuinely exceptional in ways not reflected in its demographics or partisan composition. It also fails when small national samples contain very few respondents in the target state, providing insufficient direct data to constrain state-level model estimates.

Key insight: MRP is not magic — it is a formal method for extrapolating from national data to local contexts using demographics as the connective tissue. Its validity depends on the assumption that demographic predictors in the model capture the relevant variation in political opinion. When that assumption holds (as it often does for salient national issues), MRP is remarkably effective. When the local political context is driven by idiosyncratic factors the model doesn't capture, MRP can produce misleading estimates.


Additional exercise solutions for Chapters 21–44 are available in the instructor's companion materials.