Chapter 22 Exercises: Down-Ballot and Global Forecasting

DataField.Dev

Chapter 22 Exercises: Down-Ballot and Global Forecasting

Conceptual Review

Exercise 22.1 — Senate vs. House Forecasting Compare Senate and House forecasting across the following dimensions. For each, explain which race type is harder to forecast and why.

a) Data availability per competitive race b) Number of simultaneous competitive races c) Importance of candidate quality vs. national environment d) Vulnerability to systematic polling error e) Importance of geographic distribution of votes

Exercise 22.2 — The Generic Ballot Translation Problem Democrats lead the generic ballot by 4 points heading into the midterms.

a) Does a 4-point generic ballot lead guarantee a Democratic House majority? Why or why not? b) Describe two scenarios where a 4-point generic ballot advantage produces very different seat outcomes. What structural factors explain the difference? c) A forecaster publishes "Democrats will gain 15 seats" as their House projection. Why is this a less informative forecast than "Democrats will gain 8–22 seats with a median projection of 15"? d) How does the distribution of competitive seats across geographic regions affect the translation from generic ballot to seats?

Exercise 22.3 — MRP Architecture Explain in plain language (without equations) how MRP produces district-level estimates from a national survey.

a) What problem does MRP solve that ordinary quota-sampling does not? b) Why is having 15 survey respondents in a congressional district insufficient for direct estimation? c) What does "borrowing strength" from other geographic units mean in the MRP context? d) What is the "poststratification" step, and why is Census data necessary for it? e) In what type of electoral environment would MRP be most and least accurate? Explain your reasoning.

Applied Problems

Exercise 22.4 — Seats-Votes Curve Construction Using the following hypothetical historical data on a legislative chamber with 200 seats, construct a simplified seats-votes curve.

Year	Party A vote %	Party A seats
2000	45%	82
2004	48%	93
2008	52%	108
2012	55%	124
2016	50%	100
2020	47%	88

a) Plot vote share (x-axis) vs. seat share (y-axis). b) Fit a linear regression. What is the estimated "swing ratio" (seats gained per 1 percentage point gain in vote share)? c) Using your fitted relationship, project how many seats Party A would win if their national vote share is 51%. d) Why might the historical relationship underestimate or overestimate the current cycle's seat outcome? e) A forecaster argues that the swing ratio has decreased in recent years because geographic sorting has reduced the number of competitive seats. What evidence would you look for to evaluate this claim?

Exercise 22.5 — International System Comparison Compare the forecasting challenges in four electoral systems:

System	Country Example	Electoral Formula
First-Past-the-Post	UK	Plurality in single-member districts
Proportional Representation	Netherlands	Party list PR
Mixed-Member Proportional	Germany	FPTP + PR correction
Two-Round Presidential	France	Top-2 runoff

For each system: a) What is the primary object of the forecast (vote share? seat total? government formation? runoff result)? b) How does the electoral formula affect the translation from vote shares to seats or government? c) What is the single largest source of forecasting uncertainty specific to that system? d) Would poll aggregation be sufficient for a high-quality forecast, or would additional inputs be required?

Exercise 22.6 — MRP Poststratification Exercise A national survey of 5,000 respondents estimates the following vote preference by demographic group (simplified to three groups for this exercise):

Group	Sample n	Democratic %	Population share in District A	Population share in District B
College graduates	2,000	62%	40%	20%
Non-college whites	1,500	38%	35%	55%
Non-white voters	1,500	78%	25%	25%

a) Calculate the MRP estimate for District A's Democratic vote share. b) Calculate the MRP estimate for District B's Democratic vote share. c) District A's actual Democratic vote share is 61%. District B's actual is 53%. Which estimate is more accurate? d) What might explain the discrepancy between the MRP estimate and the actual result in the less-accurate district? e) The MRP estimate ignores local candidate effects and district-specific economic conditions. How would you incorporate these into an extended MRP framework?

International Applications

Exercise 22.7 — Brazilian Polling Challenge Brazilian pollsters faced systematic Bolsonaro underestimation in 2022.

a) Identify three structural reasons why Bolsonaro supporters might have been underrepresented in Brazilian telephone polls. How do these compare to the mechanisms identified for Trump supporter underrepresentation in American polls? b) The standard correction in American polling is education weighting. Describe why this correction might work differently (or less well) in a Brazilian context. c) A Brazilian forecasting firm is considering weighting on recalled vote choice from the 2018 presidential election. What are the advantages and disadvantages of this approach in the Brazilian context specifically? d) If you were designing a polling methodology from scratch to accurately represent the Brazilian electorate, what approaches would you prioritize? Consider geographic, technological, and linguistic diversity.

Exercise 22.8 — UK MRP Application YouGov's UK constituency MRP uses Brexit referendum vote choice as a key predictor in addition to standard demographics (age, education, class).

a) Why would Brexit referendum vote choice improve MRP accuracy in the 2017–2019 period specifically? b) What would happen to the model's accuracy if Brexit referendum vote choice were dropped from the regression? Would the effect be symmetric across constituencies? c) By 2024, would you expect Brexit referendum vote choice to be as useful a predictor as it was in 2017–2019? Explain your reasoning. d) The Brexit referendum was in 2016. How do you handle voters who have entered the electorate since 2016 (who were too young to vote in the referendum)?

Exercise 22.9 — Parliamentary vs. Presidential Forecasting Complexity A French political scientist argues: "Presidential forecasting is easier than parliamentary forecasting because you only need to predict one binary outcome rather than the composition of a 577-member legislature and its resulting government." An American political scientist replies: "But parliamentary vote shares translate more reliably to seats than American House votes, because geographic concentration is less of a problem under PR."

Write a 300-word essay evaluating both claims. Consider: - What makes the forecasting problem in each system hard in different ways - Whether the French presidential or German parliamentary election in 2022 was more or less accurately forecast than the American House elections in the same cycle - What "difficulty" means in this context (is it harder to be accurate, or harder to quantify your uncertainty?)

Data Quality and Ethics

Exercise 22.10 — The Data Desert Problem You are asked to forecast a national election in a sub-Saharan African country where: - Only 3 national polls have been conducted in the past year - Two pollsters are known to have political ties to major parties - Polling is conducted primarily by telephone in a country with 65% mobile penetration - The country has held only 3 elections since democratization in 1995

a) Describe a forecasting approach that acknowledges these data limitations honestly. b) What is the minimum information you would need to produce any probabilistic forecast at all? c) How would you communicate forecast uncertainty to a client (a democracy assistance organization) in a way that is honest about what the forecast can and cannot say? d) Should you produce a forecast at all? Are there circumstances where the responsible answer is "the data does not support a credible probabilistic forecast"?

Exercise 22.11 — Who Gets Counted: Global Application The chapter argues that the populations hardest to survey are often the most consequential in close elections.

a) Identify two international elections where rural or economically marginal voters were systematically underrepresented in polling — and where this underrepresentation contributed to a polling miss. b) What structural features of polling methodology (telephone, online, in-person) tend to underrepresent economically marginal populations? Do these vary by country? c) Some analysts argue that the growing accuracy of prediction markets (where participants bet on outcomes) might partially substitute for biased polling. Evaluate this claim: who participates in prediction markets, and how does that selection bias affect their predictive value for elections where underrepresented populations are decisive? d) In your view, is systematic underrepresentation of certain populations in election polling a technical problem, a political problem, or an ethical problem? How does your answer affect what you think should be done about it?