Chapter 19 Exercises: Probabilistic Forecasting and Uncertainty

DataField.Dev

Chapter 19 Exercises: Probabilistic Forecasting and Uncertainty

Conceptual Exercises

Exercise 19.1 — Probability Basics

For each of the following scenarios, identify whether the described reaction to a probabilistic forecast is appropriate or represents a misinterpretation, and explain why:

a) A forecaster gives Candidate A a 75% win probability. Candidate A wins. A journalist writes: "The model was right."

b) A forecaster gives Candidate A a 75% win probability. Candidate B wins. A journalist writes: "The model failed."

c) A forecaster gives Candidate A a 60% win probability. A campaign manager says: "60% isn't good enough — we need to assume we're winning and plan accordingly."

d) A forecaster gives Candidate A a 55% win probability. An observer says: "That's basically a coin flip — the forecaster has no idea who's going to win."

e) A forecaster consistently gives 70% probability to events that happen 68% of the time across a large sample of races. A critic says: "The forecaster is biased — they're systematically wrong." Evaluate this claim.

Exercise 19.2 — Monte Carlo Basics

The polling average in a competitive Senate race shows Candidate A up 3 points. The historical standard deviation of final polling averages as predictors of election results is 3.5 points.

Assuming a normal distribution of election-day outcomes:

a) What is the probability that Candidate A wins (i.e., the election-day margin is positive)? (Hint: You need to find the probability that a normal distribution with mean 3 and SD 3.5 is greater than 0. Use z-score tables or calculate: z = 3/3.5 = 0.857; lookup or recall P(Z > -0.857) ≈ 80.4%)

b) What is the 80% confidence interval on the election-day margin? (i.e., what range contains 80% of the simulated outcomes?)

c) If you double the standard deviation to 7.0 points (representing higher uncertainty), how does the win probability change?

d) What does this calculation illustrate about the relationship between uncertainty and win probability?

Exercise 19.3 — Correlated Errors

Consider a presidential election with three swing states: State A, State B, and State C. The polling average shows the Democrat up 2 points in each.

Scenario 1 (Independent errors): Assume errors across states are independent; each state has a polling error standard deviation of 3.5 points. What is the probability that the Democrat loses all three states simultaneously?

Scenario 2 (Correlated errors): Assume errors are highly correlated — if the polls are wrong by 3 points in one state, they're wrong by approximately the same in the others. How does this change your assessment of the probability of losing all three states simultaneously?

a) Work through the logic of each scenario conceptually, even if you can't compute exact numbers. b) Which scenario is more realistic? Why? c) What were the implications of ignoring correlated errors in the 2016 presidential forecast?

Exercise 19.4 — Calibration

Here are the outcomes of 50 elections where a forecaster assigned specific win probabilities:

Predicted Probability	Number of Elections	Candidate Won
55-65%	15	10
65-75%	12	9
75-85%	10	8
85-95%	8	8
95%+	5	4

a) For each probability bin, compute the actual win rate. b) Compare the actual win rates to what a well-calibrated forecaster would predict (use the midpoint of each bin as the expected probability). c) Is this forecaster well-calibrated? Where are they over- or under-confident? d) What additional data would you want to see to make a definitive calibration assessment?

Analytical Exercises

Exercise 19.5 — The 2016 Forecast Revisited

Research the probabilistic forecasts published by major forecasters in the final week of the 2016 presidential election: - FiveThirtyEight (polls-only model) - FiveThirtyEight (polls-plus model) - The Huffington Post Pollster - Princeton Election Consortium - The Upshot (NYT)

For each: a) What probability did they assign to a Clinton win? b) What probability did they assign to a Trump win? c) What methodological choices drove differences between forecasters? d) Which forecaster was most "accurate" in terms of giving a meaningful probability to the actual outcome? e) What lessons about correlated errors and calibration does this comparison illustrate?

Exercise 19.6 — Scenario Analysis Design

For a competitive Senate race of your choice (real or hypothetical), design a four-scenario analysis:

a) Define four scenarios with descriptive names and specific conditions for each b) Assign probabilities to each scenario (they must sum to 100%) c) For each scenario, describe: what observable signals would indicate we're in this scenario? What campaign actions would each scenario call for? d) How does your scenario analysis translate the probabilistic forecast into an actionable strategic document?

Exercise 19.7 — Vivian Park Method Application

You are a pollster briefing a campaign on the state of their Senate race. Your model shows the candidate at 62% win probability, with the polling average showing them up 2.5 points and a confidence interval of -1 to +6 points.

Write a 400-500 word briefing document that applies the Vivian Park method: lead with the range, use accessible analogies, make the uncertainty actionable, and be transparent about what drives the uncertainty. Avoid jargon. Write for a smart non-statistician.

Applied Exercises

Exercise 19.8 — Simple Monte Carlo

In Python or a spreadsheet, build a simple Monte Carlo model for a Senate race:

Set polling average = 3 points (Democrat lead)
Set standard deviation = 3.5 points
Draw 10,000 random election margins from a normal distribution with these parameters
Count the fraction where the Democrat wins (margin > 0)
Compute the 80% and 95% confidence intervals

Then modify the model: a) What happens to win probability if the polling average shifts to 5 points? b) What happens if the standard deviation increases to 5 points? c) What happens if you add a correlated error term: in 30% of simulations, add an additional draw from N(0, 2) to represent the possibility of a national wave?

Exercise 19.9 — Forecaster Comparison

In a recent election cycle, identify at least three forecasters who publish probabilistic win probabilities for congressional races (538, Economist, DDHQ, etc.). Select ten competitive Senate races from that cycle.

a) For each race, record the win probability each forecaster assigned b) Record the actual outcome c) Compare calibration across forecasters using your data d) Did any forecaster show systematically higher or lower probabilities for one party? e) What does this comparison tell you about model differences?

Discussion Questions

Discussion 19.1

After the 2016 election, some argued that probabilistic forecasting should be banned or heavily regulated because it suppresses voter turnout among supporters of the leading candidate. Others argued the opposite: that probabilistic forecasting is more honest and democratic than confident point predictions. Evaluate both positions. What evidence would you want to see to assess the empirical claim about turnout effects?

Discussion 19.2

Vivian Park argues that expressing honest uncertainty is a competitive advantage for a polling firm. But in competitive markets, clients may simply hire a different firm that expresses more confidence — even if that confidence is false. Under what market conditions would honest uncertainty communication actually be rewarded? What would need to change about how clients evaluate and hire polling firms to create stronger incentives for honesty?

Discussion 19.3

A political candidate with a 35% win probability in a closely-watched race tells their supporters "the polls show we're behind but we believe we can win." Is this statement problematic? Is it accurate? When, if ever, is it appropriate for candidates or campaigns to publicly downplay probability estimates? Who are the relevant stakeholders and what do each of them need from election forecasters?

Discussion 19.4

The chapter argues that "high uncertainty is the honest answer" in genuinely close races. But journalists and media organizations have strong incentives to say who will win rather than who might win. What structural changes in how media covers elections might create better incentives for probabilistic, uncertainty-acknowledging coverage? Is this a problem of media culture, economic incentives, audience psychology, or all three?