Chapter 21 Exercises: Building a Simple Election Model

DataField.Dev

Chapter 21 Exercises: Building a Simple Election Model

Conceptual Review

Exercise 21.1 — The Three-Layer Architecture In your own words, explain the three-layer architecture of the election model built in this chapter.

a) What does each layer contribute that the others do not? b) Why would a model with only Layer 1 (poll aggregation) be insufficient? c) Why would a model with only Layer 3 (uncertainty simulation) without Layers 1 or 2 be meaningless? d) In what order should the layers be built, and why does order matter?

Exercise 21.2 — Recency Weighting Logic A forecaster considers two polls: Poll A conducted 5 days before the election with 600 respondents, and Poll B conducted 25 days before the election with 1,200 respondents.

a) Using a decay rate of λ = 0.05, calculate the recency weight for each poll. b) Calculate the sample-size weight (using √n) for each poll. c) Calculate the composite weight for each poll (before normalization). d) Normalize the weights so they sum to 1. What fraction of the weighted average does each poll contribute? e) If Poll A shows D+3.0 and Poll B shows D+1.5, what is the weighted average? f) Now repeat d) and e) with λ = 0.12. How does the result change, and why?

Exercise 21.3 — Fundamentals Prior Construction The fundamentals model uses three inputs: state partisan lean, presidential approval, and economic conditions.

a) A state has a Republican lean of 3.5 points. What is the baseline D margin before any adjustments? b) The Democratic presidential incumbent has an approval rating of 46 percent nationally. Using an effect of 0.3 points per percentage point deviation from 50, what is the approval adjustment? c) State unemployment is 4.8 percent; national is 4.0 percent. Using an effect of -0.8 points per percentage point deviation, what is the unemployment adjustment? d) Calculate the total fundamentals prior for a Democratic Senate candidate in this environment. e) A polling average shows D+1.5. Using a blending weight of 0.70, what is the blended point estimate?

Python Lab Exercises

Exercise 21.4 — Modify the Decay Rate Open code/example-01-poll-aggregation.py. Change the DECAY_RATE parameter from 0.05 to each of the following values: 0.01, 0.03, 0.08, and 0.15.

a) Record how the weighted polling average changes for each decay rate. b) Plot the polling average vs. decay rate. What shape does the relationship take? c) At what decay rate does a poll from 30 days ago receive less than 25% of the weight of a same-day poll? d) Argue for or against the proposition that a higher decay rate always produces a more accurate forecast.

Exercise 21.5 — Custom Weighting Scheme The current model weights polls by recency × √sample_size × quality_multiplier. Propose and implement an alternative weighting scheme that also incorporates:

Methodology bonus: phone polls receive a 1.2 multiplier; IVR polls receive 0.8
Pollster recency penalty: pollsters who have not published a poll in this cycle in the past 30 days receive a 0.7 multiplier

a) Implement this in Python by modifying the calculate_weights function. b) Calculate the weighted average under your new scheme. c) Compare it to the original scheme and explain any difference. d) What assumption about pollster quality does the "recency penalty" embody? Is it a reasonable assumption?

Exercise 21.6 — Monte Carlo Parameter Exploration Open code/example-02-monte-carlo-simulation.py. Using the run_monte_carlo function:

a) Set the systematic error SD to 0. How does the win probability change? What does this mean? b) Set the systematic error SD to 5.0 (a very large systematic error). What is the win probability? What does the distribution look like? c) Keep the systematic error SD at 2.0, but change the point estimate to -1.0 (Whitfield leading by 1 point). What is the Garza win probability? d) Find the point estimate at which the Garza win probability is exactly 50 percent under the base uncertainty parameters. (Hint: it is not 0.0 — why?) e) Explain in plain language why the 50-percent win probability point is not exactly 0.

Exercise 21.7 — Add a Third Fundamentals Factor The fundamentals model currently uses state lean, presidential approval, and unemployment. Propose and implement a fourth factor: incumbent candidate advantage (the incumbent senator has a personal vote advantage of approximately 2 points relative to a generic candidate).

a) Write code to incorporate this factor. It should apply when the incumbent is running for re-election but not in open-seat races. b) How does adding this factor change the fundamentals prior for the Garza-Whitfield race? (Assume Whitfield is the incumbent.) c) How does it change the blended point estimate? d) Is this addition theoretically justified? Cite evidence for or against the existence of a Senate incumbent advantage.

Applied Analysis

Exercise 21.8 — Full Model Run and Interpretation Run the complete code/example-03-election-model.py with the default settings.

a) Record the following outputs: polling average, fundamentals prior, blended point estimate, Garza win probability, 10th percentile, 90th percentile. b) In plain language suitable for a non-technical campaign staffer, write a 3-sentence interpretation of the model output that communicates the central finding and the key uncertainty. c) A campaign staffer says: "61 percent — so we've probably won." Write a brief correction that explains why this interpretation is wrong. d) The campaign is deciding whether to increase spending in this state by $500,000. Based solely on the model output, is this state a reasonable target for additional spending? What additional information would you want before making this recommendation?

Exercise 21.9 — Sensitivity Analysis Design Design a sensitivity analysis for a different input: the fundamentals blending weight (poll_weight parameter, which ranges from 0 to 1).

a) Write code that runs the full model at poll_weight = 0.1, 0.25, 0.5, 0.75, 0.9, and 1.0 and records the win probability at each value. b) Plot win probability vs. poll_weight. c) At what blending weight is the model most sensitive to a 3-point systematic polling error? Show this mathematically. d) The chapter recommends increasing poll_weight as Election Day approaches. At what time before the election should poll_weight exceed 0.9? What criterion would you use to set this threshold?

Exercise 21.10 — Model Comparison Build two versions of the model: - Model A: Uses only polls (poll_weight = 1.0, no fundamentals) - Model B: Uses only fundamentals (poll_weight = 0.0, no polls) - Model C: Blended model with poll_weight = 0.75 (the default from the chapter)

a) Run all three models and record their win probabilities. b) Which model is most and least confident (narrowest and widest confidence interval)? c) If the polls have a systematic 3-point error toward Democrats, which model produces the smallest final-point-estimate error? d) If the fundamentals model has a structural break (because a major policy event changed the political environment), which model is most robust? e) Based on your analysis, when in the electoral cycle would you recommend each model? Why?

Extension and Critical Thinking

Exercise 21.11 — The Herding Problem in Practice Herding (Chapter 20) occurs when pollsters adjust their results toward the consensus. Suppose you are building a poll aggregator and you suspect some polls have been adjusted toward the consensus.

a) What statistical test could you apply to detect if the distribution of poll results is "too tight" to be consistent with independent sampling? (Hint: compare the observed variance of poll results to the expected variance under independent sampling.) b) Write Python code that computes the expected variance of poll results if they were independent measurements with the sample sizes given, and compares this to the actual observed variance. c) If you detect herding, how should you adjust your aggregation? (Should you widen your uncertainty estimates, down-weight consensus-adjacent polls, or something else?) d) Is there a way to detect which specific polls have been herded toward the consensus vs. which represent genuine outlier measurements? What approach would you use?

Exercise 21.12 — Nadia's Presentation: Communication Choices Nadia presents the model to the Garza campaign leadership. The finance director asks: "If we're at 61 percent, what does the other 39 percent look like? How does Whitfield win?"

Write a 200-word answer in Nadia's voice that: - Explains the primary scenarios under which Whitfield wins (citing specific uncertainty sources) - Distinguishes between outcomes that are likely vs. unlikely within the 39 percent - Does not use jargon but does communicate the probabilistic thinking accurately - Closes with a concrete recommendation for what information would most change the probability estimate