Case Study 22.2: Forecasting Without Polls — The 2018 Brazilian Municipal Elections

Case Study 22.2: Forecasting Without Polls — The 2018 Brazilian Municipal Elections

Background

This case study examines the challenge of electoral forecasting in a data-scarce subnational environment, using Brazil's 2018 municipal and state elections as the context. Brazil is a middle-income country with substantial polling infrastructure for national and major state-level elections, but dramatically thinner coverage at the municipal and regional levels. The 2018 cycle also coincided with the presidential election in which Jair Bolsonaro unexpectedly won, creating a national-level covariate that substantially affected down-ballot races in unpredictable ways.

Brazil has 26 states, one federal district, and 5,570 municipalities. State gubernatorial polling was conducted in approximately half the states with meaningful frequency; in the remainder, analysts were working with 0–2 polls from the entire campaign. Municipal-level races, except in São Paulo and Rio de Janeiro, received essentially no professional polling.

The Data Environment

Available data: - 2014 and 2016 presidential election results by municipality (from the Tribunal Superior Eleitoral — Brazil's electoral authority) - Party registration and candidate filing data - Economic indicators: state-level unemployment and GDP per capita - Social indicators: Bolsa Família (conditional cash transfer) recipient density by municipality - Polling: available for federal presidential race and approximately 14 of 26 state races

Not available: - Any individual municipality-level polling outside major cities - Reliable voter registration party affiliation data (Brazil does not have party registration in the American sense) - Exit polling in most states

The Methodology: Structural Forecasting Under Data Scarcity

A Brazilian electoral analytics firm faced with forecasting state gubernatorial races in the 12 states with no polling data developed the following approach:

Step 1: Presidential Vote as State Partisan Lean Proxy

The 2014 presidential election between Dilma Rousseff (PT) and Aécio Neves (PSDB) was used as the baseline partisan lean measure for each state. States where PT performed strongly relative to the national average in 2014 were coded as PT-leaning; states where PSDB performed strongly were coded as PSDB-leaning.

This approach has a significant limitation: Brazilian parties are weakly programmatic compared to American parties. The PT-PSDB alignment of 2014 was already fraying by 2018, and Bolsonaro's candidacy under the PSL fragmented the right-wing vote in ways that made 2014 presidential patterns a poor guide to 2018 state races.

Step 2: Economic Conditions Model

State unemployment rates and per-capita income growth since 2014 were used to predict incumbent support. States with stronger economic conditions were expected to show higher incumbent approval; states with deteriorating conditions would be more vulnerable to challengers.

The model used a coefficient estimated from the pooled 2010 and 2014 state-level election data: approximately 0.6 percentage points of incumbent support per percentage point difference in state unemployment from the national average. This coefficient was uncertain — it was estimated from only two election cycles with limited data — but represented the best available empirical prior.

Step 3: Bolsonaro Covariate

The breakthrough development in the final weeks of the campaign was the dramatic rise of Bolsonaro's first-round vote share, which reached approximately 46% nationally. Historical data from previous high-volatility elections in Brazil suggested that when a major national candidate performs dramatically better or worse than expected, the coattail effect for state-level candidates of the same party or ideological alignment is substantial.

The firm estimated that in states where Bolsonaro's vote share exceeded 50% in the first round, governors affiliated with right-leaning parties (PSL, MDB, DEM) would outperform their historical baseline by approximately 3–4 points. This estimate was based on analogies from the 2002 Lula wave and the 2010 Dilma election rather than from 2018-specific data.

Step 4: Explicit Uncertainty Communication

The firm's final deliverable for each of the 12 unpolled states was not a single probability estimate but a range: a 5th-to-95th percentile range for the most likely governor's final vote share, with explicit notation of the primary sources of uncertainty:

Model uncertainty (±2.5 points from structural model)
Bolsonaro covariate uncertainty (±3.0 points, reflecting genuine unpredictability of coattail effects)
Candidate quality uncertainty (qualitative assessment, not quantified)

For 8 of the 12 states, the range was approximately 20 percentage points wide — wide enough to encompass multiple possible outcomes. The firm communicated explicitly that for these states, the forecast was "indicative" rather than "predictive."

Results

Of the 12 states with no polling data: - 7 gubernatorial outcomes fell within the firm's stated 5th-to-95th percentile range - 5 outcomes fell outside this range, primarily because the Bolsonaro coattail effect was larger in some states than the model anticipated - In 2 states, the eventual winner was a candidate the model did not identify as a leading contender, because of last-minute candidate consolidation dynamics that no model could have captured

The accuracy rate was substantially lower than in the 14 states where polling was available (where 12 of 14 outcomes fell within the stated ranges). But the firm's explicit uncertainty communication meant that the 5 misses were not presented to clients as surprises — the stated uncertainty ranges had been wide enough to communicate genuine ignorance.

Lessons from the Brazilian Case

Lesson 1: Structural models have large irreducible uncertainty in multi-party systems. Brazilian politics in 2018 was characterized by fragmented parties, new candidates, and a national shock (Bolsonaro's rise) that had no precedent. The structural model's limited accuracy in the unpolled states reflects genuine unpredictability, not merely methodological inadequacy.

Lesson 2: Wide confidence intervals honestly communicated are more valuable than narrow intervals that prove wrong. The firm's decision to communicate 20-percentage-point ranges was initially poorly received by clients who wanted specific predictions. But when the misses occurred, the credibility of the firm's overall analytical work was intact because the uncertainty had been clearly stated in advance.

Lesson 3: Coattail effects are among the hardest phenomena to model. The Bolsonaro coattail effect was real but highly variable across states — stronger in some regions than the model suggested, weaker in others. Any model that attempted to precisely quantify coattail effects from two previous elections was overconfident.

Lesson 4: The responsible threshold for claiming forecasting capability. After the election, the firm's leadership had an internal debate about whether they should have produced state-level forecasts at all for the 12 unpolled states, given the genuine limitations. The conclusion was that probabilistic scenario analysis — with extremely explicit uncertainty quantification — was more valuable than silence, because clients needed some framework for thinking about these races. But the communication discipline was essential: the firm was providing structured uncertainty, not prediction.

Discussion Questions

1. The firm used 2014 presidential vote share as a proxy for state partisan lean in 2018. By 2018, the PT-PSDB alignment had fragmented substantially. How would you assess whether a historical partisan lean proxy remains valid in a changing political environment? What alternative proxy variables might have worked better for 2018?

2. The firm's structural model had large uncertainty in part because it was estimated from only two election cycles. What are the minimum data requirements for estimating a state-level electoral model that can be applied with confidence? How would your answer change if you could pool across Brazilian states vs. needing to estimate state-specific models?

3. The firm debated whether to produce forecasts at all for the 12 unpolled states. Construct the best argument for "produce a forecast" and the best argument for "refuse to forecast." Which do you find more convincing, and under what conditions might your answer change?

4. Compare Brazil 2018 to a scenario where an American firm is asked to forecast state legislative races in 50 states simultaneously. What data sources and methodological strategies would the American firm have that the Brazilian firm lacked? What challenges would be structurally similar?

5. The chapter discusses parallel vote tabulation as an alternative to pre-election polling in data-scarce environments. Could PVT have helped in the Brazilian municipal context? What are its limitations when applied to subnational races where ballot design, candidate names, and party affiliations vary significantly across thousands of municipalities?