Case Study 17-1: The 2022 Senate Map and Aggregator Divergence

Case Study 17-1: The 2022 Senate Map and Aggregator Divergence

Background

The 2022 midterm elections offered a particularly rich testing ground for poll aggregation models. Going into the fall campaign, the fundamental political environment suggested significant Republican gains — historically, the party of a first-term president facing high inflation and low approval ratings tends to perform strongly in midterms. But the specific candidate quality in several key Senate races complicated that picture, and the expected "red wave" never materialized.

For poll aggregators, 2022 presented a specific methodological challenge: a proliferation of lower-quality polls, many from partisan-affiliated outfits, drove significant divergence between aggregators that included these polls and those that excluded them. The race in Pennsylvania between Democrat John Fetterman and Republican Mehmet Oz became the central case study.

The Pennsylvania Senate Race

For much of September and October 2022, three major aggregators told meaningfully different stories about the Pennsylvania Senate race:

RealClearPolitics Average (mid-October): Fetterman +1.3 FiveThirtyEight Average (mid-October): Fetterman +4.6 Decision Desk HQ Average (mid-October): Fetterman +3.1

The divergence of more than three percentage points between RCP and 538 on the same race, at roughly the same time, is striking. It's not explained by random variation — it's a methodological artifact.

What Explained the Divergence?

The primary driver was poll selection. RCP's simple average included every available poll in the timeframe, including several polls from organizations with poor methodological track records and notable Republican-leaning house effects. FiveThirtyEight's quality-weighted average assigned these lower-rated pollsters much less weight, and also adjusted for their detected house effects.

Specifically, several of the polls that pushed RCP's average toward a tighter race came from organizations that: - Were affiliated with Republican-leaning media outlets - Had detected house effects favoring Republicans of 2-4 percentage points - Used online opt-in panels without probability-based sampling - Had limited track records for accuracy in comparable races

FiveThirtyEight's model downweighted these polls significantly. RCP included them at face value. The result was a three-point difference in the average — a difference that had real consequences for how the race was being covered.

The Media Amplification Effect

The RCP average was widely cited by right-leaning media as evidence that the Pennsylvania race was a toss-up. The 538 average was cited by center-left outlets as evidence of a Fetterman lead. Both citations were accurate representations of what those aggregators showed. But the underlying data was the same; the difference was entirely methodological.

This created a perverse situation: consumers of different media outlets were receiving genuinely different information about the same race, derived from the same underlying polls, because of aggregator methodology choices.

The Election Outcome

John Fetterman won Pennsylvania by 4.9 percentage points. The final polling averages showed:

Aggregator	Final Average	Actual Result	Error
RCP	Fetterman +1.3	Fetterman +4.9	3.6 pts
FiveThirtyEight	Fetterman +4.6	Fetterman +4.9	0.3 pts
DDHQ	Fetterman +3.1	Fetterman +4.9	1.8 pts

The quality-weighted, house-effect-adjusted aggregator was dramatically more accurate than the simple average. The divergence between aggregators was not splitting the difference around an ambiguous truth — one methodology was significantly better.

Discussion Questions

1. RCP's methodology is often defended on the grounds of transparency — anyone can see every poll included and compute the average themselves. Given the Pennsylvania case, is transparency alone sufficient justification for a methodology? What else matters?

2. Some argued that Republican-affiliated polls were "gaming" the RCP average by flooding the zone — commissioning many polls to push the average in their direction. Does the RCP methodology create this incentive? How would you design a system that preserves transparency while resisting this kind of gaming?

3. Consumers of different media outlets received genuinely different information about the Pennsylvania race because of aggregator methodology choices. Who bears responsibility for this fragmented information environment — the aggregators, the media outlets, or someone else?

4. After the election, with hindsight, it's clear that FiveThirtyEight's methodology was more accurate. Does this mean you should always prefer 538's approach over RCP's? What would you need to see to be confident in this conclusion?

5. How does the Pennsylvania 2022 case relate to the "measurement shapes reality" theme from Chapter 17? Did the divergent aggregator readings affect the campaign in any way you can identify from published reporting?

Quantitative Extension

Using publicly available polling data (from 538's archive or similar sources), replicate the aggregation exercise for a different 2022 Senate race (suggestions: Georgia, Wisconsin, Nevada). Compute your own quality-weighted average and compare it to the simple RCP average. How much do they differ? How close was each to the final election result?