Case Study 8.1: Weighting the Sun Belt — Meridian's Sampling Strategy for the Garza-Whitfield Race

The Problem

When Meridian Research Group was contracted to conduct rolling tracking polls for the Garza-Whitfield Senate race, Trish McGovern's first call was not to a technology vendor or a phone bank. It was to the state's voter registration office.

"Before I can design the sample," she told Carlos Mendez on his first day on the project, "I need to know who's on the list, how complete it is, and how stale it might be."

The voter file she received was 3.2 million registered voters with names, addresses, and past voting history. Phone numbers were present for approximately 71% of records — a mix of landline and cell numbers, many matched from commercial list vendors rather than self-reported to the registrar.

Trish spent two days with the file before drafting the sampling plan. What she found shaped every subsequent methodological decision.

The Sampling Frame Analysis

The voter file had several significant limitations.

Geographic concentration of missing numbers. The 29% of records without phone numbers were not distributed randomly across the state. They were disproportionately concentrated in the southern metro counties — exactly the most Latino-heavy areas of the state, and exactly the area where Garza's margin would be determined. In those counties, phone number match rates ran as low as 58%, compared to 82% in northern suburban areas.

"We're going in already blind to a significant part of our target population," Trish told Vivian at the Monday morning meeting. "And it's not a random part — it's the part we need most."

The reasons for low match rates in southern metro counties were structural: higher residential mobility (reducing the time any address-phone pairing is valid), more cell-phone-only households, more frequent use of prepaid or temporary phones that don't show up in commercial directories, and a higher share of recent registrants whose records had not yet been matched.

Language. Of the 3.2 million registered voters, approximately 380,000 had Spanish-language preference flags in the voter file — either from self-reported data or from precinct-level proxy estimates. For these voters, an English-only telephone survey would likely achieve much lower cooperation rates, and respondents who completed an English interview despite Spanish-language preference might not be representative of their demographic peers.

Likely voter uncertainty. The state had seen significant new registrant activity in the eighteen months before the election, driven by a national voter registration drive. These new registrants — roughly 180,000 people — had no prior voting history in the file, making standard propensity-score-based likely voter screens unreliable for them.

The Sampling Design

Trish's solution was a disproportionate stratified sample with enhanced southern metro coverage, conducted in two modes (English and Spanish) with a voter propensity adjustment rather than a binary likely voter cut.

Stratification: The state was divided into five strata based on geographic region and Latino population concentration:

Stratum Approximate LV Population Sample n (allocated) Proportion of LV Population Sample Rate
Southern Metro (high Latino) 22% 350 22% 2.0x
Northern cities 31% 310 31% 1.0x
Inner suburbs 18% 180 18% 1.0x
Outer suburbs 14% 210 14% 1.5x (growing area, high uncertainty)
Rural 15% 150 15% 1.0x
Total 100% 1,200

The southern metro oversample at 2x the proportionate rate was designed to produce approximately 350 respondents from that stratum, ensuring enough Latino respondents for subgroup analysis even after applying likely voter weights.

Likely voter approach: Rather than a binary likely voter cut, Trish assigned each respondent a vote propensity weight based on their voter file history (number of elections voted in out of the last four general elections). Respondents with no history (new registrants) received a moderate propensity weight based on stated vote intent, calibrated to national data on new registrant turnout patterns.

Mode: Bilingual call centers, with Spanish-language version of the questionnaire available at any point. All numbers classified as Spanish-language preference in the voter file received a Spanish-first protocol. All other numbers received English-first with a Spanish transfer option.

The Weighting Procedure

After data collection (3,847 contacts yielded 1,214 completes, a cooperation rate of approximately 5.4%), Trish ran the weighting protocol.

Stage 1: Design weight. Correct for the unequal selection probabilities introduced by the disproportionate stratification. Southern metro respondents received downweights; outer suburban respondents received slight downweights. This brought the sample back to proportionate regional representation.

Stage 2: Vote propensity weight. Apply the voter propensity weight, which slightly downweights very high propensity respondents (who are overrepresented among survey completers) and upweights lower-propensity respondents (who are harder to reach and underrepresented).

Stage 3: Demographic rake. Rake on age, gender, race/ethnicity, and education to voter file and Census targets, accounting for the interaction between region and race (so that the Latino demographic correction is applied correctly within regions rather than globally).

Final effective n: After weighting, the effective sample size (accounting for design effects) was approximately 870. The headline MOE was ±3.5 percentage points.

What the Weighting Did to the Numbers

The raw unweighted data showed Garza at 46% and Whitfield at 45%. After full weighting, the estimate was Garza 49%, Whitfield 44%.

Carlos walked through the attribution analysis:

  • Design weight (southern metro correction): +1.2 points for Garza
  • Vote propensity weight: +0.8 points for Garza
  • Demographic rake (primarily Latino upweight): +1.1 points for Garza

All three corrections moved in the same direction — toward Garza — because all three corrections increased the representation of groups that disproportionately favored her.

"If we hadn't weighted at all," Carlos noted, "we'd have called this a toss-up. We're calling it Garza +5. That's a big difference."

"And we'd have been wrong not to weight," Trish said. "The unweighted sample was systematically underrepresenting her coalition. Weighting is what makes the number mean something."

The Disclosure Challenge

When Meridian prepared the methodology statement for public release, they faced the question that every pollster with a complex weighting scheme faces: how much to disclose?

Their final methodology statement included:

  • Sampling frame (state voter registration file, commercial phone match)
  • Mode (bilingual telephone, live interviewer)
  • Sample design (disproportionate stratified with five strata)
  • Weighting variables and population targets
  • Effective sample size and headline MOE
  • Field dates and response rate calculation (AAPOR Method 3)

Vivian argued for maximum disclosure. "Anyone who wants to critique our methodology should be able to. If we're confident in what we did, we should be willing to show it."

"Some clients won't want the full weighting details out there," Trish noted. "If Whitfield's team sees we're weighting up Latino voters heavily, they'll use that to attack the poll."

"Then we explain why we're weighting up Latino voters," Vivian said. "Because the voter file shows they're underrepresented in phone samples in this region, and we're correcting for that. That's not a political judgment — it's a methodological one."

The full methodology statement was published alongside the poll results.

Discussion Questions

  1. Trish chose a vote propensity weight rather than a binary likely voter screen. What are the advantages of this approach? What are the risks?

  2. All three weighting corrections (design weight, propensity weight, demographic rake) moved the estimate in the same direction — toward Garza. What does this convergence tell us about the systematic biases in unweighted telephone samples? Is this coincidence, or is there a structural reason these corrections tend to move together?

  3. Meridian used a Spanish-first protocol for phone numbers flagged as Spanish-language preference. What are the potential benefits of this approach? What challenges might it introduce in terms of data comparability — are Spanish-language and English-language interviews measuring the same thing?

  4. The effective n after weighting was 870, compared to 1,214 completes. What accounts for this reduction in effective sample size? Under what circumstances would the design effect be larger or smaller?

  5. Vivian insists on full methodology disclosure. Trish notes that opponents could use that disclosure to attack the poll. Who do you think is right? What professional and ethical considerations are in tension here?