Case Study 40.1: Reading a Published Meta-Analysis — What the Aggregate Tells Us (and Hides)

Case Study 40.1: Reading a Published Meta-Analysis — What the Aggregate Tells Us (and Hides)

Background

In 2022, a team of researchers published a meta-analysis in Psychological Bulletin titled (fictitiously for this case study): "Responsive Listening and Romantic Attraction: A Meta-Analytic Review of 43 Studies." The meta-analysis aggregated findings from 43 studies across 18 years, including laboratory experiments, field studies, and survey-based correlational designs. The pooled sample was approximately 8,400 participants. The reported pooled effect was r = .31 (medium by conventional benchmarks), with a 95% CI of [.26, .36].

The abstract concluded: "Responsive listening consistently predicts attraction across study designs and contexts, with a medium-sized effect. These findings suggest that responsiveness is a robust determinant of romantic interest."

Unpacking the Claim

What the aggregate tells us:

The pooled r = .31 is a genuine signal. Averaged across 43 studies, a correlation of this magnitude is unlikely to be a statistical artifact. The confidence interval is reasonably tight, indicating good precision. The effect is large enough to be practically meaningful in population-level research: if responsiveness predicts about 10% of the variance in attraction ratings (r² ≈ .096), that is a modest but real contribution.

What the aggregate hides:

The I² problem: The meta-analysis reports I² = 68%. This means that 68% of the variation in effect sizes across the 43 studies is due to true differences between studies — not sampling error. The pooled r = .31 is an average of effects that range from about r = .05 (small, near-zero) to r = .56 (large) across different studies. The "medium effect" is hiding substantial heterogeneity. Some populations, settings, and definitions of "responsive listening" show large effects; others show almost none. The abstract's language — "consistently predicts" and "robust determinant" — obscures this.

The WEIRD composition of the sample: Thirty-one of the 43 studies used American or Western European samples. Only four were conducted outside WEIRD contexts. The four non-WEIRD studies had notably smaller effects on average (r ≈ .18) than the WEIRD studies (r ≈ .34). If the meta-analysis had been confined to non-WEIRD samples, the conclusion about universality would be much weaker. The Okafor-Reyes data, which this fictional meta-analysis did not include, suggested that the role of explicit verbal responsiveness varies by cultural context — in cultures where emotional expressiveness is normatively suppressed, responsiveness is communicated and read through different channels.

The funnel plot asymmetry: The meta-analysis includes a funnel plot in its supplementary materials. The funnel is visibly asymmetrical: there are very few studies in the lower-left quadrant (small studies with small or negative effects). Egger's test is significant (B = 1.87, SE = 0.61, p = .003), suggesting substantial publication bias. If the file drawer contains small studies that found near-zero or negative effects, the true pooled effect may be closer to r = .20–.25 than the reported .31.

The construct heterogeneity: "Responsive listening" was operationalized in at least seven different ways across the 43 studies: observer ratings of behavioral response in laboratory interactions; self-report measures of how responsive a partner seems; partner-rated responsiveness after a first conversation; coded verbal reflection in taped interactions; self-reported tendency to listen closely; neural measure of sustained auditory attention; and a composite scale including verbal and nonverbal responsiveness. These are related but distinct constructs. The meta-analysis assumes they are measuring the same underlying thing; the substantial heterogeneity may partly reflect that they are not.

The Honest Assessment

The meta-analysis provides genuine evidence that something in the family of "responsive listening" constructs is associated with attraction, with a medium-sized positive effect that is unlikely to be a sampling artifact. But the appropriate confidence is more measured than the abstract suggests. The effect is probably smaller than r = .31 when corrected for publication bias, probably varies meaningfully across cultural contexts, and may represent an average over a heterogeneous family of related but distinct effects rather than a single universal mechanism.

The lesson is not that this meta-analysis is bad science. It is careful, comprehensive work. The lesson is that even good meta-analyses require careful reading — that the headline number is a starting point for interpretation, not an ending point.

Discussion Questions

The abstract uses the phrase "robust determinant." Given the I² value and funnel plot asymmetry, is this characterization warranted? What language would be more accurate?
Should researchers conducting meta-analyses be required to report I² and funnel plot results in the abstract rather than just in supplementary materials? What would be gained and lost?
If you were designing a follow-up study based on this meta-analysis, what would be your top priority: addressing the WEIRD sample composition, clarifying the construct heterogeneity, or investigating the publication bias? Why?