Case Study 3.2: When Replication Fails — The Power Pose Controversy and What It Teaches Us

Case Study 3.2: When Replication Fails — The Power Pose Controversy and What It Teaches Us

Background

In 2010, social psychologist Amy Cuddy and her colleagues published a paper claiming that adopting "power poses" — expansive, open body postures, like standing with feet wide and hands on hips — for two minutes before a stressful social encounter caused measurable hormonal changes (increased testosterone, decreased cortisol) and led participants to feel more powerful and take more financial risks. The paper was published in Psychological Science, a top-tier journal, and Cuddy's subsequent TED Talk, "Your Body Language May Shape Who You Are," became one of the most viewed TED Talks in history, with over 60 million views.

The claims were directly relevant to attraction research: if posture affects hormones and confidence, and confidence affects how attractive people are perceived to be, then a simple two-minute behavioral intervention could meaningfully alter social desirability outcomes. The research was widely cited in self-help literature, workplace training programs, and coaching circles.

The Replication Attempt

In 2015, Eva Ranehill and colleagues published a large, pre-registered replication study in Psychological Science — the same journal that published the original. Using a much larger sample (N = 200, compared to the original N = 42) and closely following the original protocol, they found that power poses did produce self-reported feelings of power. But the hormonal changes — the testosterone and cortisol effects that had been central to the original paper's theoretical argument — did not replicate. At all.

This was followed by a remarkable internal critique: Dana Carney, one of the original co-authors, published an extensive blog post in 2016 in which she stated that she no longer believed the power pose effect was real, described the multiple analytic degrees of freedom involved in the original study, and recommended that the finding not be cited as established.

What Made This a Particularly Instructive Failure

Several features of this case make it especially useful for understanding how the replication crisis works.

First, the original study had a very small sample size (N = 42) for detecting hormonal changes, which are high-variance biological outcomes. With samples this small, effect size estimates are extremely noisy — the original d values were almost certainly inflated by chance. The large replication, with nearly five times the participants, provided a much more precise estimate of the true effect.

Second, the paper had multiple outcomes — self-reported power, risk-taking behavior, testosterone, cortisol — and the pattern of significant results was not entirely consistent. In a small sample, the probability of at least some of these reaching significance even under the null hypothesis is not negligible. Reporting the ones that did reach significance, without adjusting for the number of tests conducted, is a form of the "garden of forking paths" problem in data analysis.

Third, the social demand characteristics were severe: participants told to adopt "powerful" poses may well have reported feeling more powerful because the manipulation cued them to, not because of any hormonal mechanism. The behavioral and hormonal claims required separating these channels, which the original design could not cleanly do.

The Broader Lesson for Attraction Research

Power posing is not primarily an attraction research finding, but its failure implicates a broad class of studies that attraction researchers cite regularly: embodied cognition effects (does how you sit affect how you think about yourself?), hormonal influences on confidence and attractiveness, and body language as a signal of mate quality.

Many of these claims rest on similarly small samples, similarly high-variance hormonal outcome measures, and similarly unaddressed demand characteristic problems. When someone cites a study linking testosterone levels to attractiveness ratings, or linking expansive posture to perceived dominance, the appropriate response is the same methodological checklist this chapter has developed: What's the sample size? Has it replicated? What's the effect size? Was the study pre-registered?

The power pose case also illustrates that replication failures are not scandalous exceptions — they are the normal, necessary process by which science self-corrects. Cuddy's research program generated testable predictions; those predictions were tested with better methods; the evidence now favors a more modest conclusion. This is science working as it should, even though the path was painful for everyone involved.

Discussion Questions

The power pose replication failure involved a lead author publicly walking away from the finding. How should students and practitioners respond when original authors repudiate their own research? What does this suggest about citing unpublished or non-replicated findings?
The self-report component of power posing effects (people reporting feeling more powerful) replicated even when the hormonal effects did not. How should researchers interpret a situation in which subjective experience and physiological outcomes disagree?
Confidence and perceived attractiveness are often claimed to be related. Given what you know from this case study about the methodological problems in embodied cognition research, how much weight would you place on studies claiming to link physical posture to attractiveness ratings?