Case Study 9.1: The Sweaty T-Shirt Study — Replication and Controversy
Background: A Simple Design with Large Implications
When Swiss zoologist Claus Wedekind and his colleagues published "MHC-Dependent Mate Preferences in Humans" in Proceedings of the Royal Society B in 1995, the study design was almost disarmingly modest. Forty-four men wore a plain cotton t-shirt for two consecutive nights. They followed a scent-control protocol: no scented products, no sex, no garlic or spiced foods. The shirts were then stored in sealed boxes. Forty-nine women were brought in to smell each box and rate the odors on pleasantness and sexiness, as well as to indicate which smell they would prefer on a (hypothetical) partner.
The result was what has since become one of the most replicated-and-debated findings in the biology of attraction: women tended to give higher pleasantness and sexiness ratings to t-shirts from men whose MHC (major histocompatibility complex) genotype was most different from their own — and women on oral contraceptives showed a reversed preference pattern.
The study attracted international attention not just from researchers, but from popular press and the fragrance industry. Here was an apparent answer to one of attraction's most enduring mysteries: beneath the noise of looks and personality and cultural expectation, do bodies quietly assess biological compatibility through smell?
Thirty years of follow-up research have given a much more complicated answer.
What the Original Study Showed — and What It Did Not
Before examining the replication record, it is worth being precise about what Wedekind's 1995 study actually demonstrated.
What it showed: Women in the sample, on average, rated the odors of MHC-dissimilar men as more pleasant and sexier than the odors of MHC-similar men. The oral contraceptive group showed a pattern consistent with a reversal of this preference.
What it did not show: - That women would choose these men as actual partners in real life - That the detected chemical cues were specific MHC-encoded peptides (the mechanism remained hypothetical) - That the effect was large or clinically meaningful — effect sizes were modest - That the finding would generalize beyond a Western European university sample - That the effect was robust across different operationalizations of MHC (dis)similarity
The study used a small, homogeneous sample (Swiss university students), lacked pre-registration, and reported what was partly a secondary finding (the oral contraceptive effect) as a key result — a practice that raises concerns about post-hoc analysis. None of this necessarily invalidates the findings, but it sets the context for understanding the replication difficulties that followed.
The Replication Record: Mixed to Troubled
Over the following two decades, numerous labs attempted to replicate or extend Wedekind's findings. The results were heterogeneous enough to constitute a genuine scientific puzzle.
Supportive replications and extensions: Several studies found patterns consistent with the original MHC-dissimilarity preference in women. Thornhill and colleagues (2003) found that body odor rated as more attractive by women corresponded to men with more MHC heterozygosity (carrying multiple distinct MHC alleles) rather than MHC dissimilarity per se — a related but subtly different prediction. Roberts and colleagues (2008) replicated the oral contraceptive reversal effect specifically, though the main dissimilarity effect was weaker.
Non-replications and null results: Other studies found no significant MHC odor preference effect. Crucially, some of the null results occurred in larger samples with more rigorous MHC typing methods, raising the possibility that earlier findings reflected type I errors (false positives) in underpowered studies.
The oral contraceptive complication: The finding that oral contraceptive users showed reversed preferences attracted particular attention because of a provocative implication: if hormonal contraceptives shift MHC odor preferences toward similar (kin-like) odors, could they affect mate choice in real life? A 2008 study by Roberts and colleagues in the Proceedings of the Royal Society B examined this directly, finding that women who began or stopped using hormonal contraceptives during a relationship reported changes in satisfaction with their partner's scent. This study has methodological problems — it is cross-sectional and retrospective — but it raised questions that generated both follow-up research and popular panic about oral contraceptives "destroying relationships." The evidence for the latter claim is weak.
Where Does the MHC Hypothesis Stand?
Meta-analyses of the available literature find a small but potentially detectable MHC odor preference effect. A 2015 meta-analysis by Winternitz and colleagues estimated a pooled effect size of approximately d = 0.41 for women's preferences — small by psychological research conventions. However, funnel plot analysis suggested possible publication bias: the true effect may be smaller still.
The field has also grappled with measurement heterogeneity. Different studies operationalize MHC (dis)similarity differently — some use the number of shared alleles at specific loci, others use more comprehensive similarity metrics. The correlation across operationalizations is not always strong, which means studies may not be testing exactly the same hypothesis.
Perhaps most importantly, no study has successfully demonstrated that MHC odor cues influence actual human pair-bonding at the population level — that is, that real couples are more MHC-dissimilar than expected by chance. Some studies claim to find this effect; others do not. The evidence linking laboratory odor preference ratings to real-world partner selection remains indirect.
Lessons for Evidence Evaluation
The sweaty t-shirt study's replication history is a case study in several recurring patterns in psychological and biological research:
The file-drawer problem: Null results are less likely to be published. If a dozen labs attempt to replicate a finding and half succeed while half do not, the published literature may show mostly successes, creating a misleadingly strong impression of the effect's robustness.
Operationalization matters: "MHC dissimilarity" is not a single, agreed-upon variable. Studies that appear to test the same hypothesis may actually be testing related but distinct predictions.
The gap between preference and choice: Even if women reliably prefer the odor of MHC-dissimilar men in laboratory t-shirt tasks, this tells us little about whether MHC dissimilarity influences the complex, multi-factorial decision of actual partner selection. The ecological validity gap is large.
Media amplification: The popular science treatment of the t-shirt studies consistently overstated the certainty and magnitude of the findings, creating public beliefs about "genetic compatibility" that the scientific record does not support.
The sweaty t-shirt study remains scientifically interesting. It is not junk science. But it is also not the clean proof of an olfactory mate-choice mechanism that popular accounts have made it appear to be. It is, instead, an excellent example of a promising hypothesis in an ongoing state of empirical resolution — which is where science often actually lives.
Discussion Questions: 1. What would a definitive replication of the MHC hypothesis require? What study design would satisfy you that the effect is real and meaningful? 2. The oral contraceptive reversal finding received significant popular press coverage. What are the ethical implications of reporting a weakly supported finding that could affect people's decisions about contraception? 3. How would you distinguish, in a research design, between "odor preferences in a laboratory" and "odor-influenced partner selection in real life"?