Appendix D: The Replication Crisis Timeline
Key events in the discovery, investigation, and reform of psychology's methodological problems.
| Year | Event | Significance |
|---|---|---|
| 1979 | Rosenthal coins "file drawer problem" | First identification of publication bias |
| 1998 | Baumeister publishes ego depletion study | Launches a 200+ study program that later fails to replicate |
| 2005 | Ioannidis: "Why Most Published Research Findings Are False" | Meta-science paper (not psychology-specific) raising alarm about false positive rates |
| 2008 | Pashler et al. review learning styles | Concludes no adequate evidence for the meshing hypothesis |
| 2010 | Bem submits precognition paper to JPSP | Uses standard methods to find "evidence" for psychic powers |
| 2011 | Bem publishes "Feeling the Future" | The paper that catalyzed the crisis — standard methods producing impossible results |
| 2011 | Simmons, Nelson, & Simonsohn: "False-positive psychology" | Demonstrates how flexible analysis produces evidence for anything (Beatles song making you younger) |
| 2011 | Diederik Stapel fraud revealed | Prominent social psychologist fabricated data across dozens of papers |
| 2012 | Doyen et al. fail to replicate elderly priming | First high-profile replication failure of a landmark social psychology finding |
| 2012 | Nosek founds Center for Open Science | Infrastructure for pre-registration and open science practices |
| 2013 | Open Science Framework (OSF) launches | Platform for pre-registration and open data |
| 2013 | Shanks et al.: 9 experiments failing to replicate professor priming | Systematic dismantling of social priming |
| 2014 | Carter & McCullough: ego depletion publication bias | Meta-analysis showing the ego depletion literature was inflated by bias |
| 2014 | Ranehill et al. fail to replicate power posing | 200-participant study finds no hormonal effects |
| 2015 | Open Science Collaboration: "Estimating the reproducibility" | Only 36% of 100 tested findings replicate — the landmark paper |
| 2015 | Many Labs 1 published | Large-scale multi-lab replication of 13 classic findings; some replicate, some don't |
| 2016 | Hagger et al.: ego depletion RRR | 23 labs, 2,141 participants, d = 0.04 — the flagship finding collapses |
| 2016 | Carney publicly retracts support for power posing | First author of original paper says she no longer believes the effect is real |
| 2016 | FTC fines Lumosity $2 million | Federal action against brain training claims |
| 2017 | Many Labs 2 published | 28 findings tested; roughly half replicate |
| 2018 | Watts, Duncan, & Quan: marshmallow test replication | Effect substantially reduced when controlling for SES |
| 2018 | Sisk et al.: growth mindset meta-analyses | Small effects (r = 0.10; intervention d = 0.08) |
| 2019 | Yeager et al.: national growth mindset study | d = 0.03 overall — published in Nature |
| 2019 | Many Labs 5 published | Replication rate continues to show substantial non-replication |
| 2020s | Pre-registration becomes increasingly standard | New studies are more often pre-registered, improving reliability |
| 2022 | Moncrieff et al.: serotonin hypothesis umbrella review | No consistent evidence for the serotonin theory of depression |
| 2023+ | Registered Reports adopted by 300+ journals | The reform movement reaches critical mass |
What the Timeline Shows
Phase 1 (1979–2010): The Problem Accumulates. Publication bias, small samples, and flexible analysis produce a literature that overrepresents positive findings.
Phase 2 (2011–2015): The Crisis Erupts. Bem's paper, the Stapel fraud, and the OSC replication project reveal the scale of the problem.
Phase 3 (2016–present): Reform and Recovery. Pre-registration, Registered Reports, open data, and large-scale replications become standard. The field is producing more reliable new knowledge while reckoning with the unreliability of the legacy literature.
The crisis is not over. The legacy literature remains biased. But the direction is clearly toward better methods and more reliable science.