Chapter 3: Key Takeaways

Chapter 3: Key Takeaways

Core Concepts

The replication crisis was catalyzed by Daryl Bem's 2011 precognition paper, which demonstrated that standard psychological methods could produce evidence for impossible phenomena. The subsequent Open Science Collaboration (2015) found that only 36% of 100 published findings replicated successfully.
Four practices caused the crisis: p-hacking (flexible analysis until significance is found), HARKing (hypothesizing after results are known), publication bias (journals preferring significant results), and small samples with low statistical power.
Landmark findings that didn't hold up include: ego depletion (willpower as a limited resource), social priming effects, the Stanford Prison Experiment (methodologically compromised), the marshmallow test (effect largely explained by SES), and some power posing effects.
The field is reforming. Pre-registration, Registered Reports, open data, larger samples, and multi-lab replications have made new research more reliable. Psychology is stronger for having confronted its problems.
The replication crisis means psychology is MORE trustworthy now, not less. A science that checks and corrects its work should be trusted more than one that doesn't. The crisis revealed problems that had always existed; the reforms are addressing them.
Most popular psychology claims are based on pre-crisis research — published before pre-registration was standard, before large-scale replications were common, and before publication bias was widely recognized. This doesn't mean all pre-crisis findings are wrong, but it means their reliability should be treated as uncertain until confirmed by modern methods.
"Has this been replicated?" is now one of the most important questions you can ask about any psychology claim. Before 2011, this question barely existed in the field's culture.

Evidence Ratings in This Chapter

Claim	Rating	Summary
"Published findings in top journals are reliable"	❌ DEBUNKED	OSC 2015: only 36% replicated
"Ego depletion (willpower as a muscle) is real"	❌ DEBUNKED	Multi-lab RRR found no effect (d = 0.04)
"The Stanford Prison Experiment proved situations override character"	❌ DEBUNKED	Methodologically compromised: demand effects, self-selection, coaching
"Psychology is less trustworthy after the crisis"	❌ DEBUNKED	Reforms make the field more reliable than ever
"The replication crisis means psychology is self-correcting"	✅ SUPPORTED	Pre-registration, open data, and Registered Reports are genuine improvements

Key Terms Introduced

Replication crisis: The discovery that a large proportion of published findings in psychology (and other sciences) cannot be reproduced
P-hacking: Analyzing data flexibly until a significant result is found, then reporting only that result
HARKing: Hypothesizing After the Results are Known — writing papers as if data-driven hypotheses were pre-specified
Publication bias (file drawer problem): The systematic tendency of journals to publish significant results and reject null findings
Statistical power: The probability of detecting a real effect; low power increases false positive rates and inflates effect sizes
Winner's curse: The tendency for significant results from low-powered studies to overestimate the true effect
Pre-registration: Publicly committing to hypotheses and analysis plans before collecting data
Registered Reports: Journal format where papers are peer-reviewed and provisionally accepted before data collection

One Sentence to Remember

The replication crisis revealed that many of psychology's most famous findings were built on shaky methods — but the field's willingness to confront this honestly is why you should trust it more now, not less.