> "The first principle is that you must not fool yourself — and you are the easiest person to fool."
Learning Objectives
- Trace psychology's full paradigm arc from phrenology through the replication crisis
- Identify which failure modes produced the replication crisis and which enabled the response
- Evaluate the Open Science reforms as genuine correction, cosmetic correction, or potential overcorrection
- Distinguish between psychology's robust subfields and its fragile ones
- Extract lessons from psychology's crisis that apply to any field
In This Chapter
- Chapter Overview
- 25.1 The Arc: Five Paradigm Shifts in 150 Years
- 25.2 The Crisis: What Went Wrong
- 25.3 What It Looked Like From Inside
- 25.4 The Response: Open Science and Its Achievements
- 25.5 The Limits and Risks of the Correction
- 25.6 Applying the Correction Speed Model
- 25.7 What Psychology's Crisis Teaches Every Field
- 📐 Project Checkpoint
- 25.8 Chapter Summary
- Spaced Review
- What's Next
- Chapter 25 Exercises → exercises.md
- Chapter 25 Quiz → quiz.md
- Case Study: The Fall of Social Priming → case-study-01.md
- Case Study: The Open Science Movement — A Correction in Progress → case-study-02.md
Chapter 25: Field Autopsy: Psychology
"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman
Chapter Overview
In 2011, the social psychologist Daryl Bem published a paper in the Journal of Personality and Social Psychology — one of the field's most prestigious journals — claiming to demonstrate precognition. Nine experiments, he argued, showed that people's responses were influenced by events that hadn't happened yet. The future was reaching backward to shape the present.
The paper passed peer review. It was published. And its publication triggered a crisis — not because anyone believed in precognition, but because the methods Bem used were identical to the methods used in hundreds of other social psychology papers. If Bem's methods could "prove" precognition, what did that say about everything else those methods had "proven"?
The answer, it turned out, was devastating.
Within a few years, the field would discover that many of its most celebrated findings — power posing, ego depletion, social priming, stereotype threat effects, the Stanford Prison Experiment — could not be replicated. The statistical practices that produced these findings were revealed to be systematically biased: p-hacking, HARKing, small samples, flexible analysis, and the publication of only positive results had created an evidence base that was substantially unreliable.
But here is what makes psychology uniquely important for this book: psychology responded. Where other fields absorbed their crises with cosmetic reform (economics after 2008, NASA after Challenger), psychology undertook a genuine reconstruction of its methods, institutions, and culture. The Open Science movement — pre-registration, registered reports, open data, large-scale replications — represents the most ambitious attempt at institutional self-correction in modern social science.
Whether that reconstruction is complete, whether it has overcorrected, and what it teaches every other field — that is the subject of this chapter.
In this chapter, you will learn to: - Trace psychology's full paradigm arc and identify the failure modes at each stage - Analyze the replication crisis as a case study in accumulated institutional failure - Evaluate the Open Science reforms using the correction frameworks from Part III - Assess whether psychology's response is a model for other fields or a cautionary tale about overcorrection
🏃 Fast Track: If you're familiar with the replication crisis narrative, skim sections 25.1–25.3 and focus on 25.4–25.7, which evaluate the response and its limitations.
🔬 Deep Dive: After this chapter, read Stuart Ritchie's Science Fictions (2020) for the most accessible overview of the crisis, and Simine Vazire's work on credibility in psychological science for the most thoughtful insider analysis.
25.1 The Arc: Five Paradigm Shifts in 150 Years
Psychology has reinvented itself more dramatically and more frequently than any other science. This instability is itself a diagnostic sign — a field that has undergone five paradigm shifts in 150 years may have structural features that make it chronically vulnerable to error.
Introspection (1870s–1910s)
Wilhelm Wundt's founding of the first psychology laboratory in Leipzig in 1879 established psychology as an empirical science — but its primary method, introspection (trained self-observation), was inherently unreliable. Different laboratories trained their observers differently and produced contradictory results. The method was unfalsifiable (Chapter 3): if two observers reported different experiences, there was no way to determine who was correct.
Psychoanalysis (1890s–1960s)
Freud's psychoanalytic framework dominated clinical psychology for over half a century. Psychoanalysis was, as Karl Popper famously argued, a textbook case of unfalsifiability: any behavior could be explained after the fact (post-hoc rationalization), and any resistance to the theory was interpreted as evidence for the theory (the patient's "resistance" confirmed repression). The framework generated compelling narratives (the plausible story problem, Chapter 6) without generating testable predictions.
Behaviorism (1920s–1960s)
John Watson and B.F. Skinner's behaviorism was a deliberate correction to the unfalsifiability of introspection and psychoanalysis. By studying only observable behavior (not internal mental states), behaviorism imposed a disciplinary rigor that the earlier paradigms lacked. But behaviorism overcorrected (Chapter 21): by ruling out the study of mental processes entirely, it couldn't explain language acquisition, complex decision-making, or memory — phenomena that clearly involved internal processing.
The Cognitive Revolution (1950s–1980s)
The cognitive revolution — associated with Noam Chomsky's critique of behaviorism, George Miller's work on memory, and the development of information processing models — restored the study of mental processes. This was a genuine correction to behaviorism's overcorrection: it acknowledged that internal mental states exist and can be studied scientifically through their effects on behavior.
But the cognitive revolution introduced its own failure mode: the computer metaphor. The brain was modeled as an information processor — input, processing, output — and the metaphor shaped research for decades. As we discussed in Chapter 8 (imported error), borrowed metaphors gain unearned credibility through their source field's prestige. The computer metaphor worked well for some phenomena (attention, short-term memory) and poorly for others (emotion, motivation, social cognition). Where it worked poorly, the mismatch between metaphor and reality was absorbed rather than corrected.
Social Psychology's "Golden Age" (1980s–2010)
The period from the mid-1980s to approximately 2010 was social psychology's most productive and publicly visible era. Researchers produced a stream of findings that were dramatic, counterintuitive, and media-friendly: priming effects (exposure to words related to old age makes you walk slower), ego depletion (willpower is a limited resource that gets used up), power posing (standing in a powerful posture changes your hormone levels and behavior), implicit bias (unconscious prejudices shape behavior even in people who hold egalitarian beliefs), and dozens of similar findings.
These findings were published in top journals, covered extensively by media, featured in TED talks, and integrated into corporate training programs, educational interventions, and public policy. They made social psychology the most publicly visible subfield of the discipline.
They also, in many cases, turned out to be wrong.
Why Social Psychology Was Uniquely Vulnerable
Social psychology was the epicenter of the replication crisis not by accident but by structural design. The subfield had several features that made it maximally vulnerable to the failure modes documented in this book:
Small effects in noisy domains. Social psychology studies subtle influences on complex behavior — exactly the domain where effects are small, context-dependent, and easily confounded. Unlike perception (where the effects are large and directly measurable), social psychology operates in the zone where statistical noise and researcher decisions can easily masquerade as real effects.
Narrative appeal. Social psychology findings were stories — compelling, counterintuitive narratives about human behavior that translated perfectly into media coverage, TED talks, and popular books. The plausible story problem (Chapter 6) operated at maximum intensity: a finding that "power poses change your hormones" is a better story than "there's a small, inconsistent effect of posture on some self-report measures, maybe." The media-friendly findings were preferentially published, cited, and replicated — not because they were more likely to be true, but because they were better stories.
Low methodological barriers. Running a social psychology study requires a hypothesis, a sample of undergraduates, a computer, and a statistical software package. The low cost of conducting studies meant that researchers could run many studies, select the ones that "worked," and discard the rest — the file drawer problem at industrial scale. By contrast, a neuroscience study requiring fMRI or a clinical trial requiring IRB approval and patient recruitment is more expensive and less susceptible to mass-production of underpowered studies.
Celebrity culture. Social psychology developed a celebrity culture in which prominent researchers — Malcolm Gladwell, Amy Cuddy, Dan Ariely, Philip Zimbardo — became public figures whose careers depended on maintaining the dramatic findings that had made them famous. This created a specific variant of sunk cost: not just career investment but personal brand investment. Admitting that power posing or the Stanford Prison Experiment had fundamental problems meant not just revising a finding but dismantling a public identity.
25.2 The Crisis: What Went Wrong
The replication crisis didn't arrive in a single moment — it accumulated through a series of revelations between 2011 and 2015:
The Catalysts
Daryl Bem's precognition paper (2011): Published in JPSP using standard methods. The absurdity of the conclusion forced the field to confront what its methods could "prove."
Diederik Stapel's fraud exposure (2011): A prominent Dutch social psychologist was found to have fabricated data across dozens of papers. The fraud was detected not by peer review but by suspicious graduate students.
The Reproducibility Project (2015): Led by Brian Nosek at the Center for Open Science, this project attempted to replicate 100 published psychology studies. The result: only 36% replicated successfully with the original effect size. Even with a more generous criterion, fewer than half could be considered replicated.
The Many Labs projects (2014–2018): Multiple labs simultaneously attempted to replicate classic findings. Results were mixed: some effects replicated robustly across all labs (anchoring, the Stroop effect), while others failed completely (ego depletion, some priming effects).
The Specific Failures
Ego depletion: Roy Baumeister's theory that willpower is a limited resource — one of the most cited findings in social psychology — failed to replicate in a registered replication report with over 2,000 participants across 23 labs.
Power posing: Amy Cuddy's famous TED talk (the second-most-viewed TED talk ever) claimed that adopting "power poses" increased testosterone and decreased cortisol. A 2017 replication by Eva Ranehill and colleagues found no effect on hormones, though some behavioral effects remained debated.
Social priming: The entire field of social priming — the idea that subtle environmental cues can dramatically influence complex behavior — has been severely undermined. John Bargh's famous "elderly priming" study (words related to old age make you walk slower) has failed to replicate in multiple attempts.
The Stanford Prison Experiment: Philip Zimbardo's 1971 study — perhaps the most famous psychology experiment in history — has been revealed through examination of archival recordings to have involved extensive coaching of participants by the experimenters, fundamentally undermining its claim to demonstrate the power of situational forces.
The Root Causes
The failures were not random. They were produced by a specific set of questionable research practices (QRPs) that were endemic to the field:
- P-hacking: Running multiple statistical tests and reporting only the ones that reached significance (p < .05).
- HARKing (Hypothesizing After Results are Known): Formulating hypotheses after seeing the data, then presenting them as if they were predicted in advance.
- Small samples: Running studies with 20-40 participants, producing effect sizes that were too noisy to be reliable.
- Flexible analysis: Making decisions about data exclusion, variable selection, and statistical tests after seeing the data — the "garden of forking paths."
- Publication bias: Journals publishing only positive results, creating a literature that overrepresented effects and underrepresented null findings.
- The file drawer problem: Negative results (failed replications, null findings) never published, remaining in researchers' file drawers.
These practices were not secrets. Many researchers knew about them. Some openly discussed them at conferences. But the incentive structure (Chapter 11) rewarded them: novel positive findings led to publication in top journals, which led to grants, tenure, and prestige. Methodological rigor — running large samples, pre-registering analyses, publishing null results — was not rewarded and was sometimes punished (longer timelines, fewer publications, less media attention).
🔄 Check Your Understanding (try to answer without scrolling up)
- What was the Reproducibility Project's main finding?
- Name three questionable research practices (QRPs) that contributed to the crisis.
Verify
1. Only 36% of 100 published psychology studies replicated successfully with the original effect size. 2. Any three of: p-hacking, HARKing, small samples, flexible analysis, publication bias, the file drawer problem, selective reporting.
25.3 What It Looked Like From Inside
Consider the position of a social psychology professor in 2013. You have spent twenty years studying priming effects. Your career — tenure, grants, publications, invited talks, media appearances — is built on findings that are now being questioned. You know, at some level, that your studies used small samples and that you made some analytical decisions after seeing the data. But everyone did that. It was standard practice. Your mentors did it. Your reviewers accepted it. Your field encouraged it.
Now a graduate student at another university is running a high-powered replication of your key finding — and it's failing.
What do you do? The options mirror those we described for macroeconomists after 2008:
- Accept that your life's work may be substantially unreliable (cost: existential)
- Argue that the replication was conducted poorly or in a different context (cost: minimal)
- Attack the replication movement as misguided (cost: professional conflict)
Many senior researchers chose options 2 and 3. Some argued that "hidden moderators" (contextual factors that differed between the original study and the replication) explained the failures. Others argued that the replication movement was stifling creativity and punishing legitimate science. A few acknowledged the problem and began reforming their own practices.
The field split — not along lines of intelligence or integrity, but along lines of career investment. Researchers with more invested in the old practices resisted more strongly. Researchers with less invested — particularly junior researchers and methodologists — drove the reform.
This is Planck's principle (Chapter 17) in action: the correction was driven primarily by a new generation that had less invested in the old paradigm.
25.4 The Response: Open Science and Its Achievements
Psychology's response to the replication crisis has been, by the standards of institutional self-correction, remarkably vigorous:
Pre-registration
Researchers publicly register their hypotheses, sample sizes, and analysis plans before collecting data. This eliminates p-hacking and HARKing by committing the researcher to a specific analysis before the results are known. Pre-registration has been adopted by hundreds of psychology journals and is increasingly expected for confirmatory studies.
Registered Reports
A journal format in which peer review occurs before data collection. The study design and analysis plan are reviewed; if accepted, the paper is published regardless of whether the results are positive or negative. This eliminates publication bias by design. The journal Cortex pioneered this format, and it has now been adopted by over 300 journals across multiple disciplines.
Open Data and Open Materials
Requirements to share raw data and experimental materials alongside published papers, enabling independent verification and replication. Major journals and funders (including the NIH) increasingly require data sharing.
Large-Scale Collaborative Replications
The Psychological Science Accelerator (a global network of labs), the Many Labs projects, and other collaborative efforts conduct large-scale replications that provide definitive evidence about whether effects are real. These efforts address the small-sample problem by testing effects across thousands of participants in dozens of labs.
Statistical Reform
A growing movement away from null-hypothesis significance testing (NHST) toward alternative approaches: Bayesian statistics, estimation-based approaches, and an emphasis on effect sizes and confidence intervals rather than binary significant/not-significant decisions.
The NHST problem is worth understanding in detail, because it is not limited to psychology. The null-hypothesis framework asks: "If the effect is truly zero, what is the probability of getting data this extreme?" If that probability (the p-value) is below .05, the result is declared "statistically significant." But this framework has several structural problems:
- The .05 threshold is arbitrary. There is nothing magical about p < .05. It was proposed by Ronald Fisher as a rough guideline and has become, through convention rather than logic, the line between "publishable" and "not publishable."
- P-values don't measure what researchers think they measure. The p-value is not the probability that the hypothesis is true. It is the probability of the data given that the null hypothesis is true — a subtly but critically different statement that is misunderstood by the majority of researchers.
- Binary significance testing discards information. A finding that just misses significance (p = .06) may be substantively identical to one that just reaches it (p = .04), but the two receive radically different treatment in the publication system.
- Multiple testing inflates false positives. When researchers test many hypotheses (as flexible analysis enables), the probability that at least one will reach p < .05 by chance increases dramatically.
The statistical reform movement addresses these problems by shifting toward estimation (how big is the effect?) rather than testing (is the effect non-zero?), and toward Bayesian approaches that incorporate prior evidence. This reform is genuine and important — but it faces the same resistance any correction faces: the existing system (NHST) is what researchers are trained in, what journals accept, and what statistical software defaults to. The switching cost is real, even for something as apparently neutral as statistical methodology.
The Scorecard
By the criteria of genuine correction (Chapter 19), psychology scores well:
| Marker of Genuine Correction | Psychology's Response |
|---|---|
| New training curricula | Yes — graduate programs increasingly teach Open Science methods |
| New hiring criteria | Partially — some departments value methodological rigor in hiring |
| Changes persist after crisis fades | Yes — pre-registration and registered reports continue to grow |
| Former defenders acknowledge failure | Some — several senior researchers have publicly acknowledged QRPs |
| Correction extends to adjacent areas | Yes — Open Science spreading to medicine, political science, economics |
25.5 The Limits and Risks of the Correction
Psychology's response is impressive — but the analysis from Chapter 21 (overcorrection) and Chapter 20 (revision myth) suggests caution.
The Overcorrection Risk
As we discussed in Chapter 21, several concerns have emerged:
- Chilling effect on exploratory research. Pre-registration is excellent for confirmatory research but may constrain genuinely exploratory investigation. Some researchers report reluctance to pursue unexpected findings.
- Methodological conservatism. The post-crisis culture values large samples and pre-registered designs, which favors certain types of research (simple, scalable paradigms) over others (complex, context-sensitive investigations).
- Rebound orthodoxy. Open Science practices are becoming institutional requirements — embedded in journal policies, grant criteria, and hiring standards. This is appropriate if the practices are correct, but concerning if they become the kind of unquestionable orthodoxy that produced the original crisis.
The Revision Myth Risk
The narrative of the replication crisis is already being sanitized: "Psychology discovered its problems and self-corrected — proof that science works." This narrative erases the decades during which the problems were known and tolerated, the researchers who raised methodological concerns and were marginalized, and the degree to which the reforms were driven by external humiliation (media coverage of failed replications) rather than internal self-awareness.
The Robust Subfields
It is important to note that the replication crisis was not uniform across psychology. Some subfields were hit hard (social psychology, parts of cognitive psychology); others emerged relatively unscathed.
Perception and psychophysics — which study basic sensory processing — have high replication rates. The reason: the phenomena are directly measurable, the methods are well-standardized, and the effects are large.
Learning and conditioning — the legacy of behaviorism — also replicate well. The effects are robust, the paradigms are simple, and decades of animal research established strong baselines.
Cognitive psychology (core areas) — attention, memory, language processing — generally replicates well, though specific findings (particularly in the more "social" areas of cognition) have been challenged.
The pattern is clear: subfields with high evidence clarity (Chapter 22), well-standardized methods, and large, robust effects replicate well. Subfields with low evidence clarity, flexible methods, and small, context-dependent effects do not. The replication crisis was not a crisis of psychology as a whole — it was a crisis of specific subfields with specific methodological vulnerabilities.
This distinction matters for the field autopsy: the replication crisis does not invalidate psychology. It invalidates a specific way of doing psychology — small-sample, flexible-analysis, media-friendly research in social and personality psychology. The parts of psychology that were built on more rigorous methodological foundations remain sound. The crisis, properly understood, is not "psychology is broken" but "some parts of psychology were built on methods that produce unreliable results, and we now know which parts."
The WEIRD Problem
Compounding the replication crisis is the WEIRD problem: the vast majority of psychological research has been conducted on participants from Western, Educated, Industrialized, Rich, and Democratic societies — primarily American college students. Joseph Henrich and colleagues' 2010 paper demonstrated that WEIRD populations are often psychological outliers, not representative of human psychology in general.
This means that even findings that replicate perfectly within WEIRD samples may not generalize to the majority of humanity. The field has been building a theory of "human psychology" on a sample that represents approximately 12% of the world's population. This is survivorship bias (Chapter 5) at population scale: the evidence that survived into the literature came overwhelmingly from one type of human, and the psychology of all other humans was systematically invisible.
The WEIRD problem is harder to correct than the QRP problem, because it requires not just methodological reform (which can be implemented within existing institutional structures) but a geographic and cultural diversification of the research enterprise — which requires funding, infrastructure, and institutional relationships that take decades to build.
The Incentive Structure That Produced the Crisis
The QRPs that produced the crisis were not aberrations or individual failures. They were the predictable output of an incentive structure that can be mapped precisely:
The publication pressure chain: 1. Academic careers depend on publications in top journals 2. Top journals publish primarily novel, positive, counterintuitive findings 3. Novel positive findings are easier to produce with small samples and flexible analysis 4. Methodological rigor (large samples, pre-registration) is slower, more expensive, and less likely to produce "exciting" results 5. Therefore: the incentive structure systematically selects for flashy-but-unreliable over rigorous-but-boring
This is Goodhart's Law (Chapter 4) applied to science: publication count became the measure of scientific quality, and when the measure became the target, it ceased to measure quality. Researchers optimized for the metric (publications) rather than the outcome (truth), because the metric was what the system rewarded.
🔗 Connection: The incentive structure that produced psychology's replication crisis is structurally identical to the incentive structure that produced the 2008 financial crisis. In both cases: a system designed to reward a specific output (publications / profit) created systematic bias toward a specific type of error (unreliable findings / excessive risk). The correction in both cases required changing the incentive structure, not just the individuals operating within it. Psychology has done more to change its incentive structure than economics has — which is why psychology's correction is deeper.
25.6 Applying the Correction Speed Model
| Variable | Score | Assessment |
|---|---|---|
| Evidence clarity | HIGH | Failed replications are unambiguous; QRPs are documented and measurable |
| Switching cost | MEDIUM | Career investments in specific findings, but no external industry investment |
| Defender power | LOW-MEDIUM | Senior researchers are influential within academia but lack external power base |
| Outsider access | MEDIUM-HIGH | Open publication norms; junior researchers drove reform |
| Alternative availability | HIGH | Open Science methods are specific, implementable, and clearly superior |
| Crisis probability | MEDIUM | No single crisis; cumulative exposure through media and replication projects |
| Correction mode | Mixed | Crisis-driven + generational replacement + genuine persuasion |
| Revision resistance | MEDIUM | Some messy history preserved; but sanitization already underway |
Prediction: Medium-fast correction (10–20 years from crisis onset to widespread adoption). Assessment: Psychology is approximately 12 years into this correction (2011–present) and has made significant progress. The model fits.
Why psychology corrected faster and deeper than economics: Lower switching cost (no trillion-dollar industry dependent on the old methods), lower defender power (no external institutional leverage), higher alternative availability (Open Science methods were ready), and higher outsider access (junior researchers could publish criticism and gain professional standing). The structural comparison is sharp: the same crisis (empirical failure) produced deeper reform in the field with fewer structural barriers.
25.7 What Psychology's Crisis Teaches Every Field
Psychology's experience yields five generalizable lessons:
Lesson 1: Incentive structures produce the errors they reward. The QRPs that produced the replication crisis were not aberrations — they were the rational response to an incentive structure that rewarded novel positive findings and punished methodological rigor. Any field with a similar incentive structure is vulnerable to the same crisis.
Lesson 2: The quality of the correction depends on the availability of alternatives. Psychology reformed quickly because Open Science methods were concrete, implementable, and clearly better than the status quo. Fields without ready alternatives (macroeconomics) reform slowly even when the crisis is more severe.
Lesson 3: Junior researchers are the correction's vanguard. In psychology, as in most fields, the correction was driven disproportionately by people with less invested in the old paradigm. Institutional structures that empower junior voices accelerate correction.
Lesson 4: Genuine correction can coexist with overcorrection risk. The Open Science reforms are genuine and they carry the risk of becoming a new orthodoxy. Both things are true simultaneously. The lesson is not to resist reform but to build self-assessment mechanisms into the reform itself.
Lesson 5: The revision myth starts immediately. Even as the reforms are being implemented, the story is being sanitized. Preserving the messy version — including the decades of denial, the researchers who were punished for raising concerns, and the degree to which external pressure drove the change — is essential for maintaining the institutional vigilance that prevents the next crisis.
The Meta-Lesson
The deepest lesson from psychology's crisis is not about psychology at all. It is this: if the field that studies human bias was itself subject to those same biases at institutional scale — if the researchers who literally wrote the papers on confirmation bias, motivated reasoning, and overconfidence exhibited all of those biases in their own research practices — then no field is immune. The structural forces documented in this book operate on everyone, including the people who study them.
This is Theme 7 of the book in its most powerful formulation: you are currently wrong about something, and the feeling of being wrong is indistinguishable from the feeling of being right. Psychology's crisis proves this not as an abstract philosophical point but as an empirical fact: the people who understood cognitive bias best were the most confident in findings that turned out to be unreliable. Understanding bias does not protect against it. Only structural safeguards — pre-registration, open data, independent replication — provide protection. And even those safeguards are imperfect, as the overcorrection analysis (Chapter 21) warns.
📐 Project Checkpoint
Epistemic Audit — Chapter 25 Addition: The Psychology Comparison
25A. QRP Assessment. Does your field have equivalents of psychology's questionable research practices? Consider: selective reporting, flexible analysis, publication bias toward positive results, small samples, post-hoc hypothesis generation.
25B. Replication Culture. Has anyone attempted to replicate your field's foundational findings? If not, what does that absence tell you? If so, what were the results?
25C. Open Science Comparison. Does your field have equivalents of pre-registration, registered reports, open data, and collaborative replication? If not, what would it take to implement them?
25D. Robust vs. Fragile Assessment. Within your field, which subfields or findings are most likely to be robust (high evidence clarity, standardized methods, large effects) and which are most likely to be fragile (low evidence clarity, flexible methods, small effects)?
25.8 Chapter Summary
Key Concepts
- The replication crisis: ~64% of 100 tested psychology studies failed to replicate; driven by QRPs (p-hacking, HARKing, small samples, publication bias)
- Open Science reforms: Pre-registration, registered reports, open data, collaborative replications, statistical reform — the most ambitious institutional self-correction in modern social science
- Robust vs. fragile subfields: Perception, learning, and core cognitive psychology replicate well; social psychology and context-dependent effects do not
- Overcorrection risk: The reforms are genuine but carry risks of chilling exploratory research, methodological conservatism, and rebound orthodoxy
Key Arguments
- The replication crisis was produced by incentive structures, not by individual dishonesty — any field with similar incentives is vulnerable
- Psychology corrected faster and deeper than economics because of lower switching costs, lower defender power, higher alternative availability, and higher outsider access
- Genuine correction can coexist with overcorrection risk — the lesson is ongoing vigilance, not resistance to reform
- The revision myth is already operating on psychology's correction narrative
Spaced Review
Revisiting earlier material to strengthen retention.
-
(From Chapter 10 — The Replication Problem) This chapter described psychology's replication crisis in detail. How does the psychology case compare to the broader replication crisis described in Chapter 10? Are the root causes the same across fields?
-
(From Chapter 14 — Consensus Enforcement) How did consensus enforcement contribute to the pre-crisis culture in psychology? What happened to researchers who raised methodological concerns before the crisis?
-
(From Chapter 20 — The Revision Myth) The chapter notes that the replication crisis narrative is already being sanitized. Write a "messy version" of the crisis in three sentences, then write the "clean version" that the revision myth would produce.
-
(From Chapter 21 — Overcorrection) Is psychology at risk of overcorrection? Identify one specific reform that might be overcorrecting and one that seems well-calibrated.
Answers
1. The root causes are largely the same across fields: publication bias, incentive structures rewarding novelty over replication, small samples, and flexible analysis. Psychology's crisis was more dramatic because the field was more publicly visible and the failed replications were more extensively documented. But the same QRPs operate in medicine (clinical research), economics (experimental), and other fields. 2. Before the crisis, methodological critics (e.g., Jacob Cohen on statistical power, Paul Meehl on theory testing) were acknowledged but not acted upon. Researchers who challenged specific findings faced professional consequences: difficulty publishing, strained relationships, accusations of being "replication police." The consensus enforcement was cultural rather than institutional — no formal mechanism prevented criticism, but the informal costs were high enough to suppress it. 3. Messy version: "For decades, psychology's incentive structure rewarded flashy findings produced by questionable methods. Researchers who raised concerns were marginalized. The crisis was triggered by external embarrassment (Bem's precognition paper, media coverage of failed replications), and the reforms were driven disproportionately by junior researchers with less invested in the old system." Clean version: "Psychology discovered its methodological problems and self-corrected through the Open Science movement, demonstrating that science works." 4. Pre-registration may be overcorrecting against HARKing at the cost of constraining genuinely exploratory research — this is the clearest overcorrection risk. Large-scale collaborative replications (Many Labs, PSA) seem well-calibrated — they address the small-sample problem without constraining the types of questions that can be asked.What's Next
In Chapter 26: Field Autopsy: Nutrition Science, we will examine the field that has done the most damage to public trust in science — a field where contradictory findings, industry funding, and methodological weakness have created the perception that "scientists can't agree on anything." Nutrition is, in many ways, the field where every failure mode documented in this book operates simultaneously with maximum intensity.
Before moving on, complete the exercises and quiz to solidify your understanding.
Chapter 25 Exercises → exercises.md
Chapter 25 Quiz → quiz.md
Case Study: The Fall of Social Priming → case-study-01.md
Case Study: The Open Science Movement — A Correction in Progress → case-study-02.md
Related Reading
Explore this topic in other books
How Humans Get Stuck The Replication Problem Intro to Data Science Reproducibility and Collaboration Media Literacy Scientific Misinformation