Chapter 23: Field Autopsy: Medicine

34 min read

> "The desire to take medicine is perhaps the greatest feature which distinguishes man from animals."

Learning Objectives

Trace the full arc of medical error from humoral theory to the present, identifying which failure modes operated at each stage
Analyze medicine's correction mechanisms (RCTs, Cochrane, clinical guidelines) using the Correction Speed Model
Evaluate why medicine has built the most sophisticated correction infrastructure yet still fails to correct some errors
Identify which failure modes are currently active in medicine and estimate correction timelines
Apply the field autopsy methodology to assess any other field's complete error history

In This Chapter

Chapter Overview
23.1 The Long Wrong: 2,000 Years of Confident Error
23.2 The First Great Correction: Germ Theory
23.3 A Century of Error and Correction (1900–2000)
23.4 Medicine's Correction Toolkit: What Works
23.5 Medical Reversal: The Current Frontier
23.6 Applying the Correction Speed Model to Medicine
23.7 What It Looked Like From Inside: The Physician's Dilemma
23.8 Active Right Now: Where Medicine Is Currently Stuck
📐 Project Checkpoint
23.9 Chapter Summary
Spaced Review
What's Next
Chapter 23 Exercises → exercises.md
Chapter 23 Quiz → quiz.md
Case Study: The Opioid Crisis — Every Failure Mode at Once → case-study-01.md
Case Study: Medical Reversal — When Standard of Care Is Wrong → case-study-02.md

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 23: Field Autopsy: Medicine

"The desire to take medicine is perhaps the greatest feature which distinguishes man from animals." — William Osler

Chapter Overview

For approximately 2,000 years, the most educated, well-trained, well-intentioned medical practitioners in the Western world were killing their patients.

They were doing it systematically, confidently, and with the full support of the most prestigious medical authorities of their era. They were draining blood from people who needed it, administering mercury and arsenic to people who were already sick, purging and blistering and starving patients whose bodies needed rest and nutrition, and performing surgeries without washing their hands.

They were not evil. They were not stupid. They were operating within a paradigm — humoral medicine — that told them these treatments were correct, and within an institutional structure that rewarded adherence to the paradigm and punished deviation from it. Every failure mode documented in Parts I and II of this book was active simultaneously: authority cascades propagated the wrong treatments from prestigious physicians to everyone else. Unfalsifiable theory (the four humors) explained every outcome. Survivorship bias ensured that recoveries were attributed to treatment while deaths were attributed to disease severity. Sunk costs of training and reputation made switching inconceivable. And consensus enforcement destroyed the careers of anyone who suggested that the treatments might be harmful.

Medicine is the ideal subject for the first field autopsy because it has the longest documented history of error, the highest stakes (lives), and — crucially — the most sophisticated correction mechanisms of any field. Medicine invented the randomized controlled trial. Medicine built the Cochrane Collaboration. Medicine developed evidence-based practice guidelines. And yet medicine still fails to correct some errors for decades, still has a replication crisis in clinical research, and still sees the same structural failure modes operating that operated in the era of bloodletting.

This chapter examines medicine's complete arc — from confident wrongness through hard-won correction to the present state of imperfect reform — using the full framework from Parts I–III.

In this chapter, you will learn to: - Apply the book's full diagnostic framework to a single field's 2,500-year history - Identify which failure modes have been corrected by medicine's institutional reforms and which persist - Evaluate medicine's correction mechanisms and their limitations - Assess your own field against the medical benchmark

🏃 Fast Track: If you're familiar with the history of medical error, skim sections 23.1–23.3 and focus on sections 23.4–23.7, which apply the analytical framework and assess the current state.

🔬 Deep Dive: After this chapter, read David Wootton's Bad Medicine: Doctors Doing Harm Since Hippocrates (2006) for the most comprehensive account of medical error's history, and Vinayak Prasad and Adam Cifu's Ending Medical Reversal (2015) for the most rigorous analysis of why medicine still gets things wrong.

23.1 The Long Wrong: 2,000 Years of Confident Error

The Humoral Paradigm (c. 400 BCE – c. 1850 CE)

Western medicine was dominated for over two millennia by the humoral theory — the idea, attributed to Hippocrates and systematized by Galen, that health depends on the balance of four humors: blood, phlegm, yellow bile, and black bile. Disease was caused by an imbalance of these humors, and treatment consisted of restoring the balance — through bloodletting (removing excess blood), purging (removing excess bile), and the administration of various substances designed to rebalance the system.

Apply the failure mode framework:

Authority cascade (Ch.2): Galen's authority was so absolute that his anatomical descriptions — many derived from animal dissections rather than human anatomy — were taught as fact for over 1,000 years. When Andreas Vesalius published corrected human anatomy in 1543, his work was attacked by Galenists who insisted that if the human body disagreed with Galen, the body must be abnormal.
Unfalsifiability (Ch.3): Humoral theory was unfalsifiable in practice. If a patient recovered after bloodletting, the treatment worked. If a patient died after bloodletting, the disease was too severe, or the bloodletting was insufficient, or the humoral imbalance was too deep. No outcome could disconfirm the theory.
Sunk cost (Ch.9): By the time counter-evidence began accumulating in the 17th and 18th centuries, two thousand years of medical training, institutional infrastructure, and professional identity were built on humoral theory.
Consensus enforcement (Ch.14): Physicians who questioned bloodletting or other humoral treatments faced professional ostracism. The institutional structures of medicine — licensing, training, professional societies — reinforced adherence to the paradigm.

Heroic Medicine (1780s–1850s)

The era of "heroic medicine" in the late 18th and early 19th centuries represents the humoral paradigm's most aggressive — and most harmful — period. Heroic medicine involved extreme interventions: massive bloodletting (patients sometimes drained of 80% of their blood volume), administration of mercury chloride (calomel) as a "universal remedy," blistering agents applied to the skin to draw out disease, and powerful purges designed to empty the bowels.

Benjamin Rush, the most prominent American physician of the late 18th century and a signer of the Declaration of Independence, was the era's most influential advocate of heroic medicine. Rush treated yellow fever patients with bloodletting so aggressive that patients died from blood loss rather than from the disease. His treatments almost certainly killed more patients than they saved.

What It Looked Like From Inside:

Consider Rush's perspective. He was a brilliant man — a genuine intellectual, a patriot, and a committed physician. The theoretical framework told him that yellow fever was caused by excess blood and bile. His treatment — removing blood and purging bile — was logically consistent with the theory. When patients died, the theory explained the death (the disease was too severe, the treatment was applied too late). When patients survived, the theory explained the recovery (the treatment restored humoral balance). Rush had no way to test whether his patients would have done better without treatment, because the concept of a control group had not yet been developed.

The absence of controlled comparison is the structural feature that sustained 2,000 years of harmful medicine. Without a comparison group, every recovery is evidence that the treatment works, and every death is evidence that the disease was severe. This is not a failure of individual reasoning — it is a failure of methodology, and it persisted because no one had yet invented the methodology that would reveal the failure.

The Scope of the Damage

Historian David Wootton has argued that medicine did not save more lives than it took until approximately 1865 — the year Joseph Lister introduced antiseptic surgery. Before that date, the net effect of medical intervention on the human population was negative. Physicians, on average, killed more patients than they cured. For over two thousand years.

This is not a fringe claim. Wootton marshals extensive evidence: bloodletting weakened patients fighting infections. Mercury treatments poisoned them. Purging dehydrated them. Surgery without asepsis infected them. And the treatments that did work — wound care, bonesetting, some herbal remedies — would have been equally effective in the hands of folk practitioners without any theoretical framework.

The damage was not limited to individual patients. Medical authority shaped public health policy for centuries. Quarantine practices were inconsistently applied because miasma theory provided no clear rationale for person-to-person transmission. Sanitation reforms were delayed because the medical establishment resisted the idea that water contamination (rather than bad air) caused cholera. The plausible story of miasma — you could smell disease — trumped the correct but less intuitive story of waterborne bacteria.

Pierre Louis and the First Crack

The first systematic challenge to bloodletting came from Pierre Charles Alexandre Louis, a French physician who in the 1830s applied what he called the "numerical method" — a forerunner of clinical epidemiology — to compare outcomes between bloodletted and non-bloodletted patients with pneumonia. His data showed that bloodletting did not improve outcomes and might worsen them.

Louis's findings were acknowledged but did not change practice. The institutional response is illuminating: physicians praised his methodology while continuing to bleed their patients. The evidence was "interesting" but "insufficient to overturn established practice." The argument against was not that Louis was wrong, but that clinical judgment — the experienced physician's trained eye — was a more reliable guide than mere numbers.

This response is the ancestor of every subsequent resistance to evidence-based reform in medicine. The claim that individual clinical experience outweighs systematic evidence is the medical profession's most durable defense against correction. It has been articulated in every era, against every evidence-based reform, from Louis's numerical method to modern RCTs.

🔄 Check Your Understanding (try to answer without scrolling up)

What made humoral theory unfalsifiable in practice?

Why couldn't individual physicians detect that their treatments were harmful?

Verify
1. Any outcome could be explained within the theory: recovery confirmed the treatment worked; death confirmed the disease was too severe or the treatment was insufficient. No observation could disconfirm the theory. 2. Because without controlled comparison (treating some patients and leaving others untreated), there was no way to distinguish the effect of the treatment from the natural course of the disease. Every recovery seemed to confirm the treatment; every death seemed to confirm the disease's severity.

23.2 The First Great Correction: Germ Theory

The germ theory revolution of the mid-19th century represents one of the most dramatic paradigm shifts in the history of any field. Within approximately 40 years (1840s–1880s), the entire framework of medical causation shifted from humoral imbalance to microbial infection.

The Key Figures and Their Fates

Ignaz Semmelweis (1818–1865): Demonstrated in 1847 that hand-washing with chlorine solution dramatically reduced childbed fever mortality in maternity wards. His evidence was compelling: wards where doctors washed their hands had mortality rates of 1-2%, compared to 10-15% in wards where they didn't. The medical establishment's response: rejection, ridicule, and professional destruction. Semmelweis was dismissed from his position, unable to secure a comparable one elsewhere, and died in a mental institution at age 47. His hand-washing protocol was abandoned after his departure.

John Snow (1813–1858): Mapped cholera cases in London's 1854 Broad Street outbreak and identified the contaminated water pump as the source — before germ theory was established. His epidemiological approach challenged the prevailing miasma theory (the idea that disease was caused by "bad air"). The response was mixed: Snow's evidence was taken seriously by some but rejected by the medical establishment at large, which continued to advocate miasma theory for another two decades.

Louis Pasteur (1822–1895) and Robert Koch (1843–1910): Provided the definitive experimental evidence for germ theory in the 1860s–1880s. Pasteur demonstrated that microorganisms caused fermentation and disease; Koch identified specific bacteria causing tuberculosis, cholera, and anthrax, establishing Koch's postulates as the gold standard for proving microbial causation. Their evidence was eventually accepted — but the acceptance was neither immediate nor universal.

Joseph Lister (1827–1912): Applied Pasteur's findings to surgery, introducing antiseptic technique in the 1860s. The response from the surgical establishment was resistance: surgeons considered their bloody frock coats a badge of experience, and many dismissed Lister's methods as unnecessary inconvenience. Antiseptic technique took over twenty years to become standard practice.

The resistance to Lister is particularly instructive because his intervention was simple, cheap, and demonstrably effective. He was not asking surgeons to abandon a theory — only to wash their hands and instruments. The resistance was not intellectual but cultural: the suggestion that surgeons were carrying infection to their patients was an insult to professional identity. Accepting Lister meant accepting that surgeons had been killing patients through negligence — an emotionally intolerable conclusion, regardless of the evidence.

This is the sunk cost of professional identity — not just careers and textbooks, but the self-concept of practitioners who have dedicated their lives to healing. The cost of admitting that your treatments have been harmful is not just professional but psychological. It requires a revision of identity that most people — even intelligent, well-intentioned people — resist until the evidence becomes absolutely inescapable.

Applying the Correction Speed Model

Variable	Score
Evidence clarity	HIGH (Koch's postulates, controlled comparisons)
Switching cost	MEDIUM-HIGH (entire theory of disease causation)
Defender power	MEDIUM (prestigious physicians, but no external power base)
Outsider access	LOW (hierarchical medical profession)
Alternative availability	HIGH (germ theory was a clear replacement framework)
Crisis probability	MEDIUM-HIGH (visible epidemics, surgical mortality)
Correction mode	Mixed (persuasion + circumvention + crisis)
Revision resistance	LOW (the germ theory revolution is heavily sanitized in medical history)

Result: Medium-speed correction (~40 years from Semmelweis to standard antiseptic practice). The high alternative availability and high evidence clarity pulled toward fast; the low outsider access and medium-high switching cost pulled toward slow.

🔗 Connection: Semmelweis's story is the purest example of the outsider problem (Chapter 18) in medical history. He had clear evidence, a reproducible demonstration, and a simple intervention that would save lives. He was destroyed by the institution he was trying to fix. The structural lesson: the quality of the evidence does not determine how the evidence is received. The institutional position of the person presenting the evidence determines that.

23.3 A Century of Error and Correction (1900–2000)

The 20th century saw medicine achieve unprecedented therapeutic power — and commit unprecedented therapeutic errors. The pattern repeated: confident wrong answer, resistance to correction, eventually crisis or generational change.

Lobotomy (1935–1970s)

Egas Moniz received the Nobel Prize in Physiology or Medicine in 1949 for developing the prefrontal lobotomy. Over the following two decades, an estimated 40,000–50,000 patients in the United States alone underwent the procedure — many without meaningful consent, many with outcomes ranging from severe personality changes to permanent disability to death.

Failure modes active: Authority cascade (the Nobel Prize provided maximum prestige amplification). Sunk cost (psychiatric institutions had invested in the procedure as their primary intervention for severe mental illness). Precision without accuracy (Ch.12: the procedure was performed with surgical precision on a theory of neuroanatomy that was fundamentally wrong). Low crisis probability (the victims were institutionalized psychiatric patients with minimal political voice).

Correction mechanism: The introduction of chlorpromazine (the first antipsychotic medication) in the 1950s provided an alternative that made lobotomy unnecessary. The procedure declined not because the evidence against it became overwhelming, but because a better option appeared. This is the alternative availability variable in action.

Thalidomide (1957–1962) and Its Aftermath

We examined thalidomide in Chapter 21 as an example of crisis-driven overcorrection. In the field autopsy context, it illustrates a different point: the development of the modern drug regulatory framework was itself a correction — from no systematic safety testing to mandatory clinical trials. The correction was genuine but, as Chapter 21 showed, it overcorrected in specific dimensions.

The Replication Crisis in Clinical Research

Medicine has its own replication crisis, though it receives less attention than psychology's. In 2011, researchers at Bayer Healthcare reported that they could replicate the findings of only 20-25% of published preclinical studies they attempted to validate. In 2012, researchers at Amgen reported that they could reproduce only 6 of 53 (11%) "landmark" preclinical cancer studies.

The implications are staggering: the preclinical research base on which clinical trials are built — and on which treatment decisions depend — may be substantially unreliable. Billions of dollars in clinical development have been invested in pursuing drug targets identified by preclinical studies that don't replicate.

Failure modes active: Publication bias (Ch.5: positive results published, negative results filed away). Incentive structures (Ch.11: researchers rewarded for novel findings, not replications). Precision without accuracy (Ch.12: p-values treated as gospel despite inadequate sample sizes and researcher degrees of freedom). Consensus enforcement (Ch.14: challenging established findings is professionally risky).

The correction is underway — pre-registration, registered reports, and replication initiatives are gaining traction — but it is proceeding slowly against the same structural resistance that psychology's Open Science movement faced.

The Opioid Crisis (1990s–present)

The opioid crisis illustrates that medicine's correction mechanisms, for all their sophistication, can still fail when the incentive structures (Chapter 11) are sufficiently powerful.

In the mid-1990s, pharmaceutical companies — most notoriously Purdue Pharma with OxyContin — marketed opioid painkillers with claims about low addiction potential that were not supported by adequate evidence. The American Pain Society promoted the concept of "pain as the fifth vital sign," encouraging aggressive pain treatment. Prescribing rates skyrocketed.

Failure modes active: - Incentive structures (Ch.11): Pharmaceutical companies spent billions marketing opioids to physicians. Sales representatives provided misleading information about addiction risk. "Key opinion leaders" — physicians paid by pharmaceutical companies — published favorable articles and gave lectures promoting opioid prescribing. - Authority cascade (Ch.2): The endorsement by professional societies (American Pain Society, Joint Commission) gave opioid prescribing institutional legitimacy that individual physicians relied on. - Precision without accuracy (Ch.12): Pain rating scales (0-10) gave the illusion of measurement while capturing none of the complexity of pain experience, addiction risk, or treatment outcomes. - Consensus enforcement (Ch.14): Physicians who resisted the trend toward aggressive opioid prescribing were accused of under-treating pain — a charge with legal and professional consequences.

The cost: Over 500,000 Americans dead from opioid overdoses between 1999 and 2020. The crisis continues.

The correction: Ongoing and incomplete. Prescribing guidelines have been revised. Some states have implemented prescription monitoring programs. Purdue Pharma pleaded guilty to federal charges. But the structural features that produced the crisis — pharmaceutical industry influence on prescribing, inadequate post-market surveillance, the incentive structure that rewards prescribing over non-pharmacological pain management — remain largely intact.

What the opioid crisis teaches about medicine's correction infrastructure: The RCT, Cochrane, and clinical guidelines — medicine's crown jewels of evidence-based practice — did not prevent the opioid crisis. The reason is structural: these tools are designed to evaluate whether a treatment works, not whether the institutions promoting it are trustworthy. The opioid crisis was not a failure of evidence evaluation. It was a failure of incentive alignment — the evidence was generated by interested parties, amplified by paid advocates, and embedded in guidelines by conflicted experts. Medicine's correction tools assume good-faith evidence production. When that assumption fails — when the evidence itself is corrupted by financial interests — the tools are helpless.

This is the deepest lesson of the medical field autopsy: correction infrastructure that addresses only evidence quality, without addressing evidence production incentives, is structurally incomplete. RCTs can be designed to favor the sponsor's product. Systematic reviews can only synthesize the studies that exist. Guidelines can be written by experts with conflicts of interest. The tools are powerful but they operate within the incentive structure rather than reforming it.

🔗 Connection: The opioid crisis is the medical equivalent of the 2008 financial crisis (Chapter 19, case study 2): in both cases, the correction infrastructure (financial regulation / medical evidence-based practice) was real and sophisticated, but the incentive structures that produced the error (financial industry lobbying / pharmaceutical marketing) were more powerful than the correction mechanisms. In both cases, the crisis produced regulatory reform but left the underlying incentive structures largely intact.

🔄 Check Your Understanding (try to answer without scrolling up)

What was the primary alternative that ended the lobotomy era?

Which failure modes from Parts I and II were active in the opioid crisis?

Verify
1. Chlorpromazine (the first antipsychotic medication) — demonstrating the "alternative availability" variable: lobotomy declined because a better option appeared, not because the evidence against it became overwhelming. 2. Incentive structures (pharmaceutical marketing), authority cascade (professional society endorsement), precision without accuracy (pain rating scales), and consensus enforcement (accusations of under-treating pain against resistant physicians).

23.4 Medicine's Correction Toolkit: What Works

Medicine has built the most sophisticated correction infrastructure of any field. Understanding what works — and why some errors persist despite this infrastructure — is the central analytical task of this autopsy.

The Randomized Controlled Trial (RCT)

The RCT, developed in its modern form in the late 1940s (the 1948 streptomycin trial is generally cited as the first modern RCT), addresses the fundamental problem that sustained 2,000 years of harmful medicine: the absence of controlled comparison. By randomly assigning patients to treatment and control groups, the RCT allows the effect of the treatment to be separated from the natural course of the disease.

What it corrects: The survivorship bias, post-hoc rationalization, and confirmation bias that made pre-RCT medicine unfalsifiable.

What it doesn't correct: The incentive structures that bias which RCTs are conducted (studies of profitable drugs are funded; studies of unprofitable alternatives are not), how they are designed (choice of comparators, endpoints, and follow-up duration can be optimized for favorable results), and which results are published (publication bias toward positive results persists).

The Cochrane Collaboration

Founded in 1993, Cochrane conducts systematic reviews — rigorous syntheses of all available evidence on specific medical questions. Cochrane reviews are considered the gold standard of medical evidence synthesis.

What it corrects: The plausible story problem (Chapter 6) — by systematically reviewing all evidence rather than selectively citing studies that support a preferred narrative.

What it doesn't correct: The problem of evidence that was never generated. Cochrane can only review studies that were conducted. If the studies that should have been done (testing the alternatives, replicating key findings, examining long-term outcomes) were never funded, Cochrane's reviews will reflect the biased evidence base rather than the truth.

Clinical Practice Guidelines

Professional societies publish guidelines that synthesize evidence into specific clinical recommendations. Guidelines are the mechanism by which research evidence reaches practicing physicians.

What they correct: The gap between evidence and practice — ensuring that individual physicians' decisions are informed by the best available evidence rather than by personal experience or training that may be outdated.

What they don't correct: The authority cascade. Guidelines are written by experts who may have conflicts of interest, may be invested in the current paradigm, and may systematically underweight evidence that challenges their positions. Guidelines can — and do — encode wrong answers with institutional authority.

The Eminence-Based Medicine Problem

Despite the EBM revolution, much of medical practice remains what critics call "eminence-based medicine" — driven by the authority of prominent physicians rather than the systematic evaluation of evidence. Senior clinicians, department chairs, and key opinion leaders wield enormous influence over practice through informal channels: grand rounds presentations, mentorship, corridor conversations, and the "how we do it here" culture that shapes every training program.

Research by John Ioannidis and others has documented the persistence of eminence-based practice: a substantial proportion of medical decisions are made based on expert opinion rather than on evidence from rigorous trials. This is not because physicians are unaware of EBM principles — they are trained in them. It is because the institutional culture of medicine still privileges experience and seniority, and because the volume of medical literature is so vast that no individual physician can evaluate the evidence for every clinical decision. In practice, physicians rely on trusted authorities — which is an authority cascade operating within the correction infrastructure designed to prevent it.

This is Theme 9 in action: the correction mechanism (EBM) has become a new landscape in which the old failure mode (authority cascade) operates through new channels. Guidelines themselves become the authoritative pronouncements that physicians defer to — and when guidelines are wrong (as they sometimes are), the deference that EBM was designed to replace reasserts itself in a new form.

23.5 Medical Reversal: The Current Frontier

Despite medicine's correction infrastructure, the field continues to discover that established practices are wrong. Researchers Vinayak Prasad and Adam Cifu coined the term medical reversal to describe the phenomenon: a medical practice established based on inadequate evidence that is later contradicted by higher-quality evidence.

Examples of medical reversals include: - Hormone replacement therapy (HRT) for postmenopausal women: Widely recommended for decades based on observational data suggesting cardiovascular benefits. The Women's Health Initiative RCT (2002) found that HRT actually increased cardiovascular risk. Millions of women had been prescribed a treatment that was harming rather than helping them. - Routine stenting for stable angina: Percutaneous coronary intervention (stenting) for stable heart disease was standard practice for years. The COURAGE trial (2007) and the ORBITA trial (2017) showed that for stable angina, stenting was no better than optimal medical therapy alone. Hundreds of thousands of unnecessary procedures had been performed. - Arthroscopic knee surgery for osteoarthritis: Widely performed for decades. Multiple RCTs showed it was no more effective than sham surgery (placebo). The procedure persists in some settings despite the evidence.

Prasad and Cifu estimate that approximately 40% of established medical practices that have been tested in rigorous trials have been reversed — found to be no better than alternatives or actually harmful.

The Scale of the Problem

The reversal rate is not evenly distributed across medicine. Some areas — emergency medicine, surgical procedures adopted without RCTs, screening programs — have higher reversal rates. Others — infectious disease (where the evidence base is often stronger and the causal chains more direct) — have lower rates.

What makes the reversal problem structurally concerning is the pathway by which practices become standard. Many medical practices follow a predictable trajectory:

Small study or observational data suggests a treatment works
Mechanistic reasoning ("it makes biological sense") provides theoretical support
Expert endorsement (authority cascade) amplifies the finding
Practice guidelines encode the treatment as standard of care
Training programs teach the treatment to new physicians
Institutional infrastructure (equipment, staffing, billing codes) is built around it
Only then — sometimes decades later, sometimes never — is a rigorous RCT conducted

This inverted pipeline means that by the time rigorous evidence arrives, the switching cost is enormous. The treatment is embedded in guidelines, training, infrastructure, and professional identity. The RCT that reveals the treatment is ineffective is fighting not just the inertia of habit but the full weight of institutional investment.

Why Reversals Persist

Medical reversal illustrates the full failure mode stack operating simultaneously:

Initial adoption based on inadequate evidence: Many practices are adopted based on observational studies, mechanistic reasoning, or small trials — then never rigorously tested because they become "standard of care."
Authority cascade: Once a practice is endorsed by guidelines and professional societies, questioning it requires challenging institutional authority.
Sunk cost: Physicians who have performed thousands of stent procedures, surgeons who have built careers on arthroscopic knee surgery, hospitals that have invested in cardiac catheterization labs — all have enormous investments in the continued use of these procedures.
Incentive structures: Many reversed practices are procedures that are financially lucrative. The incentive structure rewards performing procedures, not testing whether they work.
Therapeutic inertia: Physicians continue prescribing treatments they were trained to use, even when evidence suggests they should change. Changing practice requires actively unlearning, which is cognitively costly and psychologically uncomfortable.

🧩 Productive Struggle

Before reading the next section, consider: Medicine has the most sophisticated correction infrastructure of any field — RCTs, Cochrane reviews, clinical guidelines, regulatory oversight. Yet ~40% of tested practices are reversed. What does this tell us about the limits of correction infrastructure? Is the problem that the infrastructure is inadequate, or that the failure modes are more powerful than any infrastructure can overcome?

Spend 3–5 minutes, then read on.

23.6 Applying the Correction Speed Model to Medicine

Let's score medicine as a field on the eight Correction Speed Model variables:

Variable	Score	Assessment
Evidence clarity	MEDIUM-HIGH	RCTs provide high clarity when conducted; but many practices are never rigorously tested
Switching cost	HIGH	Training, equipment, institutional infrastructure, patient expectations
Defender power	MEDIUM-HIGH	Professional societies, procedure-dependent specialists, pharmaceutical industry
Outsider access	MEDIUM	EBM has opened some channels; but medical hierarchy remains strong
Alternative availability	VARIABLE	High for some conditions (antibiotics replacing surgery); low for others (no replacement for many procedures)
Crisis probability	MEDIUM	Visible iatrogenic harm exists but is often diffuse (opioid crisis) or individual (malpractice)
Correction mode	Mixed	Persuasion through guidelines + circumvention through new physician training + occasional crisis
Revision resistance	LOW-MEDIUM	Some institutional memory (Cochrane) but medical history is heavily sanitized

Overall assessment: Medicine corrects faster than most fields (10–30 years for specific practices) because its evidence clarity is relatively high and its correction infrastructure (RCTs, Cochrane) is genuinely powerful. But it corrects slower than it should because its failure modes — particularly incentive structures and authority cascades — are also powerful, and because many practices are adopted without adequate testing and then never rigorously evaluated.

The fundamental tension: Medicine has the best correction tools and some of the strongest failure modes. The result is a field that is simultaneously the most self-correcting and the most in need of correction — because its interventions have the highest stakes (human lives) and the gap between current practice and optimal practice, while narrower than in most fields, is measured in preventable deaths.

23.7 What It Looked Like From Inside: The Physician's Dilemma

Let's reconstruct the institutional position of a practicing physician confronting the failure modes described in this chapter.

You are an orthopedic surgeon in 2018. You have performed arthroscopic knee surgery for osteoarthritis hundreds of times over your career. Your patients generally report improvement. Your training, your mentors, and your professional society's guidelines endorse the procedure. Your hospital's revenue model includes arthroscopic procedures as a significant income stream. Your reputation is partly built on your skill as a surgeon.

Then the ORBITA-style trials are published showing that for some conditions, stenting and surgical procedures are no better than sham surgery or optimal medical therapy.

What do you do?

The evidence says change. Your training says continue. Your patients expect surgery. Your hospital expects procedures. Your income depends on operating. Your colleagues are still performing the procedure. The guidelines haven't changed yet. And your own experience — hundreds of patients who got better — conflicts with the trial data (even though you know, intellectually, that your experience is subject to placebo effects, regression to the mean, and the confirmation bias of following up more carefully with surgical patients).

This is the sunk cost problem (Chapter 9), the Einstellung effect (Chapter 13), and the incentive structure problem (Chapter 11) all operating simultaneously — in a physician who is intelligent, well-trained, and genuinely committed to patient welfare. The failure is not in the physician. The failure is in the system that makes changing practice so difficult even when the evidence supports change.

Research on "therapeutic inertia" — the phenomenon of physicians continuing established treatments despite evidence that they should change — estimates that the average time from the publication of definitive evidence to widespread adoption of correct practice is 17 years. During those 17 years, patients receive treatments that evidence has shown to be inferior, unnecessary, or harmful.

Seventeen years. In medicine. The field with the best correction infrastructure.

This number should haunt every other field that has less correction infrastructure and assumes it is doing better.

23.8 Active Right Now: Where Medicine Is Currently Stuck

Several areas of medicine currently exhibit the patterns described in this book:

Overdiagnosis and overtreatment. Many screening programs (PSA for prostate cancer, mammography schedules, thyroid screening) detect conditions that would never cause symptoms or death if left undetected. The treatment of these "incidentalomas" exposes patients to risk without benefit. The evidence against aggressive screening is growing, but the authority cascade (professional society guidelines), the incentive structure (screening generates revenue), and the sunk cost (screening programs are enormous institutional investments) all resist correction.

Low-value care. The Choosing Wisely initiative, launched in 2012, has identified hundreds of commonly performed medical practices that provide little or no benefit to patients. Adoption of the initiative's recommendations has been slow — because physicians face no penalty for providing low-value care and may face liability for withholding it (the same political asymmetry that drives overcorrection in Chapter 21).

Mental health treatment. The field's dominant paradigm — the neurochemical model of mental illness (the "chemical imbalance" theory popularized in the 1990s) — has come under increasing challenge. A 2022 umbrella review in Molecular Psychiatry found no consistent evidence supporting the serotonin hypothesis of depression — the theory that had been the primary justification for SSRI antidepressants for three decades. The theory was always an oversimplification (Chapter 7's anchoring of first explanations), and the evidence base for many psychiatric medications is weaker than commonly assumed. The correction is ongoing but faces enormous switching costs (pharmaceutical industry investment, clinical training, patient expectations) and low alternative availability (no clear replacement paradigm for many conditions).

Surgical innovation without evidence. New surgical techniques are routinely adopted without the rigorous testing that would be required for a new drug. The regulatory framework treats devices and procedures differently from pharmaceuticals — a structural asymmetry that allows surgical innovations to enter practice based on small case series and expert opinion rather than RCTs. When procedures are eventually tested rigorously, the reversal rate is high. The structural problem: surgeons have strong professional incentives to innovate (career advancement, institutional prestige, patient demand) and weak incentives to rigorously test whether innovations actually work (testing is expensive, takes years, and risks showing that the innovation doesn't work).

Algorithmic medicine and new failure modes. Machine learning models are increasingly used in clinical decision-making — predicting disease risk, recommending treatments, triaging patients. These systems introduce new failure modes that medicine's existing correction infrastructure is not designed to address: training data bias (models trained on historically biased data reproduce the bias), opacity (black-box models that cannot explain their recommendations), and the authority cascade of algorithmic authority (physicians defer to algorithmic recommendations even when clinical judgment suggests otherwise). The same institutional dynamics that produced deference to Galen are beginning to produce deference to algorithms — with the additional complication that the algorithm's reasoning is literally invisible.

📐 Project Checkpoint

Epistemic Audit — Chapter 23 Addition: The Medical Benchmark

Apply this chapter's analysis to your own field:

23A. Era Mapping. If your field's history were mapped onto medicine's arc, which era would it be in? - Pre-germ-theory (before ~1850): No reliable correction mechanism; confident wrongness sustained by unfalsifiable theory - Early correction (1850–1950): Some correction mechanisms emerging, but institutional resistance still dominant - RCT era (1950–1990): Correction infrastructure in place for some questions; major errors still common - EBM era (1990–present): Sophisticated correction tools, but failure modes still powerful enough to sustain wrong practices for decades

23B. Correction Infrastructure Comparison. What is your field's equivalent of the RCT? What is your field's equivalent of Cochrane reviews? What is your field's equivalent of clinical practice guidelines? If these equivalents don't exist, your field has less correction infrastructure than medicine — and medicine still gets ~40% of tested practices wrong.

23C. Medical Reversal Analog. Are there practices in your field that were adopted based on inadequate evidence and have never been rigorously tested? What would your field's "medical reversal rate" be if someone systematically tested established practices?

23.9 Chapter Summary

Key Concepts

Humoral medicine as paradigm case: 2,000 years of confident, systematic, well-intentioned wrong treatment — sustained by unfalsifiable theory, authority cascade, and absence of controlled comparison
Germ theory revolution: A 40-year correction driven by high evidence clarity, high alternative availability, and the outsider contributions of Semmelweis, Snow, Pasteur, Koch, and Lister
Medical reversal: ~40% of rigorously tested medical practices are reversed — found to be no better than alternatives or actually harmful
Medicine's correction toolkit: RCTs (address the comparison problem), Cochrane (address the synthesis problem), guidelines (address the implementation problem) — each powerful but each with specific blind spots
The fundamental tension: Medicine has both the best correction tools and some of the strongest failure modes of any field

Key Arguments

Medicine's history demonstrates every failure mode in Parts I and II operating simultaneously for over 2,000 years
The absence of controlled comparison (not the stupidity of physicians) sustained 2,000 years of harmful treatment
Correction infrastructure is necessary but not sufficient — incentive structures, authority cascades, and sunk costs can sustain wrong practices even in the presence of RCTs and systematic reviews
The opioid crisis demonstrates that modern medicine is not immune to the failure modes that produced bloodletting and lobotomy
Medicine is the benchmark: if your field has less correction infrastructure than medicine, and medicine still gets ~40% of tested practices wrong, your field's error rate is likely higher

Spaced Review

Revisiting earlier material to strengthen retention.

(From Chapter 1 — The Archaeology of Error) The lifecycle of a wrong idea has seven stages. Map bloodletting onto all seven stages. At which stage did bloodletting spend the longest time? What finally moved it to Stage 6 (crisis)?
(From Chapter 2 — The Authority Cascade) Galen's authority shaped medicine for over 1,000 years. How does the Galen cascade compare to modern authority cascades in medicine (professional society guidelines, key opinion leaders)? Has the mechanism changed, or only the speed?
(From Chapter 9 — The Sunk Cost of Consensus) The opioid crisis involved enormous sunk costs — pharmaceutical investment, clinical training, institutional protocols. How did sunk cost interact with incentive structures to sustain prescribing practices that were killing patients?
(From Chapter 18 — The Outsider Problem) Semmelweis, Snow, and Marshall & Warren were all outsiders who brought correct evidence to medicine. Compare their experiences. Did the outsider problem operate similarly in all three cases, or did structural differences produce different outcomes?

Answers

1. Stage 1 (Introduction): Humoral theory introduced by Hippocrates/Galen. Stage 2 (Adoption): Became the dominant framework. Stage 3 (Entrenchment): Embedded in training, licensing, and institutional practice for over 1,500 years — this is where bloodletting spent the longest. Stage 4 (Counter-evidence): Began accumulating in the 17th-18th centuries (Pierre Louis's statistical analysis). Stage 5 (Resistance): Heroic medicine intensified despite counter-evidence. Stage 6 (Crisis): No single crisis — instead, the germ theory revolution provided both counter-evidence and an alternative framework simultaneously. Stage 7 (Revision): Medical history now presents the transition as smooth progress. 2. The mechanism is structurally identical: a prestigious source's claims are propagated through citation, training, and institutional deference without independent verification. The speed has changed — Galen's cascade took centuries; modern cascades through guidelines and key opinion leaders take years to decades. But the structure is the same: deference to authority substituting for independent evaluation of evidence. 3. Pharmaceutical companies invested billions in opioid marketing and manufacturing (sunk cost). Physicians who had built prescribing patterns around opioids had training-based sunk cost. Hospitals that had built pain management programs around opioids had institutional sunk cost. The incentive structures (pharmaceutical marketing, pressure to treat pain aggressively, financial incentives for prescribing) aligned with the sunk cost to create a self-reinforcing system: the more that was invested, the stronger the incentive to continue, and the stronger the resistance to evidence that the investment was producing harm. 4. Semmelweis (1847): Outsider in terms of institutional position (junior physician), correct evidence dismissed, career destroyed — the outsider problem in its most extreme form. Snow (1854): Outsider in terms of methodology (epidemiological rather than clinical), evidence taken partly seriously but not sufficient to change the paradigm alone. Marshall & Warren (1982): Outsider in terms of specialty (not gastroenterologists), correct evidence dismissed for ~15 years, but survived to be vindicated (Nobel Prize 2005). The outsider problem operated in all three cases but with different outcomes — largely determined by the structural buffers available (Chapter 18): Semmelweis had none; Snow had methodological credibility; Marshall had dramatic evidence (self-experiment) and eventually alternative availability (antibiotics).

What's Next

In Chapter 24: Field Autopsy: Economics, we will apply the same analytical framework to "the dismal science" — a field that claims scientific status while failing to predict its most consequential outcomes, that mathematicized its theories to the point of unfalsifiability, and that responded to the 2008 crisis with remarkably little theoretical change.

Before moving on, complete the exercises and quiz to solidify your understanding.

Learning Objectives

In This Chapter

Chapter 23: Field Autopsy: Medicine

Chapter Overview

23.1 The Long Wrong: 2,000 Years of Confident Error

The Humoral Paradigm (c. 400 BCE – c. 1850 CE)

Heroic Medicine (1780s–1850s)

The Scope of the Damage

Pierre Louis and the First Crack

23.2 The First Great Correction: Germ Theory

The Key Figures and Their Fates

Applying the Correction Speed Model

23.3 A Century of Error and Correction (1900–2000)

Lobotomy (1935–1970s)

Thalidomide (1957–1962) and Its Aftermath

The Replication Crisis in Clinical Research

The Opioid Crisis (1990s–present)

23.4 Medicine's Correction Toolkit: What Works

The Randomized Controlled Trial (RCT)

The Cochrane Collaboration

Clinical Practice Guidelines

The Eminence-Based Medicine Problem

23.5 Medical Reversal: The Current Frontier

The Scale of the Problem

Why Reversals Persist

23.6 Applying the Correction Speed Model to Medicine

23.7 What It Looked Like From Inside: The Physician's Dilemma

23.8 Active Right Now: Where Medicine Is Currently Stuck

📐 Project Checkpoint

Epistemic Audit — Chapter 23 Addition: The Medical Benchmark

23.9 Chapter Summary

Key Concepts

Key Arguments

Spaced Review

What's Next

Chapter 23 Exercises → `exercises.md`

Chapter 23 Quiz → `quiz.md`

Case Study: The Opioid Crisis — Every Failure Mode at Once → `case-study-01.md`

Case Study: Medical Reversal — When Standard of Care Is Wrong → `case-study-02.md`

Learning Objectives

In This Chapter

Chapter 23: Field Autopsy: Medicine

Chapter Overview

23.1 The Long Wrong: 2,000 Years of Confident Error

The Humoral Paradigm (c. 400 BCE – c. 1850 CE)

Heroic Medicine (1780s–1850s)

The Scope of the Damage

Pierre Louis and the First Crack

23.2 The First Great Correction: Germ Theory

The Key Figures and Their Fates

Applying the Correction Speed Model

23.3 A Century of Error and Correction (1900–2000)

Lobotomy (1935–1970s)

Thalidomide (1957–1962) and Its Aftermath

The Replication Crisis in Clinical Research

The Opioid Crisis (1990s–present)

23.4 Medicine's Correction Toolkit: What Works

The Randomized Controlled Trial (RCT)

The Cochrane Collaboration

Clinical Practice Guidelines

The Eminence-Based Medicine Problem

23.5 Medical Reversal: The Current Frontier

The Scale of the Problem

Why Reversals Persist

23.6 Applying the Correction Speed Model to Medicine

23.7 What It Looked Like From Inside: The Physician's Dilemma

23.8 Active Right Now: Where Medicine Is Currently Stuck

📐 Project Checkpoint

Epistemic Audit — Chapter 23 Addition: The Medical Benchmark

23.9 Chapter Summary

Key Concepts

Key Arguments

Spaced Review

What's Next

Chapter 23 Exercises → exercises.md

Chapter 23 Quiz → quiz.md

Case Study: The Opioid Crisis — Every Failure Mode at Once → case-study-01.md

Case Study: Medical Reversal — When Standard of Care Is Wrong → case-study-02.md

Chapter 23 Exercises → `exercises.md`

Chapter 23 Quiz → `quiz.md`

Case Study: The Opioid Crisis — Every Failure Mode at Once → `case-study-01.md`

Case Study: Medical Reversal — When Standard of Care Is Wrong → `case-study-02.md`