Case Study 2: Preclinical Cancer Research — When 89% of "Landmark" Studies Don't Replicate
The Amgen Study
In 2012, C. Glenn Begley and Lee Ellis published a brief but devastating report in Nature. Working at Amgen, one of the world's largest biotechnology companies, they had attempted to replicate 53 "landmark" preclinical cancer studies — papers that had been published in top journals, highly cited, and influential in shaping research directions and drug development priorities.
They could replicate only 6 of 53 — an 11% success rate.
Why This Matters More Than Psychology
The psychology replication crisis is concerning. The preclinical cancer research replication crisis is potentially lethal.
Preclinical research is the foundation of drug development. When a laboratory study identifies a molecular target, a mechanism, or a promising compound, that finding enters the drug development pipeline. Clinical trials costing hundreds of millions of dollars are designed based on the preclinical evidence. Patients are enrolled in trials — sometimes forgoing other treatments — based on the expectation that the preclinical science is reliable.
If 89% of the foundational findings are unreliable, the pipeline is built on sand. Clinical trials that test unreliable preclinical findings will predictably fail — wasting money, time, and (most importantly) the health and hope of patients who enrolled expecting a genuine chance of benefit.
The Structural Problem
The structural incentives in preclinical cancer research mirror those in psychology, with additional factors:
- Academic incentives: Preclinical researchers face the same publish-or-perish pressures as other academics. Novel, dramatic findings (a new molecular target, a promising compound) are publishable; failed replications are not.
- Industry incentives: Pharmaceutical companies want to identify promising drug targets. They have incentives to pursue exciting leads and disincentives to question them — because questioning might eliminate a potential product from the pipeline.
- Complexity: Biological systems are extraordinarily complex. Small variations in cell lines, reagents, protocols, and environmental conditions can produce different results. This genuine variability makes both initial positive results and failed replications harder to interpret.
- Low transparency: Many preclinical studies do not share their raw data, full protocols, or negative results. The garden of forking paths is wide open, and the paths taken are often not documented.
A Specific Example
Begley described one case in detail. A "landmark" study had identified a protein that, when inhibited, stopped cancer cell growth in laboratory dishes. The finding generated enormous excitement — it suggested a new drug target. Multiple research groups built on the finding, publishing follow-up studies and initiating drug development programs.
When Begley's team tried to replicate the original result, they failed. They contacted the original author, who admitted that the published result had emerged after multiple failed attempts — and that the author's own lab had been unable to reproduce the finding consistently.
The published finding was, essentially, a best-case result selected from multiple attempts. The failures were in the file drawer. And on this unreliable foundation, an entire research direction had been built.
The Scale of the Problem
The Begley study was small (53 papers). But its implications scale across the entire field. If similar replication rates apply to the thousands of preclinical cancer studies published each year, the volume of unreliable findings entering the drug development pipeline is staggering. Combined with the known failure rate of clinical trials (~90% of drugs fail), the picture that emerges is of a system where unreliable preclinical evidence feeds into expensive clinical trials that predictably fail — a massive, structural waste of resources and human hope.
Discussion Questions
- Should pharmaceutical companies be required to replicate preclinical findings before investing in clinical trials? What would this cost? What would it save?
- Compare the structural incentives in preclinical research with those in psychology. What similarities and differences do you see?
- The original author admitted they couldn't replicate their own finding. Is this admission admirable (honest) or concerning (why publish it then)? What structural forces drove the publication?
- How should the scientific community handle the finding that 89% of "landmark" studies don't replicate? What reforms would you prioritize?
References
- Begley, C. G. & Ellis, L. M. (2012). "Drug development: Raise standards for preclinical cancer research." Nature, 483, 531–533. (Tier 1)
- Prinz, F., Schlange, T., & Asadullah, K. (2011). "Believe it or not: how much can we rely on published data on potential drug targets?" Nature Reviews Drug Discovery, 10, 712. (Tier 1 — the Bayer replication study)
- Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). "The Economics of Reproducibility in Preclinical Research." PLoS Biology, 13(6), e1002165. (Tier 1 — estimated economic cost of irreproducible preclinical research at $28 billion annually in the US alone)