Appendix A: The Evidence Evaluation Framework

The 9-Step Fact-Checker's Toolkit — a standalone reference card from Chapter 4.


Quick Reference

Step Question What to Look For
1 What is the specific claim? Pin it down. Vague claims can't be evaluated. Turn "X affects Y" into a testable statement.
2 What is the original source? A peer-reviewed study? A book? A theory? Folk wisdom with no identifiable source?
3 Single study or meta-analysis? Meta-analyses > large replications > single studies > anecdotes. Check the hierarchy.
4 What was the sample? Size (N < 50 = cautious; N > 200 = more reliable). Demographics (WEIRD?). Generalizability.
5 Has it been replicated? Replicated = stronger. Failed = weaker. Never tested = unknown. The most important post-crisis question.
6 What is the effect size? Significant ≠ large. Cohen's d: 0.2 = small, 0.5 = medium, 0.8 = large. Correlation r: 0.1 = small, 0.3 = medium, 0.5 = large.
7 What do other experts say? Consensus or controversy? Is this one researcher's claim or the field's position?
8 Who benefits? Financial, professional, or ideological incentives. Profit ≠ falsehood, but warrants scrutiny.
9 Too good to be true? Universal claims, single-cause explanations, dramatic promises, no caveats = be suspicious.

The Hierarchy of Evidence

From strongest to weakest:

  1. Meta-analyses and systematic reviews (with publication bias corrections)
  2. Large pre-registered replications (e.g., Many Labs projects)
  3. Multiple independent replications (different labs, different samples)
  4. A single large, well-designed study (pre-registered, adequate sample)
  5. A single small study (the typical published study pre-crisis)
  6. No empirical evidence (theory, clinical observation, or folk wisdom)

The Four Evidence Ratings

Rating Meaning When to Apply
SUPPORTED Research backs this up, with caveats Meta-analyses support the claim; effect has been replicated; experts largely agree
⚠️ OVERSIMPLIFIED Kernel of truth, distorted in popular version The underlying phenomenon is real but the pop version has been exaggerated, stripped of context, or applied beyond its evidence base
DEBUNKED Research doesn't support this despite popularity Failed replications, no supporting evidence, or contradicted by the available data
🔬 UNRESOLVED Science is genuinely uncertain Honest scholars disagree; evidence is mixed; the question hasn't been adequately tested

Common Pitfalls

Pitfall Description How to Avoid
Appealing to popularity "Millions of people believe this" Popularity ≠ validity. Apply the toolkit regardless of how many people endorse the claim.
Appealing to authority "A famous psychologist said this" One researcher's claim ≠ scientific consensus. Check Steps 5 and 7.
Confusing correlation with causation "X and Y are correlated, therefore X causes Y" Correlation establishes association, not causation. Consider reverse causation and confounders.
Base rate neglect "This description fits me perfectly" Most Barnum-style descriptions fit most people. Ask: how many people would this apply to?
The file drawer problem "There are hundreds of studies supporting this" Published literature overrepresents positive findings. Check for publication bias corrections.
Neuroscience window dressing "This activates your dopamine/amygdala/prefrontal cortex" Brain language adds apparent credibility without adding explanation. Apply the "replace with brain stuff" test.

The "Replace With Brain Stuff" Test

If you encounter a claim that invokes neuroscience language, replace the specific brain term with "brain stuff":

  • "Your phone is addictive because of dopamine" → "Your phone is addictive because of brain stuff"
  • "Meditation changes your amygdala" → "Meditation changes your brain stuff"

If the explanatory power is the same with "brain stuff" as with the specific term, the neuroscience language is adding credibility, not explanation.


Applying the Toolkit: A Worked Example Template

Claim: [State the claim specifically]

Step Finding
1. Specific claim [Restate precisely]
2. Original source [Study / book / folk wisdom]
3. Study type [Meta-analysis / single study / no study]
4. Sample [Size, demographics, WEIRD?]
5. Replicated? [Yes / No / Never tested]
6. Effect size [d = ? / r = ? / plain language]
7. Expert consensus [Consensus / Controversy / Unknown]
8. Who benefits? [Financial / professional / ideological]
9. TGTBT? [Universal? Single-cause? No caveats?]

Rating: [✅ / ⚠️ / ❌ / 🔬]

Justification: [2–3 sentences]


This appendix is designed to be photocopied, bookmarked, or saved as a permanent reference. The toolkit works on any psychology claim you encounter — in books, on social media, in corporate training, or in conversation. It improves with practice.