Chapter 4: How to Evaluate a Psychology Claim — The Fact-Checker's Toolkit

15 min read

Everything that came before — the Barnum effect, the mutation pipeline, the replication crisis — was context. Everything that comes after — 36 chapters of specific claims evaluated against the evidence — uses the tool this chapter teaches you.

In This Chapter

The 9-Step Fact-Checker's Toolkit
The Toolkit in Practice: A Worked Example
When the Toolkit Says "It's Complicated"
The Toolkit Applied to This Book
Quick Reference: The 9-Step Toolkit
Fact-Check Portfolio: Chapter 4
After Reading: Confidence Revisited

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 4: How to Evaluate a Psychology Claim — The Fact-Checker's Toolkit

This is the most important chapter in the book.

Everything that came before — the Barnum effect, the mutation pipeline, the replication crisis — was context. Everything that comes after — 36 chapters of specific claims evaluated against the evidence — uses the tool this chapter teaches you.

The tool is a 9-step framework for evaluating any psychology claim you encounter. It works on social media posts, self-help books, corporate training programs, parenting advice, therapy trends, and dinner party assertions. It works on the claims in this book too — nothing we say is exempt from scrutiny.

The framework is not a formula that spits out a yes or no answer. It is a set of questions that, when asked consistently, dramatically improve your ability to distinguish well-supported claims from oversimplified, debunked, or unresolved ones. With practice, these questions become automatic — a reflex that activates every time someone says "studies show" or "psychologists have proven."

Let's build the toolkit.

Before You Read: Confidence Check

Rate your confidence (1–10) that each statement is true.

"If a claim cites a study, it's probably well-supported." ___

"Meta-analyses are more reliable than individual studies." ___

"A statistically significant result means the effect is large and important." ___

"Western research findings apply to people everywhere." ___

"I could evaluate a psychology claim myself if I had the right framework." ___

The 9-Step Fact-Checker's Toolkit

Step 1: What Is the Specific Claim?

The first and most important step is pinning down what is actually being claimed. Popular psychology claims are often maddeningly vague, and vagueness is the enemy of evaluation.

"Attachment styles affect relationships" — what does this mean? That people with secure attachment have better relationships? That knowing your attachment style improves your relationship? That your attachment style is fixed? That online quizzes accurately measure your style? These are five different claims with five different answers.

The move: Restate the claim as specifically as possible. Turn "X affects Y" into "doing/having X causes/correlates with a specific change in Y of a specific magnitude, as measured by Z." If the claim can't be restated specifically, it may be unfalsifiable — and unfalsifiable claims are not scientific claims.

Example: "Growth mindset helps students" → "Explicitly teaching students that intelligence is malleable (rather than fixed) causes measurable improvements in academic performance as measured by standardized test scores."

The specific version is testable. The vague version is not.

Step 2: What Is the Original Source?

Many psychology claims circulate without any identifiable source. "Studies show that..." — which studies? "Research has found that..." — whose research? "Psychologists agree that..." — which psychologists?

The move: Trace the claim to its origin. Is there an actual study? Is there a specific researcher? Or is the claim "common knowledge" with no identifiable source?

Three outcomes are possible: - There is a specific study. Good — you can evaluate it. Proceed to Steps 3–9. - The claim comes from a book or a theory, not a study. Note this. Books by psychologists (or about psychology) are not the same as published research. A claim in a self-help book may not have been tested empirically. - There is no identifiable source. The claim is folk psychology — widely believed but never empirically tested. This doesn't mean it's wrong, but it means there's no evidence to evaluate.

Example: "We only use 10% of our brains" has no original source — it's a myth with no identifiable study behind it. "Growth mindset improves learning" traces to Carol Dweck's research program at Stanford. These are very different starting points.

Step 3: Was It a Single Study or a Meta-Analysis?

A single study is one experiment or observational study. A meta-analysis is a statistical synthesis of many studies on the same question. They carry very different evidentiary weight.

A single study, no matter how well-designed, can produce a false positive by chance (roughly 5% of the time at the p < .05 threshold, and more if p-hacking occurred). A meta-analysis that combines 50 studies on the same question is far more reliable because random error averages out across studies.

The move: If the claim is supported by a single study, treat it as preliminary. If it's supported by a meta-analysis, check whether the meta-analysis accounts for publication bias (many don't). If it's supported by multiple independent replications, your confidence can be higher.

The hierarchy of evidence (from strongest to weakest): 1. Meta-analyses and systematic reviews (with publication bias corrections) 2. Large pre-registered replications 3. Multiple independent replications 4. A single large, well-designed study 5. A single small study 6. No empirical evidence (theory or common knowledge only)

Step 4: What Was the Sample?

The most common sampling problem in psychology is the WEIRD problem: the vast majority of research participants are from Western, Educated, Industrialized, Rich, and Democratic societies. Henrich, Heine, and Norenzayan (2010) found that 96% of psychology participants came from only 12% of the world's population.

The move: Ask three questions about the sample: - How large was it? Smaller samples produce less reliable results (see the winner's curse in Chapter 3). As a rough guide: N < 50 per condition = be cautious; N > 200 per condition = more reliable. - Who was in it? College students at a Western university? Clinical patients? Children? Online participants? The sample determines who the results apply to. - Is the population being generalized to much broader than the sample studied? A finding from 50 American college students may not apply to 40-year-olds in Kenya.

Example: The marshmallow test's original sample was fewer than 50 children of Stanford faculty. The claim that was made from it — "self-control predicts success for all children" — was a dramatic overgeneralization.

Step 5: Has It Been Replicated?

After the replication crisis (Chapter 3), this is perhaps the single most important question you can ask.

The move: Search for the finding plus "replication" or "meta-analysis" on Google Scholar. Three outcomes: - Replicated successfully (especially in large, pre-registered studies): Confidence can be high. - Failed to replicate: Confidence should be low, regardless of how famous the original finding is. - Never tested for replication: This is common for many claims. Treat the finding as preliminary.

Example: Ego depletion was "supported" by hundreds of studies but failed a large pre-registered replication. Power posing's hormonal effects failed multiple replications. The Big Five personality traits have been replicated extensively across cultures and decades. Replication status separates the solid from the shaky.

Step 6: What Is the Effect Size?

Statistical significance (p < .05) tells you that a result is unlikely to be due to chance. It does not tell you that the result is large, important, or practically meaningful.

A study with 10,000 participants can find a statistically significant difference between two groups even if that difference is trivially small — a fraction of a percent. Statistical significance is about detection, not magnitude.

The move: Look for the effect size — the magnitude of the finding. In psychology, common effect size measures include:

Cohen's d: The difference between groups in standard deviation units. By convention: d = 0.2 is small, d = 0.5 is medium, d = 0.8 is large. Most psychology effects are small to medium.
Correlation (r): The strength of association between variables. r = 0.1 is small, r = 0.3 is medium, r = 0.5 is large.
Percentage or odds ratio: Expressed in plain language: "people who did X were 12% more likely to do Y."

For this book, we translate effect sizes into plain language: "people in the treatment group scored about 5% higher" or "the correlation between X and Y is modest — knowing X tells you a little about Y, but not a lot."

Example: Many popular psychology claims are based on real but small effects. The correlation between social media use and depression in teens is approximately r = 0.10–0.15 — real, but about the same as the correlation between wearing glasses and depression. Knowing this changes how you interpret "social media causes depression."

Step 7: What Do Other Experts Say?

One study (or one researcher) is a data point, not a consensus. Science works through the accumulation of evidence from many independent sources.

The move: Look for: - Expert commentary or response articles published alongside or after the original study - Review articles that synthesize a body of research - Whether the finding is controversial within the field — not all findings generate consensus

Be especially cautious when a single researcher makes a big claim and the rest of the field is skeptical. Amy Cuddy's power posing, Zimbardo's SPE, and some of Carol Dweck's growth mindset claims all involve situations where the original researcher's public messaging outpaced the field's consensus.

Example: Jonathan Haidt's claim that social media is the primary cause of a teen mental health crisis is hotly contested by other researchers (notably Amy Orben and Andrew Przybylski), who argue the evidence is much weaker than Haidt suggests. The scientific debate is genuine — both sides have data. The popular version usually presents Haidt's position as established fact.

Step 8: Who Benefits from This Claim Being True?

This is not a cynicism step — it's a context step. Understanding the incentive structure around a claim helps you evaluate it more clearly.

The move: Ask: who profits (financially, professionally, or ideologically) from this claim being believed?

The self-help industry benefits from claims that personal transformation is simple and achievable (growth mindset, manifesting, habit formation).
The corporate training industry benefits from claims that personality tests predict performance and that resilience can be trained.
Therapy app companies benefit from claims that therapy is essential for everyone and that mental health conditions are epidemic.
Researchers benefit from claims that generate citations, media coverage, and grant funding.
Media outlets benefit from claims that are dramatic, surprising, and shareable.

This doesn't mean every profitable claim is false. Exercise is enormously profitable for the fitness industry AND genuinely supported by evidence. But when a claim is profitable, apply extra scrutiny.

Example: Myers-Briggs generates $2–4 billion annually. When an industry worth billions depends on a framework being perceived as valid, the incentive to defend it is powerful — regardless of the evidence.

Step 9: Does It Survive the "Too Good to Be True" Test?

The most viral psychology claims tend to be simple, dramatic, and promise more than the research delivers. Real findings are usually messier, more conditional, and less tweetable.

The move: Apply the TGTBT test. If a claim sounds like it would make a perfect headline, be suspicious. Real psychology findings usually include phrases like "in some conditions," "for some people," "with a small to moderate effect," and "more research is needed."

Red flags: - Universal claims: "Everyone is..." "Humans always..." "All children who..." - Single-cause explanations: "X causes Y" (most outcomes have multiple causes) - Permanent, fixed categories: "You are a [type] and always will be" - Dramatic promises: "This one trick..." "The secret to success is..." - No caveats or limitations mentioned

Example: "Your attachment style, determined in infancy, shapes every relationship you'll ever have" is too clean. The real story is: attachment patterns exist, they're partially stable, they're also context-dependent, they can change, and they explain some (not all) variance in relationship outcomes. The nuanced version is more accurate and, honestly, more interesting.

The Toolkit in Practice: A Worked Example

Let's apply all nine steps to a claim you've probably encountered:

Claim: "People are either left-brained (logical) or right-brained (creative)."

Step	Question	Answer
1	What is the specific claim?	That people have a dominant brain hemisphere that determines whether they are logical or creative
2	Original source?	Loose misinterpretation of Roger Sperry's split-brain research (1960s Nobel Prize work) — but Sperry never claimed hemispheric dominance creates personality types
3	Single study or meta-analysis?	No study has ever supported the claim that individuals are "left-brained" or "right-brained." A 2013 fMRI study (Nielsen et al.) with 1,011 participants found no evidence of hemispheric dominance
4	What was the sample?	Nielsen et al.: large, diverse sample using brain imaging — strong evidence
5	Replicated?	The absence of hemispheric dominance has been consistently found across brain imaging studies
6	Effect size?	No meaningful difference in hemispheric lateralization between "logical" and "creative" people
7	Expert consensus?	Neuroscientists overwhelmingly reject the left-brain/right-brain model as applied to personality
8	Who benefits?	The learning styles industry, personality quiz industry, and self-help authors who use brain-type frameworks
9	Too good to be true?	A clean binary (logical vs. creative) that explains personality? Fits the TGTBT pattern perfectly

Verdict: ❌ DEBUNKED. The left-brain/right-brain model of personality has no empirical support.

When the Toolkit Says "It's Complicated"

Not every claim will resolve into a clean verdict. Some claims are genuinely contested — the evidence is mixed, experts disagree, and the honest answer is "we don't know yet." This is the 🔬 UNRESOLVED rating, and it's just as important as the other three.

The toolkit helps you distinguish between three types of uncertainty: - The evidence is weak but trending in one direction — lean cautiously toward that direction - The evidence is genuinely mixed — honest scholars disagree, and both sides have data - The evidence doesn't exist — the claim has never been properly tested

Popular psychology hates uncertainty. Social media posts don't say "the evidence is mixed." Self-help books don't say "we're not sure." But science is uncertain about many things, and pretending otherwise is its own form of distortion.

The ability to sit with uncertainty — to say "I don't know yet" — is one of the most important skills this book can teach you. It is more honest and more useful than premature certainty in either direction.

The Toolkit Applied to This Book

This book itself is subject to the toolkit. We are simplifying research for a popular audience, which means we are participating in the mutation pipeline. We try to do so responsibly — citing sources, acknowledging uncertainty, presenting caveats — but we are not exempt from the pressures that make popular science oversimplified.

Here is what we commit to: - Every evidence rating in this book is supported by citations to peer-reviewed research - When the evidence is uncertain, we say so - When we simplify, we note the simplification - We encourage you to check our sources using the same toolkit we're teaching you

If this book is doing its job, you should finish it with the ability to evaluate not just popular psychology claims but also the claims in this book. That's the goal.

Verdict: "I could evaluate a psychology claim myself if I had the right framework" ✅ SUPPORTED — Research on critical thinking education consistently shows that teaching explicit evaluation frameworks improves people's ability to assess evidence-based claims. The 9-step framework in this chapter is designed to be learnable, memorable, and applicable to any psychology claim. The Fact-Check Portfolio project provides structured practice. The skill is genuine and developable. Evidence base: Critical thinking instruction meta-analyses (Abrami et al., 2015) show moderate positive effects on reasoning quality. The framework itself draws on established science communication and evidence evaluation methods.

Quick Reference: The 9-Step Toolkit

For easy reference, here is the complete toolkit on one page. You may want to bookmark this or copy it somewhere accessible — you'll use it for the rest of the book and, more importantly, for the rest of your life.

Step	Question	What to Look For
1	What is the specific claim?	Pin it down. Vague claims can't be evaluated.
2	What is the original source?	A study? A book? A theory? Folk wisdom?
3	Single study or meta-analysis?	Meta-analyses > replications > single studies
4	What was the sample?	Size, demographics, WEIRD problem
5	Has it been replicated?	Replicated = stronger. Failed = weaker. Never tested = unknown.
6	What is the effect size?	Significant ≠ large. Small effects can be real but modest.
7	What do other experts say?	Is there consensus or controversy?
8	Who benefits?	Financial, professional, or ideological incentives
9	Too good to be true?	Simple, dramatic, universal claims → be suspicious

Fact-Check Portfolio: Chapter 4

Choose two of your 10 selected claims. Apply the full 9-step toolkit to each one. Write up your analysis — it doesn't need to be long, but it should address each step.

You probably won't be able to answer every step fully (you might not find the original study, or you might not know what other experts think). That's fine — note what you can find and what you can't. The gaps in your analysis are themselves informative.

You will apply the toolkit to additional claims in subsequent chapters as you encounter relevant evidence. By Chapter 40, all 10 claims should have a full analysis.

After Reading: Confidence Revisited

Revisit your confidence ratings from the start of this chapter.

"If a claim cites a study, it's probably well-supported." — Does citing a study guarantee the study is replicated, well-powered, or representative?

"Meta-analyses are more reliable than individual studies." — Yes, but with what caveat about publication bias?

"A statistically significant result means the effect is large and important." — What is the difference between significance and effect size?

"Western research findings apply to people everywhere." — What is the WEIRD problem?

"I could evaluate a psychology claim myself if I had the right framework." — Has your confidence in this changed? You now have the framework.