Case Study: Scoring a Current Controversy — What the Scorecard Says About Claims You're Evaluating Right Now

The Purpose

The previous case study applied the scorecard retrospectively — to a claim that has already been resolved. That's useful for calibration but doesn't demonstrate the tool's practical value. The scorecard's real purpose is to evaluate claims before you know whether they're right or wrong.

This case study provides a template for applying the scorecard to current controversies. Rather than scoring a single claim (which would inevitably become dated as the book ages), it provides a structured methodology for applying the scorecard to whatever claim is currently under debate in your field.

The Method

Step 1: Select the Claim

Choose a claim that is currently accepted as consensus in your field but that you have reason to question — or a claim that is currently controversial, with intelligent people on both sides.

Good candidates include: - A practice that "everyone knows works" but that you haven't seen rigorously validated - A claim where the supporting evidence is older than you expected when you looked it up - A recommendation that seems to benefit the people making it more than the people receiving it - A consensus position that outsiders from adjacent fields dispute

Step 2: Score the 15 Questions

For each question, assign a score (🟢🟡🔴) and document your reasoning. Be honest — the tool only works if you score accurately, not if you score to confirm your existing suspicion.

Common scoring pitfalls: - Confirmation bias in scoring. If you already suspect the claim is wrong, you'll tend to score everything red. If you believe the claim is right, you'll tend to score everything green. Counteract this by asking: "What would I score this if I had no opinion about the claim?" - Insufficient investigation. Many questions require research (Who funded this? Has it been replicated? What happened to dissenters?). Don't guess — look it up. The scorecard's value depends on accurate scoring. - Conflating "I don't know" with a red flag. Uncertainty (🟡) is not the same as a red flag (🔴). If you can't determine the funding source, that's yellow (unknown), not red (known conflict).

Step 3: Interpret the Pattern

Count the red flags and consult the interpretation guide. But also look at the pattern — which clusters of questions score red?

Cluster 1 (Q1, Q4, Q9): The Incentive Cluster. If these three questions all score red — the claim is funded by interested parties, benefits powerful actors, and the effect size is small or unreported — the structural conditions for incentive-driven error are strong.

Cluster 2 (Q2, Q8, Q10): The Evidence Cluster. If these three score red — not independently replicated, evidence from a single source, not tested outside the lab — the evidence base is structurally weak.

Cluster 3 (Q3, Q12): The Falsifiability Cluster. If both score red — no falsification criteria, no prediction track record — the claim may be structured to be immune from disconfirmation.

Cluster 4 (Q7, Q14): The Suppression Cluster. If both score red — dissenters are punished, outsiders are dismissed — the consensus is being maintained by institutional power rather than by evidence.

Cluster 5 (Q5, Q13): The History Cluster. If both score red — old evidence, sanitized history — the claim may be sustained by institutional inertia rather than ongoing evaluation.

Step 4: Decide What to Do

The scorecard does not tell you whether the claim is right or wrong. It tells you where to look next.

  • High-scoring claims (6+ red flags): Warrant deep investigation. Read the original evidence. Look for independent replications. Talk to people who disagree. Investigate the funding and incentive structure. This is not paranoia — it is the structural due diligence that prevents you from being trapped by the same failure modes that have trapped every field in this book.

  • Medium-scoring claims (3-5 red flags): Warrant specific investigation of the red-flagged dimensions. You don't need to redo the entire evidence base — just check the weak points.

  • Low-scoring claims (0-2 red flags): Probably safe to accept provisionally, with the understanding that the scorecard is a screening tool and not a guarantee.

Worked Template

Copy and complete the following for any claim you want to evaluate:

CLAIM: [State the claim precisely]
FIELD: [The field or domain]
DATE OF ASSESSMENT: [Today's date]

| # | Question | Score | Evidence/Reasoning |
|---|---|---|---|
| 1 | Who funded this? | | |
| 2 | Independently replicated? | | |
| 3 | What would disprove this? | | |
| 4 | Who benefits? | | |
| 5 | How old is the core evidence? | | |
| 6 | Precision or accuracy? | | |
| 7 | What happens to dissenters? | | |
| 8 | Independent sources? | | |
| 9 | Effect size meaningful? | | |
| 10 | Works outside the lab? | | |
| 11 | Simpler explanation? | | |
| 12 | Prediction track record? | | |
| 13 | Field history? | | |
| 14 | Outsiders saying differently? | | |
| 15 | How would we know if wrong? | | |

RED FLAGS: [count]
YELLOW FLAGS: [count]
GREEN FLAGS: [count]

CLUSTER ANALYSIS:
- Incentive cluster (Q1, Q4, Q9): [pattern]
- Evidence cluster (Q2, Q8, Q10): [pattern]
- Falsifiability cluster (Q3, Q12): [pattern]
- Suppression cluster (Q7, Q14): [pattern]
- History cluster (Q5, Q13): [pattern]

ASSESSMENT: [Your overall interpretation]
NEXT STEPS: [What you need to investigate further]

Analysis Questions

1. Complete the template above for a claim in your field. Share and discuss with colleagues or classmates. Did they score the same questions differently? If so, what does that tell you about the subjectivity of the tool?

2. The scorecard inevitably reflects the user's prior knowledge and biases. How would you design a process for scoring that minimizes this? Could the scorecard be applied by a team rather than an individual? What would that process look like?

3. Identify a claim that you believe is correct but that might score several red flags (e.g., a claim in a field with weak research infrastructure, or a claim made by a small research community). What does this tell you about the limitations of structural screening? How should you handle a high red flag score on a claim you believe is sound?