Quiz: Red Flags

Q: The Red Flag Scorecard is designed to: (a) Prove that claims are wrong (b) Assess structural conditions that increase the probability of error — not prove wrongness, but identify where skepticism is warranted (c) Replace scientific evidence (d) Evaluate individual researchers' competence

(b) The scorecard detects structural risk, not truth value. No scorecard can prove a claim wrong — only evidence can do that. The scorecard identifies the conditions that sustain wrong consensuses, helping the reader focus their skepticism.

Q: Question 3 — "What would disprove this?" — detects which failure mode? (a) Survivorship bias (b) Unfalsifiability (Chapter 3) (c) Sunk cost (d) The streetlight effect

(b) Unfalsifiability. A claim whose proponents cannot specify what evidence would change their mind is structurally protected from correction.

Q: Question 7 — "What happens to people who disagree?" — is important because: (a) Disagreement is always correct (b) The treatment of dissenters signals whether the consensus is defended by evidence (engagement) or by institutional power (suppression) (c) All dissenters are heroes (d) Consensus is always wrong

(b) In a healthy field, dissent is engaged on its merits. In a field protecting wrong consensus, dissent is punished through career penalties, funding denial, or ridicule. The treatment of dissenters reveals the quality of the consensus.

Q: The dietary fat hypothesis (circa 1990) scored approximately 8 red flags in the worked example. This score falls in the: (a) "Probably sound" range (0-2 red flags) (b) "Caution warranted" range (3-5 red flags) (c) "Significant concern" range (6-9 red flags) (d) "Deep skepticism warranted" range (10+ red flags)

(c) Eight red flags place the claim well into "significant concern" territory. The claim turned out to be substantially wrong — the scorecard would have correctly identified it as structurally vulnerable.

Q: Question 9 — "Is the effect size meaningful?" — addresses the problem of: (a) Statistical significance without practical importance — effects that are real but too small to matter (b) Studies being too expensive (c) Researchers being biased (d) Sample sizes being too large

(a) A finding can be statistically significant (unlikely to be due to chance) while having an effect size too small to matter in practice. Many education and psychology findings have this profile — real effects that are practically meaningless.

Q: Question 13 — "How does the field tell its own history?" — detects: (a) Revision myth (Chapter 20) — the tendency to sanitize history and present current positions as the inevitable outcome of rational progress (b) The outsider problem (c) Survivorship bias (d) The sunk cost of consensus

(a) A field that rewrites its history to erase the wrong turns, suppressed ideas, and costly corrections cannot learn from its mistakes — because it has erased the evidence that mistakes were made.

Q: The scorecard's interpretation guidelines suggest that 3-5 red flags indicates: (a) The claim is definitely wrong (b) No concern (c) "Caution warranted" — the claim may be correct but there are structural features that increase the probability of error (d) The researcher should give up

(c) 3-5 red flags doesn't prove wrongness but indicates structural vulnerability. The recommendation is to investigate the red-flagged dimensions specifically rather than to accept or reject the claim based on the score alone.

Quiz: Red Flags

Q1. The Red Flag Scorecard is designed to:

(a) Prove that claims are wrong (b) Assess structural conditions that increase the probability of error — not prove wrongness, but identify where skepticism is warranted (c) Replace scientific evidence (d) Evaluate individual researchers' competence

Answer

**(b)** The scorecard detects structural risk, not truth value. No scorecard can prove a claim wrong — only evidence can do that. The scorecard identifies the conditions that sustain wrong consensuses, helping the reader focus their skepticism.

Q2. Question 3 — "What would disprove this?" — detects which failure mode?

(a) Survivorship bias (b) Unfalsifiability (Chapter 3) (c) Sunk cost (d) The streetlight effect

Answer

**(b)** Unfalsifiability. A claim whose proponents cannot specify what evidence would change their mind is structurally protected from correction.

Q3. Question 7 — "What happens to people who disagree?" — is important because:

(a) Disagreement is always correct (b) The treatment of dissenters signals whether the consensus is defended by evidence (engagement) or by institutional power (suppression) (c) All dissenters are heroes (d) Consensus is always wrong

Answer

**(b)** In a healthy field, dissent is engaged on its merits. In a field protecting wrong consensus, dissent is punished through career penalties, funding denial, or ridicule. The treatment of dissenters reveals the quality of the consensus.

Q4. The dietary fat hypothesis (circa 1990) scored approximately 8 red flags in the worked example. This score falls in the:

(a) "Probably sound" range (0-2 red flags) (b) "Caution warranted" range (3-5 red flags) (c) "Significant concern" range (6-9 red flags) (d) "Deep skepticism warranted" range (10+ red flags)

Answer

**(c)** Eight red flags place the claim well into "significant concern" territory. The claim turned out to be substantially wrong — the scorecard would have correctly identified it as structurally vulnerable.

Q5. Question 9 — "Is the effect size meaningful?" — addresses the problem of:

(a) Statistical significance without practical importance — effects that are real but too small to matter (b) Studies being too expensive (c) Researchers being biased (d) Sample sizes being too large

Answer

**(a)** A finding can be statistically significant (unlikely to be due to chance) while having an effect size too small to matter in practice. Many education and psychology findings have this profile — real effects that are practically meaningless.

Q6. A claim scores 🟢 on Questions 1-4 (funding, replication, falsification, beneficiaries) but 🔴 on Question 15 ("How would we know if it's wrong?"). What does this pattern suggest?

(a) The claim is definitely correct (b) The claim may be correct, but the field lacks the capacity to detect if it's wrong — the error could persist undetected (c) The claim is definitely wrong (d) The scorecard is broken

Answer

**(b)** Green scores on structural integrity questions combined with a red score on error detection suggests a claim that may have been produced honestly but exists in a field that cannot catch its own mistakes. This is the profile of fields like education and criminal justice — where errors can persist for decades before becoming visible.

Q7. Question 8 — "Does the evidence come from one source or many independent sources?" — detects:

(a) The plausible story problem (b) Authority cascade and citation amplification — evidence that looks extensive but traces back to a single source (c) The replication problem (d) Incentive structures

Answer

**(b)** An evidence base that appears broad but traces back to a single study, research group, or methodological tradition may reflect authority cascade and citation amplification rather than genuinely independent confirmation. The dietary fat evidence tracing to Keys and the neural network critique tracing to Minsky are paradigmatic examples.

Q8. Question 13 — "How does the field tell its own history?" — detects:

(a) Revision myth (Chapter 20) — the tendency to sanitize history and present current positions as the inevitable outcome of rational progress (b) The outsider problem (c) Survivorship bias (d) The sunk cost of consensus

Answer

**(a)** A field that rewrites its history to erase the wrong turns, suppressed ideas, and costly corrections cannot learn from its mistakes — because it has erased the evidence that mistakes were made.

Q9. The scorecard's interpretation guidelines suggest that 3-5 red flags indicates:

(a) The claim is definitely wrong (b) No concern (c) "Caution warranted" — the claim may be correct but there are structural features that increase the probability of error (d) The researcher should give up

Answer

**(c)** 3-5 red flags doesn't prove wrongness but indicates structural vulnerability. The recommendation is to investigate the red-flagged dimensions specifically rather than to accept or reject the claim based on the score alone.

Q10. The chapter argues that the 15 questions work because:

(a) They were designed by experts (b) The failure modes are structural and predictable — the same conditions that sustained past errors can be detected around current claims (c) They are statistically validated (d) They replace the need for evidence

Answer

**(b)** The questions work because the failure modes documented in Parts I-III are structural, not accidental. The conditions that sustained the dietary fat hypothesis, neural network suppression, forensic science errors, and military doctrinal lock-in are detectable in current claims through the same diagnostic questions.

Scoring Guide

9-10 correct: Excellent. You understand both the tool and its limitations.
7-8 correct: Good. Review the distinction between structural risk assessment and truth value.
5-6 correct: Fair. Revisit the connection between each question and its underlying failure mode.
Below 5: Re-read the chapter focusing on why each question is included — what failure mode does it detect?