Quiz: Adversarial Collaboration and Other Tools

Q: Adversarial collaboration differs from normal scientific debate because: (a) It eliminates disagreement (b) Disagreeing researchers jointly design a study, agree on methodology and success criteria in advance, and publish together regardless of the outcome — ensuring that the study is designed to be maximally rigorous from both perspectives (c) It only works in psychology (d) One side always wins

(b) The key innovation is joint design and pre-commitment to publish. This prevents either side from rigging the methodology or dismissing the results, and produces higher-quality evidence than either side would produce alone.

Q: Pre-registration primarily addresses: (a) Funding bias (b) Researcher degrees of freedom (p-hacking) — by requiring researchers to commit to their hypothesis and analysis plan before seeing the data (c) Peer review quality (d) Publication speed

(b) Pre-registration eliminates the flexibility that allows researchers to adjust their analysis until they find a "significant" result. Studies from pre-registered analyses show smaller (and more accurate) effect sizes than non-pre-registered studies.

Q: Registered reports produce approximately 55-60% null results compared to ~5-10% in traditional publications. This difference exists because: (a) Registered reports are lower quality (b) Traditional publications are massively filtered by publication bias — null results aren't submitted or accepted. Registered reports reveal the actual distribution by removing this filter (c) Registered report researchers are less skilled (d) Random chance

(b) The difference reveals the scale of publication bias in traditional science. The true rate of null results is probably around 50-60%; traditional publishing suppresses most of them, creating a distorted picture of reality.

Q: The chapter identifies the "overcorrection risk" for each tool because: (a) Tools always fail (b) Every correction mechanism can itself become a source of error (Theme 9) — mandatory pre-registration could suppress exploration, excessive replication demands could paralyze research, and rigid red teams could create destructive criticism cultures (c) Overcorrection never happens (d) The chapter is pessimistic about reform

(b) This is Theme 9 of the book applied to the solutions themselves. The goal is calibrated correction (Chapter 21) — arriving at the right answer rather than swinging to the opposite wrong answer.

Q: The chapter argues that registered reports are "the single most effective institutional innovation" because: (a) They are the cheapest to implement (b) They eliminate publication bias at its source — by making the publication decision before data exist, removing the incentive to produce positive results and the filter against negative ones (c) They are universally adopted (d) They work in every field

(b) Registered reports address the root cause of publication bias (the publication decision depends on results) rather than a symptom. Other tools address researcher behavior; registered reports change the institutional structure.

Q: Independent replication funding is described as "essential and underfunded" because: (a) It costs too much (b) Replication is the most direct mechanism for catching errors, but current incentive structures disincentivize it — researchers, journals, and funding agencies all reward novelty over verification (c) No one can do replications (d) Replications always confirm the original

(b) The structural problem is clear: no one has an incentive to check other people's work. Dedicated replication funding creates that incentive. Where it has been tried (NWO in the Netherlands, EEF in the UK), it has caught errors and calibrated effect sizes.

Q: Prediction markets for scientific claims address consensus enforcement because: (a) They are expensive (b) They provide an anonymous mechanism for expressing doubt — researchers can bet against a claim without publicly challenging the consensus, and the market doesn't weight bets by the bettor's prestige (c) They always predict correctly (d) They replace peer review

(b) The anonymity and prestige-independence of prediction markets address two key features of consensus enforcement: the career risk of public dissent and the authority cascade that weights opinions by status.

Q: The Tool-Failure Mode Matrix serves what practical purpose? (a) It replaces the Epistemic Health Checklist (b) It allows a field to identify which specific tools would address its specific vulnerabilities — matching solutions to diagnosed problems rather than adopting tools at random (c) It ranks tools from best to worst (d) It eliminates the need for further research

(b) The matrix connects diagnosis (which failure modes are active?) to treatment (which tools address those failure modes?). Combined with the Epistemic Health Checklist, it produces targeted recommendations rather than generic advice.

Q: The chapter concludes that: (a) One tool can fix everything (b) No tool is worth implementing (c) Fields need a portfolio of correction mechanisms tailored to their specific vulnerability profile, implemented with awareness of overcorrection risk (d) Individual behavior change is sufficient

(c) No single tool addresses all failure modes. The most effective approach is to diagnose the field's specific vulnerabilities (Checklist), identify which tools address those vulnerabilities (Matrix), and implement them with awareness of the pendulum dynamic (overcorrection warning).

Quiz: Adversarial Collaboration and Other Tools

Q1. Adversarial collaboration differs from normal scientific debate because:

(a) It eliminates disagreement (b) Disagreeing researchers jointly design a study, agree on methodology and success criteria in advance, and publish together regardless of the outcome — ensuring that the study is designed to be maximally rigorous from both perspectives (c) It only works in psychology (d) One side always wins

Answer

**(b)** The key innovation is joint design and pre-commitment to publish. This prevents either side from rigging the methodology or dismissing the results, and produces higher-quality evidence than either side would produce alone.

Q2. Pre-registration primarily addresses:

(a) Funding bias (b) Researcher degrees of freedom (p-hacking) — by requiring researchers to commit to their hypothesis and analysis plan before seeing the data (c) Peer review quality (d) Publication speed

Answer

**(b)** Pre-registration eliminates the flexibility that allows researchers to adjust their analysis until they find a "significant" result. Studies from pre-registered analyses show smaller (and more accurate) effect sizes than non-pre-registered studies.

Q3. Registered reports produce approximately 55-60% null results compared to ~5-10% in traditional publications. This difference exists because:

(a) Registered reports are lower quality (b) Traditional publications are massively filtered by publication bias — null results aren't submitted or accepted. Registered reports reveal the actual distribution by removing this filter (c) Registered report researchers are less skilled (d) Random chance

Answer

**(b)** The difference reveals the scale of publication bias in traditional science. The true rate of null results is probably around 50-60%; traditional publishing suppresses most of them, creating a distorted picture of reality.

Q4. The chapter identifies the "overcorrection risk" for each tool because:

(a) Tools always fail (b) Every correction mechanism can itself become a source of error (Theme 9) — mandatory pre-registration could suppress exploration, excessive replication demands could paralyze research, and rigid red teams could create destructive criticism cultures (c) Overcorrection never happens (d) The chapter is pessimistic about reform

Answer

**(b)** This is Theme 9 of the book applied to the solutions themselves. The goal is calibrated correction (Chapter 21) — arriving at the right answer rather than swinging to the opposite wrong answer.

Q5. Red teams are most effective when:

(a) They have no authority (b) The institutional culture genuinely values dissent (high Dissent Tolerance score) and leadership acts on red team findings — and least effective when they become performative theater (c) They are staffed by the weakest personnel (d) They agree with the consensus

Answer

**(b)** Red team effectiveness is entirely determined by institutional culture. In organizations that treat red teams as a checkbox, the exercise is meaningless. The military's experience (Chapter 28) shows that even extensive red teaming doesn't prevent failure when structural incentives override the findings.

Q6. The chapter argues that registered reports are "the single most effective institutional innovation" because:

(a) They are the cheapest to implement (b) They eliminate publication bias at its source — by making the publication decision before data exist, removing the incentive to produce positive results and the filter against negative ones (c) They are universally adopted (d) They work in every field

Answer

**(b)** Registered reports address the root cause of publication bias (the publication decision depends on results) rather than a symptom. Other tools address researcher behavior; registered reports change the institutional structure.

Q7. Independent replication funding is described as "essential and underfunded" because:

(a) It costs too much (b) Replication is the most direct mechanism for catching errors, but current incentive structures disincentivize it — researchers, journals, and funding agencies all reward novelty over verification (c) No one can do replications (d) Replications always confirm the original

Answer

**(b)** The structural problem is clear: no one has an incentive to check other people's work. Dedicated replication funding creates that incentive. Where it has been tried (NWO in the Netherlands, EEF in the UK), it has caught errors and calibrated effect sizes.

Q8. Prediction markets for scientific claims address consensus enforcement because:

(a) They are expensive (b) They provide an anonymous mechanism for expressing doubt — researchers can bet against a claim without publicly challenging the consensus, and the market doesn't weight bets by the bettor's prestige (c) They always predict correctly (d) They replace peer review

Answer

**(b)** The anonymity and prestige-independence of prediction markets address two key features of consensus enforcement: the career risk of public dissent and the authority cascade that weights opinions by status.

Q9. The Tool-Failure Mode Matrix serves what practical purpose?

(a) It replaces the Epistemic Health Checklist (b) It allows a field to identify which specific tools would address its specific vulnerabilities — matching solutions to diagnosed problems rather than adopting tools at random (c) It ranks tools from best to worst (d) It eliminates the need for further research

Answer

**(b)** The matrix connects diagnosis (which failure modes are active?) to treatment (which tools address those failure modes?). Combined with the Epistemic Health Checklist, it produces targeted recommendations rather than generic advice.

Q10. The chapter concludes that:

(a) One tool can fix everything (b) No tool is worth implementing (c) Fields need a portfolio of correction mechanisms tailored to their specific vulnerability profile, implemented with awareness of overcorrection risk (d) Individual behavior change is sufficient

Answer

**(c)** No single tool addresses all failure modes. The most effective approach is to diagnose the field's specific vulnerabilities (Checklist), identify which tools address those vulnerabilities (Matrix), and implement them with awareness of the pendulum dynamic (overcorrection warning).

Scoring Guide

9-10 correct: Excellent. You can match tools to failure modes and evaluate trade-offs.
7-8 correct: Good. Review the Tool-Failure Mode Matrix and the overcorrection warnings.
5-6 correct: Fair. Revisit the distinction between pre-registration and registered reports, and the red team effectiveness conditions.
Below 5: Re-read the chapter focusing on what problem each tool solves and what its limitations are.