Case Study: Designing a Self-Correcting Research Institute From Scratch
The Design Challenge
Imagine you have the resources to build a new research institute — 100 researchers, adequate funding, freedom to design any structure you want. Your mandate: produce the most reliable knowledge possible in your field. You know everything in this book.
What do you build?
The Blueprint
Hiring: Select for Calibration, Not Just Brilliance
Traditional research institutes hire for domain expertise, publication record, and prestige. These criteria select for the very traits that produce failure modes — confidence without calibration, career investment in specific positions, and prestige-weighted evaluation.
Design choice: Hiring criteria include a calibration assessment (Chapter 35) — not as a filter (everyone is overconfident) but as a baseline. Candidates are also evaluated on: (a) history of productive updating (have they changed their mind about something important?), (b) collaborative inclination (are they willing to design studies with people who disagree?), and (c) methodological breadth (do they use multiple approaches?).
Which principle: Incentive Alignment (P2) — hiring criteria signal what the institution values, and values shape behavior.
Funding: Separate Discovery from Verification
The institute allocates its research budget in three streams: - 60% Discovery: Traditional research — hypothesis-driven, exploratory, novel - 25% Verification: Independent replication of the institute's own findings and important external findings - 15% Correction: Studies specifically designed to challenge the institute's most successful prior results
Design choice: Verification and correction funding is administered by a separate committee whose evaluation criteria explicitly exclude "did the original result hold up?" — they evaluate rigor and importance, not outcome.
Which principles: Replication Norms (P4), Incentive Alignment (P2).
Publication: Registered Reports as Default
The institute's internal publication policy uses registered reports (Chapter 34) as the default for confirmatory research. Exploratory research is published separately with clear labeling.
Design choice: All studies accepted at Stage 1 are published regardless of results. The institute maintains a public record of null results alongside positive results — creating a complete evidence base.
Which principles: Fast Feedback Loops (P1), Uncertainty Quantification (P7).
Error Culture: The Monthly Revision Meeting
Once a month, the entire institute meets for a "Revision Meeting" — modeled on medical M&M conferences and tech blameless postmortems. Researchers present their most important corrections — findings they previously believed that turned out to be wrong, methods they changed, conclusions they revised.
Design choice: Presenting a significant correction is a positive career event — it is tracked in the institute's annual review and treated as evidence of intellectual honesty and methodological rigor.
Which principles: Correction Celebration (P6), Fast Feedback Loops (P1).
External Challenge: The Outsider Fellowship
The institute maintains a permanent "Outsider Fellowship" — three positions for researchers from adjacent fields who are invited specifically to challenge the institute's assumptions. Fellows have access to all data, attend all meetings, and are evaluated solely on the quality of their critiques.
Design choice: Outsider Fellows serve two-year terms and are replaced by new outsiders, preventing assimilation into the institutional consensus.
Which principles: Structural Outsider Access (P3), Measurement Validity Audits (P5).
Metrics: The Annual Goodhart Audit
Every year, the institute reviews its own metrics — publication counts, citation rates, funding success, student placements — and asks: "Has any of these metrics become a target? Is gaming occurring? Are we optimizing for the metric at the expense of the goal the metric was designed to measure?"
Design choice: The audit is conducted by the Outsider Fellows, who have the structural independence to identify metric corruption that insiders might not see (or might not want to see).
Which principle: Measurement Validity Audits (P5).
The Hard Trade-offs
This blueprint is not cost-free. The trade-offs include:
-
Speed vs. reliability. 25% of the budget goes to verification and 15% to correction — leaving less for discovery. The institute will produce fewer novel findings than a traditional institute with the same resources.
-
Comfort vs. calibration. Monthly revision meetings, outsider fellows, and Goodhart audits create an environment of continuous scrutiny. This is intellectually healthy but psychologically demanding. Some excellent researchers will prefer institutions that don't require this level of self-examination.
-
Prestige vs. honesty. Publishing null results and corrections reduces the institute's appearance of "success" by traditional metrics. External evaluators who count positive publications will rate the institute lower than traditional institutes — even though its evidence base is more reliable.
-
Independence vs. funding. Separating discovery from verification funding requires either a large endowment or a funding model that explicitly values reliability. Most current funding models do not.
Analysis Questions
1. Score this blueprint against the Epistemic Health Checklist (Chapter 32). What would each of the 10 dimensions score? Where are the remaining vulnerabilities?
2. Identify the three design choices that face the most institutional resistance. For each, apply the dissent strategy framework (Chapter 33) to design an implementation approach that maximizes adoption.
3. The blueprint trades discovery speed for reliability. Is this trade-off worth it? Under what conditions would a traditional institute (maximizing discovery, minimizing verification) produce better outcomes than this self-correcting institute? Under what conditions would the self-correcting institute produce better outcomes?