Case Study 2: History and Conspiracy Thinking -- Narrative Overfitting at Two Scales

Case Study 2: History and Conspiracy Thinking -- Narrative Overfitting at Two Scales

"History is merely a list of surprises. It can only prepare us to be surprised yet again." -- Kurt Vonnegut, Slapstick

Two Pattern-Finders, One Error

The professional historian and the conspiracy theorist appear to be opposites. One submits to peer review, demands evidence, and hedges conclusions with caveats. The other operates outside institutional constraints, selects evidence to fit a predetermined narrative, and treats uncertainty as confirmation of hidden machinations. One is credentialed; the other is marginal. One is respectable; the other is not.

And yet they are both doing the same thing: constructing causal narratives from large, complex, noisy datasets. They differ not in the cognitive process they employ but in the regularization they apply to the results. The historian has institutional constraints -- methodological standards, peer review, the demand for parsimony, the obligation to consider counterevidence. The conspiracy theorist has none. Understanding this structural parallel illuminates both the vulnerabilities of historical reasoning and the cognitive mechanisms of conspiracy thinking.

Part I: The Historian's Trap

Why History Is Vulnerable

Historical reasoning faces an overfitting problem that is, in some respects, more severe than the one faced by machine learning or medicine. Machine learning can generate new data by collecting more examples. Medicine can run new trials with new patients. History cannot. The past is a fixed dataset. You cannot rerun World War I with different initial conditions to see whether the alliance system was genuinely causal or merely coincidental. You cannot replicate the French Revolution.

This means that every historical explanation is, in a precise sense, fit to a single observation. The Roman Empire fell once. The Industrial Revolution happened once. The Cold War ended once. Historical explanations are models trained on a sample size of one.

A sample size of one does not make explanation impossible, but it makes overfitting almost guaranteed -- unless the historian is extraordinarily disciplined about constraining the degrees of freedom. And the degrees of freedom available to a historian are enormous.

Consider the question: Why did World War I begin? The historian can invoke:

The alliance system (the web of mutual defense treaties that turned a regional conflict into a continental war)
Imperial competition (rivalry for colonies and global influence)
The arms race (particularly the Anglo-German naval competition)
Nationalism (ethnic and cultural movements destabilizing multi-ethnic empires)
The assassination of Archduke Franz Ferdinand (the proximate trigger)
The Schlieffen Plan (Germany's rigid war plan that required attacking France through Belgium)
Economic competition (industrial rivalry between Britain and Germany)
The Balkan crises of 1908-1913 (a series of escalating confrontations)
The failure of diplomacy (breakdown in communication between heads of state)
Social Darwinism (the belief that war was a natural and healthy competition between nations)
Railroad timetables (the logistical constraints that made mobilization irreversible once started)
The personality of Kaiser Wilhelm II

Each factor is supported by evidence. Each is a real feature of the historical landscape of 1914. A historian invoking all twelve factors has twelve degrees of freedom -- twelve knobs that can be turned, twelve variables whose relative weight can be adjusted. With twelve degrees of freedom and a sample size of one, the historian can construct a narrative that explains the outbreak of war with perfect precision. The narrative will be compelling, evidence-based, and potentially meaningless.

This is the historian's trap: the data is rich enough to support many explanations, and there is no independent dataset to test which explanation is correct.

How Historians Regularize

Professional historians are not unaware of this problem. Over centuries, the discipline has developed regularization techniques -- though it does not call them that.

Source criticism is a form of data quality control. Historians evaluate the reliability, bias, and provenance of their sources, discarding or down-weighting those that are unreliable. This is analogous to data cleaning in machine learning: removing noisy or corrupted data points before fitting the model. A source written decades after an event, by someone with an axe to grind, is noise that could drive overfitting. Discarding it reduces the degrees of freedom available to the interpreter.

Peer review functions in history much as it does in science: other historians evaluate whether the interpretation is supported by the evidence, whether alternative interpretations have been adequately considered, and whether the narrative is parsimonious. Peer reviewers are overfitting detectors. They ask: "Does this interpretation hold up, or is it overfit to a selective reading of the evidence?"

Comparative history is the historian's version of cross-validation. Rather than explaining a single event in isolation, the comparative historian asks whether the explanation holds across multiple cases. If you argue that empires fall because of military overextension, you should check whether this pattern holds for the Roman, Ottoman, British, and Soviet empires. If it fits some cases but not others, your model may be overfit to the cases it fits.

Counterfactual reasoning is the historian's version of out-of-sample testing. When a historian asks "What if?" -- What if the Archduke had not been assassinated? What if Germany had not invaded Belgium? -- they are testing whether their causal model would produce different predictions under different conditions. A model that explains the war only under the exact conditions that actually obtained is more likely to be overfit than one that identifies factors robust enough to produce war even under slightly different circumstances.

Occam's razor operates in history as a stylistic and methodological norm: historians are trained to prefer simpler explanations when they are adequate. An explanation that invokes two causal factors is preferred to one that invokes twelve, provided both account for the key evidence. This is a penalty for complexity -- a bias toward parsimony that reduces the risk of narrative overfitting.

When Historians Overfit Anyway

Despite these safeguards, historians overfit regularly. The most common form is the retrospective narrative: a story that, knowing the outcome, selects and arranges evidence to make the outcome appear inevitable. The historian who argues that the Industrial Revolution was bound to happen in Britain -- because of its coal deposits, its maritime tradition, its patent system, its political stability -- is constructing a narrative that fits the outcome perfectly. But a historian of the early eighteenth century, armed with the same data, would have had no basis for this prediction. Many of the same factors existed in other countries that did not industrialize first.

This is the historical equivalent of fitting a model to the training data and then claiming that the model's predictions were inevitable. The narrative is overfit to the known outcome. Change the outcome -- imagine that the Netherlands or France had industrialized first -- and a different historian could construct an equally compelling narrative explaining why that outcome was inevitable, using different selections from the same evidence.

The great historian E.P. Thompson warned against this tendency in The Making of the English Working Class: "We must not smuggle in the hindsight of subsequent history." Smuggling in hindsight is narrative overfitting. It is the construction of a model that only needs to explain one outcome, with full knowledge of what that outcome was.

Part II: The Conspiracy Theorist's Method

Apophenia at Scale

If the historian's trap is having too many degrees of freedom for a sample size of one, the conspiracy theorist's method is having infinite degrees of freedom for a sample size of one -- and no regularization at all.

Consider the conspiracy theories surrounding the September 11, 2001, attacks. The data available to the conspiracy theorist includes: thousands of hours of video footage, millions of pages of documents, hundreds of witness accounts, the physical evidence of the collapsed buildings, the histories of the hijackers, the intelligence failures that preceded the attacks, the political decisions that followed, and the vast web of geopolitical relationships that formed the context.

In a dataset this large, coincidences are not just possible but mathematically certain. Some witnesses will contradict each other. Some footage will be ambiguous. Some decisions made before the attacks will, in hindsight, appear suspicious. Some people who benefited from the attacks will have had foreknowledge of unrelated events. These are noise -- the inevitable statistical artifacts of any large, complex dataset. They are the equivalent of the spurious correlations that appear when you test enough hypotheses on enough data.

The conspiracy theorist treats these coincidences as evidence. A contradiction between two witnesses is not an instance of the normal unreliability of human memory; it is evidence of a cover-up. An ambiguity in the footage is not the result of camera angles and dust; it is evidence of controlled demolition. A politician's foreknowledge of an unrelated event is not a coincidence; it is evidence of complicity.

The conspiracy theorist's model has, in effect, unlimited degrees of freedom. Any fact can be interpreted as evidence for the conspiracy. Any fact that contradicts the conspiracy can be reinterpreted as evidence of the cover-up's thoroughness. Any absence of evidence is evidence of suppression. The model cannot be falsified, because it can accommodate any data. This is the hallmark of an overfit model: it can explain everything and predict nothing.

The Degrees of Freedom of Interpretation

The critical difference between the historian and the conspiracy theorist is not in the data they examine but in the constraints on interpretation.

The historian is constrained by: - Methodological norms (source criticism, evidentiary standards) - Peer review (other experts evaluate the interpretation) - Parsimony (simpler explanations are preferred) - Falsifiability (the interpretation should be testable, at least in principle) - Professional reputation (career consequences for making unfounded claims)

The conspiracy theorist is constrained by none of these. The absence of institutional constraints means that every interpretive choice is free -- every fact can be read in whatever way best supports the narrative. This is like building a machine learning model with no regularization: the model will fit the training data perfectly, but the fit is meaningless because it has enough flexibility to fit any data.

This is why conspiracy theories are so internally coherent. An overfit model, by definition, explains its training data well. A high-degree polynomial passes through every data point. A conspiracy theory accounts for every known fact. The coherence is not evidence of truth; it is evidence of excess flexibility.

Why Conspiracy Theories Are Seductive

The psychological appeal of conspiracy theories derives from the same cognitive machinery that produces scientific insight: the human desire to find patterns that explain the world. In the language of the chapter, conspiracy theories satisfy the brain's pattern-recognition system. They take noisy, confusing, frightening events and impose a narrative structure that makes them comprehensible.

The key insight is that this narrative satisfaction is independent of truth. A model that fits the data feels right, regardless of whether it is overfit. The student who memorizes every practice problem feels prepared, regardless of whether the memorization will generalize to the exam. The conspiracy theorist who has an explanation for every anomaly feels certain, regardless of whether the explanation would hold up under scrutiny.

Several psychological factors amplify this susceptibility:

Proportionality bias: The tendency to believe that large events must have large causes. The assassination of a president by a lone gunman feels disproportionate -- the effect (the death of the most powerful person in the world) seems to demand a cause of similar magnitude (a vast conspiracy). This bias drives the search for complex explanations when simple ones are adequate, which is a direct driver of overfitting.

Need for cognitive closure: The desire for definitive answers to ambiguous questions. Complex events are inherently ambiguous -- the evidence is incomplete, the causes are multiple, and uncertainty is irreducible. The conspiracy theory eliminates ambiguity by providing a complete, closed narrative. This is emotionally satisfying but epistemically dangerous, because the elimination of ambiguity requires fitting noise as well as signal.

Distrust of institutions: When people distrust the institutions that provide official explanations (governments, media, scientific establishments), they lose access to the regularization those institutions provide. Institutional trust is, in this framework, trust in the regularization process. Without it, individuals must regularize their own thinking -- and most people lack the training and incentive to do so.

The Shared Structure: A Diagnostic Table

Feature	Professional Historian	Conspiracy Theorist
Data source	Historical record (documents, artifacts, accounts)	Same historical record, plus additional coincidences and anomalies
Model	Causal narrative explaining events	Causal narrative explaining events
Degrees of freedom	Constrained by methodology, peer review, parsimony	Unconstrained; any fact can be reinterpreted
Regularization	Source criticism, peer review, Occam's razor, comparative history, counterfactuals	None
Falsifiability	Seeks it (at least in principle)	Resists it (unfalsifiable by design)
Response to contradictory evidence	Revises the model	Incorporates as evidence of cover-up
Risk	Moderate overfitting (narrative smoothing, hindsight bias)	Extreme overfitting (noise read as signal at every level)
Self-correction mechanism	Peer review, scholarly debate, new evidence	None; the theory becomes more elaborate as evidence accumulates

The table reveals a spectrum rather than a binary. The historian is not immune to overfitting -- all narrative construction involves some degree of fitting interpretation to data. But the historian operates within institutional constraints that function as regularization, reducing the risk. The conspiracy theorist operates without constraints, and the result is a model so flexible that it can accommodate any evidence, including evidence that directly contradicts it.

Implications for Critical Thinking

This analysis has practical implications for how we evaluate historical claims, political narratives, and explanations of complex events.

Ask about degrees of freedom. When someone presents an explanation for a complex event, ask: how many causal factors are invoked? How many interpretive choices were made? The more factors and choices, the more likely the explanation is overfit to the specific data rather than capturing a genuine pattern.

Ask about regularization. What constraints were applied to the explanation? Was it peer-reviewed? Does it prefer simpler explanations? Has it been tested against alternative interpretations or different historical cases?

Ask about falsifiability. What evidence would cause the explainer to abandon or significantly modify the explanation? If no evidence could do this -- if the explanation can accommodate any fact -- it is almost certainly overfit.

Ask about proportionality. Is the complexity of the explanation proportional to the complexity of the evidence? A twelve-factor explanation of a well-documented event may be appropriate. A twelve-factor explanation of a single ambiguous data point is probably overfit.

Check your own pattern recognition. The same cognitive machinery that makes conspiracy theories seductive also makes overfit historical narratives compelling. When an explanation feels satisfying -- when everything clicks into place, when the narrative is clean and complete -- pause and ask whether the satisfaction comes from genuine insight or from the emotional reward of a well-fit model.

Questions for Discussion

The case study argues that the difference between a historian and a conspiracy theorist is primarily one of regularization, not intelligence or access to data. Do you agree? If not, what other factors distinguish the two?
The concept of "hindsight bias" -- reading the past through the lens of known outcomes -- is described as a form of narrative overfitting. Can you identify a specific historical narrative that you have encountered (in a textbook, documentary, or popular account) that suffers from this problem? How would the narrative change if the outcome had been different?
The case study suggests that institutional trust functions as trust in regularization. What happens to a society's collective reasoning when institutional trust declines? Does the case study's framework predict the rise of conspiracy thinking in low-trust environments?
Comparative history is described as the historian's version of cross-validation. But historical events are never truly identical -- each case is unique in important ways. Does this limitation invalidate comparative history as a regularization technique, or does it merely weaken it? What would a more rigorous form of historical cross-validation look like?
The case study argues that conspiracy theories are unfalsifiable by design. Can you think of a conspiracy theory that was actually proven true? If so, what distinguished it from the unfalsifiable theories discussed here? What kind of evidence or institutional process was required to establish the truth?