Research Methods Primer

This primer is for readers who want to go deeper — whether evaluating the claims in this book, reading the primary sources in the bibliography, or conducting their own cross-domain investigations. It covers the essential skills of critical reading, statistical reasoning, and methodological awareness that underpin serious pattern recognition.


1. How to Read a Scientific Paper

Scientific papers are not written to be read the way you read a novel. They are structured documents with conventions that, once understood, allow you to extract what you need efficiently.

The Standard Structure

Most empirical papers follow the IMRaD format:

  • Introduction: Why this study was conducted. What gap in knowledge does it address? What hypothesis does it test?
  • Methods: How the study was done. What data were collected? What procedures were followed? This is the section that determines whether the results are trustworthy.
  • Results: What the data showed. Tables, figures, and statistical tests. Ideally, this section reports findings without interpreting them.
  • Discussion: What the results mean, according to the authors. How they relate to previous work. Limitations acknowledged by the authors.

A Reading Strategy

Do not read papers front to back. Instead:

  1. Read the abstract to decide if the paper is relevant. The abstract should tell you the question, the method, the key result, and the conclusion.

  2. Look at the figures and tables. In a well-written paper, the core findings are visible in the data displays. Can you understand the main result from the figures alone?

  3. Read the introduction's last paragraph. This is where authors state their specific hypothesis or research question. It tells you exactly what they set out to show.

  4. Read the discussion's first paragraph. This is where authors state their main conclusion. Compare this to the hypothesis — did they find what they expected?

  5. Now read the methods section carefully. This is where most problems hide. Ask: - How large was the sample? (Small samples produce unreliable results.) - How were participants or subjects selected? (Non-random selection introduces bias.) - What was the control condition? (Without a proper control, you cannot attribute effects to the treatment.) - Could the researchers have influenced the outcome? (Blinding prevents this.) - Were the measurements valid? (Did they actually measure what they claimed to measure?)

  6. Finally, read the full results. Pay attention to effect sizes (how large was the difference?), not just statistical significance (could the difference be due to chance?). A statistically significant effect can be trivially small.

Red Flags

Be skeptical when you encounter: - Claims of enormous effects from simple interventions - Failure to report effect sizes alongside p-values - Post-hoc hypotheses presented as if they were pre-registered predictions - Dismissal of contradictory evidence from other studies - Conflicts of interest (funding from parties who benefit from the results) - Absence of a pre-registration or replication attempt


2. Understanding Statistical Significance

Statistical significance is the most misunderstood concept in science. Getting it right is essential for evaluating evidence.

What a p-value Actually Means

A p-value answers this question: "If there were truly no effect (the null hypothesis), how likely would we be to observe data at least this extreme?"

  • A p-value of 0.05 means: "If there is no real effect, there is a 5% chance of getting results this extreme by random chance."
  • A p-value does not mean: "There is a 95% chance the effect is real." This is a common and consequential misinterpretation.

Why p < 0.05 Is Not Enough

The conventional threshold of p < 0.05 has several problems:

  1. Multiple testing. If you test 20 hypotheses and none of them are true, you expect one to be "significant" at p < 0.05 by chance alone. Researchers who test many variables and report only the significant ones (p-hacking) produce false positives at alarming rates.

  2. Base rate neglect. If you are testing a hypothesis that has only a 1% prior probability of being true, then even a p < 0.05 result is more likely to be a false positive than a true positive. This connects directly to Bayesian reasoning (Chapter 10) — the p-value ignores the prior probability.

  3. Effect size matters more. A study with 10,000 participants can detect statistically significant effects that are so small they have no practical importance. "Statistically significant" does not mean "practically significant" or "large."

  4. Confidence intervals are more informative. Instead of a binary yes/no (significant or not), a confidence interval tells you the range of plausible effect sizes. A 95% confidence interval of [0.01, 0.03] tells you the effect is real but tiny. A 95% confidence interval of [-0.5, 3.0] tells you there is too much uncertainty to draw conclusions.

What to Look For Instead

  • Effect size: How large is the observed difference, in meaningful units? Cohen's d, odds ratios, and correlation coefficients are common measures.
  • Confidence intervals: What range of effect sizes is consistent with the data?
  • Pre-registration: Was the hypothesis specified before the data were collected? Pre-registration prevents p-hacking.
  • Replication: Has the finding been reproduced by independent researchers? A single study, no matter how significant, is preliminary evidence.

3. The Replication Crisis

Beginning around 2011, several large-scale efforts to reproduce published findings revealed that many could not be replicated. This "replication crisis" has profound implications for how we evaluate evidence.

Key Findings

  • The Open Science Collaboration (2015) attempted to replicate 100 psychology studies. Only 36% of replications produced statistically significant results in the same direction as the originals.
  • The Reproducibility Project: Cancer Biology found that many landmark cancer biology findings were difficult or impossible to reproduce.
  • John Ioannidis's influential 2005 paper argued, on mathematical grounds, that most published research findings are false, particularly in fields with small sample sizes, small effect sizes, and large numbers of tested hypotheses.

Root Causes

  1. Publication bias. Journals prefer to publish positive results (effects found) over negative results (no effect found). This creates a literature that systematically overrepresents effects.

  2. P-hacking and researcher degrees of freedom. Researchers have many choices in how to analyze data (which variables to include, which outliers to exclude, which subgroups to examine). Each choice is an opportunity to find a "significant" result by chance.

  3. Small samples. Underpowered studies (too few participants to reliably detect real effects) produce noisy results. The significant results from small studies tend to overestimate effect sizes, and many will be false positives.

  4. Career incentives. Researchers are rewarded for publishing novel, surprising, statistically significant findings. This creates a system-level pressure toward exactly the practices that produce unreliable results. This is a Goodhart's Law problem (Chapter 15): when "published significant findings" becomes the target, it ceases to be a reliable measure of scientific truth.

What Has Changed

The replication crisis has prompted significant reforms: - Pre-registration of hypotheses and analysis plans (e.g., on the Open Science Framework) - Registered Reports, where journals accept papers based on the research question and methods before data are collected - Greater emphasis on effect sizes and confidence intervals rather than p-values alone - Increased publication of replication studies and null results - Adoption of open data and open materials practices

Implications for Cross-Domain Thinking

The replication crisis is itself a cross-domain pattern. The incentive structures that produce unreliable science (rewarding novelty over reliability, publication bias, researcher degrees of freedom) are instances of Goodhart's Law, moral hazard, and cobra effects — patterns that appear throughout this book. Understanding why science can go wrong makes you a better consumer of scientific claims in every domain.


4. Evaluating Evidence Quality

Not all evidence is created equal. A hierarchy of evidence quality helps you assess how much weight to give different claims.

The Evidence Hierarchy (from strongest to weakest)

  1. Systematic reviews and meta-analyses of multiple randomized controlled trials. These synthesize all available evidence on a question, weighting studies by quality and sample size.

  2. Randomized controlled trials (RCTs). Participants are randomly assigned to treatment or control groups, minimizing confounding variables. The gold standard for causal claims.

  3. Cohort studies. Researchers follow a group over time, comparing those exposed to a factor with those not exposed. Can establish association but not definitive causation (confounders may exist).

  4. Case-control studies. Researchers compare people with an outcome to people without it, looking for differences in exposure history. Subject to recall bias and confounding.

  5. Cross-sectional surveys. A snapshot of a population at one time point. Can identify correlations but cannot establish temporal ordering or causation.

  6. Case reports and expert opinion. Individual observations and professional judgment. Valuable for generating hypotheses but not for testing them.

Key Questions for Evaluating Any Claim

Regardless of where it falls in the hierarchy, ask:

  • Is the sample representative? Results from college students (the WEIRD problem — Western, Educated, Industrialized, Rich, Democratic) may not generalize to other populations.
  • Is there a plausible mechanism? A correlation without a plausible causal mechanism should be treated as preliminary. Correlation does not imply causation — but a correlation with a plausible mechanism is stronger evidence.
  • What is the effect size? Statistical significance with a tiny effect size suggests the finding may be real but unimportant.
  • Has it been replicated? Independent replication is the strongest evidence that a finding is robust.
  • Who funded the research? Industry-funded research systematically finds more favorable results for the funder's products. This is not always fraud — it can result from subtle choices in study design.
  • What would change your mind? If the authors do not specify what evidence would falsify their claims, be cautious. Unfalsifiable claims are not scientific.

Special Considerations for Cross-Domain Claims

This book makes claims about patterns that appear across domains. Evaluating such claims requires additional scrutiny:

  • Is the cross-domain pattern genuinely structural, or merely superficial? Two phenomena can look similar without sharing underlying mechanisms. (Chapter 1 addresses this distinction.)
  • Has the pattern been tested in the target domain, or only imported from the source domain by analogy? An analogy is a hypothesis, not a proof.
  • Are the boundary conditions the same? A pattern that holds in physics may not hold in social systems because social agents can learn, adapt, and respond to the pattern itself (reflexivity).

5. Designing Your Own Cross-Domain Investigations

One of this book's goals is to train you to spot patterns yourself. Here is a methodology for doing so rigorously.

Step 1: Notice the Pattern

The process begins with a hunch — a feeling that something in one domain resembles something in another. At this stage, do not censor yourself. Record the observation: - What is the source domain (where you first noticed the pattern)? - What is the target domain (where you think it might also appear)? - What specific structural feature do you think is shared?

Step 2: Specify the Analogy Precisely

Vague analogies are useless. Map the correspondence explicitly: - What corresponds to what? (e.g., "genes correspond to memes," "temperature corresponds to social volatility") - What relationships are preserved? (e.g., "higher mutation rate corresponds to higher innovation rate") - What does not correspond? (e.g., "genes are passively selected; memes are actively chosen") Identifying the limits of the analogy is as important as identifying the correspondences.

Step 3: Search for the Pattern in the Target Domain

Now test whether the pattern actually appears in the target domain: - Look for independent evidence. Has anyone in the target domain noticed this pattern without reference to the source domain? Independent discovery of the same pattern is strong evidence. - Look for quantitative signatures. If the source domain pattern produces a power-law distribution, does the target domain data also follow a power law? If the source pattern produces oscillations with a specific period, do you see similar oscillations? - Look for the mechanism. Is there a plausible process in the target domain that would generate the pattern? Pattern similarity without mechanistic similarity is weaker evidence.

Step 4: Test the Analogy's Limits

Every analogy breaks down somewhere. The valuable question is: where? - Under what conditions does the pattern appear in the source domain? Do those conditions hold in the target domain? - Are there cases in the target domain where the pattern should appear but does not? These negative cases are highly informative. - Does the target domain have features (e.g., reflexivity, intentionality, regulation) that might modify or suppress the pattern?

Step 5: Seek Falsification, Not Confirmation

The greatest danger in cross-domain pattern recognition is confirmation bias — you will find what you are looking for if you look hard enough. Counteract this by: - Actively searching for cases where the pattern fails - Asking others to critique the analogy - Formulating a specific prediction that the analogy implies, then checking whether it holds - Being willing to abandon the analogy if evidence warrants

Step 6: Communicate with Epistemic Humility

When presenting a cross-domain pattern, be clear about: - The level of evidence (anecdotal observation, quantitative analysis, formal model) - The limits of the analogy (where it breaks down) - The alternative explanations (could the similarity be coincidental?) - What would falsify the claim


6. Avoiding Common Methodological Pitfalls

Confirmation Bias

You will see the patterns you expect to see. The antidote is deliberately seeking disconfirmation. For every example that supports the pattern, search for a counterexample. If you cannot find one, you may not be looking hard enough.

Survivorship Bias (Chapter 37)

You may identify a pattern only among successful cases, missing the many failures that exhibited the same pattern but with different outcomes. Always ask: "What am I not seeing? What cases have been filtered out?"

The Streetlight Effect (Chapter 35)

You may look for patterns only in domains where data are easily available, missing the most important instances in data-poor domains. Be aware of what data are missing, not just what data are present.

Narrative Capture (Chapter 36)

Once you have a compelling story about a pattern, you may unconsciously distort new evidence to fit the narrative. Write down your hypothesis before examining evidence, and note when evidence surprises you.

Overfitting (Chapter 14)

With enough ingenuity, you can find a "pattern" connecting any two phenomena. The more parameters you adjust and the more examples you cherry-pick, the more likely you are to overfit. Prefer simple, parsimonious patterns over elaborate, multi-step ones.

Confusing Correlation with Causation

Two phenomena may exhibit the same pattern because of a shared underlying cause, not because one causes the other or because there is a deep structural connection. Consider: power laws appear in earthquake sizes, word frequencies, and city sizes. This does not mean earthquakes cause cities. The shared pattern may reflect a common mathematical mechanism (multiplicative processes, preferential attachment) without implying a causal or structural link between the domains.

The Fallacy of Misplaced Precision

Cross-domain analogies are inherently approximate. Demanding exact quantitative correspondence between domains is usually inappropriate. The value of cross-domain thinking is in qualitative structural insights (e.g., "this system has a positive feedback loop that will produce exponential growth until a constraint intervenes"), not in precise numerical predictions transplanted from one domain to another.

Reflexivity in Social Systems

Patterns discovered in physical and biological systems operate on substrates that do not know they are being studied. Social systems are different: when people learn about a pattern (e.g., the cobra effect), they can consciously avoid it or, worse, exploit it. This means social-domain patterns are inherently less stable than natural-science patterns and must be treated as tendencies, not laws.


Summary: A Cross-Domain Research Checklist

Before accepting or asserting a cross-domain pattern, work through these questions:

  1. Pattern identification: What specific structural feature appears in both domains?
  2. Mechanism: Is there a plausible process that would generate the pattern in each domain independently?
  3. Precision: Have you specified the analogy precisely enough to be tested and potentially falsified?
  4. Evidence quality: What kind of evidence supports the pattern in each domain (anecdote, correlation, experiment, mathematical proof)?
  5. Boundary conditions: Under what conditions does the pattern hold, and under what conditions does it fail?
  6. Alternative explanations: Could the apparent similarity be coincidental, or explained by a simpler mechanism?
  7. Falsification: What evidence would convince you the pattern is not real?
  8. Limits: Where does the analogy break down, and what does the breakdown tell you?
  9. Reflexivity: If the pattern involves human systems, how does awareness of the pattern change the pattern?
  10. Humility: Have you communicated the pattern with appropriate caveats about confidence level and limitations?

This checklist will not make you infallible. But it will make you calibrated — aware of how much confidence your evidence warrants, and transparent about what you know and what you are guessing.


For a deeper treatment of research methodology, see the Further Reading sections of Chapters 10, 14, and 35, and the Tier 1 sources in the Bibliography.