Appendix B: Research Methods Primer

How to Read Psychology and Social Science Studies

The Science of Luck: Statistical Thinking, Network Theory, Serendipity Engineering, Opportunity Recognition, and the Psychology of Chance

Throughout this textbook, we cite studies — experiments, surveys, longitudinal analyses, simulations. We claim these studies support particular conclusions about luck, probability, and human behavior. But how much should you trust any given study? How do you know when a "study shows..." headline is worth updating your beliefs, and when it is oversimplified, cherry-picked, or outright misleading?

This appendix gives you the tools to evaluate research claims yourself. You do not need to become a statistician to read studies critically. You need a working understanding of how evidence is gathered, how claims can be distorted, and what questions to ask.

These skills are directly relevant to luck science because claims about luck behavior, lucky personality traits, and luck-increasing strategies are often based on research that is misreported in popular media. The difference between a study that supports a conclusion and a news article that claims a study supports a conclusion can be enormous.

Part 1: The Hierarchy of Evidence

Not all evidence is created equal. Research methods exist on a spectrum from "very weak" to "very strong" depending on how well they can establish that one thing causes another, control for alternative explanations, and generalize to the real world.

Level 1: Anecdote and Personal Testimony

What it is: One person's story about what happened to them. "My friend became lucky after starting a gratitude journal."

What it can do: Generate hypotheses, illustrate concepts, make abstract ideas concrete.

What it cannot do: Establish causation, control for any other explanations, generalize to other people. The friend might have changed jobs, started exercising, stopped isolating — the gratitude journal may have had nothing to do with the change.

In luck science: Anecdotes appear throughout this textbook as illustration, not evidence. When we say "Priya found that building weak ties changed her job search," we are using a fictional example to make a concept vivid — not to prove the concept. The actual evidence is in the studies cited.

The Survivorship Bias Warning: Most motivational anecdotes about luck are survivorship-biased. You hear the story of the person who journaled and got lucky; you don't hear the story of the ten people who journaled and got... roughly the same luck they would have had anyway.

Level 2: Case Study

What it is: In-depth examination of one or a small number of cases — a company, a community, a person — typically combining interviews, observation, and documentary analysis.

What it can do: Provide rich contextual understanding, generate detailed hypotheses, reveal mechanisms and process, document unusual phenomena.

What it cannot do: Establish generalizability (n=1 cannot tell you about the population), or establish causation (no comparison group, no random assignment).

Example in luck science: James Austin's analysis of scientific serendipity (Chapter 29) draws heavily on case studies of individual discoveries. These cases are instructive and hypothesis-generating, but we cannot claim from them that all scientists who tried the same behaviors would have the same serendipitous discoveries.

When case studies are strong: Case studies become more persuasive when (a) many independent cases show the same pattern, (b) the mechanism is clearly described, and (c) the case is not atypical in obvious ways.

Level 3: Survey/Correlational Study

What it is: Measuring two or more variables across a group of people (a sample) and examining whether they are statistically related (correlated).

What it can do: Establish that two variables are related, estimate the size of that relationship, and identify patterns across large populations.

What it cannot do: Establish causation. Correlation is not causation. When we find that people who keep luck journals report more lucky events, we cannot conclude from a survey alone that journaling caused the increased luck perception. Perhaps people who naturally notice more opportunities are also more likely to keep journals.

The confound problem: A confounding variable is a third variable that causes both the measured variable and the outcome, creating an apparent relationship between them. Coffee consumption correlates with heart disease — but coffee drinkers also smoke more. Smoking is a confounder.

In luck science: Wiseman's findings about "lucky personality traits" (Chapter 12) are largely correlational — lucky self-identified people tend to have open body language, large networks, and positive expectations. This correlation does not prove that open body language causes luck; it's possible that some third factor (temperament, early social experience) causes both.

How to evaluate: Ask: "What else could explain this relationship?" If you can easily identify confounders that the study didn't control for, treat the finding with appropriate skepticism.

Level 4: Quasi-Experiment

What it is: A study that exploits a naturally occurring assignment that approximates random assignment. Classic examples include natural experiments (a policy change affects some regions but not others) and difference-in-differences designs (comparing groups before and after an intervention).

What it can do: Substantially reduce confounding because the "assignment" to conditions is determined by external factors rather than self-selection.

Goldin and Rouse's blind audition study (Appendix A, Study 27) is a quasi-experiment: orchestras adopted screens at different times, creating a natural experiment in which the "treatment" (blind audition) was not selected by the individuals being evaluated.

Limitations: The external factor causing assignment may itself be related to outcomes in unknown ways. Quasi-experiments are stronger than correlational studies but weaker than true experiments.

In luck science: The Corak/Chetty mobility studies use natural variation in geographic location as a quasi-experimental variable. Children in different zip codes have "different luck" due to where their parents happened to live, and the outcomes associated with different locations can be estimated.

Level 5: Randomized Controlled Experiment

What it is: Participants are randomly assigned to different conditions (treatment vs. control), and outcomes are compared. Random assignment means that, on average, the groups are identical on all measured and unmeasured characteristics before the intervention — so any difference afterward can be attributed to the treatment.

What it can do: Establish causation. The gold standard for demonstrating that an intervention works.

Limitations: Many important questions cannot be studied experimentally for ethical or practical reasons (you cannot randomly assign people to grow up in poverty). Laboratory experiments may not generalize to real-world conditions. Short-term effects may not predict long-term outcomes.

Examples in luck science: The Salganik, Dodds, and Watts music lab experiment (Appendix A, Study 3) is a true experiment — participants were randomly assigned to the social influence or independent conditions, allowing causal inference about social influence effects. Emmons and McCullough's gratitude studies (cited in Chapter 16) are randomized experiments demonstrating causal effects of gratitude practices on well-being.

Level 6: Meta-Analysis and Systematic Review

What it is: A statistical synthesis of all available studies on a topic, combining results to produce more reliable overall estimates of effect sizes and to identify moderating conditions.

What it can do: Provide the most reliable estimates of effect sizes; identify inconsistencies across studies that point toward moderating factors; correct for publication bias through funnel plot analysis and trim-and-fill methods.

Limitations: Meta-analyses are only as good as the studies they include. If all available studies are methodologically weak, a meta-analysis of them is still weak (GIGO: garbage in, garbage out). Publication bias is a serious concern.

In luck science: Meta-analyses of mindset interventions (relevant to Chapter 13), gratitude practices (Chapter 16), and social capital returns (Chapter 21) exist and are cited. They generally find smaller effect sizes than the original studies and more conditionality — the interventions work, but not uniformly and not as dramatically as popular accounts suggest.

Reading a meta-analysis: Look for the confidence intervals on effect sizes (wide intervals indicate uncertainty), the I² statistic (heterogeneity across studies — high I² means effects vary widely by context and condition), and funnel plot asymmetry (suggests publication bias).

Part 2: What p-Values Actually Mean

The p-value is one of the most misunderstood concepts in science. Let's get it right.

The Technical Definition

The p-value is the probability of observing results at least as extreme as those obtained, if the null hypothesis were true.

The null hypothesis is typically "there is no effect" or "there is no relationship."

Example: Wiseman tests whether lucky people notice more opportunities than unlucky people. The null hypothesis is: "Lucky and unlucky people notice opportunities at the same rate." He finds a difference, and the p-value is 0.03.

This means: if luck had no effect on opportunity noticing, the probability of observing a difference as large as the one observed (or larger) is 3%.

What p-Values Do NOT Mean

A p-value of 0.03 does NOT mean there is a 97% chance the effect is real. It only describes the probability of the data given the null hypothesis, not the probability of the hypothesis given the data. These are very different things.
A p-value does NOT measure the size of the effect. A tiny, practically meaningless effect can produce a very small p-value with a large enough sample size.
"Not significant" does not mean no effect exists. It means the data did not provide strong enough evidence to reject the null hypothesis, which depends on sample size among other factors.
p < 0.05 is not a magic threshold. The 0.05 convention is arbitrary (set by Ronald Fisher in the 1920s), and many researchers argue for more stringent thresholds (0.01, 0.005) or the abandonment of threshold-based inference entirely.

The Conventional Threshold

Most psychology research uses p < 0.05 as the threshold for "statistical significance." This means: if the null hypothesis were true, results this extreme would happen fewer than 5% of the time. This is the evidence level that typically earns publication.

The problem: if researchers test many hypotheses, 5% of all null results will appear "significant" by chance alone. With enough research labs testing enough hypotheses, false positives accumulate — this is one root cause of the replication crisis.

Part 3: Statistical Significance vs. Practical Significance

A finding can be statistically significant (reliably detectable) without being practically significant (large enough to matter in real life).

Example from luck science: Suppose a study finds that daily luck journaling is associated with a statistically significant increase in self-reported positive events (p = 0.02). The effect size is d = 0.12.

Statistical significance: Yes. The result is unlikely to be chance.
Practical significance: An effect size of d = 0.12 is very small by conventional standards (small = 0.2, medium = 0.5, large = 0.8). It means the journaling group scores 0.12 standard deviations higher than the control group — which might not be noticeable in any individual's experience.

Why this matters for luck science: Many pop-psychology claims about luck-increasing behaviors are based on small effect sizes that are statistically reliable but individually inconsequential. Individually, any one technique probably produces a small effect. The textbook's argument is that combining multiple strategies produces a compound effect — and that the habits themselves (curiosity, network-building, resilience) are intrinsically valuable beyond their luck effects.

Part 4: Effect Sizes and Why They Matter

Effect sizes translate statistical findings into meaningful units that can be compared across studies.

Cohen's d

Used for comparing two group means. Calculated as the difference between group means divided by the pooled standard deviation.

d = 0.2: Small effect. The average treated person scores above 58% of the control group.
d = 0.5: Medium effect. The average treated person scores above 69% of the control group.
d = 0.8: Large effect. The average treated person scores above 79% of the control group.

Pearson's r (Correlation Coefficient)

Used for relationships between continuous variables.

r = 0.1: Small correlation (1% of variance explained)
r = 0.3: Medium correlation (9% of variance explained)
r = 0.5: Large correlation (25% of variance explained)

The variance explained issue: Even a correlation of r = 0.5 leaves 75% of outcome variance unexplained. When a researcher says "weak tie density correlates r = 0.35 with job-finding success," they are saying this variable explains about 12% of the variance. Important! But 88% of outcome variance is still unexplained.

Odds Ratios

Used in studies with binary outcomes (yes/no). An odds ratio of 1.5 means the outcome is 1.5 times as likely in the treatment vs. control group.

The Bertrand-Mullainathan (2004) audit study found an odds ratio of approximately 1.5 for callback rates: White-sounding names were about 50% more likely to receive callbacks than African American-sounding names.

Part 5: The Replication Crisis and What It Means for Luck Research

What Happened

Beginning around 2008–2011, psychologists began systematically attempting to replicate landmark published findings. The results were alarming.

Open Science Collaboration (2015): Attempted to replicate 100 psychology studies from major journals. Only 39 of 100 replications found statistically significant effects at the same level as the original. Effect sizes were on average half those of the original studies.

Social priming research: A large cluster of studies showing that exposure to words related to old age made people walk slower, or that holding warm objects made people feel warmer toward strangers, have largely failed to replicate reliably.

Ego depletion: The finding that self-control is a limited resource that depletes with use — influential for years and central to many applied recommendations — has shown inconsistent replication.

Why It Happened

Publication bias: Journals preferentially publish statistically significant results. Researchers who find nothing don't submit; reviewers reject non-significant findings. This creates a database of published findings that overrepresents lucky positive results.
p-Hacking (researcher degrees of freedom): Researchers make many choices during analysis — which participants to exclude, which variables to control for, when to stop collecting data. Each choice can push results toward significance. When researchers make these choices guided by whether results look significant, the published p-value no longer has the meaning it claims.
Small samples: Many classic studies used samples of 20–50 participants. Small samples produce highly variable effect estimates. The same study run twice on small samples will often produce different results — not because the effect isn't real, but because sampling variance is high.
File drawer effect: Studies that fail to find significant effects are published less often. The literature is a biased sample of all conducted research.

What This Means for Luck Research

Be especially cautious about: - Single studies with small samples (n < 100) - Studies that have not been independently replicated - Large, surprising effect sizes from a single lab - Findings that align too neatly with popular narratives

Be more confident about: - Findings that replicate across multiple labs and cultures - Findings supported by meta-analyses using rigorous inclusion criteria - Findings with theoretically coherent mechanisms - Findings with large effect sizes in well-powered studies

Specific to this textbook: The foundational findings this textbook relies upon most heavily — the structure of weak ties (Granovetter), the small-world network model (Watts & Strogatz), prospect theory (Kahneman & Tversky), survivorship bias mechanics, and the relative age effect — are among the most robustly replicated findings in social science. More speculative findings about specific luck-increasing interventions are presented with appropriate hedging in the main text.

Part 6: How to Evaluate a Study

When you encounter a study claim — in a news article, social media post, or even this textbook — ask these questions in order.

Step 1: What is the actual claim?

Popular science writers routinely overstate what a study shows. "Scientists discover that lucky people smile more" might be based on a study that found a correlation of r = 0.15 between self-reported smiling frequency and self-reported lucky events, in a sample of 87 university students. That is not "discovery." That is a weak, preliminary correlation.

Find the original paper (see Part 7 on finding primary sources). Read the abstract and conclusion, not just the news coverage.

Step 2: What was the design?

Experiment, survey, case study, simulation, or meta-analysis?
Was there a control group?
Were participants randomly assigned?
Was the study pre-registered (researchers publicly declared their hypothesis and analysis plan before collecting data, preventing post-hoc hypothesis generation)?

Step 3: Who was the sample?

How many participants?
Where were they recruited (WEIRD populations — Western, Educated, Industrialized, Rich, Democratic — are overrepresented in psychology research)?
Are they representative of the people the conclusion is claimed to apply to?

Step 4: What was actually measured?

Is the measurement valid — does it capture what it claims to capture? Self-report measures of "luck" are very different from behavioral measures or objective outcome records. "Lucky events" counted by participants are filtered through perception and memory, which are subject to the biases the textbook describes.

Step 5: How large was the effect?

Look for Cohen's d, Pearson's r, odds ratios, or explained variance. Is the effect size reported? Is it practically meaningful?

Step 6: Was it replicated?

Has the finding been independently replicated? Has it survived meta-analysis? What does the replication record look like?

Step 7: What are the alternative explanations?

For correlational studies: what confounders haven't been controlled? For experimental studies: were there demand characteristics (participants guessing the hypothesis and behaving accordingly)? Were there experimenter effects?

Part 7: Common Red Flags in Popular Science Reporting

The following phrases and patterns in news articles should trigger skepticism:

"Scientists have discovered..." — Science discovers rarely. Science accumulates evidence, refines estimates, and occasionally overturns previous conclusions.

"A new study shows..." — A single study, especially unpublicated or not yet peer-reviewed, provides very weak evidence. Wait for replication.

"People who X are more likely to Y" — This is almost always a correlational claim presented without the confounders, effect sizes, or sample information that would allow you to assess it.

"All you need to do is..." — Interventions in complex systems like human psychology rarely produce guaranteed outcomes.

Percentage claims without base rates: "X doubles your risk of Y!" What is the base rate of Y? If Y happens to 1 in 10,000 people, doubling the rate means 2 in 10,000 — still very rare.

No sample size mentioned: Small samples produce unreliable estimates.

No effect size mentioned: Statistical significance without effect size is almost meaningless.

Only one study: Find out if it's been replicated.

The study was done on college students: Many psychology findings from WEIRD university student samples do not generalize to broader populations.

Part 8: How to Find Primary Sources

Google Scholar

scholar.google.com — Search by author name, paper title, or keywords. Often finds free full-text versions through institutional repositories or author websites. Look at "Cited by" count for a rough indication of influence (though not quality).

PubMed

pubmed.ncbi.nlm.nih.gov — Biomedical and psychological research. Free to search; many papers available as free full text.

PsycINFO

Database of psychology research available through most university library systems. More comprehensive than Google Scholar for psychology.

JSTOR

jstor.org — Access to many academic journals. Free access to some content; subscription required for full access through libraries.

Unpaywall

unpaywall.org — Browser extension that automatically finds legally free full-text versions of academic papers. An excellent tool for students without institutional access.

SSRN and PsyArXiv

ssrn.com and psyarxiv.com — Preprint servers where researchers post papers before or during peer review. These papers are not yet peer-reviewed, so evaluate them accordingly.

Research Tips

When you find one good paper, look at its reference list for older foundational work.
Use the "Cited by" function to find newer papers that build on a finding.
Look for the lead researcher's lab or personal website — they often post free full-text papers.
If a news article cites a study, the article usually includes a link or the journal name. Track it down.

The claim (from a hypothetical news article): "Researchers found that people who make eye contact with strangers get 73% more unexpected opportunities than those who avoid eye contact. Lucky people naturally make more eye contact."

Let's evaluate this claim using the framework above.

Step 1: Find the original study. Search Google Scholar for "eye contact opportunities luck" and related terms. Suppose we find: Davis, M., & Chen, L. (2021). Eye gaze and opportunity sensitivity: A correlational study. Journal of Personal Development, 12(3), 44–58.

Step 2: Read the abstract. The study measured self-reported eye contact frequency (via survey) and self-reported unexpected positive opportunities (via weekly diary over 4 weeks) in 94 participants recruited via social media.

Step 3: Evaluate the design. This is a correlational study. No control group. No random assignment. Both key variables (eye contact and opportunities) are self-reported. Self-report measures are subject to response bias — people who are generally optimistic and socially engaged may both report more eye contact AND more opportunities, with neither causing the other.

Step 4: Check the sample. n = 94 participants, recruited via social media. This is a small, self-selected, probably young, tech-using, likely WEIRD sample. The study can make claims about this group; generalization to "people" broadly is unsupported.

Step 5: Find the effect size. Suppose the paper reports r = 0.29 between eye contact frequency and opportunity count. This is a small-to-medium correlation. It means eye contact frequency explains 8.4% of variance in reported opportunities. The news article's "73% more" is almost certainly a misstatement of the finding — perhaps comparing the highest vs. lowest eye contact quintiles, which is not the same as the average effect.

Step 6: Was it replicated? This is a 2021 paper. Google Scholar shows 7 citations, none of which are direct replications. The study has not been independently replicated.

Step 7: Alternative explanations. People who naturally make more eye contact may be more extroverted, which would cause them to have more social encounters generally, and to reinterpret interactions as opportunities more readily. The eye contact itself might not be the causal mechanism.

Verdict: The headline is dramatically overstated. The study finds a small-to-medium correlation between self-reported eye contact and self-reported opportunities in a small, self-selected sample, with no replication, no causal design, and substantial confounders. This is hypothesis-generating evidence — it suggests eye contact might be worth studying more carefully — but it does not support the headline claim.

The appropriate response: Maintain the behavior recommendation (eye contact does seem to help social interactions based on a broader literature) but recognize it comes from a converging pattern of evidence, not this single study.

Conclusion: What Good Luck Research Looks Like

The studies in this textbook that deserve highest confidence share these properties:

Multiple independent replications across different labs, cultures, and time periods.
Pre-registration or at least publication prior to the replication crisis.
Plausible mechanisms that connect to established principles in psychology, economics, or sociology.
Moderate to large effect sizes in well-powered samples.
Consistency with what we observe in the world — the finding makes sense given everything else we know.

No study meets all these criteria. But the more criteria a finding meets, the more you should update your beliefs based on it. And the fewer criteria it meets, the more you should treat it as a hypothesis worth keeping in mind while waiting for more evidence.

This is, appropriately, how a scientist thinks about luck: with calibrated uncertainty, proportional to the evidence.

Recommended further reading: Spiegelhalter, D. (2019). The art of statistics. Kahneman, D. (2011). Thinking, fast and slow. Gigerenzer, G. (2002). Calculated risks.