Appendix D: Research Methods Primer — How Psychology Studies Work


Applied psychology readers encounter research constantly: in chapters, further reading sections, news articles, and popular books. The ability to read research critically — to know what a study can and cannot tell you — is as practical a skill as any of the frameworks in the book itself.

This primer covers the core concepts you need to evaluate psychological research. It is not a statistics textbook; it is a conceptual toolkit.


The Central Question: What Can This Study Tell Me?

Every piece of psychological research produces some kind of claim. The first question to ask is: What kind of claim is this, and what kind of evidence would justify it?

The most important distinction is between correlation and causation.

Correlation vs. Causation

A correlation means that two variables tend to co-vary — when one goes up, the other tends to go up (positive correlation) or down (negative correlation). Correlation does not establish that one variable caused the other.

Example: Studies find that people who use social media more tend to report lower wellbeing. This is a correlation. It is consistent with several possible explanations: 1. Social media use causes reduced wellbeing (the popular interpretation) 2. Lower wellbeing causes more social media use (the reverse causal direction) 3. Some third variable causes both (depression causes both more social media use and lower reported wellbeing) 4. All three operate simultaneously

A single correlational study cannot distinguish between these explanations. Only experimental evidence (or very sophisticated longitudinal designs) can begin to establish direction.

This is not a reason to dismiss correlational research. Many important findings are correlational and remain practically useful. But it is a reason to be precise about what the evidence supports.


Study Designs: What Each Can Tell You

Cross-Sectional Studies

Measure multiple variables in a single sample at a single point in time.

What they can tell you: Whether variables are associated (correlated).

What they cannot tell you: Which direction causation runs; whether the association holds over time; whether association in one sample generalizes to other populations.

Example: Survey 500 adults about social media use and wellbeing. Find that higher use correlates with lower wellbeing scores.

Limitation: The sample was selected at one moment; causal direction unknown; many possible confounds.


Longitudinal Studies

Follow the same people over time, measuring the variables of interest at multiple points.

What they can tell you: Whether changes in Variable A precede changes in Variable B (temporal precedence, a necessary but not sufficient condition for causality); whether associations are stable across development; patterns of change over time.

What they cannot tell you: With certainty whether Variable A caused Variable B (unmeasured variables could still produce both).

Example: Follow 1,000 adolescents from age 12 to 18, measuring social media use and wellbeing annually. Find that increases in social media use in one year predict decreases in wellbeing in the following year (even controlling for initial wellbeing).

Improvement over cross-sectional: The temporal direction is established. The limitation remains that unmeasured third variables could explain the relationship.


Randomized Controlled Trials (RCTs)

Randomly assign participants to conditions (e.g., treatment vs. control). The randomization means that, on average, participants in both conditions are equivalent on all variables — measured and unmeasured — before the treatment begins.

What they can tell you: Whether the treatment caused a difference in the outcome.

What they cannot tell you: Whether the effect generalizes beyond the sample studied; what the long-term effects are (if the trial is short); what the mechanism is (why the treatment works).

Example: Randomly assign 200 participants to either a social media abstinence condition or a control condition for four weeks. Measure wellbeing before and after. Find that the abstinence group shows significantly greater wellbeing improvement.

Strength: The randomization controls for confounds. The difference in outcomes can be attributed to the treatment.

Limitation: Social media abstinence trials recruit specific samples; participants who volunteer for an abstinence study may be more motivated to change; the effect may not generalize to all users.


Meta-Analyses

Statistically combine the results from multiple studies on the same question.

What they can tell you: The average effect size across all studies; whether the effect is consistent or varies across study designs, populations, or contexts.

What they cannot tell you: Answers questions that the underlying studies couldn't answer. If all underlying studies are cross-sectional, a meta-analysis of them is still cross-sectional.

Important concept: Publication bias. Studies finding significant effects are more likely to be published than studies finding null results. Meta-analyses that don't correct for publication bias may overestimate effect sizes.


Key Concepts for Evaluating Research

Effect Size

Effect sizes tell you how large the relationship or difference is — not just whether it is statistically significant.

Common effect size measures: - Cohen's d (comparing two means): d = 0.2 small, 0.5 medium, 0.8 large - r (correlation): r = 0.1 small, 0.3 medium, 0.5 large

Why effect size matters: A study can find a statistically significant relationship that is practically meaningless in magnitude. Orben and Przybylski's (2019) finding that screen time explained approximately 0.4% of the variance in adolescent wellbeing is a good example: statistically significant (given the large sample), practically trivial.


Statistical Significance vs. Practical Significance

p < .05 means: if the null hypothesis were true (no relationship), this result would occur by chance less than 5% of the time. It does NOT mean: - The effect is large - The finding will replicate - The finding is practically important - There is a 95% chance the hypothesis is correct

With large enough samples, trivially small effects become statistically significant. With small samples, moderately large effects may not reach significance.


Replication

Psychology has experienced a significant replication crisis since approximately 2011, when high-profile findings failed to replicate in systematic attempts. The Reproducibility Project (Open Science Collaboration, 2015) found that only about 39% of psychology findings replicated.

What to look for: - Has the finding been replicated by independent research groups? - Was the replication study of comparable quality and sample size to the original? - Have meta-analyses synthesized the literature, or is the finding based on a single influential study?

Red flags: - Single study, never replicated - Original study had very small sample size (n < 50) - Finding contradicts well-established adjacent findings - Based primarily on self-report with no behavioral validation - Priming studies (some of the most famous psychological priming effects have failed to replicate)


Sample and Generalizability

A study's conclusions are, strictly speaking, only about the sample studied. The question of generalizability — whether the findings extend to other populations — requires evidence.

WEIRD critique (Chapter 38): Most psychology research has been conducted on Western, Educated, Industrialized, Rich, Democratic samples. Henrich et al. (2010) demonstrated that WEIRD populations are often outliers on many psychological dimensions. Findings from WEIRD samples may not generalize to most of the world's population.

WEIRD sample note: College students have historically been the most convenient sample. Findings from college student samples may not generalize to working adults, older adults, clinical populations, or non-Western populations.


Demand Characteristics and Experimenter Bias

Demand characteristics: Participants may respond in ways they believe the researcher expects or desires. Self-report measures are particularly vulnerable. When participants know the hypothesis, they may consciously or unconsciously respond to confirm it.

Experimenter bias: Researchers' expectations can subtly influence how they collect, analyze, or report data. Blind or double-blind designs reduce this risk.


Common Research Design Limitations in Practice

Self-report: Most psychological research relies on self-report questionnaires. Self-report is subject to: social desirability bias, memory distortion, introspective inaccuracy (people often don't know why they do what they do), and framing effects.

Causal inference from observational data: Sophisticated statistical techniques (structural equation modeling, propensity score matching, instrumental variables) can partially address confounding, but cannot fully substitute for randomization.

Ecological validity: Laboratory findings may not replicate in real-world conditions. Social psychology experiments often use artificial situations; it is unclear how the findings translate to naturalistic settings.


How to Read a Research Summary Critically

When encountering a research claim (in news, books, or this book), ask:

  1. What type of study is this? (RCT, longitudinal, cross-sectional, meta-analysis)
  2. What causal claim is being made, and does the study design support it?
  3. What is the effect size? (Small/medium/large; % variance explained)
  4. Has this finding been replicated? By whom?
  5. What is the sample, and does it match the population to which the claim is being applied?
  6. Who funded the study? (Industry-funded research has documented conflicts of interest in some fields)
  7. Is the popular report consistent with the study itself? (Science journalism often overstates findings; check the abstract when possible)

Effect Sizes in Practice

To calibrate what different effect sizes mean in the real world:

r value d value What it looks like Example
.10 .20 Small Two people with the same height differ in weight by a few pounds
.30 .50 Medium Height difference between 15- and 16-year-old boys
.50 .80 Large Height difference between 13- and 18-year-old boys

Most psychological interventions produce effect sizes in the small-to-medium range (d = 0.2–0.5). Therapy meta-analyses typically show d ≈ 0.80 — which is genuinely large for a complex psychological outcome.

The Orben-Przybylski screen time finding (β ≈ .05, explaining < 1% of variance) is very small — smaller than the effect of wearing glasses or eating potatoes on wellbeing. The Haidt/Twenge counterargument is that the correlation is consistent, population-wide, and temporally aligned with smartphone adoption — even if the effect size per person is small, the aggregate effect at scale may be significant.

Both perspectives are honest engagements with the evidence. That is what good science literacy looks like.


A Note on Clinical vs. Statistical Significance

In clinical research, a finding can be statistically significant without being clinically meaningful. A therapy that reduces depression scores by 2 points on a 60-point scale might be statistically significant (in a large enough sample) without producing any clinically meaningful improvement in the patient's life.

Clinicians use the concept of minimal clinically important difference (MCID) — the threshold below which a change is not practically meaningful. Readers of research should apply the same standard: Is this effect large enough to matter in practice?


This primer covers the most practically relevant concepts. For deeper statistical understanding, the reading list includes introductory methods texts. The goal here is not statistical expertise but conceptual literacy — the ability to hold research claims with appropriate confidence and appropriate skepticism.

Recommended introductory reading: - Leek, J. T., & Peng, R. D. (2015). What is the question? Science, 347(6228), 1314–1315. (2-page guide to research question types) - Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. - Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). (The replication crisis paper)