Appendix A: Research Methods Primer
This appendix is your companion reference for the methodological discussions woven throughout the book. Whether you are encountering terms like "effect size" or "quasi-experimental design" for the first time, or need a quick refresher before evaluating a study in Chapter 25, you can return here as often as you like. Research methods are not dry plumbing — they are the epistemological backbone of everything we claim to know about attraction, desire, and relationships. Understanding how knowledge is made is inseparable from understanding the knowledge itself.
Part 1: Research Design Types
Different research questions call for different tools. The following eight design types represent the core methodological vocabulary of attraction science.
1.1 Experimental Design
Definition. A true experiment randomly assigns participants to conditions, manipulates one or more independent variables, and measures the effect on a dependent variable. Random assignment is the key feature: it distributes known and unknown confounders roughly equally across groups, giving the researcher grounds to infer causation rather than mere association.
Example in attraction research. Dutton and Aron's (1974) classic bridge study randomly assigned male participants to be approached by an attractive female confederate on either a high-arousal suspension bridge or a low-arousal solid bridge. The independent variable was bridge type (high-arousal vs. low-arousal); the dependent variable was whether participants later called the confederate. The experiment allowed the researchers to infer that arousal — not some pre-existing trait — caused the difference in attraction.
Strengths. High internal validity (confidence that X caused Y). Replicable with standardized protocols. Allows causal inference.
Limitations. Often conducted in artificial laboratory settings (low external validity or ecological validity). Some variables of greatest interest — race, gender, attachment history — cannot be randomly assigned for ethical or practical reasons. Demand characteristics: participants may behave differently when they know they are being studied.
🧪 Methodology Note: The gold standard of the "true experiment" is often impractical for the most interesting questions in attraction science. You cannot randomly assign someone an anxious attachment style or a particular socioeconomic background. This is why attraction research relies heavily on the quasi-experimental and correlational designs below.
1.2 Quasi-Experimental Design
Definition. A quasi-experiment resembles a true experiment — it has a comparison condition and often a pre/post structure — but lacks full random assignment. Participants may be assigned to groups based on pre-existing characteristics (sex, age, nationality) or natural circumstances (a policy change, a pandemic).
Example. Before-and-after studies examining how the introduction of Tinder in a market changed self-reported loneliness or hookup frequency. Researchers compare regions before and after app availability, but they cannot randomly assign regions to "gets Tinder" vs. "doesn't."
Strengths. More feasible and ethical than true experiments in many contexts. Can approximate experimental logic when groups are well-matched.
Limitations. Selection bias — groups may differ in important ways before the study begins. Cannot rule out confounders as decisively as a randomized experiment.
1.3 Correlational Research
Definition. Correlational studies measure two or more variables and assess whether they vary together — do people with higher self-esteem report more dating satisfaction? — without manipulating anything. The result is a correlation coefficient (see Part 2 below).
Example. Measuring body symmetry (via caliper or photogrammetry) and self-reported attractiveness ratings across a sample. If more symmetric people tend to get rated as more attractive, the variables are positively correlated.
Strengths. Feasible for large samples. Can examine variables that cannot be experimentally manipulated. Captures real-world variation.
Limitations. The phrase "correlation does not imply causation" exists for a reason. A correlation between two variables A and B could mean A causes B, B causes A, a third variable C causes both, or the relationship is spurious (coincidental). Attraction researchers must be particularly careful not to reverse-engineer causal stories from correlational data.
1.4 Observational / Naturalistic Methods
Definition. Researchers observe behavior as it naturally occurs, without intervention. This may be overt (participants know they are being observed) or covert (they do not). Settings range from bars and speed-dating events to online platforms.
Example. Coding flirtation behaviors — gaze duration, smile frequency, body orientation, touch initiation — during real speed-dating events. The Okafor-Reyes Global Attraction Project's behavioral observation component is an example of structured naturalistic observation: trained coders use a standardized coding scheme in real social contexts.
Strengths. High ecological validity — behavior is genuine rather than lab-induced. Captures nuance and context.
Limitations. Observer effects (the Hawthorne effect): people change behavior when observed. Covert observation raises ethical questions around consent. Intercoder reliability must be established. Time and labor intensive.
1.5 Survey / Self-Report Methods
Definition. Participants complete questionnaires measuring attitudes, beliefs, behaviors, or traits. Scales may be Likert-type (1 = strongly disagree, 7 = strongly agree), forced-choice, or open-ended.
Example. The Experiences in Close Relationships — Revised (ECR-R; Fraley et al., 2000) measures anxious and avoidant attachment dimensions on two 18-item scales. The Sociosexual Orientation Inventory — Revised (SOI-R; Penke & Asendorpf, 2008) measures openness to casual sex.
Strengths. Efficient — large samples at low cost. Can measure internal states (attitudes, feelings) inaccessible by observation. Standardized scales allow cross-study comparison.
Limitations. Social desirability bias: respondents give answers they think look good, not what is true. Memory distortion: retrospective reports of past behavior are unreliable. Acquiescence bias: some people tend to agree with whatever is stated. Self-insight limitations: people often do not know why they behave as they do.
⚠️ Critical Caveat: Much of what we "know" about attraction comes from self-report surveys administered to convenience samples of undergraduate students. Keep this in mind every time you read a sentence beginning "people prefer…"
1.6 Qualitative Methods
Qualitative approaches prioritize depth, meaning, and context over numerical summarization. The three most common methods in relationship science are:
Interviews. Semi-structured or in-depth conversations that follow an interview guide but allow for exploration. A researcher interviewing queer individuals about navigating disclosure in dating contexts would use an interview to capture the nuance of individual experiences rather than collapsing them into a scale score.
Focus Groups. Group conversations in which several participants discuss a topic together, generating data through interaction. Useful for understanding shared cultural scripts — what does "leading someone on" mean in a particular community? — but can suppress minority viewpoints due to group dynamics.
Discourse / Thematic Analysis. Systematic analysis of language in texts, conversations, or media. Thematic analysis (Braun & Clarke, 2006) involves iterative coding to identify recurring themes. Discourse analysis examines how language constructs social reality — for instance, how the phrase "the friend zone" positions women as gatekeepers of access.
Strengths. Captures lived experience, cultural context, and meaning. Generates hypotheses for quantitative testing. Essential for studying populations and phenomena that do not fit standardized scales.
Limitations. Not easily generalizable to broader populations. Findings are interpretive — different researchers may code the same data differently. Time-intensive. Can be dismissed (unfairly) by quantitative researchers as "merely anecdotal."
1.7 Neuroimaging Methods
fMRI (functional Magnetic Resonance Imaging). Measures blood-oxygen-level-dependent (BOLD) signal as a proxy for neural activity. Studies have scanned participants while viewing photos of romantic partners, attractive strangers, or completing tasks related to social rejection. Key regions implicated in attraction include the ventral tegmental area (VTA), caudate nucleus, and nucleus accumbens — all components of the dopaminergic reward system — as well as the amygdala and prefrontal cortex.
EEG (Electroencephalography). Measures electrical activity at the scalp surface, millisecond by millisecond. High temporal resolution (it can capture fast, automatic responses) but poor spatial resolution compared to fMRI. Used to study the "N170" component linked to face processing, or frontal asymmetry associated with approach motivation.
Strengths. Objective, biological measurement that does not rely on self-report. Provides mechanistic insight into the neural architecture of desire.
Limitations. Small samples (scanner time is expensive). Reverse inference problem: the brain regions activated by "romantic love" overlap enormously with those activated by drug reward, gambling, and maternal love — concluding "this is romantic attraction" from a particular activation pattern requires caution. fMRI spatial resolution is still limited (one voxel contains ~1 million neurons). Many early neuroimaging findings in social psychology have not replicated.
🧪 Methodology Note: "Neuroessentialism" — the tendency to treat brain scans as more real or authoritative than behavioral or self-report data — is a cognitive bias, not a methodological principle. Brain data does not trump psychological data; it supplements it.
1.8 Meta-Analysis
Definition. A meta-analysis statistically aggregates results across multiple independent studies addressing the same research question. Rather than relying on any single study, it calculates a weighted average effect size across the literature, yielding a more stable estimate than any individual study alone.
Key concepts. The forest plot (visualized in Chapter 40) displays each study's effect size and 95% confidence interval. The diamond at the bottom represents the pooled estimate. Heterogeneity statistics (I²) indicate how much the effect sizes vary across studies — high heterogeneity may signal that a moderating variable is at work.
Example. Feingold's (1992) meta-analysis of physical attractiveness and social outcomes analyzed 7 meta-analyses and dozens of studies, finding moderate positive effects of attractiveness on a range of life outcomes.
Strengths. Larger effective sample sizes. Reduces reliance on any single potentially flawed study. Can identify moderators of effects.
Limitations. Garbage in, garbage out — a meta-analysis of poorly designed studies is a high-powered analysis of bad data. Publication bias (see below) means the studies available to meta-analyze may not represent the full picture. Combining studies that measured slightly different constructs (apples and oranges problem) is a persistent criticism.
Part 2: Key Statistical Concepts
You do not need to be a statistician to be a critical consumer of research. These are the concepts that appear most frequently in attraction science papers.
2.1 Mean and Standard Deviation
The mean (M or x̄) is the arithmetic average of a set of scores. The standard deviation (SD) measures how spread out scores are around the mean. A small SD means scores cluster tightly; a large SD means scores are widely dispersed. In attraction research: if mean attractiveness rating is 5.2 (SD = 1.1) on a 1–7 scale, most people's ratings fall between roughly 4.1 and 6.3.
2.2 Correlation Coefficient (r)
Pearson's r ranges from -1.0 to +1.0. An r of +1.0 is a perfect positive relationship (as X increases, Y increases proportionally). An r of 0 is no linear relationship. An r of -1.0 is a perfect negative relationship. Conventions from Cohen (1988): r = .10 is small, r = .30 is medium, r = .50 is large. Most correlations in social psychology are in the .20–.40 range.
Important: r² tells you the proportion of variance in Y explained by X. An r of .30 means X accounts for only 9% of the variance in Y — the other 91% is explained by other factors.
2.3 Statistical Significance and the p-Value
A p-value is the probability of obtaining your result (or a more extreme one) if the null hypothesis (no effect) were true. By convention, p < .05 is called "statistically significant," meaning there is less than a 5% chance of seeing this result by chance under a null world.
What p-values do NOT tell you. They do not tell you the probability that your hypothesis is true. They do not tell you the size or importance of the effect. A p-value of .001 in a study with 50,000 participants does not mean the effect is large or meaningful. Statistical significance is heavily influenced by sample size: with a large enough N, even trivial effects become "significant."
⚠️ Critical Caveat: Much of the "seduction science" literature conflates statistical significance with practical importance. A study finding that men rated women in red as more attractive at p = .03 tells you almost nothing about whether wearing red will change your dating outcomes in any meaningful way.
2.4 Effect Size
Effect size quantifies the magnitude of a relationship or difference, independent of sample size.
Cohen's d is used for mean differences between two groups. d = (M₁ - M₂) / SD_pooled. Cohen's benchmarks: small = 0.2, medium = 0.5, large = 0.8. An effect of d = 0.2 means the groups differ by one-fifth of a standard deviation — a small but potentially detectable effect.
r as effect size can also index effect magnitude: small = .10, medium = .30, large = .50.
Why this matters. Chapter 3 demonstrates that many headline-grabbing findings in attraction science have effect sizes in the small-to-medium range. This does not mean the findings are false — small effects in large populations can have substantial aggregate consequences — but it should calibrate your expectations.
2.5 Confidence Intervals
A confidence interval (CI) provides a range of values within which the true population parameter likely falls. A 95% CI means: if you repeated this study 100 times, approximately 95 of those intervals would contain the true value. Wide CIs indicate uncertainty (small samples, noisy measurement). Narrow CIs indicate precision.
Key insight: if a 95% CI for a difference includes zero, the result is not statistically significant at p < .05 — and this is the more informative way to express uncertainty than a bare p-value.
2.6 Regression
Simple regression estimates the linear relationship between one predictor variable (X) and one outcome variable (Y), yielding a slope (β or b) that tells you how much Y changes for each one-unit increase in X, holding other factors constant.
Multiple regression includes several predictor variables simultaneously, allowing researchers to control for confounders. For example: "After controlling for age and income, profile photo quality significantly predicted match rate (β = .31, p < .001)."
Caution. Regression results still require careful causal interpretation. Including the wrong control variables can introduce bias (collider bias, mediation confounding). Coefficients depend on the specific sample and variables included.
Part 3: The WEIRD Problem
One of the most important critiques of social psychology, and attraction science in particular, is the WEIRD problem, named in a landmark paper by Henrich, Heine, and Norenzayan (2010).
WEIRD stands for: Western, Educated, Industrialized, Rich, Democratic. The vast majority of psychology studies — perhaps 80–90% — have sampled exclusively from these populations, yet most textbooks (including older attraction textbooks) present their findings as if they describe universal human behavior.
Why WEIRD matters for attraction research
Physical attractiveness standards are more culturally variable than early research suggested. Body-mass preferences, skin tone preferences, and facial feature ideals differ substantially across cultures and have shifted historically within cultures. Studies conducted only on American undergraduates cannot establish "universal" standards.
Relationship norms are profoundly culturally variable. Individualist cultures (US, Western Europe) value personal choice in partner selection; collectivist cultures (much of South and East Asia, parts of Africa and the Middle East) embed partner selection in family and community decision-making. Self-report measures developed in individualist contexts may not translate meaningfully.
Sexual attitudes and behaviors show wide cross-cultural variation in what is normative, permissible, or even conceivable. Research on casual sex conducted with US undergraduates cannot be generalized to populations where premarital sex is heavily stigmatized or legally regulated.
The Okafor-Reyes Global Attraction Project, introduced in Chapter 1, is partly a response to the WEIRD problem: its 12-country design specifically oversample non-WEIRD populations.
💡 Key Insight: When you read a study claiming "people prefer X," always ask: which people? When? Where? Under what conditions? These are not pedantic quibbles — they are the difference between a finding and a claim.
Part 4: Pre-Registration and Open Science
The replication crisis in psychology (discussed in Chapter 3) has prompted a reform movement called open science, aimed at making research more transparent, reproducible, and resistant to methodological manipulation.
Pre-Registration
Pre-registration means publicly declaring your hypotheses, study design, sample size, and analysis plan before collecting data, typically through a registry like OSF (Open Science Framework) or AsPredicted.
Why it matters. Without pre-registration, researchers can engage (consciously or unconsciously) in a range of practices that inflate false positive rates:
- HARKing (Hypothesizing After Results are Known): presenting post-hoc observations as if they were predicted in advance
- p-hacking: trying multiple analyses and reporting only the one that reaches p < .05
- Selective reporting: running five dependent variables and publishing only the two that were significant
Pre-registered studies have a higher bar: deviations from the plan must be transparently reported. Exploratory analyses are permitted but must be labeled as such.
Other Open Science Practices
Open data and materials. Sharing raw data and study materials allows other researchers to verify analyses and run replications. The Okafor-Reyes project plans to release its full dataset at completion (Chapter 40).
Registered Reports. A journal format in which peer review happens before data collection, based on the quality of the research question and design rather than the results. Papers are accepted for publication regardless of outcome — eliminating publication bias for those studies.
Multi-site replication. Studies conducted simultaneously across multiple labs in multiple countries, which substantially increases power and generalizability. The Many Labs projects are prominent examples.
📊 Research Spotlight: The Open Science Collaboration (2015) attempted to replicate 100 psychology studies. Only about 36–39% replicated with comparable effect sizes. Social psychology fared worse than cognitive psychology. This sobering finding — not a condemnation of psychology, but a call for better practices — is the backdrop against which all attraction research in this book should be read.
Part 5: Research Quality Checklist
Use this checklist when evaluating any study you encounter — in academic papers, news coverage, or social media claims. This checklist connects directly to the evidence evaluation discussions in Chapter 3 and Chapter 40.
Checklist: Evaluating an Attraction Study
Study Design - [ ] What type of study is this? (experimental, correlational, observational, survey, qualitative, neuroimaging) - [ ] Does the design allow causal inference, or only association? - [ ] Was the study pre-registered?
Sample - [ ] Who were the participants? (age, gender, nationality, education, sexuality) - [ ] How were they recruited? (convenience sample? paid participants? MTurk workers? undergraduates?) - [ ] How large was the sample? (N < 50 warrants skepticism; N > 500 increases confidence) - [ ] Is the sample WEIRD? Does the paper acknowledge this limitation?
Measurement - [ ] How were the key variables measured? (validated scales? behavioral coding? self-report?) - [ ] Are the measures face-valid — do they actually measure what they claim to measure? - [ ] Were interrater reliability statistics reported for behavioral coding?
Analysis - [ ] What was the effect size? (not just "significant" — how big?) - [ ] Are confidence intervals reported? - [ ] Are there alternative explanations the authors did not adequately rule out? - [ ] Did the authors engage in obvious p-hacking (many variables, only a few results reported)?
Replication and Context - [ ] Has this finding been independently replicated? - [ ] Is this a single study or part of a converging body of evidence? - [ ] Does the finding appear in a peer-reviewed journal? (Be especially critical of conference abstracts, press releases, and TED Talk claims)
Claims vs. Findings - [ ] Does the headline/abstract overstate the findings? - [ ] Is the researcher claiming causation from correlational data? - [ ] Are the findings being applied to populations far outside the study sample?
⚖️ Debate Point: Some critics argue that this level of methodological scrutiny selectively debunks findings that challenge folk wisdom while letting intuitively plausible findings pass unchallenged. This "skeptical asymmetry" is itself a bias worth watching for in your own reading — and in the book's own analysis.
Quick Reference Glossary
| Term | Definition |
|---|---|
| Correlation coefficient (r) | Measure of linear association between two variables, ranging from -1 to +1 |
| Cohen's d | Standardized measure of difference between two group means |
| Confidence interval (CI) | Range of values likely to contain the true population parameter |
| Effect size | Magnitude of a relationship or difference, independent of sample size |
| External validity | Degree to which findings generalize beyond the study's specific sample and setting |
| Internal validity | Confidence that the independent variable caused the observed effect |
| p-value | Probability of observing the result if the null hypothesis were true |
| Pre-registration | Publicly declaring hypotheses and analysis plan before data collection |
| Quasi-experiment | Design resembling an experiment but without full random assignment |
| Random assignment | Allocating participants to conditions by chance to equalize confounders |
| Reliability | Consistency of a measure — same result under same conditions |
| Replication | Repeating a study to test whether the original finding holds |
| WEIRD bias | Over-reliance on Western, Educated, Industrialized, Rich, Democratic samples |
| Validity | Whether a measure captures what it claims to capture |
Further Reading on Methods
- Schroeder, J., & Epley, N. (2021). "Research Methodology and Statistical Reporting in Social-Personality Psychology." Annual Review of Psychology. Excellent overview of current best practices.
- Henrich, J., Heine, S. J., & Norenzayan, A. (2010). "The weirdest people in the world?" Behavioral and Brain Sciences, 33, 61–135. The foundational WEIRD critique.
- Open Science Collaboration. (2015). "Estimating the reproducibility of psychological science." Science, 349, aac4716. The landmark replication study.
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). "False-positive psychology." Psychological Science, 22, 1359–1366. A bracing demonstration of how flexibility in analysis produces false positives.
- Braun, V., & Clarke, V. (2006). "Using thematic analysis in psychology." Qualitative Research in Psychology, 3, 77–101. The standard reference for thematic analysis.