Key Takeaways: Designing Studies — Sampling and Experiments

Contributors

Key Takeaways: Designing Studies — Sampling and Experiments

One-Sentence Summary

How data is collected determines what it can tell you — observational studies reveal associations, experiments reveal causes, and bias in sampling can make even the largest dataset misleading.

Core Concepts at a Glance

Concept	Definition	Why It Matters
Observational study	A study that measures without intervening	Can show association, but not causation — confounders may lurk
Experiment	A study that imposes a treatment and measures the response	With randomization, can establish cause-and-effect
Confounding variable	A variable related to both the explanatory and response variables	The reason "correlation does not imply causation" — confounders create false signals
Bias	A systematic tendency to produce results that are wrong in a particular direction	Doesn't shrink with larger samples; must be prevented by design
Randomization	Using chance to select samples or assign treatments	Protects against known and unknown biases — the single most powerful tool in statistics

Decision Flowchart: What Type of Study Is This?

Does the researcher impose a treatment?
│
├── YES → It's an EXPERIMENT
│         │
│         ├── Was there random assignment to groups?
│         │   ├── YES → Can support CAUSAL claims ✓
│         │   └── NO  → Confounding may explain results ✗
│         │
│         ├── Was there a control group?
│         │   ├── YES → Has a baseline for comparison ✓
│         │   └── NO  → No way to separate treatment from time/other factors ✗
│         │
│         └── Was blinding used?
│             ├── Double-blind → Minimal bias ✓✓
│             ├── Single-blind → Some protection ✓
│             └── None → Placebo effect and researcher bias possible ✗
│
└── NO → It's an OBSERVATIONAL STUDY
          │
          ├── Shows ASSOCIATION only — not causation
          ├── Confounding variables are always a concern
          └── Can provide strong evidence when:
              • Large sample
              • Replicated across studies
              • Dose-response relationship
              • Biologically/theoretically plausible

Decision Flowchart: Which Sampling Method?

Do you need your results to generalize to a population?
│
├── YES → You need a probability sample (random element)
│         │
│         ├── Do you need specific subgroups represented?
│         │   ├── YES → STRATIFIED sampling
│         │   │         (divide into strata, randomly sample within each)
│         │   └── NO  → ↓
│         │
│         ├── Is the population geographically spread out?
│         │   ├── YES → CLUSTER sampling
│         │   │         (randomly select entire groups)
│         │   └── NO  → ↓
│         │
│         ├── Do you have a complete list of the population?
│         │   ├── YES → SIMPLE RANDOM sampling
│         │   │         (every member has equal chance)
│         │   └── NO  → SYSTEMATIC sampling
│         │             (every kth member from available list)
│         │
│         └── Always check for bias regardless of method
│
└── NO → CONVENIENCE sampling is acceptable
          (but clearly state limitations — results may not generalize)

Sampling Methods Quick Reference

Method	The Idea	Strength	Weakness
Simple Random	Everyone has equal chance	Unbiased	Need a complete list
Stratified	Random within defined subgroups	Guarantees representation	Must know subgroups in advance
Cluster	Select entire groups randomly	Cost-effective for spread-out populations	Less precise per observation
Convenience	Whoever is easiest to reach	Cheap and fast	Probably biased — use with extreme caution
Systematic	Every kth member from a list	Simple to do	Risk if list has periodic pattern

Bias Cheat Sheet

Type of Bias	What It Is	How to Spot It	Example
Selection bias	Sample systematically excludes groups	Ask: "Who is missing from this data?"	Phone surveys miss people without phones
Response bias	Questions or context influence answers	Look for leading questions, social desirability	"Don't you agree that..."
Nonresponse bias	Non-respondents differ from respondents	Check response rates; ask who didn't reply	Only 24% return a survey — who are the 76%?
Survivorship bias	Only "survivors" are visible in the data	Ask: "What am I not seeing?"	Studying bullet holes on planes that returned

The Causal Claims Checklist

When someone says "X causes Y," ask:

[ ] Was this an experiment or observational study?
[ ] Was there random assignment?
[ ] Was there a control group?
[ ] Was blinding used?
[ ] Can I think of a confounding variable?
[ ] How was the sample selected?
[ ] How large was the sample?
[ ] Has this been replicated?

Rule of thumb: If fewer than 5 boxes are checked favorably, be skeptical of a causal claim.

Key Terms

Term	Definition
Observational study	Observes without intervening
Experiment	Imposes a treatment to observe responses
Random sample	Every member has an equal selection chance
Stratified sampling	Random sampling within defined subgroups
Cluster sampling	Randomly selecting entire groups
Convenience sample	Sampling whoever is easiest to reach
Systematic sampling	Selecting every kth member
Bias	Systematic tendency toward wrong results in one direction
Confounding variable	Related to both explanatory and response variables
Randomization	Using chance for selection or assignment
Control group	Doesn't receive the treatment
Treatment group	Receives the treatment
Placebo	Inactive treatment that looks real
Blinding	Hiding group assignments from participants and/or researchers
Double-blind	Neither participants nor researchers know group assignments

The One Thing to Remember

If you forget everything else from this chapter, remember this:

The design of a study determines what it can prove. Observational studies show association. Experiments with randomization show causation. And a biased sample — no matter how large — gives you a precise wrong answer.