Key Takeaways: Designing Studies — Sampling and Experiments

One-Sentence Summary

How data is collected determines what it can tell you — observational studies reveal associations, experiments reveal causes, and bias in sampling can make even the largest dataset misleading.

Core Concepts at a Glance

Concept Definition Why It Matters
Observational study A study that measures without intervening Can show association, but not causation — confounders may lurk
Experiment A study that imposes a treatment and measures the response With randomization, can establish cause-and-effect
Confounding variable A variable related to both the explanatory and response variables The reason "correlation does not imply causation" — confounders create false signals
Bias A systematic tendency to produce results that are wrong in a particular direction Doesn't shrink with larger samples; must be prevented by design
Randomization Using chance to select samples or assign treatments Protects against known and unknown biases — the single most powerful tool in statistics

Decision Flowchart: What Type of Study Is This?

Does the researcher impose a treatment?
│
├── YES → It's an EXPERIMENT
│         │
│         ├── Was there random assignment to groups?
│         │   ├── YES → Can support CAUSAL claims ✓
│         │   └── NO  → Confounding may explain results ✗
│         │
│         ├── Was there a control group?
│         │   ├── YES → Has a baseline for comparison ✓
│         │   └── NO  → No way to separate treatment from time/other factors ✗
│         │
│         └── Was blinding used?
│             ├── Double-blind → Minimal bias ✓✓
│             ├── Single-blind → Some protection ✓
│             └── None → Placebo effect and researcher bias possible ✗
│
└── NO → It's an OBSERVATIONAL STUDY
          │
          ├── Shows ASSOCIATION only — not causation
          ├── Confounding variables are always a concern
          └── Can provide strong evidence when:
              • Large sample
              • Replicated across studies
              • Dose-response relationship
              • Biologically/theoretically plausible

Decision Flowchart: Which Sampling Method?

Do you need your results to generalize to a population?
│
├── YES → You need a probability sample (random element)
│         │
│         ├── Do you need specific subgroups represented?
│         │   ├── YES → STRATIFIED sampling
│         │   │         (divide into strata, randomly sample within each)
│         │   └── NO  → ↓
│         │
│         ├── Is the population geographically spread out?
│         │   ├── YES → CLUSTER sampling
│         │   │         (randomly select entire groups)
│         │   └── NO  → ↓
│         │
│         ├── Do you have a complete list of the population?
│         │   ├── YES → SIMPLE RANDOM sampling
│         │   │         (every member has equal chance)
│         │   └── NO  → SYSTEMATIC sampling
│         │             (every kth member from available list)
│         │
│         └── Always check for bias regardless of method
│
└── NO → CONVENIENCE sampling is acceptable
          (but clearly state limitations — results may not generalize)

Sampling Methods Quick Reference

Method The Idea Strength Weakness
Simple Random Everyone has equal chance Unbiased Need a complete list
Stratified Random within defined subgroups Guarantees representation Must know subgroups in advance
Cluster Select entire groups randomly Cost-effective for spread-out populations Less precise per observation
Convenience Whoever is easiest to reach Cheap and fast Probably biased — use with extreme caution
Systematic Every kth member from a list Simple to do Risk if list has periodic pattern

Bias Cheat Sheet

Type of Bias What It Is How to Spot It Example
Selection bias Sample systematically excludes groups Ask: "Who is missing from this data?" Phone surveys miss people without phones
Response bias Questions or context influence answers Look for leading questions, social desirability "Don't you agree that..."
Nonresponse bias Non-respondents differ from respondents Check response rates; ask who didn't reply Only 24% return a survey — who are the 76%?
Survivorship bias Only "survivors" are visible in the data Ask: "What am I not seeing?" Studying bullet holes on planes that returned

The Causal Claims Checklist

When someone says "X causes Y," ask:

  • [ ] Was this an experiment or observational study?
  • [ ] Was there random assignment?
  • [ ] Was there a control group?
  • [ ] Was blinding used?
  • [ ] Can I think of a confounding variable?
  • [ ] How was the sample selected?
  • [ ] How large was the sample?
  • [ ] Has this been replicated?

Rule of thumb: If fewer than 5 boxes are checked favorably, be skeptical of a causal claim.

Key Terms

Term Definition
Observational study Observes without intervening
Experiment Imposes a treatment to observe responses
Random sample Every member has an equal selection chance
Stratified sampling Random sampling within defined subgroups
Cluster sampling Randomly selecting entire groups
Convenience sample Sampling whoever is easiest to reach
Systematic sampling Selecting every kth member
Bias Systematic tendency toward wrong results in one direction
Confounding variable Related to both explanatory and response variables
Randomization Using chance for selection or assignment
Control group Doesn't receive the treatment
Treatment group Receives the treatment
Placebo Inactive treatment that looks real
Blinding Hiding group assignments from participants and/or researchers
Double-blind Neither participants nor researchers know group assignments

The One Thing to Remember

If you forget everything else from this chapter, remember this:

The design of a study determines what it can prove. Observational studies show association. Experiments with randomization show causation. And a biased sample — no matter how large — gives you a precise wrong answer.