Key Takeaways: Comparing Two Groups
One-Sentence Summary
Comparing two groups — the most common analysis in applied statistics — requires choosing between the two-sample t-test (independent means), the paired t-test (dependent means), or the two-proportion z-test (independent proportions), each following the same fundamental logic: measure the difference, compute the standard error of that difference, and determine whether the difference is too large to be explained by chance.
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| Independent samples | Two samples where individuals in one group are unrelated to individuals in the other | Determines whether to use the two-sample t-test (independent) or paired t-test (dependent) |
| Dependent (paired) samples | Data where each observation in one group has a natural partner in the other group | Paired tests eliminate between-subject variability, often dramatically increasing power |
| Difference in proportions | $\hat{p}_1 - \hat{p}_2$ estimates the true difference $p_1 - p_2$ between two population proportions | Essential for comparing rates, percentages, and binary outcomes across two groups |
| CI for a difference | Interval that estimates how large the difference between two groups plausibly is | Contains zero ↔ fail to reject $H_0$; entirely positive or negative → significant difference |
The Three Tests
1. Two-Sample t-Test (Welch's)
$$\boxed{t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}}$$
- When: Comparing means from two independent groups
- df: Welch-Satterthwaite approximation (let software compute it)
- Default choice: Welch's (does not assume equal variances)
- Example: Alex's A/B test — old algorithm vs. new algorithm
2. Paired t-Test
$$\boxed{t = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n - 1}$$
- When: Data come in natural pairs (before/after, matched subjects)
- Key insight: This IS a one-sample t-test on the within-pair differences
- Why it's powerful: Eliminates between-subject variability
- Example: Sam's game-by-game comparison of Daria's scoring
3. Two-Proportion z-Test
$$\boxed{z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}_{\text{pooled}}(1 - \hat{p}_{\text{pooled}})\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}}}$$
- When: Comparing proportions from two independent groups
- Pooled proportion: $\hat{p}_{\text{pooled}} = (X_1 + X_2)/(n_1 + n_2)$
- Note: Test uses pooled SE; CI uses unpooled SE
- Example: James's recidivism rate comparison
The Decision Flowchart
Comparing two groups?
│
What type of variable?
╱ ╲
Numerical Categorical
(means) (proportions)
│ │
Paired? Two-proportion
╱ ╲ z-test
Yes No
│ │
Paired Two-sample
t-test t-test
(Welch's)
Confidence Intervals for Differences
| Scenario | CI Formula |
|---|---|
| Independent means | $(\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{s_1^2/n_1 + s_2^2/n_2}$ |
| Paired means | $\bar{d} \pm t^*_{n-1} \cdot s_d/\sqrt{n}$ |
| Independent proportions | $(\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}$ |
Reading the CI: - Contains zero → no significant difference - Entirely positive → Group 1 plausibly higher - Entirely negative → Group 1 plausibly lower - Width → precision of the estimate
Conditions Summary
| Condition | Two-Sample t | Paired t | Two-Proportion z |
|---|---|---|---|
| Independence between groups | ✓ | N/A (same subjects) | ✓ |
| Independence within groups | ✓ | Differences independent | ✓ |
| Normality/Success-failure | Both groups: $n \geq 30$ or normal | Differences: $n \geq 30$ or normal | $n\hat{p}_{\text{pooled}} \geq 10$ and $n(1-\hat{p}_{\text{pooled}}) \geq 10$ per group |
| Random sampling/assignment | ✓ | ✓ | ✓ |
Python Quick Reference
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest
import numpy as np
# --- Two-sample t-test (Welch's, default) ---
t, p = stats.ttest_ind(group1, group2, equal_var=False)
# One-tailed:
t, p = stats.ttest_ind(group1, group2, equal_var=False,
alternative='greater')
# --- Paired t-test ---
t, p = stats.ttest_rel(after, before)
# Equivalent:
diffs = after - before
t, p = stats.ttest_1samp(diffs, popmean=0)
# --- Two-proportion z-test ---
count = np.array([successes1, successes2])
nobs = np.array([n1, n2])
z, p = proportions_ztest(count, nobs)
Excel Quick Reference
| Test | Formula |
|---|---|
| Two-sample t (Welch's) | =T.TEST(range1, range2, tails, 3) |
| Paired t | =T.TEST(range1, range2, tails, 1) |
| Two-proportion z | Compute z manually, then =2*(1-NORM.S.DIST(ABS(z),TRUE)) |
Common Misconceptions
| Misconception | Reality |
|---|---|
| "Same test score = paired data" | Pairing is about the study design (same subjects, matched pairs), not the test itself |
| "Overlapping CIs mean no difference" | Two CIs can overlap and the difference can still be significant; compute the CI for the difference |
| "Significant difference = causal effect" | Only in randomized experiments; observational studies show associations |
| "Equal variances test first, then choose t-test" | Just use Welch's by default — it works either way |
| "More degrees of freedom always means more power" | Using the wrong test (independent instead of paired) gives more df but less power |
| "A small p-value means a large difference" | P-values depend on sample size; the CI tells you how large the difference is |
How This Chapter Connects
| This Chapter | Builds On | Leads To |
|---|---|---|
| Two-sample t-test | One-sample t-test (Ch.15), SE (Ch.11) | ANOVA for 3+ groups (Ch.20) |
| Paired t-test | One-sample t-test on differences (Ch.15) | Repeated measures designs (Ch.20) |
| Two-proportion z-test | One-sample z-test for proportions (Ch.14) | Chi-square tests (Ch.19) |
| Study design interpretation | Experimental vs. observational (Ch.4) | Regression controls for confounders (Ch.22-23) |
| Choosing the right test | Hypothesis testing framework (Ch.13) | Power analysis informs sample size (Ch.17) |
The Key Themes
Theme 2: Comparing groups is where bias becomes visible. Every fairness audit, pay equity analysis, and health disparity study uses two-group comparisons. James's false positive rate comparison revealed algorithmic bias that the overall comparison masked. The methods in this chapter are tools for justice as much as tools for science.
Theme 5: Correlation vs. causation in group comparisons. Finding a significant difference between two groups does not automatically mean one group's condition caused the difference. Alex's randomized A/B test supports a causal claim. Maya's observational comparison reveals an association that demands further investigation. The statistical test tells you the difference is real. The study design determines what kind of "real" it is.
The One Thing to Remember
If you forget everything else from this chapter, remember this:
When comparing two groups, ask two questions before choosing a test: (1) Am I comparing means or proportions? (2) Are the data independent or paired? The answers determine whether you use a two-sample t-test, a paired t-test, or a two-proportion z-test. All three follow the same logic: observed difference divided by standard error of the difference. Use Welch's t-test by default for independent means (it doesn't assume equal variances). For paired data, compute within-pair differences and run a one-sample t-test. Always report the confidence interval for the difference alongside the p-value — the CI tells you HOW BIG the difference is, not just WHETHER it exists. And remember: the test tells you the difference is real; the study design tells you whether you can call it causal.
Key Terms
| Term | Definition |
|---|---|
| Two-sample t-test | A hypothesis test comparing the means of two independent groups; uses Welch's formula by default: $t = (\bar{x}_1 - \bar{x}_2) / \sqrt{s_1^2/n_1 + s_2^2/n_2}$ |
| Independent samples | Two samples where individuals in one group are completely unrelated to individuals in the other; knowing a value in Group 1 tells you nothing about Group 2 |
| Paired t-test | A hypothesis test for paired data; computes within-pair differences and tests whether the mean difference is zero: $t = \bar{d}/(s_d/\sqrt{n})$ with $df = n-1$ |
| Dependent samples | Paired or matched observations where each data point in one group has a natural partner in the other (same person, same location, matched pair) |
| Pooled standard error | The standard error for the difference between two statistics, combining variability from both groups; for independent means: $SE = \sqrt{s_1^2/n_1 + s_2^2/n_2}$ |
| Two-proportion z-test | A hypothesis test comparing proportions from two independent groups using the standard normal distribution; uses the pooled proportion under $H_0$ |
| Matched pairs | A study design where each observation in one condition is linked to a specific observation in the other condition, creating natural one-to-one correspondence |
| Difference in means | $\bar{x}_1 - \bar{x}_2$; the point estimate for $\mu_1 - \mu_2$; the primary quantity of interest in two-sample mean comparisons |
| Difference in proportions | $\hat{p}_1 - \hat{p}_2$; the point estimate for $p_1 - p_2$; the primary quantity of interest in two-sample proportion comparisons |