Key Takeaways: Comparing Two Groups

One-Sentence Summary

Comparing two groups — the most common analysis in applied statistics — requires choosing between the two-sample t-test (independent means), the paired t-test (dependent means), or the two-proportion z-test (independent proportions), each following the same fundamental logic: measure the difference, compute the standard error of that difference, and determine whether the difference is too large to be explained by chance.

Core Concepts at a Glance

Concept Definition Why It Matters
Independent samples Two samples where individuals in one group are unrelated to individuals in the other Determines whether to use the two-sample t-test (independent) or paired t-test (dependent)
Dependent (paired) samples Data where each observation in one group has a natural partner in the other group Paired tests eliminate between-subject variability, often dramatically increasing power
Difference in proportions $\hat{p}_1 - \hat{p}_2$ estimates the true difference $p_1 - p_2$ between two population proportions Essential for comparing rates, percentages, and binary outcomes across two groups
CI for a difference Interval that estimates how large the difference between two groups plausibly is Contains zero ↔ fail to reject $H_0$; entirely positive or negative → significant difference

The Three Tests

1. Two-Sample t-Test (Welch's)

$$\boxed{t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}}$$

  • When: Comparing means from two independent groups
  • df: Welch-Satterthwaite approximation (let software compute it)
  • Default choice: Welch's (does not assume equal variances)
  • Example: Alex's A/B test — old algorithm vs. new algorithm

2. Paired t-Test

$$\boxed{t = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n - 1}$$

  • When: Data come in natural pairs (before/after, matched subjects)
  • Key insight: This IS a one-sample t-test on the within-pair differences
  • Why it's powerful: Eliminates between-subject variability
  • Example: Sam's game-by-game comparison of Daria's scoring

3. Two-Proportion z-Test

$$\boxed{z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}_{\text{pooled}}(1 - \hat{p}_{\text{pooled}})\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}}}$$

  • When: Comparing proportions from two independent groups
  • Pooled proportion: $\hat{p}_{\text{pooled}} = (X_1 + X_2)/(n_1 + n_2)$
  • Note: Test uses pooled SE; CI uses unpooled SE
  • Example: James's recidivism rate comparison

The Decision Flowchart

     Comparing two groups?
            │
      What type of variable?
      ╱                 ╲
  Numerical          Categorical
   (means)          (proportions)
      │                  │
   Paired?         Two-proportion
   ╱     ╲            z-test
  Yes     No
   │       │
Paired  Two-sample
t-test   t-test
         (Welch's)

Confidence Intervals for Differences

Scenario CI Formula
Independent means $(\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{s_1^2/n_1 + s_2^2/n_2}$
Paired means $\bar{d} \pm t^*_{n-1} \cdot s_d/\sqrt{n}$
Independent proportions $(\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}$

Reading the CI: - Contains zero → no significant difference - Entirely positive → Group 1 plausibly higher - Entirely negative → Group 1 plausibly lower - Width → precision of the estimate

Conditions Summary

Condition Two-Sample t Paired t Two-Proportion z
Independence between groups N/A (same subjects)
Independence within groups Differences independent
Normality/Success-failure Both groups: $n \geq 30$ or normal Differences: $n \geq 30$ or normal $n\hat{p}_{\text{pooled}} \geq 10$ and $n(1-\hat{p}_{\text{pooled}}) \geq 10$ per group
Random sampling/assignment

Python Quick Reference

from scipy import stats
from statsmodels.stats.proportion import proportions_ztest
import numpy as np

# --- Two-sample t-test (Welch's, default) ---
t, p = stats.ttest_ind(group1, group2, equal_var=False)
# One-tailed:
t, p = stats.ttest_ind(group1, group2, equal_var=False,
                        alternative='greater')

# --- Paired t-test ---
t, p = stats.ttest_rel(after, before)
# Equivalent:
diffs = after - before
t, p = stats.ttest_1samp(diffs, popmean=0)

# --- Two-proportion z-test ---
count = np.array([successes1, successes2])
nobs = np.array([n1, n2])
z, p = proportions_ztest(count, nobs)

Excel Quick Reference

Test Formula
Two-sample t (Welch's) =T.TEST(range1, range2, tails, 3)
Paired t =T.TEST(range1, range2, tails, 1)
Two-proportion z Compute z manually, then =2*(1-NORM.S.DIST(ABS(z),TRUE))

Common Misconceptions

Misconception Reality
"Same test score = paired data" Pairing is about the study design (same subjects, matched pairs), not the test itself
"Overlapping CIs mean no difference" Two CIs can overlap and the difference can still be significant; compute the CI for the difference
"Significant difference = causal effect" Only in randomized experiments; observational studies show associations
"Equal variances test first, then choose t-test" Just use Welch's by default — it works either way
"More degrees of freedom always means more power" Using the wrong test (independent instead of paired) gives more df but less power
"A small p-value means a large difference" P-values depend on sample size; the CI tells you how large the difference is

How This Chapter Connects

This Chapter Builds On Leads To
Two-sample t-test One-sample t-test (Ch.15), SE (Ch.11) ANOVA for 3+ groups (Ch.20)
Paired t-test One-sample t-test on differences (Ch.15) Repeated measures designs (Ch.20)
Two-proportion z-test One-sample z-test for proportions (Ch.14) Chi-square tests (Ch.19)
Study design interpretation Experimental vs. observational (Ch.4) Regression controls for confounders (Ch.22-23)
Choosing the right test Hypothesis testing framework (Ch.13) Power analysis informs sample size (Ch.17)

The Key Themes

Theme 2: Comparing groups is where bias becomes visible. Every fairness audit, pay equity analysis, and health disparity study uses two-group comparisons. James's false positive rate comparison revealed algorithmic bias that the overall comparison masked. The methods in this chapter are tools for justice as much as tools for science.

Theme 5: Correlation vs. causation in group comparisons. Finding a significant difference between two groups does not automatically mean one group's condition caused the difference. Alex's randomized A/B test supports a causal claim. Maya's observational comparison reveals an association that demands further investigation. The statistical test tells you the difference is real. The study design determines what kind of "real" it is.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

When comparing two groups, ask two questions before choosing a test: (1) Am I comparing means or proportions? (2) Are the data independent or paired? The answers determine whether you use a two-sample t-test, a paired t-test, or a two-proportion z-test. All three follow the same logic: observed difference divided by standard error of the difference. Use Welch's t-test by default for independent means (it doesn't assume equal variances). For paired data, compute within-pair differences and run a one-sample t-test. Always report the confidence interval for the difference alongside the p-value — the CI tells you HOW BIG the difference is, not just WHETHER it exists. And remember: the test tells you the difference is real; the study design tells you whether you can call it causal.

Key Terms

Term Definition
Two-sample t-test A hypothesis test comparing the means of two independent groups; uses Welch's formula by default: $t = (\bar{x}_1 - \bar{x}_2) / \sqrt{s_1^2/n_1 + s_2^2/n_2}$
Independent samples Two samples where individuals in one group are completely unrelated to individuals in the other; knowing a value in Group 1 tells you nothing about Group 2
Paired t-test A hypothesis test for paired data; computes within-pair differences and tests whether the mean difference is zero: $t = \bar{d}/(s_d/\sqrt{n})$ with $df = n-1$
Dependent samples Paired or matched observations where each data point in one group has a natural partner in the other (same person, same location, matched pair)
Pooled standard error The standard error for the difference between two statistics, combining variability from both groups; for independent means: $SE = \sqrt{s_1^2/n_1 + s_2^2/n_2}$
Two-proportion z-test A hypothesis test comparing proportions from two independent groups using the standard normal distribution; uses the pooled proportion under $H_0$
Matched pairs A study design where each observation in one condition is linked to a specific observation in the other condition, creating natural one-to-one correspondence
Difference in means $\bar{x}_1 - \bar{x}_2$; the point estimate for $\mu_1 - \mu_2$; the primary quantity of interest in two-sample mean comparisons
Difference in proportions $\hat{p}_1 - \hat{p}_2$; the point estimate for $p_1 - p_2$; the primary quantity of interest in two-sample proportion comparisons