Key Takeaways: Analysis of Variance (ANOVA)
One-Sentence Summary
Analysis of variance (ANOVA) compares means across three or more groups by decomposing total data variability into between-group (explained) and within-group (unexplained) components, using the F-statistic ratio to determine whether group differences exceed what random variation alone would produce — followed by Tukey's HSD post-hoc tests to identify which specific groups differ while controlling the family-wise error rate.
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| Multiple comparisons problem | Running many tests inflates the probability of at least one false positive far beyond $\alpha$ | Explains why multiple t-tests are dangerous and ANOVA is necessary |
| Between-group variability | Variability in the data explained by group membership ($SS_B$) | The "signal" — differences among group means |
| Within-group variability | Variability in the data unexplained by groups ($SS_W$) — natural noise within each group | The "noise" — individual differences within groups |
| F-statistic | Ratio $MS_B / MS_W$ — signal divided by noise | Large F means group differences are unlikely due to chance alone |
| Decomposing variability | $SS_T = SS_B + SS_W$ — total variation splits exactly into explained and unexplained parts | The threshold concept; foundation for regression $R^2$ and all statistical modeling |
The ANOVA Procedure
Step by Step
-
State hypotheses: - $H_0: \mu_1 = \mu_2 = \cdots = \mu_k$ (all group means equal) - $H_a$: Not all $\mu_i$ are equal (at least one group differs)
-
Check assumptions: - Independence (study design) - Normality within each group (histograms, QQ-plots, Shapiro-Wilk) - Equal variances (Levene's test, SD ratio $< 2$)
-
Compute the ANOVA table:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | $\sum n_i(\bar{x}_i - \bar{x})^2$ | $k - 1$ | $SS_B / (k-1)$ | $MS_B / MS_W$ |
| Within | $\sum\sum(x_{ij} - \bar{x}_i)^2$ | $N - k$ | $SS_W / (N-k)$ | |
| Total | $\sum\sum(x_{ij} - \bar{x})^2$ | $N - 1$ |
-
Find the p-value from the F-distribution with $df_1 = k-1$ and $df_2 = N-k$
-
Compute effect size: $\eta^2 = SS_B / SS_T$
-
If significant: run Tukey's HSD for pairwise comparisons
-
Interpret in context with descriptive statistics, F-statistic, p-value, effect size, and post-hoc results
Key Python Code
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import numpy as np
# One-way ANOVA
F_stat, p_value = stats.f_oneway(group1, group2, group3)
# Check equal variances
stat, p_levene = stats.levene(group1, group2, group3)
# Effect size (eta-squared) — manual calculation
all_data = np.concatenate([group1, group2, group3])
grand_mean = np.mean(all_data)
ss_between = sum(len(g) * (np.mean(g) - grand_mean)**2
for g in [group1, group2, group3])
ss_total = np.sum((all_data - grand_mean)**2)
eta_squared = ss_between / ss_total
# Post-hoc: Tukey's HSD
data = np.concatenate([group1, group2, group3])
labels = ['G1']*len(group1) + ['G2']*len(group2) + ['G3']*len(group3)
tukey = pairwise_tukeyhsd(endog=data, groups=labels, alpha=0.05)
print(tukey)
Excel: Data Analysis ToolPak
- Data tab → Data Analysis → Anova: Single Factor
- Set Input Range to all data columns
- Grouped By: Columns
- Output includes ANOVA table with SS, df, MS, F, p-value, and $F_{\text{critical}}$
The Threshold Concept: Decomposing Variability
Total variation = Explained variation + Unexplained variation
$$SS_T = SS_B + SS_W$$
This is not just an ANOVA formula. It's a universal principle:
| Context | Total | Explained | Unexplained |
|---|---|---|---|
| ANOVA | $SS_T$ | $SS_B$ (group differences) | $SS_W$ (within-group noise) |
| Regression (Ch.22) | $SS_T$ | $SS_{\text{Reg}}$ (predictor) | $SS_{\text{Res}}$ (residuals) |
| Effect size | 100% | $\eta^2$ or $R^2$ (% explained) | $1 - \eta^2$ (% unexplained) |
Getting this concept — really getting it — prepares you for regression, multiple regression, and the $R^2$ interpretation in Chapters 22-23.
The Multiple Comparisons Problem
| Groups ($k$) | Pairwise Tests | $P(\geq 1 \text{ false positive})$ |
|---|---|---|
| 2 | 1 | 5.0% |
| 3 | 3 | 14.3% |
| 5 | 10 | 40.1% |
| 10 | 45 | 90.1% |
ANOVA solves this by testing all groups in a single test with a single p-value, keeping the Type I error rate at exactly $\alpha$.
Key Formulas
| Formula | Description |
|---|---|
| $SS_T = \sum\sum(x_{ij} - \bar{x})^2$ | Total sum of squares |
| $SS_B = \sum n_i(\bar{x}_i - \bar{x})^2$ | Between-group sum of squares |
| $SS_W = \sum\sum(x_{ij} - \bar{x}_i)^2$ | Within-group sum of squares |
| $MS_B = SS_B / (k-1)$ | Mean square between |
| $MS_W = SS_W / (N-k)$ | Mean square within |
| $F = MS_B / MS_W$ | F-statistic |
| $\eta^2 = SS_B / SS_T$ | Eta-squared (proportion of variance explained) |
| $\binom{k}{2} = k(k-1)/2$ | Number of pairwise comparisons |
Effect Size Benchmarks (Cohen, 1988)
| $\eta^2$ | Cohen's $f$ | Interpretation |
|---|---|---|
| 0.01 | 0.10 | Small |
| 0.06 | 0.25 | Medium |
| 0.14 | 0.40 | Large |
Always interpret effect sizes in the context of your field — these benchmarks are starting points, not absolute standards.
Post-Hoc Tests: When and How
| Test | When to Use |
|---|---|
| Tukey's HSD | Default for ANOVA follow-up; all pairwise comparisons; less conservative than Bonferroni |
| Bonferroni | When you want to test only a few pre-planned comparisons; simpler but more conservative |
| Neither | When ANOVA is not significant — do not fish for pairwise differences |
Assumptions Checklist
| Assumption | How to Check | If Violated |
|---|---|---|
| Independence | Study design (random sampling, random assignment) | Use repeated-measures ANOVA or mixed models |
| Normality | Shapiro-Wilk, QQ-plots, histograms per group | Robust if $n \geq 15$-$20$ per group and balanced design; otherwise Kruskal-Wallis (Ch.21) |
| Equal variances | Levene's test, SD ratio $< 2$ | Welch's ANOVA; robust if group sizes are equal |
Common Mistakes
| Mistake | Correction |
|---|---|
| Running multiple t-tests instead of ANOVA | Use one-way ANOVA to test all groups simultaneously |
| "ANOVA is significant, so all groups differ" | ANOVA only tells you at least one group differs; use Tukey's HSD for specifics |
| Running post-hoc tests after non-significant ANOVA | Only run post-hoc tests after a significant omnibus test |
| Ignoring effect size | Always report $\eta^2$ alongside $F$ and $p$ |
| Reporting $F$ without both degrees of freedom | Correct format: $F(df_B, df_W) = \text{value}$, $p = \text{value}$, $\eta^2 = \text{value}$ |
Reporting Template (APA Style)
"A one-way ANOVA revealed a statistically significant difference in [outcome variable] across the [k] [grouping variable] groups, $F(df_B, df_W) = [F\text{-value}]$, $p [= \text{or} <] [p\text{-value}]$, $\eta^2 = [value]$. Tukey's HSD post-hoc comparisons indicated that [specific group differences with means and adjusted p-values]."
Connections
| Connection | Details |
|---|---|
| Ch.6 (Variance) | ANOVA literally analyzes variance; $MS_W$ is a pooled version of the sample variance |
| Ch.16 (Two-sample t-test) | ANOVA generalizes to $k \geq 2$ groups; when $k = 2$, $F = t^2$ |
| Ch.17 (Effect sizes, multiple comparisons) | $\eta^2$ parallels Cohen's $d$; FWER and Bonferroni correction applied to ANOVA context |
| Ch.21 (Nonparametric methods) | Kruskal-Wallis test is the nonparametric alternative when ANOVA assumptions fail |
| Ch.22 (Regression) | $R^2$ is the regression analogue of $\eta^2$; same decomposition of variability |