Key Takeaways: Inference for Means
One-Sentence Summary
The one-sample t-test — the workhorse of statistical inference — tests claims about a population mean using the t-distribution (which honestly accounts for our uncertainty about $\sigma$), is robust to non-normality for moderate-to-large samples, and should be your default whenever $\sigma$ is unknown (which is almost always).
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| One-sample t-test | Tests whether a population mean $\mu$ equals a specific value $\mu_0$, using the t-distribution | The most commonly used statistical test in practice; applies whenever you have quantitative data and a reference value |
| Robustness | A procedure's ability to give approximately correct results even when assumptions aren't perfectly met | The t-test is remarkably robust to non-normality for $n \geq 30$, making it practical for real-world data |
| Paired data (preview) | Data that come in natural pairs (before/after, matched subjects); analyzed by computing differences and running a one-sample t-test on those differences | Eliminates person-to-person variability, often dramatically increasing power |
The One-Sample t-Test Formula
$$\boxed{t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}}$$
where: - $\bar{x}$ = sample mean - $\mu_0$ = hypothesized population mean (from $H_0$) - $s$ = sample standard deviation - $n$ = sample size - $df = n - 1$ (degrees of freedom)
In plain English: The t-statistic measures how many standard errors the sample mean is from the hypothesized value. Large values (positive or negative) mean the data are far from $H_0$; values near zero mean the data are consistent with $H_0$.
The Five-Step Procedure
| Step | Action | Key Question |
|---|---|---|
| 1 | State $H_0$ and $H_a$ | What's the claim? One- or two-tailed? |
| 2 | Check conditions | Random? Independent? Normal enough? |
| 3 | Compute $t = (\bar{x} - \mu_0) / (s/\sqrt{n})$ | How far from $H_0$, in SE units? |
| 4 | Find p-value from $t_{n-1}$ distribution | How surprising are these data if $H_0$ is true? |
| 5 | Conclude in context | Reject or fail to reject — and what does it mean? |
Three Conditions for the t-Test
| Condition | What to Check | What Happens If Violated |
|---|---|---|
| 1. Randomness | Data from a random sample or random assignment | Results cannot be generalized; no statistical fix |
| 2. Independence | Observations don't influence each other; 10% condition for sampling without replacement | Standard error is wrong; p-values unreliable |
| 3. Normality | Sampling distribution of $\bar{x}$ is approximately normal | P-values may be inaccurate, especially for small $n$ |
The Normality Condition: Quick Guide
| Sample Size | Requirement | Rationale |
|---|---|---|
| $n < 15$ | Population must be approximately normal; no outliers or skewness | CLT can't compensate; t-test relies on normality directly |
| $15 \leq n < 30$ | Tolerate moderate skewness; check for extreme outliers | CLT partially compensates; extreme outliers still distort |
| $n \geq 30$ | CLT handles most population shapes; only extreme outliers are a concern | CLT nearly guarantees normality of $\bar{x}$ |
Robustness Summary
| The t-test IS robust to... | The t-test is NOT robust to... |
|---|---|
| Moderate skewness (especially $n \geq 30$) | Extreme outliers (any sample size) |
| Light-tailed distributions | Heavy-tailed distributions with small $n$ |
| Bimodal distributions (moderate $n$) | Strong skewness with small $n$ |
z-Test vs. t-Test
| Feature | z-Test | t-Test |
|---|---|---|
| Uses | $\sigma$ (known) | $s$ (estimated from data) |
| Distribution | Standard normal | t with $df = n - 1$ |
| When to use | Almost never | Almost always |
| Effect of small $n$ | No extra penalty | Heavier tails → more conservative |
| Bottom line | Training wheels | The real deal |
Rule of thumb: Default to the t-test. You'll be right 99% of the time.
Confidence Interval for a Mean
$$\boxed{\bar{x} \pm t^*_{n-1} \cdot \frac{s}{\sqrt{n}}}$$
CI-Test Duality:
$$\text{Reject } H_0: \mu = \mu_0 \text{ at } \alpha = 0.05 \iff \mu_0 \text{ is NOT in the 95\% CI}$$
Always report the CI alongside the test — the CI tells you how large the effect might be, not just whether it exists.
P-Value Calculation
| Alternative Hypothesis | P-Value |
|---|---|
| $H_a: \mu > \mu_0$ (right-tailed) | $P(T_{df} \geq t)$ |
| $H_a: \mu < \mu_0$ (left-tailed) | $P(T_{df} \leq t)$ |
| $H_a: \mu \neq \mu_0$ (two-tailed) | $2 \times P(T_{df} \geq |t|)$ |
Python Quick Reference
import numpy as np
from scipy import stats
# --- One-sample t-test (raw data) ---
data = np.array([...]) # your data
result = stats.ttest_1samp(data, popmean=mu_0)
# result.statistic = t-value
# result.pvalue = two-tailed p-value
# For one-tailed (SciPy ≥ 1.7):
result = stats.ttest_1samp(data, popmean=mu_0, alternative='greater')
result = stats.ttest_1samp(data, popmean=mu_0, alternative='less')
# --- t-test from summary statistics ---
t_stat = (x_bar - mu_0) / (s / np.sqrt(n))
p_two = 2 * stats.t.sf(abs(t_stat), df=n-1) # two-tailed
p_right = stats.t.sf(t_stat, df=n-1) # right-tailed
p_left = stats.t.cdf(t_stat, df=n-1) # left-tailed
# --- Confidence interval ---
t_star = stats.t.ppf(0.975, df=n-1) # 95% CI
margin = t_star * s / np.sqrt(n)
ci = (x_bar - margin, x_bar + margin)
# --- Normality check ---
stat, p = stats.shapiro(data) # Shapiro-Wilk test
stats.probplot(data, dist="norm", plot=plt) # QQ-plot
Excel Quick Reference
| Task | Formula |
|---|---|
| t-statistic | =(AVERAGE(range) - mu_0) / (STDEV.S(range) / SQRT(COUNT(range))) |
| p-value (two-tailed) | =T.DIST.2T(ABS(t), df) |
| p-value (right-tailed) | =T.DIST.RT(t, df) |
| p-value (left-tailed) | =T.DIST(t, df, TRUE) |
| Critical value (95% CI) | =T.INV.2T(0.05, df) |
| Margin of error | =CONFIDENCE.T(0.05, STDEV.S(range), COUNT(range)) |
Common Misconceptions
| Misconception | Reality |
|---|---|
| "Use z-test for large $n$, t-test for small $n$" | Use z when $\sigma$ is known (rare), t when $\sigma$ is estimated (usual) |
| "The t-test requires normally distributed data" | It requires a normal sampling distribution of $\bar{x}$ — which the CLT provides for $n \geq 30$ |
| "The t-distribution's wider tails are a problem" | They're a feature — honest acknowledgment of uncertainty from estimating $\sigma$ |
| "A small p-value means a large effect" | P-values measure evidence, not effect size; report the CI for magnitude |
| "Fail to reject = the null is true" | It means insufficient evidence to reject; the effect might exist but be undetectable with your sample size |
| "Borderline p-values ($\approx$ 0.05) are meaningless" | They indicate suggestive but inconclusive evidence; context matters |
How This Chapter Connects
| This Chapter | Builds On | Leads To |
|---|---|---|
| t-test formula | z-test (Ch.13), SE (Ch.11), $\bar{x}$ and $s$ (Ch.6) | Two-sample t-test, paired t-test (Ch.16) |
| t-distribution | Introduced in Ch.12 for CIs | Deepened here for hypothesis testing; used through Ch.24 |
| Conditions | Normal model (Ch.10), CLT (Ch.11) | Same conditions apply to all future t-based procedures |
| Robustness | Distribution thinking (Ch.5), normality assessment (Ch.10) | Nonparametric alternatives (Ch.21), bootstrap (Ch.18) |
| CI-test duality | Established in Ch.13 | Applied in every inference chapter going forward |
| Paired data preview | One-sample t-test on differences | Full treatment in Ch.16 |
The Key Theme
The t-distribution embodies a fundamental statistical virtue: honesty about uncertainty. When $\sigma$ is unknown and must be estimated from data, the t-distribution widens its tails to say: "We're less certain than we'd be if we knew $\sigma$." As $n$ grows and our estimate improves, the t-distribution relaxes toward the normal. This isn't a weakness — it's intellectual integrity. The t-distribution doesn't pretend to know more than it does.
The One Thing to Remember
If you forget everything else from this chapter, remember this:
When you want to test whether a population mean equals a specific value, use the one-sample t-test: $t = (\bar{x} - \mu_0) / (s / \sqrt{n})$, with $df = n - 1$. Use it whenever $\sigma$ is unknown, which is almost always. Check three conditions (random, independent, normal enough), and remember that the t-test is robust to non-normality for $n \geq 30$. Always pair the hypothesis test with a confidence interval — the test tells you WHETHER an effect exists; the CI tells you HOW LARGE it might be. The t-distribution's wider tails aren't a limitation — they're the honest price of admitting we don't know $\sigma$.
Key Terms
| Term | Definition |
|---|---|
| One-sample t-test | A hypothesis test for a population mean that uses the t-distribution because $\sigma$ is estimated by $s$; test statistic: $t = (\bar{x} - \mu_0)/(s/\sqrt{n})$ |
| t-distribution (deepened) | A symmetric, bell-shaped distribution with heavier tails than the normal; indexed by degrees of freedom; used when $\sigma$ is estimated; converges to normal as $df \to \infty$ |
| Degrees of freedom (deepened) | For a one-sample t-test: $df = n - 1$; connects to $n - 1$ in the sample variance formula; determines which t-distribution to use; smaller df = heavier tails = more uncertainty |
| Robustness | A statistical procedure's ability to give approximately correct results even when its assumptions are not perfectly satisfied |
| Normality assumption | The condition that the sampling distribution of $\bar{x}$ is approximately normal; satisfied by population normality (small $n$) or by the CLT ($n \geq 30$) |
| Paired data (preview) | Observations that come in natural pairs; analyzed by computing within-pair differences and applying a one-sample t-test to the differences |