Key Takeaways: Inference for Means

Contributors

Key Takeaways: Inference for Means

One-Sentence Summary

The one-sample t-test — the workhorse of statistical inference — tests claims about a population mean using the t-distribution (which honestly accounts for our uncertainty about $\sigma$), is robust to non-normality for moderate-to-large samples, and should be your default whenever $\sigma$ is unknown (which is almost always).

Core Concepts at a Glance

Concept	Definition	Why It Matters
One-sample t-test	Tests whether a population mean $\mu$ equals a specific value $\mu_0$, using the t-distribution	The most commonly used statistical test in practice; applies whenever you have quantitative data and a reference value
Robustness	A procedure's ability to give approximately correct results even when assumptions aren't perfectly met	The t-test is remarkably robust to non-normality for $n \geq 30$, making it practical for real-world data
Paired data (preview)	Data that come in natural pairs (before/after, matched subjects); analyzed by computing differences and running a one-sample t-test on those differences	Eliminates person-to-person variability, often dramatically increasing power

The One-Sample t-Test Formula

$$\boxed{t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}}$$

where: - $\bar{x}$ = sample mean - $\mu_0$ = hypothesized population mean (from $H_0$) - $s$ = sample standard deviation - $n$ = sample size - $df = n - 1$ (degrees of freedom)

In plain English: The t-statistic measures how many standard errors the sample mean is from the hypothesized value. Large values (positive or negative) mean the data are far from $H_0$; values near zero mean the data are consistent with $H_0$.

The Five-Step Procedure

Step	Action	Key Question
1	State $H_0$ and $H_a$	What's the claim? One- or two-tailed?
2	Check conditions	Random? Independent? Normal enough?
3	Compute $t = (\bar{x} - \mu_0) / (s/\sqrt{n})$	How far from $H_0$, in SE units?
4	Find p-value from $t_{n-1}$ distribution	How surprising are these data if $H_0$ is true?
5	Conclude in context	Reject or fail to reject — and what does it mean?

Three Conditions for the t-Test

Condition	What to Check	What Happens If Violated
1. Randomness	Data from a random sample or random assignment	Results cannot be generalized; no statistical fix
2. Independence	Observations don't influence each other; 10% condition for sampling without replacement	Standard error is wrong; p-values unreliable
3. Normality	Sampling distribution of $\bar{x}$ is approximately normal	P-values may be inaccurate, especially for small $n$

The Normality Condition: Quick Guide

Sample Size	Requirement	Rationale
$n < 15$	Population must be approximately normal; no outliers or skewness	CLT can't compensate; t-test relies on normality directly
$15 \leq n < 30$	Tolerate moderate skewness; check for extreme outliers	CLT partially compensates; extreme outliers still distort
$n \geq 30$	CLT handles most population shapes; only extreme outliers are a concern	CLT nearly guarantees normality of $\bar{x}$

Robustness Summary

The t-test IS robust to...	The t-test is NOT robust to...
Moderate skewness (especially $n \geq 30$)	Extreme outliers (any sample size)
Light-tailed distributions	Heavy-tailed distributions with small $n$
Bimodal distributions (moderate $n$)	Strong skewness with small $n$

z-Test vs. t-Test

Feature	z-Test	t-Test
Uses	$\sigma$ (known)	$s$ (estimated from data)
Distribution	Standard normal	t with $df = n - 1$
When to use	Almost never	Almost always
Effect of small $n$	No extra penalty	Heavier tails → more conservative
Bottom line	Training wheels	The real deal

Rule of thumb: Default to the t-test. You'll be right 99% of the time.

Confidence Interval for a Mean

$$\boxed{\bar{x} \pm t^*_{n-1} \cdot \frac{s}{\sqrt{n}}}$$

CI-Test Duality:

$$\text{Reject } H_0: \mu = \mu_0 \text{ at } \alpha = 0.05 \iff \mu_0 \text{ is NOT in the 95\% CI}$$

Always report the CI alongside the test — the CI tells you how large the effect might be, not just whether it exists.

P-Value Calculation

Alternative Hypothesis	P-Value
$H_a: \mu > \mu_0$ (right-tailed)	$P(T_{df} \geq t)$
$H_a: \mu < \mu_0$ (left-tailed)	$P(T_{df} \leq t)$
$H_a: \mu \neq \mu_0$ (two-tailed)	$2 \times P(T_{df} \geq \|t\|)$

Python Quick Reference

import numpy as np
from scipy import stats

# --- One-sample t-test (raw data) ---
data = np.array([...])  # your data
result = stats.ttest_1samp(data, popmean=mu_0)
# result.statistic = t-value
# result.pvalue = two-tailed p-value

# For one-tailed (SciPy ≥ 1.7):
result = stats.ttest_1samp(data, popmean=mu_0, alternative='greater')
result = stats.ttest_1samp(data, popmean=mu_0, alternative='less')

# --- t-test from summary statistics ---
t_stat = (x_bar - mu_0) / (s / np.sqrt(n))
p_two = 2 * stats.t.sf(abs(t_stat), df=n-1)    # two-tailed
p_right = stats.t.sf(t_stat, df=n-1)             # right-tailed
p_left = stats.t.cdf(t_stat, df=n-1)             # left-tailed

# --- Confidence interval ---
t_star = stats.t.ppf(0.975, df=n-1)              # 95% CI
margin = t_star * s / np.sqrt(n)
ci = (x_bar - margin, x_bar + margin)

# --- Normality check ---
stat, p = stats.shapiro(data)                     # Shapiro-Wilk test
stats.probplot(data, dist="norm", plot=plt)       # QQ-plot

Excel Quick Reference

Task	Formula
t-statistic	`=(AVERAGE(range) - mu_0) / (STDEV.S(range) / SQRT(COUNT(range)))`
p-value (two-tailed)	`=T.DIST.2T(ABS(t), df)`
p-value (right-tailed)	`=T.DIST.RT(t, df)`
p-value (left-tailed)	`=T.DIST(t, df, TRUE)`
Critical value (95% CI)	`=T.INV.2T(0.05, df)`
Margin of error	`=CONFIDENCE.T(0.05, STDEV.S(range), COUNT(range))`

Common Misconceptions

Misconception	Reality
"Use z-test for large $n$, t-test for small $n$"	Use z when $\sigma$ is known (rare), t when $\sigma$ is estimated (usual)
"The t-test requires normally distributed data"	It requires a normal sampling distribution of $\bar{x}$ — which the CLT provides for $n \geq 30$
"The t-distribution's wider tails are a problem"	They're a feature — honest acknowledgment of uncertainty from estimating $\sigma$
"A small p-value means a large effect"	P-values measure evidence, not effect size; report the CI for magnitude
"Fail to reject = the null is true"	It means insufficient evidence to reject; the effect might exist but be undetectable with your sample size
"Borderline p-values ($\approx$ 0.05) are meaningless"	They indicate suggestive but inconclusive evidence; context matters

How This Chapter Connects

This Chapter	Builds On	Leads To
t-test formula	z-test (Ch.13), SE (Ch.11), $\bar{x}$ and $s$ (Ch.6)	Two-sample t-test, paired t-test (Ch.16)
t-distribution	Introduced in Ch.12 for CIs	Deepened here for hypothesis testing; used through Ch.24
Conditions	Normal model (Ch.10), CLT (Ch.11)	Same conditions apply to all future t-based procedures
Robustness	Distribution thinking (Ch.5), normality assessment (Ch.10)	Nonparametric alternatives (Ch.21), bootstrap (Ch.18)
CI-test duality	Established in Ch.13	Applied in every inference chapter going forward
Paired data preview	One-sample t-test on differences	Full treatment in Ch.16

The Key Theme

The t-distribution embodies a fundamental statistical virtue: honesty about uncertainty. When $\sigma$ is unknown and must be estimated from data, the t-distribution widens its tails to say: "We're less certain than we'd be if we knew $\sigma$." As $n$ grows and our estimate improves, the t-distribution relaxes toward the normal. This isn't a weakness — it's intellectual integrity. The t-distribution doesn't pretend to know more than it does.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

When you want to test whether a population mean equals a specific value, use the one-sample t-test: $t = (\bar{x} - \mu_0) / (s / \sqrt{n})$, with $df = n - 1$. Use it whenever $\sigma$ is unknown, which is almost always. Check three conditions (random, independent, normal enough), and remember that the t-test is robust to non-normality for $n \geq 30$. Always pair the hypothesis test with a confidence interval — the test tells you WHETHER an effect exists; the CI tells you HOW LARGE it might be. The t-distribution's wider tails aren't a limitation — they're the honest price of admitting we don't know $\sigma$.

Key Terms

Term	Definition
One-sample t-test	A hypothesis test for a population mean that uses the t-distribution because $\sigma$ is estimated by $s$; test statistic: $t = (\bar{x} - \mu_0)/(s/\sqrt{n})$
t-distribution (deepened)	A symmetric, bell-shaped distribution with heavier tails than the normal; indexed by degrees of freedom; used when $\sigma$ is estimated; converges to normal as $df \to \infty$
Degrees of freedom (deepened)	For a one-sample t-test: $df = n - 1$; connects to $n - 1$ in the sample variance formula; determines which t-distribution to use; smaller df = heavier tails = more uncertainty
Robustness	A statistical procedure's ability to give approximately correct results even when its assumptions are not perfectly satisfied
Normality assumption	The condition that the sampling distribution of $\bar{x}$ is approximately normal; satisfied by population normality (small $n$) or by the CLT ($n \geq 30$)
Paired data (preview)	Observations that come in natural pairs; analyzed by computing within-pair differences and applying a one-sample t-test to the differences