Key Takeaways: Inference for Proportions
One-Sentence Summary
The one-sample z-test for proportions uses the sampling distribution of $\hat{p}$ (grounded in the CLT from Chapter 11) to test claims about population proportions — with the success-failure condition as the gatekeeper, $p_0$ in the test's standard error, and the Wilson interval as an improvement over the basic Wald CI for challenging situations.
Core Concepts at a Glance
| Concept |
Definition |
Why It Matters |
| Sample proportion ($\hat{p}$) |
$X/n$; the proportion of "successes" in the sample |
The point estimate for the population proportion $p$ |
| Population proportion ($p$) |
The true proportion in the entire population |
What we're trying to estimate or test |
| Success-failure condition |
$np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests) |
Ensures the normal approximation is valid |
| Normal approximation |
The CLT-based assumption that $\hat{p}$ is approximately normally distributed |
The theoretical foundation for the z-test and Wald CI |
| Wilson interval |
An improved CI formula with better coverage properties |
Fixes the Wald interval's problems for small $n$ or extreme $\hat{p}$ |
| Plus-four method |
Add 2 successes and 2 failures, then use Wald formula |
Simple improvement for small samples |
| Margin of error for proportions |
$z^* \sqrt{\hat{p}(1-\hat{p})/n}$; half-width of the CI |
What "±3 points" means in polls |
The One-Sample Z-Test for a Proportion
The Five Steps
| Step |
Action |
Key Detail |
| 1 |
State $H_0: p = p_0$ and $H_a$ |
Choose one- or two-tailed based on the research question |
| 2 |
Choose $\alpha$ (typically 0.05) |
Consider consequences of Type I and Type II errors |
| 3 |
Check three conditions |
Random, independent (10%), success-failure |
| 4 |
Compute $z$ and p-value |
$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$ |
| 5 |
Conclude in context |
Not just "reject" — explain what it means |
The Test Statistic
$$\boxed{z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}}$$
Uses $p_0$ in the SE because we compute under the assumption that $H_0$ is true.
The Conditions
| Condition |
What It Checks |
What Happens If It Fails |
| Random sample |
Representative data |
Results not generalizable (bias) |
| Independence (10% condition) |
Observations are independent |
SE formula needs adjustment |
| Success-failure ($np_0 \geq 10$, $n(1-p_0) \geq 10$) |
Normal approximation is valid |
Use exact binomial test instead |
Confidence Intervals: Three Methods
| Method |
Formula |
When to Use |
| Wald |
$\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}$ |
Large $n$, $\hat{p}$ not extreme |
| Wilson |
Complex formula (use software) |
Any situation; best coverage |
| Plus-four |
$\tilde{p} \pm z^* \sqrt{\tilde{p}(1-\tilde{p})/(n+4)}$ where $\tilde{p} = (X+2)/(n+4)$ |
Simple improvement for small $n$ |
Key difference from hypothesis tests: CIs use $\hat{p}$ in the SE (not $p_0$) because there's no hypothesized value.
Margin of Error in Polling
$$E = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
| Sample Size |
Approx. Margin of Error ($p \approx 0.5$, 95% CI) |
| 400 |
±4.9% |
| 1,000 |
±3.1% |
| 1,500 |
±2.5% |
| 4,000 |
±1.5% |
Critical caveat: The margin of error measures random sampling error only — not bias (nonresponse, coverage, question wording). A biased poll can have a small margin of error and still be very wrong.
The $p_0$ vs. $\hat{p}$ Rule
| Context |
Use in SE |
Why |
| Hypothesis test |
$p_0$ |
Computing probabilities under $H_0$ |
| Confidence interval |
$\hat{p}$ |
No hypothesized value to assume |
| Success-failure check (test) |
$p_0$ |
Checking conditions under $H_0$ |
| Success-failure check (CI) |
$\hat{p}$ |
No $p_0$ available |
Common Misconceptions
| Misconception |
Reality |
| "Failing to reject means $p = p_0$" |
It means there's not enough evidence to conclude otherwise |
| "The margin of error covers all sources of error" |
It covers only random sampling error, not bias |
| "A significant result means a large difference" |
Significance depends on $n$; large samples can detect tiny differences |
| "The Wald interval always works" |
It fails for small $n$ and extreme $\hat{p}$; use Wilson or plus-four |
| "If the success-failure condition fails, no inference is possible" |
Use exact binomial test or Wilson interval |
Python Quick Reference
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest, proportion_confint
# --- One-sample z-test for a proportion ---
z_stat, p_value = proportions_ztest(count=208, nobs=400,
value=0.47,
alternative='larger')
# --- Confidence intervals (three methods) ---
ci_wald = proportion_confint(208, 400, alpha=0.05, method='normal')
ci_wilson = proportion_confint(208, 400, alpha=0.05, method='wilson')
# --- Plus-four (manual) ---
p_tilde = (208 + 2) / (400 + 4)
se_tilde = np.sqrt(p_tilde * (1 - p_tilde) / (400 + 4))
ci_pf = (p_tilde - 1.96 * se_tilde, p_tilde + 1.96 * se_tilde)
# --- Exact binomial test (when success-failure fails) ---
result = stats.binomtest(k=3, n=20, p=0.05, alternative='greater')
How This Chapter Connects
| This Chapter |
Builds On |
Leads To |
| Z-test for proportions |
CLT for proportions (Ch.11), hypothesis testing (Ch.13) |
Two-sample proportion test (Ch.16) |
| CI for proportions (Wald/Wilson) |
CI framework (Ch.12) |
Bootstrap CI (Ch.18) |
| Success-failure condition |
Binomial distribution (Ch.10), CLT conditions (Ch.11) |
Chi-square conditions (Ch.19) |
| Margin of error in polls |
Sampling methods (Ch.4), standard error (Ch.11) |
Power analysis (Ch.17) |
| PPV and base rates |
Bayes' theorem (Ch.9) |
Logistic regression PPV (Ch.24) |
| Contextual interpretation |
Hypothesis testing logic (Ch.13) |
Effect sizes (Ch.17) |
Threads Updated
- Sam and Daria: Formal proportion test conducted ($z = 1.30$, $p = 0.097$). Still fail to reject at $\alpha = 0.05$ with only 65 shots. Need more data — power analysis in Ch.17.
- Maya: Tested county hypertension prevalence above national rate — found significant evidence ($z = 2.00$, $p = 0.023$). Used proportion inference to estimate disease prevalence for Bayesian screening analysis.
- Polling and elections: Margin of error demystified. Random error vs. bias distinction. The 2016/2020 lessons.
The One Thing to Remember
If you forget everything else from this chapter, remember this:
In a hypothesis test, the standard error uses $p_0$ (because you're computing under the null). In a confidence interval, the standard error uses $\hat{p}$ (because there's no null to assume). The success-failure condition is your checkpoint for whether the normal approximation is valid. And the margin of error in a poll only measures random sampling error — not bias, not nonresponse, not question wording. The uncertainty in any proportion estimate is always at least as large as the margin of error, and usually larger.
Key Terms
| Term |
Definition |
| Sample proportion ($\hat{p}$) |
The proportion of successes in the sample: $\hat{p} = X/n$ |
| Population proportion ($p$) |
The true proportion of successes in the entire population |
| One-sample z-test for proportions |
A hypothesis test that uses the standard normal distribution to test a claim about a population proportion |
| Success-failure condition |
The requirement that $np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests) or $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ (for CIs); ensures the normal approximation is valid |
| Normal approximation |
The use of the normal distribution to approximate the sampling distribution of $\hat{p}$, justified by the CLT for proportions |
| Margin of error for proportions |
$z^* \sqrt{\hat{p}(1-\hat{p})/n}$; the maximum likely distance between $\hat{p}$ and $p$ at a given confidence level |
| Wilson interval |
An improved confidence interval for proportions with better coverage properties than the Wald interval, especially for small samples and extreme proportions |
| Plus-four method |
A simple improvement to the Wald interval: add 2 successes and 2 failures before computing the CI ($\tilde{p} = (X+2)/(n+4)$) |