Key Takeaways: Inference for Proportions

One-Sentence Summary

The one-sample z-test for proportions uses the sampling distribution of $\hat{p}$ (grounded in the CLT from Chapter 11) to test claims about population proportions — with the success-failure condition as the gatekeeper, $p_0$ in the test's standard error, and the Wilson interval as an improvement over the basic Wald CI for challenging situations.

Core Concepts at a Glance

Concept Definition Why It Matters
Sample proportion ($\hat{p}$) $X/n$; the proportion of "successes" in the sample The point estimate for the population proportion $p$
Population proportion ($p$) The true proportion in the entire population What we're trying to estimate or test
Success-failure condition $np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests) Ensures the normal approximation is valid
Normal approximation The CLT-based assumption that $\hat{p}$ is approximately normally distributed The theoretical foundation for the z-test and Wald CI
Wilson interval An improved CI formula with better coverage properties Fixes the Wald interval's problems for small $n$ or extreme $\hat{p}$
Plus-four method Add 2 successes and 2 failures, then use Wald formula Simple improvement for small samples
Margin of error for proportions $z^* \sqrt{\hat{p}(1-\hat{p})/n}$; half-width of the CI What "±3 points" means in polls

The One-Sample Z-Test for a Proportion

The Five Steps

Step Action Key Detail
1 State $H_0: p = p_0$ and $H_a$ Choose one- or two-tailed based on the research question
2 Choose $\alpha$ (typically 0.05) Consider consequences of Type I and Type II errors
3 Check three conditions Random, independent (10%), success-failure
4 Compute $z$ and p-value $z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$
5 Conclude in context Not just "reject" — explain what it means

The Test Statistic

$$\boxed{z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}}$$

Uses $p_0$ in the SE because we compute under the assumption that $H_0$ is true.

The Conditions

Condition What It Checks What Happens If It Fails
Random sample Representative data Results not generalizable (bias)
Independence (10% condition) Observations are independent SE formula needs adjustment
Success-failure ($np_0 \geq 10$, $n(1-p_0) \geq 10$) Normal approximation is valid Use exact binomial test instead

Confidence Intervals: Three Methods

Method Formula When to Use
Wald $\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}$ Large $n$, $\hat{p}$ not extreme
Wilson Complex formula (use software) Any situation; best coverage
Plus-four $\tilde{p} \pm z^* \sqrt{\tilde{p}(1-\tilde{p})/(n+4)}$ where $\tilde{p} = (X+2)/(n+4)$ Simple improvement for small $n$

Key difference from hypothesis tests: CIs use $\hat{p}$ in the SE (not $p_0$) because there's no hypothesized value.

Margin of Error in Polling

$$E = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Sample Size Approx. Margin of Error ($p \approx 0.5$, 95% CI)
400 ±4.9%
1,000 ±3.1%
1,500 ±2.5%
4,000 ±1.5%

Critical caveat: The margin of error measures random sampling error only — not bias (nonresponse, coverage, question wording). A biased poll can have a small margin of error and still be very wrong.

The $p_0$ vs. $\hat{p}$ Rule

Context Use in SE Why
Hypothesis test $p_0$ Computing probabilities under $H_0$
Confidence interval $\hat{p}$ No hypothesized value to assume
Success-failure check (test) $p_0$ Checking conditions under $H_0$
Success-failure check (CI) $\hat{p}$ No $p_0$ available

Common Misconceptions

Misconception Reality
"Failing to reject means $p = p_0$" It means there's not enough evidence to conclude otherwise
"The margin of error covers all sources of error" It covers only random sampling error, not bias
"A significant result means a large difference" Significance depends on $n$; large samples can detect tiny differences
"The Wald interval always works" It fails for small $n$ and extreme $\hat{p}$; use Wilson or plus-four
"If the success-failure condition fails, no inference is possible" Use exact binomial test or Wilson interval

Python Quick Reference

import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

# --- One-sample z-test for a proportion ---
z_stat, p_value = proportions_ztest(count=208, nobs=400,
                                     value=0.47,
                                     alternative='larger')

# --- Confidence intervals (three methods) ---
ci_wald = proportion_confint(208, 400, alpha=0.05, method='normal')
ci_wilson = proportion_confint(208, 400, alpha=0.05, method='wilson')

# --- Plus-four (manual) ---
p_tilde = (208 + 2) / (400 + 4)
se_tilde = np.sqrt(p_tilde * (1 - p_tilde) / (400 + 4))
ci_pf = (p_tilde - 1.96 * se_tilde, p_tilde + 1.96 * se_tilde)

# --- Exact binomial test (when success-failure fails) ---
result = stats.binomtest(k=3, n=20, p=0.05, alternative='greater')

How This Chapter Connects

This Chapter Builds On Leads To
Z-test for proportions CLT for proportions (Ch.11), hypothesis testing (Ch.13) Two-sample proportion test (Ch.16)
CI for proportions (Wald/Wilson) CI framework (Ch.12) Bootstrap CI (Ch.18)
Success-failure condition Binomial distribution (Ch.10), CLT conditions (Ch.11) Chi-square conditions (Ch.19)
Margin of error in polls Sampling methods (Ch.4), standard error (Ch.11) Power analysis (Ch.17)
PPV and base rates Bayes' theorem (Ch.9) Logistic regression PPV (Ch.24)
Contextual interpretation Hypothesis testing logic (Ch.13) Effect sizes (Ch.17)

Threads Updated

  • Sam and Daria: Formal proportion test conducted ($z = 1.30$, $p = 0.097$). Still fail to reject at $\alpha = 0.05$ with only 65 shots. Need more data — power analysis in Ch.17.
  • Maya: Tested county hypertension prevalence above national rate — found significant evidence ($z = 2.00$, $p = 0.023$). Used proportion inference to estimate disease prevalence for Bayesian screening analysis.
  • Polling and elections: Margin of error demystified. Random error vs. bias distinction. The 2016/2020 lessons.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

In a hypothesis test, the standard error uses $p_0$ (because you're computing under the null). In a confidence interval, the standard error uses $\hat{p}$ (because there's no null to assume). The success-failure condition is your checkpoint for whether the normal approximation is valid. And the margin of error in a poll only measures random sampling error — not bias, not nonresponse, not question wording. The uncertainty in any proportion estimate is always at least as large as the margin of error, and usually larger.

Key Terms

Term Definition
Sample proportion ($\hat{p}$) The proportion of successes in the sample: $\hat{p} = X/n$
Population proportion ($p$) The true proportion of successes in the entire population
One-sample z-test for proportions A hypothesis test that uses the standard normal distribution to test a claim about a population proportion
Success-failure condition The requirement that $np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests) or $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ (for CIs); ensures the normal approximation is valid
Normal approximation The use of the normal distribution to approximate the sampling distribution of $\hat{p}$, justified by the CLT for proportions
Margin of error for proportions $z^* \sqrt{\hat{p}(1-\hat{p})/n}$; the maximum likely distance between $\hat{p}$ and $p$ at a given confidence level
Wilson interval An improved confidence interval for proportions with better coverage properties than the Wald interval, especially for small samples and extreme proportions
Plus-four method A simple improvement to the Wald interval: add 2 successes and 2 failures before computing the CI ($\tilde{p} = (X+2)/(n+4)$)