Key Takeaways: Inference for Proportions

Contributors

Key Takeaways: Inference for Proportions

One-Sentence Summary

The one-sample z-test for proportions uses the sampling distribution of $\hat{p}$ (grounded in the CLT from Chapter 11) to test claims about population proportions — with the success-failure condition as the gatekeeper, $p_0$ in the test's standard error, and the Wilson interval as an improvement over the basic Wald CI for challenging situations.

Core Concepts at a Glance

Concept	Definition	Why It Matters
Sample proportion ($\hat{p}$)	$X/n$; the proportion of "successes" in the sample	The point estimate for the population proportion $p$
Population proportion ($p$)	The true proportion in the entire population	What we're trying to estimate or test
Success-failure condition	$np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests)	Ensures the normal approximation is valid
Normal approximation	The CLT-based assumption that $\hat{p}$ is approximately normally distributed	The theoretical foundation for the z-test and Wald CI
Wilson interval	An improved CI formula with better coverage properties	Fixes the Wald interval's problems for small $n$ or extreme $\hat{p}$
Plus-four method	Add 2 successes and 2 failures, then use Wald formula	Simple improvement for small samples
Margin of error for proportions	$z^* \sqrt{\hat{p}(1-\hat{p})/n}$; half-width of the CI	What "±3 points" means in polls

The One-Sample Z-Test for a Proportion

The Five Steps

Step	Action	Key Detail
1	State $H_0: p = p_0$ and $H_a$	Choose one- or two-tailed based on the research question
2	Choose $\alpha$ (typically 0.05)	Consider consequences of Type I and Type II errors
3	Check three conditions	Random, independent (10%), success-failure
4	Compute $z$ and p-value	$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$
5	Conclude in context	Not just "reject" — explain what it means

The Test Statistic

$$\boxed{z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}}$$

Uses $p_0$ in the SE because we compute under the assumption that $H_0$ is true.

The Conditions

Condition	What It Checks	What Happens If It Fails
Random sample	Representative data	Results not generalizable (bias)
Independence (10% condition)	Observations are independent	SE formula needs adjustment
Success-failure ($np_0 \geq 10$, $n(1-p_0) \geq 10$)	Normal approximation is valid	Use exact binomial test instead

Confidence Intervals: Three Methods

Method	Formula	When to Use
Wald	$\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}$	Large $n$, $\hat{p}$ not extreme
Wilson	Complex formula (use software)	Any situation; best coverage
Plus-four	$\tilde{p} \pm z^* \sqrt{\tilde{p}(1-\tilde{p})/(n+4)}$ where $\tilde{p} = (X+2)/(n+4)$	Simple improvement for small $n$

Key difference from hypothesis tests: CIs use $\hat{p}$ in the SE (not $p_0$) because there's no hypothesized value.

Margin of Error in Polling

$$E = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Sample Size	Approx. Margin of Error ($p \approx 0.5$, 95% CI)
400	±4.9%
1,000	±3.1%
1,500	±2.5%
4,000	±1.5%

Critical caveat: The margin of error measures random sampling error only — not bias (nonresponse, coverage, question wording). A biased poll can have a small margin of error and still be very wrong.

The $p_0$ vs. $\hat{p}$ Rule

Context	Use in SE	Why
Hypothesis test	$p_0$	Computing probabilities under $H_0$
Confidence interval	$\hat{p}$	No hypothesized value to assume
Success-failure check (test)	$p_0$	Checking conditions under $H_0$
Success-failure check (CI)	$\hat{p}$	No $p_0$ available

Common Misconceptions

Misconception	Reality
"Failing to reject means $p = p_0$"	It means there's not enough evidence to conclude otherwise
"The margin of error covers all sources of error"	It covers only random sampling error, not bias
"A significant result means a large difference"	Significance depends on $n$; large samples can detect tiny differences
"The Wald interval always works"	It fails for small $n$ and extreme $\hat{p}$; use Wilson or plus-four
"If the success-failure condition fails, no inference is possible"	Use exact binomial test or Wilson interval

Python Quick Reference

import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

# --- One-sample z-test for a proportion ---
z_stat, p_value = proportions_ztest(count=208, nobs=400,
                                     value=0.47,
                                     alternative='larger')

# --- Confidence intervals (three methods) ---
ci_wald = proportion_confint(208, 400, alpha=0.05, method='normal')
ci_wilson = proportion_confint(208, 400, alpha=0.05, method='wilson')

# --- Plus-four (manual) ---
p_tilde = (208 + 2) / (400 + 4)
se_tilde = np.sqrt(p_tilde * (1 - p_tilde) / (400 + 4))
ci_pf = (p_tilde - 1.96 * se_tilde, p_tilde + 1.96 * se_tilde)

# --- Exact binomial test (when success-failure fails) ---
result = stats.binomtest(k=3, n=20, p=0.05, alternative='greater')

How This Chapter Connects

This Chapter	Builds On	Leads To
Z-test for proportions	CLT for proportions (Ch.11), hypothesis testing (Ch.13)	Two-sample proportion test (Ch.16)
CI for proportions (Wald/Wilson)	CI framework (Ch.12)	Bootstrap CI (Ch.18)
Success-failure condition	Binomial distribution (Ch.10), CLT conditions (Ch.11)	Chi-square conditions (Ch.19)
Margin of error in polls	Sampling methods (Ch.4), standard error (Ch.11)	Power analysis (Ch.17)
PPV and base rates	Bayes' theorem (Ch.9)	Logistic regression PPV (Ch.24)
Contextual interpretation	Hypothesis testing logic (Ch.13)	Effect sizes (Ch.17)

Threads Updated

Sam and Daria: Formal proportion test conducted ($z = 1.30$, $p = 0.097$). Still fail to reject at $\alpha = 0.05$ with only 65 shots. Need more data — power analysis in Ch.17.
Maya: Tested county hypertension prevalence above national rate — found significant evidence ($z = 2.00$, $p = 0.023$). Used proportion inference to estimate disease prevalence for Bayesian screening analysis.
Polling and elections: Margin of error demystified. Random error vs. bias distinction. The 2016/2020 lessons.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

In a hypothesis test, the standard error uses $p_0$ (because you're computing under the null). In a confidence interval, the standard error uses $\hat{p}$ (because there's no null to assume). The success-failure condition is your checkpoint for whether the normal approximation is valid. And the margin of error in a poll only measures random sampling error — not bias, not nonresponse, not question wording. The uncertainty in any proportion estimate is always at least as large as the margin of error, and usually larger.

Key Terms

Term	Definition
Sample proportion ($\hat{p}$)	The proportion of successes in the sample: $\hat{p} = X/n$
Population proportion ($p$)	The true proportion of successes in the entire population
One-sample z-test for proportions	A hypothesis test that uses the standard normal distribution to test a claim about a population proportion
Success-failure condition	The requirement that $np_0 \geq 10$ and $n(1-p_0) \geq 10$ (for tests) or $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ (for CIs); ensures the normal approximation is valid
Normal approximation	The use of the normal distribution to approximate the sampling distribution of $\hat{p}$, justified by the CLT for proportions
Margin of error for proportions	$z^* \sqrt{\hat{p}(1-\hat{p})/n}$; the maximum likely distance between $\hat{p}$ and $p$ at a given confidence level
Wilson interval	An improved confidence interval for proportions with better coverage properties than the Wald interval, especially for small samples and extreme proportions
Plus-four method	A simple improvement to the Wald interval: add 2 successes and 2 failures before computing the CI ($\tilde{p} = (X+2)/(n+4)$)