Quiz: Inference for Proportions

Q: A quality control inspector tests 500 items and finds 18 defective. She tests vs. . What is the test statistic? (a) (b) (c) (d)

(b) . The test statistic uses in the standard error (not ). Option (a) incorrectly uses in the SE. Option (c) uses the count form, which gives the same but the setup is equivalent to (b). Option (d) uses the wrong standard error formula entirely. The calculation: , , .

Q: The plus-four method computes and then uses the Wald formula with and . If and , what is ? (a) 0 (b) 0.083 (c) 0.10 (d) 0.50

(b) 0.083. . The plus-four method converts the impossible to a more reasonable starting point. Without this adjustment, the Wald interval would be degenerate: , which is clearly wrong — getting 0 successes in 20 trials doesn't prove the proportion is exactly zero.

Q: In a two-tailed test with , the critical values for the z-test are . A test statistic of would lead to: (a) Fail to reject because is negative (b) Reject because (c) Reject only if specifies the negative direction (d) An inconclusive result

(b) Reject because . In a two-tailed test, we reject when exceeds , regardless of the direction. A test statistic of means the sample proportion is 2.15 standard errors below* , which is evidence that . The p-value is .

Q: A polling organization wants to achieve a margin of error of ±2% at 95% confidence. Using the conservative estimate , the required sample size is approximately: (a) 625 (b) 1,000 (c) 2,401 (d) 9,604

(c) 2,401. . A margin of error of ±2% requires about 2,400 people — much more than the ~1,000 needed for ±3%. This is the effect: cutting the margin of error by one-third (from 3% to 2%) requires more than doubling the sample.

Contributors

Quiz: Inference for Proportions

Test your understanding of the one-sample z-test for proportions, confidence intervals, conditions, and applied interpretation. Try to answer each question before revealing the answer.

1. The standard error in the z-test for a proportion uses $p_0$ (the null hypothesis value) rather than $\hat{p}$ (the sample proportion). Why?

(a) Because $p_0$ is always larger than $\hat{p}$ (b) Because we assume $H_0$ is true when computing the test statistic and p-value (c) Because $\hat{p}$ is not a valid estimate of $p$ (d) Because the formula is simpler with $p_0$

Answer

**(b) Because we assume $H_0$ is true when computing the test statistic and p-value.** In hypothesis testing, we calculate everything *under the assumption that $H_0$ is true.* If $H_0: p = p_0$ is true, then the standard error of $\hat{p}$ is $\sqrt{p_0(1-p_0)/n}$, not $\sqrt{\hat{p}(1-\hat{p})/n}$. We ask: "If the true proportion really were $p_0$, how likely is the data we observed?" That requires using $p_0$ in the standard error. In confidence intervals, there's no hypothesized value, so we use $\hat{p}$ as our best estimate.

2. The success-failure condition for a one-sample z-test for proportions requires:

(a) $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ (b) $np_0 \geq 10$ and $n(1-p_0) \geq 10$ (c) $n \geq 30$ (d) $np_0 \geq 5$ and $n(1-p_0) \geq 5$

Answer

**(b) $np_0 \geq 10$ and $n(1-p_0) \geq 10$.** For hypothesis tests, the success-failure condition uses $p_0$ because we're checking whether the normal approximation is valid *under the null hypothesis*. Option (a) is the version used for confidence intervals (where $\hat{p}$ replaces $p_0$). Option (c) is a condition for the CLT for means, not proportions. Option (d) uses a threshold of 5, which is less conservative (some older textbooks use 5, but 10 is the modern standard).

3. A quality control inspector tests 500 items and finds 18 defective. She tests $H_0: p = 0.03$ vs. $H_a: p > 0.03$. What is the test statistic?

(a) $z = \frac{0.036 - 0.03}{\sqrt{0.036 \times 0.964 / 500}} = 0.72$ (b) $z = \frac{0.036 - 0.03}{\sqrt{0.03 \times 0.97 / 500}} = 0.79$ (c) $z = \frac{18 - 15}{\sqrt{500 \times 0.03 \times 0.97}} = 0.79$ (d) $z = \frac{0.036 - 0.03}{0.03 / \sqrt{500}} = 4.47$

Answer

**(b) $z = \frac{0.036 - 0.03}{\sqrt{0.03 \times 0.97 / 500}} = 0.79$.** The test statistic uses $p_0 = 0.03$ in the standard error (not $\hat{p} = 0.036$). Option (a) incorrectly uses $\hat{p}$ in the SE. Option (c) uses the count form, which gives the same $z$ but the setup is equivalent to (b). Option (d) uses the wrong standard error formula entirely. The calculation: $\hat{p} = 18/500 = 0.036$, $SE = \sqrt{0.03 \times 0.97/500} = \sqrt{0.0000582} = 0.00763$, $z = 0.006/0.00763 = 0.79$.

4. If the 95% confidence interval for a population proportion is (0.42, 0.58), which of the following null hypothesis values would be rejected at $\alpha = 0.05$ in a two-sided test?

(a) $p_0 = 0.50$ (b) $p_0 = 0.45$ (c) $p_0 = 0.40$ (d) $p_0 = 0.55$

Answer

**(c) $p_0 = 0.40$.** A 95% CI contains all values that would *not* be rejected at $\alpha = 0.05$ in a two-sided test (the CI-test duality from [Chapter 13](../../part-04-bridge-to-inference/chapter-13-hypothesis-testing/index.md)). Since 0.40 is outside the interval (0.42, 0.58), it would be rejected. Values 0.45, 0.50, and 0.55 are all inside the interval and would not be rejected.

5. A poll of 1,200 adults finds that 54% support a new policy, with a margin of error of ±2.8%. A commentator says: "54% support the policy — that's a clear majority." Is this conclusion statistically justified?

(a) Yes — 54% is greater than 50%, so it's a majority (b) Yes — the entire confidence interval (51.2%, 56.8%) is above 50% (c) No — 50% is within the margin of error of 54% (d) No — the margin of error is too large for any conclusion

Answer

**(b) Yes — the entire confidence interval (51.2%, 56.8%) is above 50%.** The 95% CI is $54\% \pm 2.8\% = (51.2\%, 56.8\%)$. Since the entire interval is above 50%, we can be confident that a majority supports the policy. This is equivalent to rejecting $H_0: p = 0.50$ in favor of $H_a: p > 0.50$ at $\alpha = 0.05$. Option (a) is technically right about the point estimate but doesn't account for uncertainty. Option (c) is wrong — 50% is outside the CI, so it *is* distinguishable from 54%.

6. Which of the following would cause the margin of error for a proportion to decrease? (Select all that apply.)

(a) Increasing the sample size from 400 to 1,600 (b) Decreasing the confidence level from 99% to 95% (c) Observing a sample proportion closer to 0.5 (d) Observing a sample proportion closer to 0 or 1

Answer

**(a) and (b) and (d).** The margin of error is $E = z^* \sqrt{\hat{p}(1-\hat{p})/n}$. - (a) Increasing $n$ decreases $E$. Quadrupling $n$ from 400 to 1,600 halves the margin of error (the $\sqrt{n}$ effect). - (b) Decreasing the confidence level reduces $z^*$ (from 2.576 to 1.960), which decreases $E$. - (c) INCREASES the margin of error — $\hat{p}(1-\hat{p})$ is maximized at $\hat{p} = 0.5$. - (d) DECREASES the margin of error — $\hat{p}(1-\hat{p})$ is smaller when $\hat{p}$ is near 0 or 1.

7. A researcher surveys 30 patients and finds that 27 experience symptom relief ($\hat{p} = 0.90$). She computes a 95% Wald confidence interval and gets (0.79, 1.01). What is wrong with this interval?

(a) The sample size is too small for any confidence interval (b) The upper bound exceeds 1.0, which is impossible for a proportion (c) The interval is too wide to be useful (d) She should have used a t-distribution instead of z

Answer

**(b) The upper bound exceeds 1.0, which is impossible for a proportion.** A proportion is bounded between 0 and 1 by definition. An upper bound of 1.01 is nonsensical — you can't have 101% of patients experience relief. This is a known weakness of the Wald interval when $\hat{p}$ is close to 0 or 1. The Wilson interval or plus-four method would produce a more reasonable interval that stays within [0, 1]. Also note: the success-failure condition for the CI is borderline: $n(1-\hat{p}) = 30 \times 0.10 = 3 < 10$, which is another reason the Wald interval is unreliable here.

8. The plus-four method computes $\tilde{p} = (X+2)/(n+4)$ and then uses the Wald formula with $\tilde{p}$ and $n+4$. If $X = 0$ and $n = 20$, what is $\tilde{p}$?

(a) 0 (b) 0.083 (c) 0.10 (d) 0.50

Answer

**(b) 0.083.** $\tilde{p} = (0 + 2)/(20 + 4) = 2/24 = 0.0833$. The plus-four method converts the impossible $\hat{p} = 0/20 = 0$ to a more reasonable starting point. Without this adjustment, the Wald interval would be degenerate: $0 \pm 0 = (0, 0)$, which is clearly wrong — getting 0 successes in 20 trials doesn't prove the proportion is exactly zero.

9. Maya surveys 400 adults and finds that 208 have hypertension. She tests $H_0: p = 0.47$ vs. $H_a: p > 0.47$ and gets $z = 2.00$, $p = 0.023$. Which interpretation is correct?

(a) There is a 2.3% probability that the county's hypertension rate is 47% (b) There is a 2.3% probability that the results are due to chance (c) If the county's true hypertension rate were 47%, there's a 2.3% probability of observing a sample proportion as high or higher than 52% (d) The county's hypertension rate is definitely higher than 47%

Answer

**(c) If the county's true hypertension rate were 47%, there's a 2.3% probability of observing a sample proportion as high or higher than 52%.** This is the correct conditional probability interpretation of the p-value: $P(\hat{p} \geq 0.52 \mid p = 0.47) = 0.023$. Option (a) is the classic p-value misconception — it inverts the conditional. Option (b) is vague and incorrect. Option (d) is too strong — we can never "definitely" prove a population parameter value.

10. A poll has a margin of error of ±3 percentage points. This margin of error accounts for:

(a) Both random sampling error and bias (b) Random sampling error only (c) Bias only (d) The total error in the poll's estimate

Answer

**(b) Random sampling error only.** The margin of error measures the variability due to *random* sampling — the fact that different random samples would give different results. It does NOT account for bias (systematic errors from nonresponse, question wording, sampling frame problems, etc.). This is why a poll can have a small margin of error and still be quite wrong — if the sample is biased, the margin of error understates the true uncertainty.

11. Two polls are conducted on the same question:

Poll A: $n = 400$, $\hat{p} = 0.60$, margin of error = ±4.8%
Poll B: $n = 1{,}600$, $\hat{p} = 0.58$, margin of error = ±2.4%

Which statement is most accurate?

(a) Poll A is more accurate because it found a higher proportion (b) Poll B is more precise because its margin of error is smaller (c) The polls contradict each other because they found different proportions (d) We can combine the polls for a total sample of 2,000

Answer

**(b) Poll B is more precise because its margin of error is smaller.** A smaller margin of error means the estimate is more precise (narrower CI). The different point estimates (60% vs. 58%) are not contradictory — they're both within each other's margins of error. Option (a) confuses a higher estimate with accuracy. Option (d) is tempting but requires careful methodology (you can't simply add the samples unless they come from the same population and sampling frame).

12. Sam tests $H_0: p = 0.31$ vs. $H_a: p > 0.31$ using Daria's shooting data ($n = 65$, $\hat{p} = 0.385$) and gets $p = 0.097$. He fails to reject at $\alpha = 0.05$. Which conclusion is correct?

(a) Daria has not improved her shooting (b) Daria's true shooting percentage is 31% (c) There is insufficient evidence to conclude Daria has improved, based on 65 shots (d) The test proves that any observed improvement is due to chance

Answer

**(c) There is insufficient evidence to conclude Daria has improved, based on 65 shots.** Failing to reject $H_0$ does NOT mean $H_0$ is true — it means we don't have enough evidence to reject it. Options (a) and (b) state that $H_0$ is true, which we cannot conclude. Option (d) says the test "proves" the improvement is chance, which is too strong. The correct conclusion acknowledges the limitation: with only 65 shots, sampling variability is too large to detect a difference of 7.5 percentage points with confidence.

13. For a confidence interval for a proportion, the success-failure condition uses $\hat{p}$ rather than $p_0$. Why?

(a) Because $\hat{p}$ is always more accurate than $p_0$ (b) Because there is no hypothesized value in a confidence interval — we're estimating, not testing (c) Because the condition is less strict for confidence intervals (d) Because the Wald interval requires $\hat{p}$

Answer

**(b) Because there is no hypothesized value in a confidence interval — we're estimating, not testing.** In a confidence interval, there's no $H_0$ and therefore no $p_0$. We're trying to estimate the unknown $p$, and the best estimate we have is $\hat{p}$. So we use $\hat{p}$ to check conditions and compute the standard error. In contrast, hypothesis tests assume $H_0$ is true and use $p_0$.

14. A sample of 50 products from a factory has 1 defective item ($\hat{p} = 0.02$). A quality manager wants to test whether the defect rate exceeds 1%. What is the main concern with using a z-test here?

(a) The sample size is too small (b) The success-failure condition fails: $np_0 = 50 \times 0.01 = 0.5 < 10$ (c) The sample is not random (d) The proportion is too close to zero for any test

Answer

**(b) The success-failure condition fails: $np_0 = 50 \times 0.01 = 0.5 < 10$.** With $p_0 = 0.01$ and $n = 50$, we expect only 0.5 "successes" (defective items) under $H_0$. The normal approximation is unreliable for such rare events. The appropriate alternative is an exact binomial test. Note that (a) is not exactly right — the issue isn't the sample size per se, but the combination of small $n$ and extreme $p_0$.

15. The Wilson interval is generally preferred over the Wald interval because:

(a) It always produces narrower intervals (b) It has better coverage properties, especially for small samples and extreme proportions (c) It doesn't require the success-failure condition (d) Both (b) and (c)

Answer

**(d) Both (b) and (c).** The Wilson interval has genuinely better coverage — its actual confidence level is closer to the nominal level across all values of $p$ and $n$. And unlike the Wald interval, it doesn't require the success-failure condition to perform well. It does NOT always produce narrower intervals (sometimes it's wider, sometimes narrower), so (a) is wrong.

16. A researcher finds $\hat{p} = 0.52$ in a sample of $n = 2{,}500$. She tests $H_0: p = 0.50$ vs. $H_a: p \neq 0.50$ and gets $z = 2.00$, $p = 0.046$. She concludes "the proportion is significantly different from 50%." What important caveat should she add?

(a) The result might be a Type II error (b) The difference, while statistically significant, is only 2 percentage points — which may not be practically meaningful (c) She should have used a one-tailed test (d) The sample size is too large for a z-test

Answer

**(b) The difference, while statistically significant, is only 2 percentage points — which may not be practically meaningful.** With $n = 2{,}500$, even tiny deviations from 50% can be "statistically significant" because the standard error is very small. But "52% differs from 50%" may not matter in any practical sense. This is the statistical significance vs. practical significance distinction (preview of [Chapter 17](../chapter-17-power-and-effect-sizes/index.md)). Always ask: is the difference *big enough to matter*, not just *unlikely enough to be non-random*?

17. In a two-tailed test with $\alpha = 0.05$, the critical values for the z-test are $z^* = \pm 1.96$. A test statistic of $z = -2.15$ would lead to:

(a) Fail to reject $H_0$ because $z$ is negative (b) Reject $H_0$ because $|z| = 2.15 > 1.96$ (c) Reject $H_0$ only if $H_a$ specifies the negative direction (d) An inconclusive result

Answer

**(b) Reject $H_0$ because $|z| = 2.15 > 1.96$.** In a two-tailed test, we reject $H_0$ when $|z|$ exceeds $z^*$, regardless of the direction. A test statistic of $z = -2.15$ means the sample proportion is 2.15 standard errors *below* $p_0$, which is evidence that $p \neq p_0$. The p-value is $2 \times P(Z \leq -2.15) = 2 \times 0.0158 = 0.0316 < 0.05$.

18. A polling organization wants to achieve a margin of error of ±2% at 95% confidence. Using the conservative estimate $\hat{p} = 0.5$, the required sample size is approximately:

(a) 625 (b) 1,000 (c) 2,401 (d) 9,604

Answer

**(c) 2,401.** $n = \left(\frac{z^*}{E}\right)^2 \hat{p}(1-\hat{p}) = \left(\frac{1.96}{0.02}\right)^2 \times 0.25 = 96.04^2 \times 0.25 = 9{,}220 \times 0.25 = 2{,}401$. A margin of error of ±2% requires about 2,400 people — much more than the ~1,000 needed for ±3%. This is the $\sqrt{n}$ effect: cutting the margin of error by one-third (from 3% to 2%) requires more than doubling the sample.

19. A medical researcher tests whether the proportion of patients who respond to a treatment exceeds 60%. She surveys $n = 80$ patients and finds $\hat{p} = 0.70$. She computes $z = 1.83$ and $p = 0.034$. At $\alpha = 0.05$, she rejects $H_0$. She then constructs a 95% CI and gets (0.60, 0.80). She's confused: "My CI includes 0.60 — how can I reject $H_0: p = 0.60$?"

What is the explanation?

(a) She made a computational error in either the test or the CI (b) The CI uses $\hat{p}$ in the SE while the test uses $p_0$, so they can give slightly different results at the boundary (c) The test is one-tailed but the CI is two-tailed, so they don't need to exactly agree (d) Both (b) and (c)

Answer

**(d) Both (b) and (c).** There are two reasons for the apparent discrepancy. First, the hypothesis test uses $p_0 = 0.60$ in the standard error, while the CI uses $\hat{p} = 0.70$. These give slightly different standard errors, which can lead to different conclusions right at the boundary. Second, and more importantly, the CI is inherently two-tailed (it corresponds to a two-sided test), while her hypothesis test is one-tailed ($H_a: p > 0.60$). The CI-test duality is exact only when comparing a two-sided test to a two-tailed CI. A one-tailed test at $\alpha = 0.05$ can reject $H_0$ even when the 95% CI just barely includes $p_0$.

20. Which of the following best describes what is "new" in Chapter 14 compared to Chapters 12 and 13?

(a) The formulas for proportion inference are completely different from those for means (b) This chapter introduces the z-test for the first time (c) This chapter applies the frameworks from Chapters 12 and 13 specifically to proportions, with rigorous conditions checking and improved CI methods (d) This chapter adds the Wilson interval as a replacement for all previous methods

Answer

**(c) This chapter applies the frameworks from Chapters 12 and 13 specifically to proportions, with rigorous conditions checking and improved CI methods.** Chapter 14 is an application chapter — it takes the CI framework from [Chapter 12](../../part-04-bridge-to-inference/chapter-12-confidence-intervals/index.md) and the hypothesis testing framework from [Chapter 13](../../part-04-bridge-to-inference/chapter-13-hypothesis-testing/index.md) and applies them systematically to proportions. The new content includes rigorous conditions checking (the success-failure condition), improved CI methods (Wilson interval and plus-four method), and extensive applied practice with polling, public health, and sports analytics examples. It's not introducing entirely new frameworks but deepening and applying existing ones.