Quiz: Inference for Means

Test your understanding of the one-sample t-test, conditions, robustness, and the z-test vs. t-test distinction. Try to answer each question before revealing the answer.


1. The one-sample t-test is used instead of the z-test when:

(a) The sample size is small (b) The population standard deviation $\sigma$ is unknown (c) The data are not normally distributed (d) The significance level is less than 0.05

Answer **(b) The population standard deviation $\sigma$ is unknown.** The defining difference between the z-test and t-test is whether $\sigma$ is known or must be estimated by $s$. Sample size, normality, and significance level are separate considerations. When $\sigma$ is unknown (the usual case), we use $s$ in the standard error, which introduces extra variability — and the t-distribution accounts for that extra uncertainty with heavier tails.

2. The degrees of freedom for a one-sample t-test with $n = 25$ observations is:

(a) 25 (b) 24 (c) 26 (d) It depends on the significance level

Answer **(b) 24.** For a one-sample t-test, $df = n - 1 = 25 - 1 = 24$. The $n - 1$ arises because one degree of freedom is "used up" by estimating the mean $\bar{x}$ from the same data used to compute $s$. This is the same $n - 1$ that appears in the denominator of the sample variance formula from Chapter 6.

3. As degrees of freedom increase, the t-distribution:

(a) Becomes more spread out with heavier tails (b) Stays the same regardless of degrees of freedom (c) Approaches the standard normal distribution (d) Becomes skewed to the right

Answer **(c) Approaches the standard normal distribution.** As $df \to \infty$, $s$ becomes an increasingly reliable estimate of $\sigma$, and the extra uncertainty disappears. By $df = 100$, the t-distribution is nearly indistinguishable from the standard normal. At $df = \infty$, they are identical. This makes intuitive sense: with a very large sample, $s \approx \sigma$, so there's no need for the "extra caution" that the heavier tails provide.

4. A researcher tests $H_0: \mu = 50$ vs. $H_a: \mu > 50$ using a sample of 20 observations. She calculates $t = 1.85$ with $df = 19$. The p-value is:

(a) The area to the left of 1.85 under the t-distribution with 19 df (b) The area to the right of 1.85 under the t-distribution with 19 df (c) Twice the area to the right of 1.85 under the t-distribution with 19 df (d) The area between -1.85 and 1.85 under the t-distribution with 19 df

Answer **(b) The area to the right of 1.85 under the t-distribution with 19 df.** Since the alternative hypothesis is $H_a: \mu > 50$ (right-tailed), the p-value is the probability of getting a t-statistic as large as or larger than the observed value, assuming $H_0$ is true. That's the right-tail area: $P(T_{19} \geq 1.85) \approx 0.040$.

5. Which of the following is NOT a condition for the one-sample t-test?

(a) The data come from a random sample or random assignment (b) Observations are independent of each other (c) The population standard deviation is known (d) The sampling distribution of $\bar{x}$ is approximately normal

Answer **(c) The population standard deviation is known.** In fact, the opposite is true: we use the t-test precisely *because* $\sigma$ is unknown. The three conditions are randomness, independence, and normality of the sampling distribution (which can be satisfied by approximate normality of the population for small samples or by the CLT for larger samples).

6. A sample of $n = 8$ observations has a histogram that shows strong right skewness. A t-test would be:

(a) Perfectly appropriate — the t-test works for any distribution (b) Inappropriate — the sample is too small for the CLT to compensate for the skewness (c) Appropriate only if $\alpha = 0.01$ instead of 0.05 (d) Appropriate as long as there are no outliers

Answer **(b) Inappropriate — the sample is too small for the CLT to compensate for the skewness.** With $n < 15$, the t-test requires the population to be approximately normal. Strong skewness with $n = 8$ means the sampling distribution of $\bar{x}$ may not be approximately normal, so the p-value from the t-test could be misleading. A nonparametric test (like the Wilcoxon signed-rank test from Chapter 21) or a transformation would be more appropriate.

7. A researcher has $n = 100$ observations from a moderately skewed population. The t-test is:

(a) Invalid because the population is not normal (b) Valid because $n \geq 30$ and the CLT ensures approximate normality of $\bar{x}$ (c) Valid only if a Shapiro-Wilk test fails to reject normality (d) Invalid because moderate skewness always invalidates the t-test

Answer **(b) Valid because $n \geq 30$ and the CLT ensures approximate normality of $\bar{x}$.** The t-test is robust to moderate departures from normality when the sample size is large. With $n = 100$, the CLT guarantees that $\bar{x}$ has an approximately normal sampling distribution regardless of the population shape (as long as the population has a finite variance). The Shapiro-Wilk test (option c) tests whether the *data* are normal, which is a stricter requirement than what the t-test actually needs.

8. The t-test statistic formula is $t = (\bar{x} - \mu_0) / (s / \sqrt{n})$. In this formula, $s / \sqrt{n}$ represents:

(a) The population standard deviation (b) The standard error of the sample mean (c) The margin of error (d) The degrees of freedom

Answer **(b) The standard error of the sample mean.** The standard error $SE = s / \sqrt{n}$ estimates how much the sample mean $\bar{x}$ typically varies from the population mean $\mu$ due to random sampling. The t-statistic measures how many standard errors the sample mean is from the hypothesized value $\mu_0$.

9. Maya tests $H_0: \mu = 240$ vs. $H_a: \mu > 240$ and gets $p = 0.011$. At $\alpha = 0.05$:

(a) She fails to reject $H_0$ because the p-value is small (b) She rejects $H_0$ and concludes the population mean is exactly 258 (c) She rejects $H_0$ and concludes there is sufficient evidence that $\mu > 240$ (d) She accepts $H_a$ and proves $\mu > 240$

Answer **(c) She rejects $H_0$ and concludes there is sufficient evidence that $\mu > 240$.** Since $p = 0.011 < 0.05 = \alpha$, she rejects $H_0$. But she doesn't know the exact value of $\mu$ (option b), and we never "prove" a hypothesis or "accept" $H_a$ (option d). The correct conclusion is that the data provide sufficient evidence at the 0.05 level to support the claim that the true mean exceeds 240.

10. Alex's 95% confidence interval for average watch time is (42.76, 48.44) minutes. The industry benchmark is 45 minutes. Without computing a test statistic, we know that testing $H_0: \mu = 45$ at $\alpha = 0.05$ would:

(a) Reject $H_0$ (b) Fail to reject $H_0$ (c) Be inconclusive (d) Require additional information

Answer **(b) Fail to reject $H_0$.** By the CI-test duality, a two-sided hypothesis test at $\alpha = 0.05$ rejects $H_0$ if and only if $\mu_0$ falls *outside* the 95% CI. Since 45 is inside the interval (42.76, 48.44), we fail to reject. This is the connection between confidence intervals and hypothesis tests from Chapter 13: the CI contains all values that would not be rejected.

11. A t-statistic of $t = 0.437$ with $df = 24$ produces a two-tailed p-value of approximately 0.67. This means:

(a) There is a 67% chance that $H_0$ is true (b) The data are highly inconsistent with $H_0$ (c) If $H_0$ is true, there is a 67% probability of seeing data at least as extreme as what was observed (d) The effect is statistically significant at the 0.05 level

Answer **(c) If $H_0$ is true, there is a 67% probability of seeing data at least as extreme as what was observed.** A large p-value (0.67) means the data are perfectly consistent with $H_0$. The observed sample mean is well within the range of normal sampling variation under $H_0$. Remember: the p-value is NOT the probability that $H_0$ is true (option a) — it's a conditional probability: $P(\text{data this extreme} \mid H_0)$.

12. Which of the following best describes "robustness" of the t-test?

(a) The t-test always gives the correct answer regardless of assumptions (b) The t-test produces approximately correct results even when the normality assumption is not perfectly met (c) The t-test is robust only when $n > 1000$ (d) The t-test is robust to violations of the randomness condition

Answer **(b) The t-test produces approximately correct results even when the normality assumption is not perfectly met.** Robustness means the procedure performs well under assumption violations. The t-test is robust to non-normality (especially for larger $n$), but it is NOT robust to violations of the randomness condition (biased samples cannot be fixed by any test) or to extreme outliers. And it doesn't always give the "correct" answer — it gives approximately correct p-values and coverage probabilities.

13. Sam gets $p = 0.051$ when testing whether scoring has changed. Which interpretation is most appropriate?

(a) The result is not significant, so the coaching change had no effect (b) The result just barely misses significance; the evidence is suggestive but not conclusive at $\alpha = 0.05$ (c) The result is essentially significant and should be treated as such (d) Sam should lower $\alpha$ to 0.10 to make the result significant

Answer **(b) The result just barely misses significance; the evidence is suggestive but not conclusive at $\alpha = 0.05$.** A p-value of 0.051 represents nearly the same evidence as 0.049 — the difference is trivial. The responsible interpretation acknowledges this borderline result: the data are suggestive of an effect but don't meet the conventional threshold. Changing $\alpha$ after seeing the data (option d) is p-hacking. Declaring no effect at all (option a) is too strong — absence of evidence at a specific threshold isn't evidence of absence.

14. If you increase the sample size from $n = 25$ to $n = 100$ while everything else stays the same ($\bar{x}$, $s$, $\mu_0$), the t-statistic will:

(a) Stay the same (b) Decrease (c) Increase (approximately double) (d) Increase by a factor of exactly 4

Answer **(c) Increase (approximately double).** The t-statistic is $t = (\bar{x} - \mu_0) / (s / \sqrt{n})$. The numerator doesn't change, but the denominator decreases by a factor of $\sqrt{100}/\sqrt{25} = 10/5 = 2$. So the t-statistic doubles. This illustrates why larger samples produce more significant results: the standard error shrinks, making the same difference more detectable. (The increase is approximate because the t-distribution also changes from $df = 24$ to $df = 99$.)

15. A 95% confidence interval for a mean is (18.2, 22.8) and a 99% confidence interval is (17.4, 23.6). Which statement is correct?

(a) The 99% CI is narrower because it is more confident (b) The 99% CI is wider because it needs to be more likely to capture $\mu$ (c) The two intervals must have the same width (d) The 95% CI is more likely to contain $\mu$ than the 99% CI

Answer **(b) The 99% CI is wider because it needs to be more likely to capture $\mu$.** To increase the probability that the interval captures the true parameter (from 95% to 99%), you need a wider net. The 99% CI uses a larger critical value ($t^*$), which increases the margin of error. This is the tradeoff triangle from Chapter 12: higher confidence requires wider intervals (for fixed $n$ and $s$).

16. In the t-test formula $t = (\bar{x} - \mu_0)/(s/\sqrt{n})$, which change would produce the LARGEST t-statistic?

(a) Doubling $n$ (from 25 to 50) (b) Doubling $(\bar{x} - \mu_0)$ (from 5 to 10) (c) Halving $s$ (from 20 to 10) (d) All three changes would produce the same increase in $t$

Answer **(b) Doubling $(\bar{x} - \mu_0)$** and **(c) Halving $s$** both double $t$ — they have the same effect. **Doubling $n$** only increases $t$ by a factor of $\sqrt{2} \approx 1.41$. So options (b) and (c) are tied, and both produce a larger increase than option (a). The t-statistic is proportional to $(\bar{x} - \mu_0)$, inversely proportional to $s$, and proportional to $\sqrt{n}$. Changes in the numerator have a linear effect; changes in $n$ have a square-root effect.

17. A factory tests $H_0: \mu = 500$ grams for its products. Which scenario is a Type II error?

(a) The test rejects $H_0$ when the true mean is actually 500 grams (b) The test fails to reject $H_0$ when the true mean is actually 485 grams (c) The test rejects $H_0$ when the true mean is actually 485 grams (d) The test fails to reject $H_0$ when the true mean is actually 500 grams

Answer **(b) The test fails to reject $H_0$ when the true mean is actually 485 grams.** A Type II error is a "missed detection" — failing to reject $H_0$ when it's actually false. In this case, the products are genuinely underweight (485 < 500), but the test doesn't detect it. Option (a) is a Type I error (false alarm). Options (c) and (d) are correct decisions.

18. The t-distribution acknowledges our uncertainty about $\sigma$ by:

(a) Having a larger mean than the normal distribution (b) Being skewed to the right (c) Having heavier tails, which produce wider CIs and larger p-values (d) Requiring a larger sample size

Answer **(c) Having heavier tails, which produce wider CIs and larger p-values.** The t-distribution is symmetric (not skewed) and centered at 0 (same mean as the standard normal). Its distinctive feature is heavier tails — extreme values are more probable under the t-distribution than the normal. This translates to wider confidence intervals and larger p-values, which is exactly the right response to the extra uncertainty from estimating $\sigma$ with $s$.

19. For a paired data scenario (e.g., before/after measurements), the correct approach is:

(a) Conduct two separate one-sample t-tests, one on the "before" data and one on the "after" data (b) Compute the differences within each pair and conduct a one-sample t-test on the differences (c) Use the larger of the two groups as the single sample for a t-test (d) The t-test cannot be used for paired data

Answer **(b) Compute the differences within each pair and conduct a one-sample t-test on the differences.** Paired data violate the independence assumption if analyzed as two separate samples. The key insight is that each pair provides one "difference" observation, and these differences are independent of each other. Testing $H_0: \mu_d = 0$ on the differences is a valid one-sample t-test that accounts for the pairing. This approach is typically more powerful because it eliminates person-to-person variability.

20. You are deciding between a z-test and a t-test for your analysis. Which statement is the best general advice?

(a) Use the z-test when $n > 30$ and the t-test when $n \leq 30$ (b) Always use the z-test because it's simpler (c) Always default to the t-test; it's valid whether or not $\sigma$ is known (d) Use the z-test for means and the t-test for proportions

Answer **(c) Always default to the t-test; it's valid whether or not $\sigma$ is known.** The t-test is the safe default. When $\sigma$ is known, the t-test gives results virtually identical to the z-test (especially for moderate or large $n$). When $\sigma$ is unknown (the usual case), the t-test correctly accounts for the extra uncertainty while the z-test does not. Option (a) is a common misconception — the choice isn't about sample size but about whether $\sigma$ is known. Option (d) has it backwards — proportions use the z-test.