Quiz: Power, Effect Sizes, and What "Significant" Really Means
Instructions: Choose the best answer for each question. Some questions have multiple parts or require brief explanations. Answers to selected questions appear in the appendix.
1. Which of the following best describes the relationship between statistical significance and practical significance?
(a) They are the same thing — a statistically significant result is always practically significant.
(b) Statistical significance tells you whether an effect exists; practical significance tells you whether the effect is large enough to matter.
(c) Practical significance is more important than statistical significance, so p-values are useless.
(d) A result must be both statistically significant and practically significant to be published.
2. A study with 1,000,000 participants finds that people who eat dark chocolate score 0.02 points higher on a 100-point happiness scale compared to a control group ($p < 0.001$, $d = 0.003$). Which statement best describes this finding?
(a) The effect is both statistically and practically significant.
(b) The effect is statistically significant but not practically significant.
(c) The effect is practically significant but not statistically significant.
(d) The effect is neither statistically nor practically significant.
3. Cohen's d is defined as:
(a) The difference between two group means divided by the sample size
(b) The p-value divided by the sample size
(c) The difference between two group means divided by the pooled standard deviation
(d) The t-statistic divided by the degrees of freedom
4. A Cohen's d of 0.50 means:
(a) The two group means differ by 50 points
(b) The two group means differ by half a standard deviation
(c) 50% of the variance is explained by group membership
(d) The probability of rejecting $H_0$ is 50%
5. Which of the following is NOT one of the four factors that determine statistical power?
(a) Sample size ($n$)
(b) Effect size
(c) Significance level ($\alpha$)
(d) The number of researchers on the study team
(e) Population variability ($\sigma$)
6. Statistical power is:
(a) $P(\text{reject } H_0 \mid H_0 \text{ is true})$
(b) $P(\text{reject } H_0 \mid H_0 \text{ is false})$
(c) $P(\text{fail to reject } H_0 \mid H_0 \text{ is false})$
(d) $P(\text{fail to reject } H_0 \mid H_0 \text{ is true})$
7. A researcher designs a study with 80% power. This means:
(a) There is an 80% chance the null hypothesis is false.
(b) If the effect is real, the study has an 80% chance of detecting it.
(c) 80% of the sample will show the effect.
(d) The p-value will be less than 0.80.
8. Which change will DECREASE statistical power?
(a) Increasing the sample size
(b) Increasing the true effect size
(c) Decreasing the significance level from 0.05 to 0.01
(d) Using a paired design instead of an independent-groups design
9. Two groups have $\bar{x}_1 = 50$, $s_1 = 10$, $n_1 = 100$ and $\bar{x}_2 = 52$, $s_2 = 10$, $n_2 = 100$. What is Cohen's d?
(a) 0.02
(b) 0.10
(c) 0.20
(d) 2.00
10. A t-test produces $t = 2.0$ with $df = 96$. What is $r^2$?
(a) 0.02
(b) 0.04
(c) 0.20
(d) 0.96
11. A "large" effect ($d = 0.8$) explains approximately what proportion of the variance?
(a) 80%
(b) 64%
(c) 14%
(d) 8%
12. An underpowered study is problematic because (select ALL that apply):
(a) It has a high probability of missing real effects (Type II error).
(b) When it does find significance, the effect size is likely to be overestimated.
(c) It inflates the Type I error rate above $\alpha$.
(d) It contributes to the replication crisis when combined with publication bias.
13. The "file drawer problem" refers to:
(a) The tendency for researchers to lose their data files
(b) The tendency for non-significant results to remain unpublished
(c) The tendency for researchers to file their pre-registration documents late
(d) The tendency for p-values to be filed in the wrong category
14. A researcher conducts 20 independent t-tests, each at $\alpha = 0.05$, when $H_0$ is true for all 20. The probability of finding at least one "significant" result is approximately:
(a) 0.05
(b) 0.20
(c) 0.64
(d) 1.00
15. Which of the following is the best way to report the results of a hypothesis test?
(a) "The result was significant ($p = 0.03$)."
(b) "The result was significant ($p = 0.03$, $d = 0.45$)."
(c) "The treatment group scored 5.3 points higher (95% CI: 1.2 to 9.4, $d = 0.45$, $p = 0.03$)."
(d) "We reject the null hypothesis."
16. The ASA's 2016 statement on p-values includes which of the following principles?
(a) P-values should be replaced by Bayes factors.
(b) A p-value does not measure the size of an effect or the importance of a result.
(c) Any study with $p > 0.05$ should not be published.
(d) P-values are the most reliable measure of evidence.
17. A study finds $p = 0.049$. A replication study with 3x the sample size finds $p = 0.051$. Which interpretation is most appropriate?
(a) The first study found an effect; the second did not. The results are contradictory.
(b) Both studies found similar levels of evidence. The difference between $p = 0.049$ and $p = 0.051$ is trivial.
(c) The first study must have been p-hacked, since the replication failed.
(d) The replication study was underpowered.
18. You want to detect a small effect ($d = 0.2$) with 80% power at $\alpha = 0.05$ using a two-sample t-test. Approximately how many participants do you need per group?
(a) 26
(b) 64
(c) 200
(d) 394
19. P-hacking inflates the false positive rate because:
(a) It increases $\alpha$ above 0.05.
(b) It allows researchers to test many hypotheses and report only the significant ones.
(c) It reduces the sample size.
(d) It eliminates the need for randomization.
20. A confidence interval is preferred over a p-value alone because:
(a) Confidence intervals are always narrower than p-values.
(b) A confidence interval simultaneously conveys the direction, magnitude, and precision of the effect.
(c) Confidence intervals are easier to compute.
(d) Confidence intervals never require assumptions about the population distribution.
Answer Key (Selected)
1. (b) — Statistical significance tells you whether an effect is unlikely under $H_0$; practical significance tells you whether the effect is large enough to matter in the real world. They address different questions and can give different answers.
3. (c) — Cohen's d = $(\bar{x}_1 - \bar{x}_2) / s_p$, where $s_p$ is the pooled standard deviation.
6. (b) — Power = $P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - \beta$. It is the probability of correctly detecting a real effect.
9. (c) — $d = (52 - 50) / 10 = 0.20$ (small effect). Note that sample size does not appear in the calculation.
10. (b) — $r^2 = t^2 / (t^2 + df) = 4 / (4 + 96) = 4/100 = 0.04$. The group variable explains 4% of the variance.
11. (c) — $r^2 = d^2 / (d^2 + 4) = 0.64 / (0.64 + 4) = 0.64 / 4.64 \approx 0.14$. Even a "large" effect explains only about 14% of the variance.
14. (c) — $P(\text{at least one}) = 1 - (1 - 0.05)^{20} = 1 - 0.95^{20} \approx 0.64$.
17. (b) — The difference between 0.049 and 0.051 is trivially small. Both represent similar (weak) evidence against $H_0$. Treating one as "significant" and the other as "not significant" gives a false sense of a clear distinction.
18. (d) — Approximately 394 per group (788 total). Detecting small effects requires large samples.
20. (b) — A CI gives you the estimated effect size (the center), the direction (positive or negative), and the precision (width). A p-value alone tells you none of these directly.