Quiz: Why Statistics Matters (and Why You Might Actually Enjoy This)

Q: Which of the following is NOT one of the four pillars of a statistical investigation? Ask a good question Collect or find data Memorize formulas Interpret and communicate results

C) Memorize formulas. Why C: The four pillars are: (1) Ask a question, (2) Collect data, (3) Analyze data, (4) Interpret and communicate. Formula memorization is not a pillar — understanding and applying formulas happens within Pillar 3. Reference: Section 1.3

Q: "AI systems are fundamentally different from statistics — they don't use statistical methods."

False. Explanation: Most AI systems — including recommendation engines, spam filters, and predictive models — are built on statistical methods like regression, probability, and classification. AI applies these methods at greater scale and speed, but the underlying logic is statistical.

Contributors

Quiz: Why Statistics Matters (and Why You Might Actually Enjoy This)

Test your understanding before moving on. Target: 70% or higher to proceed confidently.

Section 1: Multiple Choice (1 point each)

1. Which of the following best describes statistics?

A) The study of numbers and calculations
B) The science of collecting, organizing, analyzing, and interpreting data to make decisions under uncertainty
C) A branch of mathematics focused on probability theory
D) The process of creating graphs and charts from data

Answer

**B)** The science of collecting, organizing, analyzing, and interpreting data to make decisions under uncertainty. *Why B:* This captures the full scope — not just calculation, but the entire process of turning data into decisions. *Why not A:* Statistics involves numbers, but reducing it to "the study of numbers" misses the decision-making purpose. *Why not C:* Probability is part of statistics, but statistics is much broader. *Why not D:* Graphs are one tool within statistics, not the whole discipline. *Reference:* Section 1.1

2. A hospital calculates the average wait time in its emergency room last month: 47 minutes. This is an example of:

A) Inferential statistics
B) Descriptive statistics
C) Probability
D) Experimental design

Answer

**B)** Descriptive statistics. *Why B:* The hospital is summarizing actual data it collected — the wait times of patients who actually visited last month. No generalization beyond the data. *Why not A:* There's no claim about a larger population or future months. *Why not C:* No probability calculations are involved in computing an average. *Why not D:* This is analysis of existing data, not design of a study. *Reference:* Section 1.1

3. A polling company surveys 2,000 adults and reports that "58% of Americans support universal background checks for gun purchases." This is:

A) Descriptive statistics (it describes the survey results)
B) Inferential statistics (it generalizes from a sample to a population)
C) Neither — this is just data collection
D) Both descriptive and inferential

Answer

**B)** Inferential statistics. *Why B:* The claim is about "Americans" (the population), not just "the 2,000 people we surveyed" (the sample). The poll uses sample data to infer something about the larger population. *Why not A:* If the report said "58% of our respondents support..." it would be descriptive. But it says "Americans." *Why not C:* Data has been analyzed and a conclusion drawn. *Why not D:* While the 58% is descriptive of the sample, the claim about "Americans" makes the primary use inferential. *Reference:* Section 1.1

4. Which of the following is NOT one of the four pillars of a statistical investigation?

A) Ask a good question
B) Collect or find data
C) Memorize formulas
D) Interpret and communicate results

Answer

**C)** Memorize formulas. *Why C:* The four pillars are: (1) Ask a question, (2) Collect data, (3) Analyze data, (4) Interpret and communicate. Formula memorization is not a pillar — understanding and applying formulas happens within Pillar 3. *Reference:* Section 1.3

5. A Netflix recommendation saying "Because you watched The Office, you might like Parks and Recreation" is an example of:

A) Descriptive statistics — it describes your viewing history
B) Inferential statistics — it predicts what you'll enjoy based on patterns in data
C) Neither — this is just a computer program, not statistics
D) Probability — it calculates the odds you'll like the show

Answer

**B)** Inferential statistics. *Why B:* The algorithm uses patterns in data (your history + millions of other users' histories) to predict something it doesn't know for certain (whether you'll enjoy a show). This is inference — using known data to make predictions about unknowns. *Why not A:* It goes beyond describing your history to making a prediction. *Why not C:* The computer program IS doing statistics — regression and probability models power recommendations. *Why not D:* While probability is involved, the broader activity is inference. *Reference:* Section 1.4

6. Sam notices that Daria's three-point percentage improved from 31% to 38%. Before concluding she truly improved, Sam should consider:

A) Whether the sample size (number of attempts) is large enough for the difference to be meaningful
B) Whether Daria changed her shooting technique
C) Only option A — sample size is all that matters statistically
D) Neither — a 7-percentage-point improvement is clearly significant

Answer

**A)** Whether the sample size is large enough for the difference to be meaningful. *Why A:* With only 65 attempts, random variation alone could produce this improvement. Sample size directly affects how reliable a percentage is. *Why not B:* This is important context, but it's not a statistical consideration — it's about mechanism. *Why not C:* While A is the key statistical consideration, other factors matter too (but from the options given, A is the best answer). *Why not D:* A 7-point improvement on 65 attempts is NOT clearly significant — this is exactly the kind of claim that requires statistical testing. *Reference:* Section 1.5

7. The textbook describes statistical thinking as a "threshold concept." This means:

A) It's too advanced for introductory students
B) Once you understand it, it permanently changes how you see the world
C) You need to pass a test before you can proceed
D) It's the easiest concept in the course

Answer

**B)** Once you understand it, it permanently changes how you see the world. *Why B:* Threshold concepts are transformative (they change your perspective), irreversible (you can't un-see it), and often troublesome (they take time to click). *Why not A:* Threshold concepts are challenging but appropriate for any level. *Why not C:* "Threshold" here refers to a conceptual transformation, not a test gate. *Why not D:* Threshold concepts are often the most challenging, not the easiest. *Reference:* Section 1.4

8. Which study strategy does the textbook recommend as MOST effective for learning statistics?

A) Re-reading the textbook chapters multiple times
B) Highlighting key terms and formulas
C) Retrieval practice — closing the book and trying to recall from memory
D) Watching YouTube videos about each topic

Answer

**C)** Retrieval practice — closing the book and trying to recall from memory. *Why C:* Research by Roediger and McDaniel consistently shows that effortful retrieval produces stronger, more durable learning than passive review. *Why not A:* Re-reading creates an "illusion of fluency" — you feel like you know it because it looks familiar, but you can't actually use it. *Why not B:* Highlighting is one of the least effective study strategies according to learning science research. *Why not D:* Videos can supplement learning but don't replace active practice. *Reference:* Section 1.8

Section 2: True/False with Justification (1 point each)

9. "If you have enough data, you can eliminate uncertainty entirely."

Answer

**False.** *Explanation:* More data reduces uncertainty but never eliminates it. Uncertainty is inherent in the real world — measurement has error, samples aren't perfectly representative, and future outcomes are never guaranteed. A core message of this course is that quantifying uncertainty is a feature, not a bug.

10. "Descriptive statistics can be misleading even though it only summarizes existing data."

Answer

**True.** *Explanation:* Choosing which statistics to report (mean vs. median), which data to include or exclude, and how to visualize results can all create misleading impressions while technically only "describing" data. An average income can be dragged up by a few billionaires, hiding the typical experience. Descriptive statistics is honest but not automatically objective.

11. "AI systems are fundamentally different from statistics — they don't use statistical methods."

Answer

**False.** *Explanation:* Most AI systems — including recommendation engines, spam filters, and predictive models — are built on statistical methods like regression, probability, and classification. AI applies these methods at greater scale and speed, but the underlying logic is statistical.

12. "Understanding statistics is only important for people who work with data professionally."

Answer

**False.** *Explanation:* Statistical claims appear in news, healthcare decisions, advertising, politics, and everyday life. Being a critical consumer of these claims requires statistical literacy regardless of profession. As the textbook argues, statistical thinking is a "survival skill" in an information-saturated world.

Section 3: Short Answer (2 points each)

13. Explain the difference between a population and a sample. Why is this distinction important in statistics?

Sample Answer

A **population** is the entire group you want to study or draw conclusions about. A **sample** is a subset of that population that you actually observe or measure. This distinction is crucial because we almost never have access to the entire population. We rely on samples to make inferences about populations. The quality of those inferences depends on how well the sample represents the population — which is why sampling methods (Chapter 4) matter so much. *Rubric — full credit requires:* - Clear definitions of both terms - Explanation of why the distinction matters for inference

14. The textbook mentions a healthcare algorithm that was biased against Black patients. In 2-3 sentences, explain what went wrong statistically (not ethically — we'll cover that later).

Sample Answer

The algorithm used healthcare spending as a statistical proxy for health needs. Because Black patients historically had less access to healthcare and therefore lower spending, the model interpreted their lower spending as lower need — even when they were equally or more sick. The statistical error was choosing a variable (spending) that was confounded with race, making it an unreliable proxy for the outcome they actually wanted to predict (health needs). *Rubric — full credit requires:* - Identification of the proxy variable problem - Explanation of why spending was a flawed measure

15. Name two of the four anchor examples from this chapter and briefly describe the statistical question each one faces.

Sample Answer

**Dr. Maya Chen:** Is the higher rate of asthma-related ER visits in low-income zip codes caused by poverty, lack of preventive care access, or environmental factors? She needs to distinguish correlation from causation. **Sam Okafor:** Has Daria Kowalczyk genuinely improved her three-point shooting (from 31% to 38%), or is the improvement just random variation due to a small sample size? He needs to determine whether the observed difference is statistically meaningful. *Rubric — full credit requires:* - Two correctly identified examples - Accurate description of each person's statistical question

Section 4: Applied Scenario (3 points)

16. A local news station reports: "Crime in our city dropped 12% this year, proving that the mayor's new policing strategy is working."

Using concepts from this chapter, evaluate this claim. Your response should address: a) Is this descriptive or inferential? (1 point) b) What are at least two alternative explanations for the 12% drop? (1 point) c) What additional information would you need to evaluate whether the mayor's strategy actually caused the decline? (1 point)

Sample Answer

**a)** This is inferential statistics — the news station is drawing a causal conclusion ("proving the strategy is working") that goes beyond simply reporting the 12% decline. The 12% drop itself is descriptive; the causal claim is inferential. **b)** Alternative explanations include: - Crime could have declined for other reasons (economic improvement, demographic shifts, changes in reporting practices) - 12% might be within normal year-to-year fluctuation — some years crime goes up, some years it goes down, regardless of policy - The definition of "crime" might have changed (reclassifying certain offenses, changes in reporting) - A nationwide crime trend might explain the local decline **c)** To evaluate causation, you'd want: - Crime trends from comparable cities that didn't implement the policy (a control group) - Crime data from several years before and after the policy (not just one year) - Analysis of whether the specific types of crime targeted by the policy declined more than other types - Understanding of what else changed in the city during this period *Rubric:* | Criterion | 0 pts | 1 pt | 2 pts | 3 pts | |-----------|-------|------|-------|-------| | Classification | No answer | Partially correct | Correctly identifies inferential but without nuance | Correctly distinguishes the descriptive fact from the inferential claim | | Alternatives | No alternatives given | One plausible alternative | Two plausible alternatives | Two+ alternatives with clear reasoning | | Additional info | No answer | Vague response | One specific information need | Two+ specific, relevant information needs |

Section 5: Reflection (1 point)

17. What is one thing from this chapter that surprised you or changed how you think about statistics?

Sample Answer

There's no single correct answer here. Credit is given for a thoughtful, specific reflection that demonstrates engagement with the material. Examples: surprise at how AI uses statistics, the realization that statistical thinking applies to everyday decisions, the healthcare algorithm bias example, etc.

18. On a scale of 1-10, how would you rate your current comfort level with statistics? What specific aspect are you most anxious or curious about?

Sample Answer

No correct answer — this is a metacognitive prompt to help you gauge your starting point. Be honest with yourself; you'll revisit this at the end of the course.

Scoring & Next Steps

Score	Assessment	Recommended Action
< 50%	Needs review	Re-read sections 1.1-1.3, redo Part A exercises
50-70%	Partial	Review weak areas, focus on the descriptive vs. inferential distinction
70-85%	Solid	Ready to proceed; revisit any missed topics
> 85%	Strong	Proceed; consider the Deep Dive case study (case-study-02.md)