Chapter 28 Quiz: Probabilistic Thinking and Uncertainty

Instructions: Answer all questions. Show calculations where required. For multiple choice, select the best answer. Answers are hidden in collapsible sections.


Question 1 A disease has a prevalence of 1% in a population. A test for this disease has sensitivity of 95% and specificity of 90%. You test positive. Using Bayes' Theorem, what is the approximate probability you actually have the disease?

A) 95% (the test sensitivity) B) 90% (the test specificity) C) About 8.7% D) About 50%

Show Answer and Calculation **Answer: C — approximately 8.7%** P(disease) = 0.01; P(no disease) = 0.99 P(positive | disease) = 0.95 P(positive | no disease) = 1 - 0.90 = 0.10 P(positive) = 0.01 × 0.95 + 0.99 × 0.10 = 0.0095 + 0.0990 = 0.1085 P(disease | positive) = (0.01 × 0.95) / 0.1085 = 0.0095 / 0.1085 ≈ **0.0875 = 8.75%** The key insight: even a 95%/90% test produces mostly false positives when disease prevalence is only 1%. The large "denominator" of people who don't have the disease but test positive (9.9% of 99% = 9.80% of population) overwhelms the true positives (0.95% of population).

Question 2 Bayes' Theorem states: P(A|B) = P(A) × P(B|A) / P(B). In the context of updating beliefs about a news story's accuracy:

  • P(A) = prior probability the story is true
  • P(B|A) = probability of seeing corroborating evidence IF the story is true
  • P(B) = total probability of seeing this corroborating evidence

What does the posterior probability P(A|B) represent?

Show Answer **Answer:** P(A|B) is the **updated (posterior) probability that the story is true after observing the corroborating evidence**. It tells you how likely the story is to be accurate, given both your prior belief and the specific evidence you have now seen. If the corroborating evidence is diagnostic — much more likely to appear if the story is true than if it's false — the posterior will be substantially higher than the prior. If the evidence is non-diagnostic (equally likely whether the story is true or false), the posterior equals the prior.

Question 3 The prosecutor's fallacy involves confusing:

A) The probability of guilt with the probability of innocence B) P(evidence | innocent) with P(innocent | evidence) C) Relative risk with absolute risk D) Sensitivity with specificity

Show Answer **Answer: B** The prosecutor's fallacy confuses P(evidence|innocent) — how likely is this evidence to appear if the person is innocent — with P(innocent|evidence) — how likely is the person to be innocent given that this evidence exists. Just because evidence has a very low probability under innocence does not mean guilt is highly probable; you must also consider the base rate of guilt (the prior probability that the defendant is the perpetrator). Using Bayes' theorem, you need to account for both the likelihood ratio of the evidence AND the prior probability.

Question 4 Which of the following best explains the conjunction fallacy demonstrated in the Linda problem?

A) People have incorrect knowledge of set theory B) The representativeness heuristic makes the specific conjunction feel more plausible because it matches the description better C) People don't understand what "probability" means D) The problem is misleading and both answers are actually acceptable

Show Answer **Answer: B** The conjunction fallacy arises because the representativeness heuristic causes people to judge probability by similarity: "bank teller AND feminist" is more representative of (matches the description of) Linda better than "bank teller" alone. People effectively substitute "how well does this fit the description?" for "how probable is this?" These two questions have very different answers. The error is not ignorance of set theory — even when the mathematical impossibility is explained, many people stick with their intuitive judgment.

Question 5 A researcher is 90% confident in the findings of a study. What does good calibration require this to mean?

Show Answer **Answer:** A well-calibrated researcher who regularly expresses 90% confidence should be correct approximately 90% of the time across all the predictions/findings they rate at that level. This means that for every 10 claims they assert with 90% confidence, approximately one should turn out to be wrong. Research shows that most people claiming 90% confidence are wrong far more often — perhaps 25-30% of the time — revealing systematic overconfidence. Good calibration requires that stated confidence levels track actual accuracy rates across a large sample of predictions.

Question 6 According to Tetlock's research on the Good Judgment Project, which characteristic most strongly distinguishes superforecasters from average forecasters?

A) Higher domain expertise in the topics being forecast B) Access to more information and better data sources C) Probabilistic thinking, calibration focus, and active open-mindedness D) More years of experience in forecasting

Show Answer **Answer: C** Tetlock's research found that superforecasters are distinguished primarily by their cognitive style rather than domain expertise or information access. Key characteristics: they think naturally in probabilities, they actively seek disconfirming evidence (active open-mindedness), they care deeply about calibration accuracy, they update readily when new evidence arrives, and they maintain granular probability estimates. Domain expertise was less important than these general epistemic habits — generalists with good probabilistic thinking regularly outperformed narrow experts.

Question 7 Calculate the expected value of the following decision:

A public health intervention costs $10 million to implement. It has a 60% probability of preventing a disease outbreak that would cost $20 million in medical expenses and economic losses, and a 40% probability of failing to prevent the outbreak (which still costs $20 million).

A) EV(implement) = -$10M; EV(not implement) = -$12M; implement B) EV(implement) = -$10M; EV(not implement) = -$20M; implement C) EV(implement) = -$10M; EV(not implement) = -$8M; do not implement D) Both options have the same expected value

Show Answer **Answer: A** EV(implement) = -$10M (certain cost of intervention) + 0.60 × $0 (outbreak prevented) + 0.40 × (-$20M) (outbreak happens despite intervention) = -$10M + 0 - $8M = **-$18M** Wait — let me recalculate properly: EV(implement) = -$10M (intervention cost) + 0.60 × (0) + 0.40 × (-$20M) = -$10M - $8M = **-$18M** EV(not implement) = 0.60 × (-$20M) + 0.40 × (-$20M) = -$20M × 1.0 = **-$20M** Actually the outbreak costs $20M regardless of whether it occurs; the intervention changes the probability of it occurring: EV(implement) = -$10M + [0.60 × $0 + 0.40 × (-$20M)] = -$10M - $8M = -$18M EV(not implement) = 0.60 × (-$20M) + 0.40 × (-$20M) = -$20M (outbreak certain if no intervention? No...) Corrected: If no intervention, the outbreak has some natural probability. The problem implies the intervention prevents the outbreak with 60% probability. If no intervention occurs, the outbreak probability is implicitly 100% (i.e., without intervention, the outbreak will happen): EV(no intervention) = -$20M. Answer A is correct: implement is better (-$18M vs -$20M). This demonstrates that even costly interventions can have higher expected value than inaction.

Question 8 The IPCC uses the term "likely" to describe a probability range of:

A) > 50% (more probable than not) B) > 66% (roughly two-thirds probability) C) > 75% (three-quarters probability) D) > 80% (four-fifths probability)

Show Answer **Answer: B** In the IPCC uncertainty framework, "likely" corresponds to a probability greater than 66% (often expressed as the range 66-100%, complemented by "unlikely" at < 33%). This is an important distinction: in everyday English, "likely" often connotes a simple majority (>50%). In IPCC usage, it is a stronger claim. When the IPCC says a warming threshold is "likely" to be exceeded, it means they assess this with more than 66% confidence — a significantly higher threshold than the everyday meaning might suggest.

Question 9 A news article reports: "People who drink red wine daily are 30% less likely to have a heart attack than non-drinkers." What additional information is MOST important for interpreting this claim accurately?

A) The study's sample size and statistical power B) The base rate of heart attacks in the non-drinking population, and whether the reduction is relative or absolute risk C) Whether the study was funded by the wine industry D) The definition of "daily" drinking used in the study

Show Answer **Answer: B** While all four pieces of information are relevant, the base rate and absolute vs. relative risk distinction is most critical for accurate interpretation. A "30% reduction" in relative terms means very different things depending on the baseline: if the annual heart attack rate in non-drinkers is 1%, a 30% reduction means 0.7% in drinkers — a 0.3 percentage point reduction. If the annual rate is 0.1%, the reduction is only 0.03 percentage points — tiny in absolute terms. The distinction between relative risk reduction (30%) and absolute risk reduction (depends on base rate) is precisely the kind of base rate information that prevents misleading health claims.

Question 10 In Bayesian epistemology, what is the "prior" probability?

Show Answer **Answer:** The **prior probability** is the probability assigned to a hypothesis **before** new evidence is considered. It represents what you believed about the hypothesis based on background knowledge, general base rates, and previously accumulated evidence — before the specific new piece of evidence you are now evaluating. The prior is then updated using Bayes' Theorem when new evidence arrives to produce the **posterior** probability. Bayesian reasoning requires explicit specification of priors, which forces clarity about what background knowledge you are bringing to the evaluation of new evidence.

Question 11 "We can't be completely certain that smoking causes lung cancer because observational studies have confounders and randomized controlled trials on smoking would be unethical to conduct." This argument represents which misinformation strategy?

A) False symmetry B) Demanding certainty to paralyze action C) Uncertainty laundering D) Base rate neglect

Show Answer **Answer: B** This is a "demanding certainty" strategy. The argument is technically correct that a perfectly clean causal demonstration is difficult, but it uses this epistemological limitation to imply that no action is warranted — even though the evidence for smoking causing lung cancer (from multiple independent types of studies including longitudinal cohorts, mechanistic studies, natural experiments, and dose-response evidence) is overwhelming by the standards of empirical science. The tobacco industry used exactly this argument for decades. "We can't be 100% certain" is weaponized to set an impossibly high bar for evidence, preventing action even when the probabilistic evidence is extremely strong.

Question 12 The Brier score for a forecast is computed as (probability - outcome)². What Brier score does a forecaster receive who assigns 80% probability to an event that does NOT occur?

Show Answer **Answer:** BS = (0.80 - 0)² = 0.64 The forecaster said 80% probability (0.80) and the outcome was 0 (event did not occur). This is a poor Brier score — 0.64 out of a maximum of 1.0. For comparison: - A 50% forecast for an event that doesn't occur: BS = (0.5-0)² = 0.25 - A 10% forecast for an event that doesn't occur: BS = (0.1-0)² = 0.01 This illustrates why being confidently wrong is penalized much more than being appropriately uncertain. The Brier score creates incentives for calibration: you are rewarded for being confident when you're right and heavily penalized for being confident when you're wrong.

Question 13 Why do Gigerenzen and colleagues recommend presenting probability information as natural frequencies (e.g., "10 out of 10,000") rather than conditional probabilities (e.g., "0.1%")?

Show Answer **Answer:** Gigerenzen's research shows that humans reason more accurately with natural frequencies than conditional probabilities because natural frequencies: (1) preserve information about the sizes of the groups being compared — the denominator is explicit and concrete; (2) match the way our evolutionary ancestors encountered information (in terms of observed counts rather than abstract percentages); (3) make base rates visible rather than hidden within calculations. When told "10 out of 10,000 healthy women who have a mammogram will have breast cancer, and 80 of those will test positive, while 950 of the 9,990 healthy women will also test positive," the problem becomes transparent. The 80 true positives versus 950 false positives immediately reveals that most positive tests are false. The same information expressed as conditional probabilities is much harder to process correctly.

Question 14 A vaccine is 92% effective at preventing severe illness. A vaccinated person develops severe illness. A friend says: "See, the vaccine didn't work!" This reasoning commits which error?

A) The conjunction fallacy B) Treating population-level statistics as individual-level guarantees C) The prosecutor's fallacy D) Manufactured uncertainty

Show Answer **Answer: B** "92% effective" is a population-level probability statement: among vaccinated people, 92% fewer will develop severe illness compared to unvaccinated. This means 8% of vaccinated people may still develop severe illness in the relevant comparison — the vaccine reduces risk dramatically but does not eliminate it. No vaccine is 100% effective for every individual. Expecting a population-level statistic to guarantee an individual outcome is a fundamental probabilistic error. The correct response is: "This is consistent with the vaccine being 92% effective. You are among the approximately 8% of vaccinated people for whom this vaccine did not prevent severe illness. Without the vaccine, your risk would have been substantially higher."

Question 15 What does "base rate neglect" mean, and provide an example outside of medicine?

Show Answer **Answer:** Base rate neglect is the failure to incorporate the prior probability (base rate) of an event when evaluating new evidence or making probability judgments. People focus on specific, vivid case-specific information while ignoring the underlying frequency of the event in the relevant population. **Example outside medicine:** Airport security screening. A passenger sets off the metal detector, and a security officer thinks "this person is very likely hiding a weapon." But the base rate of actual weapon carriers among air travelers is extremely low — estimated below 0.01% in most contexts. If the metal detector has a 1% false positive rate, then among 100,000 travelers, there are ~100 false alerts (actual weapon carriers) and ~999 false alarms (random noise from keys, belts, etc.). The alert is much more likely to be false than a genuine threat. Security officers often neglect this base rate and treat each alert as high-probability confirmation of a threat, leading to inefficient resource allocation and sometimes civil liberties violations.

Question 16 Tetlock's concept of the "fox vs. hedgehog" distinction holds that foxes are better forecasters than hedgehogs. What distinguishes the fox-style thinker from the hedgehog-style thinker?

Show Answer **Answer:** The distinction originates from Isaiah Berlin's essay on Tolstoy, borrowing a Greek proverb: "The fox knows many things; the hedgehog knows one big thing." **Hedgehogs** organize their thinking around one central idea, principle, or theoretical framework and interpret new information through that lens. They give confident, sweeping predictions. They are engaging media commentators because of their certainty and consistency. **Foxes** draw on many different analytical frameworks, are comfortable holding multiple partially competing models simultaneously, acknowledge more uncertainty, and update more readily. They are less exciting as commentators but dramatically more accurate as forecasters. Tetlock found that hedgehog-style experts performed barely better than chance in long-range political forecasting, while fox-style thinkers performed substantially better. The hedgehog's commitment to one framework produces systematic blind spots; the fox's eclecticism catches errors that any single model misses.

Question 17 Two news outlets report on the same scientific study. Outlet A says: "Scientists prove coffee causes cancer." Outlet B says: "New study suggests coffee may slightly increase cancer risk in heavy smokers, but researchers note the effect was small and the study was observational." Which coverage better reflects calibrated uncertainty, and why?

Show Answer **Answer:** Outlet B reflects significantly better calibrated uncertainty. Its specific advantages: 1. **"May"** rather than "proves" — preserves appropriate uncertainty about causal direction from observational data. 2. **"Slightly"** — conveys effect size, preventing misinterpretation of small effects as large. 3. **"Heavy smokers"** — specifies the population in which the effect was found, preventing overgeneralization. 4. **"Observational"** — correctly signals the study design and its limitations for causal inference. 5. **"Researchers note"** — attributes the qualifications to the scientists themselves rather than just editorial hedging. Outlet A's headline inverts all of these: it converts a probabilistic association into definitive proof, conceals effect size and population specificity, and implies causation from correlation. This is characteristic of the systematic amplification and distortion that occurs when science journalism prioritizes engagement over accuracy.

Question 18 What is Pascal's Mugging, and what does it reveal about the limitations of pure expected value reasoning?

Show Answer **Answer:** Pascal's Mugging is a thought experiment in which someone demands a small payment (e.g., $5) and threatens enormous harm to a vast number of people (e.g., a trillion people in a simulation) if refused. The probability of this threat being genuine may be astronomically small — say, 10^-20. However, the threatened harm is so large (10^12 people × some large harm value) that the expected value calculation might suggest compliance: even a tiny probability multiplied by an enormous utility can produce a large expected cost of non-compliance. The thought experiment reveals that pure expected value reasoning can produce intuitively absurd conclusions when very small probabilities are paired with astronomically large claimed outcomes. Rational responses include: (1) applying much lower priors to extreme claims — if someone threatens to harm a trillion simulation-inhabitants, the actual probability of this being true should be set extremely low based on all available evidence; (2) recognizing that stated utilities must also be subject to prior distributions — claiming infinite harm doesn't make the expected value calculation infinite; (3) applying skeptical prior probabilities proportional to the extraordinary nature of the claim ("extraordinary claims require extraordinary evidence").

Question 19 The tobacco industry spent decades arguing that "the science isn't settled" on whether smoking causes cancer, despite internal documents showing executives knew about the causal connection. What probability concept does this strategy exploit?

Show Answer **Answer:** This strategy exploits the **asymmetry between establishing a positive claim and introducing doubt**. In empirical science, establishing a causal claim requires accumulating multiple independent lines of evidence that together meet a high evidential threshold. Introducing doubt requires only producing apparently reasonable alternative explanations or pointing to gaps in evidence — a much lower bar. The tobacco industry exploited this asymmetry through **manufactured uncertainty**: funding researchers to produce studies finding no effect, commissioning literature reviews that selectively highlighted inconsistencies, and cultivating a small number of credentialed scientists who would publicly question the consensus. Each new "counter-study" required the scientific community to respond — consuming resources and perpetuating the appearance of genuine debate. The strategy also exploited the **media norm of balance** (see Chapter 15): presenting two scientists with opposing views as if both represented comparable scientific standing. A consensus of 99% of researchers was represented as a 50/50 debate. The technique is sometimes called the "Merchants of Doubt" strategy, documented by historians Naomi Oreskes and Erik Conway, who showed the same tactics were subsequently used for climate science, ozone depletion, and other areas where industry interests conflicted with scientific findings.

Question 20 Explain why having a very low prior probability on an extraordinary claim is not the same as being closed-minded, even from a Bayesian perspective.

Show Answer **Answer:** A very low prior probability on an extraordinary claim reflects the accumulated weight of background knowledge — everything we already know about how the world works, accumulated over centuries of scientific investigation. This is not closed-mindedness; it is appropriate respect for the existing evidential base. From a Bayesian perspective, a low prior is compatible with full openness to evidence. Bayes' Theorem shows that sufficiently strong evidence (a very high likelihood ratio) can update even a very low prior to a high posterior. The key is that the evidence required to overcome a low prior must be proportionally strong. Carl Sagan's maxim "extraordinary claims require extraordinary evidence" is a Bayesian principle: extraordinary claims have very low priors, and updating them requires very high likelihood ratios — meaning evidence that is much more likely to exist if the extraordinary claim is true than if it isn't. A truly closed-minded person would ignore strong evidence. A Bayesian reasoner with a very low prior simply requires stronger evidence proportional to how extraordinary the claim is — but remains genuinely open to updating if that evidence materializes. The difference is operationalized: a closed-minded person sets the likelihood ratio of evidence to zero (no evidence counts); a Bayesian sets an appropriate low prior but processes evidence correctly.

Question 21 A relative risk reduction of 50% sounds dramatic, but an absolute risk reduction might be tiny. Give an example of a health claim where this distinction matters, and explain why the base rate is essential information.

Show Answer **Answer:** Consider a drug that reduces the risk of a particular type of stroke by 50% (relative risk reduction). If the annual baseline risk of this stroke type in the target population is 0.1% (1 in 1,000), a 50% relative reduction means the risk drops to 0.05% (1 in 2,000). The absolute risk reduction is 0.05 percentage points — meaning you would need to treat 2,000 patients for one year to prevent one stroke. This absolute risk reduction must then be weighed against the side effect profile and cost of the drug. If the drug costs $5,000/year, preventing one stroke costs $10 million in drug costs alone. If the drug has a 0.1% rate of serious side effects, treating 2,000 patients for one year would also produce approximately 2 serious adverse events — potentially more than the number of strokes prevented. Without the base rate (0.1% annual stroke risk in this population), the "50% reduction" sounds very impressive and clearly worth accepting. With the base rate, the decision becomes much more nuanced. This is why pharmaceutical companies, and health journalists, often prefer to report relative risk reductions rather than absolute risk reductions — the relative number is always more impressive.

Question 22 Describe what a reliability diagram (calibration curve) looks like for a well-calibrated forecaster vs. an overconfident forecaster, and explain what each pattern means.

Show Answer **Answer:** A reliability diagram plots **stated confidence (probability)** on the x-axis against **observed accuracy (fraction correct)** on the y-axis, across many predictions bucketed by confidence level. **Well-calibrated forecaster:** The curve lies approximately on the **diagonal line** (y = x). When this forecaster says 70%, they are correct ~70% of the time; when they say 90%, they are correct ~90% of the time. The stated confidence accurately tracks actual accuracy. **Overconfident forecaster:** The curve lies **below the diagonal** — specifically, it bows toward the lower-right. When this forecaster says 90%, they are actually correct only 70-75% of the time. Their stated confidence systematically exceeds their actual accuracy. The curve typically shows little deviation at low confidence levels (around 50-60%) but increasingly large deviations at high confidence levels (80-99%), because overconfidence inflates high-confidence assessments the most. **Underconfident forecaster** (less common): The curve lies **above the diagonal**, meaning their stated probabilities are consistently lower than their actual accuracy rate. When they say 60%, they're right 80% of the time. This person systematically hedges more than the evidence warrants. The ideal curve touches the diagonal at the 50% point (random accuracy for 50% confidence) and follows it closely across the full range. The area between a forecaster's curve and the diagonal is a measure of miscalibration.

Question 23 (Short Essay) Explain why understanding Bayes' Theorem is specifically relevant to evaluating misinformation. Use at least two specific examples from the chapter (or from your own knowledge) to illustrate how Bayesian reasoning either helps prevent belief in false claims or explains why false beliefs persist.

Show Answer **Answer:** Bayes' Theorem is the normative model for rational belief update: it specifies how new evidence should change our credences in propositions. Understanding it helps in two directions: it prevents overreaction to weak evidence, and it explains the persistence of false beliefs despite correction. **Example 1: Preventing overreaction to weak evidence.** Suppose a single study finds that vaccine X is associated with a slight increase in a particular autoimmune condition. Without Bayesian thinking, this might prompt dramatic belief change: "The vaccine causes autoimmune disease." With Bayesian reasoning, you ask: What is the prior probability that this vaccine (which has been in use for several years across millions of doses with extensive post-market surveillance) causes this condition? This prior is very low. The likelihood ratio for a single study — especially one that could reflect sampling variation, residual confounding, or publication bias — is modest at best. Therefore, the posterior probability of a genuine effect remains low even after the single study. The correct response is not dismissal, but proportional updating: the study provides weak evidence worth monitoring, but not enough to revise established beliefs substantially. **Example 2: Explaining the persistence of false beliefs despite correction.** A person with a strong prior belief that vaccines cause autism (developed through social exposure and emotional experience with a diagnosed child) encounters a correction: large epidemiological studies consistently show no link. Why doesn't the correction immediately convince them? Bayesian analysis shows that if a person's prior is very strong (say, 95% certainty), even substantial evidence requires a very high likelihood ratio to overcome. If the person discounts the diagnostic quality of the studies (viewing them as potentially corrupted by pharmaceutical industry funding), they effectively lower the likelihood ratio attributed to the studies. The result is a posterior that barely moves from the prior. This is not irrational given the person's beliefs about the evidence quality — it is what Bayesian reasoning predicts. The implication for correction strategies is important: corrections must either (a) be extremely diagnostic (high LR) or (b) target the prior directly — addressing why the person's background beliefs are miscalibrated, rather than just providing counter-evidence. Bayes' Theorem thus illuminates why misinformation is not simply a deficit of information: it is a belief state that can only be moved by strong, trusted, diagnostic evidence, and that is resistant to weak corrections precisely because Bayesian reasoning (applied with distorted priors and skeptical likelihood assessments) predicts resistance.

Question 24 What is the difference between genuine scientific uncertainty and manufactured uncertainty? Give one example of each.

Show Answer **Answer:** **Genuine scientific uncertainty** exists when the evidence base is genuinely insufficient to support a clear conclusion. This can occur because of: inadequate sample sizes, absence of replication, methodological limitations, genuinely conflicting results from well-designed studies, or a phenomenon that has not yet been sufficiently studied. Genuine uncertainty is acknowledged by the majority of researchers in a field and is reflected in cautious language, wide confidence intervals, and calls for more research. *Example of genuine uncertainty:* The optimal sleep duration for maximizing long-term health is genuinely uncertain. Multiple large studies find different optimal values (7 hours, 7-9 hours, 8 hours) and the causal direction (does sleeping less cause health problems, or do health problems cause people to sleep less?) remains partially unresolved. Expert guidance reflects this by offering ranges rather than specific targets. **Manufactured uncertainty** is strategically created by interested parties to maintain the appearance of scientific debate where genuine consensus exists. It typically involves: funding studies designed to find null results, amplifying minor disagreements while suppressing the broader consensus, securing media coverage that treats fringe views as equivalent to consensus views, and challenging regulatory decisions by claiming scientific uncertainty that does not actually characterize the expert community. *Example of manufactured uncertainty:* The tobacco industry manufactured uncertainty about the link between smoking and lung cancer from the 1950s through the 1990s. Internal documents revealed that executives knew about the causal link. Externally, they funded the Tobacco Industry Research Committee, created the appearance of active scientific debate, and cultivated a small number of scientists willing to publicly question the consensus — despite the fact that the scientific literature overwhelmingly supported causation. The "uncertainty" was manufactured, not genuine. The distinction matters because genuine uncertainty is a reason to gather more evidence; manufactured uncertainty is a tactical deployment of apparent doubt to prevent action that is otherwise clearly warranted.

Question 25 (Calculation) Calculate the Brier scores for these five forecasts and identify which forecaster performed best overall:

Forecaster A: 90% confident, event occurred Forecaster A: 90% confident, event occurred Forecaster A: 90% confident, event did NOT occur Forecaster A: 60% confident, event occurred Forecaster A: 60% confident, event did NOT occur

Forecaster B: 70% confident, event occurred Forecaster B: 70% confident, event occurred Forecaster B: 30% confident, event did NOT occur Forecaster B: 65% confident, event occurred Forecaster B: 35% confident, event did NOT occur

Show Answer **Answer:** **Forecaster A:** - (0.9 - 1)² = 0.01 - (0.9 - 1)² = 0.01 - (0.9 - 0)² = 0.81 - (0.6 - 1)² = 0.16 - (0.6 - 0)² = 0.36 Sum = 1.36; Average Brier score = **0.272** **Forecaster B:** - (0.7 - 1)² = 0.09 - (0.7 - 1)² = 0.09 - (0.3 - 0)² = 0.09 - (0.65 - 1)² = 0.1225 - (0.35 - 0)² = 0.1225 Sum = 0.515; Average Brier score = **0.103** **Forecaster B performed substantially better** (lower Brier score = better). The key reason: Forecaster A's third prediction — 90% confident in an event that did NOT occur — earned a devastating score of 0.81, dragging the average up significantly. Forecaster B, by contrast, was well-calibrated: saying 70% for events that occurred, 30% for non-events, and consistently using moderate confidence levels that matched the actual uncertainty in outcomes. The lesson: being overconfident when wrong is the fastest way to accumulate a high (bad) Brier score. It is better to say 70% and be wrong occasionally than to say 90% and be wrong at the same rate.