Chapter 26: Exercises — Scientific Thinking and Evidence Evaluation

Instructions

These exercises develop skills in evaluating scientific evidence across multiple dimensions: study design quality, statistical interpretation, causal reasoning, and media translation. Progress from fundamental identification tasks to synthesis and application.


Part A: Study Design and Evidence Quality (Exercises 1–8)

Exercise 1: Classifying Study Designs

For each description, identify the study design and place it in the evidence hierarchy. Explain what causal claims the study design can and cannot support.

(a) Researchers recruited 300 adults who drink coffee daily and 300 who do not, then reviewed their medical records for the past 10 years to compare rates of Parkinson's disease diagnosis.

(b) 5,000 participants were randomly assigned to receive either aspirin or placebo daily; researchers followed them for 5 years and recorded cardiovascular events.

(c) A researcher collected 12 published and 18 unpublished studies on the effect of mindfulness on anxiety, calculated pooled effect sizes, and tested for publication bias.

(d) A physician reported on three patients with a rare autoimmune condition who had all traveled to Southeast Asia within the previous six months.

(e) Researchers surveyed 50,000 adults about their dietary habits in 2010, then followed them through 2020, comparing cancer incidence between quartiles of processed meat consumption.

(f) A lab exposed mouse neurons to a chemical found in cosmetics and found increased markers of oxidative stress.


Exercise 2: Identifying Study Limitations

For each study, identify the primary methodological limitation(s).

(a) A survey asks participants whether they ate fast food "frequently" and correlates this with self-reported happiness scores.

(b) An RCT tests a new antidepressant against placebo but is funded entirely by the drug manufacturer, and the company's statisticians conduct all analyses.

(c) A study reports that a herbal supplement reduces blood pressure. Sample size: n = 18 participants; duration: 2 weeks.

(d) Researchers find a significant positive association between organic food consumption and lower cancer rates in a large cohort study, after controlling for age, sex, and smoking.

(e) A meta-analysis of 40 studies on vitamin D supplementation finds a significant effect on all-cause mortality. However, all 40 studies used different doses, different populations, different follow-up periods, and different outcome measures.


Exercise 3: Evaluating Causal Claims

For each claim, explain whether the evidence supports a causal conclusion. Apply the Bradford Hill criteria where relevant.

(a) A correlation study finds that children with more books in the home have higher academic achievement scores.

(b) A 30-year cohort study of 200,000 adults finds that those who exercise regularly have a 30% lower risk of cardiovascular disease than sedentary adults, controlling for age, BMI, diet quality, smoking, and socioeconomic status. The dose-response relationship is consistent.

(c) Ice cream consumption is positively correlated with drowning deaths across months of the year.

(d) A natural experiment shows that in regions where water fluoridation was adopted, tooth decay rates fell in the following decade, while matched non-fluoridated regions showed no comparable decline.


Exercise 4: The Evidence Pyramid in Practice

Rank the following pieces of evidence from strongest to weakest for supporting the claim "Regular resistance training reduces the risk of type 2 diabetes":

(a) A systematic review of 20 RCTs involving 2,500 participants, finding consistent risk reduction across trials. (b) A doctor's clinical observation that their patients who lift weights seem healthier. (c) A single RCT with 60 participants over 6 months showing significant reduction in HbA1c. (d) A prospective cohort study of 50,000 adults finding inverse correlation between self-reported strength training and diabetes incidence. (e) A case report of a previously diabetic patient whose diabetes went into remission after starting a strength training program. (f) A meta-analysis of 30 cohort studies involving 500,000 participants, finding consistent dose-response relationships.


Exercise 5: Confounding Analysis

For each association, propose at least two plausible confounding variables that might explain the relationship. For each, explain the mechanism by which the confounder could produce the association.

(a) People who eat breakfast regularly are less likely to be overweight.

(b) Children who watch more television have lower reading scores.

(c) Adults who take vitamin supplements have lower rates of heart disease.

(d) Neighborhoods with more trees have lower rates of mental illness.


Exercise 6: Bradford Hill Criteria Application

Apply the Bradford Hill criteria to evaluate the following causal claim. For each criterion, assess whether the available evidence meets it.

Claim: Heavy alcohol consumption causes liver cirrhosis.

Available evidence: - Alcoholics develop cirrhosis at rates 5-10 times higher than non-drinkers (relative risk). - Cirrhosis rates are strongly correlated with per capita alcohol consumption across countries. - Biological mechanisms (acetaldehyde toxicity, oxidative stress) are well-characterized. - Reduced alcohol consumption slows cirrhosis progression. - The dose-response relationship is well documented. - Cirrhosis in alcoholics is consistent across different populations and methodologies. - Temporal sequence is established: heavy drinking precedes cirrhosis.


Exercise 7: Simpson's Paradox

The following data comes from a hypothetical kidney stone treatment study:

Treatment A Treatment B
Small stones 81/87 = 93% 234/270 = 87%
Large stones 192/263 = 73% 55/80 = 69%
Overall 273/350 = 78% 289/350 = 83%

(a) Which treatment appears better in the overall data? (b) Which treatment is actually better for both small and large stones? (c) Explain why the overall and stratified results differ. (d) Which treatment should a patient prefer? Why? (e) What does this example teach us about aggregate statistics?


Exercise 8: Replication Crisis Mechanisms

For each scenario, identify which mechanism of the replication crisis it illustrates (publication bias, p-hacking, HARKing, underpowered study, motivated reasoning).

(a) A researcher tests 20 different outcome measures in their dataset, finds 3 significant results, and writes a paper presenting only those 3 as the pre-specified outcomes.

(b) A small-sample study (n = 30) finds a dramatic effect of a new therapy (p = 0.03). Six independent labs attempt to replicate with larger samples (n = 150-300 each) and find effect sizes near zero.

(c) A researcher analyzing data tries removing outliers, adding different covariates, and using different subgroup definitions until the p-value drops below 0.05.

(d) A journal receives 100 studies on a new drug: 40 find significant effects, 60 find null results. The journal publishes 35 of the 40 positive results and 5 of the 60 null results.

(e) A research team is highly invested in demonstrating that their intervention works. They interpret borderline results favorably, exclude inconvenient participants with post-hoc justifications, and stop data collection when the result first becomes significant.


Part B: Statistical Interpretation (Exercises 9–16)

Exercise 9: P-Value Interpretation

For each statement about p-values, identify whether it is correct or incorrect. For incorrect statements, provide the correct interpretation.

(a) "The p-value of 0.03 means there is a 3% probability that this result occurred by chance."

(b) "A p-value of 0.001 means the effect is large and practically meaningful."

(c) "A p-value of 0.04 means we can be 96% confident the null hypothesis is false."

(d) "The p-value is the probability of observing data at least this extreme if the null hypothesis were true."

(e) "Because p = 0.049, there is strong evidence against the null hypothesis. Because p = 0.051, there is no evidence against the null hypothesis."


Exercise 10: Confidence Interval Analysis

Interpret each confidence interval and draw appropriate conclusions.

(a) A study finds that a new drug reduces cholesterol by 12 mg/dL (95% CI: -2 to +26 mg/dL).

(b) A large RCT finds that a vaccine reduces infection probability by 92% (95% CI: 88% to 95%).

(c) An observational study finds that coffee drinkers have a 15% lower risk of depression (95% CI: 2% to 26%).

(d) A drug company reports that their new painkiller produces 20% greater pain relief (95% CI: 19.5% to 20.5%) compared to the existing drug.


Exercise 11: Absolute vs. Relative Risk

Convert each relative risk claim to an absolute risk claim and comment on the practical significance.

(a) "This drug reduces heart attack risk by 50%." Baseline heart attack risk in the studied population: 2% per year.

(b) "People who smoke have 15 times the risk of lung cancer of non-smokers." Lifetime lung cancer risk for non-smokers: approximately 0.5%.

(c) "Children eating processed meat daily have 18% higher risk of childhood leukemia." Baseline childhood leukemia risk: approximately 5 per 100,000 per year.

(d) "The drug reduces cancer mortality by 30% (relative risk reduction)." Absolute cancer mortality rate in the control group: 12%.


Exercise 12: Effect Size Evaluation

For each scenario, assess whether the effect size justifies the claim being made.

(a) A study of 100,000 students finds that school start time is significantly associated with GPA (p = 0.001, Cohen's d = 0.04).

(b) A clinical trial finds that a new antidepressant produces significantly greater improvement than placebo (p = 0.04, Cohen's d = 0.2) in a sample of 80 patients.

(c) A mindfulness app claims its program is "scientifically proven to reduce stress" based on a study finding p = 0.02, d = 0.15 in 50 participants.

(d) An RCT finds that a vaccine prevents 90% of infections (p < 0.001, NNT = 1.2) in a population where natural infection rate is 80% per season.


Exercise 13: Multiple Comparisons

A researcher conducts a study and tests 20 separate outcome measures (blood pressure, cholesterol, BMI, resting heart rate, grip strength, etc.). Using alpha = 0.05, the researcher reports 2 significant findings.

(a) With 20 independent tests at alpha = 0.05, what is the probability of finding at least one false positive by chance?

(b) Does finding 2 significant results in 20 tests provide strong evidence? Explain.

(c) What statistical correction could address this problem?

(d) The researcher reports only the 2 significant outcomes and claims they were the "primary endpoints." What methodological problem does this illustrate?


Exercise 14: Statistical vs. Practical Significance

Find the correct interpretation for each scenario:

(a) A study of 2 million people finds a statistically significant but 0.001% lower risk of hangnails among those who drink green tea (p = 0.00001).

(b) A study of 20 patients finds a 40% reduction in symptom severity on a scale of 10, but p = 0.08.

(c) A factory air purifier reduces fine particulate matter by 95% (p < 0.0001) in a test chamber that originally had PM2.5 levels of 2 μg/m³ (WHO guideline: 5 μg/m³).


Exercise 15: Publication Bias Detection

Describe the funnel plot method for detecting publication bias. Answer the following:

(a) What does a symmetrical funnel plot look like, and what does it indicate?

(b) What does an asymmetrical funnel plot look like, and what might it indicate?

(c) If a meta-analysis finds a significant effect but the funnel plot is asymmetrical, what does this suggest about the true effect size?

(d) What is Egger's test, and what does a significant Egger's test result suggest?


Exercise 16: Interpreting a Research Abstract

Read the following abstract and answer all questions below.

"Background: We investigated whether consumption of flavonoid-rich foods is associated with cognitive decline in older adults. Methods: A prospective cohort study following 6,835 participants aged 60+ over 12 years. Dietary assessment by food frequency questionnaire at baseline. Cognitive function assessed by standardized neuropsychological battery at years 4, 8, and 12. Results: Higher flavonoid consumption was significantly associated with lower rates of cognitive decline after adjustment for age, sex, education, BMI, and physical activity (β = -0.18, 95% CI: -0.29 to -0.07, p = 0.001). Effect was strongest in the highest vs. lowest consumption quintile. Conclusions: Flavonoid consumption may protect against cognitive decline in older adults."

Questions: (a) What study design is this? What evidence level? (b) What does the beta coefficient of -0.18 indicate? (c) What confounders were controlled? What important confounders might remain? (d) Can this study establish that flavonoids cause reduced cognitive decline? (e) How might publication bias affect this finding? (f) What next steps are needed before clinical recommendations could be made?


Part C: Media Analysis (Exercises 17–24)

Exercise 17: Science Headline Decoding

For each headline, identify the implied claim, identify likely translation errors or distortions, and write a more accurate headline.

(a) "Scientists PROVE coffee prevents Alzheimer's disease"

(b) "New vaccine DOUBLES cancer survival rates"

(c) "Children who watch TV are 40% more likely to develop attention problems"

(d) "Vitamin D supplements linked to 25% reduced all-cause mortality — should everyone be supplementing?"

(e) "Groundbreaking study: sitting is the new smoking"


Exercise 18: Press Release vs. Paper

Below is a press release excerpt and the corresponding abstract. Identify all the distortions introduced by the press release.

Press release: "Researchers at [University] have discovered that eating blueberries every day dramatically reduces the risk of heart disease. The study found that daily blueberry eaters were significantly less likely to develop heart problems, confirming what health practitioners have long suspected about the power of antioxidant-rich superfoods."

Actual abstract excerpt: "In this 8-week randomized crossover trial (n = 40, mean age 68, all participants had metabolic syndrome), daily consumption of 200g of blueberries vs. placebo powder significantly improved endothelial function (FMD: +1.53%, 95% CI: 0.28–2.78%, p = 0.02) and reduced blood pressure (-5.1 mmHg systolic, 95% CI: -9.3 to -0.9, p = 0.02). No significant effects were observed on total cholesterol, LDL, or fasting glucose. Longer-term studies are needed to determine whether these improvements translate to reduced cardiovascular event rates."


Exercise 19: Identifying Manufactured Controversy

Distinguish between genuine scientific controversy and manufactured controversy in the following examples:

(a) News reports featuring a debate between "experts for" and "experts against" the claim that vaccines cause autism.

(b) A scientific debate about whether the initial sensitivity of the Pacific Decadal Oscillation to greenhouse gases is positive or negative feedback.

(c) Media coverage presenting "two sides" on whether evolution occurred.

(d) Ongoing debate among researchers about the optimal protein intake for muscle hypertrophy.


Exercise 20: Relative Risk Media Exercise

Find a health news story in a newspaper or online news source (or use an assigned article). Analyze it for:

(a) What is the implied causal claim?

(b) What is the study design underlying the story?

(c) Are the risks presented as relative or absolute? If relative, calculate the absolute risk change.

(d) What are the limitations of the underlying study?

(e) What additional information would be needed to evaluate the claim?


Exercise 21: Animal Model Translation

A news story reports: "Scientists cure Alzheimer's disease in mice using new gene therapy. Human trials could begin within two years."

(a) What are the key limitations in translating findings from mouse models to humans?

(b) Why do promising mouse-model results frequently fail in human trials?

(c) How should this story be more accurately headlined?

(d) What evidence would need to exist before a human trial could be considered ethically justified?


Exercise 22: Consensus Identification

For each claim, determine whether it represents (A) established scientific consensus, (B) active scientific debate, or (C) fringe/unsupported claim, and provide reasoning.

(a) Human activity is the primary driver of current climate change.

(b) A low-carbohydrate diet is superior to a low-fat diet for weight loss.

(c) The universe is approximately 6,000 years old.

(d) There is an optimal dose of microplastics that is safe for human consumption.

(e) Vaccination against measles is effective and the MMR vaccine does not cause autism.

(f) Moderate red wine consumption reduces cardiovascular disease risk.

(g) General anesthesia poses risks that require careful informed consent.


Exercise 23: Pre-Registration Analysis

Read the following description of a study and evaluate whether pre-registration would have strengthened or weakened confidence in the results.

"Researchers enrolled 200 participants in a dietary intervention trial. After 6 months, they analyzed outcomes across 15 different biomarkers. They found significant improvements in 3 of the 15 biomarkers (p = 0.03, 0.04, and 0.02 respectively). They reported these three outcomes prominently while noting the other 12 showed no significant change. The study was not pre-registered."

(a) What is the probability of finding at least 3 significant results among 15 tests at alpha = 0.05 if there were no true effects?

(b) How would pre-registration have changed the interpretation of these results?

(c) What would make these results more convincing even without pre-registration?


Exercise 24: Evaluating a Systematic Review

A systematic review on a dietary supplement reports:

  • 25 studies included
  • Significant overall effect: d = 0.42 (95% CI: 0.28-0.56)
  • Funnel plot shows asymmetry (small studies show larger effects)
  • Egger's test: p = 0.02 (significant)
  • 80% of included studies were funded by the supplement manufacturer
  • 15 of the 25 studies used outcome measures designed by the manufacturer

Evaluate this systematic review. What is the likely true effect size? What confidence would you place in the reported result?


Part D: Applied Scientific Thinking (Exercises 25–32)

Exercise 25: Bayesian Updating

Apply Bayesian thinking to each scenario. Before seeing any data, what is your prior probability for the claim? After seeing the evidence described, how should this update?

(a) Prior: A homeopathic remedy diluted to 10^-23 (no molecules of active ingredient remain) cures the common cold. Evidence: A single small RCT (n = 30) finds p = 0.04.

(b) Prior: A drug with known anti-inflammatory mechanisms reduces markers of inflammation in arthritis patients. Evidence: A well-designed RCT (n = 400) finds significant reduction in CRP and ESR (p < 0.001), with consistent results across subgroups.

(c) Prior: Coffee consumption affects cognitive performance. Evidence: 50 independent studies show variable results; a meta-analysis finds a small but significant positive effect (d = 0.2, 95% CI: 0.1-0.3).


Exercise 26: Designing Better Studies

The following study designs have flaws. Propose improvements.

(a) To test whether herbal supplements reduce anxiety, researchers ask patients at a naturopath's office to rate their anxiety before and after taking supplements for one month.

(b) To test whether a new teaching method improves test scores, a school allows teachers to volunteer to use the new method; volunteer-teacher classrooms are compared to non-volunteer-teacher classrooms.

(c) To test whether high-fat diet causes cardiovascular disease, researchers ask participants to recall their dietary habits over the past 20 years and compare those who report high fat intake with those who don't.


Exercise 27: Applying Bradford Hill Criteria

Apply the Bradford Hill criteria to evaluate the causal claim: "Cigarette smoking causes lung cancer." List specific evidence for each criterion and assess the overall causal case.


Exercise 28: The Replication Crisis and Public Trust

Write a 400-word response to the following prompt:

"The replication crisis proves that science cannot be trusted to tell us the truth about health and medicine. We should be as skeptical of peer-reviewed research as we are of any other claim."

Your response should: acknowledge genuine problems, distinguish them from what they do and do not show, and explain what the replication crisis actually implies for evidence evaluation.


Exercise 29: Finding the Paper

For each type of science news claim, explain what you would need to look for if you searched for the underlying research to evaluate the claim properly.

(a) "Scientists link social media use to depression in teenagers."

(b) "New study shows intermittent fasting reverses aging at the cellular level."

(c) "Experts warn: common household chemical poses cancer risk."


Exercise 30: Cost-Benefit Thinking Under Uncertainty

A new blood test is proposed to screen for a rare cancer affecting 1 in 10,000 people per year. The test has 90% sensitivity (detects 90% of true cases) and 95% specificity (correctly identifies 95% of non-cases as negative).

(a) In a population of 100,000, how many true positives and false positives would be expected?

(b) What is the positive predictive value (PPV) of the test — the probability that a positive result indicates actual cancer?

(c) What does this imply about the practical utility of screening this population?

(d) How would your analysis change if the cancer prevalence were 1 in 100 instead of 1 in 10,000?


Exercise 31: Research Ethics and Replication

Pre-registration and registered reports are proposed solutions to the replication crisis. However, some researchers argue these requirements would impede exploratory research and scientific creativity.

Construct arguments for both sides of this debate, then identify what you think the strongest position is.


Exercise 32: Synthesis — Evaluating a Full Study Report

Below is a summary of a real-world type study. Apply all relevant tools from this chapter to provide a comprehensive evaluation.

"A prospective cohort study followed 12,000 adults for 10 years. At baseline, dietary habits were assessed using a food frequency questionnaire. Researchers found that those in the highest quartile of ultra-processed food consumption had 28% higher mortality during follow-up (HR = 1.28, 95% CI: 1.12-1.47, p < 0.001) compared to the lowest quartile, after adjustment for age, sex, smoking, physical activity, education, and BMI. A statistically significant dose-response relationship was observed. The study was observational and funded by a governmental public health institute. It has been replicated in three independent cohort studies from different countries."

Evaluate: study design, evidence level, statistical interpretation, causal inference, limitations, and overall conclusion.