Chapter 21 Key Takeaways: Data Journalism and Statistical Literacy

Core Concepts

1. Data Journalism Has Transformed Accountability Reporting

Organizations like FiveThirtyEight, The Upshot, ProPublica, and the Guardian Data Desk have demonstrated that systematic quantitative analysis can reveal systemic injustice, produce calibrated predictions, and hold institutions accountable in ways that anecdotal narrative reporting cannot. The best data journalism combines journalistic values (public interest, verification, accountability) with statistical rigor and information design skill. However, data journalism has its own failure modes — false precision, editorial bias in metric selection, and accessibility of data tools to bad actors as well as good ones.

2. Mean vs. Median — Always Ask Which Is Being Reported

In skewed distributions (income, wealth, home prices, social media follower counts), the mean is pulled upward by extreme values while the median reflects typical experience. When politicians and media report economic statistics using means in high-inequality environments, they systematically misrepresent what most people experience. The alert reader always asks: which measure of central tendency is this, and would the other measure tell a different story?

3. Relative Risk Without Absolute Context Is Almost Always Misleading

A "50% reduction in risk" is meaningless without the baseline risk. A 50% reduction in a 0.1% risk and a 50% reduction in a 20% risk are completely different in their practical implications. The Number Needed to Treat (NNT = 1/Absolute Risk Reduction) translates clinical trial results into the most patient-interpretable format: how many people must be treated to prevent one adverse event? Pharmaceutical marketing systematically favors relative risk framing to make modest benefits appear dramatic.

4. Base Rates Are the Invisible Context That Changes Everything

The probability that a positive test result means you actually have the disease (positive predictive value) depends critically on how common the disease is in the population being tested. In low-prevalence populations, even highly accurate tests produce predominantly false positives. This base rate principle applies to security screening, medical diagnosis, social policy, and virtually any classification system. When you see a reported false positive or true positive rate for any detection system, the first question is: what is the base rate of the thing being detected?

5. Cherry-Picked Timeframes Are the Most Common Political Statistical Manipulation

Almost any economic, crime, or social trend can be made to appear positive or negative by choosing the right starting point. The antidote is context: What does the full data series show? Is the chosen timeframe typical or selected? What trend predates the policy or the politician being credited?

6. Truncated Y-Axes on Bar Charts Are a Reliable Deception Indicator

Bar charts whose y-axis starts at a value other than zero visually misrepresent the proportional difference between values. For bar charts, the zero baseline is not arbitrary — it reflects the visual expectation that bar height encodes the full quantity. Check this on every bar chart you encounter. For line charts of continuous variables, truncated axes are sometimes defensible; for bar charts, they are almost always misleading.

7. Correlation Does Not Establish Causation — and Knowing This Is Not Enough

Most educated readers know the correlation/causation cliché. Fewer can identify the specific alternative causal structures that could explain a given correlation (reverse causation, common cause, chance) or describe what evidence would distinguish them. The key question for any claimed association: what confounders might explain this? What design — ideally randomized — would more convincingly establish causality?

8. P-Values Are Widely Misunderstood and Frequently Misused

A p-value is not the probability that the null hypothesis is true. It is not the probability that the result was due to chance. It is the probability of observing results at least as extreme as those obtained, if the null hypothesis were true. Statistical significance at p < 0.05 does not mean the finding is true, important, or replicable. Small p-values combined with large samples and small effect sizes are common and often represent trivially small, practically unimportant relationships.

9. The Replication Crisis Matters for Misinformation Research

A substantial fraction of published social psychology findings — including some cited in research on misinformation, persuasion, and cognitive bias — has failed rigorous replication. Findings like the "backfire effect" (corrections making people believe misinformation more strongly) have been substantially weakened by subsequent evidence. This does not invalidate the field but requires calibrated skepticism: prefer replicated findings over single studies, pre-registered over exploratory research, large samples over small, and findings with plausible mechanisms over counterintuitive surprises.

10. Polls Measure More Than They Appear To — And Less

The stated margin of error on a poll captures only sampling variability — the randomness of drawing a sample rather than surveying everyone. It says nothing about non-response bias, question wording effects, question order effects, social desirability bias, coverage gaps, or weighting errors. All of these can produce systematic poll errors much larger than the stated margin of error. Opt-in internet polls with large stated sample sizes (n = 10,000) have essentially meaningless stated margins of error because they are not probability samples.

11. Choropleth Maps Mislead Because Geography Is Not Population

Election results, COVID-19 case rates, income levels, and other population statistics mapped by geographic area systematically misrepresent the data when population density varies across the map. Sparsely populated large areas dominate the visual display. Always ask: does this map encode the variable I care about (votes, people, cases) or does it encode the geographic area of the political units used to display the data?

12. Effect Size Matters More Than Statistical Significance

For any reported finding — especially in medicine, psychology, and social science — the effect size is more informative than the p-value. Cohen's d, r, odds ratios, and NNT all quantify how large an effect is. A highly statistically significant but tiny effect (common in large-dataset studies) may be unimportant practically. A moderately significant but large effect may be highly important. Both significance and magnitude must be evaluated.

13. The GDP Does Not Measure What Most People Think It Measures

GDP measures total economic production, not typical living standards, wellbeing, sustainability, leisure time, household work, inequality, or health. GDP per capita divides total production by population, concealing distribution. A growing GDP accompanied by rising inequality can leave median household income stagnant. Alternative measures (median household income, Gini coefficient, Human Development Index, Genuine Progress Indicator) reveal aspects of economic reality that GDP conceals.

14. Official Unemployment Rate Excludes Millions of Workers

The U-3 unemployment rate counts only people without jobs who are actively searching. It excludes discouraged workers who have given up searching (captured in U-4), marginally attached workers not currently searching (U-5), and involuntary part-time workers (U-6). During economic downturns, the official unemployment rate can actually fall as discouraged workers leave the measured labor force. The labor force participation rate provides essential context.

15. The Data Literacy Checklist Is a Universal Tool

Every quantitative claim in media can be evaluated by asking: What is being measured? What is the comparison? What is the baseline? What timeframe was chosen? What sample was this drawn from? Who collected this data and why? Is this mean or median? Is this relative or absolute? Has this been replicated? Is this correlation or causation? Were confounders addressed? What would the visualization look like with a zero-baseline?

These fifteen questions, consistently applied, will not make you immune to statistical deception — no checklist can do that. But they will dramatically improve your ability to recognize when you are being misled by numbers, to ask the right follow-up questions, and to withhold the credibility that statistical presentation does not automatically deserve.

The Central Insight

Statistics are constructed artifacts, not neutral descriptions of reality. They are built from choices about what to measure, how to measure it, which denominator to use, which timeframe to select, how to visualize the result, and how to frame the conclusion. Every one of those choices can be made honestly or dishonestly, carefully or carelessly, in the service of truth or in the service of a predetermined conclusion.

Statistical literacy is the capacity to see those choices — to look through the apparent objectivity of a number to the human decisions that produced it. This does not require cynicism about statistics as a tool. It requires precisely the opposite: enough respect for what statistics can genuinely reveal to insist that they be used rigorously, and enough understanding of how they can mislead to resist being fooled.

In a media environment where quantitative claims are used to support every conceivable position, where data visualizations spread virally on social media, and where the authority of numbers is routinely exploited to lend false credibility to misleading claims, this is not an optional skill. It is a requirement of responsible citizenship.