Case Study 2: How a Misleading Axis Fooled Millions — Deconstructing Bad Charts

Contributors to Introduction to Data Science

Case Study 2: How a Misleading Axis Fooled Millions — Deconstructing Bad Charts

Tier 2 — Attributed Real-World Examples: This case study examines real examples of misleading charts that appeared in news broadcasts, political campaigns, and corporate presentations. The specific examples described are based on widely documented and discussed instances catalogued by visualization researchers. Sources are attributed to the extent possible. Some contextual details are reconstructed for pedagogical clarity.

The Setting

It's a Tuesday evening in November, and Jordan — the university student investigating grading patterns at their school — is doing homework in front of the TV. A cable news segment catches their attention. The anchor is discussing a recent change in a government program, and a bar chart fills the screen. The chart shows two bars: one representing the program's enrollment last year, and one representing enrollment this year.

"As you can see," the anchor says, "enrollment has absolutely skyrocketed."

Jordan glances at the chart. The bar on the right is roughly four times taller than the bar on the left. That does look dramatic. But something nags at them. They pull out their phone, find the original data, and discover that enrollment went from 6.0 million to 6.3 million — a 5% increase.

Five percent. Not "skyrocketing." Not four times taller. Five percent.

Jordan looks at the chart again and sees it: the y-axis starts at 5.8 million, not at zero. In the narrow range from 5.8 to 6.5 million, the 0.3 million increase does look enormous. The bar chart's visual language — bars encode values as lengths, and a bar four times taller should mean a value four times larger — has been violated. The chart tells a visual lie, even though every number on it is technically correct.

Welcome to the world of misleading charts. In this case study, we'll examine real-world examples of charts that deceive — some through incompetence, some through deliberate manipulation — and build a systematic framework for spotting them.

Example 1: The Truncated Y-Axis

The scenario Jordan noticed is not hypothetical. Truncated y-axes in bar charts are one of the most common and well-documented forms of visual deception, appearing regularly on cable news, in political advertising, and in corporate presentations.

How It Works

A bar chart encodes values as lengths. When you see a bar that reaches twice as high as another bar, your brain registers "twice the value" — automatically, pre-attentively, before you even check the axis labels. This is Cleveland and McGill's visual encoding at work: length is one of the most accurately perceived visual properties.

When the y-axis starts at zero, the encoding is honest. A bar at 60 is twice as tall as a bar at 30. But when the y-axis is truncated — starting at 50, say — the bar at 60 appears to be ten units tall, and the bar at 30... isn't visible at all. In practice, truncation usually narrows the range to make modest differences look dramatic.

A Well-Known Example

One of the most widely cited examples of y-axis truncation appeared on a major U.S. cable news network during a segment about changes to the Affordable Care Act (ACA). The chart showed enrollment figures over several months, with bars representing enrollment counts. The y-axis started at approximately 6 million rather than zero, making fluctuations of a few hundred thousand enrollees — on a base of more than 6 million — look like dramatic swings. The chart was later highlighted by numerous media critics and data visualization experts as a textbook example of misleading axis truncation.

Alberto Cairo, a professor of visual journalism at the University of Miami and author of How Charts Lie: Getting Smarter about Visual Information (W. W. Norton, 2019), catalogued this and similar examples. Cairo's central argument is that charts can lie not by showing false numbers but by encoding truthful numbers in visually misleading ways.

The Fix

For bar charts, always start the y-axis at zero. This is not a stylistic preference — it's a requirement imposed by the visual encoding. If the differences you want to show are too small to be visible with a zero-baseline bar chart, that itself is meaningful information. Perhaps the differences really are small. Or perhaps a bar chart is the wrong chart type — a dot plot or line chart, where the encoding is position rather than length, can meaningfully zoom in on a narrow range without lying.

The Nuance: When Is a Non-Zero Axis OK?

Remember from Chapter 14's main text: line charts are different from bar charts. Line charts encode values as positions, and trends as slopes. A line chart of stock prices from $150 to $155 with a y-axis from $148 to $157 is perfectly fine — you're showing the shape of the change, not the absolute level. The y-axis doesn't need to start at zero because the line's height is not being compared to a baseline.

This distinction — lengths must start at zero, positions don't have to — is one of the most important principles in honest chart design.

Example 2: The Cherry-Picked Time Range

In 2015, a U.S. Congressional hearing featured a chart that was immediately flagged by statisticians and data visualization experts. The chart showed two trend lines: one for cancer screening services at an organization and one for abortion services, plotted over time. The lines crossed dramatically, suggesting that the organization had shifted its focus from health screening to abortions.

The problems were numerous and quickly identified by media fact-checkers and statisticians. The two lines were plotted on different implicit scales (the actual numbers for the two services were vastly different in magnitude, but the lines were presented as if they were on the same scale). The time range was chosen to maximize the visual drama of the crossing pattern. And the chart lacked axis labels that would have revealed the scale discrepancy.

This chart was discussed extensively in the media and became a widely used teaching example in data visualization courses. PolitiFact and other fact-checking organizations analyzed it in detail.

How Cherry-Picking Works

Any time series that fluctuates can be made to tell almost any story by choosing the right start and end dates: - Want to show growth? Start at a local minimum. - Want to show decline? Start at a local maximum. - Want to show stability? Start and end at similar values, even if there was wild fluctuation in between. - Want to hide a recovery? End the chart before the recovery begins.

This is not always deliberate deception — sometimes analysts genuinely don't realize that their choice of time range is loading the argument. But the effect is the same: the viewer gets a distorted picture.

The Fix

Show as much time context as reasonably available. If you must focus on a specific period, show the focused period in context — perhaps as a highlighted section within a longer time range. And always ask yourself: "Would the story change if I extended the chart by two years in either direction?" If the answer is yes, you probably need a wider range.

Example 3: The Dual Y-Axis Trap

Tyler Vigen, a law student at Harvard, created a website called "Spurious Correlations" that pairs unrelated time series datasets on dual-axis charts. The results are both hilarious and instructive:

U.S. spending on science, space, and technology vs. suicides by hanging, strangulation, and suffocation (correlation: 0.998)
Per capita cheese consumption vs. number of people who died by becoming tangled in their bedsheets (correlation: 0.947)
Nicholas Cage films released per year vs. number of people who drowned by falling into a swimming pool (correlation: 0.666)

Vigen's examples are deliberately absurd — no one seriously believes Nicolas Cage movies cause drownings. But they demonstrate a real danger: when two variables are plotted on independently scaled dual y-axes, any two upward trends will appear visually correlated, regardless of whether there's any actual relationship.

How It Works

The creator of a dual-axis chart gets to choose the scale of each y-axis independently. By adjusting the range of each axis, they can make any two lines appear to move in lockstep, diverge, or cross at any desired point. The visual impression of "correlation" is entirely controlled by the scale choices, not by the data.

This technique has been used in political arguments ("Crime rose at the same time immigration rose — look, the lines match!"), marketing ("Our social media engagement tracked perfectly with sales!"), and pseudoscience ("5G tower installations correlate with COVID cases!").

The Fix

Avoid dual y-axes entirely whenever possible. If you need to compare two variables: - Normalize them to a common scale (e.g., percentage change from baseline) and plot on a single axis. - Use two separate panels (facets) stacked vertically, each with its own clearly labeled axis. - Compute the actual correlation and report it as a number, rather than relying on visual impression.

If you must use dual axes (some industries and audiences expect them), clearly label both axes, use distinct visual styles (e.g., bars for one variable, a line for the other), and include a note acknowledging that visual alignment does not imply a causal relationship.

Example 4: Area and 3D Distortion

In the early days of USA Today's data graphics (1980s-1990s), the newspaper popularized colorful infographics that used three-dimensional icons to represent data. Oil barrels of different heights for petroleum production. Money bags of different sizes for budget figures. Airplanes of different scales for passenger counts.

The problem is geometric. When you scale an icon proportionally in all dimensions to represent a larger number: - Doubling the height also doubles the width, making the area 4x larger. - If it's a 3D icon, doubling the height, width, and depth makes the volume 8x larger.

The eye perceives area and volume, not linear dimensions. So a "twice as tall" icon looks four times as large in 2D or eight times as large in 3D. A 50% increase in a value gets displayed as an icon that appears to be over three times bigger.

Edward Tufte coined the term "lie factor" for this distortion:

Lie factor = (size of effect shown in the graphic) / (size of effect in the data)

A lie factor of 1.0 means the graphic is proportional to the data. A lie factor of 4.0 means the graphic makes the effect look four times larger than it actually is. Tufte documented lie factors of 5, 10, and even higher in published graphics.

The Fix

Don't use scaled icons to represent quantitative values. Use bar charts. If you must use icons for aesthetic reasons (in an infographic, for example), scale them in only one dimension (height) and keep the other dimension constant — essentially turning the icon into a fancy bar.

Example 5: The Missing Baseline or Context

In 2020, during the early days of the COVID-19 pandemic, charts of case counts and death counts proliferated across news media. Many of these charts were well-designed and informative. But some omitted crucial context:

Charts showing cumulative case counts (which can only go up, by definition) without noting that cumulative charts always increase — the relevant question is whether the rate of increase is accelerating or decelerating.
Charts comparing absolute case counts across countries with vastly different populations, without normalizing per capita.
Charts showing case counts without noting changes in testing capacity — more testing mechanically produces more detected cases, even if the underlying infection rate is stable.

None of these charts contained false numbers. But by omitting context, they led viewers to conclusions that the data didn't actually support.

The Fix

Always provide context. For comparison charts, normalize appropriately (per capita, per test, per hospital bed). For trend charts, consider whether cumulative or daily counts better answer the question. For any chart, ask: "What additional information does the viewer need to correctly interpret this data?"

A Framework for Chart Literacy

Based on these examples, here's a systematic framework for evaluating any chart you encounter. Think of it as the visualization equivalent of the critical reading skills you developed in Chapter 1:

The Five-Question Audit

1. What is the chart claiming? State the message in one sentence. "This chart claims that enrollment skyrocketed" or "This chart claims that two trends are related."

2. What are the axes? Check the y-axis: does it start at zero? Is the scale linear or logarithmic? Are there dual axes? Check the x-axis: what time range is shown? Is it complete or cherry-picked?

3. What is the visual encoding? Are values encoded as lengths (bar chart — must start at zero), positions (line/scatter — non-zero axis may be fine), areas (watch for distortion), or angles (pie chart — hard to read precisely)?

4. What is NOT shown? Is context missing? Benchmarks? Confidence intervals? A longer time range? Per capita normalization? The omission of relevant information is itself a form of distortion.

5. Who made this, and what is their incentive? A pharmaceutical company charting its drug's effectiveness has a different incentive than an independent researcher. A political campaign has a different incentive than a nonpartisan fact-checker. The incentive doesn't automatically mean the chart is wrong, but it tells you where to look more carefully.

Jordan's Application

Back in their dorm room, Jordan decides to apply this framework to their own project. They're investigating whether grading patterns at their university show bias across departments or demographics. They realize that their own charts could mislead if they're not careful:

If they make a bar chart of average GPAs by department, they need the y-axis to start at zero — otherwise a difference between 3.1 and 3.3 could look enormous.
If they compare grades across demographic groups, they need to control for department and course level — a raw comparison might just reflect that certain groups tend to major in disciplines with different grading norms.
If they show a trend over time, they need enough years of data to see the full picture — not just two cherry-picked years.

"Being honest with charts," Jordan writes in their notebook, "isn't just about not lying. It's about not accidentally lying either."

That sentence captures the heart of visualization ethics better than any textbook definition could.

The Responsibility of the Chart Maker

Here's the uncomfortable truth: you will have the power to mislead people with charts. After Chapters 15-17, you'll be able to create polished, professional-looking visualizations that can make small effects look enormous, hide inconvenient trends, and imply causation where none exists.

The techniques in this case study are not secret knowledge — they're standard features of every charting tool. Truncating an axis takes one line of code. Cherry-picking a date range is a simple filter. Dual axes are a matplotlib option. The tool doesn't care about your intentions.

The responsibility falls on you. Every time you make a chart, you're making ethical choices: - Where should the axis start? - What time range should I show? - What context am I providing — or omitting? - Does the visual impression match the actual magnitude of the effect? - Would a skeptical, informed viewer agree that my chart is fair?

These aren't just design questions. They're ethical questions. And they matter more than most people realize, because a single chart can shape opinions, drive policies, and influence millions of decisions.

As Alberto Cairo writes in How Charts Lie: "A chart can be simultaneously factually accurate and visually misleading. The numbers don't lie, but the chart does."

Your job, as a data scientist, is to make charts that don't.

Discussion Questions

Find a chart in a recent news article or social media post. Apply the Five-Question Audit from this case study. What did you find? Is the chart honest, misleading, or somewhere in between?
The pharmaceutical company example and the political chart example both involve charts where the numbers are technically correct but the visual impression is misleading. Is there a meaningful difference between showing false numbers and showing true numbers in a misleading way? Where do you draw the ethical line?
Tyler Vigen's "Spurious Correlations" are obviously absurd. But what about cases where two variables do have a plausible-sounding connection (e.g., ice cream sales and crime rates)? How do dual-axis charts make it harder to distinguish genuine relationships from coincidental ones?
Think about your own progressive project. Identify one specific chart you plan to make and describe at least two ways you could make that chart misleading (even unintentionally). Then describe the design choices you'll make to keep it honest.

Sources

Cairo, Alberto. How Charts Lie: Getting Smarter about Visual Information. New York: W. W. Norton, 2019.
Tufte, Edward R. The Visual Display of Quantitative Information. 2nd edition. Cheshire, CT: Graphics Press, 2001.
Cleveland, William S., and Robert McGill. "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods." Journal of the American Statistical Association 79, no. 387 (1984): 531-554.
Vigen, Tyler. Spurious Correlations. New York: Hachette Books, 2015. (Based on the website tylervigen.com.)
Monmonier, Mark. How to Lie with Maps. 3rd edition. Chicago: University of Chicago Press, 2018.
Huff, Darrell. How to Lie with Statistics. New York: W. W. Norton, 1954. (A classic popular treatment of statistical and visual deception.)

End of Case Study 2