Case Study 2: The Gallery of Misleading Charts — A Forensic Analysis

Contributors to Introduction to Data Science

Case Study 2: The Gallery of Misleading Charts — A Forensic Analysis

Tier 3 — Illustrative/Composite Example: This case study follows Professor Okafor's data visualization class as students analyze a collection of misleading charts. All charts described are fictional composites inspired by common patterns found in media, corporate reports, and political advertising. No specific publication, company, or political figure is referenced. All data values, company names, and contexts are fabricated for pedagogical purposes. The misleading techniques demonstrated are real and widely documented in visualization literature.

The Setting

Professor Okafor opens class with a challenge: "Today I am going to show you five charts. All of them contain accurate data. Not a single number has been fabricated. And all of them are lying to you. Your job is to figure out how."

She projects the first chart.

Exhibit A: The Incredible Shrinking Tax Rate

The Chart

A bar chart from a fictional political advertisement titled "WE CUT YOUR TAXES!" The chart shows the average tax rate over four years:

Year	Tax Rate
2021	24.6%
2022	24.2%
2023	23.9%
2024	23.5%

The y-axis starts at 23.0% and ends at 25.0%.

What the Students See

The bar for 2024 is approximately one-quarter the height of the bar for 2021. It looks like taxes were slashed by 75%. The bars shrink dramatically from left to right, supporting the "WE CUT YOUR TAXES!" headline.

The Forensic Analysis

Student Maria speaks first: "The y-axis starts at 23%, not zero. The actual decrease is 1.1 percentage points over four years — about 4.5% of the original rate. But visually, it looks like a 75% reduction."

Professor Okafor nods. "What makes this particularly effective as propaganda?"

Student James: "The all-caps title with an exclamation point primes the reader to see a dramatic change. The truncated axis delivers the visual 'proof.' Even if you notice the axis, the emotional impression has already been formed."

The Honest Redesign

Professor Okafor shows two redesigns:

Redesign A: Zero-based bar chart

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(years, rates, color="#4d96ff", width=0.5)
ax.set_ylim(0, 30)
ax.set_ylabel("Average Tax Rate (%)")
ax.set_title("Average Tax Rate, 2021-2024\n"
             "Declined 1.1 percentage points "
             "(from 24.6% to 23.5%)")

With the y-axis starting at zero, the bars are nearly identical in height. The 1.1-point decrease is visible but proportionally small.

Redesign B: Dot plot with relevant range

fig, ax = plt.subplots(figsize=(6, 3))
ax.plot(years, rates, "o-", color="#4d96ff",
        linewidth=2, markersize=8)
ax.set_ylim(22, 26)
ax.set_ylabel("Average Tax Rate (%)")
ax.set_title("Tax Rate Declined Slightly\n"
             "From 24.6% to 23.5% over four years")

The dot plot allows a non-zero baseline because position (not length) encodes the value. The trend is visible without exaggeration.

Key lesson: The original chart used accurate data and a valid chart type but manipulated the axis range to create a false visual impression. The deception is in the design, not the data.

Exhibit B: The Cherry-Picked Stock

The Chart

A line chart from a fictional investment newsletter titled "Our Fund Has Outperformed the Market!" The chart shows the fund's performance from March 2023 to September 2023, a seven-month window during which the fund indeed rose 18% while the S&P 500 rose only 8%.

What the Students See

A steep upward line (the fund) dramatically outperforming a gentle upward line (the market). The visual impression is overwhelming outperformance.

The Forensic Analysis

Student Kenji has been doing his own research: "I looked up this fund's full history. From January 2022 to December 2024 — the full three-year period — the fund returned 12% total while the S&P returned 31%. They cherry-picked the only seven-month window where they beat the market."

Professor Okafor: "How would you visualize the full picture?"

The Honest Redesign

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(full_dates, fund_returns,
        label="Fund", linewidth=2)
ax.plot(full_dates, sp500_returns,
        label="S&P 500", linewidth=2,
        linestyle="--")

# Highlight the cherry-picked window
ax.axvspan("2023-03-01", "2023-09-30",
           alpha=0.15, color="yellow",
           label="Window shown in ad")

ax.legend(fontsize=10)
ax.set_title("Fund Performance vs. S&P 500\n"
             "Full 3-Year View (ad showed only "
             "highlighted window)")
ax.set_ylabel("Cumulative Return (%)")

The full picture shows the fund underperforming the market for most of the three-year period. The cherry-picked window is highlighted in yellow, making the selection visible and the deception transparent.

Key lesson: Cherry-picking does not fabricate data — it hides context. The antidote is always the full picture.

Exhibit C: The Magical Dual Axis

The Chart

A dual-axis chart from a fictional company's annual report titled "As We Invest in R&D, Customer Satisfaction Soars." The left y-axis shows R&D spending in millions (range: $10M-$50M). The right y-axis shows customer satisfaction score (range: 72-78 out of 100). Both lines go up and to the right, tracking each other closely.

What the Students See

Two lines moving together, implying that R&D spending causes customer satisfaction to rise. The chart is visually compelling — the parallel movement seems too consistent to be coincidence.

The Forensic Analysis

Student Priya: "I measured the actual changes. R&D spending increased by 400% — from $10M to $50M. Customer satisfaction increased by 8.3% — from 72 to 78. A 400% change and an 8.3% change are being shown as visually identical movements."

Student Omar: "And by adjusting the two y-axis ranges, you can make any two upward trends look parallel. Watch."

Omar sketches on the whiteboard: "If I change the right axis to 0-100 instead of 72-78, the satisfaction line becomes nearly flat. If I change the left axis to $0-$500M instead of $10M-$50M, the spending line becomes nearly flat. The perceived relationship depends entirely on the axis ranges, which the chart creator controls."

The Honest Redesign

Option A: Separate panels

fig, (ax1, ax2) = plt.subplots(2, 1,
                                figsize=(8, 6),
                                sharex=True)

ax1.plot(years, rd_spending, "o-",
         color="#4d96ff", linewidth=2)
ax1.set_ylabel("R&D Spending ($M)")
ax1.set_title("R&D Spending and Customer "
              "Satisfaction (2019-2024)")

ax2.plot(years, satisfaction, "o-",
         color="#E69F00", linewidth=2)
ax2.set_ylabel("Satisfaction Score (0-100)")
ax2.set_ylim(0, 100)  # Full scale

plt.tight_layout()

Option B: Normalized comparison

fig, ax = plt.subplots(figsize=(8, 4))

# Normalize both to % change from baseline
rd_norm = (rd_spending / rd_spending[0] - 1) * 100
sat_norm = (satisfaction / satisfaction[0] - 1) * 100

ax.plot(years, rd_norm, "o-",
        label="R&D Spending", linewidth=2)
ax.plot(years, sat_norm, "s--",
        label="Satisfaction", linewidth=2)
ax.set_ylabel("% Change from 2019 Baseline")
ax.legend()
ax.set_title("R&D Spending Grew 400% While "
             "Satisfaction Grew 8%")

The normalized version reveals the truth: R&D spending skyrocketed while satisfaction barely budged. The original dual-axis chart hid a 50:1 difference in magnitude.

Key lesson: Dual-axis charts allow the creator to manufacture visual correlation between any two trends. Separate panels and normalization reveal the true relationship.

Exhibit D: The Inflated Bubble

The Chart

An infographic from a fictional industry report showing market share for three companies. Company A has 40% market share, represented by a circle. Company B has 20%, also a circle. Company C has 10%, also a circle.

The circles have diameters proportional to the values: Company A's circle is 4 times the diameter of Company C's.

What the Students See

Company A's circle appears to have roughly 16 times the area of Company C's circle. The visual impression is total market domination, not a 4:1 ratio.

The Forensic Analysis

Student Lena works through the math: "If the diameter is proportional to value, then area is proportional to value squared. A diameter ratio of 4:1 means an area ratio of 16:1. But the actual data ratio is 4:1. The chart inflates Company A's visual dominance by a factor of four."

Professor Okafor: "This is one of the most common errors in infographic design. And it is an error, not always intentional deception — many designers do not realize that human perception responds to area, not diameter."

The Honest Redesign

fig, ax = plt.subplots(figsize=(6, 4))
companies = ["A", "B", "C"]
shares = [40, 20, 10]

ax.barh(companies, shares, color="#4d96ff",
        height=0.5)
for i, v in enumerate(shares):
    ax.text(v + 1, i, f"{v}%", va="center",
            fontsize=11)

ax.set_xlabel("Market Share (%)")
ax.set_title("Company A Leads with 40% "
             "Market Share")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set_xlim(0, 50)

Or, if circles are desired for aesthetic reasons:

# Area proportional to value (correct)
import numpy as np
sizes = np.array(shares) * 50  # Scale factor
ax.scatter([1, 2, 3], [1, 1, 1], s=sizes,
           color="#4d96ff", alpha=0.7)

Note: matplotlib.scatter takes s as area in points-squared, so passing values directly (scaled) produces correct area-proportional circles.

Key lesson: When encoding data as area, area (not diameter, not radius) must be proportional to the data value. Better yet, use bar charts where the encoding (length) is unambiguous.

Exhibit E: The Missing Context

The Chart

A press release chart titled "Customer Complaints Down 40%!" showing complaints declining from 500 in Q1 to 300 in Q4.

What the Students See

An impressive, steep decline. The company appears to have dramatically improved its service.

The Forensic Analysis

Student Elena asks the key question: "Down 40% from what? And why?"

She discovers (from a footnote in tiny print): the company lost 35% of its customers during this period due to a service outage. The number of complaints per customer actually increased from 0.005 to 0.006 — a 20% worsening.

Professor Okafor: "The chart is technically accurate. Complaints did decline from 500 to 300. But without the context of customer count, the chart tells the opposite of the true story."

The Honest Redesign

fig, (ax1, ax2) = plt.subplots(1, 2,
                                figsize=(12, 4))

# Left: raw complaints (the story they told)
ax1.bar(quarters, complaints, color="#4d96ff")
ax1.set_title("Total Complaints (Raw Count)")
ax1.set_ylabel("Number of Complaints")

# Right: complaints per customer (the real story)
ax2.bar(quarters, complaints_per_customer,
        color="#E69F00")
ax2.set_title("Complaints Per Customer\n"
              "(The Real Story)")
ax2.set_ylabel("Complaints per Customer")

fig.suptitle("Context Changes the Narrative:\n"
             "Total Complaints Fell Because "
             "Customers Left",
             fontsize=13, y=1.05)
plt.tight_layout()

Key lesson: Always ask "relative to what?" Raw counts without denominators can reverse the true story. Provide rates, percentages, or per-capita figures alongside raw counts.

The Class Conclusion

Professor Okafor closes with a synthesis: "Every chart you saw today used real data. Every number was accurate. And every chart lied. The lies were in the design choices: axis ranges, time windows, scale manipulation, area distortion, and missing context."

"As data scientists, you will create charts that hundreds or thousands of people may see. You have the power to inform or to mislead. The techniques you learned today are not just things to avoid — they are things to detect when others use them. Critical chart literacy is as important as critical reading literacy."

"The next time you see a chart in the news, a report, or a social media post, run through the forensic checklist:"

Does the y-axis start at zero (for bar charts)?
What time range is shown, and what would the full range look like?
Are there two y-axes, and who chose the scales?
Is area proportional to value, or to diameter/radius?
What context is missing — baselines, denominators, comparisons?

"If you can answer these five questions about any chart you encounter, you will never be fooled. And if you apply them to every chart you create, you will never fool others."

Pedagogical Reflection

This case study works differently from the others in this textbook. Instead of following one analyst through a workflow, it presents a classroom critique session that models the critical thinking process:

Accurate data can produce dishonest charts. The distinction between data accuracy and visual honesty is the core lesson.
Each technique exploits a specific perceptual weakness. Truncated axes exploit the "bar length = value" perception. Dual axes exploit the "parallel movement = correlation" perception. Area distortion exploits the "bigger = more" perception.
Honest redesigns are always possible. For every misleading chart, there is a straightforward honest alternative that uses the same data.
Detection requires practice. The forensic checklist at the end gives students a concrete tool for evaluating any chart they encounter in the wild.
Ethics is a design choice, not a data choice. The data is never the problem. The problem is always in the visualization decisions — which are made by humans, not algorithms.