Exercises: Lies, Distortions, and Honest Charts

DataField.Dev

Exercises: Lies, Distortions, and Honest Charts

These exercises do not require code. They require quantitative reasoning, critical evaluation, and ethical judgment. Many have no single correct answer — the quality of your reasoning matters more than the conclusion you reach.

Part A: Lie Factor Calculations (6 problems)

A.1 ★☆☆ | Apply

A bar chart shows Company A's market share at 22% and Company B's at 26%. The y-axis starts at 20%. Company A's bar is 10mm tall and Company B's bar is 30mm tall. Calculate the lie factor.

Answer

Data change: (26 - 22) / 22 = 18.2% increase. Visual change: (30 - 10) / 10 = 200% increase. Lie factor: 200 / 18.2 = 11.0. The chart exaggerates the difference by a factor of 11. Starting the axis at 20% transforms a moderate market share gap into a visual gulf. If the axis started at zero, Company A's bar would be approximately 85% the height of Company B's — a much more honest comparison.

A.2 ★★☆ | Apply

A news infographic shows two circles representing city populations. City X has 3 million residents and City Y has 6 million. The circle for City Y has twice the diameter of City X's circle. Calculate the lie factor.

Answer

Data change: (6 - 3) / 3 = 100% increase. Visual change: Area scales with the square of the diameter. If the diameter doubles, the area quadruples: (4 - 1) / 1 = 300% increase. Lie factor: 300 / 100 = 3.0. To represent a doubling of population honestly, the diameter of City Y's circle should be the square root of 2 (approximately 1.41) times the diameter of City X's circle, so that the areas are in a 2:1 ratio.

A.3 ★★☆ | Apply

A pictograph shows oil production using barrel icons. Country A produces 5 million barrels/day (icon: 20mm wide, 30mm tall). Country B produces 10 million barrels/day (icon: 40mm wide, 60mm tall — both dimensions doubled). Calculate the lie factor based on visual area.

Answer

Data change: (10 - 5) / 5 = 100% increase. Visual area of A: 20 x 30 = 600 sq mm. Visual area of B: 40 x 60 = 2400 sq mm. Visual change: (2400 - 600) / 600 = 300% increase. Lie factor: 300 / 100 = 3.0. This is the standard pictograph distortion. Doubling both width and height quadruples the area. If the pictograph were rendered in 3D (doubling depth as well), the visual volume would increase eightfold, producing a lie factor of 7.0.

A.4 ★★☆ | Apply

A line chart shows a stock price moving from $148 to $156 over one quarter. The y-axis runs from $140 to $160. Measured on the chart, the starting point is 40mm from the bottom of the plot area and the ending point is 80mm from the bottom. The plot area height is 100mm. Calculate the lie factor.

Answer

Data change: (156 - 148) / 148 = 5.4% increase. Visual change: (80 - 40) / 40 = 100% increase. Lie factor: 100 / 5.4 = 18.5. Note, however, that this is a line chart, not a bar chart. The viewer reads the slope and position, not the absolute distance from a baseline. The lie factor formula is most meaningful for encodings where the viewer compares lengths or areas to a reference point (zero). For line charts, the truncated axis is often appropriate — the question is whether the slope is being read as "dramatic" when the underlying change is small. The lie factor flags the issue; the chart maker must then judge whether the focused axis serves the viewer or misleads them.

A.5 ★★★ | Analyze

You encounter a chart with a lie factor of exactly 1.0. Does this guarantee the chart is honest? Identify three specific ways a chart with a perfect lie factor could still mislead.

Guidance

Think beyond visual proportions. A chart with a lie factor of 1.0 can still mislead through cherry-picked date ranges (the data shown is accurate but the window is chosen to support a narrative), context omission (missing denominators, comparisons, or uncertainty), Simpson's paradox (the aggregation hides subgroup reversals), misleading annotations or titles that frame the viewer's interpretation, or color choices that encode value judgments (red for one category, green for another) without data-driven justification.

A.6 ★★★ | Analyze

A government report shows two charts of the same healthcare spending data. Chart A uses a bar chart with a zero-based y-axis, showing spending of $3.8 trillion in 2020 and $4.1 trillion in 2024. Chart B uses a line chart with a y-axis from $3.5T to $4.5T, showing monthly spending over the same period with a clear upward trend and seasonal variation. Both have lie factors near 1.0. Which chart is more appropriate for a general public audience? Which for a health policy analyst audience? Justify your answers using concepts from this chapter and Chapter 2.

Guidance

Consider the visual encoding used (length vs. position), the audience's ability to interpret truncated axes, the information each chart reveals (magnitude vs. trend and pattern), and the risk of misinterpretation. The bar chart's zero baseline prevents overestimation of the growth magnitude for a general audience. The line chart's focused axis and monthly granularity reveals trend and seasonality that a policy analyst needs. Both are honest; the question is which framing serves which audience.

Part B: Distortion Identification (6 problems)

B.1 ★☆☆ | Remember

List the six most common visualization distortions covered in this chapter and give a one-sentence description of each.

Answer

1. **Truncated axis:** A non-zero baseline on a bar or area chart that exaggerates the apparent magnitude of differences. 2. **Dual axis chart:** Two variables plotted with independent y-axes, allowing the chart maker to manufacture or destroy apparent correlations by adjusting scale. 3. **Area distortion:** Scaling proportional symbols by radius or diameter instead of area, causing visual size to increase faster than the data. 4. **Cherry-picking:** Selecting a date range, subset, or comparison that supports a conclusion while omitting data that would complicate or contradict it. 5. **Misleading scale:** Using non-linear scales (e.g., logarithmic) without clear labeling, or using inconsistent intervals that distort comparisons. 6. **3D effects:** Adding perspective, depth, or tilt to charts, which introduces foreshortening and occlusion that distort every visual comparison.

B.2 ★★☆ | Understand

A colleague shows you a dual axis chart of Meridian Corp's monthly revenue (left axis, $380M-$420M) and employee headcount (right axis, 9,800-10,200). Both lines slope upward and appear to track each other closely. Your colleague says, "See? Hiring drives revenue." Explain why this conclusion is not supported by the chart, and propose an alternative visualization that would legitimately test the relationship.

Guidance

The apparent visual correlation is an artifact of the axis scaling. By adjusting either axis range, you could make the lines appear uncorrelated or inversely correlated. The spatial alignment exploits the Gestalt principle of common fate ([Chapter 2](../chapter-02-how-the-eye-sees/index.md)). A scatter plot of revenue vs. headcount, with each point representing a month, would honestly show whether the relationship exists. The viewer could assess the correlation from the scatter pattern rather than from a manufactured spatial alignment.

B.3 ★★☆ | Understand

You see a pie chart rendered in 3D, with the largest slice at the front of the tilted disk. Why does this specific placement of the largest slice matter? Use concepts from Chapter 2 (Gestalt principles, pre-attentive processing) to explain the perceptual mechanism.

Guidance

In a 3D pie chart, slices at the front of the tilted disk appear larger than slices of the same angular size at the back, because the viewer perceives both the surface area and the visible depth (the "thickness" of the wedge). Placing the largest slice at the front compounds this distortion — it appears even larger relative to slices at the back. Pre-attentive area comparison processes this size difference automatically, before the viewer can mentally correct for the perspective. The distortion is processed before critical evaluation begins.

B.4 ★★☆ | Apply

A social media chart shows "Crime Rate Soars 40% in Five Years" with a line chart of absolute crime counts rising from 10,000 to 14,000 incidents. What context is missing? Identify at least three specific pieces of information that could change the viewer's interpretation.

Guidance

Consider: (1) Population change — if the population grew 50% in the same period, the per-capita crime rate actually fell. (2) Changes in reporting practices — if police departments changed what they classify as reportable incidents, the increase may reflect reclassification rather than actual crime increases. (3) Comparison to regional or national trends — if crime rose 60% nationally, this city actually outperformed the average. (4) The crime categories included — "crime" could mean anything from parking violations to homicides, and the composition of the increase matters. (5) Whether the y-axis starts at zero, and over what exact time period the 40% was measured.

B.5 ★★★ | Analyze

You are reviewing a dashboard that Meridian Corp will publish in its annual report. One chart shows customer satisfaction scores across four product lines. The bars are colored green (scores above 80), yellow (70-80), and red (below 70). The actual scores are 82, 79, 76, and 71. Identify the ethical concern with this color-coding scheme. How does it interact with the concepts of annotation bias and framing effects?

Guidance

The traffic-light color scheme imposes value judgments that may not align with the data. The difference between 79 (yellow) and 82 (green) is only 3 points, but the color change from yellow to green creates a qualitative boundary — "warning" vs. "good" — that the 3-point difference does not support. Similarly, 71 (red/"danger") and 76 (yellow/"caution") are only 5 points apart. The color acts as an implicit annotation, telling the viewer how to feel about each score before they evaluate the numbers. This is annotation bias: the design choice frames the interpretation. If the thresholds are arbitrary or corporate policy rather than evidence-based, the color scheme manufactures significance.

B.6 ★★★ | Analyze

An environmental organization publishes a chart of deforestation rates in the Amazon. The chart starts in 2004 (the peak year of deforestation) and shows a downward trend to 2012. A logging industry group publishes a chart of the same data starting in 2012 and ending in 2022, showing an upward trend. Both charts are technically accurate. Using the ethical framework from Section 4.7, evaluate both charts. Are either of them honest? Is one more dishonest than the other?

Guidance

Apply the five ethical questions to each chart. Both have acceptable lie factors (assuming proper axes). But both fail Question 2 (alternative framing would change the message), Question 3 (critical context is missing — the full time series), and Question 4 (neither would survive scrutiny from the opposing side). Both are cherry-picking. Whether one is "more dishonest" depends on intent and context — but in terms of impact on the viewer, both create equally misleading impressions. The honest chart shows the full time series, making both the decline and the subsequent increase visible.

Part C: Ethical Reasoning (7 problems)

C.1 ★★☆ | Apply

Classify each of the following scenarios into Zone 1 (transparently honest), Zone 2 (carelessly misleading), Zone 3 (negligently misleading), or Zone 4 (deliberately manipulative):

(a) A data analyst accepts Excel's default axis range, which starts the y-axis at 50 for a bar chart of test scores ranging from 55 to 70.

(b) A journalist selects a date range for a stock price chart that makes a political figure's economic record look poor, even though the journalist is aware that extending the range by six months would show a recovery.

(c) A public health official displays vaccination rates with clear axes, uncertainty bands, and per-capita denominators, along with a note explaining the data source and collection methodology.

(d) A marketing team creates a bubble chart where the CEO's preferred product is represented by a bubble scaled by diameter rather than area, making it appear four times larger than its actual market share warrants.

Answer

(a) Zone 2 — carelessly misleading. The analyst did not intend to distort, but the software default created a truncated bar chart that exaggerates differences. The remedy is education and a pre-publication checklist. (b) Zone 4 — deliberately manipulative. The journalist has knowledge of the fuller context and consciously withholds it to support a narrative. This is cherry-picking with intent. (c) Zone 1 — transparently honest. The chart provides context, acknowledges uncertainty, and makes its editorial choices visible. (d) This could be Zone 3 or Zone 4 depending on whether the marketing team knows that area scaling is the correct approach. If they know and proceed anyway to please the CEO, it is Zone 3 (negligent) or Zone 4 (deliberate). If they genuinely do not know that doubling the diameter quadruples the area, it is Zone 2 (careless).

C.2 ★★☆ | Apply

You are asked to create a chart for Meridian Corp's board meeting showing that customer churn decreased from 8.2% to 7.8% over the last quarter. The VP asks you to "make it look impressive." Using the ethical framework, describe how you would create an honest chart that still communicates the improvement effectively. What would you refuse to do?

Guidance

Honest approaches: Use a clear title that states the improvement ("Customer Churn Declined 0.4 Percentage Points"). Show the trend over multiple quarters so the improvement is in context. Add the industry benchmark so the viewer can see whether 7.8% is good relative to competitors. Use annotation to highlight the change. These approaches are legitimate emphasis. What to refuse: Truncating the y-axis on a bar chart to make the 0.4 percentage point change look like a dramatic drop. Omitting quarters where churn was lower than 7.8%. Using a chart type (like a filled area chart with a non-zero baseline) that exaggerates the magnitude. The distinction: emphasis through context and annotation is honest. Emphasis through distortion is not.

C.3 ★★☆ | Apply

Apply the five ethical questions from Section 4.7 to the following chart: A line chart shows global average temperature anomaly from 1998 to 2012, with a flat or slightly declining trend line. The title reads "Global Warming Paused." Walk through each question.

Guidance

(1) Lie factor: Likely near 1.0 — the visual matches the data within the selected range. (2) Alternative framing: Extending the date range to 1970-2020 shows a clear warming trend. Starting in 1998 — an unusually warm El Nino year — is a classic cherry-pick that creates a flat trend by starting at an outlier peak. (3) What is missing: The full time series, context about why 1998 was anomalously warm, longer-term trend information, and uncertainty bands. (4) Adversary test: A climate scientist would immediately identify the 1998 start date as cherry-picking. The chart does not survive scrutiny. (5) Visible editorial choices: The date range choice should be explained and justified if the chart is to be honest. Without justification, it appears to be motivated selection.

C.4 ★★★ | Analyze

A hospital reports that its surgical mortality rate is 4.2%, compared to a national average of 3.1%. The hospital's chief medical officer asks you to create a chart that "puts our numbers in context." You discover that the hospital handles a disproportionate number of high-risk emergency cases. When you stratify by case severity, the hospital's mortality rate is below the national average in every severity category. This is Simpson's paradox.

Describe what chart or set of charts you would create. What would you show the chief medical officer? What would you recommend for the public-facing report? Address the tension between the aggregate number (unfavorable) and the disaggregated numbers (favorable).

Guidance

This is a genuine ethical dilemma. The aggregate number is real but misleading. The disaggregated numbers tell the true story but require more viewer effort to interpret. Options: (1) A faceted bar chart showing mortality by severity category, with hospital vs. national average in each facet — this honestly shows that the hospital outperforms in every category. (2) An accompanying chart showing the case mix — the distribution of severity categories — explaining why the aggregate is higher despite per-category superiority. (3) For the public report, both the aggregate and the disaggregated view, with a clear explanation. Showing only the disaggregated view without the aggregate would itself be cherry-picking (in reverse). The honest approach is to show both and explain the paradox.

C.5 ★★★ | Analyze

During a public health crisis, a government dashboard displays daily case counts without adjusting for reporting delays (which cause artificial dips on weekends and spikes on Mondays). A seven-day rolling average would smooth these artifacts but introduces a 3-4 day lag. Analyze the tradeoffs. Which display is more honest? Which is more useful? Can both be shown simultaneously, and if so, how?

Guidance

Daily counts are "honest" in the sense that they show real reported data, but they are misleading because viewers interpret weekend dips as genuine declines. The rolling average is more honest about the *trend* but less honest about the *timing* (it introduces lag). The best practice — used by many dashboards during COVID-19 — is to show both: the daily counts as light bars or dots in the background, and the 7-day average as a bold line overlaid. This allows the viewer to see both the raw data and the smoothed trend, and to understand the relationship between them. Annotations explaining the weekend reporting artifact add further context.

C.6 ★★★ | Analyze

A social media platform publishes a chart showing that "hate speech on our platform declined 30% in 2024." The chart shows the number of posts flagged by the platform's automated detection system. Identify at least four reasons why this chart might not mean what it appears to mean, even if every number is accurate.

Guidance

(1) The denominator: If total posts grew 50%, the flagged-post *rate* may have increased even as the count decreased. (2) Detection changes: If the automated system's sensitivity was reduced (e.g., to decrease false positives), fewer posts would be flagged regardless of actual hate speech levels. (3) Definition changes: If the platform narrowed its definition of "hate speech," fewer posts would qualify. (4) Migration effects: If hate speech moved to private groups, encrypted messages, or coded language that the system does not detect, it has not declined — it has become invisible. (5) Selection bias: The platform controls which metric to report and which to omit. A "30% decline in flagged posts" is not the same as "30% less hate speech experienced by users." The chart shows a platform metric, not a user experience metric.

C.7 ★★★ | Analyze

You discover that a chart in your company's investor presentation has a lie factor of 2.8 — the growth in a key metric is visually exaggerated by nearly three times. The chart was created by a senior executive. You are a junior analyst. What do you do? Frame your answer using the ethical framework from this chapter, and acknowledge the professional constraints.

Guidance

This is where ethics meets organizational reality. The framework provides clear guidance: a lie factor of 2.8 fails Question 1, and the chart likely fails Questions 4 and 5 as well. The professional challenge is how to raise the issue without insubordination. Approaches: (1) Frame it as a risk management issue — "If analysts or journalists recalculate the visual proportions, it could create a credibility problem for the company." (2) Offer a corrected version as an alternative — "I created a version with a zero-based axis that I think is also visually effective." (3) Cite professional standards or regulatory guidelines if applicable (SEC guidance on non-GAAP metrics, for example). (4) Document your concern in writing. The ethical framework does not guarantee that raising the issue will be comfortable or successful. It does guarantee that staying silent makes you complicit in the distortion.

Part D: Applied Design (6 problems)

D.1 ★★☆ | Apply

You have the following data for Meridian Corp's annual revenue: - 2020: $15.2B - 2021: $15.8B - 2022: $16.1B - 2023: $16.9B - 2024: $17.4B

Describe (do not draw) two versions of this chart: one that makes the growth look dramatic and one that gives an honest representation. Specify the chart type, y-axis range, and any annotations for each version. Calculate the approximate lie factor for both.

Guidance

Dramatic version: Bar chart, y-axis from $15.0B to $17.5B. The bars grow from near the bottom to near the top of the chart. Lie factor: the visual change from the shortest to tallest bar would be very large relative to the actual 14.5% total growth. Honest version: Bar chart, y-axis from $0 to $20B. The bars are all visually similar in height, with a modest upward progression. The visual change matches the actual change. Lie factor approximately 1.0. Add the percentage growth rate as a text annotation to highlight the trend that the honest bar chart makes visually subtle. Key insight: if the honest bar chart makes the growth "look small," that is because the growth IS small relative to total revenue. The chart is telling you the truth.

D.2 ★★☆ | Apply

Redesign the following problematic chart (described verbally): A 3D pie chart showing Meridian Corp's revenue by region — North America (52%), Europe (28%), Asia-Pacific (15%), Other (5%). The chart is tilted at 45 degrees with the North America slice at the front. Describe your redesigned version, explain each design choice, and reference at least two principles from this chapter.

Guidance

Replace the 3D pie chart with a horizontal bar chart sorted by magnitude (North America at top, Other at bottom). A bar chart uses length encoding — the most accurate channel per [Chapter 2](../chapter-02-how-the-eye-sees/index.md)'s hierarchy — instead of angle encoding (the pie chart) or area with perspective distortion (the 3D pie). Removing 3D effects eliminates the foreshortening distortion (Section 4.3) and improves the data-to-ink ratio (Tufte). Label each bar with the exact percentage for precision. Start the axis at zero to prevent lie factor distortion (Section 4.2). A single neutral color avoids the false categorical emphasis of multi-color pies.

D.3 ★★★ | Analyze

You are building a dashboard for a public health department during a disease outbreak. The dashboard needs to show daily case counts, cumulative cases, testing rates, and positivity rates. For each metric, identify one specific way the chart could accidentally mislead and describe how you would prevent it.

Guidance

Daily cases: Weekend reporting dips could be mistaken for real declines. Prevention: overlay a 7-day rolling average. Cumulative cases: Cumulative curves always go up, which can alarm viewers even when daily cases are declining. Prevention: annotate or pair with a daily-case chart so the viewer sees the rate of growth, not just the total. Testing rates: If testing increases, case counts rise even with constant disease prevalence. Prevention: show both testing volume and positivity rate so the viewer can distinguish between "more cases found" and "more cases occurring." Positivity rate: On days with very few tests, the positivity rate swings wildly. Prevention: show the rolling average and/or annotate days with low test counts so the viewer does not overinterpret noisy data points.

D.4 ★★★ | Analyze

A journalist asks you to help design a chart comparing CEO compensation at five major companies. The compensation packages range from $12 million to $48 million. The journalist wants to use proportional dollar-sign icons. Describe the specific distortion risk, calculate the lie factor if the icons are naively scaled, and propose a better design.

Guidance

If the $48M icon is scaled to be 4x the height and 4x the width of the $12M icon (to represent 4x the compensation), its visual area is 16x larger — a lie factor of 4.0. If rendered in a pseudo-3D style (doubling depth as well), the visual volume is 64x larger — a lie factor of 16.0. Better design: A horizontal bar chart with the exact compensation labeled. If pictographic icons are required for editorial style, use a unit-based approach: each icon represents a fixed amount (e.g., $5M), and the chart shows the appropriate number of icons per CEO. This scales linearly and avoids the area distortion entirely.

D.5 ★★★ | Analyze

You have climate temperature anomaly data from 1880 to 2025. You need to create one chart for a scientific journal audience and one chart for a newspaper front page. Describe how each chart would differ in: (a) y-axis range, (b) time range, (c) baseline period, (d) uncertainty representation, and (e) annotations. Justify each choice using the ethical framework.

Guidance

Scientific journal: (a) Y-axis focused on anomaly range (-0.5 to +1.5 C), because the audience expects this convention and reads position, not length. (b) Full time range 1880-2025 for completeness. (c) Standard baseline (1951-1980 or pre-industrial), stated explicitly. (d) Full uncertainty envelope showing wider bands for early data and narrower bands for recent data. (e) Minimal annotations — the audience will read the methods section. Newspaper: (a) Same y-axis range — this is a line chart, so truncation is appropriate. (b) Possibly shorter (1960-2025) for clarity, or full range with an annotation explaining the longer context. (c) Pre-industrial baseline, because the policy-relevant question is "how much warming since before industrialization." (d) Simplified uncertainty — perhaps a shaded band rather than individual confidence intervals, to avoid visual clutter. (e) More annotations: key events (Kyoto Protocol, Paris Agreement), the 1.5 C threshold, and a clear explanation of what "anomaly" means. In both cases, the five questions should be satisfied. The charts differ not in honesty but in emphasis, context, and assumed viewer expertise.

D.6 ★★★ | Analyze

You discover that a standard bar chart of Meridian Corp's four product lines shows virtually no visible difference (revenues of $4.1B, $4.2B, $4.3B, and $4.5B on a zero-based axis to $5B). The VP says the chart is "useless" because "you can't see the differences." You know that truncating the axis would exaggerate those differences. Propose a design solution that is both honest and informative — one that reveals the differences without distorting them.

Guidance

Several honest approaches: (1) Keep the zero-based bar chart and add the exact values as labels on each bar — the bars show magnitude honestly, and the labels provide the precision the viewer needs for comparison. (2) Create a second chart — a dot plot or bar chart of the *differences* from the lowest value, making the inter-product variation the explicit subject while clearly labeling what the chart shows. (3) Use a table for the absolute values and a chart for the growth rates or differences. (4) If a focused-axis chart is truly needed, switch to a dot plot (which does not rely on length from zero) and label the axis clearly. The key insight: if the differences are small relative to the total, the zero-based bar chart is telling the truth. "You can't see the differences" may be the honest message. The question is whether the *differences* or the *magnitudes* are the story — and you can design different charts for each.