Quiz: Lies, Distortions, and Honest Charts

Quiz: Lies, Distortions, and Honest Charts

20 questions. Aim for mastery (18+). If you score below 14, revisit the relevant sections before moving to Chapter 5.

Multiple Choice (10 questions)

1. Tufte's lie factor is calculated as:

(a) The ratio of data ink to total ink on the chart (b) The ratio of the visual effect size to the actual data effect size (c) The percentage of the chart area occupied by data versus decoration (d) The number of distortions present in a chart divided by the total number of design choices

Answer

**(b)** The ratio of the visual effect size to the actual data effect size. The formula is: Lie Factor = Size of effect shown in graphic / Size of effect in data. A lie factor of 1.0 means the visual representation faithfully matches the data. Values above 1.0 indicate exaggeration; values below 1.0 indicate minimization.

2. A bar chart showing corporate revenue has a y-axis that starts at $40 billion instead of zero. The revenues plotted are $42B and $46B. This truncation is problematic primarily because:

(a) It makes the chart harder to read (b) It violates international charting standards (c) Bar charts encode data through length, and the truncated baseline distorts the length ratios (d) Viewers will not notice the axis labels

Answer

**(c)** Bar charts encode data through length, and the truncated baseline distorts the length ratios. The key concept from Chapter 2: bar charts use length as their primary visual encoding. When the baseline is not zero, the visible bar lengths no longer correspond to the data magnitudes. The $46B bar appears to be several times taller than the $42B bar, even though the actual difference is less than 10%. The viewer's pre-attentive processing compares the bar lengths, not the axis labels.

3. Which of the following chart types is the MOST appropriate exception to the "start the y-axis at zero" rule?

(a) A bar chart comparing quarterly earnings (b) A line chart showing monthly temperature variation (c) A stacked area chart showing market share (d) A histogram showing the distribution of income

Answer

**(b)** A line chart showing monthly temperature variation. Line charts encode data through position and slope, not through length from a baseline. For temperature data — which varies around a central value rather than growing from zero — a zero-based y-axis would waste the majority of the visual field on empty space and compress the meaningful variation into an unreadable sliver. A focused axis that shows the range of meaningful variation is the more honest choice for a line chart.

4. A dual axis chart shows Company X's revenue on the left y-axis and customer satisfaction on the right y-axis. Both lines appear to rise in parallel. This apparent correlation is unreliable because:

(a) Customer satisfaction and revenue are never correlated in practice (b) The chart maker can manufacture or destroy the apparent correlation by adjusting either axis scale (c) Dual axis charts always reverse the actual relationship (d) The left and right axes use different units, making comparison impossible

Answer

**(b)** The chart maker can manufacture or destroy the apparent correlation by adjusting either axis scale. The fundamental problem with dual axis charts is that the visual alignment of two lines is controlled by the axis ranges, not by the data relationship. By widening or narrowing either y-axis range, the same data can be made to show positive correlation, negative correlation, or no correlation. The viewer perceives spatial alignment as indicating a relationship (the Gestalt principle of common fate), but the alignment is an artifact of scaling, not evidence of association.

5. In a bubble chart, a company with $2B in revenue is represented by a circle with a radius of 10mm. A company with $8B in revenue should have a circle with a radius of approximately:

(a) 40mm (4 times the radius) (b) 20mm (2 times the radius) (c) 28mm (approximately 2.83 times the radius) (d) 14mm (approximately 1.41 times the radius)

Answer

**(b)** 20mm (2 times the radius). The revenue is 4 times larger ($8B / $2B = 4). Area scales with the square of the radius. To make the area 4 times larger, the radius must increase by a factor of the square root of 4, which is 2. So the correct radius is 10mm x 2 = 20mm. A common error is to scale the radius by 4 (answer a), which would make the area 16 times larger — a lie factor of 4.0.

6. Simpson's paradox occurs when:

(a) A chart uses too many colors, confusing the viewer (b) A trend visible in aggregated data reverses when the data is disaggregated into subgroups (c) Two charts of the same data appear to tell different stories due to different axis scales (d) A chart omits error bars, making uncertain data appear precise

Answer

**(b)** A trend visible in aggregated data reverses when the data is disaggregated into subgroups. Simpson's paradox is a statistical phenomenon, not a design flaw. The classic example is the UC Berkeley admissions case, where the aggregate data suggested gender discrimination, but department-level analysis showed the opposite. The visualization implication: aggregation is an editorial choice, and disaggregating data is an essential check on whether the aggregate tells the true story.

7. Base rate neglect in visualization refers to:

(a) Forgetting to label the x-axis with the starting date (b) Failing to show the underlying prevalence of a phenomenon, leading viewers to misinterpret specific rates (c) Using a logarithmic scale without telling the viewer (d) Omitting the baseline period from a temperature anomaly chart

Answer

**(b)** Failing to show the underlying prevalence of a phenomenon, leading viewers to misinterpret specific rates. Base rate neglect is a cognitive bias. A chart showing that a test is "99% accurate" leads viewers to overestimate the probability that a positive result is correct — unless the base rate (prevalence) is also shown. If only 0.1% of the population has the condition, most positive results will be false positives. Visualization can correct this bias through designs like icon arrays that make the base rate visually salient.

8. The "data-to-ink ratio" principle, as defined by Tufte, argues that:

(a) Charts should use as many colors as possible to maximize information density (b) The proportion of ink encoding actual data should be maximized, and non-data ink minimized (c) Every chart should include at least one decorative element for visual interest (d) Digital charts should use higher resolution than printed charts

Answer

**(b)** The proportion of ink encoding actual data should be maximized, and non-data ink minimized. Tufte defines chartjunk as visual elements that do not encode data — decorative fills, 3D effects, ornamental illustrations, excessive gridlines. The data-to-ink ratio measures the fraction of total ink that represents data. The principle is that every element in a chart should serve the viewer's understanding. Elements that only decorate reduce the ratio and can introduce distortion (as with 3D effects).

9. A chart with a lie factor of exactly 1.0 is:

(a) Guaranteed to be an honest, ethical representation of the data (b) Free from visual distortion but may still mislead through cherry-picking, context omission, or aggregation effects (c) The only acceptable chart for publication in a professional context (d) Impossible to achieve in practice — all charts have some distortion

Answer

**(b)** Free from visual distortion but may still mislead through cherry-picking, context omission, or aggregation effects. The lie factor measures visual proportionality — whether the graphic representation matches the data magnitude. But a chart can have a perfect lie factor and still mislead through cherry-picked time ranges, missing context (denominators, comparisons, uncertainty), Simpson's paradox, biased annotations, or other non-visual distortions. The lie factor is necessary but not sufficient for honesty.

10. The chapter's ethical framework identifies four "zones" on the spectrum from honest to manipulative. Zone 3 (negligently misleading) is distinguished from Zone 2 (carelessly misleading) primarily by:

(a) The number of distortions in the chart (b) Whether the chart appears in a professional or personal context (c) Whether the chart maker has the knowledge to recognize the distortion but fails to apply it (d) Whether the chart uses software defaults or custom settings

Answer

**(c)** Whether the chart maker has the knowledge to recognize the distortion but fails to apply it. Zone 2 describes charts made without malice by people who lack training — they do not know that truncated bar charts distort, or that area must scale by area. Zone 3 describes charts made by people who possess this knowledge but proceed with the distortion anyway — because of time pressure, stakeholder requests, or convenience. The distinction is between ignorance and negligence. After reading this chapter, a reader who produces a distorted chart has moved from Zone 2 to Zone 3.

True/False (5 questions)

11. True or False: A line chart of stock prices should always start the y-axis at zero to be considered honest.

Answer

**False.** Line charts encode data through position and slope, not through length from a zero baseline. For stock prices, which vary around a non-zero value, starting the y-axis at zero would compress the meaningful variation into an unreadable band and waste the majority of the chart on empty space. A focused axis showing the range of price variation is typically more honest for a line chart, provided the axis is clearly labeled.

12. True or False: If a chart shows only accurate data points, with no fabricated numbers, it cannot be considered misleading.

Answer

**False.** A chart can mislead with entirely accurate data through truncated axes, area distortions, cherry-picked date ranges, context omission, biased aggregation (Simpson's paradox), misleading annotations, or dual-axis correlations. The chapter's central argument is that the most effective misleading charts show true data falsely — the distortion is in the design, not in the numbers.

13. True or False: 3D effects in charts can improve comprehension by making the data appear more tangible and concrete.

Answer

**False.** 3D effects introduce perspective foreshortening, occlusion, and depth ambiguity that distort every quantitative comparison the viewer attempts. Research consistently shows that 3D charts are decoded less accurately than their 2D equivalents. 3D effects decrease the data-to-ink ratio by adding non-data visual elements while simultaneously reducing encoding accuracy. There is no legitimate use case for 3D effects in quantitative data visualization.

14. True or False: Cherry-picking a date range is always a deliberate act of manipulation.

Answer

**False.** Cherry-picking can result from deliberate selection (Zone 4) but also from carelessness, default settings, or genuinely limited data availability (Zone 2). An analyst who shows the most recent 12 months of data because that is all they have is not cherry-picking — but an analyst who starts a trend at an anomalous peak to make a decline look worse is. The ethical question is whether the chart maker considered alternative date ranges and whether extending the range would change the story.

15. True or False: Icon arrays (grids of small figures) are effective at correcting base rate neglect because they make proportions visually salient.

Answer

**True.** Icon arrays translate abstract probabilities into spatial proportions that the viewer can perceive directly. A grid of 1,000 icons with 1 colored for "true positive" and 10 colored for "false positive" makes it visually obvious that most positive results are false positives — information that a "99% accuracy" label alone does not convey. This leverages pre-attentive processing (Chapter 2) to counteract a known cognitive bias.

Short Answer (5 questions)

16. Explain in two to three sentences why the threshold concept of this chapter — "every chart is an editorial" — follows logically from Chapter 1's concept that "every chart is an argument."

Answer

Chapter 1 established that charts make claims, present evidence, and use design choices as rhetoric. This chapter extends that idea: if every design choice (axis range, baseline, date window, aggregation) is a form of rhetoric, then every chart reflects a point of view about what matters and what the viewer should take away. There is no "neutral" arrangement of these choices — each combination emphasizes some aspects of the data and de-emphasizes others. The chart is therefore an editorial: a curated, framed perspective on the data, not a transparent window onto it.

17. A bubble chart correctly scales circles by area (not by diameter or radius). One circle represents 100 units and another represents 400 units. The larger circle has 4 times the area of the smaller. Is the lie factor 1.0? Why might the viewer still misperceive the comparison, even when the encoding is mathematically correct?

Answer

The lie factor is 1.0 — the visual areas are proportional to the data values. However, viewers systematically underestimate area differences. Research by Cleveland and McGill (referenced in Chapter 2) showed that area is a less accurately perceived encoding than position or length. Viewers tend to judge the larger circle as less than 4 times the size of the smaller one. This is a perceptual limitation, not a design error. The chart maker can mitigate it by adding data labels, but the inherent inaccuracy of area perception means that bubble charts always sacrifice some precision compared to bar charts or dot plots.

18. You are shown two charts of the same COVID-19 data. Chart A shows daily cases on a linear scale. Chart B shows daily cases on a logarithmic scale. Neither chart is mislabeled. Are both charts honest? Under what circumstances would one be more appropriate than the other?

Answer

Both charts can be honest if the axes are clearly labeled and the scale is identified. The linear scale shows the *magnitude* of case counts — the viewer sees how many people are affected each day. The logarithmic scale shows the *rate of change* — the viewer sees whether growth is exponential, sub-exponential, or declining. During the early exponential phase of an outbreak, the log scale reveals the growth rate clearly while the linear scale compresses early data into a flat line near zero. During a later phase with high counts and gradual decline, the linear scale is more informative. The choice depends on which question — "how many?" or "how fast?" — is most important for the audience. Failing to label the scale as logarithmic would be deceptive; choosing the scale that best answers the audience's question is not.

19. Describe a scenario where context omission is not a form of dishonesty but a legitimate design choice. What distinguishes this from dishonest context omission?

Answer

A chart designed for expert viewers in a familiar domain may legitimately omit context that a general audience would need. For example, a chart of monthly unemployment rates for labor economists does not need to explain what the unemployment rate measures, what "seasonally adjusted" means, or why it matters — the audience already knows. Omitting this context is a legitimate design choice that reduces clutter and focuses on the data. Dishonest context omission, by contrast, removes information that would change the viewer's interpretation. The distinction: legitimate omission removes what the viewer already knows; dishonest omission removes what the viewer needs but does not know. The test is whether an informed viewer would reach a different conclusion if the omitted information were restored.

20. The chapter argues that "impact does not require intent." Explain this principle in the context of a specific example: an analyst who creates a bar chart with a truncated axis because the software default auto-scaled the y-axis. The chart is presented to a board of directors who then make a strategic decision based on the exaggerated visual impression. Who is responsible?

Answer

The principle means that the viewer's misperception is the same regardless of whether the chart maker intended the distortion. The board members see bars that suggest dramatic growth, and their decision is influenced by that visual impression — whether the truncation was deliberate or accidental. Responsibility is distributed. The analyst bears responsibility for not checking the chart against the ethical checklist — even if the distortion was unintentional, professional standards require reviewing axes and encodings before presentation. The software bears some design responsibility for defaults that produce misleading charts. The board bears responsibility for making decisions based on visual impressions without examining the underlying data. But the primary ethical obligation falls on the analyst as the chart maker — the person with the most direct control over the design and the most relevant expertise. After completing a chapter like this one, "the software did it" is not a sufficient defense. The analyst's role is to catch what the software's defaults got wrong.