Chapter 18 Quiz: Visualization Design — Principles, Accessibility, Ethics, and Common Mistakes

Q: A colleague creates a bar chart comparing quarterly revenue: Q1 = 4.5M, Q3 = 4.8M. The y-axis starts at $4.0M. The chart title says "Revenue Soared in Q4!" Identify all design problems and suggest fixes.

Problems: (1) Truncated y-axis (0 or use a dot plot with the relevant range. (2) Use a factual title: "Q4 Revenue Reached $4.8M, Up 12% from Q3." (3) Add comparison context: include prior year's Q4 or industry benchmark. (4) Consider showing revenue for multiple years to provide trend context.

Contributors to Introduction to Data Science

Chapter 18 Quiz: Visualization Design — Principles, Accessibility, Ethics, and Common Mistakes

Instructions: This quiz tests your understanding of Chapter 18. Answer all questions before checking the solutions. For multiple choice, select the best answer — some options may be partially correct. Total points: 100.

Section 1: Multiple Choice (8 questions, 5 points each)

Question 1. According to Cleveland and McGill's hierarchy, which visual encoding allows the most accurate comparison of quantities?

(A) Color saturation
(B) Angle
(C) Position along a common scale
(D) Area

Answer

**Correct: (C)** The hierarchy from most to least accurate is: position along a common scale, length, angle, area, color saturation. This is why scatter plots and dot plots (position) are more precise than pie charts (angle) or bubble charts (area) for comparing values. This is not a matter of personal preference — it reflects empirical research on human perceptual accuracy.

Question 2. What is the data-ink ratio?

(A) The percentage of the chart that uses dark-colored ink
(B) The ratio of ink used to represent data to total ink used in the graphic
(C) The number of data points divided by the chart area
(D) The ratio of text annotations to visual elements

Answer

**Correct: (B)** Introduced by Edward Tufte, the data-ink ratio measures the proportion of a chart's visual elements that actually represent data versus total visual elements. The principle is to maximize this ratio — remove decorative gridlines, 3D effects, background images, and other elements that do not encode data. A chart with only data-representing elements and essential labels has a high data-ink ratio.

Question 3. Why is the "rainbow" (jet) colormap problematic for data visualization?

(A) It uses too few colors
(B) It is perceptually non-uniform, inaccessible to colorblind viewers, and creates false boundaries
(C) It is too bright for print
(D) It only works with categorical data

Answer

**Correct: (B)** The rainbow/jet colormap has three problems: (1) **Perceptual non-uniformity** — equal data differences produce unequal perceived color differences, so some data variations appear larger than they are. (2) **Colorblind inaccessibility** — the red-green transitions are invisible to deuteranopic viewers (~8% of men). (3) **False boundaries** — sharp hue transitions create perceived category boundaries that do not exist in the data. Better alternatives include `"viridis"`, `"plasma"`, and `"cividis"`.

Question 4. What is the most important rule about y-axis ranges for bar charts?

(A) Always show the full data range with 10% padding
(B) Start the y-axis at zero
(C) Use a logarithmic scale
(D) Match the y-axis range to the data range for maximum detail

Answer

**Correct: (B)** Bar charts encode values as bar *length*. Viewers perceive bar length as proportional to the value — a bar twice as tall represents twice the value. If the y-axis starts at 85 instead of 0, a bar at 90 appears only 1/5 the height of a bar at 95 (5 vs. 10 units of height), when the actual ratio is 90/95 (nearly equal). Starting at zero ensures that bar length honestly represents the value. Note: this rule applies to bar charts specifically. Line charts and scatter plots can use meaningful ranges because position does not carry the "proportion of the whole" implication.

Question 5. Which of the following is the best approach for making a scatter plot accessible to viewers with color vision deficiency?

(A) Use very saturated, bright colors
(B) Use color alone but with the "colorblind" palette
(C) Use color combined with a second encoding like marker shape
(D) Convert the chart to grayscale

Answer

**Correct: (C)** While using a colorblind-safe palette (B) helps, it does not guarantee accessibility for all forms of color vision deficiency. The gold standard is **redundant encoding**: use color AND shape (or size, or pattern) so that even if a viewer cannot distinguish the colors, they can distinguish groups by the second channel. Converting to grayscale (D) removes color information entirely rather than using it accessibly. Very saturated colors (A) do not help with color confusion.

Question 6. What is cherry-picking in the context of data visualization?

(A) Selecting the most visually appealing colors for a chart
(B) Selecting a subset of data (time range, categories, or observations) that supports a predetermined conclusion
(C) Choosing the chart type that makes the data look best
(D) Picking the best-performing data points for display

Answer

**Correct: (B)** Cherry-picking means selecting only the data that supports your argument while omitting data that contradicts it. A common example: showing stock performance only during an upward period while hiding a subsequent decline. The data shown is accurate — no numbers are fabricated — but the incomplete selection creates a false impression. The fix: show the full available data, or show the subset clearly within the context of the full dataset.

Question 7. Which Gestalt principle explains why scatter plot points of the same color are perceived as belonging to the same group?

(A) Proximity
(B) Similarity
(C) Continuity
(D) Enclosure

Answer

**Correct: (B)** The Gestalt principle of **similarity** states that elements sharing visual properties (color, shape, size) are perceived as belonging to the same group, even when spatially separated. This is why `hue` encoding works — red dots scattered across a scatter plot are perceived as "the red group" and blue dots as "the blue group." Proximity (A) would group spatially close points regardless of color. Continuity (C) governs line following. Enclosure (D) requires a visual boundary.

Question 8. What should good alt text for a data visualization include?

(A) Only the chart title
(B) The chart type, what it shows, and the key finding
(C) A detailed description of every data point
(D) The Python code used to create the chart

Answer

**Correct: (B)** Good alt text describes: (1) the **chart type** ("Bar chart"), (2) **what it shows** ("of vaccination coverage by WHO region"), and (3) the **key finding** ("EURO leads at 93%, AFRO trails at 72%"). It should be concise but informative — a screen reader user should understand the main message. Describing every data point (C) is too verbose. Just the title (A) provides no data content. The code (D) is not helpful to someone who cannot see the chart.

Section 2: True or False (3 questions, 5 points each)

Question 9. True or false: A truncated y-axis (not starting at zero) is always misleading, regardless of the chart type.

Answer

**False.** A truncated y-axis is misleading for **bar charts** because bar length implies proportion of the whole. However, for **line charts** and **scatter plots**, using a relevant range (not starting at zero) is often appropriate and even preferable. If data ranges from 88 to 92, showing the full 0-100 range compresses all variation into an unreadable band at the top. The key distinction is whether the encoding (length vs. position) implies a baseline of zero.

Question 10. True or false: The "viridis" colormap is both perceptually uniform and accessible to viewers with the most common forms of color vision deficiency.

Answer

**True.** `"viridis"` was specifically designed (by Stéfan van der Walt and Nathaniel Smith, introduced in matplotlib 2.0) to be perceptually uniform (equal data steps produce equal perceived color changes), accessible to colorblind viewers, and readable when printed in grayscale. It achieves this by varying both hue and luminance systematically, rather than relying on hue alone like the rainbow colormap.

Question 11. True or false: Dual y-axis charts are never appropriate and should always be replaced with separate panels.

Answer

**False.** While dual y-axis charts are frequently misused and should be approached with caution, there are legitimate use cases — for example, showing temperature and precipitation on the same time axis when the relationship between them is the point of the chart and both variables are clearly labeled. The key is transparency: both axes must be prominently labeled, the scales should not be manipulated to create a false impression, and the viewer must be able to clearly distinguish which data corresponds to which axis. In most cases, separate panels are safer, but "never" is too strong.

Section 3: Short Answer (4 questions, 5 points each)

Question 12. Explain why pie charts are generally less effective than bar charts for comparing quantities. Reference Cleveland and McGill's perceptual hierarchy in your answer.

Answer

Pie charts encode values as **angles**, which ranks 4th in Cleveland and McGill's accuracy hierarchy. Bar charts encode values as **length**, which ranks 2nd. Humans are significantly worse at comparing angles than lengths. This means that a viewer can more accurately determine whether Bar A is 15% or 20% longer than Bar B than whether Slice A is 15% or 20% larger in angle than Slice B. The problem worsens with more slices — with 7-8 categories, adjacent slices of similar size (e.g., 12% vs. 14%) are nearly impossible to compare using angle alone.

Question 13. Name three specific things you can do to make a visualization more accessible to someone using a screen reader.

Answer

(1) Write descriptive **alt text** that states the chart type, what it shows, and the key finding. (2) Provide a **data table** alongside the chart so the screen reader can access exact values. (3) Use **semantic HTML headings** (H2 for chart title, H3 for sections) that allow the screen reader user to navigate the document structure. Other valid answers: provide a text summary of key patterns, use ARIA labels on interactive chart elements (in web contexts), ensure that interactive charts have keyboard navigation support.

Question 14. What is the difference between a sequential, diverging, and qualitative color palette? Give one example dataset that is best served by each type.

Answer

**Sequential**: A single-hue gradient from light to dark, for ordered data going in one direction. Example: population density by county (higher = darker). **Diverging**: Two hues radiating from a neutral center, for data with a meaningful midpoint. Example: temperature anomaly from historical average (above average = red, below = blue, average = white). **Qualitative**: Distinct, unrelated hues for unordered categories. Example: product categories in a sales chart (electronics = blue, clothing = green, food = orange — no inherent order or direction).

Question 15. Explain the "squint test" for chart readability and what to do if your chart fails it.

Answer

The squint test: squint at your chart (or view it from across the room) until the details blur. If you can still identify the main message — the key trend, the biggest bar, the dominant pattern — the chart communicates effectively. If the chart becomes an undifferentiated blur, it is too complex. **Fixes when it fails:** (1) Reduce the number of groups or data points. (2) Increase contrast between the most important element and everything else. (3) Facet into multiple simpler panels. (4) Remove chartjunk. (5) Add annotations that highlight the main finding. (6) Simplify the chart type (e.g., replace a complex multi-line chart with a focused comparison of the two most important lines).

Section 4: Applied Scenarios (3 questions, 5 points each)

Question 16. A colleague creates a bar chart comparing quarterly revenue: Q1 = $4.2M, Q2 = $4.5M, Q3 = $4.3M, Q4 = $4.8M. The y-axis starts at $4.0M. The chart title says "Revenue Soared in Q4!" Identify all design problems and suggest fixes.

Answer

**Problems:** (1) **Truncated y-axis** ($4.0M start) exaggerates the differences — Q4's bar appears 2-3x taller than Q1's, but the actual difference is 14%. (2) **Misleading title** — "soared" is editorializing; the increase from Q3 to Q4 is about 12%. (3) **No context** — is this growth normal? How does it compare to the same quarter last year? **Fixes:** (1) Start the y-axis at $0 or use a dot plot with the relevant range. (2) Use a factual title: "Q4 Revenue Reached $4.8M, Up 12% from Q3." (3) Add comparison context: include prior year's Q4 or industry benchmark. (4) Consider showing revenue for multiple years to provide trend context.

Question 17. You are designing a dashboard for a nonprofit that serves a diverse audience, including people with visual impairments. Name five specific accessibility features you would include.

Answer

(1) **Colorblind-safe palette** (e.g., `"colorblind"` or `"viridis"`) with redundant encoding (shape + color). (2) **Alt text** for every chart describing the type, content, and key finding. (3) **High-contrast text** (minimum 4.5:1 contrast ratio per WCAG AA). (4) **Data tables** alongside every chart so screen reader users can access exact values. (5) **Large, readable fonts** (minimum 12pt for body text, 14pt+ for chart labels). Other valid answers: keyboard-navigable interactive elements, ARIA labels, text summaries of key findings, avoiding reliance on color alone for any information, providing a "text-only" view option.

Question 18. You need to show how vaccination coverage has changed over 20 years for 6 WHO regions. A colleague suggests an animated bar chart race. Evaluate this choice using the principles from this chapter and suggest alternatives if appropriate.

Answer

**Evaluation:** A bar chart race (animated horizontal bars reordering each year) is engaging and attention-grabbing, but has design limitations: (1) It is hard to track individual regions across frames because positions change. (2) The animation makes precise comparison difficult — you cannot pause and compare two specific years side by side. (3) It prioritizes entertainment over analysis. (4) It requires a screen/device capable of playing animation (not accessible in print). **Alternatives:** (1) A **line chart** with one line per region — shows all 20 years simultaneously, readers can trace each region's trajectory. (2) A **small multiples** layout — one panel per region, shared y-axis, readers compare by scanning panels. (3) A **slope chart** showing start year vs. end year — highlights the magnitude of change clearly. The animated bar chart race is fine for social media engagement but poor for analytical communication.

Section 5: Code Analysis (2 questions, 5 points each)

Question 19. Identify all design problems in the following chart code:

fig, ax = plt.subplots()
colors = ["red", "green", "blue", "orange",
          "purple", "cyan"]
ax.bar(regions, values, color=colors,
       edgecolor="black", linewidth=2)
ax.set_ylim(70, 100)
ax.grid(True, linewidth=2)
ax.set_facecolor("#e0e0e0")

Answer

Design problems: 1. **Red-green color pair** — inaccessible to deuteranopic viewers. Red and green are the first two colors assigned to what are presumably the two most important regions. 2. **Six different colors for a single variable** — the bars represent different regions but the same metric (a value). Using different colors implies the colors encode something, but they do not (this is not a grouped bar chart). One color should be used for all bars. 3. **Truncated y-axis** (70-100) — for a bar chart, this exaggerates differences between regions. 4. **Heavy gridlines** (linewidth=2) — distracting chartjunk. 5. **Gray background** — reduces data-ink ratio; the background competes with the data. 6. **Thick black bar edges** (linewidth=2) — visual noise that adds no information. 7. **No title** — the reader does not know what the chart shows. 8. **No axis labels** — no units, no variable names.

Question 20. Review this alt text for a scatter plot and identify what is wrong. Then write improved alt text.

Original alt text: "A scatter plot. Blue dots and red dots. Some dots are higher than others."

Answer

**Problems:** (1) Does not state what the scatter plot shows (which variables). (2) Does not explain what the colors represent. (3) Does not describe the pattern or key finding. (4) "Some dots are higher than others" conveys no useful information. (5) No mention of data source or context. **Improved alt text:** "Scatter plot of GDP per capita (x-axis, in USD) versus vaccination coverage (y-axis, as a percentage) for 180 countries. Blue dots represent European and American countries; red dots represent African and Asian countries. There is a positive relationship: higher GDP is associated with higher coverage, but the relationship levels off above approximately $30,000 GDP per capita. Source: WHO/World Bank, 2023."