Chapter 18 Exercises: Visualization Design — Principles, Accessibility, Ethics, and Common Mistakes

Contributors to Introduction to Data Science

Chapter 18 Exercises: Visualization Design — Principles, Accessibility, Ethics, and Common Mistakes

How to use these exercises: This chapter is more conceptual than Chapters 15-17, so the exercises emphasize critique, redesign, and written analysis alongside code. Parts A-D focus on Chapter 18 material. Part M mixes in earlier skills. Some exercises ask you to write paragraphs, not just code — design is a thinking skill, not just a coding skill.

Difficulty key: 1-star: Foundational | 2-star: Intermediate | 3-star: Advanced | 4-star: Extension

Part A: Conceptual Understanding (1-star)

Exercise 18.1 — Pre-attentive features

Name four pre-attentive visual features. For each, give a concrete example of how you would use it to draw attention to a specific data point in a scatter plot.

Guidance

Color hue (make the point red while others are gray), size (make it larger), shape (use a star while others are circles), position (place an annotation arrow pointing to it). Other valid answers: color intensity (darker than surrounding points), orientation (tilted marker), motion (animation highlighting it). The key insight is that pre-attentive features are processed in under 250ms, making highlighted elements "pop" instantly.

Exercise 18.2 — Gestalt principles

For each Gestalt principle below, describe how it applies to a common chart type:

Proximity
Similarity
Continuity
Enclosure
Connection

Guidance

1. Proximity: In a grouped bar chart, bars within a group are close together, perceived as belonging to the same category. 2. Similarity: In a scatter plot with `hue`, dots of the same color are perceived as a group even when scattered. 3. Continuity: In a line chart, the eye follows the line, perceiving the trend. 4. Enclosure: FacetGrid panels have borders, so the viewer treats each panel as a separate unit. 5. Connection: Lines between points in a line chart imply continuity — removing the line (dot plot) changes the perception.

Exercise 18.3 — Cleveland and McGill hierarchy

According to Cleveland and McGill, rank these visual encodings from most to least accurate for comparing quantities: area, angle, position on a common scale, color saturation, length.

For each encoding, name a chart type that uses it.

Guidance

1. Position on a common scale (scatter plot, dot plot). 2. Length (bar chart). 3. Angle (pie chart). 4. Area (bubble chart). 5. Color saturation (heatmap, choropleth). This hierarchy explains why bar charts are preferred over pie charts and why scatter plots are more precise than bubble charts for comparing quantities.

Exercise 18.4 — Data-ink ratio

In your own words, explain Tufte's data-ink ratio principle. Then list five examples of chartjunk that violate this principle. For each, explain why it is chartjunk (what it adds visually but not informationally).

Guidance

The data-ink ratio is the proportion of ink used to represent data versus total ink used in the chart. Maximize it by removing non-data elements. Chartjunk examples: (1) 3D effects on 2D charts (adds depth that distorts, encodes nothing). (2) Background images or patterns (compete for attention, encode nothing). (3) Gradient fills on bars (visual complexity, no meaning). (4) Excessive gridlines (visual clutter beyond what helps reading values). (5) Decorative borders or shadows (aesthetic only, no data purpose).

Exercise 18.5 — Color palette types

Match each dataset to the most appropriate color palette type (sequential, diverging, or qualitative), and explain why:

Average temperature across US states (higher is warmer)
Profit/loss by department (positive is profit, negative is loss)
Market share by company (no inherent order)
Customer satisfaction score change from last year (positive = improvement, negative = decline)
Population density by county (higher is denser)

Guidance

1. Sequential — temperature has one direction (low to high), no meaningful center. 2. Diverging — zero is a meaningful center (break-even); positive and negative are equally important. 3. Qualitative — companies are unordered categories. 4. Diverging — zero change is the meaningful center. 5. Sequential — density has one direction. The key distinction: use diverging when there is a meaningful midpoint; use sequential when values go in one direction; use qualitative when categories have no order.

Exercise 18.6 — Alt text practice

Write appropriate alt text for each of the following hypothetical charts:

A bar chart showing vaccination coverage for six WHO regions.
A line chart showing GDP growth over 20 years for five countries.
A scatter plot of height vs. weight for 500 people, colored by gender.

Guidance

1. "Bar chart of mean vaccination coverage by WHO region in 2023. EURO leads at 93%, WPRO at 91%, AMRO at 85%, EMRO at 82%, SEARO at 80%, and AFRO at 72%. Source: WHO." 2. "Line chart of GDP growth (%) from 2004 to 2023 for USA, China, India, Germany, and Brazil. China and India show the steepest growth trajectories, with China averaging 6-7% and India 5-6%. The USA, Germany, and Brazil show more moderate growth of 2-3%." 3. "Scatter plot of height (cm) vs. weight (kg) for 500 adults, colored by gender. Both genders show a positive correlation between height and weight. Male data points cluster higher and to the right, with mean height around 175 cm and mean weight around 80 kg. Female data points cluster lower and to the left, with mean height around 163 cm and mean weight around 65 kg."

Exercise 18.7 — Identifying misleading techniques

For each misleading technique below, explain (a) how it works, (b) why it is deceptive, and (c) how to fix it:

Truncated y-axis on a bar chart
Cherry-picked time range
Dual y-axes
Area distortion (doubling radius instead of area)

Guidance

1. (a) Starting the y-axis at a value other than zero. (b) Bar length implies proportion of the whole; truncation exaggerates small differences. (c) Start at zero, or use a dot plot instead of bars. 2. (a) Showing only the dates that support your argument. (b) Hides the full context, which might show a different pattern. (c) Show the full available time range, or show the subset within the full range. 3. (a) Two different y-scales on the same chart. (b) The creator controls the apparent relationship by adjusting each scale independently. (c) Use separate panels, normalize to a common scale, or compute a meaningful ratio. 4. (a) Encoding value as radius or diameter instead of area. (b) A 2x data increase produces a 4x visual increase. (c) Set area proportional to value (plotly and seaborn do this correctly with `size`).

Part B: Applied Skills (2-star)

Exercise 18.8 — Colorblind-safe redesign

Create a scatter plot with three groups using the default matplotlib color cycle. Then recreate it using: 1. The seaborn "colorblind" palette 2. Color + shape encoding (each group gets a different color AND marker shape) 3. Direct labels instead of a legend

Which version is most accessible? Which is most practical for 10+ groups?

Guidance

For version 1, use `sns.set_palette("colorblind")`. For version 2, use `hue` and `style` together. For version 3, place text labels at the last data point of each series. The color+shape version is most accessible because it provides two independent channels. Direct labels are most practical for many groups because legends with 10+ entries are hard to match to data. Color alone is never sufficient.

Exercise 18.9 — Before/after: Removing chartjunk

Create a deliberately cluttered bar chart with: - A gray background - Heavy gridlines - 3D-looking bars (use edgecolor and hatch patterns) - Different colors for each bar (single variable, no meaning) - A gradient title - A redundant legend

Then create a clean redesign following data-ink ratio principles. Compare the two side by side. Count the number of non-data visual elements in each version.

Guidance

The cluttered version should be ugly on purpose. The clean version: one color for all bars, no background, minimal or no gridlines, clean spines (remove top and right), simple title, no legend (since all bars represent the same variable). Count non-data elements: the cluttered version might have 8-10 (background, grid, edges, hatch, multi-color, legend, borders, etc.); the clean version might have 2-3 (axis labels, title, thin axis lines).

Exercise 18.10 — Truncated axis demonstration

Using the same data (e.g., test scores of 88, 90, 91, 89, 92 for five students), create: 1. A bar chart with y-axis starting at 0 2. A bar chart with y-axis starting at 85 3. A dot plot with y-axis showing the relevant range (85-95)

Write a paragraph explaining which is appropriate for which situation. Why is option 2 misleading for bars but option 3 is acceptable for dots?

Guidance

Option 1 is honest but makes the differences look tiny. Option 2 exaggerates differences because bar height encodes the full value (readers perceive "one bar is twice as tall = twice the value"). Option 3 is acceptable because dot position on a scale does not carry the "proportion of the whole" implication that bar length does. The dot plot says "these values are close together in the 88-92 range"; the truncated bar chart says "these values are dramatically different."

Exercise 18.11 — Pie chart replacement

You receive a pie chart showing market share for 8 products. The three smallest slices are nearly indistinguishable (7%, 6%, 5%). Redesign this as: 1. A horizontal bar chart, sorted by value 2. A treemap (using plotly or matplotlib)

Which redesign is better for precise comparison? Which is better for showing part-to-whole relationship?

Guidance

The horizontal bar chart is better for precise comparison (position/length on a common scale). The treemap preserves the part-to-whole relationship (areas sum to the total). Both are superior to the pie chart with 8 slices. The bar chart is the safest choice for most audiences because bar length is the second-most-accurately-perceived visual encoding.

Exercise 18.12 — Aspect ratio exploration

Take a time series with a gradual upward trend. Create three versions: 1. Aspect ratio 3:1 (very wide) 2. Aspect ratio 1:1 (square) 3. Aspect ratio 1:3 (very tall)

How does the perceived steepness of the trend change? Which aspect ratio follows Cleveland's "banking to 45 degrees" principle?

Guidance

The wide version makes the trend look nearly flat. The tall version makes it look dramatically steep. The square version (or one adjusted to bank slopes to 45 degrees) gives the most balanced perception. Cleveland's principle says: choose the aspect ratio where the average slope of the line is approximately 45 degrees. This maximizes the viewer's ability to perceive both upward and downward movements. The student should observe how the same data tells different stories depending on aspect ratio alone.

Exercise 18.13 — Dual axis critique

Create a dual-axis chart that appears to show a strong correlation between two unrelated variables (e.g., ice cream sales and shark attacks by month). Then create a redesign using two separate panels with a shared x-axis. Write a paragraph explaining why the dual-axis version is misleading and the panel version is honest.

Guidance

The dual-axis version is misleading because the creator controls both y-axis scales independently. By choosing scales that align the two lines, any two seasonal variables appear correlated. The two-panel version puts each variable on its own honestly-scaled y-axis, making the viewer judge each trend independently. The student should learn that visual proximity on a chart implies comparison, and dual axes exploit this by creating false comparisons through scale manipulation.

Exercise 18.14 — The design checklist in practice

Take a visualization you created in Chapters 15-17. Apply the full design checklist from Section 18.10. For each item, record whether it passes or fails. Fix all failures and present the before/after.

Guidance

Most student-created charts will fail at least 3-4 checklist items: missing alt text, default colors that may not be colorblind-safe, no source citation, redundant encodings, or insufficient labeling. The exercise forces the student to critically evaluate their own work, which is harder (and more valuable) than critiquing someone else's.

Part C: Real-World Applications (2-3 star)

Exercise 18.15 — Redesign a government chart (3-star)

Find a chart from a government report, news article, or company annual report (or use one provided by your instructor). Apply the redesign workflow from Section 18.8: 1. Identify the message 2. Audit the encodings 3. Check accessibility 4. Check honesty 5. Remove chartjunk 6. Add annotations

Recreate the chart in Python with your improvements. Write a one-page analysis comparing the original and your redesign.

Guidance

This is an open-ended exercise. Common issues students find: pie charts with too many slices, 3D effects, truncated axes, missing labels, poor color choices, and no source citation. The analysis should be specific: "The original used a rainbow colormap, which is perceptually non-uniform and inaccessible to the 8% of men with deuteranopia. I replaced it with 'viridis,' which is perceptually uniform and colorblind-safe."

Exercise 18.16 — Accessibility audit (2-star)

Take any three charts you created in earlier chapters. For each: 1. Simulate color vision deficiency (use an online simulator or convert to grayscale). 2. Check if all groups are still distinguishable. 3. Write alt text. 4. Check contrast ratios for text elements. 5. List specific changes needed to make each chart fully accessible.

Guidance

Converting to grayscale is a quick proxy for colorblind testing: if groups are distinguishable in grayscale, they will likely work for most forms of color vision deficiency. Common fixes: add shape encoding alongside color, switch to colorblind palette, increase font size, add direct labels. The student should discover that most of their earlier charts relied on color alone for group distinction.

Exercise 18.17 — The misleading chart gallery (3-star)

Create three deliberately misleading charts using real (or realistic) data: 1. A truncated-axis bar chart that exaggerates a small difference 2. A cherry-picked time range that reverses the apparent trend 3. A dual-axis chart that makes two unrelated variables appear correlated

For each, also create the honest version. Write a paragraph for each pair explaining how the misleading version deceives and how the honest version corrects it.

Guidance

This exercise teaches through construction. By creating misleading charts intentionally, students learn to recognize the techniques when others use them. The key learning is that all three techniques use real data — not a single number is fabricated — yet the visual impression is false. The dishonesty is in the design choices, not the data.

Exercise 18.18 — Visualization for different audiences (2-star)

Take a single dataset (e.g., vaccination coverage by region and year). Create three versions of the same information, each designed for a different audience: 1. A technical audience (fellow data scientists) 2. A general audience (newspaper readers) 3. An executive audience (busy decision-makers who will glance for 10 seconds)

Explain how the chart type, annotation level, color scheme, and complexity differ across audiences.

Guidance

Technical: more detail, possibly a pair plot or faceted regression with confidence bands, minimal annotation (the audience knows how to read it). General: simpler chart type (bar or line), more annotation, clearer title stating the finding, larger text. Executive: the simplest possible chart (single bar chart or KPI number), the main finding in the title, one key comparison, actionable implication in a subtitle. The student should learn that the same data serves different purposes for different audiences.

Exercise 18.19 — Annotation practice (2-star)

Take a line chart showing vaccination coverage over time for three regions. Create two versions: 1. Bare: no annotations beyond axis labels 2. Annotated: add a title stating the main finding, direct labels on each line, a reference line at 90% (herd immunity threshold), and a callout arrow pointing to the year when the gap between highest and lowest regions was largest

Compare the two versions. How does annotation change the reader's experience?

Guidance

The bare version requires the reader to study the chart, identify the patterns, and draw their own conclusions. The annotated version guides the reader to the key findings immediately. Annotations do not change the data — they change the narrative. The annotated version is better for communication; the bare version is better for exploration (it does not bias the viewer toward a specific interpretation). Most shared charts should be annotated.

Part D: Synthesis and Extension (3-4 star)

Exercise 18.20 — Complete redesign portfolio (4-star)

Collect five flawed visualizations (from the internet, textbooks, news articles, or your own earlier work). For each: 1. Screenshot the original 2. List all design problems (using the categories from this chapter) 3. Redesign in Python 4. Write alt text for the redesign 5. Explain each design decision

Present as a portfolio document with before/after pairs.

Guidance

This is a capstone exercise for Part III visualization. The portfolio demonstrates that the student can not only create charts but critically evaluate and improve them. Grading should assess: correctness of problem identification, quality of redesigns, thoroughness of alt text, and clarity of written explanations. Bonus for finding subtle problems (area distortion, colorblind inaccessibility) beyond obvious ones (chartjunk, truncated axes).

Exercise 18.21 — Design principles debate (3-star)

For each pair of statements below, argue both sides and then state your position with justification:

"All bar charts must start at zero" vs. "Sometimes showing a relevant range is more informative"
"Pie charts should never be used" vs. "Pie charts are effective for showing parts of a whole when there are 2-3 categories"
"Annotations bias the reader and should be minimized" vs. "Unannotated charts are irresponsible because readers may misinterpret them"

Guidance

These are genuine debates in the visualization community. For (1), the nuanced position is that bar charts should start at zero (because bar length implies proportion) but dot plots and line charts can use meaningful ranges. For (2), pie charts are defensible for 2-3 categories where the part-to-whole relationship is the point (e.g., "60% vs 40%"). For (3), the balance depends on context: exploration (minimal annotation) vs. communication (more annotation). The student should learn that design principles are guidelines, not absolute rules.

Exercise 18.22 — Ethics scenario analysis (3-star)

You are a data analyst at a company. Your manager asks you to create a chart for the quarterly report showing that customer satisfaction "increased significantly" from 78.2 to 79.1 (out of 100) over the past quarter. She suggests starting the y-axis at 77 to "make the improvement visible."

Is the suggested chart misleading? Why or why not?
What are three honest alternatives that still communicate the improvement?
Draft an email to your manager explaining your design recommendation.

Guidance

1. Yes, a bar chart starting at 77 would exaggerate a 0.9-point difference. The 77-80 range makes 79.1 look enormously higher than 78.2. 2. Honest alternatives: (a) A dot plot with a relevant range and explicit annotation "0.9-point improvement." (b) A bar chart starting at zero with annotation calling out the change. (c) A text-based KPI display: "78.2 -> 79.1 (+0.9 points, +1.2%)" with a trend arrow. 3. The email should be respectful but clear: "I'd recommend a format that highlights the improvement without exaggerating the scale. Here are three options that honestly show the 0.9-point gain..."

Part M: Mixed Review (integrating earlier chapters)

Exercise 18.23 — Tool selection and design (2-star)

For each visualization task below, choose the best tool (matplotlib, seaborn, or plotly) AND the best design approach (chart type, color palette, annotation strategy). Justify both choices.

A printed report showing correlation between 8 health metrics
A website allowing donors to explore vaccination data by country
A slide deck showing the distribution of income inequality across regions
A notebook exploration of whether education level predicts voting behavior

Guidance

1. seaborn heatmap, `"coolwarm"` diverging palette, mask upper triangle, `annot=True` — static for print. 2. plotly choropleth with hover, `"YlGnBu"` sequential palette — interactive for web. 3. seaborn violin or box plot, `"colorblind"` palette, large context (`"talk"`) — static for slides. 4. seaborn lmplot or relplot with faceting, any accessible palette — static for notebook. The design choice matters as much as the tool choice.

Exercise 18.24 — End-to-end visualization workflow (3-star)

Take the vaccination dataset through the complete visualization workflow: 1. Clean the data (Chapter 8 skills) 2. Explore with pair plot and distributions (Chapter 16) 3. Create an interactive map (Chapter 17) 4. Apply the design checklist (Chapter 18) 5. Redesign for accessibility 6. Write alt text for each final chart 7. Export static versions for a report and interactive versions for the web

Document each step and the design decisions you made.

Guidance

This integrates all of Part II (wrangling) and Part III (visualization). The documentation is as important as the code — the student should justify: why they chose each chart type, which palette and why, what they removed as chartjunk, how they ensured accessibility, and what story the visualizations tell. This is a near-professional workflow.

Exercise 18.25 — Teaching visualization principles (3-star)

Create a "cheat sheet" (one page, front and back) summarizing the most important visualization design principles from this chapter. It should include: 1. The pre-attentive features list 2. A "which chart type" decision guide 3. The accessibility checklist 4. The three most common misleading techniques and how to spot them 5. At least one before/after example

Design the cheat sheet itself as a well-designed visualization of information — practice what you preach.

Guidance

This meta-exercise asks students to apply design principles to the design of a reference document about design principles. The cheat sheet should be clean, well-organized, and scannable — not a wall of text. Use tables, short bullet points, and consistent formatting. The before/after example should be small but impactful (e.g., a truncated axis correction). Creating teaching materials reinforces learning by requiring the student to prioritize and synthesize.