Answers to Selected Exercises

This appendix provides brief answers to selected Part A (conceptual) exercises from each chapter. For coding exercises, consult the companion repository for complete, runnable solutions.

Chapter 1: Why Visualization Matters

Exercise 1.1. A table of 500 rows of daily sales figures hides trends, seasonality, and outliers in raw numbers. A line chart reveals the upward trend instantly, while a summary statistic (mean) would conceal the seasonal dips visible in the chart. Visualization leverages the pre-attentive visual system, processing spatial patterns in milliseconds where reading numbers takes seconds per row.

Exercise 1.2. Anscombe's Quartet demonstrates that four datasets with identical summary statistics (mean, variance, correlation, regression line) look completely different when plotted. The takeaway: always plot your data before relying on summary statistics alone.

Exercise 1.3. The Challenger O-ring example shows that plotting temperature against O-ring failure incidents would have made the risk at low launch temperatures visually obvious. The engineers presented tables instead of charts, and the dangerous trend went unrecognized.

Chapter 2: How the Eye Sees

Exercise 2.1. Position along a common scale is the most accurately perceived encoding (Cleveland and McGill ranking). Length and angle are moderately accurate. Area and color saturation are least accurate. You should encode the most important variable with position and relegate secondary variables to less precise channels like color or size.

Exercise 2.2. Proximity causes nearby dots to be perceived as a group. Similarity (shared color or shape) further reinforces grouping. Enclosure (drawing a boundary) is the strongest grouping cue. These Gestalt principles explain why legend entries should match the visual order in the chart and why whitespace separates logical sections.

Chapter 3: Color

Exercise 3.1. A rainbow (jet) colormap is problematic because: (1) it is not perceptually uniform -- equal data steps do not produce equal perceived color steps; (2) it creates false boundaries where perceptual bands meet; (3) it fails completely for colorblind viewers. Replace it with viridis, plasma, or another perceptually uniform alternative.

Exercise 3.2. For a diverging dataset (departure from average), use a diverging palette like RdBu or PiYG centered at zero. For a sequential dataset (population density), use a sequential palette like viridis or YlOrRd. For categorical data (five product lines), use a qualitative palette like Set2 or the Wong palette.

Exercise 3.3. Test with a simulator (Color Oracle or Coblis). Use redundant encodings (pair color with pattern, shape, or direct labels). Stick to palettes known to be colorblind-safe (viridis family, Wong palette). Verify the chart still communicates when printed in grayscale.

Chapter 4: Lies, Distortions, and Honest Charts

Exercise 4.1. A truncated y-axis on a bar chart exaggerates differences because bars encode value through length from a baseline. A small absolute difference appears enormous when the baseline is moved from zero to near the minimum value. Line charts are more forgiving of truncation because they encode change through slope, not absolute length.

Exercise 4.2. The dual-axis chart is misleading because the two y-axis scales can be independently chosen to suggest a correlation that may not exist. Stretching or compressing either axis changes the apparent relationship. A better alternative is two aligned panels (one variable per panel, shared x-axis) or normalizing both series to a common scale.

Chapter 5: Choosing the Right Chart

Exercise 5.1. (a) Comparing quarterly revenue across five regions: horizontal bar chart, sorted by value. (b) Showing how market share changed over four years: stacked area chart or multi-line chart. (c) Exploring the relationship between advertising spend and sales: scatter plot. (d) Showing budget allocation across departments: horizontal bar chart (preferred) or pie chart (only if five or fewer departments).

Exercise 5.2. A pie chart with 12 slices is difficult to read because the human eye is poor at comparing angles, especially when slices are similar in size. Replace it with a horizontal bar chart sorted from largest to smallest. The bar chart allows precise comparison using position along a common scale.

Chapter 6: Data-Ink Ratio

Exercise 6.1. Elements to remove: heavy gridlines, background fill color, 3D bevel effects on bars, redundant legend when bars are already labeled, box border around the chart. Each of these consumes ink (pixels) without conveying data, lowering the data-ink ratio.

Exercise 6.2. Sparklines embed trend information directly in a table, replacing columns of numbers with a tiny line chart. They maximize data-ink ratio by eliminating axes, labels, and gridlines -- context is provided by the table row. Best suited for showing trend direction and relative magnitude, not precise values.

Chapter 7: Typography and Annotation

Exercise 7.1. An actionable title states the insight: "Sales grew 23% in Q4, driven by the Northeast region." A descriptive title merely labels the chart: "Quarterly Sales by Region." The actionable title tells the reader what to see; the descriptive title makes them search for it.

Exercise 7.2. Direct labeling places the series name next to the line itself, eliminating the need for a legend and the eye movement between chart and legend. It is preferred when there are 2--5 series and the lines are sufficiently separated. A legend is preferable when lines overlap heavily or when there are many series.

Chapter 8: Layout, Composition, and Small Multiples

Exercise 8.1. Small multiples work because they keep the same axes, scales, and encoding across panels, so the viewer only learns the encoding once and compares by scanning across panels. The shared framework eliminates confounding variables (different scales, different axis ranges) that would arise from comparing unrelated charts.

Exercise 8.2. Use a single column layout (panels stacked vertically) when the x-axis is time and the reader should compare time-aligned values across panels. Use a grid layout (rows and columns) when the categorical variable has many levels and vertical stacking would make each panel too short.

Chapter 9: Storytelling with Data

Exercise 9.1. The three-act structure: (1) Setup -- establish context, introduce the data, state the question. (2) Conflict -- present the surprising finding, the trend, or the problem revealed by the data. (3) Resolution -- deliver the insight, recommend an action, or present the conclusion. This structure mirrors how audiences naturally process information.

Exercise 9.2. Ordering charts in a presentation: lead with the big picture (the overall trend or the key metric), then drill into supporting details, then close with the recommendation. Do not lead with methodology. The audience needs to care about the question before they care about how you answered it.

Chapter 10: matplotlib Architecture

Exercise 10.1. Figure is the top-level container (the canvas). Axes is the plotting area within the Figure (contains the x-axis, y-axis, title, and data). Artist is any visual element that gets drawn: lines, text, patches, and the Axes itself. A Figure can contain multiple Axes; each Axes contains multiple Artists.

Exercise 10.2. The pyplot state machine (plt.plot()) implicitly tracks the "current" figure and axes. The object-oriented interface (fig, ax = plt.subplots(); ax.plot()) explicitly references the target Axes. The OO interface is recommended because it avoids ambiguity with multiple subplots and makes code self-documenting.

Chapter 11: Essential Chart Types

Exercise 11.1. Bin width controls the resolution of a histogram. Too few bins (wide) hides important features (bimodality, gaps). Too many bins (narrow) creates noisy spikes that obscure the overall shape. Start with Sturges' rule or 30 bins, then adjust to reveal the structure without noise.

Exercise 11.2. A scatter plot reveals the relationship between two continuous variables: direction (positive/negative), form (linear/nonlinear), strength (tight/dispersed), and outliers. A histogram of either variable alone cannot show any of these bivariate features.

Chapter 12: Customization Mastery

Exercise 12.1. Remove top and right spines, set a light y-axis grid, increase the title font size, use a descriptive title that states the insight, and apply a consistent color palette. These five changes immediately move a default matplotlib chart toward publication quality.

Exercise 12.2. rcParams sets global defaults applied to every subsequent figure in the session. Per-figure customization (calling ax.set_title(), etc.) overrides globals for that specific chart. Use rcParams for organization-wide style consistency; use per-figure calls for chart-specific adjustments.

Chapter 13: Subplots, GridSpec, and Multi-Panel Figures

Exercise 13.1. Use sharex=True or sharey=True in plt.subplots() to link axis ranges across panels. Shared axes ensure that spatial position encodes the same value in every panel, enabling valid visual comparison. Without shared axes, identical positions in different panels could represent different values.

Exercise 13.2. GridSpec is preferred over plt.subplots() when panels need unequal sizes (e.g., a wide panel on top and two narrow panels below). Use GridSpec(2, 2) and fig.add_subplot(gs[0, :]) to span the top row, then fig.add_subplot(gs[1, 0]) and fig.add_subplot(gs[1, 1]) for the bottom row.

Chapter 14: Specialized matplotlib Charts

Exercise 14.1. A bubble chart encodes a third variable as the area of each point. Map area (not radius) to data values, because the eye perceives area, not radius. In matplotlib, the s parameter in scatter() controls area in points squared, so pass values proportional to the data directly.

Exercise 14.2. A radar chart is appropriate for comparing a small number of entities (2--4) across 5--8 dimensions. It becomes unreadable with more than 8 axes or when entities overlap heavily. An alternative is a parallel coordinates plot, which handles more dimensions and avoids the distortion inherent in polar coordinates.

Chapter 15: Animation and Interactivity

Exercise 15.1. FuncAnimation calls a user-defined update function for each frame, modifying existing Artists in place (e.g., line.set_ydata(new_data)). ArtistAnimation takes a pre-built list of Artist collections, one per frame. FuncAnimation is more memory-efficient for long animations; ArtistAnimation is simpler when you can pre-compute all frames.

Chapter 16: Seaborn Philosophy

Exercise 16.1. Seaborn's figure-level functions (relplot, displot, catplot) create a FacetGrid and return a FacetGrid object. Axes-level functions (scatterplot, histplot, boxplot) draw on a single matplotlib Axes. Use figure-level functions when you need faceting; use axes-level functions when you need to place the plot in a specific subplot of a custom layout.

Chapter 17: Distributional Visualization

Exercise 17.1. A violin plot shows the full distribution shape (via KDE) plus summary statistics (median, IQR). A box plot shows only the five-number summary and outliers. The violin is preferred when the distribution is multimodal (multiple peaks), because the box plot hides that shape. The box plot is preferred when you have many groups (20+) and need a compact representation.

Chapter 18: Relational and Categorical Visualization

Exercise 18.1. Jitter adds small random noise to discrete values so that overlapping points become visible. Without jitter, 100 points at the same (x, y) position appear as a single dot. Apply jitter only to the categorical axis (or both if both variables are discrete), keeping the random displacement small enough that it does not distort the data distribution.

Chapter 19: Multi-Variable Exploration

Exercise 19.1. A pair plot is most useful with 3--6 variables. Below 3, individual scatter plots suffice. Above 6, the matrix becomes too large to read (36 panels for 6 variables). For higher dimensions, use parallel coordinates, dimensionality reduction (PCA/t-SNE), or targeted scatter plots of the most interesting pairs identified through correlation analysis.

Chapter 20: Plotly Express

Exercise 20.1. Plotly Express returns a Figure object that can be modified using .update_layout(), .update_traces(), and .update_xaxes()/.update_yaxes(). This means you can start with a one-liner and progressively customize without rewriting the chart from scratch in Graph Objects.

Chapter 21: Plotly Graph Objects

Exercise 21.1. Use make_subplots() to create a multi-panel Plotly figure with shared axes. Pass shared_xaxes=True and specify rows and cols. Add traces using fig.add_trace(trace, row=r, col=c). This is the Plotly equivalent of matplotlib's plt.subplots() with sharex=True.

Chapter 22: Altair

Exercise 22.1. In Altair, encoding channels map data fields to visual properties: x, y, color, size, shape, opacity, row, column. Altair infers scale and axis details from the data type (quantitative, nominal, ordinal, temporal). This declarative approach means you describe the mapping and Altair handles the rendering -- the opposite of matplotlib's imperative draw-this-line approach.

Chapter 23: Geospatial Visualization

Exercise 23.1. A choropleth encodes data as the fill color of geographic regions. It works well when regions are of similar size. It misleads when large, sparsely populated regions dominate the visual field (e.g., comparing Alaska to Rhode Island by population). Alternatives: cartograms (resize regions by data) or graduated symbol maps (circles placed at centroids).

Chapter 24: Network and Graph Visualization

Exercise 24.1. The spring (force-directed) layout positions densely connected nodes close together. It works well for small-to-medium networks (under 1,000 nodes) with community structure. It fails for very large or very dense networks, where it produces a hairball. For large networks, consider adjacency matrices, filtered subgraphs, or hierarchical layouts.

Chapter 25: Time-Series Visualization

Exercise 25.1. A rolling average smooths short-term fluctuations to reveal the underlying trend. The window size controls the trade-off: a short window (7 days) preserves detail but retains noise; a long window (365 days) reveals only the macro trend but hides seasonal patterns. Always plot the raw data alongside the rolling average so the reader can assess both.

Chapter 26: Text and NLP Visualization

Exercise 26.1. Word clouds are useful only for quick, exploratory impressions of term frequency. They are unsuitable for precise comparison because word size is hard to judge accurately, word placement is arbitrary, and long words appear visually larger regardless of frequency. For any analytical purpose, a horizontal bar chart of term frequencies is more accurate and readable.

Chapter 27: Statistical and Scientific Visualization

Exercise 27.1. Error bars should always be labeled: do they represent standard deviation, standard error, or a 95% confidence interval? Each conveys different information. Standard deviation describes data spread; standard error describes estimate precision; confidence intervals indicate the range likely to contain the true parameter.

Chapter 28: Big Data Visualization

Exercise 28.1. Datashader rasterizes data points into a fixed-resolution pixel grid, computing aggregates (count, mean, etc.) per pixel. This avoids overplotting because overlapping points contribute to the aggregate rather than occluding each other. The result scales to billions of points because rendering time depends on image resolution, not data size.

Chapter 29: Dashboards with Streamlit

Exercise 29.1. Streamlit reruns the entire script from top to bottom on every widget interaction. This means expensive computations (loading data, training models) should be wrapped in @st.cache_data or @st.cache_resource to avoid redundant work. State that must persist across reruns should be stored in st.session_state.

Chapter 30: Dashboards with Dash

Exercise 30.1. A Dash callback takes Input (triggers the callback when changed), Output (the component property to update), and optionally State (read a component's value without triggering the callback). This separation allows multiple inputs to trigger the same callback and prevents circular dependencies.

Chapter 31: Automated Reporting

Exercise 31.1. fpdf2 generates PDFs programmatically: add pages, insert text, embed matplotlib figures saved as PNG. python-pptx creates PowerPoint files: add slides, insert chart images, set titles and bullet text. Both enable scheduled, repeatable reporting pipelines where the same script produces an updated report each week with fresh data.

Chapter 32: Theming, Branding, and Style Guides

Exercise 32.1. A matplotlib style sheet is a plain-text file of rcParams key-value pairs. Save it in matplotlib's style directory or load it with plt.style.use("path/to/style.mplstyle"). This enforces consistent colors, fonts, figure sizes, and spine visibility across every chart produced by the team without manual per-chart configuration.

Chapter 33: The Visualization Workflow

Exercise 33.1. The workflow is: (1) Define the question. (2) Acquire and clean the data. (3) Explore with quick, disposable charts. (4) Choose the appropriate chart type. (5) Build and refine the visualization. (6) Add context (titles, annotations, source). (7) Export in the appropriate format (vector for print, interactive for web). Skipping step 1 is the most common mistake -- without a clear question, you produce charts that show data without saying anything.

Chapter 34: Capstone

Exercise 34.1. The capstone integrates the full workflow: loading the climate dataset, exploring distributions and trends, building static matplotlib charts, converting to interactive Plotly versions, assembling a Streamlit dashboard, and generating an automated PDF report. The exercise tests whether you can select the right chart for each question, apply design principles, and deliver a coherent data story from raw data to polished output.

Chapter 35: Visualization Gallery

Exercise 35.1. An anti-pattern is a chart that technically "works" but misleads or confuses: truncated bar charts, rainbow colormaps on sequential data, pie charts with 15 slices, spaghetti line charts with 20 series, or 3D effects on 2D data. For each anti-pattern, identify the perceptual or design principle it violates, then apply the remedy from the relevant chapter.