Exercises: Statistical and Scientific Visualization
These exercises assume import matplotlib.pyplot as plt, import matplotlib as mpl, import numpy as np, import pandas as pd, and optional import seaborn as sns, from statannotations.Annotator import Annotator.
Part A: Conceptual (6 problems)
A.1 ★☆☆ | Recall
What are typical single-column and double-column figure widths for Nature?
Guidance
Single-column: 89 mm ≈ 3.5 inches. Double-column: 183 mm ≈ 7.2 inches. Most journals have similar conventions with slight variations. Science uses 55/120/180 mm; PLOS uses 789/1651 pixels at 300 DPI.A.2 ★☆☆ | Recall
What is the difference between Type 42 and Type 3 fonts in PDF output?
Guidance
**Type 42** is TrueType font format, widely compatible and the default for modern matplotlib. **Type 3** is older PostScript subsetting, which many journals reject because fonts get embedded as character outlines rather than proper font definitions. Set `mpl.rcParams["pdf.fonttype"] = 42` and `mpl.rcParams["ps.fonttype"] = 42` to force Type 42.A.3 ★★☆ | Understand
Why must scientific figures show uncertainty (error bars, confidence intervals, bands)?
Guidance
A point estimate without uncertainty is incomplete. The reader cannot tell whether a difference between two means is meaningful or within noise. Uncertainty visualization lets the reader judge whether the data supports the claims. Peer reviewers routinely reject figures that report point estimates without uncertainty.A.4 ★★☆ | Understand
Describe the Wong colorblind-safe palette and when to use it.
Guidance
Bang Wong's 2011 palette has 8 colors (#000000, #E69F00, #56B4E9, #009E73, #F0E442, #0072B2, #D55E00, #CC79A7) designed to be distinguishable by all major forms of color blindness. Use for any publication figure with categorical color encoding, especially when the journal expects grayscale-printable output.A.5 ★★☆ | Analyze
Explain why a bar chart with error bars (dynamite plot) is considered weak for modern publication.
Guidance
Dynamite plots (Chapter 18) show only the mean and error, hiding the full distribution. They cannot reveal outliers, skewness, or bimodality. Modern reproducibility-focused guidance recommends showing individual data points (strip + box + summary) so the reader can see the underlying distribution. Some journals (e.g., *Nature Methods*) explicitly discourage dynamite plots.A.6 ★★★ | Evaluate
A colleague has prepared a 2-panel figure with 6 pt axis labels, black-and-white printout that obscures the red-green lines, and no panel labels. List the problems and fixes.
Guidance
(1) 6 pt is below the typical 7-9 pt minimum — increase to 7 or 8 pt. (2) Red-green is not colorblind-safe and fails in grayscale — switch to Wong palette or use redundant encoding (line style + marker). (3) Panel labels are missing — add "a", "b" in bold at the top-left of each panel using `ax.text` with `transform=ax.transAxes`. Also check font embedding (Type 42) and figure width (match single- or double-column).Part B: Applied (10 problems)
B.1 ★☆☆ | Apply
Set up matplotlib rcParams for a Nature single-column figure.
Guidance
import matplotlib as mpl
mpl.rcParams.update({
"pdf.fonttype": 42,
"ps.fonttype": 42,
"font.family": "sans-serif",
"font.sans-serif": ["Arial", "Helvetica", "DejaVu Sans"],
"font.size": 7,
"axes.titlesize": 8,
"axes.labelsize": 7,
"xtick.labelsize": 6,
"ytick.labelsize": 6,
"legend.fontsize": 6,
"figure.figsize": (3.5, 2.5),
"savefig.dpi": 300,
})
B.2 ★☆☆ | Apply
Create a scatter plot with error bars and an axis label using LaTeX math notation.
Guidance
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])
yerr = np.array([0.5, 0.8, 1.2, 1.5, 2.0])
fig, ax = plt.subplots(figsize=(3.5, 2.5))
ax.errorbar(x, y, yerr=yerr, fmt="o", capsize=3, color="black")
ax.set_xlabel(r"$x$ (unit)")
ax.set_ylabel(r"$y = x^2$")
ax.set_title(r"$y$ vs. $x$")
B.3 ★★☆ | Apply
Build a 2×2 panel figure with panel labels "a", "b", "c", "d" in bold.
Guidance
fig, axes = plt.subplots(2, 2, figsize=(7.2, 5.4))
for ax, letter in zip(axes.flat, "abcd"):
ax.text(-0.15, 1.05, letter, transform=ax.transAxes,
fontsize=12, fontweight="bold", va="top")
B.4 ★★☆ | Apply
Plot a regression line with a 95% confidence band using fill_between.
Guidance
fig, ax = plt.subplots(figsize=(4, 3))
ax.scatter(x, y, alpha=0.5, s=10)
ax.plot(x, y_pred, color="steelblue")
ax.fill_between(x, y_lower, y_upper, color="steelblue", alpha=0.2, label="95% CI")
ax.legend()
B.5 ★★☆ | Apply
Create a QQ plot of residuals using scipy.stats.probplot.
Guidance
import scipy.stats as stats
residuals = np.random.randn(100)
fig, ax = plt.subplots(figsize=(3.5, 3.5))
stats.probplot(residuals, dist="norm", plot=ax)
ax.set_title("Normal QQ Plot")
B.6 ★★☆ | Apply
Use statannotations to add significance brackets to a seaborn boxplot.
Guidance
import seaborn as sns
from statannotations.Annotator import Annotator
df = pd.DataFrame({"group": ["A", "A", "B", "B", "C", "C"] * 10,
"value": np.random.randn(60)})
fig, ax = plt.subplots(figsize=(4, 3))
sns.boxplot(data=df, x="group", y="value", ax=ax)
annotator = Annotator(ax, [("A", "B"), ("A", "C"), ("B", "C")], data=df, x="group", y="value")
annotator.configure(test="t-test_ind", text_format="star", loc="outside")
annotator.apply_and_annotate()
B.7 ★★★ | Apply
Build a forest plot showing 4 study effects plus a pooled estimate.
Guidance
studies = ["A", "B", "C", "D", "Pooled"]
effects = [0.1, 0.2, -0.05, 0.15, 0.1]
lowers = [0.0, 0.1, -0.15, 0.05, 0.04]
uppers = [0.2, 0.3, 0.05, 0.25, 0.16]
fig, ax = plt.subplots(figsize=(4, 3))
y_pos = np.arange(len(studies))[::-1]
for i, (s, e, lo, hi) in enumerate(zip(studies, effects, lowers, uppers)):
y = y_pos[i]
m = "D" if s == "Pooled" else "s"
ax.plot([lo, hi], [y, y], color="black")
ax.scatter(e, y, marker=m, s=60 if m == "D" else 40, color="black")
ax.axvline(0, color="gray", linestyle="--")
ax.set_yticks(y_pos)
ax.set_yticklabels(studies)
ax.set_xlabel("Effect size (95% CI)")
B.8 ★★★ | Apply
Create a volcano plot from a dataset of effect sizes and p-values.
Guidance
fig, ax = plt.subplots(figsize=(4, 4))
significant = (p_values < 0.05) & (abs(effects) > 1)
ax.scatter(effects, -np.log10(p_values),
c=["red" if s else "gray" for s in significant], s=10, alpha=0.6)
ax.axhline(-np.log10(0.05), color="black", linestyle="--", linewidth=0.5)
ax.axvline(1, color="black", linestyle="--", linewidth=0.5)
ax.axvline(-1, color="black", linestyle="--", linewidth=0.5)
ax.set_xlabel("log2 fold change")
ax.set_ylabel("-log10 p-value")
B.9 ★★☆ | Apply
Save a figure as both PDF (vector) and TIFF (raster at 300 DPI).
Guidance
fig.savefig("figure.pdf", bbox_inches="tight")
fig.savefig("figure.tif", dpi=300, bbox_inches="tight")
B.10 ★★★ | Create
Build a complete 4-panel publication figure for the climate dataset: (a) time series with rolling mean, (b) regression scatter, (c) calendar heatmap, (d) regional bar chart. Use a journal-style rcParams, panel labels, and export as PDF.
Guidance
Follow the Section 27.12 structure. Apply the Nature style module, create a 2×2 subplot figure at `figsize=(7.2, 6)`, build each panel separately, add panel labels with `ax.text`, tight_layout or constrained_layout for spacing, save as PDF. The full code is 50-80 lines.Part C: Synthesis (4 problems)
C.1 ★★★ | Analyze
Take a figure from a recent paper you have read and evaluate it against the Section 27.11 checklist. Which items pass, which fail?
Guidance
Most published figures pass most items, but many fail at least one or two. Common failures: missing panel labels, insufficient font size, non-colorblind-safe palettes, missing n values in caption, incomplete error-bar definitions. The exercise is subjective but develops a critical eye for production quality.C.2 ★★★ | Evaluate
A reviewer comments: "Figure 2 uses Type 3 fonts. Please resubmit with Type 42." What happened, and how do you fix it?
Guidance
Older matplotlib versions defaulted to Type 3 fonts for PDF output, and some journal PDF processors cannot handle them. Fix: set `mpl.rcParams["pdf.fonttype"] = 42` and `mpl.rcParams["ps.fonttype"] = 42` before creating the figure, then re-run the figure code. No content changes; just re-export.C.3 ★★★ | Create
Create a reusable Python module that applies your preferred journal style and provides convenience functions for sized figures and panel labels. Write a docstring explaining how to use it.
Guidance
See Section 27.13 for the basic template. Add docstrings, a main function that applies the style, and helpers like `figsize_single_col(aspect)` and `add_panel_label(ax, letter)`. Save as a module you can import into every figure script.C.4 ★★★ | Evaluate
The chapter argues that modern publication figures should emphasize effect sizes and reproducibility over p-value thresholds. Do you agree? What are the costs of this shift?
Guidance
The shift is well-motivated: effect sizes communicate practical significance better than p-values, and reproducibility catches errors that peer review misses. The costs: some older readers are not fluent in effect-size interpretation, and the shift requires learning new visualization conventions (estimation plots, forest plots). The transition is gradual but real, and the direction is clear.Chapter 28 covers big data visualization strategies for datasets too large for standard tools.