> "Make every picture tell a story — but also make it pass peer review."
Learning Objectives
- Configure matplotlib for journal submission: figure sizes, DPI, font embedding
- Apply journal-required font specifications and handle Type 42 vs. Type 3 fonts
- Create multi-panel figures with panel labels (a, b, c, d) in journal style
- Visualize statistical results: confidence intervals, p-value annotations, significance brackets
- Create QQ plots, residual plots, and other diagnostic figures
- Export figures in PDF, EPS, TIFF, and SVG formats with correct settings
- Apply color-blind and grayscale-safe palettes for print journals
- Render LaTeX mathematical notation in matplotlib
In This Chapter
- 27.1 What Publication-Ready Actually Means
- 27.2 Journal Requirements: A Survey
- 27.3 Figure Sizing in Matplotlib
- 27.4 Font Settings and Embedding
- 27.5 Panel Labels
- 27.6 Confidence Intervals and Error Bars
- 27.7 Significance Brackets and P-Value Annotations
- 27.8 Diagnostic Plots: QQ Plots and Residuals
- 27.9 Color-Blind and Grayscale Safety
- 27.10 Export Formats
- 27.11 A Journal Submission Checklist
- 27.12 Progressive Project: Four-Panel Climate Figure
- 27.13 Reusable Journal Style Modules
- 27.14 Scale Bars for Images
- 27.15 Inset Axes for Zoomed Views
- 27.16 Annotation with arrows and callouts
- 27.17 Figure Captions: The Essential Companion
- 27.18 Multiple Comparisons Corrections
- 27.19 Volcano Plots and Manhattan Plots
- 27.20 Heatmaps for Clustered Data
- 27.21 Forest Plots for Meta-Analysis
- 27.22 Funnel Plots for Publication Bias
- 27.23 Effect Size Visualization
- 27.24 The Reproducibility Crisis and Figure Transparency
- 27.25 Check Your Understanding
- 27.26 Chapter Summary
- 27.27 Spaced Review
Chapter 27: Statistical and Scientific Visualization — Publication-Ready Figures
"Make every picture tell a story — but also make it pass peer review." — adapted from a submission guideline for Nature
27.1 What Publication-Ready Actually Means
For most of this book, "looks good" and "communicates effectively" have been the criteria for a successful chart. In scientific and technical publishing, there is an additional criterion: the figure must meet the explicit standards of the publication venue. These standards are written down, enforced by copy editors, and specific about things that general-audience charts never need to worry about: exact figure sizes in inches or centimeters, minimum font sizes, font embedding requirements, color-blind-safe palettes, panel label styles, and more.
Meeting these standards takes work. A chart that is perfectly readable in a Jupyter notebook may be rejected by a journal for having 8-point axis labels when the journal requires 9-point minimum. A multi-panel figure that looks organized on a laptop may violate the journal's panel labeling conventions. A color scheme that looks great on an RGB screen may be illegible when printed in black and white. None of these are content problems — the chart is still showing the same data — but they are production problems, and journals reject production-noncompliant figures as routinely as they reject content-noncompliant ones.
This chapter covers the specific requirements of scientific publication and the matplotlib techniques that meet them. It is a practical, applied chapter — there is no new conceptual threshold, just a long list of specific standards and the code to comply with them. Students who are preparing figures for their first journal submission will find most of the material directly useful. Students who are not will find it useful whenever they need to produce a chart that meets an external specification — a style guide, a brand standard, a government report format, or similar.
The chapter's threshold concept, or lack thereof, is also worth noting. No new theoretical ideas appear here. What is new is the discipline of checking your chart against external requirements before publishing it. This discipline separates scientific chart makers from casual ones and is worth developing even if you never submit to a journal.
27.2 Journal Requirements: A Survey
Different journals have different requirements. Some are strict; some are flexible. Some specify exact pixel dimensions; others give ranges. The only way to know what a specific journal requires is to read its author guidelines. That said, there are common patterns, and understanding them will prepare you for most publications.
Nature (and its family: Nature, Nature Communications, Nature Biotechnology, etc.):
- Figure width: 89 mm (single-column) or 183 mm (double-column).
- Height: up to 247 mm.
- Minimum font size: 5 pt (but 7 pt for most text).
- Font family: Arial or Helvetica.
- Color: encouraged but must be distinguishable in grayscale.
- Format: PDF or EPS (vector preferred), TIFF (raster at 300+ DPI for photographs).
- Panel labels: lowercase letters in bold (a, b, c, d), positioned at the top-left of each panel.
Science (and sub-journals):
- Figure width: 1 column = 55 mm, 1.5 column = 120 mm, 2 column = 180 mm.
- Minimum font size: 7 pt.
- Font family: Helvetica or Arial.
- Format: PDF or EPS, with fonts embedded.
- Panel labels: uppercase letters (A, B, C, D), bold.
PLOS (all PLOS journals):
- Figure width: 789 pixels (single-column, ~6.5 inches) or 1651 pixels (double-column, ~13.8 inches).
- Minimum DPI: 300.
- Format: TIFF or EPS.
- Fonts: Arial, Times New Roman, or Symbol, embedded.
IEEE (engineering journals):
- Figure width: 3.5 inches (single-column) or 7 inches (double-column).
- Minimum font size: 8 pt.
- Format: EPS or high-resolution TIFF.
- Color: allowed but grayscale should work.
APA (psychology journals):
- Figure width: 2.5 inches to 7 inches.
- Minimum font size: 8 pt.
- Format: Various, with preference for vector.
- Panel labels: "Figure 1a", "Figure 1b" in captions rather than on the figure.
The common threads: single-column widths around 3.5 inches or 89 mm, double-column widths around 7 inches or 180 mm, minimum font sizes in the 7-9 pt range, font embedding required, vector formats preferred (PDF/EPS) with high-DPI raster (300+) as fallback. Knowing these common values lets you design figures that work for most journals and tweak only the specifics for each submission.
27.3 Figure Sizing in Matplotlib
Matplotlib's figure size is specified in inches via figsize. For publication figures, you typically want to set the width to match the journal's single-column or double-column specification and let the height be driven by aspect-ratio considerations.
For a single-column Nature figure (89 mm ≈ 3.5 inches):
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(3.5, 2.5)) # 3.5 inch width, 2.5 inch height
For a double-column Nature figure (183 mm ≈ 7.2 inches):
fig, ax = plt.subplots(figsize=(7.2, 4.5))
The height is a design choice. Aspect ratios near 4:3, 3:2, or 16:9 are common. For line charts, apply the banking-to-45-degrees heuristic from Chapter 8. For scatter plots, a more square aspect ratio is often appropriate. For multi-panel figures, the height depends on the panel arrangement.
Matplotlib also supports metric units via the mm_to_inches conversion:
def mm_to_inches(mm):
return mm / 25.4
fig, ax = plt.subplots(figsize=(mm_to_inches(89), mm_to_inches(65)))
For multi-panel figures with plt.subplots(nrows, ncols), the figure size applies to the whole figure, not individual panels. For a 2×2 grid of double-column-width Nature panels, use figsize=(7.2, 5.4) and let the four panels divide the space.
DPI (dots per inch) matters for raster output. Journals typically require 300 DPI or higher for print-quality images. Set this on the figure creation:
fig, ax = plt.subplots(figsize=(3.5, 2.5), dpi=300)
Or on save:
fig.savefig("figure.png", dpi=300, bbox_inches="tight")
For vector output (PDF, EPS, SVG), DPI is mostly irrelevant — vector formats scale losslessly. Specify DPI only for raster formats.
27.4 Font Settings and Embedding
Font choices matter for publication figures because they affect readability, file size, and compliance with journal requirements. The key settings:
Font family: most journals specify Arial or Helvetica for sans-serif, Times or Times New Roman for serif. Set via rcParams:
import matplotlib as mpl
mpl.rcParams["font.family"] = "sans-serif"
mpl.rcParams["font.sans-serif"] = ["Arial", "Helvetica", "DejaVu Sans"]
The list order is a fallback: try Arial first, then Helvetica, then DejaVu Sans if neither is available. Matplotlib ships with DejaVu Sans as a safe default that looks similar to Arial.
Font size: minimum 7-9 pt for most journals. Set a base size and let specific elements scale from it:
mpl.rcParams["font.size"] = 8
mpl.rcParams["axes.titlesize"] = 9
mpl.rcParams["axes.labelsize"] = 8
mpl.rcParams["xtick.labelsize"] = 7
mpl.rcParams["ytick.labelsize"] = 7
mpl.rcParams["legend.fontsize"] = 7
Font embedding: journals require that fonts be embedded in PDF/EPS files so the document renders correctly on any machine. Matplotlib uses Type 42 (TrueType) fonts by default in most versions, which are widely supported. Older versions may default to Type 3 (PostScript), which some journals reject because it has sub-setting issues.
To force Type 42 font output:
mpl.rcParams["pdf.fonttype"] = 42
mpl.rcParams["ps.fonttype"] = 42
These settings ensure that saved PDF and EPS files have embedded TrueType fonts. Without them, you may get a rejection letter citing "fonts not embedded" or "Type 3 fonts not allowed."
Math text: mathematical notation in labels and titles can be written with matplotlib's math text (a simplified LaTeX-like syntax):
ax.set_xlabel(r"$x$ (mm)")
ax.set_ylabel(r"$y = x^2$")
ax.set_title(r"$\alpha$ vs. $\beta$")
The r"..." raw string prefix prevents Python from interpreting backslashes. The $...$ delimiters mark math mode. Most standard symbols work: \alpha, \beta, \sum, \int, ^2, _i, and more.
For full LaTeX rendering (with access to the full LaTeX typesetting engine), enable usetex:
mpl.rcParams["text.usetex"] = True
This requires a working LaTeX installation on the system. With usetex enabled, all text in the figure is rendered by LaTeX, which produces more consistent math output and access to the full LaTeX package ecosystem. The cost is slower rendering and a dependency on external software.
27.5 Panel Labels
Multi-panel figures require panel labels — typically "a", "b", "c", "d" in lowercase bold, positioned at the top-left of each panel. Different journals use different conventions (uppercase, parentheses, brackets), so check the specifics.
To add panel labels in matplotlib:
fig, axes = plt.subplots(2, 2, figsize=(7.2, 5.4))
for ax, label in zip(axes.flat, "abcd"):
ax.text(-0.15, 1.05, label, transform=ax.transAxes,
fontsize=12, fontweight="bold", va="top")
The text() call places a text object at position (-0.15, 1.05) in axes coordinates (where (0, 0) is the bottom-left of the axes and (1, 1) is the top-right). Negative x and y > 1 position the text outside the axes, to the upper-left. The transform=ax.transAxes tells matplotlib the coordinates are in axes coordinate space, not data coordinates.
For journals that use uppercase labels or parentheses:
# "(A)", "(B)", "(C)", "(D)" — uppercase with parentheses
for ax, letter in zip(axes.flat, "ABCD"):
ax.text(-0.15, 1.05, f"({letter})", transform=ax.transAxes,
fontsize=12, fontweight="bold", va="top")
The exact positioning varies with the figure layout. You may need to adjust the x and y coordinates to avoid overlapping the axes. A helper function that takes an axes and a letter and adds a label consistently is useful for any project that produces many multi-panel figures.
27.6 Confidence Intervals and Error Bars
Scientific figures typically show uncertainty alongside point estimates. The main visual encodings:
Error bars for discrete estimates (group means, regression coefficients, categorical summaries):
ax.errorbar(x, means, yerr=std_errors, fmt="o", capsize=3, color="black")
The fmt="o" draws circles at each point. capsize=3 adds small perpendicular caps at the error bar ends. yerr can be a single value (symmetric errors) or a 2D array [[lower_errors], [upper_errors]] for asymmetric ones. The error can represent standard error, standard deviation, 95% confidence interval, or other measures — always disclose which in the caption.
Shaded confidence bands for continuous estimates (regression lines, time series models):
ax.plot(x, y_pred, color="steelblue", label="Fit")
ax.fill_between(x, y_lower, y_upper, color="steelblue", alpha=0.2, label="95% CI")
The fill_between draws a translucent band between the lower and upper bounds of the interval. Setting alpha=0.2 lets the reader see through the band to the background, so other elements (data points, axes) remain visible.
Boxplots with notches show the median and confidence interval around the median:
ax.boxplot(data, notch=True)
The notch is a V-shaped indentation around the median that represents approximately a 95% confidence interval. Overlapping notches between two boxplots suggest no significant difference; non-overlapping notches suggest significance. This is an older convention, less common in modern papers than in the 1980s.
Confidence ellipses for 2D estimates (e.g., a mean in 2D space):
from matplotlib.patches import Ellipse
from scipy.stats import chi2
def confidence_ellipse(mean, cov, ax, n_std=2.0, **kwargs):
vals, vecs = np.linalg.eigh(cov)
angle = np.degrees(np.arctan2(*vecs[:, 1][::-1]))
width, height = 2 * n_std * np.sqrt(vals)
ellipse = Ellipse(mean, width, height, angle=angle, **kwargs)
ax.add_patch(ellipse)
Use for 2D confidence regions around a scatter plot mean or a multivariate estimate.
The general rule: always show uncertainty. A point estimate without a confidence interval is incomplete and sometimes misleading. Publication figures that omit uncertainty are rare and usually rejected by peer reviewers.
27.7 Significance Brackets and P-Value Annotations
Scientific figures often annotate pairwise comparisons with significance brackets and p-values. The traditional convention:
- A horizontal bracket spans two groups being compared.
- Above the bracket is a label: "n.s." (not significant), "" (p < 0.05), "" (p < 0.01), "**" (p < 0.001), or an explicit p-value.
In matplotlib, drawing these manually is tedious. The statannotations library (pip install statannotations) automates it for seaborn-style categorical plots:
import seaborn as sns
from statannotations.Annotator import Annotator
fig, ax = plt.subplots(figsize=(4, 3))
sns.boxplot(data=df, x="group", y="value", ax=ax)
pairs = [("A", "B"), ("A", "C"), ("B", "C")]
annotator = Annotator(ax, pairs, data=df, x="group", y="value")
annotator.configure(test="t-test_ind", text_format="star", loc="outside")
annotator.apply_and_annotate()
The library runs the specified statistical test for each pair, computes the p-value, and draws the bracket with a significance label. It supports t-tests, Mann-Whitney U, Wilcoxon, Kruskal-Wallis, and more. For text_format="star", it uses stars; for "full", it writes "p=0.023" or similar.
A caveat: significance stars are a convention from older statistical practice and are increasingly criticized. Modern statistical advice emphasizes effect sizes and confidence intervals over p-value thresholds. Some journals (like The American Statistician) have editorialized against p-value stars explicitly. When submitting, check the journal's current stance.
27.8 Diagnostic Plots: QQ Plots and Residuals
Statistical papers often include diagnostic plots that check model assumptions. Two common ones are QQ plots and residual plots.
QQ plots compare the distribution of a sample to a theoretical distribution (usually normal). Points on the diagonal mean the sample matches the theoretical distribution; deviations indicate non-normality.
import scipy.stats as stats
import matplotlib.pyplot as plt
residuals = model.resid # from a fitted statsmodels regression
fig, ax = plt.subplots(figsize=(3.5, 3.5))
stats.probplot(residuals, dist="norm", plot=ax)
ax.set_title("Normal QQ Plot of Residuals")
scipy.stats.probplot takes the sample and a distribution name and returns a QQ plot. The points should fall roughly on the diagonal line for normally-distributed residuals.
Residual plots show residuals vs. fitted values to check for heteroscedasticity and non-linearity:
fig, ax = plt.subplots(figsize=(4, 3))
ax.scatter(model.fittedvalues, model.resid, alpha=0.5)
ax.axhline(0, color="red", linestyle="--")
ax.set_xlabel("Fitted values")
ax.set_ylabel("Residuals")
ax.set_title("Residuals vs. Fitted")
A good residual plot shows points randomly scattered around zero with constant variance. Patterns (fan-shaped, curved) indicate model problems.
Other diagnostic plots include leverage plots, Cook's distance plots, and partial residual plots. The statsmodels library has sm.graphics.plot_regress_exog and related functions that produce standard diagnostic sets.
27.9 Color-Blind and Grayscale Safety
Scientific papers are often printed in black and white, and even color-capable journals reach readers with various forms of color blindness. Figures should work in both cases.
Color-blind-safe palettes: several research-backed palettes exist. The Wong palette (from Bang Wong, 2011) has 8 colors designed to be distinguishable by all major forms of color blindness:
wong_colors = ["#000000", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7"]
mpl.rcParams["axes.prop_cycle"] = plt.cycler("color", wong_colors)
Other safe palettes: viridis (for sequential colormaps), colorblind (seaborn's built-in), Okabe-Ito (similar to Wong). The matplotlib Colorblind and seaborn colorblind palettes are good defaults.
Grayscale safety: even with safe colors, the chart should be distinguishable in grayscale. Test by converting to grayscale (imshow(img, cmap="gray") or taking a screenshot and desaturating) and checking whether different categories are still distinguishable. If not, add redundant encoding: different line styles, different markers, explicit labels.
Redundant encoding is the surest way to ensure grayscale readability. Use color AND line style AND marker shape for each category:
for i, (group, data) in enumerate(df.groupby("category")):
ax.plot(data.x, data.y, color=wong_colors[i],
linestyle=["-", "--", ":", "-."][i],
marker=["o", "s", "^", "D"][i],
label=group)
Even if the reader cannot distinguish colors, the line styles and markers make the categories clear. This is extra work but essential for accessibility.
27.10 Export Formats
Different journals accept different formats. The main options:
PDF: vector format, widely supported, high quality, small file sizes for simple figures. Matplotlib's default for vector output. Use for almost any modern journal.
fig.savefig("figure.pdf", bbox_inches="tight")
EPS (Encapsulated PostScript): older vector format, still required by some journals. Produced by matplotlib with .eps extension. Handles fewer features than PDF (no transparency in some readers) but universally compatible.
fig.savefig("figure.eps", bbox_inches="tight")
TIFF: raster format preferred by biomedical journals (PLOS, some Nature journals). Supports high DPI and multiple layers. Use 300 DPI minimum, 600 for complex figures.
fig.savefig("figure.tif", dpi=300, bbox_inches="tight")
SVG: vector web format. Useful for web publishing but not commonly required by print journals. Produced with .svg extension.
fig.savefig("figure.svg", bbox_inches="tight")
The bbox_inches="tight" argument crops the saved figure to the actual content, removing whitespace margins. Essential for most journal submissions.
The facecolor argument sets the background color of the saved figure. Defaults to white, which is usually correct. For transparent backgrounds, use facecolor="none" plus transparent=True.
27.11 A Journal Submission Checklist
Before submitting a figure to a journal, run through this checklist:
- Width: does it match the journal's single-column or double-column specification?
- Height: is the aspect ratio appropriate (banking, panel arrangement)?
- Font family: Arial/Helvetica for sans-serif, Times for serif?
- Font sizes: all text at or above the journal's minimum (usually 7-9 pt)?
- Font embedding: Type 42 (TrueType) set in rcParams?
- Panel labels: correct style (lowercase/uppercase, bold, parentheses/not), positioned consistently?
- Error bars: present for every point estimate, clearly defined in caption?
- Color-blind safety: distinguishable in grayscale, or uses redundant encoding?
- Format: correct format (PDF/EPS/TIFF) per journal requirements?
- DPI: 300+ for raster formats?
- Panel labeling style: matches journal convention?
- Caption: self-contained, explains every element (axes, error bars, colors, significance)?
- Statistical tests: disclosed in caption, multiple-comparison corrections noted if applicable?
- Sample sizes: n values stated for each group?
- Scale bars: present for images?
- Abbreviations: defined in caption or figure legend?
- Reproducibility: code archived or available, data accessible?
- Accessibility: alt-text for supplementary materials?
- File size: within journal's upload limits?
- Final visual check: print out the figure at intended publication size and verify readability.
Running through this checklist for every figure becomes second nature after a few submissions. Skipping it produces figures that get bounced back from the editorial desk before reaching peer reviewers — wasted time for everyone.
27.12 Progressive Project: Four-Panel Climate Figure
The climate project in this chapter produces a 4-panel publication-quality figure suitable for a climate science paper. The panels:
Panel (a): Temperature anomaly time series with 10-year rolling mean and 95% confidence band.
Panel (b): Scatter plot of CO2 vs. temperature with fitted regression line and 95% prediction interval.
Panel (c): Monthly heatmap of temperature anomalies by year and month (2D heatmap from Chapter 14).
Panel (d): Bar chart of temperature trends by region with error bars.
The complete figure uses:
figsize=(7.2, 6)for double-column Nature-style.rcParamsset for 8 pt Arial, Type 42 fonts.- Panel labels "a", "b", "c", "d" in bold at top-left.
- Wong colorblind-safe palette.
- 95% confidence intervals on all estimates.
- A comprehensive caption describing every element.
- Export as both PDF (vector) and TIFF (raster) at 300 DPI.
The code is verbose — perhaps 50-80 lines — but it is the kind of code that becomes reusable across similar projects. Building the figure once and saving the style settings as a configuration file means the next figure takes much less effort.
27.13 Reusable Journal Style Modules
Once you have identified the settings a journal requires, the right approach is to put them all in a Python module and import it at the top of every submission figure script. This ensures consistency across figures within the same paper and across papers to the same journal.
A minimal journal style module:
# nature_style.py
import matplotlib as mpl
def apply_nature_style():
mpl.rcParams.update({
"figure.dpi": 100, # screen preview
"savefig.dpi": 300, # saved output
"pdf.fonttype": 42, # Type 42 TrueType
"ps.fonttype": 42,
"font.family": "sans-serif",
"font.sans-serif": ["Arial", "Helvetica", "DejaVu Sans"],
"font.size": 7,
"axes.titlesize": 8,
"axes.labelsize": 7,
"axes.linewidth": 0.8,
"xtick.labelsize": 6,
"ytick.labelsize": 6,
"xtick.major.width": 0.8,
"ytick.major.width": 0.8,
"legend.fontsize": 6,
"legend.frameon": False,
"lines.linewidth": 1.0,
"lines.markersize": 4,
"figure.figsize": (3.5, 2.5), # single-column default
})
# Sizes in millimeters converted to inches
NATURE_SINGLE_COL = 89 / 25.4 # 3.5 inches
NATURE_DOUBLE_COL = 183 / 25.4 # 7.2 inches
NATURE_1P5_COL = 120 / 25.4 # 4.7 inches
def nature_figsize(width_type="single", aspect=0.75):
width = {"single": NATURE_SINGLE_COL, "1.5": NATURE_1P5_COL, "double": NATURE_DOUBLE_COL}[width_type]
return (width, width * aspect)
Usage in a figure script:
from nature_style import apply_nature_style, nature_figsize
import matplotlib.pyplot as plt
apply_nature_style()
fig, ax = plt.subplots(figsize=nature_figsize("single", aspect=0.7))
# ... build chart ...
fig.savefig("figure_1.pdf", bbox_inches="tight")
Separate modules for different journals (plos_style.py, science_style.py, ieee_style.py) let you switch between submission targets by changing a single import. Colleagues can share these modules as a lab standard, and new lab members inherit the conventions automatically.
This pattern applies beyond journal styles. A module can codify a lab's preferred color palette, its logo watermarking function, its standard panel-labeling helper, and any other conventions the lab uses. The result is that every figure from the lab looks like it came from the same lab, without individual researchers having to remember the settings.
27.14 Scale Bars for Images
When publication figures include photographs, microscopy, or other imagery, they typically need scale bars — small reference lines indicating a specific physical length. A micrograph without a scale bar is nearly useless for publication.
Matplotlib does not have a built-in scale bar function, but the matplotlib-scalebar library (pip install matplotlib-scalebar) adds one:
from matplotlib_scalebar.scalebar import ScaleBar
fig, ax = plt.subplots()
ax.imshow(microscopy_image)
scalebar = ScaleBar(dx=0.5, units="um", location="lower right")
ax.add_artist(scalebar)
The dx=0.5 parameter says "each pixel represents 0.5 units" (here, micrometers). The library automatically picks a nice round length (5 μm, 10 μm, etc.) and draws the bar with a label.
For manual scale bars without the library, draw a line and annotate:
ax.plot([100, 200], [600, 600], color="white", linewidth=3) # 100-pixel scale bar
ax.text(150, 590, "10 μm", color="white", ha="center", fontsize=7)
Scale bars are required for any image where physical size matters: histology, cell biology, material science, astronomy, nanotechnology. Always include them, place them in a corner where they do not obscure data, and use white or black text that contrasts with the image background.
27.15 Inset Axes for Zoomed Views
Scientific figures often need to show a big picture with a zoomed detail. The matplotlib pattern is inset axes — a smaller axes embedded within a larger one.
from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset
fig, ax = plt.subplots(figsize=(5, 4))
ax.plot(x, y)
ax.set_xlim(0, 100)
ax.set_ylim(0, 10)
# Inset axes in the top-right
axins = inset_axes(ax, width="30%", height="30%", loc="upper right")
axins.plot(x, y)
axins.set_xlim(40, 60) # zoom region
axins.set_ylim(4, 6)
# Mark the zoom region on the main axes
mark_inset(ax, axins, loc1=2, loc2=4, fc="none", ec="gray")
inset_axes creates an axes at a specified position within the main axes. mark_inset draws connector lines between the zoom region on the main chart and the inset. The result clearly shows what is being zoomed and where.
For multi-panel figures, consider whether an inset is simpler than a separate subplot. An inset keeps the comparison in a single visual unit; a separate subplot gives each view more space but requires the reader to look between panels.
27.16 Annotation with arrows and callouts
Scientific figures often call out specific data points or regions with annotations. The ax.annotate method is the primary tool:
ax.annotate(
"Peak at\n t=45s",
xy=(45, 0.95), # point being annotated
xytext=(60, 0.7), # text position
arrowprops=dict(
arrowstyle="->",
color="black",
connectionstyle="arc3,rad=-0.2",
),
fontsize=8,
ha="left",
)
The xy parameter specifies the point being annotated; xytext specifies where the text appears. The arrow connects them automatically. The connectionstyle="arc3,rad=-0.2" makes the arrow curve slightly, which looks more natural than a straight line.
For callouts that do not have arrows, use ax.text directly with offset coordinates:
ax.text(45, 0.95, "Peak", ha="center", va="bottom", fontsize=8,
bbox=dict(boxstyle="round,pad=0.3", facecolor="white", edgecolor="black"))
The bbox argument adds a white rounded rectangle behind the text, making it readable against any background. Useful for labels inside scatter plots or heatmaps where the text might otherwise be obscured.
27.17 Figure Captions: The Essential Companion
A scientific figure is incomplete without its caption. Journals require that every figure include a caption that explains what the figure shows, how it was produced, and what the reader should conclude from it. A good caption is a small essay — detailed, self-contained, and carefully written. A bad caption is a sentence fragment that leaves the reader guessing.
The components of a good caption:
Figure identifier: "Figure 1." or similar, bolded and followed by the caption text.
Title or headline: a short description of the finding, analogous to an action title (Chapter 7). For example: "CO2 and temperature are strongly correlated in the industrial era." Not: "CO2 vs. temperature scatter."
Method description: a brief summary of how the data was collected or produced. "Daily temperature anomalies were computed from NOAA/GHCN station records using the reference period 1951-1980."
Axis descriptions: what the x and y axes represent, including units. "X axis: atmospheric CO2 concentration (ppm). Y axis: global temperature anomaly (°C)."
Panel descriptions: for multi-panel figures, describe each panel. "(a) Raw time series with 10-year rolling mean. (b) CO2-temperature scatter with regression line. (c) Monthly heatmap of anomalies."
Symbol key: colors, markers, and error bars explained. "Blue markers: pre-1950 observations. Red markers: post-1950 observations. Error bars: 95% confidence intervals."
Statistical details: tests, n values, p-values. "Spearman correlation ρ = 0.94 (n = 1728, p < 0.001)."
Source attribution: where the data came from. "Data from NOAA National Centers for Environmental Information."
A well-written caption allows the reader to understand the figure without reading the rest of the paper. This matters because readers often skim papers by looking at figures first — the figure and caption together must tell the story.
In LaTeX scientific papers, captions are written as \caption{...} inside the figure environment. In markdown or Word documents, they appear below the figure. The exact style depends on the journal, but the content is similar across all of them.
27.18 Multiple Comparisons Corrections
When a figure reports many statistical tests at once — pairwise comparisons across groups, multiple tests of the same hypothesis, many voxels in a brain image — the naive p-value threshold (usually 0.05) becomes misleading. With enough tests, some will be "significant" by chance alone. This is the multiple comparisons problem, and scientific papers must address it.
The standard corrections:
Bonferroni correction: divide the significance threshold by the number of tests. If you run 20 tests and want a family-wise α of 0.05, the per-test threshold becomes 0.05/20 = 0.0025. Very conservative; often over-corrects for large numbers of tests.
Holm-Bonferroni: a sequential version of Bonferroni that is less conservative. Order the p-values, compare each to a progressively less strict threshold.
False Discovery Rate (Benjamini-Hochberg): controls the expected proportion of false positives among rejected tests rather than the family-wise error rate. More permissive for large-scale testing. Widely used in genomics and neuroimaging.
In visualization, corrections affect which results are shown as "significant." A figure that annotates "p < 0.01" without mentioning a correction may be misleading if many tests were run. Best practice:
- State the correction method in the caption: "P-values are Benjamini-Hochberg corrected (FDR)."
- Report both raw and corrected p-values when space allows.
- For figures with many tests (heatmaps of significance, volcano plots), clearly mark the correction.
The statannotations library supports multiple-comparison corrections via its comparisons_correction parameter. Set to "bonferroni", "holm", "BH", or similar to apply the correction automatically.
27.19 Volcano Plots and Manhattan Plots
Two specialized plot types appear frequently in genomics, proteomics, and other high-throughput scientific contexts.
Volcano plots display the results of many tests simultaneously, with effect size on the x-axis and significance (−log10 of p-value) on the y-axis. Points in the top-left and top-right corners (large effect + small p-value) are the most interesting. A horizontal line marks the significance threshold; vertical lines mark effect-size thresholds.
import numpy as np
import matplotlib.pyplot as plt
results = pd.DataFrame({
"gene": gene_names,
"log2_fold_change": effect_sizes,
"p_value": p_values,
})
results["neg_log10_p"] = -np.log10(results["p_value"])
results["significant"] = (results["p_value"] < 0.05) & (abs(results["log2_fold_change"]) > 1)
fig, ax = plt.subplots(figsize=(5, 4))
ax.scatter(results["log2_fold_change"], results["neg_log10_p"],
c=results["significant"].map({True: "red", False: "gray"}),
s=10, alpha=0.6)
ax.axhline(-np.log10(0.05), color="black", linestyle="--", linewidth=0.8)
ax.axvline(1, color="black", linestyle="--", linewidth=0.8)
ax.axvline(-1, color="black", linestyle="--", linewidth=0.8)
ax.set_xlabel("log2 fold change")
ax.set_ylabel("-log10 p-value")
ax.set_title("Volcano Plot")
The interesting points (significant + large effect) form a cluster at the top of the plot, shaped like the outline of a volcano — hence the name. Genes or features in those corners are the candidates for follow-up investigation.
Manhattan plots are specific to genome-wide association studies (GWAS). The x-axis is genomic position (chromosome and base-pair location), and the y-axis is −log10 of the p-value for each SNP. Significant associations appear as tall peaks — the visual metaphor is the New York City skyline (hence "Manhattan"). A genome-wide significance threshold is marked as a horizontal line.
Both volcano and Manhattan plots have the same underlying idea: display many tests with effect size and significance, and let the viewer spot the extreme cases visually. The format is stereotyped in genomics; any GWAS paper will include a Manhattan plot, and the conventions (colors alternating by chromosome, threshold line at 5e-8) are universal in the field.
27.20 Heatmaps for Clustered Data
Scientific papers often show clustered heatmaps — heatmaps where rows and columns have been reordered by hierarchical clustering to reveal block-diagonal structure. This is the same technique as Chapter 19's clustermap, applied with additional publication-specific styling.
For a gene expression heatmap meeting journal requirements:
import seaborn as sns
g = sns.clustermap(
expression_matrix,
cmap="RdBu_r",
center=0,
z_score=0, # z-score normalize rows
figsize=(5, 7),
col_cluster=False, # columns = conditions, preserve order
row_cluster=True, # rows = genes, cluster
xticklabels=condition_labels,
yticklabels=False, # too many rows to label
cbar_kws={"label": "z-score"},
)
g.fig.suptitle("Gene Expression Heatmap", y=1.02)
The z_score=0 argument normalizes each row (gene) to z-scores, which makes the heatmap reveal relative expression patterns rather than absolute magnitudes. The col_cluster=False preserves the biological order of conditions; row_cluster=True groups similar genes together.
This kind of heatmap is a staple of gene expression papers and follows the Eisen 1998 conventions covered in Chapter 19's Case Study 2. The publication version mainly differs from the exploratory version in having consistent fonts, a clean legend, and appropriate axis labels.
27.21 Forest Plots for Meta-Analysis
Meta-analyses combine results from multiple studies into a single estimate, and they are typically visualized with forest plots. A forest plot shows each study's effect estimate and confidence interval on a horizontal line, with all studies arranged vertically. A diamond at the bottom represents the pooled estimate.
import matplotlib.pyplot as plt
import numpy as np
studies = ["Smith 2018", "Jones 2019", "Patel 2020", "Lee 2021", "Pooled"]
effects = [0.12, -0.05, 0.20, 0.15, 0.11]
lowers = [0.02, -0.15, 0.10, 0.08, 0.05]
uppers = [0.22, 0.05, 0.30, 0.22, 0.17]
fig, ax = plt.subplots(figsize=(5, 4))
y_positions = np.arange(len(studies))[::-1]
for i, (study, effect, lo, hi) in enumerate(zip(studies, effects, lowers, uppers)):
y = y_positions[i]
marker = "D" if study == "Pooled" else "s"
color = "black" if study == "Pooled" else "steelblue"
size = 80 if study == "Pooled" else 40
ax.plot([lo, hi], [y, y], color=color, linewidth=1)
ax.scatter(effect, y, s=size, marker=marker, color=color, zorder=5)
ax.axvline(0, color="gray", linestyle="--", linewidth=0.5)
ax.set_yticks(y_positions)
ax.set_yticklabels(studies)
ax.set_xlabel("Effect size (95% CI)")
ax.set_title("Forest Plot")
Each row is one study, with its point estimate (square) and confidence interval (horizontal line). The pooled estimate is the diamond. The vertical line at zero marks "no effect." Studies whose confidence intervals cross zero are not significant individually; the pooled estimate may or may not be significant depending on the combined evidence.
Forest plots are standard in systematic reviews and meta-analyses across medicine, psychology, and the social sciences. The Cochrane collaboration (which produces medical meta-analyses) has a specific forest plot template that most medical journals follow. The key elements — square size proportional to study weight, horizontal confidence intervals, pooled diamond — are universal.
27.22 Funnel Plots for Publication Bias
A funnel plot is a scatter plot of effect sizes (x-axis) against a measure of study precision (y-axis, usually standard error inverted so larger studies are at the top). In the absence of publication bias, the plot should be symmetric — small studies scattered widely around the true effect, large studies clustered tightly. Asymmetry suggests publication bias: small studies with null results are missing, pulling the visible pattern toward positive effects.
fig, ax = plt.subplots(figsize=(4, 5))
ax.scatter(effects, standard_errors, alpha=0.6)
ax.set_xlabel("Effect size")
ax.set_ylabel("Standard error")
ax.invert_yaxis() # larger SE at bottom
# Add funnel lines (95% CI contour)
mean_effect = np.average(effects, weights=1/standard_errors**2)
se_range = np.linspace(0, max(standard_errors), 100)
ax.plot(mean_effect - 1.96*se_range, se_range, color="gray", linestyle="--")
ax.plot(mean_effect + 1.96*se_range, se_range, color="gray", linestyle="--")
ax.axvline(mean_effect, color="red", linestyle=":")
The dashed funnel lines show the 95% confidence interval expected under no bias. Studies outside these lines are unexpected given their precision, and asymmetry in the cloud suggests missing studies.
Funnel plots are a specific diagnostic tool for meta-analyses, not a general visualization. They appear in Cochrane reviews and systematic review papers. If you are producing one, follow the standard Egger-test approach for quantifying asymmetry and report both the plot and the test in the caption.
27.23 Effect Size Visualization
Modern statistical guidance emphasizes effect sizes over p-value thresholds. An effect size (Cohen's d, odds ratio, correlation coefficient, standardized mean difference) is a measure of the magnitude of an effect, independent of sample size. Visualizing effect sizes requires showing the effect value, its confidence interval, and a reference point (zero or "no effect").
Effect-size forest plots combine the forest plot layout with effect size labels for clarity. These are especially useful for subgroup analyses.
Estimation plots (Gardner-Altman plots) combine the raw data with the effect-size estimate in a single figure. The left panel shows the raw group data; the right panel shows the mean difference with its confidence interval aligned to the reference group. The DABEST library (pip install dabest) produces these plots automatically:
import dabest
analysis = dabest.load(df, idx=("control", "treatment"))
analysis.mean_diff.plot(fig_size=(5, 4))
DABEST is gaining popularity as an alternative to t-test-and-bar-chart reporting. The visual emphasizes effect size and uncertainty rather than a binary "significant/not significant" determination. Some journals (eLife, Nature Methods) have explicitly endorsed this approach.
The broader lesson: effect sizes should be visualized, not just reported in text. A figure that shows effect sizes with their confidence intervals conveys uncertainty in a way that a table of p-values does not. Publications increasingly expect this.
27.24 The Reproducibility Crisis and Figure Transparency
Scientific publishing has been dealing with a "reproducibility crisis" — the finding that many published results cannot be replicated by independent researchers. Visualization is part of the picture. Figures that look convincing can mislead if the underlying data or methods are not transparent, and opaque figures make it hard for other researchers to identify problems.
Modern best practices for transparent figures:
Publish the data and code. The single most effective way to ensure reproducibility is to publish the data and the code that produced the figure, not just the figure image. Journals increasingly require this (Nature, Science, and most PLOS journals). Archives like GitHub, Zenodo, OSF, and Dryad make it practical.
Use figures that show the full data distribution, not just summaries. A bar chart with error bars shows mean ± SEM, which tells the reader almost nothing about the distribution. A strip plot + box plot (Chapter 18) shows every individual observation alongside the summary, making outliers and skewness visible. The modern alternative to dynamite plots exists partly because of reproducibility concerns — showing individual points makes it harder to hide problematic data.
Disclose the sample size. Every figure should mark or caption the n values for each group. "N = 12 mice per group" is a single line in a caption that adds enormous interpretability.
Report effect sizes, not just p-values. As Section 27.23 discussed, p-values alone can mislead — a very small effect with a very large sample can produce p < 0.001 and be practically meaningless. Effect sizes plus confidence intervals give a more honest picture.
Avoid cherry-picked examples. Representative examples in figures should be, well, representative. Selecting the cleanest-looking example out of a hundred and presenting it as "typical" is common and problematic. When you need example data, show multiple examples or use a statistical summary across all data.
Use version control for figure code. Figure code changes over time as analyses evolve. Keeping it in git with meaningful commit messages lets you track which figure corresponds to which analysis state and revert if needed.
Provide raw and processed data. Some journals require authors to submit both the raw data file and the processed data file that was plotted. This lets reviewers and future researchers check the intermediate steps.
The broader principle is that a figure is not just the final image; it is also the data and code that produced it. Publishing the full chain makes the figure verifiable, and verifiable figures are more trustworthy. The best scientific visualizations are the ones where any interested reader can reproduce them from first principles.
A concrete practice: when you submit a figure to a journal, also create a GitHub repository (or similar) containing the data, the Python script, and a README explaining how to reproduce the figure. Link to the repository in the paper's supplementary materials or methods section. This small investment of effort dramatically increases the figure's long-term credibility. Future readers who are skeptical of the claim can check the data themselves; future researchers who want to extend the analysis can start from a working baseline; and you yourself will benefit a year later when you need to regenerate a figure with updated data and no longer remember which notebook produced the original. Reproducibility is a gift to others and especially to your future self. It also makes your results more likely to survive scrutiny, to be cited correctly, and to contribute meaningfully to the cumulative knowledge in your field. The alternative — figures whose underlying data and methods are not available — is not sustainable in a scientific culture that increasingly expects transparency, and journals that require it will continue to multiply in the coming years.
27.25 Check Your Understanding
Before continuing to Chapter 28 (Big Data Visualization), make sure you can answer:
- What are the typical single-column and double-column figure widths for Nature, Science, and PLOS?
- What is the difference between Type 42 and Type 3 fonts, and why does it matter?
- How do you add panel labels (a, b, c) to a multi-panel figure in matplotlib?
- What is the Wong colorblind-safe palette, and when should you use it?
- What does
fig.savefig(..., bbox_inches="tight")do? - How do you embed LaTeX math notation in a matplotlib label?
- What is the statannotations library, and what does it automate?
- Name five items from the journal submission checklist.
If any of these are unclear, re-read the relevant section. Chapter 28 addresses big data visualization — when you have more points than pixels and standard charts break down.
27.26 Chapter Summary
This chapter covered the specific requirements and techniques for publication-quality scientific figures:
- Journal requirements specify figure widths, font sizes, font embedding, panel label styles, and export formats. Common patterns: ~3.5-inch single-column, ~7-inch double-column, 7-9 pt minimum font, Arial/Helvetica, PDF/EPS/TIFF.
- Figure sizing uses matplotlib's
figsize(inches) with mm-to-inches conversion for metric specifications. - Font settings include family, size, and embedding (Type 42 for TrueType, required by most modern journals). Set via rcParams.
- Panel labels use
ax.textwithtransform=ax.transAxesto place labels consistently at the top-left of each panel. - Confidence intervals and error bars encode uncertainty via
errorbar,fill_between, and boxplot notches. - Significance brackets via the statannotations library automate pairwise comparison annotations.
- Diagnostic plots (QQ plots, residual plots) check model assumptions with scipy.stats and matplotlib.
- Color-blind and grayscale safety requires palette selection (Wong, Okabe-Ito, viridis) and redundant encoding (color + line style + marker).
- Export formats include PDF (vector default), EPS (legacy vector), TIFF (raster for biomedical), and SVG (web).
- The submission checklist ensures every figure meets the journal's requirements before submission.
No new threshold concept — this chapter is applied rather than conceptual. The payoff is the ability to produce figures that meet real publication standards without being rejected on production grounds. The checklist in Section 27.11 is a practical tool for any scientific submission.
Chapter 28 moves to the final chapter of Part VI: big data visualization, where the challenge is rendering datasets too large for standard tools.
27.27 Spaced Review
- From Chapter 12 (Customization Mastery): Journal-specific styles are an application of the rcParams and style sheet material from Chapter 12. How does a "journal style" file compare to the reusable style functions discussed there?
- From Chapter 13 (Subplots & GridSpec): Multi-panel figures for journals require precise layout. When is GridSpec better than plain
subplotsfor this? - From Chapter 14 (Specialized Charts): Error bars and confidence bands were introduced in Chapter 14. How do the journal-specific requirements modify the Chapter 14 patterns?
- From Chapter 3 (Color): Colorblind-safe palettes were introduced in Chapter 3. How does the scientific-publication context differ from the general-audience context?
- From Chapter 6 (Data-Ink Ratio): Publication figures must be dense but readable. How does the data-ink ratio principle apply in a context with strict space constraints?
Chapter 27 is applied and detail-heavy. There is no clever trick to learn, just a long list of specific requirements and the code to meet them. Students who are preparing their first journal figure will find this chapter directly useful. Those who are not will still benefit from the discipline of checking figures against external specifications. Chapter 28 closes Part VI with big data visualization strategies.