Quiz: Statistical and Scientific Visualization


Part I: Multiple Choice (10 questions)

Q1. Nature's typical single-column figure width is:

A) 3.5 inches (89 mm) B) 7 inches (178 mm) C) 5 inches (127 mm) D) 8.5 inches (216 mm)

Answer**A.** 89 mm ≈ 3.5 inches is Nature's single-column width. Double-column is 183 mm ≈ 7.2 inches.

Q2. Which rcParams setting ensures TrueType (Type 42) fonts in PDF output?

A) mpl.rcParams["font.family"] = "truetype" B) mpl.rcParams["pdf.fonttype"] = 42 C) mpl.rcParams["pdf.embed_fonts"] = True D) mpl.rcParams["font.encoding"] = "truetype"

Answer**B.** `pdf.fonttype = 42` forces Type 42 TrueType output. Set `ps.fonttype = 42` similarly for PostScript output.

Q3. The typical minimum font size for journal figures is:

A) 5 pt B) 7 pt C) 10 pt D) 12 pt

Answer**B.** Most journals require 7-9 pt minimum, with 7 pt being the absolute floor for most. Some (Nature's supplementary) allow 5 pt for very dense figures.

Q4. Which library automates pairwise significance brackets on seaborn plots?

A) statsmodels B) statannotations C) scipy.stats D) seaborn itself

Answer**B.** `statannotations` provides the `Annotator` class for adding brackets with p-values to seaborn categorical plots.

Q5. The Wong palette is designed for:

A) High contrast B) Colorblind accessibility C) Grayscale printing D) RGB displays

Answer**B.** Bang Wong's 2011 palette has 8 colors designed to be distinguishable by all major forms of color blindness. Also works in grayscale (with care).

Q6. Which format is vector-based and preferred by most modern journals?

A) PNG B) JPEG C) PDF D) BMP

Answer**C.** PDF is the vector default and widely accepted. EPS is older but also vector. TIFF (raster) is required by some biomedical journals but at high DPI.

Q7. Which function adds a horizontal reference line in matplotlib?

A) ax.hline B) ax.axhline C) ax.draw_line D) ax.horizontal

Answer**B.** `ax.axhline(y=0)` adds a horizontal line across the whole axes. `ax.axvline` is the vertical version.

Q8. In a volcano plot, what do the axes represent?

A) Time and magnitude B) Effect size (x) and -log10 p-value (y) C) Mean and variance D) Group and value

Answer**B.** Effect size on x, -log10 of p-value on y. Points with large effect and small p-value appear in the top corners, forming the "volcano" shape.

Q9. A forest plot is typically used for:

A) Tree structures B) Meta-analyses showing effect estimates per study C) Categorical comparisons D) Time series

Answer**B.** Forest plots display each study's effect estimate with confidence interval, plus a pooled estimate. Standard in meta-analyses.

Q10. What does ax.text(-0.15, 1.05, "a", transform=ax.transAxes) do?

A) Places "a" at data coordinates (-0.15, 1.05) B) Places "a" at axes-fraction coordinates (-0.15, 1.05), which is just outside the top-left C) Creates a new axis labeled "a" D) Errors because coordinates are invalid

Answer**B.** `transform=ax.transAxes` means coordinates are in axes fraction (0-1 within the axes). Negative x and y>1 place the text outside the axes — a common panel label position.

Part II: Short Answer (10 questions)

Q11. Write rcParams to set font family to Arial and font size to 8 pt.

Answer
import matplotlib as mpl
mpl.rcParams["font.family"] = "sans-serif"
mpl.rcParams["font.sans-serif"] = ["Arial", "Helvetica", "DejaVu Sans"]
mpl.rcParams["font.size"] = 8

Q12. Describe the purpose of panel labels in multi-panel figures.

AnswerPanel labels (a, b, c, d, typically in bold lowercase or uppercase) let the caption and text refer to specific panels unambiguously. Without labels, the reader cannot tell which panel is "the third one." Labels also establish a reading order (a before b before c) that guides the reader through the figure.

Q13. Write matplotlib code to save a figure as a 300 DPI TIFF with tight bounding box.

Answer
fig.savefig("figure.tif", dpi=300, bbox_inches="tight")

Q14. Explain why confidence intervals matter more than p-values for modern scientific visualization.

AnswerA p-value tells you whether an effect is "statistically significant" at some threshold. A confidence interval tells you the range of plausible effect values. The interval is more informative: a tiny effect with a huge sample can have p < 0.001 and still be practically meaningless, while a moderate effect with a small sample can have p > 0.05 and still be worth discussing. Effect size + confidence interval together communicate both significance and magnitude, and modern guidance emphasizes them over p-value thresholds.

Q15. What are Type 42 and Type 3 fonts, and which is preferred?

AnswerType 42 is the TrueType font format used in modern PDF output — fonts are embedded as full character definitions. Type 3 is an older PostScript format where fonts are subsetted as character outlines, which many journal PDF processors cannot handle. Type 42 is strongly preferred; most modern matplotlib defaults to it, but older versions and some export paths still produce Type 3. Force Type 42 with `mpl.rcParams["pdf.fonttype"] = 42`.

Q16. Describe a QQ plot and what question it answers.

AnswerA QQ plot (quantile-quantile plot) plots the quantiles of a sample against the quantiles of a theoretical distribution (usually normal). If the sample matches the distribution, the points fall on the diagonal. If they deviate — curved, S-shaped, or with heavy tails — the sample is non-normal. QQ plots answer: "does my data follow the distribution I assumed?" Standard diagnostic for regression residuals and for choosing statistical tests.

Q17. How do you ensure a figure is grayscale-readable?

AnswerUse a colorblind-safe palette (Wong, Okabe-Ito) that happens to also work in grayscale, or use redundant encoding: color + line style + marker. Convert the figure to grayscale before submission and verify the categories are still distinguishable. If not, make adjustments before finalizing.

Q18. Write a statannotations call to add significance brackets for two specific pairs on a seaborn boxplot.

Answer
from statannotations.Annotator import Annotator

pairs = [("A", "B"), ("B", "C")]
annotator = Annotator(ax, pairs, data=df, x="group", y="value")
annotator.configure(test="t-test_ind", text_format="star")
annotator.apply_and_annotate()

Q19. Name three reproducibility best practices for scientific figures.

Answer(1) Publish the data and code alongside the paper (GitHub, Zenodo, OSF). (2) Show the full distribution of data (strip plots, box plots) rather than just mean + error. (3) Report sample sizes in the caption. (4) Report effect sizes with confidence intervals, not just p-values. (5) Use version control for figure code. (6) Provide both raw and processed data when journals request them.

Q20. Explain the journal submission checklist and why it is useful.

AnswerThe checklist is a set of 20 items to verify before submitting a figure: width, height, fonts, embedding, panel labels, error bars, color safety, format, DPI, caption, sample sizes, and others. It is useful because journals reject figures for production reasons (wrong font size, missing panel labels) as routinely as for content reasons, and running through the checklist catches these problems before submission. Over time, the checklist becomes internalized, but during early career it prevents avoidable resubmissions.

Scoring Rubric

Score Level Meaning
18–20 Mastery You can produce publication-quality figures that meet journal standards.
14–17 Proficient You know the main requirements; review font embedding and statistical annotations.
10–13 Developing You grasp the basics; re-read Sections 27.2-27.6 and work all Part B exercises.
< 10 Review Re-read the full chapter.

After this quiz, move on to Chapter 28 (Big Data Visualization).