Quiz: Essential Chart Types in matplotlib

Q: The matplotlib method for creating a line chart is: (a) `ax.line()` (b) `ax.plot()` (c) `ax.drawline()` (d) `ax.linechart()`

(b) `ax.plot()`. This is the canonical line chart method. It creates a Line2D Artist and adds it to the Axes. The name is a historical legacy from matplotlib's MATLAB-compatible origins (MATLAB's equivalent is `plot`).

Q: "A single Axes can contain multiple Line2D artists from multiple `ax.plot()` calls."

True. Each call to `ax.plot()` adds another Line2D artist to the Axes's Artist tree. To plot multiple series on the same chart, you simply call `ax.plot()` multiple times with different data, each call adding another line. This is how multi-series line charts are built.

DataField.Dev

Quiz: Essential Chart Types in matplotlib

20 questions. Aim for mastery (18+). If you score below 14, revisit the relevant sections before moving to Chapter 12.

Multiple Choice (10 questions)

1. The matplotlib method for creating a line chart is:

(a) ax.line() (b) ax.plot() (c) ax.drawline() (d) ax.linechart()

Answer

**(b)** `ax.plot()`. This is the canonical line chart method. It creates a Line2D Artist and adds it to the Axes. The name is a historical legacy from matplotlib's MATLAB-compatible origins (MATLAB's equivalent is `plot`).

2. To create a horizontal bar chart for a ranking of five product lines with long labels, which method should you use?

(a) ax.bar() (b) ax.barh() (c) ax.horizontalbar() (d) ax.hbar()

Answer

**(b)** `ax.barh()`. `ax.barh(categories, values)` produces horizontal bars with the categories on the y-axis and the values on the x-axis. This is the preferred orientation when the category labels are long (because horizontal labels are more legible than rotated vertical ones) or when there are many categories to compare.

3. The alpha parameter in ax.scatter(x, y, alpha=0.5) controls:

(a) The marker shape (b) The marker size (c) The transparency of the markers (d) The color of the markers

Answer

**(c)** The transparency of the markers. `alpha` ranges from 0 (fully transparent, invisible) to 1 (fully opaque, default). A value of 0.5 makes each marker 50% transparent, which is the standard technique for managing overplotting in dense scatter plots — overlap regions appear darker because multiple transparent points accumulate.

4. To encode a third variable as the size of scatter markers, you use the s parameter:

(a) ax.scatter(x, y, s=values) where values is an array (b) ax.scatter(x, y, size=values) (c) ax.scatter(x, y).set_size(values) (d) ax.scatter(x, y, markersize=values)

Answer

**(a)** `ax.scatter(x, y, s=values)` where values is an array. The `s` parameter takes either a single number (all markers the same size) or an array of numbers (each marker sized individually). The units of `s` are points squared, so the visual area of each marker is proportional to the value — which is what you want for an honest bubble chart. Passing `values` as the `s` array produces a bubble chart.

5. When choosing a bin count for a histogram, which of the following is a reasonable default for a dataset of 500 points?

(a) 3 bins (b) 500 bins (c) 20-50 bins (d) 1 bin

Answer

**(c)** 20-50 bins. For most datasets, 20-50 bins is a reasonable range: enough bins to show the shape of the distribution, not so many that the histogram becomes noisy. Statistical rules of thumb (Sturges, Scott, Freedman-Diaconis) produce specific recommendations but generally fall within this range. Too few bins hide structure; too many bins show noise instead of signal.

6. A box plot's "box" represents:

(a) The full range from minimum to maximum (b) The interquartile range (25th to 75th percentile) (c) One standard deviation from the mean (d) The 5th to 95th percentile

Answer

**(b)** The interquartile range (25th to 75th percentile). The box in a box plot covers the middle 50% of the data — from the first quartile (25th percentile) to the third quartile (75th percentile). The line inside the box is the median (50th percentile). The whiskers extend beyond the box to 1.5 × IQR by default, and points beyond the whiskers are shown as outliers.

7. According to Chapter 4's rules (referenced in this chapter), bar charts must:

(a) Have a y-axis that starts at zero (b) Use only blue bars (c) Never show more than three bars (d) Always be horizontal

Answer

**(a)** Have a y-axis that starts at zero. [Chapter 4](../../part-01-seeing-data/chapter-04-lies-distortions-honest-charts/index.md) established this rule: bar charts encode data through bar length, and a non-zero baseline distorts the length comparison. In matplotlib, you enforce this with `ax.set_ylim(0, max_value * 1.1)` because the default autoscaling often does not start at zero. This is one of the most important overrides for publication-quality bar charts.

8. To overlay two histograms for comparing distributions, which is the recommended approach?

(a) Use two separate figures (b) Plot both on the same Axes with alpha=0.5 or similar for transparency (c) Use a bar chart with the counts (d) Subtract one histogram from the other

Answer

**(b)** Plot both on the same Axes with `alpha=0.5` or similar for transparency.

ax.hist(group_a, bins=30, alpha=0.6, label="A", color="blue")
ax.hist(group_b, bins=30, alpha=0.6, label="B", color="orange")
ax.legend()

The transparency lets the overlap between the two distributions remain visible, so the reader can see where the distributions differ and where they are similar. Alternative: use `histtype="step"` to show only the outlines, which works well for up to three groups.

9. The ax.fill_between(x, lower, upper, alpha=0.2) method is used for:

(a) Coloring the entire chart background (b) Filling a region between two y-value curves, typically for confidence bands (c) Drawing a histogram (d) Creating a pie chart

Answer

**(b)** Filling a region between two y-value curves, typically for confidence bands. `ax.fill_between(x, lower, upper)` fills the vertical region between the `lower` and `upper` arrays at each x-value. Combined with a central line chart, this produces the standard "line with shaded confidence band" pattern used for time-series forecasts, climate reconstructions, and any data with continuous uncertainty. The `alpha=0.2` makes the fill transparent so the central line is still visible.

10. The chapter's threshold concept is that:

(a) Every matplotlib method should be memorized (b) Every parameter in every plot method is a design decision from Parts I and II (c) Pyplot is always preferable to the OO API (d) You should always use the default settings

Answer

**(b)** Every parameter in every plot method is a design decision from Parts I and II. `color` implements [Chapter 3](../../part-01-seeing-data/chapter-03-color/index.md) palette choices. `linewidth` implements [Chapter 6](../../part-02-design-principles/chapter-06-data-ink-ratio/index.md) visual weight. `alpha` manages pre-attentive processing from Chapter 2. `cmap` implements perceptual uniformity from Chapter 3. Every parameter connects to a principle. Learning matplotlib is learning how to translate design principles into method calls, which is why Parts I and II are prerequisites for Part III.

True / False (5 questions)

11. "A single Axes can contain multiple Line2D artists from multiple ax.plot() calls."

Answer

**True.** Each call to `ax.plot()` adds another Line2D artist to the Axes's Artist tree. To plot multiple series on the same chart, you simply call `ax.plot()` multiple times with different data, each call adding another line. This is how multi-series line charts are built.

12. "A histogram and a bar chart are the same thing."

Answer

**False.** They look similar (both use rectangular bars) but answer different questions. A bar chart compares values across discrete categories; a histogram shows the distribution of a single continuous variable by binning it. The x-axis of a bar chart is categorical; the x-axis of a histogram is continuous. [Chapter 5](../../part-01-seeing-data/chapter-05-choosing-the-right-chart/index.md)'s chart selection framework treats them as distinct chart types for distinct question types.

13. "The default matplotlib color cycle should be accepted for publication-quality charts because it is designed to be colorblind-safe."

Answer

**False.** matplotlib's default color cycle (the "tab10" colors as of matplotlib 2.0+) is reasonable but not deliberate. For publication-quality charts, you should specify colors explicitly based on the principles from [Chapter 3](../../part-01-seeing-data/chapter-03-color/index.md): sequential for ordered data, diverging for around-a-midpoint data, qualitative for categories, and always with colorblind safety verified. The default cycle is acceptable for quick exploration but is not a substitute for a thoughtful palette choice.

14. "Stacked bar charts are good at showing comparisons of middle and top segments across categories."

Answer

**False.** Stacked bar charts are good at showing totals and at comparing the bottom (first) segment across categories. Middle and top segments do not share a common baseline (each starts where the previous segment ended), so comparing them across categories is visually difficult. For comparing specific segments, use grouped bars or small multiples instead of stacked bars.

15. "The ax.scatter() method supports per-point color and size, while ax.plot() applies a single color and size to all points."

Answer

**True.** This is the main reason to use `ax.scatter()` instead of `ax.plot(marker="o", linestyle="None")`. The scatter method's `c` and `s` parameters can take arrays, letting you encode a third variable as color and a fourth variable as size (bubble chart). The plot method does not support this — it applies one color and one size to all points in a single call.

Short Answer (3 questions)

16. In three to four sentences, explain the difference between ax.plot() and ax.scatter() when producing a simple dot plot, and state when you would prefer each.

Answer

`ax.plot(x, y, marker="o", linestyle="None")` produces dots at each data point, connected by no line (because the linestyle is None). `ax.scatter(x, y)` also produces dots but supports per-point color (`c`) and size (`s`) parameters through arrays, allowing you to encode additional variables. For simple dot plots with uniform color and size, either method works; `plot` is slightly more efficient. For bubble charts or color-mapped scatter plots where different points need different visual properties, `ax.scatter()` is required because `ax.plot()` cannot vary color and size per point.

17. Describe three techniques for managing overplotting in dense scatter plots. For each, state when it is most appropriate.

Answer

**(1) Transparency (alpha).** Set `alpha=0.3` or similar so individual points are pale but overlapping points appear darker. Appropriate for a few hundred to a few thousand points where individual identity still matters and the density pattern is the signal. **(2) Smaller markers.** Set `s=5` or smaller so points take less visual space. Appropriate when the main structure is the overall distribution rather than individual points. **(3) Hexagonal binning or 2D histograms.** Replace scatter with `ax.hexbin(x, y, gridsize=30)` or `ax.hist2d(x, y, bins=50)` to show density directly. Appropriate for very large datasets (tens of thousands of points or more) where individual points cannot be distinguished anyway and density is the only meaningful signal.

18. The chapter includes a section on error bars and shaded confidence bands. Explain why these are not optional for charts of real measurement data, and describe when to prefer each form.

Answer

Real measurement data always has noise, sampling error, or estimation uncertainty. A chart that shows only point estimates implies precision the data does not support, which is a form of visualization dishonesty ([Chapter 4](../../part-01-seeing-data/chapter-04-lies-distortions-honest-charts/index.md)). Error bars or confidence bands are the standard way to acknowledge this uncertainty visually. **Error bars** (`ax.errorbar()`) are best for discrete data points where each has its own separate uncertainty — for example, a bar chart of group means with standard errors, or a scatter plot of experimental measurements. **Shaded confidence bands** (`ax.fill_between()`) are best for continuous functions like time series, where the uncertainty is meaningful at every point and the visual continuity of a band is easier to read than a forest of error bars.

Applied Scenarios (2 questions)

19. You are preparing a quarterly business review for Meridian Corp. You have data on five product lines with quarterly revenue for the past three years (12 quarters × 5 products = 60 data points). Your audience is the executive team, and you have 5 seconds of attention per chart. Choose a chart type for each of the following questions and write the matplotlib code skeleton:

(a) "How has total revenue changed over the past 12 quarters?" (b) "Which product line had the highest revenue in Q4 2024?" (c) "How does Q4 2024 revenue compare to Q4 2023 across the five product lines?"

Answer

**(a) Total revenue over 12 quarters:** Line chart showing total revenue (sum across products) over time. Change-over-time question.

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(quarters, total_revenue, linewidth=2, color="#1f77b4")
ax.set_title("Total Revenue: Q1 2022 - Q4 2024")
ax.set_ylabel("Revenue (USD millions)")

**(b) Highest revenue in Q4 2024:** Horizontal bar chart sorted by value. Comparison question with five categories.

q4_2024 = df[df["quarter"] == "Q4 2024"].sort_values("revenue")
fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(q4_2024["product"], q4_2024["revenue"], color="steelblue")
ax.set_title("Revenue by Product Line, Q4 2024")
ax.set_xlabel("Revenue (USD millions)")

**(c) Q4 2024 vs Q4 2023 comparison:** Grouped bar chart with two bars per product line. Comparison with two time points.

import numpy as np

products = ["Enterprise", "Professional", "Starter", "Growth", "Legacy"]
rev_2023 = [...]
rev_2024 = [...]
x = np.arange(5)
width = 0.35

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, rev_2023, width, label="Q4 2023", color="#1f77b4")
ax.bar(x + width/2, rev_2024, width, label="Q4 2024", color="#ff7f0e")
ax.set_xticks(x)
ax.set_xticklabels(products)
ax.set_ylabel("Revenue (USD millions)")
ax.legend()

For all three charts, you would also set explicit `ylim(0, ...)` to enforce [Chapter 4](../../part-01-seeing-data/chapter-04-lies-distortions-honest-charts/index.md)'s zero-baseline rule for bar charts, add action titles in [Chapter 12](../chapter-12-customization-mastery/index.md), and apply the decluttering discipline from Chapter 6.

20. A colleague sends you the following code to produce a climate chart. Diagnose three problems with it and propose fixes. (Assume the imports are correct.)

fig, ax = plt.subplots()
ax.bar(climate["year"], climate["anomaly"])
ax.set_title("Climate Data")
plt.title("Temperature Trends Over Time")
fig.savefig("climate.png")

Answer

**Problem 1: Wrong chart type.** Bar chart for a 140-year time series is the wrong choice — with 140+ bars, the result will be a dense wall of bars that is hard to read, and time-series questions are better answered by line charts. Also, bar charts must start their y-axis at zero ([Chapter 4](../../part-01-seeing-data/chapter-04-lies-distortions-honest-charts/index.md)), but anomaly data includes negative values, which breaks the bar chart convention. **Fix:** use `ax.plot(climate["year"], climate["anomaly"])` for a line chart. **Problem 2: Double title using both OO and pyplot.** The code calls `ax.set_title("Climate Data")` and then `plt.title("Temperature Trends Over Time")`. The pyplot call operates on the "current Axes," which is `ax`, so it overrides the first title. The result is that the first set_title call is wasted code. This is exactly the kind of bug the OO API is meant to prevent. **Fix:** use only `ax.set_title("...")` and remove the `plt.title()` call entirely. Do not mix pyplot and OO calls in the same code. **Problem 3: Missing essential details.** No figsize (defaults to a generic size that is not appropriate for time series), no dpi in savefig, no bbox_inches, no axis labels, no y-axis units. The saved file will be low-resolution and missing context. **Fix:**

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(climate["year"], climate["anomaly"], color="#d62728", linewidth=1.5)
ax.axhline(0, color="gray", linewidth=0.8, linestyle="--")
ax.set_title("Global Temperature Anomaly")  # should be an action title in real code
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (°C)")
fig.savefig("climate.png", dpi=300, bbox_inches="tight")

Review your results against the mastery thresholds at the top. If you scored below 14, revisit Sections 11.1 through 11.5 — each covers one of the five essential chart types and the key parameters you need to know. Chapter 12 assumes you are comfortable producing all five chart types and moves on to customizing their appearance with colors, typography, annotations, and style sheets.