Exercises: matplotlib Architecture

DataField.Dev

Exercises: matplotlib Architecture

These exercises include hands-on code for the first time in this book. You will need a Python environment with matplotlib and pandas installed. Jupyter is recommended but not required. Every exercise in Part B should be run and verified; do not just read them.

Part A: Conceptual (7 problems)

A.1 ★☆☆ | Recall

Name the three layers of matplotlib's architecture and describe what each layer does in one sentence.

Guidance

The three layers are Backend (renders the chart to pixels or vectors — Agg, PDF, SVG, Qt, etc.), Artist (the tree of Python objects that represent every visible element of the chart), and Scripting / pyplot (a convenience wrapper that provides one-liner chart creation). The Backend is about output, the Artist layer is where your charts live, and the Scripting layer is a thin layer of convenience functions.

A.2 ★☆☆ | Recall

Distinguish between Figure, Axes, and Axis (singular) in your own words. How many of each does a simple single-chart figure have?

Guidance

A **Figure** is the whole image — the entire PNG or PDF page. An **Axes** is a single plotting area within a Figure — what most people call "a chart." An **Axis** (singular, no 's') is one of the two numerical axes within an Axes — the x-axis or the y-axis. A simple single-chart figure has 1 Figure, 1 Axes, and 2 Axis objects (the x-axis and the y-axis of that Axes).

A.3 ★☆☆ | Understand

The chapter recommends the object-oriented API over pyplot. Why? Describe a specific scenario where pyplot's state-machine behavior would cause a bug that the OO API would avoid.

Guidance

Pyplot tracks a "current" Figure and a "current" Axes, and every pyplot call operates on whichever is currently active. A specific failure mode: you make a figure, make a second figure, then call `plt.title("My Chart")` intending it for the first figure — but pyplot sets the title on the second figure because that is now the current one. The OO API avoids this because `ax1.set_title("My Chart")` is explicit about which Axes gets the title.

A.4 ★★☆ | Understand

Explain the canonical fig, ax = plt.subplots() pattern. What does the call return? What happens when you call ax.plot(...) on the returned ax?

Guidance

`plt.subplots()` with no arguments creates a Figure containing one Axes, and returns both objects as a tuple. We unpack them into variables named `fig` and `ax`. The Figure is the top-level container; the Axes is the plotting area within it. When you call `ax.plot(x, y)`, matplotlib creates a new Line2D Artist with the specified data and adds it to the Axes's Artist tree. Nothing is rendered yet — the rendering happens when you call `fig.savefig()` or display the figure.

A.5 ★★☆ | Analyze

Explain the difference between the dpi parameter in plt.subplots(dpi=...) and the dpi parameter in fig.savefig(..., dpi=...). Why would you use different values for each?

Guidance

`plt.subplots(dpi=...)` sets the display DPI for interactive rendering. `fig.savefig(..., dpi=...)` sets the save DPI for the output file. You would use different values because you want fast interactive display (lower DPI, maybe 100) and high-quality output (higher DPI, 300 for print). A common pattern is to leave the display DPI at the default and specify `dpi=300` only at save time.

A.6 ★★☆ | Understand

The chapter says that when you call ax.plot([1,2,3], [4,5,6]), "a new Line2D Artist is added to the Axes's Artist tree." Explain what this means in the context of the chapter's threshold concept (that everything is an object). Why does this framing matter?

Guidance

The method call is not "drawing a line." It is creating a Python object (a Line2D) that represents the line, and adding it to a tree of other objects (the Axes). No pixels have been drawn yet. The drawing happens later, when matplotlib walks the tree during rendering. The framing matters because it explains why you can modify the line after creating it: you can find the Line2D in the tree and change its properties (color, linestyle, etc.) before rendering. If plot() were actually drawing, you could not modify the result.

A.7 ★★★ | Evaluate

The chapter warns that plt.show() is "actively harmful" in scripts but "fine" in interactive sessions. Explain the distinction. Under what circumstances would you use plt.show() in production code?

Guidance

In scripts, `plt.show()` opens a window and blocks the script until the user closes the window — which defeats the purpose of a script that is supposed to run without interaction. In interactive sessions (a Python REPL or a Jupyter notebook with a non-inline backend), `plt.show()` is used to display a chart on screen. Production code almost never uses `plt.show()`; it uses `fig.savefig()` to write the chart to a file. The exception is interactive debugging tools where the user is meant to inspect the chart visually.

Part B: Applied — Your First matplotlib Code (10 problems)

For these exercises, you need a working Python environment with matplotlib and pandas. Run each exercise; do not just read them.

B.1 ★☆☆ | Apply

Write the minimal working fig/ax pattern that plots the lists [1, 2, 3, 4, 5] and [2, 4, 6, 8, 10] as a line chart with the title "Linear Relationship." Save the result to a PNG file.

Guidance

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
ax.set_title("Linear Relationship")
fig.savefig("linear.png")

Verify that the PNG file was created and opens correctly. This is the minimal template you will use for every matplotlib exercise.

B.2 ★☆☆ | Apply

Modify your solution to B.1 to also set the x-axis label to "x" and the y-axis label to "y (= 2x)". Save to a PNG with dpi=300 and bbox_inches="tight".

Guidance

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
ax.set_title("Linear Relationship")
ax.set_xlabel("x")
ax.set_ylabel("y (= 2x)")
fig.savefig("linear.png", dpi=300, bbox_inches="tight")

The `dpi=300` and `bbox_inches="tight"` arguments are defaults you should remember for publication-quality output.

B.3 ★★☆ | Apply

Create a figure with figsize=(12, 4) (wide aspect ratio for time series). Plot the years 2000 through 2020 on the x-axis and any monotonically increasing values (e.g., a list like [10, 12, 14, 17, 21, 26, 32, 39, 47, 56, 66, 77, 89, 102, 116, 131, 147, 164, 182, 201, 221]) on the y-axis. Title the chart "Cumulative Growth, 2000-2020."

Guidance

years = list(range(2000, 2021))
values = [10, 12, 14, 17, 21, 26, 32, 39, 47, 56, 66, 77, 89, 102, 116, 131, 147, 164, 182, 201, 221]

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(years, values)
ax.set_title("Cumulative Growth, 2000-2020")
ax.set_xlabel("Year")
ax.set_ylabel("Value")
fig.savefig("growth.png", dpi=300, bbox_inches="tight")

Notice the figsize — wide aspect ratio for a time-series chart, as [Chapter 8](../../part-02-design-principles/chapter-08-layout-composition-small-multiples/index.md) recommended.

B.4 ★★☆ | Apply

Create a figure with plt.subplots(1, 2) (one row, two columns). On the left panel, plot a line chart. On the right panel, plot a scatter of the same data. Give each panel its own title. Use the OO API — do not use pyplot after the initial subplots() call.

Guidance

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

ax1.plot(x, y)
ax1.set_title("As a Line")

ax2.scatter(x, y)
ax2.set_title("As Points")

fig.savefig("line_vs_scatter.png", dpi=300, bbox_inches="tight")

Notice that we unpack the tuple `(ax1, ax2)` to get explicit references to the two Axes. Every method call is on `ax1` or `ax2`, never on `plt`.

B.5 ★★☆ | Apply

Create a figure with a 2×2 grid of subplots. Plot four different simple functions (e.g., y=x, y=x^2, y=x^3, y=x^4) over the range x=0 to x=5. Give each panel a descriptive title. Use axes = plt.subplots(2, 2) and access individual panels via axes[0, 0], axes[0, 1], axes[1, 0], axes[1, 1].

Guidance

import numpy as np
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

x = np.linspace(0, 5, 100)

axes[0, 0].plot(x, x)
axes[0, 0].set_title("y = x")

axes[0, 1].plot(x, x**2)
axes[0, 1].set_title("y = x^2")

axes[1, 0].plot(x, x**3)
axes[1, 0].set_title("y = x^3")

axes[1, 1].plot(x, x**4)
axes[1, 1].set_title("y = x^4")

fig.savefig("powers.png", dpi=300, bbox_inches="tight")

With 2x2, `axes` is a 2D numpy array. Index it as `axes[row, col]`. For 1D grids (`plt.subplots(1, 3)`), you can index as `axes[0]`, `axes[1]`, `axes[2]`.

B.6 ★★☆ | Apply

Modify B.5 to use sharex=True, sharey=True in plt.subplots(). What changes visually? Why does matplotlib need you to opt in to shared axes rather than making it the default?

Guidance

fig, axes = plt.subplots(2, 2, figsize=(10, 8), sharex=True, sharey=True)

With shared axes, all four panels use the same x-limits and the same y-limits. This enables comparison across panels (since values at the same position are directly comparable) but can hide the variation in panels with smaller ranges. matplotlib makes it opt-in because sometimes you want independent scales (e.g., for small multiples with free axes, as discussed in [Chapter 8](../../part-02-design-principles/chapter-08-layout-composition-small-multiples/index.md)).

B.7 ★★☆ | Apply

Create a line chart and save it in three different formats: PNG, SVG, and PDF. What are the sizes of the three files? Which format would you use for (a) a web page, (b) a printed journal article, (c) editing in Illustrator?

Guidance

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
ax.set_title("Multi-Format Save")

fig.savefig("chart.png", dpi=300, bbox_inches="tight")
fig.savefig("chart.svg", bbox_inches="tight")
fig.savefig("chart.pdf", bbox_inches="tight")

Typical relative sizes: SVG is smallest (tiny XML file), PDF is small-to-medium (vector), PNG is largest (many pixels). Use PNG for web, PDF for print journals, SVG for editing in vector tools.

B.8 ★★☆ | Apply

Load the climate data (or fabricate a simple version if you do not have it) and produce the "ugly climate plot" from Section 10.9. The chart should have a default-style title, missing units on the y-axis, default spines, and no annotations. Save it as climate_ugly.png. You will use this file as the "before picture" for exercises in Chapter 11 and 12.

Guidance

If you have real climate data (e.g., from NOAA or NASA GISS), load it with pandas:

import pandas as pd
climate = pd.read_csv("climate_data.csv")

If you do not have the real data, fabricate something similar:

import numpy as np

years = list(range(1880, 2025))
# Fabricated trend that looks roughly like real temperature anomalies
anomalies = [-0.2 + 0.008 * (y - 1880) + 0.1 * np.sin(0.3 * (y - 1880)) for y in years]
climate = pd.DataFrame({"year": years, "anomaly": anomalies})

Then produce the chart:

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(climate["year"], climate["anomaly"])
ax.set_title("Temperature Anomaly")
ax.set_xlabel("Year")
ax.set_ylabel("Anomaly")
fig.savefig("climate_ugly.png", dpi=150, bbox_inches="tight")

Verify that the output has all the problems listed in Section 10.9: descriptive title, missing units, default spines, no annotations. This is the starting point for everything in Chapters 11 and 12.

B.9 ★★★ | Apply

Produce the same climate chart as B.8, but this time explicitly set the figure size to match the Chapter 8 aspect-ratio recommendation for a 150-year time series. Justify your choice.

Guidance

For a 150-year time series, [Chapter 8](../../part-02-design-principles/chapter-08-layout-composition-small-multiples/index.md) recommended a wide aspect ratio, perhaps 3:1 or 4:1. A reasonable choice:

fig, ax = plt.subplots(figsize=(15, 4))

or

fig, ax = plt.subplots(figsize=(12, 4))

The wide aspect ratio gives the time dimension enough horizontal extent and matches Cleveland's "banking to 45 degrees" heuristic for slope perception. A square chart (figsize=(8, 8)) would cram the temporal variation into too narrow a horizontal range.

B.10 ★★★ | Create

Write a small function make_time_series_chart(years, values, title, ylabel) that takes the data and labels as arguments, creates the figure, plots the line, sets the title and labels, and returns the (fig, ax) tuple without saving. The caller is responsible for saving. Demonstrate the function by calling it twice with different datasets and saving each result to a separate file.

Guidance

def make_time_series_chart(years, values, title, ylabel):
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.plot(years, values)
    ax.set_title(title)
    ax.set_xlabel("Year")
    ax.set_ylabel(ylabel)
    return fig, ax

# Use it:
years = list(range(2000, 2021))
revenue = [100 + i * 15 for i in range(21)]
temperature = [14.5 + i * 0.01 for i in range(21)]

fig1, ax1 = make_time_series_chart(years, revenue, "Revenue Growth", "USD millions")
fig1.savefig("revenue.png", dpi=300, bbox_inches="tight")

fig2, ax2 = make_time_series_chart(years, temperature, "Temperature Rising", "°C")
fig2.savefig("temperature.png", dpi=300, bbox_inches="tight")

This is the beginning of building reusable matplotlib utilities. [Chapter 12](../chapter-12-customization-mastery/index.md) will extend this pattern with styling functions.

Part C: Synthesis and Design Judgment (5 problems)

C.1 ★★★ | Analyze

Take the code from B.8 (the ugly climate plot) and identify every line where a design decision from Parts I and II is being made (or implicitly deferred to a default). For each identified line, state what the decision is and which chapter discussed it.

Guidance

- `plt.subplots(figsize=(10, 6))` — aspect ratio decision ([Chapter 8](../../part-02-design-principles/chapter-08-layout-composition-small-multiples/index.md)). The default width/height is not optimized for time series. - `ax.plot(...)` — chart type selection ([Chapter 5](../../part-01-seeing-data/chapter-05-choosing-the-right-chart/index.md)). Line chart is appropriate for change over time. - `ax.set_title("Temperature Anomaly")` — title choice ([Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md)). Descriptive title, not an action title. - `ax.set_xlabel("Year")` — axis labeling ([Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md)). Minimal label. - `ax.set_ylabel("Anomaly")` — axis labeling ([Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md)). Missing units — should say "Temperature Anomaly (°C)". - `fig.savefig(..., dpi=150, ...)` — output resolution. Lower than print quality. - Implicit: default color (not chosen — [Chapter 3](../../part-01-seeing-data/chapter-03-color/index.md)), default spines (should be removed — [Chapter 6](../../part-02-design-principles/chapter-06-data-ink-ratio/index.md)), no annotations ([Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md)), no source attribution ([Chapter 7](../../part-02-design-principles/chapter-07-typography-annotation/index.md)). Every deferred default is a future decision that you can override in Chapters 11-12.

C.2 ★★★ | Create

Write a "matplotlib starter template" file (.py or .ipynb cell) that you will use as the starting point for every new chart. The template should include: the standard imports, a fig, ax = plt.subplots() call with a specific figsize, a stub title, axis labels, and a savefig call. Include comments indicating where to customize for different chart types.

Guidance

"""matplotlib starter template."""

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Load data (customize this)
# df = pd.read_csv(...)

# Create figure (customize figsize for chart type: wide for time series, square for scatter, etc.)
fig, ax = plt.subplots(figsize=(10, 6))

# Plot data (customize: ax.plot, ax.bar, ax.scatter, ax.hist, ax.boxplot)
# ax.plot(df["x"], df["y"])

# Title and labels (customize: use an action title that states the finding)
ax.set_title("TITLE GOES HERE")
ax.set_xlabel("X LABEL (with units in parens)")
ax.set_ylabel("Y LABEL (with units in parens)")

# Coming in Chapter 6 declutter — remove top/right spines
# Coming in Chapter 7 typography — action title, annotations
# Coming in Chapter 12 — colors, fonts, styling

# Save
fig.savefig("output.png", dpi=300, bbox_inches="tight")

Save this as `matplotlib_starter.py` or paste it into the top cell of new notebooks. Over time, you will customize it as you develop your own style conventions.

C.3 ★★★ | Evaluate

The chapter argues that the object-oriented API is preferable to pyplot. However, nearly all Stack Overflow answers and many online tutorials use pyplot. Given this reality, how should a beginner approach learning matplotlib? Should they learn pyplot first because it is more common, or OO first because it is better?

Guidance

This book argues for OO first because it scales better. The cost is that Stack Overflow answers look slightly different from OO code and need mental translation. The benefit is that once you understand the fig/ax pattern, everything else clicks into place. Pyplot's state machine causes bugs that take time to understand if you learn it first. A reasonable approach: learn OO as the canonical pattern, but be able to read pyplot and translate it in your head when you encounter it on Stack Overflow.

C.4 ★★★ | Evaluate

Most matplotlib tutorials start with import matplotlib.pyplot as plt and then use only pyplot functions. This book uses the same import but then uses fig, ax = plt.subplots() and avoids other pyplot calls. Is the pyplot import still necessary in OO code? Explain.

Guidance

Yes, the pyplot import is still necessary because `plt.subplots()` is a pyplot function — it is the standard way to create a Figure and Axes. After that line, all subsequent method calls are on the explicit `fig` and `ax` objects, but you still use pyplot to create the initial objects. Some very advanced usage imports `matplotlib.figure.Figure` directly and creates figures without pyplot, but this is rare in practice. The conventional pattern is "pyplot for creation, OO for everything else."

C.5 ★★★ | Create

Read the matplotlib gallery at matplotlib.org/stable/gallery/ and find an example that uses a chart type you have not yet used. Copy the code into a notebook, run it, and modify at least three things (color, title, data, figsize, or similar). Save the result.

Guidance

The point is to practice navigating the gallery and adapting examples. The specific chart type does not matter. What matters is that you: (1) find the gallery, (2) pick an example, (3) understand enough of the code to modify it, (4) verify your modifications produce the expected changes. This is how experienced matplotlib users work — they rarely write charts from scratch; they adapt gallery examples to their needs.

These exercises are hands-on for the first time in this book. Every exercise in Part B should be run in a Python environment, not just read. Matplotlib is a library you learn by typing code, making mistakes, and fixing them. If you only read the exercises, you will not retain the syntax. If you type and run them, the fig/ax pattern will become automatic within a few dozen examples.