35 min read

> "Matplotlib is the plotting library everyone loves to hate and hates to love."

Learning Objectives

  • Explain the matplotlib architecture: the three-layer model (Backend, Artist, Scripting) and why the OO API is preferred over pyplot for anything beyond quick exploration
  • Distinguish between Figure, Axes, and Axis objects and state the role of each
  • Create a figure using the canonical OO pattern: `fig, ax = plt.subplots()` and configure properties using method calls on `ax`
  • Explain why `plt.plot()` (pyplot / state-machine API) is convenient for one-liners but dangerous for production code
  • Configure figure size, DPI, and save figures to multiple formats (PNG, SVG, PDF) with appropriate resolution settings
  • Navigate the matplotlib documentation: find methods on Axes, look up parameters, use the gallery
  • Explain the rendering pipeline: how matplotlib converts Python objects into pixels or vector output
  • Identify the most common beginner traps and know how to fix each one

Chapter 10: matplotlib Architecture: Figures, Axes, and the Object-Oriented API

"Matplotlib is the plotting library everyone loves to hate and hates to love." — John D. Hunter, matplotlib's creator, approximately


Welcome to Part III.

For the first nine chapters of this book, we have been building the conceptual foundations of data visualization without writing a single line of code. You now know why visualization matters (Chapter 1), how the eye processes charts (Chapter 2), what color is for (Chapter 3), how charts lie (Chapter 4), which chart to pick for which question (Chapter 5), and how to declutter (Chapter 6), typographically polish (Chapter 7), compose (Chapter 8), and sequence (Chapter 9) your charts into a complete data story. You can look at a chart — any chart — and evaluate whether it is working. You know what an action title is. You know why 3D pie charts are indefensible. You know what the grayed-out strategy does. All of that was deliberate. We wanted you to develop the critical eye and the design instincts before you learned the tool, because the tool is not the skill. The tool just implements the skill.

But now it is time to implement. Part III of this book is about matplotlib — the Python library that draws every chart you have been imagining. Matplotlib is the oldest and most important Python visualization library. It underpins seaborn, pandas plotting, geopandas, and many other tools. It is the default plotting backend for most scientific Python workflows. It is installed on every data analyst's machine. If you learn one visualization library in Python, it has to be matplotlib — and not because matplotlib is the best, but because everything else is built on top of it and will require you to understand matplotlib when their abstractions leak.

This chapter is your introduction to matplotlib's architecture. It is not a tutorial on how to produce a specific chart type — those chapters come next (Chapter 11 covers line, bar, scatter, histogram, and box plots; Chapter 12 covers customization and styling). This chapter is about the structure of matplotlib: the Figure, the Axes, the Artist tree, the backends, the rendering pipeline. Understanding the structure matters because without it, the rest of matplotlib feels like a random collection of function names that you have to memorize. With it, the function names organize themselves around a mental model of what matplotlib is doing, and you can figure out how to do new things by reasoning from the model rather than by searching Stack Overflow.

The threshold concept of the chapter — the single mental shift that makes matplotlib make sense — is that matplotlib is an object library, not a drawing library. You do not draw on a canvas; you build a tree of Python objects, and matplotlib renders the tree. Every visible element on your chart — every line, every dot, every letter, every bar — is a Python object with methods and properties. Your job as a matplotlib user is to configure those objects. The rendering is automatic once the objects are configured correctly.

Some warnings before we start. Matplotlib has two coexisting APIs. One is called pyplot (accessed as import matplotlib.pyplot as plt and then calls like plt.plot(), plt.bar(), plt.title()). The other is called the object-oriented API (where you create Figure and Axes objects explicitly and call methods on them: fig, ax = plt.subplots() followed by ax.plot(), ax.bar(), ax.set_title()). Most tutorials teach the pyplot API first because it produces one-liner code that looks simple. Most books, including this one, strongly recommend the object-oriented API for anything beyond a five-second exploratory chart. The reason is that pyplot maintains hidden state ("which Axes am I currently drawing on?"), and the hidden state becomes a source of bugs the moment you have more than one Axes in your figure. The OO API makes the state explicit, which is harder to type but easier to reason about. Throughout Part III, we use the OO API almost exclusively.

No code in this chapter's opening section — but there will be plenty of code starting in Section 10.2. By the end of the chapter, you will understand what matplotlib is doing under the hood, and you will have written and saved your first climate chart.


10.1 The Three-Layer Architecture

Matplotlib is built as a three-layer architecture. Understanding the three layers gives you a mental model for everything that follows.

Layer 1: Backend

The backend is the part of matplotlib that actually draws pixels (or vectors) on an output surface. When you call savefig("chart.png"), the backend is the thing that takes the tree of Python objects in your figure and produces the PNG file. When you call plt.show() in an interactive Python session, the backend is the thing that pops up the window on your screen.

Matplotlib has several backends, and the right one depends on your use case:

  • Agg (Anti-Grain Geometry) — the default for non-interactive use. Produces raster output (PNG, JPG, raw pixel arrays). Fast, high-quality, and used for nearly all batch chart generation. When you run a script that saves a chart to a PNG, matplotlib is using Agg under the hood.
  • PDF, PS (PostScript), SVG — vector output backends. Produce resolution-independent output suitable for publication (PDF for print, SVG for editing in tools like Illustrator or Figma, PS for LaTeX). Vector formats let you zoom in without blurring because the output is a description of shapes, not a grid of pixels.
  • Cairo — an alternative vector backend with better font rendering and support for additional formats.
  • Qt, Tk, WxAgg, GTK, macOS — interactive backends that pop up a window on your screen for live plot manipulation. Useful for interactive debugging but not for reproducible chart generation.
  • Inline (in Jupyter notebooks) — displays charts inline in notebook cells. This is what you are probably using if you work in Jupyter.
  • Widget (%matplotlib widget in Jupyter) — interactive inline plots with zoom, pan, and hover.

Most of the time, you do not need to think about the backend. Matplotlib chooses a reasonable default: Agg when you save to a file, the inline backend when you are in a Jupyter notebook, and an interactive backend when you run a script from a GUI-enabled Python session. You switch backends explicitly only when you need something the default does not provide (for example, SVG output for a publication, or the widget backend for interactive notebook plots).

The backend is important to know about because it affects two specific things: output format (what kind of file you can save) and font rendering (some backends handle fonts better than others). But in day-to-day work, the backend is an implementation detail that matplotlib manages for you.

Layer 2: Artist

The Artist layer is where your charts actually live. Every visible element on a matplotlib chart — every line, every dot, every bar, every label, every tick mark, every annotation — is an instance of a class called Artist. The Figure itself is an Artist. The Axes is an Artist. The Line2D that represents a line in your chart is an Artist. The Text object that represents the title is an Artist. Everything. Artists.

Artists exist in a tree structure. The top of the tree is a Figure. A Figure contains one or more Axes objects (note the plural — "Axes" is a matplotlib-specific term for a single plotting area, and it is always plural even when there is only one). Each Axes contains plot artists (Line2D, PathCollection, Rectangle, etc.) and annotation artists (Text, Legend, etc.). The tree structure lets matplotlib manage the rendering: when you call fig.canvas.draw(), matplotlib walks the tree, asks each Artist to draw itself, and assembles the result.

Understanding the tree matters because most of what you do in matplotlib is configuring individual Artists. When you call ax.plot(x, y), matplotlib creates a new Line2D Artist and adds it to the Axes. When you call ax.set_title("My Chart"), matplotlib modifies the existing Text Artist that holds the title. When you call ax.legend(), matplotlib creates a new Legend Artist and adds it to the Axes. Every method call on an Axes is, at some level, a manipulation of the Artist tree.

The threshold concept of this chapter is rooted here: you are not drawing; you are configuring a tree of Artists. The rendering happens automatically when the tree is ready. Your job is to get the tree right.

Layer 3: Scripting (pyplot)

The scripting layer is pyplot. It is a thin convenience wrapper around the Artist layer, designed to make simple charts easy to produce with one-line commands. When you write:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Hello, matplotlib")
plt.show()

...you are using pyplot. Under the hood, pyplot is doing something like:

  1. Creating a Figure (if none exists).
  2. Creating an Axes inside the Figure (if none exists).
  3. Calling .plot() on the current Axes.
  4. Calling .set_title() on the current Axes.
  5. Calling the backend's rendering routine.

The key words in that list are "current" — pyplot tracks a current Figure and a current Axes, and every pyplot call operates on whichever Figure and Axes are currently active. This is called the state-machine API because the API behavior depends on state that is not explicit in the function calls.

Pyplot is fine for throwaway exploratory charts: plt.hist(data) and you are done. It starts to cause problems when you have multiple Figures or multiple Axes, because keeping track of "which one is current" becomes a source of bugs. The classic pyplot failure mode: you make a figure, make another figure, then call plt.title("Old Title") intending to title the first figure, but pyplot titles the second one because that is now the current figure.

The object-oriented API avoids this by making the state explicit. You hold references to the Figure and Axes objects you care about, and you call methods directly on them:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # create a Figure and one Axes
ax.plot([1, 2, 3], [4, 5, 6])  # plot on THIS specific Axes
ax.set_title("Hello, matplotlib")  # title THIS specific Axes
fig.savefig("chart.png")  # save THIS specific Figure

Notice that pyplot (plt) is still used for one thing — creating the Figure and Axes — but everything after that is method calls on the explicit objects. This is the canonical fig/ax pattern, and it is the pattern you should use for nearly everything in Part III. We will explore it in detail in the next section.


10.2 Figure, Axes, and Axis: The Core Trinity

Three matplotlib classes do most of the work in any chart: Figure, Axes, and Axis. The names are similar and easy to confuse, so this section walks through them carefully.

Figure

A Figure is the top-level container in matplotlib. It represents the entire image you are creating — the whole PNG, the whole PDF page, the whole Jupyter cell output. A Figure has a size (in inches), a resolution (in DPI), a background color, and a set of child Axes.

You create a Figure in several ways. The most common is through plt.subplots(), which creates a Figure and a specified number of Axes in a single call:

fig, ax = plt.subplots()  # one Figure with one Axes
fig, (ax1, ax2) = plt.subplots(1, 2)  # one Figure with two Axes side by side
fig, axes = plt.subplots(2, 3)  # one Figure with a 2x3 grid of Axes

You can also create a Figure explicitly with plt.figure() and add Axes later:

fig = plt.figure(figsize=(10, 6))  # create an empty Figure
ax = fig.add_subplot(1, 1, 1)  # add one Axes covering the whole Figure

The two approaches produce equivalent results. plt.subplots() is the canonical modern way because it handles the common case in one line. Use plt.figure() + add_subplot() when you need more fine-grained control over the subplot layout.

Key Figure properties:

  • figsize: the size of the Figure in inches as a (width, height) tuple. Default is (6.4, 4.8). This is one of the most important parameters in all of matplotlib, and one of the most frequently adjusted. For a time-series chart that should be wide (Chapter 8's aspect ratio rules), you might use figsize=(12, 4). For a square scatter plot, figsize=(6, 6). For a publication-ready figure, the size is often dictated by the journal's column width.
  • dpi: dots per inch. Default is 100 for display, 300 for print. You can set this at Figure creation (plt.subplots(dpi=150)) or at save time (fig.savefig("chart.png", dpi=300)).
  • facecolor: background color of the Figure. Default is white.
  • tight_layout() / constrained_layout=True: the Figure's layout manager, which we will cover in detail in Chapter 13.

Axes

An Axes is a single plotting area — what most people think of as "a chart." It has a rectangular region on the Figure where data is drawn, an x-axis and a y-axis, a title, and all the plot elements (lines, dots, bars, labels).

This is the object you will call the most methods on in all of matplotlib. Understanding Axes methods is 80% of understanding matplotlib. A few important groups:

  • Plot methods: ax.plot(), ax.scatter(), ax.bar(), ax.hist(), ax.boxplot(), ax.fill_between(), ax.imshow(), and many others. These are the methods that add data to the Axes. Each one creates new Artists and adds them to the Axes's Artist tree.
  • Label methods: ax.set_title(), ax.set_xlabel(), ax.set_ylabel(). These set the text elements. They correspond to the typography discipline from Chapter 7.
  • Limit methods: ax.set_xlim(), ax.set_ylim(). These control the displayed data range. Important for the truncated-axis discussion from Chapter 4 — you choose the axis range explicitly here.
  • Scale methods: ax.set_xscale("log"), ax.set_yscale("symlog"). These control the axis scaling. The FT pandemic chart from Chapter 5's Case Study 2 used set_yscale("log").
  • Tick methods: ax.set_xticks(), ax.set_yticks(), ax.tick_params(). These control the tick positions and formatting. Chapter 7's axis formatting discussion is implemented here.
  • Grid/spine methods: ax.grid(), ax.spines["top"].set_visible(False). These implement the declutter procedure from Chapter 6.
  • Legend method: ax.legend(). Creates a legend for the Axes.
  • Annotation method: ax.annotate(), ax.text(). Add text annotations to the chart.

When you see ax.something() anywhere in this book or in the matplotlib documentation, what is happening is: the Axes object ax has a method called something(), and calling it either adds new Artists to the Axes's tree or modifies existing ones.

Axis (Singular)

An Axis (note: singular, no 's') is a single numerical axis — the x-axis or the y-axis of a plotting area. An Axes has two Axis objects: ax.xaxis and ax.yaxis. The Axis is where tick marks live, where tick labels live, and where the axis label lives.

Most of the time, you do not interact with Axis objects directly. You use convenience methods on the Axes that delegate to the Axis (for example, ax.set_xlabel("Year") delegates to ax.xaxis.set_label_text("Year")). But you will occasionally need to manipulate an Axis directly for advanced formatting, and it is useful to know that it exists.

Putting It Together

To summarize:

  • Figure = the whole image (the PNG, the PDF page).
  • Axes = a single plotting area within a Figure (what most people call "a chart").
  • Axis (singular) = one of the two numerical axes (x-axis or y-axis) within an Axes.

A Figure can contain one Axes (a simple single-chart figure) or many Axes (a small-multiple figure from Chapter 8). Each Axes contains exactly two Axis objects — the x-axis and the y-axis — plus all the plot elements.

The naming is confusing because "Axes" and "Axis" are so similar. The rule: if you are working with a whole chart (with labels, title, data, etc.), you want Axes. If you are working with just the x or y axis (tick marks, tick labels, scale), you want Axis.

Check Your Understanding — You have a figure with two side-by-side plots (a time series on the left and a bar chart on the right). How many Figures are there? How many Axes? How many Axis objects?


10.3 The Canonical fig/ax Pattern

This section introduces the single most important code pattern in all of matplotlib. Every example in the rest of Part III is built on it. Memorize it.

The Pattern

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [1, 4, 9, 16, 25])
ax.set_title("A Simple Squared-Values Chart")
ax.set_xlabel("x")
ax.set_ylabel("x squared")
fig.savefig("squared.png", dpi=300, bbox_inches="tight")

Six lines of code. Read them carefully.

Line 1: import matplotlib.pyplot as plt. The standard matplotlib import. plt is the conventional alias; every matplotlib tutorial uses it; do not change it.

Line 2: fig, ax = plt.subplots(). This is the line that creates everything. plt.subplots() with no arguments creates a Figure containing a single Axes, and returns both objects. We unpack them into variables named fig (the Figure) and ax (the Axes). From this point on, we have explicit references to both objects.

Line 3: ax.plot(x, y). This calls the plot method on our specific Axes. It adds a line Artist to the Axes's Artist tree. The first argument is the x-values, the second is the y-values. Both can be Python lists, numpy arrays, pandas Series, or anything that can be iterated and converted to numbers.

Line 4: ax.set_title("..."). Sets the title text Artist on the Axes. (Under the hood: the Axes already has a Text Artist for the title; this call modifies its text content.)

Lines 5-6: ax.set_xlabel(...) and ax.set_ylabel(...). Set the axis label text Artists.

Line 7: fig.savefig("squared.png", dpi=300, bbox_inches="tight"). Tells the Figure to render itself to a PNG file. The dpi=300 argument sets the resolution (300 is publication-ready). The bbox_inches="tight" argument tells savefig to crop tight around the actual content, which eliminates excess whitespace at the edges — a small but important default to remember.

Six lines, one chart. This is the baseline pattern. Every subsequent complication (multiple Axes, styled titles, custom colors, annotations, legends) is an extension of this pattern. The key insight is that everything after line 2 is method calls on the explicit fig and ax objects, not on the plt module. We are using the object-oriented API.

Why Not pyplot?

The pyplot equivalent of the same chart looks almost as clean:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4, 5], [1, 4, 9, 16, 25])
plt.title("A Simple Squared-Values Chart")
plt.xlabel("x")
plt.ylabel("x squared")
plt.savefig("squared.png", dpi=300, bbox_inches="tight")

For a one-chart figure, the pyplot version is almost indistinguishable from the OO version. The difference shows up when you add a second Axes:

# Object-oriented version
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(x, y)
ax1.set_title("Left panel")  # explicit: set title on ax1
ax2.scatter(x, y)
ax2.set_title("Right panel")  # explicit: set title on ax2

vs.

# pyplot version
fig = plt.figure()
plt.subplot(1, 2, 1)
plt.plot(x, y)
plt.title("Left panel")  # Which Axes does this title go on?
plt.subplot(1, 2, 2)
plt.scatter(x, y)
plt.title("Right panel")  # Which Axes does this title go on?

In the pyplot version, plt.title() depends on the "current" Axes, which is set by the most recent plt.subplot() call. It works here, but the moment you start doing anything more complex (a helper function that takes an Axes as an argument, a loop over subplots, a multi-figure script), the "current Axes" state becomes hard to track. Bugs in pyplot code often come from the wrong Axes being current when a method is called.

The OO API avoids this problem entirely. You have explicit references to each Axes. You know which Axes you are operating on because you call a method directly on it. The state is in the variable names, not hidden in a global state machine. This is why every book in the modern Python visualization literature — Wilke, Knaflic, Cairo's Python companion, McKinney's pandas book, VanderPlas's Data Science Handbook — teaches the OO API as the primary approach.

Common Variations of the Pattern

The fig/ax pattern has a few common variations. Knowing them will save you time.

Multi-panel figures:

# 1 row, 2 columns
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# 2 rows, 2 columns
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
# axes is a 2D numpy array; access as axes[0, 0], axes[0, 1], axes[1, 0], axes[1, 1]

# 2 rows, 3 columns with shared axes
fig, axes = plt.subplots(2, 3, figsize=(12, 6), sharex=True, sharey=True)
# all panels share the same x and y limits

Single Axes with explicit figsize:

fig, ax = plt.subplots(figsize=(10, 6))  # wider than default

Getting the current Figure and Axes in pyplot-style code:

# Sometimes you are in a context where pyplot has created objects
# and you want to get explicit references to them:
fig = plt.gcf()  # get current figure
ax = plt.gca()   # get current axes

# Useful for converting pyplot code to OO code, but avoid in new code.

Creating a figure and an axes separately:

fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(1, 1, 1)
# equivalent to plt.subplots() but slightly more flexible

For the rest of Part III, we use plt.subplots() as the default pattern. Learn it, use it, trust it.

Check Your Understanding — Write the fig/ax pattern for a single chart that plots the list [10, 20, 30, 40] as a line with the title "Simple Line." Do not use pyplot methods beyond subplots().


10.4 Configuring the Figure: figsize, dpi, and Why They Matter

Figure Size (figsize)

The figsize argument to plt.subplots() controls the size of the Figure in inches. This is the single most important Figure-level parameter and one of the most commonly adjusted.

fig, ax = plt.subplots(figsize=(10, 6))  # 10 inches wide, 6 inches tall

Why inches? Because matplotlib was designed for publication output, and publication sizes are typically specified in inches (or millimeters, which matplotlib also accepts via figsize=... in an appropriate unit conversion). The default figsize=(6.4, 4.8) is a slightly-wider-than-tall rectangle that works for many single-chart figures but is usually not optimal for any specific chart.

Choosing figsize well means applying the aspect-ratio principles from Chapter 8:

  • Time series (wide): figsize=(12, 4) or figsize=(10, 4) for long time ranges. Cleveland's "banking to 45 degrees" rule applies here.
  • Scatter plot (square): figsize=(6, 6) or figsize=(8, 8) for two equally-important continuous variables.
  • Horizontal bar chart (tall): figsize=(8, 12) if you have many categories.
  • Small multiple grid: depends on the grid shape. A 2x3 small multiple might be figsize=(12, 8) — two rows of three, each panel roughly 4x4 inches.
  • Dashboard tile: smaller, maybe figsize=(5, 3), sized to match dashboard grid cells.
  • Publication figure: dictated by the journal. For a single-column figure in a typical academic journal, figsize=(3.5, 2.5) might be correct. For a two-column figure, figsize=(7, 5).

The figsize you choose affects how the chart will look at its final destination. A chart with figsize=(12, 4) that is displayed at half that size (a thumbnail, a slide embedding) will look cramped; a chart with figsize=(4, 3) that is displayed at twice that size will look stretched. Choose figsize to match the final display size, not to arbitrary values.

DPI and Resolution

DPI stands for "dots per inch" — the number of pixels per inch of the output image. Higher DPI means more pixels, which means better visual quality, especially when the chart is printed or displayed on high-resolution screens.

Matplotlib has two different DPI settings that are frequently confused:

  • fig.dpi or plt.subplots(dpi=...) — the display DPI. This controls how the chart is rendered during interactive display and what the default "size on screen" is.
  • fig.savefig(..., dpi=...) — the save DPI. This controls the resolution of the output file when you save it.

These two can be different. A chart with fig.dpi = 100 and fig.savefig(..., dpi=300) is rendered at 100 DPI for display but saved at 300 DPI for printing. This is usually what you want: a fast interactive display and a high-quality print output.

Common DPI values:

  • 72 DPI: historical default for computer displays. Still used for some web graphics.
  • 100 DPI: matplotlib's default for display. A reasonable choice for interactive notebook use.
  • 150 DPI: higher quality display, good for retina-class screens.
  • 300 DPI: standard print resolution. Most journals require this.
  • 600 DPI or higher: very high quality print. Rarely needed but occasionally required.

The figsize and dpi together determine the pixel dimensions of the output. A chart with figsize=(10, 6) and dpi=300 produces an image of 3000 × 1800 pixels. A chart with figsize=(10, 6) and dpi=100 produces an image of 1000 × 600 pixels.

For save operations, dpi=300 is the default you should use for anything that might be printed. For web or slide use, dpi=150 is usually sufficient. For fast exploratory charts, the default dpi=100 is fine.

Other Figure Properties

A few other Figure properties occasionally matter:

  • facecolor: the background color of the Figure. Default is white. facecolor="none" makes it transparent (useful for overlaying on other images). facecolor="#f5f5f5" makes it light gray (matches some corporate backgrounds).
  • edgecolor: the color of the Figure's border (if any). Usually not needed.
  • frameon: whether to draw a frame around the Figure. Default is True but rarely matters because the frame is invisible unless you give it a color.

In practice, the only Figure properties you will configure frequently are figsize and dpi. Everything else defaults are usually correct.


10.5 Saving Figures: savefig and Output Formats

Producing a chart is only half the job. Saving it to a file is the other half. This section covers the common patterns.

The Basic savefig Call

fig.savefig("chart.png")

This saves the Figure to a PNG file at the current working directory. Simple and usually what you want for a quick check. For anything beyond a quick check, you almost always want to add a few arguments:

fig.savefig("chart.png", dpi=300, bbox_inches="tight", facecolor="white")
  • dpi=300: print-quality resolution.
  • bbox_inches="tight": crops the saved image to the actual content, removing excess whitespace. Without this, matplotlib sometimes saves more blank space than you want.
  • facecolor="white": ensures a white background even if the Figure was created with a different default. Useful when matplotlib defaults to a gray or transparent background in some backends.

Output Formats

matplotlib can save to many formats. The common ones:

PNG (.png) — raster (pixel-based), best for web display, slides, and embedded images. The "default" format for most Python visualization workflows. Supports transparency via facecolor="none" or transparent=True. Use dpi=300 for printing.

SVG (.svg) — vector (shape-based), best for editing. Scalable without loss of quality. Open in Illustrator, Inkscape, or Figma for post-processing. Ideal for figures that need manual tweaking after matplotlib is done.

PDF (.pdf) — vector, best for print publications. Most academic journals accept PDF figures with embedded fonts. Use this for anything that will end up in a printed paper or book.

JPG (.jpg) — raster, lossy compression. Smaller file sizes than PNG but not as high quality. Not recommended for charts because the compression can degrade text and thin lines.

EPS (.eps) — legacy vector format, sometimes required by older journals. Rare in modern workflows.

To save in a specific format, just change the file extension:

fig.savefig("chart.png")  # raster
fig.savefig("chart.svg")  # vector, editable
fig.savefig("chart.pdf")  # vector, print-ready

Matplotlib detects the format from the extension automatically.

savefig Arguments Worth Knowing

fig.savefig(
    "chart.png",
    dpi=300,                # resolution for raster formats
    bbox_inches="tight",    # crop to content
    pad_inches=0.1,         # padding around the tight crop
    facecolor="white",      # background color
    edgecolor="none",       # border color
    transparent=False,      # transparent background (overrides facecolor)
    format="png",           # explicit format (usually inferred from extension)
    metadata={"Author": "Your Name"},  # metadata embedded in the file
)

Most of the time you only need dpi and bbox_inches. Remember those two as the defaults for publication-quality output.

A Warning About Font Embedding

For PDF output in particular, matplotlib has to decide how to include fonts. By default, it uses Type 3 fonts, which are not supported by some journal publishing systems. If a journal complains about your PDF figure, set the following before creating the figure:

import matplotlib
matplotlib.rcParams["pdf.fonttype"] = 42  # TrueType
matplotlib.rcParams["ps.fonttype"] = 42

This uses Type 42 fonts, which are TrueType fonts embedded in the PDF. Every modern journal accepts Type 42. The setting is one of those obscure matplotlib defaults that you only learn about when something goes wrong.


10.6 The Rendering Pipeline: From Code to Pixels

This section is a brief tour of what happens inside matplotlib when you call fig.savefig(). You do not need to know this to use matplotlib effectively, but knowing it makes several confusing behaviors make sense.

Step 1: Build the Artist Tree

When you write:

fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
ax.set_title("My Chart")

...you are building a tree of Artist objects in memory. The Figure has an Axes. The Axes has a Line2D (the line you plotted) and a Text (the title). Nothing has been drawn yet — these are just Python objects with their properties set.

Step 2: Layout Computation

Before matplotlib can render anything, it needs to know where each element goes. The layout engine computes the positions of the Axes within the Figure, the positions of tick labels, the space required for titles and axis labels, and so on. This is where tight_layout() and constrained_layout do their work, figuring out how to arrange everything without overlap.

Step 3: Canvas Setup

Matplotlib creates a canvas (implemented by the active backend — Agg for raster, PDF for PDF, etc.) at the requested size and resolution.

Step 4: Artist Rendering

Matplotlib walks the Artist tree and asks each Artist to render itself onto the canvas. The Figure renders first (drawing the background). Then each Axes renders its spines, gridlines, and ticks. Then the plot Artists render (the line, the dots, the bars). Then the text Artists render (title, labels, annotations). The order matters because later Artists appear on top of earlier ones — which is what zorder controls if you need to override the default.

Step 5: Output

The backend writes the canvas to the output file in the requested format. For Agg, this means encoding pixels as PNG. For PDF, this means writing PDF drawing commands. For SVG, this means writing SVG XML.

The whole pipeline takes a fraction of a second for a simple chart and a few seconds for a complex multi-panel figure with thousands of data points. Most of the time, you do not think about the pipeline at all — you just call savefig() and trust that it works. But knowing the steps helps you diagnose problems when they occur:

  • "My chart is blank": probably nothing was added to the Axes, or the chart limits cut off the data.
  • "The title is cut off": the layout engine did not allocate enough space; try bbox_inches="tight" or constrained_layout=True.
  • "The tick labels overlap": same cause, same fix.
  • "The output is blurry": you saved a raster format at too low a DPI.
  • "The output is huge": you saved a raster format at too high a DPI, or the file is a PNG where an SVG would be smaller.

10.7 Common Beginner Traps and How to Fix Them

This section catalogs the mistakes that trip up most new matplotlib users, with explicit fixes.

Trap 1: Mixing pyplot and the OO API

# Confusing:
fig, ax = plt.subplots()
ax.plot(x, y)
plt.title("Hello")  # this goes to the current Axes, which is ax — but it is confusing

Fix: Use ax.set_title("Hello") instead. Keep the pyplot calls to plt.subplots() and avoid mixing plt.something() with ax.something() in the same code.

Trap 2: plt.show() in Scripts

Calling plt.show() in a script pops up a window and blocks the script until the window is closed. In a Jupyter notebook, it is unnecessary (the inline backend displays automatically). In a script that generates files, it is actively harmful because it blocks the script.

Fix: Use fig.savefig(...) instead of plt.show() in scripts. Use plt.show() only in interactive sessions where you want to see the chart on screen.

Trap 3: The Figure Not Rendering in Jupyter

fig, ax = plt.subplots()
ax.plot(x, y)
# Why is nothing showing?

If nothing shows in a Jupyter cell, one of several things might be wrong:

  • You forgot %matplotlib inline (older Jupyter) or the inline backend is not the default.
  • The cell does not end with an expression that matplotlib's auto-display can pick up.
  • You are running the code as a script rather than in a notebook.

Fix: In Jupyter, make sure the last line of the cell is fig or that you call plt.show() explicitly. In scripts, you do not display charts; you save them.

Trap 4: Tick Labels Overlapping or Cut Off

fig, ax = plt.subplots()
ax.plot(dates, values)
# The dates on the x-axis are rotated or cut off

This happens when the tick labels are wider than the available space. matplotlib is not smart enough by default to adjust for this.

Fix: Use fig.autofmt_xdate() for date axes, or plt.xticks(rotation=45) to rotate, or fig.tight_layout() to have matplotlib recompute layout, or constrained_layout=True when creating the figure.

Trap 5: The Chart Looks Different in Save vs. Display

You see the chart in a notebook, it looks fine; you save it, and the saved version is slightly different — tick labels in different positions, aspect ratio slightly off, legend placed differently.

Cause: The display DPI and the save DPI are different, and matplotlib is re-running layout computation for the save. Some elements can shift.

Fix: Use bbox_inches="tight" when saving, and prefer constrained_layout=True over tight_layout() for more stable layout behavior.

Trap 6: Colors That Look Different in Save vs. Display

The colors look right in your notebook but look slightly off in the saved PNG.

Cause: Usually a backend issue or a color profile mismatch. Most matplotlib output uses the sRGB color space, but some displays use P3 or Adobe RGB.

Fix: Not much you can do in matplotlib itself. Save the chart and view it in multiple programs to see which version is accurate. For publication, use sRGB-aware tools.

Trap 7: Global State from Previous Cells

You make a chart in cell A. You go to cell B and make a different chart. Some setting from cell A is still active — the title has the wrong font, or the colors are from the previous style.

Cause: matplotlib's global state (rcParams) persists across cells until you explicitly reset it.

Fix: Use plt.rcdefaults() to reset at the start of each cell, or use a context manager (with plt.rc_context({...}):) to apply settings only within a specific block.


10.8 Navigating the Matplotlib Documentation

matplotlib has extensive documentation, and knowing how to navigate it is a core skill. This section gives you the map.

The matplotlib gallery at matplotlib.org/stable/gallery/ is the single most useful resource. It has hundreds of example charts with full source code. Browse by chart type, find one that is close to what you want, copy the code, and modify it. The gallery is so useful that many practitioners (including experienced ones) start every new chart by opening the gallery and finding a reference example.

The API Reference

The API reference at matplotlib.org/stable/api/ lists every class and method in matplotlib. Most of the time, you want the matplotlib.axes.Axes page, which lists every method on the Axes object — plot, scatter, bar, set_title, and so on. The API reference is dense but comprehensive; use it as a lookup when you need to know the exact arguments for a method.

The Tutorials

The tutorials at matplotlib.org/stable/tutorials/ cover specific topics in depth: pyplot, the OO API, text, images, colors, animation, and so on. Worth reading selectively when you want to go deep on a topic.

Stack Overflow

Stack Overflow has tens of thousands of answered matplotlib questions. For most specific problems, a Google search will find a Stack Overflow answer. The catch: many older answers use the pyplot state-machine API, which we are avoiding. When reading Stack Overflow answers, mentally translate pyplot calls (plt.something()) to OO calls (ax.something()) before applying them.

Reading Method Signatures

When you look at a method in the matplotlib documentation, you will see a signature like:

Axes.plot(*args, scalex=True, scaley=True, data=None, **kwargs)

This looks intimidating but decodes easily:

  • *args means the function takes a flexible number of positional arguments. For plot, you can pass ax.plot(y), ax.plot(x, y), or ax.plot(x, y, fmt) where fmt is a format string.
  • scalex=True, scaley=True are keyword arguments with defaults. You can override them.
  • data=None is another keyword argument.
  • **kwargs means the function accepts additional keyword arguments that will be passed through to the Line2D constructor. Most of the "styling" arguments like color, linewidth, linestyle, marker, label are kwargs.

So you can call plot in many ways:

ax.plot(y)                              # just y-values, x defaults to 0,1,2,...
ax.plot(x, y)                           # explicit x and y
ax.plot(x, y, color="red")              # with a color kwarg
ax.plot(x, y, linewidth=2, linestyle="--", label="Series A")  # multiple kwargs

The kwargs are where most of the customization happens. Chapter 11 will show you the important ones for each chart type; Chapter 12 will show you the full set for polishing.


10.9 The Ugly Climate Plot: Our First Real Code

Time to put everything together. This is the code for the "ugly" climate chart — the default-style time series of global temperature anomalies that has been the progressive project throughout this book. It is deliberately ugly because Chapters 11 and 12 will make it beautiful.

import matplotlib.pyplot as plt
import pandas as pd

# Load the climate data
# (In a real application, you would load from NASA GISS or a local file.
# For this example, assume we have a DataFrame with columns 'year' and 'anomaly')
climate = pd.read_csv("climate_data.csv")

# The canonical fig/ax pattern
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the line
ax.plot(climate["year"], climate["anomaly"])

# Add the title and labels
ax.set_title("Temperature Anomaly")
ax.set_xlabel("Year")
ax.set_ylabel("Anomaly")

# Save the figure
fig.savefig("climate_ugly.png", dpi=150, bbox_inches="tight")

That is the whole thing. Twelve lines (including imports and comments). The chart it produces is correct but ugly:

  • The title is "Temperature Anomaly," a descriptive title that does not state the finding (Chapter 7 violation).
  • The x-axis label is "Year," which is reasonable but could be omitted since the dates are self-explanatory.
  • The y-axis label is "Anomaly," which is missing units (Chapter 7 violation — should say "Temperature Anomaly (°C)").
  • The top and right spines are still there (Chapter 6 declutter violation).
  • The line is in matplotlib's default blue — not chosen, just accepted.
  • There is no annotation on the 2016 or 2023 record years (Chapter 7 violation).
  • There is no source attribution at the bottom (Chapter 7 violation).
  • The default font is DejaVu Sans, which is acceptable but not deliberate.

Every one of these issues will be fixed in subsequent chapters. Chapter 11 will cover the specific line chart options (color, linewidth, linestyle, marker). Chapter 12 will cover the styling (titles, labels, annotations, spines, fonts, colors). Chapter 13 will introduce the multi-panel layout.

But the twelve lines above are the foundation. They use the canonical fig/ax pattern. They create an explicit Figure and Axes. They call methods on the Axes to add data and labels. They save the result to a file at a specified DPI with a tight bounding box. This pattern will appear in every subsequent chapter, in every chart you make, for the rest of your matplotlib career.


10.10 Bringing It Together: The Mental Model

This chapter has introduced a lot of terminology and several mental models. Let us pull them together.

The big picture: matplotlib is a Python library for building charts by configuring a tree of Artist objects and rendering the tree through a backend to an output file. You do not draw on a canvas; you build a tree.

The three layers: - Backend draws pixels or vectors (Agg, PDF, SVG, Cairo, Qt, inline). - Artist is where your chart lives — every visible element is an Artist in a tree. - Scripting (pyplot) is a convenience wrapper that manages state for you. Use it only for plt.subplots() and avoid the rest.

The core trinity: - Figure is the whole image. - Axes is a single plotting area (what most people call "a chart"). - Axis (singular) is one of the two numerical axes within an Axes.

The canonical pattern:

fig, ax = plt.subplots(figsize=(W, H))
ax.plot(x, y)  # or ax.scatter, ax.bar, ax.hist, etc.
ax.set_title("...")
ax.set_xlabel("...")
ax.set_ylabel("...")
fig.savefig("chart.png", dpi=300, bbox_inches="tight")

The threshold concept: Everything is an object. Every method call configures an object. Your job is to configure the tree; matplotlib's job is to render it.

The rest of Part III is built on this foundation. Chapter 11 will teach you the specific plot methods for line, bar, scatter, histogram, and box plots. Chapter 12 will teach you how to style those charts using the principles from Part II. Chapter 13 will teach you multi-panel layouts with GridSpec. Chapter 14 will teach you specialized chart types. Chapter 15 will teach you animation and interactivity.

By the end of Part III, you will be producing publication-quality matplotlib charts that meet every discipline from Parts I and II. You will have built your own style sheet, your own reusable utility functions, your own matplotlib mental model. And you will understand that the tool is not the skill — the skill is what you learned in Parts I and II, and matplotlib is just the instrument that lets you express it.


Chapter Summary

This chapter introduced matplotlib's architecture: the three-layer model (Backend, Artist, Scripting), the core trinity (Figure, Axes, Axis), and the canonical fig, ax = plt.subplots() pattern that will appear in every subsequent example in Part III.

The threshold concept is that everything in matplotlib is an object. You do not draw on a canvas; you configure a tree of Artists, and matplotlib renders the tree through a backend to produce the final output. Every method call configures some Artist in the tree.

The object-oriented API (creating explicit Figure and Axes objects, then calling methods on them) is preferred over the pyplot state-machine API (which relies on hidden "current" Figure and Axes) for everything beyond the simplest exploratory chart. The OO API makes the state explicit, which is slightly more verbose but dramatically easier to reason about.

The most important Figure parameters are figsize (the size of the Figure in inches, choose to match the aspect-ratio principles from Chapter 8) and dpi (dots per inch, typically 100 for display and 300 for print). The most important Axes methods are the plot methods (plot, scatter, bar, hist, boxplot) and the labeling methods (set_title, set_xlabel, set_ylabel).

Saving figures with fig.savefig("chart.png", dpi=300, bbox_inches="tight") is the canonical output pattern. The output format is determined by the file extension; PNG for raster, SVG for editing, PDF for print publications.

The matplotlib documentation — especially the gallery, the API reference, and Stack Overflow — is how experienced practitioners find the specific syntax for specific charts. Knowing how to navigate the docs is a core matplotlib skill.

The "ugly climate plot" produced in Section 10.9 is the starting point for all subsequent Part III chapters. It is deliberately unpolished — descriptive title, no annotations, default spines, default colors, missing units. Chapters 11 and 12 will transform it step by step into a publication-quality figure that meets every standard from Parts I and II.

Next in Chapter 11: the essential chart types — line, bar, scatter, histogram, and box plot — implemented in matplotlib. Each section introduces the core method for that chart type, the key parameters you will use most often, and the common pitfalls. By the end of Chapter 11, you will have produced at least one of every chart type from the Chapter 5 chart selection matrix.


Spaced Review: Concepts from Chapters 1-9

These questions reinforce ideas from earlier chapters. If any feel unfamiliar, revisit the relevant chapter before proceeding.

  1. Chapter 1: The "visualization as argument" framework says every explanatory chart makes a claim. As you start writing matplotlib code, where in the code does the claim get expressed? Is it in the ax.plot() call, the ax.set_title() call, or somewhere else?

  2. Chapter 2: Pre-attentive processing is the perceptual mechanism that handles visual features in under 250 ms. Which matplotlib parameters (in ax.plot() or ax.scatter()) affect pre-attentive processing — color, size, shape, position? Which ones affect later, conscious reading?

  3. Chapter 5: The chart selection matrix tells you which chart type to use for which question. Which matplotlib Axes methods correspond to which chart types in the matrix? ax.plot() is for... ax.bar() is for... ax.scatter() is for...

  4. Chapter 6: The declutter procedure says "remove, lighten, simplify." Which matplotlib parameters and methods implement each step? (Hint: spine visibility, gridline alpha, tick_params.)

  5. Chapter 7: Action titles state the finding. The ax.set_title() method takes a string — what should that string say for a climate chart, a business chart, a public health chart?

  6. Chapter 8: Small multiples are the best way to show many groups. Which plt.subplots() call creates the skeleton for a 3×4 small multiple? How do you access individual panels?

  7. Chapter 9: Data stories are sequences of charts. If you are producing a five-chart climate story, would you use five separate Figure objects or one Figure with five Axes objects? Why?