Chapter 15: matplotlib Foundations — Building Charts from the Ground Up

Contributors to Introduction to Data Science

16 min read

> "matplotlib tries to make easy things easy and hard things possible."

Learning Objectives

Create line plots, bar charts, scatter plots, and histograms using matplotlib's object-oriented interface
Customize charts with titles, axis labels, legends, colors, and gridlines using Axes methods
Construct multi-panel figures using subplots to compare related views of the same data
Annotate charts with text labels, arrows, and reference lines to highlight key findings
Save publication-quality figures to files in multiple formats (PNG, SVG, PDF) with appropriate resolution

In This Chapter

Chapter Overview
15.1 The Two Interfaces: pyplot vs. Object-Oriented
15.2 Your First Plot: A Line Chart
15.3 Bar Charts: Comparing Categories
15.4 Scatter Plots: Exploring Relationships
15.5 Histograms: Understanding Distributions
15.6 Customization: Making Charts Your Own
15.7 Multi-Panel Figures with Subplots
15.8 Annotations: Highlighting What Matters
15.9 Working with pandas DataFrames
15.10 Saving Figures: Publication-Quality Output
15.11 Putting It All Together: A Complete Workflow
15.12 Common Mistakes and How to Fix Them
15.13 Quick Reference: The matplotlib Cheat Sheet
15.14 Chapter Summary
Key Vocabulary Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 15: matplotlib Foundations — Building Charts from the Ground Up

"matplotlib tries to make easy things easy and hard things possible." — John D. Hunter, creator of matplotlib

Chapter Overview

In Chapter 14, you learned to think about charts — the grammar of graphics, chart selection, sketching on paper, Tufte's principles, and the difference between exploratory and explanatory visualization. You designed chart plans on paper without writing a line of code.

Now the code begins.

matplotlib is Python's most widely used plotting library. It was created by John D. Hunter in 2003, inspired by MATLAB's plotting interface, and has since become the foundation on which most of Python's visualization ecosystem is built. seaborn (Chapter 16) is built on top of matplotlib. plotly (Chapter 17) can export matplotlib-compatible figures. Even pandas has built-in plotting that calls matplotlib under the hood.

Learning matplotlib is like learning to drive a manual transmission. It's more work than an automatic. You have to understand the clutch, the gears, the coordination. But once you know how it works, you understand what's happening — and you can handle any situation. seaborn and plotly are the automatic transmissions that make common tasks easier, but matplotlib is the engine they're built on.

This chapter teaches matplotlib's object-oriented interface — the Figure and Axes approach — rather than the simpler but less flexible pyplot shortcut. The object-oriented interface gives you full control over every element of your chart, and it's what you'll need for professional-quality work.

In this chapter, you will learn to:

Create line plots, bar charts, scatter plots, and histograms using matplotlib's object-oriented interface (all paths)
Customize charts with titles, axis labels, legends, colors, and gridlines using Axes methods (all paths)
Construct multi-panel figures using subplots to compare related views of the same data (all paths)
Annotate charts with text labels, arrows, and reference lines to highlight key findings (standard + deep dive paths)
Save publication-quality figures to files in multiple formats (PNG, SVG, PDF) with appropriate resolution (all paths)

Note — Learning path annotations: Objectives marked (all paths) are essential for every reader. Those marked (standard + deep dive) can be skimmed on the Fast Track but are important for deeper understanding. See "How to Use This Book" for full path descriptions.

15.1 The Two Interfaces: pyplot vs. Object-Oriented

Before we write any code, we need to clear up a common source of confusion. matplotlib has two interfaces for creating charts, and you'll encounter both in tutorials, Stack Overflow answers, and other people's code.

The pyplot Interface (Quick but Limited)

The pyplot interface uses functions from matplotlib.pyplot (conventionally imported as plt) that operate on a "current figure" and "current axes" behind the scenes:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.title("Sales Over Time")
plt.xlabel("Quarter")
plt.ylabel("Revenue ($K)")
plt.show()

This works, and for a quick exploratory chart, it's fine. But notice that you never explicitly created a figure or an axes. matplotlib did it for you, invisibly. That implicit state management becomes a problem when you want multiple panels, when you want to modify specific parts of the chart, or when you're working in a script rather than a notebook.

The Object-Oriented Interface (Explicit and Powerful)

The object-oriented (OO) interface makes everything explicit. You create a Figure (the canvas) and one or more Axes (the coordinate systems where data is plotted), then call methods on those objects:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 30])
ax.set_title("Sales Over Time")
ax.set_xlabel("Quarter")
ax.set_ylabel("Revenue ($K)")
plt.show()

The output is identical, but now you have explicit references to the fig (figure) and ax (axes) objects. You can pass ax to functions. You can have multiple axes on one figure. You always know exactly what you're modifying.

For this entire book, we will use the object-oriented interface. It takes a few more characters of code, but it gives you full control and scales to complex visualizations. When you see matplotlib code online using plt.plot() without an explicit fig, ax, know that it's using the pyplot shortcut — it works, but it's not what we'll teach here.

The mental model: Think of Figure as a blank piece of paper and Axes as a rectangular region on that paper where a chart is drawn. One piece of paper can hold one chart (one Axes) or many charts (multiple Axes — i.e., subplots). The Figure holds everything; the Axes is where the data lives.

15.2 Your First Plot: A Line Chart

Let's build a line chart step by step. We'll start with Elena's vaccination data — a simple time series of vaccination rates over several years for a single country.

import matplotlib.pyplot as plt

years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023]
rates = [72, 74, 76, 78, 79, 71, 75, 80, 83]

fig, ax = plt.subplots()
ax.plot(years, rates)
plt.show()

That's your first chart. A line connecting nine points, with years on the x-axis and vaccination rates on the y-axis. matplotlib chose default colors (blue), default axis ranges, and default tick marks. For exploratory work, this is perfectly adequate — you can see that rates generally increased, dipped around 2020, and recovered.

Now let's improve it — applying what we learned in Chapter 14 about explanatory visualization.

Adding Labels and Title

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

Notice three things:

figsize=(8, 5) sets the figure size in inches (width, height). The default is usually (6.4, 4.8), which is often too small for readability.
set_title() receives a descriptive title that states the finding, not just the topic — following Tufte's advice from Chapter 14.
set_xlabel() and set_ylabel() add axis labels with units. Always include units.

Customizing the Line

The plot() method accepts keyword arguments to control the line's appearance:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2,
        marker="o", markersize=6)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

color="steelblue": A named color. matplotlib supports hundreds of named colors, hex codes ("#4682B4"), and RGB tuples.
linewidth=2: Thicker line for visibility.
marker="o": Circles at each data point. Other options: "s" (square), "^" (triangle), "D" (diamond), "x" (x-mark).
markersize=6: Size of the marker circles.

Setting Axis Ranges

matplotlib auto-scales axes to fit your data, but sometimes you want explicit control:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.set_xlim(2014, 2024)
ax.set_ylim(0, 100)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

Remember from Chapter 14: for a line chart, starting the y-axis at zero is optional (lines encode position, not length). But if you want to show the rates in context of the 0-100% range, setting set_ylim(0, 100) makes sense. For a closer look at the trend, you might use set_ylim(65, 90) instead.

15.3 Bar Charts: Comparing Categories

Bar charts are the workhorse of categorical comparison. Let's compare vaccination rates across WHO regions.

import matplotlib.pyplot as plt

regions = ["Africa", "Americas", "South-East\nAsia",
           "Europe", "Eastern\nMed", "Western\nPacific"]
rates = [48, 79, 82, 91, 73, 88]

fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color="steelblue")
ax.set_title("Sub-Saharan Africa Trails Other Regions in Vaccination Coverage")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()

Key points:

ax.bar(categories, values) creates a vertical bar chart.
set_ylim(0, 100) is essential for bar charts — bars encode values as lengths, so the axis must start at zero. This is not optional.
\n in category labels: Newlines in strings break long labels across two lines, preventing them from overlapping.

Horizontal Bar Charts

When category labels are long, horizontal bars are more readable:

fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(regions, rates, color="steelblue")
ax.set_title("Vaccination Rates by WHO Region, 2023")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_xlim(0, 100)
plt.show()

ax.barh() swaps the axes — categories on y, values on x. Labels read naturally from left to right.

Highlighting a Specific Bar

To draw attention to one bar (say, Africa), use a different color:

colors = ["darkorange" if r == "Africa" else "steelblue"
          for r in regions]

fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color=colors)
ax.set_title("Africa's Vaccination Rate is 30+ Points Below Europe's")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()

This uses a list comprehension to assign "darkorange" to Africa and "steelblue" to everyone else. The orange bar pops out through pre-attentive color processing — exactly the kind of deliberate design choice we discussed in Chapter 14.

Adding Value Labels on Bars

For explanatory charts, labeling bar values directly is often clearer than relying on the y-axis alone:

fig, ax = plt.subplots(figsize=(9, 5))
bars = ax.bar(regions, rates, color="steelblue")

for bar, rate in zip(bars, rates):
    ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1,
            f"{rate}%", ha="center", va="bottom", fontsize=10)

ax.set_title("Vaccination Rates by WHO Region, 2023")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 105)
plt.show()

The loop places a text label above each bar. ha="center" centers the text horizontally over the bar; va="bottom" places it just above the specified y coordinate.

15.4 Scatter Plots: Exploring Relationships

Scatter plots reveal relationships between two continuous variables. Let's plot GDP per capita against vaccination rate for a set of countries.

import matplotlib.pyplot as plt

# Sample data: 12 countries
gdp = [1200, 3500, 8200, 12000, 15000, 22000,
       28000, 35000, 42000, 48000, 55000, 62000]
vacc = [52, 61, 68, 74, 78, 83, 86, 89, 91, 90, 93, 95]

fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(gdp, vacc, color="steelblue", s=60, alpha=0.7)
ax.set_title("Higher GDP Countries Tend to Have Higher Vaccination Rates")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

ax.scatter(x, y): Places one point per (x, y) pair.
s=60: Marker size. Larger values make bigger dots.
alpha=0.7: Transparency. When points overlap, transparency lets you see the density. Alpha ranges from 0 (invisible) to 1 (fully opaque).

Color-Coding by a Third Variable

You can encode a third variable using color:

import matplotlib.pyplot as plt

gdp = [1200, 3500, 8200, 12000, 15000, 22000,
       28000, 35000, 42000, 48000, 55000, 62000]
vacc = [52, 61, 68, 74, 78, 83, 86, 89, 91, 90, 93, 95]
population = [45, 120, 30, 85, 15, 60, 25, 10, 35, 50, 8, 20]

fig, ax = plt.subplots(figsize=(8, 6))
scatter = ax.scatter(gdp, vacc, c=population, s=80,
                     cmap="YlOrRd", alpha=0.8, edgecolors="gray")
fig.colorbar(scatter, ax=ax, label="Population (millions)")
ax.set_title("GDP, Vaccination Rates, and Population")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

c=population: Maps the population values to colors.
cmap="YlOrRd": A colormap — a gradient palette from yellow through orange to red. matplotlib has many colormaps: "viridis" (default, perceptually uniform), "Blues", "coolwarm" (diverging), etc.
fig.colorbar(scatter, ...): Adds a color legend showing what the colors mean.
edgecolors="gray": Adds a thin gray border around each point for visual separation.

Encoding Size (Bubble Chart)

You can also vary point size to encode a fourth variable:

fig, ax = plt.subplots(figsize=(8, 6))
sizes = [p * 5 for p in population]  # Scale for visibility
ax.scatter(gdp, vacc, s=sizes, color="steelblue",
           alpha=0.6, edgecolors="navy")
ax.set_title("Larger Bubbles = Larger Population")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

Remember Cleveland and McGill's hierarchy from Chapter 14: area is less accurately perceived than position or length. Bubble charts are useful for getting a general sense of a third variable, but readers shouldn't rely on precise size comparisons.

15.5 Histograms: Understanding Distributions

Histograms show the distribution of a single continuous variable — one of the most common exploratory visualizations.

import matplotlib.pyplot as plt

# Simulated vaccination rates for 150 countries
import random
random.seed(42)
vacc_rates = [random.gauss(75, 15) for _ in range(150)]
vacc_rates = [max(0, min(100, r)) for r in vacc_rates]

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(vacc_rates, bins=20, color="steelblue",
        edgecolor="white", alpha=0.8)
ax.set_title("Global Vaccination Rates Are Roughly Normally Distributed")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_ylabel("Number of Countries")
plt.show()

bins=20: Divides the data range into 20 equal-width intervals. More bins show more detail but can be noisy; fewer bins smooth the picture but may hide structure. A good starting range is 15-30 for most datasets.
edgecolor="white": Adds white borders between bars so you can distinguish them.
alpha=0.8: Slight transparency.

Choosing the Number of Bins

There's no single "correct" number of bins. Here's a practical approach: try several values and see how the shape changes.

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for ax, n_bins in zip(axes, [10, 20, 40]):
    ax.hist(vacc_rates, bins=n_bins, color="steelblue",
            edgecolor="white")
    ax.set_title(f"{n_bins} bins")
    ax.set_xlabel("Vaccination Rate (%)")
    ax.set_ylabel("Count")

fig.suptitle("Effect of Bin Count on Histogram Appearance", fontsize=14)
fig.tight_layout()
plt.show()

This creates three side-by-side histograms with different bin counts — a small multiples comparison. We'll explore subplots() in detail in Section 15.7.

15.6 Customization: Making Charts Your Own

Now that you know the four basic chart types, let's learn the customization tools that transform rough exploratory charts into polished explanatory ones.

Colors

matplotlib accepts colors in many formats:

# Named colors
ax.plot(x, y, color="steelblue")
ax.plot(x, y, color="tomato")

# Hex codes
ax.plot(x, y, color="#2C3E50")

# RGB tuples (0-1 range)
ax.plot(x, y, color=(0.2, 0.4, 0.6))

For a list of all named colors, search "matplotlib named colors" in the documentation. For exploratory work, names like "steelblue", "tomato", "seagreen", "slategray", and "goldenrod" are readable and visually distinct.

Gridlines

Gridlines help readers estimate values but should be subtle:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.grid(True, alpha=0.3, linestyle="--")
ax.set_title("Vaccination Rate Over Time")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

ax.grid(True): Turns gridlines on.
alpha=0.3: Makes them very faint — following Tufte's principle that gridlines should support, not compete with, the data.
linestyle="--": Dashed lines are less visually heavy than solid lines.

Legends

When multiple data series appear on the same chart, a legend identifies them:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, [72, 74, 76, 78, 79, 71, 75, 80, 83],
        label="Country A", color="steelblue", linewidth=2)
ax.plot(years, [85, 86, 87, 88, 88, 84, 86, 89, 91],
        label="Country B", color="tomato", linewidth=2)
ax.legend(frameon=False, fontsize=11)
ax.set_title("Country A Is Closing the Gap with Country B")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

label="Country A": The text that appears in the legend for this line.
ax.legend(): Displays the legend. matplotlib automatically places it where it overlaps the data least.
frameon=False: Removes the legend box border — less clutter, following Tufte.

Removing Spines

Spines are the borders around the plot area. Removing the top and right spines makes charts cleaner:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set_title("A Cleaner Look: Top and Right Spines Removed")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()

This small change makes a noticeable difference in visual cleanliness. Many professional data visualization styles (FiveThirtyEight, The Economist) remove unnecessary spines.

15.7 Multi-Panel Figures with Subplots

One of matplotlib's most powerful features is the ability to create multi-panel figures — what Chapter 14 called faceting or small multiples.

Basic Subplots

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left panel: bar chart
axes[0].bar(regions, rates, color="steelblue")
axes[0].set_title("Vaccination Rates by Region")
axes[0].set_ylabel("Rate (%)")
axes[0].set_ylim(0, 100)

# Right panel: histogram
axes[1].hist(vacc_rates, bins=20, color="seagreen", edgecolor="white")
axes[1].set_title("Distribution of Country-Level Rates")
axes[1].set_xlabel("Vaccination Rate (%)")
axes[1].set_ylabel("Count")

fig.suptitle("Two Views of Global Vaccination Data", fontsize=14)
fig.tight_layout()
plt.show()

plt.subplots(1, 2) creates a figure with 1 row and 2 columns of axes.
axes[0] and axes[1] are the left and right panels.
fig.suptitle() adds a title for the entire figure (above all panels).
fig.tight_layout() automatically adjusts spacing so titles and labels don't overlap. Always call this.

Grid Layouts

For more panels, use a grid:

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Top-left
axes[0, 0].plot(years, rates, color="steelblue", marker="o")
axes[0, 0].set_title("Trend Over Time")

# Top-right
axes[0, 1].bar(regions[:4], rates[:4], color="tomato")
axes[0, 1].set_title("Top 4 Regions")

# Bottom-left
axes[1, 0].scatter(gdp, vacc, color="seagreen", s=60)
axes[1, 0].set_title("GDP vs. Vaccination")

# Bottom-right
axes[1, 1].hist(vacc_rates, bins=20, color="goldenrod", edgecolor="white")
axes[1, 1].set_title("Rate Distribution")

fig.suptitle("Four Views of Global Health Data", fontsize=14, y=1.01)
fig.tight_layout()
plt.show()

With a 2D grid, axes[row, col] indexes the panels. Row 0 is top, row 1 is bottom. Column 0 is left, column 1 is right.

When comparing the same variable across panels, sharing axis scales ensures honest comparison:

fig, axes = plt.subplots(1, 3, figsize=(15, 4), sharey=True)

country_data = {
    "Country A": [72, 74, 76, 78, 79, 71, 75, 80, 83],
    "Country B": [85, 86, 87, 88, 88, 84, 86, 89, 91],
    "Country C": [40, 42, 45, 48, 50, 47, 52, 55, 60],
}

for ax, (name, data) in zip(axes, country_data.items()):
    ax.plot(years, data, color="steelblue", linewidth=2, marker="o")
    ax.set_title(name)
    ax.set_xlabel("Year")

axes[0].set_ylabel("Vaccination Rate (%)")
fig.suptitle("Three Countries, Three Trajectories", fontsize=14)
fig.tight_layout()
plt.show()

sharey=True: All three panels share the same y-axis range. This is critical for honest comparison — without shared axes, panels auto-scale independently and a modest change could look just as dramatic as a large one.

15.8 Annotations: Highlighting What Matters

Annotations turn a chart from "here's some data" into "here's what you should notice." They're the difference between exploratory and explanatory visualization.

Text Annotations

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")

# Annotate the 2020 dip
ax.annotate("COVID-19\npandemic dip",
            xy=(2020, 71), xytext=(2017.5, 65),
            fontsize=10, color="tomato",
            arrowprops=dict(arrowstyle="->", color="tomato"),
            ha="center")

ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.show()

ax.annotate(text, xy=..., xytext=...): Places annotation text at xytext with an arrow pointing to xy (the data point).
arrowprops=dict(arrowstyle="->", color="tomato"): Draws a red arrow from the text to the point.
ha="center": Horizontally centers the text.

Reference Lines

Horizontal or vertical reference lines provide context:

fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color="steelblue")

# Add global average reference line
global_avg = sum(rates) / len(rates)
ax.axhline(y=global_avg, color="tomato", linestyle="--",
           linewidth=1.5, label=f"Global Average ({global_avg:.0f}%)")

ax.legend(frameon=False)
ax.set_title("Three Regions Fall Below the Global Average")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()

ax.axhline(y=...): Draws a horizontal line across the entire chart at the specified y value.
ax.axvline(x=...): Draws a vertical line (useful for marking specific dates on a time series).

Combining Annotations

A polished explanatory chart often combines several annotation techniques:

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(years, rates, color="steelblue", linewidth=2.5, marker="o",
        markersize=7, zorder=3)

# Reference line at 80%
ax.axhline(y=80, color="gray", linestyle=":", alpha=0.5)
ax.text(2023.3, 80.5, "80% target", fontsize=9, color="gray")

# Annotate the dip
ax.annotate("2020: pandemic\ndisrupts vaccination",
            xy=(2020, 71), xytext=(2017, 63),
            fontsize=10, color="tomato",
            arrowprops=dict(arrowstyle="->", color="tomato"))

# Annotate the recovery
ax.annotate("2023: full recovery\nand new high",
            xy=(2023, 83), xytext=(2021, 88),
            fontsize=10, color="seagreen",
            arrowprops=dict(arrowstyle="->", color="seagreen"))

ax.set_title("Vaccination Rates Recovered from Pandemic Dip to Reach New Highs",
             fontsize=13, fontweight="bold")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(True, alpha=0.2)
fig.tight_layout()
plt.show()

This chart tells a story: there was a dip, here's why, and here's the recovery. Every annotation earns its place by contributing to the narrative.

15.9 Working with pandas DataFrames

In practice, your data lives in pandas DataFrames, not in Python lists. matplotlib works seamlessly with pandas:

import pandas as pd
import matplotlib.pyplot as plt

# Create a small DataFrame
df = pd.DataFrame({
    "region": ["Africa", "Americas", "SE Asia",
               "Europe", "E Med", "W Pacific"],
    "rate_2020": [42, 73, 78, 87, 68, 84],
    "rate_2023": [48, 79, 82, 91, 73, 88],
})

fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(df["region"], df["rate_2023"], color="steelblue")
ax.set_title("Vaccination Rates by Region, 2023")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
fig.tight_layout()
plt.show()

You can pass DataFrame columns directly to ax.bar(), ax.scatter(), ax.plot(), and ax.hist(). They're just arrays of values, and matplotlib doesn't care whether they come from a list, a NumPy array, or a pandas Series.

Grouped Bar Charts from DataFrames

Showing two years side-by-side requires a bit of positioning:

import numpy as np

x = np.arange(len(df["region"]))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, df["rate_2020"], width,
       label="2020", color="lightcoral")
ax.bar(x + width/2, df["rate_2023"], width,
       label="2023", color="steelblue")

ax.set_xticks(x)
ax.set_xticklabels(df["region"])
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
ax.set_title("All Regions Improved from 2020 to 2023")
ax.legend(frameon=False)
fig.tight_layout()
plt.show()

np.arange(len(...)): Creates numerical positions [0, 1, 2, 3, 4, 5] for the bar groups.
Offset by width/2: Places 2020 bars slightly left and 2023 bars slightly right of each position.
set_xticks() and set_xticklabels(): Replaces numeric positions with region names.

15.10 Saving Figures: Publication-Quality Output

Once you've built a chart worth keeping, you need to save it. The savefig() method exports your figure to a file.

Basic Save

fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(regions, rates, color="steelblue")
ax.set_title("Vaccination Rates by Region")
ax.set_ylabel("Rate (%)")
ax.set_ylim(0, 100)
fig.tight_layout()

fig.savefig("vaccination_by_region.png", dpi=150, bbox_inches="tight")

dpi=150: Dots per inch. 72 is screen quality. 150 is good for documents. 300 is publication quality. Higher DPI means larger file size but sharper output.
bbox_inches="tight": Trims whitespace around the figure. Without this, saved figures often have excessive margins.

Format Options

matplotlib supports multiple output formats:

fig.savefig("chart.png", dpi=300, bbox_inches="tight")   # Raster
fig.savefig("chart.svg", bbox_inches="tight")              # Vector
fig.savefig("chart.pdf", bbox_inches="tight")              # Vector

PNG: Raster format. Good for web, presentations, and notebooks. Use dpi=300 for print quality.
SVG: Scalable vector format. Lines and text remain sharp at any zoom level. Ideal for web and for figures that need to scale.
PDF: Vector format. Ideal for inclusion in academic papers and reports. Text remains selectable and searchable.

For most data science work, save both a PNG (for embedding in notebooks and slides) and an SVG or PDF (for publications and reports).

Saving from a Jupyter Notebook

In Jupyter, figures display inline automatically. To save them, call savefig() before plt.show() or in the same cell:

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.set_title("Vaccination Trend")
ax.set_xlabel("Year")
ax.set_ylabel("Rate (%)")
fig.tight_layout()
fig.savefig("vaccination_trend.png", dpi=150, bbox_inches="tight")
plt.show()

15.11 Putting It All Together: A Complete Workflow

Let's build a polished, publication-ready figure from start to finish, following the chart plan approach from Chapter 14.

Chart Plan: - Question: How have vaccination rates changed over time for three countries with different trajectories? - Chart type: Multi-panel line chart (3 panels) - Data: Time series for three countries - Audience: Explanatory — for a policy brief

import matplotlib.pyplot as plt
import numpy as np

# Data
years = list(range(2015, 2024))
countries = {
    "Rwanda": [93, 95, 97, 97, 98, 93, 96, 98, 99],
    "Brazil": [84, 79, 76, 72, 73, 68, 70, 77, 80],
    "Afghanistan": [60, 58, 55, 52, 50, 45, 42, 48, 50],
}
colors = {"Rwanda": "seagreen", "Brazil": "steelblue",
          "Afghanistan": "tomato"}

# Build the figure
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)

for ax, (name, data) in zip(axes, countries.items()):
    ax.plot(years, data, color=colors[name], linewidth=2.5,
            marker="o", markersize=5)
    ax.set_title(name, fontsize=13, fontweight="bold")
    ax.set_xlabel("Year")
    ax.set_ylim(30, 105)
    ax.axhline(y=80, color="gray", linestyle=":", alpha=0.4)
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.grid(True, alpha=0.15)

axes[0].set_ylabel("Vaccination Rate (%)")

fig.suptitle("Three Countries, Three Vaccination Trajectories",
             fontsize=15, fontweight="bold", y=1.02)
fig.tight_layout()
fig.savefig("three_trajectories.png", dpi=300, bbox_inches="tight")
plt.show()

This figure: - Uses shared y-axes for honest comparison - Adds a reference line at 80% (a common coverage target) - Removes top/right spines for a clean look - Uses distinct colors for each country - Has a finding-based title at the figure level and country names at the panel level - Is saved at 300 DPI for print quality

15.12 Common Mistakes and How to Fix Them

matplotlib has a famously large API and some confusing conventions. Here are the mistakes you'll encounter most often, and how to fix them.

Mistake 1: Mixing pyplot and OO Interfaces

# WRONG: mixing styles
fig, ax = plt.subplots()
plt.title("My Chart")       # pyplot style on an OO figure
ax.plot(x, y)

# RIGHT: consistent OO style
fig, ax = plt.subplots()
ax.set_title("My Chart")    # OO style
ax.plot(x, y)

The plt.title() function operates on the "current axes," which might not be the axes you think. Use ax.set_title() to be explicit.

Mistake 2: Forgetting tight_layout()

Without tight_layout(), titles and labels frequently get cut off or overlap. Always call it:

fig.tight_layout()  # Call this before savefig and show

Mistake 3: Not Setting Y-Axis to Zero for Bar Charts

# WRONG: auto-scaled y-axis for bar chart
ax.bar(categories, values)  # axis might start at 50

# RIGHT: explicit zero baseline
ax.bar(categories, values)
ax.set_ylim(0, max(values) * 1.1)  # start at 0, 10% headroom

Mistake 4: Overlapping X-Axis Labels

Long category labels overlap when there are many bars:

# FIX: rotate labels
ax.set_xticklabels(labels, rotation=45, ha="right")

Or use horizontal bars (ax.barh()) where labels read naturally.

Mistake 5: Too Many Colors

Using a different color for each bar in a single-variable bar chart adds visual noise without information:

# WRONG: rainbow bars for no reason
ax.bar(categories, values, color=["red", "blue", "green",
       "orange", "purple", "pink"])

# RIGHT: one color (or highlight specific bars)
ax.bar(categories, values, color="steelblue")

Use color to encode a variable, not to decorate.

15.13 Quick Reference: The matplotlib Cheat Sheet

Here's a condensed reference for the most common operations. Tear this page out (metaphorically) and keep it next to your keyboard.

Creating Figures

fig, ax = plt.subplots()                    # One panel
fig, ax = plt.subplots(figsize=(10, 6))     # Custom size
fig, axes = plt.subplots(1, 3)              # 1 row, 3 columns
fig, axes = plt.subplots(2, 2)              # 2x2 grid
fig, axes = plt.subplots(2, 2, sharey=True) # Shared y-axes

Plot Types

ax.plot(x, y)                    # Line chart
ax.bar(categories, values)       # Vertical bar chart
ax.barh(categories, values)      # Horizontal bar chart
ax.scatter(x, y)                 # Scatter plot
ax.hist(data, bins=20)           # Histogram

Customization

ax.set_title("Title")
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_xlim(min, max)
ax.set_ylim(min, max)
ax.legend(frameon=False)
ax.grid(True, alpha=0.3)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

Line/Marker Options

ax.plot(x, y, color="steelblue", linewidth=2,
        linestyle="--", marker="o", markersize=6,
        label="Series A")

Annotations

ax.annotate("Label", xy=(x, y), xytext=(tx, ty),
            arrowprops=dict(arrowstyle="->"))
ax.axhline(y=val, color="gray", linestyle=":")
ax.axvline(x=val, color="gray", linestyle=":")
ax.text(x, y, "Text", fontsize=10, ha="center")

Saving

fig.tight_layout()
fig.savefig("name.png", dpi=300, bbox_inches="tight")
fig.savefig("name.svg", bbox_inches="tight")
fig.savefig("name.pdf", bbox_inches="tight")

15.14 Chapter Summary

You came into this chapter knowing how to think about charts. Now you know how to build them.

We learned that matplotlib has two interfaces — the implicit pyplot style and the explicit object-oriented style — and chose the OO interface for its clarity and power. We built the four fundamental chart types: line charts for trends, bar charts for comparisons, scatter plots for relationships, and histograms for distributions.

We customized charts with titles (that state findings, not topics), axis labels (with units), legends (without borders), gridlines (faint and supportive), and cleaned-up spines. We built multi-panel figures with subplots() and learned to share axes for honest comparison. We annotated charts with text, arrows, and reference lines to guide the reader's eye to the key story.

We worked with pandas DataFrames directly, built grouped bar charts, and saved figures in multiple formats at publication quality. And we covered the five most common mistakes and how to avoid them.

matplotlib's API is large and sometimes inconsistent — John Hunter himself acknowledged that its MATLAB heritage created some awkward conventions. You don't need to memorize every option. You need to understand the Figure/Axes mental model, know the basic chart types, and keep the cheat sheet in Section 15.13 within reach. The rest you can look up when you need it.

In the next chapter, seaborn will make many of these operations simpler and more statistically sophisticated. But every seaborn chart is a matplotlib chart under the hood, and knowing how the engine works will make you a more effective, more confident data visualizer.

Key Vocabulary Summary

Term	Definition
Figure	The top-level container in matplotlib — the "canvas" that holds one or more Axes
Axes	A single chart panel within a Figure, containing the coordinate system, data plots, labels, and annotations
subplot	One of multiple Axes arranged in a grid within a single Figure
plot()	Axes method for creating line charts by connecting (x, y) points
bar() / barh()	Axes methods for creating vertical / horizontal bar charts
scatter()	Axes method for creating scatter plots from (x, y) point pairs
hist()	Axes method for creating histograms from a single data array
set_xlabel() / set_ylabel()	Axes methods for setting axis label text
set_title()	Axes method for setting the chart title
legend()	Axes method for displaying a legend identifying multiple data series
colormap	A mapping from numerical values to colors, used in scatter plots and heatmaps (e.g., "viridis", "Blues")
savefig()	Figure method for saving the figure to a file (PNG, SVG, PDF)
tight_layout()	Figure method that auto-adjusts spacing to prevent label overlap
annotation	Text, arrows, or reference lines added to a chart to highlight specific findings
object-oriented interface	matplotlib's explicit approach using Figure and Axes objects, preferred over the implicit pyplot style

Next up: Chapter 16 — Statistical Visualization with seaborn. Same data, more statistical power, less code.

Learning Objectives

In This Chapter

Chapter 15: matplotlib Foundations — Building Charts from the Ground Up

Chapter Overview

15.1 The Two Interfaces: pyplot vs. Object-Oriented

The pyplot Interface (Quick but Limited)

The Object-Oriented Interface (Explicit and Powerful)

15.2 Your First Plot: A Line Chart

Adding Labels and Title

Customizing the Line

Setting Axis Ranges

15.3 Bar Charts: Comparing Categories

Horizontal Bar Charts

Highlighting a Specific Bar

Adding Value Labels on Bars

15.4 Scatter Plots: Exploring Relationships

Color-Coding by a Third Variable

Encoding Size (Bubble Chart)

15.5 Histograms: Understanding Distributions

Choosing the Number of Bins

15.6 Customization: Making Charts Your Own

Colors

Gridlines

Legends

Removing Spines

15.7 Multi-Panel Figures with Subplots

Basic Subplots

Grid Layouts

Sharing Axes

15.8 Annotations: Highlighting What Matters

Text Annotations

Reference Lines

Combining Annotations

15.9 Working with pandas DataFrames

Grouped Bar Charts from DataFrames

15.10 Saving Figures: Publication-Quality Output

Basic Save

Format Options

Saving from a Jupyter Notebook

15.11 Putting It All Together: A Complete Workflow

15.12 Common Mistakes and How to Fix Them

Mistake 1: Mixing pyplot and OO Interfaces

Mistake 2: Forgetting tight_layout()

Mistake 3: Not Setting Y-Axis to Zero for Bar Charts

Mistake 4: Overlapping X-Axis Labels

Mistake 5: Too Many Colors

15.13 Quick Reference: The matplotlib Cheat Sheet

Creating Figures

Plot Types

Customization

Line/Marker Options

Annotations

Saving

15.14 Chapter Summary

Key Vocabulary Summary

Related Reading