> "matplotlib tries to make easy things easy and hard things possible."
Learning Objectives
- Create line plots, bar charts, scatter plots, and histograms using matplotlib's object-oriented interface
- Customize charts with titles, axis labels, legends, colors, and gridlines using Axes methods
- Construct multi-panel figures using subplots to compare related views of the same data
- Annotate charts with text labels, arrows, and reference lines to highlight key findings
- Save publication-quality figures to files in multiple formats (PNG, SVG, PDF) with appropriate resolution
In This Chapter
- Chapter Overview
- 15.1 The Two Interfaces: pyplot vs. Object-Oriented
- 15.2 Your First Plot: A Line Chart
- 15.3 Bar Charts: Comparing Categories
- 15.4 Scatter Plots: Exploring Relationships
- 15.5 Histograms: Understanding Distributions
- 15.6 Customization: Making Charts Your Own
- 15.7 Multi-Panel Figures with Subplots
- 15.8 Annotations: Highlighting What Matters
- 15.9 Working with pandas DataFrames
- 15.10 Saving Figures: Publication-Quality Output
- 15.11 Putting It All Together: A Complete Workflow
- 15.12 Common Mistakes and How to Fix Them
- 15.13 Quick Reference: The matplotlib Cheat Sheet
- 15.14 Chapter Summary
- Key Vocabulary Summary
Chapter 15: matplotlib Foundations — Building Charts from the Ground Up
"matplotlib tries to make easy things easy and hard things possible." — John D. Hunter, creator of matplotlib
Chapter Overview
In Chapter 14, you learned to think about charts — the grammar of graphics, chart selection, sketching on paper, Tufte's principles, and the difference between exploratory and explanatory visualization. You designed chart plans on paper without writing a line of code.
Now the code begins.
matplotlib is Python's most widely used plotting library. It was created by John D. Hunter in 2003, inspired by MATLAB's plotting interface, and has since become the foundation on which most of Python's visualization ecosystem is built. seaborn (Chapter 16) is built on top of matplotlib. plotly (Chapter 17) can export matplotlib-compatible figures. Even pandas has built-in plotting that calls matplotlib under the hood.
Learning matplotlib is like learning to drive a manual transmission. It's more work than an automatic. You have to understand the clutch, the gears, the coordination. But once you know how it works, you understand what's happening — and you can handle any situation. seaborn and plotly are the automatic transmissions that make common tasks easier, but matplotlib is the engine they're built on.
This chapter teaches matplotlib's object-oriented interface — the Figure and Axes approach — rather than the simpler but less flexible pyplot shortcut. The object-oriented interface gives you full control over every element of your chart, and it's what you'll need for professional-quality work.
In this chapter, you will learn to:
- Create line plots, bar charts, scatter plots, and histograms using matplotlib's object-oriented interface (all paths)
- Customize charts with titles, axis labels, legends, colors, and gridlines using Axes methods (all paths)
- Construct multi-panel figures using subplots to compare related views of the same data (all paths)
- Annotate charts with text labels, arrows, and reference lines to highlight key findings (standard + deep dive paths)
- Save publication-quality figures to files in multiple formats (PNG, SVG, PDF) with appropriate resolution (all paths)
Note — Learning path annotations: Objectives marked (all paths) are essential for every reader. Those marked (standard + deep dive) can be skimmed on the Fast Track but are important for deeper understanding. See "How to Use This Book" for full path descriptions.
15.1 The Two Interfaces: pyplot vs. Object-Oriented
Before we write any code, we need to clear up a common source of confusion. matplotlib has two interfaces for creating charts, and you'll encounter both in tutorials, Stack Overflow answers, and other people's code.
The pyplot Interface (Quick but Limited)
The pyplot interface uses functions from matplotlib.pyplot (conventionally imported as plt) that operate on a "current figure" and "current axes" behind the scenes:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.title("Sales Over Time")
plt.xlabel("Quarter")
plt.ylabel("Revenue ($K)")
plt.show()
This works, and for a quick exploratory chart, it's fine. But notice that you never explicitly created a figure or an axes. matplotlib did it for you, invisibly. That implicit state management becomes a problem when you want multiple panels, when you want to modify specific parts of the chart, or when you're working in a script rather than a notebook.
The Object-Oriented Interface (Explicit and Powerful)
The object-oriented (OO) interface makes everything explicit. You create a Figure (the canvas) and one or more Axes (the coordinate systems where data is plotted), then call methods on those objects:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 30])
ax.set_title("Sales Over Time")
ax.set_xlabel("Quarter")
ax.set_ylabel("Revenue ($K)")
plt.show()
The output is identical, but now you have explicit references to the fig (figure) and ax (axes) objects. You can pass ax to functions. You can have multiple axes on one figure. You always know exactly what you're modifying.
For this entire book, we will use the object-oriented interface. It takes a few more characters of code, but it gives you full control and scales to complex visualizations. When you see matplotlib code online using plt.plot() without an explicit fig, ax, know that it's using the pyplot shortcut — it works, but it's not what we'll teach here.
The mental model: Think of
Figureas a blank piece of paper andAxesas a rectangular region on that paper where a chart is drawn. One piece of paper can hold one chart (one Axes) or many charts (multiple Axes — i.e., subplots). The Figure holds everything; the Axes is where the data lives.
15.2 Your First Plot: A Line Chart
Let's build a line chart step by step. We'll start with Elena's vaccination data — a simple time series of vaccination rates over several years for a single country.
import matplotlib.pyplot as plt
years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023]
rates = [72, 74, 76, 78, 79, 71, 75, 80, 83]
fig, ax = plt.subplots()
ax.plot(years, rates)
plt.show()
That's your first chart. A line connecting nine points, with years on the x-axis and vaccination rates on the y-axis. matplotlib chose default colors (blue), default axis ranges, and default tick marks. For exploratory work, this is perfectly adequate — you can see that rates generally increased, dipped around 2020, and recovered.
Now let's improve it — applying what we learned in Chapter 14 about explanatory visualization.
Adding Labels and Title
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
Notice three things:
figsize=(8, 5)sets the figure size in inches (width, height). The default is usually (6.4, 4.8), which is often too small for readability.set_title()receives a descriptive title that states the finding, not just the topic — following Tufte's advice from Chapter 14.set_xlabel()andset_ylabel()add axis labels with units. Always include units.
Customizing the Line
The plot() method accepts keyword arguments to control the line's appearance:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2,
marker="o", markersize=6)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
color="steelblue": A named color. matplotlib supports hundreds of named colors, hex codes ("#4682B4"), and RGB tuples.linewidth=2: Thicker line for visibility.marker="o": Circles at each data point. Other options:"s"(square),"^"(triangle),"D"(diamond),"x"(x-mark).markersize=6: Size of the marker circles.
Setting Axis Ranges
matplotlib auto-scales axes to fit your data, but sometimes you want explicit control:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.set_xlim(2014, 2024)
ax.set_ylim(0, 100)
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
Remember from Chapter 14: for a line chart, starting the y-axis at zero is optional (lines encode position, not length). But if you want to show the rates in context of the 0-100% range, setting set_ylim(0, 100) makes sense. For a closer look at the trend, you might use set_ylim(65, 90) instead.
15.3 Bar Charts: Comparing Categories
Bar charts are the workhorse of categorical comparison. Let's compare vaccination rates across WHO regions.
import matplotlib.pyplot as plt
regions = ["Africa", "Americas", "South-East\nAsia",
"Europe", "Eastern\nMed", "Western\nPacific"]
rates = [48, 79, 82, 91, 73, 88]
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color="steelblue")
ax.set_title("Sub-Saharan Africa Trails Other Regions in Vaccination Coverage")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()
Key points:
ax.bar(categories, values)creates a vertical bar chart.set_ylim(0, 100)is essential for bar charts — bars encode values as lengths, so the axis must start at zero. This is not optional.\nin category labels: Newlines in strings break long labels across two lines, preventing them from overlapping.
Horizontal Bar Charts
When category labels are long, horizontal bars are more readable:
fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(regions, rates, color="steelblue")
ax.set_title("Vaccination Rates by WHO Region, 2023")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_xlim(0, 100)
plt.show()
ax.barh() swaps the axes — categories on y, values on x. Labels read naturally from left to right.
Highlighting a Specific Bar
To draw attention to one bar (say, Africa), use a different color:
colors = ["darkorange" if r == "Africa" else "steelblue"
for r in regions]
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color=colors)
ax.set_title("Africa's Vaccination Rate is 30+ Points Below Europe's")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()
This uses a list comprehension to assign "darkorange" to Africa and "steelblue" to everyone else. The orange bar pops out through pre-attentive color processing — exactly the kind of deliberate design choice we discussed in Chapter 14.
Adding Value Labels on Bars
For explanatory charts, labeling bar values directly is often clearer than relying on the y-axis alone:
fig, ax = plt.subplots(figsize=(9, 5))
bars = ax.bar(regions, rates, color="steelblue")
for bar, rate in zip(bars, rates):
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1,
f"{rate}%", ha="center", va="bottom", fontsize=10)
ax.set_title("Vaccination Rates by WHO Region, 2023")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 105)
plt.show()
The loop places a text label above each bar. ha="center" centers the text horizontally over the bar; va="bottom" places it just above the specified y coordinate.
15.4 Scatter Plots: Exploring Relationships
Scatter plots reveal relationships between two continuous variables. Let's plot GDP per capita against vaccination rate for a set of countries.
import matplotlib.pyplot as plt
# Sample data: 12 countries
gdp = [1200, 3500, 8200, 12000, 15000, 22000,
28000, 35000, 42000, 48000, 55000, 62000]
vacc = [52, 61, 68, 74, 78, 83, 86, 89, 91, 90, 93, 95]
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(gdp, vacc, color="steelblue", s=60, alpha=0.7)
ax.set_title("Higher GDP Countries Tend to Have Higher Vaccination Rates")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
ax.scatter(x, y): Places one point per (x, y) pair.s=60: Marker size. Larger values make bigger dots.alpha=0.7: Transparency. When points overlap, transparency lets you see the density. Alpha ranges from 0 (invisible) to 1 (fully opaque).
Color-Coding by a Third Variable
You can encode a third variable using color:
import matplotlib.pyplot as plt
gdp = [1200, 3500, 8200, 12000, 15000, 22000,
28000, 35000, 42000, 48000, 55000, 62000]
vacc = [52, 61, 68, 74, 78, 83, 86, 89, 91, 90, 93, 95]
population = [45, 120, 30, 85, 15, 60, 25, 10, 35, 50, 8, 20]
fig, ax = plt.subplots(figsize=(8, 6))
scatter = ax.scatter(gdp, vacc, c=population, s=80,
cmap="YlOrRd", alpha=0.8, edgecolors="gray")
fig.colorbar(scatter, ax=ax, label="Population (millions)")
ax.set_title("GDP, Vaccination Rates, and Population")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
c=population: Maps thepopulationvalues to colors.cmap="YlOrRd": A colormap — a gradient palette from yellow through orange to red. matplotlib has many colormaps:"viridis"(default, perceptually uniform),"Blues","coolwarm"(diverging), etc.fig.colorbar(scatter, ...): Adds a color legend showing what the colors mean.edgecolors="gray": Adds a thin gray border around each point for visual separation.
Encoding Size (Bubble Chart)
You can also vary point size to encode a fourth variable:
fig, ax = plt.subplots(figsize=(8, 6))
sizes = [p * 5 for p in population] # Scale for visibility
ax.scatter(gdp, vacc, s=sizes, color="steelblue",
alpha=0.6, edgecolors="navy")
ax.set_title("Larger Bubbles = Larger Population")
ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
Remember Cleveland and McGill's hierarchy from Chapter 14: area is less accurately perceived than position or length. Bubble charts are useful for getting a general sense of a third variable, but readers shouldn't rely on precise size comparisons.
15.5 Histograms: Understanding Distributions
Histograms show the distribution of a single continuous variable — one of the most common exploratory visualizations.
import matplotlib.pyplot as plt
# Simulated vaccination rates for 150 countries
import random
random.seed(42)
vacc_rates = [random.gauss(75, 15) for _ in range(150)]
vacc_rates = [max(0, min(100, r)) for r in vacc_rates]
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(vacc_rates, bins=20, color="steelblue",
edgecolor="white", alpha=0.8)
ax.set_title("Global Vaccination Rates Are Roughly Normally Distributed")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_ylabel("Number of Countries")
plt.show()
bins=20: Divides the data range into 20 equal-width intervals. More bins show more detail but can be noisy; fewer bins smooth the picture but may hide structure. A good starting range is 15-30 for most datasets.edgecolor="white": Adds white borders between bars so you can distinguish them.alpha=0.8: Slight transparency.
Choosing the Number of Bins
There's no single "correct" number of bins. Here's a practical approach: try several values and see how the shape changes.
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, n_bins in zip(axes, [10, 20, 40]):
ax.hist(vacc_rates, bins=n_bins, color="steelblue",
edgecolor="white")
ax.set_title(f"{n_bins} bins")
ax.set_xlabel("Vaccination Rate (%)")
ax.set_ylabel("Count")
fig.suptitle("Effect of Bin Count on Histogram Appearance", fontsize=14)
fig.tight_layout()
plt.show()
This creates three side-by-side histograms with different bin counts — a small multiples comparison. We'll explore subplots() in detail in Section 15.7.
15.6 Customization: Making Charts Your Own
Now that you know the four basic chart types, let's learn the customization tools that transform rough exploratory charts into polished explanatory ones.
Colors
matplotlib accepts colors in many formats:
# Named colors
ax.plot(x, y, color="steelblue")
ax.plot(x, y, color="tomato")
# Hex codes
ax.plot(x, y, color="#2C3E50")
# RGB tuples (0-1 range)
ax.plot(x, y, color=(0.2, 0.4, 0.6))
For a list of all named colors, search "matplotlib named colors" in the documentation. For exploratory work, names like "steelblue", "tomato", "seagreen", "slategray", and "goldenrod" are readable and visually distinct.
Gridlines
Gridlines help readers estimate values but should be subtle:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.grid(True, alpha=0.3, linestyle="--")
ax.set_title("Vaccination Rate Over Time")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
ax.grid(True): Turns gridlines on.alpha=0.3: Makes them very faint — following Tufte's principle that gridlines should support, not compete with, the data.linestyle="--": Dashed lines are less visually heavy than solid lines.
Legends
When multiple data series appear on the same chart, a legend identifies them:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, [72, 74, 76, 78, 79, 71, 75, 80, 83],
label="Country A", color="steelblue", linewidth=2)
ax.plot(years, [85, 86, 87, 88, 88, 84, 86, 89, 91],
label="Country B", color="tomato", linewidth=2)
ax.legend(frameon=False, fontsize=11)
ax.set_title("Country A Is Closing the Gap with Country B")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
label="Country A": The text that appears in the legend for this line.ax.legend(): Displays the legend. matplotlib automatically places it where it overlaps the data least.frameon=False: Removes the legend box border — less clutter, following Tufte.
Removing Spines
Spines are the borders around the plot area. Removing the top and right spines makes charts cleaner:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set_title("A Cleaner Look: Top and Right Spines Removed")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
plt.show()
This small change makes a noticeable difference in visual cleanliness. Many professional data visualization styles (FiveThirtyEight, The Economist) remove unnecessary spines.
15.7 Multi-Panel Figures with Subplots
One of matplotlib's most powerful features is the ability to create multi-panel figures — what Chapter 14 called faceting or small multiples.
Basic Subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Left panel: bar chart
axes[0].bar(regions, rates, color="steelblue")
axes[0].set_title("Vaccination Rates by Region")
axes[0].set_ylabel("Rate (%)")
axes[0].set_ylim(0, 100)
# Right panel: histogram
axes[1].hist(vacc_rates, bins=20, color="seagreen", edgecolor="white")
axes[1].set_title("Distribution of Country-Level Rates")
axes[1].set_xlabel("Vaccination Rate (%)")
axes[1].set_ylabel("Count")
fig.suptitle("Two Views of Global Vaccination Data", fontsize=14)
fig.tight_layout()
plt.show()
plt.subplots(1, 2)creates a figure with 1 row and 2 columns of axes.axes[0]andaxes[1]are the left and right panels.fig.suptitle()adds a title for the entire figure (above all panels).fig.tight_layout()automatically adjusts spacing so titles and labels don't overlap. Always call this.
Grid Layouts
For more panels, use a grid:
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Top-left
axes[0, 0].plot(years, rates, color="steelblue", marker="o")
axes[0, 0].set_title("Trend Over Time")
# Top-right
axes[0, 1].bar(regions[:4], rates[:4], color="tomato")
axes[0, 1].set_title("Top 4 Regions")
# Bottom-left
axes[1, 0].scatter(gdp, vacc, color="seagreen", s=60)
axes[1, 0].set_title("GDP vs. Vaccination")
# Bottom-right
axes[1, 1].hist(vacc_rates, bins=20, color="goldenrod", edgecolor="white")
axes[1, 1].set_title("Rate Distribution")
fig.suptitle("Four Views of Global Health Data", fontsize=14, y=1.01)
fig.tight_layout()
plt.show()
With a 2D grid, axes[row, col] indexes the panels. Row 0 is top, row 1 is bottom. Column 0 is left, column 1 is right.
Sharing Axes
When comparing the same variable across panels, sharing axis scales ensures honest comparison:
fig, axes = plt.subplots(1, 3, figsize=(15, 4), sharey=True)
country_data = {
"Country A": [72, 74, 76, 78, 79, 71, 75, 80, 83],
"Country B": [85, 86, 87, 88, 88, 84, 86, 89, 91],
"Country C": [40, 42, 45, 48, 50, 47, 52, 55, 60],
}
for ax, (name, data) in zip(axes, country_data.items()):
ax.plot(years, data, color="steelblue", linewidth=2, marker="o")
ax.set_title(name)
ax.set_xlabel("Year")
axes[0].set_ylabel("Vaccination Rate (%)")
fig.suptitle("Three Countries, Three Trajectories", fontsize=14)
fig.tight_layout()
plt.show()
sharey=True: All three panels share the same y-axis range. This is critical for honest comparison — without shared axes, panels auto-scale independently and a modest change could look just as dramatic as a large one.
15.8 Annotations: Highlighting What Matters
Annotations turn a chart from "here's some data" into "here's what you should notice." They're the difference between exploratory and explanatory visualization.
Text Annotations
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
# Annotate the 2020 dip
ax.annotate("COVID-19\npandemic dip",
xy=(2020, 71), xytext=(2017.5, 65),
fontsize=10, color="tomato",
arrowprops=dict(arrowstyle="->", color="tomato"),
ha="center")
ax.set_title("Vaccination Rate Dipped in 2020 but Recovered by 2023")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.show()
ax.annotate(text, xy=..., xytext=...): Places annotation text atxytextwith an arrow pointing toxy(the data point).arrowprops=dict(arrowstyle="->", color="tomato"): Draws a red arrow from the text to the point.ha="center": Horizontally centers the text.
Reference Lines
Horizontal or vertical reference lines provide context:
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(regions, rates, color="steelblue")
# Add global average reference line
global_avg = sum(rates) / len(rates)
ax.axhline(y=global_avg, color="tomato", linestyle="--",
linewidth=1.5, label=f"Global Average ({global_avg:.0f}%)")
ax.legend(frameon=False)
ax.set_title("Three Regions Fall Below the Global Average")
ax.set_xlabel("WHO Region")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
plt.show()
ax.axhline(y=...): Draws a horizontal line across the entire chart at the specified y value.ax.axvline(x=...): Draws a vertical line (useful for marking specific dates on a time series).
Combining Annotations
A polished explanatory chart often combines several annotation techniques:
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(years, rates, color="steelblue", linewidth=2.5, marker="o",
markersize=7, zorder=3)
# Reference line at 80%
ax.axhline(y=80, color="gray", linestyle=":", alpha=0.5)
ax.text(2023.3, 80.5, "80% target", fontsize=9, color="gray")
# Annotate the dip
ax.annotate("2020: pandemic\ndisrupts vaccination",
xy=(2020, 71), xytext=(2017, 63),
fontsize=10, color="tomato",
arrowprops=dict(arrowstyle="->", color="tomato"))
# Annotate the recovery
ax.annotate("2023: full recovery\nand new high",
xy=(2023, 83), xytext=(2021, 88),
fontsize=10, color="seagreen",
arrowprops=dict(arrowstyle="->", color="seagreen"))
ax.set_title("Vaccination Rates Recovered from Pandemic Dip to Reach New Highs",
fontsize=13, fontweight="bold")
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(True, alpha=0.2)
fig.tight_layout()
plt.show()
This chart tells a story: there was a dip, here's why, and here's the recovery. Every annotation earns its place by contributing to the narrative.
15.9 Working with pandas DataFrames
In practice, your data lives in pandas DataFrames, not in Python lists. matplotlib works seamlessly with pandas:
import pandas as pd
import matplotlib.pyplot as plt
# Create a small DataFrame
df = pd.DataFrame({
"region": ["Africa", "Americas", "SE Asia",
"Europe", "E Med", "W Pacific"],
"rate_2020": [42, 73, 78, 87, 68, 84],
"rate_2023": [48, 79, 82, 91, 73, 88],
})
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(df["region"], df["rate_2023"], color="steelblue")
ax.set_title("Vaccination Rates by Region, 2023")
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
fig.tight_layout()
plt.show()
You can pass DataFrame columns directly to ax.bar(), ax.scatter(), ax.plot(), and ax.hist(). They're just arrays of values, and matplotlib doesn't care whether they come from a list, a NumPy array, or a pandas Series.
Grouped Bar Charts from DataFrames
Showing two years side-by-side requires a bit of positioning:
import numpy as np
x = np.arange(len(df["region"]))
width = 0.35
fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, df["rate_2020"], width,
label="2020", color="lightcoral")
ax.bar(x + width/2, df["rate_2023"], width,
label="2023", color="steelblue")
ax.set_xticks(x)
ax.set_xticklabels(df["region"])
ax.set_ylabel("Vaccination Rate (%)")
ax.set_ylim(0, 100)
ax.set_title("All Regions Improved from 2020 to 2023")
ax.legend(frameon=False)
fig.tight_layout()
plt.show()
np.arange(len(...)): Creates numerical positions [0, 1, 2, 3, 4, 5] for the bar groups.- Offset by
width/2: Places 2020 bars slightly left and 2023 bars slightly right of each position. set_xticks()andset_xticklabels(): Replaces numeric positions with region names.
15.10 Saving Figures: Publication-Quality Output
Once you've built a chart worth keeping, you need to save it. The savefig() method exports your figure to a file.
Basic Save
fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(regions, rates, color="steelblue")
ax.set_title("Vaccination Rates by Region")
ax.set_ylabel("Rate (%)")
ax.set_ylim(0, 100)
fig.tight_layout()
fig.savefig("vaccination_by_region.png", dpi=150, bbox_inches="tight")
dpi=150: Dots per inch. 72 is screen quality. 150 is good for documents. 300 is publication quality. Higher DPI means larger file size but sharper output.bbox_inches="tight": Trims whitespace around the figure. Without this, saved figures often have excessive margins.
Format Options
matplotlib supports multiple output formats:
fig.savefig("chart.png", dpi=300, bbox_inches="tight") # Raster
fig.savefig("chart.svg", bbox_inches="tight") # Vector
fig.savefig("chart.pdf", bbox_inches="tight") # Vector
- PNG: Raster format. Good for web, presentations, and notebooks. Use
dpi=300for print quality. - SVG: Scalable vector format. Lines and text remain sharp at any zoom level. Ideal for web and for figures that need to scale.
- PDF: Vector format. Ideal for inclusion in academic papers and reports. Text remains selectable and searchable.
For most data science work, save both a PNG (for embedding in notebooks and slides) and an SVG or PDF (for publications and reports).
Saving from a Jupyter Notebook
In Jupyter, figures display inline automatically. To save them, call savefig() before plt.show() or in the same cell:
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(years, rates, color="steelblue", linewidth=2, marker="o")
ax.set_title("Vaccination Trend")
ax.set_xlabel("Year")
ax.set_ylabel("Rate (%)")
fig.tight_layout()
fig.savefig("vaccination_trend.png", dpi=150, bbox_inches="tight")
plt.show()
15.11 Putting It All Together: A Complete Workflow
Let's build a polished, publication-ready figure from start to finish, following the chart plan approach from Chapter 14.
Chart Plan: - Question: How have vaccination rates changed over time for three countries with different trajectories? - Chart type: Multi-panel line chart (3 panels) - Data: Time series for three countries - Audience: Explanatory — for a policy brief
import matplotlib.pyplot as plt
import numpy as np
# Data
years = list(range(2015, 2024))
countries = {
"Rwanda": [93, 95, 97, 97, 98, 93, 96, 98, 99],
"Brazil": [84, 79, 76, 72, 73, 68, 70, 77, 80],
"Afghanistan": [60, 58, 55, 52, 50, 45, 42, 48, 50],
}
colors = {"Rwanda": "seagreen", "Brazil": "steelblue",
"Afghanistan": "tomato"}
# Build the figure
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
for ax, (name, data) in zip(axes, countries.items()):
ax.plot(years, data, color=colors[name], linewidth=2.5,
marker="o", markersize=5)
ax.set_title(name, fontsize=13, fontweight="bold")
ax.set_xlabel("Year")
ax.set_ylim(30, 105)
ax.axhline(y=80, color="gray", linestyle=":", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(True, alpha=0.15)
axes[0].set_ylabel("Vaccination Rate (%)")
fig.suptitle("Three Countries, Three Vaccination Trajectories",
fontsize=15, fontweight="bold", y=1.02)
fig.tight_layout()
fig.savefig("three_trajectories.png", dpi=300, bbox_inches="tight")
plt.show()
This figure: - Uses shared y-axes for honest comparison - Adds a reference line at 80% (a common coverage target) - Removes top/right spines for a clean look - Uses distinct colors for each country - Has a finding-based title at the figure level and country names at the panel level - Is saved at 300 DPI for print quality
15.12 Common Mistakes and How to Fix Them
matplotlib has a famously large API and some confusing conventions. Here are the mistakes you'll encounter most often, and how to fix them.
Mistake 1: Mixing pyplot and OO Interfaces
# WRONG: mixing styles
fig, ax = plt.subplots()
plt.title("My Chart") # pyplot style on an OO figure
ax.plot(x, y)
# RIGHT: consistent OO style
fig, ax = plt.subplots()
ax.set_title("My Chart") # OO style
ax.plot(x, y)
The plt.title() function operates on the "current axes," which might not be the axes you think. Use ax.set_title() to be explicit.
Mistake 2: Forgetting tight_layout()
Without tight_layout(), titles and labels frequently get cut off or overlap. Always call it:
fig.tight_layout() # Call this before savefig and show
Mistake 3: Not Setting Y-Axis to Zero for Bar Charts
# WRONG: auto-scaled y-axis for bar chart
ax.bar(categories, values) # axis might start at 50
# RIGHT: explicit zero baseline
ax.bar(categories, values)
ax.set_ylim(0, max(values) * 1.1) # start at 0, 10% headroom
Mistake 4: Overlapping X-Axis Labels
Long category labels overlap when there are many bars:
# FIX: rotate labels
ax.set_xticklabels(labels, rotation=45, ha="right")
Or use horizontal bars (ax.barh()) where labels read naturally.
Mistake 5: Too Many Colors
Using a different color for each bar in a single-variable bar chart adds visual noise without information:
# WRONG: rainbow bars for no reason
ax.bar(categories, values, color=["red", "blue", "green",
"orange", "purple", "pink"])
# RIGHT: one color (or highlight specific bars)
ax.bar(categories, values, color="steelblue")
Use color to encode a variable, not to decorate.
15.13 Quick Reference: The matplotlib Cheat Sheet
Here's a condensed reference for the most common operations. Tear this page out (metaphorically) and keep it next to your keyboard.
Creating Figures
fig, ax = plt.subplots() # One panel
fig, ax = plt.subplots(figsize=(10, 6)) # Custom size
fig, axes = plt.subplots(1, 3) # 1 row, 3 columns
fig, axes = plt.subplots(2, 2) # 2x2 grid
fig, axes = plt.subplots(2, 2, sharey=True) # Shared y-axes
Plot Types
ax.plot(x, y) # Line chart
ax.bar(categories, values) # Vertical bar chart
ax.barh(categories, values) # Horizontal bar chart
ax.scatter(x, y) # Scatter plot
ax.hist(data, bins=20) # Histogram
Customization
ax.set_title("Title")
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_xlim(min, max)
ax.set_ylim(min, max)
ax.legend(frameon=False)
ax.grid(True, alpha=0.3)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
Line/Marker Options
ax.plot(x, y, color="steelblue", linewidth=2,
linestyle="--", marker="o", markersize=6,
label="Series A")
Annotations
ax.annotate("Label", xy=(x, y), xytext=(tx, ty),
arrowprops=dict(arrowstyle="->"))
ax.axhline(y=val, color="gray", linestyle=":")
ax.axvline(x=val, color="gray", linestyle=":")
ax.text(x, y, "Text", fontsize=10, ha="center")
Saving
fig.tight_layout()
fig.savefig("name.png", dpi=300, bbox_inches="tight")
fig.savefig("name.svg", bbox_inches="tight")
fig.savefig("name.pdf", bbox_inches="tight")
15.14 Chapter Summary
You came into this chapter knowing how to think about charts. Now you know how to build them.
We learned that matplotlib has two interfaces — the implicit pyplot style and the explicit object-oriented style — and chose the OO interface for its clarity and power. We built the four fundamental chart types: line charts for trends, bar charts for comparisons, scatter plots for relationships, and histograms for distributions.
We customized charts with titles (that state findings, not topics), axis labels (with units), legends (without borders), gridlines (faint and supportive), and cleaned-up spines. We built multi-panel figures with subplots() and learned to share axes for honest comparison. We annotated charts with text, arrows, and reference lines to guide the reader's eye to the key story.
We worked with pandas DataFrames directly, built grouped bar charts, and saved figures in multiple formats at publication quality. And we covered the five most common mistakes and how to avoid them.
matplotlib's API is large and sometimes inconsistent — John Hunter himself acknowledged that its MATLAB heritage created some awkward conventions. You don't need to memorize every option. You need to understand the Figure/Axes mental model, know the basic chart types, and keep the cheat sheet in Section 15.13 within reach. The rest you can look up when you need it.
In the next chapter, seaborn will make many of these operations simpler and more statistically sophisticated. But every seaborn chart is a matplotlib chart under the hood, and knowing how the engine works will make you a more effective, more confident data visualizer.
Key Vocabulary Summary
| Term | Definition |
|---|---|
| Figure | The top-level container in matplotlib — the "canvas" that holds one or more Axes |
| Axes | A single chart panel within a Figure, containing the coordinate system, data plots, labels, and annotations |
| subplot | One of multiple Axes arranged in a grid within a single Figure |
| plot() | Axes method for creating line charts by connecting (x, y) points |
| bar() / barh() | Axes methods for creating vertical / horizontal bar charts |
| scatter() | Axes method for creating scatter plots from (x, y) point pairs |
| hist() | Axes method for creating histograms from a single data array |
| set_xlabel() / set_ylabel() | Axes methods for setting axis label text |
| set_title() | Axes method for setting the chart title |
| legend() | Axes method for displaying a legend identifying multiple data series |
| colormap | A mapping from numerical values to colors, used in scatter plots and heatmaps (e.g., "viridis", "Blues") |
| savefig() | Figure method for saving the figure to a file (PNG, SVG, PDF) |
| tight_layout() | Figure method that auto-adjusts spacing to prevent label overlap |
| annotation | Text, arrows, or reference lines added to a chart to highlight specific findings |
| object-oriented interface | matplotlib's explicit approach using Figure and Axes objects, preferred over the implicit pyplot style |
Next up: Chapter 16 — Statistical Visualization with seaborn. Same data, more statistical power, less code.
Related Reading
Explore this topic in other books
Intro to Data Science Grammar of Graphics Introductory Statistics Graphs and Descriptive Statistics Introductory Statistics Communicating with Data Political Analytics Visualizing the Electorate