Key Takeaways: matplotlib Foundations

This is your reference card and cheat sheet for Chapter 15. Keep it open while building charts. The first section covers concepts; the second section is a quick-reference code guide you can copy from directly.


Key Concepts

  • matplotlib has two interfaces — use the OO one. The pyplot interface (plt.plot(...)) is quick but implicit. The object-oriented interface (fig, ax = plt.subplots() followed by ax.plot(...)) is explicit, composable, and essential for multi-panel figures. We use OO throughout this book.

  • Figure is the canvas; Axes is the chart. A Figure holds one or more Axes. Each Axes is a complete coordinate system with its own data, title, labels, and legend. Subplots are just multiple Axes on one Figure.

  • The four workhorses: line, bar, scatter, histogram. ax.plot() for trends over time. ax.bar() / ax.barh() for categorical comparison. ax.scatter() for relationships between continuous variables. ax.hist() for distributions. These four cover the vast majority of data science visualization needs.

  • Customization turns exploratory into explanatory. A default chart is fine for exploration. For communication, add: a descriptive title (finding, not topic), axis labels with units, appropriate axis ranges (zero baseline for bars), clean design (faint gridlines, removed spines), and annotations highlighting key findings.

  • Multi-panel figures enable comparison. plt.subplots(rows, cols) creates a grid of panels. Use sharey=True or sharex=True to ensure honest comparison across panels. Always call fig.tight_layout() to prevent overlap.

  • Save at the right quality. fig.savefig("chart.png", dpi=300, bbox_inches="tight") for print-quality PNG. Use SVG or PDF for vector formats that scale perfectly. Always use bbox_inches="tight" to trim excess whitespace.


matplotlib Cheat Sheet

Setup

import matplotlib.pyplot as plt
import numpy as np  # often needed for positioning

Creating Figures and Axes

# Single panel
fig, ax = plt.subplots()
fig, ax = plt.subplots(figsize=(10, 6))

# Multiple panels
fig, axes = plt.subplots(1, 3)              # 1 row, 3 cols
fig, axes = plt.subplots(2, 2)              # 2x2 grid
fig, axes = plt.subplots(1, 3, sharey=True) # shared y-axis
fig, axes = plt.subplots(2, 1, sharex=True) # shared x-axis

Accessing Panels

# 1D array (single row or single column)
axes[0], axes[1], axes[2]

# 2D array (grid)
axes[0, 0]  # top-left
axes[0, 1]  # top-right
axes[1, 0]  # bottom-left
axes[1, 1]  # bottom-right

Line Chart

ax.plot(x, y)
ax.plot(x, y, color="steelblue", linewidth=2,
        marker="o", markersize=6, label="Label",
        linestyle="--")

Common markers: "o" circle, "s" square, "^" triangle, "D" diamond, "x" x-mark

Common linestyles: "-" solid, "--" dashed, ":" dotted, "-." dash-dot

Bar Chart

# Vertical bars
ax.bar(categories, values, color="steelblue")

# Horizontal bars
ax.barh(categories, values, color="steelblue")

# Highlighting specific bars
colors = ["tomato" if c == "Target" else "steelblue"
          for c in categories]
ax.bar(categories, values, color=colors)

Scatter Plot

ax.scatter(x, y, color="steelblue", s=60, alpha=0.7)

# Color-coded by third variable
scatter = ax.scatter(x, y, c=third_var, cmap="viridis",
                     s=60, alpha=0.8, edgecolors="gray")
fig.colorbar(scatter, ax=ax, label="Third Variable")

Common colormaps: - Sequential: "viridis", "Blues", "YlOrRd", "Greens" - Diverging: "coolwarm", "RdBu", "PiYG" - Categorical: use explicit color lists

Histogram

ax.hist(data, bins=20, color="steelblue",
        edgecolor="white", alpha=0.8)

# Fixed range (important for comparison)
ax.hist(data, bins=20, range=(0, 100),
        color="steelblue", edgecolor="white")

Titles and Labels

ax.set_title("Descriptive Finding Title", fontsize=13,
             fontweight="bold")
ax.set_xlabel("X Label (units)")
ax.set_ylabel("Y Label (units)")
fig.suptitle("Figure-Level Title", fontsize=14, y=1.02)

Axis Control

ax.set_xlim(min_val, max_val)
ax.set_ylim(0, 100)           # Zero baseline for bar charts!
ax.set_xticks([2015, 2018, 2021, 2024])
ax.set_xticklabels(labels, rotation=45, ha="right")

Legend

# Labels set during plotting
ax.plot(x, y, label="Series A")
ax.plot(x, y2, label="Series B")
ax.legend()                              # automatic placement
ax.legend(frameon=False)                 # no border
ax.legend(loc="upper left")             # specific location
ax.legend(frameon=False, fontsize=11)    # clean + readable

Gridlines

ax.grid(True, alpha=0.3)                # faint
ax.grid(True, alpha=0.2, linestyle="--")  # dashed, very faint
ax.grid(True, alpha=0.2, axis="y")      # y-axis only

Spines (Borders)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
# Keep bottom and left for axes

Annotations

# Text with arrow
ax.annotate("Important point",
            xy=(x_data, y_data),         # arrow points here
            xytext=(x_text, y_text),     # text placed here
            fontsize=10, color="tomato",
            arrowprops=dict(arrowstyle="->", color="tomato"))

# Horizontal reference line
ax.axhline(y=value, color="gray", linestyle="--", alpha=0.5,
           label="Reference")

# Vertical reference line
ax.axvline(x=value, color="gray", linestyle=":", alpha=0.5)

# Plain text (no arrow)
ax.text(x, y, "Text here", fontsize=10, ha="center",
        va="bottom", color="gray")

Value Labels on Bars

bars = ax.bar(categories, values, color="steelblue")
for bar, val in zip(bars, values):
    ax.text(bar.get_x() + bar.get_width() / 2,
            bar.get_height() + 1,
            f"{val}%", ha="center", va="bottom", fontsize=10)

Grouped Bar Chart

import numpy as np

x = np.arange(len(categories))
width = 0.35

ax.bar(x - width/2, values_a, width, label="Group A",
       color="steelblue")
ax.bar(x + width/2, values_b, width, label="Group B",
       color="tomato")

ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend(frameon=False)

Saving

fig.tight_layout()                      # ALWAYS call this first

# Raster (for screens, slides, notebooks)
fig.savefig("chart.png", dpi=150, bbox_inches="tight")

# High-res raster (for print)
fig.savefig("chart.png", dpi=300, bbox_inches="tight")

# Vector (for publications, web scaling)
fig.savefig("chart.svg", bbox_inches="tight")
fig.savefig("chart.pdf", bbox_inches="tight")

Display

plt.show()  # In scripts, shows the figure window
# In Jupyter, figures display inline automatically

Common Mistakes Checklist

  • [ ] Using pyplot (plt.title()) when you should use OO (ax.set_title())
  • [ ] Forgetting fig.tight_layout() before save/show -- labels get cut off
  • [ ] Bar chart y-axis not starting at zero -- visually misleading
  • [ ] Overlapping x-axis labels -- fix with rotation or horizontal bars
  • [ ] Rainbow colors on bars that represent the same variable -- use one color
  • [ ] Title says the topic, not the finding -- "Bar Chart" tells the reader nothing
  • [ ] Missing axis labels or units -- "Rate" could mean anything; "Vaccination Rate (%)" is clear

Design Principles (from Chapter 14, applied here)

  1. Title states the finding: Not "Vaccination Rates" but "Africa Trails the Global Average by 27 Points"
  2. Y-axis at zero for bar charts. Always. No exceptions.
  3. Faint gridlines: alpha=0.2 or 0.3. Grid should support, not compete.
  4. Remove unnecessary spines: Top and right spines are almost never needed.
  5. Legend without border: frameon=False. The border is non-data ink.
  6. Use color intentionally: Color should encode a variable or highlight a data point, not decorate.
  7. Annotate the story: Reference lines, text labels, and arrows guide the reader to your finding.

What You Should Be Able to Do Now

  • [ ] Create line charts, bar charts, scatter plots, and histograms using fig, ax = plt.subplots()
  • [ ] Customize with titles, labels, colors, gridlines, and spine removal
  • [ ] Build multi-panel figures with plt.subplots(rows, cols) and shared axes
  • [ ] Annotate charts with annotate(), axhline(), axvline(), and text()
  • [ ] Work with pandas DataFrames directly in matplotlib
  • [ ] Save figures as PNG (at 300 DPI), SVG, or PDF with savefig()
  • [ ] Avoid the five common mistakes listed in this chapter

If you can do all of this, you have a solid matplotlib foundation. In Chapter 16, seaborn will make many of these operations simpler and add statistical visualization capabilities. But you'll always be able to drop down to matplotlib when you need full control — and you'll understand what seaborn is doing under the hood.


Next: Chapter 16 — Statistical Visualization with seaborn