12 min read

When Priya Okonkwo emails Sandra Chen a table of twelve monthly revenue figures, Sandra has to read all twelve numbers, mentally compute the trend, and form a judgment about whether performance is improving. When Priya emails a line chart of those...

Chapter 14: Introduction to Data Visualization with matplotlib

Learning Objectives

By the end of this chapter, you will be able to:

  • Understand the matplotlib architecture: Figure, Axes, and the pyplot interface
  • Apply the "good chart" checklist to every visualization you create
  • Build line charts for trends, bar charts for comparisons, histograms for distributions, and scatter plots for relationships
  • Create horizontal bar charts for long category labels
  • Know when to use — and when to avoid — pie charts
  • Save charts to files using .savefig() with appropriate DPI and format settings
  • Apply formatting: colors, line styles, markers, fonts, and grid lines
  • Compose multiple charts into a single figure using plt.subplots()
  • Generate charts quickly from pandas DataFrames using the .plot() accessor
  • Choose the right chart type for each business analysis scenario

14.1 Why Visualization Matters in Business

Numbers tell part of the story. Charts tell the rest.

When Priya Okonkwo emails Sandra Chen a table of twelve monthly revenue figures, Sandra has to read all twelve numbers, mentally compute the trend, and form a judgment about whether performance is improving. When Priya emails a line chart of those same twelve numbers with a clear upward trend line, Sandra sees the story in three seconds.

Data visualization is not decoration. It is the most efficient mechanism humans have for recognizing patterns, comparing magnitudes, and identifying anomalies in quantitative data. A well-designed chart communicates in seconds what a table communicates in minutes.

At the same time, a poorly designed chart — missing labels, misleading scales, inappropriate chart type — actively distorts understanding. This chapter teaches you not just the mechanics of matplotlib, but also the principles that separate charts that illuminate from charts that confuse.


14.2 The matplotlib Architecture

Before writing a single line of chart code, it helps to understand how matplotlib is organized. There are three levels to the architecture:

14.2.1 The Figure

A Figure is the entire canvas — the outermost container for everything you draw. Think of it as a blank piece of paper. A Figure can contain one or many charts.

14.2.2 The Axes

An Axes (note: plural form is the class name, not "axis") is a single chart within a Figure. It has its own x-axis, y-axis, tick marks, labels, title, and plot area. Most of your chart-building work happens at the Axes level.

14.2.3 The pyplot Interface

matplotlib.pyplot (conventionally imported as plt) provides a state-based convenience interface that manages the current Figure and current Axes for you. For simple, single-chart scripts, pyplot is the fastest way to work. For complex multi-panel figures, the object-oriented interface (working directly with fig and ax objects) is more reliable.

import matplotlib.pyplot as plt
import numpy as np

# ── pyplot interface (simple, implicit) ──────────────────────────────────────
plt.figure(figsize=(8, 5))
plt.plot([1, 2, 3, 4], [10, 25, 20, 35])
plt.title("Simple Line Chart")
plt.xlabel("Period")
plt.ylabel("Value")
plt.show()

# ── Object-oriented interface (explicit, recommended for complex figures) ────
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot([1, 2, 3, 4], [10, 25, 20, 35])
ax.set_title("Simple Line Chart")
ax.set_xlabel("Period")
ax.set_ylabel("Value")
plt.tight_layout()
plt.show()

Both produce the same result. Throughout this chapter, we will use the object-oriented interface (fig, ax = plt.subplots(...)) because it scales cleanly to multi-panel figures and avoids the state-management quirks of the pyplot interface.

14.2.4 The Anatomy of a Chart

Every well-formed matplotlib chart has these components:

┌─────────────────────────────────────────────────────────┐
│                       Figure Title                      │
│                                                         │
│  Y-axis   ┌──────────────────────────────────────────┐  │
│  label    │                                          │  │
│           │              Plot Area (Axes)            │  │
│     ↑     │                                          │  │
│     │     │         (data is drawn here)             │  │
│           │                                          │  │
│           └──────────────────────────────────────────┘  │
│                         X-axis label                    │
│                                                         │
│  Legend (optional)    Source / Note (optional)          │
└─────────────────────────────────────────────────────────┘

14.3 The Good Chart Checklist

Every chart you produce for a business audience should pass this checklist before it leaves your desk:

Item Description
Title Clear, descriptive, specific to the data shown
Axis labels Both axes labeled with the measure name and units
Appropriate scale Y-axis starts at zero for bar charts; uses meaningful range for line charts
Legend Present when multiple series are shown; absent when it would be redundant
Data labels Considered for bar charts; used when exact values matter
Color Purposeful, not decorative; accessible to colorblind readers
Grid lines Light horizontal lines for quantitative comparisons; no vertical grid lines for most charts
Whitespace plt.tight_layout() to prevent label clipping
Saved at adequate DPI 150 DPI minimum for screen; 300 DPI for print
Right chart type The chart type matches the analytical question

Priya runs through this checklist mentally for every chart she sends to Sandra. It has saved her from several embarrassing moments — missing axis labels, truncated y-axes that exaggerate small changes, and legend entries that say "Series 1" instead of "North Region."


Line charts are the correct choice when you are showing how a continuous variable changes over an ordered sequence — almost always, time.

14.4.1 Basic Line Chart

import matplotlib.pyplot as plt
import pandas as pd

monthly = pd.DataFrame({
    "month": pd.date_range("2024-01-01", periods=12, freq="MS"),
    "revenue": [
        42000, 38000, 51000, 55000, 47000, 60000,
        63000, 58000, 67000, 71000, 65000, 79000,
    ],
})

fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(
    monthly["month"],
    monthly["revenue"],
    color="#2563EB",    # Blue
    linewidth=2,
    marker="o",
    markersize=6,
    label="Monthly Revenue",
)

ax.set_title("Acme Corp — Monthly Revenue 2024", fontsize=14, fontweight="bold", pad=12)
ax.set_xlabel("Month", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.grid(axis="y", linestyle="--", alpha=0.5)
ax.legend()

plt.tight_layout()
plt.savefig("monthly_revenue_line.png", dpi=150, bbox_inches="tight")
plt.show()

Key formatting decisions in this chart:

  • marker="o" adds a dot at each data point, making it easy to read individual months.
  • ax.yaxis.set_major_formatter(...) formats the y-axis as currency without writing $42000 in a Python string for each tick.
  • grid(axis="y") adds horizontal grid lines only — vertical grid lines on a line chart are rarely useful.
  • bbox_inches="tight" in savefig prevents axis labels from being clipped at the image boundary.

14.4.2 Multiple Lines on One Chart

regions = pd.DataFrame({
    "month": pd.date_range("2024-01-01", periods=6, freq="MS"),
    "North": [18000, 16000, 22000, 24000, 20000, 27000],
    "South": [12000, 11000, 14000, 15000, 13000, 18000],
    "West":  [8000,  9000,  11000, 12000, 10000, 14000],
    "East":  [4000,  4500,  5000,  4500,  4000,  5500],
})

COLORS = {"North": "#2563EB", "South": "#16A34A", "West": "#D97706", "East": "#DC2626"}

fig, ax = plt.subplots(figsize=(10, 5))

for region in ["North", "South", "West", "East"]:
    ax.plot(
        regions["month"],
        regions[region],
        color=COLORS[region],
        linewidth=2,
        marker="o",
        markersize=5,
        label=region,
    )

ax.set_title("Acme Corp — Revenue by Region, Jan–Jun 2024", fontsize=14, fontweight="bold")
ax.set_xlabel("Month", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.legend(title="Region", loc="upper left")
ax.grid(axis="y", linestyle="--", alpha=0.5)

plt.tight_layout()
plt.show()

When comparing multiple series on one line chart, assign each line a distinct, accessible color and always include a legend. Four to five lines is typically the maximum before the chart becomes difficult to read.


14.5 Bar Charts: Comparing Categories

Bar charts are for comparing a quantitative value across discrete categories. The height of each bar encodes the value — a direct, natural visual encoding that human perception processes accurately.

14.5.1 Vertical Bar Chart

regional_revenue = pd.DataFrame({
    "region": ["North", "South", "East", "West"],
    "revenue": [328000, 196000, 142000, 244000],
})

# Sort for cleaner presentation
regional_revenue = regional_revenue.sort_values("revenue", ascending=False)

fig, ax = plt.subplots(figsize=(8, 5))

bars = ax.bar(
    regional_revenue["region"],
    regional_revenue["revenue"],
    color=["#2563EB", "#3B82F6", "#60A5FA", "#93C5FD"],  # Same hue, varying lightness
    edgecolor="white",
    linewidth=0.8,
    width=0.6,
)

# Add data labels on top of each bar
for bar in bars:
    height = bar.get_height()
    ax.text(
        bar.get_x() + bar.get_width() / 2.0,
        height + 3000,
        f"${height:,.0f}",
        ha="center",
        va="bottom",
        fontsize=9,
        fontweight="bold",
    )

ax.set_title("Acme Corp — Annual Revenue by Region", fontsize=14, fontweight="bold")
ax.set_xlabel("Region", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.set_ylim(0, regional_revenue["revenue"].max() * 1.15)  # Leave room for data labels
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

Bar chart rules: - Always start the y-axis at zero. A truncated y-axis makes small differences look enormous and is one of the most common ways charts mislead. - Sort bars by value (descending) unless category order carries inherent meaning (e.g., time periods, ordinal scales). - Remove the top and right spines (ax.spines["top"].set_visible(False)) for a cleaner look.

14.5.2 Grouped Bar Chart

quarters = ["Q1", "Q2", "Q3", "Q4"]
north_rev = [78000, 85000, 90000, 95000]
south_rev = [45000, 52000, 48000, 55000]

x = range(len(quarters))
width = 0.35

fig, ax = plt.subplots(figsize=(9, 5))

bars_n = ax.bar([i - width/2 for i in x], north_rev, width=width,
                label="North", color="#2563EB")
bars_s = ax.bar([i + width/2 for i in x], south_rev, width=width,
                label="South", color="#16A34A")

ax.set_title("Acme Corp — Quarterly Revenue: North vs. South", fontsize=14, fontweight="bold")
ax.set_xlabel("Quarter", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.set_xticks(list(x))
ax.set_xticklabels(quarters)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.legend()
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

14.5.3 Stacked Bar Chart

Stacked bars show both total magnitude and part-to-whole composition in a single chart:

categories = ["Software", "Hardware", "Services"]
north = [180000, 85000, 63000]
south = [95000, 62000, 39000]

fig, ax = plt.subplots(figsize=(8, 5))

ax.bar(categories, north, label="North", color="#2563EB")
ax.bar(categories, south, bottom=north, label="South", color="#93C5FD")

ax.set_title("Acme Corp — Revenue by Product Category and Region", fontsize=14, fontweight="bold")
ax.set_xlabel("Product Category", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.legend()
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

14.6 Horizontal Bar Charts: Long Category Names

When category labels are long (product names, customer names, job titles), a horizontal bar chart is almost always easier to read than a vertical one. Labels appear along the y-axis with plenty of room, and the reader's eye naturally follows the bars from left to right.

products = pd.DataFrame({
    "product": [
        "Enterprise Cloud Suite",
        "Professional Services Bundle",
        "Data Analytics Platform",
        "Legacy Support Contract",
        "Hardware Refresh Package",
        "Security Compliance Module",
    ],
    "revenue": [285000, 198000, 176000, 143000, 121000, 98000],
})

products = products.sort_values("revenue")  # Sort ascending — largest at top visually

fig, ax = plt.subplots(figsize=(9, 5))

bars = ax.barh(
    products["product"],
    products["revenue"],
    color="#2563EB",
    edgecolor="white",
    height=0.6,
)

# Add data labels at end of each bar
for bar in bars:
    width = bar.get_width()
    ax.text(
        width + 3000,
        bar.get_y() + bar.get_height() / 2.0,
        f"${width:,.0f}",
        va="center",
        fontsize=9,
    )

ax.set_title("Acme Corp — Revenue by Product Line", fontsize=14, fontweight="bold")
ax.set_xlabel("Revenue (USD)", fontsize=11)
ax.set_ylabel("")  # Category names are self-explanatory
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.set_xlim(0, products["revenue"].max() * 1.18)
ax.grid(axis="x", linestyle="--", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

Use sort_values(ascending=True) before plotting so that the highest value appears at the top of the chart (which is the most prominent position for a horizontal bar).


14.7 Histograms: Showing Distributions

A histogram answers the question "how are my values distributed?" It divides the range of values into equal-width bins and shows how many observations fall into each bin. This is different from a bar chart, which compares named categories.

14.7.1 Basic Histogram

import numpy as np

# Simulate order values for Acme Corp
np.random.seed(42)
order_values = np.concatenate([
    np.random.normal(loc=3500, scale=800, size=150),   # Mid-range orders
    np.random.normal(loc=8500, scale=1200, size=50),   # Large orders
])
order_values = order_values[order_values > 0]

fig, ax = plt.subplots(figsize=(9, 5))

n, bins, patches = ax.hist(
    order_values,
    bins=25,
    color="#2563EB",
    edgecolor="white",
    linewidth=0.5,
    alpha=0.85,
)

# Add vertical lines for mean and median
mean_val = np.mean(order_values)
median_val = np.median(order_values)

ax.axvline(mean_val, color="#DC2626", linewidth=2, linestyle="--", label=f"Mean: ${mean_val:,.0f}")
ax.axvline(median_val, color="#16A34A", linewidth=2, linestyle="-.", label=f"Median: ${median_val:,.0f}")

ax.set_title("Acme Corp — Order Value Distribution", fontsize=14, fontweight="bold")
ax.set_xlabel("Order Value (USD)", fontsize=11)
ax.set_ylabel("Number of Orders", fontsize=11)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.legend()
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

When mean and median diverge, it indicates a skewed distribution. In this chart, the bimodal shape (two humps) reveals that Acme has two distinct customer segments: mid-range buyers and enterprise buyers.

14.7.2 Choosing the Number of Bins

The number of bins (bins= parameter) significantly affects how the histogram looks:

  • Too few bins (e.g., 5): oversimplifies, obscures the shape
  • Too many bins (e.g., 100): too noisy, hard to read
  • A good rule of thumb for business data: bins = int(len(data)**0.5) or simply experiment with 15–30 bins

14.8 Scatter Plots: Showing Relationships

Scatter plots reveal whether two continuous variables are related. Each point represents one observation, with its x-position encoding one variable and y-position encoding another.

14.8.1 Basic Scatter Plot

marketing_data = pd.DataFrame({
    "marketing_spend": [5000, 8000, 12000, 15000, 6000, 9500, 18000, 11000, 7500, 14000],
    "revenue": [42000, 58000, 78000, 95000, 48000, 62000, 112000, 74000, 52000, 88000],
    "region": ["North","South","North","West","East","South","North","West","East","North"],
})

REGION_COLORS = {
    "North": "#2563EB",
    "South": "#16A34A",
    "West":  "#D97706",
    "East":  "#DC2626",
}

fig, ax = plt.subplots(figsize=(9, 6))

for region, group in marketing_data.groupby("region"):
    ax.scatter(
        group["marketing_spend"],
        group["revenue"],
        c=REGION_COLORS[region],
        label=region,
        s=80,          # Marker size
        alpha=0.85,
        edgecolors="white",
        linewidths=0.5,
    )

ax.set_title("Marketing Spend vs. Revenue by Region", fontsize=14, fontweight="bold")
ax.set_xlabel("Marketing Spend (USD)", fontsize=11)
ax.set_ylabel("Revenue (USD)", fontsize=11)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.legend(title="Region")
ax.grid(linestyle="--", alpha=0.3)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

14.8.2 Adding a Trend Line

import numpy as np

x = marketing_data["marketing_spend"].values
y = marketing_data["revenue"].values

# Fit a linear trend line
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
x_line = np.linspace(x.min(), x.max(), 100)

ax.plot(x_line, p(x_line), color="gray", linewidth=1.5,
        linestyle="--", label="Trend", zorder=0)

14.8.3 When NOT to Use a Scatter Plot

Scatter plots require continuous data on both axes. They are not appropriate for: - Comparing categories (use bar charts) - Showing how one variable changes over time when the time axis is non-continuous - Data with fewer than 10 points (consider a table instead)


14.9 Pie Charts: Sparingly and Appropriately

Pie charts show parts-of-a-whole. They work best when: - There are 5 or fewer categories - The differences between segments are meaningful (not all similar sizes) - You want to emphasize one segment's dominance

They are frequently overused. If you want to compare values, a bar chart is almost always clearer — human perception of angles and arc lengths is less accurate than perception of bar heights.

tier_revenue = pd.DataFrame({
    "tier": ["Gold", "Silver", "Bronze"],
    "revenue": [485000, 298000, 127000],
})

fig, ax = plt.subplots(figsize=(7, 7))

wedges, texts, autotexts = ax.pie(
    tier_revenue["revenue"],
    labels=tier_revenue["tier"],
    autopct="%1.1f%%",
    colors=["#F59E0B", "#94A3B8", "#92400E"],
    startangle=90,
    wedgeprops={"edgecolor": "white", "linewidth": 2},
    explode=[0.04, 0, 0],   # Slightly separate the Gold slice
)

for autotext in autotexts:
    autotext.set_fontsize(11)
    autotext.set_fontweight("bold")

ax.set_title("Revenue by Customer Tier — 2024", fontsize=14, fontweight="bold")

plt.tight_layout()
plt.show()

When to use a bar chart instead: If you need to compare three tiers and the exact percentages matter, a horizontal bar chart with data labels is usually clearer. The pie chart is better suited to "Gold is approximately half of all revenue" type messages.


14.10 Saving Figures with .savefig()

A chart that lives only in a Jupyter notebook or a temporary plt.show() window does not help anyone. Use .savefig() to produce a file that can be shared, embedded in reports, or versioned.

fig.savefig(
    "acme_revenue_chart.png",
    dpi=150,              # Dots per inch: 72 for draft, 150 for screen, 300 for print
    bbox_inches="tight",  # Trim whitespace; prevents label clipping
    facecolor="white",    # Ensures white background (not transparent)
    format="png",         # Explicit format: "png", "svg", "pdf", "jpg"
)

Format guidance:

Format Use Case
PNG Presentations, emails, web embeds — most common
SVG Scalable vector; ideal for web and print; editable in Illustrator
PDF Reports and documents; perfect quality at any size
JPG Photos; avoid for charts — compression artifacts blur text and lines

DPI guidance:

  • 72 DPI: Screen resolution minimum (draft use only)
  • 150 DPI: Good quality for digital documents and presentations
  • 300 DPI: Print-ready quality

14.11 Formatting: Colors, Styles, Fonts, and Grids

14.11.1 Colors

matplotlib accepts colors in multiple formats: - Named colors: "blue", "red", "forestgreen" - Hex codes: "#2563EB", "#DC2626" - RGB tuples: (0.145, 0.388, 0.922) - CSS colors: work the same as named colors

For business charts, choose a consistent palette and stick to it across all charts in the same report. Acme Corp uses blue as its primary brand color, so Priya uses #2563EB (a medium blue) throughout her dashboards.

Colorblind-accessible palettes:

About 8% of men and 0.5% of women have color vision deficiency. Avoid encoding meaning through red-green distinction alone. The IBM Design Color Palette and the ColorBrewer palettes (available via the seaborn library) are designed for accessibility.

14.11.2 Line Styles

ax.plot(x, y1, linestyle="-",   label="Solid")
ax.plot(x, y2, linestyle="--",  label="Dashed")
ax.plot(x, y3, linestyle="-.",  label="Dash-dot")
ax.plot(x, y4, linestyle=":",   label="Dotted")

14.11.3 Markers

ax.plot(x, y, marker="o",   label="Circle")   # Most common
ax.plot(x, y, marker="s",   label="Square")
ax.plot(x, y, marker="^",   label="Triangle up")
ax.plot(x, y, marker="D",   label="Diamond")
ax.plot(x, y, marker="x",   label="X")

14.11.4 Fonts

# Set font size for a specific element
ax.set_title("Chart Title", fontsize=14, fontweight="bold")
ax.set_xlabel("X Label", fontsize=11)
ax.tick_params(axis="both", labelsize=9)

# Set font size globally for the entire figure
plt.rcParams.update({
    "font.size": 10,
    "axes.titlesize": 14,
    "axes.labelsize": 11,
})

14.11.5 Grid Lines

ax.grid(True)                           # Both axes, default style
ax.grid(axis="y")                       # Horizontal only (for bar/line charts)
ax.grid(axis="x")                       # Vertical only (for horizontal bar charts)
ax.grid(linestyle="--", alpha=0.5)     # Dashed, semi-transparent
ax.grid(linestyle="--", color="gray", alpha=0.3)  # Custom color

For most business charts, light horizontal grid lines (axis="y", alpha=0.3–0.5) improve readability without visual clutter.

14.11.6 Cleaning Up Spines

The four "spines" are the borders of the plot area. Removing the top and right spines gives charts a cleaner, more modern look:

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

14.12 Multiple Subplots: plt.subplots(nrows, ncols)

A dashboard is a set of related charts on a single figure. Use plt.subplots() to create a grid of Axes objects.

14.12.1 Creating a 2×2 Dashboard

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(14, 10))
fig.suptitle("Acme Corp — Sales Dashboard Q1 2024", fontsize=16, fontweight="bold", y=1.02)

# axes is a 2D NumPy array: axes[row][col]
ax_line   = axes[0][0]  # Top-left
ax_bar    = axes[0][1]  # Top-right
ax_hist   = axes[1][0]  # Bottom-left
ax_hbar   = axes[1][1]  # Bottom-right

# ── Panel 1: Monthly Revenue Line ─────────────────────────────────────────
months = ["Jan", "Feb", "Mar"]
revenue = [132000, 148000, 168000]

ax_line.plot(months, revenue, color="#2563EB", linewidth=2, marker="o", markersize=8)
ax_line.set_title("Monthly Revenue Trend", fontsize=12, fontweight="bold")
ax_line.set_ylabel("Revenue (USD)")
ax_line.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
ax_line.grid(axis="y", linestyle="--", alpha=0.4)
ax_line.spines["top"].set_visible(False)
ax_line.spines["right"].set_visible(False)

# ── Panel 2: Revenue by Region Bar ────────────────────────────────────────
regions = ["North", "South", "East", "West"]
rev_by_region = [168000, 110000, 88000, 82000]

ax_bar.bar(regions, rev_by_region, color="#2563EB", edgecolor="white", width=0.6)
ax_bar.set_title("Revenue by Region", fontsize=12, fontweight="bold")
ax_bar.set_ylabel("Revenue (USD)")
ax_bar.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
ax_bar.grid(axis="y", linestyle="--", alpha=0.4)
ax_bar.spines["top"].set_visible(False)
ax_bar.spines["right"].set_visible(False)

# ── Panel 3: Order Value Distribution ─────────────────────────────────────
import numpy as np
np.random.seed(7)
order_vals = np.concatenate([
    np.random.normal(3500, 700, 120),
    np.random.normal(8000, 1100, 40),
])
order_vals = order_vals[order_vals > 0]

ax_hist.hist(order_vals, bins=20, color="#2563EB", edgecolor="white", alpha=0.85)
ax_hist.set_title("Order Value Distribution", fontsize=12, fontweight="bold")
ax_hist.set_xlabel("Order Value (USD)")
ax_hist.set_ylabel("Number of Orders")
ax_hist.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
ax_hist.grid(axis="y", linestyle="--", alpha=0.4)
ax_hist.spines["top"].set_visible(False)
ax_hist.spines["right"].set_visible(False)

# ── Panel 4: Product Revenue Horizontal Bar ────────────────────────────────
products = ["Enterprise Cloud\nSuite", "Professional\nServices", "Data Analytics\nPlatform",
            "Security Module", "Legacy Support"]
prod_rev = [198000, 145000, 128000, 88000, 72000]

# Sort ascending so highest value is at top
order = sorted(range(len(prod_rev)), key=lambda i: prod_rev[i])
sorted_products = [products[i] for i in order]
sorted_rev = [prod_rev[i] for i in order]

ax_hbar.barh(sorted_products, sorted_rev, color="#2563EB", edgecolor="white", height=0.6)
ax_hbar.set_title("Revenue by Product Line", fontsize=12, fontweight="bold")
ax_hbar.set_xlabel("Revenue (USD)")
ax_hbar.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
ax_hbar.grid(axis="x", linestyle="--", alpha=0.4)
ax_hbar.spines["top"].set_visible(False)
ax_hbar.spines["right"].set_visible(False)

plt.tight_layout()
plt.savefig("acme_dashboard.png", dpi=150, bbox_inches="tight", facecolor="white")
plt.show()

14.12.2 Uneven Subplot Layouts

For figures where one panel should be wider or taller than the others, use gridspec:

from matplotlib.gridspec import GridSpec

fig = plt.figure(figsize=(14, 8))
gs = GridSpec(2, 2, figure=fig, hspace=0.4, wspace=0.3)

ax_top = fig.add_subplot(gs[0, :])  # Full-width top panel
ax_bl  = fig.add_subplot(gs[1, 0])  # Bottom-left
ax_br  = fig.add_subplot(gs[1, 1])  # Bottom-right

14.13 Integrating with pandas: df.plot()

pandas has a built-in .plot() method that calls matplotlib under the hood. For quick exploratory charts, .plot() is faster to write; for publication-quality charts, the full matplotlib API gives you more control.

import pandas as pd
import matplotlib.pyplot as plt

monthly_summary = pd.DataFrame({
    "month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
    "revenue": [132000, 148000, 168000, 155000, 172000, 195000],
    "cost":    [52000,  58000,  65000,  60000,  66000,  73000],
})
monthly_summary = monthly_summary.set_index("month")

# Line chart using df.plot()
ax = monthly_summary[["revenue", "cost"]].plot(
    kind="line",
    figsize=(10, 5),
    color={"revenue": "#2563EB", "cost": "#DC2626"},
    linewidth=2,
    marker="o",
    title="Acme Corp — Revenue vs. Cost (Jan–Jun 2024)",
)
ax.set_ylabel("Amount (USD)")
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
ax.grid(axis="y", linestyle="--", alpha=0.4)
plt.tight_layout()
plt.show()

# Bar chart using df.plot()
ax2 = monthly_summary["revenue"].plot(
    kind="bar",
    figsize=(9, 5),
    color="#2563EB",
    title="Monthly Revenue",
    rot=0,    # Don't rotate x-tick labels
)
ax2.set_ylabel("Revenue (USD)")
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"${x/1000:.0f}K"))
plt.tight_layout()
plt.show()

Available kind= values in df.plot():

kind Chart type
"line" Line chart (default)
"bar" Vertical bar chart
"barh" Horizontal bar chart
"hist" Histogram
"scatter" Scatter plot (requires x= and y= kwargs)
"pie" Pie chart
"area" Stacked area chart
"box" Box-and-whisker plot
"kde" Kernel density estimate

14.14 Choosing the Right Chart Type

Analysis Question Best Chart Type
How has revenue changed over time? Line chart
Which region has the most revenue? Bar chart (vertical)
How does revenue compare across products with long names? Horizontal bar chart
What is the distribution of order sizes? Histogram
Is there a relationship between marketing spend and revenue? Scatter plot
What proportion of revenue comes from each tier? Bar chart or pie (if ≤5 categories)
How does revenue by region compare across quarters? Grouped bar chart
How does each category contribute to the total over time? Stacked bar chart or area chart
How are two groups distributed compared to each other? Overlapping histograms or box plots

14.15 Chapter Summary

matplotlib is the foundation of Python's visualization ecosystem. You started with its three-level architecture (Figure, Axes, pyplot) and the object-oriented interface that scales from single charts to complex dashboards. You applied the "good chart" checklist — titles, labels, appropriate scale, legends, grid lines — to every chart you built.

The core chart types you now know when to use: - Line charts for trends over time - Vertical bar charts for comparing named categories - Horizontal bar charts when category labels are long - Histograms for understanding distributions - Scatter plots for relationships between continuous variables - Pie charts sparingly, for 3–5 category part-to-whole stories

You learned to save publication-quality figures with .savefig(), apply consistent formatting with colors and fonts, and compose multi-panel dashboards with plt.subplots(). Finally, you saw how to generate charts quickly from pandas DataFrames using .plot().

These tools are sufficient to produce every standard chart type that appears in business reporting. In Chapter 15, you will extend this foundation to seaborn — a higher-level visualization library that handles statistical charts and styling with less boilerplate.


Key Terms

Figure — The top-level container in matplotlib, representing the entire canvas.

Axes — A single chart within a Figure; contains the x-axis, y-axis, plot area, and all decorators (title, labels, legend).

pyplot — The matplotlib.pyplot module; provides a state-based interface that tracks the current Figure and Axes automatically.

DPI — Dots Per Inch; controls image resolution when saving. 150 DPI for digital use; 300 DPI for print.

Spine — One of the four borders of the Axes plot area (top, bottom, left, right).

Tick formatter — A function or format string that controls how axis tick labels are displayed (e.g., $42,000 instead of 42000).

Subplot — One Axes within a Figure that contains multiple Axes arranged in a grid.

Legend — A box identifying which line style or color corresponds to which data series.

Bin — A fixed-width interval used to group continuous values in a histogram.