Chapter 14 Key Takeaways: Introduction to Data Visualization with matplotlib

The Big Idea

A well-designed chart communicates in seconds what a table communicates in minutes. matplotlib gives you complete control over every element of a business visualization — from the chart type and colors to the tick formatter and the DPI of the saved file. Mastering it means you can produce publication-quality charts that accurately represent your data and support your audience's decisions.


Core Concepts

1. The matplotlib Architecture

matplotlib organizes visualizations in a three-level hierarchy:

  • Figure — the overall canvas, created with plt.figure() or plt.subplots()
  • Axes — a single chart within the Figure; the object on which you call all drawing and formatting methods
  • pyplot (plt) — a convenience interface that tracks the "current" Figure and Axes; useful for simple scripts

Always prefer the object-oriented interface (fig, ax = plt.subplots()) for business work. It is explicit, readable, and scales cleanly to multi-panel dashboards.

2. The Good Chart Checklist

Every chart you produce for a business audience should have: - A clear, specific title - Axis labels with units for both axes - Appropriate scale — bar charts must start at zero; line charts can use a meaningful range - A legend when more than one series is shown (omit it when it would be redundant) - Light horizontal grid lines to support quantitative comparisons - Spine cleanup — remove the top and right spines for a cleaner look - plt.tight_layout() to prevent label clipping - A sufficient DPI (150 for digital; 300 for print)

Run this checklist mentally while building every chart, not as a final review.

3. Choosing the Right Chart Type

Question Chart
How has revenue changed over time? Line chart
Which category is largest? Bar chart (vertical)
Which product has the most revenue (long names)? Horizontal bar chart
How are my order values distributed? Histogram
Is there a relationship between spend and revenue? Scatter plot
What fraction of revenue comes from each tier? Bar chart or pie (≤5 categories)
How do two periods compare across categories? Grouped bar chart

The single most common charting mistake in business is using a pie chart when a bar chart would communicate the comparison more accurately.

4. Line Charts

Line charts imply continuity between points, making them ideal for time series. Key practices: - Use marker="o" and markersize=5–8 so individual data points are visible - Add a rolling average line (color=gray, linestyle="--") to show the trend through noise - Format the y-axis as currency using plt.FuncFormatter - Use ax.fill_between() for a light shaded area below the line (optional but polished) - Annotate key points with ax.annotate() for peak/trough highlights

5. Bar Charts

Bar charts encode magnitude as bar height — the most accurate visual encoding for comparison. Rules: - Always start the y-axis at zero. A truncated y-axis distorts relative magnitudes. - Sort bars by value (descending) unless category order has inherent meaning. - Add data labels above each bar for precision. - Use ax.spines["top/right"].set_visible(False) to clean up the frame. - For stacked bars, pass the cumulative base via the bottom parameter. - For grouped bars, manually offset x-positions by ±width/2.

6. Horizontal Bar Charts

Use horizontal bars when category names are long. Sort ascending=True before plotting so the highest value appears at the top (the most visually prominent position).

7. Histograms

Histograms reveal the distribution of a continuous variable. Key choices: - bins=15–30 is typically right for business data; adjust by eye - Add axvline markers for mean and median to give viewers a reference - The gap between mean and median indicates skewness - Do not confuse histograms (continuous distributions) with bar charts (discrete categories)

8. Scatter Plots

Scatter plots reveal relationships between two continuous variables. Best practices: - Add a trend line using numpy.polyfit and ax.plot() - Display the R² value as a text annotation for context - Use the s parameter to encode a third variable as bubble size - Color-code points by category for a fourth variable

9. Saving Figures

Always save charts with:

fig.savefig("filename.png", dpi=150, bbox_inches="tight", facecolor="white")
  • dpi=150 for digital delivery; dpi=300 for print
  • bbox_inches="tight" prevents axis labels from being clipped
  • facecolor="white" ensures a white background (not transparent)
  • Call savefig() before plt.show() in scripts

10. Multi-Panel Dashboards

Use plt.subplots(nrows, ncols) to create grids of charts:

fig, axes = plt.subplots(2, 2, figsize=(14, 9))
ax_tl = axes[0][0]  # top-left
ax_tr = axes[0][1]  # top-right
ax_bl = axes[1][0]  # bottom-left
ax_br = axes[1][1]  # bottom-right
  • Name your axes (ax_line, ax_bar) rather than indexing them
  • Use fig.suptitle() for the figure-level title
  • Call plt.tight_layout() or plt.tight_layout(rect=[0, 0, 1, 0.96]) after building all panels
  • Use gridspec for uneven layouts where one panel needs more space

11. pandas .plot() Integration

For quick exploratory charts, df.plot() is faster:

df.set_index("month")["revenue"].plot(kind="line", figsize=(10, 5), color="blue")

For publication-quality work, use the full matplotlib API for precision control over every element.


Common Mistakes to Avoid

Truncating the y-axis on bar charts. Even a 5% difference looks huge if the y-axis starts at 95% of the minimum value. Always start at zero.

Too many lines on one chart. Four to five lines is the practical maximum. More than five makes colors indistinguishable and the legend overwhelming.

Forgetting plt.tight_layout(). Axis labels and titles clip outside the figure boundary without it.

Not calling plt.close() in loops. If you generate many charts in a loop, always call plt.close() after saving each one to free memory.

Using default colors for all series. matplotlib's default blue is fine for a single-series chart, but if you have four regions, assign consistent, meaningful colors across your entire report.

Saving without facecolor="white". The default figure background is transparent. In email clients and some PDF viewers, transparent renders as black.


The Visualization → Decision Loop

The ultimate test of a business chart is whether it supports a decision. Ask yourself before sending any chart:

  1. What question does this chart answer?
  2. Can a busy executive read the answer in 5 seconds?
  3. Is any element of this chart potentially misleading?
  4. Have I applied the good chart checklist?

If you can answer "yes" to questions 1, 2, and 4 and "no" to question 3, the chart is ready.


What Comes Next

Chapter 15 introduces seaborn — a higher-level visualization library built on top of matplotlib. seaborn handles many of the formatting details automatically and excels at statistical charts (distribution plots, correlation heatmaps, categorical plots). Once you understand matplotlib's foundations from this chapter, seaborn is a natural and efficient extension that handles many routine formatting tasks with less code.