Chapter 14 Key Takeaways: Introduction to Data Visualization with matplotlib
The Big Idea
A well-designed chart communicates in seconds what a table communicates in minutes. matplotlib gives you complete control over every element of a business visualization — from the chart type and colors to the tick formatter and the DPI of the saved file. Mastering it means you can produce publication-quality charts that accurately represent your data and support your audience's decisions.
Core Concepts
1. The matplotlib Architecture
matplotlib organizes visualizations in a three-level hierarchy:
- Figure — the overall canvas, created with
plt.figure()orplt.subplots() - Axes — a single chart within the Figure; the object on which you call all drawing and formatting methods
- pyplot (
plt) — a convenience interface that tracks the "current" Figure and Axes; useful for simple scripts
Always prefer the object-oriented interface (fig, ax = plt.subplots()) for business work. It is explicit, readable, and scales cleanly to multi-panel dashboards.
2. The Good Chart Checklist
Every chart you produce for a business audience should have:
- A clear, specific title
- Axis labels with units for both axes
- Appropriate scale — bar charts must start at zero; line charts can use a meaningful range
- A legend when more than one series is shown (omit it when it would be redundant)
- Light horizontal grid lines to support quantitative comparisons
- Spine cleanup — remove the top and right spines for a cleaner look
- plt.tight_layout() to prevent label clipping
- A sufficient DPI (150 for digital; 300 for print)
Run this checklist mentally while building every chart, not as a final review.
3. Choosing the Right Chart Type
| Question | Chart |
|---|---|
| How has revenue changed over time? | Line chart |
| Which category is largest? | Bar chart (vertical) |
| Which product has the most revenue (long names)? | Horizontal bar chart |
| How are my order values distributed? | Histogram |
| Is there a relationship between spend and revenue? | Scatter plot |
| What fraction of revenue comes from each tier? | Bar chart or pie (≤5 categories) |
| How do two periods compare across categories? | Grouped bar chart |
The single most common charting mistake in business is using a pie chart when a bar chart would communicate the comparison more accurately.
4. Line Charts
Line charts imply continuity between points, making them ideal for time series. Key practices:
- Use marker="o" and markersize=5–8 so individual data points are visible
- Add a rolling average line (color=gray, linestyle="--") to show the trend through noise
- Format the y-axis as currency using plt.FuncFormatter
- Use ax.fill_between() for a light shaded area below the line (optional but polished)
- Annotate key points with ax.annotate() for peak/trough highlights
5. Bar Charts
Bar charts encode magnitude as bar height — the most accurate visual encoding for comparison. Rules:
- Always start the y-axis at zero. A truncated y-axis distorts relative magnitudes.
- Sort bars by value (descending) unless category order has inherent meaning.
- Add data labels above each bar for precision.
- Use ax.spines["top/right"].set_visible(False) to clean up the frame.
- For stacked bars, pass the cumulative base via the bottom parameter.
- For grouped bars, manually offset x-positions by ±width/2.
6. Horizontal Bar Charts
Use horizontal bars when category names are long. Sort ascending=True before plotting so the highest value appears at the top (the most visually prominent position).
7. Histograms
Histograms reveal the distribution of a continuous variable. Key choices:
- bins=15–30 is typically right for business data; adjust by eye
- Add axvline markers for mean and median to give viewers a reference
- The gap between mean and median indicates skewness
- Do not confuse histograms (continuous distributions) with bar charts (discrete categories)
8. Scatter Plots
Scatter plots reveal relationships between two continuous variables. Best practices:
- Add a trend line using numpy.polyfit and ax.plot()
- Display the R² value as a text annotation for context
- Use the s parameter to encode a third variable as bubble size
- Color-code points by category for a fourth variable
9. Saving Figures
Always save charts with:
fig.savefig("filename.png", dpi=150, bbox_inches="tight", facecolor="white")
dpi=150for digital delivery;dpi=300for printbbox_inches="tight"prevents axis labels from being clippedfacecolor="white"ensures a white background (not transparent)- Call
savefig()beforeplt.show()in scripts
10. Multi-Panel Dashboards
Use plt.subplots(nrows, ncols) to create grids of charts:
fig, axes = plt.subplots(2, 2, figsize=(14, 9))
ax_tl = axes[0][0] # top-left
ax_tr = axes[0][1] # top-right
ax_bl = axes[1][0] # bottom-left
ax_br = axes[1][1] # bottom-right
- Name your axes (
ax_line,ax_bar) rather than indexing them - Use
fig.suptitle()for the figure-level title - Call
plt.tight_layout()orplt.tight_layout(rect=[0, 0, 1, 0.96])after building all panels - Use
gridspecfor uneven layouts where one panel needs more space
11. pandas .plot() Integration
For quick exploratory charts, df.plot() is faster:
df.set_index("month")["revenue"].plot(kind="line", figsize=(10, 5), color="blue")
For publication-quality work, use the full matplotlib API for precision control over every element.
Common Mistakes to Avoid
Truncating the y-axis on bar charts. Even a 5% difference looks huge if the y-axis starts at 95% of the minimum value. Always start at zero.
Too many lines on one chart. Four to five lines is the practical maximum. More than five makes colors indistinguishable and the legend overwhelming.
Forgetting plt.tight_layout(). Axis labels and titles clip outside the figure boundary without it.
Not calling plt.close() in loops. If you generate many charts in a loop, always call plt.close() after saving each one to free memory.
Using default colors for all series. matplotlib's default blue is fine for a single-series chart, but if you have four regions, assign consistent, meaningful colors across your entire report.
Saving without facecolor="white". The default figure background is transparent. In email clients and some PDF viewers, transparent renders as black.
The Visualization → Decision Loop
The ultimate test of a business chart is whether it supports a decision. Ask yourself before sending any chart:
- What question does this chart answer?
- Can a busy executive read the answer in 5 seconds?
- Is any element of this chart potentially misleading?
- Have I applied the good chart checklist?
If you can answer "yes" to questions 1, 2, and 4 and "no" to question 3, the chart is ready.
What Comes Next
Chapter 15 introduces seaborn — a higher-level visualization library built on top of matplotlib. seaborn handles many of the formatting details automatically and excels at statistical charts (distribution plots, correlation heatmaps, categorical plots). Once you understand matplotlib's foundations from this chapter, seaborn is a natural and efficient extension that handles many routine formatting tasks with less code.