Key Takeaways — Chapter 11: Essential Chart Types

DataField.Dev

Key Takeaways — Chapter 11: Essential Chart Types

1. Five Methods Cover Most of the Work

ax.plot() for line charts (change over time, continuous relationships). ax.bar() / ax.barh() for bar charts (comparison across categories). ax.scatter() for scatter plots (relationships between two continuous variables). ax.hist() for histograms (distributions of one continuous variable). ax.boxplot() for box plots (distribution summaries across groups). These five methods handle the vast majority of real-world visualization work, and each has direct correspondence to a question type in the Chapter 5 chart selection matrix.

2. Every Parameter Is a Design Decision

The threshold concept: every parameter in every plot method — color, linewidth, marker, alpha, edgecolor, cmap, s, label — is a design decision from Parts I and II. color implements Chapter 3 palette choices. linewidth implements Chapter 6 visual weight. alpha manages Chapter 2 pre-attentive processing through transparency. cmap implements perceptual uniformity. Learning matplotlib is learning how to translate design principles into method calls, not memorizing syntax.

3. `ax.plot()` Is Both Line and Dot Plot (But `ax.scatter()` Is Better for Dots)

ax.plot(x, y) creates a line chart. ax.plot(x, y, marker="o", linestyle="None") creates a dot plot. Both work for simple cases. However, ax.scatter() is preferred for dot plots because it supports per-point color (c) and size (s) encodings through arrays, enabling bubble charts and color-mapped scatter plots that ax.plot() cannot produce efficiently.

4. Bar Charts Must Enforce the Zero Baseline

Chapter 4's rule that bar charts must start at zero (because the bar length is the encoding) is enforced in matplotlib with ax.set_ylim(0, max_value * 1.1). matplotlib's default autoscaling can start the axis at a non-zero value, which distorts the length comparison. For publication-quality bar charts, always set the y-limit explicitly. Horizontal bar charts (ax.barh) are preferred for long category labels and many categories.

5. Managing Overplotting Is a Core Scatter-Plot Skill

Dense scatter plots become unreadable when points overlap. The techniques: transparency (alpha=0.3-0.6) for a few hundred to a few thousand points, so overlap shows as darker regions; smaller markers (s=5) for medium density; hexagonal binning (ax.hexbin) or 2D histograms for very large datasets where individual points cannot be distinguished. Choose the technique based on how many points you have and what you want the reader to see.

6. Bin Count Matters for Histograms

A histogram's visual shape depends strongly on bin count. Too few bins (fewer than ~10) oversimplify the distribution. Too many bins (more than ~100 for most datasets) add noise. 20-50 bins is the typical sweet spot. Statistical rules (Sturges, Scott, Freedman-Diaconis) provide principled choices, but trial and error is also acceptable. For comparing distributions of different sizes, use density=True to normalize.

7. Box Plots Are for Summary Comparison Across Groups

Box plots compress a distribution into five numbers (median, quartiles, whiskers, outliers) and are most useful for comparing distributions across groups in a small space. They hide bimodality and fine structure, so pair them with histograms or violin plots when those details matter. matplotlib's box plot API is verbose; seaborn's sns.boxplot() is simpler for complex customization (we will cover seaborn in Part IV).

8. Uncertainty Should Always Be Visible

Chapter 4 established that hiding uncertainty is a form of visualization dishonesty. matplotlib's ax.errorbar() method adds error bars to line and scatter charts. ax.fill_between() creates shaded confidence bands for continuous time series. Bar charts accept yerr for error bars on each bar. Whenever your data is a measurement or an estimate (which is nearly always), include some indication of uncertainty — it is the difference between a chart that respects the reader and one that misleads them.

9. Use Pandas Plot Methods for Exploration, OO for Production

df.plot.line(), df.plot.bar(), df.plot.scatter(), df.hist(), df.boxplot() are convenient shortcuts that produce matplotlib output internally. They are fine for quick exploratory charts in a Jupyter notebook. For production-quality code, prefer the explicit fig, ax = plt.subplots() pattern because it gives you more control, makes the Figure and Axes references visible, and works better in reusable functions and multi-panel figures. Both produce the same matplotlib output under the hood.

10. One Dataset, Five Chart Types, Five Different Answers

Section 11.9's climate example demonstrates the Chapter 5 principle in matplotlib code: the same dataset produces a line chart (change over time), a bar chart (decade averages — comparison), a scatter plot (CO2 vs. temperature — relationship), a histogram (distribution of annual anomalies), and a box plot (variation by decade — distribution by group). Each chart answers a different question. The choice of chart type is a design decision that precedes the code; the matplotlib API is the instrument that translates the decision into a rendered figure.

Key Takeaways — Chapter 11: Essential Chart Types

1. Five Methods Cover Most of the Work

2. Every Parameter Is a Design Decision

3. ax.plot() Is Both Line and Dot Plot (But ax.scatter() Is Better for Dots)

4. Bar Charts Must Enforce the Zero Baseline

5. Managing Overplotting Is a Core Scatter-Plot Skill

6. Bin Count Matters for Histograms

7. Box Plots Are for Summary Comparison Across Groups

8. Uncertainty Should Always Be Visible

9. Use Pandas Plot Methods for Exploration, OO for Production

10. One Dataset, Five Chart Types, Five Different Answers

3. `ax.plot()` Is Both Line and Dot Plot (But `ax.scatter()` Is Better for Dots)