Key Takeaways: The Grammar of Graphics
This is your reference card for Chapter 14. It's the conceptual foundation for everything in Part III — the next four chapters will add tools, but the thinking starts here. Keep this handy whenever you're designing a chart.
Key Concepts
-
Charts are encodings, not pictures. Every visual element in a well-designed chart represents data. A bar's height encodes a number. A point's position encodes two numbers. Color encodes a category or a value. Once you see charts as data encodings, you can read any chart and build any chart.
-
The grammar of graphics gives charts a structure. Every chart is built from six components: data, aesthetic mappings, geometric objects, scales, coordinate systems, and facets. Understanding these components lets you construct any chart from modular pieces rather than memorizing a catalog of chart types.
-
Aesthetic mappings are the heart of every chart. The connection between a data variable and a visual property (position, color, size, shape) is what makes a chart mean something. When evaluating any chart, the first question is always: what maps to what?
-
There is no single "best" chart type — but there are wrong ones. The right chart depends on your question, your data, and your audience. The wrong chart is one whose visual encoding doesn't match what you're trying to show (e.g., a line chart for categorical data, or a pie chart with 12 slices).
-
Exploratory and explanatory visualization are different activities. Exploratory charts are quick, rough, and for your own understanding. Explanatory charts are polished, focused, and for an audience. Don't spend time polishing exploratory charts, and don't rush explanatory ones.
-
Charts can lie without showing false numbers. Truncated axes, cherry-picked time ranges, dual y-axes, and area distortion can make truthful data tell a misleading visual story. Being chart-literate means knowing how to spot these techniques.
The Grammar of Graphics Components
| Component | What It Determines | Example |
|---|---|---|
| Data | What are we plotting? | vaccination_df, filtered to 2023 |
| Aesthetic mapping | Which variables connect to which visual properties? | x = region, y = rate, color = income group |
| Geometric object | What mark represents the data? | Bar, point, line, area, box |
| Scale | How do data values translate to visual values? | y-axis 0-100, linear; color palette |
| Coordinate system | What canvas are marks drawn on? | Cartesian, polar, geographic |
| Faceting | Is the data split into subgroups in separate panels? | One panel per WHO region |
Chart Selection Quick Reference
| Your Question Is About... | Use This Chart | Why |
|---|---|---|
| Comparing categories | Bar chart | Length encoding is accurate and intuitive |
| Relationship between two numbers | Scatter plot | Position on two axes reveals patterns |
| Change over time | Line chart | Connecting lines show continuity and trends |
| Shape of a distribution | Histogram | Binned frequencies reveal the data's structure |
| Comparing distributions | Box plot | Compact summary of center, spread, and outliers |
| Parts of a whole | Stacked bar (or pie for 2-3 parts) | Shows proportional composition |
| Patterns in a grid | Heatmap | Color intensity in a matrix reveals structure |
Cleveland & McGill's Encoding Accuracy (Most to Least)
1. Position along a common scale (scatter plots, dot plots)
2. Position along non-aligned scales (faceted panels)
3. Length (bar charts)
4. Direction / Angle (pie charts)
5. Area (bubble charts, treemaps)
6. Volume (3D charts — avoid)
7. Color saturation / Shading (heatmaps)
Takeaway: Use position and length encodings whenever possible. Avoid area and volume encodings. Use color intentionally.
Tufte's Core Principles
- Maximize the data-ink ratio. Remove visual elements that don't represent data: heavy gridlines, borders, backgrounds, decorative effects.
- Eliminate chartjunk. 3D effects, gradient fills, decorative illustrations, and moiré patterns add complexity without information.
- Small multiples are powerful. Faceted panels with shared axes are one of the most effective ways to compare groups.
- Titles should state findings, not topics. Not "Vaccination Rates by Region" but "Sub-Saharan Africa Trails Other Regions by 30 Points."
The Two-Stage Visualization Workflow
STAGE 1: EXPLORE (for yourself)
- Many charts, quickly
- Default settings, no polish
- Goal: discover patterns, check assumptions
- Most charts will be thrown away
STAGE 2: EXPLAIN (for your audience)
- One clear message per chart
- Polished: title, labels, annotations
- High data-ink ratio
- Designed for the specific audience
Five Ways Charts Mislead (and How to Spot Them)
| Technique | How It Misleads | How to Spot It |
|---|---|---|
| Truncated y-axis | Makes small differences look enormous in bar charts | Check if the y-axis starts at zero |
| Cherry-picked time range | Controls which trend the viewer sees | Ask "why does it start/end here?" |
| Dual y-axes | Makes unrelated variables appear correlated | Check for two different y-axis scales |
| Area distortion | Icons scaled in 2D/3D exaggerate differences | Compare actual numbers to visual impression |
| Missing context | Omits benchmarks, baselines, or normalization | Ask "what else would I need to know?" |
The Chart Plan Template
Use this before writing any plotting code:
CHART PLAN
==========
Question: What am I trying to answer or show?
Chart type: Bar / Scatter / Line / Histogram / Other
Data source: Which DataFrame? Any filters or aggregations?
x-axis: Variable name + axis label
y-axis: Variable name + axis label
Color: Variable name (or single color if no variable)
Facets: Split by what variable? (or none)
Title: Finding-based title (not just topic)
Annotations: Callouts, reference lines, labels?
Audience: Exploratory (for me) or Explanatory (for whom)?
Terms to Remember
| Term | Definition |
|---|---|
| Grammar of graphics | Framework describing any chart as a combination of data, aesthetics, geoms, scales, coordinates, and facets |
| Aesthetic mapping | Connection between a data variable and a visual property (position, color, size, shape) |
| Geometric object | The visual mark (point, line, bar) representing data on a chart |
| Scale | Rule translating data values into visual values (axis range, color gradient, etc.) |
| Coordinate system | The canvas: Cartesian, polar, or geographic |
| Faceting | Splitting data into subgroups with one mini-chart per group |
| Bar chart | Bars anchored to a baseline representing categorical comparisons |
| Scatter plot | Points on two axes showing relationships between continuous variables |
| Line chart | Connected points showing trends over a sequential dimension |
| Histogram | Binned bars showing the distribution of a continuous variable |
| Exploratory visualization | Quick, rough charts for discovering patterns (for yourself) |
| Explanatory visualization | Polished charts for communicating findings (for an audience) |
| Data-ink ratio | Proportion of a chart's visual content that represents data (higher is better) |
| Chartjunk | Non-data visual elements that clutter without informing |
What You Should Be Able to Do Now
- [ ] Decompose any chart into its grammar of graphics components (data, aesthetics, geom, scale, coordinates, facets)
- [ ] Select an appropriate chart type for a given question and dataset
- [ ] Sketch a chart on paper using the chart plan template, specifying all key design decisions before coding
- [ ] Distinguish exploratory from explanatory visualization and adjust your workflow accordingly
- [ ] Identify at least three common misleading chart techniques (truncated axis, cherry-picking, dual axes)
- [ ] Apply Tufte's principles to critique a chart's data-ink ratio and identify chartjunk
- [ ] Create chart plans for the progressive project, ready to implement in Chapter 15
If you checked every box, you have the conceptual toolkit that makes Part III possible. Next stop: matplotlib. Time to turn these plans into code.
Next: Chapter 15 — matplotlib Foundations: Building Charts from the Ground Up