Key Takeaways: The Grammar of Graphics

This is your reference card for Chapter 14. It's the conceptual foundation for everything in Part III — the next four chapters will add tools, but the thinking starts here. Keep this handy whenever you're designing a chart.


Key Concepts

  • Charts are encodings, not pictures. Every visual element in a well-designed chart represents data. A bar's height encodes a number. A point's position encodes two numbers. Color encodes a category or a value. Once you see charts as data encodings, you can read any chart and build any chart.

  • The grammar of graphics gives charts a structure. Every chart is built from six components: data, aesthetic mappings, geometric objects, scales, coordinate systems, and facets. Understanding these components lets you construct any chart from modular pieces rather than memorizing a catalog of chart types.

  • Aesthetic mappings are the heart of every chart. The connection between a data variable and a visual property (position, color, size, shape) is what makes a chart mean something. When evaluating any chart, the first question is always: what maps to what?

  • There is no single "best" chart type — but there are wrong ones. The right chart depends on your question, your data, and your audience. The wrong chart is one whose visual encoding doesn't match what you're trying to show (e.g., a line chart for categorical data, or a pie chart with 12 slices).

  • Exploratory and explanatory visualization are different activities. Exploratory charts are quick, rough, and for your own understanding. Explanatory charts are polished, focused, and for an audience. Don't spend time polishing exploratory charts, and don't rush explanatory ones.

  • Charts can lie without showing false numbers. Truncated axes, cherry-picked time ranges, dual y-axes, and area distortion can make truthful data tell a misleading visual story. Being chart-literate means knowing how to spot these techniques.


The Grammar of Graphics Components

Component What It Determines Example
Data What are we plotting? vaccination_df, filtered to 2023
Aesthetic mapping Which variables connect to which visual properties? x = region, y = rate, color = income group
Geometric object What mark represents the data? Bar, point, line, area, box
Scale How do data values translate to visual values? y-axis 0-100, linear; color palette
Coordinate system What canvas are marks drawn on? Cartesian, polar, geographic
Faceting Is the data split into subgroups in separate panels? One panel per WHO region

Chart Selection Quick Reference

Your Question Is About... Use This Chart Why
Comparing categories Bar chart Length encoding is accurate and intuitive
Relationship between two numbers Scatter plot Position on two axes reveals patterns
Change over time Line chart Connecting lines show continuity and trends
Shape of a distribution Histogram Binned frequencies reveal the data's structure
Comparing distributions Box plot Compact summary of center, spread, and outliers
Parts of a whole Stacked bar (or pie for 2-3 parts) Shows proportional composition
Patterns in a grid Heatmap Color intensity in a matrix reveals structure

Cleveland & McGill's Encoding Accuracy (Most to Least)

1. Position along a common scale    (scatter plots, dot plots)
2. Position along non-aligned scales (faceted panels)
3. Length                            (bar charts)
4. Direction / Angle                 (pie charts)
5. Area                              (bubble charts, treemaps)
6. Volume                            (3D charts — avoid)
7. Color saturation / Shading        (heatmaps)

Takeaway: Use position and length encodings whenever possible. Avoid area and volume encodings. Use color intentionally.


Tufte's Core Principles

  • Maximize the data-ink ratio. Remove visual elements that don't represent data: heavy gridlines, borders, backgrounds, decorative effects.
  • Eliminate chartjunk. 3D effects, gradient fills, decorative illustrations, and moiré patterns add complexity without information.
  • Small multiples are powerful. Faceted panels with shared axes are one of the most effective ways to compare groups.
  • Titles should state findings, not topics. Not "Vaccination Rates by Region" but "Sub-Saharan Africa Trails Other Regions by 30 Points."

The Two-Stage Visualization Workflow

STAGE 1: EXPLORE (for yourself)
  - Many charts, quickly
  - Default settings, no polish
  - Goal: discover patterns, check assumptions
  - Most charts will be thrown away

STAGE 2: EXPLAIN (for your audience)
  - One clear message per chart
  - Polished: title, labels, annotations
  - High data-ink ratio
  - Designed for the specific audience

Five Ways Charts Mislead (and How to Spot Them)

Technique How It Misleads How to Spot It
Truncated y-axis Makes small differences look enormous in bar charts Check if the y-axis starts at zero
Cherry-picked time range Controls which trend the viewer sees Ask "why does it start/end here?"
Dual y-axes Makes unrelated variables appear correlated Check for two different y-axis scales
Area distortion Icons scaled in 2D/3D exaggerate differences Compare actual numbers to visual impression
Missing context Omits benchmarks, baselines, or normalization Ask "what else would I need to know?"

The Chart Plan Template

Use this before writing any plotting code:

CHART PLAN
==========
Question:     What am I trying to answer or show?
Chart type:   Bar / Scatter / Line / Histogram / Other
Data source:  Which DataFrame? Any filters or aggregations?
x-axis:       Variable name + axis label
y-axis:       Variable name + axis label
Color:        Variable name (or single color if no variable)
Facets:       Split by what variable? (or none)
Title:        Finding-based title (not just topic)
Annotations:  Callouts, reference lines, labels?
Audience:     Exploratory (for me) or Explanatory (for whom)?

Terms to Remember

Term Definition
Grammar of graphics Framework describing any chart as a combination of data, aesthetics, geoms, scales, coordinates, and facets
Aesthetic mapping Connection between a data variable and a visual property (position, color, size, shape)
Geometric object The visual mark (point, line, bar) representing data on a chart
Scale Rule translating data values into visual values (axis range, color gradient, etc.)
Coordinate system The canvas: Cartesian, polar, or geographic
Faceting Splitting data into subgroups with one mini-chart per group
Bar chart Bars anchored to a baseline representing categorical comparisons
Scatter plot Points on two axes showing relationships between continuous variables
Line chart Connected points showing trends over a sequential dimension
Histogram Binned bars showing the distribution of a continuous variable
Exploratory visualization Quick, rough charts for discovering patterns (for yourself)
Explanatory visualization Polished charts for communicating findings (for an audience)
Data-ink ratio Proportion of a chart's visual content that represents data (higher is better)
Chartjunk Non-data visual elements that clutter without informing

What You Should Be Able to Do Now

  • [ ] Decompose any chart into its grammar of graphics components (data, aesthetics, geom, scale, coordinates, facets)
  • [ ] Select an appropriate chart type for a given question and dataset
  • [ ] Sketch a chart on paper using the chart plan template, specifying all key design decisions before coding
  • [ ] Distinguish exploratory from explanatory visualization and adjust your workflow accordingly
  • [ ] Identify at least three common misleading chart techniques (truncated axis, cherry-picking, dual axes)
  • [ ] Apply Tufte's principles to critique a chart's data-ink ratio and identify chartjunk
  • [ ] Create chart plans for the progressive project, ready to implement in Chapter 15

If you checked every box, you have the conceptual toolkit that makes Part III possible. Next stop: matplotlib. Time to turn these plans into code.


Next: Chapter 15 — matplotlib Foundations: Building Charts from the Ground Up