Chapter 16 Key Takeaways

Core Concepts

Visualizations as Models Every visualization encodes assumptions. A county-level choropleth weights geographic area; a bubble map weights population; a cartogram weights votes. None is objectively correct — each answers a different question. The analyst's responsibility is to choose the visualization that best answers the question at hand and to be explicit about what the design choices do and don't show.

Chart Type Matching - Choropleth maps: geographic variation - Bar charts: category comparison; stacked for composition, grouped for direct comparison - Line charts: temporal trends; only for sequential data - Scatter plots: bivariate relationships between continuous variables - Heatmaps: two-dimensional crosstabulation - Interactive (Plotly): exploration by non-technical audiences

Color Scale Principles - Sequential scales (one hue, varying lightness): for data with one-directional range - Diverging scales (two contrasting hues meeting at a neutral center): for data with a natural midpoint - Always use colorblind-accessible palettes (ColorBrewer, viridis family) - Maintain the conventional red (Republican) / blue (Democrat) encoding in American political visualization - Never truncate color scales without clear labeling

The Choropleth Workflow 1. Aggregate voter-level data to geographic units 2. Load shapefile/GeoJSON with geopandas.read_file() 3. Merge data to geometry with careful name-matching validation 4. Call .plot() with appropriate cmap, vmin, vmax, and legend parameters 5. Add titles, remove axes, save at appropriate DPI

The Ecological Fallacy County-level correlations (% Hispanic and mean support score) do not imply the same individual-level correlations (Hispanic voters have higher support scores). Analysts must always specify the level of analysis and be explicit about what individual-level inferences, if any, are warranted by aggregate data.

Interactive Visualization (Plotly) px.choropleth() and px.scatter() produce interactive HTML outputs with hover tooltips and zoom capability. The featureidkey parameter must match the property name in the GeoJSON to the column name in the data. Interactive visualizations are superior for exploration and non-technical presentation; static matplotlib figures are superior for publication and archiving.

Misleading Visualization Patterns to Avoid - Truncated y-axes on bar charts (makes small differences look large) - Cherry-picked time windows (shows only the favorable portion of a trend) - Asymmetric color scales (distorts visual balance in diverging displays) - Non-comparable denominators (mixing vote share with absolute votes) - Non-perceptually-uniform color scales (jet, rainbow) that distort visual impression of magnitude

Key Python Functions and Methods

# Geographic
geopandas.read_file('file.geojson')      # Load geographic data
gdf.merge(df, left_on=..., right_on=...) # Join data to geometry
gdf.plot(column='var', cmap='RdBu', ...)  # Choropleth
gdf.geometry.centroid                     # Extract centroids for bubbles

# Aggregation
df.groupby('county').agg(...)             # County-level summaries
df.pivot_table(values, index, columns, aggfunc)  # Crosstab for heatmap

# Visualization
plt.subplots(nrows, ncols, figsize=...)   # Multi-panel layout
ax.bar_label(container, fmt='%.1f', ...)  # Add data labels to bars
sns.heatmap(data, cmap=..., annot=True)   # Heatmap

# Interactive
px.choropleth(df, geojson=geo, locations='county', featureidkey='properties.name', color='var')
px.scatter(df, x='col1', y='col2', color='col3', size='col4', hover_name='county')
fig.write_html('output.html')             # Save interactive visualization

Key Terms

  • Choropleth map: A thematic map where geographic units are shaded by a data variable
  • Cartogram: A map where geographic units are distorted in size proportional to a data variable (e.g., population)
  • Ecological fallacy: The error of inferring individual-level patterns from aggregate data
  • Sequential color scale: A color scale varying in lightness/saturation along a single hue, appropriate for one-directional data
  • Diverging color scale: A color scale with two contrasting hues meeting at a neutral midpoint, appropriate for data with a natural reference point
  • Perceptual uniformity: The property of a color scale where equal data differences produce equal visual color differences
  • GeoDataFrame: A pandas DataFrame extended with a geometry column enabling spatial operations, provided by GeoPandas
  • CRS (Coordinate Reference System): The spatial reference system defining how geographic coordinates map to Earth's surface; must be consistent across datasets to merge correctly
  • featureidkey: In Plotly's choropleth functions, the GeoJSON property that identifies each geographic feature for matching to data

The Central Lesson: Visualization Is Analysis

Visualization is not decoration applied after analysis is complete. It is analysis made visual — a different representation of the same underlying inferential work. The choice of what to visualize, how to aggregate, what scale to use, and what to highlight are analytical decisions that shape what conclusions are drawn. The best political data analysts are as deliberate about their visualization choices as about their statistical modeling choices, for the same reason: both are consequential for what the data is understood to mean.