36 min read

The campaign's third-floor conference room had been converted into what Jake Rourke called "the war room" and Nadia Osei privately called "the spreadsheet graveyard." Rows of monitors showed real-time voter contact tracking. A wall map — an actual...

Chapter 16: Visualizing the Electorate (Python Lab)

The campaign's third-floor conference room had been converted into what Jake Rourke called "the war room" and Nadia Osei privately called "the spreadsheet graveyard." Rows of monitors showed real-time voter contact tracking. A wall map — an actual printed paper map, at Jake's insistence — showed county boundaries in red and blue, the approximate partisan lean visible at a glance but none of the granularity that mattered.

"Can you put this on screen?" Jake asked, gesturing at the wall map. "Something the field directors can actually work with? Zoom into counties, see where we're under-performing?"

"I built it last night," Nadia said. She pulled up her laptop, connected it to the projector, and opened a Python script. Within seconds, the wall map's crude red-blue picture was replaced by something qualitatively different: a county-level choropleth in which each county was shaded on a gradient from deep red to deep blue, with dots scaled to population overlaid at county centroids, and a second panel showing turnout propensity distribution across demographic groups by urban-rural category.

"That's our state," Jake said after a moment. It was not a question; it was recognition. The map showed something the paper map hadn't — the structure of the electorate, not just its outcome. The dark blue urban cores, the light blue ring suburbs that were trending Democratic, the pale red exurban counties that had been trending redder, the deeper red rural territories. And crucially, overlaid on the partisanship shading, the turnout story: which counties were below their modeled propensity, which were running ahead.

This chapter teaches you how to build that map — and the other visualizations that make voter data legible to campaign teams, journalists, researchers, and the public. Good political visualization is not decoration. It is analysis made visual: a model that reveals structure in data that would be invisible in a table.


16.1 Principles of Political Data Visualization

Before writing a single line of Python, we need to understand what political visualizations are for and how they can succeed or fail.

16.1.1 The Map Is Not the Territory

Every visualization is a model — a simplification that reveals some aspects of reality while hiding others. A county-level choropleth map of presidential vote share looks like a "sea of red" in most of the continental United States because rural counties, which lean Republican, are geographically large while urban counties, which lean Democratic, are small. The famous "bubble maps" that scale counties by population rather than geography tell a different story.

Neither map is "correct." Each emphasizes different aspects of electoral reality. The political analyst's job is to choose the visual model that best answers the question at hand and to communicate clearly what the visualization does and doesn't show.

💡 Intuition Check: The 2016 Map Problem Standard county-level maps of the 2016 presidential election show Donald Trump winning almost all the land area of the United States. Hillary Clinton won in 2016 the large urban cores that are tiny on a standard map. A cartogram that scales county size by population produces a strikingly different visual picture. Neither is misleading if labeled properly; both are misleading if the reader doesn't understand what they're looking at. The first lesson of political visualization is that the map encodes assumptions, and those assumptions should be explicit.

16.1.2 Choosing the Right Chart for the Question

Different visualization types answer different questions:

Choropleth maps answer: How does a variable vary across geographic units? They are best for showing geographic patterns and are subject to area-size bias (geographic area dominates the visual impression regardless of population).

Bar charts answer: How do quantities compare across categories? They are ideal for demographic breakdowns and group comparisons. Grouped bar charts show multiple variables across the same categories simultaneously.

Time series line charts answer: How does a variable change over time? They are essential for showing trends in vote share, polling, and turnout across election cycles.

Scatter plots answer: What is the relationship between two continuous variables? Scatter plots with political data frequently show demographic-to-vote correlations, e.g., percent college-educated by county vs. Democratic vote share.

Heatmaps answer: How does a variable vary across two categorical dimensions simultaneously? They are ideal for crosstabulation visualization — e.g., how support scores vary by demographic group and urban-rural category.

Interactive visualizations allow users to explore data themselves — zooming into geographic areas, hovering for exact values, filtering by group. They are powerful for exploration and presentation but require more investment in design.

Best Practice: Match Chart Type to Data Type The most common visualization mistake in political analytics is using the wrong chart type for the data at hand. Pie charts are almost always inferior to bar charts for comparing quantities. Line charts should only be used for data with inherent sequence (time, ordered categories). Never use a three-dimensional bar chart — the added dimension provides no information and distorts visual comparison. When in doubt, start with the simplest chart that answers the question.

16.1.3 Color Choices in Political Visualization

Color is the primary visual encoding in most political visualizations, and its choices carry enormous meaning. Several principles apply:

Sequential vs. diverging color scales: A sequential scale (light to dark) works well when data has a natural one-directional range (e.g., turnout rate from 0% to 100%). A diverging scale (two contrasting colors meeting at a neutral midpoint) works better when data has a natural reference point (e.g., vote margin where 50% is the tipping point).

Avoid red-green: The most common form of color vision deficiency (affecting approximately 8% of men) makes red and green indistinguishable. Many political visualizations use red for Republican and green for independent or third-party, which is inaccessible. Use the ColorBrewer palettes, which are designed with colorblind accessibility in mind.

The conventional Republican red / Democrat blue: This color convention is so entrenched in American political visualization that departing from it creates confusion. Stick to it unless you have a specific reason to deviate. International political visualizations use different conventions (e.g., red for left in most countries).

Perceptual linearity: Some color scales appear perceptually linear (equal data differences produce equal visual color differences) while others do not. The viridis, plasma, and cividis palettes from matplotlib are perceptually uniform and excellent for choropleth maps.

⚠️ Common Pitfall: Misleading Color Scales Truncating a color scale — setting the minimum at something other than zero or the natural data minimum — can make small differences appear large. A county choropleth where the scale runs from 48% to 52% (rather than 0% to 100%) will dramatically amplify what are in fact small partisan differences. Always show your scale and think carefully about what range is visually and analytically appropriate.


16.1.4 The Grammar of Graphics: Thinking in Layers

Before writing any Python, it is useful to internalize a conceptual framework for thinking about what visualizations are made of. The "grammar of graphics," developed by Leland Wilkinson and implemented in the R package ggplot2 and the Python package plotnine, decomposes every visualization into a small number of separable components:

Data: The underlying dataset — in our case, the ODA voter file or county-level aggregates derived from it.

Aesthetic mappings: The assignment of data variables to visual properties. In a choropleth map, the aesthetic mapping assigns the support score variable to the fill color property. In a scatter plot, it assigns one variable to the x position and another to the y position.

Geometric objects (geoms): The visual marks that represent data — polygons for choropleths, bars for bar charts, points for scatter plots, lines for time series.

Statistical transformations: Operations applied to the data before rendering — means, counts, kernel density estimates. A bar chart showing average support scores has undergone a mean transformation; a scatter plot of raw data has undergone no transformation.

Scales: The functions that map data values to aesthetic values. A color scale maps numeric support scores (0–100) to colors (red to blue). An axis scale maps numeric vote counts to pixel positions.

Coordinate system: Cartesian for most charts; geographic projections for maps.

Facets: The small multiples logic — how data is split across panels.

You don't need to use the grammar of graphics framework explicitly in matplotlib (it uses a more procedural API), but thinking in these terms clarifies the analytical choices embedded in every visualization. When you change a color scale, you are changing the mapping from data to visual property. When you decide to show mean support rather than raw individual scores, you are adding a statistical transformation. Making these choices explicit is what separates analytical visualization from graphic decoration.

💡 Intuition Check: Decomposing a Campaign Map Take Nadia's county choropleth and decompose it into grammar-of-graphics components: Data = county-level support score aggregates. Aesthetic mapping = support score → fill color; total voters → point size (for the bubble overlay). Geom = polygon (for county shapes); point (for centroids). Statistical transformation = mean aggregation from voter-level to county-level. Scale = RdBu diverging color scale, 30–70 range. Coordinate system = geographic projection (Albers or Mercator). Once you can decompose a visualization this way, you can systematically vary any component — change the statistical transformation to median, change the aesthetic mapping to add a second variable, change the scale to highlight different ranges — and understand what analytical claim the changed visualization makes.


16.2 Setting Up the Python Environment

All code in this chapter requires the following packages. Install them with pip if needed:

pip install geopandas matplotlib pandas numpy seaborn plotly

For geographic data, you'll also need:

pip install shapely fiona pyproj

The examples in this chapter use a simulated dataset based on the ODA voter file (oda_voters.csv) and a county-level GeoJSON file. In a real campaign environment, you would use actual county shapefiles (available from the Census Bureau's TIGER/Line shapefile repository) and your campaign's voter file.

16.2.1 Loading the ODA Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
from pathlib import Path

# Load the voter-level dataset
df = pd.read_csv('oda_voters.csv')

# Preview the structure
print(df.head())
print(df.dtypes)
print(df.describe())

The ODA dataset contains the following columns: - voter_id: unique voter identifier - state: state code - county: county name - age: voter age in years - gender: M/F/Other - race_ethnicity: White NH, Hispanic/Latino, Black NH, Asian/other - education: Less than HS / HS diploma / Some college / College degree / Graduate degree - income_bracket: <$30k / $30k-$60k / $60k-$100k / $100k+ - party_reg: D / R / Other / Unaffiliated - vote_history_2018, vote_history_2020, vote_history_2022: 0/1 indicator for each election - urban_rural: Urban / Suburban / Exurban / Rural - support_score: 0–100 estimated probability of supporting Garza - persuadability_score: 0–100 estimated openness to persuasion


16.2.2 Working with Geographic Data: A Conceptual Overview

Geographic visualization requires understanding three concepts that non-GIS analysts often encounter for the first time in political analytics:

Shapefiles and GeoJSON. Vector geographic data is stored in formats that describe the boundaries of geographic units as sequences of coordinate pairs. The shapefile format (developed by ESRI) and the GeoJSON format (an open standard) are the most common. The Census Bureau's TIGER/Line shapefiles provide county, congressional district, and census tract boundaries for the entire United States and are freely available at census.gov/geographies/mapping-files.

Coordinate Reference Systems (CRS). Geographic coordinates can be expressed in different reference systems. Geographic coordinate systems use latitude and longitude in degrees. Projected coordinate systems (like Albers Equal Area Conic, commonly used for U.S. national maps) transform the curved earth surface to a flat plane, choosing between different distortions of area, shape, and distance. For choropleth maps, equal-area projections are preferable because they don't inflate the apparent size of high-latitude regions. GeoPandas stores and transforms CRS using the to_crs() method.

The merge operation. A choropleth map is fundamentally a join operation: you have geographic boundary data (polygon geometries labeled by county FIPS code or county name) and analytical data (support scores, turnout rates, demographic percentages labeled by county). You need to join these two datasets on their common identifier to attach the analytical data to the geometry. The most common failure in this workflow is name mismatching — "St. Clair County" in the shapefile vs. "Saint Clair County" in the voter file — which produces unmatched records that appear as gray "no data" counties on your map.


16.3 Building County-Level Aggregates

Before building maps, we need to aggregate voter-level data to the county level. This is the fundamental join operation that connects individual voter records to geographic units.

# Compute county-level aggregates
county_stats = df.groupby('county').agg(
    total_voters=('voter_id', 'count'),
    pct_dem=('party_reg', lambda x: (x == 'D').mean() * 100),
    pct_rep=('party_reg', lambda x: (x == 'R').mean() * 100),
    mean_support=('support_score', 'mean'),
    mean_persuadability=('persuadability_score', 'mean'),
    pct_voted_2020=('vote_history_2020', 'mean'),
    pct_voted_2022=('vote_history_2022', 'mean'),
    pct_hispanic=('race_ethnicity', lambda x: (x == 'Hispanic/Latino').mean() * 100),
    pct_black=('race_ethnicity', lambda x: (x == 'Black NH').mean() * 100),
    pct_college=('education', lambda x:
                 x.isin(['College degree', 'Graduate degree']).mean() * 100),
    pct_urban=('urban_rural', lambda x: (x == 'Urban').mean() * 100),
).reset_index()

# Compute partisan registration margin (positive = more Democratic)
county_stats['reg_margin'] = county_stats['pct_dem'] - county_stats['pct_rep']

print(county_stats.head(10))
print(f"\nTotal counties: {len(county_stats)}")

📊 Real-World Application: The Aggregation Choice Notice that we're computing the mean support score rather than, say, the median or the sum. For a campaign resource-allocation dashboard, the mean is usually the right aggregation: it tells you the average persuadability/support level of voters in the county, which is what you need to compare counties. But for identifying which voters within a county to contact, you want the individual-level scores, not the aggregated mean.


16.3.1 Why Aggregation Choices Matter

When we aggregate individual voter records to the county level, we are making analytical choices that determine what questions the visualization can and cannot answer. Consider the simple calculation of mean_support per county. This mean is the right aggregation if we want to know the average probability that a randomly selected voter in the county supports Garza. But it is the wrong aggregation for at least two other important questions:

For estimating total Garza votes: We need total_voters × mean_support, not just mean_support. A county with 10,000 voters and mean support of 65 contributes 6,500 expected Garza votes; a county with 100,000 voters and mean support of 52 contributes 52,000 expected Garza votes. The second county is far more important despite its lower average support.

For GOTV prioritization: We need the distribution of support scores among low-propensity voters, not the mean across all voters. A county with mean support of 60 but where that support is concentrated among high-propensity voters who will vote without GOTV contact is less valuable for mobilization than a county with mean support of 55 where the support is concentrated among low-propensity voters who need mobilization encouragement.

For persuasion targeting: We need the number of genuinely persuadable voters (persuadability score above a threshold) with moderate support scores (neither strongly for nor strongly against Garza). A county mean doesn't reveal this.

These distinctions motivate building multiple county-level metrics rather than just one. Nadia's county dashboard computes: mean support (for geographic overview), total expected Garza votes (for importance ranking), low-propensity high-support voter count (for GOTV ranking), and high-persuadability moderate-support voter count (for persuasion ranking). Each answers a different question; none is redundant.


16.4 Choropleth Maps: Electoral Geography

A choropleth map shades geographic units (counties, states, precincts) by a variable of interest. Building one in Python requires two things: geographic boundary data (shapefiles or GeoJSON) and the data to map onto those boundaries.

16.4.1 Loading Geographic Data with GeoPandas

import geopandas as gpd

# Load county shapefile
# In production, use the actual state's county shapefile from Census TIGER
# Here we use a GeoJSON with county boundaries
gdf = gpd.read_file('state_counties.geojson')

print(gdf.head())
print(f"CRS: {gdf.crs}")  # Coordinate reference system

# Merge with our county statistics
gdf_merged = gdf.merge(county_stats, left_on='county_name', right_on='county', how='left')

# Check for unmatched counties
unmatched = gdf_merged[gdf_merged['total_voters'].isna()]
if len(unmatched) > 0:
    print(f"Warning: {len(unmatched)} counties did not merge:")
    print(unmatched['county_name'].values)

⚠️ Common Pitfall: Name Matching in GeoJSON Merges County names in shapefiles and voter files are often inconsistently formatted. "St. Clair County" in the shapefile may appear as "Saint Clair County" or "St Clair County" (without period) in the voter file. Always check your merge quality and build a name-cleaning step before joining. Fuzzy matching libraries (fuzzywuzzy, rapidfuzz) can help with systematic mismatches.

16.4.2 Your First Choropleth: Partisan Registration

fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Panel 1: Partisan registration margin
ax1 = axes[0]
gdf_merged.plot(
    column='reg_margin',
    cmap='RdBu',           # Red = Republican, Blue = Democrat
    ax=ax1,
    legend=True,
    legend_kwds={
        'label': 'Reg. Margin (D - R)',
        'orientation': 'horizontal',
        'shrink': 0.7,
        'pad': 0.02,
    },
    missing_kwds={'color': 'lightgrey'},
    vmin=-40,              # Symmetric around 0
    vmax=40,
    edgecolor='white',
    linewidth=0.5,
)
ax1.set_title('Partisan Registration Margin\n(Blue = More Democratic, Red = More Republican)',
              fontsize=13, fontweight='bold')
ax1.axis('off')

# Panel 2: Mean Garza support score
ax2 = axes[1]
gdf_merged.plot(
    column='mean_support',
    cmap='Blues',
    ax=ax2,
    legend=True,
    legend_kwds={
        'label': 'Mean Garza Support Score (0-100)',
        'orientation': 'horizontal',
        'shrink': 0.7,
        'pad': 0.02,
    },
    missing_kwds={'color': 'lightgrey'},
    vmin=0,
    vmax=100,
    edgecolor='white',
    linewidth=0.5,
)
ax2.set_title('Mean Garza Support Score by County\n(Darker = Stronger Support)',
              fontsize=13, fontweight='bold')
ax2.axis('off')

plt.suptitle('Electoral Geography: Garza-Whitfield Senate Race',
             fontsize=16, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('choropleth_partisan.png', dpi=150, bbox_inches='tight')
plt.show()

16.4.3 Adding Population Bubbles

The area-bias problem with choropleth maps can be partly addressed by overlaying circles scaled to population, making it visually clear that large rural counties have fewer voters than small urban ones.

fig, ax = plt.subplots(1, 1, figsize=(14, 10))

# Base choropleth: support score
gdf_merged.plot(
    column='mean_support',
    cmap='RdBu',
    ax=ax,
    legend=False,
    vmin=30,
    vmax=70,
    edgecolor='white',
    linewidth=0.4,
    alpha=0.85,
)

# Add population bubbles at county centroids
centroids = gdf_merged.copy()
centroids['geometry'] = centroids.geometry.centroid

# Scale bubble size by total_voters
max_voters = centroids['total_voters'].max()
min_size = 20
max_size = 600
centroids['bubble_size'] = (
    (centroids['total_voters'] / max_voters) * (max_size - min_size) + min_size
)

centroids_proj = centroids.to_crs(gdf_merged.crs)
centroids_proj.plot(
    ax=ax,
    markersize=centroids_proj['bubble_size'],
    color='black',
    alpha=0.25,
    marker='o',
)

# Custom legend
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

# Colorbar for support score
sm = plt.cm.ScalarMappable(cmap='RdBu', norm=plt.Normalize(vmin=30, vmax=70))
sm._A = []
cbar = plt.colorbar(sm, ax=ax, orientation='horizontal', fraction=0.03, pad=0.02,
                    shrink=0.5)
cbar.set_label('Mean Garza Support Score (30-70 range shown)', fontsize=10)

ax.set_title(
    'Garza Support Score by County\n(Bubble size = registered voters)',
    fontsize=14, fontweight='bold'
)
ax.axis('off')

plt.tight_layout()
plt.savefig('choropleth_with_bubbles.png', dpi=150, bbox_inches='tight')
plt.show()

💡 Intuition Check: Why Bubble Size Matters Compare how the choropleth looks with and without the population bubbles. Several rural counties in the interior of the state are deep red (strong Whitfield territory) and geographically large, dominating the visual impression. But the small bubble sizes reveal that each of these counties has 15,000–25,000 registered voters. A single urban county with a tiny geographic footprint but a large bubble has 180,000 registered voters. The map with bubbles tells a different — and more accurate — story about where votes actually come from.


16.4.4 Reading a Choropleth Analytically

When Nadia presents the bubble choropleth to the field directors, she doesn't just say "here's a map." She guides their attention through a specific analytical reading:

"Look at the deep blue urban core counties — those are our strongest support areas, but notice the small bubbles. Each has 60,000 to 100,000 registered voters. Now look at the suburban ring counties that are lighter blue — those are our persuasion and mobilization frontier, and they have the biggest bubbles. The exurban counties where we're at roughly 48 are where the race is being fought, and they have significant voter populations. The deep red rural interior has large geographic footprints but small bubbles — each of those counties has 15,000 to 30,000 voters, and we're not going to win them, but we need to minimize our losses."

This analytical narration is what distinguishes a well-used visualization from a well-made one. The map communicates the structure of the data; the analyst communicates what that structure means for strategy. The best political visualizations are designed to support this kind of guided analytical reading — with color scales, annotations, and design choices that make the strategically important features visible without requiring the analyst to explain every element.

A specific design lesson from this narration: the bubble overlay is not just about correcting for geographic area bias. It is about making the strategic importance of each county visible at a glance. When field directors can see both the partisan lean (color) and the vote potential (bubble size) in a single image, they can make intuitive priority judgments that would take much longer to derive from a ranked table.


16.5 Demographic Bar Charts: Who Makes Up the Electorate?

Bar charts are the workhorses of demographic visualization. Well-designed bar charts can show composition, comparison, and change simultaneously.

16.5.1 Stacked Bar Chart: Demographic Composition by Urban-Rural

fig, axes = plt.subplots(1, 2, figsize=(15, 7))

# ---- Panel 1: Race/ethnicity composition by urban-rural ----
urban_race = df.groupby(['urban_rural', 'race_ethnicity']).size().unstack(fill_value=0)
urban_race_pct = urban_race.div(urban_race.sum(axis=1), axis=0) * 100

# Define order
urban_order = ['Urban', 'Suburban', 'Exurban', 'Rural']
race_order = ['White NH', 'Hispanic/Latino', 'Black NH', 'Asian/other']
race_colors = ['#6baed6', '#fd8d3c', '#74c476', '#9e9ac8']

urban_race_pct = urban_race_pct.reindex(urban_order)[race_order]

ax1 = axes[0]
urban_race_pct.plot(
    kind='bar',
    stacked=True,
    ax=ax1,
    color=race_colors,
    edgecolor='white',
    linewidth=0.5,
)
ax1.set_title('Racial/Ethnic Composition\nby Urban-Rural Category',
              fontsize=13, fontweight='bold')
ax1.set_xlabel('')
ax1.set_ylabel('Percentage of Registered Voters', fontsize=11)
ax1.set_xticklabels(urban_order, rotation=0, fontsize=11)
ax1.legend(title='Race/Ethnicity', bbox_to_anchor=(1.05, 1), loc='upper left')
ax1.set_ylim(0, 100)

# Add percentage labels for segments > 8%
for container in ax1.containers:
    ax1.bar_label(container, fmt='%.0f%%', label_type='center', fontsize=8,
                  padding=2)

# ---- Panel 2: Party registration by education ----
edu_party = df.groupby(['education', 'party_reg']).size().unstack(fill_value=0)
edu_party_pct = edu_party.div(edu_party.sum(axis=1), axis=0) * 100

edu_order = ['Less than HS', 'HS diploma', 'Some college',
             'College degree', 'Graduate degree']
party_order = ['D', 'R', 'Other', 'Unaffiliated']
party_colors = ['#2166ac', '#d6604d', '#78c679', '#969696']

edu_party_pct = edu_party_pct.reindex(edu_order)[party_order]

ax2 = axes[1]
edu_party_pct.plot(
    kind='bar',
    stacked=True,
    ax=ax2,
    color=party_colors,
    edgecolor='white',
    linewidth=0.5,
)
ax2.set_title('Party Registration by Education Level',
              fontsize=13, fontweight='bold')
ax2.set_xlabel('')
ax2.set_ylabel('Percentage of Registered Voters', fontsize=11)
ax2.set_xticklabels(edu_order, rotation=20, ha='right', fontsize=10)
ax2.legend(title='Party', bbox_to_anchor=(1.05, 1), loc='upper left',
           labels=['Democrat', 'Republican', 'Other', 'Unaffiliated'])
ax2.set_ylim(0, 100)

plt.suptitle('Electorate Composition: Garza-Whitfield State',
             fontsize=15, fontweight='bold')
plt.tight_layout()
plt.savefig('demographic_composition.png', dpi=150, bbox_inches='tight')
plt.show()

16.5.2 Grouped Bar Chart: Support Scores by Demographic Group

Grouped (side-by-side) bar charts show multiple measures across the same categories, enabling direct comparison.

# Mean support and persuadability by race/ethnicity
group_scores = df.groupby('race_ethnicity').agg(
    mean_support=('support_score', 'mean'),
    mean_persuadability=('persuadability_score', 'mean'),
    count=('voter_id', 'count')
).reset_index()

race_order = ['White NH', 'Hispanic/Latino', 'Black NH', 'Asian/other']
group_scores = group_scores.set_index('race_ethnicity').reindex(race_order)

x = np.arange(len(race_order))
width = 0.35

fig, ax = plt.subplots(figsize=(11, 7))

bars1 = ax.bar(x - width/2, group_scores['mean_support'], width,
               label='Mean Support Score', color='#2166ac', alpha=0.85,
               edgecolor='white')
bars2 = ax.bar(x + width/2, group_scores['mean_persuadability'], width,
               label='Mean Persuadability Score', color='#d95f02', alpha=0.85,
               edgecolor='white')

# Add value labels
ax.bar_label(bars1, fmt='%.1f', padding=3, fontsize=10)
ax.bar_label(bars2, fmt='%.1f', padding=3, fontsize=10)

# Add voter count annotations below bars
for i, (idx, row) in enumerate(group_scores.iterrows()):
    n = int(row['count'])
    ax.text(i, -4, f'n={n:,}', ha='center', va='top', fontsize=8.5,
            color='#555555')

ax.set_title('Mean Support and Persuadability Scores\nby Racial/Ethnic Group',
             fontsize=14, fontweight='bold')
ax.set_ylabel('Score (0-100 scale)', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(race_order, fontsize=12)
ax.set_ylim(0, 100)
ax.legend(fontsize=11)
ax.axhline(50, color='grey', linestyle='--', linewidth=0.8, alpha=0.6,
           label='50 = toss-up')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('grouped_bars_support.png', dpi=150, bbox_inches='tight')
plt.show()

Best Practice: Always Show Sample Size Every chart showing means, percentages, or other statistics derived from grouped data should show the group size. A mean support score of 71 from 847 respondents means something very different from a mean of 71 from 15 respondents. Adding n labels prevents readers from drawing conclusions from noisy estimates.


16.5.3 Reading Bar Charts: The Three Questions

Every bar chart in political data visualization should be readable as the answer to one of three types of questions:

Comparison: How do different groups compare on a single measure? A simple bar chart sorted from highest to lowest answers this. Key design rule: sort by the variable being shown, not alphabetically, unless alphabetical order is analytically meaningful.

Composition: What is the relative size of each component of a whole? A stacked bar chart answers this, showing how the total is divided among sub-categories. Key design rule: put the most important or most variable category at the bottom (anchored to the baseline) where comparison is easiest.

Change: How does the same measure change across conditions (time, policy, treatment)? A grouped bar chart shows multiple measures per category, enabling direct comparison. Key design rule: use consistent colors to mark the "same" variable across groups, so the reader can track each measure across the groups.

Nadia's stacked bar of racial composition by urban-rural category is answering the composition question: what fraction of each urban-rural category belongs to each racial group? Her grouped bar of support scores by race is answering the comparison question: how do support and persuadability scores compare across racial groups? Choosing the wrong chart type for the question — say, using a stacked bar to show support scores across racial groups — would produce a confusing visualization that answers a question nobody was asking.


16.6 Time Series: Partisan Change Across Election Cycles

Time series charts show how variables change over time. For electoral geography, the most important time series is typically turnout or vote share across multiple election cycles.

# Simulate multi-cycle turnout data from vote history columns
# In the ODA dataset we have 2018, 2020, 2022 history

# County-level turnout rates by cycle
county_turnout = df.groupby('county').agg(
    turnout_2018=('vote_history_2018', 'mean'),
    turnout_2020=('vote_history_2020', 'mean'),
    turnout_2022=('vote_history_2022', 'mean'),
    urban_rural=('urban_rural', lambda x: x.mode()[0]),
    total_voters=('voter_id', 'count'),
).reset_index()

# Reshape to long format for time series plotting
turnout_long = county_turnout.melt(
    id_vars=['county', 'urban_rural', 'total_voters'],
    value_vars=['turnout_2018', 'turnout_2020', 'turnout_2022'],
    var_name='cycle',
    value_name='turnout_rate'
)
turnout_long['year'] = turnout_long['cycle'].str.extract(r'(\d+)').astype(int)
turnout_long['turnout_rate'] *= 100  # Convert to percentage

# Aggregate to urban-rural category for cleaner visualization
urban_turnout = turnout_long.groupby(['urban_rural', 'year']).agg(
    mean_turnout=('turnout_rate', 'mean'),
    weighted_turnout=('turnout_rate', lambda x: np.average(
        x, weights=turnout_long.loc[x.index, 'total_voters']
    )),
).reset_index()

# Plot
fig, axes = plt.subplots(1, 2, figsize=(15, 7))

colors = {'Urban': '#1b7837', 'Suburban': '#762a83',
          'Exurban': '#d95f02', 'Rural': '#d73027'}
urban_order = ['Urban', 'Suburban', 'Exurban', 'Rural']

# Left panel: Simple mean turnout by urban-rural
ax1 = axes[0]
for urban_cat in urban_order:
    data = urban_turnout[urban_turnout['urban_rural'] == urban_cat]
    ax1.plot(data['year'], data['mean_turnout'],
             color=colors[urban_cat], marker='o', linewidth=2.5,
             markersize=8, label=urban_cat)

ax1.set_title('Turnout Rate by Election Cycle\n(Urban-Rural Category)',
              fontsize=13, fontweight='bold')
ax1.set_xlabel('Election Year', fontsize=12)
ax1.set_ylabel('Turnout Rate (%)', fontsize=12)
ax1.set_xticks([2018, 2020, 2022])
ax1.legend(title='Urban-Rural', fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.set_ylim(0, 100)

# Annotate surge-and-decline pattern
ax1.annotate('2020 surge\n(presidential)',
             xy=(2020, urban_turnout[urban_turnout['urban_rural']=='Urban']['mean_turnout'].iloc[1]),
             xytext=(2019.6, 85),
             arrowprops=dict(arrowstyle='->', color='grey'),
             fontsize=9, color='grey')

# Right panel: Change from 2018 to 2022 (midterm-to-midterm)
ax2 = axes[1]
change_data = urban_turnout[urban_turnout['year'].isin([2018, 2022])].pivot(
    index='urban_rural', columns='year', values='mean_turnout'
).reset_index()
change_data['change'] = change_data[2022] - change_data[2018]
change_data = change_data.set_index('urban_rural').reindex(urban_order)

bar_colors = ['#2166ac' if c > 0 else '#d6604d' for c in change_data['change']]
bars = ax2.barh(change_data.index, change_data['change'],
                color=bar_colors, edgecolor='white', height=0.5)
ax2.bar_label(bars, fmt='%.1f pp', padding=5, fontsize=10)
ax2.axvline(0, color='black', linewidth=0.8)
ax2.set_title('Change in Turnout: 2018 to 2022\n(Midterm-to-Midterm Comparison)',
              fontsize=13, fontweight='bold')
ax2.set_xlabel('Percentage Point Change', fontsize=12)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)

plt.suptitle('Turnout Trends Across Election Cycles',
             fontsize=15, fontweight='bold')
plt.tight_layout()
plt.savefig('turnout_time_series.png', dpi=150, bbox_inches='tight')
plt.show()

🔗 Connection to Chapter 14 The surge-and-decline pattern you can visualize here corresponds directly to the academic literature discussed in Chapter 14. The 2020 spike in all urban-rural categories reflects the mobilized presidential electorate; the 2022 decline reflects the smaller midterm electorate. The differential between urban and rural change is analytically important: if urban areas show larger declines from 2020 to 2022 than rural areas, the midterm electorate is systematically more Republican, consistent with the surge-and-decline theory.


16.7 Scatter Plots: Demographic Correlates of Support

Scatter plots show relationships between two continuous variables. For political data, the canonical scatter plot shows a demographic attribute (college education rate, median income, percent Hispanic) on the x-axis and vote share or support score on the y-axis.

from matplotlib.colors import Normalize
from matplotlib.cm import ScalarMappable

# County-level scatter: % college educated vs. mean support
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Panel 1: College education vs. support score
ax1 = axes[0]
scatter = ax1.scatter(
    county_stats['pct_college'],
    county_stats['mean_support'],
    c=county_stats['pct_urban'],
    cmap='YlOrRd',
    s=county_stats['total_voters'] / 100,  # Size = voters / 100
    alpha=0.7,
    edgecolors='grey',
    linewidths=0.4,
)
plt.colorbar(scatter, ax=ax1, label='% Urban Voters')

# Regression line
from numpy.polynomial import polynomial as P
coeffs = np.polyfit(county_stats['pct_college'].dropna(),
                    county_stats['mean_support'].dropna(), 1)
x_line = np.linspace(county_stats['pct_college'].min(),
                     county_stats['pct_college'].max(), 100)
y_line = np.polyval(coeffs, x_line)
ax1.plot(x_line, y_line, color='navy', linewidth=2, linestyle='--',
         label=f'Trend (slope: {coeffs[0]:.2f})')

# Correlation
r = np.corrcoef(county_stats['pct_college'].dropna(),
                county_stats['mean_support'].dropna())[0, 1]
ax1.text(0.05, 0.95, f'r = {r:.3f}', transform=ax1.transAxes,
         fontsize=11, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='white', alpha=0.7))

ax1.set_xlabel('% College-Educated Voters', fontsize=12)
ax1.set_ylabel('Mean Garza Support Score', fontsize=12)
ax1.set_title('Education and Support Score\n(Bubble size = registered voters)',
              fontsize=13, fontweight='bold')
ax1.axhline(50, color='grey', linestyle=':', alpha=0.6)
ax1.legend(fontsize=10)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)

# Panel 2: % Hispanic vs. support score
ax2 = axes[1]
scatter2 = ax2.scatter(
    county_stats['pct_hispanic'],
    county_stats['mean_support'],
    c=county_stats['reg_margin'],
    cmap='RdBu',
    vmin=-40, vmax=40,
    s=county_stats['total_voters'] / 100,
    alpha=0.7,
    edgecolors='grey',
    linewidths=0.4,
)
plt.colorbar(scatter2, ax=ax2, label='Reg. Margin (D-R)')

coeffs2 = np.polyfit(county_stats['pct_hispanic'].dropna(),
                     county_stats['mean_support'].dropna(), 1)
x_line2 = np.linspace(county_stats['pct_hispanic'].min(),
                      county_stats['pct_hispanic'].max(), 100)
y_line2 = np.polyval(coeffs2, x_line2)
ax2.plot(x_line2, y_line2, color='navy', linewidth=2, linestyle='--')

r2 = np.corrcoef(county_stats['pct_hispanic'].dropna(),
                 county_stats['mean_support'].dropna())[0, 1]
ax2.text(0.05, 0.95, f'r = {r2:.3f}', transform=ax2.transAxes,
         fontsize=11, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='white', alpha=0.7))

ax2.set_xlabel('% Hispanic/Latino Voters', fontsize=12)
ax2.set_ylabel('Mean Garza Support Score', fontsize=12)
ax2.set_title('Hispanic Population and Support Score\n(Bubble size = registered voters)',
              fontsize=13, fontweight='bold')
ax2.axhline(50, color='grey', linestyle=':', alpha=0.6)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)

plt.suptitle('Demographic Correlates of Garza Support\nby County',
             fontsize=15, fontweight='bold')
plt.tight_layout()
plt.savefig('scatter_demographics.png', dpi=150, bbox_inches='tight')
plt.show()

⚠️ Common Pitfall: Ecological Correlation When you compute county-level correlations between demographics and support scores, you are working with aggregate data — not individual voters. A strong positive correlation between % Hispanic and mean support score does NOT mean that Hispanic voters in these counties are more supportive than non-Hispanic voters. It might mean that counties with more Hispanic voters tend to be urban counties, and urban counties tend to have more Democratic voters regardless of race. This is the "ecological fallacy": drawing individual-level conclusions from aggregate data. Always be explicit about the level of analysis and what it does and does not tell you.


16.8 Heatmaps for Crosstabulation Visualization

Heatmaps display the values in a two-dimensional table using color intensity, making patterns in crosstabulations visible at a glance.

# Crosstab: mean support score by race/ethnicity and urban-rural category
pivot_support = df.pivot_table(
    values='support_score',
    index='race_ethnicity',
    columns='urban_rural',
    aggfunc='mean'
)

race_order = ['White NH', 'Hispanic/Latino', 'Black NH', 'Asian/other']
urban_order = ['Urban', 'Suburban', 'Exurban', 'Rural']
pivot_support = pivot_support.reindex(race_order)[urban_order]

# Crosstab: mean persuadability score (second panel)
pivot_persuad = df.pivot_table(
    values='persuadability_score',
    index='race_ethnicity',
    columns='urban_rural',
    aggfunc='mean'
).reindex(race_order)[urban_order]

# Count crosstab for annotation
pivot_count = df.groupby(['race_ethnicity', 'urban_rural']).size().unstack(fill_value=0)
pivot_count = pivot_count.reindex(race_order)[urban_order]

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Helper function for annotated heatmap
def annotated_heatmap(data, count_data, ax, cmap, title, vmin, vmax, fmt='.1f'):
    sns.heatmap(
        data,
        ax=ax,
        cmap=cmap,
        vmin=vmin, vmax=vmax,
        annot=False,  # We'll add custom annotations
        linewidths=0.5,
        linecolor='white',
        cbar_kws={'label': 'Score (0-100)'},
    )
    # Custom annotations showing score + count
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            val = data.iloc[i, j]
            n = count_data.iloc[i, j]
            if not np.isnan(val):
                ax.text(j + 0.5, i + 0.5, f'{val:{fmt}}\n(n={n:,})',
                        ha='center', va='center', fontsize=9,
                        color='black' if val < 65 else 'white')
    ax.set_title(title, fontsize=13, fontweight='bold')
    ax.set_xlabel('Urban-Rural Category', fontsize=11)
    ax.set_ylabel('Race/Ethnicity', fontsize=11)
    ax.set_xticklabels(urban_order, rotation=15, fontsize=10)
    ax.set_yticklabels(race_order, rotation=0, fontsize=10)

annotated_heatmap(
    pivot_support, pivot_count, axes[0],
    cmap='Blues',
    title='Mean Garza Support Score\nby Race/Ethnicity and Urban-Rural',
    vmin=20, vmax=90
)

annotated_heatmap(
    pivot_persuad, pivot_count, axes[1],
    cmap='Oranges',
    title='Mean Persuadability Score\nby Race/Ethnicity and Urban-Rural',
    vmin=20, vmax=80
)

plt.suptitle('Support and Persuadability Heatmaps: Garza-Whitfield Race',
             fontsize=14, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('heatmaps_crosstab.png', dpi=150, bbox_inches='tight')
plt.show()

📊 Real-World Application: What Nadia Reads in the Heatmap When Nadia presents this heatmap to the campaign team, she focuses on two cells: suburban Hispanic voters (moderately high support, moderate persuadability) and rural Hispanic voters (lower support, higher persuadability). The persuadability story is actionable: rural Hispanic communities are genuinely in play, but they have lower support than their urban counterparts, suggesting they may be more responsive to Whitfield's economic populism. This means the message to rural Hispanic voters needs to be different from the message to urban Hispanic voters — a finding that the summary statistics in the county-level aggregates would have missed.


16.8.1 Beyond the 2×2 Grid: Multi-Dimensional Crosstabulation

The two-dimensional heatmap shown in section 16.8 covers race/ethnicity and urban-rural. But political data frequently has more than two dimensions of analytical interest — age, income, and education all interact with race and geography in ways that a 2×2 heatmap cannot capture.

Several approaches extend the heatmap concept to higher dimensionality:

Faceted heatmaps: A small-multiples grid of heatmaps, each showing the same two-dimensional crosstab for a different value of a third variable. A 2×3 grid of heatmaps (race × urban-rural, faceted by income bracket) shows how the basic pattern changes across income groups. This requires more screen real estate but preserves the intuitive heatmap format.

3D surface plots: Rarely used in political analytics because they are perceptually difficult to read and almost always inferior to a well-designed 2D equivalent. Avoid them.

Parallel coordinates: A technique that shows each data point as a line connecting its values on multiple parallel axes, enabling visualization of multivariate patterns. Works well for 5–10 variables across 50–500 data points. Can reveal cluster structure that scatter plots miss. Rarely used in campaign analytics but valuable for exploratory analysis of voter file segments.

Dimensionality reduction visualization (UMAP, t-SNE): For very high-dimensional voter data (50+ attributes per voter), dimensionality reduction techniques can project the data into 2D space while approximately preserving the neighborhood structure. The resulting scatter plot — where voters near each other in 2D space have similar attribute profiles — can reveal segment structure that traditional crosstabulation misses. This is a more advanced technique introduced in later chapters.

For Nadia's immediate campaign needs, the 2D heatmap covering the most analytically important dimensions (race × urban-rural, and the income extension in the exercises) provides sufficient granularity for the mobilization and persuasion targeting decisions she faces. The more advanced multi-dimensional approaches are better suited for the exploratory phase of voter file analysis at the beginning of a campaign cycle, when the goal is to discover segment structure rather than track known variables.


16.9 Interactive Visualizations with Plotly

Interactive visualizations allow users to explore data themselves: hover for values, zoom into geographic areas, filter by group. Plotly Express is the fastest path to interactive political visualizations in Python.

16.9.1 Interactive Choropleth

import plotly.express as px
import plotly.graph_objects as go
import json

# Load GeoJSON for plotly
with open('state_counties.geojson', 'r') as f:
    counties_geo = json.load(f)

# Interactive choropleth: support score
fig_choropleth = px.choropleth(
    county_stats,
    geojson=counties_geo,
    locations='county',
    featureidkey='properties.county_name',
    color='mean_support',
    color_continuous_scale='RdBu',
    range_color=[30, 70],
    hover_name='county',
    hover_data={
        'mean_support': ':.1f',
        'total_voters': ':,',
        'pct_dem': ':.1f',
        'pct_rep': ':.1f',
        'pct_hispanic': ':.1f',
        'mean_persuadability': ':.1f',
    },
    title='Garza Support Score by County (Interactive)',
    labels={
        'mean_support': 'Support Score',
        'total_voters': 'Total Voters',
        'pct_dem': '% Democrat',
        'pct_rep': '% Republican',
        'pct_hispanic': '% Hispanic',
        'mean_persuadability': 'Persuadability',
    }
)

fig_choropleth.update_geos(
    fitbounds='locations',
    visible=False,
)
fig_choropleth.update_layout(
    height=600,
    coloraxis_colorbar=dict(
        title='Support Score<br>(30=Strong R, 70=Strong D)',
        tickvals=[30, 40, 50, 60, 70],
        ticktext=['30', '40', '50<br>(Toss-up)', '60', '70'],
    )
)
fig_choropleth.write_html('interactive_choropleth.html')
fig_choropleth.show()

16.9.2 Interactive Scatter with Dropdown

# Interactive scatter: choose which demographic on x-axis
fig_scatter = go.Figure()

demographics = {
    'pct_college': '% College Educated',
    'pct_hispanic': '% Hispanic/Latino',
    'pct_black': '% Black',
    'pct_urban': '% Urban Voters',
}

# Add a trace for each demographic
for i, (col, label) in enumerate(demographics.items()):
    visible = (i == 0)  # Only first trace visible by default
    fig_scatter.add_trace(go.Scatter(
        x=county_stats[col],
        y=county_stats['mean_support'],
        mode='markers',
        name=label,
        visible=visible,
        marker=dict(
            size=np.sqrt(county_stats['total_voters'] / 100),
            color=county_stats['reg_margin'],
            colorscale='RdBu',
            cmin=-40, cmax=40,
            colorbar=dict(title='Reg. Margin<br>(D-R)'),
            showscale=True,
            opacity=0.7,
            line=dict(width=0.5, color='grey'),
        ),
        text=county_stats['county'],
        hovertemplate=(
            '<b>%{text}</b><br>'
            f'{label}: %{{x:.1f}}%<br>'
            'Support Score: %{y:.1f}<br>'
            'Total Voters: %{customdata[0]:,}<br>'
            '<extra></extra>'
        ),
        customdata=county_stats[['total_voters']].values,
    ))

# Dropdown menu
buttons = []
for i, (col, label) in enumerate(demographics.items()):
    visibility = [j == i for j in range(len(demographics))]
    buttons.append(dict(
        label=label,
        method='update',
        args=[
            {'visible': visibility},
            {'xaxis': {'title': label + ' (%)'}},
        ]
    ))

fig_scatter.update_layout(
    updatemenus=[dict(
        active=0,
        buttons=buttons,
        direction='down',
        showactive=True,
        x=0.02, y=0.98,
        xanchor='left', yanchor='top',
    )],
    title='Demographic Correlates of Garza Support (Interactive)',
    xaxis_title=list(demographics.values())[0] + ' (%)',
    yaxis_title='Mean Garza Support Score',
    height=600,
)

# Add 50% reference line
fig_scatter.add_hline(y=50, line_dash='dash', line_color='grey',
                       annotation_text='Toss-up (50)')

fig_scatter.write_html('interactive_scatter.html')
fig_scatter.show()

🧪 Try This: Extend the Interactive Visualization Add a second dropdown that controls which variable is shown on the y-axis (support score, persuadability score, 2022 turnout rate, or 2020 turnout rate). Plotly's update_layout with multiple updatemenus supports cascading dropdowns. The result is a four-variable exploratory tool that lets campaign staff ask their own questions of the data without needing to write code.


16.9.3 When to Use Interactive vs. Static Visualization

Interactive visualizations are powerful, but they are not always the right choice. The decision should be driven by the audience and the context:

Use interactive visualizations when: - The audience will explore the data themselves (not just receive a pre-constructed argument) - Multiple questions need to be answered from the same dataset and you can't anticipate which ones - The data has multiple dimensions and you want to let users slice by any of them - The visualization will be distributed digitally (in a web browser or campaign dashboard) - Geographic exploration (zooming into specific counties) is a core use case

Use static visualizations when: - The audience is receiving a presentation and won't be able to interact - The key message is specific and should not be obscured by exploratory complexity - The visualization will be printed, included in a PDF, or embedded in a slide deck - The design needs to be precisely controlled and reproducible

In the campaign context, Nadia uses static matplotlib visualizations for weekly campaign briefings (printed handouts, projected slides where interaction isn't possible) and interactive Plotly dashboards for the digital war room that field directors access on tablets and laptops throughout the day. The two formats serve different communicative purposes and should be maintained separately rather than trying to make one format do both jobs.


16.10 Putting It Together: Nadia's Campaign Dashboard

By Election Day minus 21, Nadia has assembled the four visualizations into a two-page campaign dashboard that field directors can actually use:

Page 1: Where We Are - County choropleth: Garza support score (shaded) with bubble size = registered voters - Sorted horizontal bar chart: counties ranked by expected net votes (total voters × (support score / 100 - Whitfield share estimate)) — the "opportunity counties"

Page 2: Where to Focus - Heatmap: turnout propensity × support score (2×2 grid showing which cell is the best GOTV target: high support + low propensity) - Time series: week-by-week tracking of modeled support in key counties, updated daily

When Jake looks at the finished dashboard, he says what he hadn't said when looking at her raw numbers: "Now I can see it."

That is the point of visualization. The data was always there. The structure was always in it. The map makes it visible.

Best Practice: Design for Your Audience Nadia's campaign dashboard is designed for field directors, not for statisticians. It avoids confidence intervals (which field staff find confusing), uses simple color conventions (blue = good for us), and shows actionable outputs (ranked county list) rather than analytical outputs (model coefficients). Different audiences need different visualizations of the same underlying analysis. Always ask: who is reading this, and what decision does it need to support?


16.11 Avoiding Misleading Visualizations

Political data visualization is particularly susceptible to manipulation — intentional or accidental. Common misleading patterns:

Truncated axes: Bar charts that start at something other than zero make small differences look large. Always start bar charts at zero (or clearly label non-zero baselines). Line charts can legitimately show a narrower range, but label it clearly.

Cherry-picked time windows: A time series that starts just after an opponent's high point and ends just after your candidate's high point will always show a favorable trend. Show the full relevant time window.

Misleading map projections: Standard Mercator projections inflate the apparent size of high-latitude areas (making rural western/northern areas appear larger than they are). Consider equal-area projections (Albers, Lambert) for choropleth maps.

Non-comparable denominators: Vote share (% of votes cast) and vote margin (absolute votes) tell different stories. A county where your candidate won 70% of 10,000 votes contributed 4,000 net votes; a county where they won 52% of 200,000 votes contributed 4,000 net votes. Showing only the percentage obscures the importance of turnout in the second county.

Misleading color scales: Non-linear color scales, where the relationship between data value and color is not proportional, can dramatically distort visual impressions. Always use perceptually uniform scales (viridis, cividis) for quantitative data.

⚖️ Ethical Analysis: Visualization and Democratic Communication Political data visualization is not politically neutral. Maps that show "a sea of red" covering most of the geographic United States (while Democrats win most votes) can influence public perceptions of political legitimacy. Pollsters who cherry-pick favorable visualizations of their clients' data may be misleading journalists and donors. The same underlying data can tell radically different stories depending on visualization choices. Analysts have an obligation to represent data honestly, to show uncertainty, and to choose visualization parameters that serve understanding rather than advocacy. This is especially important in political contexts where visualizations can affect public confidence in electoral outcomes.


16.12 Design Principles in Depth: Making Political Data Legible

16.12.1 The Data-Ink Ratio

Edward Tufte's concept of the "data-ink ratio" captures an essential principle: the proportion of a graphic's ink that is devoted to actual data, as opposed to non-data decoration. A choropleth map that fills most of its area with shaded geographic regions has a high data-ink ratio; a three-dimensional bar chart with drop shadows, gradient fills, and decorative grid lines has a low ratio.

In political data visualization, the temptation to decorate is strong because the outputs are often shown to non-technical audiences in presentations and media contexts where visual sophistication is valued aesthetically. Campaign consultants sometimes push for visualizations that look impressive rather than communicate precisely. Analysts should resist: the most impressive visualization is the one that transmits the maximum analytical information with the minimum visual noise.

Practical applications for political choropleth maps: - Remove the background frame (set ax.axis('off')) for maps that don't need lat/lon reference - Remove all gridlines — no analytical information is lost and visual clutter is reduced - Use thin, light borders between geographic units (white at 0.4–0.5 linewidth) rather than thick black borders that dominate the visual - Position colorbars horizontally at the bottom rather than vertically to the side when vertical space is available — horizontal placement is perceptually easier to read for sequential scales

16.12.2 Small Multiples and Faceting

Small multiples — a grid of small identical charts, each showing a different sub-group or time period — are one of the most powerful techniques in the political data visualization toolkit. They enable viewers to compare patterns across groups by holding the visual format constant and varying only the data.

For political data, the most common small multiples applications are: - A grid of county-level choropleths, one per election cycle, showing partisan change over time - A grid of demographic bar charts, one per state or region, showing how the same demographic pattern varies geographically - A grid of scatter plots, one per urban-rural category, showing whether a demographic correlation holds within each geographic type

In Python, matplotlib's subplots() function handles small multiples directly; Plotly's make_subplots() does the same for interactive versions; and seaborn's FacetGrid provides a high-level API for faceted statistical plots.

The key discipline in small multiples is keeping the scale consistent across all panels. If each panel in a choropleth grid has its own color scale, viewers will compare colors across panels and draw false conclusions — a dark blue in Panel 1 (2012) may represent 55% Democratic while a dark blue in Panel 3 (2020) represents 65% Democratic, but they look the same. Always fix vmin and vmax across all panels in a small-multiples display.

Best Practice: Fixed Scales in Small Multiples When building small multiples of choropleths or bar charts across time or groups, always set the axis scale to the range of the entire dataset, not each panel individually. This is the single most common error in comparative political visualization. geopandas.plot() does not fix scales across subplots by default — you must explicitly set vmin and vmax for each panel.

16.12.3 Annotation as Analysis

The best political data visualizations annotate key data points, trends, or reference lines that carry analytical meaning. Annotation transforms a visualization from a display of data into an argument about what the data means.

Effective annotation in political charts: - Reference lines at analytically meaningful thresholds (50% = toss-up in a support score chart; 0 = party parity in a registration margin chart) - Labels on outlier counties or precincts that illustrate specific analytical points (the county with the largest turnout improvement, the district that flipped parties) - Text annotations at inflection points in time series ("2020 presidential surge"; "Dobbs decision, June 2022") - Shaded regions indicating context ("GOP wave year" shading over time series bars)

Annotation requires judgment: too little and the chart is a Rorschach test that readers interpret freely; too much and it becomes didactic, preventing the reader from making their own observations. The goal is to highlight what is analytically important without pre-empting all reader engagement.

16.12.4 Uncertainty Visualization in Political Data

Political analysts frequently present estimates — modeled support scores, predicted turnout probabilities, projected vote shares — that carry statistical uncertainty. Best practice is to make this uncertainty visible rather than hiding it behind point estimates.

Several techniques are available: - Error bars: Simple ± error bars on bar charts, showing confidence intervals. Effective for small-sample subgroups where uncertainty is high. - Shaded confidence bands: For time series, a shaded region around the trend line showing the 80% or 95% confidence interval. Used in most sophisticated polling averages. - Multiple scenarios: Showing three scenarios (pessimistic/central/optimistic) rather than a single forecast. This is how many election forecasting models present their outputs. - Transparent markers: Using lower opacity markers in scatter plots for data points with higher uncertainty (small samples, low-quality data) compared to higher-confidence points.

The challenge in campaign contexts is that decision-makers often find uncertainty displays confusing or anxiety-inducing. A field director who sees a confidence band that includes both victory and defeat may feel less actionable information than one who sees a single point estimate. The analyst's task is to present uncertainty in ways that inform rather than paralyze, which often means pairing uncertainty displays with explicit guidance on what decisions would change under different scenarios.

⚠️ Common Pitfall: False Precision in Political Visualization Reporting modeled support scores to one decimal place (Garza support: 52.3) implies a precision that the underlying model almost certainly doesn't support. A model with standard error of ±3–5 points on individual county estimates should report county scores as "approximately 50–55" rather than "52.3." Political visualizations that display false precision — whether from rounding to many decimal places or from using a color scale that makes 0.5-point differences look dramatic — mislead stakeholders about the reliability of the underlying analysis.


16.13 The Intersection of Visualization and Data Justice

16.13.1 Who Is Made Visible and Who Is Erased

Data visualization is not a neutral act. The choice of what to visualize — which groups to disaggregate, which geographic units to display, which variables to include — determines who is made visible in the analysis and who is erased or aggregated away.

In electoral analytics, several common practices can erase important distinctions: - Aggregating all "Hispanic/Latino" voters into a single category misses enormous heterogeneity — between Cuban Americans and Mexican Americans, between U.S.-born and recently naturalized citizens, between urban and rural communities. A single aggregate score for "Hispanic support" may obscure that Garza is running 65 points among urban Mexican American communities and 48 points among rural Cuban-heritage communities in the state's southern region. - County-level maps of a state with large Native American reservation populations may show blank or "grey" areas for counties with small registered voter populations, making those communities visually absent from the electoral analysis. - Turnout analyses that focus on registered voters omit the large populations of eligible but unregistered voters — people who may be systematically excluded from the analysis because they face structural barriers to registration.

The data justice perspective, developed by scholars like Catherine D'Ignazio and Lauren Klein in their book Data Feminism, asks who is represented in data, who is harmed by data analysis, and whose interests are served by particular analytical choices. For political analysts, this is not merely a philosophical question — it has direct practical implications for whom campaigns target, whom they ignore, and whose participation they treat as the baseline assumption.

16.13.2 Visualization and the Construction of the "Normal Electorate"

A subtle but important point about political visualization is that the data we visualize describes who voted in the past, not who could vote in the future. When Nadia builds a turnout model based on historical voting patterns and visualizes county-level turnout rates, she is visualizing a political equilibrium that was partly produced by which communities had access to the ballot, which had effective GOTV infrastructure, and which faced structural barriers that depressed participation.

Treating this historical participation pattern as the "baseline" or the "normal electorate" embeds past exclusions into future expectations. A county with a 35% voter registration rate among its Hispanic-age population is not a county where Hispanic people "just don't vote" — it may be a county where past organizational neglect, registration barriers, and community distrust have produced a suppressed baseline that a well-resourced mobilization program could change substantially.

The visualization implication: political data visualizations should distinguish between historical patterns and expected futures, and should be cautious about using historical participation rates as a forecast without examining the factors — structural, organizational, political — that produced them. Nadia's visualization of turnout propensity scores is most useful when she accompanies it with explicit commentary on which counties' low turnout reflects genuine low preference for participation versus structural suppression of a motivated-but-obstructed community.

⚖️ Ethical Analysis: The Visualization of Suppression If a county shows low turnout among Hispanic voters partly because of aggressive roll purges and limited Spanish-language materials at polling places, a visualization that simply displays "low turnout" as a static fact is making a political choice. It is treating suppression as natural rather than constructed. The most ethically complete political visualization of turnout data would distinguish between communities where low turnout reflects structural barriers (and therefore represents a mobilization opportunity) and communities where it reflects genuine low preference for participation. Making this distinction requires qualitative organizational knowledge that quantitative data alone cannot provide — which is why the Nadia-Jake partnership is analytically superior to either working alone.


16.14 Summary: The Analyst as Cartographer

Data visualization is, at its core, an act of cartography: you are creating maps of a complex terrain for people who need to navigate it. Like all maps, visualizations simplify. They highlight some features and obscure others. They encode assumptions in their construction that shape what readers see.

The political analyst's responsibility is to make those assumptions explicit, to match visualization to question, to design for the audience that needs to act on the information, and to represent uncertainty honestly rather than papering over it.

For Nadia, the choropleth map she showed Jake was not just a prettier version of the wall map. It was a different model — one that encoded population into bubble size, used a gradient rather than binary colors, and showed propensity alongside partisanship. Each of those encoding choices was an analytical decision as much as a design decision. Learning to make those decisions thoughtfully is the mark of a mature political data analyst.


Chapter Summary

  • Visualizations are models: every design choice encodes assumptions that shape what readers see. The map is not the territory.
  • Match chart type to the question: choropleths for geography, bar charts for comparison, scatter plots for relationships, heatmaps for crosstabulation, time series for trends.
  • Color choices matter: use diverging scales for data with a natural midpoint, sequential scales for one-directional data, and always consider colorblind accessibility.
  • Advertising effects decay rapidly (half-life ~1-2 weeks), implying spending should be concentrated close to Election Day.
  • Choropleth maps require careful geographic merging and attention to name-matching; always validate merge quality before generating maps.
  • The ecological fallacy: correlations at the county level do not imply the same correlations at the individual level. Always be explicit about level of analysis.
  • Interactive visualizations (Plotly) allow exploration by non-technical audiences and reduce the need for the analyst to anticipate every question in advance.
  • Misleading visualizations — truncated axes, cherry-picked windows, non-comparable denominators — are common in political contexts; analysts have an ethical obligation to represent data honestly.

A Final Note on Visualization as Political Communication

Political data visualization does not exist in a neutral space. Maps of election results are used by campaigns, journalists, advocacy organizations, and foreign interference actors alike, sometimes for purposes far removed from honest communication of information. The same techniques that Nadia uses to help the Garza campaign understand its electoral geography are used by bad actors to create misleading impressions of political reality — maps designed to make election outcomes look illegitimate, visualizations crafted to suppress voter enthusiasm, graphics that misrepresent demographic trends to support ethnonationalist narratives.

This does not mean political data visualization is inherently corrupting. It means that the analyst's choices carry political weight that goes beyond the immediate campaign context. A visualization shared on social media has a life beyond the analyst's control. A choropleth published by a reputable news organization shapes how millions of people understand the political geography of their country. The same skills that make a good campaign data analyst — the ability to encode information into visual form, to make complex patterns legible, to create graphics that people trust and share — are skills with consequences that extend beyond any single election cycle.

The responsibility this creates is not to refuse to visualize political data, but to do so with full awareness of how visualizations function as political communication. Label your axes. Show your scale. Acknowledge what your aggregation choices do and don't capture. Be explicit about uncertainty. Don't truncate. These are not just best practices for technical correctness — they are ethical commitments to honest representation of data that bears on the health of democratic institutions. The map is not the territory, but the map shapes what people think the territory is. Build maps accordingly.


Next: Chapter 17 examines poll aggregation — how to combine multiple polls into more reliable estimates than any single poll can provide.