Every chart you have made so far has been a mapping from data to visual space — the x-axis represents one variable, the y-axis represents another, and the reader decodes the positions to understand the data. Maps are different. On a map, the x and y...
Learning Objectives
- Explain the fundamentals of geospatial data: projections, coordinate reference systems, GeoJSON, shapefiles
- Create choropleths with Plotly Express and with geopandas + matplotlib
- Create scatter maps (dot maps) with px.scatter_geo and px.scatter_mapbox
- Create static maps with geopandas and matplotlib for publication output
- Apply appropriate color scales to geographic data and avoid the population-proxy pitfall
- Select the appropriate map type: choropleth, dot map, hex bin, or cartogram
- Use Folium for interactive Leaflet maps embedded in notebooks and HTML
In This Chapter
- 23.1 Why Maps Are Different
- 23.2 Geospatial Fundamentals: Projections and CRS
- 23.3 GeoJSON and Shapefiles: The Data Formats
- 23.4 Choropleths with Plotly Express
- 23.5 Choropleths with geopandas and matplotlib
- 23.6 Dot Maps with Plotly
- 23.7 Altair Geospatial
- 23.8 Folium: Interactive Leaflet in Python
- 23.9 The Population Proxy Pitfall
- 23.10 Map Type Selection
- 23.11 Progressive Project: Climate in Space
- 23.12 Ethical Cartography
- 23.13 Performance Considerations
- 23.14 Installing geopandas and the Geospatial Stack
- 23.15 Comparing the Libraries
- 23.16 Common Mistakes and Their Fixes
- 23.17 Check Your Understanding
- 23.18 Chapter Summary
- 23.19 Spaced Review
Chapter 23: Geospatial Visualization — Maps, Choropleths, and Location Data
"The map is not the territory." — Alfred Korzybski, Science and Sanity (1933)
23.1 Why Maps Are Different
Every chart you have made so far has been a mapping from data to visual space — the x-axis represents one variable, the y-axis represents another, and the reader decodes the positions to understand the data. Maps are different. On a map, the x and y positions are not abstract quantities; they are actual physical locations on the surface of the earth. The reader brings enormous prior knowledge to a map: they know where countries are, they recognize coastlines, they have intuitions about distances and sizes. This prior knowledge is simultaneously a gift and a trap.
The gift is that maps are immediately legible. A viewer who has never seen your data can look at a map of the United States colored by some metric and instantly grasp the geographic pattern: this region is dark, that region is light, the coastal areas differ from the interior. No training is required, because everyone already knows what the United States looks like. The same information as a bar chart of 50 state values would take 30 seconds to read; the map takes 2 seconds.
The trap is that maps lie in predictable ways. The world is a sphere, and flattening it onto a two-dimensional page distorts something — areas, shapes, distances, or directions. The Mercator projection, the most common map projection on the internet, makes areas near the poles look enormous relative to equatorial areas: Greenland appears larger than Africa, when Africa is 14 times bigger. A reader who trusts the visual area of the map gets misled. Even a "honest" projection distorts something, because there is no way to flatten a sphere without distortion. The choice of projection is a choice about which distortions to accept.
Color choices on maps are similarly fraught. A map of US counties colored by total population looks impressive — dense urban counties light up, rural counties are dim — but the map is almost useless as an analytical tool because it just shows where people live. If you are studying anything that varies with population (crime counts, COVID cases, pizza restaurants per county), your map will mostly reflect the population, not the variable of interest. The fix is normalization: divide by population to get rates rather than counts. But the unnormalized map looks more compelling at first glance, and authors sometimes publish it anyway. The chapter's threshold concept — maps are arguments about space — means that every choice you make about projection, color, and normalization is a rhetorical choice, and your readers will interpret the map based on those choices whether you meant them to or not.
This chapter covers the main Python libraries for geospatial visualization and the design principles that should guide their use. The libraries are Plotly (for interactive maps), geopandas and matplotlib (for static, publication-quality maps), Altair (for declarative geospatial visualization), and Folium (for interactive Leaflet maps in the browser). Each has strengths and weaknesses, and most real geospatial projects use several of them together.
23.2 Geospatial Fundamentals: Projections and CRS
Before you can draw a map, you have to understand projections — the mathematical transformations that convert latitude and longitude (positions on a sphere) to x and y (positions on a flat page). Every projection has trade-offs, and the choice of projection is the most consequential design decision in any geospatial visualization.
The main categories of projection:
Cylindrical projections wrap the earth onto a cylinder and then unroll it. Mercator (1569) is the most famous — it preserves angles, so navigators can draw straight lines for courses, but it dramatically distorts area away from the equator. It is what Google Maps, Bing Maps, and most web maps use, because it matches the way tile-based web mapping works. The flip side is that Mercator makes Greenland look the size of Africa (it is not), and it makes Russia look enormous (it is, but not that enormous).
Conic projections wrap the earth onto a cone that touches the sphere at one or two reference latitudes. Lambert conformal conic is common for mid-latitude regions (US, Europe) because it minimizes distortion in the latitude band of interest. This is what many national atlases use for country-level maps.
Azimuthal projections project from a point onto a plane touching the sphere. Orthographic projection looks like a photograph of the earth from space and is good for showing hemispheres or polar regions.
Equal-area projections (Mollweide, Eckert IV, Robinson, Winkel Tripel, Goode homolosine) preserve areas at the cost of shape. These are the projections you want for any map where the reader might be tempted to compare sizes — a world map of GDP, population, or emissions should use an equal-area projection so that comparisons are honest. Robinson is a popular compromise projection that is not strictly equal-area but is close, and it has a pleasing appearance.
Coordinate Reference Systems (CRS) are the formal specifications for projections plus a coordinate origin. A CRS has an EPSG code — a unique integer identifier maintained by the European Petroleum Survey Group — that lets you refer to a specific projection unambiguously. Common codes:
- EPSG:4326 — WGS84, the "raw" latitude/longitude coordinate system used by GPS.
- EPSG:3857 — Web Mercator, the projection used by Google Maps and most web mapping tiles.
- EPSG:3035 — ETRS89 Lambert azimuthal equal-area for Europe.
- EPSG:5070 — NAD83 Albers equal-area conic for the contiguous US.
Most geospatial libraries let you reproject between CRS with a single call. The general rule: start in EPSG:4326 (lat/lon), transform to an appropriate projected CRS for visualization, transform back if you need to join with other lat/lon data. Skipping reprojection and drawing lat/lon directly produces a "plate carrée" projection (equirectangular) that is usable for quick plots but not cartographically rigorous.
23.3 GeoJSON and Shapefiles: The Data Formats
Geographic data has its own file formats, and you will encounter them regardless of which library you use. The two main formats:
GeoJSON is a JSON-based format that represents geographic features as a nested dictionary. A GeoJSON file has a type field ("FeatureCollection"), a features list, and each feature has a geometry (point, polygon, multipolygon, line, etc.) and properties (key-value metadata). GeoJSON is human-readable, versionable, and compact for small datasets. It is the lingua franca of web mapping — Plotly, Folium, Leaflet, and D3 all accept GeoJSON directly.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {"type": "Point", "coordinates": [-122.33, 47.60]},
"properties": {"city": "Seattle", "population": 750000}
}
]
}
Shapefiles are an older format invented by Esri (the GIS company behind ArcGIS) in the 1990s. A shapefile is actually a set of files that travel together: a .shp with the geometry, a .shx with the index, a .dbf with the attribute table, and optionally .prj with the projection and .cpg with the character encoding. Shapefiles are the standard format for traditional GIS, and most government geographic data is distributed as shapefiles. They are binary, bulkier than GeoJSON, and annoying to version (you have to track multiple files), but they are universally supported.
Python reads both formats through the same libraries. geopandas (discussed in Section 23.5) reads shapefiles, GeoJSON, GeoPackages, KML, and several other formats with a single gpd.read_file() call. Plotly's geospatial functions accept GeoJSON dicts or URLs directly. Altair accepts GeoJSON or TopoJSON (a compact variant of GeoJSON with shared boundaries).
For practical purposes, you will typically download geographic data from one of these sources:
- Natural Earth (naturalearthdata.com) — free public-domain geographic data for countries, states, rivers, lakes, physical features. Available at three scales (1:10m, 1:50m, 1:110m) for different zoom levels. This is the gold standard for country and state boundaries in any map that does not need extreme precision.
- US Census TIGER — detailed US geography: states, counties, census tracts, blocks, roads. Free and authoritative for anything US-related.
- OpenStreetMap — the largest crowdsourced geographic database. Excellent for streets, buildings, points of interest, and anything requiring street-level detail.
- GADM — global administrative boundaries at several levels (country, state/province, county/district). Useful for non-US multi-country analyses.
- Plotly's built-in data —
px.data.election(),px.data.gapminder(), and others ship with pre-packaged geographic data for common tutorials.
23.4 Choropleths with Plotly Express
The simplest choropleth in Python is px.choropleth. You pass a DataFrame with a location column (ISO country codes, US state codes, or a GeoJSON ID) and a color column, and Plotly does the rest.
import plotly.express as px
gapminder = px.data.gapminder().query("year == 2007")
fig = px.choropleth(
gapminder,
locations="iso_alpha", # ISO 3-letter country codes
color="lifeExp",
hover_name="country",
color_continuous_scale="viridis",
projection="natural earth",
title="Life Expectancy, 2007",
)
fig.show()
This produces an interactive world map colored by life expectancy. Hovering over a country shows the name; clicking and dragging pans; scrolling zooms. The projection="natural earth" argument picks a projection — "natural earth" is Robinson's compromise projection, a good default for world maps. Other options include "mercator", "equirectangular", "orthographic", "robinson", "eckert4", "azimuthal equal area", and many more.
Plotly's px.choropleth uses built-in country and US-state boundaries. For custom geographies (counties, districts, neighborhoods), use px.choropleth_mapbox with a GeoJSON file:
import json
with open("us_counties.geojson") as f:
counties = json.load(f)
fig = px.choropleth_mapbox(
covid_county_df,
geojson=counties,
locations="fips",
color="cases_per_100k",
color_continuous_scale="reds",
mapbox_style="carto-positron",
zoom=3,
center={"lat": 37.8, "lon": -96},
opacity=0.6,
title="COVID cases per 100k, by county",
)
fig.show()
The mapbox_style argument picks the underlying base map (a tile layer). "carto-positron" is a clean light-gray style that lets the choropleth colors stand out. Other free options include "carto-darkmatter", "open-street-map", "stamen-terrain", and "white-bg". Styles that include the word "mapbox" (like "satellite", "streets", "outdoors") require a free Mapbox access token, which you can get at mapbox.com.
A choropleth map is defined by three design choices: (1) the geography (what boundaries to draw), (2) the variable (what to color by), and (3) the color scale (how to map values to colors). Each choice is consequential, and skipping any of them leads to sloppy maps.
For the color scale, the same rules from Chapter 3 apply:
- Use sequential palettes for data that goes from low to high (population, income, temperature).
- Use diverging palettes for data that has a meaningful midpoint (deviation from a baseline, change over time, election margins).
- Use categorical palettes for unordered classes (zoning categories, soil types).
- Avoid rainbow/jet for quantitative data. Viridis and its variants are the safe default.
23.5 Choropleths with geopandas and matplotlib
For static publication-quality maps, the canonical Python stack is geopandas + matplotlib. geopandas extends pandas with geospatial types — a GeoDataFrame is a DataFrame with a geometry column — and provides read/write for all the common formats, reprojection, geometric operations (union, intersection, buffer), and basic plotting.
import geopandas as gpd
import matplotlib.pyplot as plt
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
fig, ax = plt.subplots(figsize=(15, 8))
world.plot(
column="gdp_md_est",
cmap="viridis",
legend=True,
ax=ax,
)
ax.set_title("World GDP (millions USD)", fontsize=16)
ax.set_axis_off()
plt.show()
This loads a built-in Natural Earth world map, plots each country colored by GDP, and adds a colorbar. The result is a static image suitable for inclusion in a LaTeX document or a printed report. The styling follows matplotlib conventions — ax.set_axis_off() removes the axis box (which is not meaningful for a map), legend=True adds a colorbar, cmap picks the palette.
For custom styling and layout, you have full matplotlib access: annotations, multiple axes, custom colorbars, titles, spines. The GeoDataFrame integrates with matplotlib exactly the way a regular DataFrame integrates with pandas plotting. A more customized example:
from matplotlib.colors import LinearSegmentedColormap
fig, ax = plt.subplots(figsize=(15, 8), dpi=150)
# Background: all countries in light gray
world.plot(ax=ax, color="#eeeeee", edgecolor="#cccccc", linewidth=0.5)
# Foreground: countries with valid GDP data colored
valid = world[world["gdp_md_est"] > 0]
valid.plot(
column="gdp_md_est",
cmap="viridis",
legend=True,
ax=ax,
edgecolor="white",
linewidth=0.3,
legend_kwds={
"label": "GDP (millions USD)",
"orientation": "horizontal",
"shrink": 0.5,
},
)
# Reproject to Robinson for an equal-ish-area world view
ax.set_xlim(-180, 180)
ax.set_ylim(-60, 85)
ax.set_title("Global GDP — Natural Earth, Robinson Projection", fontsize=14, pad=20)
ax.set_axis_off()
plt.tight_layout()
plt.savefig("gdp_world.png", dpi=200, bbox_inches="tight")
The real projection change would use world.to_crs(...) before plotting, but for illustration the axis limits work. For a true Robinson projection:
world_robinson = world.to_crs("ESRI:54030") # Robinson projection
world_robinson.plot(...)
The ESRI:54030 code is Robinson. After reprojection, the coordinates are in meters, not degrees, and the axis limits should be in meters (millions of meters, typically).
geopandas + matplotlib is the most powerful option for static maps. It gives you full control over every visual element, it integrates with the scientific Python stack, and it produces print-quality output. The trade-off is that it is verbose compared to Plotly Express — a custom static map can be 30–50 lines of Python vs. Plotly's 5–10. For interactive or quick-turnaround work, Plotly is faster; for publication-quality print, geopandas + matplotlib is better.
23.6 Dot Maps with Plotly
A dot map shows individual points on a map rather than aggregated regions. Each point represents one observation: a city, a hospital, a weather station, a crime incident. Dot maps are best for location data where the individual positions are the story — "where are our customers?" or "where did earthquakes occur last year?" or "where are the climate monitoring stations?"
Plotly has two dot map functions:
px.scatter_geo uses the same geographic axes as px.choropleth:
fig = px.scatter_geo(
airports,
lat="latitude",
lon="longitude",
hover_name="airport_name",
size="passengers",
color="country",
projection="natural earth",
)
px.scatter_mapbox uses Mapbox tiles for the base map and requires lat/lon coordinates:
fig = px.scatter_mapbox(
airports,
lat="latitude",
lon="longitude",
hover_name="airport_name",
size="passengers",
color="country",
zoom=3,
mapbox_style="carto-positron",
)
The distinction matters: scatter_geo uses a static geographic projection (good for world maps), while scatter_mapbox uses dynamic web tiles (good for zoomable local or regional maps). For a map of "global airports," use scatter_geo with projection="robinson" or similar. For a map of "Seattle coffee shops," use scatter_mapbox with a high zoom level.
Dot map design decisions:
- Size encoding. Bubble size for numeric magnitude (population, revenue). Be careful with size ranges — very small dots disappear, very large dots obscure neighbors.
- Color encoding. Categorical colors for categories; sequential colormap for continuous values.
- Overlap. Dot maps with many points suffer from overplotting the same way scatter plots do. Options: lower alpha (
opacity=0.3), usescatter_mapboxwith clustering (some libraries support cluster markers), or aggregate into hex bins or choropleths. - Projection for dot maps. Equal-area projections are better for world-scale dot maps because they do not inflate the visual importance of high-latitude points.
23.7 Altair Geospatial
Altair supports geospatial visualization through the mark_geoshape() mark and topojson data loading. The pattern is:
import altair as alt
from vega_datasets import data
states = alt.topo_feature(data.us_10m.url, "states")
unemp = pd.read_csv("https://vega.github.io/vega-datasets/data/unemployment.tsv", sep="\t")
chart = alt.Chart(states).mark_geoshape().encode(
color=alt.Color("rate:Q", scale=alt.Scale(scheme="blues")),
tooltip=["id:N", "rate:Q"],
).transform_lookup(
lookup="id",
from_=alt.LookupData(unemp, "id", ["rate"]),
).project("albersUsa").properties(width=700, height=400)
chart
The alt.topo_feature(url, feature_name) loads a TopoJSON file and extracts a specific feature collection. The mark_geoshape() draws the boundaries. The transform_lookup joins the geometric data with the unemployment data by id. The project("albersUsa") uses the Albers USA projection, which shows Alaska and Hawaii as insets (a common US convention).
Altair's geospatial support is less complete than Plotly's or geopandas's — there are fewer built-in projections, fewer map customization options, and no native Mapbox integration. But for the cases Altair supports, the grammar-of-graphics approach is elegant: a choropleth is just a mark_geoshape with a color encoding and a data lookup. No loops, no boilerplate, no state management.
Altair is also well-suited to linked-view geospatial exploration. A brushed scatter plot on one panel can filter a choropleth on another panel through the same transform_filter(selection) pattern you saw in Chapter 22. This kind of linked geographic exploration is difficult in other libraries and easy in Altair.
23.8 Folium: Interactive Leaflet in Python
Folium is a Python wrapper around the Leaflet JavaScript library. Leaflet is to interactive web maps what D3 is to general web visualization — the foundational, most widely-used library in its category. Folium lets you build Leaflet maps in Python and render them in Jupyter notebooks or save them as HTML files.
import folium
m = folium.Map(location=[37.7749, -122.4194], zoom_start=12, tiles="OpenStreetMap")
folium.Marker(
location=[37.7749, -122.4194],
popup="San Francisco",
icon=folium.Icon(color="red", icon="info-sign"),
).add_to(m)
folium.CircleMarker(
location=[37.8, -122.3],
radius=20,
popup="Berkeley area",
color="blue",
fill=True,
fill_opacity=0.3,
).add_to(m)
m
This creates a map centered on San Francisco, adds a pin marker and a circle marker, and displays it inline. The map is fully interactive: zoom, pan, click markers for popups. You can save it with m.save("map.html") for sharing.
Folium supports choropleth layers, heatmaps (via the folium.plugins.HeatMap plugin), marker clusters, tile layer switching, and GeoJSON overlays. For general-purpose interactive maps — especially for exploratory analysis or embedding in static HTML reports — Folium is often the fastest tool.
The Folium choropleth pattern:
m = folium.Map(location=[39.8, -96], zoom_start=4)
folium.Choropleth(
geo_data="us_states.geojson",
data=df,
columns=["state", "metric"],
key_on="feature.properties.state",
fill_color="YlOrRd",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Metric name",
).add_to(m)
m
Folium's choropleth ties together a GeoJSON file and a DataFrame by a common key (key_on for the GeoJSON property and columns for the DataFrame). The color scale is specified as a ColorBrewer name string.
Folium's advantage over Plotly is that the base map is Leaflet, which has a very mature ecosystem of tile providers, plugins, and customization options. Its disadvantage is that the Python-to-JavaScript bridge is less polished than Plotly's — some Leaflet features are exposed inconsistently, and debugging means reading both Python and generated JavaScript. For production-grade interactive maps, Folium is usually fine; for complex custom interactions, you may need to drop down to raw Leaflet JavaScript.
23.9 The Population Proxy Pitfall
The single most common mistake in choropleth mapping is the population proxy problem. Consider a map of the US colored by "number of COVID cases" at the county level. The resulting map has dark counties in New York, Los Angeles, Chicago, Houston — the big cities. It looks like a dramatic pattern. But is it?
No. It is just a map of where people live. New York has the most COVID cases because it has the most people. Los Angeles has the second most because it has the second most people. The pattern in the map is the pattern of population, not the pattern of COVID. Any variable that scales with population — crime count, restaurant count, income total, school count — will produce this same map, because the underlying variable is population and the rest is noise.
The fix is normalization. Instead of "cases," use "cases per 100,000 residents." Instead of "crimes," use "crime rate." Instead of "schools," use "schools per capita." The normalized version removes the population effect and shows what is actually varying. The map changes dramatically: the dark spots move, new patterns emerge, and the map becomes analytically useful.
The unnormalized map is not always wrong. If the question is "where are the most people affected?", the count map is appropriate. If the question is "where is the risk highest?", the rate map is appropriate. Different questions call for different maps. But the unnormalized map is almost always unhelpful for analytical purposes, and presenting it without disclosing the population effect is a form of visual lie. Always ask yourself: "is my map just a population map in disguise?" If yes, normalize.
Related pitfalls:
Area distortion. A county is a county regardless of its geographic area. Los Angeles County covers over 4,000 square miles; Arlington County, Virginia, covers 26. On a choropleth map, they get one color cell each, but the visual weight is entirely different — LA County fills a huge area and dominates visually, while Arlington is nearly invisible. Readers looking at a choropleth tend to attribute importance to visual area, which misrepresents data that is not actually area-weighted. For this reason, cartograms (where area is distorted to reflect the data) are sometimes preferable for election coverage and other count-based analyses.
Small-area distortion. Small-population areas often have the most extreme rates (by coincidence — a single case in a tiny county is a huge rate), but they are visually small and get overlooked. Large-population areas have more statistically stable rates but look visually important.
Color scale extremes. A linear color scale from 0 to max can be dominated by a few outlier values. Log scales, quantile scales, and custom breaks can produce better-distributed colors. The classification or binning approach — dividing the data into equal-count buckets — is often better than linear for highly skewed distributions.
23.10 Map Type Selection
Different geographic questions call for different map types. The main options:
Choropleth: regions colored by a rate or density. Best for: comparing rates across administrative boundaries (states, counties, countries). Worst for: counts (use per-capita), or when the administrative boundaries are irrelevant to the question.
Dot map (point map): individual points placed at their geographic locations. Best for: showing where things are (earthquakes, coffee shops, customers). Worst for: aggregated comparisons across regions (use choropleth or summary).
Proportional symbol map: points with sizes proportional to a value. Best for: showing magnitude at specific locations (city populations, store revenues). Worst for: dense data (overlapping symbols become unreadable).
Hex bin map / density map: space binned into a regular grid and colored by density. Best for: very large point datasets where individual points overplot. Examples: Uber trip origins, social media checkins, sensor readings.
Cartogram: administrative boundaries distorted so that area represents a variable (usually population). Best for: election coverage where you want states to be weighted by votes rather than geographic area. Example: the "purple America" election map that shows states sized by electoral votes. Python libraries for cartograms are limited; geopandas-view and some specialized tools exist.
Flow map: lines or arrows showing movement between locations. Best for: migration, trade, transportation. Can be produced with Plotly (go.Scattergeo with line mode) or matplotlib (with geographic projection and FancyArrow patches).
Heatmap (in geographic sense): colored density on top of a base map. Best for: continuous spatial phenomena (pollution levels, rainfall). Different from the general statistical heatmap (which is a 2D matrix).
The decision tree: What is the unit of observation? If it is a region, use a choropleth (with per-capita normalization if the variable scales with population). If it is a point, use a dot map or proportional symbol. If it is a continuous field, use a heatmap. If it is flow between locations, use a flow map. If the geography itself is the variable of interest (which state is which), use a cartogram.
23.11 Progressive Project: Climate in Space
We return to the climate dataset for its geographic treatment. Up to now, the climate data has been treated as a time series — temperature over years, CO2 over years, sea level over years. But climate data has a spatial dimension too: temperature varies by latitude, sea level rise affects coastal areas more than interior regions, and the monitoring stations that produce the raw data are located at specific places on earth.
The exercise for this chapter builds several climate maps:
Map 1: Station locations. A dot map showing the locations of the major temperature monitoring stations worldwide. Plotly's px.scatter_geo with latitude and longitude columns produces this in a few lines:
stations = pd.DataFrame({
"station": ["Mauna Loa", "South Pole", "Barrow", "Cape Grim", ...],
"lat": [19.5, -89.9, 71.3, -40.7, ...],
"lon": [-155.6, 0.0, -156.8, 144.7, ...],
"country": ["USA", "Antarctica", "USA", "Australia", ...],
})
fig = px.scatter_geo(stations, lat="lat", lon="lon", hover_name="station",
color="country", projection="natural earth",
title="Global Climate Monitoring Stations")
fig.show()
Map 2: Temperature anomaly by region. A choropleth showing regional temperature anomalies — regions in the Arctic warming faster than tropical regions, for example. Requires a region-level dataset and px.choropleth with appropriate locations or px.choropleth_mapbox with custom GeoJSON.
Map 3: Static publication map. The same choropleth rebuilt in geopandas + matplotlib with a Robinson projection, custom colormap, and publication-quality typography. Suitable for inclusion in a scientific paper or report.
Map 4: Folium interactive. An interactive Folium map with markers for stations and a choropleth overlay of regional anomalies, with hover popups showing detailed statistics. For interactive sharing.
The lesson: no single library handles every map use case. Plotly is fastest for interactive sharing. geopandas + matplotlib is best for publication. Folium is best for quick interactive prototypes. Altair is best for linked-view exploration with other non-map charts. A professional geospatial project uses two or three of these libraries together, picking the right tool for each delivery context.
23.12 Ethical Cartography
Maps are one of the oldest forms of data visualization, and they have been misused for as long as they have existed. The ethical considerations in Chapter 4 apply to maps with special force, because the reader's prior knowledge of geography makes them trusting of maps in a way they are not of abstract charts.
Choosing the projection. If your map is going to invite area comparisons — "is Africa larger than Greenland?" — use an equal-area projection (Mollweide, Eckert IV, Robinson, Goode). Web Mercator is familiar but wrong for this purpose. The Mercator projection's area distortion has been linked to misconceptions about the relative sizes of continents, with documented effects on public perception of development economics and international power. If you have a choice, make it deliberately.
Choosing the scale. A map of the "United States" that does not include Alaska, Hawaii, and Puerto Rico is implicitly saying those territories are not part of the story. Sometimes that is justified (a map of contiguous state elections, where Alaska and Hawaii are visual clutter). Sometimes it is not (a map of "US health outcomes" that silently omits Puerto Rico's health data). The scale and cropping of a map is an editorial decision.
Choosing the categories. Choropleths often show data at one administrative level (state, county, census tract). The choice matters: a map at the state level tells a different story than a map at the county level. States average over internal variation; counties reveal it. Neither is wrong, but each emphasizes different things. The choice of administrative level is a framing decision.
Choosing the colors. A map of "unemployment rate" colored red-to-green implies value judgments ("red is bad, green is good") that the raw data does not carry. Using a neutral diverging palette (blue-to-orange, say) avoids the implied evaluation. In politically sensitive maps — election results, refugee flows, conflict zones — color choices can make the difference between neutral reporting and partisan advocacy. The chart maker is always editorializing; the question is whether the editorializing is transparent.
Disclosing the limits. No map can show everything. What your map omits is as important as what it includes. A world map of "GDP per capita" shows wealth but not distribution of wealth within countries. A county-level COVID map shows incidence but not testing capacity. Good cartography acknowledges these limitations in the caption, the title, or the accompanying text. Silence about limitations lets readers assume the map is more comprehensive than it is.
The ethical checklist for a map:
- Is the projection appropriate for the comparisons the reader will make?
- Is the color scale appropriate (sequential vs. diverging) for the data's structure?
- Are the values normalized if they scale with population or area?
- Are the boundaries at the right administrative level for the question?
- Does the caption disclose the data source, the projection, and any major limitations?
- Would a different defensible choice of any of the above change the story?
The last question is the most important. If reasonable alternative choices would change the story, the original choice is doing argumentative work that needs to be justified. Silent choices in geospatial visualization are not neutral; they are just unexamined.
23.13 Performance Considerations
Geospatial visualizations can be performance-intensive because geographic data is often large. A detailed shapefile of US counties has thousands of polygons; a world dataset at high resolution can have hundreds of thousands. When the data is that big, rendering performance and file size become real concerns.
Simplification. The full-resolution Natural Earth data is at 1:10 million scale — every bend in every coastline is represented. For a world map displayed at screen resolution, this is overkill. The 1:50 million or 1:110 million scales have the same shapes simplified, producing much smaller files with indistinguishable visual output at most zoom levels. Always use the simplest geometry that is adequate for the display resolution. geopandas has a .simplify(tolerance) method for custom simplification.
TopoJSON. Vega-Lite (and Altair) prefer TopoJSON over GeoJSON because TopoJSON deduplicates shared boundaries. Two countries that share a border store the border line once instead of twice. For large multi-country datasets, TopoJSON can be 30–50% smaller than equivalent GeoJSON. Convert with geojson2topo or similar tools.
Precomputing aggregations. If your map is a choropleth of rates aggregated from point data, compute the aggregation in pandas before plotting. Do not make the visualization library aggregate millions of raw points — the library's aggregation tools (where they exist) are usually slower than a direct pandas groupby.
Tile-based base maps. For zoomable maps, the base map (streets, terrain, labels) is typically delivered as pre-rendered tile images from a tile server. This offloads the rendering work from the browser to a CDN. Plotly Mapbox and Folium both use this approach. For offline use, the tile server is a limitation — your map needs internet access to load the base tiles.
WebGL for large point sets. Plotly's scatter_mapbox uses WebGL rendering for large point datasets (thousands to millions of points). A dot map of a million points is feasible if you use the Mapbox variant; the plain scatter_geo variant will struggle.
These performance considerations become important once you scale past about 10,000 features. For smaller datasets, any of the libraries work fine without tuning.
23.14 Installing geopandas and the Geospatial Stack
Unlike most Python packages, geopandas has historically been notoriously hard to install. The difficulty comes from its dependencies — it relies on several C libraries (GDAL, GEOS, PROJ) that must be installed on the system before the Python wrapper can work. On Linux and macOS, this is usually straightforward. On Windows, it has been a recurring source of frustration for years.
Recent versions (geopandas 0.14+). Things have improved. The latest geopandas releases ship pre-built wheels on PyPI for most platforms, and pip install geopandas usually works. If it does not:
Option 1: conda-forge. The conda-forge channel provides pre-built binaries for all major platforms:
conda install -c conda-forge geopandas
This is the recommended approach on Windows. The conda packages include the C libraries and handle the dependency chain correctly.
Option 2: pyogrio backend. Modern geopandas can use pyogrio instead of fiona for reading geographic files. Pyogrio is faster and has simpler installation requirements:
pip install pyogrio
Then in Python:
import geopandas as gpd
df = gpd.read_file("path.shp", engine="pyogrio")
Option 3: OSGeo4W (Windows only). If you need GDAL/GEOS/PROJ for reasons beyond geopandas (e.g., using the GDAL command-line tools), install the full OSGeo4W stack first, then install geopandas into the same Python environment. This is overkill for most use cases but necessary for serious GIS work.
Associated libraries. geopandas is one piece of a larger stack. Related libraries you may need:
- shapely — the core geometry library. geopandas depends on it.
- fiona — the file I/O library. Alternative to pyogrio.
- pyproj — projection transformations. Used by geopandas for
.to_crs(). - rtree — spatial indexing for fast geographic joins.
- rasterio — for raster (image-like) geographic data, complementing geopandas's vector focus.
- xarray — for gridded scientific data like climate model output.
- cartopy — an alternative to geopandas + matplotlib for map visualization, with better projection support.
cartopy deserves a specific mention. It is a separate library (not part of the geopandas stack) that specializes in publication-quality scientific maps with proper projections. If you are making maps for a physics or climate science paper, cartopy is often the right tool instead of geopandas + matplotlib. The API is different — cartopy uses matplotlib axes with a projection argument — but the output quality is excellent for scientific cartography.
23.15 Comparing the Libraries
The table below summarizes the strengths and weaknesses of each library for common geospatial visualization tasks:
| Task | Plotly Express | geopandas + mpl | Altair | Folium |
|---|---|---|---|---|
| Quick world choropleth | Best | Good | Good | Good |
| Custom GeoJSON choropleth | Good (mapbox) | Best | Good | Good |
| Static print-quality map | Okay | Best | Okay | Poor |
| Interactive zoom/pan | Best | None | Good | Best |
| Dot map with thousands of points | Good (scattergl) | Good | Poor (row limit) | Good |
| Custom projection | Good (many options) | Best (any EPSG) | Limited | Limited |
| Linked views (brush + filter) | Limited | None | Best | None |
| Learning curve | Gentle | Moderate | Moderate | Gentle |
| Large dataset performance | Good (Mapbox) | Good | Poor | Moderate |
The pattern: Plotly is the fastest entry point, geopandas + matplotlib is the best for static publication, Altair is the best for linked-view exploration with other charts, and Folium is the best for quick interactive prototypes and embedding in HTML reports. Most real projects use two or three libraries together, picking the right tool for each delivery context.
An additional note on cartopy (not in the table): cartopy is the scientific community's preferred tool for publication maps with proper projections. It is a layer on top of matplotlib, not a wrapper around geopandas, and it has better projection support than geopandas's native matplotlib integration. If you are doing serious scientific cartography (climate maps, oceanographic plots, atmospheric flow maps), learn cartopy. For general data-science choropleths and dot maps, geopandas + matplotlib is usually enough.
23.16 Common Mistakes and Their Fixes
A short catalog of mistakes that appear in geospatial visualizations, with fixes.
Mistake: Unnormalized choropleth. A map of "total population over 65" looks dramatic, with dark clusters in big cities. The reader concludes that aging is an urban problem. In fact, the map just shows where people live — older-skewing rural counties might have 30% seniors while urban counties show 12%. Fix: divide by total population to show the percentage, not the count.
Mistake: Mercator for global comparison. Any world map using Web Mercator distorts polar areas. A map comparing country areas or country-level metrics should use Robinson, Mollweide, or Eckert IV instead. Fix: in Plotly Express, projection="robinson"; in geopandas, df.to_crs("ESRI:54030") for Robinson.
Mistake: Too many color classes. A choropleth with 10 or more color bins is hard to read because adjacent colors blur together. Stick to 5–7 bins unless the audience is technical and needs fine resolution. Fix: use quantiles or natural_breaks classification with a smaller number of bins.
Mistake: Overplotting dots. A dot map of 100,000 customer locations produces a solid black blob. The reader learns nothing. Fix: switch to a hex bin map, a density heatmap, or a clustered marker display. Alternatively, use transparency (alpha = 0.1) so individual points are visible in density.
Mistake: Missing administrative level. A US county-level map is made, but the county boundaries are not shown — only the filled colors. Readers cannot tell where one county ends and another begins. Fix: draw the boundaries with a thin line (edgecolor="white", linewidth=0.3) so the administrative structure is visible.
Mistake: Tiny labels. Place labels on a choropleth are always too small to read at map zoom levels. Fix: drop place labels entirely and use a tooltip (Plotly, Folium) or rely on the base map tiles to provide place context.
Mistake: Ignoring projection units. After reprojection from lat/lon to a projected CRS, the coordinates are in meters (not degrees). Setting axis limits in degrees produces an empty plot. Fix: match the axis limit units to the projection, or use ax.set_axis_off() and let the data extent drive the view.
Mistake: Stale or mismatched administrative boundaries. The US has had county boundary changes (very rarely, usually for reorganization). International boundaries change more often. A choropleth with 2020 data and 2010 boundaries will have mismatches. Fix: use the boundary file for the same year as the data.
Each of these mistakes is common even in published work. Checking for them in your own maps is part of the quality-assurance routine. Build a personal checklist and run through it before publishing any map: projection appropriate for the comparisons readers will make, normalization applied where the variable scales with population or area, color scale matches data structure (sequential vs. diverging), administrative level matches the question being asked, labels are legible or removed, and boundaries are visible and drawn from the correct year. A two-minute checklist catches most errors before the reader ever sees them.
23.17 Check Your Understanding
Before continuing to Chapter 24 (Networks), make sure you can answer:
- Why does every map projection distort something, and what are the main categories of distortion (area, angle, distance, shape)?
- What is the difference between EPSG:4326 and EPSG:3857?
- What are the two main file formats for geographic data, and what are the trade-offs?
- What is the population proxy pitfall, and how do you fix it?
- When should you use
px.choroplethvs.px.choropleth_mapbox? - When should you prefer geopandas + matplotlib over Plotly for a map?
- What is Folium, and what library does it wrap?
- Name three map types other than choropleth and dot map, and describe when each is appropriate.
If any of these are unclear, re-read the relevant section. Chapter 24 leaves geographic space behind and introduces network visualization — charts of relationships rather than places.
23.18 Chapter Summary
This chapter introduced geospatial visualization in Python:
- Projections flatten the sphere of the earth onto a 2D page, distorting some combination of area, shape, angle, or distance. The choice of projection is a rhetorical choice about what to emphasize.
- CRS (coordinate reference systems) are formal specifications for projections, identified by EPSG codes. EPSG:4326 (lat/lon) is the raw form; EPSG:3857 (Web Mercator) is the web tile standard.
- GeoJSON and shapefiles are the two common file formats. GeoJSON is JSON-based and web-friendly; shapefiles are the traditional GIS format.
- Plotly Express provides
px.choropleth,px.choropleth_mapbox,px.scatter_geo, andpx.scatter_mapboxfor interactive maps. - geopandas + matplotlib is the canonical Python stack for static publication-quality maps. GeoDataFrames extend pandas with geometry columns.
- Altair supports geospatial visualization via
mark_geoshape()and TopoJSON loading. - Folium wraps Leaflet for interactive browser-based maps in Python.
- The population proxy pitfall is the most common choropleth mistake: uncorrected counts produce maps that just show where people live. Always normalize to per-capita rates.
- Map type selection depends on the question: choropleth for regional rates, dot map for locations, hex bin for dense points, cartogram for emphasis on values over area, flow map for movement.
The chapter's threshold concept — maps are arguments about space — argues that no map is neutral. Every projection, color scale, and normalization choice is a rhetorical decision. The reader will interpret the map based on these choices, so the chart maker must choose deliberately.
Chapter 24 introduces network visualization: charts where the data is not geographic but relational. Nodes represent entities, edges represent connections, and the layout is determined by algorithms rather than latitude and longitude.
23.19 Spaced Review
Questions that reach back to earlier chapters:
- From Chapter 3 (Color): What color palette choices from Chapter 3 apply to choropleths, and what additional constraints does geographic visualization impose?
- From Chapter 4 (Honest Charts): The population proxy pitfall is a specific form of the lie-factor problem. How is it similar to, and different from, the dual-y-axis concerns from Chapter 21?
- From Chapter 20 (Plotly Express): Plotly Express's
px.scatterhascolor,size, andhover_nameparameters. How do the same parameters work inpx.scatter_geoandpx.scatter_mapbox? - From Chapter 22 (Altair): Altair's
mark_geoshape()is one of many marks in the Altair mark library. How does the grammar of graphics handle maps as a special case? - From Chapter 14 (Specialized Charts): matplotlib supports basic map-like features (heatmaps with geographic extent, contour plots). When is matplotlib alone enough, and when do you need geopandas?
Geospatial visualization is a deep specialty with its own libraries, conventions, and pitfalls. This chapter is a survey of the main tools and the main design principles. For serious GIS work, look at Chapter 23's further reading and at dedicated GIS books. For most practical data science applications — choropleths, dot maps, quick interactive maps — the tools in this chapter are sufficient, and the main risk is not the technology but the design decisions. Respect the projection. Respect the normalization. Respect the reader's assumptions. Chapter 24 leaves geography entirely for the abstract space of networks and graphs.