Further Reading: Chapter 27

Working with Geospatial Data: Maps, Spatial Joins, and Location-Based Analysis


Foundational References

1. "Geographic Information Analysis" --- David O'Sullivan and David Unwin (2nd edition, Wiley, 2010) The standard textbook on spatial analysis concepts: point pattern analysis, spatial autocorrelation, geostatistics, and spatial regression. O'Sullivan and Unwin write for geographers, not programmers, so the emphasis is on why spatial data is different from non-spatial data and what assumptions standard statistical methods violate when applied to geographic data. Chapters 1-3 cover the conceptual foundations (spatial is special, coordinate systems, map projections) that this chapter compressed into Part 2. Read this if you want the intellectual grounding behind the practical tools. The treatment of Tobler's First Law of Geography --- "everything is related to everything else, but near things are more related than distant things" --- is essential context for spatial feature engineering.

2. "Python for Geospatial Data Analysis" --- Bonny P. McClain (O'Reilly, 2023) A practitioner-oriented book that covers the same ecosystem used in this chapter: geopandas, shapely, folium, contextily, and rasterio. McClain writes for data scientists who need geospatial tools, not for GIS professionals who already have them. Chapters 3-5 (vector data, spatial operations, visualization) directly extend this chapter's material with more examples and edge cases. Chapter 8 covers raster data, which we did not cover but which you will encounter if you work with satellite imagery, elevation data, or climate models. The book also covers GeoPandas' integration with PostGIS for geospatial databases, which is important for production pipelines.

3. "Geocomputation with Python" --- Anita Graser, Michael Dorman, Jakub Nowosad, and Robin Lovelace (2024) The Python companion to the influential "Geocomputation with R" textbook. Freely available online at py.geocompx.org. This is the most comprehensive open-source reference for geospatial Python: vector operations, raster operations, map-making, reprojection, spatial joins, spatial statistics, and geostatistics. The chapter on coordinate reference systems (Chapter 7) is more thorough than our treatment and includes worked examples of projection errors and their consequences. The online format means code examples are tested and reproducible.


Geopandas and Shapely

4. Geopandas Official Documentation The documentation at geopandas.org is well-written and includes a user guide, API reference, and gallery of examples. The "Introduction to GeoPandas" tutorial walks through GeoDataFrame creation, file I/O, spatial operations, and plotting. The "Spatial Joins" page covers sjoin and sjoin_nearest with clear examples of each predicate type. Read the documentation's section on CRS handling --- it explains how geopandas detects and converts CRS and what happens when CRS metadata is missing.

5. Shapely Documentation and User Manual Shapely (shapely.readthedocs.io) is the Python library for geometric operations: point-in-polygon, intersection, union, buffer, simplification, distance. Geopandas delegates all geometry operations to Shapely. The user manual's "Geometric Operations" section covers the operations used in this chapter (contains, within, intersects, distance, buffer) with clear diagrams. The section on "Prepared Geometry" explains how Shapely accelerates repeated operations using spatial indexing --- useful for understanding why spatial joins are fast.

6. "Spatial Indexing with R-trees" --- Antonin Guttman (1984) The original paper introducing the R-tree data structure, which geopandas uses (via PyGEOS/Shapely 2.0) to accelerate spatial queries. Guttman proposed organizing spatial objects by their bounding boxes in a balanced tree structure, enabling spatial search in O(log n) time. Published in Proceedings of ACM SIGMOD. You do not need to read this paper to use geopandas, but understanding that spatial joins are fast because of R-trees (not brute force) helps you reason about performance. For a more accessible explanation, the geopandas documentation on the sindex attribute provides the practical details.


Folium and Map Visualization

7. Folium Documentation The documentation at python-visualization.github.io/folium/ covers all the map types used in this chapter: Map, Marker, CircleMarker, Choropleth, and MarkerCluster. The "Quickstart" guide shows how to create a map in three lines of code. The "Advanced Guide" covers custom JavaScript, layer controls, and integration with Leaflet plugins. For data scientists, the choropleth and marker cluster examples are the most immediately useful.

8. "The Design of Everyday Things" applied to maps --- not a book, but a principle When creating maps for stakeholders, the same design principles apply as for any data visualization: minimize visual clutter, use a single clear color scale, include a legend with actual values, choose bins that communicate the pattern (quantiles for skewed data, equal intervals for uniform data). The folium Choropleth function handles the mechanics, but the design decisions --- which variable to map, how many bins, which color scale --- determine whether the map communicates or confuses.


Coordinate Reference Systems and Projections

9. "Map Projections: A Working Manual" --- John P. Snyder (USGS Professional Paper 1395, 1987) The definitive reference on map projections, published by the US Geological Survey. Snyder covers every major projection (Mercator, Transverse Mercator, Albers Equal Area, Lambert Conformal Conic, UTM) with mathematical formulas, distortion analysis, and use cases. For a data scientist, the most useful sections are the overviews of UTM (used for regional analysis) and Albers Equal Area (used for area-preserving national maps). The full text is available free from the USGS. Read the introductory chapter for a clear explanation of why no projection preserves all four properties (area, shape, distance, direction) simultaneously.

10. EPSG.io --- Coordinate Reference System Lookup The website epsg.io provides a searchable database of every EPSG code: type it in, see the projection parameters, coverage area, and units. When a colleague says "use EPSG:5070," you can look it up and see that it is the NAD83 / Conus Albers projection, covering the contiguous US, with units in meters. Bookmark this site. You will use it every time you work with a new CRS.


Feature Engineering and Spatial Analysis

11. "Location, Location, Location: The 3L Approach to Geospatial Feature Engineering" --- Cate Huston (2019, blog post) A concise blog post (available on Medium) that introduces three categories of spatial features: (1) Location itself (coordinates, region labels), (2) Linkage (spatial joins, nearest-neighbor lookups), and (3) Landscape (density, clustering, spatial context). This taxonomy maps directly to the four feature types in Part 5 of this chapter. The post includes practical Python examples and is written for data scientists, not GIS professionals.

12. "Geospatial Analysis: A Comprehensive Guide" --- Michael de Smith, Michael Goodchild, and Paul Longley (6th edition, free online) A comprehensive online reference at spatialanalysisonline.com covering point pattern analysis, surface analysis, network analysis, and geostatistics. The treatment of spatial autocorrelation (Moran's I, Geary's C) is relevant for understanding whether geographic patterns in your data are statistically significant or could arise from random variation. The section on location-allocation models (Chapter 7) directly relates to the ShopSmart FC optimization problem --- it covers methods for optimally placing facilities to serve demand.


Geocoding

13. Geopy Documentation Geopy (geopy.readthedocs.io) is the Python library for geocoding and reverse geocoding. It provides a unified interface to multiple geocoding services: Nominatim (OpenStreetMap, free), Google Maps, Mapbox, HERE, ArcGIS, and the US Census. The documentation covers rate limiting, error handling, and batch geocoding. The RateLimiter utility is essential for not getting banned from free services. For US addresses, the US Census geocoder (accessible via geopy or directly) is free with no rate limit.

14. US Census Geocoder The Census Bureau provides a free batch geocoding service at geocoding.geo.census.gov. Upload a CSV of up to 10,000 addresses and receive latitude, longitude, and census geography (state, county, tract, block) for each. No API key required, no rate limit. For US-based data science projects, this is the best free geocoding option. The geographic enrichment (county and census tract assignment) is a free spatial join that would otherwise require separate boundary files.


Spatial Data Science at Scale

15. "Spatial Data Science: With Applications in R" --- Edzer Pebesma and Roger Bivand (2023) Written for R users, but the concepts are language-agnostic. Pebesma and Bivand cover the theory behind spatial operations: geometric predicates, spatial weights, spatial regression, and geostatistical interpolation. The chapter on spatial weights matrices (defining which observations are "neighbors") is important context for spatial feature engineering: the leave-one-out state churn rate in the StreamFlow case study is one approach, but spatial weights matrices offer a more principled framework. Available free online at r-spatial.org/book/.


Return to the chapter for full context.