Further Reading: Multi-Variable Exploration
Tier 1: Essential Reading
The seaborn Multi-Panel Tutorial. seaborn.pydata.org/tutorial/axis_grids.html
The official seaborn tutorial covering FacetGrid, PairGrid, and JointGrid. Essential as a direct API reference for the Grid-based APIs in this chapter.
Anscombe, Francis J. "Graphs in Statistical Analysis." The American Statistician 27, no. 1 (1973): 17-21. The original paper introducing Anscombe's quartet. Short (five pages), freely available through academic libraries, and a foundational text for the argument that summary statistics are not enough. Read alongside Case Study 1.
Matejka, Justin, and George Fitzmaurice. "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (2017): 1290-1294. The Datasaurus Dozen paper. Introduces the algorithmic method for generating datasets with identical statistics but different shapes. Freely available from Autodesk Research.
Tier 2: Recommended Specialized Sources
Eisen, Michael B., Paul T. Spellman, Patrick O. Brown, and David Botstein. "Cluster Analysis and Display of Genome-Wide Expression Patterns." Proceedings of the National Academy of Sciences 95, no. 25 (1998): 14863-14868. The founding paper for gene expression heatmaps. Establishes the visual conventions (dendrograms, red-green colormap, matrix layout) that are still standard in molecular biology. Read alongside Case Study 2.
Wickham, Hadley, and Dianne Cook. "Graphical Exploration of Multivariate Data." Chapter in Handbook of Computational Statistics, 2012. A review of multivariate visualization techniques with emphasis on the grand tour, parallel coordinate plots, and scatterplot matrices. Freely available online.
Tufte, Edward R. Envisioning Information. Graphics Press, 1990. Tufte's second book covers "small multiples" extensively — the principle underlying the pair plot. The chapter on escaping flatland is a theoretical foundation for multi-variable visualization.
Unwin, Antony. Graphical Data Analysis with R. CRC Press, 2015. A practical guide to exploratory data analysis with R that covers pair plots, heatmaps, and cluster maps with extensive examples. The principles transfer directly to seaborn.
Few, Stephen. "Heatmaps: More Than Meets the Eyes." Perceptual Edge Visual Business Intelligence Newsletter, 2014. A critical look at heatmaps and when they work versus when they mislead. Useful for understanding the perceptual pitfalls of color-based displays.
Tier 3: Tools and Online Resources
| Resource | URL / Source | Description |
|---|---|---|
| sns.pairplot documentation | seaborn.pydata.org/generated/seaborn.pairplot.html | Official API reference with all parameter details. |
| sns.jointplot documentation | seaborn.pydata.org/generated/seaborn.jointplot.html | Official API reference for joint plots. |
| sns.heatmap documentation | seaborn.pydata.org/generated/seaborn.heatmap.html | Official API reference for heatmaps. |
| sns.clustermap documentation | seaborn.pydata.org/generated/seaborn.clustermap.html | Official API reference for cluster maps including all clustering parameters. |
| scipy.cluster.hierarchy | docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html | Scientific Python's hierarchical clustering module, used internally by seaborn's clustermap. |
| The Datasaurus Dozen | autodesk.com/research/publications/same-stats-different-graphs | The original datasets and paper, freely downloadable. |
| Cluster/TreeView | rana.lbl.gov/EisenSoftware.htm | Michael Eisen's original gene expression clustering tool, still downloadable and runnable. |
| ComplexHeatmap (R) | bioconductor.org/packages/ComplexHeatmap | A more advanced heatmap package for R with features beyond seaborn's clustermap. |
| Datasaurus Dozen Python package | pypi.org/project/datasaurus-dozen | Python package that loads the Datasaurus Dozen datasets for easy plotting. |
A note on reading order: If you want one additional source, read Anscombe's 1973 paper. It is five pages long, freely available, and directly relevant to the central warning of this chapter: that correlation coefficients can mask wildly different relationships. For historical context on cluster maps, follow up with Eisen et al.'s 1998 paper. For ongoing practice, the seaborn documentation's tutorial pages are the best reference — bookmark them and return when you need to look up a specific parameter.