Further Reading: Multi-Variable Exploration


Tier 1: Essential Reading

The seaborn Multi-Panel Tutorial. seaborn.pydata.org/tutorial/axis_grids.html The official seaborn tutorial covering FacetGrid, PairGrid, and JointGrid. Essential as a direct API reference for the Grid-based APIs in this chapter.

Anscombe, Francis J. "Graphs in Statistical Analysis." The American Statistician 27, no. 1 (1973): 17-21. The original paper introducing Anscombe's quartet. Short (five pages), freely available through academic libraries, and a foundational text for the argument that summary statistics are not enough. Read alongside Case Study 1.

Matejka, Justin, and George Fitzmaurice. "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (2017): 1290-1294. The Datasaurus Dozen paper. Introduces the algorithmic method for generating datasets with identical statistics but different shapes. Freely available from Autodesk Research.


Eisen, Michael B., Paul T. Spellman, Patrick O. Brown, and David Botstein. "Cluster Analysis and Display of Genome-Wide Expression Patterns." Proceedings of the National Academy of Sciences 95, no. 25 (1998): 14863-14868. The founding paper for gene expression heatmaps. Establishes the visual conventions (dendrograms, red-green colormap, matrix layout) that are still standard in molecular biology. Read alongside Case Study 2.

Wickham, Hadley, and Dianne Cook. "Graphical Exploration of Multivariate Data." Chapter in Handbook of Computational Statistics, 2012. A review of multivariate visualization techniques with emphasis on the grand tour, parallel coordinate plots, and scatterplot matrices. Freely available online.

Tufte, Edward R. Envisioning Information. Graphics Press, 1990. Tufte's second book covers "small multiples" extensively — the principle underlying the pair plot. The chapter on escaping flatland is a theoretical foundation for multi-variable visualization.

Unwin, Antony. Graphical Data Analysis with R. CRC Press, 2015. A practical guide to exploratory data analysis with R that covers pair plots, heatmaps, and cluster maps with extensive examples. The principles transfer directly to seaborn.

Few, Stephen. "Heatmaps: More Than Meets the Eyes." Perceptual Edge Visual Business Intelligence Newsletter, 2014. A critical look at heatmaps and when they work versus when they mislead. Useful for understanding the perceptual pitfalls of color-based displays.


Tier 3: Tools and Online Resources

Resource URL / Source Description
sns.pairplot documentation seaborn.pydata.org/generated/seaborn.pairplot.html Official API reference with all parameter details.
sns.jointplot documentation seaborn.pydata.org/generated/seaborn.jointplot.html Official API reference for joint plots.
sns.heatmap documentation seaborn.pydata.org/generated/seaborn.heatmap.html Official API reference for heatmaps.
sns.clustermap documentation seaborn.pydata.org/generated/seaborn.clustermap.html Official API reference for cluster maps including all clustering parameters.
scipy.cluster.hierarchy docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html Scientific Python's hierarchical clustering module, used internally by seaborn's clustermap.
The Datasaurus Dozen autodesk.com/research/publications/same-stats-different-graphs The original datasets and paper, freely downloadable.
Cluster/TreeView rana.lbl.gov/EisenSoftware.htm Michael Eisen's original gene expression clustering tool, still downloadable and runnable.
ComplexHeatmap (R) bioconductor.org/packages/ComplexHeatmap A more advanced heatmap package for R with features beyond seaborn's clustermap.
Datasaurus Dozen Python package pypi.org/project/datasaurus-dozen Python package that loads the Datasaurus Dozen datasets for easy plotting.

A note on reading order: If you want one additional source, read Anscombe's 1973 paper. It is five pages long, freely available, and directly relevant to the central warning of this chapter: that correlation coefficients can mask wildly different relationships. For historical context on cluster maps, follow up with Eisen et al.'s 1998 paper. For ongoing practice, the seaborn documentation's tutorial pages are the best reference — bookmark them and return when you need to look up a specific parameter.