Chapter 19: Multi-Variable Exploration — PairPlot, JointPlot, Heatmaps, and Cluster Maps

33 min read

> "The greatest value of a picture is when it forces us to notice what we never expected to see."

Learning Objectives

Create pair plots with sns.pairplot() to explore all pairwise relationships simultaneously
Customize the diagonal and off-diagonal panels of a pair plot for specific analytical questions
Create joint plots that combine a bivariate view with marginal distributions
Build correlation heatmaps with annotation, masking, and diverging colormaps
Create cluster maps with hierarchical clustering and interpret the resulting dendrogram
Use the PairGrid and JointGrid lower-level APIs for custom multi-variable layouts
Interpret the visual patterns in multi-variable displays: clusters, outliers, correlations, non-linear relationships

In This Chapter

19.1 The Multi-Variable Problem
19.2 Shneiderman's Mantra, Revisited
19.3 sns.pairplot: The All-Pairs Overview
19.4 PairGrid: The Lower-Level API
19.5 sns.jointplot: The Bivariate Plus Marginals
19.6 JointGrid: The Lower-Level API
19.7 sns.heatmap: The Correlation Matrix
19.8 Masking the Upper Triangle
19.9 Colormaps for Heatmaps
19.10 sns.clustermap: Hierarchical Clustering with a Heatmap
19.11 Reading a Cluster Map
19.12 Multi-Variable Pitfalls
19.13 Progressive Project: Climate Multi-Variable Exploration
19.14 A Note on Variable Selection
19.15 Check Your Understanding
19.16 Chapter Summary
19.17 Spaced Review

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 19: Multi-Variable Exploration — PairPlot, JointPlot, Heatmaps, and Cluster Maps

"The greatest value of a picture is when it forces us to notice what we never expected to see." — John Tukey, Exploratory Data Analysis (1977)

19.1 The Multi-Variable Problem

Every chapter up to this point has handled one or two variables at a time. Chapter 17 examined the distribution of a single variable. Chapter 18 examined the relationship between two variables or a variable and a category. These are the building blocks of exploratory analysis, and they are sufficient for many tasks. But real datasets rarely contain only two columns.

A climate dataset has year, temperature anomaly, CO2 concentration, sea level, precipitation, ice extent, solar irradiance, ENSO index, aerosol loading, and a dozen other variables. A clinical trial has patient age, sex, dosage, baseline biomarkers, treatment outcomes, adverse events, and quality-of-life scores. A customer dataset has demographics, purchase history, engagement metrics, and satisfaction scores. When the data has ten, twenty, or fifty variables, the scatter plot no longer suffices. You cannot examine each pair of variables with a separate plot and keep track of the patterns. The combinatorics are against you: ten variables means forty-five pairwise relationships, and fifty variables means over a thousand.

The multi-variable problem is not just that there are too many charts to make. The problem is that the patterns that matter are often not visible in any single pairwise chart. Clusters of variables move together; outliers affect multiple dimensions simultaneously; non-linear relationships become visible only when you look at the right combination. To see these patterns, you need tools that display many variables at once in a compact, scannable form.

This chapter covers the four main seaborn tools for multi-variable exploration. The pair plot displays all pairwise relationships in a dataset as a matrix of scatter plots, giving you an overview of the entire structure at a glance. The joint plot zooms in on one pair of variables and adds marginal distributions, letting you see both the relationship and the individual variable shapes. The correlation heatmap abstracts away the scatter plots and displays only the pairwise correlations as a colored matrix, allowing you to scan dozens of variables at once. The cluster map goes one step further by reordering the rows and columns through hierarchical clustering, grouping variables that behave similarly.

These four tools correspond to four levels of abstraction. The pair plot is the most detailed — you see the raw data. The joint plot is the most focused — you see two variables in depth. The heatmap is the most compact — you see a summary statistic for each pair. The cluster map is the most structured — you see the correlations and the groupings they imply. A typical exploratory analysis uses several of these tools in sequence. You start with a heatmap to spot which variables correlate strongly, follow up with a pair plot of the interesting subset, zoom in with a joint plot on the most important pair, and produce a cluster map to formalize the grouping structure.

This chapter does not introduce a new threshold concept. Instead, it applies and extends the techniques from Chapters 17 and 18 to a new context. The distributional tools from Chapter 17 (histograms, KDEs) reappear on the diagonal of the pair plot and in the marginal panels of the joint plot. The relational tools from Chapter 18 (scatter plots, regression overlays) reappear in the off-diagonal panels. What is new is the arrangement — the way seaborn packs many charts into one compact display.

19.2 Shneiderman's Mantra, Revisited

In Chapter 9 we discussed Ben Shneiderman's information-seeking mantra: Overview first, zoom and filter, details on demand. The mantra describes a general approach to exploring any information space, but it applies particularly well to multi-variable data.

Overview first. Before you investigate individual variables, get a sense of the whole dataset. How many variables are there? How do they relate? Which pairs correlate strongly, and which are independent? This is the job of the heatmap and the pair plot. A heatmap shows all correlations in a single image; a pair plot shows all pairwise scatter plots. Either gives you a big-picture view before you commit to examining any particular pair.

Zoom and filter. Once you know which variables are interesting, focus on them. If the heatmap shows a strong correlation between CO2 and temperature, the next step is to examine that pair in detail — not every pair. Filter the variables down to the ones that matter, and zoom in on the relationships that are worth understanding. The joint plot is the natural tool for this zoom step: it takes a single pair and examines it with the full distributional and relational machinery.

Details on demand. When a specific observation is suspicious — an outlier, a surprising cluster — investigate the underlying data. In a pair plot, you might notice that a few points are off to one side of every panel; those are multivariate outliers that deserve attention. The details-on-demand step usually happens outside the chart — you go back to the DataFrame and filter on the suspicious rows. But the chart is what told you to look.

The tools in this chapter map onto Shneiderman's three levels. The heatmap and cluster map are overview tools. The joint plot is a zoom tool. And the pair plot sits in between — it gives an overview through its structure (a grid of small multiples) but retains enough detail (the raw points) to serve as a zoom tool for moderate-sized datasets. Knowing which tool fits which step of your workflow is part of becoming fluent with seaborn.

A common mistake is to start with details-on-demand and never back up. A novice analyst opens a dataset and immediately plots the first two columns against each other, then the next two, and so on. This process is exhausting and rarely productive — you spend time on uninteresting pairs and miss the important ones. The mantra suggests a better sequence: start with the heatmap, identify what matters, and then drill down.

19.3 sns.pairplot: The All-Pairs Overview

The simplest multi-variable tool in seaborn is sns.pairplot. It takes a DataFrame and produces a grid of plots showing every pairwise combination of numeric columns. The diagonal shows the distribution of each individual variable; the off-diagonal shows the pairwise scatter plots.

The basic call is a single line:

import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.pairplot(data=penguins)

This produces a 4×4 grid of panels (the penguins dataset has four numeric variables: bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g). The diagonal panels are histograms of each variable; the off-diagonal panels are scatter plots of each variable against each other. In the time it would take to call scatterplot once, seaborn produces sixteen plots arranged into a scannable matrix.

The value of the pair plot is that it lets you see everything at once. You can spot a strong correlation (bill length vs. flipper length — the cloud is elongated along a diagonal), a weak correlation (bill depth vs. body mass — the cloud is rounder), and an unusual distribution (bill depth looks bimodal on the diagonal). You can also spot interesting structures that a single scatter would miss, like the way one variable separates into two clusters that a second variable distinguishes clearly.

The hue parameter maps a categorical variable to color, the same way it does in relplot and catplot:

sns.pairplot(data=penguins, hue="species")

Adding hue="species" transforms the display dramatically. The scatter plots now color each point by species, revealing that what looked like unstructured clouds are actually three well-separated groups. The diagonal panels switch from histograms to layered KDEs — one per species — showing which variables best discriminate between species. (Bill depth turns out to be nearly identical across Adélie and Chinstrap but different for Gentoo; flipper length separates Gentoo from the other two; and so on.)

The vars parameter restricts the display to a subset of variables:

sns.pairplot(data=penguins, vars=["bill_length_mm", "bill_depth_mm", "body_mass_g"], hue="species")

This produces a 3×3 grid instead of 4×4, which is useful when the full pair plot has too many panels to be readable. A rule of thumb is that pair plots with more than about eight variables become illegible — each panel is too small to reveal anything. If your dataset has many variables, the vars parameter lets you focus on the ones that matter, and the heatmap (Section 19.7) provides a more compact alternative.

The diag_kind parameter controls the diagonal panels. The default is "auto", which chooses histograms when there is no hue and KDEs when there is. You can force a particular choice:

sns.pairplot(data=penguins, hue="species", diag_kind="hist")
sns.pairplot(data=penguins, hue="species", diag_kind="kde")

The kind parameter controls the off-diagonal panels. The default is "scatter", but other options include "kde" (bivariate kernel density), "hist" (bivariate histogram, useful for large datasets where a scatter would overplot), and "reg" (scatter plus regression line).

sns.pairplot(data=penguins, hue="species", kind="reg")

The regression version adds a fitted line and confidence band to every off-diagonal panel, which is helpful when you want to see trend strength at a glance.

Finally, plot_kws and diag_kws let you pass keyword arguments down to the underlying plotting functions:

sns.pairplot(
    data=penguins,
    hue="species",
    plot_kws={"alpha": 0.6, "s": 20},
    diag_kws={"alpha": 0.5},
)

This reduces the marker size and adds transparency, which helps when the panels are crowded. The plot_kws argument passes through to scatterplot; the diag_kws argument passes through to histplot or kdeplot depending on the diagonal kind.

19.4 PairGrid: The Lower-Level API

sns.pairplot is a convenient wrapper around a lower-level class called PairGrid. The wrapper is fine for most uses, but sometimes you want more control — for example, a different plot type on the upper triangle than on the lower triangle, or a custom function in one of the panels. PairGrid exposes this flexibility.

The basic idea is that you construct a PairGrid object and then call one or more map_* methods to populate the panels. The most common methods are:

map_diag(func, **kwargs) — apply a function to the diagonal panels
map_offdiag(func, **kwargs) — apply a function to the off-diagonal panels
map_upper(func, **kwargs) — apply a function to panels above the diagonal
map_lower(func, **kwargs) — apply a function to panels below the diagonal

For example, to produce a pair plot with scatter plots on the lower triangle, KDE contours on the upper triangle, and histograms on the diagonal:

import seaborn as sns
g = sns.PairGrid(data=penguins, hue="species")
g.map_diag(sns.histplot)
g.map_lower(sns.scatterplot)
g.map_upper(sns.kdeplot)
g.add_legend()

Each panel now has a different visual encoding appropriate for its position. The lower triangle shows raw data; the upper triangle shows smoothed density contours; the diagonal shows marginal histograms. This layout is sometimes called a "matrix plot" or "correlation scatter matrix" and is a staple of exploratory analysis in R's GGally package.

The PairGrid API also supports the same vars, x_vars, and y_vars arguments as pairplot. If you set x_vars and y_vars to different subsets, you get a rectangular grid rather than a square one — useful for exploring how a target variable relates to a set of predictors:

g = sns.PairGrid(
    data=penguins,
    x_vars=["bill_length_mm", "bill_depth_mm", "flipper_length_mm"],
    y_vars=["body_mass_g"],
    hue="species",
)
g.map(sns.scatterplot)
g.add_legend()

This produces a 1×3 grid showing body mass against each of the three bill/flipper measurements. The output is essentially a specialized kind of small multiple: one chart, three facets, one dependent variable.

The verbose nature of PairGrid is a feature, not a bug. When pairplot does what you want, use pairplot. When you need asymmetric panels or custom functions, drop down to PairGrid. Both produce the same matplotlib.figure.Figure underneath, so you can save, embed, or customize them identically.

19.5 sns.jointplot: The Bivariate Plus Marginals

Where the pair plot zooms out, the joint plot zooms in. A joint plot takes a single pair of variables and displays them together with their marginal distributions. The central panel shows the relationship; the top and right panels show the individual distributions.

The basic call is:

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")

The output is a compact three-panel display. The main (lower-left) panel is a scatter plot of bill_length_mm vs. bill_depth_mm. The top panel is a histogram of bill_length_mm. The right panel is a histogram of bill_depth_mm rotated 90 degrees. Together, the three panels give you a complete picture of the bivariate distribution: the relationship and both marginals.

This matters because the marginals contain information that the bivariate scatter does not always reveal. If bill_length_mm is bimodal, the scatter plot might not make this obvious — you would see two clusters along the x-axis but not know whether they reflect a bimodal x-distribution, a structural break in the y-variable, or both. The marginal histogram on top answers the question directly: yes, bill length is bimodal. The joint plot has told you something the bivariate scatter alone would not have.

Like jointplot's cousin pairplot, the joint plot is highly customizable. The kind parameter controls the main panel's plot type:

"scatter" (default) — scatter plot in the center
"kde" — bivariate KDE contours in the center
"hex" — hexagonal bin plot, useful for large datasets
"reg" — scatter plus regression line (and the marginals become KDE-smoothed histograms)
"hist" — bivariate histogram

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind="hex")

The hex version is particularly valuable for datasets with thousands of points. A scatter plot would suffer from overplotting (each point on top of the next), losing the density information. A hex plot bins the points into hexagonal cells and colors each cell by count, revealing density structure that the raw scatter would hide. The hex version is a two-dimensional analog of the histogram — each bin counts the points that fall inside it, and the color encodes the count.

The hue parameter works in joint plots too, splitting the center panel and marginals by a categorical variable:

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")

With hue, the center panel becomes a colored scatter and the marginals become layered KDEs. Each species gets its own color, its own contour, its own marginal curve. The joint plot has become a three-way comparison in a single compact display.

A word about the kind="reg" variant: this option produces a regression plot with a fitted line and replaces the marginal histograms with KDE-smoothed versions. The regression fit appears in both the main panel (as a line) and implicitly in the marginals (the x-distribution is shown; you can mentally project the regression forward to where the data has none). This variant is particularly useful when you want to combine the regression analysis of Chapter 18 with the marginal inspection of this chapter.

19.6 JointGrid: The Lower-Level API

Parallel to PairGrid, seaborn provides JointGrid as a lower-level API for joint plots. You construct a JointGrid and then call plot_joint and plot_marginals to populate the panels:

g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
g.plot_joint(sns.scatterplot)
g.plot_marginals(sns.histplot, multiple="layer", alpha=0.5)

This produces the same layout as jointplot but gives you independent control over the two parts. For example, you might want a scatter plot in the main panel but violin plots in the marginals, rather than the defaults:

g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.scatterplot, alpha=0.6)
g.plot_marginals(sns.rugplot, height=0.1)

Or you might want to mix plot types between the two marginals, which requires calling the axis-level functions directly on g.ax_marg_x and g.ax_marg_y:

import seaborn as sns
import matplotlib.pyplot as plt

g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
g.plot_joint(sns.scatterplot)
sns.boxplot(data=penguins, x="bill_length_mm", y="species", ax=g.ax_marg_x)
sns.boxplot(data=penguins, y="bill_depth_mm", x="species", ax=g.ax_marg_y)

This replaces the marginal histograms with horizontal and vertical box plots, grouping by species. The result is a specialized display that no built-in seaborn function produces directly — you have composed it from pieces.

The JointGrid API is rarely necessary. Most joint plots are fine with the defaults. But when you need a custom marginal or a non-standard main panel, JointGrid lets you build exactly what you need without reinventing the layout.

19.7 sns.heatmap: The Correlation Matrix

A pair plot of thirty variables has nine hundred panels. Even at a tiny size, nine hundred panels do not fit on a readable page. The pair plot is an overview tool, but it breaks down when the number of variables gets large.

The correlation heatmap is the remedy. Instead of showing a scatter plot for each pair of variables, it shows a single number — the correlation coefficient — colored on a diverging scale. A correlation near +1 (strong positive) is deeply saturated in one direction, a correlation near -1 (strong negative) is deeply saturated in the other, and a correlation near zero is pale. The result is a compact, scannable image that can display dozens of variables at once.

To build a correlation heatmap, you first compute the correlation matrix with pandas and then pass it to sns.heatmap:

import pandas as pd
import seaborn as sns

numeric_cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
corr = penguins[numeric_cols].corr()
sns.heatmap(corr)

This produces a 4×4 square where each cell is colored by the correlation between the row variable and the column variable. The diagonal is deep blue (correlation = 1 with itself), and the off-diagonal cells reflect the actual pairwise correlations.

The default colormap (rocket) is not ideal for correlation heatmaps. Correlations range from -1 to +1 with zero as a meaningful midpoint, so you want a diverging colormap — one that uses two different hues for the two signs of correlation and a neutral color at zero. The standard choices are coolwarm, RdBu_r, and seismic. You also want to set vmin=-1 and vmax=1 so the colormap's range matches the correlation's range, and center=0 so zero lands on the neutral color:

sns.heatmap(corr, cmap="coolwarm", vmin=-1, vmax=1, center=0)

Now the heatmap is interpretable at a glance. Red means positive correlation, blue means negative correlation, and white means near-zero. You can scan the whole matrix in a second or two and identify the strongest pairs.

Annotation makes the heatmap even more useful. The annot=True parameter writes the numeric correlation value into each cell, and the fmt parameter controls the number format:

sns.heatmap(corr, cmap="coolwarm", vmin=-1, vmax=1, center=0, annot=True, fmt=".2f")

With annotation, the heatmap serves double duty: it is a visual overview (the colors) and a precise numeric table (the numbers). Readers who want the rough picture can skim the colors; readers who want the exact values can read the numbers. The annotation format ".2f" means two decimal places; other common choices are ".1f" (one decimal) and ".0%" (percentage, no decimals).

For larger matrices, annotation becomes impractical — the numbers get too small to read. A rule of thumb is that annotation works well up to about ten variables and becomes cluttered beyond that. For large matrices, omit annot=True and let the colors carry the information.

19.8 Masking the Upper Triangle

A correlation matrix is symmetric: the correlation between A and B equals the correlation between B and A. This means that half of the heatmap is redundant. The lower triangle shows the same information as the upper triangle, just transposed. Displaying both halves is not wrong, but it can clutter the display and make the diagonal harder to see.

The solution is masking. You hide one of the triangles (conventionally the upper one) so the heatmap shows only the unique correlations. To build a mask, use NumPy's triu function:

import numpy as np
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, cmap="coolwarm", vmin=-1, vmax=1, center=0, annot=True, fmt=".2f", mask=mask)

np.triu(np.ones_like(corr)) produces a matrix of ones in the upper triangle (including the diagonal) and zeros in the lower triangle. Casting to bool turns the ones into True and zeros into False. The mask parameter then tells heatmap to skip the cells where the mask is True. The result is a lower-triangular heatmap with the upper half blanked out.

If you want to show the diagonal too (which is always 1.0 and adds a visual anchor), use np.triu(ones, k=1) instead, which excludes the diagonal from the mask:

mask = np.triu(np.ones_like(corr, dtype=bool), k=1)

The k=1 argument shifts the triangle up one row, so the diagonal is no longer masked. This is a matter of taste — some analysts prefer to see the diagonal as a visual reference, others prefer the cleaner look without it.

For a different convention (showing only the upper triangle instead of the lower), reverse the mask with np.tril:

mask = np.tril(np.ones_like(corr, dtype=bool), k=-1)

This masks the lower triangle, leaving the upper triangle visible. Either convention works; pick one and be consistent across your reports.

19.9 Colormaps for Heatmaps

The choice of colormap for a heatmap is not just an aesthetic decision — it is a perceptual one. A badly chosen colormap can mislead readers about the data.

For correlation heatmaps, the rules are:

Use a diverging colormap, because correlation has a meaningful midpoint (zero).
Center the colormap on zero, so the neutral color lands at "no correlation."
Set vmin=-1 and vmax=1, so the full range of correlation is represented.

Diverging colormaps in seaborn (which come from matplotlib) include:

"coolwarm" — blue to red through white. Very common for correlation.
"RdBu_r" — red to blue through white, with red positive and blue negative.
"seismic" — brighter, higher-contrast version of coolwarm.
"vlag" — seaborn's own diverging palette, designed for perceptual uniformity.

For sequential data (where the value goes from low to high without a midpoint — for example, a confusion matrix or a pivot table of counts), use a sequential colormap instead:

"viridis" — perceptually uniform, widely recommended
"mako" — seaborn's dark-to-light palette
"rocket" — seaborn's default heatmap palette
"flare" — seaborn's light-to-dark orange palette

The critical distinction is the presence of a midpoint. Correlation has one (zero). Counts do not. Using a diverging colormap for counts would create a false midpoint — readers would assume that values near the center of the colormap are "baseline" and values at the extremes are "anomalous," which is not what the data says. Using a sequential colormap for correlation would hide the sign — positive and negative correlations would both look "large" if they had the same magnitude, and readers could not distinguish them.

A common mistake is to use seaborn's default rocket colormap for correlation. This is a sequential palette, so it does not distinguish positive from negative correlation — it maps low values to dark purple and high values to bright yellow, which obscures the sign. Always check that you have specified a diverging colormap for correlation data.

19.10 sns.clustermap: Hierarchical Clustering with a Heatmap

A correlation heatmap shows which variables correlate with which others, but the order of the rows and columns is arbitrary — usually just the order they appear in the DataFrame. This arbitrariness means that variables which behave similarly might be far apart in the matrix, and you have to scan around to find them.

The cluster map solves this problem by reordering the rows and columns through hierarchical clustering. Variables that correlate strongly are placed near each other, so the strongest correlations cluster along the diagonal. The reordering is visualized through a dendrogram — a tree diagram on the top and/or left of the heatmap that shows the clustering structure.

The call is:

sns.clustermap(corr, cmap="coolwarm", vmin=-1, vmax=1, center=0, annot=True, fmt=".2f")

At first glance, a clustermap looks like a heatmap with trees attached. But the trees encode important information. Each leaf of the dendrogram is a variable. Leaves that are joined at a low height are similar to each other (they cluster early in the hierarchical process); leaves joined at a high height are dissimilar (they only merge at the top of the hierarchy). The height at which two branches meet is a measure of their similarity.

By looking at the dendrogram, you can identify groups of variables that behave together. If the dendrogram shows two long branches meeting at the top, with many variables on each branch, you have two distinct groups — two "factors" in a loose sense. If the dendrogram is more uniform, the variables are more independent.

The method parameter controls the hierarchical clustering algorithm:

"average" (default) — uses the average distance between clusters when merging
"single" — uses the nearest-neighbor distance (prone to chaining)
"complete" — uses the farthest-neighbor distance
"ward" — minimizes variance (often produces the most interpretable clusters)

sns.clustermap(corr, method="ward", cmap="coolwarm", center=0)

Different methods can produce different cluster orderings, so experiment to see which gives the most informative layout for your data. Ward's method is a solid default for correlation matrices.

The metric parameter controls the distance measure used between variables:

"euclidean" (default) — straight-line distance between variable vectors
"correlation" — distance based on correlation (1 minus the correlation)
"cityblock" — Manhattan distance

For correlation heatmaps, metric="correlation" is often the most appropriate choice because you are already thinking about the data in terms of correlation. For other kinds of heatmaps (e.g., expression data in biology), Euclidean distance is more common.

19.11 Reading a Cluster Map

A cluster map has three parts: the heatmap in the center, the row dendrogram on the left, and the column dendrogram on top. (Some cluster maps show only one dendrogram if you have set row_cluster=False or col_cluster=False.) Each part encodes different information, and reading a cluster map well means understanding all three.

The heatmap is what you already know from Section 19.7. Each cell shows the correlation (or other statistic) between a row variable and a column variable. The colors are interpreted the same way as in a plain heatmap: red for positive, blue for negative, white for zero.

The dendrograms encode the clustering structure. Each dendrogram is a binary tree rooted at the top (or left). The leaves are the individual variables, and the internal nodes are cluster merges. The height of an internal node shows the distance (or dissimilarity) at which the two child clusters were merged. Short branches mean the two clusters were already similar when they merged; long branches mean the merge happened at a large distance.

To find the variables that behave together, look for groups of leaves that share a low internal node — that is, leaves that merge at a low height. These are the tight clusters. If you see three such groups in the dendrogram, your dataset likely has three "factors" or "themes" hidden in it.

The reordering. The cluster map does not change the correlation values — those come from the correlation matrix. What it changes is the order of rows and columns, placing similar variables adjacent to each other so the strong correlations cluster visually. The block-diagonal pattern that often emerges is a signal that clustering found real structure in the data: several groups of variables, each internally correlated, with weaker correlations between groups.

Warning: dendrograms are not always meaningful. Hierarchical clustering always produces a tree, even if the data has no real cluster structure. A dendrogram that looks informative might be an artifact of the algorithm rather than a discovery. The test is whether the heatmap shows a corresponding pattern: if clustering reorders the rows and the result is a clean block-diagonal, you have real structure. If clustering reorders the rows and the result still looks random, the dendrogram is misleading and the data probably has no clear cluster structure.

A related pitfall is the label-shuffle illusion. A human reader will naturally look at the dendrogram and assume the clusters are "real" groups. This assumption is often wrong. Clustering finds the best hierarchical grouping given the algorithm and the distance metric, but "best" is defined by the algorithm, not by the data. A different metric or method could produce a completely different tree. Always cross-check cluster map findings with domain knowledge and additional analyses before drawing strong conclusions.

19.12 Multi-Variable Pitfalls

Every chapter has its pitfalls, and multi-variable visualization has several that are worth noting explicitly.

Correlation is not causation. This is the most famous pitfall in statistics, and it applies doubly to correlation heatmaps. A heatmap that shows two variables with correlation 0.9 tells you they move together. It does not tell you that one causes the other, that a third variable drives both, or that the relationship is stable. A heatmap is a first step in understanding a dataset; it is not the final word. When you see a strong correlation, the next step is to investigate it — look at the scatter plot, think about the causal mechanism, consider confounders. Skipping this step and assuming causation from correlation is the most common misuse of heatmaps.

Overplotting in pair plots. With large datasets (tens of thousands of points), a pair plot's off-diagonal scatter plots can become solid black clouds that reveal nothing. The solution is either to reduce the alpha (plot_kws={"alpha": 0.1}), switch to hexagonal bins (kind="hist"), or sample the data before plotting. A pair plot of a million points is useless even with alpha tricks — sample down to a few thousand points first.

Misleading cluster boundaries. As noted in Section 19.11, hierarchical clustering always produces a tree. The tree looks meaningful even when the underlying data has no real structure. Always validate clustering findings with another method (k-means, DBSCAN) or with domain knowledge. If your clustermap suggests three clusters but a k-means analysis with k=3 produces different groupings, the clustermap's tree is probably not capturing real structure.

Correlation confounded by missing data. If two variables have mostly missing values, their pairwise correlation is computed from the small subset of rows where both are present. This subset is often unrepresentative, and the correlation can be very different from what you would see if all data were available. Always check the sample size behind each correlation — pandas' .corr() method computes pairwise correlations by default, which means each cell can be based on a different sample. Use .dropna() before computing the correlation matrix to ensure a consistent sample.

Non-linear relationships hiding in linear correlations. The correlation coefficient measures linear relationships. Two variables can be perfectly related — for example, y = x² — and have zero correlation if the relationship is symmetric around the mean. A heatmap will report zero, suggesting no relationship, and you will miss the story. Always follow up strong correlations and suspicious zeros with pair plots or joint plots to confirm that the relationship shape matches what the coefficient suggests.

Axis label collisions. With many variables, the row and column labels on a heatmap can overlap and become illegible. Use plt.xticks(rotation=45, ha="right") to tilt the labels, or shorten the variable names before passing them to the heatmap. seaborn does its best to produce readable labels, but cramped space defeats even the best layout code.

Diagonal dominance. In a correlation heatmap, the diagonal is always 1.0 — every variable correlates perfectly with itself. This can distort the perception of the colormap, because the diagonal is always maximally saturated. Some analysts mask the diagonal to avoid this:

mask = np.eye(corr.shape[0], dtype=bool)
sns.heatmap(corr, mask=mask, cmap="coolwarm", center=0, annot=True, fmt=".2f")

This blanks out the diagonal, letting the off-diagonal cells use the full colormap range without being visually dominated by the ones on the diagonal.

19.13 Progressive Project: Climate Multi-Variable Exploration

We return to the climate dataset — now an old friend — for the multi-variable treatment. Until now we have been plotting one or two variables at a time: temperature anomaly in Chapter 3, the full temperature time series with annotations in Chapter 9, CO2 vs. temperature with a regression in Chapter 18. This chapter is the first time we look at all the climate variables at once.

The dataset now has five columns: year, temperature_anomaly, co2_ppm, sea_level_mm, and era (a categorical label for pre-industrial, industrial, and modern periods). The question is: how do these variables relate to each other, and what does a multi-variable view reveal that a pairwise view does not?

Step 1 is the pair plot:

import seaborn as sns
sns.pairplot(data=climate, vars=["temperature_anomaly", "co2_ppm", "sea_level_mm", "year"], hue="era")

The output is a 4×4 grid of panels. The diagonal shows how each variable is distributed within each era. Temperature anomaly in pre-industrial times clusters near zero; in modern times it spreads from roughly +0.5 to +1.2. CO2 in pre-industrial times is flat around 285 ppm; in modern times it ranges from about 340 to 420. Sea level is near zero in pre-industrial times and ramps up sharply in the modern era. The year variable is just a uniform spread within each era by definition.

The off-diagonal panels show every pairwise relationship. Temperature vs. CO2 produces a near-linear cloud with all three eras overlaid — you can see the industrial era bridging the gap between pre-industrial and modern. Sea level vs. CO2 is also near-linear, which should not surprise anyone familiar with climate science. Year vs. each climate variable shows a monotonic rise in the modern era. The pair plot tells you, in one glance, that these four variables move together and that the eras form three distinct clusters in multi-dimensional space.

Step 2 is a joint plot to zoom in on the CO2-temperature relationship:

sns.jointplot(data=climate, x="co2_ppm", y="temperature_anomaly", kind="reg")

The center panel shows the regression line with confidence band. The marginals show the univariate distributions. This is the relationship that drives the climate conversation, and seeing it zoomed in — with the regression fit and the marginal shapes — reinforces what the pair plot hinted at. The marginal on top shows CO2 distribution (heavily right-skewed because of the industrial surge); the marginal on the right shows temperature anomaly distribution (also right-skewed for the same reason).

Step 3 is the correlation heatmap:

import numpy as np
numeric_cols = ["temperature_anomaly", "co2_ppm", "sea_level_mm", "year"]
corr = climate[numeric_cols].corr()
mask = np.triu(np.ones_like(corr, dtype=bool), k=1)
sns.heatmap(corr, mask=mask, cmap="coolwarm", vmin=-1, vmax=1, center=0, annot=True, fmt=".2f")

The heatmap is small (4 variables, 10 unique correlations) but already informative. Temperature-CO2 correlation is about 0.95; CO2-sea level is about 0.93; sea level-year is about 0.88. Every pair is strongly positively correlated. This is the signature of a multi-variable system that is driven by a single underlying process (in this case, the accumulation of human-emitted CO2 and its downstream consequences).

Step 4 is the cluster map:

sns.clustermap(corr, cmap="coolwarm", vmin=-1, vmax=1, center=0, annot=True, fmt=".2f", method="ward")

With only four variables, the dendrogram is simple — the four leaves merge into one big cluster because everything correlates with everything else. The clustermap is more informative when there are many variables to group, but even here it gives you a visual confirmation that all four variables behave as a single tight cluster.

The exercises for this chapter extend this project with a larger dataset that includes precipitation, ice extent, and solar irradiance — seven variables total. That version of the analysis produces a clustermap with two clear sub-clusters: the anthropogenic variables (temperature, CO2, sea level) and the natural variables (solar irradiance, volcanic aerosol). Seeing this separation in the clustermap is a moment of clarity: the data itself, through pure hierarchical clustering on the correlation matrix, reveals the distinction between natural and human-driven climate factors.

19.14 A Note on Variable Selection

Every tool in this chapter scales poorly with the number of variables. A pair plot of four variables is readable. A pair plot of ten variables is cramped. A pair plot of twenty is illegible. The same scaling problem applies, with different constants, to heatmaps and cluster maps — correlations in a 50×50 matrix are hard to see without zooming in, and the annotations become too small to read at any reasonable figure size.

This scaling problem means that variable selection is part of multi-variable exploration, not a preprocessing step that happens before it. You cannot dump a hundred-column DataFrame into sns.pairplot and expect useful output. You have to decide which variables matter.

The usual approach is a two-stage workflow. Stage 1: run a correlation heatmap on all numeric variables to identify which pairs correlate strongly. This is fast, compact, and handles dozens of variables without strain. Stage 2: produce a pair plot of the small subset (five to eight variables) that the heatmap flagged as interesting. The pair plot now has few enough panels to be readable, and each panel is large enough to reveal the shape of the relationship.

A complementary approach is domain-driven selection. Instead of letting correlations drive the choice, you pick variables because they are theoretically important — the key drivers, the outcomes, the confounders you care about. This approach avoids the trap of chasing spurious correlations that happen to be large in your sample, and it aligns the exploration with the question you are actually trying to answer.

Both approaches are legitimate, and they can be combined. Use the heatmap to surface unexpected correlations, then add in your theoretically important variables, then produce a pair plot of the combined set. The pair plot is the payoff — it shows you the raw shape of the relationships after the heatmap has done the initial filtering.

A final note: variable selection is about legibility, not about discarding information. The variables you leave out of a pair plot are not forgotten — they are explored separately, in other plots, or aggregated into derived features. Multi-variable exploration is iterative, and each chart answers one question. Do not try to answer every question in a single image.

19.15 Check Your Understanding

Before continuing to Chapter 20 (Plotly Express), make sure you can answer these questions:

What does sns.pairplot display on the diagonal, and what does it display off the diagonal?
When would you use PairGrid instead of sns.pairplot?
What are the three panels of a joint plot, and what does each one show?
When should you use a diverging colormap in a heatmap, and when should you use a sequential colormap?
What is the purpose of masking the upper (or lower) triangle of a correlation heatmap?
What does the dendrogram in a cluster map represent? How do you read it?
What are three multi-variable pitfalls from Section 19.12, and how do you mitigate each one?
Why does the correlation coefficient fail to detect non-linear relationships, and what visual tool should you use to verify relationship shape?

If any of these are unclear, re-read the relevant section. The next chapter moves from static multi-variable exploration to interactive visualization, and the tools you have learned here will reappear in a different form (Plotly's splom is a direct analog of the pair plot, for example).

19.16 Chapter Summary

This chapter introduced the four main seaborn tools for multi-variable exploration:

sns.pairplot and PairGrid for all-pairs overviews
sns.jointplot and JointGrid for bivariate-plus-marginal zoom-ins
sns.heatmap for compact correlation displays
sns.clustermap for heatmaps with hierarchical clustering and dendrograms

Each tool corresponds to a different level of abstraction. The pair plot shows raw data for every pair. The joint plot shows raw data plus marginals for one pair. The heatmap shows a summary statistic (usually correlation) for every pair. The cluster map shows that statistic plus the grouping structure learned by clustering.

A typical multi-variable exploration uses these tools in sequence: heatmap to spot interesting pairs, pair plot to see the raw data, joint plot to zoom in on the most important relationship, cluster map to formalize the grouping structure. Shneiderman's mantra — overview first, zoom and filter, details on demand — is the workflow that ties them together.

The chapter also discussed several pitfalls: correlation is not causation; large pair plots overplot; dendrograms can be misleading; missing data distorts correlations; non-linear relationships hide in linear coefficients; labels collide in crowded heatmaps. Awareness of these pitfalls is a prerequisite for using the tools responsibly.

Chapter 20 leaves the world of static visualization entirely and introduces Plotly Express, a library that produces interactive, web-native charts with a syntax similar to seaborn's. Many of the same chart types will reappear — pair plots, joint plots, heatmaps — but now with pan, zoom, hover, and brushing capabilities. The multi-variable exploration workflow of this chapter will be even more powerful when the charts respond to the reader's cursor.

19.17 Spaced Review

These questions reach back to earlier chapters to reinforce connections:

From Chapter 9: What is Shneiderman's information-seeking mantra, and how does it map onto the tools in this chapter?
From Chapter 17: What does a KDE on the diagonal of a pair plot show, and how does its bandwidth affect interpretation?
From Chapter 18: How does a regression overlay in the off-diagonal of a pair plot compare to the regression line in a joint plot with kind="reg"?
From Chapter 12: Why would you specify vmin and vmax when calling sns.heatmap, and what is the matplotlib concept underlying this parameter?
From Chapter 2: A cluster map reorders rows and columns through hierarchical clustering. How does this reordering exploit the pre-attentive perception of proximity?

You have now completed the seaborn portion of this textbook (Part IV, Chapters 16–19). Seaborn gave you a declarative, DataFrame-first interface to statistical visualization, with the three function families (relational, distributional, categorical) and the multi-variable tools of this chapter covering the vast majority of practical plotting tasks. Part V (Chapters 20–24) leaves static visualization behind and introduces interactive tools: Plotly, Altair, and the Vega-Lite grammar. The multi-variable mindset you developed here will carry over directly — you will reproduce pair plots, heatmaps, and joint plots in Plotly, and then go beyond them with interactive features that static images cannot match.