33 min read

> "The greatest value of a picture is when it forces us to notice what we never expected to see."

Learning Objectives

  • Create distribution plots (histograms, KDEs, rug plots) using displot and histplot to explore single-variable distributions
  • Build categorical comparison charts (box plots, violin plots, swarm plots, bar plots) using catplot
  • Construct relational plots (scatter, line) with hue, size, and style encodings using relplot
  • Generate heatmaps and pair plots for multivariate exploration
  • Customize seaborn output with palettes, themes, and FacetGrid for multi-panel layouts

Chapter 16: Statistical Visualization with seaborn

"The greatest value of a picture is when it forces us to notice what we never expected to see." — John Tukey


Chapter Overview

In Chapter 15, you built visualizations from scratch with matplotlib. You controlled every pixel: tick marks, colors, axis limits, font sizes. You learned that a plot is a hierarchy of Figure, Axes, and Artists, and you assembled each piece by hand. That power was important to develop. And you probably noticed that building a polished, multi-panel statistical graphic required... a lot of code.

Here is a question that took you perhaps 15 lines in matplotlib: "How does vaccination coverage vary across WHO regions, and what does the distribution look like within each region?"

In seaborn, that question becomes this:

import seaborn as sns

sns.catplot(data=df, x="region", y="coverage_pct",
            kind="violin", height=5, aspect=1.5)

Four lines. You get a violin plot with kernel density estimation, quartile markers, and one panel per region — labeled, styled, and statistically meaningful. No manual binning. No loop over subgroups. No axis formatting boilerplate.

seaborn does not replace matplotlib. It sits on top of it. Every seaborn plot is a matplotlib Figure and Axes underneath, and you can always reach in and customize with the matplotlib tools you already know. What seaborn adds is a layer of statistical intelligence: it understands DataFrames, it knows how to compute distributions and regressions, and it encodes best practices about how to show those statistical relationships visually.

In this chapter, you will learn to:

  1. Create distribution plots (histograms, KDEs, rug plots) using displot and histplot to explore single-variable distributions
  2. Build categorical comparison charts (box plots, violin plots, swarm plots, bar plots) using catplot
  3. Construct relational plots (scatter, line) with hue, size, and style encodings using relplot
  4. Generate heatmaps and pair plots for multivariate exploration
  5. Customize seaborn output with palettes, themes, and FacetGrid for multi-panel layouts

16.1 seaborn's Philosophy: Statistics Meets Aesthetics

Why Another Plotting Library?

You might wonder: if matplotlib can do anything, why do we need seaborn?

The answer is the same reason you use pandas instead of writing loops over CSV rows. matplotlib is a general-purpose drawing library. It knows nothing about statistics, DataFrames, or common analytical patterns. seaborn is a statistical visualization library built specifically for data analysis. It knows that when you hand it a DataFrame column, you probably want a distribution. When you hand it two columns, you probably want a scatter plot with a regression line. When you hand it a categorical column and a numeric column, you probably want a comparison across groups.

Let us make this concrete. Suppose you want to create a scatter plot of GDP per capita versus vaccination coverage, with each WHO region shown in a different color, and a linear regression line per region. In matplotlib, that might look something like this:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(10, 6))

regions = df["region"].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(regions)))

for i, region in enumerate(regions):
    subset = df[df["region"] == region]
    ax.scatter(subset["gdp_per_capita"],
               subset["coverage_pct"],
               color=colors[i], label=region,
               alpha=0.6, s=30)
    # Fit regression line
    z = np.polyfit(subset["gdp_per_capita"].dropna(),
                   subset["coverage_pct"].dropna(), 1)
    p = np.poly1d(z)
    x_range = np.linspace(subset["gdp_per_capita"].min(),
                          subset["gdp_per_capita"].max(), 100)
    ax.plot(x_range, p(x_range), color=colors[i])

ax.set_xlabel("GDP per Capita (USD)")
ax.set_ylabel("Vaccination Coverage (%)")
ax.set_title("GDP vs. Vaccination Coverage by Region")
ax.legend(title="Region")
plt.tight_layout()

That is about 20 lines, and it does not even include confidence bands around the regression lines. In seaborn:

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", hue="region",
           height=6, aspect=1.5)

Three lines. And the seaborn version includes confidence bands computed via bootstrapping. You get more information with less code.

This is not a trivial difference. When you are exploring a new dataset — trying 10 or 20 different views to understand the data — the difference between 3 lines and 20 lines per chart is the difference between exploring freely and getting bogged down in boilerplate.

What seaborn Knows That matplotlib Does Not

seaborn's intelligence shows up in several ways. First, it understands DataFrames natively. You pass data=df and then reference column names as strings. matplotlib requires you to extract arrays manually. Second, seaborn handles missing values gracefully. If your DataFrame has NaN values, seaborn drops them with a warning rather than crashing. matplotlib raises an error or silently produces a wrong result. Third, seaborn computes statistical summaries automatically. It bins histograms, estimates KDE bandwidth, computes regression lines, and calculates confidence intervals — all without you writing a single line of statistical code.

This intelligence means seaborn makes smart defaults:

  • It computes bin widths for histograms automatically.
  • It estimates kernel density curves without you specifying bandwidth.
  • It adds confidence intervals to bar plots and regression lines.
  • It splits data by groups when you specify a hue variable.
  • It handles NaN values gracefully, dropping them with a warning instead of crashing.

The Three Figure-Level Functions

seaborn organizes its plot types around three "figure-level" functions that each create a new Figure:

Function Purpose Plot Kinds
displot() Distribution of data histogram, KDE, ECDF, rug
relplot() Relationships between variables scatter, line
catplot() Categorical comparisons strip, swarm, box, violin, bar, count, point

Each of these accepts a kind parameter that switches between subtypes. This means you only need to learn three function signatures and then swap kind to get entirely different visualizations.

There are also "axes-level" equivalents (like histplot, scatterplot, boxplot) that draw onto existing matplotlib Axes. We will use both, but the figure-level functions are the stars of this chapter because they seamlessly create multi-panel layouts with col and row parameters.

Setting Up

Let us establish our imports and load the vaccination dataset we have been using throughout the book:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="whitegrid", palette="muted")

df = pd.read_csv("who_vaccination_data.csv")
print(df.shape)
df.head()

The set_theme() call applies seaborn's styling globally, even to plain matplotlib plots. The style parameter controls the background grid; palette sets the default color cycle. We will explore these options later in the chapter.

A note about the import convention: import seaborn as sns is the universal standard, just as import pandas as pd is for pandas. The alias "sns" has a whimsical origin — it is a reference to Samuel Norman Seaborn, a character from the TV show The West Wing. You will see sns in every tutorial, book, and StackOverflow answer, so adopt it without question.

The Mental Model: seaborn on Top of matplotlib

Before we dive into specific plot types, let us be crystal clear about the architectural relationship between these libraries:

Your Code
    |
    v
seaborn (high-level: displot, catplot, relplot)
    |
    v
matplotlib (low-level: Figure, Axes, Artists)
    |
    v
Rendered Image (PNG, PDF, screen)

Every seaborn plot is a matplotlib Figure and Axes underneath. When you call sns.catplot(), seaborn creates a matplotlib Figure, adds Axes to it, draws on those Axes using matplotlib primitives, and returns a wrapper object (FacetGrid) that gives you convenient methods for customization. You can always reach through the wrapper to access the matplotlib objects directly.

This means two things. First, everything you learned in Chapter 15 still works. You can use plt.savefig(), ax.set_xlim(), fig.suptitle(), and every other matplotlib method on seaborn-generated plots. Second, when seaborn does not have a built-in option for something you want (a custom annotation, a specific axis transformation, a particular legend placement), you drop down to matplotlib. The transition is seamless because they share the same underlying objects.


16.2 Distribution Plots with displot

The first question you ask about any numeric variable is: What does the distribution look like? seaborn gives you displot() as the one-stop shop for univariate (and bivariate) distributions.

Histograms

sns.displot(data=df, x="coverage_pct", bins=30)

That single line creates a histogram with 30 bins, axis labels pulled from the column name, and seaborn's clean styling. Compare this to the matplotlib version from Chapter 15, where you specified ax.hist(), set labels, adjusted tick formatting, and tweaked the figure size separately.

You can layer a KDE curve on top:

sns.displot(data=df, x="coverage_pct",
            bins=30, kde=True)

The kde=True parameter overlays a kernel density estimate — a smooth curve that approximates the shape of the distribution. This is useful when the exact bin boundaries of a histogram can be misleading.

Kernel Density Estimation (KDE)

Sometimes you want only the smooth curve, without the blocky histogram bins:

sns.displot(data=df, x="coverage_pct", kind="kde")

A KDE plot estimates the probability density function of a continuous variable. Think of it as placing a tiny bell curve (a "kernel") at every data point, then summing them all together. The result is a smooth, continuous representation of where the data is concentrated.

When to use KDE vs. histograms:

  • Use histograms when you want to see the exact count in each bin and when your audience is less technical.
  • Use KDE when you want to compare distributions across groups (overlapping KDEs are easier to read than overlapping histograms).
  • Use both together when exploring data for yourself.

Splitting by Groups with hue

Here is where seaborn starts to shine. Suppose you want to see how coverage distributions differ by WHO region:

sns.displot(data=df, x="coverage_pct", hue="region",
            kind="kde", fill=True, alpha=0.4)

The hue parameter splits the data by the region column and draws a separate KDE for each group, automatically assigning distinct colors and adding a legend. In matplotlib, this would have required a loop over unique regions, manual color assignment, and legend construction.

The fill=True parameter fills the area under each curve, and alpha=0.4 makes them semi-transparent so you can see overlapping regions.

ECDF Plots

The empirical cumulative distribution function (ECDF) is another way to visualize distributions that avoids the binning problem entirely:

sns.displot(data=df, x="coverage_pct",
            hue="region", kind="ecdf")

An ECDF plot shows, for each value on the x-axis, what proportion of data points fall at or below that value. The y-axis always ranges from 0 to 1. ECDF plots are excellent for comparing distributions because they never overlap in the confusing way that histograms or KDEs can.

The Rug Plot Add-On

You can add a rug plot — tiny tick marks along the axis showing each individual data point — to any distribution plot:

sns.displot(data=df, x="coverage_pct",
            kind="kde", rug=True)

Rug plots are especially useful when you have a moderate number of data points (under a few hundred) and want to see where the actual observations lie beneath the smoothed distribution.

Choosing Bin Width: Why It Matters

One of the subtlest decisions in histogram construction is choosing the number of bins. Too few bins and you lose detail — a bimodal distribution can appear unimodal if the two peaks fall within the same bin. Too many bins and you see noise — random variation creates jagged spikes that suggest patterns that do not exist.

seaborn's histplot and displot(kind="hist") use an algorithm (Freedman-Diaconis or Sturges, depending on the version) to select a reasonable default bin count. But defaults are not always optimal for your specific data. Compare:

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for i, bins in enumerate([10, 30, 100]):
    sns.histplot(df["coverage_pct"], bins=bins,
                 ax=axes[i])
    axes[i].set_title(f"{bins} bins")
plt.tight_layout()

With 10 bins, you see the broad shape but lose detail about where values cluster within each bin. With 30 bins, you see meaningful peaks and valleys. With 100 bins, the histogram becomes spiky and noisy, reflecting the specific sample rather than the underlying distribution. For most datasets, 20-40 bins is a reasonable starting point, but always experiment.

Combining Distribution Elements

You can combine histogram, KDE, and rug in a single plot for maximum information density:

sns.displot(data=df, x="coverage_pct",
            bins=25, kde=True, rug=True,
            height=5, aspect=1.5)

The histogram shows exact counts, the KDE shows the smooth shape, and the rug shows individual observations. This combination is particularly useful during initial exploration — you get three views of the same distribution in one chart. For presentations, you would typically choose just one (KDE for shape comparison, histogram for count emphasis).

Bivariate Distributions

displot() handles two-variable distributions too. Pass both x and y:

sns.displot(data=df, x="gdp_per_capita",
            y="coverage_pct", kind="kde")

This creates a contour plot showing where the joint distribution is densest. Areas with tight contour lines indicate regions of high density — clusters of countries with similar GDP and vaccination coverage.

You can also create a 2D histogram with displot:

sns.displot(data=df, x="gdp_per_capita",
            y="coverage_pct", kind="hist",
            bins=30, cbar=True)

This divides the space into a grid of rectangular bins and colors each bin by the count of observations within it. The cbar=True parameter adds a color bar showing the count scale. Bivariate histograms are particularly useful for large datasets where individual scatter plot points would overlap and create an unreadable blob.

When to Use Which Distribution Plot

A common source of confusion is having too many options. Here is a practical guide:

Use a histogram when: - You want to see exact counts per bin - Your audience is not statistically sophisticated (histograms are universally understood) - You are presenting a single distribution

Use KDE when: - You want to compare multiple distributions on the same axes (overlapping KDEs are clearer than overlapping histograms) - You want to emphasize the smooth shape of the distribution - You have enough data points (at least 50-100) for a reliable density estimate

Use ECDF when: - You want to compare distributions without the visual confusion of overlapping curves - You need to read off percentiles directly (the y-axis gives cumulative probability) - You want a representation that does not depend on bin width or bandwidth choices

Use rug when: - You want to show individual observations alongside another distribution plot - Your dataset is small to moderate (under a few hundred points) - You want to verify that the KDE or histogram is not masking gaps or clusters


16.3 Categorical Plots with catplot

When one of your variables is categorical (regions, income groups, vaccine types), you need specialized charts that show how a numeric variable behaves within each category.

The catplot Family

catplot() is the figure-level function for all categorical plots. The kind parameter selects the specific chart type:

kind Description Best For
"strip" Jittered individual points Small datasets, seeing every observation
"swarm" Non-overlapping individual points Small-to-medium datasets, exact positions
"box" Box-and-whisker plot Quartiles, outliers, quick comparisons
"violin" KDE on each side (like a mirrored distribution) Comparing distribution shapes
"bar" Height = mean, with confidence interval Simple mean comparisons
"count" Height = count of observations Category frequencies
"point" Mean with CI, connected by lines Comparing across groups when interactions matter

Box Plots

The box plot is one of the most information-dense charts in statistics:

sns.catplot(data=df, x="region", y="coverage_pct",
            kind="box", height=5, aspect=1.8)

Each box shows: - The median (the line inside the box) - The interquartile range or IQR (the box itself, from the 25th to 75th percentile) - The whiskers (extending to 1.5 times the IQR) - Outliers (individual points beyond the whiskers)

This single chart tells you the center, spread, skewness, and outlier behavior for every region simultaneously.

Violin Plots

Violin plots combine box plots with KDE to show the full distribution shape:

sns.catplot(data=df, x="region", y="coverage_pct",
            kind="violin", inner="quartile",
            height=5, aspect=1.8)

The inner="quartile" parameter draws dashed lines at the 25th, 50th, and 75th percentiles inside each violin. You can also use inner="stick" to show individual observations, or inner="box" for a miniature box plot.

Violin plots are superior to box plots when distributions are bimodal (have two peaks). A box plot hides bimodality completely; a violin plot reveals it in the shape of the curves.

Swarm Plots

When your dataset is small enough (roughly under 500 observations per category), swarm plots show every individual data point without overlap:

sns.catplot(data=df, x="income_group",
            y="coverage_pct", kind="swarm",
            height=5, aspect=1.5)

Each point is offset horizontally just enough to avoid overlapping with its neighbors. The result is a cloud of dots that reveals both the distribution and the actual data. This is the "show me the data" philosophy in action.

Warning: Swarm plots become unreadable with large datasets. If you have thousands of points per category, use violin or box plots instead.

Combining Categorical Plots

You can layer an axes-level categorical plot onto another. A common pattern is violin + strip:

fig, ax = plt.subplots(figsize=(10, 5))
sns.violinplot(data=df, x="region",
               y="coverage_pct", ax=ax,
               inner=None, alpha=0.3)
sns.stripplot(data=df, x="region",
              y="coverage_pct", ax=ax,
              size=2, alpha=0.5, color="black")

The violin provides the distribution shape; the strip plot shows you every data point. This combination gives the best of both worlds.

Bar Plots with Confidence Intervals

seaborn's barplot computes the mean of each category and adds a 95% confidence interval by default:

sns.catplot(data=df, x="region", y="coverage_pct",
            kind="bar", height=5, aspect=1.5)

The error bars represent the bootstrap confidence interval around the mean, not the standard deviation. This is a subtle but important distinction — the CI tells you how uncertain you are about the mean, not how spread out the data is. If you want standard deviation, pass errorbar="sd".

Adding hue to Categorical Plots

Just like displot, catplot accepts a hue parameter to split each category further:

sns.catplot(data=df, x="region", y="coverage_pct",
            hue="income_group", kind="box",
            height=6, aspect=2)

This creates side-by-side box plots within each region, colored by income group. Now you can see not just how regions differ, but how income groups within each region differ. The complexity of the question increased dramatically, but the code barely changed.

Count Plots

Sometimes you just want to know how many observations fall into each category:

sns.catplot(data=df, x="region", kind="count",
            height=5, aspect=1.5,
            order=df["region"].value_counts().index)

The kind="count" option creates a bar chart where the height is the number of observations per category — no y variable needed. The order parameter sorts bars by frequency (most to least). This is equivalent to df["region"].value_counts().plot.bar() but with seaborn's nicer styling and the ability to add hue for sub-grouping.

Point Plots and Interaction Effects

Point plots show the mean of each category as a dot, with a confidence interval as a vertical line. When you add hue, the dots are connected by lines within each hue group:

sns.catplot(data=df, x="region", y="coverage_pct",
            hue="income_group", kind="point",
            height=5, aspect=1.8)

Point plots are especially powerful for detecting interaction effects — when the relationship between the x-variable and y-variable differs depending on the hue variable. If the lines for different income groups are parallel, the income group effect is consistent across regions. If the lines cross, there is an interaction — the effect of income group on coverage differs by region.

For example, you might see that in European regions, all income groups have similarly high coverage (lines converge), but in African regions, the gap between income groups is large (lines diverge). This kind of insight is hard to spot in box plots but jumps out in point plots.

Ordering Categories

By default, seaborn orders categories alphabetically. This is rarely what you want. You can specify a custom order:

region_order = (df.groupby("region")["coverage_pct"]
                .median()
                .sort_values()
                .index)

sns.catplot(data=df, x="region", y="coverage_pct",
            kind="box", order=region_order,
            height=5, aspect=1.5)

This orders regions from lowest to highest median coverage, making the comparison more intuitive. Always consider whether a meaningful order exists for your categories: chronological, geographic, by magnitude, or by some other logical criterion.


16.4 Relational Plots with relplot

When both variables are numeric and you want to explore their relationship, relplot() is your tool.

Scatter Plots

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct", height=5, aspect=1.3)

This creates a scatter plot showing each country as a dot. seaborn handles axis labels automatically. But the real power of relplot is in its encoding parameters.

Encoding Additional Variables

You can encode up to four additional variables in a single scatter plot:

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct",
            hue="region",
            size="population",
            style="income_group",
            height=6, aspect=1.5)
  • hue maps to color (categorical or continuous)
  • size maps to marker area
  • style maps to marker shape (circle, triangle, square, etc.)

This is a five-dimensional visualization: x-position, y-position, color, size, and shape, each representing a different variable. Be cautious — encoding more than three or four variables makes a plot hard to read. But for exploratory work, this is tremendously powerful.

Line Plots

When your x-axis is ordered (like time), switch to a line plot:

sns.relplot(data=yearly_df, x="year",
            y="coverage_pct", hue="region",
            kind="line", height=5, aspect=1.5)

seaborn's lineplot does something clever: when multiple observations exist for the same x-value (say, multiple countries in the same year), it automatically computes the mean and draws a 95% confidence band around it. This makes it easy to see trends and uncertainty without writing any aggregation code yourself.

If you want to plot individual lines (one per country, say), add units and estimator=None:

sns.relplot(data=df, x="year", y="coverage_pct",
            hue="region", units="country",
            estimator=None, kind="line",
            alpha=0.3, height=5, aspect=1.5)

Regression Plots

For exploring linear relationships, seaborn offers regplot (axes-level) and lmplot (figure-level):

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", height=5, aspect=1.3)

This overlays a linear regression line with a 95% confidence band on the scatter plot. The confidence band is computed via bootstrapping, showing the uncertainty in the regression estimate.

You can fit polynomial regressions by specifying order:

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", order=2,
           height=5, aspect=1.3)

Or use a locally weighted regression (LOWESS) for nonlinear relationships:

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", lowess=True,
           height=5, aspect=1.3)

LOWESS fits a flexible curve that follows the data's shape without imposing a specific functional form. It is useful during exploration when you do not yet know what the relationship looks like.

Residual Plots

After fitting a regression, you should always check whether the relationship is truly linear. seaborn's residplot helps:

sns.residplot(data=df, x="gdp_per_capita",
              y="coverage_pct", lowess=True)

This plots the residuals (actual y minus predicted y) against x. If the relationship is truly linear, the residuals should scatter randomly around zero with no pattern. If you see a curve in the residuals, the linear model is missing nonlinear structure — and you should try order=2 or lowess=True in your regression plot.

We will return to residual analysis in the statistics chapters (Part IV), but developing the habit of checking residuals now will serve you well later.

Continuous hue Encoding

So far, we have used hue with categorical variables (regions, income groups). You can also use hue with continuous variables:

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct",
            hue="literacy_rate",
            palette="viridis",
            height=5, aspect=1.3)

This maps literacy rate to a continuous color gradient — low literacy appears as dark purple, high literacy as bright yellow. The palette="viridis" parameter specifies a perceptually uniform colormap that works well for continuous data (we will discuss why in Chapter 18).

Continuous hue encoding is effective when you want to see whether a third variable explains the scatter in a two-variable relationship. If the colors sort themselves — say, all the dark points cluster in the bottom-left and all the bright points cluster in the top-right — the third variable has explanatory power.


16.5 Heatmaps: Visualizing Matrices

Heatmaps display a matrix of numbers as a grid of colored cells, where color intensity encodes the value. They are indispensable for correlation matrices, pivot table summaries, and any data that is naturally grid-shaped.

Correlation Heatmap

The most common use case: visualize how every numeric variable correlates with every other:

corr = df[["coverage_pct", "gdp_per_capita",
           "health_expenditure", "population",
           "literacy_rate"]].corr()

sns.heatmap(corr, annot=True, fmt=".2f",
            cmap="coolwarm", center=0,
            square=True, linewidths=0.5)

Let us unpack the parameters:

  • annot=True — prints the correlation coefficient in each cell.
  • fmt=".2f" — formats the annotations to two decimal places.
  • cmap="coolwarm" — uses a diverging colormap where negative correlations are blue and positive are red.
  • center=0 — centers the colormap at zero, so zero correlation appears as white.
  • square=True — makes each cell square.
  • linewidths=0.5 — adds thin grid lines between cells.

Pivot Table Heatmap

You can also create a heatmap from a pivot table — for example, mean vaccination coverage by region and year:

pivot = df.pivot_table(values="coverage_pct",
                       index="region",
                       columns="year",
                       aggfunc="mean")

sns.heatmap(pivot, annot=True, fmt=".0f",
            cmap="YlGnBu", linewidths=0.5)

This shows you, at a glance, which region-year combinations have high or low coverage. Patterns pop out immediately: you can see whether coverage is improving over time within each region, and which regions lag behind.

Masking Half the Matrix

Correlation matrices are symmetric — the upper and lower triangles contain the same information. You can mask one half for a cleaner look:

import numpy as np

mask = np.triu(np.ones_like(corr, dtype=bool))

sns.heatmap(corr, mask=mask, annot=True,
            fmt=".2f", cmap="coolwarm",
            center=0, square=True)

np.triu() creates a boolean mask for the upper triangle, and seaborn hides those cells.

Interpreting Correlation Heatmaps

A few common patterns you will encounter in correlation heatmaps:

Strong positive correlation (r > 0.7): The two variables move together. GDP and health expenditure often show this — wealthier countries spend more on health. Strong positive correlations appear as deep red in "coolwarm".

Strong negative correlation (r < -0.7): The two variables move in opposite directions. Infant mortality and vaccination coverage might show this — higher coverage correlates with lower infant mortality. Strong negative correlations appear as deep blue.

Near-zero correlation (|r| < 0.2): The two variables have no linear relationship. This does not mean they are unrelated — they could have a nonlinear relationship that correlation does not capture. Near-zero appears as white in a centered diverging colormap.

Suspiciously perfect correlation (r = 1.0 or very close): Outside the diagonal (where every variable correlates perfectly with itself), a correlation near 1.0 suggests that two columns are measuring essentially the same thing, possibly with different units. This is worth investigating — do you have a redundant variable?

Clustered Heatmaps

For large correlation matrices, seaborn offers clustermap, which reorders rows and columns to group correlated variables together:

sns.clustermap(corr, annot=True, fmt=".2f",
               cmap="coolwarm", center=0,
               linewidths=0.5, figsize=(8, 8))

The dendrogram (tree diagram) on the sides shows the hierarchical clustering of variables. Variables that are close in the tree are highly correlated. This is particularly useful when you have 15-20 numeric variables and want to find natural groupings — without staring at a 20x20 matrix trying to spot patterns manually.


16.6 Pair Plots: The Multivariate Overview

When you have several numeric variables and want to see all pairwise relationships at once, pairplot is the tool:

sns.pairplot(df[["coverage_pct", "gdp_per_capita",
                 "health_expenditure", "literacy_rate",
                 "region"]],
             hue="region", diag_kind="kde")

This creates a grid of plots: - The diagonal shows the distribution of each variable (as a histogram or KDE, depending on diag_kind). - The off-diagonal cells show scatter plots of every pair of variables. - The hue parameter colors points by region across all panels.

A pair plot with five numeric variables produces a 5-by-5 grid of 25 panels. That is a lot of information at a glance. Use pair plots during early exploration to identify which pairs of variables have interesting relationships worth investigating further.

Controlling Pair Plot Details

You can customize which plot types appear in different positions:

g = sns.pairplot(df[["coverage_pct",
                     "gdp_per_capita",
                     "literacy_rate", "region"]],
                 hue="region", diag_kind="kde",
                 plot_kws={"alpha": 0.5, "s": 20})

The plot_kws parameter passes keyword arguments to the off-diagonal scatter plots — here, setting transparency and marker size. For more control, you can use PairGrid directly.

PairGrid for Full Control

PairGrid is to pairplot what FacetGrid is to catplot — the low-level version that gives you complete control over what appears in each position:

g = sns.PairGrid(df[["coverage_pct",
                      "gdp_per_capita",
                      "literacy_rate", "region"]],
                 hue="region")
g.map_upper(sns.scatterplot, alpha=0.5, s=15)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)
g.add_legend()

This creates a pair plot where the upper triangle shows scatter plots, the lower triangle shows 2D KDE contours, and the diagonal shows histograms with KDE overlays. The asymmetric design avoids redundancy (the upper and lower triangles show different information) while maximizing the insight per panel.

When Pair Plots Are (and Are Not) Useful

Pair plots shine during the early exploration phase when you have 3-6 numeric variables and want to quickly identify which pairs have interesting relationships. They answer the question "Where should I look next?" rather than providing deep analysis.

However, pair plots become unwieldy with many variables. A pair plot of 10 numeric variables produces a 10x10 grid of 100 panels — far too many to study meaningfully. If you have more than 6-7 numeric variables, start with a correlation heatmap to identify the strongest relationships, then create a pair plot of only the most interesting subset.


16.7 FacetGrid: Multi-Panel Layouts

seaborn's figure-level functions (displot, relplot, catplot) all use FacetGrid under the hood. You can also use FacetGrid directly for maximum flexibility.

The col and row Parameters

The simplest way to create faceted plots is with the col and row parameters in figure-level functions:

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct",
            col="income_group", col_wrap=2,
            height=4, aspect=1.2)

This creates a separate scatter plot for each income group, arranged in a 2-column grid. The col_wrap=2 parameter tells seaborn to wrap to a new row after every two columns.

You can use both col and row simultaneously:

sns.displot(data=df, x="coverage_pct",
            col="region", row="income_group",
            kind="kde", height=3, aspect=1.2)

This creates a matrix of KDE plots where each row is an income group and each column is a region. The panels share axes by default, making comparisons easy.

Using FacetGrid Directly

For plot types that are not built into the figure-level functions, use FacetGrid with .map():

g = sns.FacetGrid(df, col="region", col_wrap=3,
                  height=3, aspect=1.2)
g.map(sns.histplot, "coverage_pct", bins=20)
g.set_titles("{col_name}")
g.set_axis_labels("Coverage (%)", "Count")

The .map() method applies the specified plotting function to each subset of data. .set_titles() customizes the panel titles using template strings like {col_name}.

You can also map custom functions:

def annotate_median(data, **kwargs):
    median = data.median()
    plt.axvline(median, color="red",
                linestyle="--", linewidth=1)

g = sns.FacetGrid(df, col="region", col_wrap=3,
                  height=3, aspect=1.2)
g.map(sns.histplot, "coverage_pct", bins=20)
g.map(annotate_median, "coverage_pct")

This adds a red dashed line at the median of each panel.

Shared vs. Independent Axes

By default, faceted plots share axes — all panels use the same x and y range. This is essential for comparison (you can compare heights across panels because the scale is consistent). But sometimes you want independent axes, for example when groups have very different ranges:

g = sns.FacetGrid(df, col="income_group",
                  col_wrap=2, height=3,
                  sharey=False)
g.map(sns.histplot, "gdp_per_capita", bins=20)

The sharey=False parameter lets each panel use its own y-axis range. Use this sparingly — shared axes make comparison easy, and independent axes can mislead viewers who assume the scales match. When you use independent axes, make sure the axis labels are visible on every panel.

Practical FacetGrid Patterns

Here are several patterns that come up frequently in practice:

Adding a summary statistic to each panel:

def add_mean_text(data, **kwargs):
    mean = data.mean()
    plt.text(0.95, 0.95, f"Mean: {mean:.1f}",
             transform=plt.gca().transAxes,
             ha="right", va="top", fontsize=9,
             bbox=dict(boxstyle="round", alpha=0.2))

g = sns.FacetGrid(df, col="region", col_wrap=3,
                  height=3)
g.map(sns.histplot, "coverage_pct", bins=20)
g.map(add_mean_text, "coverage_pct")

This places a text box showing the mean value in the top-right corner of each panel. The transform=plt.gca().transAxes converts the coordinates from data space to axes-relative space (0 to 1), so the label always appears in the same position regardless of the data range.

Combining figure-level faceting with axes-level detail:

Sometimes the easiest approach is to use a figure-level function for the basic structure and then iterate over the axes for customization:

g = sns.catplot(data=df, x="income_group",
                y="coverage_pct", col="region",
                col_wrap=3, kind="box",
                height=3, aspect=1.2)

for ax in g.axes.flatten():
    ax.axhline(90, color="red", linestyle="--",
               linewidth=1, alpha=0.5)

This adds a horizontal reference line at 90% coverage (a common public health target) to every panel. The g.axes.flatten() call converts the 2D array of Axes into a 1D list for easy iteration.


16.8 Themes, Palettes, and Customization

Built-In Themes

seaborn ships with five built-in themes that control the overall look:

Theme Description
"darkgrid" Gray background with white grid lines (default)
"whitegrid" White background with gray grid lines
"dark" Gray background, no grid
"white" White background, no grid
"ticks" White background with tick marks on axes

Set the theme at the start of your notebook:

sns.set_theme(style="whitegrid")

For publication-ready plots, "ticks" or "white" tend to look cleanest. For data exploration, "whitegrid" provides helpful reference lines.

Color Palettes

seaborn provides several palette families:

Qualitative palettes for categorical data (distinct, unordered colors):

sns.color_palette("muted")       # Soft, distinctive
sns.color_palette("Set2")        # From ColorBrewer
sns.color_palette("tab10")       # matplotlib default

Sequential palettes for ordered data (light-to-dark gradient):

sns.color_palette("Blues")
sns.color_palette("YlOrRd")      # Yellow to orange to red

Diverging palettes for data with a meaningful center point:

sns.color_palette("coolwarm")    # Blue-white-red
sns.color_palette("RdBu")        # Red-white-blue

You can set the default palette for all subsequent plots:

sns.set_palette("colorblind")

The "colorblind" palette is designed to be distinguishable by people with the most common forms of color vision deficiency. We will discuss accessibility in depth in Chapter 18, but using "colorblind" as your default is a simple, high-impact choice you can make right now.

Fine-Tuning with matplotlib

Since every seaborn plot is a matplotlib figure underneath, you can always customize further:

g = sns.catplot(data=df, x="region",
                y="coverage_pct", kind="violin",
                height=5, aspect=1.8)

g.set_axis_labels("WHO Region", "Coverage (%)")
g.set_xticklabels(rotation=45)
g.fig.suptitle("Vaccination Coverage by Region",
               y=1.02, fontsize=14)
plt.tight_layout()

The g object returned by figure-level functions has methods like set_axis_labels(), set_titles(), and set_xticklabels(). For anything beyond that, access g.fig (the Figure) and g.axes (the Axes array) directly.

The Context Parameter

seaborn's set_theme() accepts a context parameter that scales all text and line elements proportionally:

sns.set_theme(context="talk")  # Larger for slides
sns.set_theme(context="poster")  # Even larger
sns.set_theme(context="paper")  # Smaller for print
sns.set_theme(context="notebook")  # Default

This is enormously useful. Instead of manually adjusting every font size when you move a plot from a notebook to a slide deck, you change one parameter.

Creating Custom Palettes

Beyond the built-in palettes, you can create your own:

# From a list of hex colors
custom = ["#264653", "#2a9d8f", "#e9c46a",
          "#f4a261", "#e76f51"]
sns.set_palette(custom)

# From a seaborn function
custom = sns.color_palette("husl", 8)  # 8 evenly-spaced hues
sns.set_palette(custom)

# Light/dark variants of a single color
custom = sns.light_palette("seagreen", n_colors=5)

Custom palettes are useful when you need to match your organization's branding (specific hex colors) or when you need a specific number of distinguishable colors. The "husl" palette produces evenly-spaced hues around the color wheel, which is useful when you need many distinct colors.

Saving and Reusing Settings

For consistency across a project, set your theme at the top of every notebook or script:

# My standard settings
sns.set_theme(
    style="whitegrid",
    palette="colorblind",
    context="notebook",
    font_scale=1.1,
    rc={"figure.figsize": (8, 5)}
)

The font_scale parameter multiplies all font sizes by a factor. The rc parameter accepts a dictionary of matplotlib rcParams for fine-grained control. Setting these once ensures that all your plots have a consistent look without repeating configuration code.


16.9 Choosing the Right seaborn Plot

With so many options, how do you choose? Here is a decision guide:

By Question Type

Your Question Recommended Plot Function
"What does the distribution of X look like?" Histogram or KDE displot(kind="hist") or displot(kind="kde")
"How do distributions differ across groups?" Overlapping KDEs or faceted histograms displot(hue=...) or displot(col=...)
"How does Y vary across categories?" Box plot, violin plot, or swarm plot catplot(kind="box"/"violin"/"swarm")
"What is the mean of Y per category?" Bar plot with CI catplot(kind="bar")
"How are X and Y related?" Scatter plot relplot()
"What is the linear relationship?" Regression plot lmplot()
"How do all variables correlate?" Correlation heatmap heatmap() on .corr()
"Show me all pairwise relationships" Pair plot pairplot()

By Dataset Size

Observations Good Choices Avoid
< 100 Strip, swarm, scatter with rug KDE (too few points for smooth estimate)
100 - 1,000 Box, violin, scatter, KDE Swarm (can get crowded)
1,000 - 10,000 Violin, KDE, heatmap, scatter with alpha Swarm, strip without jitter
> 10,000 KDE, hexbin, 2D histogram, aggregated heatmap Individual point plots (overplotting)

The "hue First" Heuristic

When you want to add a grouping variable, try hue first. If the plot gets too crowded with hue (too many groups, overlapping colors), switch to col (faceting into separate panels). If you need both a grouping and a faceting variable, use hue for the smaller number of groups and col for the larger.


16.10 Putting It Together: GDP vs. Vaccination Coverage

Let us build a polished, multi-layered visualization that answers a real question: How does national wealth relate to vaccination coverage, and does the relationship differ across WHO regions?

Step 1: Initial Scatter Plot

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct",
            height=5, aspect=1.3)

We see a general upward trend — wealthier countries tend to have higher coverage — but with significant scatter.

Step 2: Add Region as Color

sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct", hue="region",
            height=5, aspect=1.3)

Now patterns emerge. Some regions cluster in the lower-left (low GDP, lower coverage) while others spread across the right side.

Step 3: Add Regression Lines by Region

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", hue="region",
           height=5, aspect=1.5, scatter_kws={"s": 20})

Each region gets its own regression line with a confidence band. Some regions show steep positive slopes (coverage improves dramatically with GDP); others show flat slopes (coverage is high regardless of GDP, or GDP does not help).

Step 4: Faceted View for Clarity

sns.lmplot(data=df, x="gdp_per_capita",
           y="coverage_pct", col="region",
           col_wrap=3, height=3, aspect=1.2,
           scatter_kws={"s": 15, "alpha": 0.5})

With each region in its own panel, the relationship within each region is much clearer. You can see the slope, the scatter, and the outliers without the visual noise of overlapping colors.

Step 5: Distribution Context with Pair Plot

sns.pairplot(df[["coverage_pct", "gdp_per_capita",
                 "health_expenditure", "region"]],
             hue="region", diag_kind="kde",
             plot_kws={"alpha": 0.4, "s": 15})

This gives you the full picture: not just GDP vs. coverage, but GDP vs. health expenditure, health expenditure vs. coverage, and the marginal distributions of each variable.

Notice how each step builds on the previous one, adding complexity gradually. This is the workflow seaborn is designed for — start simple, add layers, and stop when the visualization answers your question.

Step 6: Distribution Context with Distributions

Let us add a distributional view to complement the scatter plots. How does coverage itself vary across regions?

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: KDE by region
for region in df["region"].unique():
    subset = df[df["region"] == region]
    sns.kdeplot(subset["coverage_pct"],
                label=region, ax=axes[0],
                fill=True, alpha=0.2)
axes[0].set_title("Coverage Distribution by Region")
axes[0].set_xlabel("Coverage (%)")
axes[0].legend(fontsize=8)

# Right: violin plot
sns.violinplot(data=df, x="region",
               y="coverage_pct",
               inner="quartile", ax=axes[1])
axes[1].set_title("Coverage by Region (Violin)")
axes[1].tick_params(axis="x", rotation=45)

plt.tight_layout()

The KDE view and the violin view show the same information in different ways. The overlapping KDEs let you compare the shapes directly — you can see which regions have tight, peaked distributions (high consistency) versus broad, flat distributions (high variability). The violin view gives you the quartile lines and aligns each distribution with its label.

Step 7: The Correlation Structure

Finally, a heatmap reveals which variables are most strongly related:

numeric_cols = ["coverage_pct", "gdp_per_capita",
                "health_expenditure", "population",
                "literacy_rate"]
corr = df[numeric_cols].corr()

mask = np.triu(np.ones_like(corr, dtype=bool))

fig, ax = plt.subplots(figsize=(7, 6))
sns.heatmap(corr, mask=mask, annot=True,
            fmt=".2f", cmap="coolwarm",
            center=0, square=True, ax=ax)
ax.set_title("Correlation Matrix")
plt.tight_layout()

The heatmap answers "which variables are worth investigating together?" at a glance. If GDP and health expenditure are highly correlated (r = 0.85, say), you might not need both in a regression model. If literacy rate and coverage have a moderate positive correlation (r = 0.6), that relationship is worth exploring with a scatter plot.

This seven-step workflow — scatter, color, regression, facet, pair plot, distribution, heatmap — is the standard seaborn exploration cycle. You will not always need all seven steps. Sometimes the first scatter plot answers your question immediately. Other times you will iterate through multiple rounds. The key principle is: start simple, add complexity only when it reveals new information, and stop when you have enough insight to answer your question or write your report.


16.11 Common Mistakes and How to Fix Them

Mistake 1: Overplotting

When you have thousands of data points, scatter plots become a solid blob:

# Problem: too many overlapping points
sns.relplot(data=large_df, x="x", y="y")

Fixes: - Reduce alpha: scatter_kws={"alpha": 0.1} - Use a 2D histogram or hexbin: plt.hexbin(df["x"], df["y"], gridsize=30) - Use KDE: sns.kdeplot(data=df, x="x", y="y")

Mistake 2: Too Many hue Categories

If your hue variable has 15 categories, you get 15 colors — most of which are indistinguishable:

# Problem: too many colors to tell apart
sns.relplot(data=df, x="x", y="y",
            hue="country")  # 50+ countries!

Fix: Group into fewer categories, or use col for faceting instead of hue.

Mistake 3: Swarm Plot on Large Data

# Problem: swarm takes forever and overlaps anyway
sns.catplot(data=big_df, x="group", y="value",
            kind="swarm")  # 5000 points per group

Fix: Switch to violin or box plot for large datasets.

Mistake 4: Ignoring Figure Size

seaborn's default sizes work well in notebooks but may look odd in papers or slides:

# Fix: always specify height and aspect
sns.catplot(data=df, x="region", y="value",
            kind="box", height=5, aspect=1.5)

The height parameter controls the height of each facet in inches. The aspect parameter controls the ratio of width to height. Together, they give you precise control over the figure's dimensions.

Mistake 5: Forgetting that seaborn Returns Objects

Figure-level functions return a FacetGrid object, not a matplotlib Axes. Do not try to use matplotlib Axes methods directly on them:

# Wrong
g = sns.catplot(data=df, x="region", y="value",
                kind="box")
g.set_xlabel("Region")  # AttributeError!

# Right
g.set_axis_labels("Region", "Value")

For axes-level functions (like sns.boxplot), the return value is a matplotlib Axes, and matplotlib methods work directly.

Mistake 6: Not Handling Categorical Order

seaborn orders categories alphabetically by default. This can produce confusing charts when a natural order exists:

# Problem: income groups appear in alphabetical order
# "High", "Low", "Lower-Middle", "Upper-Middle"
sns.catplot(data=df, x="income_group",
            y="coverage_pct", kind="bar")

# Fix: specify a meaningful order
sns.catplot(data=df, x="income_group",
            y="coverage_pct", kind="bar",
            order=["Low", "Lower-Middle",
                   "Upper-Middle", "High"])

Without specifying order, "High" appears first (alphabetically), then "Low", then "Lower-Middle", then "Upper-Middle" — which makes no logical sense. Always specify order for ordinal categories.

Mistake 7: Misinterpreting Bar Plot Error Bars

Students frequently mistake seaborn's bar plot error bars for standard deviations. They are not. By default, barplot shows 95% bootstrap confidence intervals around the mean:

# These error bars show CI, not SD
sns.barplot(data=df, x="region", y="coverage_pct")

The CI tells you about the uncertainty in the mean estimate — if you repeated the measurement, the mean would fall in this range 95% of the time. The standard deviation tells you about the spread of individual observations. A narrow CI does not mean all observations are similar; it can mean you have many observations (which reduces uncertainty in the mean) even if they are widely spread.

To show standard deviation instead:

sns.barplot(data=df, x="region", y="coverage_pct",
            errorbar="sd")

To show no error bar at all (rare, but sometimes appropriate):

sns.barplot(data=df, x="region", y="coverage_pct",
            errorbar=None)

Mistake 8: Ignoring Outlier Impact on Axis Scaling

A single extreme outlier can compress your entire visualization:

# If one country has GDP of 200,000 while most are
# under 50,000, the scatter is compressed to the left
sns.relplot(data=df, x="gdp_per_capita",
            y="coverage_pct")

Fixes: - Filter or clip outliers before plotting: df[df["gdp_per_capita"] < 100000] - Use a log scale: add plt.xscale("log") after the plot call - Use robust=True in regression plots, which reduces the influence of outliers on the regression line


16.12 seaborn and the Visualization Pipeline

Let us step back and see where seaborn fits in the broader visualization workflow you have been building.

In Chapter 14, you learned the Grammar of Graphics — the idea that every visualization is a composition of data, aesthetic mappings, and geometric objects. In Chapter 15, you learned matplotlib, which gives you low-level control over those components. Now, with seaborn, you have a high-level interface that makes the most common Grammar-of-Graphics patterns easy to express.

Here is the mapping:

Grammar of Graphics Concept seaborn Implementation
Data data=df parameter
Aesthetic mapping x, y, hue, size, style parameters
Geometric object kind parameter ("scatter", "box", "violin", etc.)
Faceting col, row parameters
Statistical transformation Built-in (KDE, regression, confidence intervals)
Scale palette, sizes parameters
Theme set_theme() with style and context

The Grammar of Graphics is the theory. matplotlib is the engine. seaborn is the expressway — it takes you to the most common destinations quickly, and you can always exit to matplotlib for custom detours.


16.13 Chapter Summary

You started this chapter knowing how to build any visualization from scratch with matplotlib. Now you know how to build most statistical visualizations in a fraction of the code with seaborn. The key ideas:

  • displot() handles univariate and bivariate distributions — histograms, KDEs, ECDFs, and rug plots.
  • catplot() handles categorical comparisons — box, violin, swarm, bar, strip, count, and point plots.
  • relplot() handles relationships between numeric variables — scatter and line plots with automatic aggregation and confidence bands.
  • heatmap() visualizes matrices, especially correlation matrices and pivot tables.
  • pairplot() creates a matrix of scatter plots and distributions for multivariate overview.
  • The hue, col, and row parameters let you encode additional variables through color and multi-panel layouts.
  • FacetGrid powers all figure-level functions and can be used directly for custom multi-panel plots.
  • seaborn sits on top of matplotlib, so you can always reach in and customize with the tools you already know.

In Chapter 17, you will take your visualizations beyond static images and into the interactive world with plotly, where tooltips, zoom, pan, and animation bring your data to life. And in Chapter 18, you will step back to think critically about visualization design — how to make your charts not just correct, but honest, accessible, and effective.


Summary of Key Functions

Function Level Purpose
sns.displot() Figure Distribution plots (hist, kde, ecdf, rug)
sns.relplot() Figure Relational plots (scatter, line)
sns.catplot() Figure Categorical plots (box, violin, swarm, bar, strip, count, point)
sns.lmplot() Figure Regression plots
sns.pairplot() Figure Pairwise scatter matrix
sns.heatmap() Axes Matrix visualization
sns.histplot() Axes Histogram (single axes)
sns.kdeplot() Axes KDE (single axes)
sns.scatterplot() Axes Scatter (single axes)
sns.boxplot() Axes Box plot (single axes)
sns.violinplot() Axes Violin plot (single axes)
sns.stripplot() Axes Jittered points (single axes)
sns.swarmplot() Axes Non-overlapping points (single axes)
sns.barplot() Axes Mean bar plot (single axes)
sns.regplot() Axes Regression plot (single axes)
sns.set_theme() Global Set style, palette, context