29 min read

> "Summary statistics are a compressed description of a distribution. The distribution contains more information than the summary. Visualization is how you see what the summary throws away."

Learning Objectives

  • Create histograms with sns.histplot() including stacked, layered, dodged, and filled variants
  • Create kernel density estimates with sns.kdeplot() including bandwidth selection, 1D and 2D variants
  • Create rug plots with sns.rugplot() as supplementary marginal marks
  • Create empirical cumulative distribution function plots with sns.ecdfplot()
  • Create violin plots with sns.violinplot() including split violins and inner representations
  • Create ridge plots (joy plots) using FacetGrid + kdeplot for comparing many distributions
  • Select the optimal distributional visualization given the data characteristics
  • Explain the statistical meaning of KDE bandwidth and its visual effect on smoothness

Chapter 17: Distributional Visualization

"Summary statistics are a compressed description of a distribution. The distribution contains more information than the summary. Visualization is how you see what the summary throws away." — The central argument of Chapter 1, applied here in seaborn.


Chapter 16 introduced seaborn as a statistical visualization library with three function families. This chapter covers the distributional family in depth: histplot, kdeplot, ecdfplot, rugplot, violinplot, and the ridge plot pattern. These are the tools for visualizing how values of a single variable (or two variables jointly) are distributed.

Distribution matters because summary statistics lie. The mean of a dataset does not tell you whether the data is symmetric or skewed. The standard deviation does not tell you whether the tails are fat or thin. The median does not tell you whether the distribution is unimodal or bimodal. A single "average" hides everything that makes the data interesting — the extreme values, the clusters, the gaps, the shoulders, the truncations. When you want to understand your data, the distribution is where you look.

This chapter walks through each distributional chart type with examples using seaborn's built-in datasets and the climate project. For each chart type, we cover the basic API, the key parameters, the common variations, the pitfalls, and the specific questions it answers. By the end, you should be able to look at a data set and know which distributional chart will reveal the feature you care about.

The threshold concept of the chapter is that distribution shape is information. It is not a vague aesthetic thing; it is quantitative information that determines whether your statistical model is appropriate, whether your summary statistics are meaningful, and whether two groups are actually comparable. Learning to see distribution shapes is a skill you develop by looking at many distributions, which means producing many distributional charts and reading them carefully. This chapter gives you the tools.


17.1 Why Distribution Shape Matters

Before learning the seaborn API, it helps to be clear about why distribution shape is worth visualizing.

Summary Statistics Hide Structure

Consider two datasets. Dataset A has mean 50 and standard deviation 10. Dataset B has mean 50 and standard deviation 10. By the summary statistics, they are identical. But:

  • Dataset A is roughly normal: a bell-shaped curve centered at 50.
  • Dataset B is bimodal: two clusters, one near 30 and one near 70.

The mean is the same. The standard deviation is the same. The underlying structure is completely different. A model that assumes Dataset A (normal) will give correct answers; the same model applied to Dataset B will give confident answers that are systematically wrong because the "average" customer is actually a mix of two very different groups.

This is the lesson of Anscombe's Quartet (Chapter 1) applied to distributions. Summary statistics are lossy compressions; the distribution is the full signal. Visualization is how you recover the signal.

What Distributions Can Look Like

Common distribution shapes worth recognizing:

Unimodal and symmetric: a single peak, roughly equal tails (the normal distribution is the canonical example). Most basic statistics assume this shape.

Unimodal and skewed: a single peak with one tail longer than the other. Income distributions are right-skewed (a few people make a lot). Age-at-retirement distributions are left-skewed (most retire near the typical age, a few retire early).

Bimodal: two peaks. Often indicates two mixed populations (male/female body weights, day-of-week shoppers, high vs. low engagement users).

Multimodal: more than two peaks. Less common but meaningful when present.

Heavy-tailed: most values are in a narrow range, but occasional extreme values are much larger (or smaller) than any normal distribution would predict. Stock returns are classically heavy-tailed.

Truncated: values cannot go below (or above) a certain threshold. Data on test scores that is capped at 100, or survival times for subjects still alive at the end of a study.

Uniform: roughly equal probability across a range. Rare in real data but appears in uniform random sampling.

Each shape implies different statistical approaches, different modeling assumptions, and different communication strategies. Recognizing the shape is the first step; visualizing it is the tool.


17.2 Histograms with sns.histplot

The histogram is the most familiar distributional chart. seaborn's histplot function is more capable than matplotlib's ax.hist, supporting automatic binning, multi-group comparisons, KDE overlays, and normalization.

Basic Usage

import seaborn as sns
sns.set_theme(style="whitegrid")

penguins = sns.load_dataset("penguins")

sns.histplot(data=penguins, x="body_mass_g")

This produces a histogram of penguin body mass with default binning. seaborn uses its own bin-count heuristic by default (based on the data range and count), which usually produces reasonable output.

Bin Specification

sns.histplot(data=penguins, x="body_mass_g", bins=30)
sns.histplot(data=penguins, x="body_mass_g", bins="auto")
sns.histplot(data=penguins, x="body_mass_g", binwidth=200)
sns.histplot(data=penguins, x="body_mass_g", binrange=(2500, 6500))
  • bins=N: use exactly N bins.
  • bins="auto" (or other strings like "fd", "doane", "sturges"): use a rule-based bin count.
  • binwidth=W: specify the width of each bin directly.
  • binrange=(min, max): restrict the binning to a specific range.

For most data, the default works. When you want finer control — for example, to match a specific target bin width — specify binwidth explicitly.

Comparing Groups with hue

The hue parameter produces multi-group histograms. The multiple parameter controls how the groups are arranged:

# Layered (overlaid with transparency)
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="layer")

# Stacked
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="stack")

# Side-by-side (dodged)
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="dodge")

# Filled (each group is normalized to unit area)
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="fill")

Each arrangement answers a slightly different question:

  • layer: good for comparing the shapes of the distributions directly (see where they overlap and differ).
  • stack: good for seeing totals and per-group contributions to each bin.
  • dodge: good for seeing individual group counts without overlap confusion.
  • fill: good for seeing proportional composition at each bin (ignores total count).

Normalization with stat

By default, histogram y-axis shows counts. You can change this with the stat parameter:

sns.histplot(data=penguins, x="body_mass_g", stat="count")      # default: count
sns.histplot(data=penguins, x="body_mass_g", stat="frequency")  # count / bin width
sns.histplot(data=penguins, x="body_mass_g", stat="density")    # density: area sums to 1
sns.histplot(data=penguins, x="body_mass_g", stat="probability")  # probability: bars sum to 1
sns.histplot(data=penguins, x="body_mass_g", stat="percent")    # percent: bars sum to 100

For comparing groups with different sample sizes, use stat="density" or stat="probability" to normalize. Otherwise, larger groups visually dominate smaller ones regardless of the actual distribution shape.

KDE Overlay

sns.histplot(data=penguins, x="body_mass_g", kde=True)

The kde=True parameter overlays a kernel density estimate on the histogram. This gives you both the binned counts (histogram) and the smooth estimate (KDE) in a single chart, useful when you want to show both the raw data and the inferred shape.


17.3 Kernel Density Estimates with sns.kdeplot

Kernel density estimation (KDE) produces a smooth estimate of the distribution by placing a small "kernel" (usually a Gaussian) at each data point and summing them. The result is a continuous curve that approximates the true underlying distribution.

Basic KDE

sns.kdeplot(data=penguins, x="body_mass_g")

This produces a smooth density curve. The y-axis is density (area under the curve sums to 1).

Bandwidth Parameter

The bandwidth controls the smoothness of the KDE. A small bandwidth produces a jagged curve that follows individual data points; a large bandwidth produces a smooth curve that may hide features.

sns.kdeplot(data=penguins, x="body_mass_g", bw_adjust=0.3)  # less smooth
sns.kdeplot(data=penguins, x="body_mass_g", bw_adjust=1.0)  # default
sns.kdeplot(data=penguins, x="body_mass_g", bw_adjust=3.0)  # much smoother

bw_adjust is a multiplier on seaborn's automatically-chosen bandwidth. Values less than 1 produce a less-smooth curve; values greater than 1 produce a smoother curve. The default (1.0) uses a data-driven rule (usually Scott's or Silverman's rule).

Multi-Group KDE

sns.kdeplot(data=penguins, x="body_mass_g", hue="species", fill=True, alpha=0.5)

The fill=True parameter fills the area under each curve. With hue, this produces multiple filled densities — one per species — that overlap with transparency. The result is a visual comparison of the distribution shapes across groups.

2D KDE (Bivariate)

sns.kdeplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
sns.kdeplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", fill=True, cmap="Blues")

Passing both x and y produces a 2D kernel density estimate — a contour plot of the joint density of the two variables. With fill=True, the contours are filled. With cmap, you control the colormap.

2D KDE is useful for showing where clusters or modes live in a 2D space, especially when a scatter plot would be too dense to read.

KDE Pitfalls

1. Boundary effects. KDE assumes the data can take any value, including negative values if the kernel extends below zero. For strictly positive data (like income or age), this can produce a tail that goes below zero, which is misleading. The clip=(lower, upper) parameter truncates the KDE to a specific range.

2. Over-smoothing. A large bandwidth can hide bimodality or other important features. Check multiple bandwidth values to see if the shape is robust.

3. Under-smoothing. A small bandwidth produces a jagged curve that shows noise rather than signal. The default bandwidth is usually too smooth; halving it is often better for data exploration.

4. Small sample sizes. KDE is unreliable for small samples (fewer than ~30 points). For small datasets, use histograms or strip plots instead.


17.4 Rug Plots: Marginal Marks for Individual Observations

A rug plot draws a small tick mark for each data point along the bottom (or side) of the chart, showing the exact location of every observation.

sns.rugplot(data=penguins, x="body_mass_g")

Rug plots are rarely used alone — they are supplementary marks that combine with other distributional charts:

# Histogram with rug
sns.histplot(data=penguins, x="body_mass_g", kde=True)
sns.rugplot(data=penguins, x="body_mass_g")

# KDE with rug
sns.kdeplot(data=penguins, x="body_mass_g")
sns.rugplot(data=penguins, x="body_mass_g")

The rug plot adds the raw observations as context on top of the smoothed or binned distribution. This is particularly useful for:

  • Showing actual data points alongside a density estimate.
  • Identifying gaps in the distribution (where no data lives).
  • Seeing clustering that binning might hide.

The height parameter controls how tall the rug marks are as a fraction of the axes height. Default is small (0.025). For denser plots, a shorter rug is less cluttered.


17.5 ECDFs with sns.ecdfplot

The empirical cumulative distribution function (ECDF) is an often-underused distributional chart that has specific advantages over histograms and KDE.

What an ECDF Is

The ECDF at value x is the fraction of the data that is less than or equal to x. It is a step function that starts at 0 on the left, increases by 1/n at each data point (where n is the sample size), and ends at 1 on the right.

sns.ecdfplot(data=penguins, x="body_mass_g")

This produces a step curve from 0 to 1 showing the cumulative distribution.

Why ECDFs Are Good

1. No binning decisions. ECDFs show every data point without the binning judgment calls that histograms require. There is no "right" number of bins to choose.

2. Easy quantile reading. At any y-value, you can read the corresponding quantile: at y=0.5, the x-value is the median. At y=0.25, it is the 25th percentile. No calculation needed.

3. Exact comparison of groups. Overlapping ECDFs let you see exactly where two distributions differ. A gap between two ECDFs at a specific x-value shows the difference in cumulative probability at that x.

4. No smoothing choices. Unlike KDE, ECDFs do not require a bandwidth. The curve is the data, exactly.

5. Good for small samples. ECDFs work at any sample size because they show every point directly.

Multi-Group ECDFs

sns.ecdfplot(data=penguins, x="body_mass_g", hue="species")

With hue, you get one ECDF per group, and you can see exactly how the distributions differ across the range. The Kolmogorov-Smirnov test, a standard non-parametric test, is literally about the maximum vertical distance between two ECDFs. Visualizing the ECDFs gives you the test intuitively.

Why ECDFs Are Underused

ECDFs are less familiar than histograms, so first-time viewers take longer to read them. They do not show "peaks" the way histograms do — the cumulative nature means the eye reads the slope, not the height. For audiences who are not used to them, a brief explanation helps.

Despite the unfamiliarity, ECDFs are probably the most honest distributional visualization available. They make no assumptions, require no decisions, and show every data point. For exploratory data analysis and precise comparison of groups, ECDFs should be in your toolbox.


17.6 Violin Plots with sns.violinplot

A violin plot is a box plot with a kernel density estimate draped over it. It shows the summary statistics (box plot) and the distribution shape (KDE) in a single chart, useful for comparing distributions across categories.

Basic Violin Plot

sns.violinplot(data=penguins, x="species", y="body_mass_g")

This produces one "violin" per species, each showing the distribution of body mass within that group. The violin's width at any y-value is proportional to the density at that value; the thicker sections are where more data lives.

Inner Representations

sns.violinplot(data=penguins, x="species", y="body_mass_g", inner="quartile")
sns.violinplot(data=penguins, x="species", y="body_mass_g", inner="box")
sns.violinplot(data=penguins, x="species", y="body_mass_g", inner="point")
sns.violinplot(data=penguins, x="species", y="body_mass_g", inner="stick")
sns.violinplot(data=penguins, x="species", y="body_mass_g", inner=None)

The inner parameter controls what is drawn inside each violin:

  • "quartile": draws lines at the 25th, 50th, and 75th percentiles. Default.
  • "box": draws a small box plot inside the violin. Shows both the summary and the shape.
  • "point": draws each observation as a small mark.
  • "stick": draws each observation as a small vertical line.
  • None: draws nothing inside, just the violin outline.

For most purposes, "quartile" or "box" is the right choice. Use "point" or "stick" when sample sizes are small and you want to show individual observations.

Split Violins

For two-level categorical variables, split=True puts half of one group on each side of the violin, enabling direct comparison:

sns.violinplot(data=penguins, x="species", y="body_mass_g", hue="sex", split=True, inner="quartile")

This produces three violins (one per species), each split in half: male on one side, female on the other. The split is particularly useful for before/after comparisons or male/female comparisons where you want to see both halves as one shape.

Violin Plot Pitfalls

1. Misleading tails. By default, KDE extends beyond the actual data range, producing "tails" that suggest data exists where it does not. Use the cut parameter to control this: cut=0 truncates the violin at the actual data range; higher values (default is cut=2) extend it proportionally.

2. Symmetry confusion. A violin plot is symmetric around its central axis — both sides are the same shape. The symmetry is a stylistic choice; it does not mean the data is symmetric. Readers unfamiliar with violins may misread them.

3. Small sample sizes. Like KDE, violin plots are unreliable for small samples. Use strip or swarm plots for groups with fewer than ~30 observations.

4. Too many groups. Violin plots get cluttered when there are more than ~10 groups. For many-group comparisons, consider ridge plots (next section) instead.


17.7 Ridge Plots (Joy Plots)

Ridge plots (also called joy plots, after Joy Division's album cover) stack multiple KDE curves vertically with slight overlap, creating a striking visual that can compare many groups at once.

seaborn does not have a built-in ridge plot function, but the pattern is straightforward with FacetGrid:

import seaborn as sns
import matplotlib.pyplot as plt

# Assume tips dataset has a day column with 4 values (Thu, Fri, Sat, Sun)
tips = sns.load_dataset("tips")

sns.set_theme(style="white")

# Use FacetGrid with row-based faceting
g = sns.FacetGrid(
    tips,
    row="day",
    hue="day",
    aspect=6,
    height=1,
    palette="viridis",
)

# Draw densities
g.map(sns.kdeplot, "total_bill", clip_on=False, fill=True, alpha=0.7, linewidth=1.5)
g.map(sns.kdeplot, "total_bill", clip_on=False, color="w", linewidth=2)

# Overlap the rows
g.figure.subplots_adjust(hspace=-0.6)

# Remove axes
g.set_titles("")
g.set(yticks=[], ylabel="")
g.despine(bottom=True, left=True)

# Add labels inline
for ax, day in zip(g.axes.flat, tips["day"].unique()):
    ax.text(0, 0.1, day, fontweight="bold", fontsize=12, transform=ax.transAxes)

The key elements:

  • FacetGrid with row="day": one row per group.
  • aspect=6, height=1: each row is 1 inch tall and 6 inches wide (a wide strip).
  • g.map(sns.kdeplot, ...): draw a filled KDE on each row.
  • hspace=-0.6: negative vertical spacing causes the rows to overlap, creating the "ridge" effect.
  • Hide axes and use inline labels: ridge plots typically have no axis labels; each row is labeled inline.

The result is an aesthetically striking chart that compares distributions across many groups. It is less precise than side-by-side violins but more visually compelling.

When Ridge Plots Work

Ridge plots are best for:

  • Comparing distributions across many groups (5-20 groups).
  • Showing temporal evolution of a distribution (one row per time period).
  • Producing an eye-catching visualization for presentations or articles.

They are not good for:

  • Precise quantile comparison (use ECDFs).
  • Very small groups (KDE is unreliable).
  • Cases where precise values matter (the visual impact is the point, not exact reading).

17.8 Choosing Between Distributional Chart Types

With six distributional chart types available (histogram, KDE, ECDF, violin, ridge, and the rarely-used rug), how do you choose?

Here is a decision matrix:

Question Chart type
"What does the distribution look like?" Histogram, KDE
"Is it bimodal?" Histogram, KDE (check bandwidth)
"What are the exact quantiles?" ECDF
"How do two groups compare?" Overlaid histograms, KDE, or ECDF
"How do many groups compare?" Ridge plot, violin (up to ~10 groups)
"What is the group median and spread?" Violin or box plot
"What do the individual observations look like?" Strip plot with histogram/KDE overlay
"Is there a cluster structure in 2D?" 2D KDE or scatter with density contour

No single chart type is always right. The choice depends on the question and the audience:

  • For exploratory analysis: start with histograms (fastest), add KDE if you want smooth shapes, use ECDFs for precise comparison.
  • For presentation to general audiences: histograms and violins are most familiar.
  • For presentation to statistical audiences: ECDFs and violin plots are both appropriate.
  • For aesthetic impact: ridge plots are the most striking.
  • For small samples: histograms or strip plots; avoid KDE and violin plots.

17.9 When the Data Is Small

All the chart types in this chapter work well for moderate to large sample sizes (say, 50 or more observations). For smaller samples, they can be misleading. This section explains what to do with small data.

The Problem with Small Samples

A KDE on 5 data points produces a smooth curve that looks exactly like a KDE on 500 points. The visual difference between "robust estimate" and "wild guess" is not apparent. A violin plot on 5 points produces a smooth shape that has no statistical basis. A histogram with 30 bins on 10 points has mostly empty bins. All of these are technically possible but actually misleading.

Rules of Thumb

  • Fewer than 5 points per group: show individual observations. Use sns.stripplot or sns.swarmplot or just sns.scatterplot. Do not try to estimate a distribution.
  • 5 to 30 points per group: show individual observations, optionally with a simple summary (mean or median). Use strip/swarm plot with a pointplot overlay.
  • 30 to 100 points per group: box plot or violin plot with inner="points" to show individual observations as well as the summary. Avoid KDE without checking against a histogram.
  • 100+ points per group: any distributional chart type is reasonable. KDE and violin become reliable; histograms have enough counts per bin to show structure.

The Strip+Summary Pattern

For small samples, the most honest chart is a strip plot (showing every observation) with a summary overlay:

ax = sns.stripplot(data=small_df, x="group", y="value", alpha=0.6, size=8)
sns.pointplot(data=small_df, x="group", y="value", color="black", markers="_",
              estimator="mean", errorbar=None, ax=ax)

The strip plot shows every point; the point plot overlays the mean as a horizontal tick. The reader sees both the individual observations and the group summary without pretending the data is denser than it is.

Why This Matters

Showing a smooth KDE for a 5-point dataset is a form of false precision. The curve looks as confident as a KDE for 5,000 points, but the confidence is not warranted. Readers who do not look at the sample size will absorb the smooth shape as reliable. For ethical data visualization (Chapter 4's territory), small-sample distributions should be shown honestly — as individual points, with clear indication of the sample size.

17.10 Quick Reference: Key Parameters for Each Function

As you work through the distributional functions, the parameter lists can feel overwhelming. This section is a quick reference for the most-used parameters of each function.

sns.histplot

sns.histplot(
    data=df,
    x="column",                      # variable to plot
    y=None,                          # for 2D histograms
    hue="group",                     # categorical grouping
    bins=30,                         # number of bins or rule
    binwidth=None,                   # alternative: bin width
    stat="count",                    # count, density, probability, percent, frequency
    multiple="layer",                # layer, stack, dodge, fill (when hue is used)
    kde=False,                       # overlay KDE curve
    cumulative=False,                # cumulative histogram
    element="bars",                  # bars, step, poly
    shrink=1.0,                      # fraction of bar width (for dodged histograms)
)

sns.kdeplot

sns.kdeplot(
    data=df,
    x="column",
    y=None,                          # for 2D KDE
    hue="group",
    fill=False,                      # fill area under the curve
    alpha=None,                      # transparency
    bw_method=None,                  # bandwidth method (scott, silverman, or number)
    bw_adjust=1.0,                   # multiplier on automatic bandwidth
    cut=3,                           # extend curve past data (in bandwidths)
    clip=None,                       # clip curve to range
    multiple="layer",                # layer, stack, fill (when hue is used)
    common_norm=True,                # normalize across groups
)

sns.ecdfplot

sns.ecdfplot(
    data=df,
    x="column",
    hue="group",
    stat="proportion",               # proportion or count
    complementary=False,             # 1 - ECDF instead of ECDF
)

sns.violinplot

sns.violinplot(
    data=df,
    x="category",
    y="value",
    hue="subgroup",
    split=False,                     # split violins for 2-level hue
    inner="quartile",                # quartile, box, point, stick, None
    cut=2,                           # extend beyond data range
    bw_method=None,
    density_norm="area",             # area, count, width
    scale=None,                      # deprecated; use density_norm
)

These quick references cover the parameters you will use 90% of the time. For the full parameter lists, see the seaborn documentation.

17.10 Real-World Distribution Examples

To make the chart-type choices concrete, here are several real-world scenarios and the distributional chart that best fits each.

Example 1: Tracking a Product's Response Time

Scenario: You are analyzing the response time of a web service. You have 100,000 response time measurements in milliseconds. You want to understand the distribution and identify the tail.

Shape considerations: Response times are usually right-skewed with a heavy tail (most requests are fast, a few are very slow). The tail is the interesting part.

Recommended chart: Histogram with log x-axis (or ECDF with log axis). The log scale compresses the wide range, and the tail becomes visible. A linear-axis histogram would either squish the median into a single bin or make the tail invisible.

Code:

sns.histplot(data=requests, x="response_time_ms", log_scale=True, bins=50)
# Or
sns.ecdfplot(data=requests, x="response_time_ms", log_scale=True)

Example 2: Comparing Test Scores Across Classes

Scenario: You have test scores for 500 students in 10 classes. You want to compare how the distributions differ across classes.

Shape considerations: Test scores are often truncated (0 to 100 is the theoretical range), approximately normal in the middle, and have discrete possible values (integer scores).

Recommended chart: Violin plot by class, with inner="quartile" to show the median and IQR. For 10 classes, violins are readable; for more than 20, consider a ridge plot.

Code:

sns.violinplot(data=scores, x="class", y="score", inner="quartile", cut=0)
# cut=0 prevents the violin from extending below 0 or above 100

Example 3: Customer Purchase Amounts

Scenario: You have purchase amounts for 50,000 customers. You want to compare the distributions for "regular" customers vs. "premium" customers.

Shape considerations: Purchase amounts are right-skewed (most purchases are small, a few are large). You have two groups with potentially different sample sizes.

Recommended chart: ECDF with hue="segment" (for precise comparison) or KDE with fill and alpha (for visual comparison). Use stat="density" or multiple="fill" on a histogram to avoid the larger group dominating.

Code:

# ECDF for precise comparison
sns.ecdfplot(data=purchases, x="amount", hue="segment")

# KDE for visual comparison
sns.kdeplot(data=purchases, x="amount", hue="segment", fill=True, alpha=0.4, log_scale=True)

Example 4: Clinical Trial Outcomes

Scenario: You have blood pressure measurements for 200 patients, split by treatment group (control vs. treatment).

Shape considerations: Blood pressure is approximately normal. The sample size per group is moderate. You want to see individual observations and the group differences.

Recommended chart: Violin with inner="points" to show individual observations, or box plot + strip plot overlay for classic scientific figure style.

Code:

# Violin with points
sns.violinplot(data=trial, x="treatment", y="blood_pressure", inner="points")

# Box + strip combo
ax = sns.boxplot(data=trial, x="treatment", y="blood_pressure", showfliers=False)
sns.stripplot(data=trial, x="treatment", y="blood_pressure", alpha=0.3, color="black", ax=ax)

Example 5: Age Distribution by Country

Scenario: You have age data for citizens of 15 countries. You want to compare the age distributions to show demographic differences.

Shape considerations: Age distributions differ across countries (younger in developing countries, older in developed). 15 groups is too many for side-by-side violins.

Recommended chart: Ridge plot with one row per country. The aesthetic appeal compensates for the reduced precision.

Code:

# FacetGrid-based ridge plot, see Section 17.7 for the full recipe

These examples illustrate the principle: the right distributional chart depends on the shape of the data, the sample size, the number of groups, and the audience. No single chart type is always right.

17.10 Bivariate Distributions

Beyond 1D distributions, seaborn supports visualizing the joint distribution of two variables. This section covers the options.

2D Histogram with histplot

sns.histplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", bins=30)

Passing both x and y to histplot produces a 2D histogram where each cell is colored by the count of points in that bin. This is the same thing as matplotlib's hist2d but with seaborn's theming.

The cbar=True parameter adds a colorbar showing the count scale.

2D KDE with kdeplot

sns.kdeplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
sns.kdeplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", fill=True, cmap="Blues")

A 2D KDE estimates the joint density using kernel smoothing. The result is a contour plot (unfilled) or a filled contour (with fill=True). Both variants show where the density is highest in the 2D plane.

For two populations, add hue:

sns.kdeplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", fill=True, alpha=0.4)

This produces overlapping filled contours, one per species. The intersections show where the species distributions overlap in the bill-length / bill-depth space.

Joint Plot: Bivariate + Marginals

The convenience function sns.jointplot produces a bivariate plot plus marginal distributions:

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind="scatter")
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind="kde")
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind="hex")

The central panel shows the bivariate distribution (scatter, KDE, or hexbin). The top and right panels show the marginal (1D) distributions of each axis. This is a quick way to get a comprehensive view of two variables and their relationship.

We cover jointplot in more detail in Chapter 19 (multi-variable exploration).

When to Use 2D Distributional Charts

  • 2D histogram: when you have many points and want to see density without overplotting. Good for large datasets.
  • 2D KDE contours: when you want smooth density estimates and visual comparison of cluster shapes across groups.
  • Hexbin: an alternative 2D histogram using hexagonal bins, often more visually pleasing than rectangular. Access via sns.jointplot(kind="hex") or matplotlib's ax.hexbin.
  • 2D scatter with rug marks: when the individual points matter and overplotting is not a problem. The rug marks on the axes show the marginal distributions.

For most bivariate distributional questions, a joint plot is a good starting point because it shows both the bivariate relationship and the two marginals together.

17.10 Distributional Pitfalls and How to Avoid Them

Distributional charts are easy to produce but easy to misuse. This section catalogs the common failure modes.

Pitfall 1: Wrong Bin Count in Histograms

Symptom: The histogram looks either lumpy (too few bins) or noisy (too many bins).

Cause: The default bin count may not match your specific data. Too few hides structure; too many shows noise.

Fix: Try several bin counts (bins=10, bins=30, bins=100) and pick the one where the shape is clearly visible without noise. For most datasets, 20-50 bins is reasonable. For small datasets, fewer; for large datasets, more.

Pitfall 2: KDE Over-Smoothing

Symptom: The KDE curve is flat or single-peaked when you suspect the data has more structure.

Cause: The default bandwidth (Scott's rule) assumes a normal distribution. For bimodal or multimodal data, it over-smooths.

Fix: Set bw_adjust to a value less than 1 (try 0.5 or 0.3) to reduce the smoothing. Compare with a histogram of the same data to verify the bimodality is real.

Pitfall 3: KDE Tails Below Zero for Strictly Positive Data

Symptom: A KDE of income, age, or other strictly positive data extends below zero.

Cause: The KDE kernel assumes the data could take any value, including negative values at the boundary.

Fix: Use clip=(0, None) to truncate the KDE at zero. For ECDFs (which handle this correctly by default), this is not a problem.

Pitfall 4: Stacked Histogram Misread as Comparison

Symptom: A stacked histogram is read as showing the distribution shape for each group, but the reader cannot tell which group has which shape because the colors are stacked on top of each other.

Cause: multiple="stack" answers "what is the total and how does each group contribute" rather than "what does each group's distribution look like."

Fix: For comparing shapes across groups, use multiple="layer" (overlaid with transparency) or multiple="dodge" (side-by-side) instead of stacked.

Pitfall 5: Violin Plot Tails Extending Beyond Data

Symptom: A violin plot has smooth tapering ends that suggest data exists where it does not.

Cause: The default cut=2 extends the KDE two bandwidths beyond the actual data range.

Fix: Use cut=0 to truncate the violin at the data range. This is usually more honest, though it looks slightly less smooth.

Pitfall 6: Comparing Distributions of Different Sample Sizes

Symptom: A histogram with hue shows groups with very different counts, and the larger group visually dominates regardless of the actual distribution shapes.

Cause: The y-axis shows raw counts, which depend on sample size.

Fix: Use stat="density" (normalizes each group's area to 1) or stat="probability" (normalizes each group's bars to sum to 1) or multiple="fill" (normalizes the total at each bin to 1). Each tells a slightly different story; pick based on the question.

Pitfall 7: Small Samples and Unreliable KDE

Symptom: A KDE on a group with 5 data points produces a smooth curve that looks as trustworthy as a KDE on a group with 500 data points.

Cause: The KDE algorithm runs on any sample size without warning, but the estimate is unreliable for small samples.

Fix: Do not use KDE (or violin plots) for groups with fewer than ~30 observations. Use strip plots or rug plots instead, which show individual observations honestly without pretending to smooth estimates.

Pitfall 8: Over-Interpreting Minor KDE Features

Symptom: A KDE has small bumps or shoulders that you want to interpret as bimodality or clustering.

Cause: KDE is noisy at small scales. Small bumps may be sampling variation rather than real features.

Fix: Check the histogram of the same data. Real features usually show up clearly in histograms with reasonable bin counts. If the feature only appears in the KDE, it may be an artifact of the bandwidth.

17.10 Using displot: The Figure-Level Distributional Function

Most of this chapter has used axes-level functions (histplot, kdeplot, ecdfplot, violinplot). For faceted distributional displays, use the figure-level sns.displot.

Basic displot

sns.displot(data=penguins, x="body_mass_g")

This creates a new figure with a single histogram. By default, displot uses kind="hist", but you can change it:

sns.displot(data=penguins, x="body_mass_g", kind="hist")   # histogram (default)
sns.displot(data=penguins, x="body_mass_g", kind="kde")    # KDE
sns.displot(data=penguins, x="body_mass_g", kind="ecdf")   # ECDF

The kind parameter selects which axes-level function to call underneath. All the parameters for each axes-level function are available via displot.

Faceting with displot

The main reason to use displot (over the axes-level functions) is automatic faceting:

sns.displot(
    data=penguins,
    x="body_mass_g",
    col="species",
    kind="hist",
    bins=20,
    height=3,
)

This produces one histogram per species, arranged in a row. Adding row creates a 2D grid. col_wrap wraps the layout for many groups.

When to Use Axes-Level vs. Figure-Level

For single distributional charts, the axes-level functions (histplot, kdeplot, etc.) are cleaner because they integrate with manual matplotlib layouts. For faceted comparisons across multiple groups, displot handles the layout for you.

A reasonable workflow: start with the axes-level function on a single Axes to tune the parameters (bins, bandwidth, colors). Once you have the chart the way you want, switch to the figure-level function with the same parameters plus col= or row= to produce the faceted version.

17.10 The KDE Mathematics: Bandwidth, Bias, and Variance

KDE is the most mathematically interesting distributional chart type, and understanding a little of the math helps you use it correctly. This section is optional for the Fast Track path but recommended for Standard and Deep Dive.

The KDE Formula

A kernel density estimator at a point x is:

$$\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)$$

Where: - n is the number of data points. - h is the bandwidth. - K is the kernel function (usually a Gaussian). - x_i are the data points.

In plain language: at each point x, you sum the contribution of a small bump (the kernel) centered at each data point, scaled by the bandwidth h. The sum is divided by n*h so the total area is 1. The result is a smooth function that approximates the true density.

The Kernel Function

The most common kernel is the Gaussian:

$$K(u) = \frac{1}{\sqrt{2\pi}} e^{-u^2/2}$$

Other kernels include the Epanechnikov (parabolic) and the triangular. The choice of kernel matters less than you might think — most kernels produce similar results at reasonable bandwidths. seaborn uses a Gaussian kernel by default, and this is almost always the right choice.

Bandwidth Selection

The bandwidth h is the most important parameter. It controls the trade-off between two types of error:

Bias: a smoothed estimate misses fine features of the true distribution. Large bandwidth → high bias.

Variance: a wiggly estimate follows individual data points too closely, producing noise. Small bandwidth → high variance.

The optimal bandwidth balances these two. Data-driven rules (Scott's rule, Silverman's rule) estimate a good default based on the sample size and standard deviation, but the optimal depends on the underlying distribution, which you do not know. In practice, try several bandwidths and pick the one that shows the features you care about without adding visible noise.

The Default: Scott's Rule

seaborn's default bandwidth is Scott's rule:

$$h = 1.06 \cdot \hat{\sigma} \cdot n^{-1/5}$$

Where σ̂ is the sample standard deviation and n is the sample size. This rule assumes the data is roughly normal. For bimodal or heavy-tailed data, Scott's rule can over-smooth and hide important features. Adjust with bw_adjust < 1 to see more detail.

Adaptive Bandwidths

Some KDE implementations use adaptive bandwidths that vary across the data range — smaller near the peaks and larger in the tails. seaborn does not implement adaptive KDE directly, but the common_norm=False and common_grid=False parameters on displot and kdeplot let you compute KDE separately for each group, which addresses some of the same issues.

The Practical Advice

For most distributions:

  1. Start with the default bandwidth.
  2. Look at the resulting KDE. Does it show the features you expect?
  3. If the KDE looks too smooth (bimodality hidden, flat shoulders), try bw_adjust=0.5 for a less-smooth version.
  4. If the KDE looks too jagged (every data point creates a bump), try bw_adjust=1.5 or larger.
  5. Compare the KDE to a histogram of the same data. They should roughly agree. If they disagree dramatically, the KDE bandwidth is wrong.

The goal is not to find the "right" bandwidth — there is no single right answer — but to pick a bandwidth that reveals the shape of the distribution without adding visible artifacts. This is a judgment call that gets easier with practice.

17.10 The Climate Distribution Six Ways

For the progressive project, here are six distributional views of the same climate data.

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="whitegrid", context="notebook")

# Assume climate has columns year and anomaly
# Add a decade column for grouping
climate["decade"] = (climate["year"] // 10 * 10).astype(int)

# Chart 1: Histogram of annual anomalies
sns.histplot(data=climate, x="anomaly", bins=30, kde=True)
plt.title("Distribution of Annual Temperature Anomalies")

# Chart 2: KDE by decade
sns.kdeplot(data=climate, x="anomaly", hue="decade", multiple="layer", fill=True, alpha=0.3)
plt.title("Anomaly Density by Decade")

# Chart 3: ECDF by decade
sns.ecdfplot(data=climate, x="anomaly", hue="decade")
plt.title("Cumulative Distribution by Decade")

# Chart 4: Violin by decade
sns.violinplot(data=climate, x="decade", y="anomaly", inner="quartile")
plt.xticks(rotation=45)
plt.title("Anomaly by Decade")

# Chart 5: Ridge plot by decade (using FacetGrid pattern from Section 17.7)
# (See Section 17.7 code)

# Chart 6: Box plot by decade (for comparison to the Chapter 11 matplotlib version)
sns.boxplot(data=climate, x="decade", y="anomaly")
plt.xticks(rotation=45)
plt.title("Anomaly by Decade (Box)")

Each chart answers a slightly different question about the same data:

  • Histogram: What does the overall distribution look like?
  • KDE by decade: How does the shape of the distribution change across decades?
  • ECDF by decade: What is the precise percentile shift between decades?
  • Violin by decade: What are the median, spread, and shape for each decade side by side?
  • Ridge plot: Visually striking comparison across many decades.
  • Box plot: Quick summary of medians and quartiles.

Together, these six charts tell the full distributional story of climate warming: annual anomalies have shifted from a centered-at-zero distribution (early decades) to a rightward-shifted distribution (recent decades), with the shift visible in every chart type and the specific quantiles measurable via the ECDF.


Chapter Summary

This chapter covered seaborn's distributional visualization family: histplot, kdeplot, rugplot, ecdfplot, violinplot, and the ridge plot pattern. Each chart type reveals different aspects of distribution shape, and the choice depends on the specific question and audience.

Histograms (sns.histplot) show binned counts with options for multi-group comparison (multiple="layer", "stack", "dodge", "fill") and normalization (stat). Good for overview and for audiences familiar with histograms.

Kernel density estimates (sns.kdeplot) show smooth density curves. The bw_adjust parameter controls smoothness; too small produces jagged curves, too large hides features. 1D and 2D KDE are both supported.

Rug plots (sns.rugplot) draw marginal marks for individual observations. Used as supplementary context on top of histograms or KDE plots.

ECDFs (sns.ecdfplot) show the empirical cumulative distribution as a step function. Underused but powerful: no binning decisions, exact quantile reading, precise group comparison.

Violin plots (sns.violinplot) combine a box plot with a KDE, showing both summary statistics and distribution shape. Supports split violins for pairwise comparison. Watch out for misleading tails (control with cut).

Ridge plots (built manually with FacetGrid + kdeplot) stack multiple KDE curves vertically with overlap. Aesthetically striking; good for comparing many groups.

The threshold concept is that distribution shape is information. Summary statistics lose this information; visualization recovers it. Choose the chart type based on the specific aspect of the shape you want to reveal.

Next in Chapter 18: relational and categorical visualization. scatter plots with regression overlays, grouped comparisons via strip/swarm/box/violin, and the canonical critique of "dynamite plots" (bar charts with error bars hiding the raw data).


Spaced Review: Concepts from Chapters 1-16

  1. Chapter 1: Anscombe's Quartet shows that four datasets can have identical statistics but different shapes. How does this chapter's argument extend Chapter 1's threshold concept?

  2. Chapter 5: The chart selection matrix maps questions to chart types. Which chart types in this chapter answer "what is the distribution?" and which answer "how do groups compare?"

  3. Chapter 11: matplotlib's histogram and box plot (Ch 11) are matched by seaborn's histplot and boxplot. What does seaborn add that matplotlib does not?

  4. Chapter 16: The three seaborn function families are relational, distributional, and categorical. Which family does this chapter cover, and which functions belong to it?

  5. Chapter 16: Figure-level functions support faceting via col and row. Which figure-level function would you use to create a faceted distribution comparison?

  6. Chapter 16: seaborn's automatic statistics handling is a theme. How does kdeplot hide the bandwidth decision from the user, and when is that a problem?