29 min read

> "seaborn is to matplotlib what pandas is to numpy: a higher-level interface that handles the common cases gracefully and delegates to the lower level when you need full control."

Learning Objectives

  • Explain seaborn's design philosophy: statistical visualization, tidy data assumption, sensible defaults, matplotlib integration
  • Distinguish between figure-level functions (relplot, displot, catplot) and axes-level functions (scatterplot, histplot, barplot)
  • Apply the seaborn function taxonomy: relational, distributional, and categorical function families
  • Use hue, style, size, col, and row parameters to encode multiple variables without writing loops
  • Explain why seaborn requires tidy (long-form) data and convert wide-form data using pandas melt
  • Customize seaborn figures using set_theme(), set_style(), set_context(), and direct matplotlib Axes access
  • Recognize when seaborn is the right tool vs. when raw matplotlib is better

Chapter 16: seaborn Philosophy and the Grammar of Statistical Graphics

"seaborn is to matplotlib what pandas is to numpy: a higher-level interface that handles the common cases gracefully and delegates to the lower level when you need full control." — The working consensus of the Python data visualization community.


Welcome to Part IV.

Part III taught you matplotlib — the foundation of Python visualization, powerful and flexible but often verbose. You can produce any chart matplotlib supports, customize every element, and arrange multi-panel layouts with precision. You have built publication-quality climate charts by writing specific ax.plot(), ax.bar(), and ax.set_title() calls, and you have learned the craft of systematic matplotlib styling through rcParams and reusable functions.

And yet, some matplotlib code feels like more work than it should be. Grouping data by category and plotting each group in a different color requires a loop. Adding confidence bands around a time series requires manual fill_between calls. Computing statistical summaries and plotting them requires pandas operations before the matplotlib call. For statistical visualization — the kind of work that data analysts and scientists do every day — matplotlib's verbosity adds up.

This is the gap that seaborn fills. Created by Michael Waskom in 2012 and now maintained by a small team of volunteers, seaborn is a statistical visualization library built on top of matplotlib. It is not a replacement for matplotlib; it is a higher-level interface to the same rendering system. Every seaborn chart is, underneath, a matplotlib figure. The difference is that seaborn handles the common statistical operations automatically: grouping by category, computing confidence intervals, fitting regression lines, arranging small multiples, choosing reasonable default colors for categorical data. Code that takes 30 lines in matplotlib often takes 3 in seaborn, and the seaborn version is often more correct because the library handles edge cases that hand-written matplotlib code misses.

Part IV covers seaborn in four chapters:

  • Chapter 16 (this one): seaborn philosophy, the figure-level vs. axes-level distinction, tidy data, and the three function families.
  • Chapter 17: distributional visualization — histograms, KDE, ECDF, violin plots, and ridge plots.
  • Chapter 18: relational and categorical visualization — scatter plots, line plots, regression overlays, and categorical charts.
  • Chapter 19: multi-variable exploration — pair plots, joint plots, heatmaps, and cluster maps.

By the end of Part IV, you will be able to produce statistical visualizations in seaborn that would be tedious or error-prone in raw matplotlib, and you will know when to reach for each library depending on the task.

The threshold concept of this chapter is a conceptual shift from imperative matplotlib code ("for each group, plot a line in a different color") to declarative seaborn code ("map the group column to the hue channel, and let the library handle the iteration"). This shift is not immediately obvious if you have internalized the matplotlib pattern, but once it clicks, seaborn's power becomes evident. We will practice the shift throughout the chapter.

A note on seaborn versions. The current version of seaborn (as of this textbook) is 0.13.x. seaborn had a significant API overhaul in version 0.11 (late 2020) that introduced displot, catplot, and relplot as the canonical figure-level functions, replacing older names like distplot and factorplot. If you find tutorials on the web that use the old names, the concepts usually transfer, but the function names have changed. This chapter uses the modern API throughout.


16.1 What seaborn Is (and Is Not)

Before diving into the API, it helps to be specific about what seaborn adds to the matplotlib ecosystem and what it does not.

What seaborn Is

seaborn is a statistical visualization library built on top of matplotlib. Every seaborn function internally creates or uses matplotlib Figure and Axes objects. When you call sns.scatterplot(...), seaborn builds a matplotlib scatter under the hood and returns the Axes. You can take that Axes and continue to customize it with any matplotlib method you have learned in Part III.

seaborn makes statistical operations first-class. Computing confidence intervals, fitting regression lines, aggregating by category, and building small multiples are built into the library's primary functions. You do not have to compute a statistic in pandas and then plot it in matplotlib; you declare the operation as a parameter to the seaborn function.

seaborn has opinionated defaults. The library ships with better-looking default styles than matplotlib 2.0+, carefully chosen color palettes, sensible tick formats, and automatic legend handling. If you just call a seaborn function on a DataFrame, the result is usually more polished than the equivalent matplotlib default.

seaborn integrates tightly with pandas. Every function accepts a DataFrame as the data parameter and column names as string parameters for x, y, hue, etc. This removes the need to extract columns manually, which reduces boilerplate code.

What seaborn Is Not

seaborn is not a replacement for matplotlib. It sits on top of matplotlib, and advanced customization still requires matplotlib code. For full typographic control, custom layouts, and non-standard chart types, you drop down to the matplotlib layer.

seaborn is not an interactive visualization library. seaborn produces static matplotlib figures. For interactive web-native charts (hover tooltips, zoom, pan), use Plotly or Bokeh — covered in Part V.

seaborn is not a dashboard framework. It is designed for individual statistical figures, not for building live-updating dashboards. For dashboards, use Streamlit, Dash, or similar — covered in Part VII.

seaborn is not the only statistical visualization library for Python. Plotnine (a ggplot2 port), Altair (a grammar-of-graphics library), and HoloViews all offer similar functionality with different APIs. seaborn is the most widely-used and the best-documented, which makes it the default choice for most practitioners, but it is not the only option.

The Position in the Ecosystem

Think of the Python visualization stack as layers:

  • matplotlib: the foundation. Verbose but powerful, supports any chart type, handles output to any format.
  • seaborn: the statistical layer. Built on matplotlib, adds concise APIs for statistical operations, handles common cases gracefully.
  • pandas plotting: a thin convenience layer on matplotlib for quick exploratory charts from DataFrames.
  • Plotly / Bokeh / Altair: alternative libraries for interactive web-native visualization.

For most statistical visualization work, seaborn is the right starting point. It is fast to write, produces high-quality defaults, and delegates to matplotlib when you need custom control. Part IV is about using seaborn well; the rest of Python visualization stack is covered in other parts.


16.2 Tidy Data: Why seaborn Wants Long-Form DataFrames

Before you can use seaborn effectively, you need to understand the data format it expects. seaborn assumes tidy data — a specific pandas DataFrame format that makes statistical operations natural. This section explains what tidy data is, why seaborn wants it, and how to convert wide-form data to tidy.

The Tidy Data Principle

Tidy data, as defined by Hadley Wickham in his 2014 paper of the same name, has three properties:

  1. Each variable is a column.
  2. Each observation is a row.
  3. Each type of observational unit is a table.

For a climate dataset, tidy data means:

year variable value
1880 temperature -0.23
1880 co2 290.5
1880 sea_level -158.0
1881 temperature -0.20
1881 co2 290.7
... ... ...

Each row is one observation (one year × one variable). Each column is one variable (year, variable name, value). Every combination of year and variable has exactly one row.

Wide-Form Data (What You Probably Have)

Most data as it comes from spreadsheets or databases is in wide form:

year temperature co2 sea_level
1880 -0.23 290.5 -158.0
1881 -0.20 290.7 -157.8
... ... ... ...

Each row is one year with three separate columns for the three variables. This is natural for spreadsheets and SQL queries but is not the format seaborn expects.

Converting Wide to Tidy with pandas melt

The canonical tool for converting wide-form data to tidy is pandas's melt:

import pandas as pd

# Wide-form data
wide = pd.DataFrame({
    "year": [1880, 1881, 1882],
    "temperature": [-0.23, -0.20, -0.18],
    "co2": [290.5, 290.7, 290.8],
    "sea_level": [-158.0, -157.8, -157.5],
})

# Convert to tidy (long) form
tidy = wide.melt(id_vars="year", var_name="variable", value_name="value")

The melt call takes:

  • id_vars: the columns that identify each row (year, in this case).
  • var_name: the name of the new column that will hold the original column names (variable).
  • value_name: the name of the new column that will hold the values (value).

The result is the tidy format shown above, with one row per (year, variable) combination.

Why seaborn Wants Tidy Data

With tidy data, seaborn can use column names directly in its function calls:

import seaborn as sns

# Plot the tidy data, grouping by variable
sns.lineplot(data=tidy, x="year", y="value", hue="variable")

This single call plots three lines — one for each variable — with a legend showing the variable names. The hue="variable" parameter tells seaborn to group by the variable column and use different colors for each group. Internally, seaborn iterates over the groups, plots each one, and builds the legend. No manual loop required.

With wide-form data, you would have to write a loop:

# Wide form: manual loop required
fig, ax = plt.subplots()
for col in ["temperature", "co2", "sea_level"]:
    ax.plot(wide["year"], wide[col], label=col)
ax.legend()

Three lines of loop code replace one line of seaborn. For more complex visualizations (faceting, confidence intervals, per-group regression lines), the difference grows dramatically.

When Wide-Form Is Fine

Not every dataset needs to be in tidy form. Some seaborn functions (particularly the older ones like sns.heatmap) expect wide-form data. Correlation matrices are naturally wide. Time series with aligned observations are often easier to work with in wide form.

The rule: for most seaborn plotting functions that use hue, style, col, or row parameters, use tidy data. For sns.heatmap and similar matrix-based functions, use wide data. If you are not sure, start with tidy data — it is the more flexible default.

Check Your Understanding — Take a wide-form DataFrame with columns ["date", "revenue_2023", "revenue_2024"]. Write the pandas melt call that converts it to tidy form with columns ["date", "year", "revenue"].


16.3 The Three Function Families

seaborn's plotting functions are organized into three families, each corresponding to a type of statistical question:

  1. Relational: how do two variables relate to each other?
  2. Distributional: how is a single variable distributed?
  3. Categorical: how do values compare across categories?

Each family has a figure-level function (relplot, displot, catplot) that handles faceting automatically, and a set of axes-level functions that target a specific Axes. We will cover the figure-level vs. axes-level distinction in Section 16.4. For now, here is the taxonomy.

Relational Functions

Relational functions visualize the relationship between two continuous variables. The figure-level function is sns.relplot; the axes-level functions are sns.scatterplot and sns.lineplot.

Typical use cases: - Scatter plot of two variables to see correlation or clusters. - Line chart of a variable over time, optionally with a confidence band. - Multi-line chart with different colors for different categories.

Example:

# Scatter plot with three variables encoded
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time", size="size")

Distributional Functions

Distributional functions visualize the distribution of a single variable (or the joint distribution of two variables). The figure-level function is sns.displot; the axes-level functions are sns.histplot, sns.kdeplot, sns.ecdfplot, and sns.rugplot.

Typical use cases: - Histogram of a single variable to see its shape. - Overlaid histograms for comparing groups. - Kernel density estimate for smoother distribution visualization. - ECDF (empirical cumulative distribution function) for precise quantile reading.

Example:

# Histogram by category
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="stack")

Chapter 17 covers the distributional family in depth.

Categorical Functions

Categorical functions visualize comparisons across categorical variables. The figure-level function is sns.catplot; the axes-level functions include sns.stripplot, sns.swarmplot, sns.boxplot, sns.violinplot, sns.barplot, sns.pointplot, and sns.countplot.

Typical use cases: - Strip plot showing individual observations for each category. - Box plot or violin plot showing the distribution within each category. - Bar plot showing the mean or median for each category. - Count plot showing the frequency of each category.

Example:

# Box plot by category
sns.boxplot(data=tips, x="day", y="total_bill", hue="smoker")

Chapter 18 covers the categorical family in depth.

Multi-Variable Functions

Beyond the three main families, seaborn has specialized multi-variable functions: sns.pairplot (all pairwise relationships), sns.jointplot (bivariate with marginals), sns.heatmap, and sns.clustermap. These are covered in Chapter 19.


16.4 Figure-Level vs. Axes-Level Functions

This section covers the most important conceptual distinction in the seaborn API. Understanding it is essential for using seaborn effectively.

Axes-Level Functions

Axes-level functions target a specific matplotlib Axes. They accept an ax parameter (optional, defaulting to plt.gca()) and return the Axes they plotted on. Example:

import seaborn as sns
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 5))
sns.scatterplot(data=tips, x="total_bill", y="tip", ax=ax)
ax.set_title("Tip vs. Total Bill")

The sns.scatterplot call plots on the specific Axes ax. You can then continue to customize that Axes with any matplotlib method, because seaborn returns control to you after the plot is drawn.

Axes-level functions include: scatterplot, lineplot, histplot, kdeplot, ecdfplot, rugplot, stripplot, swarmplot, boxplot, violinplot, barplot, pointplot, countplot, regplot, heatmap.

Use axes-level functions when:

  • You are building a multi-panel figure with matplotlib's plt.subplots or GridSpec and want to place a seaborn chart on a specific Axes.
  • You want full control over the matplotlib figure and use seaborn only for the plot itself.
  • You are customizing extensively beyond what seaborn's parameters allow.

Figure-Level Functions

Figure-level functions create their own figure and return a FacetGrid (or similar) object. They support faceting across multiple panels with the col and row parameters. Example:

g = sns.relplot(
    data=tips,
    x="total_bill",
    y="tip",
    hue="smoker",
    col="day",
    kind="scatter",
    height=4,
    aspect=1,
)

sns.relplot creates a figure with one panel per day (four panels total because there are four days in the tips dataset), each showing a scatter plot of tip vs. total bill colored by smoker. The returned object g is a FacetGrid with attributes like g.fig (the underlying matplotlib Figure) and g.axes (the array of Axes).

Figure-level functions include: relplot, displot, catplot, pairplot, jointplot, lmplot, clustermap.

Use figure-level functions when:

  • You want seaborn to handle faceting for you.
  • You do not need fine-grained control over the figure layout.
  • You are building a single coherent visualization (not a multi-chart-type dashboard).

The Relationship Between Them

Every figure-level function has a kind parameter that selects the axes-level function it uses internally. For example:

  • sns.relplot(kind="scatter") uses sns.scatterplot under the hood.
  • sns.relplot(kind="line") uses sns.lineplot.
  • sns.displot(kind="hist") uses sns.histplot.
  • sns.displot(kind="kde") uses sns.kdeplot.
  • sns.catplot(kind="box") uses sns.boxplot.
  • sns.catplot(kind="strip") uses sns.stripplot.

This means you can start with the axes-level function, verify that it produces the output you want on a single Axes, and then switch to the figure-level function when you need faceting.

Choosing Between Figure-Level and Axes-Level

A practical rule:

Use axes-level when: - You need to place the seaborn chart in a specific matplotlib Axes (e.g., part of a larger GridSpec layout). - You want to add extensive custom matplotlib elements to the chart. - You do not need faceting.

Use figure-level when: - You want faceting via col or row parameters. - The seaborn chart is the whole figure. - You accept the default faceting layout.

For most standalone charts, the axes-level function is simpler and integrates better with manual matplotlib control. For small-multiple visualizations, the figure-level function is worth the slight loss of control.


16.5 Encoding Multiple Variables with hue, style, and size

One of seaborn's most powerful features is the ability to encode additional variables as visual channels using the hue, style, and size parameters. This is where seaborn's declarative style pays off most clearly.

hue: Color Encoding

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker")

The hue="smoker" parameter tells seaborn to color the points based on the value of the smoker column. seaborn picks a palette automatically (different for categorical and continuous hues), iterates over the groups, plots each in a different color, and builds a legend.

The equivalent matplotlib code would require:

  1. Looping over the unique values of smoker.
  2. Filtering the DataFrame for each value.
  3. Calling ax.scatter with an appropriate color from a palette.
  4. Building the legend manually with ax.legend(handles, labels).

In seaborn, it is one parameter.

style: Marker Shape Encoding

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker", style="time")

The style="time" parameter adds a second encoding: each point's marker shape is determined by the time column. Combined with hue, this encodes three variables at once (x, y, and two categories).

size: Size Encoding

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker", size="size")

The size="size" parameter makes each point's marker size proportional to the size column (the table size in the tips dataset). This is a bubble chart.

Combining hue, style, and size

You can use all three simultaneously:

sns.scatterplot(
    data=tips,
    x="total_bill",
    y="tip",
    hue="smoker",
    style="time",
    size="size",
)

This encodes five variables: x, y, color (smoker), marker shape (time), and marker size (table size). This is a lot of information on one chart. Whether it is too much depends on the specific data — sometimes the reader can absorb five dimensions, sometimes they cannot. Use multi-variable encoding when the additional variables matter; do not use it to show off.


16.6 Faceting with col and row

The col and row parameters on figure-level functions create small multiples automatically. This is one of the clearest demonstrations of seaborn's value.

Basic Faceting

sns.relplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="day",
    kind="scatter",
    height=3,
    aspect=1,
)

With four days in the tips dataset, this creates four panels (one per day), each showing the scatter plot for that day. No loop, no GridSpec, no manual subplot arrangement — just col="day" and seaborn handles the rest.

Faceting Across Rows and Columns

sns.relplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="day",
    row="smoker",
    kind="scatter",
    height=3,
)

This creates a grid with 4 columns (days) × 2 rows (smoker/non-smoker) = 8 panels. The axes are automatically shared where appropriate. The legend is placed outside the grid. The result is a small-multiple comparison that would take dozens of lines of raw matplotlib code.

col_wrap for Many Categories

When you have many categories and a single row would be too wide, use col_wrap:

sns.relplot(
    data=data,
    x="x",
    y="y",
    col="category",
    col_wrap=4,  # wrap after 4 columns
    kind="scatter",
    height=3,
)

With 12 categories and col_wrap=4, seaborn creates a 3×4 grid instead of a 1×12 strip.

Height and Aspect

The height parameter (in inches) sets the height of each individual panel. The aspect parameter is the width-to-height ratio of each panel. So height=4, aspect=1.5 produces panels that are 4 inches tall and 6 inches wide. The total figure size is computed from the number of panels and these per-panel dimensions.

This is different from matplotlib's figsize, which specifies the whole figure. In seaborn, you specify the panel and seaborn computes the figure size. For a 3×4 grid with height=4, aspect=1.5, the figure is 18 inches wide × 12 inches tall.


16.7 Themes: set_theme, set_style, set_context

seaborn ships with several built-in themes that change the appearance of every subsequent plot. These are higher-level than matplotlib's rcParams but built on top of them.

set_theme: The Main Function

sns.set_theme(style="whitegrid", context="notebook", palette="deep", font="sans-serif", font_scale=1.0)

sns.set_theme sets several related options at once. You can call it with specific arguments to override individual aspects:

  • style: the visual style — "darkgrid", "whitegrid", "dark", "white", "ticks". Default is "darkgrid" historically (though this changes between seaborn versions). "whitegrid" is a clean, publication-ready choice.
  • context: the font scaling context — "paper", "notebook", "talk", "poster". "notebook" is default; "paper" is smaller; "talk" and "poster" are larger for presentations.
  • palette: the default color palette for categorical encodings — "deep", "muted", "pastel", "bright", "dark", "colorblind", or any matplotlib colormap name.
  • font: the font family — usually "sans-serif".
  • font_scale: a multiplier for font sizes (e.g., 1.5 for larger text).

set_style, set_context, set_palette

For finer control, the individual setters work independently:

sns.set_style("whitegrid")
sns.set_context("talk")
sns.set_palette("colorblind")

These call set_theme under the hood with only the specified option.

Resetting to Defaults

To reset seaborn's defaults to matplotlib's defaults:

sns.reset_defaults()

This is useful when you want to produce a pure matplotlib figure without seaborn's styling affecting it.

Choosing a Theme

For publication-quality charts: set_theme(style="whitegrid", context="notebook").

For slide deck presentations: set_theme(style="whitegrid", context="talk") or "poster" for very large fonts.

For colorblind-friendly defaults: set_theme(palette="colorblind").

For print or black-and-white output: consider set_theme(style="ticks", palette="gray").


16.8 Accessing the Underlying matplotlib

Because seaborn is built on matplotlib, the underlying matplotlib objects are always accessible for further customization.

From an Axes-Level Function

Axes-level functions return the Axes they plotted on:

ax = sns.scatterplot(data=tips, x="total_bill", y="tip")
ax.set_title("Action Title Stating the Finding")
ax.set_xlabel("Total bill (USD)")
ax.spines["top"].set_visible(False)

You can call any matplotlib Axes method on the returned ax. This is how you combine seaborn's statistical shortcuts with matplotlib's typographic and styling control.

From a Figure-Level Function

Figure-level functions return a FacetGrid (or similar) object:

g = sns.relplot(data=tips, x="total_bill", y="tip", col="day", kind="scatter")

# Access the whole figure
g.fig.suptitle("Tips by Day", y=1.02)

# Access individual Axes
for ax in g.axes.flat:
    ax.set_xlabel("Bill (USD)")

# Access the first Axes specifically
g.axes[0, 0].set_title("Custom Title")

g.fig is the matplotlib Figure; g.axes is a 2D numpy array of Axes (same shape as the facet grid). You can iterate over the Axes and apply matplotlib customizations to each.

The ability to drop down to matplotlib is essential for publication-quality seaborn charts. seaborn's defaults are good but not perfect; for serious work, you will almost always add matplotlib customization on top of the seaborn base.


16.9 Installing and Importing seaborn

A brief practical section before the theoretical content continues.

seaborn is installed via pip or conda:

pip install seaborn
# or
conda install -c conda-forge seaborn

It installs its dependencies automatically: matplotlib, pandas, numpy, scipy (for statistical operations). On most Python scientific distributions (Anaconda, for example), seaborn is already installed.

The conventional import is:

import seaborn as sns

The sns alias is a nod to the fictional character Samuel Norman Seaborn (from the TV show The West Wing), after whom the library is named. Every seaborn tutorial uses sns; do not change it.

Version Compatibility

Different seaborn versions have different APIs. Major changes:

  • seaborn 0.9 (2018): introduced FacetGrid improvements, relplot, catplot preview.
  • seaborn 0.11 (2020): major API overhaul. Introduced displot, histplot, ecdfplot, kdeplot with new signatures. Deprecated distplot.
  • seaborn 0.12 (2022): introduced the experimental seaborn.objects interface, a grammar-of-graphics style API.
  • seaborn 0.13 (2023): stabilized the 0.12 improvements, removed deprecated 0.10 functions.

This book uses the modern (0.13+) API. If you find online tutorials using distplot (deprecated), they are for older seaborn versions. The concepts transfer, but the function names have changed. Check your seaborn version:

import seaborn as sns
print(sns.__version__)

If it is below 0.12, upgrade with pip install --upgrade seaborn.

The seaborn.objects Experimental API

seaborn 0.12 introduced a new API called seaborn.objects that is closer to the grammar of graphics (as used by R's ggplot2). It uses a layered approach:

import seaborn.objects as so

p = so.Plot(tips, x="total_bill", y="tip").add(so.Dot()).facet("day")
p.show()

The objects API is more composable than the classic API but is still marked experimental. For this book, we use the classic API because it is stable, well-documented, and what you will encounter in most existing seaborn code. The experimental API is worth knowing about but not worth learning before the classic API.

16.10 seaborn's Approach to Statistical Operations

One of seaborn's main values is that it performs statistical operations (aggregation, confidence intervals, regression fits) as part of the plotting call. This section explains how that works and what it means for your code.

Automatic Aggregation in lineplot

Consider a DataFrame with multiple observations per x-value — for example, daily stock prices across multiple stocks:

# Synthetic: 100 days, 20 stocks, noisy prices
import pandas as pd
import numpy as np

np.random.seed(42)
data = []
for stock in range(20):
    for day in range(100):
        data.append({"day": day, "stock": stock, "price": 100 + day * 0.5 + np.random.randn() * 5})
df = pd.DataFrame(data)

To plot the "typical" price over time with seaborn:

sns.lineplot(data=df, x="day", y="price")

This produces a line that shows the mean price for each day, surrounded by a shaded 95% confidence interval across stocks. seaborn automatically:

  1. Groups the data by the x variable (day).
  2. Computes the mean for each group.
  3. Bootstraps a 95% confidence interval around each mean.
  4. Draws a central line for the means and a shaded band for the confidence intervals.

The equivalent matplotlib code would require manual pandas groupby, bootstrap computation, and two matplotlib calls (plot for the mean and fill_between for the band). seaborn does this in one line.

Customizing the Estimator

The default aggregator is the mean. You can change it with the estimator parameter:

sns.lineplot(data=df, x="day", y="price", estimator=np.median)

For grouped comparisons:

sns.lineplot(data=df, x="day", y="price", hue="stock_category", estimator=np.median, errorbar=("ci", 90))

The errorbar=("ci", 90) parameter says to compute a 90% confidence interval via bootstrap. Other options include ("se", 1) for standard error × 1, ("sd", 2) for standard deviation × 2, or None to disable the band entirely.

When You Want Individual Series

If you do not want automatic aggregation — if each row is already one series and you want to see all series separately — use units to indicate the identifier:

sns.lineplot(data=df, x="day", y="price", hue="stock", units="stock", estimator=None)

estimator=None tells seaborn not to aggregate. units="stock" tells it to draw one line per stock. The result is a spaghetti chart of 20 individual lines — useful for exploration but cluttered for presentation.

Regression Overlays in scatterplot and regplot

sns.regplot and sns.lmplot overlay a regression line on a scatter plot:

sns.regplot(data=tips, x="total_bill", y="tip")

This fits a linear regression, draws the fitted line, and shades a confidence interval around it. For polynomial or logistic fits:

sns.regplot(data=tips, x="total_bill", y="tip", order=2)  # polynomial
sns.regplot(data=tips, x="total_bill", y="tip", logistic=True)  # logistic

The regression is computed by seaborn internally; you do not need to call scikit-learn or statsmodels separately. For exploratory use, this is much faster than fitting a model and then plotting it in matplotlib.

The Trade-Off

Statistical operations inside plotting calls are convenient but occasionally obscure. The reader of your seaborn code has to know that sns.lineplot aggregates by default; if they assume it does not, they will misread the chart. For teaching and reproducibility, it is sometimes worth computing the statistics explicitly in pandas and plotting the result, so the operations are visible in the code. For quick exploration, the built-in operations are the right choice.

16.10 The Built-In Datasets

seaborn ships with several example datasets that appear in every tutorial, every documentation example, and every quick demonstration. Knowing them is useful because any matplotlib answer you find on Stack Overflow may use them, and they are convenient for testing.

Loading a Dataset

import seaborn as sns
tips = sns.load_dataset("tips")
iris = sns.load_dataset("iris")
penguins = sns.load_dataset("penguins")
flights = sns.load_dataset("flights")
titanic = sns.load_dataset("titanic")

Each call fetches the dataset from the seaborn-data GitHub repository (on first use) and caches it locally. The return value is a pandas DataFrame.

The Main Datasets

tips: restaurant tipping data (244 rows). Columns: total_bill, tip, sex, smoker, day, time, size. Used to demonstrate relational and categorical plots.

iris: the Fisher iris flowers (150 rows). Columns: sepal_length, sepal_width, petal_length, petal_width, species. The canonical dataset for pair plots and multi-class classification.

penguins: Palmer penguins data (344 rows). Columns: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex. A modern replacement for iris with a similar use case.

flights: monthly airline passenger counts 1949-1960 (144 rows). Columns: year, month, passengers. Used for time-series and heatmap demonstrations.

titanic: survival data from the Titanic (891 rows). Columns: survived, pclass, sex, age, fare, etc. Used for categorical analysis and survival demonstrations.

mpg: miles per gallon data (398 rows). Columns: mpg, cylinders, displacement, horsepower, weight, acceleration, model_year, origin, name. Used for relational plots.

Why They Matter

These datasets are valuable for three reasons:

1. You can run every tutorial example. Most seaborn tutorials use these datasets. When you want to reproduce an example from the documentation or Stack Overflow, the data is one line away.

2. They have a mix of data types. Each dataset has categorical variables, continuous variables, and often a natural grouping variable. This makes them good for demonstrating seaborn's multi-variable encoding.

3. They are small. Each dataset is a few hundred to a few thousand rows, small enough to render instantly even on modest hardware. This makes them useful for testing and quick experimentation.

Using Them in Practice

For the rest of Part IV, the examples will mostly use penguins and tips because they are both tidy and demonstrate different seaborn features. If you want to follow along, load the datasets at the top of your script:

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="whitegrid", context="notebook")
penguins = sns.load_dataset("penguins")
tips = sns.load_dataset("tips")

Everything else in the chapter examples assumes you have these available.

16.10 A Worked Example: matplotlib vs. seaborn Side by Side

To make the imperative-to-declarative shift concrete, here is the same task — plotting three climate variables as a multi-panel line chart with grouping — in both libraries.

The matplotlib Version

import matplotlib.pyplot as plt
import pandas as pd

# Assume climate is a wide-form DataFrame with columns year, temperature, co2, sea_level
fig, axes = plt.subplots(3, 1, figsize=(12, 9), sharex=True, constrained_layout=True)

variables = ["temperature", "co2", "sea_level"]
colors = ["#d62728", "#7f7f7f", "#1f77b4"]
ylabels = ["Temperature (°C)", "CO2 (ppm)", "Sea Level (mm)"]

for ax, var, color, ylabel in zip(axes, variables, colors, ylabels):
    ax.plot(climate["year"], climate[var], color=color, linewidth=1.5)
    ax.set_ylabel(ylabel)
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)

axes[-1].set_xlabel("Year")
fig.suptitle("Three Climate Variables, 1880-2024", fontsize=14, fontweight="semibold")
fig.savefig("climate_matplotlib.png", dpi=300, bbox_inches="tight")

Count the lines: a loop with four variables, explicit color and label lists, manual spine removal on each panel, manual x-label only on the bottom panel. This is about 15 lines of code once you count the setup and the imports.

The seaborn Version (Wide Form)

With wide-form data, seaborn is a little awkward — you still need to loop through variables or use pd.wide_to_long before plotting.

The seaborn Version (Tidy Form)

import seaborn as sns
import pandas as pd

# Convert wide to tidy
climate_tidy = climate.melt(id_vars="year", var_name="variable", value_name="value")

sns.set_theme(style="whitegrid", context="notebook")

g = sns.relplot(
    data=climate_tidy,
    x="year",
    y="value",
    col="variable",
    kind="line",
    col_wrap=1,
    height=3,
    aspect=3.5,
    facet_kws={"sharey": False},  # each panel uses its own y-scale
)

g.fig.suptitle("Three Climate Variables, 1880-2024", fontsize=14, fontweight="semibold", y=1.02)
g.fig.savefig("climate_seaborn.png", dpi=300, bbox_inches="tight")

Count the lines: one melt to convert the data, one set_theme for styling, one relplot call for the figure, one savefig. That is about 5 lines of actual chart code.

What Changed

The seaborn version is shorter, but that is not the only difference. Several things are different:

1. The y-axes are automatically scaled. Each panel gets its own y-limits because sharey=False was passed. seaborn handles the computation; matplotlib would require manual ax.set_ylim calls.

2. The legend is automatic. Actually, there is no legend in this specific example because each panel shows one variable. But if we had added hue="variable" within a single panel, seaborn would have built the legend automatically.

3. The theme handles spine removal. sns.set_theme(style="whitegrid") removes the top and right spines by default, so the manual loop in the matplotlib version is unnecessary.

4. The tidy-data transformation is required. The wide form is natural for some data sources, but seaborn wants tidy. This is extra work that matplotlib does not require.

Which Is Better?

Neither. They are different tools for different styles. If you like the imperative matplotlib style and need fine control, use matplotlib. If you prefer the declarative seaborn style and are willing to pay the tidy-data cost, use seaborn. For most exploratory and statistical work, seaborn wins on productivity. For most publication and customization work, matplotlib wins on control.

In practice, most experienced practitioners use both, switching between them based on the task. The next section walks through when each is the right choice.

16.10 When Seaborn Helps and When It Hurts

Seaborn is not always the right choice. This section gives you a decision guide for when seaborn's abstractions help and when they get in the way.

Seaborn Helps When:

1. You have tidy data in a DataFrame. Seaborn's data= parameter plus string column names is the cleanest API for DataFrame-first workflows. If your data is already a pandas DataFrame with proper column names, seaborn will feel natural.

2. You want automatic statistical operations. Seaborn computes means, confidence intervals, regression fits, kernel density estimates, and similar statistics automatically. If you were going to do these in pandas before matplotlib anyway, seaborn does them in one step.

3. You want faceting without writing loops. The col and row parameters on figure-level functions handle small multiples with no manual subplot arrangement. For grouped visualizations, this is dramatically shorter than matplotlib.

4. You want better defaults than matplotlib provides. Seaborn's themes are more polished than matplotlib's defaults, and the color palettes are chosen with perceptual uniformity in mind.

5. You are doing exploratory data analysis. Quick scatter plots, distribution checks, and group comparisons are faster in seaborn than in raw matplotlib. For EDA, seaborn is the productivity win.

Seaborn Hurts When:

1. You need precise layout control. Seaborn's figure-level functions manage their own figure layout, which can conflict with manual GridSpec arrangements. For complex multi-panel figures where different panels use different chart types, raw matplotlib with GridSpec is often cleaner.

2. You need typographic polish beyond seaborn's defaults. Seaborn's text handling is adequate but not as flexible as direct matplotlib Text properties. For publication-quality typography with custom fonts and precise positioning, drop down to matplotlib after the seaborn call.

3. Your data is not in a DataFrame. If your data is numpy arrays or plain Python lists, the seaborn data= parameter does not apply, and you lose some of seaborn's convenience. You can still pass arrays directly, but you lose the string-column-name API.

4. You need a chart type seaborn does not support. Specialized visualizations (quiver plots, 3D surfaces, custom domain-specific chart types) are matplotlib-only. Seaborn does not reach into every corner of matplotlib.

5. You are producing a single highly-customized chart. For one very polished chart, the time saved by seaborn's automatic operations may be offset by the time spent fighting seaborn's layout and theme decisions. Direct matplotlib gives more control.

The Practical Rule

Default to seaborn for statistical and exploratory work. For most data analysis, seaborn's conciseness and automatic statistics are a net productivity win. Write the chart as a seaborn call first.

Drop down to matplotlib when seaborn is not enough. If you need specific typography, precise layout, or specialized chart types, add matplotlib code on top of the seaborn base. seaborn returns matplotlib objects, so the drop-down is always available.

Use raw matplotlib for one-off publication figures that need full control. When the chart is important enough to justify extensive customization, start with matplotlib and skip seaborn entirely.

This hybrid approach — seaborn for the statistical shortcuts, matplotlib for the customization — is how most experienced Python data visualization practitioners work. Neither library alone is the answer; they complement each other.

16.10 The Climate Plot in seaborn

For the progressive project, here is the climate line chart in seaborn:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set_theme(style="whitegrid", context="notebook")

climate = pd.read_csv("climate_data.csv")  # columns: year, anomaly

fig, ax = plt.subplots(figsize=(12, 5))
sns.lineplot(data=climate, x="year", y="anomaly", color="#d62728", linewidth=1.5, ax=ax)
ax.axhline(0, color="gray", linewidth=0.5, linestyle="--")
ax.set_title("Global Temperature Anomaly, 1880-2024", fontsize=14, loc="left", fontweight="semibold")
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (°C)")
fig.savefig("climate_seaborn.png", dpi=300, bbox_inches="tight")

Compare this to the Chapter 12 matplotlib version. The core plotting call is one line instead of two, and the theme provides better default spine treatment. The title, labels, and savefig are the same — because these come from matplotlib, and seaborn delegates to matplotlib for the non-statistical parts.

The difference becomes more dramatic when the chart has grouping:

# Suppose climate is tidy with columns year, variable, value (variable = temperature, co2, sea_level)
sns.relplot(
    data=climate_tidy,
    x="year",
    y="value",
    col="variable",
    kind="line",
    height=3,
    aspect=3,
    col_wrap=1,
    color="#d62728",
)

One call produces a 3-panel small multiple with one line per variable. The matplotlib equivalent would take a loop, manual subplot arrangement, and several dozen lines of code.


Chapter Summary

This chapter introduced seaborn as a statistical visualization library built on matplotlib. seaborn is not a replacement for matplotlib but a higher-level interface that handles common statistical operations (grouping, confidence intervals, faceting, statistical estimators) automatically.

Tidy data is the DataFrame format seaborn expects: one column per variable, one row per observation. Convert wide-form data to tidy using pd.melt. Tidy data enables seaborn's declarative style — you map columns to visual channels (hue, style, size, col, row) and seaborn handles the iteration.

Three function families organize seaborn's plotting: relational (relplot, scatterplot, lineplot), distributional (displot, histplot, kdeplot, ecdfplot), and categorical (catplot, stripplot, boxplot, violinplot, barplot). Each family has a figure-level function with faceting support and axes-level functions that target a specific Axes.

Figure-level vs. axes-level is the most important API distinction. Figure-level functions create their own figure and return a FacetGrid; they support faceting via col and row. Axes-level functions target an existing Axes and return the Axes for further customization. Use figure-level for faceting, axes-level for precise matplotlib integration.

Multi-variable encoding is where seaborn shines. hue, style, and size parameters encode additional variables on a single chart with no manual loops. This is the threshold concept: imperative matplotlib ("for each group, plot in a color") becomes declarative seaborn ("map group to color").

Themes (set_theme, set_style, set_context) provide better-looking defaults than matplotlib alone. set_theme(style="whitegrid", context="notebook") is a reasonable default for publication-quality charts. For slides, use context="talk".

Access the underlying matplotlib for any customization seaborn does not handle. Axes-level functions return the Axes directly. Figure-level functions return a FacetGrid with g.fig and g.axes attributes. You can always drop down to matplotlib for typography, annotations, and fine-grained layout.

The next three chapters apply seaborn to specific visualization tasks: distributional (Ch 17), relational and categorical (Ch 18), and multi-variable exploration (Ch 19).


Spaced Review: Concepts from Chapters 1-15

  1. Chapter 5: The chart selection matrix maps questions to chart types. How does seaborn's three-family taxonomy (relational, distributional, categorical) map to that matrix?

  2. Chapter 8: Small multiples are "the best design principle" per Tufte. Which seaborn feature implements small multiples directly?

  3. Chapter 10: matplotlib is an object library — everything is an Artist. How does seaborn fit into this model? Does a seaborn call produce Artists?

  4. Chapter 11: You learned the five essential chart types in matplotlib. Which seaborn axes-level function corresponds to each one?

  5. Chapter 12: Styling is systematic, not ad hoc. Does seaborn replace the style system from Chapter 12, or does it work on top of it?

  6. Chapter 13: Multi-panel layouts in matplotlib use GridSpec. How does seaborn's figure-level faceting compare to GridSpec — what does it do better, and what does it do worse?