Exercises: seaborn Philosophy

DataField.Dev

Exercises: seaborn Philosophy

These exercises require a Python environment with seaborn installed (pip install seaborn). All exercises assume import seaborn as sns and use seaborn's built-in datasets.

Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name the three seaborn function families and describe the type of question each answers.

Guidance

**Relational** (relplot, scatterplot, lineplot): "How do two continuous variables relate?" **Distributional** (displot, histplot, kdeplot, ecdfplot): "How is a single variable distributed?" **Categorical** (catplot, stripplot, swarmplot, boxplot, violinplot, barplot, pointplot, countplot): "How do values compare across categories?"

A.2 ★☆☆ | Recall

Explain the difference between figure-level and axes-level seaborn functions. Give one example of each.

Guidance

**Figure-level** functions (relplot, displot, catplot, pairplot, jointplot, lmplot, clustermap) create their own figure and return a FacetGrid. They support faceting via `col` and `row` parameters. **Axes-level** functions (scatterplot, histplot, boxplot, etc.) target an existing Axes via the `ax` parameter and return that Axes. Axes-level functions integrate with manual matplotlib layouts; figure-level functions handle their own layout.

A.3 ★★☆ | Understand

What does "tidy data" mean in the context of seaborn, and why does seaborn prefer it?

Guidance

Tidy data means: each variable is a column, each observation is a row. A long-form DataFrame where, for example, each (year, variable) pair gets its own row. seaborn prefers this format because it allows direct column-name mapping in function calls (`x="year", y="value", hue="variable"`). With wide-form data, the same visualization would require loops. The `pd.melt` function converts wide to tidy.

A.4 ★★☆ | Understand

Explain what hue, style, and size parameters do in seaborn. Give an example using all three.

Guidance

`hue` maps a variable to color. `style` maps a variable to marker shape (or line style). `size` maps a variable to marker size. Example: `sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker", style="time", size="size")` encodes five variables (x, y, hue, style, size) on a single chart. seaborn handles the legend and color palette automatically.

A.5 ★★☆ | Analyze

The chapter's threshold concept is the shift from imperative matplotlib to declarative seaborn. Explain this shift in your own words and give a specific example where it matters.

Guidance

Imperative: you write the iteration explicitly — "for each group in the data, plot a line in a color from the palette." Declarative: you map a column to a visual channel — `hue="group"` — and the library handles the iteration. The shift matters for grouped visualizations where the matplotlib version is a loop and the seaborn version is a parameter. Example: plotting 20 state trajectories. In matplotlib: a loop over states with manual color selection and manual legend building. In seaborn: `sns.lineplot(data=df, x="date", y="value", hue="state")` — one line.

A.6 ★★★ | Evaluate

The chapter says that seaborn is not always better than matplotlib. Under what circumstances would you prefer matplotlib?

Guidance

Prefer matplotlib when: you need precise layout control with GridSpec, you need typographic polish beyond seaborn's themes, your data is not in a DataFrame, you need a chart type seaborn does not support (quiver, 3D, custom types), or you are producing a one-off publication figure that needs extensive customization. seaborn is not a replacement for matplotlib but a higher-level layer; drop down to matplotlib when the higher level is not enough.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Install seaborn (if not already installed), import it as sns, and load the tips dataset. Print the first 5 rows.

Guidance

import seaborn as sns
tips = sns.load_dataset("tips")
print(tips.head())

B.2 ★☆☆ | Apply

Create a simple scatter plot of tip vs. total_bill from the tips dataset using sns.scatterplot.

Guidance

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.show()

B.3 ★★☆ | Apply

Extend B.2 to color points by smoker (yes/no) using the hue parameter. Add a title and save as PNG.

Guidance

fig, ax = plt.subplots(figsize=(8, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker", ax=ax)
ax.set_title("Tip vs. Total Bill, by Smoker Status")
fig.savefig("tips_by_smoker.png", dpi=300, bbox_inches="tight")

B.4 ★★☆ | Apply

Use sns.relplot (figure-level) to create a scatter plot of tip vs. total_bill, faceted by day (one panel per day of the week).

Guidance

g = sns.relplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="day",
    kind="scatter",
    height=3,
    aspect=1,
)

This produces a 1×4 grid of scatter plots, one per day.

B.5 ★★☆ | Apply

Apply a seaborn theme with sns.set_theme(style="whitegrid", context="notebook") and reproduce the chart from B.3. Compare the appearance to the un-themed version.

Guidance

sns.set_theme(style="whitegrid", context="notebook")

fig, ax = plt.subplots(figsize=(8, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker", ax=ax)
ax.set_title("Tip vs. Total Bill, by Smoker Status")

The themed version has cleaner spine treatment, whitegrid background, and better default fonts. It is closer to publication-ready without additional customization.

B.6 ★★☆ | Apply

Create a wide-form DataFrame with columns [year, temperature, co2, sea_level] using synthetic data. Convert it to tidy form using pd.melt and verify the result has three columns: year, variable, value.

Guidance

import pandas as pd
import numpy as np

np.random.seed(42)
years = np.arange(1880, 2025)
wide = pd.DataFrame({
    "year": years,
    "temperature": -0.3 + (years - 1880) * 0.01,
    "co2": 290 + (years - 1880) * 0.9,
    "sea_level": (years - 1880) * 1.5,
})

tidy = wide.melt(id_vars="year", var_name="variable", value_name="value")
print(tidy.head(10))
print(tidy.columns.tolist())  # ['year', 'variable', 'value']

B.7 ★★★ | Apply

Use the tidy DataFrame from B.6 to create a three-panel line chart with sns.relplot, one panel per variable, using col_wrap=1 for vertical stacking and facet_kws={"sharey": False}.

Guidance

g = sns.relplot(
    data=tidy,
    x="year",
    y="value",
    col="variable",
    kind="line",
    col_wrap=1,
    height=3,
    aspect=3.5,
    facet_kws={"sharey": False},
)

The `facet_kws={"sharey": False}` argument is critical because the three variables have different units and ranges. Without it, they would share a y-axis and most variables would be squished.

B.8 ★★☆ | Apply

Access the underlying matplotlib Axes for B.7's FacetGrid and customize each panel's title.

Guidance

variable_names = {"temperature": "Temperature (°C)", "co2": "CO2 (ppm)", "sea_level": "Sea Level (mm)"}
for ax, var in zip(g.axes.flat, tidy["variable"].unique()):
    ax.set_title(variable_names[var], fontsize=10, loc="left")

`g.axes` is a numpy array of the figure-level function's Axes. Iterate over them and apply matplotlib customizations.

B.9 ★★☆ | Apply

Load the penguins dataset with sns.load_dataset("penguins"). Create a histogram of body_mass_g split by species using sns.histplot with hue and multiple="stack".

Guidance

penguins = sns.load_dataset("penguins")

fig, ax = plt.subplots(figsize=(8, 5))
sns.histplot(data=penguins, x="body_mass_g", hue="species", multiple="stack", ax=ax)
ax.set_title("Penguin Body Mass by Species")

The `multiple="stack"` parameter stacks the histograms. Other options include `"dodge"` (side-by-side), `"layer"` (overlaid with transparency), and `"fill"` (normalized).

B.10 ★★★ | Create

Using the penguins dataset, create a faceted chart with sns.relplot that shows bill_length_mm vs. bill_depth_mm, colored by species, with a separate panel for each island.

Guidance

g = sns.relplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    col="island",
    kind="scatter",
    height=4,
    aspect=1,
)

This creates a 1×3 grid (three islands), with each panel showing a scatter colored by species. The legend is shared across panels automatically.

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

Compare the matplotlib and seaborn versions of a three-panel climate chart (Section 16.10). Count the lines of code in each version. Identify three specific things seaborn handles automatically that matplotlib requires you to write explicitly.

Guidance

seaborn handles: (1) the iteration over variables (no for loop needed), (2) the spine removal (handled by the theme), (3) automatic y-axis scaling per panel (when `sharey=False`). Also: the legend construction, the per-panel sizing (via `height` and `aspect`), and the title handling via `g.fig.suptitle`.

C.2 ★★★ | Evaluate

Find a matplotlib tutorial or Stack Overflow answer that shows a multi-panel grouped chart (grouped bar chart, small multiple, or similar). Rewrite it in seaborn. Compare the two versions for readability and conciseness.

Guidance

The rewrite usually becomes much shorter because seaborn handles the grouping automatically. The readability depends on the audience: readers familiar with matplotlib may prefer the explicit version; readers familiar with seaborn will prefer the declarative version.

C.3 ★★★ | Create

Build a personal "seaborn starter template" file that includes the standard imports, a set_theme call with your preferred style and context, and a comment block describing when to use figure-level vs. axes-level functions. Save it for reuse.

Guidance

"""seaborn starter template."""

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set_theme(style="whitegrid", context="notebook", palette="colorblind")

# Usage notes:
# - For faceted charts: use figure-level (relplot, catplot, displot)
# - For single charts integrated with matplotlib layouts: use axes-level (scatterplot, lineplot, histplot)
# - For tidy data in a DataFrame, prefer the data= parameter with column names
# - For customization beyond seaborn's parameters, access .fig and .axes after the plot

C.4 ★★★ | Evaluate

The chapter argues that seaborn is a "higher-level interface to the same system" (matplotlib). Does this mean you should always learn matplotlib first, or can you learn seaborn without understanding matplotlib?

Guidance

The chapter's implicit argument is that matplotlib is the foundation — you need at least a basic understanding of Figure, Axes, and the OO API to use seaborn effectively, because seaborn returns matplotlib objects and relies on matplotlib for customization. A beginner can produce charts with pure seaborn, but the moment they need to customize (add titles, adjust fonts, remove spines), they have to drop down to matplotlib. Learning matplotlib first builds the mental model that seaborn inherits.

These exercises introduce the seaborn API. Do at least five Part B exercises before moving on to Chapter 17, which uses the same API for distributional visualization.