Chapter 16 Quiz: Statistical Visualization with seaborn

Q: True or false: The `"colorblind"` palette in seaborn is designed to be distinguishable by people with the most common forms of color vision deficiency.

True. The `"colorblind"` palette uses colors that remain distinct for people with deuteranopia (red-green color blindness, the most common form). Using it as your default is a simple accessibility improvement. We discuss visualization accessibility further in Chapter 18.

Contributors to Introduction to Data Science

Chapter 16 Quiz: Statistical Visualization with seaborn

Instructions: This quiz tests your understanding of Chapter 16. Answer all questions before checking the solutions. For multiple choice, select the best answer — some options may be partially correct. For code analysis questions, predict the output without running the code. Total points: 100.

Section 1: Multiple Choice (8 questions, 5 points each)

Question 1. Which seaborn function would you use to create a violin plot comparing vaccination coverage across WHO regions?

(A) sns.displot(data=df, x="region", y="coverage_pct", kind="violin")
(B) sns.catplot(data=df, x="region", y="coverage_pct", kind="violin")
(C) sns.relplot(data=df, x="region", y="coverage_pct", kind="violin")
(D) sns.violinplot(data=df, x="region", y="coverage_pct", kind="violin")

Answer

**Correct: (B)** Violin plots are categorical comparisons, so they belong to the `catplot()` family. `displot()` is for distributions (histograms, KDEs, ECDFs). `relplot()` is for scatter and line plots. `violinplot()` is the axes-level equivalent, but it does not accept a `kind` parameter — it *is* the kind. (D) would error because `violinplot` does not have a `kind` argument.

Question 2. What does the hue parameter do in seaborn plotting functions?

(A) Changes the overall color of the plot to a specified color
(B) Splits the data by a categorical variable and assigns each group a distinct color
(C) Sets the color palette for the entire seaborn session
(D) Adjusts the brightness of the plot background

Answer

**Correct: (B)** The `hue` parameter is one of seaborn's most powerful features. It takes a column name, splits the data by its unique values, and automatically assigns a distinct color to each group. A legend is added automatically. This is different from specifying a static `color="blue"` parameter, which sets a single color for all points.

Question 3. You want to create a scatter plot faceted into separate panels by income group. Which parameter should you use?

(A) hue="income_group"
(B) style="income_group"
(C) col="income_group"
(D) row="income_group"

Answer

**Correct: (C)** (or (D), depending on desired layout) Both `col` and `row` create faceted panels. `col` arranges panels horizontally (one per column), while `row` arranges them vertically. `hue` would overlay groups on the *same* panel using different colors. `style` would use different marker shapes. The question asks for "separate panels," which means faceting with `col` or `row`, not `hue`. (C) is the more common choice because horizontal layouts are easier to compare.

Question 4. Which plot type best reveals that a distribution is bimodal (has two distinct peaks)?

(A) Box plot
(B) Bar plot
(C) Violin plot
(D) Point plot

Answer

**Correct: (C)** A violin plot displays the full shape of the distribution using KDE, making bimodality visible as two bulges. A box plot reduces the distribution to five summary statistics (min, Q1, median, Q3, max) and completely hides multimodality. A bar plot shows only the mean. A point plot shows the mean with a confidence interval. Only the violin plot preserves enough distributional information to reveal two peaks.

Question 5. What does sns.set_theme(context="talk") change?

(A) It adds annotations explaining each plot element
(B) It scales font sizes, line widths, and marker sizes to be larger for presentations
(C) It changes the background to a dark theme suitable for projectors
(D) It enables interactive tooltips for live demos

Answer

**Correct: (B)** The `context` parameter scales all visual elements proportionally. `"talk"` makes everything larger so plots are readable on slides and projectors. The four options are `"paper"` (smallest), `"notebook"` (default), `"talk"` (larger), and `"poster"` (largest). It does not change the style (background, grid) or palette (colors) — those are separate parameters.

Question 6. In a seaborn correlation heatmap, what does center=0 do?

(A) Places zero at the center of the x-axis
(B) Ensures the colormap is symmetric around zero, so zero maps to the neutral color
(C) Filters out correlations equal to zero
(D) Shifts all correlation values so the mean is zero

Answer

**Correct: (B)** When using a diverging colormap like `"coolwarm"`, `center=0` ensures that a correlation of zero corresponds to the neutral color (white or pale). Positive correlations map to one end (warm/red), negative to the other (cool/blue), with equal intensity for equal magnitude. Without `center=0`, the colormap might be shifted if all correlations are positive (or all negative), making the visualization misleading.

Question 7. Which function creates a matrix of scatter plots showing all pairwise relationships between variables?

(A) sns.heatmap()
(B) sns.FacetGrid()
(C) sns.pairplot()
(D) sns.relplot()

Answer

**Correct: (C)** `pairplot()` creates an n-by-n grid where each off-diagonal cell is a scatter plot of two variables and each diagonal cell shows the distribution of one variable. `heatmap()` visualizes a numeric matrix as colored cells (no scatter plots). `FacetGrid()` creates multi-panel plots but requires you to specify what to plot. `relplot()` creates a single relational plot, not a pairwise matrix.

Question 8. What is the default error bar shown on seaborn bar plots (sns.barplot)?

(A) Standard deviation
(B) Standard error of the mean
(C) 95% bootstrap confidence interval
(D) Min-to-max range

Answer

**Correct: (C)** seaborn's `barplot` computes the mean of each category and shows a 95% bootstrap confidence interval by default. This tells you how uncertain the estimate of the mean is, not how spread out the data is. To show standard deviation instead, pass `errorbar="sd"`. To show standard error, pass `errorbar="se"`. This is a common source of misinterpretation — readers often mistake the CI for the data spread.

Section 2: True or False (3 questions, 5 points each)

Question 9. True or false: seaborn replaces matplotlib entirely; you never need to import matplotlib when using seaborn.

Answer

**False.** seaborn builds *on top* of matplotlib. Every seaborn plot is a matplotlib Figure and Axes underneath. You frequently need matplotlib for fine-tuning (axis labels, titles, saving figures, creating custom subplot layouts). The standard practice is to import both: `import matplotlib.pyplot as plt` and `import seaborn as sns`.

Question 10. True or false: The "colorblind" palette in seaborn is designed to be distinguishable by people with the most common forms of color vision deficiency.

Answer

**True.** The `"colorblind"` palette uses colors that remain distinct for people with deuteranopia (red-green color blindness, the most common form). Using it as your default is a simple accessibility improvement. We discuss visualization accessibility further in Chapter 18.

Question 11. True or false: sns.displot(kind="kde") and sns.kdeplot() always produce identical output.

Answer

**False.** While both create KDE curves, `displot(kind="kde")` is a figure-level function that creates its own Figure and returns a FacetGrid. `kdeplot()` is an axes-level function that draws onto an existing Axes. Their default behaviors can differ (e.g., figure size, support for `col`/`row` faceting). The underlying KDE computation is the same, but the function signatures and integration with matplotlib differ.

Section 3: Short Answer (4 questions, 5 points each)

Question 12. Name three types of information a box plot conveys that a simple bar plot (showing the mean) does not.

Answer

A box plot shows: (1) The **median** (which may differ from the mean), (2) The **interquartile range** (the spread of the middle 50% of data), and (3) **Outliers** (individual extreme values plotted as dots beyond the whiskers). A bar plot shows only the mean (and optionally a confidence interval around it), hiding the distribution's shape, spread, skewness, and outlier behavior.

Question 13. Explain the difference between using hue, col, and row to encode a categorical variable in a seaborn figure-level function.

Answer

`hue` encodes the variable as **color** within the same panel — all groups appear on the same axes with different colors. `col` creates **separate panels arranged horizontally**, one per category. `row` creates **separate panels arranged vertically**, one per category. Use `hue` when you want direct comparison on the same axes (few groups, little overlap). Use `col` or `row` when groups are too numerous or overlapping for color alone to distinguish them.

Question 14. What is the purpose of the inner parameter in sns.violinplot()? Name two valid options and describe what each shows.

Answer

The `inner` parameter controls what is drawn inside the violin shape. Valid options include: `"box"` — draws a miniature box plot inside the violin showing median, IQR, and whiskers. `"quartile"` — draws horizontal lines at the 25th, 50th, and 75th percentiles. `"stick"` — draws a line for each individual data point. `None` — draws nothing inside (useful when overlaying a strip plot).

Question 15. Why should you be cautious about using swarm plots when your dataset has more than a few hundred points per category?

Answer

Swarm plots position every individual data point without overlap by displacing them horizontally. With many points, the algorithm must push dots far from the center to avoid overlap, making the "swarm" very wide and slow to render. The result can be visually confusing, computationally expensive, and can extend beyond the category boundaries. For large datasets, violin plots or box plots summarize the distribution more effectively.

Section 4: Applied Scenarios (3 questions, 5 points each)

Question 16. You have a DataFrame with columns country, year, vaccination_rate, gdp, and region. You want to see how vaccination rates have changed over time, with a separate line for each region and a shaded confidence band. Write the seaborn code.

Answer

sns.relplot(data=df, x="year",
            y="vaccination_rate",
            hue="region", kind="line",
            height=5, aspect=1.5)

When multiple countries exist per region-year combination, `lineplot` (used internally by `relplot(kind="line")`) automatically computes the mean and draws a 95% confidence band. No manual aggregation needed. The `hue="region"` parameter creates separate colored lines with a legend.

Question 17. You create a scatter plot with 50,000 points and the result is an unreadable blob. Name two seaborn-based strategies and one matplotlib-based strategy to address overplotting.

Answer

**seaborn strategies:** (1) Use `sns.kdeplot(x=..., y=...)` to show a 2D density contour instead of individual points. (2) Use `sns.histplot(x=..., y=..., bins=50)` to create a 2D histogram with color-coded bins. **matplotlib strategy:** Use `plt.hexbin(df["x"], df["y"], gridsize=30, cmap="YlOrRd")` for a hexagonal binning plot that shows density without individual points. Other valid answers: adding `alpha=0.05` to make points semi-transparent (valid but still slow to render), or subsampling the data (pragmatic but loses information).

Question 18. Your manager asks you to create a heatmap showing average sales by day of the week (rows) and hour of the day (columns). The data is in a DataFrame with columns day, hour, and sales. Write the code to create the pivot table and heatmap.

Answer

pivot = df.pivot_table(values="sales",
                       index="day",
                       columns="hour",
                       aggfunc="mean")

sns.heatmap(pivot, annot=True, fmt=".0f",
            cmap="YlGnBu", linewidths=0.5)
plt.title("Average Sales by Day and Hour")

The `pivot_table()` reshapes the data into a matrix. `sns.heatmap()` visualizes it with color intensity proportional to value. `annot=True` prints numbers in each cell. `fmt=".0f"` rounds to integers for readability.

Section 5: Code Analysis (2 questions, 5 points each)

Question 19. What will the following code produce? Describe the visualization without running it.

sns.catplot(data=df, x="region", y="coverage_pct",
            hue="income_group", kind="box",
            col="year", col_wrap=3,
            height=3, aspect=1.2)

Answer

This creates a multi-panel figure where each panel corresponds to a different year (determined by the unique values in the `year` column). Within each panel, there is a box plot with `region` on the x-axis and `coverage_pct` on the y-axis. Within each region, side-by-side boxes are colored by `income_group`. Panels wrap to a new row after every 3 panels. Each panel is 3 inches tall and 3.6 inches wide (3 * 1.2). The result is a detailed view of how coverage distributions across regions and income groups have evolved over time.

Question 20. Identify the error in this code and explain how to fix it.

g = sns.catplot(data=df, x="region",
                y="coverage_pct", kind="bar")
g.set_xlabel("WHO Region")
g.set_ylabel("Mean Coverage (%)")

Answer

The error is that `catplot` returns a `FacetGrid` object, not a matplotlib Axes. `FacetGrid` does not have `set_xlabel()` or `set_ylabel()` methods — those belong to matplotlib Axes objects. The correct method for FacetGrid is `set_axis_labels()`:

g = sns.catplot(data=df, x="region",
                y="coverage_pct", kind="bar")
g.set_axis_labels("WHO Region",
                  "Mean Coverage (%)")

Alternatively, you could use the axes-level `sns.barplot()`, which returns an Axes object where `set_xlabel()` and `set_ylabel()` would work.