Chapter 15 Quiz: matplotlib Foundations

Q: In matplotlib's object-oriented interface, what does the `Figure` object represent? - (A) A single chart with one set of axes - (B) The top-level container ("canvas") that can hold one or more Axes - (C) A collection of data points to be plotted - (D) The color scheme applied to a chart

Correct: (B) The `Figure` is the top-level container — think of it as the blank piece of paper. It can hold one `Axes` (a single chart) or multiple `Axes` (subplots). The `Axes` is where the actual data is plotted. The distinction between Figure and Axes is fundamental to the OO interface.

Q: What does `ax.set_ylim(0, 100)` do? - (A) Sets the y-axis label to "0 to 100" - (B) Filters the data to only include values between 0 and 100 - (C) Sets the visible range of the y-axis from 0 to 100 - (D) Sets the number of gridlines on the y-axis to 100

Correct: (C) `set_ylim(min, max)` controls the visible range of the y-axis — what portion of the coordinate space is shown. It does not filter data (data points outside the range are simply not visible) or affect labels or gridlines directly.

Q: In a scatter plot, what does the `alpha` parameter control? - (A) The size of the data points - (B) The shape of the data points - (C) The transparency of the data points (0 = invisible, 1 = fully opaque) - (D) The color saturation of the data points

Correct: (C) Alpha controls transparency. An alpha of 0.5 means points are 50% transparent, allowing overlapping points to be visible (their colors blend). This is particularly useful in scatter plots with many data points, where full opacity (alpha=1.0) would hide the density of overlapping points.

Contributors to Introduction to Data Science

Chapter 15 Quiz: matplotlib Foundations

Instructions: This quiz tests your understanding of Chapter 15. Answer all questions before checking the solutions. For code analysis questions, read the code carefully and predict the output or identify the error. Total points: 100.

Section 1: Multiple Choice (8 questions, 4 points each)

Question 1. In matplotlib's object-oriented interface, what does the Figure object represent?

(A) A single chart with one set of axes
(B) The top-level container ("canvas") that can hold one or more Axes
(C) A collection of data points to be plotted
(D) The color scheme applied to a chart

Answer

**Correct: (B)** The `Figure` is the top-level container — think of it as the blank piece of paper. It can hold one `Axes` (a single chart) or multiple `Axes` (subplots). The `Axes` is where the actual data is plotted. The distinction between Figure and Axes is fundamental to the OO interface.

Question 2. Which line of code correctly creates a figure with 2 rows and 3 columns of subplots?

(A) fig, axes = plt.subplots(3, 2)
(B) fig, axes = plt.subplots(2, 3)
(C) fig, axes = plt.subplot(2, 3)
(D) fig, axes = plt.subplots(rows=2, cols=3)

Answer

**Correct: (B)** `plt.subplots(nrows, ncols)` takes rows first, then columns. So `plt.subplots(2, 3)` creates a 2-row by 3-column grid. Option (A) creates 3 rows and 2 columns. Option (C) uses `subplot` (singular, no 's') which is the older interface and returns a single axes, not a figure-axes pair. Option (D) uses incorrect parameter names.

Question 3. What does ax.set_ylim(0, 100) do?

(A) Sets the y-axis label to "0 to 100"
(B) Filters the data to only include values between 0 and 100
(C) Sets the visible range of the y-axis from 0 to 100
(D) Sets the number of gridlines on the y-axis to 100

Answer

**Correct: (C)** `set_ylim(min, max)` controls the visible range of the y-axis — what portion of the coordinate space is shown. It does not filter data (data points outside the range are simply not visible) or affect labels or gridlines directly.

Question 4. Which method adds a horizontal reference line across the entire chart?

(A) ax.hline(y=50)
(B) ax.axhline(y=50)
(C) ax.plot([0, 100], [50, 50])
(D) Both (B) and (C) achieve the same visual result

Answer

**Correct: (B)** `ax.axhline(y=50)` draws a horizontal line across the full width of the axes at y=50, regardless of the x-axis range. Option (A) is not a real matplotlib method. Option (C) draws a line segment between two specific x-coordinates, which would not automatically extend across the full chart width if the x-axis range changes. While (C) can approximate the same visual, (B) is the correct and robust approach.

Question 5. What does the dpi parameter in fig.savefig("chart.png", dpi=300) control?

(A) The number of data points plotted
(B) The resolution (dots per inch) of the output image
(C) The font size of the title
(D) The compression quality of the PNG file

Answer

**Correct: (B)** DPI stands for "dots per inch" and controls the resolution of raster images (like PNG). Higher DPI means more pixels, sharper images, and larger file sizes. 72 DPI is screen quality, 150 is good for documents, and 300 is standard for print publications. DPI does not affect vector formats like SVG or PDF.

Question 6. Why is fig.tight_layout() recommended before saving or displaying a figure?

(A) It compresses the file size of the saved image
(B) It automatically adjusts spacing to prevent titles, labels, and subplots from overlapping
(C) It removes all whitespace around the figure
(D) It converts the figure from the pyplot interface to the OO interface

Answer

**Correct: (B)** `tight_layout()` automatically adjusts the padding between and around subplots so that axis labels, titles, and tick labels don't overlap with each other or get cut off. It's especially important for multi-panel figures. Option (C) is partially related — `bbox_inches="tight"` in `savefig()` trims external whitespace — but `tight_layout()` handles internal spacing.

Question 7. In a scatter plot, what does the alpha parameter control?

(A) The size of the data points
(B) The shape of the data points
(C) The transparency of the data points (0 = invisible, 1 = fully opaque)
(D) The color saturation of the data points

Answer

**Correct: (C)** Alpha controls transparency. An alpha of 0.5 means points are 50% transparent, allowing overlapping points to be visible (their colors blend). This is particularly useful in scatter plots with many data points, where full opacity (alpha=1.0) would hide the density of overlapping points.

Question 8. What is the primary advantage of the object-oriented (OO) interface over the pyplot interface?

(A) The OO interface produces higher-resolution images
(B) The OO interface is faster to execute
(C) The OO interface makes it explicit which Figure and Axes you're modifying, enabling complex multi-panel layouts
(D) The OO interface supports more chart types

Answer

**Correct: (C)** Both interfaces produce identical output and support the same chart types at the same speed. The advantage of the OO interface is *explicitness*: you always know which Axes object you're modifying because you call methods directly on it (`ax.set_title(...)` rather than `plt.title(...)`). This is essential for multi-panel figures and for passing axes to functions. The pyplot interface relies on implicit "current axes" state, which can lead to confusing bugs.

Section 2: True or False (3 questions, 4 points each)

Question 9. True or False: The ax.bar() method requires the y-axis to start at zero; matplotlib enforces this automatically.

Answer

**False.** matplotlib does *not* enforce a zero baseline for bar charts. It will auto-scale the axis to fit the data, which might start well above zero. It is the *data scientist's* responsibility to set `ax.set_ylim(0, ...)` for bar charts. The principle that bars should start at zero comes from perceptual science (bars encode values as lengths), not from the software.

Question 10. True or False: fig.savefig("chart.svg") produces a vector graphics file where text and lines remain sharp at any zoom level.

Answer

**True.** SVG (Scalable Vector Graphics) is a vector format that stores shapes, lines, and text as mathematical descriptions rather than pixels. This means the output remains perfectly sharp at any zoom level or print size. SVG is ideal for web display and for figures that need to scale, while PNG is a raster format that becomes pixelated when enlarged beyond its native resolution.

Question 11. True or False: When using plt.subplots(1, 3, sharey=True), all three panels will display the same data.

Answer

**False.** `sharey=True` means all three panels will share the same y-axis *scale* (the same range and tick marks), not the same data. You still plot different data on each panel — but the shared scale ensures fair visual comparison. Without shared axes, each panel auto-scales independently, which can make small variations in one panel look as dramatic as large variations in another.

Section 3: Short Answer (3 questions, 6 points each)

Question 12. Explain the difference between plt.plot(x, y) (pyplot style) and ax.plot(x, y) (OO style). When would the difference matter?

Answer

`plt.plot(x, y)` operates on the "current axes" — an implicit, global state managed by matplotlib behind the scenes. `ax.plot(x, y)` operates on a specific, explicitly named Axes object. The difference matters in two key situations: 1. **Multi-panel figures:** When you have multiple Axes (subplots), `plt.plot()` might modify the wrong one because the "current axes" may not be the one you intend. `ax.plot()` always modifies the specific axes you name. 2. **Functions:** If you write a function that creates a chart, passing an `ax` parameter makes it clear which axes the function operates on. Using `plt` functions inside a function creates hidden state dependencies. For a single, quick exploratory chart, both work identically. For anything more complex, the OO style is safer and clearer.

Question 13. You want to show vaccination rates for 20 countries, but when you create a bar chart, the x-axis labels overlap and become unreadable. Describe two different solutions to this problem.

Answer

**Solution 1: Rotate the labels.** Use `ax.set_xticklabels(labels, rotation=45, ha="right")` to angle the labels so they don't overlap. A 45-degree rotation is usually enough, and `ha="right"` (horizontal alignment = right) keeps them anchored neatly. **Solution 2: Use a horizontal bar chart.** Switch from `ax.bar()` to `ax.barh()` so the category labels appear along the y-axis, where they read naturally from left to right with plenty of space. This is often the better solution for many categories because it completely eliminates the label collision problem. Other valid approaches include: increasing the figure width (`figsize`), reducing font size, abbreviating labels, or showing only the top/bottom N countries.

Question 14. What is a colormap in matplotlib, and when would you use one? Name one sequential colormap and one diverging colormap.

Answer

A **colormap** is a mapping from numerical values to colors, used to encode a continuous variable as color. In matplotlib, colormaps are specified with the `cmap` parameter in functions like `scatter()` and `imshow()`. You use a colormap when a third variable needs to be shown via color — for example, coloring scatter plot points by population or by temperature. The choice of colormap should match the data type: - **Sequential colormap** (for ordered data from low to high): `"viridis"` (blue-green-yellow, perceptually uniform and colorblind-safe), `"Blues"`, `"YlOrRd"`. - **Diverging colormap** (for data with a meaningful center point): `"coolwarm"` (blue at negative, red at positive, white in the middle), `"RdBu"`, `"PiYG"`.

Section 4: Code Analysis (4 questions, 5 points each)

Question 15. What is wrong with this code? Identify the error and explain how to fix it.

fig, axes = plt.subplots(1, 3)
axes[0].plot([1, 2, 3], [4, 5, 6])
axes[1].bar(["A", "B"], [10, 20])
axes[2].scatter([1, 2], [3, 4])
plt.title("My Dashboard")
plt.show()

Answer

**Error:** `plt.title("My Dashboard")` uses the pyplot interface, which sets the title on the "current axes" — likely `axes[2]` (the last one created). It does *not* set a title for the entire figure. **Fix:** Use `fig.suptitle("My Dashboard")` to set a figure-level title that appears above all three panels. Also add `fig.tight_layout()` to prevent overlap.

fig.suptitle("My Dashboard")
fig.tight_layout()
plt.show()

Question 16. This code is supposed to create a bar chart with the y-axis starting at zero, but the output shows bars floating above the baseline. What's wrong?

fig, ax = plt.subplots()
ax.bar(["East", "West", "North", "South"], [85, 78, 82, 90])
ax.set_title("Regional Scores")
plt.show()

Answer

**Actually, nothing is wrong with the zero-baseline.** matplotlib's `bar()` function anchors bars to zero by default, so the y-axis should start at zero if auto-scaled. The bars won't "float" — they grow upward from y=0. However, if the questioner observes the bars starting above zero, the most likely issue is that the auto-scaled y-axis has been adjusted to start slightly above zero for visual padding. The fix is to explicitly set `ax.set_ylim(0, 100)` to guarantee a zero baseline:

ax.set_ylim(0, 100)

This is a best practice for bar charts even when it seems redundant — it makes your intent explicit and protects against edge cases in auto-scaling.

Question 17. Predict the output of this code. How many panels will the figure have, and what will appear in each?

fig, axes = plt.subplots(2, 1, figsize=(8, 8), sharex=True)

axes[0].plot([2020, 2021, 2022, 2023], [100, 120, 115, 140],
             color="steelblue", label="Revenue")
axes[0].set_ylabel("Revenue ($K)")
axes[0].legend(frameon=False)

axes[1].bar([2020, 2021, 2022, 2023], [50, 55, 48, 62],
            color="seagreen")
axes[1].set_ylabel("Profit ($K)")
axes[1].set_xlabel("Year")

fig.suptitle("Revenue and Profit Trends")
fig.tight_layout()
plt.show()

Answer

The figure has **2 panels arranged vertically** (2 rows, 1 column): - **Top panel (axes[0]):** A line chart of revenue from 2020-2023, in steelblue, with a legend showing "Revenue" and a y-axis label "Revenue ($K)." - **Bottom panel (axes[1]):** A bar chart of profit from 2020-2023, in seagreen, with a y-axis label "Profit ($K)" and an x-axis label "Year." Because `sharex=True`, both panels share the same x-axis range (2020-2023), and the x-axis labels appear only on the bottom panel. The figure-level title "Revenue and Profit Trends" appears above both panels. This is a common layout for showing related metrics over the same time period.

Question 18. This code runs without error but produces a confusing chart. Identify at least two design problems.

fig, ax = plt.subplots()
ax.bar(["Q1", "Q2", "Q3", "Q4"], [250, 310, 290, 340],
       color=["red", "green", "blue", "yellow"],
       edgecolor="black", linewidth=3)
ax.set_title("Bars")
ax.set_ylim(200, 360)
plt.show()

Answer

**Problem 1: Y-axis starts at 200, not zero.** This is a bar chart, and bars encode values as lengths. With the axis starting at 200, Q1's bar (250) appears much shorter relative to Q4's bar (340) than the actual 36% difference warrants. The visual exaggeration is misleading. Fix: `ax.set_ylim(0, 380)`. **Problem 2: Four different colors for no reason.** Color doesn't encode any variable — all four bars are the same type of data (quarterly revenue). Using four colors is decorative chartjunk that suggests the bars are categorically different when they're not. Fix: Use a single color for all bars. **Problem 3: Heavy black borders (linewidth=3).** The thick black edges compete with the bars themselves, reducing the data-ink ratio. Fix: Remove or thin the edges (`linewidth=0` or `edgecolor="white"`). **Problem 4: Uninformative title.** "Bars" says nothing about the data or the finding. Fix: Use a descriptive title like "Q4 Revenue Reached a Year-High of $340K."

Section 5: Applied Scenarios (2 questions, 8 points each)

Question 19. Elena has a DataFrame with vaccination rates for 6 WHO regions across 5 years (2019-2023). She wants to create a visualization that shows both (a) the comparison across regions in 2023 and (b) how each region's rate changed over time.

Design a figure that accomplishes both goals. Specify: How many panels? What chart type for each? What axes? What shared settings? Write pseudocode or describe the figure structure in detail.

Answer

**Recommended design: A 1x2 figure (two panels side by side).** **Left panel:** Bar chart of 2023 vaccination rates by region. - x-axis: WHO region - y-axis: Vaccination rate (%), starting at 0 - Color: Single color, with the lowest region highlighted - Purpose: Static comparison across regions for the most recent year **Right panel:** Line chart of vaccination rates from 2019-2023, one line per region. - x-axis: Year - y-axis: Vaccination rate (%), shared with left panel (sharey=True) - Color: One color per region (matching a legend) - Purpose: Shows temporal trends — which regions improved, which declined

fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharey=True)

# Left: bar chart for 2023
axes[0].bar(regions, rates_2023, color="steelblue")
axes[0].set_title("2023 Rates by Region")
axes[0].set_ylabel("Vaccination Rate (%)")
axes[0].set_ylim(0, 100)

# Right: line chart over time
for region, color in zip(regions, color_list):
    axes[1].plot(years, region_data[region], label=region,
                 color=color, marker="o")
axes[1].set_title("Trends 2019-2023")
axes[1].legend(frameon=False, fontsize=9)
axes[1].set_xlabel("Year")

fig.suptitle("Global Vaccination Coverage: Current Status and Trends")
fig.tight_layout()

The shared y-axis ensures that the bar chart and line chart are directly comparable. The two panels complement each other — the bar chart shows "where we are" and the line chart shows "how we got here."

Question 20. Priya is building a scatter plot of three-point attempt rate vs. win percentage for 30 NBA teams. She wants to highlight the top 5 teams in a different color and label them with team names. Write the matplotlib code to accomplish this (you may use made-up but realistic data).

Answer

import matplotlib.pyplot as plt
import random

random.seed(42)
teams = [f"Team {i}" for i in range(1, 31)]
three_pt_rate = [random.uniform(30, 45) for _ in range(30)]
win_pct = [0.3 + 0.01 * r + random.uniform(-0.1, 0.1)
           for r in three_pt_rate]

# Identify top 5 by win percentage
sorted_idx = sorted(range(30), key=lambda i: win_pct[i], reverse=True)
top5 = set(sorted_idx[:5])

fig, ax = plt.subplots(figsize=(10, 7))

# Plot all teams
for i in range(30):
    if i in top5:
        ax.scatter(three_pt_rate[i], win_pct[i], color="tomato",
                   s=80, zorder=3)
        ax.annotate(teams[i], (three_pt_rate[i], win_pct[i]),
                    textcoords="offset points", xytext=(5, 5),
                    fontsize=8, color="tomato")
    else:
        ax.scatter(three_pt_rate[i], win_pct[i], color="steelblue",
                   s=50, alpha=0.6)

ax.set_title("Top 5 Teams by Win % Tend to Shoot More Threes")
ax.set_xlabel("Three-Point Attempt Rate (%)")
ax.set_ylabel("Win Percentage")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
plt.show()

Key techniques: loop with conditional coloring, `zorder=3` to ensure highlighted points appear on top, `annotate` with offset for labeling, and selective transparency (`alpha=0.6`) for background points.

Scoring Guide

Section	Points
Multiple Choice (8 x 4)	32
True/False (3 x 4)	12
Short Answer (3 x 6)	18
Code Analysis (4 x 5)	20
Applied Scenarios (2 x 8)	16
Total	100

90-100: Excellent — you're ready to build matplotlib charts for any project. 80-89: Strong. Review the code analysis questions to catch any syntax or design patterns you missed. 70-79: Adequate. Practice building charts from scratch (not copying examples) to solidify the OO interface. Below 70: Revisit the chapter, especially Sections 15.1-15.7. Building several charts from the exercises will help cement the patterns.

End of Chapter 15 Quiz