Chapter 15 Exercises: matplotlib Foundations

Contributors to Introduction to Data Science

Chapter 15 Exercises: matplotlib Foundations

How to use these exercises: This is a hands-on chapter, so most exercises require writing and running code. Complete them in a Jupyter notebook so you can see your charts inline. Start each exercise with a chart plan (even a quick mental one), then code it. Build iteratively — start with a basic chart, then customize.

Difficulty key: (star) Foundational | (star)(star) Intermediate | (star)(star)(star) Advanced | (star)(star)(star)(star) Extension

Part A: Core Chart Types (star)

These exercises verify you can create the four basic chart types with the OO interface.

Exercise 15.1 — Your first line chart

Create a line chart showing the following monthly temperature data for a fictional city. Use the OO interface (fig, ax = plt.subplots()). Include a descriptive title, axis labels with units, and markers at each data point.

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
temps_f = [28, 32, 42, 55, 66, 75, 82, 79, 70, 57, 43, 31]

Guidance

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(months, temps_f, color="steelblue", linewidth=2,
        marker="o", markersize=6)
ax.set_title("Monthly Temperatures Peak in July at 82F")
ax.set_xlabel("Month")
ax.set_ylabel("Temperature (F)")
ax.grid(True, alpha=0.3)
fig.tight_layout()
plt.show()

Exercise 15.2 — Bar chart of course enrollments

Create a bar chart comparing enrollment numbers for five university courses:

courses = ["Intro Stats", "Data Science", "Calc I", "Linear Alg", "ML"]
enrollment = [320, 285, 410, 180, 225]

The y-axis must start at zero. Add value labels above each bar.

Guidance

fig, ax = plt.subplots(figsize=(9, 5))
bars = ax.bar(courses, enrollment, color="steelblue")
for bar, val in zip(bars, enrollment):
    ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 5,
            str(val), ha="center", va="bottom", fontsize=10)
ax.set_title("Calculus I Has the Highest Enrollment Among Math Courses")
ax.set_ylabel("Number of Students")
ax.set_ylim(0, 460)
fig.tight_layout()
plt.show()

Exercise 15.3 — Scatter plot of study hours vs. exam score

Create a scatter plot from the following data. Use alpha=0.6 for transparency.

hours = [2, 3, 1, 6, 5, 7, 4, 8, 3, 5, 6, 2, 7, 4, 9, 1, 8, 6, 3, 5]
scores = [55, 62, 45, 78, 72, 85, 68, 90, 58, 75,
          80, 50, 88, 65, 95, 42, 92, 77, 60, 73]

Guidance

fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(hours, scores, color="steelblue", s=60, alpha=0.6)
ax.set_title("More Study Hours Are Associated with Higher Exam Scores")
ax.set_xlabel("Hours Studied")
ax.set_ylabel("Exam Score")
fig.tight_layout()
plt.show()

Exercise 15.4 — Histogram of exam scores

Using the scores data from Exercise 15.3, create a histogram with 8 bins. Add white edges between bars.

Guidance

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(scores, bins=8, color="steelblue", edgecolor="white")
ax.set_title("Exam Score Distribution Is Roughly Uniform")
ax.set_xlabel("Exam Score")
ax.set_ylabel("Number of Students")
fig.tight_layout()
plt.show()

Exercise 15.5 — Horizontal bar chart

Using the courses and enrollment data from Exercise 15.2, create a horizontal bar chart using ax.barh(). Sort the bars from highest enrollment at the top to lowest at the bottom.

Guidance

# Sort by enrollment
sorted_pairs = sorted(zip(enrollment, courses))
sorted_enr = [p[0] for p in sorted_pairs]
sorted_courses = [p[1] for p in sorted_pairs]

fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(sorted_courses, sorted_enr, color="steelblue")
ax.set_xlabel("Number of Students")
ax.set_title("Course Enrollment Comparison")
ax.set_xlim(0, 460)
fig.tight_layout()
plt.show()

Part B: Customization (star)(star)

These exercises practice customizing charts for clarity and polish.

Exercise 15.6 — Multi-line chart with legend

Plot the following three series on the same chart. Use different colors for each, add a legend, and make the lines distinguishable.

years = [2018, 2019, 2020, 2021, 2022, 2023]
product_a = [120, 135, 110, 145, 160, 175]
product_b = [80, 85, 95, 100, 110, 125]
product_c = [200, 190, 170, 180, 195, 210]

Guidance

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(years, product_a, label="Product A", color="steelblue",
        linewidth=2, marker="o")
ax.plot(years, product_b, label="Product B", color="tomato",
        linewidth=2, marker="s")
ax.plot(years, product_c, label="Product C", color="seagreen",
        linewidth=2, marker="^")
ax.legend(frameon=False, fontsize=11)
ax.set_title("Product C Leads Revenue but All Three Are Growing")
ax.set_xlabel("Year")
ax.set_ylabel("Revenue ($K)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
plt.show()

Exercise 15.7 — Removing chart clutter

Start with this code, which produces a cluttered chart. Modify it to follow Tufte's principles: remove the top and right spines, lighten the gridlines, remove the legend border, and change the title from a topic label to a finding.

import matplotlib.pyplot as plt

categories = ["Q1", "Q2", "Q3", "Q4"]
values = [45, 52, 48, 61]

fig, ax = plt.subplots()
ax.bar(categories, values, color=["red", "blue", "green", "orange"])
ax.set_title("Bar Chart")
ax.grid(True)
plt.show()

Guidance

Key changes: single color for all bars (color doesn't encode a variable), descriptive title, faint gridlines, removed spines, y-axis starting at zero.

fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(categories, values, color="steelblue")
ax.set_title("Q4 Revenue Jumped 27% Over Q3")
ax.set_ylabel("Revenue ($K)")
ax.set_ylim(0, 70)
ax.grid(True, alpha=0.2, axis="y")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
plt.show()

Exercise 15.8 — Color-coded scatter plot

Extend the scatter plot from Exercise 15.3. Assign each student to a study group ("Group A" or "Group B") and color the points by group. Add a legend.

groups = ["A","B","A","A","B","B","A","B","A","B",
          "A","B","B","A","B","A","A","B","B","A"]

Guidance

colors_map = {"A": "steelblue", "B": "tomato"}
point_colors = [colors_map[g] for g in groups]

fig, ax = plt.subplots(figsize=(8, 6))
for group, color in colors_map.items():
    mask = [g == group for g in groups]
    h_g = [h for h, m in zip(hours, mask) if m]
    s_g = [s for s, m in zip(scores, mask) if m]
    ax.scatter(h_g, s_g, color=color, s=60, alpha=0.7,
               label=f"Group {group}")

ax.legend(frameon=False)
ax.set_title("Study Hours vs. Exam Score by Study Group")
ax.set_xlabel("Hours Studied")
ax.set_ylabel("Exam Score")
fig.tight_layout()
plt.show()

Exercise 15.9 — Annotation practice

Take the line chart from Exercise 15.1 (monthly temperatures). Add: 1. A horizontal reference line at the annual average temperature 2. An annotation with an arrow pointing to the hottest month 3. An annotation pointing to the coldest month

Guidance

avg_temp = sum(temps_f) / len(temps_f)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(months, temps_f, color="steelblue", linewidth=2, marker="o")

ax.axhline(y=avg_temp, color="gray", linestyle="--", alpha=0.5,
           label=f"Annual Avg ({avg_temp:.0f}F)")
ax.annotate("Peak: 82F", xy=(6, 82), xytext=(8, 86),
            arrowprops=dict(arrowstyle="->", color="tomato"),
            color="tomato", fontsize=10)
ax.annotate("Low: 28F", xy=(0, 28), xytext=(2, 22),
            arrowprops=dict(arrowstyle="->", color="navy"),
            color="navy", fontsize=10)
ax.legend(frameon=False)
ax.set_title("Temperatures Range 54 Degrees Between Winter and Summer")
ax.set_xlabel("Month")
ax.set_ylabel("Temperature (F)")
fig.tight_layout()
plt.show()

Exercise 15.10 — Highlighting a single bar

Create a bar chart of the enrollment data from Exercise 15.2, but highlight the "Data Science" bar in a different color from all others. Add a text annotation explaining why it's highlighted ("Our department").

Guidance

bar_colors = ["tomato" if c == "Data Science" else "steelblue"
              for c in courses]
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(courses, enrollment, color=bar_colors)
ax.text(1, 290, "Our dept", ha="center", fontsize=10, color="tomato")
ax.set_title("Data Science Enrollment Is Mid-Range Among Math Courses")
ax.set_ylabel("Number of Students")
ax.set_ylim(0, 460)
fig.tight_layout()
plt.show()

Part C: Multi-Panel Figures (star)(star)

Exercise 15.11 — Side-by-side histograms

Create a figure with two panels (1 row, 2 columns). The left panel shows a histogram of Group A's exam scores; the right panel shows Group B's. Use sharey=True and give both the same x-axis range for fair comparison. Use the data from Exercises 15.3 and 15.8.

Guidance

scores_a = [s for s, g in zip(scores, groups) if g == "A"]
scores_b = [s for s, g in zip(scores, groups) if g == "B"]

fig, axes = plt.subplots(1, 2, figsize=(12, 5), sharey=True)
axes[0].hist(scores_a, bins=6, color="steelblue", edgecolor="white",
             range=(40, 100))
axes[0].set_title("Group A")
axes[0].set_xlabel("Exam Score")
axes[0].set_ylabel("Count")

axes[1].hist(scores_b, bins=6, color="tomato", edgecolor="white",
             range=(40, 100))
axes[1].set_title("Group B")
axes[1].set_xlabel("Exam Score")

fig.suptitle("Score Distributions by Study Group", fontsize=14)
fig.tight_layout()
plt.show()

Exercise 15.12 — 2x2 dashboard

Create a 2x2 figure that presents four views of the enrollment data: 1. Top-left: Vertical bar chart of enrollment 2. Top-right: Horizontal bar chart (sorted) 3. Bottom-left: A pie chart (yes, just this once for practice) 4. Bottom-right: A dot plot (scatter with enrollment on x-axis, courses on y-axis)

Guidance

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Top-left: vertical bar
axes[0, 0].bar(courses, enrollment, color="steelblue")
axes[0, 0].set_title("Vertical Bar Chart")
axes[0, 0].set_ylim(0, 460)

# Top-right: horizontal bar (sorted)
s_pairs = sorted(zip(enrollment, courses))
axes[0, 1].barh([p[1] for p in s_pairs], [p[0] for p in s_pairs],
                color="steelblue")
axes[0, 1].set_title("Horizontal Bar (Sorted)")

# Bottom-left: pie chart
axes[1, 0].pie(enrollment, labels=courses, autopct="%1.0f%%")
axes[1, 0].set_title("Pie Chart")

# Bottom-right: dot plot
axes[1, 1].scatter(enrollment, courses, color="steelblue", s=80)
axes[1, 1].set_title("Dot Plot")
axes[1, 1].set_xlabel("Enrollment")

fig.suptitle("Four Ways to Show the Same Data", fontsize=14)
fig.tight_layout()
plt.show()

Exercise 15.13 — Small multiples for time series

Create a 1x4 figure showing monthly sales data for four stores. Use shared y-axes. Add a reference line at the overall average across all stores and months.

import random
random.seed(123)
stores = {}
for name in ["Downtown", "Mall", "Airport", "Online"]:
    base = random.randint(40, 80)
    stores[name] = [base + random.randint(-10, 15) for _ in range(12)]

Guidance

all_values = [v for vals in stores.values() for v in vals]
overall_avg = sum(all_values) / len(all_values)

fig, axes = plt.subplots(1, 4, figsize=(18, 4), sharey=True)
for ax, (name, data) in zip(axes, stores.items()):
    ax.plot(range(1, 13), data, color="steelblue", marker="o",
            markersize=4, linewidth=1.5)
    ax.axhline(y=overall_avg, color="tomato", linestyle="--",
               alpha=0.5)
    ax.set_title(name)
    ax.set_xlabel("Month")
    ax.set_xticks([1, 4, 7, 10])

axes[0].set_ylabel("Sales ($K)")
fig.suptitle("Monthly Sales Across Four Locations", fontsize=14)
fig.tight_layout()
plt.show()

Part D: Real-World Applications (star)(star)(star)

Exercise 15.14 — Vaccination rates by region

Using pandas, create this DataFrame and then build a grouped bar chart comparing 2020 and 2023 rates:

import pandas as pd
df = pd.DataFrame({
    "region": ["Africa", "Americas", "SE Asia",
               "Europe", "E Med", "W Pacific"],
    "rate_2020": [42, 73, 78, 87, 68, 84],
    "rate_2023": [48, 79, 82, 91, 73, 88],
})

Add value labels above each bar. Use a descriptive title.

Guidance

Use the grouped bar technique from Section 15.9 with `np.arange` for positioning and `width` offsets for the two years.

Exercise 15.15 — Before and after: the Tufte makeover

Start with this deliberately ugly chart. Apply at least 7 specific improvements and produce a polished version. List each change you made.

import matplotlib.pyplot as plt

x = ["A", "B", "C", "D", "E"]
y = [34, 28, 42, 19, 37]

fig, ax = plt.subplots()
ax.bar(x, y, color=["red", "blue", "green", "purple", "orange"],
       edgecolor="black", linewidth=2)
ax.set_title("Chart")
ax.set_facecolor("lightgray")
ax.grid(True, color="black", linewidth=1)
plt.show()

Guidance

Changes to make: (1) single color for all bars, (2) descriptive title stating a finding, (3) remove gray background, (4) lighten gridlines, (5) reduce/remove bar edge, (6) start y-axis at zero, (7) add axis labels with units, (8) remove top/right spines, (9) add value labels on bars, (10) set appropriate figure size.

Exercise 15.16 — Saving in multiple formats

Take any chart you built in a previous exercise and save it in three formats: PNG at 300 DPI, SVG, and PDF. Use bbox_inches="tight". Verify that each file was created by checking the file size or opening it.

Guidance

fig.savefig("my_chart.png", dpi=300, bbox_inches="tight")
fig.savefig("my_chart.svg", bbox_inches="tight")
fig.savefig("my_chart.pdf", bbox_inches="tight")

In Jupyter, you can verify with `!ls -la my_chart.*` (Mac/Linux) or `!dir my_chart.*` (Windows).

Exercise 15.17 — Annotated policy chart

Build the following explanatory chart from scratch: - A line chart showing a country's vaccination rate from 2010 to 2023 - A vertical line marking when a new health policy was implemented (2017) - A text annotation labeling the policy introduction - A horizontal line at the 90% WHO target - A descriptive title that states whether the country reached the target

years = list(range(2010, 2024))
rates = [62, 65, 67, 70, 72, 74, 76, 80, 83, 86, 88, 85, 89, 92]

Guidance

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(years, rates, color="steelblue", linewidth=2.5, marker="o")
ax.axvline(x=2017, color="seagreen", linestyle="--", alpha=0.7)
ax.text(2017.2, 63, "Policy\nimplemented", color="seagreen",
        fontsize=10)
ax.axhline(y=90, color="tomato", linestyle=":", alpha=0.6)
ax.text(2023.3, 90.5, "90% WHO target", color="tomato", fontsize=9)
ax.set_title("Country Reached WHO 90% Vaccination Target in 2023,\n"
             "Six Years After Policy Reform", fontsize=13)
ax.set_xlabel("Year")
ax.set_ylabel("Vaccination Rate (%)")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
plt.show()

Exercise 15.18 — Scatter plot with size encoding

Build a bubble chart where: - x-axis: Healthcare spending per capita - y-axis: Life expectancy - Bubble size: Population - Color: Continent

Use the following data:

import pandas as pd
df = pd.DataFrame({
    "country": ["USA", "Germany", "Brazil", "Nigeria", "Japan",
                "India", "Australia", "Egypt", "Canada", "Kenya"],
    "health_spend": [11000, 6000, 900, 70, 4500,
                     75, 5400, 150, 5600, 80],
    "life_exp": [77, 81, 75, 54, 84, 69, 83, 72, 82, 67],
    "population_m": [330, 84, 213, 220, 125, 1400, 26, 104, 38, 54],
    "continent": ["N America", "Europe", "S America", "Africa",
                  "Asia", "Asia", "Oceania", "Africa",
                  "N America", "Africa"]
})

Guidance

Use a dictionary mapping continents to colors. Scale population for bubble sizes (e.g., multiply by 0.3). Add a legend manually or label individual points.

Part E: Synthesis and Challenge (star)(star)(star)(star)

Exercise 15.19 — The complete workflow

Choose a research question that can be answered with the vaccination rate data (or create your own small dataset). Follow the complete workflow: 1. Write a chart plan (question, chart type, axes, color, audience) 2. Build the chart in matplotlib using the OO interface 3. Customize for explanatory use (title, labels, annotations, clean design) 4. Save in PNG at 300 DPI

Submit both your chart plan and the code.

Guidance

Grade based on: (a) chart plan completeness, (b) appropriate chart type for the question, (c) correct use of OO interface, (d) explanatory polish (title states finding, axis labels have units, minimal clutter), (e) saved file quality.

Exercise 15.20 — Recreate a published chart

Find a simple chart in a news article or data journalism piece (from sources like FiveThirtyEight, The Economist, Our World in Data, or similar). Using made-up but realistic data, recreate the chart's style in matplotlib as closely as you can. Focus on matching: - Chart type and layout - Color palette - Typography choices - Grid and spine styling

Note which design choices the original makes that align with (or violate) Tufte's principles.

Guidance

This exercise develops both technical matplotlib skills and design observation skills. Don't worry about pixel-perfect reproduction — focus on the structural choices: What chart type? Where are the spines? How heavy are the gridlines? What does the title communicate?

Exercise 15.21 — Dynamic figure sizing

Write a function make_bar_chart(categories, values, highlight=None) that: - Creates a bar chart with the given data - Automatically sizes the figure width based on the number of categories - Highlights the bar at the highlight index in a different color - Returns the fig, ax objects

Test it with 5 categories and with 15 categories.

Guidance

def make_bar_chart(categories, values, highlight=None):
    width = max(6, len(categories) * 1.2)
    fig, ax = plt.subplots(figsize=(width, 5))
    colors = ["darkorange" if i == highlight else "steelblue"
              for i in range(len(categories))]
    ax.bar(categories, values, color=colors)
    ax.set_ylim(0, max(values) * 1.15)
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    fig.tight_layout()
    return fig, ax

Part F: Progressive Project (star)(star)

Exercise 15.22 — Bar chart of vaccination rates by region

Using the progressive project vaccination data (or the sample data below), create a publication-quality bar chart comparing vaccination rates across WHO regions. Apply all the techniques from this chapter: OO interface, descriptive title, zero-baseline y-axis, reference line at the global average, one highlighted bar, clean design. Save as PNG at 300 DPI.

regions = ["Africa", "Americas", "SE Asia", "Europe", "E Med", "W Pacific"]
rates_2023 = [48, 79, 82, 91, 73, 88]

Guidance

This is the Chapter 15 project milestone. Follow the chart plan from Exercise 14.10 and implement it fully.

Exercise 15.23 — Line chart of vaccination trends

Create a multi-panel line chart (1 row, 3 panels, shared y-axis) showing vaccination trends over time for three countries with different trajectories (one improving, one declining, one volatile). Add reference lines and annotations.

Guidance

Use the small multiples technique from Section 15.11. Choose three countries from your project data or create representative data.

Exercise 15.24 — Scatter plot of GDP vs. vaccination rate

Build a scatter plot with GDP per capita on the x-axis and vaccination rate on the y-axis. Color-code by WHO region. Add a colorbar or legend. Include at least one annotation labeling a notable outlier.

Guidance

For the region color coding, loop over unique regions and plot each group separately with its own color and label. This allows a proper legend. For the outlier, use `ax.annotate()` with an arrow.

Exercise 15.25 — Histogram of vaccination rate distribution

Create a histogram of vaccination rates for all countries. Experiment with at least three different bin counts. Choose the bin count that best reveals the distribution's shape. Add a vertical line at the median. Write a one-sentence interpretation of what the distribution tells you.

Guidance

Start with bins=10, 20, and 30. The right choice depends on the data — look for the version that shows meaningful structure without excessive noise. The median line (`ax.axvline()`) provides a reference point. Your interpretation should address shape (normal? skewed? bimodal?) and what that means for global health equity.

End of Chapter 15 Exercises