Chapter 14 Exercises: Introduction to Data Visualization with matplotlib
These exercises are organized into five tiers of increasing complexity. Exercises marked with a star (*) are recommended as minimum completion for the chapter.
Setup: Shared Data
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Monthly revenue data
monthly = pd.DataFrame({
"month": ["Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"],
"revenue": [42000, 38000, 51000, 55000, 47000, 60000,
63000, 58000, 67000, 71000, 65000, 79000],
"cost": [17000, 15500, 21000, 22500, 19000, 24000,
25500, 23500, 27000, 29000, 26500, 32000],
"north": [18000, 15500, 21000, 24000, 19000, 26000,
27000, 23000, 28000, 31000, 27000, 33000],
"south": [11000, 10000, 13000, 14000, 12000, 16000,
17000, 15000, 18000, 20000, 18000, 22000],
"west": [8000, 8500, 11000, 11000, 10000, 12000,
12000, 12000, 13000, 12000, 12000, 14000],
"east": [5000, 4000, 6000, 6000, 6000, 6000,
7000, 8000, 8000, 8000, 8000, 10000],
})
monthly["margin"] = monthly["revenue"] - monthly["cost"]
monthly["margin_pct"] = (monthly["margin"] / monthly["revenue"] * 100).round(1)
# Regional annual summary
regional = pd.DataFrame({
"region": ["North", "South", "East", "West"],
"revenue": [292000, 186000, 82000, 137000],
"margin": [131400, 74400, 36900, 61650],
"deal_count": [88, 56, 27, 42],
})
regional["margin_pct"] = (regional["margin"] / regional["revenue"] * 100).round(1)
# Product line summary
products = pd.DataFrame({
"product": [
"Enterprise Cloud Suite",
"Professional Services Bundle",
"Data Analytics Platform",
"Legacy Support Contract",
"Hardware Refresh Package",
"Security Compliance Module",
],
"revenue": [285000, 198000, 176000, 143000, 121000, 98000],
"margin_pct": [58.0, 45.0, 52.0, 38.0, 28.0, 51.0],
})
# Simulated order values
np.random.seed(42)
order_values = np.concatenate([
np.random.normal(3500, 800, 160),
np.random.normal(8500, 1200, 40),
])
order_values = order_values[order_values > 300]
Tier 1: Foundation (Recall and Recognition)
Exercise 1-1 * — Your First Line Chart
Create a line chart showing monthly revenue for the full year. Requirements:
- Figure size: 10 × 5 inches
- Line color: any blue
- Markers at each data point
- Title: "Acme Corp — Monthly Revenue 2024"
- X-axis label: "Month"
- Y-axis label: "Revenue (USD)"
- Y-axis formatted as currency (e.g., "$42K")
- Horizontal grid lines
- Save as ex1_line.png at 150 DPI
Exercise 1-2 * — Simple Bar Chart
Create a vertical bar chart comparing annual revenue by region. Requirements:
- Sort bars from tallest to shortest
- Data labels above each bar showing the value in $XK format
- Y-axis starts at zero
- Remove the top and right spines
- Title: "Annual Revenue by Region"
Exercise 1-3 * — Horizontal Bar Chart
Create a horizontal bar chart showing revenue by product (use the products DataFrame). Requirements:
- Sort ascending so the highest-revenue product is at the top
- Data labels at the right end of each bar
- Title: "Revenue by Product Line"
- X-axis formatted as currency
Exercise 1-4 — Histogram
Create a histogram of order_values. Requirements:
- 20 bins
- Add a vertical dashed line for the mean
- Add a vertical line for the median (different color)
- Add a legend explaining both lines
- Title: "Order Value Distribution"
Exercise 1-5 — Basic Scatter Plot
Create a scatter plot with cost on the x-axis and revenue on the y-axis, using the monthly DataFrame. Requirements:
- One point per month
- Title: "Monthly Cost vs. Revenue"
- Axis labels with currency formatting
- Light grid lines
Tier 2: Application (Use in Context)
Exercise 2-1 * — Multi-Line Chart with Legend
Plot all four regions (north, south, west, east) on a single line chart over the 12 months. Requirements: - Each region gets a distinct color - Legend with region names, positioned outside the chart area - Title describes all four regions - Y-axis formatted as currency - Markers on each line
Exercise 2-2 * — Grouped Bar Chart
Create a grouped bar chart comparing Q1 (months 1–3) and Q2 (months 4–6) total revenue for each region. You will need to aggregate the data first.
- Two bars per region: one for Q1, one for Q2
- Different colors for each quarter
- WoW-style annotation above each Q2 bar showing the percentage change from Q1
- Legend
Exercise 2-3 — Good Chart Checklist Audit
Take the line chart from Exercise 1-1 and systematically apply the good chart checklist from Section 14.3. For each item you are missing, add it. Document in comments which items you added and why each matters.
Exercise 2-4 * — pandas .plot() Method
Reproduce Exercise 1-1 (monthly revenue line chart) using monthly.set_index("month")["revenue"].plot() instead of the full matplotlib API. Then add an axis label and save the figure. Compare the amount of code required between the two approaches.
Exercise 2-5 — Stacked Bar Chart
Create a stacked bar chart showing monthly revenue broken into north and south (stack south on top of north). Requirements:
- Legend identifying each layer
- Total value annotated on top of each stacked bar
- Formatted y-axis
Tier 3: Analysis (Derive Insights)
Exercise 3-1 * — Revenue and Margin on One Chart (Dual Series)
Plot monthly revenue AND monthly margin on the same chart:
- Revenue as a solid blue line
- Margin as a dashed green line
- Both on the same y-axis (or try a secondary y-axis using ax.twinx())
- Legend identifying both series
Discuss in a comment: at which months does margin percentage drop even as revenue rises?
Exercise 3-2 — Annotated Line Chart
Plot the monthly revenue line chart and add: - A text annotation at the highest revenue month (December) saying "Best Month" - A text annotation at the lowest revenue month (February) saying "Lowest" - An arrow pointing from the annotation to the data point for each - A horizontal reference line at the mean revenue, labeled with its value
Exercise 3-3 — Margin Scatter Plot
Using the regional DataFrame, create a scatter plot with:
- X-axis: revenue
- Y-axis: margin_pct
- Bubble size: deal_count (scaled so bubbles are visible but not overlapping)
- Each region labeled with ax.annotate()
- A horizontal reference line for the company-average margin percentage
- Title and axis labels
Which region has the best return (margin per dollar of revenue)?
Exercise 3-4 * — 2×2 Dashboard
Create a 2×2 subplot figure containing: 1. Monthly revenue line chart (Panel 1) 2. Regional revenue bar chart (Panel 2) 3. Order value histogram (Panel 3) 4. Product revenue horizontal bar (Panel 4)
Requirements:
- Figure size: 14 × 9 inches
- Overall figure title
- Each panel has its own title and axis labels
- plt.tight_layout() applied
- Saved as ex3_dashboard.png
Exercise 3-5 — Color by Value
Modify the regional bar chart so that bars colored by margin percentage: - Green if margin_pct >= 45 - Amber if 35 <= margin_pct < 45 - Red if margin_pct < 35
Compute the colors from the data (not hardcoded by region name).
Tier 4: Synthesis (Combine Techniques)
Exercise 4-1 * — Full Business Report Page
Build a 3-panel figure (3 rows, 1 column OR another layout of your choice) telling a coherent story: - Panel 1: The trend (line chart of monthly revenue) - Panel 2: The composition (stacked bar of north/south/west/east by month or quarter) - Panel 3: The distribution (histogram of order values)
Each panel should pass the good chart checklist. Add a figure-level title and a footer with "Source: Acme Corp CRM".
Exercise 4-2 — Year-over-Year Line Comparison
Simulate a "prior year" dataset by multiplying the monthly revenue by 0.85 and subtracting a small random noise. Plot both the current year and prior year on the same line chart: - Solid line: current year (blue) - Dashed line: prior year (gray) - Shaded area between the two lines - Legend and title indicating the year-over-year growth
Exercise 4-3 — Dynamic Data Labels
Write a function add_bar_labels(ax, bars, format_str="${}K") that:
- Accepts an Axes object and a container of bars
- Adds a label above each bar formatted with the given format string
- Handles both vertical and horizontal bars automatically
Test it on the regional revenue bar chart and the product horizontal bar chart.
Exercise 4-4 — Saving at Multiple Resolutions
Take any one of your charts and save it at three different DPI settings: - 72 DPI (screen/draft) - 150 DPI (digital delivery) - 300 DPI (print-ready)
Use os.path.getsize() to check the file sizes. Document in a comment: how does DPI affect file size? Is the visual difference visible at normal screen resolution?
Tier 5: Extension (Open-Ended Challenges)
Exercise 5-1 — Automated Report Generator
Write a function generate_monthly_report(df, output_path) that:
- Accepts a monthly DataFrame in the same shape as the shared data
- Creates a 2×2 dashboard (your choice of charts)
- Saves it to output_path
- Is parameterized enough to work for any 12-month dataset with the same column structure
Test it by calling it twice: once on monthly (Jan–Dec) and once on a sliced version containing only Jan–Jun.
Exercise 5-2 — Animated Line Chart
Using matplotlib.animation.FuncAnimation, create a line chart that "draws" the monthly revenue trend one month at a time, simulating the passage of time. Save as a GIF or MP4. (Requires ffmpeg for MP4 export.)
This is a challenging exercise — consult the matplotlib animation documentation.
Exercise 5-3 — Chart Style System
Create a Python module acme_style.py that:
1. Sets plt.rcParams for font, font size, and figure DPI
2. Defines a color palette dictionary with at least 6 named colors
3. Provides a function apply_acme_style(ax) that removes top/right spines and adds a light horizontal grid
4. Provides a function currency_formatter(ax, axis="y") that applies dollar formatting
Import this module in a new script and use it to style three charts. Show that the styling is consistent across all three.
Exercise 5-4 — Pie Chart Defense
Create both a pie chart and a horizontal bar chart showing the same data: revenue by customer tier (Gold: $485,000, Silver: $298,000, Bronze: $127,000).
Write 4–6 sentences evaluating: - In what context, if any, is the pie chart the better choice? - What information does the bar chart communicate that the pie chart obscures? - Does adding data labels to the pie chart close the gap? Why or why not?
Exercise 5-5 — Custom Annotation System
Write a function highlight_outliers(ax, x_data, y_data, labels, std_threshold=1.5) that:
- Computes mean and standard deviation of y_data
- Identifies points more than std_threshold standard deviations from the mean
- Annotates those points on the chart with their label and value
- Colors the outlier points red
Apply this to the monthly revenue line chart to automatically highlight any months that were unusually high or low.
Answer Guidance
For starred exercises (*), verify your charts against these criteria:
- Does the chart answer the stated business question at a glance?
- Are title, axis labels, and legend present and descriptive?
- Does the y-axis start at zero for bar charts?
- Are colors used purposefully (not just for decoration)?
- Does plt.tight_layout() prevent label clipping?
- Is the chart saved at >= 150 DPI?
For Tier 4 and 5 exercises, there is no single correct answer. Evaluate your work against the good chart checklist and consider asking a colleague to interpret your chart without explanation — if they can answer the chart's question unaided, the design is working.