Chapter 14 Exercises: Introduction to Data Visualization with matplotlib

These exercises are organized into five tiers of increasing complexity. Exercises marked with a star (*) are recommended as minimum completion for the chapter.


Setup: Shared Data

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Monthly revenue data
monthly = pd.DataFrame({
    "month": ["Jan","Feb","Mar","Apr","May","Jun",
              "Jul","Aug","Sep","Oct","Nov","Dec"],
    "revenue": [42000, 38000, 51000, 55000, 47000, 60000,
                63000, 58000, 67000, 71000, 65000, 79000],
    "cost":    [17000, 15500, 21000, 22500, 19000, 24000,
                25500, 23500, 27000, 29000, 26500, 32000],
    "north":   [18000, 15500, 21000, 24000, 19000, 26000,
                27000, 23000, 28000, 31000, 27000, 33000],
    "south":   [11000, 10000, 13000, 14000, 12000, 16000,
                17000, 15000, 18000, 20000, 18000, 22000],
    "west":    [8000,  8500,  11000, 11000, 10000, 12000,
                12000, 12000, 13000, 12000, 12000, 14000],
    "east":    [5000,  4000,   6000,  6000,  6000,  6000,
                7000,  8000,   8000,  8000,  8000,  10000],
})
monthly["margin"] = monthly["revenue"] - monthly["cost"]
monthly["margin_pct"] = (monthly["margin"] / monthly["revenue"] * 100).round(1)

# Regional annual summary
regional = pd.DataFrame({
    "region":     ["North", "South", "East", "West"],
    "revenue":    [292000, 186000, 82000, 137000],
    "margin":     [131400,  74400, 36900, 61650],
    "deal_count": [88, 56, 27, 42],
})
regional["margin_pct"] = (regional["margin"] / regional["revenue"] * 100).round(1)

# Product line summary
products = pd.DataFrame({
    "product": [
        "Enterprise Cloud Suite",
        "Professional Services Bundle",
        "Data Analytics Platform",
        "Legacy Support Contract",
        "Hardware Refresh Package",
        "Security Compliance Module",
    ],
    "revenue": [285000, 198000, 176000, 143000, 121000, 98000],
    "margin_pct": [58.0, 45.0, 52.0, 38.0, 28.0, 51.0],
})

# Simulated order values
np.random.seed(42)
order_values = np.concatenate([
    np.random.normal(3500, 800, 160),
    np.random.normal(8500, 1200, 40),
])
order_values = order_values[order_values > 300]

Tier 1: Foundation (Recall and Recognition)

Exercise 1-1 * — Your First Line Chart

Create a line chart showing monthly revenue for the full year. Requirements: - Figure size: 10 × 5 inches - Line color: any blue - Markers at each data point - Title: "Acme Corp — Monthly Revenue 2024" - X-axis label: "Month" - Y-axis label: "Revenue (USD)" - Y-axis formatted as currency (e.g., "$42K") - Horizontal grid lines - Save as ex1_line.png at 150 DPI


Exercise 1-2 * — Simple Bar Chart

Create a vertical bar chart comparing annual revenue by region. Requirements: - Sort bars from tallest to shortest - Data labels above each bar showing the value in $XK format - Y-axis starts at zero - Remove the top and right spines - Title: "Annual Revenue by Region"


Exercise 1-3 * — Horizontal Bar Chart

Create a horizontal bar chart showing revenue by product (use the products DataFrame). Requirements: - Sort ascending so the highest-revenue product is at the top - Data labels at the right end of each bar - Title: "Revenue by Product Line" - X-axis formatted as currency


Exercise 1-4 — Histogram

Create a histogram of order_values. Requirements: - 20 bins - Add a vertical dashed line for the mean - Add a vertical line for the median (different color) - Add a legend explaining both lines - Title: "Order Value Distribution"


Exercise 1-5 — Basic Scatter Plot

Create a scatter plot with cost on the x-axis and revenue on the y-axis, using the monthly DataFrame. Requirements: - One point per month - Title: "Monthly Cost vs. Revenue" - Axis labels with currency formatting - Light grid lines


Tier 2: Application (Use in Context)

Exercise 2-1 * — Multi-Line Chart with Legend

Plot all four regions (north, south, west, east) on a single line chart over the 12 months. Requirements: - Each region gets a distinct color - Legend with region names, positioned outside the chart area - Title describes all four regions - Y-axis formatted as currency - Markers on each line


Exercise 2-2 * — Grouped Bar Chart

Create a grouped bar chart comparing Q1 (months 1–3) and Q2 (months 4–6) total revenue for each region. You will need to aggregate the data first.

  • Two bars per region: one for Q1, one for Q2
  • Different colors for each quarter
  • WoW-style annotation above each Q2 bar showing the percentage change from Q1
  • Legend

Exercise 2-3 — Good Chart Checklist Audit

Take the line chart from Exercise 1-1 and systematically apply the good chart checklist from Section 14.3. For each item you are missing, add it. Document in comments which items you added and why each matters.


Exercise 2-4 * — pandas .plot() Method

Reproduce Exercise 1-1 (monthly revenue line chart) using monthly.set_index("month")["revenue"].plot() instead of the full matplotlib API. Then add an axis label and save the figure. Compare the amount of code required between the two approaches.


Exercise 2-5 — Stacked Bar Chart

Create a stacked bar chart showing monthly revenue broken into north and south (stack south on top of north). Requirements: - Legend identifying each layer - Total value annotated on top of each stacked bar - Formatted y-axis


Tier 3: Analysis (Derive Insights)

Exercise 3-1 * — Revenue and Margin on One Chart (Dual Series)

Plot monthly revenue AND monthly margin on the same chart: - Revenue as a solid blue line - Margin as a dashed green line - Both on the same y-axis (or try a secondary y-axis using ax.twinx()) - Legend identifying both series

Discuss in a comment: at which months does margin percentage drop even as revenue rises?


Exercise 3-2 — Annotated Line Chart

Plot the monthly revenue line chart and add: - A text annotation at the highest revenue month (December) saying "Best Month" - A text annotation at the lowest revenue month (February) saying "Lowest" - An arrow pointing from the annotation to the data point for each - A horizontal reference line at the mean revenue, labeled with its value


Exercise 3-3 — Margin Scatter Plot

Using the regional DataFrame, create a scatter plot with: - X-axis: revenue - Y-axis: margin_pct - Bubble size: deal_count (scaled so bubbles are visible but not overlapping) - Each region labeled with ax.annotate() - A horizontal reference line for the company-average margin percentage - Title and axis labels

Which region has the best return (margin per dollar of revenue)?


Exercise 3-4 * — 2×2 Dashboard

Create a 2×2 subplot figure containing: 1. Monthly revenue line chart (Panel 1) 2. Regional revenue bar chart (Panel 2) 3. Order value histogram (Panel 3) 4. Product revenue horizontal bar (Panel 4)

Requirements: - Figure size: 14 × 9 inches - Overall figure title - Each panel has its own title and axis labels - plt.tight_layout() applied - Saved as ex3_dashboard.png


Exercise 3-5 — Color by Value

Modify the regional bar chart so that bars colored by margin percentage: - Green if margin_pct >= 45 - Amber if 35 <= margin_pct < 45 - Red if margin_pct < 35

Compute the colors from the data (not hardcoded by region name).


Tier 4: Synthesis (Combine Techniques)

Exercise 4-1 * — Full Business Report Page

Build a 3-panel figure (3 rows, 1 column OR another layout of your choice) telling a coherent story: - Panel 1: The trend (line chart of monthly revenue) - Panel 2: The composition (stacked bar of north/south/west/east by month or quarter) - Panel 3: The distribution (histogram of order values)

Each panel should pass the good chart checklist. Add a figure-level title and a footer with "Source: Acme Corp CRM".


Exercise 4-2 — Year-over-Year Line Comparison

Simulate a "prior year" dataset by multiplying the monthly revenue by 0.85 and subtracting a small random noise. Plot both the current year and prior year on the same line chart: - Solid line: current year (blue) - Dashed line: prior year (gray) - Shaded area between the two lines - Legend and title indicating the year-over-year growth


Exercise 4-3 — Dynamic Data Labels

Write a function add_bar_labels(ax, bars, format_str="${}K") that: - Accepts an Axes object and a container of bars - Adds a label above each bar formatted with the given format string - Handles both vertical and horizontal bars automatically

Test it on the regional revenue bar chart and the product horizontal bar chart.


Exercise 4-4 — Saving at Multiple Resolutions

Take any one of your charts and save it at three different DPI settings: - 72 DPI (screen/draft) - 150 DPI (digital delivery) - 300 DPI (print-ready)

Use os.path.getsize() to check the file sizes. Document in a comment: how does DPI affect file size? Is the visual difference visible at normal screen resolution?


Tier 5: Extension (Open-Ended Challenges)

Exercise 5-1 — Automated Report Generator

Write a function generate_monthly_report(df, output_path) that: - Accepts a monthly DataFrame in the same shape as the shared data - Creates a 2×2 dashboard (your choice of charts) - Saves it to output_path - Is parameterized enough to work for any 12-month dataset with the same column structure

Test it by calling it twice: once on monthly (Jan–Dec) and once on a sliced version containing only Jan–Jun.


Exercise 5-2 — Animated Line Chart

Using matplotlib.animation.FuncAnimation, create a line chart that "draws" the monthly revenue trend one month at a time, simulating the passage of time. Save as a GIF or MP4. (Requires ffmpeg for MP4 export.)

This is a challenging exercise — consult the matplotlib animation documentation.


Exercise 5-3 — Chart Style System

Create a Python module acme_style.py that: 1. Sets plt.rcParams for font, font size, and figure DPI 2. Defines a color palette dictionary with at least 6 named colors 3. Provides a function apply_acme_style(ax) that removes top/right spines and adds a light horizontal grid 4. Provides a function currency_formatter(ax, axis="y") that applies dollar formatting

Import this module in a new script and use it to style three charts. Show that the styling is consistent across all three.


Exercise 5-4 — Pie Chart Defense

Create both a pie chart and a horizontal bar chart showing the same data: revenue by customer tier (Gold: $485,000, Silver: $298,000, Bronze: $127,000).

Write 4–6 sentences evaluating: - In what context, if any, is the pie chart the better choice? - What information does the bar chart communicate that the pie chart obscures? - Does adding data labels to the pie chart close the gap? Why or why not?


Exercise 5-5 — Custom Annotation System

Write a function highlight_outliers(ax, x_data, y_data, labels, std_threshold=1.5) that: - Computes mean and standard deviation of y_data - Identifies points more than std_threshold standard deviations from the mean - Annotates those points on the chart with their label and value - Colors the outlier points red

Apply this to the monthly revenue line chart to automatically highlight any months that were unusually high or low.


Answer Guidance

For starred exercises (*), verify your charts against these criteria: - Does the chart answer the stated business question at a glance? - Are title, axis labels, and legend present and descriptive? - Does the y-axis start at zero for bar charts? - Are colors used purposefully (not just for decoration)? - Does plt.tight_layout() prevent label clipping? - Is the chart saved at >= 150 DPI?

For Tier 4 and 5 exercises, there is no single correct answer. Evaluate your work against the good chart checklist and consider asking a colleague to interpret your chart without explanation — if they can answer the chart's question unaided, the design is working.