Exercises: Plotly Express

These exercises require pip install plotly pandas kaleido. All exercises assume import plotly.express as px, import plotly.io as pio, and import pandas as pd. Plotly's built-in datasets are accessed via px.data.* (e.g., px.data.gapminder(), px.data.iris(), px.data.tips()).


Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name the three main components of a Plotly Figure and describe what each contains.

Guidance **Data (traces)**: a list of visual layers (Scatter, Bar, Heatmap, etc.), each with its own data arrays and style properties. **Layout**: everything that is not a trace — title, axis labels, legend, background, margins, annotations, interactive controls. **Frames** (optional): for animation, a list of data snapshots that the chart transitions through.

A.2 ★☆☆ | Recall

What is the difference between Plotly Express (px) and Plotly Graph Objects (go)? When should you use each?

Guidance Plotly Express is a high-level wrapper that produces common chart types with a single function call — similar in philosophy to seaborn. Graph Objects is the lower-level API where you construct Figure and trace objects manually. Use Plotly Express for most standard charts (faster to write, consistent API); use Graph Objects when you need mixed trace types, custom subplots, interactive buttons/dropdowns, or property-level control beyond what Express exposes.

A.3 ★★☆ | Understand

Explain why Plotly produces large output files compared to matplotlib. What strategies reduce the file size?

Guidance A Plotly HTML file includes the full plotly.js library (several MB) plus the figure's JSON spec. Matplotlib produces a raster PNG or vector SVG without any runtime dependency. **Reduction strategies**: (1) `include_plotlyjs="cdn"` references plotly.js from a CDN instead of embedding it; (2) `include_plotlyjs=False` omits the library entirely (requires it to be loaded elsewhere); (3) use `pio.write_image` + kaleido to produce a static PNG/SVG for archiving or print; (4) pre-aggregate large datasets before plotting to shrink the JSON payload.

A.4 ★★☆ | Understand

Explain how facet_col differs from animation_frame. When should you use each?

Guidance `facet_col` produces small multiples — one panel per value of the faceting variable, all visible simultaneously. `animation_frame` produces an animated chart — one "frame" per value, with a play button and slider. Use `facet_col` when the reader needs to compare all states at once (side-by-side); use `animation_frame` when time (or another ordered dimension) is itself the story and motion communicates the change. Faceting scales poorly beyond ~12 panels; animation works better for many frames.

A.5 ★★☆ | Analyze

The chapter argues that "interactive is not a gimmick." Give a specific example where interactivity changes what a chart can communicate.

Guidance Example: a time-series chart of global temperature anomalies from 1880-2024. A static version shows a line. The interactive version with a range slider lets the reader isolate any period (e.g., 1970-2024 to see the modern acceleration) without requiring multiple pre-made zoomed views. A hover tooltip gives exact values for any year. Together, the one interactive chart answers questions that would have required five or six static charts — it implements Shneiderman's mantra (overview + zoom + detail) in a single figure.

A.6 ★★★ | Evaluate

Under what circumstances would you prefer matplotlib or seaborn over Plotly Express for a given task?

Guidance Prefer matplotlib/seaborn when: (1) the output is for print (PDF, publication); (2) you need fine typographic control; (3) you need rich statistical overlays (KDE bandwidth control, confidence bands, bootstrap intervals); (4) file size or accessibility matters; (5) the output must be archival-stable across versions; (6) the audience reads in an environment where interactivity is useless (slides with frozen screenshots, printed reports). Prefer Plotly Express when the reader will interact with the chart — dashboards, web pages, exploratory notebooks shared via HTML.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Load the gapminder dataset and create a scatter plot of GDP per capita vs. life expectancy for the year 2007, colored by continent.

Guidance
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.scatter(
    gapminder.query("year == 2007"),
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    log_x=True,
)
fig.show()

B.2 ★☆☆ | Apply

Extend B.1 by adding bubble size (population), hover name (country), and a title.

Guidance
fig = px.scatter(
    gapminder.query("year == 2007"),
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    size="pop",
    hover_name="country",
    log_x=True,
    size_max=60,
    title="Life Expectancy vs. GDP per Capita, 2007",
)
fig.show()

B.3 ★★☆ | Apply

Load the tips dataset (px.data.tips()) and create a faceted box plot of total_bill by day, with columns for smoker and rows for time (Lunch/Dinner).

Guidance
tips = px.data.tips()
fig = px.box(
    tips,
    x="day",
    y="total_bill",
    facet_col="smoker",
    facet_row="time",
    color="day",
    points="all",
)
fig.show()
The `points="all"` parameter shows individual data points alongside the box, which is the strip+box alternative to dynamite plots from Chapter 18.

B.4 ★★☆ | Apply

Create an animated bubble chart of the full gapminder dataset with animation_frame="year", fixed axis ranges, and animation_group="country".

Guidance
fig = px.scatter(
    gapminder,
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    size="pop",
    animation_frame="year",
    animation_group="country",
    hover_name="country",
    log_x=True,
    size_max=60,
    range_x=[100, 100000],
    range_y=[20, 90],
)
fig.show()
The fixed ranges prevent disorientation as the data shifts. The `animation_group` keeps each country as a single object moving across frames.

B.5 ★★☆ | Apply

Build a time-series line chart with a range slider for the US-only subset of gapminder's lifeExp.

Guidance
us = gapminder.query("country == 'United States'")
fig = px.line(us, x="year", y="lifeExp", markers=True, title="US Life Expectancy, 1952-2007")
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

B.6 ★★☆ | Apply

Customize the hover on B.1's chart: show country name prominently, with continent, population (comma-separated), and GDP (no decimals).

Guidance
fig = px.scatter(
    gapminder.query("year == 2007"),
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    log_x=True,
    hover_name="country",
    hover_data={
        "continent": True,
        "pop": ":,",
        "gdpPercap": ":.0f",
        "lifeExp": ":.1f",
    },
)
fig.show()

B.7 ★★★ | Apply

Load the iris dataset with px.data.iris(). Create a scatter matrix (px.scatter_matrix) of the four numeric columns, colored by species.

Guidance
iris = px.data.iris()
fig = px.scatter_matrix(
    iris,
    dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
    color="species",
    title="Iris Scatter Matrix",
)
fig.show()
This is Plotly's equivalent of seaborn's pair plot.

B.8 ★★☆ | Apply

Apply the "simple_white" template to a scatter plot of your choice and compare the appearance to the default template.

Guidance
fig = px.scatter(gapminder.query("year == 2007"), x="gdpPercap", y="lifeExp", color="continent", log_x=True, template="simple_white")
fig.show()
Simple White has a clean white background, thin axes, no gridlines, and minimal chrome. It is a good default for publication or screenshot-to-slides use.

B.9 ★★★ | Apply

Export the chart from B.2 to (a) an interactive HTML file and (b) a static PNG at 2× scale.

Guidance
import plotly.io as pio

# Interactive HTML
pio.write_html(fig, "gapminder_2007.html", include_plotlyjs="cdn")

# Static PNG (requires kaleido: pip install kaleido)
pio.write_image(fig, "gapminder_2007.png", width=1200, height=800, scale=2)

B.10 ★★★ | Create

Build a treemap of the gapminder 2007 data showing population hierarchy: continent → country. Size cells by population and color by life expectancy.

Guidance
fig = px.treemap(
    gapminder.query("year == 2007"),
    path=[px.Constant("World"), "continent", "country"],
    values="pop",
    color="lifeExp",
    color_continuous_scale="RdYlGn",
    title="World Population Hierarchy, 2007",
)
fig.show()
The `px.Constant("World")` adds a root node so the whole tree has a single parent. The color scale encodes life expectancy (green = high, red = low).

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

Take the climate dataset used throughout this textbook. Build an interactive Plotly Express chart of temperature anomaly over time with (1) hover showing year, anomaly, CO2, and era, and (2) a range slider. Compare the result to the matplotlib version from Chapter 12.

Guidance The Plotly version adds per-point inspection (hover) and time-range filtering (slider) that the matplotlib version cannot replicate without additional code. The matplotlib version has better typographic polish. For a shared notebook or dashboard, the Plotly version is more useful; for a printed paper, the matplotlib version is more appropriate. Both are valid for their contexts.

C.2 ★★★ | Evaluate

A colleague sends you a Plotly chart with 200,000 data points that takes 15 seconds to render in the browser. Suggest two fixes.

Guidance Fix 1: Switch the trace type from `scatter` to `scattergl` — the WebGL-accelerated version handles orders of magnitude more points. Fix 2: Pre-aggregate the data before plotting. Options include binning (histograms, density heatmaps), sampling (random or stratified subset), or summarization (means per category). For 200k points, switching to `scattergl` alone often solves the problem; for datasets larger than that, aggregation is unavoidable.

C.3 ★★★ | Create

Build a Plotly Express animated bubble chart using the climate dataset: CO2 on x, temperature on y, sea level as bubble size, era as color, animated by decade. Fix the axis ranges to prevent rescaling.

Guidance
climate["decade"] = (climate["year"] // 10) * 10
fig = px.scatter(
    climate,
    x="co2_ppm",
    y="temperature_anomaly",
    size="sea_level_mm",
    color="era",
    animation_frame="decade",
    animation_group="year",
    hover_name="year",
    range_x=[270, 430],
    range_y=[-0.5, 1.5],
    size_max=25,
    title="Climate Indicators Across Decades",
)
fig.show()

C.4 ★★★ | Evaluate

The chapter argues that "interactive is not a gimmick" — that a good interactive chart can replace a static dashboard. Find a dashboard (online or in a report) that uses three or more static charts, and design a single Plotly Express chart that captures the same information. Describe what affordances (hover, zoom, filter) substitute for the static panels.

Guidance A typical example: an analytics dashboard with (1) a line chart of traffic over time, (2) a bar chart breaking down traffic by source, (3) a table of top pages. The Plotly Express replacement might be a single faceted line chart with `color="source"` (replacing the bar chart), a range slider (replacing multiple zoomed views), and a rich hover showing the top page per day (replacing the table). One chart, three levels of information. The substitution is not always cleaner, but when it is, the interactive single-chart version reduces visual clutter and keeps the reader in one place.

These exercises exercise Plotly Express's essential chart types and interactive features. Chapter 21 takes you deeper into the Graph Objects API for custom interactive controls and complex layouts.