Exercises: Altair

DataField.Dev

Exercises: Altair

These exercises assume pip install altair vega_datasets. All exercises use import altair as alt and from vega_datasets import data.

Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name the four data-type shorthand codes in Altair and describe what each means.

Guidance

**Q (quantitative)** — numeric columns with meaningful magnitudes and ratios. **N (nominal)** — unordered categorical. **O (ordinal)** — ordered categorical. **T (temporal)** — dates or datetimes. Specifying the type explicitly (e.g., `x="year:T"`) is best practice because it affects Altair's scale and axis choices.

A.2 ★☆☆ | Recall

What is the Grammar of Graphics, and which books introduced and popularized it?

Guidance

The Grammar of Graphics is Leland Wilkinson's theoretical framework for decomposing every chart into data + marks + aesthetic mappings + scales + transformations + coordinates + facets. Introduced in Wilkinson's 1999 book *The Grammar of Graphics*. Popularized by Hadley Wickham through ggplot2 (2005) and his book *ggplot2: Elegant Graphics for Data Analysis* (2009). Altair implements the grammar through the Vega-Lite JSON specification.

A.3 ★★☆ | Understand

What is Vega-Lite, and how does Altair relate to it?

Guidance

Vega-Lite is a JSON-based visualization grammar developed at the University of Washington Interactive Data Lab. It specifies statistical graphics as data + marks + encodings + transforms + composition, and it is rendered by a JavaScript library. Altair is the Python binding — it generates Vega-Lite JSON specs from Python code. Altair does not render; it produces specs that Vega-Lite (in the browser) renders.

A.4 ★★☆ | Understand

Explain the difference between mark-level styling and encoding-level mapping in Altair.

Guidance

**Mark-level** styling applies uniformly to all marks. `mark_point(size=100, color="red")` makes every point large and red. **Encoding-level** mapping varies based on data. `encode(size="Population:Q", color="Region:N")` varies point size by population and color by region. Use mark-level for properties you want constant; use encoding-level for properties that should reflect the data.

A.5 ★★☆ | Analyze

Describe the chapter's threshold concept ("declaration over instruction") in your own words.

Guidance

In imperative libraries like matplotlib, you instruct the library step by step: create an axes, plot this, set that label. In declarative libraries like Altair, you describe what the chart should look like — data + marks + encodings — and the library figures out how to produce it. The shift mirrors the SQL vs. procedural-programming distinction: you specify what, not how. Once internalized, it makes complex visualizations composable from simple pieces.

A.6 ★★★ | Evaluate

When would you choose Altair over Plotly or matplotlib? Give a specific scenario.

Guidance

Altair is best for **linked-view exploration** — multiple charts where brushing one filters the others. The selection + `transform_filter` pattern makes this trivially expressive, while Plotly requires complex updatemenus and matplotlib requires custom event handling. Example scenario: a dashboard with a scatter, a histogram, and a map, where the user can brush any one chart and see the other two filter in real time. In Altair this is a dozen lines. In matplotlib it is hundreds of lines of custom code.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Load the cars dataset and create a scatter plot of Horsepower vs. Miles_per_Gallon, colored by Origin.

Guidance

import altair as alt
from vega_datasets import data

cars = data.cars()
alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
)

B.2 ★☆☆ | Apply

Extend B.1 by adding size="Cylinders:O" and a tooltip showing the car name, year, and MPG.

Guidance

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
    size="Cylinders:O",
    tooltip=["Name:N", "Year:T", "Miles_per_Gallon:Q"],
)

B.3 ★★☆ | Apply

Create a line chart of the stocks dataset (data.stocks()) showing price over date, colored by symbol.

Guidance

stocks = data.stocks()
alt.Chart(stocks).mark_line().encode(
    x="date:T",
    y="price:Q",
    color="symbol:N",
)

B.4 ★★☆ | Apply

Build a layered chart: a scatter of Horsepower vs. Miles_per_Gallon with a line showing the mean MPG by horsepower. Use the + operator.

Guidance

points = alt.Chart(cars).mark_point().encode(x="Horsepower:Q", y="Miles_per_Gallon:Q")
line = alt.Chart(cars).mark_line(color="red").encode(
    x="Horsepower:Q", y="mean(Miles_per_Gallon):Q"
)
points + line

B.5 ★★☆ | Apply

Build an interval brush on a scatter plot and use it to filter a connected histogram of Origin.

Guidance

brush = alt.selection_interval()

scatter = alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color=alt.condition(brush, "Origin:N", alt.value("lightgray")),
).add_params(brush)

hist = alt.Chart(cars).mark_bar().encode(
    x="count()",
    y="Origin:N",
    color="Origin:N",
).transform_filter(brush)

scatter | hist

B.6 ★★☆ | Apply

Facet a scatter plot by Origin using .facet(column="Origin:N").

Guidance

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
).facet(column="Origin:N")

This produces three panels side by side, one per origin.

B.7 ★★★ | Apply

Build a pair plot using .repeat() with three numeric columns from the cars dataset.

Guidance

alt.Chart(cars).mark_point().encode(
    x=alt.X(alt.repeat("column"), type="quantitative"),
    y=alt.Y(alt.repeat("row"), type="quantitative"),
    color="Origin:N",
).repeat(
    row=["Horsepower", "Acceleration", "Miles_per_Gallon"],
    column=["Horsepower", "Acceleration", "Miles_per_Gallon"],
)

B.8 ★★☆ | Apply

Use transform_filter with a string predicate to show only cars from years 1975 and later.

Guidance

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
).transform_filter("datum.Year >= datetime(1975, 0, 1)")

The `datum.Year` refers to the current row's Year column. The `datetime(...)` expression creates a date for comparison.

B.9 ★★☆ | Apply

Create a new column efficiency_tier (Efficient if MPG > 25, else Thirsty) using transform_calculate, then bar-chart the counts.

Guidance

alt.Chart(cars).mark_bar().encode(
    x="efficiency_tier:N",
    y="count():Q",
).transform_calculate(
    efficiency_tier="datum.Miles_per_Gallon > 25 ? 'Efficient' : 'Thirsty'"
)

B.10 ★★★ | Create

Build a three-panel linked-view climate dashboard: a temperature line chart (with brush), a CO2 line chart filtered by the brush, and a CO2-vs-temperature scatter colored by era filtered by the same brush.

Guidance

brush = alt.selection_interval(encodings=["x"])

temp = alt.Chart(climate).mark_line(color="red").encode(
    x=alt.X("year:T"),
    y="temperature_anomaly:Q",
).properties(width=600, height=200).add_params(brush)

co2 = alt.Chart(climate).mark_line(color="blue").encode(
    x=alt.X("year:T"),
    y="co2_ppm:Q",
).properties(width=600, height=200).transform_filter(brush)

scatter = alt.Chart(climate).mark_point().encode(
    x="co2_ppm:Q",
    y="temperature_anomaly:Q",
    color="era:N",
).properties(width=600, height=300).transform_filter(brush)

temp & co2 & scatter

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

Take a complex chart from earlier in this book (e.g., Chapter 18's relational + categorical climate plot) and express it in Altair's grammar. Decompose it into data, marks, encodings, transforms, and composition.

Guidance

Example decomposition: *A scatter of CO2 vs. temperature with a regression line, colored by era, faceted by era*. In Altair: data = climate; marks = point + line; encodings = x=co2_ppm:Q, y=temperature_anomaly:Q, color=era:N; transforms = regression fit via transform_regression; composition = `(points + regression_line).facet(column="era:N")`. The grammar maps cleanly onto the library; every matplotlib chart has an Altair equivalent, often more concise.

C.2 ★★★ | Evaluate

You build an Altair chart with 20,000 rows and get a MaxRowsError. List three ways to fix it, and explain when each is appropriate.

Guidance

(1) `alt.data_transformers.disable_max_rows()` — removes the cap globally. Fine for quick exploration; use sparingly in production because the JSON spec becomes very large. (2) Enable VegaFusion: `alt.data_transformers.enable("vegafusion")`. This offloads data processing to Rust/Arrow and handles millions of rows. Best for production use. (3) Pre-aggregate or sample the data before plotting — compute summaries in pandas, then plot the summary. Best when the raw rows are not the unit of visualization (e.g., you want mean per group, not individual points).

C.3 ★★★ | Create

Design an Altair chart with a clickable legend that dims all non-selected categories. Use bind="legend" on a point selection.

Guidance

select = alt.selection_point(fields=["Origin"], bind="legend")

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
    opacity=alt.condition(select, alt.value(1.0), alt.value(0.1)),
).add_params(select)

Clicking a legend entry sets the selection to that origin; the opacity encoding then dims all other points.

C.4 ★★★ | Evaluate

The chapter argues that Altair's compositional operators (+, |, &, .facet(), .repeat()) make complex charts simpler to express than in matplotlib or Plotly. Test this claim: take a complex chart from an online gallery (NYT, FiveThirtyEight, The Pudding) and outline how you would build it in Altair vs. matplotlib. Which is shorter?

Guidance

Most linked-view charts are dramatically shorter in Altair. A scrollytelling chart (where content changes as the reader scrolls) is *not* shorter in Altair because Altair does not have a native scrollytelling primitive. A chart with dense typographic annotations (like a New Yorker chart) is also not shorter in Altair because text placement is not grammar-of-graphics material. The Altair advantage is strongest for linked views, composition, and declarative interactivity. Where those dominate, Altair wins. Where precise typographic control or custom marks dominate, matplotlib or D3 wins.

These exercises exercise Altair's essential features: the encoding pattern, composition operators, selections, and transforms. Chapter 23 moves from interactive libraries in general to geospatial visualization specifically.