32 min read

> "A statistical graphic is a mapping from data to a visual representation — no more, no less."

Learning Objectives

  • Explain the Grammar of Graphics as a theoretical framework: data, marks, encodings, scales, transforms, selections, composition
  • Create Altair charts using the alt.Chart(data).mark_*().encode() pattern
  • Apply encoding channels: x, y, color, size, shape, opacity, row, column, and their data types (Q, N, O, T)
  • Add interactivity with selections: selection_point, selection_interval, and conditional encoding
  • Compose charts using layering (+), concatenation (|, &), faceting, and repetition
  • Transform data within the chart specification: filter, calculate, aggregate, fold, window
  • Compare Altair's declarative approach with matplotlib's imperative and Plotly's semi-declarative approaches

Chapter 22: Altair — Declarative Visualization and the Grammar of Graphics

"A statistical graphic is a mapping from data to a visual representation — no more, no less." — Leland Wilkinson, The Grammar of Graphics (1999)


22.1 A Different Philosophy

Every library in this textbook so far has treated a chart as something you assemble piece by piece. In matplotlib, you build an Axes and call plot methods on it. In seaborn, you call a function that wraps matplotlib. In Plotly Express, you call a function that builds traces. Each library has its own conventions and its own idioms, but the underlying model is the same: you are producing a chart by telling the library which elements to draw.

Altair takes a fundamentally different approach. An Altair chart is not a sequence of instructions; it is a specification. You do not tell Altair to draw anything. You describe what the visualization should look like — "this data, mapped as points, with x from this column and y from that column, colored by this third column" — and Altair (or more precisely, the Vega-Lite renderer underneath Altair) figures out how to make it happen. The Python code produces a JSON specification, which is then rendered by a JavaScript library that understands Vega-Lite. You never specify drawing operations; you specify mappings.

This philosophy comes from the Grammar of Graphics, a theoretical framework for visualization that was articulated by Leland Wilkinson in his 1999 book of the same name. Wilkinson argued that every statistical graphic can be decomposed into a small set of components: data, aesthetic mappings (encodings), scales, geometric objects (marks), statistical transformations, and coordinate systems. Once you have these building blocks, any chart is a composition of them. A scatter plot is data + point marks + x/y mappings. A histogram is data + bar marks + x mapping + a binning transformation. A faceted line chart is data + line marks + x/y/color/facet mappings. The grammar is not a chart library; it is a theory of what charts are.

Wilkinson's book was academic and dense. The ideas spread widely through the work of Hadley Wickham, who used the Grammar of Graphics as the foundation for ggplot2, the most influential visualization library in the R community. ggplot2 was released in 2005 and became the de facto standard for statistical visualization in R, supplanting R's built-in plotting functions almost entirely. Wickham wrote several books and papers articulating the Grammar of Graphics for a practitioner audience, and ggplot2's success gave the grammar popular recognition.

Python did not have a ggplot2 equivalent for years. Several attempts were made — there was a ggplot Python package that tried to mimic the R version — but none of them gained traction, partly because they were tied to matplotlib's rendering model and lost the grammatical purity. The missing piece was a rendering backend that could take a pure grammar-of-graphics specification and render it without being constrained by matplotlib's conventions.

That backend arrived with Vega-Lite, developed at the University of Washington Interactive Data Lab starting in 2016. Vega-Lite is a JSON-based specification for statistical graphics that is essentially the grammar of graphics made executable. A Vega-Lite spec is a JSON document that describes the data, marks, encodings, transforms, and compositions; a JavaScript library reads the spec and produces the interactive chart. Vega-Lite is grammar-of-graphics through and through: every visualization is a composition of the same primitives.

Altair is the Python binding for Vega-Lite. It was created by Jake VanderPlas and Brian Granger in 2016-2017 as a way to bring Vega-Lite's grammatical approach into Python. Altair does not do any rendering itself — it generates Vega-Lite JSON specs, and the specs are rendered by the Vega-Lite JavaScript library in the browser (or in Jupyter, via an extension). The separation is similar to Plotly's (Python describes, JavaScript renders), but the philosophical approach is different. Plotly uses its own JSON schema that mirrors its trace-based API. Altair uses Vega-Lite's JSON schema that implements the grammar of graphics directly.

The chapter's threshold concept is that Altair is declarative: you describe the chart, not how to draw it. Once you internalize this, complex visualizations become compositions of simple pieces, the same way a complex SQL query is a composition of simple clauses. Layering two charts is a + operator. Concatenating two charts horizontally is a | operator. Faceting is a .facet() call. Selections and conditional encoding add interactivity with a single clause. The compositionality is the point: each piece is simple, and the power comes from combining them.

22.2 The Grammar of Graphics in Brief

Wilkinson's grammar decomposes a chart into components. The short version, adapted from Wickham's layered grammar:

  1. Data: the dataset being visualized, typically a DataFrame.
  2. Aesthetic mappings (encodings): the mapping from data columns to visual channels (x, y, color, size, shape, opacity, etc.).
  3. Geometric objects (marks): the visual primitives used to represent the data — points, lines, bars, areas, rectangles, ticks, text, etc.
  4. Scales: the transformations from data values to visual values (e.g., a linear scale from data range [0, 100] to pixel range [0, 400]).
  5. Statistical transformations: aggregations or summaries applied to the data before mapping (binning for histograms, counting for bar charts, smoothing for regression lines).
  6. Coordinate system: Cartesian, polar, geographic, etc.
  7. Facets (small multiples): the grouping of the chart into panels by one or more variables.

Every chart you have seen in this book can be expressed in these terms. A scatter plot is data + point marks + x/y encodings + linear scales + Cartesian coordinates. A histogram is data + bar marks + x encoding + a binning transformation + linear scales. A faceted line chart is data + line marks + x/y/facet encodings + the same scale repeated per panel. The grammar is the common language behind all of them.

Altair exposes the grammar directly. Every Altair chart is constructed as alt.Chart(data).mark_*().encode(...), where:

  • alt.Chart(data) binds the data to the chart.
  • .mark_*() specifies the mark type (mark_point, mark_line, mark_bar, etc.).
  • .encode(...) specifies the aesthetic mappings.

Additional methods add scales, transforms, interactivity, and composition. The order matters in places but not others, and most chart construction fits into a clear pipeline: data → mark → encode → customize → compose.

22.3 Setup and the Basic Pattern

Altair is installed with pip install altair vega_datasets. The vega_datasets package provides example datasets analogous to seaborn's built-in datasets — iris, cars, stocks, gapminder. These datasets are used throughout the Altair documentation and make it easy to run examples without setting up your own data.

The basic pattern:

import altair as alt
from vega_datasets import data

cars = data.cars()

chart = alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
)
chart

This creates a scatter plot of horsepower vs. miles per gallon, colored by origin (US, Japan, Europe). Returning the chart variable in a Jupyter cell displays it inline. The chart is fully interactive — you can hover over points to see values, and clicking legend entries isolates categories.

The :Q and :N suffixes are data type shorthand. They tell Altair what kind of data each column contains:

  • :Q (quantitative) — a numeric column for which distances and ratios are meaningful. Horsepower, miles per gallon, GDP, temperature.
  • :N (nominal) — a categorical column with no natural order. Country, species, product line.
  • :O (ordinal) — a categorical column with a natural order. Rating (low/medium/high), quarter (Q1/Q2/Q3/Q4), education level.
  • :T (temporal) — a date or datetime column.

The data type affects Altair's rendering decisions. Quantitative columns get continuous scales; nominal columns get categorical scales; temporal columns get time-formatted axes; ordinal columns get discrete but ordered scales. Specifying the data type is technically optional — Altair tries to guess — but explicit types are best practice and prevent subtle bugs (an accidentally-quantitative category, for example).

The encoding channels (x, y, color, etc.) can also be specified as alt.X("column_name:Q", ...) for more detailed configuration:

chart = alt.Chart(cars).mark_point().encode(
    x=alt.X("Horsepower:Q", title="Horsepower", scale=alt.Scale(zero=False)),
    y=alt.Y("Miles_per_Gallon:Q", title="MPG", axis=alt.Axis(format=".0f")),
    color=alt.Color("Origin:N", title="Country of origin"),
)

The alt.X, alt.Y, alt.Color wrappers give you access to scale, axis, and other properties that the bare-string shorthand does not expose. For simple charts, the shorthand is enough; for anything that needs custom formatting, the full objects are used.

22.4 Marks: The Visual Primitives

A mark is the type of visual element used to represent data. Altair has over a dozen built-in marks, each corresponding to a different chart type:

  • mark_point() — scatter plot markers.
  • mark_circle() and mark_square() — alternative marker shapes with simpler styling.
  • mark_line() — connected line segments, usually for time series.
  • mark_area() — filled area below a line, for stacked or emphasized series.
  • mark_bar() — bars, horizontal or vertical depending on encoding.
  • mark_rect() — rectangles, used for heatmaps and density plots.
  • mark_tick() — small ticks, used for strip plots or 1D distributions.
  • mark_rule() — reference lines (horizontal, vertical, or at any angle).
  • mark_text() — text annotations or direct labels.
  • mark_geoshape() — geographic shapes for choropleth and topological maps.
  • mark_boxplot() — box plots (a composite mark that produces multiple underlying elements).
  • mark_errorbar() and mark_errorband() — uncertainty displays.

Each mark takes optional keyword arguments for static styling that does not depend on data. For example, mark_point(size=60, filled=True, color="steelblue") sets the marker size, fill, and color for all points. Dynamic styling (varying by data) goes in the .encode() call.

The distinction between mark-level styling and encoding-level mapping is fundamental to the grammar. A mark-level property applies uniformly to all marks; an encoding-level mapping varies based on data. Mixing them up is a common beginner mistake. If you want all points to be blue, mark_point(color="blue"). If you want points colored by a column, encode(color="column:N").

22.5 Encoding Channels

The encoding channels are the visual properties that can be mapped from data. Altair supports all the standard ones:

  • x, y — position along the horizontal and vertical axes.
  • color — fill or stroke color, depending on the mark.
  • size — marker area or line thickness.
  • shape — marker shape (circle, square, diamond, etc.), for point marks.
  • opacity — transparency.
  • order — z-order or drawing order (for lines and stacked areas).
  • text — the string to display (for text marks).
  • tooltip — the data shown on hover.
  • href — a URL to navigate to on click.
  • row, column — facet dimensions (for small multiples).
  • xError, yError, xError2, yError2 — error bar endpoints.
  • x2, y2 — secondary position for extent marks (e.g., rule and bar ranges).

A typical multi-encoding chart:

alt.Chart(cars).mark_circle().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
    size="Cylinders:O",
    opacity=alt.value(0.7),
    tooltip=["Name:N", "Year:T", "Horsepower:Q", "Miles_per_Gallon:Q"],
)

This encodes six things: horsepower (x), MPG (y), origin (color), cylinders (size), constant opacity (via alt.value(0.7)), and a tooltip listing four columns. The alt.value(0.7) wrapper sets a constant value rather than mapping from data — useful for properties you want to fix without making them mark-level.

The tooltip channel is worth highlighting. Unlike Plotly, where you configure hover via templates, Altair's tooltip is just another encoding. You list the columns you want to show, optionally with format strings, and Altair produces the tooltip automatically. This is a good example of the grammar-of-graphics approach: tooltips are not a special feature, they are an encoding like any other.

22.6 Scales and Axes

Scales transform data values into visual values. By default, Altair chooses reasonable scales based on the data type: linear for quantitative, categorical for nominal, time for temporal. For customization, use alt.Scale(...) within an encoding channel:

alt.Chart(cars).mark_point().encode(
    x=alt.X("Horsepower:Q", scale=alt.Scale(type="log", domain=[50, 300])),
    y=alt.Y("Miles_per_Gallon:Q", scale=alt.Scale(zero=False)),
)

The type="log" argument makes the x-axis logarithmic. The domain=[50, 300] sets an explicit range. The zero=False on the y-axis prevents Altair from including zero in the scale (which the default does for quantitative axes).

Axis customization uses alt.Axis(...):

alt.Chart(cars).mark_point().encode(
    x=alt.X("Horsepower:Q", axis=alt.Axis(title="Horsepower (HP)", format=".0f", tickCount=6)),
    y=alt.Y("Miles_per_Gallon:Q", axis=alt.Axis(title="Fuel economy (MPG)", grid=True)),
)

The alt.Axis object controls title, format, grid, tick count, tick values, label angle, and many other properties. The format strings are the same d3-format strings that Plotly uses (.0f, .2%, $,.0f, etc.), so if you know one library's format conventions, the other is mostly familiar.

Color scales use alt.Scale(scheme=...) with a color-scheme name:

alt.Chart(cars).mark_point().encode(
    color=alt.Color("Origin:N", scale=alt.Scale(scheme="category10")),
)

Altair ships with dozens of color schemes, including the standard matplotlib/Plotly palettes (viridis, plasma, inferno, magma, cividis, turbo) and categorical palettes from ColorBrewer (category10, category20, dark2, set1, set2, set3, tableau10, tableau20). For custom palettes, pass range=[list of colors] instead of scheme.

22.7 Selections and Interactivity

Altair's interactivity model is called selections. A selection is a predicate — a condition — that describes which data points are currently "selected" by the user. You define selections, and then you use them in conditional encodings to change how the chart looks based on the selection state.

The simplest selection is an interval brush:

brush = alt.selection_interval()

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color=alt.condition(brush, "Origin:N", alt.value("lightgray")),
).add_params(brush)

This creates a scatter plot where the user can drag a rectangle to brush points. Selected points are colored by origin; unselected points are light gray. The brush is "just" a parameter added to the chart (add_params(brush)), and it is used in a conditional encoding (alt.condition(brush, if_selected, if_not_selected)).

For clicking individual points:

click = alt.selection_point(fields=["Origin"])

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color=alt.condition(click, "Origin:N", alt.value("lightgray")),
).add_params(click)

Clicking a point selects all other points with the same origin (because of fields=["Origin"]). The rest of the syntax is identical to the interval brush.

Linked views are where selections become powerful. When you brush one chart, you can filter or highlight another chart based on the selection. This is the multi-chart equivalent of the single-chart interactivity we have seen so far, and it is Altair's killer feature.

brush = alt.selection_interval()

scatter = alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color=alt.condition(brush, "Origin:N", alt.value("lightgray")),
).add_params(brush)

histogram = alt.Chart(cars).mark_bar().encode(
    x="count()",
    y="Origin:N",
    color="Origin:N",
).transform_filter(brush)

scatter | histogram

This creates a scatter plot with a brush and a horizontal bar chart showing the count of cars per origin. The transform_filter(brush) on the bar chart means "only show data that matches the brush selection." When the user brushes on the scatter, the bar chart updates in real time to show the filtered counts. This is one of the most satisfying interactions in all of data visualization — you are literally sculpting the bar chart by dragging a rectangle on the scatter.

Selections support several types beyond point and interval. You can create multi-selections, named selections, and selections with custom fields. The Altair documentation has the full reference.

22.8 Composition: Layering and Concatenation

Altair's composition operators are where the grammar of graphics pays off most visibly. You can combine charts in four ways:

Layering (+) stacks charts on top of each other, sharing the same axes:

points = alt.Chart(cars).mark_point().encode(x="Horsepower:Q", y="Miles_per_Gallon:Q")
line = alt.Chart(cars).mark_line(color="red").encode(x="Horsepower:Q", y="mean(Miles_per_Gallon):Q")
points + line

This produces a scatter plot with a red average-mpg-by-horsepower line overlaid. The mean(Miles_per_Gallon):Q encoding uses Altair's aggregation syntax — more on that in the next section.

Horizontal concatenation (|) places charts side by side:

chart1 | chart2

Vertical concatenation (&) stacks charts vertically:

chart1 & chart2

Faceting (.facet(row=..., column=...)) creates small multiples:

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
).facet(column="Origin:N")

This is the small-multiples equivalent we have seen in matplotlib, seaborn, and Plotly — one panel per level of the facet variable.

Repetition (.repeat()) is Altair's equivalent of a pair plot. You pass a list of columns, and Altair repeats the chart with each column in turn:

alt.Chart(cars).mark_point().encode(
    x=alt.X(alt.repeat("column"), type="quantitative"),
    y=alt.Y(alt.repeat("row"), type="quantitative"),
).repeat(
    row=["Horsepower", "Acceleration", "Miles_per_Gallon"],
    column=["Horsepower", "Acceleration", "Miles_per_Gallon"],
)

This produces a 3×3 grid of scatter plots, one for each pairwise combination of the three columns — a pair plot in the style of seaborn's pairplot or Plotly's scatter_matrix. The alt.repeat("column") and alt.repeat("row") references tell Altair which repetition variable fills the x and y channels.

The composition operators are composable with each other. (chart1 | chart2) & chart3 puts chart1 and chart2 side by side with chart3 underneath, spanning the full width. chart1 + chart2 | chart3 layers chart1 and chart2 together and places the combined result next to chart3. Parentheses work as expected. The resulting complex layouts are expressed in a few characters of operator syntax — compact and readable once you know the operators.

22.9 Data Transformations

Altair can transform data as part of the chart specification. This is a significant departure from matplotlib and Plotly, where you typically prepare the DataFrame before calling the plotting function. Altair (via Vega-Lite) supports a range of transformations that happen inside the chart spec:

  • transform_filter(predicate) — filter rows based on a condition or selection.
  • transform_calculate(new_col="expression") — compute a new column from existing ones.
  • transform_aggregate(...) — group and aggregate.
  • transform_bin(as_="binned_col", field="col") — bin a continuous column.
  • transform_fold(["col1", "col2"]) — reshape wide to long (similar to pandas melt).
  • transform_window(...) — window functions (cumulative sums, rolling means, ranks).
  • transform_joinaggregate(...) — add aggregate columns back to the original row.
  • transform_stack(...) — stack values for stacked charts.
  • transform_sample(sample=N) — random sampling.

Examples:

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
).transform_filter("datum.Year > '1975'")

This filters the data to only include cars from after 1975. The predicate is a Vega expression — a JavaScript-like mini-language — and datum refers to the current data row. The full Vega expression language is powerful but separate from Python.

alt.Chart(cars).mark_bar().encode(
    x="efficiency_tier:N",
    y="count()",
).transform_calculate(
    efficiency_tier="datum.Miles_per_Gallon > 25 ? 'Efficient' : 'Thirsty'"
)

This computes a new column called efficiency_tier based on a ternary expression, then uses it as the x encoding. The computed column does not exist in the original DataFrame; it is created on the fly by Altair.

Aggregation can also happen within encoding shortcuts:

alt.Chart(cars).mark_bar().encode(
    x="Origin:N",
    y="mean(Miles_per_Gallon):Q",
)

The mean(Miles_per_Gallon):Q encoding tells Altair to aggregate by origin and show the mean MPG. Other aggregation functions include sum, count, median, min, max, q1, q3, stdev, variance, and several more.

This compositional-data-transformation approach is powerful because the transformations happen inside the chart spec, which means they are serialized and re-evaluated whenever the chart is re-rendered. A filter that is tied to a selection (via transform_filter(brush)) re-runs every time the brush moves, without any Python involvement — Vega-Lite handles the update in JavaScript.

22.10 Altair's Limitations

Altair is not without its quirks. The most notorious is the 5,000-row default limit. By default, Altair refuses to render charts with more than 5,000 rows, on the theory that large datasets produce specifications too large to send to the browser efficiently. The error message tells you exactly what is happening:

MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

There are several workarounds. The crude one is to bump the limit:

alt.data_transformers.disable_max_rows()

This removes the cap entirely. It works for datasets up to about 100,000 rows, beyond which the JSON spec becomes genuinely unwieldy. For larger datasets, use the VegaFusion transformer:

alt.data_transformers.enable("vegafusion")

VegaFusion is a separate library (pip install vegafusion) that offloads the data processing to Rust and Arrow, dramatically increasing the row capacity. With VegaFusion, Altair can handle millions of rows.

Other limitations:

  • Limited 3D support. Altair does not really do 3D. For 3D scatter, surface plots, and volumetric data, use Plotly or matplotlib instead.
  • Custom mark types are hard. If you need a chart type that is not in the built-in marks, you are stuck — there is no low-level API to define custom marks in pure Altair (you would have to drop to raw Vega, which is much more complex).
  • Fewer statistical overlays than seaborn. Altair can compute basic statistics (mean, regression, density) through transforms, but it is less convenient than seaborn for rich statistical work.
  • Layout control is less fine-grained than matplotlib. If you need pixel-perfect control over a figure's layout (specific subplot positions, custom alignments), matplotlib is still better.

Despite these limitations, Altair is often the best tool for the kinds of visualization that align with its strengths: interactive exploration, linked views, faceted and layered compositions, and clean declarative code. For those use cases, Altair's grammar-of-graphics elegance is hard to beat.

22.11 Altair vs. Plotly vs. matplotlib

Dimension matplotlib Plotly Altair
Paradigm Imperative (draw commands) Semi-declarative (traces + layout) Fully declarative (grammar of graphics)
Output Static PDF/PNG/SVG Interactive HTML Interactive HTML (Vega-Lite JSON)
API mental model Objects + methods Traces + updates Data + marks + encodings
Interactivity None (by default) Hover, zoom, pan, animations Selections, linked views, conditional encoding
Linked views Manual (custom event handlers) Limited (custom updatemenus) Native (selections + filters)
Composition GridSpec (verbose) make_subplots (verbose) +, |, &, .facet(), .repeat() (concise)
Data prep Separate pandas code Usually separate pandas code Can be in the chart spec (transforms)
Scale Any size Struggles past 100k points 5000-row default, ~100k with disable_max_rows, millions with VegaFusion
Learning curve Steep Gentle Moderate (grammar concepts take getting used to)
Best for Publication-quality static Interactive dashboards Linked-view exploration, compositional charts

The three libraries are complementary. matplotlib excels at print output and precise control. Plotly excels at standalone interactive charts and dashboards. Altair excels at linked-view exploration and grammatical composition. The professional Python practitioner learns all three and chooses based on the delivery context and the analytical task.

22.12 Progressive Project: Climate Data in Altair

We return to the climate dataset for its Altair treatment. The chart this time will showcase selections and linked views.

import altair as alt

brush = alt.selection_interval(encodings=["x"])

temp_chart = alt.Chart(climate).mark_line(color="red").encode(
    x=alt.X("year:T", title="Year"),
    y=alt.Y("temperature_anomaly:Q", title="Temperature anomaly (°C)"),
).properties(width=600, height=200).add_params(brush)

co2_chart = alt.Chart(climate).mark_line(color="blue").encode(
    x=alt.X("year:T", title="Year"),
    y=alt.Y("co2_ppm:Q", title="CO₂ (ppm)"),
).properties(width=600, height=200).transform_filter(brush)

scatter = alt.Chart(climate).mark_point().encode(
    x=alt.X("co2_ppm:Q", title="CO₂ (ppm)"),
    y=alt.Y("temperature_anomaly:Q", title="Temperature anomaly (°C)"),
    color="era:N",
).properties(width=600, height=300).transform_filter(brush)

(temp_chart & co2_chart & scatter).properties(title="Climate: brush the top chart")

Three linked charts: a temperature line chart on top, a CO2 line chart in the middle, and a CO2-vs-temperature scatter on the bottom. The brush on the temperature chart filters the other two charts — dragging to select a time range highlights just that period in the scatter and limits the CO2 line to the same window. This is a linked-view exploration tool, built in a few lines of Altair.

The same effect in Plotly would require several updatemenus entries and manual coordination between the figures. Altair handles it declaratively: selections + transform_filter do the job automatically.

22.13 The Vega-Lite Layer and Why It Matters

Every Altair chart is a Vega-Lite specification. When you display a chart in a Jupyter notebook, Altair serializes the chart to JSON and hands it to the Vega-Lite JavaScript library, which interprets the spec and renders the chart in the browser. The Python code you write never touches the rendering directly — it produces a spec, and a separate layer does the rendering.

You can see the underlying spec at any time by calling .to_json() or .to_dict() on a chart:

chart = alt.Chart(cars).mark_point().encode(x="Horsepower:Q", y="Miles_per_Gallon:Q")
print(chart.to_json(indent=2))

The output is a JSON document with fields for $schema, data, mark, and encoding. A simple chart produces a spec of perhaps 30 lines. A complex linked-view chart produces a spec of several hundred lines. Either way, the spec is human-readable and can be edited directly or loaded into another tool that understands Vega-Lite.

This separation has several benefits. The spec is portable. You can generate an Altair spec in Python, save the JSON, and render it in JavaScript, in R (via the Vega R package), or in any other language that speaks Vega-Lite. The spec does not care about its source. The renderer improves independently. When the Vega-Lite team releases a new version of the library with better performance or new features, your Altair charts get the improvements without any Python-side changes — you just update the Vega-Lite version. The spec is a contract. If you share an Altair chart with a collaborator who does not know Python, they can still read and understand the JSON. The grammar of graphics becomes a common language.

The trade-off is that Altair sometimes feels one step removed from the rendering. If a chart does not look right, you need to check whether the issue is in your Python code (the Altair API call), in the generated spec (the Vega-Lite JSON), or in the renderer (the JavaScript library). Most issues are in the first two layers, and chart.to_dict() is the debugging tool of choice — same pattern as Plotly's fig.to_dict().

For most practical work, you do not need to know Vega-Lite directly. Altair's Python API covers the common cases cleanly, and you can treat the JSON layer as an implementation detail. When you do need it — for debugging, for sharing specs across languages, for understanding why a feature is not exposed in Altair — the layer is accessible.

22.14 Saving and Exporting Altair Charts

Altair charts can be saved in several formats, each with its own trade-offs.

JSON spec. The most portable format. Save the Vega-Lite spec directly:

chart.save("chart.json")

The JSON can be loaded by any Vega-Lite-compatible tool and rendered without Python involvement.

HTML. A self-contained HTML file that embeds the Vega-Lite library and the spec, ready to view in any browser:

chart.save("chart.html")

The HTML file includes the necessary JavaScript to render the chart. Unlike Plotly, Altair's HTML files are typically small because the Vega-Lite library is loaded from a CDN by default. A chart with a moderate amount of data can be under 100 KB on disk.

PNG and SVG. Static image exports. Altair uses the altair_saver package (for older versions) or the built-in save() method with the appropriate extension (for newer versions). PNG and SVG export typically requires a helper library like vl-convert-python or altair_saver:

chart.save("chart.png")
chart.save("chart.svg")

Altair's static export is less polished than matplotlib's — the renderer is still a browser-simulation tool, and the output sometimes has small visual artifacts. For publication-quality static output, matplotlib is still better. For archiving a quick PNG of an interactive chart, Altair's export is sufficient.

PDF. Supported the same way as PNG, with a helper library. Quality is similar to PNG — fine for most use cases, not optimal for print.

The practical recommendation: for interactive delivery, save as HTML or share the JSON spec. For static delivery, consider whether matplotlib or seaborn would produce a better print-ready output, and use those tools when appropriate.

22.15 The Compositional Mindset

The threshold concept for this chapter is "declaration over instruction," and it is worth spending a moment on what that means in practice.

In imperative programming — and imperative visualization — you tell the computer the steps to take. In matplotlib: create a figure, add an axes, call plot, set the x-label, set the y-label, show. Each step is an instruction; the order matters; the result is produced by executing the steps in sequence. This is how most programmers learn to code, and it feels natural because it mirrors how humans describe step-by-step procedures.

In declarative programming, you describe the result you want. In SQL: SELECT name, age FROM users WHERE age > 18 ORDER BY age — you are not telling the database how to execute the query (which indexes to use, which join algorithm, which sort method). You are describing what you want, and the database figures out how. In Altair: alt.Chart(data).mark_point().encode(x="col1:Q", y="col2:Q") — you are not telling Altair how to draw the chart (which pixels to color, which line segments to connect). You are describing the data-to-visual mapping, and Altair (via Vega-Lite) figures out how.

The benefit of declarative code is that it is usually shorter and more expressive. You focus on the what, not the how, and the implementation details are handled by the library. The cost is a learning curve — you have to internalize the library's abstractions, and until you do, the code feels unfamiliar. Once the grammar-of-graphics concepts click, however, Altair's declarative style produces some of the most readable visualization code in Python.

The compositional mindset extends beyond single charts. Altair's +, |, &, .facet(), and .repeat() operators let you combine charts the way SQL's JOIN, UNION, and subqueries let you combine queries. Each operator takes simple pieces and produces more complex structures. A linked-view dashboard is not a special feature in Altair; it is a composition of simple charts with selections that link them. This compositionality is what the grammar of graphics is for.

A concrete test: take any complex visualization you have seen, and try to decompose it into Altair primitives. What data is bound? What marks are used? What encodings? What transformations? What selections? What compositions? Most charts decompose into a few lines of Altair code, even if they would be dozens of lines in matplotlib. The decomposition exercise is how you internalize the compositional mindset, and it is worth doing deliberately until it feels natural.

22.16 Interactive Legends and Conditional Encoding Patterns

Altair's conditional encoding lets you build interactive legends and hover effects with a small amount of code. The key is alt.condition, which takes a predicate and two alternatives — the "if true" value and the "if false" value — and picks one based on whether the predicate matches.

A common pattern is the clickable legend: clicking a legend entry highlights that category and dims the others. In Altair, this is a single chart with a point selection and a conditional opacity:

select_origin = alt.selection_point(fields=["Origin"], bind="legend")

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
    opacity=alt.condition(select_origin, alt.value(1.0), alt.value(0.1)),
).add_params(select_origin)

The bind="legend" argument tells Altair to link the selection to the legend — clicking a legend entry selects that category. The opacity encoding uses alt.condition: if the row matches the selection, opacity is 1.0; otherwise it is 0.1. The result is a chart where clicking a legend entry fades all other categories to near-transparent.

Another pattern is the hover highlight: as the user moves the cursor over a chart, the nearest point becomes highlighted and others dim. This requires a mouse-based selection:

hover = alt.selection_point(on="mouseover", nearest=True, empty=False)

alt.Chart(cars).mark_point().encode(
    x="Horsepower:Q",
    y="Miles_per_Gallon:Q",
    color="Origin:N",
    size=alt.condition(hover, alt.value(200), alt.value(50)),
).add_params(hover)

The on="mouseover" argument makes the selection trigger on mouse movement. nearest=True snaps the selection to the closest point. empty=False means the selection is empty by default (no points highlighted until the cursor enters the chart). The conditional size encoding makes hovered points larger.

These patterns can be combined. A chart with a clickable legend, a hover highlight, and a brush-based filter is a few dozen lines of Altair code — in matplotlib, it would be hundreds of lines of custom event-handling code. The declarative approach is not just shorter; it is also less error-prone because you are not manually managing state.

22.17 When the Grammar Hurts

Altair's grammatical approach is powerful but not universal. There are cases where the grammar gets in the way, and it is worth acknowledging them.

Custom chart types. If you need a chart type that is not in Altair's built-in marks — a violin plot with custom styling, a Sankey diagram, a specific business chart — you are stuck. Altair does not have a plugin system for custom marks. You would have to drop to raw Vega (one layer deeper than Vega-Lite), which is much more complex, or use a different library entirely.

Precise layout control. Altair's composition operators (+, |, &, .facet()) produce grid-like layouts with automatic sizing. If you need pixel-perfect control — a specific subplot position, a specific spacing, a custom alignment — matplotlib's GridSpec gives you more flexibility. Altair's properties(width=..., height=...) helps, but it is not as expressive as matplotlib for fine layout work.

Mixing with other Python visualization. An Altair chart does not integrate with matplotlib or Plotly figures. You cannot embed an Altair chart in a matplotlib Axes or combine it with a Plotly trace. Each library lives in its own world. For projects that require mixing libraries (common in reports and papers), this is a real limitation.

Large datasets. The 5000-row default limit is the most famous Altair gotcha. Even with disable_max_rows or VegaFusion, very large datasets (millions of rows) can produce unwieldy specs. For big-data visualization, Datashader or pre-aggregation are better approaches.

Very customized animations. Plotly's animation_frame and custom frames are more flexible for animated visualizations. Altair supports time-based animation through selections but not in the same out-of-the-box way as Plotly.

None of these limitations are fatal for the cases Altair is designed for. But they are real, and a practitioner who recognizes them early avoids spending hours trying to force Altair to do something it is not built for. When you hit one of these limitations, switch tools — Plotly for customization-heavy interactive, matplotlib for precise layout and custom marks, seaborn for deep statistical overlays.

22.18 Theme and Styling

Like matplotlib style sheets and Plotly templates, Altair has a theme system for consistent styling across charts. A theme is a function that returns a configuration dictionary, and you register themes with alt.themes.

The built-in themes are limited but include "default", "dark", "fivethirtyeight", "ggplot2", "quartz", "vox", "latimes", "powerbi", and a few others:

alt.themes.enable("fivethirtyeight")

After this call, every subsequent chart uses the FiveThirtyEight-style theme: specific background color, font, axis styling, and default color palette. The theme is global, so one call at the top of a notebook affects everything that follows.

For custom themes, register a function:

def custom_theme():
    return {
        "config": {
            "font": "Helvetica Neue",
            "view": {"stroke": "transparent"},
            "axis": {
                "domain": False,
                "labelColor": "#333",
                "titleColor": "#333",
                "titleFontSize": 13,
            },
            "legend": {"titleFontSize": 13, "labelFontSize": 12},
        }
    }

alt.themes.register("custom", custom_theme)
alt.themes.enable("custom")

The returned config dict maps onto Vega-Lite's top-level config schema, which has entries for every styling property. Full reference at vega.github.io/vega-lite/docs/config.html.

For an organization producing many Altair charts, a shared theme module is the Altair equivalent of matplotlib's style sheets (Chapter 12) and Plotly's templates (Chapter 21). The pattern is the same: define once, enable globally, keep charts visually consistent without touching each one individually.

Per-chart overrides are available through chart.configure_axis(...), chart.configure_legend(...), chart.configure_view(...), and friends. These methods return modified charts without changing the global theme, so you can override specific properties for one chart while keeping everything else on the default theme.

22.19 Common Pitfalls

New Altair users run into a predictable set of issues. Mentioning them here saves hours of debugging.

Forgetting the data-type suffix. encode(x="year") without a type shorthand makes Altair guess, and the guess is sometimes wrong — especially for integer columns that might be year codes or quantitative values. Always specify the type (:T, :Q, :O, :N) to avoid ambiguity.

Mark-level color vs. encoding-level color. mark_point(color="blue") makes all points blue. encode(color="Origin:N") colors by origin. Using both at the same time is a common mistake — the encoding wins, but the mark-level color is still in the spec and may confuse readers of your code.

Using alt.value(...) vs. a column reference. encode(color="Origin:N") maps a column. encode(color=alt.value("blue")) sets a constant. Omitting the alt.value wrapper and passing a bare color string (encode(color="blue")) is interpreted as a column named "blue" and usually produces an error.

Selections without add_params. A selection defined with alt.selection_*() is not active until you attach it to the chart with .add_params(selection). Forgetting this step is a silent failure — the chart renders without interactivity, and you wonder why nothing happens on click.

Transforms in the wrong order. Transforms like transform_filter and transform_calculate are applied in the order they are chained. A filter that references a column computed by a later transform will silently drop everything, because the column does not exist at filter time. When in doubt, check the order.

The 5000-row error. Already mentioned, but it bears repeating: if you see MaxRowsError, either call alt.data_transformers.disable_max_rows() or switch to VegaFusion. Do not pre-filter the data to avoid the error — that often defeats the purpose of the chart.

Silent type coercion in columns named with dots or spaces. Column names like "Miles per Gallon" or "data.value" can confuse Altair's encoding parser because dots and brackets have special meaning in the Vega-Lite field syntax. Use alt.X(field="Miles per Gallon", type="quantitative") explicitly when your column names are not plain identifiers, or rename the columns to be identifier-safe before plotting.

Awareness of these pitfalls shortens the learning curve. Most of them are one-line fixes once you recognize the symptom.

22.20 Check Your Understanding

Before continuing to Chapter 23 (Geospatial Visualization), make sure you can answer:

  1. What is the Grammar of Graphics, and which three researchers are most associated with it?
  2. What is Vega-Lite, and how does Altair relate to it?
  3. What are the four data-type shorthand codes in Altair, and what does each mean?
  4. What is the difference between mark-level styling and encoding-level mapping?
  5. What is a selection, and how does it enable linked views?
  6. Name the four composition operators in Altair (layering, concatenation, faceting, repetition) and give an example of each.
  7. What is the 5000-row limit, and how do you work around it?
  8. When would you use Altair instead of Plotly or matplotlib?

If any of these are unclear, re-read the relevant section. The next chapter leaves interactive visualization libraries behind and introduces geospatial visualization: maps, choropleths, and location data.

22.21 Chapter Summary

This chapter introduced Altair and the grammar-of-graphics approach:

  • The Grammar of Graphics (Wilkinson 1999, Wickham 2005) decomposes every chart into data, marks, encodings, scales, transformations, coordinates, and facets. Any visualization is a composition of these primitives.
  • Vega-Lite is a JSON-based specification for statistical graphics that implements the grammar of graphics in executable form. Altair is the Python binding that generates Vega-Lite specs.
  • The canonical Altair pattern is alt.Chart(data).mark_*().encode(...), with optional scales, axes, transforms, and composition.
  • Marks are the visual primitives: point, line, bar, area, rect, tick, rule, text, geoshape, boxplot, errorbar.
  • Encodings map data columns to visual channels: x, y, color, size, shape, opacity, tooltip, row, column, and more.
  • Data types (Q, N, O, T) are specified as shorthand suffixes and affect scale and axis choices.
  • Selections add interactivity through conditions. Interval brushes, point clicks, and custom selections power linked views via transform_filter.
  • Composition operators (+, |, &, .facet(), .repeat()) combine charts compositionally.
  • Transforms (filter, calculate, aggregate, fold, window, sample) can run inside the chart spec, re-evaluated on interaction.
  • Altair has limitations: the 5000-row default cap (solved with disable_max_rows or VegaFusion), limited 3D, fewer custom-mark options than matplotlib.

The chapter's threshold concept — declaration over instruction — argues that Altair's power comes from describing what the chart should look like rather than how to draw it. Complex visualizations become compositions of simple pieces, and the grammar-of-graphics primitives are the pieces.

Chapter 23 moves from general interactive libraries to geospatial visualization: choropleths, maps, location scatter plots, and the specific libraries (GeoPandas, Folium, Plotly's mapbox traces) used for spatial data.

22.22 Spaced Review

Questions that reach back to earlier chapters:

  • From Chapter 16 (seaborn): seaborn's hue="column" maps a variable to color, similar to Altair's color="column:N". What is the difference between seaborn's declarative-ish API and Altair's fully declarative API?
  • From Chapter 9 (Storytelling): Altair's linked views enable a specific kind of exploratory interaction. How does this fit into Shneiderman's mantra?
  • From Chapter 4 (Honest Charts): Altair's transform_filter(brush) lets readers filter the data interactively. What ethical concerns arise from interactive filtering, and how do they compare to the range-slider concerns from Chapter 20?
  • From Chapter 17 (Distributional Viz): Altair's transform_bin and mean(col) aggregations are the Vega-Lite analog of pandas groupby. How do they compare in terms of clarity and flexibility?
  • From Chapter 5 (Choosing the Right Chart): The Grammar of Graphics argues that chart types are not fundamental — any chart is a composition of marks and encodings. Does this change how you think about Chapter 5's chart-selection matrix?

Altair is the most theoretically-grounded of the three interactive libraries in Part V. Its grammar-of-graphics foundation makes it the right choice for analysts who think compositionally — data + marks + encodings — and who value linked views for exploratory analysis. For standard interactive charts, Plotly Express is still faster. For complex dashboards, Plotly Graph Objects is still more controllable. But for elegant compositional charts with linked interactivity, Altair is the tool that feels closest to the underlying theory. Part V is now complete; Chapter 23 begins Part VI with geospatial visualization.