Case Study 2: The UW Interactive Data Lab and the Making of Vega-Lite

DataField.Dev

Case Study 2: The UW Interactive Data Lab and the Making of Vega-Lite

Between 2014 and 2016, Jeffrey Heer and a group of graduate students at the University of Washington Interactive Data Lab built two visualization grammars: Vega (low-level, general-purpose) and Vega-Lite (high-level, statistical). Vega-Lite became the rendering backbone for Altair in Python, for Observable notebooks in JavaScript, for Jupyter widgets, and for commercial products. The UW lab's story is a case study in how academic research can produce tools that change the commercial landscape — and in what it takes to design a grammar of graphics that programmers actually want to use.

The Situation: The JavaScript Visualization Gap

By the early 2010s, D3.js had become the dominant library for interactive visualization on the web. Mike Bostock's 2011 release of D3 (after his earlier work on Protovis, a precursor) gave JavaScript developers a powerful toolkit for binding data to DOM elements and producing sophisticated interactive graphics. D3 was the technology behind the NYT's graphics team, the Washington Post's data desk, FiveThirtyEight's dashboards, and countless other high-profile visualizations. It was also, famously, hard to learn. D3 exposed raw SVG manipulation, required understanding selections and data joins, and did not have chart primitives — you built everything from scratch.

The alternative was Chart.js, Highcharts, and similar libraries that offered pre-built chart types. These were easy to use but inflexible — if you wanted a chart type the library did not support, you were stuck. The gap between "easy but inflexible" (Chart.js) and "flexible but hard" (D3) was the opportunity.

Meanwhile, in the Python and R worlds, the grammar of graphics was ascendant. ggplot2 had become the dominant R library. matplotlib and seaborn were widely used in Python, though seaborn's grammar-of-graphics adoption was partial and ggplot2-like libraries had not gained much traction. The question was whether JavaScript — where D3 dominated — would get a grammar-of-graphics library of its own.

Jeffrey Heer, then a professor at Stanford and later at the University of Washington, had been thinking about this question for years. Heer had been one of the co-authors of Protovis and later D3. He was deeply familiar with the tradeoffs between low-level and high-level visualization libraries. He had seen what ggplot2 accomplished in R and wanted to bring something similar to JavaScript.

In 2014, Heer and his students at the UW Interactive Data Lab started work on Vega, a JSON-based visualization grammar. Vega was not initially grammar-of-graphics — it was a general visualization grammar, more flexible than ggplot2 but also more complex. Vega specs described visualizations as compositions of data, scales, marks, and axes, but there was no built-in notion of "encoding" and no simplified syntax for common chart types. A Vega spec for a scatter plot was 50 lines of JSON.

Vega worked, but it was too verbose for everyday use. The lab built it, researchers used it, but it did not catch on with working developers. The gap between Vega's theoretical power and its practical usability was too large.

The Vega-Lite Idea: A Higher-Level Grammar on Top of Vega

In 2015-2016, Heer's team started working on Vega-Lite — a higher-level language that compiled to Vega. The idea was simple: Vega-Lite would have a much more compact syntax, optimized for the common cases, and a Vega-Lite spec would be automatically translated into a Vega spec for rendering. Developers would write Vega-Lite (short) and get Vega (powerful) without having to think about the low-level details.

The Vega-Lite design was heavily inspired by ggplot2 and Wilkinson's grammar of graphics. Every Vega-Lite spec had four or five top-level fields:

data: the data source (inline, URL, or generated).
mark: the visual primitive (point, bar, line, etc.).
encoding: the mapping from data fields to visual channels.
transform: optional data transformations.
selection (added later): interactive selections for linked views.

A Vega-Lite scatter plot was five or six lines of JSON instead of fifty. Faceting was a single field; layering was a composition operator; linked views were selections plus filter transforms. The same principles that made ggplot2 productive in R — compositionality, consistency, explicit encodings — now worked in JavaScript too.

The first Vega-Lite release came in 2016. It was initially rough — the API was incomplete, the documentation was thin, and the compilation from Vega-Lite to Vega sometimes produced inefficient output. But the core design was right, and the lab continued to refine it over the following years. By 2019, Vega-Lite was stable enough for production use, and it had become the basis for several new visualization tools.

Altair Emerges: Python Meets Vega-Lite

Altair was born almost by accident. Jake VanderPlas, then at the UW eScience Institute, was a visualization practitioner looking for a good Python library that embodied the grammar of graphics. He had watched the Python ggplot attempts (ggplot, plotnine) and was unsatisfied with all of them — they lacked the interactivity and compositional power that Vega-Lite offered. He talked to Brian Granger (a co-creator of IPython and Jupyter) about building a Python binding for Vega-Lite.

The result was Altair, first released in 2016-2017 as an open-source project. Altair's design was simple: a Python API that generated Vega-Lite JSON specs. The API mirrored Vega-Lite's structure closely — alt.Chart(data).mark_point().encode(x="col1:Q") produced the JSON spec that corresponds to {"data": ..., "mark": "point", "encoding": {"x": {"field": "col1", "type": "quantitative"}}}. The mapping was so direct that reading Altair code felt like reading the JSON spec, with Python syntax sugar on top.

VanderPlas and Granger's decision to build Altair on Vega-Lite was consequential. It meant that Altair inherited Vega-Lite's strengths: grammar-of-graphics composition, declarative interactivity, linked views, web-native rendering. It also inherited Vega-Lite's weaknesses: the 5000-row default limit (inherited from Vega-Lite's JSON-based data embedding), the limited 3D support, and the dependence on a JavaScript renderer that you had to load in the browser. The trade-offs were worth it — Altair gave Python users access to a grammar-of-graphics framework that did not exist anywhere else in the Python ecosystem, and the JavaScript rendering made the charts interactive without requiring any additional work.

The Research-to-Production Pipeline

The UW Interactive Data Lab's approach to building Vega and Vega-Lite is a model for how academic visualization research can produce tools that matter commercially. The steps:

1. Theoretical grounding. The lab did not start with "let's build a chart library." They started with the question "what is the right abstraction for statistical graphics on the web?" The answer came from reading Wilkinson, studying ggplot2, analyzing D3, and thinking carefully about what developers needed.

2. Prototype in stages. Vega came first, as a low-level general-purpose grammar. Vega-Lite came later, layered on top of Vega to provide the higher-level abstraction. The two-stage approach let the lab evaluate the low-level layer before designing the high-level API.

3. Open-source development. Both Vega and Vega-Lite were open-source from the start, MIT-licensed and hosted on GitHub. This lowered the barrier to adoption and let outside contributors (including Altair's authors) build on top of the work.

4. Integration with existing tools. The lab worked closely with the Observable team, with Jupyter, with D3 developers, and with commercial visualization vendors to ensure Vega-Lite could be embedded in the tools people already used. Adoption was driven by usefulness, not by marketing.

5. Academic publications alongside code. The lab published peer-reviewed papers on Vega and Vega-Lite that explained the design decisions and evaluated the resulting tools. The papers gave the work academic credibility and also documented the design choices for future users.

6. Commercial downstream adoption. Once Vega-Lite was mature, commercial products began using it. Apple's Shortcuts charts used Vega-Lite. Jupyter's built-in chart widgets used Vega-Lite. Observable (a commercial notebook company founded by D3 creator Mike Bostock) used Vega-Lite extensively. The lab's work became infrastructure.

This pipeline — theory → prototype → open source → integration → papers → commercial adoption — is rare in academic research. Most research tools never make it out of the paper. Vega and Vega-Lite did because the UW lab was deliberate about every stage and because the underlying design was genuinely good.

Theory Connection: Why Grammar-of-Graphics Tools Are Infectious

The grammar of graphics is infectious. Once a library adopts it, other libraries tend to follow — not out of fashion but because the grammar is productive. ggplot2 begat plotnine, Lattice (partially), Gadfly (Julia), and influenced seaborn. Vega begat Vega-Lite. Vega-Lite begat Altair. Altair's user base overlaps heavily with ggplot2's — both libraries attract people who think compositionally about charts.

This is not a coincidence. The grammar of graphics is a mental model more than it is a library design. Once you start thinking in terms of data + marks + encodings, every library feels more or less expressive depending on whether it supports that mental model. An imperative library feels clumsy; a grammatical library feels natural. The infectiousness comes from the mental model, not from any single implementation.

The UW lab understood this. They did not try to compete with D3 on raw flexibility or Chart.js on raw simplicity. They built a tool that matched the way visualization researchers were already thinking. The mental model did the marketing.

For practitioners, the takeaway is to pick tools that match the way you think. If you think compositionally, grammar-of-graphics tools (ggplot2, Altair, Vega-Lite) are the right choice. If you think imperatively, matplotlib and D3 are the right choice. Neither is universally better; they serve different mental models. The best tool is the one that aligns with how your brain already organizes the problem.

The Impact: A Universal Visualization Grammar

Vega-Lite is now the dominant JSON-based visualization grammar. It is embedded in:

Altair (Python) — covered throughout this chapter.
Vega-Lite Observable notebooks — the default charting system for Observable, a JavaScript notebook platform.
JupyterLab — optional Vega-Lite rendering for interactive charts.
Kibana (Elastic) — used for some visualization features.
Voyager and Voyager2 — visualization recommendation tools built on Vega-Lite that automatically suggest chart types based on the data.
DataVoyager — a visual analysis tool with Vega-Lite as its rendering engine.

The ecosystem around Vega-Lite is large and growing. Each new tool that adopts it extends the reach of grammar-of-graphics thinking. A user who learns Vega-Lite (through Altair, through Observable, or through direct JSON editing) can transfer their knowledge across tools.

This is how visualization tooling progresses. Not by a single dominant library taking over, but by a shared vocabulary spreading across many implementations. The grammar of graphics is the vocabulary. ggplot2, Vega-Lite, and Altair are implementations in different languages. The vocabulary is what matters.

Discussion Questions

On academic research producing commercial tools. The UW lab's Vega/Vega-Lite work succeeded commercially in a way that most academic research does not. What factors contributed to this success? Could the model be replicated?
On shared vocabulary. The chapter argues that grammar-of-graphics thinking is infectious and that shared vocabulary matters more than individual libraries. Do you agree? Can you think of counterexamples?
On JSON as an API. Vega-Lite is a JSON specification language that Altair wraps. Is JSON a good API surface, or is Altair's Python layer doing important work that JSON alone could not?
On the 5000-row limit. The UW lab inherited this limit from Vega-Lite's design (which embeds data in the JSON spec). Is this an acceptable limitation, or should it be fixed at the library level?
On cross-language portability. A Vega-Lite spec written in Python (via Altair) can be loaded and rendered in JavaScript, in R (via the vegawidget package), or in any other language that parses JSON. What are the benefits of this portability? What are the costs?
On your own use. After reading this chapter, are you more likely to use Altair for your next project? What would it take to convince you to make it your default visualization library?

The UW Interactive Data Lab's Vega and Vega-Lite are some of the most successful academic visualization projects of the 2010s. They produced a grammar of graphics that works on the web, integrates with Python through Altair, and serves as infrastructure for many other tools. When you write an Altair chart, you are using Vega-Lite through a Python wrapper — and the Python wrapper is itself a form of translation from one grammar-of-graphics dialect (Altair's Pythonic method-chain style) to another (Vega-Lite's JSON-spec style). The grammar underlies all of it.