Case Study 2: Bar Chart Races and the Limits of matplotlib Animation

DataField.Dev

Case Study 2: Bar Chart Races and the Limits of matplotlib Animation

In 2019, animated bar charts showing the rise and fall of countries' GDP, cities' populations, or YouTube subscriber counts went viral. The "bar chart race" format exploded across social media, spawned a Twitter account that specialized in them, and triggered a design controversy about whether the format was informative or manipulative. This case study examines the format from both angles.

The Situation

Sometime in early 2019, animated "bar chart races" started appearing on Twitter and YouTube. The format: a horizontal bar chart where the bars (representing countries, companies, or cities ranked by some metric) move up and down over time as values change. The bars re-sort each frame so that the largest is always at the top. Over 30-60 seconds, the viewer watches the ranking evolve: bars climb, bars fall, bars leapfrog each other, and the story of "who was number one when" unfolds.

The most-watched example was probably a YouTube video showing the "Most Popular Programming Languages 2004-2019" — a bar chart race of Stack Overflow tag usage that received millions of views. Within weeks, bar chart races were everywhere: GDP rankings, movie box office, subscriber counts, company valuations, social media platforms, city populations, internet browsers. A Twitter account called @BarChartRace started producing them automatically from viewer requests.

The format was compelling for several reasons. It was novel. It was short enough to hold attention. The competitive, up-down-leapfrog dynamics felt dramatic. And it revealed the historical flow of ranking changes in a way that a static chart could not.

It was also controversial. Some data visualization practitioners criticized the format as misleading. The rapid motion made it hard to read specific values. The re-sorting obscured the actual magnitude of changes. The viewer absorbed a vague sense of "things moved around" but could not extract specific facts. And the emphasis on "the race" implied competition in contexts where the underlying data was not actually competitive.

This case study examines the bar chart race format from both angles: how to build one in matplotlib, and what the design critique teaches about when animation is and is not appropriate.

The Data

A bar chart race needs time-indexed ranking data. For this case study, imagine GDP per capita for 10 countries over 30 years. The data would be a pandas DataFrame indexed by year, with one column per country:

import numpy as np
import pandas as pd

np.random.seed(42)
countries = ["USA", "China", "Japan", "Germany", "UK", "France", "India", "Brazil", "Russia", "Canada"]
years = list(range(1990, 2025))

# Synthetic GDP growth data
data = pd.DataFrame(index=years, columns=countries)
for c in countries:
    data[c] = np.cumsum(np.random.rand(len(years)) * 1000 + 200) + 10000

Each row is a year; each column is a country. Each cell is the GDP for that country in that year. The goal of the animation is to show how the rankings change over time.

The Animation

Building a bar chart race in matplotlib requires more work than the simple FuncAnimation examples from Section 15.2 because the bars need to re-sort each frame. Here is a complete implementation:

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots(figsize=(10, 6))

# Color palette (one color per country, keyed by name)
colors = plt.cm.tab10(np.arange(len(countries)))
country_colors = dict(zip(countries, colors))

def update(frame):
    ax.clear()  # clear the axes each frame

    year = years[frame]
    year_data = data.loc[year].sort_values(ascending=True)  # ascending for barh (smallest at bottom)

    # Draw bars in sorted order
    bar_colors = [country_colors[c] for c in year_data.index]
    ax.barh(year_data.index, year_data.values, color=bar_colors)

    # Annotate each bar with its value
    for i, (country, value) in enumerate(year_data.items()):
        ax.text(value + 100, i, f"{int(value):,}", va="center", fontsize=9)

    # Style
    ax.set_xlim(0, data.max().max() * 1.15)
    ax.set_title(f"GDP per Capita, {year}", fontsize=14, loc="left", fontweight="semibold")
    ax.set_xlabel("GDP per Capita (USD)")
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)

ani = FuncAnimation(fig, update, frames=len(years), interval=500, blit=False)
ani.save("bar_chart_race.gif", writer="pillow", fps=2)

Walk through the key decisions:

1. ax.clear() at the start of each frame. Unlike the line-chart animations from Section 15.2, the bar chart race redraws the entire chart each frame. This is because the sort order changes: bars move positions, and it is simpler to redraw than to animate each bar's position.

2. Sort data for each frame. data.loc[year].sort_values(ascending=True) gets the current year's values and sorts them so the largest is at the top of the horizontal bar chart (remember: barh reads from bottom to top, so ascending sort gives largest-at-top).

3. Color by country, not by rank. Each country has its own color, tracked in the country_colors dictionary. The colors persist across frames even as the bars move positions, so the viewer can visually track a specific country through the animation.

4. Annotate each bar with its value. The text annotation shows the exact number, mitigating the "cannot read specific values" criticism of bar chart races. Without this, the animation shows motion without quantities.

5. Slow interval (500 ms, 2 fps). Bar chart races are usually slow — 1-3 seconds per frame. Unlike smooth animations, they are meant to be read frame by frame. At 2 fps, the 35-year animation plays for 17.5 seconds, which is long but gives the viewer time to absorb each year.

6. blit=False. Because we call ax.clear() each frame, blit is not usable. The animation is slower to render, but for 35 frames it is still acceptable.

The result is a short video that shows how GDP rankings changed over 35 years. Countries rise and fall, colors persist across frames so the viewer can track specific countries, and value annotations give concrete numbers at each step.

The Critique

Despite the viral popularity of bar chart races, serious data visualization practitioners raised several objections.

1. Specific values are hard to read during motion. Unless the animation is paused or the annotation is very clear, viewers absorb the motion but not the numbers. A static small multiple (one panel per year, or one panel per 5 years) would let viewers read specific values at their own pace.

2. The changing sort order obscures magnitudes. In a static chart, a bar's position encodes its rank; the length encodes its magnitude. In a bar chart race, both position and length change simultaneously, and viewers struggle to separate the two. A country that grows in both rank and value is visually indistinguishable from a country that grows in rank but not value.

3. The "race" framing implies competition where there may be none. GDP per capita is not a zero-sum competition among countries. A country gaining rank does not mean another country lost ground. The racing metaphor imposes a competitive frame on data that is not inherently competitive.

4. The format prioritizes drama over information. The up-down-leapfrog dynamics are entertaining but do not necessarily align with what the data shows. A slow, steady trend looks less "exciting" than a dramatic rank change, but the slow trend may be the more important finding.

5. Change blindness defeats close reading. The rapid motion (even at 1-2 fps) is too fast for viewers to carefully examine individual changes. The format rewards glance-viewing, which means it communicates general impressions but not specific facts.

Critics (including Alberto Cairo, who wrote publicly about the format) argued that bar chart races are a form of "eye candy" — visually engaging but not genuinely informative. For the same data, a small-multiple line chart (one line per country, over time) would be more informative because viewers could compare trajectories directly without waiting for an animation to play out.

The Defense

Defenders of bar chart races argued the opposite: the format reaches audiences that static charts do not, provides an engaging entry point to data, and creates a sense of historical flow that line charts cannot. The criticism of "hard to read specific values" is valid but misses the point: bar chart races are not meant to be used for specific-value reading; they are meant to convey the general dynamics of ranking change.

For public engagement with data, the defense goes, the bar chart race is an effective format even if it sacrifices some precision. A viewer who watches a bar chart race comes away with a better intuition for "how things changed" than a viewer who reads a static chart, even if the former cannot cite specific numbers. The engagement is the point.

Both sides have merit. The format is right for some audiences and wrong for others. The design controversy is not about whether bar chart races are "good" or "bad" in an absolute sense but about what specific trade-offs they make and whether those trade-offs are appropriate for a given use case.

Lessons for Practice

1. Animation can reach new audiences. Bar chart races went viral partly because they were novel and engaging. For audiences who do not usually look at data visualization, the format provided a hook. This is a legitimate value, even if the format sacrifices precision.

2. But animation can also obscure. The criticism is also legitimate: rapid motion makes close reading harder, and the "race" framing imposes meanings the data may not support. Know the trade-off and make the choice deliberately.

3. Specific values require annotation. If you produce a bar chart race, annotate each bar with its specific value. This partially mitigates the "cannot read values" criticism. Without annotations, the animation becomes pure motion with no numbers.

4. Colors should track identity, not rank. Give each country (or entity) its own color and keep it consistent across frames. This lets viewers track specific entities through the motion. Color-by-rank would be confusing because the colors would change as the ranks change.

5. Slow down for readability. Bar chart races work at 1-3 seconds per frame, much slower than smooth animations. This is long enough for viewers to read each state. Faster animations make the format unreadable.

6. Consider the alternative. Before committing to a bar chart race, consider whether a small multiple (one panel per time point) or a line chart (all series over time) would serve better. For most data, a line chart is more informative and requires no animation. Use the bar chart race only when the ranking dynamics are the primary story.

7. Know when to say no. Some data is not appropriate for the bar chart race format. Non-competitive data, data with many ties, data where specific values matter more than ranks — these are poorly served by a bar chart race. Recognize these cases and choose a different format.

Discussion Questions

On the "race" framing. The bar chart race format implies competition. For data that is not actually competitive (GDP per capita, population counts), is this framing misleading? Does the metaphor distort the data?
On the trade-off between engagement and precision. Bar chart races engage audiences but sacrifice precision. Is this a worthwhile trade-off for public communication, or does it teach audiences to expect drama over information?
On value annotations. The case study's implementation annotates each bar with its value. Does this mitigate the "cannot read values" criticism effectively, or does the motion still overwhelm the annotations?
On the alternative static version. For the same GDP data, a small multiple or a line chart would be more informative but less viral. Which form serves readers better in the long run?
On production effort. A bar chart race takes more matplotlib code than a static chart. Is the extra production effort justified by the extra engagement? How would you decide for a specific project?
On the boundary of appropriate use. Name a specific dataset where a bar chart race would be the right choice. Name one where it would be wrong. What distinguishes them?

The bar chart race format is a useful illustration of the general principle that animation is not strictly better than static visualization — it is different, with different trade-offs. For engagement with public audiences, the format works. For serious analytical use, static charts are usually better. The critique and the defense both have merit, and the chart maker has to decide which side applies to their specific context. The matplotlib code to build one is straightforward; the harder question is whether it is the right chart for the job.