Case Study 1: The Clinical Trial Figure Wars

DataField.Dev

Case Study 1: The Clinical Trial Figure Wars

In the past decade, a slow revolution has been happening in scientific figure conventions. The bar-with-error-bar chart — the classic "dynamite plot" — is being replaced by charts that show individual data points. This case study walks through the history, the reasons, and the specific design changes.

The Situation

For decades, scientific figures for group comparisons used one standard format: a bar chart showing the mean for each group, with an error bar extending above and/or below the bar showing one standard error or one standard deviation. The resulting "dynamite plot" was the universal language of biomedical publication. If you picked up a random clinical trial paper from 1990, 2000, or 2010, the figures would almost all be bar + error bar. The format was so standard that many journals required it.

Starting around 2012, this changed. A series of influential papers, blog posts, and Twitter threads made the case that dynamite plots hide information and should be replaced with charts that show individual data points. The critique was not new — statisticians had been making it for decades — but around 2012-2015, it reached a critical mass in the biomedical publication community. Papers started adopting strip+box plots, violin plots, and overlaid raw data. Journals started accepting (and some recommending) these alternatives.

By 2020, the transition was well underway but not complete. Many papers still used dynamite plots; many journals still accepted them without comment. But the conversation had shifted — if you produced a bar-with-error-bar chart for a prominent paper, reviewers were likely to suggest an alternative. "Show the data" had become a mantra in scientific visualization.

This case study walks through the history of the critique, examines why it took so long to take hold, and shows specific before-and-after redesigns using seaborn. The goal is not just to learn the alternative charts but to understand why the shift happened and what lessons it teaches about scientific figure conventions.

The Dynamite Plot and Its Critics

The dynamite plot format has a specific anatomy:

One bar per group, with the bar height representing the mean (or sometimes the median).
An error bar extending above (and sometimes below) the bar.
The error bar usually shows one standard error of the mean (SEM) but sometimes shows one standard deviation (SD) or a 95% confidence interval.
Optional significance asterisks indicating statistical test results between groups.

The format is visually simple and easy to produce. For two-group comparisons with reasonable sample sizes, it works adequately: the reader sees that group A has a larger mean than group B, with some indication of uncertainty. If the error bars do not overlap, the groups are probably different; if they overlap a lot, they are probably not.

The critiques started in academic statistics journals in the 1980s. Statisticians pointed out that dynamite plots hide the actual distribution of data:

Sample size is invisible. A bar with a small error bar might come from 10 points or 10,000. The chart does not say.
Distribution shape is invisible. A symmetric bell curve and a bimodal distribution can produce identical bar + error bar charts.
Outliers are invisible. The mean is pulled toward extremes; the chart does not show them.
Assumption of symmetric spread. The error bar is symmetric, implying the data is symmetric, but skewed data is common.

These critiques existed in the statistical literature but did not reach the biomedical publication community. Biomedical authors and reviewers continued to use dynamite plots because they were the standard, and the few alternatives (box plots) were seen as harder to read for non-statistical readers.

The Turning Point: 2012-2015

Around 2012, several influential pieces pushed the critique into the mainstream:

Weissgerber et al. (2015) "Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm" (PLOS Biology). This paper, by researchers at the Mayo Clinic, directly critiqued dynamite plots in biomedical publication and proposed alternatives. The paper was widely cited and discussed in biomedical Twitter and blog communities. It included specific before-and-after examples and a call to journals to change their figure requirements.

The #BarBarPlots Twitter campaign (2016) — a hashtag-based movement led by statisticians and data journalism practitioners, arguing against the use of bar plots for continuous data. The campaign included templates, redesigns, and a Kickstarter to print posters for scientific buildings.

Individual researcher blog posts from Bang Wong, Alberto Cairo, and others advocated for showing the raw data. These posts were shared widely and influenced how researchers thought about figures.

Journal editor position pieces in Nature, The Lancet, and similar high-impact journals began suggesting that authors should show individual observations where possible. Some journals changed their figure guidelines to explicitly recommend strip plots or box plots over dynamite plots.

By 2015, the conversation had shifted. Dynamite plots were still common, but they were no longer the unquestioned default. Reviewers would suggest alternatives. Papers that adopted the alternatives were seen as modern; papers that stuck with dynamite plots were seen as outdated.

Why It Took So Long

The critique was available for decades before it took hold. Why?

1. Convention inertia. Scientific publication conventions are slow to change. Reviewers, editors, and authors had all learned dynamite plots as the standard, and changing meant unlearning a familiar format. The inertia is not malicious; it is just how conventions work.

2. Software defaults. GraphPad Prism, the most popular scientific plotting software in biomedical research, produced dynamite plots by default. Authors who used Prism (millions of them) got dynamite plots without making an active choice. The software was not at fault, but its defaults shaped the visual conventions.

3. Space constraints in journals. Dynamite plots are compact. A four-panel figure with four dynamite plots fits in a small space. A four-panel figure with strip+box combinations needs more space. Journals that limited figure area implicitly favored the more compact format.

4. Reader familiarity. Dynamite plots are easy to read at a glance if you know the convention. For rapid review of many papers, readers are used to decoding them. Strip+box plots require a few more seconds of attention, even though they convey more information.

5. Lack of good alternative tools. Before seaborn (released 2012) and ggplot2's wider adoption, producing strip+box combinations in Python or R required more code than producing dynamite plots. The tooling friction reinforced the conventional choice.

6. Disagreement about what to show. Even among critics of dynamite plots, there was not unanimous agreement on the right alternative. Some preferred strip plots, some preferred box plots, some preferred violin plots. Without a clear single alternative, the old format persisted.

The Modern Alternatives

By 2020, several alternatives had emerged as modern replacements for dynamite plots. In seaborn:

Alternative 1: Strip + Box (Publication-Style)

fig, ax = plt.subplots(figsize=(6, 4))
sns.boxplot(data=trial, x="treatment", y="outcome", showfliers=False,
            boxprops=dict(alpha=0.3, edgecolor="black"), ax=ax)
sns.stripplot(data=trial, x="treatment", y="outcome",
              color="black", size=6, alpha=0.7, ax=ax)
ax.set_title("Outcome by Treatment")

This is the biomedical-publication-style replacement. The box plot shows the median and quartiles; the strip plot shows every individual observation. The box is made transparent with alpha=0.3 so the points remain visible.

Alternative 2: Violin + Strip (Shape-Aware)

fig, ax = plt.subplots(figsize=(6, 4))
sns.violinplot(data=trial, x="treatment", y="outcome", inner=None, ax=ax)
sns.stripplot(data=trial, x="treatment", y="outcome",
              color="black", size=5, alpha=0.7, ax=ax)
ax.set_title("Outcome by Treatment")

The violin shows the distribution shape; the strip plot shows individual observations. inner=None removes the violin's default quartile lines so the points are not obscured.

Alternative 3: Swarm + Box (Small-Sample-Friendly)

fig, ax = plt.subplots(figsize=(6, 4))
sns.boxplot(data=trial, x="treatment", y="outcome", showfliers=False,
            width=0.5, boxprops=dict(alpha=0.3), ax=ax)
sns.swarmplot(data=trial, x="treatment", y="outcome",
              color="black", size=4, ax=ax)
ax.set_title("Outcome by Treatment")

Swarm plots are cleaner than strip plots for small samples because they avoid overlap without needing jitter. For 20-100 observations per group, this is often the cleanest option.

Alternative 4: Scatter with Mean Overlay

fig, ax = plt.subplots(figsize=(6, 4))
sns.stripplot(data=trial, x="treatment", y="outcome",
              size=8, alpha=0.7, ax=ax)
sns.pointplot(data=trial, x="treatment", y="outcome",
              estimator="mean", errorbar=("ci", 95),
              markers="D", color="#d62728", ax=ax)
ax.set_title("Outcome by Treatment (Mean ± 95% CI)")

The strip plot shows individual observations; the point plot overlays the mean as a diamond with a 95% CI error bar. This is closer to the dynamite plot format but preserves the individual data visible through the overlay.

All four alternatives show more information than a dynamite plot for the same data. The choice among them depends on the specific audience (journal conventions matter), the sample size (small samples prefer strip or swarm; larger samples prefer box or violin), and the aesthetic preference.

Lessons for Practice

The dynamite-plot-to-strip+box transition is a specific example of a general pattern: scientific figure conventions can change, slowly, when a critical mass of practitioners advocates for better alternatives.

1. Convention inertia is real but not immovable. Changing an established convention takes years of sustained advocacy, but it is possible. If you believe a figure convention in your field is suboptimal, you can contribute to the change by using the alternative in your own work and explaining why when asked.

2. Tool defaults shape conventions. seaborn's default for sns.barplot is a dynamite plot (bar + CI), but this is arguably a design mistake. Matplotlib's ax.bar does not include error bars by default, which is more neutral. Tool authors should be aware that defaults shape conventions at scale.

3. Show the data. The single most important lesson from the critique is that showing individual observations reveals information that summaries hide. Whenever you produce a group comparison chart, ask whether showing the data would add useful information. Usually the answer is yes.

4. Sample size should be visible. A chart of a small experiment (n=10) should look different from a chart of a large experiment (n=10,000), even if the summary statistics are the same. Strip plots make the sample size visually obvious; bar plots hide it.

5. Journals change slowly. The transition from dynamite plots to strip+box combinations took a decade to reach mainstream acceptance and is still ongoing. If you are early in your career, consider using modern alternatives and making the case to your co-authors and reviewers. Change happens one paper at a time.

6. Not every critique takes hold. The dynamite plot critique succeeded because the alternatives were demonstrably better, the tools supported them, and a critical mass of advocates shared the message. Other critiques (against pie charts, 3D bar plots, rainbow colormaps) have had similar success. Not every critique will; choose where to spend your advocacy carefully.

Discussion Questions

On convention inertia. Dynamite plots persisted for decades despite the critique. What would it take to change a different convention — say, the use of Excel for scientific figures, or the use of JPEG instead of PDF for publication?
On tool defaults. sns.barplot defaults to showing a mean with a 95% CI — a dynamite plot. Should seaborn change its default to something better? What would the trade-offs be for existing users?
On reviewers. In your field, would a reviewer today accept a dynamite plot for a group comparison, or would they suggest a strip+box alternative? How has the convention shifted in the last 5-10 years?
On journal requirements. Some journals now recommend or require showing individual data points in group comparison figures. Is this a good use of editorial authority, or should the choice be left to authors?
On your own practice. Think of the last group comparison chart you produced. Was it a dynamite plot? Would a strip+box alternative have been more informative? If not, why did you choose the format you did?
On the general principle. Beyond dynamite plots, what other visualization conventions in scientific publication do you think should change? What alternatives would you propose?

The transition from dynamite plots to charts that show individual data points is a small example of how scientific conventions can change. The critique was around for decades; the tools to produce better alternatives existed for years before the shift; but it took coordinated advocacy from influential voices to reach critical mass. This is how visualization conventions actually change: slowly, through accumulated pressure from practitioners and editors who refuse to accept the status quo. The modern strip+box combination is the product of that change. When you produce one, you are participating in the practice that replaced the old convention.