"The simple graph has brought more information to the data analyst's mind than any other device."
Learning Objectives
- Locate the appropriate chart type for any data question by browsing the gallery's visual index
- Reproduce any of the 50 chart types from the provided code templates, adapting to their own data
- Compare chart types within each category and select the most appropriate for a given context
- Identify the library (matplotlib, seaborn, Plotly, Altair) best suited for each chart type
- Use the quick-reference decision table to map from question to chart type in under 30 seconds
In This Chapter
- 35.1 How to Use This Gallery
- 35.2 Comparison
- 35.3 Composition
- 35.4 Distribution
- 35.5 Relationship
- 35.6 Trend
- 35.7 Geospatial
- 35.8 Flow and Network
- 35.9 Part-to-Whole and Ranking
- 35.10 Specialized
- 35.11 Quick-Reference Decision Table
- Check Your Understanding
- Chapter Summary
- A Final Word: From Blank Canvas to Informed Choice
Chapter 35: The Visualization Gallery
"The simple graph has brought more information to the data analyst's mind than any other device." — John W. Tukey
This is the chapter you keep open on your second monitor. It is not meant to be read front to back like the previous thirty-four chapters. It is a reference — a lookup table where the key is the question you are trying to answer and the value is the chart that answers it, complete with working code.
Fifty chart types. Nine question categories. Every code example self-contained and copy-paste-ready. No narrative preamble, no extended theory. You already have theory — Parts I and II gave you perception science, design principles, and a decision framework. This chapter gives you the recipes.
35.1 How to Use This Gallery
Visual Index
The gallery is organized by the question your visualization answers, following the framework introduced in Chapter 5:
| Question Category | Section | Chart Types | Count |
|---|---|---|---|
| Comparison | 35.2 | Bar, grouped bar, stacked bar, lollipop, dot plot, dumbbell, slope, bump | 8 |
| Composition | 35.3 | Stacked bar %, pie, donut, treemap, sunburst, waffle | 6 |
| Distribution | 35.4 | Histogram, KDE, ECDF, box, violin, strip, swarm, ridgeline | 8 |
| Relationship | 35.5 | Scatter, bubble, connected scatter, heatmap, correlogram, hexbin | 6 |
| Trend | 35.6 | Line, area, stacked area, sparkline, slope (temporal), candlestick | 6 |
| Geospatial | 35.7 | Choropleth, dot map, bubble map, hex tile map, cartogram | 5 |
| Flow and Network | 35.8 | Sankey, alluvial, chord, network, arc diagram | 5 |
| Part-to-Whole and Ranking | 35.9 | Waterfall, Marimekko, funnel | 3 |
| Specialized | 35.10 | Radar/spider, parallel coordinates, gauge | 3 |
Total: 50 chart types.
Question-Based Lookup
If you know the question, jump directly:
- "Which is bigger?" -- Section 35.2 (Comparison)
- "What share does each part have?" -- Section 35.3 (Composition)
- "How is the data spread?" -- Section 35.4 (Distribution)
- "Are these two things related?" -- Section 35.5 (Relationship)
- "How has this changed over time?" -- Section 35.6 (Trend)
- "Where is this happening?" -- Section 35.7 (Geospatial)
- "Where does this flow to?" -- Section 35.8 (Flow and Network)
- "How do the parts add to the whole?" -- Section 35.9 (Part-to-Whole and Ranking)
- "How do multiple dimensions compare for one entity?" -- Section 35.10 (Specialized)
Entry Structure
Every chart type follows an identical structure for fast scanning:
- Name and one-line description
- When to use — the scenario where this chart shines (1-2 sentences)
- When NOT to use — the trap to avoid (1-2 sentences)
- Data requirements — column types and shape
- Recommended library — the library that makes this chart easiest
- Code example — minimal, self-contained, copy-paste-ready
- Design tips — 2-3 actionable bullets
Import Conventions
All code examples assume the following imports are available. Run this cell once before any example in this chapter:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
Some examples additionally import Plotly or specialized libraries. Those imports are included inline within the specific code block. Every example ends with plt.show() or the Plotly equivalent so that the chart renders immediately.
35.2 Comparison
Comparison charts answer the question: "Which is bigger? How do groups differ?" Use these when your audience needs to judge relative magnitude across categories.
35.2.1 Bar Chart
One-line: Horizontal or vertical bars encoding a single quantitative value per category.
When to use: Comparing a single metric across a small-to-medium number of categories (3-20). The workhorse of data visualization. When in doubt, start here.
When NOT to use: When you have more than about 25 categories (the chart becomes a wall of bars), or when the data is continuous rather than categorical. Do not use a bar chart to show trends over time when a line chart would be clearer.
Data requirements: One categorical column, one numeric column.
Recommended library: matplotlib or seaborn.
categories = ["Product A", "Product B", "Product C", "Product D", "Product E"]
values = [48, 35, 62, 21, 55]
fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(categories, values, color="#4C72B0", edgecolor="none")
ax.set_xlabel("Revenue ($K)")
ax.set_title("Q3 Revenue by Product Line")
ax.invert_yaxis()
sns.despine(left=True, bottom=False)
plt.tight_layout()
plt.show()
Design tips: - Horizontal bars are almost always better than vertical bars — they give room for long category labels without rotating text. - Sort bars by value, not alphabetically, unless the categories have a natural order (months, ratings). - Start the axis at zero. Always. Truncating the axis exaggerates differences and misleads the reader.
35.2.2 Grouped Bar Chart
One-line: Side-by-side bars comparing a metric across categories, split by a grouping variable.
When to use: Comparing two or three groups across the same categories. Excellent for before/after comparisons or contrasting a small number of segments.
When NOT to use: When you have more than three or four groups — the bars become too thin to read. Switch to small multiples or a heatmap instead.
Data requirements: One categorical column, one grouping column (2-4 levels), one numeric column.
Recommended library: seaborn or matplotlib.
data = pd.DataFrame({
"Region": ["North", "South", "East", "West"] * 2,
"Year": ["2023"] * 4 + ["2024"] * 4,
"Sales": [120, 98, 145, 110, 135, 105, 160, 125]
})
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=data, x="Region", y="Sales", hue="Year", palette="Set2", ax=ax)
ax.set_ylabel("Sales ($K)")
ax.set_title("Regional Sales: 2023 vs 2024")
ax.legend(title="Year")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Limit groups to 2-3 to keep the chart readable. Beyond that, consider faceting. - Use a consistent color for each group across all your reports so readers build familiarity. - Add value labels on each bar if precise numbers matter more than visual comparison.
35.2.3 Stacked Bar Chart
One-line: Bars divided into segments showing how sub-groups contribute to each category's total.
When to use: When both the total and the breakdown matter. Good for showing how composition shifts across categories while preserving the ability to compare totals.
When NOT to use: When you need precise comparison of inner segments — only the bottom segment and the total have a common baseline, making middle segments hard to compare. Use a grouped bar or small multiples for precise segment comparison.
Data requirements: One categorical column, one grouping column, one numeric column.
Recommended library: matplotlib or pandas (via .plot(kind='bar', stacked=True)).
categories = ["Q1", "Q2", "Q3", "Q4"]
hardware = [30, 35, 28, 40]
software = [50, 55, 60, 58]
services = [20, 25, 30, 35]
fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(categories, hardware, label="Hardware", color="#4C72B0")
ax.bar(categories, software, bottom=hardware, label="Software", color="#55A868")
ax.bar(categories, services, bottom=[h + s for h, s in zip(hardware, software)],
label="Services", color="#C44E52")
ax.set_ylabel("Revenue ($M)")
ax.set_title("Quarterly Revenue by Segment")
ax.legend(loc="upper left")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Place the most important segment at the bottom so it shares the baseline and is easiest to compare. - Use no more than 4-5 segments; beyond that, the chart becomes unreadable. - Consider a 100% stacked bar (Section 35.3.1) if you care only about proportions, not totals.
35.2.4 Lollipop Chart
One-line: A dot on a stick — a minimalist alternative to the bar chart that reduces visual clutter.
When to use: Anywhere you would use a bar chart but want a lighter, less ink-heavy appearance. Particularly effective with many categories (15-30) where dense bars create a heavy block of color.
When NOT to use: When your audience is unfamiliar with the format and expects traditional bars. In formal reports for non-technical stakeholders, bar charts remain safer.
Data requirements: One categorical column, one numeric column.
Recommended library: matplotlib.
categories = ["Dept A", "Dept B", "Dept C", "Dept D", "Dept E",
"Dept F", "Dept G", "Dept H"]
scores = [82, 75, 91, 68, 88, 73, 95, 79]
# Sort for readability
order = np.argsort(scores)
categories = [categories[i] for i in order]
scores = [scores[i] for i in order]
fig, ax = plt.subplots(figsize=(8, 6))
ax.hlines(y=categories, xmin=0, xmax=scores, color="#4C72B0", linewidth=1.5)
ax.scatter(scores, categories, color="#4C72B0", s=80, zorder=3)
ax.set_xlabel("Satisfaction Score")
ax.set_title("Employee Satisfaction by Department")
ax.set_xlim(0, 100)
sns.despine(left=True)
plt.tight_layout()
plt.show()
Design tips: - Always sort by value for easy scanning — the dot positions should form a clear pattern from low to high. - The stem (line) should start at zero to preserve proportional accuracy. - Increase the dot size slightly for emphasis; keep the stem thin (linewidth 1-2).
35.2.5 Dot Plot (Cleveland Style)
One-line: Points plotted against a categorical axis with a common baseline, as championed by William Cleveland.
When to use: Comparing values across many categories (20+) where bar charts become overwhelming. The absence of bars focuses attention on position rather than area, which humans decode more accurately.
When NOT to use: When audiences expect bars and would find dots confusing. Also less effective when the baseline is not zero and you need to show the full magnitude.
Data requirements: One categorical column, one numeric column.
Recommended library: matplotlib.
countries = ["Japan", "Germany", "UK", "France", "Canada",
"Australia", "Italy", "Spain", "Netherlands", "Sweden"]
life_exp = [84.6, 81.3, 81.0, 82.5, 82.4, 83.2, 83.5, 83.4, 82.3, 83.0]
order = np.argsort(life_exp)
countries = [countries[i] for i in order]
life_exp = [life_exp[i] for i in order]
fig, ax = plt.subplots(figsize=(7, 6))
ax.scatter(life_exp, countries, color="#4C72B0", s=70, zorder=3)
ax.set_xlabel("Life Expectancy (years)")
ax.set_title("Life Expectancy by Country (2023)")
ax.grid(axis="x", linestyle="--", alpha=0.4)
ax.set_axisbelow(True)
sns.despine(left=True)
plt.tight_layout()
plt.show()
Design tips: - Add a light horizontal grid so readers can trace from the dot back to the category label. - Sort categories by value, not alphabetically. - For grouped comparisons, use different colored dots with a legend rather than side-by-side bars.
35.2.6 Dumbbell Chart
One-line: Two dots connected by a line, showing the gap between two values for each category.
When to use: Highlighting the difference between two points — before vs. after, Group A vs. Group B, target vs. actual. The connecting line draws the eye to the magnitude of the gap.
When NOT to use: When you have more than two comparison points per category. With three or more values, the connecting lines create visual confusion. Use a grouped dot plot instead.
Data requirements: One categorical column, two numeric columns (e.g., start and end values).
Recommended library: matplotlib.
departments = ["Engineering", "Marketing", "Sales", "Support", "HR", "Finance"]
score_2022 = [72, 65, 58, 70, 80, 75]
score_2024 = [85, 78, 74, 68, 82, 90]
order = np.argsort([e - s for s, e in zip(score_2022, score_2024)])[::-1]
departments = [departments[i] for i in order]
score_2022 = [score_2022[i] for i in order]
score_2024 = [score_2024[i] for i in order]
fig, ax = plt.subplots(figsize=(8, 5))
for i, dept in enumerate(departments):
ax.plot([score_2022[i], score_2024[i]], [dept, dept],
color="#CCCCCC", linewidth=2, zorder=1)
ax.scatter(score_2022, departments, color="#C44E52", s=80, label="2022", zorder=2)
ax.scatter(score_2024, departments, color="#4C72B0", s=80, label="2024", zorder=2)
ax.set_xlabel("Engagement Score")
ax.set_title("Employee Engagement: 2022 vs 2024")
ax.legend(loc="lower right")
sns.despine(left=True)
plt.tight_layout()
plt.show()
Design tips: - Sort by the gap size (largest improvement at top) to tell a story, or by end value if absolute performance matters more. - Use contrasting colors for the two endpoints with a neutral gray for the connecting line. - Add annotations for the largest and smallest gaps to guide the reader's attention.
35.2.7 Slope Chart
One-line: Two vertical axes connected by lines showing how each item's rank or value changed between two time points.
When to use: Showing change between exactly two time periods for a moderate number of items (5-15). The slope of each line immediately communicates direction and magnitude of change. Excellent for "winners and losers" narratives.
When NOT to use: When you have more than two time periods (use a line chart) or more than about 15 items (the lines become spaghetti). Also ineffective if most values barely change — the lines will be nearly horizontal and the chart will look empty.
Data requirements: One categorical column (items), two numeric columns (values at time 1 and time 2).
Recommended library: matplotlib.
teams = ["Alpha", "Beta", "Gamma", "Delta", "Epsilon"]
rank_before = [3, 1, 5, 2, 4]
rank_after = [1, 4, 2, 3, 5]
fig, ax = plt.subplots(figsize=(6, 6))
for i, team in enumerate(teams):
color = "#4C72B0" if rank_after[i] < rank_before[i] else "#C44E52"
ax.plot([0, 1], [rank_before[i], rank_after[i]], color=color,
linewidth=2, marker="o", markersize=8)
ax.text(-0.08, rank_before[i], team, ha="right", va="center", fontsize=10)
ax.text(1.08, rank_after[i], team, ha="left", va="center", fontsize=10)
ax.set_xlim(-0.3, 1.3)
ax.set_xticks([0, 1])
ax.set_xticklabels(["Jan 2024", "Jun 2024"])
ax.invert_yaxis()
ax.set_ylabel("Rank")
ax.set_title("Team Performance Ranking Change")
ax.grid(axis="y", linestyle="--", alpha=0.3)
sns.despine(left=True, bottom=True)
plt.tight_layout()
plt.show()
Design tips: - Color-code by direction: one color for improvement, another for decline. A neutral gray works for no change. - Label each line at both endpoints to avoid the need for a legend. - Invert the y-axis if showing ranks (rank 1 at top).
35.2.8 Bump Chart
One-line: A multi-period slope chart that tracks rankings over time, with lines bumping up and down as positions change.
When to use: Tracking how rankings shift across multiple time periods (3-10 periods, 5-12 items). Common in sports standings, market share rankings, and university league tables. The crossings of lines are the story.
When NOT to use: When you care about absolute values rather than ordinal rankings. Also problematic with many items — more than about 12 lines creates an unreadable tangle. Use a heatmap of ranks instead.
Data requirements: One categorical column (items), one temporal/ordinal column (periods), one rank or numeric column.
Recommended library: matplotlib (with manual construction) or plotly.
teams = ["Red", "Blue", "Green", "Gold"]
months = ["Jan", "Feb", "Mar", "Apr", "May"]
rankings = {
"Red": [1, 2, 2, 3, 1],
"Blue": [3, 1, 1, 1, 2],
"Green": [2, 3, 4, 2, 3],
"Gold": [4, 4, 3, 4, 4],
}
fig, ax = plt.subplots(figsize=(9, 5))
colors = {"Red": "#C44E52", "Blue": "#4C72B0", "Green": "#55A868", "Gold": "#DAA520"}
for team, ranks in rankings.items():
ax.plot(months, ranks, marker="o", linewidth=2.5, markersize=8,
color=colors[team], label=team)
ax.text(len(months) - 1 + 0.1, ranks[-1], team,
va="center", fontsize=10, color=colors[team])
ax.invert_yaxis()
ax.set_ylabel("Rank")
ax.set_title("League Standings Over the Season")
ax.set_yticks([1, 2, 3, 4])
ax.legend(loc="upper left", frameon=False)
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Invert the y-axis so that rank 1 is at the top. - Use thick lines and large markers so that crossings are visible. - Label the endpoint of each line with the team/item name for direct identification without consulting a legend.
35.3 Composition
Composition charts answer the question: "What share does each part have? How do the parts make up the whole?"
35.3.1 Stacked Bar Chart (100%)
One-line: Bars normalized to 100%, showing proportional composition across categories.
When to use: Comparing how the percentage breakdown differs across categories when you do not care about totals. Effective for survey results (strongly agree through strongly disagree) and demographic breakdowns.
When NOT to use: When absolute totals also matter — the viewer cannot recover the original magnitudes. Also poor when proportions barely differ across categories, as the visual differences become imperceptible.
Data requirements: One categorical column, one grouping column, one numeric column (will be converted to percentages).
Recommended library: matplotlib or pandas .plot().
categories = ["2020", "2021", "2022", "2023"]
mobile = [55, 60, 64, 68]
desktop = [38, 32, 28, 24]
tablet = [7, 8, 8, 8]
totals = [m + d + t for m, d, t in zip(mobile, desktop, tablet)]
mobile_pct = [m / tot * 100 for m, tot in zip(mobile, totals)]
desktop_pct = [d / tot * 100 for d, tot in zip(desktop, totals)]
tablet_pct = [t / tot * 100 for t, tot in zip(tablet, totals)]
fig, ax = plt.subplots(figsize=(8, 5))
ax.barh(categories, mobile_pct, label="Mobile", color="#4C72B0")
ax.barh(categories, desktop_pct, left=mobile_pct, label="Desktop", color="#55A868")
left2 = [m + d for m, d in zip(mobile_pct, desktop_pct)]
ax.barh(categories, tablet_pct, left=left2, label="Tablet", color="#C44E52")
ax.set_xlabel("Percentage of Traffic (%)")
ax.set_title("Web Traffic by Device Type")
ax.legend(loc="lower right")
ax.set_xlim(0, 100)
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Place the segment you want readers to compare against the left (or bottom) baseline — that segment is the easiest to judge accurately. - Add percentage labels inside each segment when precision matters. - Limit to 4-5 segments maximum; merge small categories into "Other."
35.3.2 Pie Chart
One-line: A circle divided into slices proportional to each category's share.
When to use: Showing parts of a whole when you have 2-4 categories and one slice dominates. Audiences instinctively understand pies for simple majority/minority splits. Acceptable in executive dashboards and infographics.
When NOT to use: When you have more than 5 slices, when slices are similar in size (humans are poor at comparing angles), or when precision matters. A bar chart sorted by size is almost always more informative.
Data requirements: One categorical column, one numeric column (positive values that sum to a meaningful whole).
Recommended library: matplotlib.
labels = ["North America", "Europe", "Asia-Pacific", "Other"]
sizes = [42, 28, 22, 8]
colors = ["#4C72B0", "#55A868", "#C44E52", "#8172B2"]
fig, ax = plt.subplots(figsize=(6, 6))
wedges, texts, autotexts = ax.pie(
sizes, labels=labels, autopct="%1.0f%%", colors=colors,
startangle=90, pctdistance=0.75,
textprops={"fontsize": 11}
)
for t in autotexts:
t.set_fontweight("bold")
ax.set_title("Revenue Distribution by Region")
plt.tight_layout()
plt.show()
Design tips: - Start the largest slice at 12 o'clock (startangle=90) and proceed clockwise. - Never use 3D pie charts — the perspective distortion makes slices at the back appear smaller than they are. - If you have more than 4-5 categories, switch to a horizontal bar chart sorted by value.
35.3.3 Donut Chart
One-line: A pie chart with the center removed, creating a ring — slightly easier to read and useful for placing a summary metric in the center.
When to use: Same scenarios as a pie chart, but when you want to display a key number (total, percentage, or headline metric) in the center of the ring. Common in dashboards.
When NOT to use: Same caveats as pie charts — too many slices, similar-sized slices, or when precision matters.
Data requirements: One categorical column, one numeric column.
Recommended library: matplotlib.
labels = ["Completed", "In Progress", "Overdue"]
sizes = [65, 25, 10]
colors = ["#55A868", "#4C72B0", "#C44E52"]
fig, ax = plt.subplots(figsize=(6, 6))
wedges, texts, autotexts = ax.pie(
sizes, labels=labels, autopct="%1.0f%%", colors=colors,
startangle=90, pctdistance=0.82,
wedgeprops={"width": 0.4, "edgecolor": "white", "linewidth": 2}
)
ax.text(0, 0, "65%\nOn Track", ha="center", va="center",
fontsize=16, fontweight="bold")
ax.set_title("Project Status Overview")
plt.tight_layout()
plt.show()
Design tips: - The ring width (wedgeprops width) should be between 0.3 and 0.5 — too thin and slices are hard to see, too thick and it looks like a pie. - Use the center for one key number, not a paragraph of text. - Add white borders between slices (edgecolor="white") for visual separation.
35.3.4 Treemap
One-line: Nested rectangles where area represents quantity, with hierarchy shown through nesting.
When to use: Displaying hierarchical data where both the size and the grouping matter. Excellent for disk usage, budget breakdowns, or market capitalization by sector and company. Works well for large numbers of items (50+) that would overwhelm a bar chart.
When NOT to use: When precise comparison matters — humans judge rectangular areas poorly. When the hierarchy has more than 2-3 levels, the nesting becomes hard to follow.
Data requirements: One or more categorical columns (hierarchy levels), one numeric column (size).
Recommended library: plotly or squarify (matplotlib add-on).
import squarify
sizes = [35, 25, 15, 10, 8, 4, 3]
labels = ["Tech", "Healthcare", "Finance", "Energy",
"Consumer", "Industrial", "Utilities"]
colors = sns.color_palette("Set2", len(sizes))
fig, ax = plt.subplots(figsize=(10, 6))
squarify.plot(sizes=sizes, label=labels, color=colors, alpha=0.85,
text_kwargs={"fontsize": 12, "fontweight": "bold"}, ax=ax)
ax.set_title("Market Capitalization by Sector", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()
Design tips:
- Use color to encode a second variable (e.g., growth rate) rather than just distinguishing categories.
- Ensure labels are readable — hide labels for very small rectangles and use a tooltip or legend.
- Install squarify with pip install squarify for static treemaps; use Plotly for interactive ones.
35.3.5 Sunburst Chart
One-line: Concentric rings showing hierarchical composition — the inner ring is the top level, outer rings are sub-levels.
When to use: Exploring multi-level hierarchical data interactively. The radial layout naturally communicates parent-child relationships. Best in interactive contexts (Plotly) where users can click to zoom into sub-levels.
When NOT to use: In static/printed media where the interactivity advantage is lost. Also poor when the hierarchy is very unbalanced (one branch has 50 items, another has 2).
Data requirements: Two or more categorical columns (hierarchy levels), one numeric column (size).
Recommended library: plotly.
import plotly.express as px
data = pd.DataFrame({
"region": ["Americas", "Americas", "Americas", "EMEA", "EMEA", "APAC", "APAC"],
"country": ["USA", "Canada", "Brazil", "UK", "Germany", "Japan", "Australia"],
"revenue": [500, 120, 90, 200, 180, 300, 95]
})
fig = px.sunburst(data, path=["region", "country"], values="revenue",
title="Revenue by Region and Country",
color_discrete_sequence=px.colors.qualitative.Set2)
fig.update_layout(width=600, height=600)
fig.show()
Design tips: - Keep hierarchy depth to 2-3 levels; deeper nesting makes outer rings too thin. - Use Plotly's built-in interactivity so users can click to drill down. - Label only the largest segments; let tooltips handle the small ones.
35.3.6 Waffle Chart
One-line: A grid of small squares (typically 10x10 = 100) where colored squares represent percentages.
When to use: Showing a single percentage or a simple part-to-whole split (2-4 categories) in an engaging, infographic-friendly style. Each square represents 1%, making the chart highly intuitive. Effective for reports aimed at general audiences.
When NOT to use: When you need to show many categories or precise non-integer percentages. Also not suitable for comparing multiple groups simultaneously — you would need multiple waffle grids side by side.
Data requirements: One categorical column, one numeric column (percentages or counts convertible to percentages).
Recommended library: matplotlib (manual) or the pywaffle package.
from pywaffle import Waffle
fig = plt.figure(
FigureClass=Waffle,
rows=10,
columns=10,
values=[62, 25, 13],
labels=["Renewable (62%)", "Natural Gas (25%)", "Coal (13%)"],
colors=["#55A868", "#4C72B0", "#C44E52"],
legend={"loc": "lower left", "bbox_to_anchor": (0, -0.15), "ncol": 3,
"fontsize": 10, "frameon": False},
figsize=(8, 8),
title={"label": "Energy Mix 2024", "fontsize": 14}
)
plt.tight_layout()
plt.show()
Design tips:
- Use a 10x10 grid (100 squares) so each square is exactly 1% — this is the convention readers expect.
- Limit to 3-4 categories; with more, the grid becomes a confusing patchwork.
- Install with pip install pywaffle. For a dependency-free version, draw colored rectangles manually with matplotlib patches.
35.4 Distribution
Distribution charts answer the question: "How is the data spread? What is typical, and what is unusual?"
35.4.1 Histogram
One-line: Bars representing the frequency of values falling into contiguous bins.
When to use: Exploring the shape of a single continuous variable's distribution — modality, skewness, spread, and outliers. The first chart you should make when you receive any numeric column.
When NOT to use: When comparing distributions of multiple groups (use overlapping KDE or a violin). Bin width choice dramatically changes the story; avoid histograms in high-stakes presentations where the audience might question your binning.
Data requirements: One numeric column.
Recommended library: seaborn or matplotlib.
np.random.seed(42)
data = np.concatenate([np.random.normal(50, 10, 500),
np.random.normal(80, 5, 200)])
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(data, bins=30, color="#4C72B0", edgecolor="white", alpha=0.85)
ax.set_xlabel("Value")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Measurements")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Experiment with bin counts: too few bins hide structure, too many create noise. The Freedman-Diaconis rule (bins="fd") is a good default.
- Add a KDE overlay (sns.histplot(..., kde=True)) to smooth out binning artifacts.
- Use white edge lines between bars to make individual bins visually distinct.
35.4.2 KDE (Kernel Density Estimate) Plot
One-line: A smoothed, continuous estimate of the probability density function.
When to use: Comparing the shape of 2-4 distributions on the same axes without the binning artifacts of histograms. The smooth curves make overlap regions visible.
When NOT to use: When the sample size is very small (< 30) — KDE can create misleading smooth curves from sparse data. Also misleading at distribution boundaries (e.g., showing density below zero for strictly positive data).
Data requirements: One numeric column per group.
Recommended library: seaborn.
np.random.seed(42)
group_a = np.random.normal(60, 12, 300)
group_b = np.random.normal(72, 10, 300)
fig, ax = plt.subplots(figsize=(8, 5))
sns.kdeplot(group_a, label="Group A", color="#4C72B0", linewidth=2, ax=ax)
sns.kdeplot(group_b, label="Group B", color="#C44E52", linewidth=2, ax=ax)
ax.set_xlabel("Score")
ax.set_ylabel("Density")
ax.set_title("Score Distribution by Group")
ax.legend()
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Adjust the bandwidth parameter (bw_adjust in seaborn) if the curve is too smooth or too jagged.
- Fill the area under the curve with low alpha (0.2-0.3) to make overlap visible without hiding either group.
- Clip the KDE to the data's natural domain (e.g., clip=(0, None) for non-negative data).
35.4.3 ECDF (Empirical Cumulative Distribution Function)
One-line: A step function showing the proportion of data at or below each value.
When to use: Comparing distributions without any binning or smoothing choices. The ECDF is a lossless representation — every data point is visible. Excellent for detecting differences in medians, tails, and spread between groups.
When NOT to use: When your audience is unfamiliar with cumulative distributions. The y-axis (cumulative proportion) is unintuitive to many non-technical readers. Use a histogram or violin for broader audiences.
Data requirements: One numeric column per group.
Recommended library: seaborn.
np.random.seed(42)
control = np.random.normal(100, 15, 200)
treatment = np.random.normal(108, 15, 200)
fig, ax = plt.subplots(figsize=(8, 5))
sns.ecdfplot(control, label="Control", color="#4C72B0", linewidth=2, ax=ax)
sns.ecdfplot(treatment, label="Treatment", color="#C44E52", linewidth=2, ax=ax)
ax.set_xlabel("Response Time (ms)")
ax.set_ylabel("Cumulative Proportion")
ax.set_title("Response Time: Control vs Treatment")
ax.legend()
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Draw a horizontal reference line at 0.5 to mark the median. - ECDFs never require parameter tuning (no bins, no bandwidth) — this makes them ideal for automated reporting. - The maximum vertical gap between two ECDF curves is the Kolmogorov-Smirnov statistic — annotate it if you want to highlight distributional differences.
35.4.4 Box Plot
One-line: A five-number summary (min, Q1, median, Q3, max) with outlier markers.
When to use: Comparing distributions across many groups (5-30) in a compact space. The box plot packs a lot of information into a small area and is the standard in scientific publications.
When NOT to use: When the distribution is multimodal — the box hides bimodality entirely. Also poor for small samples (n < 10) where the quartile estimates are unreliable.
Data requirements: One numeric column, optionally one categorical column for grouping.
Recommended library: seaborn.
np.random.seed(42)
data = pd.DataFrame({
"Department": np.repeat(["Eng", "Sales", "Marketing", "Support"], 50),
"Salary": np.concatenate([
np.random.normal(95, 15, 50), np.random.normal(75, 20, 50),
np.random.normal(70, 12, 50), np.random.normal(60, 10, 50)
])
})
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(data=data, x="Department", y="Salary", palette="Set2",
width=0.5, ax=ax)
ax.set_ylabel("Salary ($K)")
ax.set_title("Salary Distribution by Department")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Pair with a strip or swarm overlay (sns.stripplot) to show actual data points alongside the summary.
- Horizontal box plots are better when category labels are long.
- Notched box plots (notch=True) provide a visual confidence interval around the median — useful for quick comparisons.
35.4.5 Violin Plot
One-line: A mirrored KDE on each side of a central axis, combining distributional shape with the compact structure of a box plot.
When to use: Comparing distributional shapes across 2-8 groups. Unlike box plots, violins reveal bimodality, skewness, and other shape features. Ideal for scientific and academic contexts.
When NOT to use: When the audience is non-technical and would find the mirrored shape confusing. Also less effective with very small sample sizes where the KDE is unreliable.
Data requirements: One numeric column, one categorical column.
Recommended library: seaborn.
np.random.seed(42)
data = pd.DataFrame({
"Method": np.repeat(["A", "B", "C"], 200),
"Accuracy": np.concatenate([
np.random.beta(5, 2, 200) * 100,
np.random.beta(2, 5, 200) * 100,
np.random.normal(55, 8, 200)
])
})
fig, ax = plt.subplots(figsize=(8, 5))
sns.violinplot(data=data, x="Method", y="Accuracy", palette="Set2",
inner="quartile", ax=ax)
ax.set_ylabel("Accuracy (%)")
ax.set_title("Model Accuracy by Training Method")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Use inner="quartile" to show quartile lines inside the violin, combining the best of violin and box plots.
- Consider split violins (split=True with a hue variable) to compare two sub-groups side by side within each category.
- Cut the KDE at the data bounds (cut=0) to avoid implying density where no data exists.
35.4.6 Strip Plot
One-line: Individual data points plotted along a categorical axis with jitter to reduce overlap.
When to use: Small-to-medium datasets (n < 200 per group) where showing every individual observation matters. Useful as a standalone chart for small samples or as an overlay on box/violin plots for larger ones.
When NOT to use: With large samples (n > 300 per group) the dots merge into an unreadable blob. Switch to a violin, swarm, or ECDF for larger datasets.
Data requirements: One numeric column, one categorical column.
Recommended library: seaborn.
np.random.seed(42)
data = pd.DataFrame({
"Treatment": np.repeat(["Placebo", "Low Dose", "High Dose"], 30),
"Response": np.concatenate([
np.random.normal(5, 2, 30),
np.random.normal(7, 2, 30),
np.random.normal(10, 3, 30)
])
})
fig, ax = plt.subplots(figsize=(7, 5))
sns.stripplot(data=data, x="Treatment", y="Response", jitter=0.25,
color="#4C72B0", alpha=0.6, size=6, ax=ax)
ax.set_ylabel("Response Value")
ax.set_title("Treatment Response (Individual Observations)")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Increase jitter (0.2-0.4) to reduce overlap, but not so much that points spill into neighboring categories. - Lower alpha (0.4-0.6) to reveal density through overplotting. - Layer on top of a box plot for the best of both worlds: summary statistics plus individual data.
35.4.7 Swarm / Beeswarm Plot
One-line: Like a strip plot, but points are algorithmically nudged so that no two overlap.
When to use: Small datasets (n < 100 per group) where every individual data point should be visible and distinguishable. The resulting shape resembles a violin — you get distributional form from the raw data.
When NOT to use: Computationally expensive for large samples (n > 300 per group). The algorithm slows dramatically, and the chart becomes too wide. Use a violin or KDE for larger data.
Data requirements: One numeric column, one categorical column.
Recommended library: seaborn.
np.random.seed(42)
data = pd.DataFrame({
"Class": np.repeat(["A", "B", "C"], 40),
"Score": np.concatenate([
np.random.normal(75, 8, 40),
np.random.normal(82, 6, 40),
np.random.normal(70, 10, 40)
])
})
fig, ax = plt.subplots(figsize=(7, 5))
sns.swarmplot(data=data, x="Class", y="Score", palette="Set2", size=5, ax=ax)
ax.set_ylabel("Exam Score")
ax.set_title("Exam Scores by Class Section")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Beeswarm plots shine for small n. If you have more than about 100 points per group, seaborn will issue a warning and points may overflow.
- Combine with a box plot overlay (plot the box first, swarm on top) for summary plus detail.
- Use a single hue for simple comparison; add a hue parameter to color by a third variable.
35.4.8 Ridgeline Plot
One-line: Stacked, overlapping KDE plots — one per group — creating a mountain-range effect that makes distributional shifts easy to spot.
When to use: Comparing distributions across many groups (5-20) where the shift over an ordinal variable (months, versions, categories) is the story. The vertical stacking uses space efficiently and looks visually striking.
When NOT to use: When precise reading of density values matters — the overlap obscures exact heights. Also ineffective if distributions do not meaningfully differ; the chart just shows a stack of identical shapes.
Data requirements: One numeric column, one categorical/ordinal column (group identifier).
Recommended library: seaborn (via FacetGrid and kdeplot) or joypy.
from joypy import joyplot
np.random.seed(42)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
records = []
for i, m in enumerate(months):
temps = np.random.normal(loc=10 + 12 * np.sin(np.pi * i / 6), scale=4, size=200)
records.extend([(m, t) for t in temps])
data = pd.DataFrame(records, columns=["Month", "Temperature"])
data["Month"] = pd.Categorical(data["Month"], categories=months, ordered=True)
fig, axes = joyplot(
data, by="Month", column="Temperature",
figsize=(8, 8), colormap=plt.cm.coolwarm, alpha=0.7,
linewidth=1.2
)
plt.title("Monthly Temperature Distributions", fontsize=14, y=1.02)
plt.xlabel("Temperature (C)")
plt.show()
Design tips:
- Install with pip install joypy. Alternatively, build manually with seaborn FacetGrid plus kdeplot.
- Use a sequential or diverging colormap to reinforce the ordinal ordering of groups.
- Overlap should be moderate (30-50%) — too much hides lower distributions, too little wastes space.
35.5 Relationship
Relationship charts answer the question: "Are these two (or more) things related? How does one variable change as another changes?"
35.5.1 Scatter Plot
One-line: Points plotted at (x, y) positions showing the relationship between two continuous variables.
When to use: Exploring correlation, clusters, and outliers between two numeric variables. The foundational chart for regression, classification boundaries, and bivariate analysis. Works well for n = 50 to n = 2,000.
When NOT to use: When one axis is categorical (use a strip or box plot). Also ineffective with very large n (> 5,000) due to overplotting — switch to a hexbin or 2D KDE.
Data requirements: Two numeric columns.
Recommended library: matplotlib or seaborn.
np.random.seed(42)
x = np.random.uniform(20, 80, 150)
y = 0.6 * x + np.random.normal(0, 8, 150)
fig, ax = plt.subplots(figsize=(7, 6))
ax.scatter(x, y, alpha=0.6, color="#4C72B0", edgecolors="white", s=50)
ax.set_xlabel("Study Hours per Month")
ax.set_ylabel("Exam Score")
ax.set_title("Study Time vs Exam Performance")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Use transparency (alpha 0.3-0.6) to reveal density in overlapping regions.
- Add a regression line (sns.regplot) only if you are making a claim about the linear relationship — do not add one by default.
- Encode a third variable with color (categorical) or size (continuous) to upgrade the scatter to a bubble chart.
35.5.2 Bubble Chart
One-line: A scatter plot where point size encodes a third quantitative variable.
When to use: Showing the relationship between three continuous variables simultaneously. The classic Gapminder chart (GDP vs. life expectancy, sized by population) is the archetype.
When NOT to use: When the size differences are small — readers cannot distinguish circles whose radii differ by less than about 30%. Also poor when bubbles overlap extensively and obscure each other.
Data requirements: Three numeric columns (x, y, size), optionally a categorical column for color.
Recommended library: matplotlib or plotly.
np.random.seed(42)
countries = ["A", "B", "C", "D", "E", "F", "G", "H"]
gdp = [45, 35, 55, 20, 60, 30, 50, 25]
life_exp = [78, 72, 82, 65, 80, 70, 81, 68]
population = [50, 120, 30, 200, 80, 150, 40, 180]
fig, ax = plt.subplots(figsize=(8, 6))
scatter = ax.scatter(gdp, life_exp, s=[p * 3 for p in population],
alpha=0.6, c="#4C72B0", edgecolors="white", linewidth=1)
for i, c in enumerate(countries):
ax.annotate(c, (gdp[i], life_exp[i]), fontsize=9, ha="center", va="center")
ax.set_xlabel("GDP per Capita ($K)")
ax.set_ylabel("Life Expectancy (years)")
ax.set_title("GDP vs Life Expectancy (bubble size = population)")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Scale by area, not radius — s in matplotlib is already area-based, so pass values directly (possibly with a multiplier).
- Label key bubbles directly rather than using a legend for each one.
- Provide a size legend (e.g., three reference circles) so readers can estimate the third variable.
35.5.3 Connected Scatter Plot
One-line: A scatter plot where consecutive points are connected by lines, showing how a bivariate relationship evolves over time.
When to use: Tracking the trajectory of two variables that change over time. The path reveals whether the relationship is stable, cyclical, or diverging. Common in economics (inflation vs. unemployment) and sports analytics.
When NOT to use: When the path crosses itself many times, creating an unreadable tangle. Also ineffective if there is no meaningful temporal ordering to the points.
Data requirements: Two numeric columns, one temporal/ordinal column for ordering.
Recommended library: matplotlib.
years = list(range(2015, 2025))
inflation = [1.2, 1.5, 2.1, 2.4, 1.8, 1.3, 4.7, 8.0, 4.1, 2.9]
unemployment = [5.3, 4.9, 4.4, 3.9, 3.7, 8.1, 5.4, 3.6, 3.5, 4.0]
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(inflation, unemployment, color="#4C72B0", linewidth=1.5,
marker="o", markersize=6, zorder=2)
for i, yr in enumerate(years):
ax.annotate(str(yr), (inflation[i], unemployment[i]),
textcoords="offset points", xytext=(6, 6), fontsize=8)
ax.set_xlabel("Inflation Rate (%)")
ax.set_ylabel("Unemployment Rate (%)")
ax.set_title("Inflation vs Unemployment (2015-2024)")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Label every point (or every other point) with the year/period to make the temporal sequence clear. - Use an arrow or gradient color to indicate the direction of time. - Highlight the start and end points with larger markers.
35.5.4 Heatmap
One-line: A matrix of colored cells where color intensity represents value magnitude.
When to use: Visualizing values across two categorical dimensions (e.g., day of week vs. hour of day, feature A vs. feature B). Also the standard way to display correlation matrices and confusion matrices.
When NOT to use: When either dimension has more than about 30 categories (the cells become too small). When precise numeric reading is required — color discrimination is imprecise. Add text annotations to the cells for precision.
Data requirements: A 2D matrix (DataFrame or array) of numeric values.
Recommended library: seaborn.
np.random.seed(42)
data = np.random.randint(10, 100, size=(7, 5))
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
hours = ["9am", "11am", "1pm", "3pm", "5pm"]
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(pd.DataFrame(data, index=days, columns=hours),
annot=True, fmt="d", cmap="YlOrRd", linewidths=0.5,
cbar_kws={"label": "Visitors"}, ax=ax)
ax.set_title("Website Visitors by Day and Hour")
ax.set_ylabel("")
plt.tight_layout()
plt.show()
Design tips:
- Use a sequential colormap (YlOrRd, viridis, Blues) for data with a natural zero or minimum. Use a diverging colormap (RdBu, coolwarm) when zero or a midpoint is meaningful.
- Always annotate cells with values when the matrix is small enough (< 15x15) so readers get exact numbers plus color patterns.
- Add linewidths to separate cells visually.
35.5.5 Correlogram
One-line: A heatmap of pairwise correlation coefficients across all numeric columns in a dataset.
When to use: Quick exploratory analysis of multivariate relationships. Immediately reveals which variables are strongly correlated (positively or negatively) and which are independent.
When NOT to use: When you have fewer than 4 variables (just compute the correlation manually). Also misleading when relationships are non-linear — Pearson correlation measures only linear association.
Data requirements: A DataFrame with 4+ numeric columns.
Recommended library: seaborn.
np.random.seed(42)
data = pd.DataFrame({
"Revenue": np.random.normal(100, 20, 100),
"Marketing": np.random.normal(50, 10, 100),
"Headcount": np.random.normal(200, 30, 100),
"Satisfaction": np.random.normal(75, 8, 100),
})
data["Revenue"] += 0.7 * data["Marketing"] # inject correlation
corr = data.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
fig, ax = plt.subplots(figsize=(7, 6))
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r",
center=0, vmin=-1, vmax=1, linewidths=0.5, ax=ax)
ax.set_title("Correlation Matrix")
plt.tight_layout()
plt.show()
Design tips:
- Mask the upper triangle (np.triu) to remove redundant information — the matrix is symmetric.
- Use a diverging colormap centered at zero (RdBu_r or coolwarm) so negative and positive correlations are visually distinct.
- Annotate every cell with the correlation value rounded to two decimal places.
35.5.6 Hexbin Plot
One-line: A 2D histogram using hexagonal bins, where color represents the count (or aggregated value) in each hex.
When to use: Scatter plots with massive overplotting (n > 5,000). Hexagons tile the plane more efficiently than squares and have uniform neighbor distance, making density patterns clearer.
When NOT to use: Small datasets where individual points are visible and informative. Also not ideal for non-technical audiences unfamiliar with the encoding.
Data requirements: Two numeric columns.
Recommended library: matplotlib.
np.random.seed(42)
x = np.random.normal(0, 1, 50000)
y = x * 0.5 + np.random.normal(0, 0.8, 50000)
fig, ax = plt.subplots(figsize=(8, 6))
hb = ax.hexbin(x, y, gridsize=40, cmap="YlGnBu", mincnt=1)
cb = fig.colorbar(hb, ax=ax, label="Count")
ax.set_xlabel("Variable X")
ax.set_ylabel("Variable Y")
ax.set_title("Density of 50,000 Observations")
sns.despine()
plt.tight_layout()
plt.show()
Design tips:
- Adjust gridsize — higher values give finer resolution but thinner hexagons. Start with 30-50 for most datasets.
- Use mincnt=1 to suppress empty hexagons, keeping the chart clean.
- A sequential colormap with a light-to-dark progression communicates density most intuitively.
35.6 Trend
Trend charts answer the question: "How has this changed over time?"
35.6.1 Line Chart
One-line: Points connected by lines, showing the trajectory of a value over a continuous or ordered axis.
When to use: Displaying change over time for 1-5 series. The most universally understood chart for temporal data. The slope of the line directly communicates rate of change.
When NOT to use: When the x-axis is categorical with no natural order — connected lines imply continuity. Also avoid when you have more than 6-7 lines, which creates spaghetti.
Data requirements: One temporal/ordinal column (x-axis), one or more numeric columns (y-axis).
Recommended library: matplotlib.
months = pd.date_range("2024-01", periods=12, freq="MS")
revenue = [42, 45, 50, 48, 55, 60, 58, 65, 70, 68, 75, 80]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(months, revenue, color="#4C72B0", linewidth=2, marker="o", markersize=5)
ax.set_xlabel("Month")
ax.set_ylabel("Revenue ($K)")
ax.set_title("Monthly Revenue Trend (2024)")
ax.xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter("%b"))
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Use direct labels at line endpoints instead of a legend whenever possible — it reduces eye travel. - Highlight key events (product launch, policy change) with vertical reference lines and annotations. - Keep line widths between 1.5 and 2.5 for print; thinner lines for on-screen interactive charts.
35.6.2 Area Chart
One-line: A line chart with the area between the line and the x-axis filled in.
When to use: Emphasizing the cumulative magnitude of a value over time. The filled area gives a stronger sense of volume than a plain line. Best for a single series or two non-overlapping series.
When NOT to use: With multiple overlapping series — later series obscure earlier ones. Use a stacked area (35.6.3) for multiple series, or just use line charts.
Data requirements: One temporal column, one numeric column.
Recommended library: matplotlib.
months = pd.date_range("2024-01", periods=12, freq="MS")
users = [1200, 1350, 1500, 1800, 2100, 2400, 2300, 2600, 2900, 3100, 3400, 3800]
fig, ax = plt.subplots(figsize=(9, 5))
ax.fill_between(months, users, alpha=0.3, color="#4C72B0")
ax.plot(months, users, color="#4C72B0", linewidth=2)
ax.set_xlabel("Month")
ax.set_ylabel("Active Users")
ax.set_title("User Growth Over 2024")
ax.xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter("%b"))
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Use a low alpha (0.2-0.4) for the fill so that grid lines remain visible through the shading. - Always include the line on top of the fill for a sharp boundary. - The y-axis should start at zero when the filled area represents a cumulative quantity.
35.6.3 Stacked Area Chart
One-line: Multiple area layers stacked on top of each other, showing both individual trends and their sum.
When to use: Showing how multiple components contribute to a total over time. The top edge of the stack is the total, and each band is a component. Common for revenue by product line, traffic by source, or energy by fuel type.
When NOT to use: When you need to compare individual series precisely — only the bottom layer has a flat baseline, making all other layers hard to read. Use small multiples of line charts for precise per-series comparison.
Data requirements: One temporal column, two or more numeric columns.
Recommended library: matplotlib.
months = pd.date_range("2024-01", periods=12, freq="MS")
organic = [500, 520, 550, 580, 600, 620, 650, 680, 700, 730, 760, 800]
paid = [300, 310, 340, 360, 400, 420, 400, 430, 460, 480, 500, 520]
referral = [100, 110, 105, 115, 120, 130, 135, 140, 150, 155, 160, 170]
fig, ax = plt.subplots(figsize=(9, 5))
ax.stackplot(months, organic, paid, referral,
labels=["Organic", "Paid", "Referral"],
colors=["#4C72B0", "#55A868", "#C44E52"], alpha=0.8)
ax.set_xlabel("Month")
ax.set_ylabel("Visitors")
ax.set_title("Website Traffic by Source (2024)")
ax.legend(loc="upper left")
ax.xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter("%b"))
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Place the most stable or most important series at the bottom — it gets the flat baseline and is easiest to read. - Limit to 4-5 layers. Beyond that, the chart becomes a colorful but unreadable mass. - Use a 100% stacked area variant to focus on changing proportions over time rather than absolute values.
35.6.4 Sparkline
One-line: A tiny, word-sized line chart embedded in text or a table cell, showing trend without axes or labels.
When to use: Providing at-a-glance trend context alongside tabular data. Sparklines let a table of numbers convey direction, volatility, and relative magnitude without demanding a full-size chart.
When NOT to use: When the audience needs to read precise values — sparklines intentionally lack axes and labels. They communicate shape, not specifics.
Data requirements: One numeric series (typically a time series).
Recommended library: matplotlib.
def sparkline(data, ax, color="#4C72B0"):
ax.plot(data, color=color, linewidth=1.5)
ax.fill_between(range(len(data)), data, alpha=0.1, color=color)
ax.scatter([len(data) - 1], [data[-1]], color=color, s=15, zorder=3)
ax.set_xlim(0, len(data) - 1)
ax.axis("off")
metrics = {
"Revenue": [40, 42, 45, 43, 48, 52, 55, 58, 60, 62, 65, 70],
"Users": [1.0, 1.1, 1.2, 1.4, 1.3, 1.5, 1.6, 1.8, 2.0, 2.1, 2.3, 2.5],
"Churn (%)": [5.2, 5.0, 4.8, 4.9, 4.5, 4.3, 4.0, 3.8, 3.9, 3.5, 3.3, 3.0],
}
fig, axes = plt.subplots(len(metrics), 1, figsize=(4, 3))
for ax, (name, values) in zip(axes, metrics.items()):
sparkline(values, ax)
ax.text(-0.02, 0.5, name, transform=ax.transAxes, fontsize=10,
va="center", ha="right")
fig.suptitle("KPI Sparklines (12-Month Trend)", fontsize=12, y=1.02)
plt.tight_layout()
plt.show()
Design tips: - Highlight the last value with a dot to show the current state. - Keep all sparklines on the same vertical scale if comparisons across rows matter; use independent scales if only direction matters. - A faint fill under the line helps the eye detect the trend more quickly than the line alone.
35.6.5 Slope Chart (Temporal)
One-line: A minimalist two-point line connecting a "before" and "after" value, optimized for showing change between two time periods.
When to use: Comparing the change in a metric across exactly two time periods for multiple items. Simpler and cleaner than a grouped bar chart when the story is about direction and magnitude of change. The slope of each line directly communicates whether the value rose or fell and by how much.
When NOT to use: When you have more than two time periods (use a line chart). When all items changed by roughly the same amount, making all slopes nearly parallel and the chart uninformative.
Data requirements: One categorical column (items), two numeric values per item (before and after).
Recommended library: matplotlib.
products = ["Widget A", "Widget B", "Widget C", "Widget D", "Widget E"]
sales_h1 = [120, 95, 150, 80, 110]
sales_h2 = [140, 75, 165, 105, 100]
fig, ax = plt.subplots(figsize=(6, 6))
for i, p in enumerate(products):
color = "#55A868" if sales_h2[i] >= sales_h1[i] else "#C44E52"
ax.plot([0, 1], [sales_h1[i], sales_h2[i]], marker="o",
color=color, linewidth=2, markersize=7)
ax.text(-0.06, sales_h1[i], f"{p} ({sales_h1[i]})",
ha="right", va="center", fontsize=9)
ax.text(1.06, sales_h2[i], f"{p} ({sales_h2[i]})",
ha="left", va="center", fontsize=9)
ax.set_xticks([0, 1])
ax.set_xticklabels(["H1 2024", "H2 2024"], fontsize=11)
ax.set_xlim(-0.4, 1.4)
ax.set_title("Sales Change: H1 vs H2 2024")
ax.get_yaxis().set_visible(False)
sns.despine(left=True, bottom=True)
plt.tight_layout()
plt.show()
Design tips: - Color-code lines by direction: green for increase, red for decrease. - Label both endpoints with the item name and value to eliminate the need for a y-axis. - Avoid overlapping labels — if values are close, offset them slightly or use a step-offset algorithm.
35.6.6 Candlestick Chart
One-line: A financial chart showing open, high, low, and close values for each time period as a "candle" with a body and wicks.
When to use: Visualizing stock prices, currency exchange rates, or any data that has open-high-low-close (OHLC) structure per time period. Standard in financial analysis, where the body color (green/red) instantly signals up or down days.
When NOT to use: Outside of OHLC data contexts — the four-value-per-period structure is specific. Non-financial audiences will find the encoding unfamiliar and confusing.
Data requirements: One temporal column, four numeric columns (open, high, low, close).
Recommended library: mplfinance (matplotlib financial add-on) or plotly.
import plotly.graph_objects as go
dates = pd.date_range("2024-10-01", periods=20, freq="B")
np.random.seed(42)
close = 100 + np.cumsum(np.random.randn(20) * 2)
open_price = close + np.random.randn(20) * 0.5
high = np.maximum(open_price, close) + np.abs(np.random.randn(20))
low = np.minimum(open_price, close) - np.abs(np.random.randn(20))
fig = go.Figure(data=[go.Candlestick(
x=dates, open=open_price, high=high, low=low, close=close,
increasing_line_color="#55A868", decreasing_line_color="#C44E52"
)])
fig.update_layout(
title="Stock Price (Oct 2024)",
yaxis_title="Price ($)",
xaxis_rangeslider_visible=False,
width=800, height=500
)
fig.show()
Design tips:
- Green (or white) for days where close > open; red (or black) for close < open. This is the universal convention; do not invert it.
- Remove Plotly's default range slider (xaxis_rangeslider_visible=False) for a cleaner chart unless the user needs date navigation.
- Add volume bars as a subplot below the candlestick for full financial context.
35.7 Geospatial
Geospatial charts answer the question: "Where is this happening? How does it vary across geography?"
35.7.1 Choropleth Map
One-line: A map where geographic regions (countries, states, counties) are colored by a data value.
When to use: Showing how a metric varies across well-known geographic boundaries. Effective for election results, population density, GDP per capita, and similar region-level metrics.
When NOT to use: When geographic areas differ vastly in size — large, sparse regions (Siberia, Alaska) dominate visually even if their data values are unremarkable. Consider a cartogram or hex tile map to equalize visual weight.
Data requirements: A geographic identifier column (ISO code, FIPS, state name), one numeric column.
Recommended library: plotly.
import plotly.express as px
data = pd.DataFrame({
"state": ["CA", "TX", "NY", "FL", "IL", "PA", "OH", "GA", "NC", "MI"],
"value": [85, 72, 90, 68, 78, 65, 60, 55, 70, 58]
})
fig = px.choropleth(data, locations="state", locationmode="USA-states",
color="value", scope="usa",
color_continuous_scale="Blues",
title="Metric by State")
fig.update_layout(width=800, height=500)
fig.show()
Design tips: - Use a sequential colormap for data with a natural minimum (e.g., counts, rates). Use diverging for data centered on a meaningful midpoint (e.g., change from baseline). - Always include a color bar with a clear label and units. - Consider normalizing by population or area — raw counts on a choropleth often just show population density.
35.7.2 Dot Map
One-line: Individual dots placed at geographic coordinates, one per data point or per fixed quantity.
When to use: Showing the spatial distribution of events or entities when you have precise location data (latitude/longitude). Each dot can represent one observation or a fixed number (e.g., 1 dot = 100 people).
When NOT to use: When you have so many points that the map becomes a solid mass of color. Use a heatmap or hexbin map for very dense point clouds.
Data requirements: Latitude and longitude columns, optionally a categorical column for color.
Recommended library: plotly or folium.
import plotly.express as px
np.random.seed(42)
data = pd.DataFrame({
"lat": np.random.uniform(25, 48, 200),
"lon": np.random.uniform(-125, -70, 200),
"type": np.random.choice(["Type A", "Type B"], 200)
})
fig = px.scatter_geo(data, lat="lat", lon="lon", color="type",
scope="usa", title="Event Locations by Type",
color_discrete_sequence=["#4C72B0", "#C44E52"])
fig.update_layout(width=800, height=500)
fig.show()
Design tips: - Use small dots (marker size 3-5) to avoid overlap in dense areas. - Encode a categorical variable with color and a quantitative variable with size for a richer display. - Add a basemap with minimal detail — the data should be the visual focus, not the map tiles.
35.7.3 Bubble Map
One-line: A map with circles placed at geographic coordinates, sized by a quantitative variable.
When to use: Showing magnitude at specific locations — city populations, earthquake magnitudes, store revenue. The bubble size communicates value while the position communicates location.
When NOT to use: When bubbles overlap heavily in dense regions, obscuring both data and geography. Also ineffective if the range of values is very narrow and all bubbles look the same size.
Data requirements: Latitude, longitude, one numeric column for size, optionally one for color.
Recommended library: plotly.
import plotly.express as px
cities = pd.DataFrame({
"city": ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"],
"lat": [40.71, 34.05, 41.88, 29.76, 33.45],
"lon": [-74.01, -118.24, -87.63, -95.37, -112.07],
"population": [8.3, 3.9, 2.7, 2.3, 1.6]
})
fig = px.scatter_geo(cities, lat="lat", lon="lon", size="population",
text="city", scope="usa",
title="US City Populations (millions)",
size_max=40)
fig.update_layout(width=800, height=500)
fig.show()
Design tips:
- Scale by area (not radius) to avoid exaggerating large values. Plotly's size parameter handles this correctly by default.
- Use transparency (opacity=0.6) so overlapping bubbles remain partially visible.
- Add text labels for the largest bubbles; use hover for the rest.
35.7.4 Hex Tile Map
One-line: A map where each geographic region (typically a US state) is represented by an identically sized hexagon arranged to approximate geographic layout.
When to use: When you want equal visual weight for every region regardless of its geographic area. A hex tile map of US states gives Wyoming the same size as Texas, ensuring that the data — not land mass — drives the visual impression.
When NOT to use: When precise geographic accuracy matters. The hexagonal approximation distorts spatial relationships enough that it is unsuitable for showing distances, borders, or spatial clustering.
Data requirements: A geographic identifier column, one numeric column for color.
Recommended library: plotly (with a hex tile GeoJSON) or manual matplotlib.
import plotly.express as px
# Simplified: use Plotly choropleth with a note about hex tile approach
# For true hex tiles, use a hex tile GeoJSON from sources like
# https://github.com/holtzy/D3-graph-gallery
states = pd.DataFrame({
"state": ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA"],
"score": [62, 70, 58, 55, 85, 78, 80, 72, 65, 60]
})
fig = px.choropleth(states, locations="state", locationmode="USA-states",
color="score", scope="usa",
color_continuous_scale="Viridis",
title="State Scores (standard choropleth; hex tile requires custom GeoJSON)")
fig.update_layout(width=800, height=500)
fig.show()
Design tips: - Source a hex tile GeoJSON from community repositories; building one from scratch is tedious. - Color each hex by the data variable; use a sequential colormap for quantitative data. - Label each hex with the state abbreviation since the hexagonal positions are approximate.
35.7.5 Cartogram
One-line: A map where the size of each region is distorted to be proportional to a data variable rather than land area.
When to use: When the story is about magnitude (population, GDP, votes) rather than geography. A cartogram of world population makes India and China visually dominant while shrinking Russia and Canada, telling a very different story than a standard map.
When NOT to use: When geographic accuracy matters, or when the audience is unfamiliar with the technique and will be confused by the distorted shapes. Always pair with a standard map for context.
Data requirements: A geographic boundary file (GeoJSON/shapefile), one numeric column for distortion.
Recommended library: geopandas with a cartogram algorithm, or use an external tool and import the result.
# Cartograms require specialized computation. This example shows the
# conceptual workflow using geopandas:
import geopandas as gpd
# Load a world shapefile (built into geopandas)
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world[world.continent != "Antarctica"]
# Standard choropleth as baseline — true cartogram distortion
# requires the 'cartogram' R package or the Python 'cartogram_geopandas' library
fig, ax = plt.subplots(figsize=(12, 6))
world.plot(column="pop_est", cmap="YlOrRd", legend=True,
legend_kwds={"label": "Population", "shrink": 0.5}, ax=ax)
ax.set_title("World Population (standard map; use cartogram library for distortion)")
ax.axis("off")
plt.tight_layout()
plt.show()
Design tips: - Always include a legend explaining that area represents data, not land mass. - Provide a small inset of the undistorted map so readers can orient themselves. - Dorling cartograms (circles sized by value) are an easier alternative to boundary-distortion cartograms.
35.8 Flow and Network
Flow and network charts answer the question: "Where does this flow? How are things connected?"
35.8.1 Sankey Diagram
One-line: Weighted, directed flows between nodes, where the width of each link is proportional to the flow magnitude.
When to use: Showing how quantities flow from sources through intermediate stages to destinations — energy flows, budget allocation, user journeys, migration patterns. The visual width of the links makes magnitudes immediately apparent.
When NOT to use: When there are too many nodes (> 20) or links (> 40) — the diagram becomes an unreadable web. Also poor when flows are roughly equal in size, as all links look the same width.
Data requirements: Source and target columns (categorical), one numeric column (flow magnitude).
Recommended library: plotly.
import plotly.graph_objects as go
labels = ["Salary", "Freelance", "Rent", "Food", "Savings", "Transport", "Other"]
source = [0, 0, 1, 1, 0, 0, 1]
target = [2, 3, 4, 5, 4, 6, 6]
values = [1500, 800, 500, 300, 700, 400, 200]
fig = go.Figure(data=[go.Sankey(
node=dict(pad=20, thickness=20, label=labels,
color=["#4C72B0", "#55A868", "#C44E52", "#C44E52",
"#4C72B0", "#C44E52", "#8172B2"]),
link=dict(source=source, target=target, value=values,
color=["rgba(76,114,176,0.4)"] * len(values))
)])
fig.update_layout(title="Monthly Budget Flow", width=800, height=500)
fig.show()
Design tips: - Order nodes logically: sources on the left, destinations on the right, intermediaries in the middle. - Use low-opacity link colors so overlapping flows remain distinguishable. - Label nodes clearly; label links with values if there are fewer than 10 links.
35.8.2 Alluvial Diagram
One-line: A Sankey-like chart where categorical variables at multiple stages are connected by flowing ribbons, showing how group memberships shift across stages.
When to use: Showing how individuals move between categories over time or across survey questions. Common for customer segmentation shifts, voting pattern changes, and academic major switches. The ribbons make transitions visible.
When NOT to use: When there are many categories at each stage (> 8) — the ribbons tangle. Also ineffective when most individuals stay in the same category (the dominant "straight-through" ribbons overwhelm the interesting transitions).
Data requirements: Two or more categorical columns (stages), optionally a count/weight column.
Recommended library: plotly (as a Sankey variant) or the pyalluvial package.
import plotly.graph_objects as go
# Three stages: initial segment, mid-year segment, end-year segment
labels = ["Free", "Basic", "Premium", # Stage 1 (indices 0-2)
"Free", "Basic", "Premium", # Stage 2 (indices 3-5)
"Free", "Basic", "Premium"] # Stage 3 (indices 6-8)
source = [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
target = [3, 4, 4, 5, 4, 5, 6, 7, 7, 8, 7, 8]
values = [80, 20, 15, 35, 5, 45, 70, 10, 25, 25, 10, 40]
fig = go.Figure(data=[go.Sankey(
node=dict(pad=20, thickness=15, label=labels,
color=["#C44E52", "#4C72B0", "#55A868"] * 3),
link=dict(source=source, target=target, value=values,
color=["rgba(200,200,200,0.4)"] * len(values))
)])
fig.update_layout(title="Customer Tier Migration (Q1 > Q2 > Q3)", width=900, height=500)
fig.show()
Design tips: - Color the ribbons by source category to track where each group ends up. - Arrange stages left-to-right with clear column labels. - Highlight the most interesting transition (e.g., churn) with a distinct color while keeping others muted.
35.8.3 Chord Diagram
One-line: A circular layout where arcs on the perimeter represent categories and chords connecting them represent flows or relationships.
When to use: Showing pairwise flows between categories where every category can be both a source and a destination. Trade flows between countries, migration between cities, or collaboration between departments. The circular layout treats all nodes equally.
When NOT to use: When the matrix is sparse (few connections) — the diagram looks empty. Also hard to read with more than about 10 nodes.
Data requirements: A square matrix (n x n) of flow values, or source-target-value triplets.
Recommended library: plotly (via Biopython chord) or holoviews; no single Python library dominates. A practical approach uses plotly with careful arc construction.
# Chord diagrams are complex to construct from scratch in matplotlib.
# The recommended approach is holoviews:
import holoviews as hv
hv.extension("matplotlib")
data = pd.DataFrame({
"source": ["A", "A", "B", "B", "C", "C"],
"target": ["B", "C", "A", "C", "A", "B"],
"value": [50, 30, 40, 60, 20, 35]
})
chord = hv.Chord(data)
chord.opts(
labels="index", cmap="Set2",
edge_color="source", node_color="index",
title="Inter-Department Collaboration"
)
hv.render(chord)
Design tips:
- Use color to encode the source of each chord for directional clarity.
- Sort nodes by total flow magnitude to minimize chord crossings.
- Install holoviews with pip install holoviews. Alternatively, export your adjacency matrix to an online chord diagram tool for quick prototyping.
35.8.4 Network Graph (Force-Directed)
One-line: Nodes and edges laid out by a physics-based simulation where connected nodes attract and unconnected nodes repel.
When to use: Exploring the structure of a network — social connections, citation graphs, dependency trees, co-occurrence relationships. The force-directed layout automatically reveals clusters, bridges, and outliers.
When NOT to use: When the network has more than about 500 nodes and 2,000 edges — the layout becomes a hairball. Use community detection and summarize at the group level, or switch to an adjacency matrix heatmap.
Data requirements: A node list and an edge list (source, target, optionally weight).
Recommended library: networkx (layout) + matplotlib (rendering), or pyvis for interactive output.
import networkx as nx
np.random.seed(42)
G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)
communities = nx.community.greedy_modularity_communities(G)
color_map = {}
palette = ["#4C72B0", "#C44E52", "#55A868", "#8172B2"]
for i, comm in enumerate(communities):
for node in comm:
color_map[node] = palette[i % len(palette)]
fig, ax = plt.subplots(figsize=(9, 7))
nx.draw_networkx(G, pos, ax=ax,
node_color=[color_map[n] for n in G.nodes()],
node_size=300, edge_color="#CCCCCC",
font_size=8, font_color="white")
ax.set_title("Zachary's Karate Club Network")
ax.axis("off")
plt.tight_layout()
plt.show()
Design tips: - Color nodes by community/cluster to reveal group structure. - Size nodes by degree (number of connections) or another centrality measure. - Use low-opacity, thin edges to prevent the edge bundle from overwhelming the nodes.
35.8.5 Arc Diagram
One-line: Nodes arranged on a single horizontal line with semicircular arcs above (or below) connecting linked nodes.
When to use: When you want a network visualization that is more compact and orderly than a force-directed layout. The linear arrangement makes it easy to see the range and density of connections. Works well for small-to-medium networks (10-50 nodes).
When NOT to use: When the network is dense and arcs would stack heavily, creating an unreadable tangle. Also poor for revealing 2D spatial clustering — all structure is compressed to one dimension.
Data requirements: A node list (with ordering), an edge list.
Recommended library: matplotlib (manual construction).
nodes = list(range(15))
edges = [(0, 5), (1, 8), (2, 12), (3, 7), (4, 13), (5, 10),
(6, 11), (7, 14), (0, 3), (8, 12), (1, 6)]
fig, ax = plt.subplots(figsize=(12, 5))
# Draw nodes
ax.scatter(nodes, [0] * len(nodes), s=100, color="#4C72B0", zorder=3)
for n in nodes:
ax.text(n, -0.15, str(n), ha="center", fontsize=8)
# Draw arcs
for (u, v) in edges:
center = (u + v) / 2
radius = abs(v - u) / 2
arc = plt.matplotlib.patches.Arc(
(center, 0), abs(v - u), abs(v - u) * 0.8,
angle=0, theta1=0, theta2=180,
color="#C44E52", linewidth=1.2, alpha=0.6
)
ax.add_patch(arc)
ax.set_xlim(-1, max(nodes) + 1)
ax.set_ylim(-0.5, max(abs(v - u) for u, v in edges) * 0.5)
ax.set_title("Arc Diagram: Node Connections")
ax.axis("off")
plt.tight_layout()
plt.show()
Design tips: - Order nodes meaningfully (by degree, by cluster, by time) — the ordering determines which arcs are short (local connections) and which are long (distant connections). - Use arc thickness to encode edge weight. - Color arcs by source node or by community for additional information.
35.9 Part-to-Whole and Ranking
These charts answer the question: "How do the parts add up to the total, and where does each item rank?"
35.9.1 Waterfall Chart
One-line: Bars that float, showing how a starting value is increased or decreased by intermediate steps to arrive at a final value.
When to use: Explaining how a total changes through a series of additive or subtractive components. Classic use cases include financial bridges (revenue to profit), population change (births minus deaths plus migration), and variance analysis.
When NOT to use: When the components do not add sequentially to a meaningful total. Also confusing if there are more than about 10-12 steps, as the floating bars become hard to follow.
Data requirements: One categorical column (step names), one numeric column (values, positive or negative), with an implied running total.
Recommended library: matplotlib or plotly.
steps = ["Revenue", "COGS", "Gross Profit", "OpEx", "Tax", "Net Income"]
amounts = [500, -200, None, -150, -40, None] # None = subtotal
running = []
total = 0
for i, a in enumerate(amounts):
if a is None:
amounts[i] = total
running.append(0)
else:
running.append(total)
total += a
amounts[i] = a
colors = []
bottoms = []
plot_vals = []
for i, (a, r) in enumerate(zip(amounts, running)):
if steps[i] in ("Gross Profit", "Net Income"):
colors.append("#4C72B0")
bottoms.append(0)
total_so_far = sum(v for v in amounts[:i] if steps[amounts.index(v)] not in ("Gross Profit", "Net Income")) if i > 0 else 0
# For subtotals, compute cumulative
cum = 0
for j in range(i):
if running[j] == 0 and j > 0:
continue
cum += amounts[j] if running[j] != 0 else 0
plot_vals.append(amounts[i])
elif a >= 0:
colors.append("#55A868")
bottoms.append(r)
plot_vals.append(a)
else:
colors.append("#C44E52")
bottoms.append(r + a)
plot_vals.append(abs(a))
# Simpler direct approach:
steps = ["Revenue", "COGS", "Gross Profit", "OpEx", "Tax", "Net Income"]
values = [500, -200, 300, -150, -40, 110]
bottoms_v = [0, 300, 0, 150, 110, 0]
bar_vals = [500, 200, 300, 150, 40, 110]
colors_v = ["#55A868", "#C44E52", "#4C72B0", "#C44E52", "#C44E52", "#4C72B0"]
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.bar(steps, bar_vals, bottom=bottoms_v, color=colors_v, edgecolor="white",
width=0.6)
# Connector lines
for i in range(len(steps) - 1):
top_i = bottoms_v[i] + bar_vals[i]
ax.plot([i + 0.3, i + 0.7], [top_i, top_i], color="gray",
linewidth=0.8, linestyle="--")
for i, (b, v) in enumerate(zip(bottoms_v, bar_vals)):
ax.text(i, b + v + 5, f"${v}M", ha="center", fontsize=9, fontweight="bold")
ax.set_ylabel("Amount ($M)")
ax.set_title("Income Statement Waterfall")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Use green for increases, red for decreases, and a neutral blue or gray for subtotals (Gross Profit, Net Income). - Add thin connector lines between bars to show the running total level. - Label each bar with its value since the floating baseline makes visual estimation difficult.
35.9.2 Marimekko (Mosaic) Chart
One-line: A stacked bar chart where both the bar widths and the segment heights are proportional to data values, creating a two-dimensional part-to-whole display.
When to use: Showing market share or segment composition where both the size of each group (bar width) and the breakdown within each group (segment height) matter. Classic in strategy consulting for market landscape analysis.
When NOT to use: When the audience is unfamiliar with the format — Marimekko charts are not intuitive to general audiences. Also difficult to read when there are many segments (> 5 columns or > 4 rows).
Data requirements: Two categorical columns and one numeric column, or a contingency table.
Recommended library: matplotlib (manual construction).
segments = ["Enterprise", "Mid-Market", "SMB"]
widths = [50, 30, 20] # Market size as percentage
# Product mix within each segment (percentages summing to 100)
products = {
"Product A": [60, 40, 25],
"Product B": [25, 35, 45],
"Product C": [15, 25, 30],
}
colors = {"Product A": "#4C72B0", "Product B": "#55A868", "Product C": "#C44E52"}
fig, ax = plt.subplots(figsize=(10, 6))
cum_width = 0
for i, (seg, w) in enumerate(zip(segments, widths)):
cum_height = 0
for prod, shares in products.items():
h = shares[i]
ax.bar(cum_width + w / 2, h, bottom=cum_height, width=w * 0.95,
color=colors[prod], edgecolor="white", linewidth=1,
label=prod if i == 0 else "")
if h > 8:
ax.text(cum_width + w / 2, cum_height + h / 2,
f"{h}%", ha="center", va="center", fontsize=9, color="white",
fontweight="bold")
cum_height += h
ax.text(cum_width + w / 2, -4, f"{seg}\n({w}%)", ha="center", fontsize=10)
cum_width += w
ax.set_xlim(-2, 102)
ax.set_ylim(-8, 105)
ax.set_ylabel("Product Mix (%)")
ax.set_title("Market Landscape: Segment Size x Product Penetration")
ax.legend(loc="upper right")
ax.get_xaxis().set_visible(False)
sns.despine(bottom=True)
plt.tight_layout()
plt.show()
Design tips: - Label both axes: bar width represents one dimension (e.g., market size), bar height the other (e.g., product mix). - Use percentage labels inside cells so the reader does not need to estimate from axes. - Keep the grid simple: 3-4 columns and 3-4 segments is the sweet spot.
35.9.3 Funnel Chart
One-line: A progressively narrowing set of bars or trapezoids showing conversion or attrition through sequential stages.
When to use: Showing how an initial population decreases through a sequential process — sales funnels (leads to closed deals), user onboarding flows, or hiring pipelines. The visual narrowing immediately communicates drop-off.
When NOT to use: When the stages are not strictly sequential or when the process can loop back to earlier stages. Also misleading if the "stages" do not represent a single cohort progressing through time.
Data requirements: One ordinal column (stage names in order), one numeric column (count or percentage at each stage).
Recommended library: plotly.
import plotly.express as px
stages = ["Visitors", "Sign-ups", "Trial Users", "Paid Users", "Retained (12mo)"]
counts = [10000, 3500, 1200, 450, 280]
fig = px.funnel(
y=stages, x=counts,
title="User Acquisition Funnel",
color_discrete_sequence=["#4C72B0"]
)
fig.update_layout(width=700, height=500)
fig.show()
Design tips: - Label each stage with both the absolute count and the conversion rate from the previous stage. - Use a single color with decreasing opacity, or a sequential color scale that darkens as the funnel narrows. - Place the widest stage at the top; do not invert the funnel.
35.10 Specialized
These chart types serve niche but important roles that do not fit neatly into the categories above.
35.10.1 Radar / Spider Chart
One-line: A polygon on a radial grid where each spoke represents a variable, and the polygon's shape shows the profile across all variables.
When to use: Comparing the profile of one or two entities across 4-8 dimensions. Common in product comparison, athlete skill profiles, and survey results where a balanced "shape" is meaningful.
When NOT to use: When comparing more than 2-3 entities (overlapping polygons become unreadable). Also misleading because the area of the polygon is affected by variable order — rearranging spokes changes the visual impression without changing the data. Use parallel coordinates for precise multi-dimensional comparison.
Data requirements: One row per entity, one column per dimension (all on comparable scales).
Recommended library: matplotlib.
categories = ["Speed", "Power", "Accuracy", "Endurance", "Agility", "Technique"]
values_a = [85, 70, 90, 60, 80, 75]
values_b = [70, 90, 65, 85, 60, 80]
angles = np.linspace(0, 2 * np.pi, len(categories), endpoint=False).tolist()
values_a += values_a[:1] # close the polygon
values_b += values_b[:1]
angles += angles[:1]
fig, ax = plt.subplots(figsize=(7, 7), subplot_kw=dict(polar=True))
ax.plot(angles, values_a, "o-", linewidth=2, label="Player A", color="#4C72B0")
ax.fill(angles, values_a, alpha=0.15, color="#4C72B0")
ax.plot(angles, values_b, "o-", linewidth=2, label="Player B", color="#C44E52")
ax.fill(angles, values_b, alpha=0.15, color="#C44E52")
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, fontsize=10)
ax.set_ylim(0, 100)
ax.set_title("Player Comparison", y=1.08, fontsize=14)
ax.legend(loc="upper right", bbox_to_anchor=(1.2, 1.1))
plt.tight_layout()
plt.show()
Design tips: - Normalize all variables to the same scale (e.g., 0-100) before plotting. - Limit to 2 overlapping polygons; for more entities, use small multiples of individual radar charts. - Be aware that the polygon area is a meaningless artifact — do not draw conclusions from it.
35.10.2 Parallel Coordinates Plot
One-line: Vertical axes arranged side by side, with lines connecting each observation's value on each axis.
When to use: Exploring patterns, clusters, and outliers across 4-12 continuous variables simultaneously. Each line is one observation; patterns emerge when groups of lines take similar paths. Useful for high-dimensional data exploration before applying dimensionality reduction.
When NOT to use: When you have more than about 12 variables (the axes become too dense) or more than a few hundred observations (the lines become a solid mass). Also ineffective for categorical variables with few levels.
Data requirements: A DataFrame with 4+ numeric columns, optionally one categorical column for coloring.
Recommended library: plotly or pandas (via pandas.plotting.parallel_coordinates).
from pandas.plotting import parallel_coordinates
np.random.seed(42)
data = pd.DataFrame({
"Feature_1": np.concatenate([np.random.normal(5, 1, 50),
np.random.normal(8, 1, 50)]),
"Feature_2": np.concatenate([np.random.normal(3, 0.5, 50),
np.random.normal(6, 0.5, 50)]),
"Feature_3": np.concatenate([np.random.normal(7, 1.5, 50),
np.random.normal(4, 1.5, 50)]),
"Feature_4": np.concatenate([np.random.normal(2, 0.8, 50),
np.random.normal(5, 0.8, 50)]),
"Cluster": ["A"] * 50 + ["B"] * 50
})
fig, ax = plt.subplots(figsize=(10, 6))
parallel_coordinates(data, class_column="Cluster", color=["#4C72B0", "#C44E52"],
alpha=0.4, ax=ax)
ax.set_title("Parallel Coordinates: Two Clusters Across Four Features")
ax.legend(loc="upper right")
sns.despine()
plt.tight_layout()
plt.show()
Design tips: - Color lines by cluster or class to reveal group separation across dimensions. - Use transparency (alpha 0.2-0.4) to manage overplotting. - Reorder axes to place correlated variables adjacent — this makes crossing patterns more informative.
35.10.3 Gauge Chart
One-line: A semicircular or circular dial displaying a single value against a scale, like a speedometer.
When to use: Displaying a single KPI against a target or range (e.g., "75% of goal achieved"). Common in executive dashboards where one metric is the focal point and the audience expects an at-a-glance indicator.
When NOT to use: When you need to show trends, comparisons, or multiple values — a gauge shows exactly one number and wastes significant display space doing so. A simple text callout with a sparkline is often more space-efficient.
Data requirements: One numeric value, a scale range (min, max), optionally target thresholds.
Recommended library: plotly.
import plotly.graph_objects as go
fig = go.Figure(go.Indicator(
mode="gauge+number+delta",
value=78,
delta={"reference": 70, "increasing": {"color": "#55A868"}},
title={"text": "Customer Satisfaction Score"},
gauge={
"axis": {"range": [0, 100], "tickwidth": 1},
"bar": {"color": "#4C72B0"},
"steps": [
{"range": [0, 40], "color": "#FFCCCC"},
{"range": [40, 70], "color": "#FFFFCC"},
{"range": [70, 100], "color": "#CCFFCC"},
],
"threshold": {
"line": {"color": "#C44E52", "width": 4},
"thickness": 0.75,
"value": 70
}
}
))
fig.update_layout(width=500, height=400)
fig.show()
Design tips: - Use color bands (red/yellow/green) to show thresholds, but ensure the colors are distinguishable for color-blind readers. - Include the delta (change from target or previous period) to add context. - Limit to one or two gauges per dashboard row; they consume significant visual real estate for a single number.
35.11 Quick-Reference Decision Table
The table below maps from question type to recommended chart types. Use it as a rapid lookup when you know what question you are answering but need to choose the right visual form. Chart types are listed in rough order of preference (most common first).
| Question | Best Options | Also Consider | Avoid |
|---|---|---|---|
| Compare values across categories | Bar, lollipop, dot plot | Grouped bar, dumbbell | Pie (poor for comparison), 3D bar |
| Compare across categories + groups | Grouped bar, slope chart | Bump chart, dumbbell | Stacked bar (segments hard to compare) |
| Show parts of a whole (static) | Stacked bar 100%, treemap | Pie (2-4 cats only), donut, waffle | Pie with > 5 slices, 3D pie |
| Show parts of a whole (hierarchical) | Treemap, sunburst | Nested donut | Flat pie of all leaf categories |
| Show one distribution | Histogram, KDE, ECDF | Box plot (with strip overlay) | Pie, bar of binned counts |
| Compare distributions (2-4 groups) | Violin, KDE overlay, box | Ridgeline, ECDF | Back-to-back histograms with many groups |
| Compare distributions (5+ groups) | Ridgeline, box plot array | Violin small multiples | Overlapping histograms |
| Show individual observations | Strip, swarm/beeswarm | Scatter (if 2D) | Summary charts without raw data |
| Show bivariate relationship | Scatter, hexbin | Bubble, connected scatter | Line chart (implies ordering) |
| Show correlation matrix | Correlogram (heatmap) | Scatter matrix (pairplot) | Table of numbers only |
| Show trend over time (1-3 series) | Line, area | Sparkline (inline) | Bar chart of time periods |
| Show trend over time (4+ series) | Small multiple lines, stacked area | Bump chart (for ranks) | Spaghetti line chart |
| Show financial OHLC data | Candlestick | OHLC bar chart | Line chart (loses open/high/low) |
| Show geographic patterns | Choropleth, bubble map | Dot map, hex tile map | Over-detailed basemaps |
| Equalize geographic area bias | Hex tile map, cartogram | Dorling cartogram | Standard choropleth of area-variable regions |
| Show flows between stages | Sankey, funnel | Alluvial | Stacked bars disconnected from flow |
| Show network connections | Force-directed network, arc diagram | Chord diagram, adjacency matrix | Overloaded network (> 500 nodes) |
| Show categorical transitions | Alluvial, Sankey | Heatmap of transition matrix | Separate pie charts per stage |
| Show sequential gain/loss | Waterfall | Stacked bar with cumulative line | Grouped bar (loses running total) |
| Show market landscape | Marimekko | Bubble chart | Grouped bar (loses two-dimensional structure) |
| Profile an entity across dimensions | Radar (2-3 entities) | Parallel coordinates (many entities) | Radar with > 3 overlapping polygons |
| Explore high-dimensional data | Parallel coordinates | Scatter matrix (pairplot) | Single 2D scatter (misses dimensions) |
| Show a single KPI | Gauge, big number + sparkline | Bullet chart | Full-size chart for one number |
How to Read This Table
- Find your question in the left column.
- Start with the Best Options column — these are the most effective chart types for that question.
- Check Also Consider for alternatives that may suit your specific data shape or audience.
- Glance at Avoid to sidestep common missteps.
When two chart types seem equally valid, choose the one your audience knows. Familiarity reduces cognitive load. A well-executed bar chart communicates faster than a novel chart type, no matter how elegant.
Check Your Understanding
These questions are designed for self-assessment. No trick questions — just practical application of the gallery reference.
-
Chart selection: You have a dataset of 12 product categories, each with revenue for Q1 and Q3. You want to show which products gained and which lost revenue. Which chart type from this gallery is the best fit, and why?
-
Distribution choice: You need to compare the salary distributions of 8 departments in a single figure. A box plot array is one option. Name two alternatives from this gallery and explain one advantage of each over the box plot.
-
Avoiding traps: A colleague proposes a pie chart with 9 slices to show website traffic sources. Using the decision table in Section 35.11, suggest a better chart type and explain what makes it more effective.
-
Geospatial trade-offs: You are mapping a metric across US states, but you notice that Wyoming (large area, small population) visually dominates the choropleth while New Jersey (small area, large population) is barely visible. Name two chart types from Section 35.7 that address this problem and describe how each solves it.
-
Flow visualization: Your company wants to show how 1,000 website visitors move from a landing page through three intermediate pages to one of four exit actions. Which chart type from Section 35.8 is the best match? What is the maximum number of nodes you should target before the chart becomes unreadable?
-
Code adaptation: Take the lollipop chart code from Section 35.2.4 and describe (without writing code) the two changes you would make to turn it into a Cleveland dot plot (Section 35.2.5). What visual element is removed, and what is added?
-
Specialized chart caution: A marketing team wants to use a radar chart to compare 15 competing products across 6 dimensions. Explain why this is problematic and suggest an alternative from Section 35.10 that handles 15 entities gracefully.
Chapter Summary
This chapter presented 50 chart types organized into nine question-driven categories:
-
Comparison (8 types): From the universally understood bar chart to the specialized bump chart for tracking rankings over time. The lollipop and Cleveland dot plot offer lower-ink alternatives to bars. The dumbbell chart highlights gaps between two values, while slope and bump charts track changes across time periods.
-
Composition (6 types): The 100% stacked bar remains the most reliable part-to-whole chart. Pie and donut charts have narrow but legitimate use cases (2-4 categories, dominant slice). Treemaps and sunbursts handle hierarchical data. The waffle chart provides an intuitive grid-based alternative for infographic-style reports.
-
Distribution (8 types): Histograms and KDE plots for initial exploration. ECDF for parameter-free comparison. Box plots for compact multi-group summary. Violins for shape-preserving comparison. Strip and swarm plots for showing every individual data point. Ridgeline plots for comparing many groups with a visually striking layout.
-
Relationship (6 types): The scatter plot remains the foundation of bivariate analysis. Bubble charts add a third dimension through size. Connected scatter plots reveal temporal trajectories. Heatmaps and correlograms summarize matrix-structured relationships. Hexbin plots handle massive overplotting.
-
Trend (6 types): Line and area charts for standard temporal data. Stacked area for multi-component totals over time. Sparklines for inline context in tables and dashboards. Temporal slope charts for before-and-after comparisons. Candlestick charts for financial OHLC data.
-
Geospatial (5 types): Choropleth maps for region-level data. Dot and bubble maps for point-level data. Hex tile maps and cartograms for correcting the area-bias problem inherent in standard choropleths.
-
Flow and Network (5 types): Sankey diagrams for weighted directional flows. Alluvial diagrams for categorical transitions. Chord diagrams for mutual flows. Force-directed networks for structural exploration. Arc diagrams for compact, orderly network display.
-
Part-to-Whole and Ranking (3 types): Waterfall charts for sequential gain-and-loss narratives. Marimekko charts for two-dimensional market landscape views. Funnel charts for conversion and attrition pipelines.
-
Specialized (3 types): Radar charts for multi-dimensional profiling of 1-2 entities. Parallel coordinates for high-dimensional exploration. Gauge charts for single-KPI dashboard displays.
The quick-reference decision table in Section 35.11 provides a one-page lookup from question to chart type. Bookmark it. Print it. Keep it on your desk.
A Final Word: From Blank Canvas to Informed Choice
You have reached the end of a thirty-five-chapter journey. Let us take a moment to look back at the ground covered.
In Part I, you learned why visualization matters — that it is a cognitive amplifier, not decoration. You learned how the eye processes visual information, how color works and how it fails, how charts can lie, and how to choose the right chart for any question. Before writing a single line of Python, you built the intellectual foundation that separates someone who makes charts from someone who communicates with data.
In Part II, you internalized the design principles that make the difference between a chart that gets glanced at and one that gets understood. Data-ink ratio. Typography and annotation. Layout and composition. Storytelling structure. These are the principles that remain constant regardless of which library you use.
In Parts III through V, you mastered the Python visualization stack. Matplotlib gave you total control over every pixel. Seaborn gave you statistical intelligence built into every chart. Plotly and Altair gave you interactivity and the grammar of graphics. You learned not just one tool but the entire ecosystem, and — more importantly — you learned when to reach for each one.
In Part VI, you applied your skills to specialized domains: time series with their unique decomposition challenges, text and NLP visualization, scientific publication figures, and the performance demands of big data visualization.
In Part VII, you moved from individual charts to production systems: dashboards with Streamlit and Dash, automated reporting, theming and branding, and the complete visualization workflow from raw data to polished output.
In Part VIII, you built a capstone project that exercised every skill in the book, and you now hold this gallery — a permanent reference of 50 chart types that you can return to whenever you face a new dataset and a blank screen.
The blank screen is no longer intimidating. You know the question you are answering. You know which chart type answers it. You know which library makes it easiest. You know the design principles that make it effective. And you have the code to make it real.
Data visualization is not a final step in analysis. It is how you think about data — and now, it is how you help others think about data too. Every chart you create from this point forward is an argument rendered in ink, light, and code. Make it a good one.
End of Chapter 35. End of the book.