Case Study 2: The Sports Page Goes Digital — Priya's NBA Shot Charts

Contributors to Introduction to Data Science

Case Study 2: The Sports Page Goes Digital — Priya's NBA Shot Charts

Tier 3 — Illustrative/Composite Example: This case study follows Priya, the sports journalist introduced in Chapter 1, as she uses matplotlib to build visualizations for an article about how three-point shooting has changed the NBA. Priya is a composite character. The analysis methodology reflects genuine sports analytics practices, but all specific data points, player/team statistics, and chart outputs are constructed for pedagogical purposes. The broad trends described (increased three-point shooting in the NBA) are well-documented public facts.

The Setting

Priya has been hearing the same argument in every sports bar and podcast for years: "The three-point shot has ruined basketball." Or alternatively: "The three-point revolution is the best thing to happen to the game." She's a sports journalist who believes in data over hot takes, and she's finally ready to write the definitive article on how the three-point shot has reshaped the NBA.

She has a dataset spanning multiple NBA seasons: team-level statistics including three-point attempt rate, overall field goal percentage, pace (possessions per game), and win-loss records. She cleaned it in Chapter 8, reshaped it in Chapter 9, and planned her charts in Chapter 14.

Now she opens matplotlib and gets to work.

The Article's Visual Argument

Priya's article needs to make three points, each supported by a chart:

Teams are shooting dramatically more threes than they used to (the trend exists)
The relationship between three-point shooting and winning is real but nuanced (it's not a simple "more threes = more wins")
The distribution of team strategies has compressed (everyone is converging on a similar style)

Each point gets its own visualization.

Chart 1: The Trend Line

Priya's first chart shows how the league-average three-point attempt rate has changed over time.

import matplotlib.pyplot as plt

seasons = list(range(2000, 2024))
three_pt_rate = [
    22, 22, 23, 24, 25, 25, 26, 27, 27, 28,
    28, 28, 29, 31, 33, 34, 36, 37, 39, 39,
    40, 41, 42, 43
]

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(seasons, three_pt_rate, color="steelblue",
        linewidth=2.5, marker="o", markersize=4)

# Annotate key moments
ax.annotate("Steph Curry's first\nMVP season",
            xy=(2015, 34), xytext=(2010, 38),
            fontsize=9, color="seagreen",
            arrowprops=dict(arrowstyle="->", color="seagreen"))

ax.annotate("43% of all shots\nare now threes",
            xy=(2023, 43), xytext=(2020, 46),
            fontsize=9, color="tomato",
            arrowprops=dict(arrowstyle="->", color="tomato"))

ax.set_title("The Three-Point Revolution: NBA Teams Now Take\n"
             "Nearly Twice as Many Threes as in 2000",
             fontsize=13, fontweight="bold")
ax.set_xlabel("Season")
ax.set_ylabel("Three-Point Attempt Rate (%)")
ax.set_ylim(15, 50)
ax.grid(True, alpha=0.15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
fig.tight_layout()
fig.savefig("three_pt_trend.png", dpi=300, bbox_inches="tight")
plt.show()

The line tells the story immediately: a steady climb from 22% in 2000 to 43% in 2023, with an inflection point around 2014-2015 — the Curry era. The annotations connect the data to moments that basketball fans will recognize.

Priya's design decisions: - She chose a line chart because this is temporal data with a continuous trend. - She set the y-axis from 15 to 50 rather than 0 to 100 because the full range would flatten the trend into near-invisibility. For a line chart (which encodes position, not length), this is appropriate. - She annotated two key moments to connect the data to the human story. Without annotations, the chart is just a rising line. With them, it's a narrative.

Chart 2: The Scatter Plot

Priya's second chart explores the relationship between three-point attempt rate and winning. Her hypothesis: teams that shoot more threes tend to win more, but there are important exceptions.

import matplotlib.pyplot as plt
import random

random.seed(42)
n_teams = 30
team_3pt = [random.uniform(33, 48) for _ in range(n_teams)]
# Win percentage loosely correlated with 3PT rate,
# but with significant scatter
win_pct = [0.25 + 0.008 * rate + random.uniform(-0.15, 0.15)
           for rate in team_3pt]
win_pct = [max(0.15, min(0.85, w)) for w in win_pct]

fig, ax = plt.subplots(figsize=(9, 7))
ax.scatter(team_3pt, win_pct, color="steelblue", s=70,
           alpha=0.7, edgecolors="navy", linewidth=0.5)

# Highlight an outlier: high 3PT rate, low wins
outlier_idx = max(range(n_teams),
                  key=lambda i: team_3pt[i] - win_pct[i] * 50)
ax.scatter(team_3pt[outlier_idx], win_pct[outlier_idx],
           color="tomato", s=100, zorder=3, edgecolors="darkred")
ax.annotate("High volume,\nlow wins",
            xy=(team_3pt[outlier_idx], win_pct[outlier_idx]),
            xytext=(team_3pt[outlier_idx] - 5,
                    win_pct[outlier_idx] + 0.08),
            fontsize=9, color="tomato",
            arrowprops=dict(arrowstyle="->", color="tomato"))

ax.set_title("More Threes Help — But Don't Guarantee Wins",
             fontsize=13, fontweight="bold")
ax.set_xlabel("Three-Point Attempt Rate (%)")
ax.set_ylabel("Win Percentage")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(True, alpha=0.15)
fig.tight_layout()
fig.savefig("three_pt_vs_wins.png", dpi=300, bbox_inches="tight")
plt.show()

The scatter plot shows a general upward trend — higher three-point rates tend to correlate with higher win percentages — but with substantial scatter. Some teams shoot a lot of threes and lose. Some teams are more conservative and win anyway.

Priya highlights one dramatic outlier: a team with one of the highest three-point rates but a losing record. This is the nuance that a simple correlation coefficient would miss. The scatter plot shows that the relationship is real but noisy — shooting threes is not a magic formula for winning.

Priya's design decisions: - Scatter plot for two continuous variables: the only appropriate choice. - Outlier highlighted in a different color with annotation: draws the reader's eye to the story within the data. - Transparency (alpha=0.7): prevents points from completely hiding each other. - Title states the nuanced finding: "help, but don't guarantee."

Chart 3: The Small Multiples Histogram

Priya's third chart is her most ambitious: a comparison of how the distribution of team three-point attempt rates has changed over time. Her hypothesis is that teams used to have diverse strategies (some shot many threes, others almost none), but the distribution has compressed as everyone converged on the "analytics revolution" approach.

import matplotlib.pyplot as plt
import random

random.seed(42)

# Simulate distributions for three eras
eras = {
    "2004-05": [random.gauss(24, 5) for _ in range(30)],
    "2013-14": [random.gauss(30, 4) for _ in range(30)],
    "2022-23": [random.gauss(42, 3) for _ in range(30)],
}

fig, axes = plt.subplots(1, 3, figsize=(15, 5),
                         sharey=True, sharex=True)

for ax, (era, data) in zip(axes, eras.items()):
    ax.hist(data, bins=10, color="steelblue",
            edgecolor="white", range=(10, 55))
    ax.axvline(x=sum(data)/len(data), color="tomato",
               linestyle="--", alpha=0.8,
               label=f"Mean: {sum(data)/len(data):.0f}%")
    ax.set_title(era, fontsize=13, fontweight="bold")
    ax.set_xlabel("3PT Attempt Rate (%)")
    ax.legend(frameon=False, fontsize=9)

axes[0].set_ylabel("Number of Teams")

fig.suptitle("The Three-Point Convergence: Every Team Now Plays the Same Way",
             fontsize=14, fontweight="bold", y=1.02)
fig.tight_layout()
fig.savefig("three_pt_convergence.png", dpi=300, bbox_inches="tight")
plt.show()

The three panels tell a powerful story of strategic convergence:

2004-05: Wide distribution, centered around 24%. Some teams at 15%, others near 35%. Diverse strategies.
2013-14: Distribution shifted right (mean ~30%) but still fairly spread out. The analytics revolution is beginning.
2022-23: Distribution compressed tightly around 42%. Almost every team is within a narrow band. Strategic diversity has collapsed.

The shrinking spread is arguably the most important finding. It's not just that teams are shooting more threes — it's that all teams are shooting more threes. The outliers have been absorbed. The game has fundamentally changed.

Priya's design decisions: - Small multiples with shared axes: essential for honest comparison across eras. - Fixed x-axis range (10-55): ensures the visual shift from left to right is genuine, not an artifact of auto-scaling. - Mean lines in red: allow instant comparison of central tendency across panels. - Title states the interpretive finding, not just the data description.

Putting the Article Together

Priya arranges her three charts in the article draft:

The trend line opens the piece — it's the simplest chart and establishes the basic fact: three-point shooting is way up.
The scatter plot provides nuance — yes, threes correlate with winning, but it's not automatic.
The convergence chart delivers the punchline — the game hasn't just changed; it's converged. Strategic diversity is shrinking.

She saves each chart at 300 DPI for the print edition and as SVG for the web version:

fig.savefig("three_pt_convergence.png", dpi=300, bbox_inches="tight")
fig.savefig("three_pt_convergence.svg", bbox_inches="tight")

Her editor gives her one note: "Can you add one more chart showing shot locations on a basketball court?" Priya grins — she's been wanting to try that. But that'll require some custom coordinate work that goes beyond basic matplotlib. She makes a note for a future article and sticks with her three-chart story for now.

Technical Takeaways

Annotations are storytelling tools. Without the "Steph Curry's first MVP" annotation, the trend chart is just a rising line. With it, the data connects to a human story that readers care about.
Outliers tell stories too. The highlighted team in the scatter plot — high volume, low wins — adds nuance that a trendline alone would miss. Scatter plots are for exploring relationships, and exceptions are as interesting as the rule.
Distribution comparison reveals structure. The convergence finding (narrowing spread across eras) is invisible in means alone. You need histograms — or box plots, or violin plots — to see distributional changes.
Shared axes are non-negotiable for comparison. Without shared x and y axes, the three era panels would auto-scale independently, and a narrow distribution could look just as spread out as a wide one.
Save in multiple formats for different outputs. PNG for print and presentations (at 300 DPI). SVG for web (scales perfectly to any screen size).
Iterative design is normal. Priya's charts went through multiple versions — starting with defaults, adding customization, getting editor feedback, refining. Nobody produces a perfect chart on the first attempt.

Discussion Questions

Priya's scatter plot shows a general correlation between three-point shooting and winning. Does this mean shooting more threes causes better records? What confounding variables might explain the relationship? (Preview of Chapter 24's correlation vs. causation discussion.)
The convergence chart shows strategic diversity shrinking. Is this a problem for the sport? From a data perspective, what happens to the predictive power of three-point rate if all teams have essentially the same value?
Priya chose not to include a shot-location chart (the "heat map on a court") because it would require custom coordinates. How would you approach this technically? What grammar of graphics components would you need? (This is an extension question — no single right answer.)
If Priya's editor asked her to add team names to every point on the scatter plot (30 labels), would this improve or hurt the chart? What alternatives might work better for identifying specific teams without cluttering the plot?

End of Case Study 2