Case Study 1: Calculating a Batting Average — Sports Data with Python Basics
Tier 3 — Illustrative Example: This case study uses Priya, one of our anchor characters, in a simplified scenario constructed for pedagogical purposes. The basketball statistics described are realistic in structure but are fictional numbers chosen to illustrate Python concepts. No specific NBA player or season is represented.
The Setting
Priya is a sports journalist who covers the NBA for an online publication. She's just finished Chapter 3 and has a working Jupyter notebook with Python basics under her belt: variables, arithmetic, strings, f-strings, and type conversion.
Today she's got a very specific task. Her editor wants a "by the numbers" sidebar for an article about three players on a local team. For each player, Priya needs to calculate several performance statistics and format them into a clean, readable summary. The stats she needs are:
- Points per game (PPG): total points divided by games played
- Field goal percentage (FG%): field goals made divided by field goals attempted, times 100
- Three-point percentage (3P%): three-pointers made divided by three-pointers attempted, times 100
- Free throw percentage (FT%): free throws made divided by free throws attempted, times 100
She could do this on a calculator. She's done it before — typing in numbers, writing results on sticky notes, hoping she doesn't transpose a digit. But with three players and four calculations each, that's twelve separate computations, each one a chance for a typo. And if her editor says "actually, can you also add rebounds per game?" she has to start over.
Let's see how Python makes this faster, less error-prone, and repeatable.
The Data
Priya has the following season statistics for three players. (In a future chapter, she'll load this from a CSV file. For now, she's typing it in.)
| Player | Games | Points | FG Made | FG Att | 3P Made | 3P Att | FT Made | FT Att |
|---|---|---|---|---|---|---|---|---|
| Amara Johnson | 72 | 1584 | 576 | 1210 | 144 | 398 | 288 | 331 |
| DeShaun Williams | 68 | 1122 | 408 | 892 | 102 | 295 | 204 | 240 |
| Kenji Nakamura | 79 | 987 | 372 | 814 | 81 | 248 | 162 | 194 |
Step 1: Storing the Data in Variables
Priya opens her Jupyter notebook and creates variables for the first player:
# Player 1: Amara Johnson
p1_name = "Amara Johnson"
p1_games = 72
p1_points = 1584
p1_fg_made = 576
p1_fg_att = 1210
p1_3p_made = 144
p1_3p_att = 398
p1_ft_made = 288
p1_ft_att = 331
She runs the cell. No output — that's expected. The variables are stored in memory, waiting to be used.
A few things to notice about her variable names:
- They use snake_case (
p1_fg_made, notp1FgMade). - They have a consistent prefix (
p1_for player 1) so she can keep track of which player each variable belongs to. - They're descriptive enough that someone reading the code can understand what each one is without checking back.
p1_3p_madestarts withp1_, not3— because variable names can't start with a number.
She does the same for players 2 and 3:
# Player 2: DeShaun Williams
p2_name = "DeShaun Williams"
p2_games = 68
p2_points = 1122
p2_fg_made = 408
p2_fg_att = 892
p2_3p_made = 102
p2_3p_att = 295
p2_ft_made = 204
p2_ft_att = 240
# Player 3: Kenji Nakamura
p3_name = "Kenji Nakamura"
p3_games = 79
p3_points = 987
p3_fg_made = 372
p3_fg_att = 814
p3_3p_made = 81
p3_3p_att = 248
p3_ft_made = 162
p3_ft_att = 194
What Priya notices: Typing all these variables is tedious. Three players, nine variables each — that's 27 variables. She can already see that this approach won't scale to an entire roster of 15 players, let alone a league of 450. In Chapter 5, she'll learn about dictionaries and lists, which will let her organize this data much more cleanly. For now, this works.
Step 2: Computing the Statistics
Now Priya writes the calculations for Player 1:
# Calculations for Amara Johnson
p1_ppg = p1_points / p1_games
p1_fg_pct = p1_fg_made / p1_fg_att * 100
p1_3p_pct = p1_3p_made / p1_3p_att * 100
p1_ft_pct = p1_ft_made / p1_ft_att * 100
She prints the results to check:
print(f"Points per game: {p1_ppg}")
print(f"FG%: {p1_fg_pct}")
print(f"3P%: {p1_3p_pct}")
print(f"FT%: {p1_ft_pct}")
Points per game: 22.0
FG%: 47.60330578512397
3P%: 36.18090452261306
FT%: 86.9789318600906
The math is right, but those decimal places are ugly. Priya uses f-string formatting:
print(f"{p1_name}")
print(f" PPG: {p1_ppg:.1f}")
print(f" FG%: {p1_fg_pct:.1f}%")
print(f" 3P%: {p1_3p_pct:.1f}%")
print(f" FT%: {p1_ft_pct:.1f}%")
Amara Johnson
PPG: 22.0
FG%: 47.6%
3P%: 36.2%
FT%: 87.0%
Much better. She repeats the same pattern for players 2 and 3:
# Player 2 calculations
p2_ppg = p2_points / p2_games
p2_fg_pct = p2_fg_made / p2_fg_att * 100
p2_3p_pct = p2_3p_made / p2_3p_att * 100
p2_ft_pct = p2_ft_made / p2_ft_att * 100
# Player 3 calculations
p3_ppg = p3_points / p3_games
p3_fg_pct = p3_fg_made / p3_fg_att * 100
p3_3p_pct = p3_3p_made / p3_3p_att * 100
p3_ft_pct = p3_ft_made / p3_ft_att * 100
Step 3: Formatting the Report
Priya's editor wants a clean sidebar. She builds a formatted output:
print("=" * 40)
print("PLAYER PERFORMANCE SUMMARY")
print("=" * 40)
print(f"\n{p1_name}")
print(f" PPG: {p1_ppg:.1f} | FG: {p1_fg_pct:.1f}%"
f" | 3P: {p1_3p_pct:.1f}% | FT: {p1_ft_pct:.1f}%")
print(f"\n{p2_name}")
print(f" PPG: {p2_ppg:.1f} | FG: {p2_fg_pct:.1f}%"
f" | 3P: {p2_3p_pct:.1f}% | FT: {p2_ft_pct:.1f}%")
print(f"\n{p3_name}")
print(f" PPG: {p3_ppg:.1f} | FG: {p3_fg_pct:.1f}%"
f" | 3P: {p3_3p_pct:.1f}% | FT: {p3_ft_pct:.1f}%")
print("\n" + "=" * 40)
========================================
PLAYER PERFORMANCE SUMMARY
========================================
Amara Johnson
PPG: 22.0 | FG: 47.6% | 3P: 36.2% | FT: 87.0%
DeShaun Williams
PPG: 16.5 | FG: 45.7% | 3P: 34.6% | FT: 85.0%
Kenji Nakamura
PPG: 12.5 | FG: 45.7% | 3P: 32.7% | FT: 83.5%
========================================
Step 4: Adding Comparisons
Priya's editor calls back: "Can you tell me which player has the best three-point percentage?" She adds some boolean comparisons:
p1_best_3p = p1_3p_pct > p2_3p_pct and p1_3p_pct > p3_3p_pct
p2_best_3p = p2_3p_pct > p1_3p_pct and p2_3p_pct > p3_3p_pct
p3_best_3p = p3_3p_pct > p1_3p_pct and p3_3p_pct > p2_3p_pct
print(f"{p1_name} has best 3P%: {p1_best_3p}")
print(f"{p2_name} has best 3P%: {p2_best_3p}")
print(f"{p3_name} has best 3P%: {p3_best_3p}")
Amara Johnson has best 3P%: True
DeShaun Williams has best 3P%: False
Kenji Nakamura has best 3P%: False
It works, but Priya can see the problem: this comparison approach becomes unwieldy with more players. In Chapter 4, she'll learn about if/elif/else to handle this more elegantly. And in Part II, she'll use pandas to compute these statistics for entire rosters in a single line of code.
What Priya Learned
Looking back at her notebook, Priya realizes she's used almost every concept from Chapter 3:
- Variables to store player data and computed results
- Integers for counts (games, shots made, shots attempted)
- Floats for computed percentages (the result of dividing two integers)
- Strings for player names
- Arithmetic operators (
/for division,*for multiplication) - f-strings with format specifiers (
:.1f) for clean output - Booleans and comparison operators (
>,and) to identify the best performer - String repetition (
"=" * 40) for visual formatting
More importantly, she understands why Python is better than a calculator for this task:
- Transparency. Every calculation is visible. If the FG% formula is wrong, she can see it and fix it. On a calculator, the wrong keystroke is gone forever.
- Repeatability. If a player's stats get updated, she changes one number and re-runs the cells. Every calculation updates automatically.
- Scalability. Adding a fourth player means copying the same pattern. (And in later chapters, even the copying becomes unnecessary.)
- Communication. The notebook itself — code, output, and explanatory Markdown cells — is a document she can share with her editor. It shows how she got the numbers, not just what they are.
Discussion Questions
-
Priya used the naming convention
p1_fg_made,p2_fg_made, etc. What are the advantages and disadvantages of this approach compared to just usingfg_made_1,fg_made_2? -
When Priya computed field goal percentage, she wrote
p1_fg_made / p1_fg_att * 100. The order of operations means this is evaluated as(p1_fg_made / p1_fg_att) * 100, which is correct. What would happen if she wrotep1_fg_made / (p1_fg_att * 100)instead? Would Python throw an error, or would it silently give the wrong answer? -
Imagine a player attempted zero three-point shots. What would happen when Priya tries to calculate their three-point percentage? What type of error would Python raise? (Hint: think about what
0does in a denominator.) -
The code is repetitive — the same four-line calculation block appears three times with slightly different variable names. In Chapter 4, you'll learn about functions that let you write the calculation once and reuse it. What would you want the function to take as input, and what would it return?