Chapter 5 Quiz: Descriptive Statistics in Basketball
Instructions
This quiz tests your understanding of descriptive statistics concepts and their application to basketball data as covered in Chapter 5. Answer all questions to the best of your ability. Each question is worth 1 point unless otherwise noted.
Time Limit: 45 minutes Total Points: 30
Section A: Measures of Central Tendency (Questions 1-6)
Question 1
If a player scores the following points in five games: 32, 24, 28, 31, and 22, what is their mean (average) points per game?
A) 27.0 B) 27.4 C) 28.0 D) 28.4
Question 2
NBA salary distributions are heavily right-skewed. In such a distribution, which statement is true?
A) Mean < Median < Mode B) Mode < Median < Mean C) Mean = Median = Mode D) Median < Mode < Mean
Question 3
When calculating a player's True Shooting Percentage across seasons with different attempt volumes, which type of average should you use?
A) Arithmetic mean of the percentages B) Median of the percentages C) Weighted mean using attempts as weights D) Mode of the percentages
Question 4
The median is preferred over the mean as a measure of central tendency when:
A) The distribution is symmetric B) The distribution is heavily skewed or contains outliers C) The sample size is very large D) All values are positive
Question 5
In shot chart analysis, identifying the most frequent shot location uses which measure of central tendency?
A) Mean B) Median C) Mode D) Range
Question 6
A player's seasonal scoring data shows Mean = 20.5 PPG and Median = 18.2 PPG. What does this indicate about the distribution?
A) The distribution is left-skewed B) The distribution is symmetric C) The distribution is right-skewed D) There are data errors
Section B: Measures of Variability (Questions 7-12)
Question 7
Two players both average 20 PPG. Player A has a standard deviation of 4 points, while Player B has a standard deviation of 10 points. Which player is more consistent?
A) Player A (lower standard deviation = more consistent) B) Player B (higher standard deviation = more consistent) C) They are equally consistent D) Cannot be determined from this information
Question 8
The Interquartile Range (IQR) is calculated as:
A) Maximum - Minimum B) Mean - Median C) Q3 - Q1 D) Standard Deviation / Mean
Question 9
Why is the Coefficient of Variation (CV) useful for comparing variability across different statistics?
A) It removes negative values B) It expresses variability relative to the mean, enabling comparison across different scales C) It is always between 0 and 1 D) It ignores outliers
Question 10
If a distribution has a large range but a small IQR, what does this indicate?
A) The data is uniformly distributed B) There are likely outliers in the data C) The distribution is symmetric D) The mean equals the median
Question 11
The formula for the Coefficient of Variation is:
A) Mean / Standard Deviation B) Standard Deviation / Mean x 100% C) (Max - Min) / Mean D) IQR / Median
Question 12
In basketball analytics, understanding scoring variability is important for:
A) Setting uniform playing time B) Predicting player performance ranges and assessing risk C) Calculating draft position D) Determining uniform numbers
Section C: Percentiles and Rankings (Questions 13-17)
Question 13
A player scoring 22 PPG is at the 85th percentile. This means:
A) 85% of their points come from certain shots B) They outscore 85% of NBA players C) They score 85% of their team's points D) Their efficiency is 85%
Question 14
The five-number summary includes all of the following EXCEPT:
A) Minimum B) First Quartile (Q1) C) Mean D) Maximum
Question 15
Which percentile corresponds to the median?
A) 25th percentile B) 50th percentile C) 75th percentile D) 100th percentile
Question 16
In a box plot, what typically defines an outlier?
A) Any value above the mean B) Values more than 1.5 x IQR beyond Q1 or Q3 C) The highest and lowest 10% of values D) Values more than 1 standard deviation from the mean
Question 17
If Player A is in the 92nd percentile for scoring and the 85th percentile for assists, which interpretation is correct?
A) They score more points than they assist B) They are an elite scorer and excellent passer relative to the league C) Their scoring is more valuable than their passing D) 92% of their value comes from scoring
Section D: Distribution Shape (Questions 18-21)
Question 18
A skewness value of +1.5 indicates:
A) A symmetric distribution B) A highly left-skewed distribution C) A highly right-skewed distribution D) A bimodal distribution
Question 19
Excess kurtosis measures:
A) The center of the distribution B) The spread of the distribution C) The tailedness of the distribution (frequency of extreme values) D) The number of modes
Question 20
Which of the following basketball statistics is typically right-skewed?
A) Plus/Minus ratings B) Points per game C) Free throw percentage D) Field goal percentage
Question 21
A Shapiro-Wilk test returns a p-value of 0.02. What is the conclusion regarding normality?
A) The data is normally distributed B) The data significantly deviates from normality C) The test is inconclusive D) More data is needed
Section E: Correlation and Relationships (Questions 22-25)
Question 22
A Pearson correlation coefficient of r = -0.65 between turnovers and win percentage indicates:
A) No relationship B) A weak positive relationship C) A strong negative relationship D) A perfect negative relationship
Question 23
The coefficient of determination (R-squared) represents:
A) The correlation coefficient B) The proportion of variance in Y explained by X C) The standard error of the regression D) The intercept of the regression line
Question 24
When would you prefer Spearman's rank correlation over Pearson's correlation?
A) When both variables are normally distributed B) When the relationship is linear C) When the data contains outliers or the relationship is monotonic but non-linear D) When the sample size is large
Question 25
"Correlation does not imply causation" is illustrated by which basketball scenario?
A) Players who shoot more make more shots B) Teams that shoot more three-pointers when losing (correlation with losses, but causation reversed) C) Taller players get more rebounds D) More minutes leads to more points
Section F: Standardization and Z-Scores (Questions 26-30)
Question 26
A player has a z-score of +2.5 for points per game. This means:
A) They score 2.5 points per game B) They score 2.5 standard deviations above the league average C) They are in the 25th percentile D) Their scoring is below average
Question 27
Z-scores are useful for:
A) Calculating totals B) Comparing statistics measured on different scales C) Determining draft position D) Computing salary cap
Question 28
The formula for a z-score is:
A) (x + mean) / std B) (x - mean) * std C) (x - mean) / std D) std / (x - mean)
Question 29
Era-adjusted statistics use z-scores because:
A) They are easier to calculate B) They allow comparison of players from different eras relative to their peers C) They eliminate all bias D) They are required by the NBA
Question 30 (2 points)
A player from the 1960s averaged 30 PPG when the league average was 20 PPG with a standard deviation of 8 points. A modern player averages 28 PPG when the league average is 15 PPG with a standard deviation of 6 points.
Calculate both z-scores and determine which player was more exceptional relative to their era. Show your work.
Bonus Questions (2 points each)
Bonus Question 1
Explain why the weighted mean is more appropriate than the simple mean when combining shooting percentages across multiple seasons. Provide a numerical example demonstrating the difference.
Bonus Question 2
A team's analytics department wants to create a single "overall player rating" by combining multiple statistics. Describe how you would use z-scores to create this composite metric, including what considerations you would make for weighting different statistics.
Answer Key
Section A: Measures of Central Tendency
-
B) 27.4 - (32 + 24 + 28 + 31 + 22) / 5 = 137 / 5 = 27.4
-
B) Mode < Median < Mean - In right-skewed distributions, the long right tail pulls the mean higher than the median.
-
C) Weighted mean using attempts as weights - Different seasons have different sample sizes; weighting by attempts gives appropriate influence to each season.
-
B) The distribution is heavily skewed or contains outliers - The median is resistant to extreme values and better represents the "typical" value in skewed data.
-
C) Mode - The mode identifies the most frequently occurring value/location.
-
C) The distribution is right-skewed - When mean > median, there are high values pulling the mean up, indicating right skew.
Section B: Measures of Variability
-
A) Player A (lower standard deviation = more consistent) - Lower standard deviation means scores are closer to the mean, indicating more consistent performance.
-
C) Q3 - Q1 - IQR measures the spread of the middle 50% of data.
-
B) It expresses variability relative to the mean, enabling comparison across different scales - CV normalizes variability by the mean, making comparisons meaningful across different statistics.
-
B) There are likely outliers in the data
- Large range with small IQR suggests extreme values at the tails not affecting the middle 50%.
-
B) Standard Deviation / Mean x 100%
- CV = (s / x-bar) * 100%
-
B) Predicting player performance ranges and assessing risk
- Variability indicates how much a player's performance fluctuates, important for game planning.
Section C: Percentiles and Rankings
-
B) They outscore 85% of NBA players
- Being at the 85th percentile means their scoring exceeds 85% of the comparison group.
-
C) Mean
- The five-number summary is: Min, Q1, Median, Q3, Max. The mean is not included.
-
B) 50th percentile
- By definition, the median is the 50th percentile.
-
B) Values more than 1.5 x IQR beyond Q1 or Q3
- This is the standard definition used in box plots.
-
B) They are an elite scorer and excellent passer relative to the league
- Percentiles indicate relative standing within the league distribution.
Section D: Distribution Shape
-
C) A highly right-skewed distribution
- Positive skewness indicates a long right tail; |skew| > 1 is considered highly skewed.
-
C) The tailedness of the distribution (frequency of extreme values)
- Kurtosis measures how heavy the tails are compared to a normal distribution.
-
B) Points per game
- Scoring distributions are typically right-skewed with most players scoring lower and few high scorers.
-
B) The data significantly deviates from normality
- p < 0.05 leads to rejection of the null hypothesis that data is normally distributed.
Section E: Correlation and Relationships
-
C) A strong negative relationship
- |r| = 0.65 is strong, and the negative sign indicates an inverse relationship.
-
B) The proportion of variance in Y explained by X
- R-squared = r^2 represents explained variance.
-
C) When the data contains outliers or the relationship is monotonic but non-linear
- Spearman uses ranks, making it robust to outliers and non-linear monotonic relationships.
-
B) Teams that shoot more three-pointers when losing (correlation with losses, but causation reversed)
- The correlation exists, but losing causes more threes (playing catch-up), not vice versa.
Section F: Standardization and Z-Scores
-
B) They score 2.5 standard deviations above the league average
- Z-scores express values in terms of standard deviations from the mean.
-
B) Comparing statistics measured on different scales
- Z-scores standardize different statistics to a common scale.
-
C) (x - mean) / std
- This is the standard z-score formula.
-
B) They allow comparison of players from different eras relative to their peers
- Era-adjusted z-scores account for different league contexts.
-
(2 points)
- 1960s player: z = (30 - 20) / 8 = 10/8 = 1.25
- Modern player: z = (28 - 15) / 6 = 13/6 = 2.17
- The modern player (z = 2.17) was more exceptional relative to their era because their z-score is higher, indicating they stood out more from their contemporaries.
Bonus Questions
Bonus 1: (2 points) Simple mean example: Player shoots 60% on 100 attempts in Year 1 and 40% on 400 attempts in Year 2. - Simple mean: (60% + 40%) / 2 = 50% - Weighted mean: (0.60 * 100 + 0.40 * 400) / (100 + 400) = (60 + 160) / 500 = 220/500 = 44%
The weighted mean (44%) correctly reflects that most of their shooting happened at the lower percentage. The simple mean would overstate their actual career shooting efficiency.
Bonus 2: (2 points) Process: 1. Calculate z-scores for each statistic (PPG, RPG, APG, etc.) relative to league distribution 2. Decide on weights based on position or analytical purpose (offensive vs defensive value) 3. Multiply each z-score by its weight 4. Sum the weighted z-scores
Considerations: - Different positions contribute differently (e.g., blocks more valuable for centers) - Some stats correlate highly (scoring with FGA) - avoid double-counting - Negative stats (turnovers) should be inverted before combining - Sample size matters - insufficient games may produce unreliable z-scores - Context matters - bench players vs starters should potentially be compared separately
Scoring Guide
| Score | Grade | Feedback |
|---|---|---|
| 28-34 | A | Excellent understanding of descriptive statistics |
| 24-27 | B | Good grasp of core concepts; review calculation formulas |
| 20-23 | C | Adequate understanding; practice more examples |
| 16-19 | D | Review chapter material thoroughly |
| Below 16 | F | Seek additional help before proceeding |
Post-Quiz Reflection
After completing this quiz, consider:
- Which statistical concepts do you feel most/least comfortable with?
- How would you apply these statistics to evaluate a player you follow?
- What additional practice would help solidify your understanding?
- How do these concepts connect to the EDA techniques from Chapter 4?
Take time to revisit sections where you scored below 80% before moving to the next chapter.