Appendix A: Mathematical Foundations
This appendix provides a comprehensive reference for the mathematical concepts underlying basketball analytics. These foundations are essential for understanding player evaluation metrics, team performance modeling, and predictive analytics.
A.1 Linear Algebra Basics
Linear algebra forms the backbone of modern analytics, enabling us to represent and manipulate large datasets efficiently. In basketball analytics, we use these concepts to analyze player statistics, create composite metrics, and build predictive models.
A.1.1 Vectors
A vector is an ordered collection of numbers representing quantities in a multi-dimensional space. In basketball analytics, vectors commonly represent player statistics across multiple categories.
Definition: A vector $\mathbf{x}$ in $\mathbb{R}^n$ is written as:
$$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}$$
Example: A player's per-game statistics vector might be:
$$\mathbf{p} = \begin{pmatrix} 25.3 \\ 7.2 \\ 5.8 \\ 1.4 \\ 0.8 \end{pmatrix} \quad \text{representing} \quad \begin{pmatrix} \text{PPG} \\ \text{RPG} \\ \text{APG} \\ \text{SPG} \\ \text{BPG} \end{pmatrix}$$
Vector Operations:
Addition: For vectors $\mathbf{a}, \mathbf{b} \in \mathbb{R}^n$: $$\mathbf{a} + \mathbf{b} = \begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \vdots \\ a_n + b_n \end{pmatrix}$$
Scalar Multiplication: For scalar $c$ and vector $\mathbf{a}$: $$c\mathbf{a} = \begin{pmatrix} ca_1 \\ ca_2 \\ \vdots \\ ca_n \end{pmatrix}$$
Dot Product: The inner product of two vectors: $$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1b_1 + a_2b_2 + \cdots + a_nb_n$$
Euclidean Norm: The length of a vector: $$\|\mathbf{a}\| = \sqrt{\sum_{i=1}^{n} a_i^2} = \sqrt{a_1^2 + a_2^2 + \cdots + a_n^2}$$
Application: The Euclidean distance between two players' statistical profiles measures their similarity: $$d(\mathbf{p}_1, \mathbf{p}_2) = \|\mathbf{p}_1 - \mathbf{p}_2\| = \sqrt{\sum_{i=1}^{n}(p_{1i} - p_{2i})^2}$$
A.1.2 Matrices
A matrix is a rectangular array of numbers arranged in rows and columns. Matrices allow us to represent entire datasets and perform simultaneous operations on multiple observations.
Definition: An $m \times n$ matrix $\mathbf{A}$ has $m$ rows and $n$ columns:
$$\mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}$$
Example: A team's player statistics matrix where rows are players and columns are statistics:
$$\mathbf{X} = \begin{pmatrix} 25.3 & 7.2 & 5.8 & 0.482 \\ 18.7 & 4.1 & 8.9 & 0.445 \\ 14.2 & 11.3 & 2.1 & 0.518 \\ 12.8 & 3.5 & 6.2 & 0.391 \\ 9.4 & 8.7 & 1.4 & 0.563 \end{pmatrix}$$
A.1.3 Matrix Operations
Matrix Addition: For matrices $\mathbf{A}, \mathbf{B}$ of the same dimensions: $$(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}$$
Matrix Multiplication: For $\mathbf{A}$ ($m \times n$) and $\mathbf{B}$ ($n \times p$), the product $\mathbf{C} = \mathbf{AB}$ is an $m \times p$ matrix: $$c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj}$$
Transpose: The transpose $\mathbf{A}^T$ of matrix $\mathbf{A}$ swaps rows and columns: $$(\mathbf{A}^T)_{ij} = a_{ji}$$
Matrix Inverse: For a square matrix $\mathbf{A}$, the inverse $\mathbf{A}^{-1}$ satisfies: $$\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$$
where $\mathbf{I}$ is the identity matrix.
Properties of the Inverse: - $(\mathbf{A}^{-1})^{-1} = \mathbf{A}$ - $(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$ - $(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T$
A.1.4 Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are fundamental to Principal Component Analysis (PCA), which is used extensively in player profiling and dimensionality reduction.
Definition: For a square matrix $\mathbf{A}$, a non-zero vector $\mathbf{v}$ is an eigenvector with eigenvalue $\lambda$ if: $$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$
Characteristic Equation: Eigenvalues are found by solving: $$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$
Application in PCA: The covariance matrix $\mathbf{\Sigma}$ of standardized player statistics has eigenvectors that define the principal components. The eigenvalues indicate the variance explained by each component.
A.2 Probability Theory Fundamentals
Probability theory provides the mathematical framework for quantifying uncertainty in basketball outcomes, from individual shot probabilities to game results.
A.2.1 Basic Probability Concepts
Sample Space and Events: The sample space $\Omega$ is the set of all possible outcomes. An event $A$ is a subset of $\Omega$.
Probability Axioms (Kolmogorov): 1. $P(A) \geq 0$ for all events $A$ 2. $P(\Omega) = 1$ 3. For mutually exclusive events $A_1, A_2, \ldots$: $P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$
Conditional Probability: The probability of $A$ given $B$ has occurred: $$P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0$$
Independence: Events $A$ and $B$ are independent if: $$P(A \cap B) = P(A) \cdot P(B)$$
Equivalently, $P(A|B) = P(A)$.
A.2.2 Bayes' Theorem
Bayes' theorem is essential for updating probabilities based on new evidence, crucial for in-game win probability models.
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$
Extended Form (Law of Total Probability): $$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}$$
Application: Updating a team's win probability given they made a three-pointer with 30 seconds remaining.
A.2.3 Random Variables and Distributions
Discrete Random Variables: A random variable $X$ taking countable values with probability mass function (PMF): $$P(X = x) = p(x), \quad \sum_x p(x) = 1$$
Continuous Random Variables: A random variable $X$ with probability density function (PDF) $f(x)$: $$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$
Expected Value: - Discrete: $E[X] = \sum_x x \cdot p(x)$ - Continuous: $E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx$
Variance: $$\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2$$
Standard Deviation: $$\sigma = \sqrt{\text{Var}(X)}$$
A.2.4 Common Probability Distributions
Binomial Distribution: Models the number of successes in $n$ independent trials with success probability $p$. $$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$
- Mean: $\mu = np$
- Variance: $\sigma^2 = np(1-p)$
Application: Number of free throws made out of $n$ attempts.
Poisson Distribution: Models count data for rare events. $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$
- Mean: $\mu = \lambda$
- Variance: $\sigma^2 = \lambda$
Application: Number of turnovers per game.
Normal (Gaussian) Distribution: The ubiquitous bell curve. $$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
Standard Normal: When $\mu = 0$ and $\sigma = 1$: $$\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}$$
Z-Score Transformation: $$z = \frac{x - \mu}{\sigma}$$
A.3 Statistical Inference Concepts
Statistical inference allows us to draw conclusions about populations based on sample data, essential for generalizing findings from observed games to future performance.
A.3.1 Point Estimation
Estimator Properties: - Unbiasedness: $E[\hat{\theta}] = \theta$ - Consistency: $\hat{\theta} \xrightarrow{p} \theta$ as $n \to \infty$ - Efficiency: Minimum variance among unbiased estimators
Maximum Likelihood Estimation (MLE): Find $\hat{\theta}$ that maximizes: $$L(\theta) = \prod_{i=1}^{n} f(x_i | \theta)$$
Or equivalently, the log-likelihood: $$\ell(\theta) = \sum_{i=1}^{n} \log f(x_i | \theta)$$
A.3.2 Confidence Intervals
A $(1-\alpha)$ confidence interval provides a range of plausible values for a parameter.
For the Mean (known $\sigma$): $$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$
For the Mean (unknown $\sigma$): $$\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$
For Proportions: $$\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
A.3.3 Hypothesis Testing
Framework: 1. State null hypothesis $H_0$ and alternative $H_1$ 2. Choose significance level $\alpha$ (commonly 0.05) 3. Calculate test statistic 4. Determine p-value or compare to critical value 5. Make decision: reject or fail to reject $H_0$
Test Statistic for Mean (unknown $\sigma$): $$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$
P-value: Probability of observing a test statistic at least as extreme as the one calculated, assuming $H_0$ is true.
Type I Error ($\alpha$): Rejecting $H_0$ when it is true (false positive) Type II Error ($\beta$): Failing to reject $H_0$ when it is false (false negative) Power: $1 - \beta$, the probability of correctly rejecting a false $H_0$
A.3.4 The Central Limit Theorem
For independent random variables $X_1, X_2, \ldots, X_n$ with mean $\mu$ and variance $\sigma^2$: $$\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty$$
Practical Implication: The sampling distribution of the mean is approximately normal for large samples (typically $n \geq 30$), regardless of the underlying distribution.
A.4 Regression Analysis
Regression analysis is the workhorse of basketball analytics, used for everything from predicting points per game to estimating player value.
A.4.1 Ordinary Least Squares (OLS) Regression
Simple Linear Regression Model: $$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i$$
where $\varepsilon_i \sim N(0, \sigma^2)$ are independent errors.
OLS Estimators: Minimize the sum of squared residuals: $$\min_{\beta_0, \beta_1} \sum_{i=1}^{n} (Y_i - \beta_0 - \beta_1 X_i)^2$$
Solution: $$\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{S_{XY}}{S_{XX}}$$
$$\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}$$
Multiple Linear Regression: For $p$ predictors: $$Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i$$
Matrix Form: $$\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$$
OLS Solution: $$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$$
A.4.2 Model Evaluation Metrics
Coefficient of Determination ($R^2$): $$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i}(Y_i - \hat{Y}_i)^2}{\sum_{i}(Y_i - \bar{Y})^2}$$
Adjusted $R^2$: Penalizes for additional predictors: $$R^2_{adj} = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}$$
Root Mean Squared Error (RMSE): $$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2}$$
Mean Absolute Error (MAE): $$MAE = \frac{1}{n}\sum_{i=1}^{n}|Y_i - \hat{Y}_i|$$
A.4.3 OLS Assumptions and Diagnostics
Gauss-Markov Assumptions: 1. Linearity: $E[Y|X] = \mathbf{X}\boldsymbol{\beta}$ 2. Strict Exogeneity: $E[\varepsilon_i | \mathbf{X}] = 0$ 3. No Multicollinearity: $\mathbf{X}$ has full column rank 4. Homoscedasticity: $\text{Var}(\varepsilon_i | \mathbf{X}) = \sigma^2$ 5. No Autocorrelation: $\text{Cov}(\varepsilon_i, \varepsilon_j | \mathbf{X}) = 0$ for $i \neq j$
Variance Inflation Factor (VIF): Detects multicollinearity: $$VIF_j = \frac{1}{1 - R_j^2}$$
where $R_j^2$ is the $R^2$ from regressing $X_j$ on all other predictors. VIF > 10 suggests problematic multicollinearity.
A.4.4 Ridge Regression
Ridge regression addresses multicollinearity by adding an L2 penalty to the OLS objective.
Objective Function: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \sum_{j=1}^{p}\beta_j^2 \right\}$$
Solution: $$\hat{\boldsymbol{\beta}}_{ridge} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{Y}$$
Properties: - Shrinks coefficients toward zero - Does not set coefficients exactly to zero - $\lambda = 0$ reduces to OLS - As $\lambda \to \infty$, $\hat{\boldsymbol{\beta}} \to \mathbf{0}$
Selecting $\lambda$: Use cross-validation to find the $\lambda$ that minimizes prediction error.
A.4.5 LASSO Regression
LASSO (Least Absolute Shrinkage and Selection Operator) uses an L1 penalty.
Objective Function: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \sum_{j=1}^{p}|\beta_j| \right\}$$
Properties: - Can set coefficients exactly to zero (automatic feature selection) - Useful when many predictors are irrelevant - No closed-form solution; requires iterative algorithms
A.4.6 Elastic Net
Elastic Net combines L1 and L2 penalties:
$$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda_1 \sum_{j=1}^{p}|\beta_j| + \lambda_2 \sum_{j=1}^{p}\beta_j^2 \right\}$$
Alternative Parameterization: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \left( \alpha \sum_{j=1}^{p}|\beta_j| + (1-\alpha) \sum_{j=1}^{p}\beta_j^2 \right) \right\}$$
where $\alpha \in [0, 1]$ controls the mix of L1 and L2 penalties.
A.4.7 Logistic Regression
Logistic regression models binary outcomes (e.g., win/loss, made/missed shot).
Model: $$P(Y = 1 | \mathbf{X}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p)}} = \frac{e^{\mathbf{x}^T\boldsymbol{\beta}}}{1 + e^{\mathbf{x}^T\boldsymbol{\beta}}}$$
Logit (Log-Odds): $$\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$
Interpretation: A one-unit increase in $X_j$ multiplies the odds by $e^{\beta_j}$.
Estimation: Maximum Likelihood (no closed-form solution): $$L(\boldsymbol{\beta}) = \prod_{i=1}^{n} p_i^{y_i}(1-p_i)^{1-y_i}$$
$$\ell(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left[ y_i \log(p_i) + (1-y_i)\log(1-p_i) \right]$$
Model Evaluation: - Accuracy: Proportion of correct predictions - Precision: $\frac{TP}{TP + FP}$ - Recall (Sensitivity): $\frac{TP}{TP + FN}$ - F1 Score: $2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$ - AUC-ROC: Area under the Receiver Operating Characteristic curve
A.4.8 Multinomial and Ordinal Logistic Regression
Multinomial Logistic Regression: For categorical outcomes with $K > 2$ categories: $$\log\left(\frac{P(Y = k)}{P(Y = K)}\right) = \mathbf{x}^T\boldsymbol{\beta}_k, \quad k = 1, \ldots, K-1$$
Application: Predicting whether a possession results in a made shot, missed shot, or turnover.
Ordinal Logistic Regression: For ordered categorical outcomes: $$\log\left(\frac{P(Y \leq j)}{P(Y > j)}\right) = \alpha_j - \mathbf{x}^T\boldsymbol{\beta}$$
Application: Predicting performance ratings (Poor, Average, Good, Excellent).
A.5 Additional Mathematical Tools
A.5.1 Optimization
Gradient Descent: Iterative method to find minimum: $$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta \nabla f(\boldsymbol{\theta}_t)$$
where $\eta$ is the learning rate.
Newton's Method: Second-order optimization: $$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \mathbf{H}^{-1} \nabla f(\boldsymbol{\theta}_t)$$
where $\mathbf{H}$ is the Hessian matrix of second derivatives.
A.5.2 Information Criteria
Akaike Information Criterion (AIC): $$AIC = 2k - 2\ln(\hat{L})$$
where $k$ is the number of parameters and $\hat{L}$ is the maximized likelihood.
Bayesian Information Criterion (BIC): $$BIC = k\ln(n) - 2\ln(\hat{L})$$
BIC penalizes model complexity more heavily than AIC.
A.5.3 Correlation
Pearson Correlation Coefficient: $$r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2}\sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$$
Spearman Rank Correlation: Pearson correlation applied to ranks.
Properties: - $-1 \leq r \leq 1$ - $r = 1$: Perfect positive linear relationship - $r = -1$: Perfect negative linear relationship - $r = 0$: No linear relationship (not necessarily independent)
A.6 Quick Reference Formulas
Descriptive Statistics
| Measure | Formula |
|---|---|
| Sample Mean | $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$ |
| Sample Variance | $s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2$ |
| Sample Standard Deviation | $s = \sqrt{s^2}$ |
| Coefficient of Variation | $CV = \frac{s}{\bar{x}} \times 100\%$ |
| Skewness | $\frac{n}{(n-1)(n-2)}\sum\left(\frac{x_i - \bar{x}}{s}\right)^3$ |
| Kurtosis | $\frac{n(n+1)}{(n-1)(n-2)(n-3)}\sum\left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}$ |
Probability Distributions Summary
| Distribution | PMF/PDF | Mean | Variance |
|---|---|---|---|
| Bernoulli$(p)$ | $p^x(1-p)^{1-x}$ | $p$ | $p(1-p)$ |
| Binomial$(n,p)$ | $\binom{n}{k}p^k(1-p)^{n-k}$ | $np$ | $np(1-p)$ |
| Poisson$(\lambda)$ | $\frac{\lambda^k e^{-\lambda}}{k!}$ | $\lambda$ | $\lambda$ |
| Normal$(\mu,\sigma^2)$ | $\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ | $\mu$ | $\sigma^2$ |
| Exponential$(\lambda)$ | $\lambda e^{-\lambda x}$ | $\frac{1}{\lambda}$ | $\frac{1}{\lambda^2}$ |
| Uniform$(a,b)$ | $\frac{1}{b-a}$ | $\frac{a+b}{2}$ | $\frac{(b-a)^2}{12}$ |
A.7 References and Further Reading
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury Press.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed.). Springer.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer.
- Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press.
- DeGroot, M. H., & Schervish, M. J. (2012). Probability and Statistics (4th ed.). Pearson.
This appendix provides the mathematical foundation necessary for understanding the analytical methods presented throughout this textbook. Students are encouraged to consult the referenced texts for more rigorous treatments of these topics.