Appendix A: Mathematical Foundations

This appendix provides a comprehensive reference for the mathematical concepts underlying basketball analytics. These foundations are essential for understanding player evaluation metrics, team performance modeling, and predictive analytics.

A.1 Linear Algebra Basics

Linear algebra forms the backbone of modern analytics, enabling us to represent and manipulate large datasets efficiently. In basketball analytics, we use these concepts to analyze player statistics, create composite metrics, and build predictive models.

A.1.1 Vectors

A vector is an ordered collection of numbers representing quantities in a multi-dimensional space. In basketball analytics, vectors commonly represent player statistics across multiple categories.

Definition: A vector $\mathbf{x}$ in $\mathbb{R}^n$ is written as:

$$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}$$

Example: A player's per-game statistics vector might be:

$$\mathbf{p} = \begin{pmatrix} 25.3 \\ 7.2 \\ 5.8 \\ 1.4 \\ 0.8 \end{pmatrix} \quad \text{representing} \quad \begin{pmatrix} \text{PPG} \\ \text{RPG} \\ \text{APG} \\ \text{SPG} \\ \text{BPG} \end{pmatrix}$$

Vector Operations:

Addition: For vectors $\mathbf{a}, \mathbf{b} \in \mathbb{R}^n$: $$\mathbf{a} + \mathbf{b} = \begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \vdots \\ a_n + b_n \end{pmatrix}$$

Scalar Multiplication: For scalar $c$ and vector $\mathbf{a}$: $$c\mathbf{a} = \begin{pmatrix} ca_1 \\ ca_2 \\ \vdots \\ ca_n \end{pmatrix}$$

Dot Product: The inner product of two vectors: $$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1b_1 + a_2b_2 + \cdots + a_nb_n$$

Euclidean Norm: The length of a vector: $$\|\mathbf{a}\| = \sqrt{\sum_{i=1}^{n} a_i^2} = \sqrt{a_1^2 + a_2^2 + \cdots + a_n^2}$$

Application: The Euclidean distance between two players' statistical profiles measures their similarity: $$d(\mathbf{p}_1, \mathbf{p}_2) = \|\mathbf{p}_1 - \mathbf{p}_2\| = \sqrt{\sum_{i=1}^{n}(p_{1i} - p_{2i})^2}$$

A.1.2 Matrices

A matrix is a rectangular array of numbers arranged in rows and columns. Matrices allow us to represent entire datasets and perform simultaneous operations on multiple observations.

Definition: An $m \times n$ matrix $\mathbf{A}$ has $m$ rows and $n$ columns:

$$\mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}$$

Example: A team's player statistics matrix where rows are players and columns are statistics:

$$\mathbf{X} = \begin{pmatrix} 25.3 & 7.2 & 5.8 & 0.482 \\ 18.7 & 4.1 & 8.9 & 0.445 \\ 14.2 & 11.3 & 2.1 & 0.518 \\ 12.8 & 3.5 & 6.2 & 0.391 \\ 9.4 & 8.7 & 1.4 & 0.563 \end{pmatrix}$$

A.1.3 Matrix Operations

Matrix Addition: For matrices $\mathbf{A}, \mathbf{B}$ of the same dimensions: $$(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}$$

Matrix Multiplication: For $\mathbf{A}$ ($m \times n$) and $\mathbf{B}$ ($n \times p$), the product $\mathbf{C} = \mathbf{AB}$ is an $m \times p$ matrix: $$c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj}$$

Transpose: The transpose $\mathbf{A}^T$ of matrix $\mathbf{A}$ swaps rows and columns: $$(\mathbf{A}^T)_{ij} = a_{ji}$$

Matrix Inverse: For a square matrix $\mathbf{A}$, the inverse $\mathbf{A}^{-1}$ satisfies: $$\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$$

where $\mathbf{I}$ is the identity matrix.

Properties of the Inverse: - $(\mathbf{A}^{-1})^{-1} = \mathbf{A}$ - $(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$ - $(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T$

A.1.4 Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors are fundamental to Principal Component Analysis (PCA), which is used extensively in player profiling and dimensionality reduction.

Definition: For a square matrix $\mathbf{A}$, a non-zero vector $\mathbf{v}$ is an eigenvector with eigenvalue $\lambda$ if: $$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$

Characteristic Equation: Eigenvalues are found by solving: $$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

Application in PCA: The covariance matrix $\mathbf{\Sigma}$ of standardized player statistics has eigenvectors that define the principal components. The eigenvalues indicate the variance explained by each component.

A.2 Probability Theory Fundamentals

Probability theory provides the mathematical framework for quantifying uncertainty in basketball outcomes, from individual shot probabilities to game results.

A.2.1 Basic Probability Concepts

Sample Space and Events: The sample space $\Omega$ is the set of all possible outcomes. An event $A$ is a subset of $\Omega$.

Probability Axioms (Kolmogorov): 1. $P(A) \geq 0$ for all events $A$ 2. $P(\Omega) = 1$ 3. For mutually exclusive events $A_1, A_2, \ldots$: $P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$

Conditional Probability: The probability of $A$ given $B$ has occurred: $$P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0$$

Independence: Events $A$ and $B$ are independent if: $$P(A \cap B) = P(A) \cdot P(B)$$

Equivalently, $P(A|B) = P(A)$.

A.2.2 Bayes' Theorem

Bayes' theorem is essential for updating probabilities based on new evidence, crucial for in-game win probability models.

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Extended Form (Law of Total Probability): $$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}$$

Application: Updating a team's win probability given they made a three-pointer with 30 seconds remaining.

A.2.3 Random Variables and Distributions

Discrete Random Variables: A random variable $X$ taking countable values with probability mass function (PMF): $$P(X = x) = p(x), \quad \sum_x p(x) = 1$$

Continuous Random Variables: A random variable $X$ with probability density function (PDF) $f(x)$: $$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$

Expected Value: - Discrete: $E[X] = \sum_x x \cdot p(x)$ - Continuous: $E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx$

Variance: $$\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2$$

Standard Deviation: $$\sigma = \sqrt{\text{Var}(X)}$$

A.2.4 Common Probability Distributions

Binomial Distribution: Models the number of successes in $n$ independent trials with success probability $p$. $$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

Mean: $\mu = np$
Variance: $\sigma^2 = np(1-p)$

Application: Number of free throws made out of $n$ attempts.

Poisson Distribution: Models count data for rare events. $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Mean: $\mu = \lambda$
Variance: $\sigma^2 = \lambda$

Application: Number of turnovers per game.

Normal (Gaussian) Distribution: The ubiquitous bell curve. $$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

Standard Normal: When $\mu = 0$ and $\sigma = 1$: $$\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}$$

Z-Score Transformation: $$z = \frac{x - \mu}{\sigma}$$

A.3 Statistical Inference Concepts

Statistical inference allows us to draw conclusions about populations based on sample data, essential for generalizing findings from observed games to future performance.

A.3.1 Point Estimation

Estimator Properties: - Unbiasedness: $E[\hat{\theta}] = \theta$ - Consistency: $\hat{\theta} \xrightarrow{p} \theta$ as $n \to \infty$ - Efficiency: Minimum variance among unbiased estimators

Maximum Likelihood Estimation (MLE): Find $\hat{\theta}$ that maximizes: $$L(\theta) = \prod_{i=1}^{n} f(x_i | \theta)$$

Or equivalently, the log-likelihood: $$\ell(\theta) = \sum_{i=1}^{n} \log f(x_i | \theta)$$

A.3.2 Confidence Intervals

A $(1-\alpha)$ confidence interval provides a range of plausible values for a parameter.

For the Mean (known $\sigma$): $$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

For the Mean (unknown $\sigma$): $$\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$

For Proportions: $$\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

A.3.3 Hypothesis Testing

Framework: 1. State null hypothesis $H_0$ and alternative $H_1$ 2. Choose significance level $\alpha$ (commonly 0.05) 3. Calculate test statistic 4. Determine p-value or compare to critical value 5. Make decision: reject or fail to reject $H_0$

Test Statistic for Mean (unknown $\sigma$): $$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

P-value: Probability of observing a test statistic at least as extreme as the one calculated, assuming $H_0$ is true.

Type I Error ($\alpha$): Rejecting $H_0$ when it is true (false positive) Type II Error ($\beta$): Failing to reject $H_0$ when it is false (false negative) Power: $1 - \beta$, the probability of correctly rejecting a false $H_0$

A.3.4 The Central Limit Theorem

For independent random variables $X_1, X_2, \ldots, X_n$ with mean $\mu$ and variance $\sigma^2$: $$\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty$$

Practical Implication: The sampling distribution of the mean is approximately normal for large samples (typically $n \geq 30$), regardless of the underlying distribution.

A.4 Regression Analysis

Regression analysis is the workhorse of basketball analytics, used for everything from predicting points per game to estimating player value.

A.4.1 Ordinary Least Squares (OLS) Regression

Simple Linear Regression Model: $$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i$$

where $\varepsilon_i \sim N(0, \sigma^2)$ are independent errors.

OLS Estimators: Minimize the sum of squared residuals: $$\min_{\beta_0, \beta_1} \sum_{i=1}^{n} (Y_i - \beta_0 - \beta_1 X_i)^2$$

Solution: $$\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{S_{XY}}{S_{XX}}$$

$$\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}$$

Multiple Linear Regression: For $p$ predictors: $$Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i$$

Matrix Form: $$\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$$

OLS Solution: $$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$$

A.4.2 Model Evaluation Metrics

Coefficient of Determination ($R^2$): $$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i}(Y_i - \hat{Y}_i)^2}{\sum_{i}(Y_i - \bar{Y})^2}$$

Adjusted $R^2$: Penalizes for additional predictors: $$R^2_{adj} = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}$$

Root Mean Squared Error (RMSE): $$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2}$$

Mean Absolute Error (MAE): $$MAE = \frac{1}{n}\sum_{i=1}^{n}|Y_i - \hat{Y}_i|$$

A.4.3 OLS Assumptions and Diagnostics

Gauss-Markov Assumptions: 1. Linearity: $E[Y|X] = \mathbf{X}\boldsymbol{\beta}$ 2. Strict Exogeneity: $E[\varepsilon_i | \mathbf{X}] = 0$ 3. No Multicollinearity: $\mathbf{X}$ has full column rank 4. Homoscedasticity: $\text{Var}(\varepsilon_i | \mathbf{X}) = \sigma^2$ 5. No Autocorrelation: $\text{Cov}(\varepsilon_i, \varepsilon_j | \mathbf{X}) = 0$ for $i \neq j$

Variance Inflation Factor (VIF): Detects multicollinearity: $$VIF_j = \frac{1}{1 - R_j^2}$$

where $R_j^2$ is the $R^2$ from regressing $X_j$ on all other predictors. VIF > 10 suggests problematic multicollinearity.

A.4.4 Ridge Regression

Ridge regression addresses multicollinearity by adding an L2 penalty to the OLS objective.

Objective Function: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \sum_{j=1}^{p}\beta_j^2 \right\}$$

Solution: $$\hat{\boldsymbol{\beta}}_{ridge} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{Y}$$

Properties: - Shrinks coefficients toward zero - Does not set coefficients exactly to zero - $\lambda = 0$ reduces to OLS - As $\lambda \to \infty$, $\hat{\boldsymbol{\beta}} \to \mathbf{0}$

Selecting $\lambda$: Use cross-validation to find the $\lambda$ that minimizes prediction error.

A.4.5 LASSO Regression

LASSO (Least Absolute Shrinkage and Selection Operator) uses an L1 penalty.

Objective Function: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \sum_{j=1}^{p}|\beta_j| \right\}$$

Properties: - Can set coefficients exactly to zero (automatic feature selection) - Useful when many predictors are irrelevant - No closed-form solution; requires iterative algorithms

A.4.6 Elastic Net

Elastic Net combines L1 and L2 penalties:

$$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda_1 \sum_{j=1}^{p}|\beta_j| + \lambda_2 \sum_{j=1}^{p}\beta_j^2 \right\}$$

Alternative Parameterization: $$\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^{n}(Y_i - \mathbf{x}_i^T\boldsymbol{\beta})^2 + \lambda \left( \alpha \sum_{j=1}^{p}|\beta_j| + (1-\alpha) \sum_{j=1}^{p}\beta_j^2 \right) \right\}$$

where $\alpha \in [0, 1]$ controls the mix of L1 and L2 penalties.

A.4.7 Logistic Regression

Logistic regression models binary outcomes (e.g., win/loss, made/missed shot).

Model: $$P(Y = 1 | \mathbf{X}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p)}} = \frac{e^{\mathbf{x}^T\boldsymbol{\beta}}}{1 + e^{\mathbf{x}^T\boldsymbol{\beta}}}$$

Logit (Log-Odds): $$\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$

Interpretation: A one-unit increase in $X_j$ multiplies the odds by $e^{\beta_j}$.

Estimation: Maximum Likelihood (no closed-form solution): $$L(\boldsymbol{\beta}) = \prod_{i=1}^{n} p_i^{y_i}(1-p_i)^{1-y_i}$$

$$\ell(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left[ y_i \log(p_i) + (1-y_i)\log(1-p_i) \right]$$

Model Evaluation: - Accuracy: Proportion of correct predictions - Precision: $\frac{TP}{TP + FP}$ - Recall (Sensitivity): $\frac{TP}{TP + FN}$ - F1 Score: $2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$ - AUC-ROC: Area under the Receiver Operating Characteristic curve

A.4.8 Multinomial and Ordinal Logistic Regression

Multinomial Logistic Regression: For categorical outcomes with $K > 2$ categories: $$\log\left(\frac{P(Y = k)}{P(Y = K)}\right) = \mathbf{x}^T\boldsymbol{\beta}_k, \quad k = 1, \ldots, K-1$$

Application: Predicting whether a possession results in a made shot, missed shot, or turnover.

Ordinal Logistic Regression: For ordered categorical outcomes: $$\log\left(\frac{P(Y \leq j)}{P(Y > j)}\right) = \alpha_j - \mathbf{x}^T\boldsymbol{\beta}$$

Application: Predicting performance ratings (Poor, Average, Good, Excellent).

A.5 Additional Mathematical Tools

A.5.1 Optimization

Gradient Descent: Iterative method to find minimum: $$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta \nabla f(\boldsymbol{\theta}_t)$$

where $\eta$ is the learning rate.

Newton's Method: Second-order optimization: $$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \mathbf{H}^{-1} \nabla f(\boldsymbol{\theta}_t)$$

where $\mathbf{H}$ is the Hessian matrix of second derivatives.

A.5.2 Information Criteria

Akaike Information Criterion (AIC): $$AIC = 2k - 2\ln(\hat{L})$$

where $k$ is the number of parameters and $\hat{L}$ is the maximized likelihood.

Bayesian Information Criterion (BIC): $$BIC = k\ln(n) - 2\ln(\hat{L})$$

BIC penalizes model complexity more heavily than AIC.

A.5.3 Correlation

Pearson Correlation Coefficient: $$r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2}\sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$$

Spearman Rank Correlation: Pearson correlation applied to ranks.

Properties: - $-1 \leq r \leq 1$ - $r = 1$: Perfect positive linear relationship - $r = -1$: Perfect negative linear relationship - $r = 0$: No linear relationship (not necessarily independent)

A.6 Quick Reference Formulas

Descriptive Statistics

Measure	Formula
Sample Mean	$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
Sample Variance	$s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2$
Sample Standard Deviation	$s = \sqrt{s^2}$
Coefficient of Variation	$CV = \frac{s}{\bar{x}} \times 100\%$
Skewness	$\frac{n}{(n-1)(n-2)}\sum\left(\frac{x_i - \bar{x}}{s}\right)^3$
Kurtosis	$\frac{n(n+1)}{(n-1)(n-2)(n-3)}\sum\left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}$

Probability Distributions Summary

Distribution	PMF/PDF	Mean	Variance
Bernoulli$(p)$	$p^x(1-p)^{1-x}$	$p$	$p(1-p)$
Binomial$(n,p)$	$\binom{n}{k}p^k(1-p)^{n-k}$	$np$	$np(1-p)$
Poisson$(\lambda)$	$\frac{\lambda^k e^{-\lambda}}{k!}$	$\lambda$	$\lambda$
Normal$(\mu,\sigma^2)$	$\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$	$\mu$	$\sigma^2$
Exponential$(\lambda)$	$\lambda e^{-\lambda x}$	$\frac{1}{\lambda}$	$\frac{1}{\lambda^2}$
Uniform$(a,b)$	$\frac{1}{b-a}$	$\frac{a+b}{2}$	$\frac{(b-a)^2}{12}$

A.7 References and Further Reading

Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury Press.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed.). Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer.
Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press.
DeGroot, M. H., & Schervish, M. J. (2012). Probability and Statistics (4th ed.). Pearson.

This appendix provides the mathematical foundation necessary for understanding the analytical methods presented throughout this textbook. Students are encouraged to consult the referenced texts for more rigorous treatments of these topics.