Appendix B: Statistical Tables
This appendix provides reference tables for common statistical distributions used throughout this textbook. While modern practice relies on software to compute exact p-values and critical values, these tables remain valuable for building intuition, quick estimation, and exam settings where software is unavailable.
B.1 Standard Normal Distribution (Z-Table)
The standard normal distribution has mean 0 and variance 1. The table below gives the cumulative probability P(Z <= z) for the standard normal random variable Z.
Positive Z-Values: P(Z <= z)
| z | 0.00 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.0 | 0.5000 | 0.5040 | 0.5080 | 0.5120 | 0.5160 | 0.5199 | 0.5239 | 0.5279 | 0.5319 | 0.5359 |
| 0.1 | 0.5398 | 0.5438 | 0.5478 | 0.5517 | 0.5557 | 0.5596 | 0.5636 | 0.5675 | 0.5714 | 0.5753 |
| 0.2 | 0.5793 | 0.5832 | 0.5871 | 0.5910 | 0.5948 | 0.5987 | 0.6026 | 0.6064 | 0.6103 | 0.6141 |
| 0.3 | 0.6179 | 0.6217 | 0.6255 | 0.6293 | 0.6331 | 0.6368 | 0.6406 | 0.6443 | 0.6480 | 0.6517 |
| 0.4 | 0.6554 | 0.6591 | 0.6628 | 0.6664 | 0.6700 | 0.6736 | 0.6772 | 0.6808 | 0.6844 | 0.6879 |
| 0.5 | 0.6915 | 0.6950 | 0.6985 | 0.7019 | 0.7054 | 0.7088 | 0.7123 | 0.7157 | 0.7190 | 0.7224 |
| 0.6 | 0.7257 | 0.7291 | 0.7324 | 0.7357 | 0.7389 | 0.7422 | 0.7454 | 0.7486 | 0.7517 | 0.7549 |
| 0.7 | 0.7580 | 0.7611 | 0.7642 | 0.7673 | 0.7704 | 0.7734 | 0.7764 | 0.7794 | 0.7823 | 0.7852 |
| 0.8 | 0.7881 | 0.7910 | 0.7939 | 0.7967 | 0.7995 | 0.8023 | 0.8051 | 0.8078 | 0.8106 | 0.8133 |
| 0.9 | 0.8159 | 0.8186 | 0.8212 | 0.8238 | 0.8264 | 0.8289 | 0.8315 | 0.8340 | 0.8365 | 0.8389 |
| 1.0 | 0.8413 | 0.8438 | 0.8461 | 0.8485 | 0.8508 | 0.8531 | 0.8554 | 0.8577 | 0.8599 | 0.8621 |
| 1.1 | 0.8643 | 0.8665 | 0.8686 | 0.8708 | 0.8729 | 0.8749 | 0.8770 | 0.8790 | 0.8810 | 0.8830 |
| 1.2 | 0.8849 | 0.8869 | 0.8888 | 0.8907 | 0.8925 | 0.8944 | 0.8962 | 0.8980 | 0.8997 | 0.9015 |
| 1.3 | 0.9032 | 0.9049 | 0.9066 | 0.9082 | 0.9099 | 0.9115 | 0.9131 | 0.9147 | 0.9162 | 0.9177 |
| 1.4 | 0.9192 | 0.9207 | 0.9222 | 0.9236 | 0.9251 | 0.9265 | 0.9279 | 0.9292 | 0.9306 | 0.9319 |
| 1.5 | 0.9332 | 0.9345 | 0.9357 | 0.9370 | 0.9382 | 0.9394 | 0.9406 | 0.9418 | 0.9429 | 0.9441 |
| 1.6 | 0.9452 | 0.9463 | 0.9474 | 0.9484 | 0.9495 | 0.9505 | 0.9515 | 0.9525 | 0.9535 | 0.9545 |
| 1.7 | 0.9554 | 0.9564 | 0.9573 | 0.9582 | 0.9591 | 0.9599 | 0.9608 | 0.9616 | 0.9625 | 0.9633 |
| 1.8 | 0.9641 | 0.9649 | 0.9656 | 0.9664 | 0.9671 | 0.9678 | 0.9686 | 0.9693 | 0.9699 | 0.9706 |
| 1.9 | 0.9713 | 0.9719 | 0.9726 | 0.9732 | 0.9738 | 0.9744 | 0.9750 | 0.9756 | 0.9761 | 0.9767 |
| 2.0 | 0.9772 | 0.9778 | 0.9783 | 0.9788 | 0.9793 | 0.9798 | 0.9803 | 0.9808 | 0.9812 | 0.9817 |
| 2.1 | 0.9821 | 0.9826 | 0.9830 | 0.9834 | 0.9838 | 0.9842 | 0.9846 | 0.9850 | 0.9854 | 0.9857 |
| 2.2 | 0.9861 | 0.9864 | 0.9868 | 0.9871 | 0.9875 | 0.9878 | 0.9881 | 0.9884 | 0.9887 | 0.9890 |
| 2.3 | 0.9893 | 0.9896 | 0.9898 | 0.9901 | 0.9904 | 0.9906 | 0.9909 | 0.9911 | 0.9913 | 0.9916 |
| 2.4 | 0.9918 | 0.9920 | 0.9922 | 0.9925 | 0.9927 | 0.9929 | 0.9931 | 0.9932 | 0.9934 | 0.9936 |
| 2.5 | 0.9938 | 0.9940 | 0.9941 | 0.9943 | 0.9945 | 0.9946 | 0.9948 | 0.9949 | 0.9951 | 0.9952 |
| 2.6 | 0.9953 | 0.9955 | 0.9956 | 0.9957 | 0.9959 | 0.9960 | 0.9961 | 0.9962 | 0.9963 | 0.9964 |
| 2.7 | 0.9965 | 0.9966 | 0.9967 | 0.9968 | 0.9969 | 0.9970 | 0.9971 | 0.9972 | 0.9973 | 0.9974 |
| 2.8 | 0.9974 | 0.9975 | 0.9976 | 0.9977 | 0.9977 | 0.9978 | 0.9979 | 0.9979 | 0.9980 | 0.9981 |
| 2.9 | 0.9981 | 0.9982 | 0.9982 | 0.9983 | 0.9984 | 0.9984 | 0.9985 | 0.9985 | 0.9986 | 0.9986 |
| 3.0 | 0.9987 | 0.9987 | 0.9987 | 0.9988 | 0.9988 | 0.9989 | 0.9989 | 0.9989 | 0.9990 | 0.9990 |
Usage note: By symmetry of the standard normal distribution, P(Z <= -z) = 1 - P(Z <= z). For example, P(Z <= -1.96) = 1 - 0.9750 = 0.0250.
Commonly Used Critical Values
| Confidence Level | alpha (two-tailed) | z* (critical value) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |
| 99.5% | 0.005 | 2.807 |
| 99.9% | 0.001 | 3.291 |
Quick reference: Approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2 (precisely 1.96), and 99.7% within 3. This is the well-known "68-95-99.7 rule."
B.2 Student's t-Distribution Critical Values
The t-distribution arises when estimating the mean of a normally distributed population with unknown variance using a small sample. The table gives the critical value t_{alpha, df} such that P(T > t_{alpha, df}) = alpha for a t-distributed random variable T with df degrees of freedom.
Upper-Tail Critical Values
| df | alpha=0.10 | alpha=0.05 | alpha=0.025 | alpha=0.01 | alpha=0.005 | alpha=0.001 |
|---|---|---|---|---|---|---|
| 1 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 | 318.309 |
| 2 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 | 22.327 |
| 3 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 | 10.215 |
| 4 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 | 7.173 |
| 5 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 | 5.893 |
| 6 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 | 5.208 |
| 7 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 | 4.785 |
| 8 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 | 4.501 |
| 9 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 | 4.297 |
| 10 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 | 4.144 |
| 11 | 1.363 | 1.796 | 2.201 | 2.718 | 3.106 | 4.025 |
| 12 | 1.356 | 1.782 | 2.179 | 2.681 | 3.055 | 3.930 |
| 13 | 1.350 | 1.771 | 2.160 | 2.650 | 3.012 | 3.852 |
| 14 | 1.345 | 1.761 | 2.145 | 2.624 | 2.977 | 3.787 |
| 15 | 1.341 | 1.753 | 2.131 | 2.602 | 2.947 | 3.733 |
| 20 | 1.325 | 1.725 | 2.086 | 2.528 | 2.845 | 3.552 |
| 25 | 1.316 | 1.708 | 2.060 | 2.485 | 2.787 | 3.450 |
| 30 | 1.310 | 1.697 | 2.042 | 2.457 | 2.750 | 3.385 |
| 40 | 1.303 | 1.684 | 2.021 | 2.423 | 2.704 | 3.307 |
| 50 | 1.299 | 1.676 | 2.009 | 2.403 | 2.678 | 3.261 |
| 60 | 1.296 | 1.671 | 2.000 | 2.390 | 2.660 | 3.232 |
| 80 | 1.292 | 1.664 | 1.990 | 2.374 | 2.639 | 3.195 |
| 100 | 1.290 | 1.660 | 1.984 | 2.364 | 2.626 | 3.174 |
| 120 | 1.289 | 1.658 | 1.980 | 2.358 | 2.617 | 3.160 |
| inf | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 | 3.090 |
Usage note: For a two-tailed test at significance level alpha, use the column for alpha/2. For instance, for a 95% confidence interval (alpha = 0.05 two-tailed), look up alpha/2 = 0.025. With df = 10, the critical value is 2.228.
Practical guidance: As the degrees of freedom increase, the t-distribution approaches the standard normal distribution. For df > 30, the difference is often negligible for practical purposes.
B.3 Chi-Square Distribution Critical Values
The chi-square distribution with k degrees of freedom arises as the distribution of the sum of k independent squared standard normal random variables. It is used in goodness-of-fit tests, contingency table analysis, and confidence intervals for variance. The table gives chi^2_{alpha, df} such that P(X > chi^2_{alpha, df}) = alpha.
Upper-Tail Critical Values
| df | alpha=0.995 | alpha=0.990 | alpha=0.975 | alpha=0.950 | alpha=0.900 | alpha=0.100 | alpha=0.050 | alpha=0.025 | alpha=0.010 | alpha=0.005 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.000 | 0.000 | 0.001 | 0.004 | 0.016 | 2.706 | 3.841 | 5.024 | 6.635 | 7.879 |
| 2 | 0.010 | 0.020 | 0.051 | 0.103 | 0.211 | 4.605 | 5.991 | 7.378 | 9.210 | 10.597 |
| 3 | 0.072 | 0.115 | 0.216 | 0.352 | 0.584 | 6.251 | 7.815 | 9.348 | 11.345 | 12.838 |
| 4 | 0.207 | 0.297 | 0.484 | 0.711 | 1.064 | 7.779 | 9.488 | 11.143 | 13.277 | 14.860 |
| 5 | 0.412 | 0.554 | 0.831 | 1.145 | 1.610 | 9.236 | 11.070 | 12.833 | 15.086 | 16.750 |
| 6 | 0.676 | 0.872 | 1.237 | 1.635 | 2.204 | 10.645 | 12.592 | 14.449 | 16.812 | 18.548 |
| 7 | 0.989 | 1.239 | 1.690 | 2.167 | 2.833 | 12.017 | 14.067 | 16.013 | 18.475 | 20.278 |
| 8 | 1.344 | 1.646 | 2.180 | 2.733 | 3.490 | 13.362 | 15.507 | 17.535 | 20.090 | 21.955 |
| 9 | 1.735 | 2.088 | 2.700 | 3.325 | 4.168 | 14.684 | 16.919 | 19.023 | 21.666 | 23.589 |
| 10 | 2.156 | 2.558 | 3.247 | 3.940 | 4.865 | 15.987 | 18.307 | 20.483 | 23.209 | 25.188 |
| 12 | 3.074 | 3.571 | 4.404 | 5.226 | 6.304 | 18.549 | 21.026 | 23.337 | 26.217 | 28.300 |
| 15 | 4.601 | 5.229 | 6.262 | 7.261 | 8.547 | 22.307 | 24.996 | 27.488 | 30.578 | 32.801 |
| 20 | 7.434 | 8.260 | 9.591 | 10.851 | 12.443 | 28.412 | 31.410 | 34.170 | 37.566 | 39.997 |
| 25 | 10.520 | 11.524 | 13.120 | 14.611 | 16.473 | 34.382 | 37.652 | 40.646 | 44.314 | 46.928 |
| 30 | 13.787 | 14.953 | 16.791 | 18.493 | 20.599 | 40.256 | 43.773 | 46.979 | 50.892 | 53.672 |
| 40 | 20.707 | 22.164 | 24.433 | 26.509 | 29.051 | 51.805 | 55.758 | 59.342 | 63.691 | 66.766 |
| 50 | 27.991 | 29.707 | 32.357 | 34.764 | 37.689 | 63.167 | 67.505 | 71.420 | 76.154 | 79.490 |
| 60 | 35.534 | 37.485 | 40.482 | 43.188 | 46.459 | 74.397 | 79.082 | 83.298 | 88.379 | 91.952 |
| 80 | 51.172 | 53.540 | 57.153 | 60.391 | 64.278 | 96.578 | 101.879 | 106.629 | 112.329 | 116.321 |
| 100 | 67.328 | 70.065 | 74.222 | 77.929 | 82.358 | 118.498 | 124.342 | 129.561 | 135.807 | 140.169 |
Usage note: The chi-square distribution is asymmetric and non-negative. The left columns (large alpha, e.g., 0.995) give left-tail values, while the right columns (small alpha, e.g., 0.005) give right-tail values. For a two-tailed confidence interval for variance at confidence level 1 - alpha, use chi^2_{alpha/2, df} and chi^2_{1-alpha/2, df}.
B.4 Common Probability Distributions: Summary Reference
This section provides a consolidated reference table for the distributions most commonly encountered in machine learning and AI engineering.
Discrete Distributions
Bernoulli(p): Models a single binary trial. - Support: {0, 1} - PMF: P(X = k) = p^k * (1 - p)^{1-k} for k in {0, 1} - Mean: p - Variance: p(1 - p) - ML usage: Binary classification outputs, dropout masks, stochastic neurons
Binomial(n, p): Models the number of successes in n independent Bernoulli trials. - Support: {0, 1, ..., n} - PMF: P(X = k) = C(n, k) * p^k * (1 - p)^{n-k} - Mean: np - Variance: np(1 - p) - ML usage: Aggregating binary outcomes, bootstrap sampling analysis
Categorical(p_1, ..., p_K): Generalization of Bernoulli to K categories. - Support: {1, 2, ..., K} - PMF: P(X = k) = p_k - ML usage: Multi-class classification, language model next-token prediction
Poisson(lambda): Models the count of events in a fixed interval. - Support: {0, 1, 2, ...} - PMF: P(X = k) = (lambda^k * e^{-lambda}) / k! - Mean: lambda - Variance: lambda - ML usage: Count regression, event modeling, text word counts
Geometric(p): Number of trials until the first success. - Support: {1, 2, 3, ...} - PMF: P(X = k) = (1 - p)^{k-1} * p - Mean: 1/p - Variance: (1 - p) / p^2 - ML usage: Modeling waiting times, sequence lengths
Continuous Distributions
Uniform(a, b): All values in [a, b] are equally likely. - PDF: f(x) = 1/(b - a) for a <= x <= b - Mean: (a + b) / 2 - Variance: (b - a)^2 / 12 - ML usage: Random initialization, random search hyperparameter tuning
Gaussian (Normal) N(mu, sigma^2): The most important distribution in statistics. - PDF: f(x) = (1 / sqrt(2pisigma^2)) * exp(-(x - mu)^2 / (2*sigma^2)) - Mean: mu - Variance: sigma^2 - ML usage: Weight initialization, noise modeling, variational inference, Gaussian processes
Exponential(lambda): Models time between events in a Poisson process. - PDF: f(x) = lambda * exp(-lambda * x) for x >= 0 - Mean: 1/lambda - Variance: 1/lambda^2 - ML usage: Survival analysis, learning rate schedules
Beta(alpha, beta): Distribution over probabilities (values in [0, 1]). - PDF: f(x) = x^{alpha-1} * (1-x)^{beta-1} / B(alpha, beta) - Mean: alpha / (alpha + beta) - Variance: alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1)) - ML usage: Bayesian prior for probabilities, Thompson sampling in bandits
Gamma(alpha, beta): Generalization of the exponential distribution. - PDF: f(x) = (beta^alpha / Gamma(alpha)) * x^{alpha-1} * exp(-beta * x) for x >= 0 - Mean: alpha / beta - Variance: alpha / beta^2 - ML usage: Bayesian prior for precision parameters, modeling positive quantities
Student's t(nu): Heavy-tailed alternative to the Gaussian. - PDF: f(x) = Gamma((nu+1)/2) / (sqrt(nu*pi) * Gamma(nu/2)) * (1 + x^2/nu)^{-(nu+1)/2} - Mean: 0 (for nu > 1) - Variance: nu / (nu - 2) (for nu > 2) - ML usage: Robust regression, outlier modeling, small-sample inference
Dirichlet(alpha_1, ..., alpha_K): Multivariate generalization of the Beta distribution; distribution over probability simplices. - Mean: E[x_k] = alpha_k / sum_j alpha_j - ML usage: Prior for categorical distributions, topic modeling (LDA)
Distribution Relationships
Understanding how distributions relate to each other aids both theoretical understanding and practical modeling:
- Bernoulli is a special case of Binomial with n = 1.
- Binomial approaches Poisson when n is large and p is small (with lambda = np).
- Binomial approaches Gaussian when n is large (Central Limit Theorem).
- Exponential is a special case of Gamma with alpha = 1.
- Beta(1, 1) = Uniform(0, 1).
- Chi-square(k) = Gamma(k/2, 1/2).
- Student's t approaches Gaussian as degrees of freedom approach infinity.
- The sum of independent Gaussian random variables is Gaussian.
- The Dirichlet distribution with K = 2 reduces to a Beta distribution.
B.5 Quantile Functions for Quick Reference
The quantile function (inverse CDF) returns the value x such that P(X <= x) = p. The following are commonly needed quantiles.
Standard Normal Quantiles
| Probability p | z (quantile) | Common Use |
|---|---|---|
| 0.001 | -3.090 | 99.9% lower bound |
| 0.005 | -2.576 | 99% lower bound |
| 0.010 | -2.326 | 98% lower bound |
| 0.025 | -1.960 | 95% CI lower bound |
| 0.050 | -1.645 | 90% CI lower bound |
| 0.100 | -1.282 | 80% CI lower bound |
| 0.250 | -0.674 | First quartile |
| 0.500 | 0.000 | Median |
| 0.750 | 0.674 | Third quartile |
| 0.900 | 1.282 | 80% CI upper bound |
| 0.950 | 1.645 | 90% CI upper bound |
| 0.975 | 1.960 | 95% CI upper bound |
| 0.990 | 2.326 | 98% CI upper bound |
| 0.995 | 2.576 | 99% CI upper bound |
| 0.999 | 3.090 | 99.9% upper bound |
B.6 Practical Notes for AI Engineering
While the tables in this appendix provide exact reference values, in practice AI engineers almost always compute these values programmatically. Here are the key functions in Python:
from scipy import stats
# Standard normal
stats.norm.cdf(1.96) # P(Z <= 1.96) = 0.975
stats.norm.ppf(0.975) # Quantile: z = 1.96
stats.norm.sf(1.96) # Survival: P(Z > 1.96) = 0.025
# t-distribution
stats.t.ppf(0.975, df=10) # t critical value, df=10
stats.t.cdf(2.228, df=10) # CDF at 2.228 with df=10
# Chi-square
stats.chi2.ppf(0.95, df=5) # Chi-square critical value
stats.chi2.sf(11.07, df=5) # p-value for chi-square test
# General: any scipy distribution
dist = stats.gamma(a=2, scale=1/3)
dist.mean() # Mean
dist.var() # Variance
dist.pdf(1.0) # PDF at x=1
dist.cdf(1.0) # CDF at x=1
dist.rvs(size=1000) # Random samples
When to use tables vs. code: Tables are useful for building intuition about distribution shapes and for quick mental estimates. In any production or research context, always use scipy.stats or equivalent libraries for exact computation. The tables here are provided so that readers can verify hand calculations and develop familiarity with the key distributions before relying entirely on software.