Appendix B: Statistical Tables

This appendix provides reference tables for common statistical distributions used throughout this textbook. While modern practice relies on software to compute exact p-values and critical values, these tables remain valuable for building intuition, quick estimation, and exam settings where software is unavailable.


B.1 Standard Normal Distribution (Z-Table)

The standard normal distribution has mean 0 and variance 1. The table below gives the cumulative probability P(Z <= z) for the standard normal random variable Z.

Positive Z-Values: P(Z <= z)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Usage note: By symmetry of the standard normal distribution, P(Z <= -z) = 1 - P(Z <= z). For example, P(Z <= -1.96) = 1 - 0.9750 = 0.0250.

Commonly Used Critical Values

Confidence Level alpha (two-tailed) z* (critical value)
90% 0.10 1.645
95% 0.05 1.960
99% 0.01 2.576
99.5% 0.005 2.807
99.9% 0.001 3.291

Quick reference: Approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2 (precisely 1.96), and 99.7% within 3. This is the well-known "68-95-99.7 rule."


B.2 Student's t-Distribution Critical Values

The t-distribution arises when estimating the mean of a normally distributed population with unknown variance using a small sample. The table gives the critical value t_{alpha, df} such that P(T > t_{alpha, df}) = alpha for a t-distributed random variable T with df degrees of freedom.

Upper-Tail Critical Values

df alpha=0.10 alpha=0.05 alpha=0.025 alpha=0.01 alpha=0.005 alpha=0.001
1 3.078 6.314 12.706 31.821 63.657 318.309
2 1.886 2.920 4.303 6.965 9.925 22.327
3 1.638 2.353 3.182 4.541 5.841 10.215
4 1.533 2.132 2.776 3.747 4.604 7.173
5 1.476 2.015 2.571 3.365 4.032 5.893
6 1.440 1.943 2.447 3.143 3.707 5.208
7 1.415 1.895 2.365 2.998 3.499 4.785
8 1.397 1.860 2.306 2.896 3.355 4.501
9 1.383 1.833 2.262 2.821 3.250 4.297
10 1.372 1.812 2.228 2.764 3.169 4.144
11 1.363 1.796 2.201 2.718 3.106 4.025
12 1.356 1.782 2.179 2.681 3.055 3.930
13 1.350 1.771 2.160 2.650 3.012 3.852
14 1.345 1.761 2.145 2.624 2.977 3.787
15 1.341 1.753 2.131 2.602 2.947 3.733
20 1.325 1.725 2.086 2.528 2.845 3.552
25 1.316 1.708 2.060 2.485 2.787 3.450
30 1.310 1.697 2.042 2.457 2.750 3.385
40 1.303 1.684 2.021 2.423 2.704 3.307
50 1.299 1.676 2.009 2.403 2.678 3.261
60 1.296 1.671 2.000 2.390 2.660 3.232
80 1.292 1.664 1.990 2.374 2.639 3.195
100 1.290 1.660 1.984 2.364 2.626 3.174
120 1.289 1.658 1.980 2.358 2.617 3.160
inf 1.282 1.645 1.960 2.326 2.576 3.090

Usage note: For a two-tailed test at significance level alpha, use the column for alpha/2. For instance, for a 95% confidence interval (alpha = 0.05 two-tailed), look up alpha/2 = 0.025. With df = 10, the critical value is 2.228.

Practical guidance: As the degrees of freedom increase, the t-distribution approaches the standard normal distribution. For df > 30, the difference is often negligible for practical purposes.


B.3 Chi-Square Distribution Critical Values

The chi-square distribution with k degrees of freedom arises as the distribution of the sum of k independent squared standard normal random variables. It is used in goodness-of-fit tests, contingency table analysis, and confidence intervals for variance. The table gives chi^2_{alpha, df} such that P(X > chi^2_{alpha, df}) = alpha.

Upper-Tail Critical Values

df alpha=0.995 alpha=0.990 alpha=0.975 alpha=0.950 alpha=0.900 alpha=0.100 alpha=0.050 alpha=0.025 alpha=0.010 alpha=0.005
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169

Usage note: The chi-square distribution is asymmetric and non-negative. The left columns (large alpha, e.g., 0.995) give left-tail values, while the right columns (small alpha, e.g., 0.005) give right-tail values. For a two-tailed confidence interval for variance at confidence level 1 - alpha, use chi^2_{alpha/2, df} and chi^2_{1-alpha/2, df}.


B.4 Common Probability Distributions: Summary Reference

This section provides a consolidated reference table for the distributions most commonly encountered in machine learning and AI engineering.

Discrete Distributions

Bernoulli(p): Models a single binary trial. - Support: {0, 1} - PMF: P(X = k) = p^k * (1 - p)^{1-k} for k in {0, 1} - Mean: p - Variance: p(1 - p) - ML usage: Binary classification outputs, dropout masks, stochastic neurons

Binomial(n, p): Models the number of successes in n independent Bernoulli trials. - Support: {0, 1, ..., n} - PMF: P(X = k) = C(n, k) * p^k * (1 - p)^{n-k} - Mean: np - Variance: np(1 - p) - ML usage: Aggregating binary outcomes, bootstrap sampling analysis

Categorical(p_1, ..., p_K): Generalization of Bernoulli to K categories. - Support: {1, 2, ..., K} - PMF: P(X = k) = p_k - ML usage: Multi-class classification, language model next-token prediction

Poisson(lambda): Models the count of events in a fixed interval. - Support: {0, 1, 2, ...} - PMF: P(X = k) = (lambda^k * e^{-lambda}) / k! - Mean: lambda - Variance: lambda - ML usage: Count regression, event modeling, text word counts

Geometric(p): Number of trials until the first success. - Support: {1, 2, 3, ...} - PMF: P(X = k) = (1 - p)^{k-1} * p - Mean: 1/p - Variance: (1 - p) / p^2 - ML usage: Modeling waiting times, sequence lengths

Continuous Distributions

Uniform(a, b): All values in [a, b] are equally likely. - PDF: f(x) = 1/(b - a) for a <= x <= b - Mean: (a + b) / 2 - Variance: (b - a)^2 / 12 - ML usage: Random initialization, random search hyperparameter tuning

Gaussian (Normal) N(mu, sigma^2): The most important distribution in statistics. - PDF: f(x) = (1 / sqrt(2pisigma^2)) * exp(-(x - mu)^2 / (2*sigma^2)) - Mean: mu - Variance: sigma^2 - ML usage: Weight initialization, noise modeling, variational inference, Gaussian processes

Exponential(lambda): Models time between events in a Poisson process. - PDF: f(x) = lambda * exp(-lambda * x) for x >= 0 - Mean: 1/lambda - Variance: 1/lambda^2 - ML usage: Survival analysis, learning rate schedules

Beta(alpha, beta): Distribution over probabilities (values in [0, 1]). - PDF: f(x) = x^{alpha-1} * (1-x)^{beta-1} / B(alpha, beta) - Mean: alpha / (alpha + beta) - Variance: alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1)) - ML usage: Bayesian prior for probabilities, Thompson sampling in bandits

Gamma(alpha, beta): Generalization of the exponential distribution. - PDF: f(x) = (beta^alpha / Gamma(alpha)) * x^{alpha-1} * exp(-beta * x) for x >= 0 - Mean: alpha / beta - Variance: alpha / beta^2 - ML usage: Bayesian prior for precision parameters, modeling positive quantities

Student's t(nu): Heavy-tailed alternative to the Gaussian. - PDF: f(x) = Gamma((nu+1)/2) / (sqrt(nu*pi) * Gamma(nu/2)) * (1 + x^2/nu)^{-(nu+1)/2} - Mean: 0 (for nu > 1) - Variance: nu / (nu - 2) (for nu > 2) - ML usage: Robust regression, outlier modeling, small-sample inference

Dirichlet(alpha_1, ..., alpha_K): Multivariate generalization of the Beta distribution; distribution over probability simplices. - Mean: E[x_k] = alpha_k / sum_j alpha_j - ML usage: Prior for categorical distributions, topic modeling (LDA)

Distribution Relationships

Understanding how distributions relate to each other aids both theoretical understanding and practical modeling:

  • Bernoulli is a special case of Binomial with n = 1.
  • Binomial approaches Poisson when n is large and p is small (with lambda = np).
  • Binomial approaches Gaussian when n is large (Central Limit Theorem).
  • Exponential is a special case of Gamma with alpha = 1.
  • Beta(1, 1) = Uniform(0, 1).
  • Chi-square(k) = Gamma(k/2, 1/2).
  • Student's t approaches Gaussian as degrees of freedom approach infinity.
  • The sum of independent Gaussian random variables is Gaussian.
  • The Dirichlet distribution with K = 2 reduces to a Beta distribution.

B.5 Quantile Functions for Quick Reference

The quantile function (inverse CDF) returns the value x such that P(X <= x) = p. The following are commonly needed quantiles.

Standard Normal Quantiles

Probability p z (quantile) Common Use
0.001 -3.090 99.9% lower bound
0.005 -2.576 99% lower bound
0.010 -2.326 98% lower bound
0.025 -1.960 95% CI lower bound
0.050 -1.645 90% CI lower bound
0.100 -1.282 80% CI lower bound
0.250 -0.674 First quartile
0.500 0.000 Median
0.750 0.674 Third quartile
0.900 1.282 80% CI upper bound
0.950 1.645 90% CI upper bound
0.975 1.960 95% CI upper bound
0.990 2.326 98% CI upper bound
0.995 2.576 99% CI upper bound
0.999 3.090 99.9% upper bound

B.6 Practical Notes for AI Engineering

While the tables in this appendix provide exact reference values, in practice AI engineers almost always compute these values programmatically. Here are the key functions in Python:

from scipy import stats

# Standard normal
stats.norm.cdf(1.96)           # P(Z <= 1.96) = 0.975
stats.norm.ppf(0.975)          # Quantile: z = 1.96
stats.norm.sf(1.96)            # Survival: P(Z > 1.96) = 0.025

# t-distribution
stats.t.ppf(0.975, df=10)     # t critical value, df=10
stats.t.cdf(2.228, df=10)     # CDF at 2.228 with df=10

# Chi-square
stats.chi2.ppf(0.95, df=5)    # Chi-square critical value
stats.chi2.sf(11.07, df=5)    # p-value for chi-square test

# General: any scipy distribution
dist = stats.gamma(a=2, scale=1/3)
dist.mean()                     # Mean
dist.var()                      # Variance
dist.pdf(1.0)                   # PDF at x=1
dist.cdf(1.0)                   # CDF at x=1
dist.rvs(size=1000)            # Random samples

When to use tables vs. code: Tables are useful for building intuition about distribution shapes and for quick mental estimates. In any production or research context, always use scipy.stats or equivalent libraries for exact computation. The tables here are provided so that readers can verify hand calculations and develop familiarity with the key distributions before relying entirely on software.