Chapter 21: Further Reading

Essential Sources

1. Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin, Bayesian Data Analysis (CRC Press, 3rd edition, 2013)

Chapter 5 of BDA3 is the definitive treatment of hierarchical models, and it is the single most important reference for this chapter. Gelman's development of the eight schools example — which this chapter uses as the canonical illustration — builds the intuition for partial pooling more carefully than any other source, spending 40 pages on what amounts to a 3-parameter model. The discussion of when to pool and when not to pool, the geometry of the posterior for $\tau$ (the between-group standard deviation), and the interpretation of shrinkage are all essential reading. Chapters 10-12 cover MCMC computational methods: the Metropolis algorithm, Gibbs sampling, Hamiltonian Monte Carlo, and convergence diagnostics. Chapter 7 covers model checking (posterior predictive checks) and Chapter 7.4 introduces cross-validation for Bayesian models.

Reading guidance: Start with Chapter 5 (hierarchical models) immediately after finishing this chapter — it expands on every hierarchical modeling concept covered here. Then read Chapter 11 (Hamiltonian Monte Carlo) for the mathematical details behind NUTS that this chapter explained intuitively. Chapter 7 (evaluating and comparing models) provides deeper treatment of posterior predictive checks and model comparison. If you read only one additional chapter of BDA3, read Chapter 5.

2. Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan (CRC Press, 2nd edition, 2020)

Chapters 13-14 of Statistical Rethinking cover multilevel (hierarchical) models with a pedagogical clarity that no other source matches. McElreath's "varying effects" terminology (instead of "random effects") is more intuitive and avoids the confusing frequentist connotations of "fixed" and "random." His development builds hierarchical models from the ground up: Chapter 13 starts with varying intercepts, adds varying slopes, then introduces the correlation between intercepts and slopes using the LKJ prior. Chapter 14 covers measurement error and missing data using hierarchical models. The associated lectures (freely available on YouTube) include animations of MCMC exploration and shrinkage that are more effective than any static diagram.

Reading guidance: Chapters 13-14 are the ideal companion to this chapter. McElreath's emphasis on simulation-based understanding (generating data from the model before fitting) aligns with the Bayesian workflow advocated here. The "overthinking" boxes in each chapter provide mathematical details for readers who want the derivations behind the intuition. If you found the non-centered parameterization discussion in this chapter too brief, McElreath's treatment in Section 13.4 is more thorough.

The PyMC documentation is both a reference manual and a curated collection of worked examples. The "Core Notebooks" section includes complete implementations of every model type in this chapter: linear regression, logistic regression, hierarchical models (both centered and non-centered), model comparison with ArviZ, and the full Bayesian workflow. The "Example Gallery" contains domain-specific applications including survival analysis, Gaussian processes, state-space models, and mixture models — many of which build directly on the hierarchical modeling foundation from this chapter.

Reading guidance: Start with the "Getting Started" tutorial and the "GLM: Linear Regression" example, which introduce the PyMC model specification pattern in detail. Then work through the "A Hierarchical model for Rugby prediction" example, which implements a hierarchical model with real sports data and demonstrates shrinkage, model comparison, and posterior predictive checking. The "Bayesian workflow" notebook (in the How-To section) codifies the five-stage workflow from this chapter with PyMC-specific code. Bookmark the ArviZ documentation (https://python.arviz.org/) for reference on diagnostic functions, plotting, and model comparison.

4. Aki Vehtari, Andrew Gelman, and Jonah Gabry, "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC," Statistics and Computing 27(5), 2017

This paper is the technical reference for the PSIS-LOO method used by ArviZ's az.loo() and az.compare() functions. It explains why PSIS-LOO is preferred over WAIC for hierarchical models (WAIC can be unreliable when individual observations are highly influential), introduces the Pareto $\hat{k}$ diagnostic for identifying problematic observations, and provides practical guidelines for interpreting model comparison results. The paper also compares LOO with AIC, DIC, and WAIC on theoretical grounds, making a compelling case for LOO as the default model comparison tool.

Reading guidance: Read Sections 1-3 for the motivation and the PSIS-LOO algorithm. Section 4 (diagnostics) is essential — it explains the Pareto $\hat{k}$ thresholds (0.5 and 0.7) used in this chapter. Section 7 (recommendations) provides a concise summary of when to use LOO vs. WAIC vs. Bayes factors. The associated R package loo (https://mc-stan.org/loo/) and its Python equivalent in ArviZ implement everything in the paper. If you use LOO in practice — and you should — this paper is the authoritative reference.

5. Andrew Gelman, Aki Vehtari, Daniel Simpson, et al., "Bayesian Workflow" (arXiv:2011.01808, 2020)

This paper formalizes the iterative Bayesian workflow practiced by the Stan development team and codified in Section 21.5 of this chapter. It covers the full cycle: model building, prior predictive simulation, computational faithfulness checking, posterior predictive checking, model comparison and selection, and troubleshooting at each stage. The paper's most valuable contribution is its treatment of what to do when things go wrong — non-convergence, poor posterior predictive checks, sensitivity to priors, conflicting model comparison metrics — situations that textbooks often gloss over but that practitioners encounter constantly.

Reading guidance: Read this paper after completing this chapter and working through several exercises. The paper assumes familiarity with the concepts covered here (MCMC, diagnostics, hierarchical models, model comparison) and builds on them with practical advice. Section 3 (prior predictive simulation) and Section 5 (posterior predictive checking) are the most immediately useful. The troubleshooting sections throughout the paper are worth reading even if you have not encountered the specific problems yet — they build the diagnostic intuition that separates competent Bayesian practitioners from practitioners who press "run" and hope.