Further Reading: Logistic Regression

Books

For Deeper Understanding

David W. Hosmer, Stanley Lemeshow, and Rodney X. Sturdivant, Applied Logistic Regression, 3rd edition (2013) The standard reference for logistic regression in the health and social sciences. The first five chapters provide a thorough treatment of binary logistic regression with clear worked examples, including model-building strategies, diagnostics, and interpretation of results. The discussion of goodness-of-fit testing (Hosmer-Lemeshow test) fills a gap left by this chapter — how to assess whether the logistic model actually fits the data well, not just whether individual predictors are significant. The clinical and epidemiological examples make this particularly relevant for students interested in Maya's healthcare applications.

Alan Agresti, An Introduction to Categorical Data Analysis, 3rd edition (2019) Agresti is the authoritative voice on methods for categorical data. Chapters 4-6 cover logistic regression from a statistical perspective, with careful attention to the relationship between logistic regression and the chi-square tests you learned in Chapter 19. Agresti shows how a chi-square test of independence is a special case of logistic regression with a single categorical predictor — a connection that deepens your understanding of both methods. The treatment of conditional odds ratios and adjusted odds ratios extends the concepts introduced in this chapter.

James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning (ISLR), 2nd edition (2021) Free online at statlearning.com. Chapter 4 covers logistic regression as the first classification method, then builds to linear discriminant analysis, quadratic discriminant analysis, and naive Bayes — showing how logistic regression fits into the broader classification landscape. The key insight is the comparison of logistic regression (which models P(Y|X) directly) with generative methods (which model P(X|Y) and use Bayes' theorem). This connects directly to the Bayes' theorem material from Chapter 9. The lab uses both R and Python (Python edition by Hastie, James, Witten, and Tibshirani, 2023). Previously recommended for Chapters 22 and 23.

Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd edition (2022) Chapter 4 covers logistic regression as the gateway to machine learning classification. Géron's treatment of the softmax regression (multiclass extension of logistic regression) and the connection to neural networks is exactly the "Theme 3" story: logistic regression is the simplest neural network — a single neuron with a sigmoid activation function. The sklearn implementation closely matches the code in this chapter. For students who want to see where logistic regression leads in the machine learning pipeline.

For the Conceptually Curious

Nate Silver, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (2012) Silver's discussion of prediction in baseball (Chapter 3), weather forecasting (Chapter 4), and earthquake prediction (Chapter 5) all involve the same fundamental challenge as logistic regression: converting available information into a probability of a binary event. His treatment of calibration — does the model's predicted probability match the actual frequency? — adds an evaluation dimension not covered in this chapter. Silver's emphasis on "foxes" (who use many sources of information) over "hedgehogs" (who rely on one big idea) parallels the case for multiple predictors in logistic regression.

Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (2016) O'Neil, a mathematician, systematically documents how classification algorithms — many based on logistic regression or its descendants — are used to score job applicants, set insurance premiums, evaluate teachers, and assess criminal defendants. The book provides extensive real-world context for the ethical analysis in this chapter. Her examples of "proxy variables" (using zip code as a proxy for race, or credit score as a proxy for socioeconomic status) directly support James's analysis of the predictive policing algorithm. Essential reading for Theme 6.

Charles Wheelan, Naked Statistics: Stripping the Dread from the Data (2013) Wheelan's chapter on regression includes an accessible discussion of logistic regression in the context of the Challenger space shuttle disaster. The logistic regression model that could have predicted the O-ring failure — had it been applied correctly to the full dataset — is one of the most compelling examples of why this technique matters. Previously recommended for Chapters 12, 13, 18, 20, 22, and 23.

Articles and Papers

Cox, D. R. (1958). "The Regression Analysis of Binary Sequences." Journal of the Royal Statistical Society, Series B, 20(2), 215-242. The foundational paper that established logistic regression as a standard statistical method. Cox showed how maximum likelihood estimation could be applied to binary outcome models and provided the theoretical foundation for the hypothesis tests and confidence intervals used in this chapter. While mathematically advanced, the introduction and first few pages clearly motivate the problem and solution. Reading the original paper gives you an appreciation for how the method evolved from a specialized technique into the universal tool it is today.

Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing The investigative report that launched the national conversation about algorithmic fairness. Previously recommended in Chapter 23, this article is now directly connected to the technical content: ProPublica analyzed the COMPAS algorithm — a logistic regression-based risk assessment tool — and found that its false positive rate was nearly twice as high for Black defendants as for White defendants. This article is the real-world version of James's Case Study 2. Reading it with the confusion matrix vocabulary from this chapter transforms it from a journalistic narrative into a technical analysis.

Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163. The paper proving the impossibility result discussed in Case Study 2: when base rates differ between groups, an algorithm cannot simultaneously satisfy calibration, equal false positive rates, and equal false negative rates (except when the algorithm is trivially uninformative). This result has profound implications for anyone designing or evaluating classification algorithms. The mathematics is accessible to readers comfortable with probability and algebra. Previously recommended in Chapter 23.

Dressel, J. and Farid, H. (2018). "The Accuracy, Fairness, and Limits of Predicting Recidivism." Science Advances, 4(1), eaao5580. A study showing that untrained Amazon Mechanical Turk workers predicted recidivism about as accurately as the COMPAS algorithm, using only a few pieces of information. This raises fundamental questions about the value of complex classification models: if a simple model (or human judgment) performs comparably, what does the algorithm add — and does the added complexity introduce hidden bias? Previously recommended in Chapter 23.

Kleinberg, J., Mullainathan, S., and Raghavan, M. (2017). "Inherent Trade-Offs in the Fair Determination of Risk Scores." Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS '17). A mathematical formalization of the fairness impossibility theorem. The paper identifies three natural fairness conditions and proves that (with few exceptions) they cannot all hold simultaneously. Accessible to readers with basic probability knowledge. The authors frame the impossibility result not as a reason to abandon algorithms but as a reason to be explicit about which fairness criterion is being prioritized.

King, G. and Zeng, L. (2001). "Logistic Regression in Rare Events Data." Political Analysis, 9(2), 137-163. Many important applications of logistic regression involve rare events: fraud (< 1% of transactions), disease outbreaks, airline accidents. This paper shows that standard logistic regression underestimates the probability of rare events and proposes corrections. Relevant for students who encounter highly imbalanced datasets in their progressive projects or future work.

Online Resources

Statquest: "Logistic Regression, Clearly Explained" https://www.youtube.com/watch?v=yIYKR4sgzI8

Josh Starmer's visual explanation of logistic regression starts with the problem (why linear regression fails for binary data) and builds to the sigmoid function, maximum likelihood estimation, and odds ratios. The animations showing how the sigmoid curve shifts and stretches as coefficients change are especially helpful for visual learners. Follow-up videos on confusion matrices, ROC curves, and AUC complete the evaluation toolkit. Previously recommended for Chapters 13, 20, 22, and 23.

3Blue1Brown: "But What Is a Neural Network?" https://www.youtube.com/watch?v=aircAruvnKk

This animated introduction to neural networks makes the Theme 3 connection explicit: a single neuron with a sigmoid activation function IS logistic regression. The four-video series shows how stacking these "logistic regression units" into layers creates the deep learning models that power image recognition, language models, and autonomous vehicles. Understanding logistic regression means you understand the fundamental building block.

Google's Machine Learning Crash Course: Classification https://developers.google.com/machine-learning/crash-course/classification

Google's free course includes an interactive module on classification that covers logistic regression, the sigmoid function, confusion matrices, ROC curves, and AUC — the same concepts from this chapter, presented in the context of a tech company's machine learning workflow. The interactive visualizations of ROC curves (where you can drag the threshold and watch the metrics change in real time) are especially valuable.

Seeing Theory: Statistical Inference https://seeing-theory.brown.edu/bayesian-inference/index.html

The Brown University interactive visualization tool, previously recommended for regression, also covers Bayesian inference in a way that connects to this chapter's discussion of sensitivity, specificity, and PPV. The visual Bayes' theorem calculator lets you adjust sensitivity, specificity, and prevalence and watch the PPV change — making the connection between Chapter 9 and Chapter 24 tangible.

Scikit-Learn Documentation: Logistic Regression https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression

The official sklearn documentation for LogisticRegression. Includes discussion of regularization (L1/L2 penalties), multi-class classification, solver options, and practical guidance on when to use each option. The "User Guide" section provides mathematical background, and the "Examples" section includes complete workflows.

Statsmodels Documentation: Logit https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Logit.html

The official statsmodels documentation for the Logit model. Includes the .summary() output explanation, log-likelihood values, pseudo-R-squared measures, and methods for computing odds ratios and confidence intervals. Essential reference for the statistical inference approach to logistic regression.

For the Ethically Engaged

Barocas, S. and Selbst, A. D. (2016). "Big Data's Disparate Impact." California Law Review, 104, 671-732. A legal analysis of how machine learning classification systems — including logistic regression-based tools — can violate anti-discrimination law even when they don't explicitly use protected characteristics as features. The discussion of "proxy discrimination" (using variables correlated with protected characteristics) provides the legal framework for the proxy variable analysis in James's case study. Required reading for anyone working at the intersection of data science and law.

Corbett-Davies, S. and Goel, S. (2018). "The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning." arXiv preprint arXiv:1808.00023. A comprehensive survey of mathematical definitions of algorithmic fairness. The paper organizes the many competing fairness definitions (demographic parity, equalized odds, calibration, individual fairness) and explains why they conflict with each other. This survey contextualizes the impossibility theorem from Case Study 2 and provides vocabulary for the ongoing debate about what "fair AI" actually means.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." Science, 366(6464), 447-453. A landmark study finding that a widely used healthcare algorithm — one used to allocate care to millions of patients — systematically underestimated the health needs of Black patients. The algorithm used healthcare costs as a proxy for health needs, but because Black patients face barriers to accessing care, their costs were systematically lower for the same level of illness. This is a real-world version of Maya's case study, demonstrating how proxy variables can introduce racial bias even in well-intentioned healthcare algorithms.

Eubanks, V. (2018). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin's Press. Eubanks documents three case studies of automated decision-making systems affecting poor communities: the Indiana Medicaid eligibility algorithm, the Los Angeles homelessness vulnerability index, and the Allegheny Family Screening Tool (a child welfare risk assessment). Each case involves classification algorithms — often logistic regression or similar models — that have real, life-changing consequences. The book makes concrete the abstract ethical questions raised in this chapter.