Quiz: Logistic Regression
Test your understanding of logistic regression, odds ratios, the sigmoid function, confusion matrices, ROC curves, and classification ethics. Try to answer each question before revealing the answer.
1. The primary reason we use logistic regression instead of linear regression for binary outcomes is:
(a) Logistic regression is always more accurate than linear regression (b) Linear regression cannot handle multiple predictors for binary outcomes (c) Linear regression can produce predicted probabilities outside the range [0, 1] (d) Logistic regression is easier to compute than linear regression
Answer
**(c) Linear regression can produce predicted probabilities outside the range [0, 1].** Linear regression can handle multiple predictors just fine (a is irrelevant, b is wrong). The fundamental problem is that a straight line extended far enough will predict probabilities greater than 1 or less than 0 — both impossible. The sigmoid function in logistic regression constrains all predictions to the valid [0, 1] range.2. If the probability of an event is 0.80, the odds of the event are:
(a) 0.80 (b) 0.20 (c) 4.0 (d) 1.6
Answer
**(c) 4.0** Odds = P/(1-P) = 0.80/0.20 = 4.0. This means the event is 4 times more likely to occur than not occur — "4 to 1 odds." Option (a) confuses probability with odds. Option (b) is 1-P. Option (d) has no standard interpretation.3. The sigmoid (logistic) function outputs values in the range:
(a) $(-\infty, +\infty)$ (b) $(-1, 1)$ (c) $(0, 1)$ (d) $[0, 1]$
Answer
**(c) $(0, 1)$** The sigmoid function approaches 0 and 1 asymptotically but never actually reaches them. The open interval $(0, 1)$ is the precise range. Option (d) with square brackets would mean it includes exactly 0 and 1, which it doesn't. This is a fine mathematical point — for all practical purposes, the sigmoid can get extremely close to 0 or 1.4. In the logistic regression equation $\ln(P/(1-P)) = b_0 + b_1 x$, the left side of the equation is:
(a) The probability (b) The odds (c) The log-odds (logit) (d) The odds ratio
Answer
**(c) The log-odds (logit)** $\ln(P/(1-P))$ is the natural logarithm of the odds. The odds are $P/(1-P)$ (option b). The log-odds transform the bounded probability into an unbounded quantity that can be modeled with a linear equation. The odds ratio (d) is $e^{b_1}$, which measures the multiplicative change in odds per unit change in $x$.5. A logistic regression coefficient of $b_1 = 0.693$ corresponds to an odds ratio of:
(a) 0.693 (b) 1.0 (c) 2.0 (d) 0.50
Answer
**(c) 2.0** The odds ratio is $e^{b_1} = e^{0.693} = 2.0$. (Since $\ln(2) = 0.693$.) This means the odds double for each one-unit increase in the predictor. Option (a) confuses the coefficient with the odds ratio. Option (b) would correspond to $b_1 = 0$. Option (d) would correspond to $b_1 = -0.693$.6. An odds ratio of 0.75 for a predictor means:
(a) The probability decreases by 75% for each unit increase (b) The odds decrease by 25% for each unit increase (c) The odds increase by 75% for each unit increase (d) The probability decreases by 25% for each unit increase
Answer
**(b) The odds decrease by 25% for each unit increase.** An OR of 0.75 means the odds are multiplied by 0.75 for each one-unit increase — a 25% reduction. This is NOT the same as a 25% reduction in probability (d), because the relationship between odds and probability is nonlinear. Options (a) and (c) misinterpret the number.7. The sensitivity (recall) of a classification model is defined as:
(a) TP / (TP + FP) (b) TP / (TP + FN) (c) TN / (TN + FP) (d) (TP + TN) / (TP + FP + FN + TN)
Answer
**(b) TP / (TP + FN)** Sensitivity asks: "of all actual positives, how many did the model correctly identify?" The denominator is all actual positives (TP + FN). Option (a) is precision. Option (c) is specificity. Option (d) is accuracy. This is the same definition from Chapter 9, now applied to model evaluation.8. A confusion matrix shows: TP = 50, FP = 30, FN = 10, TN = 210. The precision of this model is:
(a) 50/60 = 0.833 (b) 50/80 = 0.625 (c) 210/240 = 0.875 (d) 260/300 = 0.867
Answer
**(b) 50/80 = 0.625** Precision = TP / (TP + FP) = 50 / (50 + 30) = 50/80 = 0.625. This means that when the model predicts "positive," it's correct 62.5% of the time. Option (a) is sensitivity (50/(50+10)). Option (c) is specificity (210/(210+30)). Option (d) is accuracy ((50+210)/300).9. Why can accuracy be misleading for imbalanced datasets?
(a) Because accuracy ignores the training data (b) Because a model that always predicts the majority class can have high accuracy without actually detecting the minority class (c) Because accuracy is always lower than AUC (d) Because accuracy cannot be calculated for binary outcomes
Answer
**(b) Because a model that always predicts the majority class can have high accuracy without actually detecting the minority class.** This is the accuracy paradox. If 98% of emails are not spam, a model that labels everything "not spam" has 98% accuracy but catches zero spam. That's why sensitivity, specificity, precision, F1, and AUC are essential complements to accuracy. Options (c) and (d) are factually incorrect. Option (a) is unrelated to the issue.10. A logistic regression model predicts loan default with the equation:
$$\ln\left(\frac{P}{1-P}\right) = -2.5 + 0.04 \times \text{debt\_to\_income\_pct}$$
For a borrower with a debt-to-income ratio of 50%, the predicted probability of default is:
(a) About 0.12 (12%) (b) About 0.38 (38%) (c) About 0.50 (50%) (d) About 0.62 (62%)
Answer
**(b) About 0.38 (38%)** Step 1: Log-odds = $-2.5 + 0.04 \times 50 = -2.5 + 2.0 = -0.5$ Step 2: $P = \frac{1}{1 + e^{0.5}} = \frac{1}{1 + 1.649} = \frac{1}{2.649} = 0.378$ About 38%. The negative log-odds ($-0.5$) indicate that default is slightly less likely than not defaulting.11. Which of the following is true about the ROC curve?
(a) It plots accuracy vs. the threshold (b) It plots sensitivity vs. specificity (c) It plots sensitivity (TPR) vs. 1 - specificity (FPR) at various thresholds (d) It plots precision vs. recall at various thresholds
Answer
**(c) It plots sensitivity (TPR) vs. 1 - specificity (FPR) at various thresholds.** The ROC curve shows the tradeoff between the true positive rate and the false positive rate as the classification threshold varies from 0 to 1. Option (b) is close but has the x-axis wrong (it's 1 - specificity, not specificity). Option (d) describes a precision-recall curve, which is a different (also useful) visualization.12. An AUC of 0.50 means the model:
(a) Is perfect (b) Has 50% accuracy (c) Performs no better than random guessing (d) Correctly classifies 50% of positive cases
Answer
**(c) Performs no better than random guessing.** AUC = 0.50 corresponds to the diagonal line on the ROC plot — the model has no discriminative ability. It's no better than flipping a coin. AUC = 1.0 would be a perfect model. Note that AUC = 0.50 does NOT mean 50% accuracy (b) — accuracy depends on the threshold and base rate.13. In a cancer screening model, the consequences of a false negative (missing a cancer) are typically much worse than a false positive (unnecessary biopsy). To prioritize catching cancers, you should:
(a) Raise the classification threshold (e.g., from 0.5 to 0.7) (b) Lower the classification threshold (e.g., from 0.5 to 0.3) (c) Use accuracy as the primary metric (d) Remove predictors from the model
Answer
**(b) Lower the classification threshold (e.g., from 0.5 to 0.3).** Lowering the threshold means more observations are classified as positive. This increases sensitivity (catching more true cancers) at the cost of lower specificity (more false positives / unnecessary biopsies). When the cost of a false negative is much higher than a false positive, this tradeoff is worthwhile. Raising the threshold (a) would do the opposite — catch fewer cancers.14. The relationship between logistic regression and Bayes' theorem (Ch. 9) is:
(a) They are completely unrelated concepts (b) Logistic regression uses Bayes' theorem in its fitting algorithm (c) The confusion matrix provides the ingredients for Bayesian reasoning about model predictions: sensitivity corresponds to P(positive test | disease) and precision corresponds to the PPV (d) Bayes' theorem can only be applied to medical tests, not to statistical models
Answer
**(c) The confusion matrix provides the ingredients for Bayesian reasoning about model predictions.** Sensitivity = P(model predicts positive | actually positive), which is analogous to P(test positive | disease) in Chapter 9. Precision (PPV) = P(actually positive | model predicts positive), which is the Bayes' theorem result. The base rate (prevalence) still matters — a model with high sensitivity can still have low precision when the base rate is low. This is the same lesson as Chapter 9's disease screening examples.15. A predictive policing algorithm has the following error rates:
| Group | False Positive Rate | False Negative Rate |
|---|---|---|
| Group A | 15% | 20% |
| Group B | 30% | 10% |
Which statement is most accurate?
(a) The algorithm is fair because the total error rate is similar for both groups (b) The algorithm systematically over-predicts risk for Group B compared to Group A (c) The algorithm is biased against Group A because their false negative rate is higher (d) The false positive rates are irrelevant; only accuracy matters
Answer
**(b) The algorithm systematically over-predicts risk for Group B compared to Group A.** A higher false positive rate for Group B means more people in Group B are wrongly classified as high-risk when they would not have reoffended. In practical terms, this means Group B members are more likely to be denied bail or given harsher sentences based on incorrect predictions. While the total error rates may be similar (15+20=35% vs. 30+10=40%), the *type* of error differs, and the consequences are asymmetric. This is the core of the algorithmic fairness debate from Theme 6.16. In logistic regression with multiple predictors, the coefficient for $x_1$ represents:
(a) The total effect of $x_1$ on the outcome (b) The effect of $x_1$ on the log-odds, not controlling for other variables (c) The change in log-odds of the outcome for a one-unit increase in $x_1$, holding all other predictors constant (d) The change in probability for a one-unit increase in $x_1$
Answer
**(c) The change in log-odds of the outcome for a one-unit increase in $x_1$, holding all other predictors constant.** This directly parallels the interpretation of multiple regression coefficients from Chapter 23. The "holding all other predictors constant" phrase is essential — without it, you'd be describing a simple logistic regression coefficient (b). Option (d) is incorrect because the coefficient is a change in *log-odds*, not probability. To get the effect on odds, exponentiate: $e^{b_1}$ is the odds ratio.17. Which of the following is a correct interpretation of an odds ratio of 1.42 for "number of prior admissions" predicting hospital readmission?
(a) Patients with prior admissions have a 42% probability of readmission (b) Each additional prior admission increases the probability of readmission by 42% (c) Each additional prior admission is associated with a 42% increase in the odds of readmission, holding other variables constant (d) Patients with 1.42 prior admissions have even odds of readmission
Answer
**(c) Each additional prior admission is associated with a 42% increase in the odds of readmission, holding other variables constant.** This is the standard odds ratio interpretation. Option (a) confuses an odds ratio with a probability. Option (b) confuses odds with probability — a 42% increase in *odds* is not the same as a 42% increase in *probability*. Option (d) is nonsensical.18. The F1 score is defined as:
(a) The arithmetic mean of precision and recall (b) The harmonic mean of precision and recall (c) The geometric mean of sensitivity and specificity (d) The weighted average of accuracy and AUC
Answer
**(b) The harmonic mean of precision and recall.** $F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$ The harmonic mean penalizes extreme imbalances: if either precision or recall is very low, the F1 score will also be low, even if the other is high. The arithmetic mean (a) would be less sensitive to such imbalances.19. Which statement best describes the relationship between logistic regression and machine learning classification?
(a) Logistic regression is not a machine learning method (b) Logistic regression is the simplest classifier and serves as the foundation for more complex classification models like neural networks (c) Machine learning classifiers are completely different from logistic regression and share no common principles (d) Logistic regression is only used in statistics, never in machine learning
Answer
**(b) Logistic regression is the simplest classifier and serves as the foundation for more complex classification models like neural networks.** Logistic regression IS a classification algorithm — it belongs equally to statistics and machine learning. Neural networks are essentially stacked logistic regressions with nonlinear activation functions. The evaluation metrics (confusion matrix, ROC, AUC) used for logistic regression are the same ones used for all classifiers. Understanding logistic regression means understanding the core architecture of classification.20. The threshold concept for this chapter is "thinking in odds." Which of the following correctly describes the relationship between probability, odds, and log-odds?
(a) Probability ranges from 0 to 1; odds range from 0 to $\infty$; log-odds range from $-\infty$ to $+\infty$. Logistic regression models the log-odds as a linear function because log-odds have no bounds. (b) Probability, odds, and log-odds are three different names for the same quantity (c) Odds are always larger than probabilities (d) Log-odds are always negative