Key Takeaways: Logistic Regression and Classification — Predicting Categories
This is your reference card for Chapter 27. It covers classification, the sigmoid function, confusion matrices, and the critical precision-recall tradeoff that governs every classification decision.
The Core Idea
Logistic regression is linear regression wrapped in a sigmoid function. It computes a weighted sum of features (like linear regression), then squishes the result through the sigmoid to produce a probability between 0 and 1. This probability is then thresholded to make a classification decision.
Score = intercept + w1*x1 + w2*x2 + ... + wn*xn (linear part)
Probability = 1 / (1 + e^(-Score)) (sigmoid)
Prediction = "positive" if Probability >= threshold (decision)
Key Concepts
-
Classification: Predicting a category (yes/no, high/low, spam/not spam), as opposed to regression which predicts a number.
-
Sigmoid function: Maps any number to the range (0, 1). S-shaped curve that squashes extreme values.
-
Threshold: The probability cutoff for classification decisions. Default is 0.5, but the optimal threshold depends on the costs of different types of errors.
-
predict_proba: Returns probability estimates, which are more informative than binary predictions. Use probabilities to rank, communicate uncertainty, and set custom thresholds.
-
Confusion matrix: Four-cell table showing TP, FP, TN, FN that reveals the types of errors, not just the count.
-
Precision: TP / (TP + FP). "When the model says yes, how often is it right?"
-
Recall (sensitivity): TP / (TP + FN). "Of all actual positives, how many does the model catch?"
-
F1-score: Harmonic mean of precision and recall. Only high when both are high.
-
Class imbalance: When one class dominates, accuracy is misleading. Use precision, recall, and F1 instead.
The scikit-learn Workflow
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (confusion_matrix,
classification_report, accuracy_score)
# Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict (binary)
y_pred = model.predict(X_test)
# Predict (probabilities)
y_proba = model.predict_proba(X_test)[:, 1]
# Evaluate
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
The Confusion Matrix
Predicted
Positive Negative
Actual Pos TP FN ← Recall = TP/(TP+FN)
Neg FP TN
↑
Precision = TP/(TP+FP)
| Cell | Meaning | Also Called |
|---|---|---|
| TP | Correct positive | Hit |
| TN | Correct negative | Correct rejection |
| FP | Incorrect positive | False alarm, Type I error |
| FN | Missed positive | Miss, Type II error |
Precision vs. Recall Tradeoff
Lower threshold → More positives predicted
→ Higher recall (fewer misses)
→ Lower precision (more false alarms)
Higher threshold → Fewer positives predicted
→ Lower recall (more misses)
→ Higher precision (fewer false alarms)
When to prioritize recall: Missing a positive case is costly (cancer screening, security threats, fraud detection).
When to prioritize precision: False alarms are costly (spam filtering, criminal accusations, surgical recommendations).
Handling Class Imbalance
# Strategy 1: Balanced class weights
model = LogisticRegression(class_weight='balanced', max_iter=1000)
# Strategy 2: Custom threshold
y_proba = model.predict_proba(X_test)[:, 1]
y_pred_custom = (y_proba >= 0.3).astype(int)
# Strategy 3: Better metrics (not just accuracy)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
The baseline test for imbalanced data: If your model's accuracy doesn't beat "always predict the majority class," it hasn't learned anything useful.
Interpreting Coefficients
| Coefficient Sign | Meaning |
|---|---|
| Positive | Higher feature value increases probability of positive class |
| Negative | Higher feature value decreases probability of positive class |
| Larger absolute value | Stronger association |
Important: The effect on probability is not linear — it depends on where you are on the sigmoid curve. The coefficient is constant in log-odds space, not in probability space.
Odds ratio: exp(coefficient). An odds ratio of 1.5 means a one-unit increase in the feature multiplies the odds by 1.5.
Regression vs. Classification: When to Use Which
| Task | Model | Output | Use When |
|---|---|---|---|
| Predict a number | Linear Regression | Continuous value | You need a specific numerical prediction |
| Predict a category | Logistic Regression | Probability + class | You need a yes/no decision or risk ranking |
You can convert regression to classification (e.g., "if predicted rate >= 80%, classify as high") but logistic regression is purpose-built for classification tasks.
Common Pitfalls
- Using accuracy alone for imbalanced data. Always check precision, recall, and the confusion matrix.
- Ignoring predict_proba. Binary predictions lose important information about confidence and ranking.
- Using the default 0.5 threshold without thinking. The optimal threshold depends on the costs of FP vs. FN.
- Confusing logistic regression with linear regression. Despite the name, logistic regression is for classification.
- Not scaling features. Logistic regression with regularization (default in scikit-learn) can produce different results depending on feature scales.
- Interpreting coefficients as linear effects on probability. The effect depends on where you are on the sigmoid curve.
What You Should Be Able to Do Now
- [ ] Explain why linear regression is inappropriate for classification
- [ ] Describe how the sigmoid function converts a score to a probability
- [ ] Fit a logistic regression model with scikit-learn
- [ ] Use
predict_probato get probability outputs - [ ] Construct and read a confusion matrix
- [ ] Calculate precision and recall from a confusion matrix
- [ ] Choose between precision and recall based on error costs
- [ ] Adjust the classification threshold for different scenarios
- [ ] Recognize class imbalance and choose appropriate evaluation metrics
- [ ] Compare model performance to the majority-class baseline
The Decision Framework
When choosing how to evaluate your classifier, ask:
- Is the data imbalanced? If yes, don't trust accuracy alone.
- Which error is worse: FP or FN? This determines whether to prioritize precision or recall.
- Do I need a decision or a ranking? If ranking, use
predict_proba. - What threshold serves the application? Set it based on error costs, not convenience.
- Does the model beat the baseline? If it can't outperform "always predict the majority class," it isn't useful.
You're ready for Chapter 28, where you'll learn about decision trees — a completely different kind of model that makes predictions by asking a series of yes/no questions about the features. Decision trees handle nonlinear relationships naturally and are among the most interpretable models in machine learning.