Key Takeaways: Linear Regression — Your First Predictive Model
This is your reference card for Chapter 26. It covers the mechanics, interpretation, and evaluation of linear regression — the foundation of predictive modeling.
The Core Idea
Linear regression finds the straight line (or hyperplane) that minimizes the sum of squared prediction errors. Given features X and target y, it learns weights (coefficients) that produce predictions as close to the actual values as possible.
Simple: y = intercept + slope * x
Multiple: y = intercept + w1*x1 + w2*x2 + ... + wn*xn
Key Concepts
-
Slope (coefficient): The expected change in the target for a one-unit increase in the feature, holding all other features constant.
-
Intercept: The predicted value when all features equal zero. May or may not have a meaningful real-world interpretation.
-
Residual: The difference between an actual value and the predicted value. Residual = Actual - Predicted.
-
Least squares: The method that finds coefficients by minimizing the sum of squared residuals. Penalizes large errors more than small ones.
-
R-squared (R²): The proportion of variance in the target explained by the model. Ranges from 0 (no better than predicting the mean) to 1 (perfect prediction).
-
Multiple regression: Linear regression with more than one feature. Each feature gets its own coefficient.
-
Multicollinearity: When features are correlated with each other. Doesn't hurt predictions much but makes individual coefficients unstable and hard to interpret.
-
Feature scaling: Standardizing features to comparable scales. Makes coefficients directly comparable for assessing feature importance.
The scikit-learn Workflow
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score
# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 2. Train model
model = LinearRegression()
model.fit(X_train, y_train)
# 3. Predict
y_pred = model.predict(X_test)
# 4. Evaluate
print(f"R²: {model.score(X_test, y_test):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
# 5. Inspect coefficients
print(f"Intercept: {model.intercept_:.2f}")
for feat, coef in zip(X.columns, model.coef_):
print(f" {feat}: {coef:.4f}")
Interpreting Results
Coefficients
| Coefficient Type | Interpretation |
|---|---|
| Positive slope | Feature increases → target increases |
| Negative slope | Feature increases → target decreases |
| Large absolute value | Strong association (but scale-dependent) |
| Near zero | Weak association with target |
Always include the qualifier: "holding other features constant" for multiple regression.
Always remember: Coefficients describe associations, not causal effects.
R-squared
| R² Value | Interpretation |
|---|---|
| 0.00 - 0.10 | Model barely beats baseline |
| 0.10 - 0.30 | Weak model (common in very noisy domains) |
| 0.30 - 0.50 | Moderate (decent for social/behavioral data) |
| 0.50 - 0.70 | Strong (good for most applications) |
| 0.70 - 0.90 | Very strong |
| 0.90+ | Excellent (verify this isn't too good to be true) |
Diagnostic Checks
Check for Overfitting
Training R² ≈ Test R² → Good generalization
Training R² >> Test R² → Overfitting
Both R² low → Underfitting
Check for Nonlinearity
- Plot residuals vs. predicted values
- Random scatter around zero: model is appropriate
- Curved pattern: relationship is nonlinear — consider transformations or a different model
Check for Outlier Influence
- A few extreme points can pull the regression line dramatically
- Look for points with very large residuals
Common Transformations
| Relationship | Transformation | When to Use |
|---|---|---|
| Diminishing returns | log(feature) | GDP vs. life expectancy |
| Quadratic | feature² | Temperature vs. comfort |
| Growth curves | log(target) | Population, revenue |
import numpy as np
# Log transformation
X['log_feature'] = np.log(X['feature'] + 1)
Feature Scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # Fit on train
X_test_scaled = scaler.transform(X_test) # Apply to test
After scaling, coefficients are directly comparable (each represents the effect of a one-standard-deviation change).
Common Pitfalls
- Confusing correlation with causation. A positive coefficient for GDP doesn't mean increasing GDP increases the target.
- Ignoring nonlinearity. Always check scatter plots and residual plots before trusting a linear model.
- Extrapolating beyond data range. The model may predict impossible values (negative prices, >100% rates) outside the training range.
- Comparing unscaled coefficients. A coefficient of 0.001 for income and 50 for education doesn't mean education is more important — the scales are different.
- Adding features blindly. More features increase training R² but can decrease test R² through overfitting.
- Fitting scaler on full data. Always fit on training data only to prevent leakage.
What You Should Be Able to Do Now
- [ ] Fit a linear regression model using scikit-learn's
LinearRegression - [ ] Interpret slope and intercept in real-world context
- [ ] Evaluate model fit using R² and MAE
- [ ] Compare model performance to a baseline (predicting the mean)
- [ ] Check for overfitting by comparing training and test scores
- [ ] Create and interpret residual plots and predicted-vs-actual plots
- [ ] Extend simple regression to multiple features
- [ ] Handle nonlinear relationships with log transformations
- [ ] Explain multicollinearity and its effects on interpretation
- [ ] Scale features using StandardScaler (fit on training data only)
The Essential Comparison
Always present model results in context:
Baseline MAE: 12.5 ← "Just predict the average"
Model MAE: 7.2 ← Your model's performance
Improvement: 42% ← The value your model adds
A model that can't beat the baseline isn't learning anything useful.
You're ready for Chapter 27, where you'll predict categories instead of numbers. The equation changes slightly (from a line to a curve), but the workflow — split, train, evaluate, compare — stays exactly the same.