Key Takeaways: Linear Regression — Your First Predictive Model

This is your reference card for Chapter 26. It covers the mechanics, interpretation, and evaluation of linear regression — the foundation of predictive modeling.


The Core Idea

Linear regression finds the straight line (or hyperplane) that minimizes the sum of squared prediction errors. Given features X and target y, it learns weights (coefficients) that produce predictions as close to the actual values as possible.

Simple:   y = intercept + slope * x
Multiple: y = intercept + w1*x1 + w2*x2 + ... + wn*xn

Key Concepts

  • Slope (coefficient): The expected change in the target for a one-unit increase in the feature, holding all other features constant.

  • Intercept: The predicted value when all features equal zero. May or may not have a meaningful real-world interpretation.

  • Residual: The difference between an actual value and the predicted value. Residual = Actual - Predicted.

  • Least squares: The method that finds coefficients by minimizing the sum of squared residuals. Penalizes large errors more than small ones.

  • R-squared (R²): The proportion of variance in the target explained by the model. Ranges from 0 (no better than predicting the mean) to 1 (perfect prediction).

  • Multiple regression: Linear regression with more than one feature. Each feature gets its own coefficient.

  • Multicollinearity: When features are correlated with each other. Doesn't hurt predictions much but makes individual coefficients unstable and hard to interpret.

  • Feature scaling: Standardizing features to comparable scales. Makes coefficients directly comparable for assessing feature importance.


The scikit-learn Workflow

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 2. Train model
model = LinearRegression()
model.fit(X_train, y_train)

# 3. Predict
y_pred = model.predict(X_test)

# 4. Evaluate
print(f"R²:  {model.score(X_test, y_test):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")

# 5. Inspect coefficients
print(f"Intercept: {model.intercept_:.2f}")
for feat, coef in zip(X.columns, model.coef_):
    print(f"  {feat}: {coef:.4f}")

Interpreting Results

Coefficients

Coefficient Type Interpretation
Positive slope Feature increases → target increases
Negative slope Feature increases → target decreases
Large absolute value Strong association (but scale-dependent)
Near zero Weak association with target

Always include the qualifier: "holding other features constant" for multiple regression.

Always remember: Coefficients describe associations, not causal effects.

R-squared

R² Value Interpretation
0.00 - 0.10 Model barely beats baseline
0.10 - 0.30 Weak model (common in very noisy domains)
0.30 - 0.50 Moderate (decent for social/behavioral data)
0.50 - 0.70 Strong (good for most applications)
0.70 - 0.90 Very strong
0.90+ Excellent (verify this isn't too good to be true)

Diagnostic Checks

Check for Overfitting

Training R² ≈ Test R²    → Good generalization
Training R² >> Test R²   → Overfitting
Both R² low              → Underfitting

Check for Nonlinearity

  • Plot residuals vs. predicted values
  • Random scatter around zero: model is appropriate
  • Curved pattern: relationship is nonlinear — consider transformations or a different model

Check for Outlier Influence

  • A few extreme points can pull the regression line dramatically
  • Look for points with very large residuals

Common Transformations

Relationship Transformation When to Use
Diminishing returns log(feature) GDP vs. life expectancy
Quadratic feature² Temperature vs. comfort
Growth curves log(target) Population, revenue
import numpy as np
# Log transformation
X['log_feature'] = np.log(X['feature'] + 1)

Feature Scaling

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit on train
X_test_scaled = scaler.transform(X_test)         # Apply to test

After scaling, coefficients are directly comparable (each represents the effect of a one-standard-deviation change).


Common Pitfalls

  1. Confusing correlation with causation. A positive coefficient for GDP doesn't mean increasing GDP increases the target.
  2. Ignoring nonlinearity. Always check scatter plots and residual plots before trusting a linear model.
  3. Extrapolating beyond data range. The model may predict impossible values (negative prices, >100% rates) outside the training range.
  4. Comparing unscaled coefficients. A coefficient of 0.001 for income and 50 for education doesn't mean education is more important — the scales are different.
  5. Adding features blindly. More features increase training R² but can decrease test R² through overfitting.
  6. Fitting scaler on full data. Always fit on training data only to prevent leakage.

What You Should Be Able to Do Now

  • [ ] Fit a linear regression model using scikit-learn's LinearRegression
  • [ ] Interpret slope and intercept in real-world context
  • [ ] Evaluate model fit using R² and MAE
  • [ ] Compare model performance to a baseline (predicting the mean)
  • [ ] Check for overfitting by comparing training and test scores
  • [ ] Create and interpret residual plots and predicted-vs-actual plots
  • [ ] Extend simple regression to multiple features
  • [ ] Handle nonlinear relationships with log transformations
  • [ ] Explain multicollinearity and its effects on interpretation
  • [ ] Scale features using StandardScaler (fit on training data only)

The Essential Comparison

Always present model results in context:

Baseline MAE: 12.5        ← "Just predict the average"
Model MAE:    7.2          ← Your model's performance
Improvement:  42%          ← The value your model adds

A model that can't beat the baseline isn't learning anything useful.


You're ready for Chapter 27, where you'll predict categories instead of numbers. The equation changes slightly (from a line to a curve), but the workflow — split, train, evaluate, compare — stays exactly the same.