Key Takeaways: Linear Regression — Your First Predictive Model

Contributors to Introduction to Data Science

Key Takeaways: Linear Regression — Your First Predictive Model

This is your reference card for Chapter 26. It covers the mechanics, interpretation, and evaluation of linear regression — the foundation of predictive modeling.

The Core Idea

Linear regression finds the straight line (or hyperplane) that minimizes the sum of squared prediction errors. Given features X and target y, it learns weights (coefficients) that produce predictions as close to the actual values as possible.

Simple:   y = intercept + slope * x
Multiple: y = intercept + w1*x1 + w2*x2 + ... + wn*xn

Key Concepts

Slope (coefficient): The expected change in the target for a one-unit increase in the feature, holding all other features constant.
Intercept: The predicted value when all features equal zero. May or may not have a meaningful real-world interpretation.
Residual: The difference between an actual value and the predicted value. Residual = Actual - Predicted.
Least squares: The method that finds coefficients by minimizing the sum of squared residuals. Penalizes large errors more than small ones.
R-squared (R²): The proportion of variance in the target explained by the model. Ranges from 0 (no better than predicting the mean) to 1 (perfect prediction).
Multiple regression: Linear regression with more than one feature. Each feature gets its own coefficient.
Multicollinearity: When features are correlated with each other. Doesn't hurt predictions much but makes individual coefficients unstable and hard to interpret.
Feature scaling: Standardizing features to comparable scales. Makes coefficients directly comparable for assessing feature importance.

The scikit-learn Workflow

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 2. Train model
model = LinearRegression()
model.fit(X_train, y_train)

# 3. Predict
y_pred = model.predict(X_test)

# 4. Evaluate
print(f"R²:  {model.score(X_test, y_test):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")

# 5. Inspect coefficients
print(f"Intercept: {model.intercept_:.2f}")
for feat, coef in zip(X.columns, model.coef_):
    print(f"  {feat}: {coef:.4f}")

Interpreting Results

Coefficients

Coefficient Type	Interpretation
Positive slope	Feature increases → target increases
Negative slope	Feature increases → target decreases
Large absolute value	Strong association (but scale-dependent)
Near zero	Weak association with target

Always include the qualifier: "holding other features constant" for multiple regression.

Always remember: Coefficients describe associations, not causal effects.

R-squared

R² Value	Interpretation
0.00 - 0.10	Model barely beats baseline
0.10 - 0.30	Weak model (common in very noisy domains)
0.30 - 0.50	Moderate (decent for social/behavioral data)
0.50 - 0.70	Strong (good for most applications)
0.70 - 0.90	Very strong
0.90+	Excellent (verify this isn't too good to be true)

Diagnostic Checks

Check for Overfitting

Training R² ≈ Test R²    → Good generalization
Training R² >> Test R²   → Overfitting
Both R² low              → Underfitting

Check for Nonlinearity

Plot residuals vs. predicted values
Random scatter around zero: model is appropriate
Curved pattern: relationship is nonlinear — consider transformations or a different model

Check for Outlier Influence

A few extreme points can pull the regression line dramatically
Look for points with very large residuals

Common Transformations

Relationship	Transformation	When to Use
Diminishing returns	log(feature)	GDP vs. life expectancy
Quadratic	feature²	Temperature vs. comfort
Growth curves	log(target)	Population, revenue

import numpy as np
# Log transformation
X['log_feature'] = np.log(X['feature'] + 1)

Feature Scaling

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit on train
X_test_scaled = scaler.transform(X_test)         # Apply to test

After scaling, coefficients are directly comparable (each represents the effect of a one-standard-deviation change).

Common Pitfalls

Confusing correlation with causation. A positive coefficient for GDP doesn't mean increasing GDP increases the target.
Ignoring nonlinearity. Always check scatter plots and residual plots before trusting a linear model.
Extrapolating beyond data range. The model may predict impossible values (negative prices, >100% rates) outside the training range.
Comparing unscaled coefficients. A coefficient of 0.001 for income and 50 for education doesn't mean education is more important — the scales are different.
Adding features blindly. More features increase training R² but can decrease test R² through overfitting.
Fitting scaler on full data. Always fit on training data only to prevent leakage.

What You Should Be Able to Do Now

[ ] Fit a linear regression model using scikit-learn's LinearRegression
[ ] Interpret slope and intercept in real-world context
[ ] Evaluate model fit using R² and MAE
[ ] Compare model performance to a baseline (predicting the mean)
[ ] Check for overfitting by comparing training and test scores
[ ] Create and interpret residual plots and predicted-vs-actual plots
[ ] Extend simple regression to multiple features
[ ] Handle nonlinear relationships with log transformations
[ ] Explain multicollinearity and its effects on interpretation
[ ] Scale features using StandardScaler (fit on training data only)

The Essential Comparison

Always present model results in context:

Baseline MAE: 12.5        ← "Just predict the average"
Model MAE:    7.2          ← Your model's performance
Improvement:  42%          ← The value your model adds

A model that can't beat the baseline isn't learning anything useful.

You're ready for Chapter 27, where you'll predict categories instead of numbers. The equation changes slightly (from a line to a curve), but the workflow — split, train, evaluate, compare — stays exactly the same.