Key Takeaways: Introduction to Prediction Models

DataField.Dev

Key Takeaways: Introduction to Prediction Models

One-Page Reference

Core Concept

A prediction model is a systematic method for generating forecasts about uncertain future events based on available information—not guessing, but mathematical transformation of data into probabilistic outcomes.

The Prediction Pipeline

[Raw Data] → [Feature Engineering] → [Model] → [Predictions] → [Evaluation]
     ↑                                              ↓
     └──────────── [Feedback Loop] ←───────────────┘

Types of NFL Predictions

Type	Output	Use Case
Outcome	Winner + probability	Straight-up picks
Spread	Point margin	Betting analysis
Total	Combined score	Over/under analysis
Season	Win total	Futures, projections

Key Evaluation Metrics

Accuracy Metrics

Metric	Formula	Benchmark
Straight-up	Correct / Total	50% random, 55-60% good
ATS	Covers / Total	52.4% to profit
Brier Score	mean((prob - outcome)²)	0.25 random, <0.22 good
MAE	mean(\|pred - actual\|)	~10-12 points

What "Good" Looks Like

Straight-up: 55-60% over large sample
ATS: 53-55% is elite (very rare)
Brier: Below 0.22
MAE: Below 12 points

Common Pitfalls

1. Overfitting

Problem: Model memorizes past data, fails on new data Solution: Use train/test splits, cross-validation

2. Data Leakage

Problem: Using information not available at prediction time Solution: Strict temporal separation of features

3. Ignoring Variance

Problem: NFL spread std ≈ 13.5 points Solution: Quantify uncertainty, accept randomness

4. Small Sample Illusions

Problem: 60% accuracy over 20 games means nothing Solution: Require 100+ game samples for conclusions

Building Blocks

1. Team Ratings

Single number representing team strength

rating = weighted_average(point_differentials)

2. Home Field Advantage

~2.5 points in modern NFL

spread = away_rating - home_rating - HFA

3. Adjustments

Rest days (+0.5 pts/day)
Travel (long distance: +1-1.5 pts)
Timezone (west→east: +1 pt)
Weather, injuries

4. Uncertainty

total_std = sqrt(game_variance² + sample_uncertainty² + rating_uncertainty²)

Converting Spread to Probability

home_win_prob = 1 / (1 + 10^(spread / 8))

Spread	Home Win Prob
-14	85%
-7	72%
-3	60%
0	50%
+3	40%
+7	28%

Quick Calibration Check

Your 70% predictions should win ~70% of the time.

# Group predictions by probability bin
# Compare predicted prob to actual win rate
calibration_error = |predicted_prob - actual_win_rate|

Sample Size Requirements

Sample	95% CI Width	Reliable?
20 games	±22%	No
50 games	±14%	Barely
100 games	±10%	Somewhat
500 games	±4%	Yes
1000 games	±3%	Very

Model Comparison Framework

Aspect	Ask
Inputs	What data does it use?
Process	How does it combine information?
Outputs	What does it predict?
Evaluation	How was it validated?
Uncertainty	Does it quantify confidence?

Red Flags

Claims of >65% sustained accuracy
No reported sample size
No out-of-sample testing
Using post-game data to predict
Ignoring model uncertainty

Baseline Expectations

Method	Expected Accuracy
Coin flip	50%
Always pick home	52%
Always pick favorite	67%
Vegas spread	50% ATS
Good model	55-58% SU
Elite model	58-62% SU

Remember

Systematic > Intuitive - Models beat gut feelings long-term
Evaluation is mandatory - No testing = no credibility
Variance is real - Even perfect models have bad weeks
Simple often wins - Complexity ≠ accuracy
Continuous improvement - Update with new data