nflfastR Documentation - Technical implementation details - Model calibration and validation - `https://www.nflfastr.com/articles/nflfastR.html` → Chapter 11: Further Reading and Resources
Brian Burke - Original framework development - Foundational concepts for modern EPA - `https://www.advancedfootballanalytics.com/` → Chapter 11: Further Reading and Resources
Sports Analytics Conference - Details methodology for expected rushing models - Feature selection for pre-snap predictions - Validation and calibration approaches → Further Reading: Rushing and Running Game Analysis
"Expected Threat in Soccer"
Karun Singh (2019) - Introduces spatial value models applicable to football - Methodology for position-based analysis → Chapter 24: Further Reading
Eric Eager & George Chahrouri - Modern computational approaches - Code examples for special teams metrics - Data sources and manipulation → Chapter 10: Further Reading and Resources
Carter & Machol (1978) - Early quantitative fourth-down analysis - Expected value calculations - Historical context for modern work → Chapter 10: Further Reading and Resources
Python library for data validation - Declarative data quality rules - https://greatexpectations.io/ - Industry-standard tool for data pipelines → Further Reading: Data Cleaning and Preparation
Burke (2013) - Expected points framework for kicker assessment - Field goals over expected methodology - Career value estimation → Chapter 10: Further Reading and Resources
Jurafsky & Martin - Comprehensive NLP textbook - Free online draft available - Chapters on text classification, NER, sentiment → Chapter 25: Further Reading
Berry & Berry (2015) - Statistical analysis of FG make probability - Weather, distance, and situational factors - Foundation for probability modeling → Chapter 10: Further Reading and Resources
MIT Sloan Sports Analytics - Quantifies impact of pressure on offensive efficiency - Develops framework for valuing pass rush - Establishes pressure rate as key metric → Further Reading: Defensive Metrics and Analysis
"The Value of an Elite Running Back"
Football Outsiders Research - Analyzes replacement-level theory applied to RBs - Quantifies marginal value of rushing production - Discusses roster construction implications → Further Reading: Rushing and Running Game Analysis
"The Value of College Football Recruiting"
Journal of Sports Economics - Correlation between recruiting rankings and team success - Statistical methodology for evaluation - Long-term program building analysis → Chapter 20: Further Reading - Recruiting Analytics
Daniel Kahneman - Cognitive biases in decision-making - Risk aversion and probability assessment - Applications to coaching decisions → Chapter 10: Further Reading and Resources
"Tracking Data Analysis in Football"
Bornn, L., Cervone, D., & Fernandez, J. (2018) - Comprehensive overview of tracking data applications in sports - Framework for spatial analysis → Chapter 24: Further Reading
Final scores agreed 100% across all sources - Scores are the most verified/visible statistic - **Recommendation:** Any source is reliable for scores → Case Study: Comparing Data Sources for Accuracy
2. Yardage Statistics Have Minor Discrepancies
~11% of yardage comparisons showed differences - Most differences were small (<5 yards) - Differences likely due to definitional variations - **Recommendation:** Use consistent source within a project; document which source → Case Study: Comparing Data Sources for Accuracy
Consistent pattern across multiple games - Likely includes yards differently (sack treatment) - **Recommendation:** Don't mix ESPN passing stats with other sources → Case Study: Comparing Data Sources for Accuracy
When there are differences, they're typically small - Both likely use similar underlying sources - **Recommendation:** Either is reliable; CFBD preferred for programmatic access → Case Study: Comparing Data Sources for Accuracy
@benbbaldwin
nflfastR creator 2. **@thomasmock** - Sports data visualization 3. **@PFF** - Pro Football Focus 4. **@SethWalder** - ESPN analytics 5. **@CamPen66** - Expected rushing models → Further Reading: Rushing and Running Game Analysis
Play-by-play data filtered to rushing plays - Fields: game_id, team, down, distance, yards_gained, EPA, success - Potentially team schedules and opponent information → Quiz: The Data Landscape of NCAA Football
A) Elite (both metrics above average)
4.6 seconds is excellent hang time; 42 yards is solid distance. → Chapter 10: Quiz
A) Excellent punting with poor coverage
5.7 yards difference between gross and net indicates significant return yardage allowed. → Chapter 10: Quiz
Raw make percentages segmented by distance 2. **Pressure Performance** - Accuracy in high-leverage situations 3. **Environmental Adaptability** - Performance in adverse weather/conditions 4. **Consistency Metrics** - Variance in performance week-to-week 5. **Projection Model** - Expected NFL perform → Case Study 1: Evaluating a Kicker for the NFL Draft
4th & 1 anywhere in opponent's territory - 4th & 2 at opponent's 30-50 - 4th & goal from inside the 3 → Chapter 10: Key Takeaways
Almost Always Kick FG:
4th & 6+ at opponent's 15-25 (32-42 yard FG) - 4th & 4+ at opponent's 1-10 (under 30 yard FG) - Final seconds when FG wins/ties → Chapter 10: Key Takeaways
Almost Always Punt:
4th & 7+ from own territory - 4th & 10+ from anywhere except desperate situations - Leading late with chance to pin deep → Chapter 10: Key Takeaways
Always spot-check
Manually verify 5-10% of your data 2. **Use known results** - Cross-check high-profile games with news reports 3. **Document discrepancies** - Note when sources disagree 4. **Be consistent** - Use one source throughout a project 5. **Acknowledge limitations** - Report data source and known issues → Case Study: Comparing Data Sources for Accuracy
ROI calculation for scholarship specialists - Performance prediction models - Program-specific recommendations - Case studies of elite special teams programs → Chapter 10: Exercises
Analysis:
**Anderson** shows remarkable improvement under pressure, suggesting excellent mental composure - **Sterling** has slight regression but maintains competence - **Ramirez** shows significant decline, a concerning pattern for NFL pressure → Case Study 1: Evaluating a Kicker for the NFL Draft
Primary need: Flexible tools for ad-hoc analysis, model development - Time constraints: Variable based on project requirements - Technical sophistication: High; comfortable with code and complex interfaces - Access patterns: Continuous throughout the year → Chapter 27: Building a Complete Analytics System
Primary need: Performance tracking, resource allocation justification - Time constraints: Quarterly and annual reporting cycles - Technical sophistication: Low; need executive summaries - Access patterns: Periodic, often driven by reporting requirements → Chapter 27: Building a Complete Analytics System
Cloud analytics platform skills - **Google Cloud Professional Data Engineer** - Data engineering on GCP - **Kubernetes Administrator (CKA)** - Container orchestration - **PostgreSQL Certification** - Database administration → Chapter 27 Further Reading: Building a Complete Analytics System
EP = (0.45 × 3.0) + (0.55 × 1.2) = 1.35 + 0.66 = 2.01. Wait, let me recalculate: For defense, they get ball at 42. From defense perspective, opponent EP of 1.2 means we give them 1.2 EP. So EP of attempt = 0.45(3.0) + 0.55(−1.2) = 1.35 − 0.66 = 0.69... Actually the question states opponent starts at → Chapter 10: Quiz
`/plays` with filters: year, conference="SEC", playType="Rush" - `/games` for game context and opponent information - Possibly `/teams` for team metadata → Quiz: The Data Landscape of NCAA Football
Probability as long-run frequency - Probability rules (0 ≤ P(A) ≤ 1) - Complement rule: P(not A) = 1 - P(A) - Understanding of independent vs. dependent events → Prerequisites
Basic Statistics
Passing yards, completions, attempts - Rushing yards, carries - Touchdowns, interceptions - Team wins and losses → Prerequisites
Film breakdown including rushing analysis 2. **JT O'Sullivan** - QB School (includes run game concepts) 3. **Baldy Breakdowns** - Brian Baldinger's analysis → Further Reading: Rushing and Running Game Analysis
Parquet for the main play-by-play data (will have many rows) - Reasoning: Smaller file size, faster reads, preserves data types - Could use CSV for smaller reference tables → Quiz: The Data Landscape of NCAA Football
C) Go for it
At opponent's 40, 4th and 3 typically favors going for it (high conversion %, good failure position, FG too long). → Chapter 10: Quiz
Free and open access - Comprehensive historical data - Active development and community - Pre-calculated advanced metrics - Well-documented API → Chapter 2: The Data Landscape of NCAA Football
cfbfastR (R Package)
`https://cfbfastR.sportsdataverse.org/` - College football play-by-play data - Expected points and win probability - Special teams play identification → Chapter 10: Further Reading and Resources
Is he creating yards after contact? 2. **Check RYOE** - Does he beat expectation? 3. **Check Success Rate** - Is he consistent? 4. **Check Situational** - Reliable in critical moments? 5. **Check Durability** - Efficiency across volume? → Key Takeaways: Rushing and Running Game Analysis
Primary need: Actionable insights for game preparation and in-game decisions - Time constraints: Decisions often needed in seconds during games, hours for game planning - Technical sophistication: Variable; prefer visual interfaces over raw data - Access patterns: Heavy use during season, especially → Chapter 27: Building a Complete Analytics System
Collaboration
Work effectively with coaches (often non-technical) - Partner with engineering teams - Navigate organizational politics - Build relationships across departments → Chapter 28: Career Paths in Sports Analytics
**SQLite**: File-based, no server needed, good for personal projects - **PostgreSQL**: Full-featured, good for production systems - **MySQL**: Popular, widely supported → Chapter 2: The Data Landscape of NCAA Football
Translate complex findings for non-technical audiences - Write clear, concise reports - Present confidently to groups - Listen actively to stakeholder needs → Chapter 28: Career Paths in Sports Analytics
Trailing late: More aggressive - Leading late: Consider game state - Elite offense: More aggressive - Poor defense: More aggressive - Weather impact: Adjust probabilities → Chapter 10: Key Takeaways
Lead with genuine passion for the organization - Connect your skills to their specific needs - Reference recent team performance or challenges - Include specific portfolio examples - Keep brief (3-4 paragraphs) → Chapter 28: Career Paths in Sports Analytics
Heavy user during opponent game prep - Primary questions: "What does the opponent do in [situation]?" and "What are their tendencies?" - Needed printable scouting reports - Valued comparative analysis against multiple opponents → Case Study 1: Redesigning a Football Analytics Dashboard
they describe what happened. They've existed as long as football has been played. Box scores in newspapers a century ago recorded rushing yards and passing completions. → Chapter 1: Introduction to College Football Analytics
Design for Failure
Assume components will fail; build resilience 2. **Measure Everything** - You can't improve what you don't measure 3. **Latency is a Feature** - Every millisecond matters in live sports 4. **Data Quality First** - Bad data produces bad insights 5. **Scale Horizontally** - Add capacity by adding mach → Chapter 26 Key Takeaways: Real-Time Analytics Systems
Reduces elite returner impact - Trades distance for field position certainty - Best against specific return threats → Chapter 10: Key Takeaways
Disadvantages:
No data type information (all values are strings) - Larger file sizes than binary formats - Slow for very large datasets - No native support for nested data → Chapter 2: The Data Landscape of NCAA Football
Understanding that data has a "shape" - Familiarity with the normal distribution concept - Awareness that not all data is normally distributed → Prerequisites
One row per possession - Includes start/end field position, result, plays count - Useful for studying possession efficiency - Medium granularity → Chapter 2: The Data Landscape of NCAA Football
Verify all 16 SEC teams are represented - Check for missing values in key fields (yards, down, distance) - Validate yards_gained is within reasonable range (-20 to 99) - Confirm play_type filter worked correctly - Cross-check total rushes against published box scores - Check for duplicate plays → Quiz: The Data Landscape of NCAA Football
1.1 What Is Sports Analytics? - 1.1.1 Defining Analytics in Sports - 1.1.2 The Evolution from Statistics to Analytics - 1.1.3 Analytics vs. Traditional Scouting - 1.2 The History of Football Analytics - 1.2.1 Early Statistical Analysis in Football - 1.2.2 The Moneyball Effect on Football - 1.2.3 The → College Football Analytics and Visualization
Estimated Time: 5 hours | Difficulty: Advanced
23.1 Network Concepts in Football - 23.1.1 Network Theory Basics - 23.1.2 Football as a Network - 23.1.3 Types of Football Networks - 23.2 Passing Networks - 23.2.1 Building QB-Receiver Networks - 23.2.2 Network Metrics (Centrality, Clustering) - 23.2.3 Visualizing Passing Networks - 23.2.4 Identify → College Football Analytics and Visualization
Estimated Time: 5 hours | Difficulty: Beginner
2.1 Understanding Football Data - 2.1.1 Play-by-Play Data Structure - 2.1.2 Game-Level vs. Play-Level Data - 2.1.3 Player-Level Data - 2.2 Primary Data Sources - 2.2.1 College Football Data API (CFBD) - 2.2.2 Sports Reference - 2.2.3 ESPN and Official NCAA Statistics - 2.2.4 PFF and Premium Data Pro → College Football Analytics and Visualization
6.1 The Box Score Era - 6.1.1 History of Football Statistics - 6.1.2 What Traditional Stats Capture - 6.1.3 Limitations of Traditional Metrics - 6.2 Offensive Counting Statistics - 6.2.1 Passing: Completions, Attempts, Yards, TDs, INTs - 6.2.2 Rushing: Carries, Yards, TDs - 6.2.3 Receiving: Receptio → College Football Analytics and Visualization
Estimated Time: 6 hours | Difficulty: Advanced
16.1 Understanding Football Spatial Data - 16.1.1 Coordinate Systems - 16.1.2 Tracking Data Concepts - 16.1.3 Working with X-Y Coordinates - 16.2 Drawing the Football Field - 16.2.1 Field Dimensions and Markings - 16.2.2 Creating Field Plots - 16.2.3 Reusable Field Functions - 16.3 Pass Location Ana → College Football Analytics and Visualization
Estimated Time: 6 hours | Difficulty: Beginner
3.1 Setting Up Your Analytics Environment - 3.1.1 Python Installation and Virtual Environments - 3.1.2 Essential Libraries Installation - 3.1.3 IDE Configuration for Data Science - 3.2 pandas Fundamentals - 3.2.1 DataFrames and Series - 3.2.2 Reading and Writing Data - 3.2.3 Indexing and Selection - → College Football Analytics and Visualization
7.1 Beyond Passer Rating - 7.1.1 Problems with Traditional Passer Rating - 7.1.2 The Need for Context - 7.1.3 Framework for Advanced Metrics - 7.2 Air Yards and Depth of Target - 7.2.1 Defining Air Yards - 7.2.2 Intended Air Yards vs. Completed Air Yards - 7.2.3 Average Depth of Target (aDOT) - 7.2. → College Football Analytics and Visualization
Estimated Time: 7 hours | Difficulty: Advanced
22.1 Machine Learning in Sports - 22.1.1 When to Use ML - 22.1.2 ML vs. Traditional Statistics - 22.1.3 Interpretability Considerations - 22.2 Tree-Based Methods - 22.2.1 Decision Trees - 22.2.2 Random Forests - 22.2.3 Gradient Boosting (XGBoost, LightGBM) - 22.2.4 Feature Importance - 22.3 Regulari → College Football Analytics and Visualization
Estimated Time: 8 hours | Difficulty: Advanced
27.1 System Design - 27.1.1 Requirements Gathering - 27.1.2 Architecture Planning - 27.1.3 Technology Selection - 27.2 Data Pipeline Construction - 27.2.1 Data Collection Layer - 27.2.2 Processing and Storage - 27.2.3 Access and API Design - 27.3 Analysis Layer - 27.3.1 Metric Calculations - 27.3.2 → College Football Analytics and Visualization
Salary: $58,000 (significant cut from finance) - Responsibilities: Opponent analysis, draft evaluation support - Initial challenges: Learning team dynamics, earning trust → Case Study 1: From Finance to NFL Analytics
**Right-skewed**: Individual play yards, player salaries, recruiting rankings - **Left-skewed**: Completion percentage (ceiling at 100%), time of possession - **Symmetric**: Point differentials, standardized metrics (z-scores) → Chapter 4: Descriptive Statistics in Football
[ ] Calculate conversion probability - [ ] Determine EP for success - [ ] Determine EP for failure - [ ] Calculate EP for alternatives (FG/punt) - [ ] Choose highest EP option - [ ] Adjust for game context → Chapter 10: Key Takeaways
Four quarters, 15 minutes each - Four downs to gain 10 yards - Scoring: touchdown (6), PAT (1 or 2), field goal (3), safety (2) - Two halves, halftime in between → Prerequisites
1st & 10 at own 25: +0.2 EP - 2nd & 3 at own 32: +0.9 EP - 1st & 10 at own 42: +1.4 EP - 2nd & 8 at own 44: +1.2 EP - 1st & 10 at opponent's 41: +2.1 EP → Chapter 11: Quiz
Given:
Conversion probability for 4th and 3: 52% - Field goal distance: 49 yards - Field goal make probability (49 yards): 55% - Expected punt net yards from this position: 35 yards - EP at opponent's 32: 2.8 - EP at own 25 (after failed conversion): -0.8 - EP after made field goal: 3.0 - EP after missed f → Chapter 10: Exercises
Limited time—wanted 60-second overview maximum - Primary questions: "Are we improving?" and "Where are we vulnerable?" - Needed talking points for staff meetings - Required export for athletic director reports → Case Study 1: Redesigning a Football Analytics Dashboard
Opponent has ball at your 40-yard line - Expected Points for opponent there: approximately +2.1 EP - Your perspective: -2.1 EP → Case Study: The Fourth Down Revolution
If you go for it:
Probability of conversion × Value of first down at that spot - Probability of failure × Cost of opponent having ball there → Case Study: The Fourth Down Revolution
Distinguish between statistics and analytics in a sports context - Understand the key developments that shaped modern football analytics - Identify real applications of analytics in college football programs - Apply the five-stage analytics workflow to football questions - Consider ethical implicati → Chapter 1: Introduction to College Football Analytics
Increasing Data Availability
Player tracking becoming standard across leagues - Biometric and load management data expanding - Video-synchronized analytics growing - Real-time data feeds improving → Chapter 28: Career Paths in Sports Analytics
Positive covariance: Variables tend to move together - Negative covariance: Variables tend to move opposite - Near zero: Little linear relationship → Chapter 4: Descriptive Statistics in Football
`gameId` is a string (should be usable for joins) - `down` is a string (should be integer) - `yardsGained` contains "INC" for incomplete passes and has missing values - Team names are inconsistent across rows - No explicit game date or team identifiers → Case Study 1: Building a Season Database from Multiple Sources
**Endpoint**: URL that returns specific data (e.g., `/games`, `/plays`) - **Parameters**: Filters for your request (`year=2023`, `team=Alabama`) - **Rate Limit**: Maximum requests per hour (~1000 for CFBD) → Key Takeaways: The Data Landscape of NCAA Football
Key Observations:
**Anderson** is virtually unaffected by weather conditions - **Sterling** and **Ramirez** show significant weather-related regression - Anderson's Big Ten experience provides relevant cold-weather data → Case Study 1: Evaluating a Kicker for the NFL Draft
Key Questions:
How well does college EPA predict NFL success? - What other factors improve prediction? - Are there EPA thresholds that indicate NFL readiness? → Chapter 11: Exercises
**Ramirez** has the best long-range percentage and proven range to 57 yards - **Sterling** has solid volume from 50+ - **Anderson** has limited long-range attempts but reliable through 50 → Case Study 1: Evaluating a Kicker for the NFL Draft
Competition for talent increasing - Salaries rising (especially senior roles) - Remote work becoming common - Cross-sport movement growing → Chapter 28: Career Paths in Sports Analytics
add shape, pattern, or labels 2. **Use diverging schemes** for data with meaningful midpoint (EPA at 0) 3. **Use sequential schemes** for magnitude (more → darker) 4. **Test with colorblindness simulators** before publishing 5. **Ensure sufficient contrast** (4.5:1 minimum for text) → Chapter 12: Key Takeaways - Fundamentals of Sports Data Visualization
Behind-the-scenes of NFL's real-time tracking infrastructure. - **ESPN Analytics** - Technical posts on live win probability and decision analytics. - **PFF Engineering** - Pro Football Focus technical blog on grading systems. → Chapter 26 Further Reading: Real-Time Analytics Systems
NFL analytics in R - **cfbfastR Tutorial** - College football data in R - **Sports Analytics Course (Coursera)** - University of Michigan - **Open Source Football** - Community tutorials and guides → Chapter 28 Further Reading: Career Paths in Sports Analytics
All data sources have errors or inconsistencies 2. **Score data is most reliable** - Universal agreement on final scores 3. **Yardage stats vary by definition** - Different sources may define stats differently 4. **Validation is essential** - Always cross-check a sample of your data 5. **Document yo → Case Study: Comparing Data Sources for Accuracy
Checks analytics twice daily: morning prep and post-practice - Primary questions: "What worked yesterday?" and "What should we emphasize today?" - Preferred tablet access during film sessions - Wanted one-click access to play video from any data point → Case Study 1: Redesigning a Football Analytics Dashboard
Offensive Personnel:
Returning starting quarterback (senior) - New starting running back (sophomore transfer) - Experienced offensive line (4 returning starters) - New offensive coordinator (first year) → Case Study 2: Diagnosing Offensive Efficiency Decline
Analytics departments growing in size - Integration with coaching improving - Executive buy-in increasing - Professionalization of practices → Chapter 28: Career Paths in Sports Analytics
Other Premium Providers:
**Sports Info Solutions**: Detailed charting data - **Telemetry Sports**: Tracking data - **Pro Football Reference**: NFL data with college crossover → Chapter 2: The Data Landscape of NCAA Football
Outcome Values:
Touchback: 25-yard line - Average return: ~23-yard line - Out of bounds penalty: 35-yard line - Kick return TD: 0 (opponent scores) → Chapter 10: Key Takeaways
Output:
Team-by-team fourth down efficiency - Quantified cost of conservatism - Recommendations for improved decision-making → Chapter 11: Exercises
Over-relying on YPC
Doesn't account for blocking or situation 2. **Ignoring sample size** - Need 100+ carries for stable metrics 3. **Treating all yards equally** - 3rd down conversion > 1st down yards 4. **Forgetting context** - Box count, game script matter 5. **Conflating backs and blocking** - Separate with YBC/YAC → Key Takeaways: Rushing and Running Game Analysis
DataFrames and Series for structured data - Selection with `.loc`, `.iloc`, and boolean indexing - GroupBy for comparative analysis - Merging DataFrames for data enrichment → Chapter 3: Python for Sports Analytics
Panel 1: Radar Profile Comparison
Shows 8 core metrics in spider chart format - Overlays 2-3 prospects simultaneously - Uses consistent normalization (0-100 scale based on position percentiles) → Case Study 1: NFL Draft Comparison Dashboard
Panel 2: Percentile Context Chart
Horizontal bars showing percentile ranking for each metric - Color-coded zones (Elite/Above Average/Average/Below Average) - Shows raw values alongside percentiles → Case Study 1: NFL Draft Comparison Dashboard
Panel 3: Historical Comparison
Identifies 5 most similar NFL players based on college profile - Shows how those players performed in NFL careers - Provides statistical similarity scores → Case Study 1: NFL Draft Comparison Dashboard
Both the upside and downside are greater 2. **Turnovers are devastating** - An interception costs 4+ expected points 3. **The average rush is barely positive** - Despite this, rushing has strategic value 4. **Sacks are very costly** - Worse than an incomplete pass → Chapter 11: Efficiency Metrics (EPA, Success Rate)
Requirements gathering and stakeholder interviews - Architecture design and technology selection - Development environment setup - Database schema design and implementation → Chapter 27: Building a Complete Analytics System
Design your structure and validation criteria first 2. **Cache aggressively** - Minimize API calls through local caching 3. **Validate early** - Catch data issues before analysis begins 4. **Document everything** - Future you will thank present you 5. **Use efficient formats** - Parquet saves signif → Case Study: Building a Complete Season Database
Primary need: Prospect evaluation and comparison - Time constraints: Recruiting cycles span months, but individual evaluations needed quickly - Technical sophistication: Moderate; comfortable with databases and reports - Access patterns: Year-round, with peaks during evaluation periods → Chapter 27: Building a Complete Analytics System
Player Statistics
Passing: completions, attempts, yards, TDs, INTs - Rushing: carries, yards, TDs, fumbles - Receiving: receptions, targets, yards, TDs - Defense: tackles, sacks, interceptions, pass breakups → Chapter 2: The Data Landscape of NCAA Football
Receiver values differ significantly 2. **Floor matters for consistency** - Not just ceiling chasers 3. **Weekly updates valuable** - Season-long projections drift 4. **Position scarcity** - TE and QB values depend on format 5. **Calibration critical** - Users trust well-calibrated intervals → Case Study 2: Fantasy Football Projection System
Pressure rate increased 28% → 36% - Time to throw decreased 2.65 → 2.38 seconds - Performance under pressure collapsed - Stacked boxes became more common (teams not respecting pass) - Running game YPC dropped significantly → Case Study 2: Diagnosing Offensive Efficiency Decline
**Go for it:** EP = 0.55(2.6) + 0.45(-0.7) = 1.43 - 0.315 = **1.115** - **Field goal:** EP = 0.48(3.0) + 0.52(-1.5) = 1.44 - 0.78 = **0.66** (Note: from kicking team perspective, opponent at their 40 = -1.5 for us) - **Punt:** EP = **1.5** (opponent at their 8 means +1.5 for kicking team, or we coul → Chapter 10: Quiz
Store prospect profiles with measurables, ratings, statistics - Track recruiting status and visit history - Handle data from multiple rating services → Chapter 20: Exercises - Recruiting Analytics
Punt Decision:
[ ] Current field position - [ ] Expected net punt yards - [ ] Opponent's starting position EP - [ ] Compare to going for it EP - [ ] Consider time and score → Chapter 10: Key Takeaways
Pandas utilities for data cleaning - Method chaining for cleaning pipelines - https://pyjanitor-devs.github.io/pyjanitor/ - Convenient cleaning functions → Further Reading: Data Cleaning and Preparation
Download from python.org - During installation, check "Add Python to PATH" → Prerequisites
Python 3.9+
Install from python.org 2. **Jupyter Lab** - `pip install jupyterlab` 3. **VS Code** - Excellent for Python development 4. **DB Browser for SQLite** - Visual database tool 5. **Postman** - API testing tool (helpful for exploring endpoints) → Further Reading: The Data Landscape of NCAA Football
Reddit community for football analytics 2. **r/CFBAnalysis** - College football analytics discussion 3. **Football Outsiders Forums** - FO methodology discussion 4. **Fantasy Football Analytics** - Applied rushing metrics → Further Reading: Rushing and Running Game Analysis
Uptime: 99.9% during games, 99% overall - Data backup: Daily automated backups with point-in-time recovery - Disaster recovery: < 4 hour recovery time objective → Chapter 27: Building a Complete Analytics System
Lead with relevant skills and projects - Quantify impact where possible - Highlight sports-specific experience - Include link to portfolio/GitHub - Keep to one page (early career) → Chapter 28: Career Paths in Sports Analytics
Strategies change when leading/trailing 2. **Time remaining** - Late-game plays have different implications 3. **Opponent quality** - Playing elite defense vs poor defense 4. **Weather conditions** - May affect passing vs rushing value → Chapter 11: Efficiency Metrics (EPA, Success Rate)
Elite special teams: +1.5 to +2.0 wins/season - Average special teams: Neutral - Poor special teams: -1.0 to -1.5 wins/season → Chapter 10: Key Takeaways
Game-winning/tying kicks (final 2 min): 8/10 (80.0%) - 4th quarter, within one score: 22/26 (84.6%) - Conference championship games: 5/6 (83.3%) - Adverse weather (rain/wind >15mph): 12/16 (75.0%) → Case Study 1: Evaluating a Kicker for the NFL Draft
Skills Applied:
Understanding expected value calculations - Analyzing decision-making under uncertainty - Recognizing behavioral factors in analytics adoption → Case Study: The Fourth Down Revolution
Software Engineering Daily
Regular episodes on distributed systems and streaming. - **Data Engineering Podcast** - Stream processing and real-time analytics discussions. - **Kubernetes Podcast** - Container orchestration news and interviews. - **The Sports Analytics Podcast** - Industry insights including real-time systems. → Chapter 26 Further Reading: Real-Time Analytics Systems
Solution:
Used color schemes that remain distinguishable in grayscale - Added value labels so color wasn't sole encoding - Tested printability at multiple DPI settings → Case Study 1: NFL Draft Comparison Dashboard
Solutions:
Document your definitions explicitly - Check source documentation - Be cautious when combining data from different sources - Note definitional changes in historical analyses → Chapter 2: The Data Landscape of NCAA Football
4th & 1-2 at opponent's 26-40: Go for it unless trailing by 7+ in final 2 minutes - 4th & 3 at opponent's 26-40: Go for it if not in FG range (<45 yards) - 4th & 4 at opponent's 26-40: Evaluate based on down/distance conversion history → Case Study 2: Fourth Down Decision Analysis for a Championship Team
`https://www.sports-reference.com/cfb/` - Historical kicking statistics - Punting records and averages - Team special teams rankings → Chapter 10: Further Reading and Resources
Sports Reference / College Football Reference
https://www.sports-reference.com/cfb/ - Comprehensive historical statistics - Box scores, season totals, career data - Free access to most statistics → Further Reading: Traditional Football Statistics
No official API (must scrape or use unofficial tools) - No play-by-play data - Terms of service restrict automated access - Pre-aggregated data only → Chapter 2: The Data Landscape of NCAA Football
Understand available endpoints 2. **Complete Chapter 2 exercises** - Hands-on API practice 3. **Read pandas documentation** - Prepare for Chapter 3 4. **Explore r/CFBAnalysis** - See how others use the data 5. **Build Case Study 1 project** - Reinforce learning with real project → Further Reading: The Data Landscape of NCAA Football
Heavy regression needed 2. **Development curves add significant value** - Especially for underclassmen 3. **Context matters** - OL and WR quality adjustments improve accuracy 4. **Confidence intervals well-calibrated** - System uncertainty appropriate → Case Study 1: Building a QB Projection System
Team Analysis:
Overall offensive efficiency - Passing vs rushing efficiency - Situational performance (down, field position) - Red zone efficiency → Chapter 11: Key Takeaways
Team and Culture
Analytics team size and experience - Leadership support for analytics - Collaboration with coaches/front office - Work environment and hours → Chapter 28: Career Paths in Sports Analytics
Computer vision enabling new analyses - Deep learning improving prediction - Natural language processing for scouting - Edge computing for in-game analytics → Chapter 28: Career Paths in Sports Analytics
Modern college football favors touchbacks - Touchback vs return break-even: ~22 yard line - Elite coverage units can justify kicking returnable kicks → Chapter 10: Key Takeaways
Average travel increase: 340 miles per road game - Largest increase: Oregon/Washington to Big Ten (+1,200 miles avg) - Some decreases: Colorado to Big 12 (-180 miles avg) → Case Study 2: Conference Realignment Impact Analysis
Follow #sportsanalytics, #NFLanalytics, #CFBanalytics - **Reddit r/sportsanalytics** - Discussion forum - **Sports Analytics Discord** - Real-time community chat - **LinkedIn Groups** - Sports Analytics Professionals, Football Analytics → Chapter 28 Further Reading: Career Paths in Sports Analytics
U
Uber Engineering Blog
Real-time systems at scale, including geospatial streaming. - **Netflix Tech Blog** - Stream processing and real-time personalization. - **LinkedIn Engineering** - Kafka and real-time data infrastructure. → Chapter 26 Further Reading: Real-Time Analytics Systems
Standard deviation: Average distance from mean - Variance: Squared standard deviation - Range/IQR: Spread of values - Coefficient of variation: Relative variability → Chapter 4: Descriptive Statistics in Football
Play-by-play data from 2001 to present - Game results and box scores - Team and player statistics - Recruiting data and rankings - Betting lines and spreads - Pre-calculated advanced metrics (EPA, WPA, etc.) - Draft and NFL data for former college players → Chapter 2: The Data Landscape of NCAA Football
What PFF Provides:
Play-by-play grades for every player - Detailed charting (coverage assignments, pressure, etc.) - Snap counts by position - Premium metrics (grades 0-100 for each player) → Chapter 2: The Data Landscape of NCAA Football
What Sports Reference Provides:
Team and player statistics back to 1869 - Game logs and box scores - Historical records and milestones - Award voting results - Conference standings and results - Bowl game history → Chapter 2: The Data Landscape of NCAA Football
What's Available:
Official NCAA statistics and records - ESPN's Team and Player pages - Real-time game updates - QBR and other ESPN-specific metrics - Depth charts and injury reports → Chapter 2: The Data Landscape of NCAA Football