Case Study 2: Building a Real-Time RAPM System for Player Evaluation

Introduction

This case study follows the development of a production RAPM system for a fictional NBA team, the Metro City Navigators. We'll walk through the complete process from data acquisition to deployment, highlighting the practical challenges and decisions faced when implementing RAPM in a professional sports analytics environment.

Project Context

Team Situation

The Metro City Navigators finished the previous season with a 38-44 record, missing the playoffs by 3 games. The front office has identified several key decisions for the upcoming offseason:

  1. Whether to re-sign their starting center (30 years old, seeking max contract)
  2. Evaluating three free agent targets at different salary levels
  3. Identifying undervalued role players for roster depth
  4. Assessing young players' development trajectories

The analytics department has been tasked with building an in-house RAPM system to inform these decisions.

Phase 1: Data Acquisition and Preparation

Data Sources

The team's analytics department secured access to the following data:

Primary Sources: - Play-by-play data from official NBA feed (via Stats LLC) - Player tracking data (Second Spectrum) - Historical box scores (10 years)

Supplementary Sources: - Player biographical data (height, weight, position, age) - Team schedule and travel data - Injury reports

Data Structure

Raw play-by-play data arrives in the following format:

game_id: 0022200001
event_num: 45
event_type: shot
player_id: 203076
team_id: 1610612744
x_coord: 23.5
y_coord: 8.2
timestamp: 720.4
description: "Anthony Davis 17' pull-up jump shot"
result: made
home_score: 45
away_score: 48

Stint Construction Algorithm

The first technical challenge is transforming play-by-play events into stints:

def construct_stints(play_by_play_df, game_id):
    """
    Transform play-by-play data into stint-level observations.

    A stint is a continuous period with the same 10 players on court.
    """
    stints = []
    current_stint = None

    # Track current lineups
    home_lineup = set()
    away_lineup = set()

    for idx, event in play_by_play_df.iterrows():
        # Check for substitution
        if event['event_type'] == 'substitution':
            # End current stint
            if current_stint is not None:
                current_stint['end_time'] = event['timestamp']
                current_stint['end_home_score'] = event['home_score']
                current_stint['end_away_score'] = event['away_score']
                stints.append(current_stint)

            # Update lineup
            if event['team_id'] == event['home_team_id']:
                home_lineup.remove(event['player_out'])
                home_lineup.add(event['player_in'])
            else:
                away_lineup.remove(event['player_out'])
                away_lineup.add(event['player_in'])

            # Start new stint
            current_stint = {
                'game_id': game_id,
                'start_time': event['timestamp'],
                'start_home_score': event['home_score'],
                'start_away_score': event['away_score'],
                'home_lineup': frozenset(home_lineup),
                'away_lineup': frozenset(away_lineup)
            }

        # Track scoring events for possession estimation
        if event['event_type'] in ['shot', 'free_throw', 'turnover']:
            if current_stint is not None:
                # Update possession counters
                pass

    return pd.DataFrame(stints)

Data Quality Challenges

The team encountered several data quality issues:

Challenge 1: Missing Substitution Events

Approximately 2% of substitutions were missing from the play-by-play feed, causing incorrect lineup tracking.

Solution: Cross-reference with box score stints and player-tracking presence data. When discrepancies detected, interpolate substitution timing based on tracking data.

Challenge 2: Overtime and Technical Foul Periods

Overtime and technical foul free throws create edge cases in stint construction.

Solution: Treat overtime as separate "quarters" with fresh stint initialization. Exclude technical foul possessions from analysis.

Challenge 3: Garbage Time

Blowout games contain minutes where teams rest starters, potentially biasing RAPM estimates.

Solution: Flag stints where score differential exceeds 25 points in the fourth quarter. Run sensitivity analysis with and without these stints.

Final Dataset Statistics

After processing, the team's dataset included:

Metric Value
Games processed 1,230 (1 season)
Total stints 42,847
Unique players 537
Average stint length 5.2 possessions
Total possessions 223,124

Phase 2: Model Development

Design Matrix Construction

The analytics team built the design matrix using efficient sparse matrix operations:

import numpy as np
from scipy.sparse import csr_matrix, lil_matrix

def build_design_matrix(stints_df, player_ids):
    """
    Build sparse design matrix for RAPM.

    Parameters:
    -----------
    stints_df: DataFrame with home_lineup, away_lineup columns
    player_ids: List of all player IDs in order

    Returns:
    --------
    X: Sparse design matrix (n_stints x n_players)
    """
    n_stints = len(stints_df)
    n_players = len(player_ids)
    player_idx = {pid: i for i, pid in enumerate(player_ids)}

    # Use lil_matrix for efficient construction
    X = lil_matrix((n_stints, n_players), dtype=np.float32)

    for i, row in enumerate(stints_df.itertuples()):
        # Home team players get +1
        for player in row.home_lineup:
            if player in player_idx:
                X[i, player_idx[player]] = 1.0

        # Away team players get -1
        for player in row.away_lineup:
            if player in player_idx:
                X[i, player_idx[player]] = -1.0

    # Convert to CSR for efficient computation
    return csr_matrix(X)

Regularization Parameter Selection

The team implemented cross-validation to select the optimal lambda:

from sklearn.model_selection import KFold
from sklearn.linear_model import Ridge
import numpy as np

def cross_validate_lambda(X, y, weights, lambda_values, n_folds=5):
    """
    Select optimal lambda using k-fold cross-validation.
    """
    kf = KFold(n_splits=n_folds, shuffle=True, random_state=42)

    cv_scores = {lam: [] for lam in lambda_values}

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X[train_idx], X[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        for lam in lambda_values:
            model = Ridge(alpha=lam, fit_intercept=False)
            model.fit(X_train, y_train, sample_weight=w_train)

            # Weighted MSE on validation set
            predictions = model.predict(X_val)
            mse = np.average((y_val - predictions)**2, weights=w_val)
            cv_scores[lam].append(mse)

    # Average across folds
    mean_scores = {lam: np.mean(scores) for lam, scores in cv_scores.items()}
    optimal_lambda = min(mean_scores, key=mean_scores.get)

    return optimal_lambda, mean_scores

Cross-Validation Results:

Lambda CV MSE Relative to Best
100 156.3 +12.4%
500 142.8 +2.7%
1,000 139.1 Best
2,500 141.5 +1.7%
5,000 148.2 +6.5%
10,000 162.7 +17.0%

The optimal lambda of 1,000 was selected for production models.

Model Fitting and Results

The final model was fit using scikit-learn's Ridge implementation:

from sklearn.linear_model import Ridge

def fit_rapm(X, y, weights, lambda_reg=1000):
    """
    Fit RAPM model and return player coefficients.
    """
    model = Ridge(alpha=lambda_reg, fit_intercept=False)
    model.fit(X, y, sample_weight=weights)
    return model.coef_

Team's Key Players - RAPM Results:

Player RAPM O-RAPM D-RAPM Minutes Role
Marcus Thompson +4.2 +5.8 -1.6 2,634 Star Guard
DeAndre Jordan +1.8 +0.5 +1.3 2,412 Starting Center
Jaylen Williams +2.9 +2.1 +0.8 2,187 Starting SF
Chris Lopez +0.5 +1.2 -0.7 1,893 Starting PF
David Kim +1.2 +0.8 +0.4 1,567 6th Man
Rest of roster -0.3 ... ... ... Various

Phase 3: Application to Offseason Decisions

Decision 1: Re-signing the Starting Center

Player Profile: - DeAndre Jordan, 30 years old - Seeking 4-year, $85M contract - RAPM: +1.8 (O-RAPM: +0.5, D-RAPM: +1.3)

RAPM Analysis:

The team projected Jordan's RAPM trajectory using age curves:

Age Projected RAPM Contract Year
30 +1.8 Year 1
31 +1.4 Year 2
32 +0.9 Year 3
33 +0.4 Year 4

Value Calculation:

Using $3.5M per win and expected minutes:

Year RAPM Minutes Wins Added Value
1 +1.8 2,400 8.0 $28.0M
2 +1.4 2,200 5.7 $19.9M
3 +0.9 2,000 3.3 $11.6M
4 +0.4 1,800 1.3 $4.6M
Total 18.3 $64.1M

Recommendation: The contract ($85M) exceeds projected value ($64M). Recommend either a shorter term (2 years) or lower annual value (max $18M/year).

Decision 2: Free Agent Evaluation

Candidate A: James Robinson (SG) - Age: 27, Seeking $18M/year - RAPM: +2.4 (O-RAPM: +3.1, D-RAPM: -0.7)

Candidate B: Michael Chen (SF) - Age: 25, Seeking $12M/year - RAPM: +1.6 (O-RAPM: +0.9, D-RAPM: +0.7)

Candidate C: Andre Williams (PF) - Age: 29, Seeking $8M/year - RAPM: +0.8 (O-RAPM: +0.3, D-RAPM: +0.5)

Comparative Analysis:

Player RAPM Contract $/Win SE(RAPM) Value Rating
Robinson +2.4 $18M | $3.4M 1.2 Fair
Chen +1.6 $12M | $3.8M 1.5 Good
Williams +0.8 $8M | $5.0M 1.8 Fair

Uncertainty Analysis:

Given standard errors, the team calculated probability each player was truly positive value at their contract:

Player P(RAPM > break-even)
Robinson 78%
Chen 71%
Williams 62%

Recommendation: Prioritize Robinson for impact, Chen for value/upside. Williams only if other options fall through.

Decision 3: Identifying Undervalued Role Players

The analytics team searched for players with: - RAPM > +1.0 - Salary < $5M - Age < 28

Hidden Gems Identified:

Player Team RAPM Salary SE
Tyler Brooks PHX +1.8 $2.1M 2.1
Marcus Green MEM +1.4 $3.5M 1.8
Kevin Park OKC +1.2 $1.9M 2.4

Caveat: High standard errors suggest these estimates may be lucky outliers. The team recommended targeting multiple players to diversify risk.

Decision 4: Young Player Development Assessment

Rookie Analysis:

The team's lottery pick, Jason Williams (age 20), had mixed results:

Metric Value League Rank
RAPM -1.2 45th percentile
O-RAPM +0.5 62nd percentile
D-RAPM -1.7 28th percentile
Minutes 1,450 Starter-level

Interpretation Challenges:

  1. Sample size: First-year players have high RAPM uncertainty
  2. Role effects: Young players often in developmental situations
  3. Trajectory matters: Current RAPM less important than improvement rate

Development Projection:

Using historical data on similar prospects, the team projected:

Year Age Projected RAPM 90% CI
Current 20 -1.2 [-3.5, +1.1]
Year 2 21 +0.3 [-1.8, +2.4]
Year 3 22 +1.5 [-0.5, +3.5]
Year 4 23 +2.4 [+0.3, +4.5]

Recommendation: Patience warranted. Offensive RAPM suggests scoring ability; defensive RAPM likely to improve with experience. Project as average-to-good starter by year 3.

Phase 4: System Deployment

Production Architecture

The team deployed their RAPM system with the following components:

┌─────────────────────────────────────────────────────────┐
│                    Data Pipeline                         │
├─────────────────────────────────────────────────────────┤
│  NBA API → ETL → PostgreSQL → Feature Store → Model    │
│     ↓         ↓          ↓            ↓          ↓     │
│  Live PBP  Stints    Player DB    Matrices    RAPM     │
└─────────────────────────────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────┐
│                    Applications                          │
├─────────────────────────────────────────────────────────┤
│  Dashboard │ Contract Model │ Draft Board │ Scout API  │
└─────────────────────────────────────────────────────────┘

Update Schedule

Update Type Frequency Latency
Game-level Nightly 4 hours
Full model Weekly 2 hours
Historical Monthly 8 hours

Validation Protocol

Before deployment, the team validated their system against:

  1. External RAPM sources: Correlation of r=0.94 with published RAPM
  2. Future performance: Out-of-sample R² of 0.18 predicting next-season on/off
  3. Expert review: Front office confirmed face validity of rankings

Results and Lessons Learned

Offseason Outcomes

Following the RAPM-informed decisions:

  1. Center decision: Team let Jordan walk; signed younger replacement at $12M
  2. Free agency: Signed James Robinson; outbid for Michael Chen
  3. Role players: Acquired Tyler Brooks (trade) and Kevin Park (FA)
  4. Rookie development: Maintained starter minutes for Jason Williams

Season Results

The following season, Metro City finished 46-36 (+8 wins):

Category Previous Season Current Season
Record 38-44 46-36
Off Rating 110.2 113.8
Def Rating 112.5 110.6
Net Rating -2.3 +3.2

Key Learnings

1. RAPM is necessary but not sufficient

RAPM identified value but couldn't predict chemistry, health, or system fit. The Robinson signing worked partly due to compatibility with existing players—a factor requiring qualitative assessment.

2. Uncertainty quantification is critical

Several "undervalued" players with high RAPM standard errors regressed. Diversifying acquisitions across multiple targets mitigated this risk.

3. Context requires human interpretation

RAPM flagged Jason Williams' poor defensive RAPM, but coaches identified correctable issues (positioning, effort) rather than fundamental limitations. Human expertise complemented statistical findings.

4. System maintenance requires ongoing investment

Data quality issues emerged throughout the season (new arena camera configurations, substitution pattern changes). Dedicated engineering support was essential.

Conclusion

Building a production RAPM system requires balancing statistical rigor with practical constraints. The Metro City Navigators' experience demonstrates that RAPM can provide valuable decision support when properly implemented and interpreted. The key is treating RAPM as one input among many, acknowledging uncertainty, and maintaining human judgment in the evaluation process.


Discussion Questions

  1. How should the team weight RAPM versus traditional statistics in contract negotiations?

  2. What additional data sources might improve RAPM estimates for young players?

  3. How should injury risk be incorporated into RAPM-based projections?

  4. What safeguards prevent over-reliance on a single metric like RAPM?

  5. How might opposing teams adjust their strategies knowing RAPM is being used for evaluation?

Technical Exercise

Implement the complete RAPM pipeline described in this case study:

  1. Download sample play-by-play data (available from Basketball Reference or NBA API)
  2. Build the stint construction algorithm
  3. Create the sparse design matrix
  4. Implement cross-validation for lambda selection
  5. Fit the final model and generate player rankings
  6. Compare your results to published RAPM values