Getting Data from Baseball Savant
Beginner
10 min read
0 views
Nov 26, 2025
# Baseball Savant Data: Complete Guide to MLB's Advanced Analytics Platform
## What is Baseball Savant
Baseball Savant is Major League Baseball's official platform for accessing and visualizing Statcast data, the revolutionary tracking technology that captures detailed information about every play in every MLB game. Launched publicly in 2015, Baseball Savant has become the premier destination for baseball analysts, journalists, teams, and fans seeking the most granular baseball data available.
### The Platform's Purpose
Baseball Savant serves as the public-facing interface for Statcast technology, which uses high-resolution cameras and radar equipment installed in all 30 MLB stadiums. The platform provides:
- **Real-time tracking data** for every pitch, batted ball, and player movement
- **Historical Statcast data** dating back to 2015
- **Interactive visualizations** that make complex metrics accessible
- **Downloadable datasets** for custom analysis
- **Leaderboards** across dozens of performance categories
- **Player pages** with comprehensive statistical profiles
The platform democratizes access to data that was previously available only to MLB teams, enabling advanced analysis by the broader baseball community.
### Evolution and Improvements
Since its launch, Baseball Savant has continuously expanded its offerings. Early versions focused primarily on exit velocity and launch angle for batted balls. Modern Baseball Savant includes pitch movement, catcher framing metrics, sprint speed, defensive positioning, pitch arsenals, and much more. The platform receives regular updates to improve data quality, add new metrics, and enhance user experience.
## Statcast Data Overview
Statcast is the tracking system that powers Baseball Savant, using a combination of camera and radar systems to capture data at unprecedented detail.
### How Statcast Works
**Technology Stack:**
- **Trackman radar systems** track ball flight (used 2015-2019)
- **Hawk-Eye camera systems** now primary tracking technology (2020-present)
- **High-speed cameras** capture 30,000+ data points per second
- **Machine learning algorithms** process and validate data
**Data Capture Process:**
1. Cameras/radar track every object on the field
2. System identifies players, ball, and equipment
3. Algorithms calculate positions, velocities, and trajectories
4. Data is validated and quality-checked
5. Information is made available through Baseball Savant
### Types of Statcast Data
**Pitch Tracking:**
- Release point (x, y, z coordinates)
- Release speed and spin rate
- Pitch movement (horizontal and vertical break)
- Plate location
- Pitch classification
- Time to plate
**Batted Ball Tracking:**
- Exit velocity (speed off the bat)
- Launch angle (vertical angle)
- Launch direction (horizontal angle)
- Distance and hang time
- Hit probability and expected outcomes
- Barrel classification
**Player Movement:**
- Sprint speed (feet per second)
- Route efficiency
- Reaction time
- Lead distance for baserunners
- Catcher pop time and exchange time
- Outfielder jump and route metrics
**Defensive Positioning:**
- Player positions at pitch release
- Shift classifications
- Positioning impact on outcomes
### Data Quality and Limitations
While Statcast data is highly accurate, users should be aware of certain limitations:
- **Technology transition**: Data from 2015-2019 (Trackman) differs slightly from 2020+ (Hawk-Eye)
- **Missing data**: Occasional technical issues result in null values
- **Classification accuracy**: Pitch types and batted ball classifications can have errors
- **Sample size**: Single-game or small sample metrics can be noisy
- **Context**: Statcast measures what happened, not always why it happened
## Search Query Interface
Baseball Savant's search interface allows users to filter and download custom datasets based on specific criteria.
### Accessing the Search Tool
Navigate to baseballsavant.mlb.com/statcast_search to access the primary search interface. This tool is the most powerful way to query Statcast data interactively.
### Query Parameters
**Date Range:**
- Select specific dates or full seasons
- Filter by regular season, playoffs, or all games
- Most data available from 2015 onwards
**Player Filters:**
- Search by pitcher name
- Search by batter name
- Filter by team (pitching or batting)
- Multiple players can be selected
**Count Filters:**
- Balls and strikes count
- Outs (0, 1, 2)
- Runners on base
- Inning
**Pitch Filters:**
- Pitch type (fastball, slider, curveball, etc.)
- Pitch result (called strike, swinging strike, foul, in play)
- Velocity ranges
- Spin rate ranges
**Batted Ball Filters:**
- Exit velocity ranges
- Launch angle ranges
- Hit distance
- Batted ball type (ground ball, line drive, fly ball, popup)
- Hit result (out, single, double, triple, home run)
**Advanced Filters:**
- Game type (regular season, playoffs)
- Home/away
- Pitch hand (left/right)
- Bat side (left/right/switch)
- Barrel classification
- Hard-hit classification (95+ mph)
### Building Effective Queries
**Example Query 1: High-Velocity Fastballs**
- Pitch Type: Four-Seam Fastball
- Velocity: 98+ mph
- Season: 2024
- Result: Download all pitches meeting criteria
**Example Query 2: Barrel Analysis**
- Exit Velocity: 98+ mph
- Launch Angle: 26-30 degrees
- Player: Specific batter
- Season: 2024
- Result: Identify barrel rates
**Example Query 3: Breaking Ball Effectiveness**
- Pitch Type: Slider, Curveball
- Count: Two strikes
- Result: Called strike, swinging strike, foul
- Analysis: Which pitchers excel with breaking balls
### Downloading Query Results
Search results can be downloaded as CSV files containing:
- Pitch-by-pitch data with all Statcast measurements
- Game context (date, teams, inning, score)
- Player information
- Outcome data
- Up to 50,000 rows per download (pagination for larger sets)
## Available Data Endpoints
Baseball Savant provides multiple specialized endpoints beyond the main search interface.
### Statcast Leaderboards
**URL Pattern:** `baseballsavant.mlb.com/statcast_leaderboards`
Available leaderboards include:
- Exit velocity leaders
- Hard-hit rate leaders
- Barrel rate leaders
- Sprint speed leaders
- Outs Above Average (defensive metric)
- Expected statistics (xBA, xSLG, xwOBA)
- Pitch velocity by type
- Spin rate leaders
- Pitch movement leaders
Each leaderboard can be filtered by:
- Season or custom date range
- Minimum pitch/batted ball thresholds
- Player position
- Team
### Expected Statistics
**URL Pattern:** `baseballsavant.mlb.com/expected_statistics`
Expected stats use batted ball quality to estimate outcomes:
- **xBA** (expected batting average)
- **xSLG** (expected slugging percentage)
- **xwOBA** (expected weighted on-base average)
- **xwOBACON** (expected wOBA on contact)
Comparing actual vs. expected stats reveals:
- Players outperforming their batted ball quality (lucky)
- Players underperforming (unlucky)
- True talent assessment
### Pitch Arsenal Analysis
**URL Pattern:** `baseballsavant.mlb.com/pitch-arsenal-stats`
Detailed pitch-by-pitch breakdowns showing:
- Pitch usage percentages
- Average velocity by pitch type
- Spin rates
- Movement profiles
- Pitch values (runs above/below average)
- Whiff rates and chase rates by pitch
### Catcher Metrics
**URL Pattern:** `baseballsavant.mlb.com/catcher_framing`
Catcher-specific Statcast data:
- **Framing runs**: Runs saved by receiving strikes
- **Pop time**: Time from pitch receipt to throw arrival at 2B
- **Exchange time**: Time to transfer ball to throwing hand
- **Blocking runs**: Runs saved by blocking wild pitches
- **Throwing runs**: Runs saved by throwing out runners
### Running and Baserunning
**URL Pattern:** `baseballsavant.mlb.com/sprint_speed_leaderboard`
Movement and baserunning metrics:
- Sprint speed (competitive runs only, ft/sec)
- Bolts (runs 30+ ft/sec)
- Home to first times
- Baserunning runs above average
- Stolen base success vs. pop time
### Outs Above Average (OAA)
**URL Pattern:** `baseballsavant.mlb.com/outs_above_average`
Defensive positioning and range metrics:
- Plays made vs. expected (based on ball trajectory and fielder position)
- Success rate on different play types
- Reaction time and route efficiency
- Position-specific defensive value
## Key Metrics Available
Baseball Savant provides dozens of metrics. Understanding the most important ones is crucial for effective analysis.
### Exit Velocity (EV)
**Definition:** The speed of the baseball as it comes off the bat, measured in miles per hour (mph).
**Why It Matters:**
- Strong predictor of offensive success
- Higher exit velocity = more hits, more power
- Average MLB exit velocity: ~88-89 mph
- Elite hitters average 92+ mph
**Key Thresholds:**
- 95+ mph: "Hard hit" classification
- 98+ mph with 26-30 degree LA: "Barrel" classification
- 105+ mph: Elite contact
- 115+ mph: Rare, exceptional contact
**Usage Tips:**
- Use average exit velocity, not just max
- Consider max exit velocity for power potential
- 90th percentile EV shows ability to make elite contact
- Compare to league average for context
### Launch Angle (LA)
**Definition:** The vertical angle at which the ball leaves the bat, measured in degrees.
**Why It Matters:**
- Determines batted ball trajectory
- Optimal angles vary by exit velocity
- "Launch angle revolution" changed hitting approach
**Key Ranges:**
- -10 to 10 degrees: Ground balls
- 10 to 25 degrees: Line drives
- 25 to 50 degrees: Fly balls
- 50+ degrees: Pop-ups
**Optimal Angles:**
- For power: 25-30 degrees (with high EV)
- For batting average: 10-25 degrees
- Sweet spot: 8-32 degrees (line drives and low fly balls)
**Usage Tips:**
- Launch angle alone is insufficient; pair with exit velocity
- Extreme launch angles (very high/low) often unproductive
- Track launch angle trends to identify swing changes
### Sprint Speed
**Definition:** A player's top running speed in feet per second (ft/sec) on competitive plays.
**Why It Matters:**
- Measures raw speed and athleticism
- Impacts baserunning value and defensive range
- Age-related decline tracking
**Key Thresholds:**
- 30+ ft/sec: Elite (called a "Bolt")
- 28-30 ft/sec: Above average
- 27-28 ft/sec: Average
- Below 27 ft/sec: Below average
**Measurement Notes:**
- Only competitive runs counted (not home run trots)
- Typically home-to-first, or baserunning plays
- Less noisy than stolen base success rates
**Usage Tips:**
- Compare within position groups
- Track year-over-year changes
- Combine with baserunning runs for complete picture
### Spin Rate
**Definition:** The rate of spin on a pitched baseball, measured in revolutions per minute (RPM).
**Why It Matters:**
- Higher spin can increase rise effect (fastballs)
- Affects break and deception on breaking balls
- Spin efficiency matters as much as raw spin
- Foreign substance crackdown affected spin rates (2021)
**Typical Ranges by Pitch:**
- Four-seam fastball: 2200-2500 RPM
- Curveball: 2500-3000 RPM
- Slider: 2300-2700 RPM
- Changeup: 1700-2100 RPM
**Spin Efficiency:**
- Percentage of spin contributing to movement
- Pure backspin/topspin = 100% efficient
- Gyro spin (football spiral) = 0% efficient
- Spin axis determines efficiency
**Usage Tips:**
- Compare pitcher's spin to league average by pitch type
- Monitor for sudden drops (possible injury or grip issue)
- Combine with movement data for complete picture
- Consider spin efficiency, not just raw RPM
### Barrel Rate
**Definition:** Percentage of batted balls that are "barreled" - the combination of exit velocity and launch angle most likely to produce extra-base hits.
**Why It Matters:**
- Best single metric for hard contact quality
- Strongly correlates with power production
- Relatively stable year-to-year
**Barrel Classification:**
- Minimum 98 mph exit velocity
- Launch angle range expands with higher exit velocity
- At 98 mph: Must be 26-30 degrees
- At 105+ mph: 8-50 degrees qualifies
**League Context:**
- Average barrel rate: ~6-7%
- Elite: 12%+ barrel rate
- Top power hitters: 15%+ barrel rate
**Usage Tips:**
- Better than raw home run totals for power assessment
- Less park-dependent than traditional stats
- Useful for identifying breakout/decline candidates
### Expected Statistics (xBA, xSLG, xwOBA)
**Definition:** Quality of contact metrics that estimate what a player's stats "should" be based on batted ball quality.
**Calculation Factors:**
- Exit velocity
- Launch angle
- Sprint speed (for xBA)
- Historical outcomes of similar batted balls
**Why It Matters:**
- Separates luck from skill
- Predictive of future performance
- Identifies regression candidates
**Usage Tips:**
- Large gaps suggest regression coming
- More stable than traditional batting stats
- Use over larger samples (300+ PA)
- Consider park factors for context
## Leaderboards and Player Pages
Baseball Savant's leaderboards and player pages offer rich analytical resources.
### Leaderboard Features
**Customization Options:**
- Date range selection
- Minimum threshold filters (e.g., 100 batted balls)
- Position filters
- Team filters
- Sorting by any metric column
**Export Functionality:**
- Download full leaderboard as CSV
- Includes all visible columns
- Can be imported into Excel, R, Python, etc.
**Available Leaderboards:**
- Batting (exit velocity, barrels, expected stats)
- Pitching (velocity, spin, movement, pitch arsenal)
- Running (sprint speed, baserunning value)
- Defense (OAA, catch probability)
- Catching (framing, pop time, blocking)
### Player Pages
Each player has a dedicated page accessed via search or leaderboard links.
**Player Page Sections:**
**1. Overview Dashboard:**
- Key metrics summary
- Percentile rankings (visualized)
- Season stats table
**2. Batted Ball Profile:**
- Exit velocity distributions
- Launch angle distributions
- Spray chart with exit velocity overlay
- Expected vs. actual stats
**3. Pitch Arsenal (Pitchers):**
- Pitch mix pie chart
- Velocity and movement by pitch type
- Pitch outcome tables
- Usage by count
**4. Plate Discipline (Batters):**
- Zone profile (swing/take rates)
- Chase rate and whiff rate
- Called strike probability heat maps
**5. Sprint Speed:**
- Seasonal sprint speed
- Competitive runs logged
- Percentile ranking
**6. Comparison Tools:**
- Compare to league average
- Compare to specific players
- Year-over-year comparisons
### Using Player Pages for Analysis
**Scouting Applications:**
- Identify pitch arsenal strengths/weaknesses
- Assess batted ball quality
- Evaluate defensive value
- Project future performance
**Fantasy Baseball:**
- Find undervalued players (good xStats, poor actual stats)
- Identify breakout candidates (improved exit velocity)
- Avoid regression (outperforming xStats)
- Injury monitoring (velocity drops)
**Team Building:**
- Complementary skills assessment
- Defensive positioning optimization
- Platoon advantage identification
## Visualization Tools on the Site
Baseball Savant excels at making complex data accessible through visualizations.
### Spray Charts
**Features:**
- Interactive plot of batted ball locations
- Color-coded by exit velocity
- Filter by date, count, pitcher, etc.
- Shows defensive positioning
- Displays outcome (hit/out)
**Use Cases:**
- Identify batted ball tendencies (pull, oppo, all-fields)
- Assess shift effectiveness
- Evaluate ballpark impacts
- Scout defensive positioning needs
### Pitch Movement Charts
**Features:**
- Horizontal and vertical movement plot
- Each pitch type color-coded
- Catcher's perspective view
- Overlay league average movements
- Filter by count, handedness
**Use Cases:**
- Evaluate pitch differentiation
- Identify tunneling opportunities
- Compare to league average movement
- Assess pitch development
### Strike Zone Heat Maps
**Features:**
- Color-intensity shows frequency
- Available for pitch location, swing rate, whiff rate
- Multiple perspectives (pitcher, batter, umpire)
- Overlay strike zone boundary
**Use Cases:**
- Evaluate command and control
- Identify hot/cold zones for batters
- Assess umpire tendencies
- Optimize pitch sequencing
### Percentile Rankings
**Features:**
- Player metrics compared to league
- Color-coded visualization (red = poor, blue = elite)
- Multiple metrics displayed simultaneously
- Quick visual assessment tool
**Use Cases:**
- Rapid player evaluation
- Identify strengths and weaknesses
- Compare across player types
- Track player development
### 3D Pitch Trajectory
**Features:**
- Animated pitch flight path
- Rotation visualization
- Release point to plate
- Multiple angles available
**Use Cases:**
- Understand pitch movement mechanics
- Educational tool for pitching instruction
- Evaluate deception
- Analyze pitch tunneling
## CSV Export Options
Baseball Savant allows users to export data for custom analysis outside the platform.
### Export Methods
**1. Search Tool Export:**
- Run a custom query in Statcast Search
- Click "Download CSV" button
- Maximum 50,000 rows per export
- For larger datasets, use pagination or date filtering
**2. Leaderboard Export:**
- Any leaderboard can be exported
- Click "Download CSV" link
- Includes all visible columns
- Filter before exporting to reduce file size
**3. Player Page Export:**
- Some player page tables have export options
- Seasonal data can be downloaded
- Pitch-by-pitch data available via search tool
### CSV File Structure
**Pitch-by-Pitch Export Columns:**
- `pitch_type`: Classified pitch type
- `game_date`: Date of game
- `release_speed`: Velocity in mph
- `release_pos_x`, `release_pos_y`, `release_pos_z`: Release point coordinates
- `player_name`: Pitcher name
- `batter`: Batter ID
- `events`: Result of at-bat (if applicable)
- `description`: Pitch result (ball, called_strike, etc.)
- `zone`: Strike zone location (1-9 in zone, 11-14 out of zone)
- `des`: Text description
- `stand`: Batter stance (R/L)
- `p_throws`: Pitcher throws (R/L)
- `balls`, `strikes`, `outs_when_up`
- `pfx_x`, `pfx_z`: Pitch movement in inches
- `plate_x`, `plate_z`: Location crossing plate
- `vx0`, `vy0`, `vz0`: Velocity components at release
- `ax`, `ay`, `az`: Acceleration components
- `sz_top`, `sz_bot`: Strike zone dimensions for batter
- `hit_distance_sc`: Distance traveled by batted ball
- `launch_speed`: Exit velocity
- `launch_angle`: Launch angle
- `effective_speed`: Perceived velocity
- `release_spin_rate`: Spin rate
- `release_extension`: Release point distance from rubber
- `barrel`: Barrel classification (1 if barreled)
- `estimated_ba_using_speedangle`: Expected batting average
- `estimated_woba_using_speedangle`: Expected wOBA
- `woba_value`: wOBA value of outcome
- `babip_value`: BABIP value
- `iso_value`: Isolated power value
**Leaderboard Export Columns:**
Vary by leaderboard but typically include:
- Player identification
- Counting stats
- Rate stats
- Statcast metrics
- Percentile rankings
### Best Practices for CSV Exports
**1. Filter Before Exporting:**
- Reduce file size and processing time
- Get only relevant data
- Avoid hitting row limits
**2. Document Your Query:**
- Save query parameters in file name
- Record date of download
- Note any filters applied
**3. Data Cleaning:**
- Handle null values appropriately
- Validate data types
- Check for outliers/errors
**4. Storage and Version Control:**
- Keep raw exported files
- Document any transformations
- Track data source and date
## API Access Patterns
While Baseball Savant doesn't offer an official public API, data can be accessed programmatically.
### Understanding the Data Flow
Baseball Savant serves data through internal endpoints that power its visualizations. While not officially documented as a public API, these endpoints can be accessed programmatically.
**Important Caveats:**
- No official API support from MLB
- Endpoints may change without notice
- Rate limiting is enforced
- Terms of service should be respected
- Consider using pybaseball or baseballr packages instead
### Common Endpoint Patterns
**Statcast Search CSV:**
```
https://baseballsavant.mlb.com/statcast_search/csv?
all=true
&hfPT=
&hfAB=
&hfGT=R%7C
&hfPR=
&hfZ=
&stadium=
&hfBBL=
&hfNewZones=
&hfPull=
&hfC=
&hfSea=2024%7C
&hfSit=
&player_type=pitcher
&hfOuts=
&opponent=
&pitcher_throws=
&batter_stands=
&hfSA=
&game_date_gt=
&game_date_lt=
&hfMo=
&team=
&home_road=
&hfRO=
&position=
&hfInfield=
&hfOutfield=
&hfInn=
&hfBBT=
&player_id=
&type=details
```
**Parameter Breakdown:**
- `all=true`: Include all pitches
- `hfSea`: Season filter (e.g., 2024|)
- `player_type`: pitcher or batter
- `player_id`: MLB player ID
- `game_date_gt`: Start date
- `game_date_lt`: End date
- `type=details`: Detail level
**Expected Statistics API:**
```
https://baseballsavant.mlb.com/leaderboard/expected_statistics?
type=batter
&year=2024
&position=
&team=
&min=q
```
### Rate Limiting and Etiquette
**Best Practices:**
- Implement delays between requests (1-2 seconds minimum)
- Cache results locally
- Avoid hammering the server
- Use during off-peak hours
- Consider using official Python/R packages
**Handling Errors:**
- Implement retry logic with exponential backoff
- Catch HTTP errors gracefully
- Validate response data
- Log failed requests
### Why Use Packages Instead
The pybaseball (Python) and baseballr (R) packages provide:
- Abstracted endpoint access
- Better error handling
- Maintained code that adapts to changes
- Documentation and community support
- Proper rate limiting built-in
## Comparing with FanGraphs Data
Baseball Savant and FanGraphs are complementary resources, each with unique strengths.
### Data Coverage Comparison
**Baseball Savant Strengths:**
- Pitch-level Statcast data (2015+)
- Exit velocity, launch angle, spin rate
- Sprint speed and defensive metrics
- Expected statistics based on batted ball quality
- Official MLB data source
- Granular batted ball and pitch tracking
**FanGraphs Strengths:**
- Historical data back to 1871
- Comprehensive sabermetric calculations (WAR, wRC+, FIP)
- Plate discipline metrics
- Pitch type linear weights
- Park factors and context adjustments
- Projection systems (Steamer, ZiPS, THE BAT)
- Financial data (contracts, salaries)
### Metric Comparison
**Similar Metrics:**
| Metric | Baseball Savant | FanGraphs |
|--------|----------------|-----------|
| Exit Velocity | Average EV, Max EV | Hard%, Average EV |
| Launch Angle | Average LA | GB%, LD%, FB% |
| Expected Stats | xBA, xSLG, xwOBA | xFIP (pitching only) |
| Batted Ball Data | Detailed Statcast | BatttedBall% profiles |
| Spin Rate | Raw RPM | Spin rate data |
**Unique to Baseball Savant:**
- Barrels and barrel rate
- Outs Above Average (OAA)
- Sprint speed
- Pitch movement (inches)
- Catcher framing runs (Statcast-based)
- Expected stats on every batted ball
**Unique to FanGraphs:**
- WAR (wins above replacement)
- wRC+ (weighted runs created plus)
- FIP, xFIP, SIERA (pitching metrics)
- ISO, wOBA, wRC (offensive context metrics)
- UZR, DRS (alternative defensive metrics)
- Historical comparison tools
- Contract and salary data
### When to Use Each Platform
**Use Baseball Savant When:**
- Analyzing batted ball quality
- Evaluating pitch characteristics
- Researching defensive positioning
- Studying player movement and speed
- Seeking pitch-level granularity
- Need official MLB Statcast data
- Want visualizations of tracking data
**Use FanGraphs When:**
- Need historical context (pre-2015)
- Calculating WAR or context-adjusted metrics
- Researching contract/salary information
- Using projection systems
- Need park-adjusted statistics
- Want comprehensive sabermetric suite
- Studying league-wide trends over time
**Use Both When:**
- Conducting comprehensive player evaluation
- Cross-referencing different metrics
- Validating findings across sources
- Building predictive models
- Writing detailed analytical pieces
### Data Integration Strategies
**Combining Data Sources:**
1. Use Baseball Savant for batted ball quality
2. Use FanGraphs for context and league adjustments
3. Merge datasets on player name or MLB ID
4. Cross-validate findings
**Example Workflow:**
- Pull exit velocity data from Baseball Savant
- Pull wRC+ and WAR from FanGraphs
- Merge on player ID
- Analyze correlation between EV and wRC+
- Identify players with strong EV but low wRC+ (unlucky)
- Identify players with weak EV but high wRC+ (likely to regress)
## Python Code Examples with pybaseball
The `pybaseball` package is the most popular Python library for accessing baseball data, including Baseball Savant's Statcast data.
### Installation and Setup
```python
# Install pybaseball
pip install pybaseball
# Import and enable cache for faster repeated queries
from pybaseball import cache
cache.enable()
# Import specific functions
from pybaseball import statcast
from pybaseball import statcast_batter, statcast_pitcher
from pybaseball import playerid_lookup
from pybaseball import sprint_speed, barrel_batted_ball_data
```
### Basic Statcast Query
```python
from pybaseball import statcast
import pandas as pd
# Get all statcast data for a date range
# Warning: Large date ranges can take time and memory
data = statcast(start_dt='2024-08-01', end_dt='2024-08-07')
# View first few rows
print(data.head())
# Check shape
print(f"Retrieved {len(data)} pitches")
# View available columns
print(data.columns.tolist())
# Basic filtering: Only home runs
homers = data[data['events'] == 'home_run']
print(f"Home runs in period: {len(homers)}")
# Calculate average exit velocity on home runs
avg_ev = homers['launch_speed'].mean()
print(f"Average home run exit velocity: {avg_ev:.1f} mph")
```
### Player-Specific Queries
```python
from pybaseball import statcast_batter, statcast_pitcher, playerid_lookup
# Look up player ID
# Returns DataFrame with player IDs across different systems
player = playerid_lookup('ohtani', 'shohei')
print(player)
# Get MLBAM ID (used by Baseball Savant)
mlbam_id = player['key_mlbam'].values[0]
# Get all batted balls by Shohei Ohtani in 2024
ohtani_batting = statcast_batter('2024-03-20', '2024-10-01', mlbam_id)
print(f"Total plate appearances tracked: {len(ohtani_batting)}")
# Filter to only batted balls (exclude swings and misses, takes, etc.)
batted_balls = ohtani_batting[ohtani_batting['launch_speed'].notna()]
# Calculate average exit velocity
avg_ev = batted_balls['launch_speed'].mean()
max_ev = batted_balls['launch_speed'].max()
print(f"Average EV: {avg_ev:.1f} mph")
print(f"Max EV: {max_ev:.1f} mph")
# Calculate barrel rate
barrels = batted_balls[batted_balls['barrel'] == 1]
barrel_rate = len(barrels) / len(batted_balls) * 100
print(f"Barrel rate: {barrel_rate:.1f}%")
# For pitchers
# Example: Get Gerrit Cole's pitching data
cole = playerid_lookup('cole', 'gerrit')
cole_id = cole['key_mlbam'].values[0]
cole_pitching = statcast_pitcher('2024-03-20', '2024-10-01', cole_id)
# Average fastball velocity
fastballs = cole_pitching[cole_pitching['pitch_type'] == 'FF']
avg_fb_velo = fastballs['release_speed'].mean()
avg_fb_spin = fastballs['release_spin_rate'].mean()
print(f"Average 4-seam fastball velocity: {avg_fb_velo:.1f} mph")
print(f"Average 4-seam fastball spin rate: {avg_fb_spin:.0f} RPM")
```
### Working with Season-Level Data
```python
from pybaseball import statcast
import pandas as pd
# Function to get full season data in chunks (avoid memory issues)
def get_season_data(year):
"""
Get full season Statcast data by month to manage memory
"""
# Define month ranges
if year == 2024:
months = [
('2024-03-20', '2024-04-30'),
('2024-05-01', '2024-05-31'),
('2024-06-01', '2024-06-30'),
('2024-07-01', '2024-07-31'),
('2024-08-01', '2024-08-31'),
('2024-09-01', '2024-10-01')
]
else:
# Adjust for other years
months = [
(f'{year}-03-15', f'{year}-04-30'),
(f'{year}-05-01', f'{year}-05-31'),
(f'{year}-06-01', f'{year}-06-30'),
(f'{year}-07-01', f'{year}-07-31'),
(f'{year}-08-01', f'{year}-08-31'),
(f'{year}-09-01', f'{year}-10-01')
]
all_data = []
for start, end in months:
print(f"Fetching {start} to {end}...")
data = statcast(start_dt=start, end_dt=end)
all_data.append(data)
# Combine all months
season_data = pd.concat(all_data, ignore_index=True)
return season_data
# Get 2024 season data
season_2024 = get_season_data(2024)
# Save to CSV for future use
season_2024.to_csv('statcast_2024.csv', index=False)
# Analyze season trends
# Average exit velocity by month
season_2024['month'] = pd.to_datetime(season_2024['game_date']).dt.month
batted_balls = season_2024[season_2024['launch_speed'].notna()]
monthly_ev = batted_balls.groupby('month')['launch_speed'].mean()
print("Average exit velocity by month:")
print(monthly_ev)
```
### Advanced Analysis: Pitch Arsenal
```python
from pybaseball import statcast_pitcher, playerid_lookup
import pandas as pd
import numpy as np
# Get pitcher data
# Example: Analyzing Blake Snell's arsenal
snell = playerid_lookup('snell', 'blake')
snell_id = snell['key_mlbam'].values[0]
snell_data = statcast_pitcher('2024-03-20', '2024-10-01', snell_id)
# Remove null pitch types
snell_data = snell_data[snell_data['pitch_type'].notna()]
# Pitch usage rates
pitch_usage = snell_data['pitch_type'].value_counts(normalize=True) * 100
print("Pitch Usage:")
print(pitch_usage)
# Average velocity by pitch type
pitch_velo = snell_data.groupby('pitch_type')['release_speed'].agg(['mean', 'std', 'count'])
print("\nVelocity by Pitch Type:")
print(pitch_velo)
# Average spin rate by pitch type
pitch_spin = snell_data.groupby('pitch_type')['release_spin_rate'].agg(['mean', 'std'])
print("\nSpin Rate by Pitch Type:")
print(pitch_spin)
# Whiff rate by pitch type
# Whiff = swinging_strike
snell_data['whiff'] = snell_data['description'] == 'swinging_strike'
snell_data['swing'] = snell_data['description'].isin([
'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked',
'foul_tip', 'foul_bunt'
])
whiff_rates = snell_data.groupby('pitch_type').apply(
lambda x: (x['whiff'].sum() / x['swing'].sum() * 100) if x['swing'].sum() > 0 else 0
)
print("\nWhiff Rate by Pitch Type:")
print(whiff_rates)
# Pitch movement profile
movement = snell_data.groupby('pitch_type')[['pfx_x', 'pfx_z']].mean()
print("\nAverage Movement (inches):")
print(movement)
```
### Building Custom Queries with Filters
```python
from pybaseball import statcast
import pandas as pd
# Get data for a specific period
data = statcast(start_dt='2024-07-01', end_dt='2024-08-01')
# Filter: High-leverage situations (7th inning or later, close game)
data['inning_topbot_num'] = data['inning'].astype(str) + data['inning_topbot']
high_leverage = data[
(data['inning'] >= 7) &
(abs(data['bat_score'] - data['fld_score']) <= 2)
]
print(f"High leverage pitches: {len(high_leverage)}")
# Filter: Fastballs 98+ mph in high leverage
fastballs_98 = high_leverage[
(high_leverage['pitch_type'].isin(['FF', 'SI'])) &
(high_leverage['release_speed'] >= 98)
]
# Which pitchers throw the most high-leverage 98+ mph fastballs?
top_throwers = fastballs_98['player_name'].value_counts().head(10)
print("\nTop high-leverage hard throwers:")
print(top_throwers)
# Filter: Barrels against
# Barrels are the combination of EV/LA most likely to result in hits
barrels = data[data['barrel'] == 1]
# Which pitchers allowed the most barrels?
barrels_allowed = barrels['player_name'].value_counts().head(10)
print("\nMost barrels allowed:")
print(barrels_allowed)
# Filter: Two-strike breaking balls
two_strike_breaking = data[
(data['strikes'] == 2) &
(data['pitch_type'].isin(['SL', 'CU', 'KC', 'SV', 'CS']))
]
# Whiff rate on two-strike breaking balls
two_strike_breaking['whiff'] = two_strike_breaking['description'] == 'swinging_strike'
two_strike_breaking['swing'] = two_strike_breaking['description'].isin([
'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked'
])
whiff_rate = (
two_strike_breaking['whiff'].sum() /
two_strike_breaking['swing'].sum() * 100
)
print(f"\nTwo-strike breaking ball whiff rate: {whiff_rate:.1f}%")
```
### Working with Expected Statistics
```python
from pybaseball import statcast_batter
import pandas as pd
# Get batter data with expected statistics
# The Statcast data includes estimated_ba_using_speedangle (xBA)
# and estimated_woba_using_speedangle (xwOBA)
# Example: Aaron Judge 2024
from pybaseball import playerid_lookup
judge = playerid_lookup('judge', 'aaron')
judge_id = judge['key_mlbam'].values[0]
judge_data = statcast_batter('2024-03-20', '2024-10-01', judge_id)
# Filter to batted balls only
batted_balls = judge_data[judge_data['launch_speed'].notna()].copy()
# Calculate actual wOBA from individual pitch values
# The woba_value column contains the wOBA value for each outcome
actual_woba = batted_balls['woba_value'].mean()
# Calculate expected wOBA average
expected_woba = batted_balls['estimated_woba_using_speedangle'].mean()
print(f"Aaron Judge 2024:")
print(f"Actual wOBA on contact: {actual_woba:.3f}")
print(f"Expected wOBA on contact: {expected_woba:.3f}")
print(f"Difference (luck): {actual_woba - expected_woba:+.3f}")
# Calculate expected batting average
# For xBA, we need batted balls in play (not HR)
bip = batted_balls[batted_balls['events'] != 'home_run']
# Actual BA on balls in play
actual_hits = bip[bip['events'].isin([
'single', 'double', 'triple'
])].shape[0]
actual_babip = actual_hits / len(bip) if len(bip) > 0 else 0
# Expected BA
expected_ba = bip['estimated_ba_using_speedangle'].mean()
print(f"\nBABIP: {actual_babip:.3f}")
print(f"Expected BA on contact: {expected_ba:.3f}")
# Identify "unlucky" batted balls
# Batted balls with high xBA but resulted in outs
unlucky = batted_balls[
(batted_balls['estimated_ba_using_speedangle'] > 0.500) &
(batted_balls['events'].isin(['field_out', 'force_out', 'double_play']))
]
print(f"\n'Unlucky' outs (xBA > .500): {len(unlucky)}")
print("\nTop 5 unluckiest batted balls:")
print(unlucky.nlargest(5, 'estimated_ba_using_speedangle')[
['game_date', 'launch_speed', 'launch_angle',
'estimated_ba_using_speedangle', 'events', 'des']
])
```
## R Code Examples with baseballr
The `baseballr` package provides R users access to Baseball Savant data through convenient functions.
### Installation and Setup
```r
# Install baseballr from CRAN
install.packages("baseballr")
# Or install development version from GitHub
# install.packages("devtools")
# devtools::install_github("BillPetti/baseballr")
# Load library
library(baseballr)
library(dplyr)
library(ggplot2)
# The package uses mlb_pbp() and scrape_statcast_savant() functions
```
### Basic Statcast Queries
```r
library(baseballr)
library(dplyr)
# Get Statcast data for a date range
# Note: Use scrape_statcast_savant() for date-based queries
data <- scrape_statcast_savant(
start_date = "2024-08-01",
end_date = "2024-08-07",
player_type = "batter"
)
# View structure
glimpse(data)
# How many pitches?
nrow(data)
# Filter to home runs only
home_runs <- data %>%
filter(events == "home_run")
# Average exit velocity on home runs
avg_hr_ev <- home_runs %>%
summarise(
avg_ev = mean(launch_speed, na.rm = TRUE),
max_ev = max(launch_speed, na.rm = TRUE),
count = n()
)
print(avg_hr_ev)
# Top home run hitters in the period
hr_leaders <- home_runs %>%
count(player_name, sort = TRUE) %>%
head(10)
print(hr_leaders)
```
### Player-Specific Analysis
```r
library(baseballr)
library(dplyr)
# Get player ID using lookup function
# Note: baseballr uses different lookup than pybaseball
player_lookup <- playerid_lookup("Ohtani", "Shohei")
print(player_lookup)
# Get MLBAM ID
mlbam_id <- player_lookup$mlbam_id[1]
# Scrape Statcast data for specific batter
# Use scrape_statcast_savant_batter() for batter-specific
ohtani_data <- scrape_statcast_savant(
start_date = "2024-03-20",
end_date = "2024-09-30",
player_type = "batter"
) %>%
filter(batter == mlbam_id)
# Filter to batted balls only
batted_balls <- ohtani_data %>%
filter(!is.na(launch_speed))
# Calculate metrics
metrics <- batted_balls %>%
summarise(
total_bbe = n(),
avg_ev = mean(launch_speed, na.rm = TRUE),
max_ev = max(launch_speed, na.rm = TRUE),
avg_la = mean(launch_angle, na.rm = TRUE),
barrel_count = sum(barrel == 1, na.rm = TRUE),
barrel_rate = mean(barrel == 1, na.rm = TRUE) * 100,
hard_hit_count = sum(launch_speed >= 95, na.rm = TRUE),
hard_hit_rate = mean(launch_speed >= 95, na.rm = TRUE) * 100
)
print(metrics)
# Pitch type faced
pitch_breakdown <- ohtani_data %>%
filter(!is.na(pitch_type)) %>%
count(pitch_type, sort = TRUE) %>%
mutate(pct = n / sum(n) * 100)
print(pitch_breakdown)
```
### Pitcher Analysis
```r
library(baseballr)
library(dplyr)
# Look up pitcher
pitcher_lookup <- playerid_lookup("Cole", "Gerrit")
cole_id <- pitcher_lookup$mlbam_id[1]
# Get pitcher data
cole_data <- scrape_statcast_savant(
start_date = "2024-03-20",
end_date = "2024-09-30",
player_type = "pitcher"
) %>%
filter(pitcher == cole_id)
# Pitch arsenal breakdown
arsenal <- cole_data %>%
filter(!is.na(pitch_type)) %>%
group_by(pitch_type, pitch_name) %>%
summarise(
count = n(),
usage_pct = n() / nrow(cole_data) * 100,
avg_velo = mean(release_speed, na.rm = TRUE),
avg_spin = mean(release_spin_rate, na.rm = TRUE),
avg_h_break = mean(pfx_x, na.rm = TRUE),
avg_v_break = mean(pfx_z, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(usage_pct))
print(arsenal)
# Whiff rate by pitch type
whiff_data <- cole_data %>%
filter(!is.na(pitch_type)) %>%
mutate(
whiff = description == "swinging_strike",
swing = description %in% c(
"swinging_strike", "foul", "hit_into_play",
"swinging_strike_blocked", "foul_tip"
)
) %>%
group_by(pitch_type) %>%
summarise(
swings = sum(swing),
whiffs = sum(whiff),
whiff_rate = ifelse(swings > 0, whiffs / swings * 100, 0),
.groups = "drop"
) %>%
arrange(desc(whiff_rate))
print(whiff_data)
```
### Season-Level Data Analysis
```r
library(baseballr)
library(dplyr)
library(purrr)
# Function to scrape full season by month
scrape_season_data <- function(year) {
# Define month ranges
if (year == 2024) {
dates <- list(
c("2024-03-20", "2024-04-30"),
c("2024-05-01", "2024-05-31"),
c("2024-06-01", "2024-06-30"),
c("2024-07-01", "2024-07-31"),
c("2024-08-01", "2024-08-31"),
c("2024-09-01", "2024-09-30")
)
} else {
# Construct for other years
dates <- list(
c(paste0(year, "-03-15"), paste0(year, "-04-30")),
c(paste0(year, "-05-01"), paste0(year, "-05-31")),
c(paste0(year, "-06-01"), paste0(year, "-06-30")),
c(paste0(year, "-07-01"), paste0(year, "-07-31")),
c(paste0(year, "-08-01"), paste0(year, "-08-31")),
c(paste0(year, "-09-01"), paste0(year, "-09-30"))
)
}
# Scrape each month and combine
all_data <- map_df(dates, function(date_range) {
message(sprintf("Fetching %s to %s...", date_range[1], date_range[2]))
scrape_statcast_savant(
start_date = date_range[1],
end_date = date_range[2],
player_type = "batter"
)
})
return(all_data)
}
# Get 2024 season (this will take several minutes)
# season_2024 <- scrape_season_data(2024)
# Save for later use
# write.csv(season_2024, "statcast_2024.csv", row.names = FALSE)
# Load saved data
# season_2024 <- read.csv("statcast_2024.csv")
# Analyze league-wide trends
# Average exit velocity by month
# monthly_trends <- season_2024 %>%
# filter(!is.na(launch_speed)) %>%
# mutate(month = lubridate::month(game_date)) %>%
# group_by(month) %>%
# summarise(
# avg_ev = mean(launch_speed, na.rm = TRUE),
# median_ev = median(launch_speed, na.rm = TRUE),
# batted_balls = n(),
# .groups = "drop"
# )
#
# print(monthly_trends)
```
### Custom Query Building
```r
library(baseballr)
library(dplyr)
# Get data for specific period
data <- scrape_statcast_savant(
start_date = "2024-07-01",
end_date = "2024-08-01",
player_type = "pitcher"
)
# Filter: High-leverage 98+ mph fastballs
high_lev_hard <- data %>%
filter(
inning >= 7,
abs(bat_score - fld_score) <= 2,
pitch_type %in% c("FF", "SI"),
release_speed >= 98
)
# Who throws them most?
top_throwers <- high_lev_hard %>%
count(player_name, sort = TRUE) %>%
head(10)
print(top_throwers)
# Two-strike breaking ball whiff rates
two_strike_breaking <- data %>%
filter(
strikes == 2,
pitch_type %in% c("SL", "CU", "KC", "SV", "CS")
) %>%
mutate(
whiff = description == "swinging_strike",
swing = description %in% c(
"swinging_strike", "foul", "hit_into_play",
"swinging_strike_blocked", "foul_tip"
)
)
# Overall whiff rate
whiff_summary <- two_strike_breaking %>%
summarise(
total_swings = sum(swing),
total_whiffs = sum(whiff),
whiff_rate = total_whiffs / total_swings * 100
)
print(whiff_summary)
# By pitcher
pitcher_whiff <- two_strike_breaking %>%
group_by(player_name) %>%
summarise(
swings = sum(swing),
whiffs = sum(whiff),
whiff_rate = ifelse(swings >= 20, whiffs / swings * 100, NA),
.groups = "drop"
) %>%
filter(!is.na(whiff_rate)) %>%
arrange(desc(whiff_rate)) %>%
head(15)
print(pitcher_whiff)
```
### Visualization Examples
```r
library(baseballr)
library(dplyr)
library(ggplot2)
# Get batter data for visualization
player_lookup <- playerid_lookup("Judge", "Aaron")
judge_id <- player_lookup$mlbam_id[1]
judge_data <- scrape_statcast_savant(
start_date = "2024-03-20",
end_date = "2024-09-30",
player_type = "batter"
) %>%
filter(batter == judge_id)
# Filter to batted balls
batted_balls <- judge_data %>%
filter(!is.na(launch_speed))
# 1. Exit velocity distribution
ggplot(batted_balls, aes(x = launch_speed)) +
geom_histogram(binwidth = 2, fill = "steelblue", color = "white") +
geom_vline(xintercept = 95, linetype = "dashed", color = "red") +
annotate("text", x = 95, y = Inf, label = "Hard Hit (95+ mph)",
vjust = 2, hjust = -0.1, color = "red") +
labs(
title = "Aaron Judge Exit Velocity Distribution - 2024",
x = "Exit Velocity (mph)",
y = "Count"
) +
theme_minimal()
# 2. Launch angle vs exit velocity (spray chart style)
ggplot(batted_balls, aes(x = launch_angle, y = launch_speed)) +
geom_point(aes(color = events), alpha = 0.6, size = 2) +
geom_hline(yintercept = 95, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = c(8, 32), linetype = "dashed", alpha = 0.5) +
scale_color_brewer(palette = "Set1") +
labs(
title = "Aaron Judge Launch Angle vs Exit Velocity - 2024",
x = "Launch Angle (degrees)",
y = "Exit Velocity (mph)",
color = "Outcome"
) +
theme_minimal()
# 3. Expected vs Actual wOBA
woba_comparison <- batted_balls %>%
summarise(
actual_woba = mean(woba_value, na.rm = TRUE),
expected_woba = mean(estimated_woba_using_speedangle, na.rm = TRUE)
) %>%
tidyr::pivot_longer(cols = everything(), names_to = "metric", values_to = "woba")
ggplot(woba_comparison, aes(x = metric, y = woba, fill = metric)) +
geom_col() +
geom_text(aes(label = round(woba, 3)), vjust = -0.5) +
scale_fill_manual(values = c("actual_woba" = "steelblue",
"expected_woba" = "coral")) +
labs(
title = "Aaron Judge: Actual vs Expected wOBA - 2024",
x = "",
y = "wOBA"
) +
theme_minimal() +
theme(legend.position = "none")
```
### Working with Pitch-Level Data
```r
library(baseballr)
library(dplyr)
# Get detailed pitch-level data for a specific game
# First, find a game
games <- mlb_schedule(season = 2024, level_ids = 1) %>%
filter(teams_away_team_name == "New York Yankees",
teams_home_team_name == "Los Angeles Dodgers") %>%
head(1)
game_pk <- games$game_pk[1]
# Get play-by-play including Statcast
game_data <- mlb_pbp(game_pk)
# Examine structure
glimpse(game_data)
# Alternatively, use date-based scraping for specific teams
yankees_dodgers <- scrape_statcast_savant(
start_date = "2024-06-07",
end_date = "2024-06-09",
player_type = "batter"
) %>%
filter(
(home_team == "NYY" & away_team == "LAD") |
(home_team == "LAD" & away_team == "NYY")
)
# Analyze pitch sequencing
pitch_sequence <- yankees_dodgers %>%
arrange(game_date, at_bat_number, pitch_number) %>%
select(
at_bat_number, pitch_number, pitch_type, pitch_name,
release_speed, pfx_x, pfx_z, plate_x, plate_z,
description, events
)
# Count analysis
count_outcomes <- yankees_dodgers %>%
mutate(count = paste0(balls, "-", strikes)) %>%
group_by(count, pitch_type) %>%
summarise(
n = n(),
avg_velo = mean(release_speed, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(count, desc(n))
print(count_outcomes)
```
## Conclusion
Baseball Savant has revolutionized public access to MLB's most advanced data. From casual fans exploring player pages to analysts building sophisticated models, the platform provides unprecedented insights into the game. By understanding the search interface, available metrics, and programmatic access through Python and R, you can unlock powerful analytical capabilities.
Key takeaways:
- **Statcast data** provides granular tracking of every play
- **Search and leaderboards** offer interactive data exploration
- **CSV exports** enable custom analysis
- **Python (pybaseball)** and **R (baseballr)** packages simplify data access
- **Complementary to FanGraphs** for comprehensive analysis
- **Proper interpretation** requires understanding metrics and limitations
Whether you're evaluating players for fantasy baseball, conducting baseball research, or simply satisfying your curiosity about the game, Baseball Savant is an essential resource in the modern baseball landscape.
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions