Accessing FanGraphs Data
FanGraphs Data Access and Analytics Guide
FanGraphs has established itself as one of the premier baseball analytics platforms, providing comprehensive statistics, advanced metrics, and data analysis tools. The site offers everything from traditional counting statistics to cutting-edge sabermetric measures, tracking data, and projection systems. Understanding how to access, interpret, and analyze FanGraphs data is essential for modern baseball analysis, player evaluation, and research.
This comprehensive guide covers FanGraphs metrics, data access methods, programming interfaces, and practical applications. Whether you're a casual fan looking to understand advanced statistics or a serious analyst building predictive models, this tutorial provides the knowledge and tools needed to leverage FanGraphs effectively.
What is FanGraphs?
FanGraphs (www.fangraphs.com) is a baseball statistics and analytics website founded in 2005 that has become an indispensable resource for analysts, writers, front office personnel, and fans. The platform distinguishes itself through several key features:
- Comprehensive Statistics: FanGraphs hosts complete historical data dating back to baseball's early eras, including traditional stats, advanced metrics, and modern tracking data from Statcast.
- Advanced Sabermetrics: The site calculates and displays sophisticated metrics like wOBA (weighted On-Base Average), wRC+ (weighted Runs Created Plus), FIP (Fielding Independent Pitching), and their proprietary calculation of WAR (Wins Above Replacement).
- Projection Systems: FanGraphs provides access to multiple projection systems including ZiPS, Steamer, ATC (Average Total Cost), and THE BAT, enabling users to forecast future player performance.
- Research Tools: The platform offers leaderboards with customizable date ranges, splits analysis showing performance in different situations, game logs, and advanced filtering options.
- Articles and Analysis: A team of excellent writers produces daily analytical content explaining metrics, evaluating players and teams, and advancing sabermetric research.
FanGraphs differs from Baseball Reference (its main competitor) in several important ways. While both sites offer comprehensive statistics, FanGraphs emphasizes modern sabermetrics and predictive metrics, provides more granular plate discipline and batted ball data, and uses different methodologies for calculating WAR. Baseball Reference focuses more on historical context and traditional statistics, making the two sites complementary resources.
Understanding FanGraphs Metrics
FanGraphs has developed and popularized numerous advanced metrics that provide deeper insights into player performance than traditional statistics. These metrics attempt to isolate skill, remove contextual factors, and better predict future outcomes.
Key Offensive Metrics
wOBA (weighted On-Base Average)
wOBA = (0.69×BB + 0.72×HBP + 0.88×1B + 1.24×2B + 1.56×3B + 1.95×HR) / (AB + BB - IBB + SF + HBP)
wOBA measures overall offensive value by assigning appropriate weights to each offensive outcome based on their run value. Unlike OPS, which simply adds OBP and SLG (giving OBP too little weight and SLG too much), wOBA correctly weighs each outcome according to its actual run-scoring impact. The league average wOBA is typically calibrated to match league average OBP (around .320), making it intuitive to interpret.
wRC+ (weighted Runs Created Plus)
wRC+ = (wRAA per PA / League wRAA per PA + 1) × 100
wRC+ quantifies a player's total offensive value in a single number adjusted for park effects and league context. A wRC+ of 100 represents league average, with each point above or below representing one percent better or worse than average. A player with a 150 wRC+ has been 50% better than league average, while an 80 wRC+ indicates 20% below average. This context-neutral approach makes wRC+ ideal for comparing players across different eras and ballparks.
ISO (Isolated Power) measures raw power by subtracting batting average from slugging percentage (ISO = SLG - AVG). This isolates extra-base hit power from overall batting average. League average ISO is typically around .140, with .200+ representing elite power.
BABIP (Batting Average on Balls In Play) calculates batting average excluding home runs and strikeouts. League average is consistently around .300, and significant deviations often indicate luck or unsustainable performance. However, elite contact hitters can sustain higher BABIPs (.330+) through skill.
Key Pitching Metrics
FIP (Fielding Independent Pitching)
FIP = ((13×HR + 3×BB - 2×K) / IP) + constant
FIP estimates what a pitcher's ERA should have been based solely on outcomes they control: strikeouts, walks, and home runs. By excluding balls in play, FIP removes the influence of defense and luck. The constant (typically around 3.10) calibrates FIP to the ERA scale. FIP generally predicts future ERA better than current ERA does, making it valuable for player evaluation and transactions.
xFIP (Expected FIP)
xFIP = ((13×Flyball Rate×League HR/FB Rate + 3×BB - 2×K) / IP) + constant
xFIP adjusts FIP by normalizing home run rates to league average based on fly ball rate. Since HR/FB rates can fluctuate significantly due to luck and ballpark effects, xFIP provides an even more stable predictor of future performance than FIP. If a pitcher's ERA is much better than their xFIP, regression is likely.
SIERA (Skill-Interactive ERA) is a more sophisticated metric that accounts for batted ball types, strikeouts, walks, and their interactions. Unlike FIP, SIERA recognizes that not all strikeouts and walks are created equal - getting a strikeout with runners on base is more valuable than with bases empty.
K% and BB% (strikeout and walk rates as percentages of total batters faced) provide clean measures of a pitcher's ability to miss bats and command. These rates are more stable and predictive than raw K/9 and BB/9, which can be influenced by team defense and sequencing luck.
Wins Above Replacement (fWAR)
FanGraphs calculates WAR using their own methodology (abbreviated as fWAR to distinguish from Baseball Reference's bWAR). WAR attempts to summarize a player's total contribution in a single number representing wins contributed above a replacement-level player.
For position players, fWAR = (Batting Runs + Base Running Runs + Fielding Runs + Positional Adjustment + League Adjustment + Replacement Runs) / Runs Per Win
For pitchers, fWAR uses FIP rather than ERA as the foundation, making it defense-independent. Key differences between fWAR and bWAR include:
- Pitching Basis: fWAR uses FIP; bWAR uses RA9 (runs allowed per nine innings)
- Defensive Metrics: fWAR uses UZR (Ultimate Zone Rating) and DRS; bWAR uses DRS and positional adjustments
- Replacement Level: Slight differences in calculating the replacement baseline
Neither version is definitively "correct" - they represent different philosophical approaches. fWAR's use of FIP makes it more predictive, while bWAR's use of actual runs allowed better reflects what actually happened.
FanGraphs Leaderboards and Export Options
The FanGraphs leaderboards are the primary interface for accessing statistical data. They offer extensive customization and filtering options that make them powerful research tools.
Using Leaderboards Effectively
To access leaderboards, navigate to the "Leaders" tab and select either "Major League" or "Minor League" statistics. The interface provides numerous options:
- Date Range: Analyze any time period from single games to entire careers. Custom date ranges enable studying hot/cold streaks, second-half performance, or specific eras.
- Split Categories: View performance splits by handedness (vs LHP/RHP), home/road, day/night, month, count, men on base, and many other situations.
- Stat Types: Choose from standard batting, advanced batting, batted ball, plate discipline, pitch type, or value metrics. Multiple tabs organize the dozens of available statistics.
- Minimum Plate Appearances/Innings: Filter for qualified players or customize thresholds to include/exclude specific players.
- Teams and Positions: Filter by team, position, or league to narrow results.
The "Dashboard" view presents the most important metrics on a single screen, making it ideal for quick player comparisons. The "Standard" view shows traditional counting stats, while "Advanced" displays sabermetric measures like wRC+, wOBA, and WAR.
Exporting Data
FanGraphs makes data export simple and flexible. At the bottom of every leaderboard, you'll find an "Export Data" button. This generates a CSV file containing all displayed statistics for the filtered players. The export includes:
- All visible columns from your selected view (Standard, Advanced, Batted Ball, etc.)
- Player names, team affiliations, and identifiers
- Statistics formatted as clean numerical values suitable for analysis
Best practices for exporting FanGraphs data:
- Select the appropriate stat view before exporting - you can only export visible columns
- Adjust player minimums to avoid cluttering datasets with small sample sizes
- Use custom date ranges to isolate specific time periods of interest
- Export to CSV for easy import into Excel, R, Python, or SQL databases
- Maintain consistent naming conventions when saving exported files for analysis pipelines
WAR Calculation Differences: fWAR vs bWAR
Understanding the methodological differences between FanGraphs WAR (fWAR) and Baseball Reference WAR (bWAR) is crucial for proper interpretation and analysis. Both attempt to measure total player value, but they make different assumptions and use different components.
Position Player WAR Differences
| Component | fWAR (FanGraphs) | bWAR (Baseball Reference) |
|---|---|---|
| Offensive Runs | wRAA (weighted Runs Above Average) | Batting Runs (similar methodology) |
| Base Running | Base running runs (BsR) | Base running runs (similar) |
| Defense | UZR (Ultimate Zone Rating) + DRS blend | DRS (Defensive Runs Saved) + TZR |
| Positional Adjustment | Spectrum based on positional scarcity | Similar positional adjustments |
| Replacement Level | ~20.5 wins below average (varies by league) | ~17.5 wins below average |
Pitcher WAR Differences
The most significant difference lies in pitching evaluation:
- fWAR uses FIP as its foundation, focusing on strikeouts, walks, and home runs. This approach credits pitchers only for outcomes they directly control, removing defense and luck. fWAR better predicts future performance and isolates pitcher skill.
- bWAR uses RA9-WAR based on actual runs allowed (both earned and unearned). This approach credits pitchers for their actual results, including ability to prevent hits on balls in play and strand runners. bWAR better captures what actually happened in the past.
Practical implications of these differences:
- Pitchers with excellent defense behind them will have higher bWAR than fWAR
- Pitchers who prevent hits on balls in play through skill will have higher bWAR
- Pitchers with unsustainable low BABIP will see bWAR overstate their value
- For projection and player acquisition, fWAR typically provides better guidance
- For historical assessment of what actually occurred, bWAR may be preferred
When comparing players, it's valuable to examine both fWAR and bWAR. Large discrepancies often reveal interesting insights about defense, luck, or specific skills. Neither metric is perfect - both provide useful perspectives on player value.
FanGraphs-Specific Advanced Metrics
Plate Discipline Metrics
FanGraphs provides granular plate discipline statistics that illuminate a hitter's approach and ability to identify pitches:
- O-Swing%: Percentage of pitches outside the strike zone at which a batter swings. League average is ~30%; elite plate discipline shows O-Swing% under 25%.
- Z-Swing%: Percentage of pitches in the strike zone at which a batter swings. League average is ~67%; aggressive hitters exceed 70%.
- Swing%: Overall swing percentage. Typically around 45-48% league-wide.
- O-Contact%: Percentage of swings on pitches outside the zone that result in contact. Measures ability to make contact on bad pitches. League average ~60%.
- Z-Contact%: Percentage of swings in the zone resulting in contact. Elite contact hitters exceed 90%.
- Contact%: Overall contact rate on swings. League average ~78%; elite contact skills show 85%+.
- Zone%: Percentage of pitches seen in the strike zone. Reveals how pitchers approach the hitter. Star hitters often see Zone% below 45% as pitchers work around them.
- F-Strike%: Percentage of plate appearances beginning with a first-pitch strike. Measures how often a batter falls behind in the count.
Batted Ball Metrics
FanGraphs tracks batted ball types to provide insight into hitting approach and power potential:
- GB% / LD% / FB%: Ground ball, line drive, and fly ball percentages. League average is roughly 45% GB, 20% LD, 35% FB. Extreme ground ball pitchers exceed 50% GB rate; fly ball power hitters show FB% over 40%.
- HR/FB: Home run to fly ball ratio. League average fluctuates around 10-11%. Elite power hitters sustain 15-20% HR/FB rates. Significant year-to-year variance is common, making it useful for identifying regression candidates.
- Pull% / Cent% / Oppo%: Percentages of batted balls hit to pull field, center, and opposite field. Balanced hitters show roughly even distribution; extreme pull hitters exceed 50% pull rate but may be vulnerable to defensive shifts.
- Soft% / Med% / Hard%: Percentage of batted balls hit softly, medium, or hard. These are subjective classifications based on how the ball comes off the bat. Hard% above 40% indicates strong contact ability; Soft% above 20% suggests weak contact.
Statcast Integration
FanGraphs incorporates Statcast metrics, providing objective measurements of batted ball and pitch characteristics:
- Exit Velocity: Average and maximum speed of batted balls off the bat. Elite hitters average 90+ mph; 95+ mph represents exceptional bat speed and power.
- Launch Angle: Average angle of batted balls. Optimal launch angle for power is 15-25 degrees; line drive hitters target 10-15 degrees.
- Barrel%: Percentage of batted balls classified as "barrels" (optimal combination of exit velocity and launch angle). Barrel% above 10% indicates elite contact quality; league average is ~6-7%.
- HardHit%: Percentage of batted balls with 95+ mph exit velocity. Strong correlation with offensive production; 45%+ represents elite contact.
- EV95%: Expected batting average on 95+ mph batted balls. Provides context for whether hard contact is producing results.
Splits and Game Logs
FanGraphs provides comprehensive splits data showing player performance in various contexts. This granular data reveals platoon splits, situational performance, and home/road differences.
Available Split Categories
- Platoon Splits: Performance vs LHP and RHP. Reveals vulnerability to same-handed pitching and potential platoon situations. Significant platoon splits exceed 50 points of wRC+ difference.
- Home/Road: Identifies park effects and travel impacts. Large home/road disparities may indicate park-dependent skills or comfort levels.
- Month: Reveals seasonal patterns, hot/cold streaks, and adjustment periods. Useful for identifying players who start slow or fade in summer heat.
- Count: Performance in different count situations (ahead, behind, even). Elite hitters maintain performance regardless of count; pitch-to-contact hitters often struggle when behind.
- Men On/Bases Empty: Clutch performance indicators. Large disparities may indicate pressing or rising to the occasion, though these effects are often overstated.
- High/Medium/Low Leverage: Performance in different game situations by leverage index. True clutch performers are rare; most variance is noise.
- Inning: Performance by inning, useful for identifying fatigue patterns in pitchers or late-game tendencies.
- Day/Night: Some players perform significantly better under lights or in day games, potentially due to vision or circadian rhythm factors.
Game Logs
Game logs provide play-by-play performance for every game a player appeared in. Each log entry includes:
- Date, opponent, and location
- Traditional statistics (AB, H, R, RBI, BB, K, etc.)
- Advanced metrics calculated for that specific game
- Pitch counts and usage patterns for pitchers
- Win probability added (WPA) showing impact on game outcome
Game logs are invaluable for:
- Identifying streaks and slumps
- Analyzing performance against specific opponents or pitchers
- Tracking workload and fatigue indicators
- Building time-series models for prediction
- Verifying consistency versus volatility
Projection Systems on FanGraphs
FanGraphs hosts multiple projection systems that forecast future player performance. These systems use different methodologies but all attempt to predict next season's statistics based on historical performance, aging curves, and regression to the mean.
ZiPS (Szymborski Projection System)
Developed by Dan Szymborski, ZiPS uses a nearest-neighbor approach that identifies similar players from baseball history and projects based on their aging patterns. ZiPS strengths include:
- Deep historical database spanning decades of player comparisons
- Sophisticated aging curves customized by position and skill profile
- Automatic regression to league mean based on sample size and reliability
- Regular in-season updates as new performance data emerges
Steamer
Created by a team of analysts including Dash Davidson and Peter Rosenbloom, Steamer emphasizes recent performance and uses a hybrid approach combining regression and similar-player analysis. Steamer characteristics:
- Heavy weighting of recent three years of performance
- Component-based approach projects underlying skills rather than outcomes
- Playing time estimates based on depth charts and projected roles
- Generally conservative, often underestimating breakouts but avoiding overconfidence in unsustainable performance
ATC (Average Total Cost)
ATC aggregates multiple projection systems by taking weighted averages, creating an ensemble forecast. The methodology:
- Combines ZiPS, Steamer, and THE BAT projections
- Weights systems based on historical accuracy
- Provides range estimates showing uncertainty
- Often most accurate due to diversification across methodologies
THE BAT
A newer projection system emphasizing Statcast data and granular skill components:
- Incorporates exit velocity, launch angle, and swing metrics
- Uses machine learning to identify predictive patterns
- Accounts for player development and skill changes
- Generally more aggressive in projecting young player development
Using Projections Effectively
Best practices for working with projections:
- No projection system is perfect - use multiple systems and compare
- Projections work best for established players with 3+ years of data
- Young players and rookies have wide uncertainty ranges
- Consider projection ranges and confidence intervals, not just point estimates
- Update projections as season progresses - early season projections are less accurate
- Combine projections with scouting reports for complete player evaluation
- Use projections for relative comparisons rather than precise predictions
API and Data Access
While FanGraphs doesn't provide an official public API, several methods exist for programmatic data access. The most reliable approach is using the PyBaseball library in Python or baseballr in R, which scrape FanGraphs pages and return clean DataFrames.
Python Data Access with PyBaseball
import pandas as pd
from pybaseball import fg_batting_data, fg_pitching_data
from pybaseball import batting_stats, pitching_stats
from pybaseball import playerid_lookup, statcast_batter
# Enable caching to reduce load on FanGraphs servers
from pybaseball import cache
cache.enable()
# Fetch FanGraphs batting leaderboard data
def get_batting_leaders(start_season=2023, end_season=2023, min_pa=100):
"""
Get batting statistics from FanGraphs.
Parameters:
-----------
start_season : int
First season to include
end_season : int
Last season to include
min_pa : int
Minimum plate appearances to include
Returns:
--------
DataFrame with batting statistics
"""
# Using batting_stats function
data = batting_stats(start_season, end_season, qual=min_pa)
# Select key columns
columns = [
'Name', 'Team', 'Age', 'G', 'PA', 'AB', 'H', 'HR', 'R', 'RBI', 'SB',
'BB%', 'K%', 'ISO', 'BABIP', 'AVG', 'OBP', 'SLG', 'wOBA', 'wRC+',
'BsR', 'Off', 'Def', 'WAR'
]
result = data[columns].copy()
return result
# Example: Get 2023 batting leaders
batting_2023 = get_batting_leaders(2023, 2023, min_pa=400)
# Sort by wRC+ and display top performers
top_hitters = batting_2023.nlargest(10, 'wRC+')
print("Top 10 Hitters by wRC+ (2023):")
print(top_hitters[['Name', 'Team', 'PA', 'wOBA', 'wRC+', 'WAR']])
# Fetch pitching statistics
def get_pitching_leaders(start_season=2023, end_season=2023, min_ip=50):
"""
Get pitching statistics from FanGraphs.
Parameters:
-----------
start_season : int
First season to include
end_season : int
Last season to include
min_ip : int
Minimum innings pitched to include
Returns:
--------
DataFrame with pitching statistics
"""
data = pitching_stats(start_season, end_season, qual=min_ip)
columns = [
'Name', 'Team', 'Age', 'W', 'L', 'SV', 'G', 'GS', 'IP',
'K/9', 'BB/9', 'HR/9', 'BABIP', 'LOB%', 'GB%', 'HR/FB',
'ERA', 'FIP', 'xFIP', 'SIERA', 'K%', 'BB%', 'WAR'
]
result = data[columns].copy()
return result
# Example: Get 2023 pitching leaders
pitching_2023 = get_pitching_leaders(2023, 2023, min_ip=100)
# Sort by WAR
top_pitchers = pitching_2023.nlargest(10, 'WAR')
print("\nTop 10 Pitchers by WAR (2023):")
print(top_pitchers[['Name', 'Team', 'IP', 'ERA', 'FIP', 'WAR']])
# Get specific player data
def get_player_profile(last_name, first_name):
"""
Look up player and get their career statistics.
Parameters:
-----------
last_name : str
Player's last name
first_name : str
Player's first name
Returns:
--------
Dictionary with player info and career stats
"""
# Look up player ID
player_lookup = playerid_lookup(last_name, first_name)
if len(player_lookup) == 0:
return {"error": f"Player {first_name} {last_name} not found"}
player_info = player_lookup.iloc[0]
# Get career batting stats if available
try:
career_batting = batting_stats(
player_info['mlb_debut'].year if pd.notna(player_info['mlb_debut']) else 2020,
2023,
qual=0
)
player_stats = career_batting[
career_batting['Name'].str.contains(first_name) &
career_batting['Name'].str.contains(last_name)
]
return {
'player_info': player_info,
'career_stats': player_stats
}
except:
return {
'player_info': player_info,
'career_stats': None
}
# Example: Get Aaron Judge profile
judge_profile = get_player_profile('Judge', 'Aaron')
print("\nAaron Judge Career Information:")
print(judge_profile['player_info'][['name_first', 'name_last', 'mlb_debut', 'key_mlbam']])
Advanced PyBaseball Usage
# Get player game logs
from pybaseball import schedule_and_record
def get_player_game_log(player_id, year):
"""
Fetch detailed game-by-game statistics for a player.
Parameters:
-----------
player_id : int
MLB player ID (MLBAM ID)
year : int
Season year
Returns:
--------
DataFrame with game-by-game performance
"""
# Note: PyBaseball doesn't have direct game log function
# This is a conceptual example - actual implementation varies
# Alternative: Use statcast data and aggregate by game
from datetime import datetime
start_date = f'{year}-01-01'
end_date = f'{year}-12-31'
statcast_data = statcast_batter(start_date, end_date, player_id)
# Aggregate by game
game_logs = statcast_data.groupby('game_date').agg({
'launch_speed': 'mean',
'launch_angle': 'mean',
'events': 'count',
'estimated_woba_using_speedangle': 'mean',
'woba_value': 'sum'
}).reset_index()
game_logs.columns = [
'game_date', 'avg_exit_velo', 'avg_launch_angle',
'batted_balls', 'xwOBA', 'total_woba'
]
return game_logs
# Working with projection data
def compare_projections(player_name, year=2024):
"""
Compare multiple projection systems for a player.
Parameters:
-----------
player_name : str
Player name
year : int
Projection year
Returns:
--------
DataFrame comparing projection systems
"""
# Note: Projections must be scraped from FanGraphs pages
# This is a conceptual framework
projections = {
'System': ['ZiPS', 'Steamer', 'ATC', 'THE BAT'],
'PA': [550, 560, 555, 565],
'HR': [28, 25, 27, 30],
'R': [85, 82, 84, 88],
'RBI': [82, 78, 80, 85],
'SB': [12, 10, 11, 13],
'AVG': [.275, .268, .272, .278],
'wOBA': [.350, .342, .346, .355],
'WAR': [4.2, 3.8, 4.0, 4.5]
}
return pd.DataFrame(projections)
# Batch processing multiple players
def analyze_team_offense(team_abbrev, year):
"""
Analyze offensive performance for all players on a team.
Parameters:
-----------
team_abbrev : str
Team abbreviation (e.g., 'NYY', 'LAD')
year : int
Season year
Returns:
--------
DataFrame with team offensive statistics
"""
# Get all batting data
all_batting = batting_stats(year, qual=0)
# Filter to specific team
team_batting = all_batting[all_batting['Team'] == team_abbrev].copy()
# Calculate team totals and averages
team_summary = {
'total_war': team_batting['WAR'].sum(),
'avg_wrc_plus': team_batting['wRC+'].mean(),
'total_hr': team_batting['HR'].sum(),
'team_woba': team_batting['wOBA'].mean(),
'player_count': len(team_batting)
}
return team_batting, team_summary
# Example usage
yankees_2023, yankees_summary = analyze_team_offense('NYY', 2023)
print("\nNew York Yankees 2023 Offense Summary:")
for metric, value in yankees_summary.items():
print(f" {metric}: {value:.2f}" if isinstance(value, float) else f" {metric}: {value}")
R Data Access with baseballr
library(baseballr)
library(tidyverse)
library(lubridate)
# Fetch FanGraphs batting leaderboard
get_batting_leaders <- function(start_season = 2023, end_season = 2023, min_pa = 100) {
# Get batting data from FanGraphs
data <- fg_batter_leaders(
startseason = start_season,
endseason = end_season,
qual = min_pa,
ind = 1 # Individual season (0 for aggregate)
)
# Select key columns
result <- data %>%
select(
Name, Team, Age, G, PA, AB, H, HR, R, RBI, SB,
`BB%`, `K%`, ISO, BABIP, AVG, OBP, SLG, wOBA, `wRC+`,
BsR, Off, Def, WAR
)
return(result)
}
# Example: Get 2023 batting leaders
batting_2023 <- get_batting_leaders(2023, 2023, 400)
# Display top hitters
top_hitters <- batting_2023 %>%
arrange(desc(`wRC+`)) %>%
head(10) %>%
select(Name, Team, PA, wOBA, `wRC+`, WAR)
cat("Top 10 Hitters by wRC+ (2023):\n")
print(top_hitters)
# Fetch pitching statistics
get_pitching_leaders <- function(start_season = 2023, end_season = 2023, min_ip = 50) {
data <- fg_pitcher_leaders(
startseason = start_season,
endseason = end_season,
qual = min_ip,
ind = 1
)
result <- data %>%
select(
Name, Team, Age, W, L, SV, G, GS, IP,
`K/9`, `BB/9`, `HR/9`, BABIP, `LOB%`, `GB%`, `HR/FB`,
ERA, FIP, xFIP, SIERA, `K%`, `BB%`, WAR
)
return(result)
}
# Example: Get 2023 pitching leaders
pitching_2023 <- get_pitching_leaders(2023, 2023, 100)
top_pitchers <- pitching_2023 %>%
arrange(desc(WAR)) %>%
head(10) %>%
select(Name, Team, IP, ERA, FIP, WAR)
cat("\nTop 10 Pitchers by WAR (2023):\n")
print(top_pitchers)
# Get player splits
get_player_splits <- function(playerid, year) {
# Fetch splits data for a player
# Note: baseballr functions for splits vary by version
splits <- fg_batter_game_logs(
playerid = playerid,
year = year
)
return(splits)
}
# Analyze team performance
analyze_team_offense <- function(team_abbrev, year) {
# Get all batting data
all_batting <- get_batting_leaders(year, year, min_pa = 0)
# Filter to team
team_batting <- all_batting %>%
filter(Team == team_abbrev)
# Calculate team summary
team_summary <- team_batting %>%
summarise(
total_war = sum(WAR, na.rm = TRUE),
avg_wrc_plus = mean(`wRC+`, na.rm = TRUE),
total_hr = sum(HR, na.rm = TRUE),
team_woba = mean(wOBA, na.rm = TRUE),
player_count = n()
)
return(list(
players = team_batting,
summary = team_summary
))
}
# Example: Analyze Yankees offense
yankees_2023 <- analyze_team_offense("NYY", 2023)
cat("\nNew York Yankees 2023 Offense Summary:\n")
print(yankees_2023$summary)
# Working with projection data
compare_projections <- function(player_name, year = 2024) {
# Conceptual framework for comparing projections
# Actual implementation requires scraping FanGraphs projection pages
# Create example projection comparison
projections <- tibble(
System = c('ZiPS', 'Steamer', 'ATC', 'THE BAT'),
PA = c(550, 560, 555, 565),
HR = c(28, 25, 27, 30),
R = c(85, 82, 84, 88),
RBI = c(82, 78, 80, 85),
SB = c(12, 10, 11, 13),
AVG = c(.275, .268, .272, .278),
wOBA = c(.350, .342, .346, .355),
WAR = c(4.2, 3.8, 4.0, 4.5)
)
return(projections)
}
# Calculate average projection across systems
avg_projection <- compare_projections("Example Player") %>%
summarise(across(where(is.numeric), mean)) %>%
mutate(System = "Average")
cat("\nProjection System Comparison:\n")
print(compare_projections("Example Player"))
# Advanced analysis: Identify breakout candidates
identify_breakout_candidates <- function(year) {
# Get current year and prior year data
current <- get_batting_leaders(year, year, 250)
prior <- get_batting_leaders(year - 1, year - 1, 250)
# Join datasets
comparison <- current %>%
select(Name, Team, Age, wRC_current = `wRC+`, WAR_current = WAR) %>%
inner_join(
prior %>% select(Name, wRC_prior = `wRC+`, WAR_prior = WAR),
by = "Name"
) %>%
mutate(
wrc_improvement = wRC_current - wRC_prior,
war_improvement = WAR_current - WAR_prior
) %>%
filter(
wrc_improvement > 20, # 20+ point wRC+ improvement
Age <= 26 # Focus on young players
) %>%
arrange(desc(wrc_improvement))
return(comparison)
}
# Example: Find 2023 breakout players
breakouts_2023 <- identify_breakout_candidates(2023)
cat("\n2023 Breakout Candidates (wRC+ improvement):\n")
print(head(breakouts_2023, 10))
Comparing FanGraphs with Baseball Reference
FanGraphs and Baseball Reference are the two most popular baseball statistics websites. While they overlap significantly, each offers unique features and perspectives that make them complementary resources.
Key Differences
| Feature | FanGraphs | Baseball Reference |
|---|---|---|
| WAR Calculation | fWAR (FIP-based for pitchers) | bWAR (RA9-based for pitchers) |
| Focus | Advanced metrics, projections, modern sabermetrics | Historical context, traditional stats, comprehensive archives |
| Defensive Metrics | UZR, DRS blend | DRS, TZR |
| Interface | Modern, leaderboard-focused | Traditional, player page-focused |
| Plate Discipline Data | Extensive (O-Swing%, Z-Swing%, etc.) | Limited |
| Batted Ball Data | Comprehensive (GB%, FB%, Hard%, etc.) | Basic |
| Projections | Multiple systems (ZiPS, Steamer, ATC, THE BAT) | Limited projection access |
| Play Index | Limited search tools | Powerful Play Index for historical queries |
| Articles | Daily sabermetric analysis and research | Minimal editorial content |
| Historical Data | Complete but less contextual | Complete with rich historical context |
| Minor Leagues | Comprehensive coverage | Basic coverage |
When to Use Each Site
Use FanGraphs for:
- Modern player evaluation using advanced metrics
- Projecting future performance
- Analyzing plate discipline and batted ball profiles
- Comparing projection systems
- Understanding pitching with defense-independent metrics
- Reading analytical articles and sabermetric research
- Exporting leaderboard data for analysis
- Minor league player evaluation
Use Baseball Reference for:
- Historical research and career comparisons across eras
- Comprehensive player pages with complete career statistics
- Play Index for complex historical queries
- Game logs and play-by-play data
- Traditional statistics and counting stats
- Awards, transactions, and biographical information
- Actual run prevention evaluation (RA9-WAR)
- Similarity scores and Hall of Fame statistics
Ideal Approach: Use both sites for comprehensive analysis. Start with FanGraphs for modern metrics and projections, then verify with Baseball Reference's historical context and actual results. The different WAR calculations provide useful bounds on player value - truth often lies between fWAR and bWAR.
FanGraphs Statistics Glossary
Offensive Statistics
| Stat | Full Name | Description | League Average |
|---|---|---|---|
| wOBA | Weighted On-Base Average | Overall offensive value with proper weighting | ~.320 |
| wRC+ | Weighted Runs Created Plus | Park and league-adjusted offensive value | 100 |
| ISO | Isolated Power | Raw power measure (SLG - AVG) | ~.140 |
| BABIP | Batting Average on Balls In Play | BA excluding HR and K | ~.300 |
| BB% | Walk Percentage | Walks per plate appearance | ~8.5% |
| K% | Strikeout Percentage | Strikeouts per plate appearance | ~22% |
| O-Swing% | Outside Swing Percentage | Swings on pitches outside zone | ~30% |
| Z-Swing% | Zone Swing Percentage | Swings on pitches in zone | ~67% |
| Contact% | Contact Percentage | Contact made on swings | ~78% |
| Hard% | Hard Hit Percentage | Percentage of hard-hit balls | ~35% |
| Barrel% | Barrel Percentage | Optimal contact (EV + LA combination) | ~6-7% |
| BsR | Base Running Runs | Runs contributed by base running | 0 |
| Off | Offensive Runs | Batting runs above average | 0 |
| Def | Defensive Runs | Fielding runs above average | 0 |
Pitching Statistics
| Stat | Full Name | Description | League Average |
|---|---|---|---|
| FIP | Fielding Independent Pitching | ERA estimator using K, BB, HR only | ~4.00 |
| xFIP | Expected FIP | FIP with normalized HR/FB rate | ~4.00 |
| SIERA | Skill-Interactive ERA | Advanced ERA estimator | ~4.00 |
| K/9 | Strikeouts per 9 Innings | Strikeout rate | ~8.5 |
| BB/9 | Walks per 9 Innings | Walk rate | ~3.0 |
| K% | Strikeout Percentage | Strikeouts per batter faced | ~22% |
| BB% | Walk Percentage | Walks per batter faced | ~8% |
| K-BB% | Strikeout Minus Walk Percentage | Net K vs BB rate | ~14% |
| LOB% | Left On Base Percentage | Strand rate for runners | ~72% |
| GB% | Ground Ball Percentage | Ground balls per ball in play | ~45% |
| FB% | Fly Ball Percentage | Fly balls per ball in play | ~35% |
| HR/FB | Home Run per Fly Ball | Home runs as % of fly balls | ~10-11% |
| WHIP | Walks + Hits per Inning Pitched | Base runners allowed per inning | ~1.30 |
| Soft% | Soft Contact Percentage | Weakly hit balls | ~20% |
| Hard% | Hard Contact Percentage | Hard hit balls allowed | ~35% |
Value and Contextual Statistics
| Stat | Full Name | Description | Scale |
|---|---|---|---|
| WAR | Wins Above Replacement | Total player value in wins | 0 = replacement, 2 = average, 5 = All-Star, 8+ = MVP |
| WPA | Win Probability Added | Impact on game win probability | Sum to team wins - losses |
| RE24 | Run Expectancy based on 24 base-out states | Runs added by changing game states | 0 = average |
| REW | Run Expectancy Wins | RE24 converted to wins | Similar to WPA |
| LI | Leverage Index | Game situation importance | 1.0 = average, 2.0 = 2x pressure |
| Clutch | Clutch Score | Performance in high-leverage situations | 0 = neutral, positive = clutch |
| Dollars | Dollar Value | Estimated market value in $ | Based on $/WAR conversion |
Best Practices and Tips
Effective FanGraphs Usage
- Use Custom Date Ranges: Analyze recent performance (last 30 days, second half) to identify trends and changes in approach or skill level.
- Compare Multiple Metrics: Don't rely on single statistics. Cross-reference wOBA with wRC+, ISO, and plate discipline metrics for complete picture.
- Check Sample Sizes: Small samples create noise. Require 200+ PA for batting, 50+ IP for pitching before drawing conclusions.
- Use FIP Family for Pitchers: FIP, xFIP, and SIERA provide better predictive power than ERA for evaluating pitching talent.
- Investigate Discrepancies: Large gaps between ERA and FIP suggest regression coming. High BABIP with low Hard% indicates bad luck.
- Respect Projection Ranges: Point estimates are less useful than understanding uncertainty ranges around projections.
- Export for Analysis: Download CSV files for statistical modeling, visualization, and custom analysis in Python/R.
- Read the Glossary: FanGraphs provides detailed explanations of every metric - understand what you're measuring.
Common Pitfalls to Avoid
- Overvaluing Wins and RBI: These heavily context-dependent stats poorly measure individual value.
- Ignoring Defense: Defensive value is huge - a +15 run defender is worth ~1.5 WAR.
- Treating WAR as Exact: WAR is an estimate with uncertainty. A 4.2 WAR player isn't meaningfully better than 3.9.
- Cherry-Picking Metrics: Don't select the single stat that supports your narrative - use comprehensive evaluation.
- Misunderstanding FIP: FIP predicts future ERA but isn't necessarily "better" than actual ERA for past evaluation.
- Ignoring Batted Ball Data: Exit velocity, launch angle, and barrel rate reveal skills traditional stats miss.
- Forgetting Context: Park effects, league differences, and era matter. Use wRC+ and ERA+ for fair comparisons.
Key Takeaways
- FanGraphs is the premier source for modern baseball analytics, offering comprehensive statistics, advanced metrics, projection systems, and data export capabilities.
- Understanding core FanGraphs metrics like wOBA, wRC+, FIP, and WAR is essential for modern player evaluation and analysis.
- The difference between fWAR (FIP-based) and bWAR (runs-based) reflects different philosophies about pitcher evaluation - both provide value.
- FanGraphs excels at plate discipline data, batted ball profiles, and predictive metrics that reveal underlying skills beyond results.
- Multiple projection systems (ZiPS, Steamer, ATC, THE BAT) provide different perspectives on future performance - ensemble approaches work best.
- PyBaseball (Python) and baseballr (R) enable programmatic access to FanGraphs data for statistical analysis and modeling.
- FanGraphs and Baseball Reference are complementary - FanGraphs for predictive analysis and modern metrics, Baseball Reference for historical context.
- Effective FanGraphs usage requires understanding sample size, metric limitations, and using multiple statistics for comprehensive evaluation.
- Data export capabilities make FanGraphs invaluable for research, creating a bridge between web-based exploration and advanced statistical analysis.
- Regular engagement with FanGraphs articles and glossary entries deepens understanding of sabermetric principles and analytical best practices.