BABIP (Batting Average on Balls in Play)

Intermediate 10 min read 0 views Nov 26, 2025

BABIP Explained: Batting Average on Balls in Play

Batting Average on Balls in Play (BABIP) is one of the most insightful statistics in baseball analytics. While traditional batting average tells us how often a batter gets a hit, BABIP isolates what happens when the ball is put into play, excluding strikeouts and home runs.

The BABIP Formula

BABIP = (H - HR) / (AB - K - HR + SF)

Where:

  • H = Hits
  • HR = Home Runs
  • AB = At Bats
  • K = Strikeouts
  • SF = Sacrifice Flies

What Affects BABIP?

Speed and Athleticism

Faster players consistently maintain higher BABIPs (around .330-.340) because they beat out infield grounders. Elite speedsters like Trea Turner and Bobby Witt Jr. regularly post BABIPs 20-30 points above league average.

Quality of Contact: Line Drive Rate

Line drives have the highest BABIP (~.700), followed by ground balls (~.240), then fly balls (~.200). Players who consistently hit line drives (22%+ rate) sustain higher BABIPs.

Batted Ball Authority

Hard contact generates higher BABIPs. Balls hit 95+ mph give defenders less reaction time.

Luck and Random Variation

A significant portion of BABIP is influenced by luck—well-struck balls sometimes find gloves, while weak pop-ups occasionally drop.

League Average and Regression

League average BABIP: ~.300

BABIP RangeInterpretationAction
.370+Extreme high - LuckyExpect regression down
.330-.370High - Speed/skill or luckEvaluate underlying skills
.280-.320Normal rangeSustainable
.250-.280Low - Unlucky or weak contactExpect regression up
Below .250Extreme lowAlmost certainly unlucky

BABIP by Batted Ball Type

Batted Ball TypeBABIP
Line Drives.690 - .720
Ground Balls.230 - .250
Fly Balls (non-HR).180 - .220
Pop-ups.020 - .050

Python Implementation

from pybaseball import batting_stats, pitching_stats
import pandas as pd
import matplotlib.pyplot as plt

# Get batting data
batting = batting_stats(2024, qual=300)

# Calculate expected BABIP based on batted ball profile
batting['xBABIP'] = batting['LD%'] * 0.70 + batting['GB%'] * 0.24 + batting['FB%'] * 0.20
batting['BABIP_diff'] = batting['BABIP'] - batting['xBABIP']

# Find regression candidates
print("Positive Regression Candidates (Unlucky):")
unlucky = batting.nsmallest(10, 'BABIP_diff')[['Name', 'BABIP', 'xBABIP', 'BABIP_diff']]
print(unlucky.to_string(index=False))

print("\nNegative Regression Candidates (Lucky):")
lucky = batting.nlargest(10, 'BABIP_diff')[['Name', 'BABIP', 'xBABIP', 'BABIP_diff']]
print(lucky.to_string(index=False))

# BABIP vs Line Drive Rate
plt.figure(figsize=(10, 6))
plt.scatter(batting['LD%'], batting['BABIP'], alpha=0.6)
plt.xlabel('Line Drive %')
plt.ylabel('BABIP')
plt.title('BABIP vs Line Drive Rate')
plt.axhline(y=0.300, color='red', linestyle='--', label='League Avg')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('babip_vs_ld.png', dpi=300)
plt.show()

# Pitcher BABIP analysis
pitching = pitching_stats(2024, qual=100)
print("\nPitcher BABIP Summary:")
print(f"Mean: {pitching['BABIP'].mean():.3f}")
print(f"Std Dev: {pitching['BABIP'].std():.3f}")

# Low BABIP pitchers (expect regression up)
print("\nLowest BABIP Pitchers:")
low_babip = pitching.nsmallest(10, 'BABIP')[['Name', 'BABIP', 'ERA', 'FIP']]
print(low_babip.to_string(index=False))

R Implementation

library(baseballr)
library(dplyr)
library(ggplot2)

# Get batting data
batting <- fg_batter_leaders(2024, 2024, qual = 300)

# Calculate expected BABIP
batting <- batting %>%
  mutate(
    xBABIP = `LD%` * 0.70 + `GB%` * 0.24 + `FB%` * 0.20,
    BABIP_diff = BABIP - xBABIP
  )

# Unlucky players
cat("Positive Regression Candidates (Unlucky):\n")
unlucky <- batting %>%
  arrange(BABIP_diff) %>%
  select(Name, BABIP, xBABIP, BABIP_diff) %>%
  head(10)
print(unlucky)

# Lucky players
cat("\nNegative Regression Candidates (Lucky):\n")
lucky <- batting %>%
  arrange(desc(BABIP_diff)) %>%
  select(Name, BABIP, xBABIP, BABIP_diff) %>%
  head(10)
print(lucky)

# Visualization
ggplot(batting, aes(x = `LD%`, y = BABIP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0.300, color = "red", linetype = "dashed") +
  labs(title = "BABIP vs Line Drive Rate", x = "Line Drive %", y = "BABIP") +
  theme_minimal()

ggsave("babip_vs_ld.png", width = 10, height = 6, dpi = 300)

# Pitcher analysis
pitching <- fg_pitch_leaders(2024, 2024, qual = 100)

cat("\nPitcher BABIP Summary:\n")
cat(sprintf("Mean: %.3f\n", mean(pitching$BABIP, na.rm = TRUE)))
cat(sprintf("Std Dev: %.3f\n", sd(pitching$BABIP, na.rm = TRUE)))

# Low BABIP pitchers
cat("\nLowest BABIP Pitchers:\n")
low_babip <- pitching %>%
  arrange(BABIP) %>%
  select(Name, BABIP, ERA, FIP) %>%
  head(10)
print(low_babip)

Practical Applications

For Hitters

  • Buy-low candidates: Players with low BABIP but good batted ball metrics
  • Sell-high opportunities: Players with high BABIP and mediocre contact quality
  • Breakout vs fluke: Check if improved BABIP is backed by better contact

For Pitchers

  • Limited control: Pitchers have minimal BABIP control compared to K/BB/HR
  • Regression indicator: Extreme BABIPs almost always regress toward .290-.300
  • Defense matters: Good defenses can sustain slightly lower BABIPs

Key Takeaways

  • League average is ~.300: Extreme values usually regress
  • Speed matters: Fast players can sustain .320-.340 BABIP
  • Line drives are key: High LD% = sustainable high BABIP
  • Pitchers have little control: Use FIP/xFIP instead for pitcher evaluation
  • Sample size matters: Need 400+ balls in play for reliable BABIP

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.