Case Study 2: Golf Tournament Prediction with Strokes Gained and Course Fit

Overview

In this case study, we build a Monte Carlo golf tournament simulator that uses the strokes gained framework to project individual golfer performance, adjusts for course-specific demands using a course fit model, and produces probability estimates for all major betting markets: outright winner, top-5/10/20 finishes, and make/miss cut. We demonstrate how course fit analysis can identify golfers whose true probability of success differs substantially from the market, creating actionable betting value in one of the most inefficient sports betting markets.

The Problem

Golf tournament prediction is fundamentally different from team sport prediction because: (1) fields contain 144 players, making the probability of any single outcome small; (2) round-to-round variance is enormous -- a golfer who shoots 65 on Thursday may shoot 74 on Friday; (3) different courses demand dramatically different skill profiles, meaning a golfer's expected performance varies substantially from week to week. A naive model that uses overall strokes gained as the sole predictor ignores the course fit dimension and will systematically misprice golfers whose skill profiles are mismatched (positively or negatively) with the specific course.

The Strokes Gained Framework

We decompose each golfer's ability into four components: SG:OTT (driving), SG:APP (approaches), SG:ARG (short game), and SG:PUTT (putting). Each course is characterized by a weight profile that reflects which skills matter most. The course-fit-adjusted projection applies these weights to produce a venue-specific expected performance.

Implementation

"""
Golf Tournament Prediction with Strokes Gained and Course Fit
Monte Carlo simulator for tournament outcome probabilities.
"""

import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple


@dataclass
class Golfer:
    """Professional golfer with strokes gained profile.

    Attributes:
        name: Golfer name.
        sg_ott: Strokes Gained: Off the Tee.
        sg_app: Strokes Gained: Approach.
        sg_arg: Strokes Gained: Around the Green.
        sg_putt: Strokes Gained: Putting.
        round_std: Standard deviation of round score vs field.
        world_ranking: OWGR ranking.
        recent_form: Recent performance adjustment.
    """

    name: str
    sg_ott: float = 0.0
    sg_app: float = 0.0
    sg_arg: float = 0.0
    sg_putt: float = 0.0
    round_std: float = 2.8
    world_ranking: int = 100
    recent_form: float = 0.0

    @property
    def sg_total(self) -> float:
        """Total strokes gained per round."""
        return self.sg_ott + self.sg_app + self.sg_arg + self.sg_putt


@dataclass
class Course:
    """Course profile as SG component weights.

    Attributes:
        name: Course name.
        ott_weight: Importance of driving.
        app_weight: Importance of approach play.
        arg_weight: Importance of short game.
        putt_weight: Importance of putting.
        par: Course par.
    """

    name: str
    ott_weight: float = 0.25
    app_weight: float = 0.30
    arg_weight: float = 0.22
    putt_weight: float = 0.23
    par: int = 72


class GolfSimulator:
    """Monte Carlo golf tournament simulator.

    Simulates complete tournaments including cuts using course-fit
    adjusted golfer projections and realistic round-to-round variance.

    Args:
        course: Course profile for the tournament.
        golfers: List of golfers in the field.
        n_sims: Number of Monte Carlo simulations.
        n_rounds: Number of rounds.
        cut_after: Round after which the cut is applied.
        cut_top_n: Number of golfers making the cut.
        round_correlation: Round-to-round performance correlation.
        seed: Random seed for reproducibility.
    """

    def __init__(
        self,
        course: Course,
        golfers: List[Golfer],
        n_sims: int = 50000,
        n_rounds: int = 4,
        cut_after: int = 2,
        cut_top_n: int = 65,
        round_correlation: float = 0.08,
        seed: Optional[int] = None,
    ):
        self.course = course
        self.golfers = golfers
        self.n_sims = n_sims
        self.n_rounds = n_rounds
        self.cut_after = cut_after
        self.cut_top_n = cut_top_n
        self.round_corr = round_correlation
        self.rng = np.random.default_rng(seed)

    def course_fit_score(self, golfer: Golfer) -> float:
        """Calculate course-fit-adjusted expected strokes gained.

        Weights the four SG components by the course profile and blends
        with raw SG:Total to avoid overfitting to noisy course weights.

        Args:
            golfer: Golfer profile.

        Returns:
            Course-adjusted expected SG per round.
        """
        weights = np.array([
            self.course.ott_weight, self.course.app_weight,
            self.course.arg_weight, self.course.putt_weight,
        ])
        weights = weights / weights.sum()

        components = np.array([
            golfer.sg_ott, golfer.sg_app,
            golfer.sg_arg, golfer.sg_putt,
        ])

        course_fit = np.sum(weights * components) * 4
        blended = 0.6 * course_fit + 0.4 * golfer.sg_total
        return blended + golfer.recent_form

    def simulate(self) -> Dict[str, Dict]:
        """Run the full Monte Carlo tournament simulation.

        Returns:
            Dictionary mapping golfer names to outcome probabilities.
        """
        n = len(self.golfers)
        expected_sg = np.array([self.course_fit_score(g) for g in self.golfers])
        round_stds = np.array([g.round_std for g in self.golfers])

        wins = np.zeros(n)
        top_5 = np.zeros(n)
        top_10 = np.zeros(n)
        top_20 = np.zeros(n)
        made_cut = np.zeros(n)
        total_finish = np.zeros(n)

        for _ in range(self.n_sims):
            cumulative = np.zeros(n)
            form = self.rng.normal(0, 0.5, n)
            cut_mask = np.ones(n, dtype=bool)

            for rnd in range(self.n_rounds):
                if rnd == self.cut_after and self.cut_after < self.n_rounds:
                    order = np.argsort(cumulative)
                    cut_mask = np.zeros(n, dtype=bool)
                    cut_mask[order[:self.cut_top_n]] = True

                noise = self.rng.normal(0, 1, n) * round_stds
                scores = -(expected_sg + form * self.round_corr + noise)

                if rnd >= self.cut_after:
                    scores[~cut_mask] = np.inf
                    cumulative[~cut_mask] = np.inf

                cumulative += scores

            if self.cut_after < self.n_rounds:
                made_cut += cut_mask.astype(float)
            else:
                made_cut += 1.0

            valid = cumulative < np.inf
            rankings = np.full(n, n)
            if valid.any():
                valid_scores = cumulative[valid]
                order = np.argsort(valid_scores)
                ranks = np.empty_like(order)
                ranks[order] = np.arange(1, len(order) + 1)
                rankings[valid] = ranks

            wins += (rankings == 1).astype(float)
            top_5 += (rankings <= 5).astype(float)
            top_10 += (rankings <= 10).astype(float)
            top_20 += (rankings <= 20).astype(float)
            total_finish += rankings

        results = {}
        for i, g in enumerate(self.golfers):
            results[g.name] = {
                "course_fit_sg": round(expected_sg[i], 3),
                "win_prob": round(wins[i] / self.n_sims, 4),
                "top_5": round(top_5[i] / self.n_sims, 4),
                "top_10": round(top_10[i] / self.n_sims, 4),
                "top_20": round(top_20[i] / self.n_sims, 4),
                "make_cut": round(made_cut[i] / self.n_sims, 4),
                "avg_finish": round(total_finish[i] / self.n_sims, 1),
            }
        return results


def create_sample_field() -> List[Golfer]:
    """Create a realistic 30-golfer tournament field."""
    return [
        Golfer("Scheffler", sg_ott=0.8, sg_app=1.2, sg_arg=0.4, sg_putt=0.3,
               round_std=2.5, world_ranking=1),
        Golfer("McIlroy", sg_ott=1.0, sg_app=0.9, sg_arg=0.2, sg_putt=0.1,
               round_std=2.7, world_ranking=3),
        Golfer("Rahm", sg_ott=0.7, sg_app=0.8, sg_arg=0.5, sg_putt=0.4,
               round_std=2.6, world_ranking=5),
        Golfer("Schauffele", sg_ott=0.6, sg_app=0.7, sg_arg=0.3, sg_putt=0.5,
               round_std=2.6, world_ranking=4),
        Golfer("Morikawa", sg_ott=0.3, sg_app=1.1, sg_arg=0.3, sg_putt=0.0,
               round_std=2.8, world_ranking=8),
        Golfer("Hovland", sg_ott=0.5, sg_app=0.8, sg_arg=-0.1, sg_putt=0.2,
               round_std=2.9, world_ranking=10),
        Golfer("Cantlay", sg_ott=0.3, sg_app=0.6, sg_arg=0.4, sg_putt=0.4,
               round_std=2.7, world_ranking=12),
        Golfer("Clark", sg_ott=0.9, sg_app=0.5, sg_arg=0.1, sg_putt=0.1,
               round_std=2.8, world_ranking=15),
        Golfer("Homa", sg_ott=0.5, sg_app=0.5, sg_arg=0.2, sg_putt=0.3,
               round_std=3.0, world_ranking=18),
        Golfer("Fleetwood", sg_ott=0.4, sg_app=0.6, sg_arg=0.3, sg_putt=0.1,
               round_std=2.8, world_ranking=20),
        Golfer("Lowry", sg_ott=0.2, sg_app=0.4, sg_arg=0.6, sg_putt=0.3,
               round_std=2.9, world_ranking=22),
        Golfer("Fitzpatrick", sg_ott=0.1, sg_app=0.7, sg_arg=0.4, sg_putt=0.2,
               round_std=2.8, world_ranking=25),
    ] + [
        Golfer(f"Golfer_{i}", sg_ott=np.random.normal(0, 0.3),
               sg_app=np.random.normal(0, 0.3),
               sg_arg=np.random.normal(0, 0.2),
               sg_putt=np.random.normal(0, 0.2),
               round_std=np.random.uniform(2.7, 3.3),
               world_ranking=30 + i)
        for i in range(18)
    ]


def compare_courses(golfers: List[Golfer]) -> None:
    """Compare predictions across two different course profiles."""
    augusta = Course("Augusta National", 0.18, 0.35, 0.28, 0.19)
    torrey = Course("Torrey Pines", 0.30, 0.30, 0.18, 0.22)

    print("\nSimulating Augusta National (approach + short game emphasis)...")
    sim_a = GolfSimulator(augusta, golfers, n_sims=50000, seed=42)
    results_a = sim_a.simulate()

    print("Simulating Torrey Pines (driving emphasis)...")
    sim_t = GolfSimulator(torrey, golfers, n_sims=50000, seed=42)
    results_t = sim_t.simulate()

    named = [g for g in golfers if not g.name.startswith("Golfer_")]

    print(f"\n  {'Golfer':<15} {'Augusta Win%':>12} {'Torrey Win%':>12} {'Diff':>8}")
    print(f"  {'-'*15} {'-'*12} {'-'*12} {'-'*8}")
    for g in sorted(named, key=lambda x: -results_a[x.name]["win_prob"]):
        ra = results_a[g.name]
        rt = results_t[g.name]
        diff = rt["win_prob"] - ra["win_prob"]
        print(f"  {g.name:<15} {ra['win_prob']:>11.1%} {rt['win_prob']:>11.1%} "
              f"{diff:>+7.1%}")

    print(f"\n  Biggest course fit effects:")
    diffs = [(g.name, results_t[g.name]["win_prob"] - results_a[g.name]["win_prob"])
             for g in named]
    diffs.sort(key=lambda x: abs(x[1]), reverse=True)
    for name, d in diffs[:5]:
        direction = "better at Torrey" if d > 0 else "better at Augusta"
        print(f"    {name}: {abs(d):.1%} probability swing ({direction})")


def identify_value(
    results: Dict[str, Dict],
    market_odds: Dict[str, int],
) -> List[Dict]:
    """Compare model predictions to market odds for value identification."""
    value = []
    for name, odds in market_odds.items():
        if name not in results:
            continue
        model_prob = results[name]["win_prob"]
        if odds > 0:
            implied = 100 / (odds + 100)
            decimal = 1 + odds / 100
        else:
            implied = abs(odds) / (abs(odds) + 100)
            decimal = 1 + 100 / abs(odds)

        ev = model_prob * decimal - 1
        edge = model_prob - implied

        if edge > 0.005:
            value.append({
                "golfer": name, "model_prob": model_prob,
                "implied_prob": round(implied, 4),
                "odds": odds, "edge": round(edge, 4),
                "ev": round(ev, 4),
            })

    return sorted(value, key=lambda x: -x["ev"])


def main() -> None:
    """Run the golf tournament prediction case study."""
    print("=" * 70)
    print("Case Study: Golf Tournament Prediction with Course Fit")
    print("=" * 70)

    np.random.seed(42)
    golfers = create_sample_field()

    augusta = Course("Augusta National", 0.18, 0.35, 0.28, 0.19)
    sim = GolfSimulator(augusta, golfers, n_sims=50000, seed=42)

    print(f"\nSimulating Augusta National ({sim.n_sims} iterations)...")
    results = sim.simulate()

    named = [g for g in golfers if not g.name.startswith("Golfer_")]
    print(f"\n  {'Golfer':<15} {'CF SG':>7} {'Win%':>7} {'Top5':>7} "
          f"{'Top10':>7} {'Cut%':>7} {'AvgFin':>7}")
    print(f"  {'-'*15} {'-'*7} {'-'*7} {'-'*7} {'-'*7} {'-'*7} {'-'*7}")
    for g in sorted(named, key=lambda x: -results[x.name]["win_prob"]):
        r = results[g.name]
        print(f"  {g.name:<15} {r['course_fit_sg']:>7.3f} {r['win_prob']:>6.1%} "
              f"{r['top_5']:>6.1%} {r['top_10']:>6.1%} "
              f"{r['make_cut']:>6.1%} {r['avg_finish']:>7.1f}")

    # Course fit comparison
    print("\n" + "=" * 70)
    print("Course Fit Comparison: Augusta vs Torrey Pines")
    print("=" * 70)
    compare_courses(golfers)

    # Value identification
    print("\n" + "=" * 70)
    print("Betting Value Identification")
    print("=" * 70)
    market = {
        "Scheffler": 500, "McIlroy": 800, "Rahm": 1000,
        "Schauffele": 1200, "Morikawa": 1400, "Hovland": 2000,
        "Cantlay": 2500, "Clark": 3000, "Lowry": 4000,
        "Fitzpatrick": 5000,
    }
    value_bets = identify_value(results, market)
    if value_bets:
        print(f"\n  {'Golfer':<15} {'Model':>7} {'Implied':>8} {'Edge':>7} {'EV':>7}")
        for v in value_bets:
            print(f"  {v['golfer']:<15} {v['model_prob']:>6.1%} "
                  f"{v['implied_prob']:>7.1%} {v['edge']:>+6.1%} {v['ev']:>+6.1%}")


if __name__ == "__main__":
    main()

Results and Discussion

The simulation reveals substantial course fit effects. At Augusta National (which emphasizes approach play and short game), golfers like Scheffler (strong SG:APP of +1.2) and Rahm (strong SG:ARG of +0.5) outperform their overall SG ranking. At Torrey Pines (which emphasizes driving), McIlroy (SG:OTT of +1.0) and Clark (SG:OTT of +0.9) see their win probabilities increase meaningfully.

The probability swings from course fit can be substantial. A golfer might have a 6% win probability at one course but only 3% at another with a different skill-demand profile. In a 144-player field where the favorite is at 10-15%, a 3-percentage-point swing represents a meaningful difference in the probability distribution.

Betting Implications

Golf outright winner markets are among the most inefficient precisely because these course fit dynamics are difficult for sportsbooks to fully incorporate across every golfer in every field. The model identifies value when a golfer's course-fit-adjusted probability exceeds the market-implied probability after vig removal. The most consistent edges appear for: (1) golfers with extreme skill profiles (very strong in one area, weak in another) at courses that heavily weight their strong area; (2) relative unknowns (ranked 30-80) at courses that happen to suit their game; (3) derivative markets (top-5, top-10, make cut) where the sportsbook's pricing model does not fully account for course fit.

Limitations

The simulation uses estimated strokes gained profiles rather than real data. A production system would pull from PGA Tour ShotLink data or Data Golf. The round-to-round correlation model is simplified; in practice, weather interruptions, course setup changes, and psychological factors (pressure in contention) affect late-round performance. Field strength adjustment is not implemented here but would be essential when comparing predictions across tournaments.