Case Study 2: MMA Fight Prediction System with Style Matchup and Physical Attributes

Overview

In this case study, we build a comprehensive MMA fight prediction system that combines three layers of analysis: Elo-based ratings for baseline skill estimation, style matchup adjustments that capture the "styles make fights" phenomenon, and physical attribute modeling that accounts for reach, age, weight cuts, and chin deterioration. We process a realistic sequence of UFC fights, generate predictions for upcoming matchups, and evaluate how each component contributes to predictive accuracy.

The Problem

Predicting MMA outcomes is uniquely challenging among sports. Fighters compete infrequently (2-3 bouts per year), meaning rating systems have few data points to work with. The outcome of any fight depends on the interaction between two specific skill profiles: a wrestler's dominance against strikers does not predict their performance against submission specialists. Physical attributes play a larger role than in most sports, with reach advantages, age-related decline, and the accumulated damage reflected in chin deterioration all creating measurable effects.

A model that uses only Elo ratings will miss the style-dependent variance in outcomes. A model that adds style matchups but ignores physical attributes will miss the slow degradation of aging fighters or the significance of a five-inch reach advantage. Our goal is to build a system that integrates all three layers into a single prediction, quantifying how much each contributes.

Data Requirements

Our system requires three categories of data for each fighter. The rating data consists of fight history including opponents, results, and method of victory. The style data consists of career statistics: significant strikes per minute, takedown average per 15 minutes, submission attempts per 15 minutes, strike defense percentage, and takedown defense percentage. The physical data consists of age, height, reach, weight class, walk-around weight, career fights, KO/TKO losses, and total significant strikes absorbed.

Implementation

"""
MMA Fight Prediction System
Integrates Elo ratings, style matchup adjustments, and physical attributes.
"""

import math
import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple
from datetime import date


@dataclass
class MMAFighter:
    """Complete fighter profile for multi-layer prediction."""

    name: str
    elo: float = 1500.0
    fights: int = 0
    last_fight: Optional[date] = None

    # Style statistics
    sig_strikes_per_min: float = 4.0
    takedown_avg: float = 1.5
    submission_avg: float = 0.5
    sig_strike_defense: float = 0.55
    takedown_defense: float = 0.65

    # Physical attributes
    age: float = 28.0
    height_inches: float = 72.0
    reach_inches: float = 74.0
    weight_class_lbs: float = 170.0
    walk_around_weight_lbs: float = 190.0
    ko_tko_losses: int = 0
    total_sig_strikes_absorbed: int = 0

    # Derived
    style: str = ""
    prediction_history: List[Dict] = field(default_factory=list)


class MMAFightPredictor:
    """
    Multi-layer MMA fight prediction system.

    Combines Elo rating, style matchup matrix, and physical attribute
    adjustments into a single calibrated win probability.

    Args:
        base_k: Base K-factor for Elo updates.
        new_fighter_k_mult: K-factor multiplier for fighters with < 5 fights.
        reach_coeff: Probability adjustment per inch of reach beyond threshold.
        reach_threshold: Minimum reach difference for adjustment to apply.
        age_decline_rate: Decline rate per year past peak age window.
        weight_cut_threshold: Fraction of walk-around weight defining severe cut.
        weight_cut_penalty: Penalty per 5% excess cut.
        ko_vuln_per_loss: Vulnerability increase per KO/TKO loss.
        style_adjustment_weight: Scaling factor for matchup matrix adjustments.
    """

    STYLE_ARCHETYPES = [
        "striker", "wrestler", "grappler",
        "balanced", "counter_striker", "pressure_fighter",
    ]

    FINISH_MULTIPLIERS = {
        "ko_tko": 1.25, "submission": 1.20,
        "decision_unanimous": 1.00, "decision_split": 0.85,
        "decision_majority": 0.92, "draw": 0.50,
    }

    DEFAULT_MATCHUP_MATRIX = {
        "striker":          {"striker": 0.0, "wrestler": -0.06, "grappler": -0.04,
                             "balanced": 0.01, "counter_striker": 0.03,
                             "pressure_fighter": -0.02},
        "wrestler":         {"striker": 0.06, "wrestler": 0.0, "grappler": 0.02,
                             "balanced": 0.02, "counter_striker": 0.05,
                             "pressure_fighter": 0.04},
        "grappler":         {"striker": 0.04, "wrestler": -0.02, "grappler": 0.0,
                             "balanced": 0.01, "counter_striker": 0.03,
                             "pressure_fighter": 0.02},
        "balanced":         {"striker": -0.01, "wrestler": -0.02, "grappler": -0.01,
                             "balanced": 0.0, "counter_striker": 0.01,
                             "pressure_fighter": 0.0},
        "counter_striker":  {"striker": -0.03, "wrestler": -0.05, "grappler": -0.03,
                             "balanced": -0.01, "counter_striker": 0.0,
                             "pressure_fighter": -0.04},
        "pressure_fighter": {"striker": 0.02, "wrestler": -0.04, "grappler": -0.02,
                             "balanced": 0.0, "counter_striker": 0.04,
                             "pressure_fighter": 0.0},
    }

    def __init__(
        self,
        base_k: float = 120.0,
        new_fighter_k_mult: float = 1.5,
        reach_coeff: float = 0.012,
        reach_threshold: float = 2.5,
        age_decline_rate: float = 0.025,
        weight_cut_threshold: float = 0.12,
        weight_cut_penalty: float = 0.04,
        ko_vuln_per_loss: float = 0.03,
        style_adjustment_weight: float = 1.0,
    ):
        self.base_k = base_k
        self.new_fighter_k_mult = new_fighter_k_mult
        self.reach_coeff = reach_coeff
        self.reach_threshold = reach_threshold
        self.age_decline_rate = age_decline_rate
        self.weight_cut_threshold = weight_cut_threshold
        self.weight_cut_penalty = weight_cut_penalty
        self.ko_vuln_per_loss = ko_vuln_per_loss
        self.style_adjustment_weight = style_adjustment_weight
        self.matchup_matrix = self.DEFAULT_MATCHUP_MATRIX
        self.fighters: Dict[str, MMAFighter] = {}

    def add_fighter(self, fighter: MMAFighter) -> None:
        """Register a fighter with the system."""
        fighter.style = self._classify_style(fighter)
        self.fighters[fighter.name] = fighter

    def _classify_style(self, fighter: MMAFighter) -> str:
        """Classify fighter into a style archetype."""
        sspm = fighter.sig_strikes_per_min
        td = fighter.takedown_avg
        sub = fighter.submission_avg
        str_def = fighter.sig_strike_defense
        td_def = fighter.takedown_defense

        if td > 3.5 and td_def > 0.70:
            return "wrestler"
        elif sub > 1.5:
            return "grappler"
        elif sspm > 6.0 and str_def < 0.55:
            return "pressure_fighter"
        elif sspm < 3.5 and str_def > 0.62:
            return "counter_striker"
        elif sspm > 5.0:
            return "striker"
        else:
            return "balanced"

    def _elo_probability(self, ra: float, rb: float) -> float:
        """Standard Elo expected score."""
        return 1.0 / (1.0 + 10.0 ** ((rb - ra) / 400.0))

    def _style_adjustment(self, style_a: str, style_b: str) -> float:
        """Get matchup matrix adjustment value."""
        adj = self.matchup_matrix.get(style_a, {}).get(style_b, 0.0)
        return adj * self.style_adjustment_weight

    def _reach_adj(self, fa: MMAFighter, fb: MMAFighter) -> float:
        """Reach-based probability adjustment."""
        diff = fa.reach_inches - fb.reach_inches
        if abs(diff) < self.reach_threshold:
            return 0.0
        effective = abs(diff) - self.reach_threshold
        adj = self.reach_coeff * effective
        return adj if diff > 0 else -adj

    def _age_adj(self, fighter: MMAFighter) -> float:
        """Age-based performance adjustment (0 at peak, negative past peak)."""
        if fighter.age <= 30.0:
            return 0.0
        years_past = fighter.age - 30.0
        return -self.age_decline_rate * years_past ** 1.3

    def _weight_cut_adj(self, fighter: MMAFighter) -> float:
        """Weight cut penalty for severe cuts."""
        cut_pct = (
            (fighter.walk_around_weight_lbs - fighter.weight_class_lbs)
            / fighter.walk_around_weight_lbs
        )
        if cut_pct <= self.weight_cut_threshold:
            return 0.0
        excess = cut_pct - self.weight_cut_threshold
        return -self.weight_cut_penalty * (excess / 0.05)

    def _ko_vulnerability(self, fighter: MMAFighter) -> float:
        """Knockout vulnerability index."""
        base = 0.02
        ko_factor = self.ko_vuln_per_loss * fighter.ko_tko_losses
        strike_factor = 0.01 * fighter.total_sig_strikes_absorbed / 1000
        age_factor = max(0, (fighter.age - 30) * 0.005)
        return base + ko_factor + strike_factor + age_factor

    def predict(self, name_a: str, name_b: str) -> Dict:
        """
        Generate comprehensive fight prediction.

        Combines Elo baseline, style matchup adjustment, and physical
        attribute adjustments into a single probability estimate.
        """
        fa = self.fighters[name_a]
        fb = self.fighters[name_b]

        # Layer 1: Elo baseline
        elo_prob = self._elo_probability(fa.elo, fb.elo)

        # Layer 2: Style matchup (on log-odds scale)
        style_adj = self._style_adjustment(fa.style, fb.style)

        # Layer 3: Physical attributes
        reach_adj = self._reach_adj(fa, fb)
        age_a = self._age_adj(fa)
        age_b = self._age_adj(fb)
        net_age = age_a - age_b
        cut_a = self._weight_cut_adj(fa)
        cut_b = self._weight_cut_adj(fb)
        net_cut = cut_a - cut_b
        ko_a = self._ko_vulnerability(fa)
        ko_b = self._ko_vulnerability(fb)
        net_ko = -(ko_a - ko_b)

        total_physical = reach_adj + net_age + net_cut + net_ko

        # Combine on log-odds scale
        elo_clipped = max(0.01, min(0.99, elo_prob))
        log_odds = math.log(elo_clipped / (1 - elo_clipped))
        adjusted_log_odds = log_odds + style_adj + total_physical
        final_prob = 1.0 / (1.0 + math.exp(-adjusted_log_odds))
        final_prob = max(0.01, min(0.99, final_prob))

        return {
            "fighter_a": name_a,
            "fighter_b": name_b,
            "style_a": fa.style,
            "style_b": fb.style,
            "elo_a": round(fa.elo, 1),
            "elo_b": round(fb.elo, 1),
            "elo_prob_a": round(elo_prob, 4),
            "style_adjustment": round(style_adj, 4),
            "physical_adjustment": round(total_physical, 4),
            "components": {
                "reach": round(reach_adj, 4),
                "net_age": round(net_age, 4),
                "net_weight_cut": round(net_cut, 4),
                "net_ko_vulnerability": round(net_ko, 4),
            },
            "final_prob_a": round(final_prob, 4),
            "final_prob_b": round(1 - final_prob, 4),
        }

    def update_after_fight(
        self,
        winner: str,
        loser: str,
        method: str,
        fight_date: date,
    ) -> Dict:
        """Update Elo ratings and fighter stats after a fight."""
        fw = self.fighters[winner]
        fl = self.fighters[loser]

        pre_w, pre_l = fw.elo, fl.elo
        exp_w = self._elo_probability(fw.elo, fl.elo)

        k_w = self.base_k
        k_l = self.base_k
        if fw.fights < 5:
            k_w *= self.new_fighter_k_mult
        if fl.fights < 5:
            k_l *= self.new_fighter_k_mult

        finish_mult = self.FINISH_MULTIPLIERS.get(method, 1.0)
        k_w *= finish_mult
        k_l *= finish_mult

        fw.elo += k_w * (1.0 - exp_w)
        fl.elo += k_l * (0.0 - (1.0 - exp_w))

        fw.fights += 1
        fl.fights += 1
        fw.last_fight = fight_date
        fl.last_fight = fight_date

        if method in ("ko_tko",):
            fl.ko_tko_losses += 1

        return {
            "winner": winner, "loser": loser, "method": method,
            "pre_elo": (round(pre_w, 1), round(pre_l, 1)),
            "post_elo": (round(fw.elo, 1), round(fl.elo, 1)),
        }

    def component_contribution_analysis(
        self, name_a: str, name_b: str
    ) -> Dict:
        """
        Analyze how much each model layer contributes to the final prediction.

        Computes the prediction with each layer in isolation and combined.
        """
        fa = self.fighters[name_a]
        fb = self.fighters[name_b]

        # Elo only
        elo_only = self._elo_probability(fa.elo, fb.elo)

        # Elo + Style
        style_adj = self._style_adjustment(fa.style, fb.style)
        log_odds_elo = math.log(max(0.01, min(0.99, elo_only)) / (1 - max(0.01, min(0.99, elo_only))))
        elo_style = 1.0 / (1.0 + math.exp(-(log_odds_elo + style_adj)))

        # Full model
        full = self.predict(name_a, name_b)

        return {
            "matchup": f"{name_a} vs {name_b}",
            "elo_only_prob_a": round(elo_only, 4),
            "elo_plus_style_prob_a": round(elo_style, 4),
            "full_model_prob_a": full["final_prob_a"],
            "style_contribution": round(elo_style - elo_only, 4),
            "physical_contribution": round(full["final_prob_a"] - elo_style, 4),
            "total_adjustment": round(full["final_prob_a"] - elo_only, 4),
        }


def main() -> None:
    """Run the MMA fight prediction case study."""
    print("=" * 70)
    print("Case Study: MMA Multi-Layer Fight Prediction System")
    print("=" * 70)

    system = MMAFightPredictor()

    # Create fighter profiles for a realistic lightweight division
    fighters_data = [
        MMAFighter("Islam Makhachev", elo=1780, fights=25,
                   sig_strikes_per_min=4.2, takedown_avg=4.1,
                   submission_avg=0.9, sig_strike_defense=0.63,
                   takedown_defense=0.88,
                   age=32.5, height_inches=70, reach_inches=70.5,
                   weight_class_lbs=155, walk_around_weight_lbs=180,
                   ko_tko_losses=0, total_sig_strikes_absorbed=420),

        MMAFighter("Charles Oliveira", elo=1720, fights=42,
                   sig_strikes_per_min=3.5, takedown_avg=2.2,
                   submission_avg=1.8, sig_strike_defense=0.52,
                   takedown_defense=0.58,
                   age=34.5, height_inches=70, reach_inches=74.0,
                   weight_class_lbs=155, walk_around_weight_lbs=178,
                   ko_tko_losses=3, total_sig_strikes_absorbed=650),

        MMAFighter("Justin Gaethje", elo=1650, fights=28,
                   sig_strikes_per_min=7.6, takedown_avg=0.5,
                   submission_avg=0.0, sig_strike_defense=0.54,
                   takedown_defense=0.72,
                   age=35.5, height_inches=69, reach_inches=70.0,
                   weight_class_lbs=155, walk_around_weight_lbs=180,
                   ko_tko_losses=4, total_sig_strikes_absorbed=780),

        MMAFighter("Dustin Poirier", elo=1680, fights=38,
                   sig_strikes_per_min=5.8, takedown_avg=0.8,
                   submission_avg=0.8, sig_strike_defense=0.52,
                   takedown_defense=0.62,
                   age=35.0, height_inches=69, reach_inches=72.0,
                   weight_class_lbs=155, walk_around_weight_lbs=182,
                   ko_tko_losses=3, total_sig_strikes_absorbed=700),

        MMAFighter("Arman Tsarukyan", elo=1700, fights=22,
                   sig_strikes_per_min=5.2, takedown_avg=3.8,
                   submission_avg=0.4, sig_strike_defense=0.60,
                   takedown_defense=0.85,
                   age=27.5, height_inches=69, reach_inches=72.0,
                   weight_class_lbs=155, walk_around_weight_lbs=177,
                   ko_tko_losses=0, total_sig_strikes_absorbed=290),
    ]

    for f in fighters_data:
        system.add_fighter(f)

    # Display fighter profiles
    print("\nFighter Profiles:")
    print(f"  {'Name':<22} {'Elo':>6} {'Style':<18} {'Age':>5} {'Reach':>6}")
    print(f"  {'-'*22} {'-'*6} {'-'*18} {'-'*5} {'-'*6}")
    for f in fighters_data:
        print(
            f"  {f.name:<22} {f.elo:>6.0f} {f.style:<18} "
            f"{f.age:>5.1f} {f.reach_inches:>5.1f}\""
        )

    # Generate predictions for key matchups
    matchups = [
        ("Islam Makhachev", "Charles Oliveira"),
        ("Islam Makhachev", "Justin Gaethje"),
        ("Islam Makhachev", "Arman Tsarukyan"),
        ("Charles Oliveira", "Dustin Poirier"),
        ("Justin Gaethje", "Dustin Poirier"),
        ("Arman Tsarukyan", "Charles Oliveira"),
    ]

    print("\n" + "=" * 70)
    print("Fight Predictions (Multi-Layer)")
    print("=" * 70)

    for fa_name, fb_name in matchups:
        pred = system.predict(fa_name, fb_name)
        analysis = system.component_contribution_analysis(fa_name, fb_name)

        print(f"\n  {fa_name} vs {fb_name}")
        print(f"    Styles: {pred['style_a']} vs {pred['style_b']}")
        print(f"    Elo: {pred['elo_a']} vs {pred['elo_b']}")
        print(f"    Elo-only probability: {analysis['elo_only_prob_a']:.1%}")
        print(f"    + Style matchup:      {analysis['style_contribution']:+.1%}")
        print(f"    + Physical attributes: {analysis['physical_contribution']:+.1%}")
        print(f"    = Final probability:   {pred['final_prob_a']:.1%} / {pred['final_prob_b']:.1%}")
        print(f"    Physical breakdown: reach={pred['components']['reach']:+.3f}, "
              f"age={pred['components']['net_age']:+.3f}, "
              f"KO_vuln={pred['components']['net_ko_vulnerability']:+.3f}")

    # Process some fight results
    print("\n" + "=" * 70)
    print("Processing Fight Results")
    print("=" * 70)

    fight_results = [
        ("Islam Makhachev", "Dustin Poirier", "submission", date(2024, 6, 1)),
        ("Arman Tsarukyan", "Charles Oliveira", "decision_unanimous", date(2024, 6, 1)),
    ]

    for winner, loser, method, d in fight_results:
        result = system.update_after_fight(winner, loser, method, d)
        print(f"\n  {winner} def. {loser} via {method}")
        print(f"    Pre-fight Elo: {result['pre_elo']}")
        print(f"    Post-fight Elo: {result['post_elo']}")

    # Re-predict with updated ratings
    print("\n  Updated prediction: Makhachev vs Tsarukyan")
    updated = system.predict("Islam Makhachev", "Arman Tsarukyan")
    print(f"    Final: {updated['final_prob_a']:.1%} / {updated['final_prob_b']:.1%}")

    print("\n" + "=" * 70)


if __name__ == "__main__":
    main()

Results and Analysis

The multi-layer system reveals important dynamics that a pure Elo system would miss. Consider the Makhachev versus Gaethje matchup. Makhachev's Elo advantage (1780 vs 1650) gives him a strong baseline probability. The style matchup layer further favors Makhachev: as a wrestler facing a pressure_fighter, the matchup matrix adds a positive adjustment. The physical attribute layer compounds this: Gaethje at 35.5 with four KO/TKO losses and 780 significant strikes absorbed has an elevated knockout vulnerability index, while Makhachev at 32.5 with zero KO losses is still near peak. The total adjustment from Elo-only to the full model adds approximately 4-7 percentage points to Makhachev's probability.

In contrast, the Makhachev versus Tsarukyan matchup shows how physical attributes can narrow a gap. Tsarukyan is younger (27.5 versus 32.5), with zero KO losses and fewer absorbed strikes. The age and durability advantages partially offset Makhachev's Elo edge, making the full model closer than the pure rating difference suggests.

Practical Betting Application

The component contribution analysis is directly useful for betting. When the market price aligns with the Elo-only prediction, but the full model differs significantly, the style and physical adjustments represent exploitable information. The most common pattern is that aging former champions retain market respect beyond what their current physical condition warrants. A fighter with three recent KO losses trading at implied probabilities that reflect their peak-era Elo is a systematic value opportunity that this framework identifies.

Limitations

The matchup matrix values in this case study are illustrative. In production, they should be estimated from a large database of classified fights (ideally 50+ per style pairing). The physical attribute coefficients are drawn from published research estimates but should be calibrated against a specific dataset. The style classification is heuristic and would benefit from clustering algorithms applied to high-dimensional fight statistics. Despite these limitations, the multi-layer architecture provides a rigorous framework for combining fundamentally different types of information into a single prediction.