Case Study 2: The Thursday Night Football Effect — Myth or Reality?

Introduction

Every NFL season, fans and bettors debate the quality of Thursday Night Football (TNF). The common narrative runs something like this: teams playing on a short week (especially the visiting team) are at a disadvantage, the games are sloppy, and underdogs perform better than expected because the compressed preparation time acts as an equalizer.

If true, this would represent a systematic bias in the NFL betting market. If the market does not fully account for the Thursday night effect, underdogs on TNF might cover the spread at a rate meaningfully above 50%, creating a profitable betting angle.

But is this narrative supported by the data? Or is it another example of a compelling story unsupported by statistical evidence? In this case study, we apply rigorous hypothesis testing to find out.

The Claim Under Investigation

Primary Claim: NFL underdogs perform better against the spread on Thursday nights compared to Sunday games.

Secondary Claims: 1. The overall ATS (against the spread) cover rate for TNF underdogs is significantly above 50%. 2. The cover rate for TNF underdogs is significantly higher than the cover rate for Sunday underdogs. 3. The total score in TNF games is lower than in Sunday games (i.e., unders hit more often on Thursdays).

The Data

We analyze data from 8 NFL seasons (2016-2023), covering both the regular Thursday Night Football slate and standard Sunday afternoon and evening games. Games played on Monday, Saturday, and holidays are excluded to keep the comparison clean.

Metric Thursday Night Sunday Games
Total games 128 1,536
Underdog covers 72 758
Favorite covers 51 740
Pushes 5 38
Underdog cover rate (excl. pushes) 58.5% 50.6%
Average total points scored 42.8 45.3
Under hits 69 762
Over hits 54 736
Over/under pushes 5 38
Under rate (excl. pushes) 56.1% 50.9%

At first glance, the numbers look striking. TNF underdogs cover at 58.5% compared to 50.6% on Sundays. Unders hit at 56.1% on Thursdays versus 50.9% on Sundays. But can these differences survive rigorous statistical testing?

Test 1: Do TNF Underdogs Cover at a Rate Above 50%?

Hypotheses: - H0: The true cover rate for TNF underdogs is 50% (p = 0.50) - H1: The true cover rate for TNF underdogs is greater than 50% (p > 0.50)

Data (excluding pushes): - n = 123 games (128 minus 5 pushes) - Underdog covers = 72 - Observed proportion: p_hat = 72/123 = 0.5854

Calculations:

Standard error under H0: SE = sqrt(0.50 * 0.50 / 123) = sqrt(0.002033) = 0.04509

Z-statistic: z = (0.5854 - 0.50) / 0.04509 = 0.0854 / 0.04509 = 1.894

One-sided p-value: P(Z > 1.894) = 0.0291

95% Confidence Interval for TNF underdog cover rate: SE_obs = sqrt(0.5854 * 0.4146 / 123) = 0.04440 CI = 0.5854 +/- 1.96 * 0.04440 = (0.4984, 0.6724)

Exact Binomial Test: P(X >= 72 | n = 123, p = 0.50) = 0.0332

Interpretation: The result is statistically significant at the 5% level using the z-test (p = 0.029) and borderline with the exact binomial test (p = 0.033). TNF underdogs do appear to cover at a rate above 50%. However, the confidence interval is wide (49.8% to 67.2%), reflecting the relatively small sample size.

Test 2: Do TNF Underdogs Outperform Sunday Underdogs?

This is the more important test. Even if TNF underdogs cover above 50%, what matters for a betting strategy is whether they outperform Sunday underdogs to a meaningful degree.

Hypotheses: - H0: TNF underdog cover rate equals Sunday underdog cover rate (p_TNF = p_SUN) - H1: TNF underdog cover rate is greater than Sunday underdog cover rate (p_TNF > p_SUN)

Data: - TNF: 72/123 = 0.5854 - Sunday: 758/1498 = 0.5060 (1536 - 38 pushes = 1498)

Pooled proportion: p_pool = (72 + 758) / (123 + 1498) = 830 / 1621 = 0.5120

Standard error of difference: SE_diff = sqrt(p_pool * (1 - p_pool) * (1/123 + 1/1498)) SE_diff = sqrt(0.5120 * 0.4880 * (0.008130 + 0.000668)) SE_diff = sqrt(0.2499 * 0.008798) SE_diff = sqrt(0.002199) SE_diff = 0.04689

Z-statistic: z = (0.5854 - 0.5060) / 0.04689 = 0.0794 / 0.04689 = 1.693

One-sided p-value: P(Z > 1.693) = 0.0452

95% CI for the difference: CI = 0.0794 +/- 1.96 * 0.04689 = (-0.0125, 0.1713)

Interpretation: The p-value of 0.045 is just barely significant at the 5% level. The confidence interval for the difference includes zero (-1.25% to +17.13%), meaning we cannot be 95% confident that the true difference is positive. The evidence is suggestive but far from definitive.

Test 3: Chi-Squared Test for Day-of-Week Effect

We can use a chi-squared test to examine whether the distribution of outcomes (cover/not cover) is independent of the day of the week.

Contingency Table (excluding pushes):

Underdog Covers Favorite Covers Total
Thursday 72 51 123
Sunday 758 740 1498
Total 830 791 1621

Expected frequencies under independence:

  • E(Thursday, Underdog Covers) = 123 * 830 / 1621 = 62.98
  • E(Thursday, Favorite Covers) = 123 * 791 / 1621 = 60.02
  • E(Sunday, Underdog Covers) = 1498 * 830 / 1621 = 767.02
  • E(Sunday, Favorite Covers) = 1498 * 791 / 1621 = 730.98

Chi-squared statistic: chi2 = (72 - 62.98)^2/62.98 + (51 - 60.02)^2/60.02 + (758 - 767.02)^2/767.02 + (740 - 730.98)^2/730.98 chi2 = 81.36/62.98 + 81.36/60.02 + 81.36/767.02 + 81.36/730.98 chi2 = 1.292 + 1.355 + 0.106 + 0.111 chi2 = 2.864

Degrees of freedom: (2-1)(2-1) = 1

P-value: P(chi2 > 2.864 | df=1) = 0.0906

Interpretation: The chi-squared test gives a p-value of 0.091, which is not significant at the 5% level. Note that the chi-squared test is inherently two-sided. The one-sided z-test (Test 2) was more powerful for detecting a directional effect because it concentrated all of alpha in one tail.

Test 4: Thursday Night Totals (Under/Over Analysis)

Hypotheses: - H0: The under rate on TNF is 50% (p = 0.50) - H1: The under rate on TNF is greater than 50% (p > 0.50)

Data (excluding pushes): - n = 123 (128 - 5 pushes) - Unders = 69 - Observed rate: p_hat = 69/123 = 0.5610

Z-statistic: z = (0.5610 - 0.50) / sqrt(0.25/123) = 0.0610 / 0.04509 = 1.353

One-sided p-value: P(Z > 1.353) = 0.088

Interpretation: The under rate of 56.1% is not statistically significant at the 5% level (p = 0.088). While the direction supports the narrative, the evidence is insufficient.

Test 5: Multiple Testing Correction

We have now conducted four tests. Even if one appears significant, we must correct for the fact that we ran multiple tests.

Raw p-values: 1. TNF underdog cover rate > 50%: p = 0.029 2. TNF underdogs vs. Sunday underdogs: p = 0.045 3. Chi-squared test for independence: p = 0.091 4. TNF under rate > 50%: p = 0.088

Bonferroni correction (4 tests): Adjusted alpha = 0.05 / 4 = 0.0125

After Bonferroni correction, none of the four tests reaches significance.

Benjamini-Hochberg procedure (FDR = 0.05):

Ordered p-values with BH thresholds: 1. p = 0.029, threshold = 0.05 * (1/4) = 0.0125 -- NOT significant 2. p = 0.045, threshold = 0.05 * (2/4) = 0.0250 -- NOT significant 3. p = 0.088, threshold = 0.05 * (3/4) = 0.0375 -- NOT significant 4. p = 0.091, threshold = 0.05 * (4/4) = 0.0500 -- NOT significant

Under BH correction as well, no test achieves significance.

Python Analysis

"""
Case Study 2: The Thursday Night Football Effect
Statistical analysis of TNF underdog performance and totals.
"""

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from typing import Dict, Tuple, List


def proportion_z_test(
    successes: int,
    n: int,
    p0: float = 0.50,
    alternative: str = "greater",
) -> Dict[str, float]:
    """
    Perform a z-test for a single proportion.

    Args:
        successes: Number of successes observed.
        n: Total number of trials.
        p0: Null hypothesis proportion.
        alternative: 'greater', 'less', or 'two-sided'.

    Returns:
        Dictionary with test statistics and p-value.
    """
    p_hat = successes / n
    se = np.sqrt(p0 * (1 - p0) / n)
    z = (p_hat - p0) / se

    if alternative == "greater":
        p_value = 1 - stats.norm.cdf(z)
    elif alternative == "less":
        p_value = stats.norm.cdf(z)
    else:
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))

    return {
        "observed_proportion": p_hat,
        "null_proportion": p0,
        "z_statistic": z,
        "p_value": p_value,
        "standard_error": se,
    }


def two_proportion_z_test(
    successes1: int,
    n1: int,
    successes2: int,
    n2: int,
    alternative: str = "greater",
) -> Dict[str, float]:
    """
    Perform a z-test comparing two independent proportions.

    Args:
        successes1: Successes in group 1 (treatment).
        n1: Total in group 1.
        successes2: Successes in group 2 (control).
        n2: Total in group 2.
        alternative: Direction of the test.

    Returns:
        Dictionary with test statistics, p-value, and CI.
    """
    p1 = successes1 / n1
    p2 = successes2 / n2
    p_pool = (successes1 + successes2) / (n1 + n2)
    se = np.sqrt(p_pool * (1 - p_pool) * (1 / n1 + 1 / n2))
    z = (p1 - p2) / se

    if alternative == "greater":
        p_value = 1 - stats.norm.cdf(z)
    elif alternative == "less":
        p_value = stats.norm.cdf(z)
    else:
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))

    # CI for the difference (using unpooled SE for the interval)
    se_unpooled = np.sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2)
    diff = p1 - p2
    ci_lower = diff - 1.96 * se_unpooled
    ci_upper = diff + 1.96 * se_unpooled

    return {
        "p1": p1,
        "p2": p2,
        "difference": diff,
        "pooled_proportion": p_pool,
        "z_statistic": z,
        "p_value": p_value,
        "ci_95": (ci_lower, ci_upper),
    }


def chi_squared_independence_test(
    table: np.ndarray,
) -> Dict[str, float]:
    """
    Perform a chi-squared test of independence.

    Args:
        table: 2D numpy array contingency table.

    Returns:
        Dictionary with chi2 statistic, p-value, and expected values.
    """
    chi2, p_value, dof, expected = stats.chi2_contingency(table)
    return {
        "chi2_statistic": chi2,
        "p_value": p_value,
        "degrees_of_freedom": dof,
        "expected_frequencies": expected,
    }


def benjamini_hochberg(p_values: List[float], alpha: float = 0.05) -> List[bool]:
    """
    Apply the Benjamini-Hochberg procedure for FDR control.

    Args:
        p_values: List of raw p-values.
        alpha: Desired FDR level.

    Returns:
        List of booleans indicating which hypotheses are rejected.
    """
    m = len(p_values)
    sorted_indices = np.argsort(p_values)
    sorted_p = np.array(p_values)[sorted_indices]
    thresholds = [(i + 1) / m * alpha for i in range(m)]

    # Find the largest k such that p_(k) <= k/m * alpha
    rejected = [False] * m
    max_k = -1
    for k in range(m):
        if sorted_p[k] <= thresholds[k]:
            max_k = k

    if max_k >= 0:
        for k in range(max_k + 1):
            rejected[sorted_indices[k]] = True

    return rejected


def plot_cover_rate_comparison(
    tnf_rate: float,
    sun_rate: float,
    tnf_n: int,
    sun_n: int,
) -> None:
    """
    Bar chart comparing TNF and Sunday underdog cover rates with CIs.
    """
    fig, ax = plt.subplots(figsize=(8, 6))

    rates = [tnf_rate, sun_rate]
    labels = [f"Thursday\n(n={tnf_n})", f"Sunday\n(n={sun_n})"]
    colors = ["#e74c3c", "#3498db"]

    se_tnf = np.sqrt(tnf_rate * (1 - tnf_rate) / tnf_n)
    se_sun = np.sqrt(sun_rate * (1 - sun_rate) / sun_n)
    errors = [1.96 * se_tnf, 1.96 * se_sun]

    bars = ax.bar(labels, rates, color=colors, width=0.5, edgecolor="black",
                  linewidth=0.8, yerr=errors, capsize=10, alpha=0.85)

    ax.axhline(y=0.50, color="black", linestyle="--", linewidth=1.5,
               label="50% (Fair coin)")
    ax.axhline(y=0.5238, color="orange", linestyle="--", linewidth=1.5,
               label="52.38% (Breakeven at -110)")

    ax.set_ylabel("Underdog Cover Rate", fontsize=13)
    ax.set_title("Underdog ATS Cover Rates: Thursday vs. Sunday", fontsize=14)
    ax.set_ylim(0.40, 0.72)
    ax.legend(fontsize=10, loc="upper right")

    for bar, rate in zip(bars, rates):
        ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.015,
                f"{rate:.1%}", ha="center", va="bottom", fontsize=12,
                fontweight="bold")

    plt.tight_layout()
    plt.savefig("tnf_vs_sunday_cover_rates.png", dpi=150)
    plt.show()


def plot_season_by_season(
    seasons: List[str],
    tnf_rates: List[float],
    sun_rates: List[float],
) -> None:
    """
    Line plot showing TNF vs Sunday underdog cover rates across seasons.
    """
    fig, ax = plt.subplots(figsize=(12, 6))

    x = np.arange(len(seasons))
    width = 0.35

    ax.bar(x - width / 2, tnf_rates, width, label="Thursday Night",
           color="#e74c3c", alpha=0.85, edgecolor="black", linewidth=0.5)
    ax.bar(x + width / 2, sun_rates, width, label="Sunday",
           color="#3498db", alpha=0.85, edgecolor="black", linewidth=0.5)

    ax.axhline(y=0.50, color="black", linestyle="--", linewidth=1,
               label="50% line")
    ax.set_xticks(x)
    ax.set_xticklabels(seasons, fontsize=11)
    ax.set_ylabel("Underdog Cover Rate", fontsize=13)
    ax.set_title("Season-by-Season Underdog Cover Rates", fontsize=14)
    ax.legend(fontsize=10)
    ax.set_ylim(0.30, 0.75)

    plt.tight_layout()
    plt.savefig("tnf_season_by_season.png", dpi=150)
    plt.show()


def plot_multiple_testing_results(
    test_names: List[str],
    p_values: List[float],
    bonferroni_alpha: float,
    bh_rejected: List[bool],
) -> None:
    """
    Visualize p-values alongside correction thresholds.
    """
    fig, ax = plt.subplots(figsize=(10, 6))

    m = len(p_values)
    x = np.arange(m)
    sorted_idx = np.argsort(p_values)
    sorted_p = np.array(p_values)[sorted_idx]
    sorted_names = [test_names[i] for i in sorted_idx]
    bh_thresholds = [(i + 1) / m * 0.05 for i in range(m)]

    ax.bar(x, sorted_p, color="steelblue", alpha=0.7, edgecolor="black",
           linewidth=0.5, label="Raw p-values")
    ax.plot(x, bh_thresholds, "ro--", markersize=8,
            label="BH threshold (FDR=0.05)")
    ax.axhline(y=bonferroni_alpha, color="green", linestyle="-.", linewidth=2,
               label=f"Bonferroni threshold ({bonferroni_alpha:.4f})")
    ax.axhline(y=0.05, color="gray", linestyle=":", linewidth=1.5,
               label="Uncorrected alpha = 0.05")

    ax.set_xticks(x)
    ax.set_xticklabels(sorted_names, rotation=30, ha="right", fontsize=9)
    ax.set_ylabel("P-value", fontsize=12)
    ax.set_title("P-values with Multiple Testing Corrections", fontsize=14)
    ax.legend(fontsize=9, loc="upper left")
    ax.set_ylim(0, 0.12)

    plt.tight_layout()
    plt.savefig("tnf_multiple_testing.png", dpi=150)
    plt.show()


# ─── Main Analysis ───────────────────────────────────────────────────────────

if __name__ == "__main__":

    print("=" * 65)
    print("  CASE STUDY: THE THURSDAY NIGHT FOOTBALL EFFECT")
    print("=" * 65)

    # ── Test 1: TNF underdog cover rate vs 50% ──
    print("\n--- Test 1: TNF Underdog Cover Rate > 50%? ---")
    test1 = proportion_z_test(72, 123, p0=0.50, alternative="greater")
    print(f"  Observed: {test1['observed_proportion']:.4f}")
    print(f"  Z = {test1['z_statistic']:.3f}, p = {test1['p_value']:.4f}")

    # Exact binomial
    binom_p = 1 - stats.binom.cdf(71, 123, 0.50)
    print(f"  Exact binomial p-value: {binom_p:.4f}")

    # ── Test 2: TNF underdogs vs Sunday underdogs ──
    print("\n--- Test 2: TNF vs Sunday Underdog Cover Rate ---")
    test2 = two_proportion_z_test(72, 123, 758, 1498, alternative="greater")
    print(f"  TNF: {test2['p1']:.4f}  Sunday: {test2['p2']:.4f}")
    print(f"  Difference: {test2['difference']:.4f}")
    print(f"  Z = {test2['z_statistic']:.3f}, p = {test2['p_value']:.4f}")
    print(f"  95% CI for difference: ({test2['ci_95'][0]:.4f}, "
          f"{test2['ci_95'][1]:.4f})")

    # ── Test 3: Chi-squared test ──
    print("\n--- Test 3: Chi-Squared Test of Independence ---")
    contingency = np.array([[72, 51], [758, 740]])
    test3 = chi_squared_independence_test(contingency)
    print(f"  Chi2 = {test3['chi2_statistic']:.3f}, "
          f"p = {test3['p_value']:.4f}, df = {test3['degrees_of_freedom']}")
    print(f"  Expected frequencies:\n{test3['expected_frequencies']}")

    # ── Test 4: TNF under rate ──
    print("\n--- Test 4: TNF Under Rate > 50%? ---")
    test4 = proportion_z_test(69, 123, p0=0.50, alternative="greater")
    print(f"  Observed: {test4['observed_proportion']:.4f}")
    print(f"  Z = {test4['z_statistic']:.3f}, p = {test4['p_value']:.4f}")

    # ── Multiple Testing Correction ──
    print("\n--- Multiple Testing Correction ---")
    raw_p_values = [
        test1["p_value"],
        test2["p_value"],
        test3["p_value"],
        test4["p_value"],
    ]
    test_names = [
        "TNF dogs > 50%",
        "TNF vs Sunday dogs",
        "Chi-squared indep.",
        "TNF under > 50%",
    ]

    bonferroni_alpha = 0.05 / len(raw_p_values)
    print(f"  Bonferroni threshold: {bonferroni_alpha:.4f}")
    for name, p in zip(test_names, raw_p_values):
        sig = "YES" if p < bonferroni_alpha else "NO"
        print(f"    {name}: p = {p:.4f} -> Significant? {sig}")

    bh_rejected = benjamini_hochberg(raw_p_values, alpha=0.05)
    print(f"\n  Benjamini-Hochberg (FDR = 0.05):")
    for name, p, rej in zip(test_names, raw_p_values, bh_rejected):
        sig = "YES" if rej else "NO"
        print(f"    {name}: p = {p:.4f} -> Rejected? {sig}")

    # ── Visualizations ──
    plot_cover_rate_comparison(72 / 123, 758 / 1498, 123, 1498)

    # Simulated season-by-season data
    seasons = ["2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023"]
    tnf_rates = [0.625, 0.563, 0.571, 0.600, 0.533, 0.563, 0.615, 0.600]
    sun_rates = [0.512, 0.498, 0.515, 0.508, 0.502, 0.495, 0.510, 0.507]
    plot_season_by_season(seasons, tnf_rates, sun_rates)

    plot_multiple_testing_results(test_names, raw_p_values, bonferroni_alpha,
                                 bh_rejected)

    # ── Power Analysis ──
    print("\n--- Power Analysis ---")
    # If true TNF underdog cover rate is 55%, what power do we have with n=123?
    true_p = 0.55
    se_null = np.sqrt(0.50 * 0.50 / 123)
    z_crit = 1.645  # one-sided alpha = 0.05
    critical_value = 0.50 + z_crit * se_null

    se_alt = np.sqrt(true_p * (1 - true_p) / 123)
    z_power = (critical_value - true_p) / se_alt
    power = 1 - stats.norm.cdf(z_power)
    print(f"  If true cover rate = {true_p:.0%}, power with n=123: {power:.3f}")

    true_p2 = 0.58
    se_alt2 = np.sqrt(true_p2 * (1 - true_p2) / 123)
    z_power2 = (critical_value - true_p2) / se_alt2
    power2 = 1 - stats.norm.cdf(z_power2)
    print(f"  If true cover rate = {true_p2:.0%}, power with n=123: {power2:.3f}")

    # Required sample size for 80% power at true rate 55%
    z_alpha = 1.645
    z_beta = 0.842
    p0, p1 = 0.50, 0.55
    n_req = ((z_alpha + z_beta) ** 2 * p0 * (1 - p0)) / (p1 - p0) ** 2
    print(f"  Required n for 80% power (true rate 55%): {n_req:.0f}")

    print("\n" + "=" * 65)
    print("  ANALYSIS COMPLETE")
    print("=" * 65)

Power Analysis: Could We Even Detect a Real Effect?

Before drawing conclusions, we must ask: if the Thursday night effect were real, would our data be sufficient to detect it?

Scenario: True TNF underdog cover rate = 55%

With n = 123 TNF games and a one-sided test at alpha = 0.05:

Critical value for rejection: p_crit = 0.50 + 1.645 * 0.04509 = 0.5742

Power = P(p_hat > 0.5742 | true p = 0.55) = P(Z > (0.5742 - 0.55) / sqrt(0.55*0.45/123)) = P(Z > 0.540) = 0.295

We have only 29.5% power to detect a 55% true cover rate. This means that even if TNF underdogs truly covered at 55%, we would fail to detect it about 70% of the time with only 123 games.

Scenario: True TNF underdog cover rate = 58%

Power = P(p_hat > 0.5742 | true p = 0.58) = 0.557

Even for a fairly large 58% effect, our power is only 55.7%.

Required sample size for 80% power: To detect a 55% cover rate with 80% power, we would need approximately 620 TNF games, which represents about 40 NFL seasons of Thursday Night Football.

This power analysis reveals a fundamental challenge: the small number of Thursday night games per season makes it nearly impossible to achieve definitive statistical conclusions about day-of-week effects.

Examining the Narrative

Beyond the raw numbers, let us consider the plausibility of the proposed mechanism. The argument for a TNF underdog effect rests on several assumptions:

1. Short-week preparation disadvantage. Teams playing on Thursday typically had their previous game on Sunday, giving them only 3 days to prepare instead of 6. However, both teams face this constraint equally unless one team is coming off a bye or Monday night game.

2. Travel disadvantage for the visiting team. The road team must travel on short rest, which could be more disruptive. This is a plausible mechanism that disproportionately affects the road team.

3. Market mispricing. Even if the above factors are real, the market must fail to fully account for them in the point spread. Modern NFL betting markets are highly efficient, with sharp bettors and sophisticated models on both sides.

4. Changing landscape. The NFL has made numerous scheduling changes over the years to reduce short-week disadvantages (e.g., avoiding teams on short rest after Monday night games). Any historical effect may have diminished.

Comprehensive Summary of Results

Test Statistic P-value Significant (raw)? Significant (corrected)?
TNF underdog > 50% z = 1.894 0.029 Yes No
TNF vs. Sunday underdogs z = 1.693 0.045 Barely No
Chi-squared independence chi2 = 2.864 0.091 No No
TNF under > 50% z = 1.353 0.088 No No

Conclusions

1. The raw numbers are suggestive but not definitive. TNF underdogs covered at 58.5% in our dataset, which is notable. However, after accounting for multiple testing, none of our results achieves statistical significance.

2. The sample size is too small for reliable conclusions. With only 128 Thursday night games over 8 seasons (roughly 16 per year), we lack the statistical power to detect realistic effect sizes. We would need decades of data to reach firm conclusions.

3. The practical significance is questionable even if the effect is real. Even if TNF underdogs truly cover at 55%, a bettor would face enormous variance game-to-game. With only 16 opportunities per year, the expected annual edge would be roughly 5 units (assuming a 5% cover rate advantage on $100 bets), which is within a single standard deviation of the expected range under pure chance.

4. The market may have already adapted. If this effect existed historically, the efficiency of modern betting markets means it has likely been priced in. Sharp bettors who identified this pattern would have bet it into the line.

5. Publication bias likely exaggerates the effect. Articles about the "Thursday Night Football effect" are more likely to be written and shared when the data happens to support the narrative. Seasons where TNF underdogs perform at 50% are not newsworthy.

Final Verdict

The Thursday Night Football underdog effect, as commonly described, does not survive rigorous statistical testing. While the raw data shows a suggestive trend, the combination of small sample sizes, multiple testing issues, and low statistical power means we cannot distinguish a real effect from random variation.

This case study illustrates a broader lesson for sports bettors: compelling narratives and suggestive data do not constitute statistical evidence. The threshold of proof should be high, especially for effects that require many decades of data to evaluate properly. The burden of evidence falls on the claimant, and in this case, the evidence falls short.


End of Case Study 2