Prerequisites

"The beginning of wisdom is the definition of terms." — Socrates


This chapter outlines what you should know before starting the textbook and provides self-assessment exercises to evaluate your readiness. If you find gaps, we provide resources and review material to help you prepare.


Required Background

Mathematics and Statistics

You should be comfortable with the following concepts:

Descriptive Statistics

  • Measures of central tendency: mean, median, mode
  • Measures of spread: variance, standard deviation, range
  • Percentiles and quartiles
  • Basic data visualization: histograms, scatter plots, box plots

Probability Fundamentals

  • Basic probability rules: addition, multiplication
  • Conditional probability: $P(A|B)$
  • Independence and dependence
  • Expected value and variance

Distributions

  • Normal (Gaussian) distribution
  • Binomial and Poisson distributions (basic awareness)
  • Understanding of probability density functions

Statistical Inference (Basic Awareness)

  • Hypothesis testing concepts (null/alternative hypotheses)
  • p-values and significance (conceptual understanding)
  • Confidence intervals (what they represent)

Regression (Helpful but Not Required)

  • Chapters 3 and 7 will teach regression in depth
  • Prior exposure helpful but not assumed

Self-Assessment: Can you answer these questions? 1. What is the difference between mean and median? 2. If a coin is flipped 10 times, what is the probability of exactly 7 heads? 3. What does it mean for two events to be independent? 4. What is a 95% confidence interval?

If you struggled, review a basic statistics textbook or course before or alongside Part I.


Python Programming

You should be comfortable with:

Basic Syntax and Data Types

# Variables and basic operations
x = 5
y = 3.14
name = "player"
is_active = True

# Lists
players = ["Messi", "Ronaldo", "Mbappé"]
players.append("Haaland")
first_player = players[0]

# Dictionaries
stats = {"goals": 30, "assists": 15}
stats["shots"] = 100

Control Flow

# Conditionals
if goals > 20:
    print("Top scorer")
elif goals > 10:
    print("Good season")
else:
    print("Room for improvement")

# Loops
for player in players:
    print(player)

for i in range(10):
    print(i)

# List comprehensions
goal_totals = [player["goals"] for player in team_data]

Functions

def calculate_goal_rate(goals: int, minutes: int) -> float:
    """Calculate goals per 90 minutes."""
    if minutes == 0:
        return 0.0
    return (goals / minutes) * 90

# Using the function
rate = calculate_goal_rate(15, 2500)

Basic File Operations

# Reading files
with open("data.txt", "r") as f:
    content = f.read()

# Working with CSV (basic awareness)
import csv
with open("players.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

Object-Oriented Basics (Helpful)

class Player:
    def __init__(self, name: str, position: str):
        self.name = name
        self.position = position

    def __repr__(self):
        return f"Player({self.name}, {self.position})"

Self-Assessment: Can you write code to: 1. Create a list of numbers and find the maximum? 2. Write a function that takes two parameters and returns their sum? 3. Loop through a dictionary and print each key-value pair? 4. Handle a potential division by zero error?

If you struggled, complete a Python basics tutorial before starting. Recommended: Python official tutorial or "Automate the Boring Stuff with Python."


Data Manipulation (Helpful but Taught)

Prior exposure to pandas is helpful but not required. Chapter 4 provides a comprehensive review.

Basic familiarity with these operations is beneficial:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
    "player": ["A", "B", "C"],
    "goals": [10, 15, 8]
})

# Basic operations
df["goals"].mean()
df[df["goals"] > 10]
df.groupby("team")["goals"].sum()

If you have never used pandas, don't worry—Chapter 4 covers everything you need.


Soccer Knowledge

You should understand:

Basic Rules

  • Match duration (90 minutes + stoppage time)
  • Scoring and offside
  • Fouls, free kicks, penalties
  • Yellow and red cards

Positions and Formations

  • Goalkeeper, defenders, midfielders, forwards
  • Common formations (4-3-3, 4-4-2, 3-5-2)
  • What different positions are responsible for

Competition Structure

  • League formats (points for win/draw/loss)
  • Cup competitions (knockout format)
  • Major leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1)
  • European competitions (Champions League, Europa League)

Basic Tactical Concepts

  • Possession vs. counter-attacking styles
  • High press vs. low block defending
  • Build-up play through different thirds

Self-Assessment: 1. What happens if a player is in an offside position when the ball is played? 2. How many points does a team get for a win in most leagues? 3. What is the difference between a 4-3-3 and a 4-4-2 formation? 4. What does "pressing high" mean?

If you're new to soccer, watch several matches while paying attention to tactics. Resources like Tifo Football on YouTube provide excellent tactical explanations.


If you need to strengthen your background, here are recommended resources:

Statistics

Textbooks: - OpenIntro Statistics (free online) — Comprehensive introduction - Statistics by Freedman, Pisani, and Purves — Classic introduction - Naked Statistics by Charles Wheelan — Accessible, non-technical

Online Courses: - Khan Academy Statistics and Probability (free) - Coursera: Statistics with Python Specialization - edX: Introduction to Probability and Statistics

Python

Interactive Tutorials: - Python.org official tutorial (free) - Codecademy Python course - DataCamp Introduction to Python

Books: - Python Crash Course by Eric Matthes - Automate the Boring Stuff with Python by Al Sweigart (free online) - Learning Python by Mark Lutz (comprehensive reference)

Data Science with Python

Tutorials: - pandas documentation tutorials - Real Python pandas tutorials - Kaggle Learn: Pandas course (free)

Books: - Python for Data Analysis by Wes McKinney (pandas creator) - Hands-On Machine Learning by Aurélien Géron (later chapters)

Soccer Tactics

YouTube Channels: - Tifo Football — Tactical explanations - The Coaches' Voice — Professional insights - HITC Sevens — Tactical analysis

Books: - Inverting the Pyramid by Jonathan Wilson — History of tactics - The Mixer by Michael Cox — Premier League tactical evolution - Zonal Marking by Michael Cox — European tactical developments


Diagnostic Exercises

Complete these exercises to assess your readiness. Solutions are provided at the end.

Statistics Diagnostic

Problem 1: A soccer team scored the following number of goals in 10 matches: 2, 1, 0, 3, 2, 1, 4, 2, 1, 2. - a) Calculate the mean number of goals - b) Calculate the median number of goals - c) Calculate the standard deviation

Problem 2: In a match, the probability that Team A scores is 0.6, and the probability that Team B scores is 0.4. Assuming independence, what is the probability that both teams score?

Problem 3: A striker takes 100 shots in a season. If each shot has a 15% probability of being a goal (independent of other shots), what is the expected number of goals?

Python Diagnostic

Problem 4: Write a function top_scorers(players, n) that takes a list of dictionaries (each with "name" and "goals" keys) and returns the names of the top n scorers.

players = [
    {"name": "Player A", "goals": 15},
    {"name": "Player B", "goals": 23},
    {"name": "Player C", "goals": 18},
    {"name": "Player D", "goals": 12}
]

# Should return ["Player B", "Player C"] for n=2

Problem 5: Write code that reads a CSV file called "matches.csv" with columns "home_team", "away_team", "home_goals", "away_goals" and prints how many matches ended in a draw.

Soccer Diagnostic

Problem 6: Explain in 2-3 sentences why a team might choose to play with a "high press" and what risks this involves.

Problem 7: A striker has 20 goals from 150 shots this season. Express their conversion rate as a percentage. How does this compare to a typical elite striker?


Diagnostic Solutions

Statistics Solutions

Problem 1: - a) Mean = (2+1+0+3+2+1+4+2+1+2)/10 = 18/10 = 1.8 goals - b) Sorted: 0,1,1,1,2,2,2,2,3,4. Median = (2+2)/2 = 2 goals - c) Variance = Σ(x-μ)²/n = 1.16, Standard deviation = √1.16 ≈ 1.08 goals

Problem 2: P(both score) = P(A scores) × P(B scores) = 0.6 × 0.4 = 0.24 or 24%

Problem 3: Expected goals = n × p = 100 × 0.15 = 15 goals

Python Solutions

Problem 4:

def top_scorers(players: list, n: int) -> list:
    """Return names of top n scorers."""
    sorted_players = sorted(players, key=lambda x: x["goals"], reverse=True)
    return [p["name"] for p in sorted_players[:n]]

Problem 5:

import pandas as pd

df = pd.read_csv("matches.csv")
draws = df[df["home_goals"] == df["away_goals"]]
print(f"Number of draws: {len(draws)}")

# Alternative without pandas:
import csv
draws = 0
with open("matches.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if int(row["home_goals"]) == int(row["away_goals"]):
            draws += 1
print(f"Number of draws: {draws}")

Soccer Solutions

Problem 6: A high press aims to win the ball back quickly after losing possession, often in dangerous areas near the opponent's goal. This creates turnover opportunities but leaves space behind the defensive line that opponents can exploit with quick passes or long balls.

Problem 7: Conversion rate = 20/150 = 13.3%. This is slightly below average for elite strikers, who typically convert 15-20% of their shots. However, the quality of shots matters—a striker taking more difficult chances might have a lower conversion rate but still be highly effective.


Readiness Assessment

Score yourself: - Problems 1-3 (Statistics): 3 points each - Problems 4-5 (Python): 4 points each - Problems 6-7 (Soccer): 2 points each

Total: 21 points

Score Readiness
18-21 Ready to begin
14-17 Ready, but review weak areas as you go
10-13 Some preparation recommended
<10 Significant preparation needed

If you scored below 14, spend 1-2 weeks reviewing the recommended resources before starting Chapter 1. The investment will pay dividends throughout your learning journey.


Ready? Turn to Chapter 1: Introduction to Soccer Analytics.