Prerequisites
"The beginning of wisdom is the definition of terms." — Socrates
This chapter outlines what you should know before starting the textbook and provides self-assessment exercises to evaluate your readiness. If you find gaps, we provide resources and review material to help you prepare.
Required Background
Mathematics and Statistics
You should be comfortable with the following concepts:
Descriptive Statistics
- Measures of central tendency: mean, median, mode
- Measures of spread: variance, standard deviation, range
- Percentiles and quartiles
- Basic data visualization: histograms, scatter plots, box plots
Probability Fundamentals
- Basic probability rules: addition, multiplication
- Conditional probability: $P(A|B)$
- Independence and dependence
- Expected value and variance
Distributions
- Normal (Gaussian) distribution
- Binomial and Poisson distributions (basic awareness)
- Understanding of probability density functions
Statistical Inference (Basic Awareness)
- Hypothesis testing concepts (null/alternative hypotheses)
- p-values and significance (conceptual understanding)
- Confidence intervals (what they represent)
Regression (Helpful but Not Required)
- Chapters 3 and 7 will teach regression in depth
- Prior exposure helpful but not assumed
Self-Assessment: Can you answer these questions? 1. What is the difference between mean and median? 2. If a coin is flipped 10 times, what is the probability of exactly 7 heads? 3. What does it mean for two events to be independent? 4. What is a 95% confidence interval?
If you struggled, review a basic statistics textbook or course before or alongside Part I.
Python Programming
You should be comfortable with:
Basic Syntax and Data Types
# Variables and basic operations
x = 5
y = 3.14
name = "player"
is_active = True
# Lists
players = ["Messi", "Ronaldo", "Mbappé"]
players.append("Haaland")
first_player = players[0]
# Dictionaries
stats = {"goals": 30, "assists": 15}
stats["shots"] = 100
Control Flow
# Conditionals
if goals > 20:
print("Top scorer")
elif goals > 10:
print("Good season")
else:
print("Room for improvement")
# Loops
for player in players:
print(player)
for i in range(10):
print(i)
# List comprehensions
goal_totals = [player["goals"] for player in team_data]
Functions
def calculate_goal_rate(goals: int, minutes: int) -> float:
"""Calculate goals per 90 minutes."""
if minutes == 0:
return 0.0
return (goals / minutes) * 90
# Using the function
rate = calculate_goal_rate(15, 2500)
Basic File Operations
# Reading files
with open("data.txt", "r") as f:
content = f.read()
# Working with CSV (basic awareness)
import csv
with open("players.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
print(row)
Object-Oriented Basics (Helpful)
class Player:
def __init__(self, name: str, position: str):
self.name = name
self.position = position
def __repr__(self):
return f"Player({self.name}, {self.position})"
Self-Assessment: Can you write code to: 1. Create a list of numbers and find the maximum? 2. Write a function that takes two parameters and returns their sum? 3. Loop through a dictionary and print each key-value pair? 4. Handle a potential division by zero error?
If you struggled, complete a Python basics tutorial before starting. Recommended: Python official tutorial or "Automate the Boring Stuff with Python."
Data Manipulation (Helpful but Taught)
Prior exposure to pandas is helpful but not required. Chapter 4 provides a comprehensive review.
Basic familiarity with these operations is beneficial:
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({
"player": ["A", "B", "C"],
"goals": [10, 15, 8]
})
# Basic operations
df["goals"].mean()
df[df["goals"] > 10]
df.groupby("team")["goals"].sum()
If you have never used pandas, don't worry—Chapter 4 covers everything you need.
Soccer Knowledge
You should understand:
Basic Rules
- Match duration (90 minutes + stoppage time)
- Scoring and offside
- Fouls, free kicks, penalties
- Yellow and red cards
Positions and Formations
- Goalkeeper, defenders, midfielders, forwards
- Common formations (4-3-3, 4-4-2, 3-5-2)
- What different positions are responsible for
Competition Structure
- League formats (points for win/draw/loss)
- Cup competitions (knockout format)
- Major leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1)
- European competitions (Champions League, Europa League)
Basic Tactical Concepts
- Possession vs. counter-attacking styles
- High press vs. low block defending
- Build-up play through different thirds
Self-Assessment: 1. What happens if a player is in an offside position when the ball is played? 2. How many points does a team get for a win in most leagues? 3. What is the difference between a 4-3-3 and a 4-4-2 formation? 4. What does "pressing high" mean?
If you're new to soccer, watch several matches while paying attention to tactics. Resources like Tifo Football on YouTube provide excellent tactical explanations.
Recommended Preparation
If you need to strengthen your background, here are recommended resources:
Statistics
Textbooks: - OpenIntro Statistics (free online) — Comprehensive introduction - Statistics by Freedman, Pisani, and Purves — Classic introduction - Naked Statistics by Charles Wheelan — Accessible, non-technical
Online Courses: - Khan Academy Statistics and Probability (free) - Coursera: Statistics with Python Specialization - edX: Introduction to Probability and Statistics
Python
Interactive Tutorials: - Python.org official tutorial (free) - Codecademy Python course - DataCamp Introduction to Python
Books: - Python Crash Course by Eric Matthes - Automate the Boring Stuff with Python by Al Sweigart (free online) - Learning Python by Mark Lutz (comprehensive reference)
Data Science with Python
Tutorials: - pandas documentation tutorials - Real Python pandas tutorials - Kaggle Learn: Pandas course (free)
Books: - Python for Data Analysis by Wes McKinney (pandas creator) - Hands-On Machine Learning by Aurélien Géron (later chapters)
Soccer Tactics
YouTube Channels: - Tifo Football — Tactical explanations - The Coaches' Voice — Professional insights - HITC Sevens — Tactical analysis
Books: - Inverting the Pyramid by Jonathan Wilson — History of tactics - The Mixer by Michael Cox — Premier League tactical evolution - Zonal Marking by Michael Cox — European tactical developments
Diagnostic Exercises
Complete these exercises to assess your readiness. Solutions are provided at the end.
Statistics Diagnostic
Problem 1: A soccer team scored the following number of goals in 10 matches: 2, 1, 0, 3, 2, 1, 4, 2, 1, 2. - a) Calculate the mean number of goals - b) Calculate the median number of goals - c) Calculate the standard deviation
Problem 2: In a match, the probability that Team A scores is 0.6, and the probability that Team B scores is 0.4. Assuming independence, what is the probability that both teams score?
Problem 3: A striker takes 100 shots in a season. If each shot has a 15% probability of being a goal (independent of other shots), what is the expected number of goals?
Python Diagnostic
Problem 4: Write a function top_scorers(players, n) that takes a list of dictionaries (each with "name" and "goals" keys) and returns the names of the top n scorers.
players = [
{"name": "Player A", "goals": 15},
{"name": "Player B", "goals": 23},
{"name": "Player C", "goals": 18},
{"name": "Player D", "goals": 12}
]
# Should return ["Player B", "Player C"] for n=2
Problem 5: Write code that reads a CSV file called "matches.csv" with columns "home_team", "away_team", "home_goals", "away_goals" and prints how many matches ended in a draw.
Soccer Diagnostic
Problem 6: Explain in 2-3 sentences why a team might choose to play with a "high press" and what risks this involves.
Problem 7: A striker has 20 goals from 150 shots this season. Express their conversion rate as a percentage. How does this compare to a typical elite striker?
Diagnostic Solutions
Statistics Solutions
Problem 1: - a) Mean = (2+1+0+3+2+1+4+2+1+2)/10 = 18/10 = 1.8 goals - b) Sorted: 0,1,1,1,2,2,2,2,3,4. Median = (2+2)/2 = 2 goals - c) Variance = Σ(x-μ)²/n = 1.16, Standard deviation = √1.16 ≈ 1.08 goals
Problem 2: P(both score) = P(A scores) × P(B scores) = 0.6 × 0.4 = 0.24 or 24%
Problem 3: Expected goals = n × p = 100 × 0.15 = 15 goals
Python Solutions
Problem 4:
def top_scorers(players: list, n: int) -> list:
"""Return names of top n scorers."""
sorted_players = sorted(players, key=lambda x: x["goals"], reverse=True)
return [p["name"] for p in sorted_players[:n]]
Problem 5:
import pandas as pd
df = pd.read_csv("matches.csv")
draws = df[df["home_goals"] == df["away_goals"]]
print(f"Number of draws: {len(draws)}")
# Alternative without pandas:
import csv
draws = 0
with open("matches.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
if int(row["home_goals"]) == int(row["away_goals"]):
draws += 1
print(f"Number of draws: {draws}")
Soccer Solutions
Problem 6: A high press aims to win the ball back quickly after losing possession, often in dangerous areas near the opponent's goal. This creates turnover opportunities but leaves space behind the defensive line that opponents can exploit with quick passes or long balls.
Problem 7: Conversion rate = 20/150 = 13.3%. This is slightly below average for elite strikers, who typically convert 15-20% of their shots. However, the quality of shots matters—a striker taking more difficult chances might have a lower conversion rate but still be highly effective.
Readiness Assessment
Score yourself: - Problems 1-3 (Statistics): 3 points each - Problems 4-5 (Python): 4 points each - Problems 6-7 (Soccer): 2 points each
Total: 21 points
| Score | Readiness |
|---|---|
| 18-21 | Ready to begin |
| 14-17 | Ready, but review weak areas as you go |
| 10-13 | Some preparation recommended |
| <10 | Significant preparation needed |
If you scored below 14, spend 1-2 weeks reviewing the recommended resources before starting Chapter 1. The investment will pay dividends throughout your learning journey.
Ready? Turn to Chapter 1: Introduction to Soccer Analytics.