Self-Assessment Quiz: Python for Sports Analytics

Test your understanding of pandas and NumPy fundamentals for football analytics. Select the best answer for each question.

Section 1: DataFrame Fundamentals (Questions 1-7)

Question 1

What is the difference between a pandas DataFrame and a Series?

A) A DataFrame can only hold numeric data; a Series can hold any type B) A DataFrame is 2-dimensional (rows and columns); a Series is 1-dimensional C) A Series is faster for calculations; DataFrames are for display only D) There is no difference; they are interchangeable

Question 2

Given this code:

games = pd.DataFrame({
    "team": ["Alabama", "Georgia", "Texas"],
    "wins": [12, 13, 10],
    "losses": [2, 1, 4]
})
result = games["wins"]

What is the type of result?

A) pandas.DataFrame B) pandas.Series C) numpy.array D) list

Question 3

Which method would you use to select rows 5-10 (by position) from a DataFrame?

A) df.loc[5:10] B) df.iloc[5:11] C) df[5:10] D) df.select(5, 10)

Question 4

What is the output of df.shape for a DataFrame with 100 games and 12 columns?

A) 1200 B) (12, 100) C) (100, 12) D) [100, 12]

Question 5

To filter a DataFrame for games where the home team scored more than 35 points, which syntax is correct?

A) games[games.home_score > 35] B) games.filter(home_score > 35) C) games.where(home_score > 35) D) games.select(games.home_score > 35)

Question 6

When combining multiple conditions in a pandas filter, which operators should you use?

A) and and or B) && and || C) & and | D) AND and OR

Question 7

What does df.head(3) return?

A) The first 3 columns B) The first 3 rows C) The last 3 rows D) A random sample of 3 rows

Section 2: Data Manipulation (Questions 8-14)

Question 8

What is the correct way to create a new column called "total_points" that sums home_score and away_score?

A) games.total_points = games.home_score + games.away_score B) games["total_points"] = games["home_score"] + games["away_score"] C) games.add_column("total_points", sum(home_score, away_score)) D) games.new("total_points", home_score + away_score)

Question 9

Which groupby aggregation calculates the average home score for each week?

A) games.groupby("week").mean("home_score") B) games.groupby("week")["home_score"].mean() C) games.aggregate("week", "home_score", "mean") D) games["home_score"].groupby("week").average()

Question 10

What does .apply(func, axis=1) do?

A) Applies the function to each column B) Applies the function to each row C) Applies the function to the entire DataFrame D) Applies the function to the first column only

Question 11

To fill all missing values in a column with 0, which method would you use?

A) df["col"].replace_null(0) B) df["col"].fillna(0) C) df["col"].missing = 0 D) df["col"].nan_to(0)

Question 12

What does df.dropna(subset=["team", "score"]) do?

A) Drops the team and score columns B) Drops rows where team OR score is missing C) Drops rows where team AND score are missing D) Drops rows where ALL columns have missing values

Question 13

Which is the most efficient way to create a column based on if-else logic?

A) Using a for loop to iterate through rows B) Using .apply() with a lambda function C) Using np.where() or np.select() D) All methods have the same performance

Question 14

To rename a column from "pts" to "points", which is correct?

A) df.columns["pts"] = "points" B) df.rename(columns={"pts": "points"}) C) df.column_rename("pts", "points") D) df["points"] = df["pts"]

Section 3: Merging and Joining (Questions 15-19)

Question 15

What is the difference between an inner join and a left join?

A) Inner join keeps all rows from the left table; left join keeps only matching rows B) Inner join keeps only matching rows; left join keeps all rows from the left table C) There is no difference D) Inner join is faster; left join is more accurate

Question 16

Given two DataFrames where the key column is named "team_id" in one and "id" in the other, which merge syntax is correct?

A) pd.merge(df1, df2, on=["team_id", "id"]) B) pd.merge(df1, df2, left_on="team_id", right_on="id") C) df1.join(df2, "team_id" == "id") D) pd.concat([df1, df2], on="team_id")

Question 17

What happens when you perform an outer join and there's no matching row in one table?

A) An error is raised B) The row is excluded C) Missing values (NaN) are filled in for the unmatched columns D) The join fails silently

Question 18

To stack two DataFrames vertically (add rows), which function should you use?

A) pd.merge() B) pd.concat() C) pd.join() D) pd.append()

Question 19

What is the how parameter in pd.merge()?

A) It specifies which columns to keep in the result B) It specifies the type of join (inner, left, right, outer) C) It specifies the sorting order of the result D) It specifies whether to keep duplicates

Section 4: NumPy Operations (Questions 20-25)

Question 20

What is the main advantage of NumPy arrays over Python lists for numerical operations?

A) NumPy arrays can store more data types B) NumPy operations are vectorized and much faster C) NumPy arrays use less syntax D) Python lists cannot perform math operations

Question 21

What does np.mean(scores) return for scores = np.array([28, 35, 42, 21, 17])?

A) 28 B) 28.6 C) 35 D) 143

Question 22

Given two arrays of equal length, what does home_scores - away_scores produce?

A) A single number (the sum of differences) B) An array of element-wise differences C) An error (arrays can't be subtracted) D) A 2D array combining both

Question 23

What does np.where(margin > 20, "Blowout", "Close") do?

A) Returns the indices where margin > 20 B) Returns an array of "Blowout" or "Close" based on the condition C) Filters the array to only values > 20 D) Raises an error because np.where requires 1 argument

Question 24

What does arr.shape return for a 2D array with 10 rows and 5 columns?

A) (5, 10) B) (10, 5) C) 50 D) [10, 5]

Question 25

To calculate the correlation between two arrays, which function would you use?

A) np.correlation(a, b) B) np.corrcoef(a, b) C) np.correlate(a, b) D) np.cor(a, b)

Answer Key

Question	Answer	Explanation
1	B	DataFrames are 2D (rows and columns), Series are 1D
2	B	Selecting a single column returns a Series
3	B	iloc uses integer positions; note that end is exclusive
4	C	Shape returns (rows, columns)
5	A	Boolean indexing with condition in brackets
6	C	Use & for AND,
7	B	head(n) returns the first n rows
8	B	Use bracket notation to create/assign columns
9	B	Select column with bracket, then apply aggregation
10	B	axis=1 applies function row-wise
11	B	fillna() fills missing values
12	B	subset specifies columns; drops if ANY are missing
13	C	Vectorized operations are fastest
14	B	rename() with columns dictionary
15	B	Inner = only matches; left = all from left table
16	B	Use left_on/right_on for different column names
17	C	Unmatched rows get NaN for missing columns
18	B	concat stacks DataFrames; merge joins on keys
19	B	how specifies join type
20	B	Vectorization is NumPy's main performance advantage
21	B	(28+35+42+21+17)/5 = 28.6
22	B	Array operations are element-wise by default
23	B	np.where returns values based on condition
24	B	Shape is (rows, columns)
25	B	corrcoef returns correlation coefficient matrix

Scoring Guide

23-25 correct: Excellent! You have a strong foundation in pandas and NumPy.
18-22 correct: Good understanding. Review the topics you missed.
13-17 correct: Fair. Spend more time practicing with real data.
Below 13: Review the chapter material and complete all exercises.

Topics to Review by Question

Questions	Topic
1-4	DataFrame structure and properties
5-7	Selection and filtering
8-10	Column creation and transformation
11-12	Missing data handling
13-14	Efficient operations
15-19	Merging and joining
20-25	NumPy fundamentals