Self-Assessment Quiz: Python for Sports Analytics
Test your understanding of pandas and NumPy fundamentals for football analytics. Select the best answer for each question.
Section 1: DataFrame Fundamentals (Questions 1-7)
Question 1
What is the difference between a pandas DataFrame and a Series?
A) A DataFrame can only hold numeric data; a Series can hold any type B) A DataFrame is 2-dimensional (rows and columns); a Series is 1-dimensional C) A Series is faster for calculations; DataFrames are for display only D) There is no difference; they are interchangeable
Question 2
Given this code:
games = pd.DataFrame({
"team": ["Alabama", "Georgia", "Texas"],
"wins": [12, 13, 10],
"losses": [2, 1, 4]
})
result = games["wins"]
What is the type of result?
A) pandas.DataFrame B) pandas.Series C) numpy.array D) list
Question 3
Which method would you use to select rows 5-10 (by position) from a DataFrame?
A) df.loc[5:10]
B) df.iloc[5:11]
C) df[5:10]
D) df.select(5, 10)
Question 4
What is the output of df.shape for a DataFrame with 100 games and 12 columns?
A) 1200 B) (12, 100) C) (100, 12) D) [100, 12]
Question 5
To filter a DataFrame for games where the home team scored more than 35 points, which syntax is correct?
A) games[games.home_score > 35]
B) games.filter(home_score > 35)
C) games.where(home_score > 35)
D) games.select(games.home_score > 35)
Question 6
When combining multiple conditions in a pandas filter, which operators should you use?
A) and and or
B) && and ||
C) & and |
D) AND and OR
Question 7
What does df.head(3) return?
A) The first 3 columns B) The first 3 rows C) The last 3 rows D) A random sample of 3 rows
Section 2: Data Manipulation (Questions 8-14)
Question 8
What is the correct way to create a new column called "total_points" that sums home_score and away_score?
A) games.total_points = games.home_score + games.away_score
B) games["total_points"] = games["home_score"] + games["away_score"]
C) games.add_column("total_points", sum(home_score, away_score))
D) games.new("total_points", home_score + away_score)
Question 9
Which groupby aggregation calculates the average home score for each week?
A) games.groupby("week").mean("home_score")
B) games.groupby("week")["home_score"].mean()
C) games.aggregate("week", "home_score", "mean")
D) games["home_score"].groupby("week").average()
Question 10
What does .apply(func, axis=1) do?
A) Applies the function to each column B) Applies the function to each row C) Applies the function to the entire DataFrame D) Applies the function to the first column only
Question 11
To fill all missing values in a column with 0, which method would you use?
A) df["col"].replace_null(0)
B) df["col"].fillna(0)
C) df["col"].missing = 0
D) df["col"].nan_to(0)
Question 12
What does df.dropna(subset=["team", "score"]) do?
A) Drops the team and score columns B) Drops rows where team OR score is missing C) Drops rows where team AND score are missing D) Drops rows where ALL columns have missing values
Question 13
Which is the most efficient way to create a column based on if-else logic?
A) Using a for loop to iterate through rows
B) Using .apply() with a lambda function
C) Using np.where() or np.select()
D) All methods have the same performance
Question 14
To rename a column from "pts" to "points", which is correct?
A) df.columns["pts"] = "points"
B) df.rename(columns={"pts": "points"})
C) df.column_rename("pts", "points")
D) df["points"] = df["pts"]
Section 3: Merging and Joining (Questions 15-19)
Question 15
What is the difference between an inner join and a left join?
A) Inner join keeps all rows from the left table; left join keeps only matching rows B) Inner join keeps only matching rows; left join keeps all rows from the left table C) There is no difference D) Inner join is faster; left join is more accurate
Question 16
Given two DataFrames where the key column is named "team_id" in one and "id" in the other, which merge syntax is correct?
A) pd.merge(df1, df2, on=["team_id", "id"])
B) pd.merge(df1, df2, left_on="team_id", right_on="id")
C) df1.join(df2, "team_id" == "id")
D) pd.concat([df1, df2], on="team_id")
Question 17
What happens when you perform an outer join and there's no matching row in one table?
A) An error is raised B) The row is excluded C) Missing values (NaN) are filled in for the unmatched columns D) The join fails silently
Question 18
To stack two DataFrames vertically (add rows), which function should you use?
A) pd.merge()
B) pd.concat()
C) pd.join()
D) pd.append()
Question 19
What is the how parameter in pd.merge()?
A) It specifies which columns to keep in the result B) It specifies the type of join (inner, left, right, outer) C) It specifies the sorting order of the result D) It specifies whether to keep duplicates
Section 4: NumPy Operations (Questions 20-25)
Question 20
What is the main advantage of NumPy arrays over Python lists for numerical operations?
A) NumPy arrays can store more data types B) NumPy operations are vectorized and much faster C) NumPy arrays use less syntax D) Python lists cannot perform math operations
Question 21
What does np.mean(scores) return for scores = np.array([28, 35, 42, 21, 17])?
A) 28 B) 28.6 C) 35 D) 143
Question 22
Given two arrays of equal length, what does home_scores - away_scores produce?
A) A single number (the sum of differences) B) An array of element-wise differences C) An error (arrays can't be subtracted) D) A 2D array combining both
Question 23
What does np.where(margin > 20, "Blowout", "Close") do?
A) Returns the indices where margin > 20 B) Returns an array of "Blowout" or "Close" based on the condition C) Filters the array to only values > 20 D) Raises an error because np.where requires 1 argument
Question 24
What does arr.shape return for a 2D array with 10 rows and 5 columns?
A) (5, 10) B) (10, 5) C) 50 D) [10, 5]
Question 25
To calculate the correlation between two arrays, which function would you use?
A) np.correlation(a, b)
B) np.corrcoef(a, b)
C) np.correlate(a, b)
D) np.cor(a, b)
Answer Key
| Question | Answer | Explanation |
|---|---|---|
| 1 | B | DataFrames are 2D (rows and columns), Series are 1D |
| 2 | B | Selecting a single column returns a Series |
| 3 | B | iloc uses integer positions; note that end is exclusive |
| 4 | C | Shape returns (rows, columns) |
| 5 | A | Boolean indexing with condition in brackets |
| 6 | C | Use & for AND, |
| 7 | B | head(n) returns the first n rows |
| 8 | B | Use bracket notation to create/assign columns |
| 9 | B | Select column with bracket, then apply aggregation |
| 10 | B | axis=1 applies function row-wise |
| 11 | B | fillna() fills missing values |
| 12 | B | subset specifies columns; drops if ANY are missing |
| 13 | C | Vectorized operations are fastest |
| 14 | B | rename() with columns dictionary |
| 15 | B | Inner = only matches; left = all from left table |
| 16 | B | Use left_on/right_on for different column names |
| 17 | C | Unmatched rows get NaN for missing columns |
| 18 | B | concat stacks DataFrames; merge joins on keys |
| 19 | B | how specifies join type |
| 20 | B | Vectorization is NumPy's main performance advantage |
| 21 | B | (28+35+42+21+17)/5 = 28.6 |
| 22 | B | Array operations are element-wise by default |
| 23 | B | np.where returns values based on condition |
| 24 | B | Shape is (rows, columns) |
| 25 | B | corrcoef returns correlation coefficient matrix |
Scoring Guide
- 23-25 correct: Excellent! You have a strong foundation in pandas and NumPy.
- 18-22 correct: Good understanding. Review the topics you missed.
- 13-17 correct: Fair. Spend more time practicing with real data.
- Below 13: Review the chapter material and complete all exercises.
Topics to Review by Question
| Questions | Topic |
|---|---|
| 1-4 | DataFrame structure and properties |
| 5-7 | Selection and filtering |
| 8-10 | Column creation and transformation |
| 11-12 | Missing data handling |
| 13-14 | Efficient operations |
| 15-19 | Merging and joining |
| 20-25 | NumPy fundamentals |