Quiz: Python for Football Analytics
Target: 70% or higher to proceed.
Section 1: Multiple Choice (1 point each)
1. What is the primary benefit of using a virtual environment?
- A) It makes Python run faster
- B) It isolates project dependencies from other projects
- C) It provides automatic code formatting
- D) It enables parallel processing
Answer
**B)** It isolates project dependencies from other projects *Explanation:* Virtual environments ensure that each project has its own set of packages, preventing conflicts between different projects' requirements.2. Which pandas method returns a Series when selecting a single column?
- A)
df.loc[:, 'column'] - B)
df[['column']] - C)
df['column'] - D)
df.filter(['column'])
Answer
**C)** `df['column']` *Explanation:* Single bracket notation returns a Series, while double brackets `[['column']]` return a DataFrame.3. What operators should you use for multiple conditions in pandas boolean indexing?
- A)
and,or - B)
&,| - C)
&&,|| - D)
AND,OR
Answer
**B)** `&`, `|` *Explanation:* Python's `and`/`or` don't work element-wise on arrays. Use `&` for AND and `|` for OR, and wrap each condition in parentheses.4. What does df.groupby('team')['epa'].transform('mean') return?
- A) A single mean value for each team
- B) A Series the same length as the original DataFrame
- C) A DataFrame with team names as index
- D) A dictionary of means
Answer
**B)** A Series the same length as the original DataFrame *Explanation:* `.transform()` returns a result with the same shape as the input, broadcasting group statistics back to each row. This is useful for comparing individual values to group averages.5. How do you avoid the SettingWithCopyWarning?
- A) Use
.iloc[]instead of.loc[] - B) Add
.copy()when creating a subset DataFrame - C) Use
inplace=Trueon all operations - D) Disable warnings with
pd.options.mode.chained_assignment = None
Answer
**B)** Add `.copy()` when creating a subset DataFrame *Explanation:* The warning indicates you may be modifying a view rather than a copy. Using `.copy()` explicitly creates a new DataFrame you can safely modify.6. What's the correct way to handle NaN values in a comparison?
- A)
df['col'] > 0automatically excludes NaN - B) Use
df['col'].fillna(0) > 0or check explicitly withnotna() - C) NaN values are always treated as False
- D) Use
df['col'] > 0 or pd.isna(df['col'])
Answer
**B)** Use `df['col'].fillna(0) > 0` or check explicitly with `notna()` *Explanation:* NaN comparisons return NaN (not False). Handle NaN explicitly to ensure expected behavior.7. Which NumPy function is used for conditional column creation with multiple conditions?
- A)
np.where()for all cases - B)
np.select()for multiple conditions - C)
np.choose() - D)
np.if_else()
Answer
**B)** `np.select()` for multiple conditions *Explanation:* `np.where()` handles simple if-else. For multiple conditions with different values, use `np.select(conditions, choices, default)`.8. What is the typical speedup of NumPy vectorization over Python loops?
- A) 2-5x faster
- B) 10-100x faster
- C) 1000x faster
- D) No significant difference
Answer
**B)** 10-100x faster *Explanation:* Vectorized NumPy operations avoid Python's interpreter overhead and use optimized C code, typically achieving 10-100x speedups.9. In a function docstring, what does the "Parameters" section describe?
- A) The function's return value
- B) The function's input arguments and their types
- C) Example usage of the function
- D) The function's internal variables
Answer
**B)** The function's input arguments and their types *Explanation:* The Parameters section documents each input parameter, its type, and what it represents.10. What does the how='left' parameter do in pd.merge()?
- A) Sorts the result by the left DataFrame's index
- B) Keeps all rows from the left DataFrame, filling with NaN where no match exists
- C) Only includes rows that appear in the left DataFrame
- D) Merges on columns from the left side only
Answer
**B)** Keeps all rows from the left DataFrame, filling with NaN where no match exists *Explanation:* A left join retains all rows from the left DataFrame, adding matched data from the right or NaN if no match.Section 2: True/False (1 point each)
11. Method chaining in pandas always creates new DataFrame copies at each step.
Answer
**False** *Explanation:* Pandas often uses views for efficiency. Explicit `.copy()` is needed when you want to ensure a copy is made.12. The .apply() method is always faster than a Python for loop.
Answer
**False** *Explanation:* `.apply()` with a Python function still has overhead. For maximum speed, use vectorized operations with NumPy or pandas built-in methods.13. Type hints in Python are enforced at runtime.
Answer
**False** *Explanation:* Python type hints are for documentation and static analysis tools (like mypy). They are not enforced at runtime.14. Using df.query("column > 5") is equivalent to df[df['column'] > 5].
Answer
**True** *Explanation:* `.query()` provides a string-based syntax for filtering that is functionally equivalent to boolean indexing.15. The @timer decorator pattern can be used to measure function execution time.
Answer
**True** *Explanation:* Decorators that wrap functions and add timing logic before/after execution are a common pattern for profiling.Section 3: Code Analysis (2 points each)
16. What's wrong with this code?
filtered = pbp[pbp['pass'] == 1]
filtered['success'] = filtered['epa'] > 0
Answer
**SettingWithCopyWarning / Modifying a view** The fix is to add `.copy()`:filtered = pbp[pbp['pass'] == 1].copy()
filtered['success'] = filtered['epa'] > 0
17. What's wrong with this code?
mask = df['epa'] > 0 and df['pass'] == 1
result = df[mask]
Answer
**Using `and` instead of `&` for element-wise operations** The fix:mask = (df['epa'] > 0) & (df['pass'] == 1)
result = df[mask]
Each condition must be wrapped in parentheses when using `&` or `|`.
18. What does this code return?
pbp.groupby('posteam')['epa'].agg(['sum', 'mean', 'count'])
Answer
A DataFrame with: - Index: unique team names (`posteam`) - Columns: `sum`, `mean`, `count` (EPA statistics) - One row per team The result has a multi-level column structure if not flattened.19. Rewrite this slow code to be faster:
results = []
for idx, row in pbp.iterrows():
results.append(row['epa'] * 2)
df['epa_doubled'] = results
Answer
df['epa_doubled'] = df['epa'] * 2
Vectorized operations are 100x+ faster than iterating with `.iterrows()`.
Section 4: Short Answer (2 points each)
20. Explain when you would use .transform() vs .agg() in a groupby operation.
Sample Answer
Use `.agg()` when you want to reduce each group to a single summary value (e.g., calculating team totals or averages). The result is one row per group. Use `.transform()` when you want to broadcast a group statistic back to each row of the original DataFrame (e.g., adding a column showing each play's EPA vs. team average). The result has the same length as the input.21. Why is caching important when loading NFL data, and what should you consider when implementing it?
Sample Answer
Caching avoids repeated downloads of large datasets, saving time and reducing load on data servers. Considerations: - Cache location (local files, memory) - Cache invalidation (when to refresh) - Storage format (parquet is efficient) - Cache key design (season, data type)22. What are the key components of a well-written function docstring?
Sample Answer
A good docstring includes: 1. **Brief description** of what the function does 2. **Parameters section** listing each argument, its type, and purpose 3. **Returns section** describing the return value and type 4. **Examples** showing usage 5. **Raises section** if the function raises exceptionsSection 5: Matching (1 point each)
Match the pandas operation with its purpose:
| Operation | Purpose |
|---|---|
23. .merge() |
A. Combine rows from multiple DataFrames |
24. .concat() |
B. Join DataFrames on matching columns |
25. .groupby().transform() |
C. Broadcast group statistics to each row |
Answers
**23. B** - `.merge()` joins DataFrames on matching columns (like SQL JOIN) **24. A** - `.concat()` stacks DataFrames vertically or horizontally **25. C** - `.transform()` broadcasts group calculations back to original rowsScoring
| Section | Points | Your Score |
|---|---|---|
| Multiple Choice (1-10) | 10 | ___ |
| True/False (11-15) | 5 | ___ |
| Code Analysis (16-19) | 8 | ___ |
| Short Answer (20-22) | 6 | ___ |
| Matching (23-25) | 3 | ___ |
| Total | 32 | ___ |
Passing Score: 23/32 (70%)