What's wrong with this code? ```python filtered = pbp[pbp['pass'] == 1] filtered['success'] = filtered['epa'] > 0 ```

SettingWithCopyWarning / Modifying a view The fix is to add `.copy()`: ```python filtered = pbp[pbp['pass'] == 1].copy() filtered['success'] = filtered['epa'] > 0 ```

What's wrong with this code? ```python mask = df['epa'] > 0 and df['pass'] == 1 result = df[mask] ```

Using `and` instead of `&` for element-wise operations The fix: ```python mask = (df['epa'] > 0) & (df['pass'] == 1) result = df[mask] ``` Each condition must be wrapped in parentheses when using `&` or `|`.

Quiz: Python for Football Analytics

Q: What is the primary benefit of using a virtual environment? It makes Python run faster It isolates project dependencies from other projects It provides automatic code formatting It enables parallel processing

B) It isolates project dependencies from other projects Explanation: Virtual environments ensure that each project has its own set of packages, preventing conflicts between different projects' requirements.

Q: Which pandas method returns a Series when selecting a single column? `df.loc[:, 'column']` `df[['column']]` `df['column']` `df.filter(['column'])`

C) `df['column']` Explanation: Single bracket notation returns a Series, while double brackets `[['column']]` return a DataFrame.

Q: What operators should you use for multiple conditions in pandas boolean indexing? `and`, `or` `&`, `|` `&&`, `||` `AND`, `OR`

B) `&`, `|` Explanation: Python's `and`/`or` don't work element-wise on arrays. Use `&` for AND and `|` for OR, and wrap each condition in parentheses.

Q: What does `df.groupby('team')['epa'].transform('mean')` return? A single mean value for each team A Series the same length as the original DataFrame A DataFrame with team names as index A dictionary of means

B) A Series the same length as the original DataFrame Explanation: `.transform()` returns a result with the same shape as the input, broadcasting group statistics back to each row. This is useful for comparing individual values to group averages.

Q: How do you avoid the `SettingWithCopyWarning`? Use `.iloc[]` instead of `.loc[]` Add `.copy()` when creating a subset DataFrame Use `inplace=True` on all operations Disable warnings with `pd.options.mode.chained_assignment = None`

B) Add `.copy()` when creating a subset DataFrame Explanation: The warning indicates you may be modifying a view rather than a copy. Using `.copy()` explicitly creates a new DataFrame you can safely modify.

Q: What's the correct way to handle NaN values in a comparison? `df['col'] > 0` automatically excludes NaN Use `df['col'].fillna(0) > 0` or check explicitly with `notna()` NaN values are always treated as False Use `df['col'] > 0 or pd.isna(df['col'])`

B) Use `df['col'].fillna(0) > 0` or check explicitly with `notna()` Explanation: NaN comparisons return NaN (not False). Handle NaN explicitly to ensure expected behavior.

Q: Which NumPy function is used for conditional column creation with multiple conditions? `np.where()` for all cases `np.select()` for multiple conditions `np.choose()` `np.if_else()`

B) `np.select()` for multiple conditions Explanation: `np.where()` handles simple if-else. For multiple conditions with different values, use `np.select(conditions, choices, default)`.

Q: What is the typical speedup of NumPy vectorization over Python loops? 2-5x faster 10-100x faster 1000x faster No significant difference

B) 10-100x faster Explanation: Vectorized NumPy operations avoid Python's interpreter overhead and use optimized C code, typically achieving 10-100x speedups.

Q: In a function docstring, what does the "Parameters" section describe? The function's return value The function's input arguments and their types Example usage of the function The function's internal variables

B) The function's input arguments and their types Explanation: The Parameters section documents each input parameter, its type, and what it represents.

Target: 70% or higher to proceed.

Section 1: Multiple Choice (1 point each)

1. What is the primary benefit of using a virtual environment?

A) It makes Python run faster
B) It isolates project dependencies from other projects
C) It provides automatic code formatting
D) It enables parallel processing

Answer

**B)** It isolates project dependencies from other projects *Explanation:* Virtual environments ensure that each project has its own set of packages, preventing conflicts between different projects' requirements.

2. Which pandas method returns a Series when selecting a single column?

A) df.loc[:, 'column']
B) df[['column']]
C) df['column']
D) df.filter(['column'])

Answer

**C)** `df['column']` *Explanation:* Single bracket notation returns a Series, while double brackets `[['column']]` return a DataFrame.

3. What operators should you use for multiple conditions in pandas boolean indexing?

A) and, or
B) &, |
C) &&, ||
D) AND, OR

Answer

**B)** `&`, `|` *Explanation:* Python's `and`/`or` don't work element-wise on arrays. Use `&` for AND and `|` for OR, and wrap each condition in parentheses.

4. What does df.groupby('team')['epa'].transform('mean') return?

A) A single mean value for each team
B) A Series the same length as the original DataFrame
C) A DataFrame with team names as index
D) A dictionary of means

Answer

**B)** A Series the same length as the original DataFrame *Explanation:* `.transform()` returns a result with the same shape as the input, broadcasting group statistics back to each row. This is useful for comparing individual values to group averages.

5. How do you avoid the SettingWithCopyWarning?

A) Use .iloc[] instead of .loc[]
B) Add .copy() when creating a subset DataFrame
C) Use inplace=True on all operations
D) Disable warnings with pd.options.mode.chained_assignment = None

Answer

**B)** Add `.copy()` when creating a subset DataFrame *Explanation:* The warning indicates you may be modifying a view rather than a copy. Using `.copy()` explicitly creates a new DataFrame you can safely modify.

6. What's the correct way to handle NaN values in a comparison?

A) df['col'] > 0 automatically excludes NaN
B) Use df['col'].fillna(0) > 0 or check explicitly with notna()
C) NaN values are always treated as False
D) Use df['col'] > 0 or pd.isna(df['col'])

Answer

**B)** Use `df['col'].fillna(0) > 0` or check explicitly with `notna()` *Explanation:* NaN comparisons return NaN (not False). Handle NaN explicitly to ensure expected behavior.

7. Which NumPy function is used for conditional column creation with multiple conditions?

A) np.where() for all cases
B) np.select() for multiple conditions
C) np.choose()
D) np.if_else()

Answer

**B)** `np.select()` for multiple conditions *Explanation:* `np.where()` handles simple if-else. For multiple conditions with different values, use `np.select(conditions, choices, default)`.

8. What is the typical speedup of NumPy vectorization over Python loops?

A) 2-5x faster
B) 10-100x faster
C) 1000x faster
D) No significant difference

Answer

**B)** 10-100x faster *Explanation:* Vectorized NumPy operations avoid Python's interpreter overhead and use optimized C code, typically achieving 10-100x speedups.

9. In a function docstring, what does the "Parameters" section describe?

A) The function's return value
B) The function's input arguments and their types
C) Example usage of the function
D) The function's internal variables

Answer

**B)** The function's input arguments and their types *Explanation:* The Parameters section documents each input parameter, its type, and what it represents.

10. What does the how='left' parameter do in pd.merge()?

A) Sorts the result by the left DataFrame's index
B) Keeps all rows from the left DataFrame, filling with NaN where no match exists
C) Only includes rows that appear in the left DataFrame
D) Merges on columns from the left side only

Answer

**B)** Keeps all rows from the left DataFrame, filling with NaN where no match exists *Explanation:* A left join retains all rows from the left DataFrame, adding matched data from the right or NaN if no match.

Section 2: True/False (1 point each)

11. Method chaining in pandas always creates new DataFrame copies at each step.

Answer

**False** *Explanation:* Pandas often uses views for efficiency. Explicit `.copy()` is needed when you want to ensure a copy is made.

12. The .apply() method is always faster than a Python for loop.

Answer

**False** *Explanation:* `.apply()` with a Python function still has overhead. For maximum speed, use vectorized operations with NumPy or pandas built-in methods.

13. Type hints in Python are enforced at runtime.

Answer

**False** *Explanation:* Python type hints are for documentation and static analysis tools (like mypy). They are not enforced at runtime.

14. Using df.query("column > 5") is equivalent to df[df['column'] > 5].

Answer

**True** *Explanation:* `.query()` provides a string-based syntax for filtering that is functionally equivalent to boolean indexing.

15. The @timer decorator pattern can be used to measure function execution time.

Answer

**True** *Explanation:* Decorators that wrap functions and add timing logic before/after execution are a common pattern for profiling.

Section 3: Code Analysis (2 points each)

16. What's wrong with this code?

filtered = pbp[pbp['pass'] == 1]
filtered['success'] = filtered['epa'] > 0

Answer

**SettingWithCopyWarning / Modifying a view** The fix is to add `.copy()`:

filtered = pbp[pbp['pass'] == 1].copy()
filtered['success'] = filtered['epa'] > 0

17. What's wrong with this code?

mask = df['epa'] > 0 and df['pass'] == 1
result = df[mask]

Answer

**Using `and` instead of `&` for element-wise operations** The fix:

mask = (df['epa'] > 0) & (df['pass'] == 1)
result = df[mask]

Each condition must be wrapped in parentheses when using `&` or `|`.

18. What does this code return?

pbp.groupby('posteam')['epa'].agg(['sum', 'mean', 'count'])

Answer

A DataFrame with: - Index: unique team names (`posteam`) - Columns: `sum`, `mean`, `count` (EPA statistics) - One row per team The result has a multi-level column structure if not flattened.

19. Rewrite this slow code to be faster:

results = []
for idx, row in pbp.iterrows():
    results.append(row['epa'] * 2)
df['epa_doubled'] = results

Answer

df['epa_doubled'] = df['epa'] * 2

Vectorized operations are 100x+ faster than iterating with `.iterrows()`.

Section 4: Short Answer (2 points each)

20. Explain when you would use .transform() vs .agg() in a groupby operation.

Sample Answer

Use `.agg()` when you want to reduce each group to a single summary value (e.g., calculating team totals or averages). The result is one row per group. Use `.transform()` when you want to broadcast a group statistic back to each row of the original DataFrame (e.g., adding a column showing each play's EPA vs. team average). The result has the same length as the input.

21. Why is caching important when loading NFL data, and what should you consider when implementing it?

Sample Answer

Caching avoids repeated downloads of large datasets, saving time and reducing load on data servers. Considerations: - Cache location (local files, memory) - Cache invalidation (when to refresh) - Storage format (parquet is efficient) - Cache key design (season, data type)

22. What are the key components of a well-written function docstring?

Sample Answer

A good docstring includes: 1. **Brief description** of what the function does 2. **Parameters section** listing each argument, its type, and purpose 3. **Returns section** describing the return value and type 4. **Examples** showing usage 5. **Raises section** if the function raises exceptions

Section 5: Matching (1 point each)

Match the pandas operation with its purpose:

Operation	Purpose
23. `.merge()`	A. Combine rows from multiple DataFrames
24. `.concat()`	B. Join DataFrames on matching columns
25. `.groupby().transform()`	C. Broadcast group statistics to each row

Answers

**23. B** - `.merge()` joins DataFrames on matching columns (like SQL JOIN) **24. A** - `.concat()` stacks DataFrames vertically or horizontally **25. C** - `.transform()` broadcasts group calculations back to original rows

Scoring

Section	Points	Your Score
Multiple Choice (1-10)	10	___
True/False (11-15)	5	___
Code Analysis (16-19)	8	___
Short Answer (20-22)	6	___
Matching (23-25)	3	___
Total	32	___

Passing Score: 23/32 (70%)