Chapter 10 Quiz: Introduction to pandas
Instructions: Answer all questions. Multiple-choice questions have exactly one correct answer unless otherwise noted. For code questions, assume import pandas as pd and import numpy as np have already been run. The product catalog DataFrame from the exercises is available as df.
Part A: Conceptual Questions (Questions 1–7)
Question 1
What does the acronym "pandas" derive from?
A) Python ANalysis DAta Suite B) Panel Data — a term from econometrics C) Parallel ANalysis and DAta Structures D) Python Advanced Numerical Data Analysis System
Question 2
Which statement best describes the relationship between a pandas Series and a pandas DataFrame?
A) A DataFrame is a collection of lists; a Series is a dictionary. B) A DataFrame is a two-dimensional table; a Series is a one-dimensional labeled array. Each column in a DataFrame is a Series. C) A Series is a simplified DataFrame with only one row. D) A DataFrame and a Series are different names for the same object.
Question 3
Priya has a DataFrame df with 5,000 rows. She runs df.info(). Under which column would she look to detect missing values?
A) Dtype
B) Non-Null Count
C) Index
D) Memory Usage
Question 4
Which of the following correctly imports pandas using the universal convention?
A) import pandas
B) from pandas import *
C) import pandas as pd
D) import pd from pandas
Question 5
Maya has a DataFrame of projects. She wants to see a quick statistical summary (mean, median, min, max, standard deviation) of all numeric columns. Which method should she call?
A) df.info()
B) df.head()
C) df.summary()
D) df.describe()
Question 6
A pandas object dtype in a column most commonly means the column contains:
A) A mix of integers and floats B) Python dictionaries C) Strings (or mixed types) D) Boolean True/False values
Question 7
Which statement accurately describes a "vectorized operation" in pandas?
A) An operation that iterates through each row one at a time using a Python for loop
B) An operation applied to an entire column (array) simultaneously using optimized low-level code, without explicit Python looping
C) A mathematical operation using vectors from the math module
D) Any pandas method that ends in .apply()
Part B: Code Interpretation (Questions 8–12)
Read the code and choose the correct description of what it does.
Question 8
result = df[df['unit_price'] > 50]
What does result contain?
A) A single value: the first unit price above 50
B) A DataFrame containing only the rows where unit_price is greater than 50
C) A Boolean Series of True/False values
D) The integer count of rows where unit_price is above 50
Question 9
col_a = df['unit_price']
col_b = df[['unit_price']]
What is the difference between col_a and col_b?
A) They are identical; single and double brackets are interchangeable.
B) col_a is a pandas Series; col_b is a single-column pandas DataFrame.
C) col_a is a DataFrame; col_b is a Series.
D) col_a raises a KeyError; only double brackets work for column selection.
Question 10
df['margin'] = (df['unit_price'] - df['unit_cost']) / df['unit_price']
What does this line of code do?
A) Filters the DataFrame to rows where the margin can be calculated
B) Calculates the margin for the first row only
C) Creates a new column called margin in df, with the margin calculated for every row simultaneously
D) Returns a new DataFrame without modifying df
Question 11
filtered = df[(df['category'] == 'Gadgets') & (df['unit_price'] > 100)]
Why are the individual conditions wrapped in parentheses?
A) Parentheses are required by the .loc[] method
B) Without parentheses, Python's operator precedence would cause & to bind before ==, producing a TypeError
C) Parentheses prevent pandas from converting the result to a Series
D) They are optional style; the code would work identically without them
Question 12
df_sorted = df.sort_values('unit_price', ascending=False)
print(df.head(1))
print(df_sorted.head(1))
After running this code, what is true?
A) Both df and df_sorted show the same first row (the most expensive product).
B) df is unchanged; df_sorted is a new DataFrame sorted by price descending, so df_sorted.head(1) shows the most expensive product.
C) df has been sorted in place; df_sorted is an identical copy.
D) The code raises an error because .sort_values() requires inplace=True.
Part C: Code Writing (Questions 13–17)
Write the single line of pandas code that accomplishes each task. Assume df is the product catalog DataFrame from the exercises.
Question 13
Write a line of code that selects only the product_name and category columns from df.
(Write your answer below)
Question 14
Write a line of code that filters df to show only products in the Components category.
(Write your answer below)
Question 15
Write a line of code that adds a new column called gross_profit to df, calculated as unit_price minus unit_cost.
(Write your answer below)
Question 16
Write a line of code that sorts df by inventory in ascending order (lowest inventory first) and assigns the result back to df.
(Write your answer below)
Question 17
Write a line of code that filters df to show products where inventory is less than reorder_point (both are column names in df).
(Write your answer below)
Part D: Scenario Questions (Questions 18–20)
Question 18
Priya is building an automated report. She loads a CSV into df and immediately runs df.info(). She notices that the unit_price column shows Non-Null Count: 4187 but the DataFrame has 4,200 rows. What does this mean and what should she do?
A) It means 13 rows have non-numeric prices, so she should delete those rows immediately.
B) It means 13 rows have missing values in unit_price. She should investigate whether those rows should be dropped, filled with a default value, or excluded from analysis depending on the business context.
C) It means the CSV was loaded incorrectly and she should re-run pd.read_csv().
D) It means unit_price is stored as strings rather than numbers, and she should use .astype(float) on the column.
Question 19
Marcus tells Priya: "I need the product list filtered to show only Widgets with more than 200 units in stock, sorted by unit price high to low." Which code correctly accomplishes this?
A)
result = df[df['category'] == 'Widgets' and df['inventory'] > 200].sort_values('unit_price')
B)
result = df[(df['category'] == 'Widgets') & (df['inventory'] > 200)].sort_values('unit_price', ascending=False)
C)
result = df[df['category'] == 'Widgets'][df['inventory'] > 200].sort_values('unit_price', ascending=False)
D)
result = df.loc['Widgets', 'inventory' > 200].sort_values('unit_price', ascending=False)
Question 20
Maya has her project DataFrame indexed by project_id (strings like "MR-2024-01"). She writes:
print(df.iloc[0])
print(df.loc["MR-2024-01"])
After sorting the DataFrame by revenue_remaining descending and re-assigning it, she runs the same two lines again. Which of the following is true?
A) Both lines return the same row, because iloc and loc always return the same result.
B) df.iloc[0] now returns whichever project has the highest revenue_remaining (the new first row after sorting). df.loc["MR-2024-01"] still returns the MR-2024-01 project, regardless of its position.
C) df.loc["MR-2024-01"] raises a KeyError because the sort changed the index.
D) df.iloc[0] still returns MR-2024-01 because .iloc[] always references the original order.
Answer Key
Part A: Conceptual
Q1 — B The name "pandas" derives from "panel data," a term used in econometrics for multidimensional structured datasets. Wes McKinney created pandas while working with financial panel data.
Q2 — B A Series is a one-dimensional labeled array. A DataFrame is a two-dimensional table. Each column in a DataFrame is a Series — they share the same row index.
Q3 — B
The Non-Null Count column in .info() output shows how many non-missing values exist in each column. If Non-Null Count is less than the total number of rows, that column has missing values.
Q4 — C
import pandas as pd is the universal convention used throughout the Python data science ecosystem. Using pd as the alias is not required technically, but deviating from it creates confusion when reading any external documentation or community code.
Q5 — D
.describe() generates descriptive statistics for all numeric columns: count, mean, standard deviation, min, 25th percentile, median (50th percentile), 75th percentile, and max. .info() gives structural metadata, not statistical summaries.
Q6 — C
The object dtype almost always means strings. Pandas uses object as its dtype for columns containing Python objects, and in practice this nearly always means string (text) data. It can also appear when a column has mixed types (e.g., some rows have strings, some have numbers), which is a data quality problem worth investigating.
Q7 — B
A vectorized operation is one that operates on an entire array at once using optimized C code under the hood, rather than iterating row by row in Python. This is what makes pandas dramatically faster than looping through a DataFrame with for loops or .iterrows().
Part B: Code Interpretation
Q8 — B
The expression df['unit_price'] > 50 produces a Boolean Series (True/False per row). Passing that Boolean Series inside df[...] returns a DataFrame containing only the rows where the condition is True — i.e., where unit_price is greater than 50.
Q9 — B
Single-bracket notation df['unit_price'] returns a Series. Double-bracket notation df[['unit_price']] passes a list of column names and returns a DataFrame — in this case, a DataFrame with one column. They look similar when printed, but have different types and different behaviors when used in further operations.
Q10 — C
This is a vectorized column assignment. The right-hand side calculates margin for every row simultaneously (vectorized arithmetic on Series). The result is stored as a new column named 'margin' in the existing DataFrame df. The original df is modified in place.
Q11 — B
Python's operator precedence gives & higher precedence than ==. Without the outer parentheses, Python would try to evaluate 'Gadgets' & df['unit_price'] before completing the equality check, which produces a TypeError. The parentheses force each comparison to be evaluated first, producing a Boolean Series on each side of &.
Q12 — B
By default, .sort_values() returns a new DataFrame — it does not modify df in place. Therefore df is unchanged (its first row is whatever it was before), and df_sorted is a new DataFrame sorted by unit_price descending. df_sorted.head(1) shows the most expensive product; df.head(1) shows the original first row.
Part C: Code Writing
Q13
df[['product_name', 'category']]
Q14
df[df['category'] == 'Components']
Q15
df['gross_profit'] = df['unit_price'] - df['unit_cost']
Q16
df = df.sort_values('inventory', ascending=True)
Q17
df[df['inventory'] < df['reorder_point']]
Part D: Scenario Questions
Q18 — B
A Non-Null Count of 4,187 out of 4,200 rows means 13 rows have NaN (missing) values in the unit_price column. The correct response is to investigate: Are these products discontinued? Were they loaded incorrectly from the source? Can they be filled in? Simply deleting them (A) discards potentially valid data. Re-loading the CSV (C) would produce the same result since the source data has the missing values. Using .astype(float) (D) addresses a dtype issue, not a missing value issue.
Q19 — B
Option A uses Python's and keyword instead of &, which does not work with pandas Series. Option C chains two separate [] operations without using &, which produces incorrect behavior. Option D misuses .loc[] syntax. Option B correctly uses & with both conditions wrapped in parentheses, then chains .sort_values() with ascending=False for high-to-low order.
Q20 — B
After sorting by revenue_remaining descending, the DataFrame's row order changes. .iloc[0] always returns the first row by current position, so it now returns whichever project has the highest revenue remaining. .loc["MR-2024-01"] returns the row with index label "MR-2024-01" regardless of where it falls in the sort order — the index label does not change when you sort. This is the fundamental distinction between .iloc[] and .loc[].
Scoring Guide
| Score | Interpretation |
|---|---|
| 18–20 | Excellent. You have a strong command of Chapter 10 fundamentals. |
| 15–17 | Good. Review the questions you missed, particularly around .loc[] vs. .iloc[] and Boolean filtering syntax. |
| 11–14 | Satisfactory. Re-read Sections 10.7 and 10.8 and complete Tier 1–2 exercises before moving to Chapter 11. |
| 7–10 | Needs review. Work through the chapter examples interactively in a Jupyter notebook or Python REPL before proceeding. |
| 0–6 | Return to the chapter. Consider reading Sections 10.3–10.10 again with the code examples open in a Python environment. |