Chapter 5 Quiz: Working with Data Structures

Q: What does the following list comprehension produce? ```python result = [x 2 for x in range(6) if x % 2 == 0] ``` - (A) `[0, 1, 4, 9, 16, 25]` - (B) `[0, 4, 16]` - (C) `[1, 9, 25]` - (D)** `[4, 16, 36]`

Correct: (B) `range(6)` produces `[0, 1, 2, 3, 4, 5]`. The filter `if x % 2 == 0` keeps only even numbers: `[0, 2, 4]`. Squaring each gives `[0, 4, 16]`. Choice (A) squares all numbers without filtering. Choice (C) squares only odd numbers. Choice (D) uses a different range.

Q: You want to add a new key-value pair to an existing dictionary. Which syntax is correct? - (A) `my_dict.add("key", "value")` - (B) `my_dict.append("key", "value")` - (C) `my_dict["key"] = "value"` - (D) `my_dict.insert("key", "value")`

Correct: (C) To add or modify a dictionary entry, use bracket notation with assignment: `my_dict["key"] = "value"`. Dictionaries do not have `add()` (A), `append()` (B), or `insert()` (D) methods — those belong to sets, lists, and lists respectively.

Contributors to Introduction to Data Science

Chapter 5 Quiz: Working with Data Structures

Instructions: This quiz tests your understanding of Chapter 5. Answer all questions before checking the solutions. For multiple choice, select the best answer — some options may be partially correct. For code analysis questions, predict the output before checking. Total points: 100.

Section 1: Multiple Choice (8 questions, 4 points each)

Question 1. Which data structure would be most appropriate for storing a mapping from country ISO codes (like "BRA") to full country names (like "Brazil")?

(A) List
(B) Tuple
(C) Dictionary
(D) Set

Answer

**Correct: (C)** A dictionary maps keys to values — exactly the use case here. `iso_to_name = {"BRA": "Brazil", "CAN": "Canada", ...}` gives you instant lookup by code. A list would require searching through all entries to find a match. A tuple is immutable and indexed by position, not by key. A set stores unique values but does not store key-value pairs.

Question 2. What will the following code print?

data = {"name": "Elena", "role": "analyst"}
print(data.get("salary", "not specified"))

(A) None
(B) KeyError
(C) "not specified"
(D) 0

Answer

**Correct: (C)** The `.get()` method returns the default value (second argument) when the key does not exist. Since `"salary"` is not in the dictionary, it returns `"not specified"`. If no default were provided, `.get()` would return `None` (choice A). Using `data["salary"]` (without `.get()`) would raise a `KeyError` (choice B).

Question 3. Which of the following is a valid dictionary key in Python?

(A) [1, 2, 3] (a list)
(B) (1, 2, 3) (a tuple)
(C) {"a": 1} (a dictionary)
(D) {1, 2, 3} (a set)

Answer

**Correct: (B)** Dictionary keys must be *hashable*, which means they must be immutable. Tuples are immutable and therefore hashable (as long as they contain only hashable elements). Lists (A), dictionaries (C), and sets (D) are all mutable and therefore cannot be used as dictionary keys. This is why GPS coordinates are often stored as tuples — `{(40.7, -74.0): "New York"}` works.

Question 4. What does the following list comprehension produce?

result = [x ** 2 for x in range(6) if x % 2 == 0]

(A) [0, 1, 4, 9, 16, 25]
(B) [0, 4, 16]
(C) [1, 9, 25]
(D) [4, 16, 36]

Answer

**Correct: (B)** `range(6)` produces `[0, 1, 2, 3, 4, 5]`. The filter `if x % 2 == 0` keeps only even numbers: `[0, 2, 4]`. Squaring each gives `[0, 4, 16]`. Choice (A) squares all numbers without filtering. Choice (C) squares only odd numbers. Choice (D) uses a different range.

Question 5. When reading a CSV file with csv.reader, what data type are all values returned as?

(A) The appropriate type (integers for numbers, strings for text)
(B) Strings — always
(C) Objects
(D) It depends on the CSV file's encoding

Answer

**Correct: (B)** The `csv` module returns every value as a string, regardless of what the value looks like. The string `"72.3"` is not the float `72.3`. You must explicitly convert using `int()` or `float()`. This is one of the most common sources of bugs in data processing with pure Python. Forgetting this conversion leads to incorrect comparisons (e.g., `"9" > "10"` is `True` in string comparison because "9" comes after "1" alphabetically).

Question 6. What is the primary advantage of using csv.DictReader over csv.reader?

(A) It is faster for large files
(B) It automatically converts numeric values to the correct type
(C) It returns each row as a dictionary with column names as keys
(D) It handles missing values automatically

Answer

**Correct: (C)** `csv.DictReader` uses the header row to create a dictionary for each data row, so you can access fields by name (e.g., `row["country"]`) instead of by index (e.g., `row[0]`). This makes code more readable and less error-prone. It does not convert types (B) — all values are still strings. It is not necessarily faster (A), and it does not handle missing values (D) — empty fields become empty strings.

Question 7. Which statement about sets is false?

(A) Sets automatically remove duplicate values
(B) You can check membership in a set with the in operator
(C) Sets maintain the order in which items were inserted
(D) Sets support union, intersection, and difference operations

Answer

**Correct: (C)** Sets are *unordered* collections. While CPython's implementation may sometimes appear to preserve insertion order for small sets, this is not guaranteed by the language specification. You should never rely on set ordering. All other statements are true: sets remove duplicates (A), support `in` for fast membership testing (B), and support mathematical set operations (D).

Question 8. You want to add a new key-value pair to an existing dictionary. Which syntax is correct?

(A) my_dict.add("key", "value")
(B) my_dict.append("key", "value")
(C) my_dict["key"] = "value"
(D) my_dict.insert("key", "value")

Answer

**Correct: (C)** To add or modify a dictionary entry, use bracket notation with assignment: `my_dict["key"] = "value"`. Dictionaries do not have `add()` (A), `append()` (B), or `insert()` (D) methods — those belong to sets, lists, and lists respectively.

Section 2: True/False with Justification (3 questions, 5 points each)

For each statement, indicate whether it is True or False, then write 1-2 sentences justifying your answer. A correct True/False answer without justification earns only 2 of 5 points.

Question 9. "A tuple can be used as a dictionary key, but a list cannot."

Answer

**True.** Dictionary keys must be hashable, which requires immutability. Tuples are immutable (their contents cannot be changed after creation), so they are hashable and can serve as keys. Lists are mutable, which makes them unhashable and ineligible as dictionary keys. Trying to use a list as a key raises `TypeError: unhashable type: 'list'`.

Question 10. "The expression my_list.sort() and sorted(my_list) produce the same result."

Answer

**False (with nuance).** They produce the same *sorted sequence*, but they work differently. `my_list.sort()` sorts the list *in place* and returns `None` — it modifies the original list. `sorted(my_list)` returns a *new* sorted list and leaves the original unchanged. If you write `result = my_list.sort()`, the variable `result` will be `None`, which catches many beginners off guard.

Question 11. "When you assign one list to another variable with b = a, modifying b will also modify a."

Answer

**True.** `b = a` does not create a copy — it creates a second reference to the *same* list object in memory. Any changes made through `b` (like `b.append(4)`) will be visible through `a` as well, because both names point to the same underlying list. To create an independent copy, use `b = a.copy()` or `b = list(a)`.

Section 3: Short Answer (3 questions, 6 points each)

Answer in 2-4 complete sentences. Clarity and precision matter more than length.

Question 12. Explain the difference between a list of dictionaries and a dictionary of lists as ways to represent tabular data. Give a one-sentence example of when each would be more convenient.

Answer

**Sample answer:** A list of dictionaries represents each *row* as a dictionary with column names as keys — this is natural when processing data record by record (e.g., iterating through patients). A dictionary of lists represents each *column* as a list — this is natural when performing operations on entire columns (e.g., computing the average of all vaccination rates). The list-of-dicts approach is more common in pure Python and matches the row-by-row output of `csv.DictReader`. The dict-of-lists approach matches how pandas DataFrames store data internally and is convenient when you create a DataFrame with `pd.DataFrame(column_dict)`.

Question 13. What does the with keyword do when opening files, and why is it important?

Answer

**Sample answer:** The `with` keyword creates a *context manager* that automatically closes the file when the indented block finishes, even if an error occurs during processing. This is important because open files consume operating system resources, and failing to close them can lead to resource leaks or data corruption (especially when writing). Without `with`, you would need to call `f.close()` manually, and you might forget — or an exception might prevent it from running. Using `with` is a best practice that makes file handling both safer and more readable.

Question 14. A classmate says: "I just use lists for everything — they work fine." Give two specific scenarios where using a dictionary or set would be significantly better than using a list, and explain why.

Answer

**Sample answer:** First, if you need to look up a country's vaccination rate by name, a dictionary provides instant access (`rates["Brazil"]`) while a list requires looping through all entries to find the matching country — much slower for large datasets. Second, if you need to find all unique values in a dataset (e.g., all unique regions), converting to a set removes duplicates automatically in a single operation (`unique = set(regions)`), whereas with a list you would need to write a loop that checks `if item not in seen_list` for each element, which is both slower and more code. Choosing the right data structure is not about preference — it is about correctness and efficiency.

Section 4: Code Analysis (4 questions, 6 points each)

Predict the output of each code snippet, then explain your reasoning. Write your prediction before expanding the answer.

Question 15. What does this code print?

inventory = {"apples": 5, "bananas": 3, "oranges": 8}
inventory["apples"] += 2
inventory["grapes"] = 4
del inventory["bananas"]
print(len(inventory))
print(list(inventory.keys()))

Answer

**Output:**

3
['apples', 'oranges', 'grapes']

**Explanation:** The dictionary starts with 3 keys. `inventory["apples"] += 2` modifies the existing "apples" entry (now 7) — no change in count. `inventory["grapes"] = 4` adds a new key, bringing the count to 4. `del inventory["bananas"]` removes a key, bringing the count to 3. The remaining keys, in insertion order, are `['apples', 'oranges', 'grapes']`.

Question 16. What does this code print?

data = [
    {"name": "A", "score": 90},
    {"name": "B", "score": 75},
    {"name": "C", "score": 85},
]
result = {d["name"]: d["score"] for d in data if d["score"] >= 80}
print(result)

Answer

**Output:**

{'A': 90, 'C': 85}

**Explanation:** The dictionary comprehension iterates over the list of dictionaries. The filter `if d["score"] >= 80` excludes "B" (score 75). For each remaining record, the key is the name and the value is the score. The result is a dictionary mapping names to scores for only those records with scores of 80 or above.

Question 17. What does this code print?

fruits = ["apple", "banana", "cherry"]
veggies = ["broccoli", "carrot"]
combined = fruits + veggies
fruits.append("date")
print(len(combined))
print(len(fruits))

Answer

**Output:**

5
4

**Explanation:** `fruits + veggies` creates a *new* list with 5 elements and assigns it to `combined`. This new list is independent of `fruits` and `veggies`. When `fruits.append("date")` adds an element to `fruits`, it has no effect on `combined`. So `combined` remains at 5 elements, and `fruits` grows to 4 elements.

Question 18. What does this code print?

numbers = [10, 20, 30, 40, 50]
subset = numbers[1:4]
subset[0] = 999
print(numbers)
print(subset)

Answer

**Output:**

[10, 20, 30, 40, 50]
[999, 30, 40]

**Explanation:** Slicing a list creates a *new* list (a shallow copy of that portion). `subset` is `[20, 30, 40]`, and modifying `subset[0]` to 999 does not affect the original `numbers` list. This is different from `subset = numbers`, which would create a reference to the same list object.

Section 5: Applied Scenario (2 questions, 10 points each)

These problems present realistic situations. Show your code and explain your reasoning.

Question 19. Elena's Data Processing

Elena has downloaded a CSV file with vaccination data. Here is a simplified version of her code:

import csv

records = []
with open("vaccinations.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        records.append(row)

# Attempt to find countries with rates above 70%
high_vax = []
for record in records:
    if record["vaccination_rate"] > 70:
        high_vax.append(record["country"])

print(high_vax)

The CSV file contains:

country,vaccination_rate
Brazil,72.3
Chad,41.7
Denmark,93.2
Ethiopia,8.5

(a) Elena runs this code and gets unexpected results: all four countries appear in high_vax, including Chad (41.7) and Ethiopia (8.5). What is the bug? (3 points)

(b) Write the corrected version of the filtering code. (3 points)

(c) Elena also wants to store the results as a dictionary mapping country names to their rates (as floats). Write a dictionary comprehension to do this. (4 points)

Answer

**(a)** The bug is that `csv.DictReader` returns all values as strings. The comparison `record["vaccination_rate"] > 70` compares a string to an integer. In Python, comparing a string to an integer raises a `TypeError` in Python 3. However, if Elena compared string to string (e.g., `> "70"`), string comparison is lexicographic: `"8.5" > "70"` is `True` because `"8"` > `"7"` in character comparison, and `"41.7" > "70"` is `False` because `"4"` < `"7"`. Either way, the comparison does not work as intended because the values are strings, not numbers. **(b)** Corrected code:

high_vax = []
for record in records:
    if float(record["vaccination_rate"]) > 70:
        high_vax.append(record["country"])

**(c)** Dictionary comprehension:

rate_lookup = {record["country"]: float(record["vaccination_rate"]) for record in records}

Question 20. Choosing Data Structures for Priya's Project

Priya is building a dataset of NBA player statistics. She needs to support these operations:

Look up any player's stats by name (e.g., "What are LeBron James's stats?")
Find all unique teams represented in the dataset
Get a list of all players sorted by points per game
Store each player's season-by-season stats (multiple seasons per player)

For each operation, recommend a specific data structure or combination of structures. Write a brief Python example (2-3 lines) showing the structure and how it would be accessed. Explain why your choice is better than alternatives.

Answer

**1. Lookup by name:** A dictionary keyed by player name.

players = {"LeBron James": {"team": "LAL", "ppg": 25.7, "rpg": 7.3}, ...}
print(players["LeBron James"]["ppg"])  # Instant lookup

Better than a list because lookup by name is O(1) vs. O(n) linear search. **2. Unique teams:** A set.

teams = {p["team"] for p in players.values()}
# or: teams = set(p["team"] for p in players.values())

Better than a list because sets enforce uniqueness automatically without needing `if team not in seen`. **3. Sorted by PPG:** A list of dictionaries (or list of tuples), sorted.

sorted_players = sorted(players.items(), key=lambda x: x[1]["ppg"], reverse=True)

Lists maintain order and support sorting; dictionaries and sets do not naturally support ordering by value. **4. Season-by-season stats:** A dictionary of lists (or dictionary of dictionaries keyed by season).

career = {
    "LeBron James": [
        {"season": "2022-23", "team": "LAL", "ppg": 28.9},
        {"season": "2023-24", "team": "LAL", "ppg": 25.7},
    ]
}
print(career["LeBron James"][0]["ppg"])  # First season's PPG

The outer dictionary provides fast player lookup; the inner list preserves the chronological order of seasons. **Rubric:** - 2.5 points per operation (1 point for correct structure choice, 1 point for working code example, 0.5 points for explanation of why it is better than alternatives).

Scoring & Next Steps

Section	Questions	Points	Your Score
1. Multiple Choice	8	32	___ / 32
2. True/False with Justification	3	15	___ / 15
3. Short Answer	3	18	___ / 18
4. Code Analysis	4	24	___ / 24
5. Applied Scenario	2	20 (note: 1 extra point possible)	___ / 20
Total	20	109 (scaled to 100)	___ / 100

Score	Assessment	Recommended Action
90-100	Excellent	You have a strong command of Python data structures and file I/O. Proceed to Chapter 6, where you will apply these skills to a real dataset. Consider tackling the Extension exercises (Part E) for extra depth.
70-89	Proficient	You understand the core concepts. Review any questions you missed, especially in Sections 4 and 5 — the ability to read and predict code behavior is essential. If dictionary operations or file reading felt shaky, re-read Sections 5.2 and 5.6 before moving on.
50-69	Developing	Revisit the chapter, focusing on: dictionary creation and access, the difference between mutable and immutable, and the CSV reading pattern. Then retake the quiz. These skills are the foundation for everything in Chapter 6 and beyond.
Below 50	Needs review	Re-read Sections 5.1-5.4 carefully, typing every code example into a Jupyter cell. Then work through Part A and Part B of the exercises before retaking this quiz. Data structures take practice — and the practice is worth it.

Remember: this quiz tests whether you can reason about data structures and file I/O, not whether you have memorized syntax. If you can explain why a dictionary is better than a list for lookups, predict the output of a comprehension, and debug a file-reading error, you are ready for Chapter 6.