Chapter 2 Quiz: Setting Up Your Toolkit

Contributors to Introduction to Data Science

Chapter 2 Quiz: Setting Up Your Toolkit

Instructions: This quiz tests your understanding of Chapter 2. Answer all questions before checking the solutions. For multiple choice, select the best answer. For short answer questions, aim for 2-4 clear sentences. For code analysis questions, predict the output before checking. Total points: 100.

Section 1: Multiple Choice (8 questions, 4 points each)

Question 1. What is Anaconda?

(A) A Python library for data visualization
(B) A free distribution of Python that bundles Jupyter, data science libraries, and a package manager into one installer
(C) A type of Jupyter notebook optimized for large datasets
(D) An online platform for running Python code in the cloud

Answer

**Correct: (B)** - **(A)** is incorrect — that would describe something like matplotlib or seaborn. - **(B)** is correct. Anaconda is a distribution (a bundled package) that includes Python, Jupyter, conda (the package manager), and hundreds of data science libraries like pandas, NumPy, and matplotlib. It simplifies setup by giving beginners everything they need in one installer. - **(C)** is incorrect — Anaconda is not a type of notebook. - **(D)** is incorrect — that would describe something like Google Colab. Anaconda is installed locally on your computer.

Question 2. In Jupyter Notebook, what is the kernel?

(A) The file format used to save notebooks (.ipynb)
(B) The web browser that displays the notebook interface
(C) The computational engine that executes code sent from the notebook
(D) The toolbar at the top of the notebook that contains buttons and menus

Answer

**Correct: (C)** The kernel is the background process that receives code from the notebook, executes it (using the Python interpreter), and sends the results back to be displayed. When you "restart the kernel," you clear its memory — all variables, imported modules, and previously executed code are forgotten. The kernel runs independently of the notebook interface: the notebook is the *document*, the kernel is the *engine*.

Question 3. You're in Jupyter and want to create a new cell below the current one. Which of the following works? (Assume you're in command mode.)

(A) Press the B key
(B) Press Shift+Enter
(C) Press Ctrl+N
(D) Press Insert on your keyboard

Answer

**Correct: (A)** In command mode (press Esc first), **B** inserts a new cell **b**elow the current cell, and **A** inserts one **a**bove. Shift+Enter *runs* the current cell and moves to the next one (creating one if needed, but that's a side effect, not its purpose). Ctrl+N typically opens a new browser window, not a new cell. The Insert key has no default function in Jupyter.

Question 4. What is the main purpose of Markdown cells in a Jupyter notebook?

(A) To run Python code that is slower than normal
(B) To write formatted text — headings, paragraphs, lists, and other explanations — alongside code
(C) To import external Python libraries
(D) To save the notebook in a special compressed format

Answer

**Correct: (B)** Markdown cells contain formatted text, not code. They're used to explain what you're doing, why you're doing it, and what the results mean. They support headings, bold, italic, lists, links, block quotes, and more. When you "run" a Markdown cell (Shift+Enter), the plain-text formatting syntax gets rendered into formatted text. Without Markdown cells, a notebook would be a disconnected series of code snippets with no narrative.

Question 5. Why does the chapter recommend periodically using "Kernel > Restart & Run All"?

(A) To speed up the notebook's performance
(B) To verify that the notebook's cells work correctly when run sequentially from top to bottom
(C) To save the notebook and create a backup
(D) To update Python to the latest version

Answer

**Correct: (B)** Because Jupyter lets you run cells in any order, it's possible to create dependencies that only work in the specific order you happened to run them. For example, you might define a variable in cell 10 and use it in cell 3 — this works if you ran cell 10 first, but fails if someone runs the notebook top-to-bottom. "Restart & Run All" clears the kernel's memory and executes every cell sequentially, catching these out-of-order problems.

Question 6. Which of the following is NOT a reason the chapter gives for choosing Python over Excel for data science?

(A) Python code serves as a record of every step, making analysis reproducible
(B) Python can handle datasets with millions of rows
(C) Python is always faster than Excel for simple calculations
(D) Python can automate repetitive analyses

Answer

**Correct: (C)** The chapter explicitly acknowledges that Excel can be *faster* for simple tasks: "If you have 500 rows and want a bar chart, Excel is probably faster than Python." The advantages of Python are reproducibility (A), scalability (B), and automation (D) — not raw speed on small tasks. The case for Python is about doing things that Excel *can't* do well (or at all), not about doing everything faster.

Question 7. What does the .ipynb file extension stand for?

(A) Interactive Python Notebook
(B) IPython Notebook
(C) Integrated Python Notebook
(D) Internal Python Notebook

Answer

**Correct: (B)** The `.ipynb` extension stands for "IPython Notebook," a historical name from before the Jupyter project existed. Jupyter was originally called IPython Notebook because it only supported Python. The name "Jupyter" (introduced in 2014) comes from the three core programming languages it was designed to support: **Ju**lia, **Py**thon, and **R**. The file extension kept the old name for backward compatibility.

Question 8. When you launch Jupyter Notebook, it opens in your web browser. Which statement about this is correct?

(A) Your data is being sent to a remote server for processing
(B) You need an active internet connection to use Jupyter at all times
(C) A local notebook server runs on your computer, and the browser connects to it — everything stays on your machine
(D) Jupyter requires Google Chrome specifically and won't work in other browsers

Answer

**Correct: (C)** When you launch Jupyter, a small web server (the notebook server) starts running on your own computer. Your browser connects to this local server. No data leaves your machine. You don't need an internet connection to use Jupyter (only to install it). The browser is just being used as a display technology — Jupyter works in Chrome, Firefox, Safari, Edge, and other modern browsers.

Section 2: True or False (4 questions, 4 points each)

Question 9. True or False: In Jupyter, you must use the print() function to see any output from a code cell.

Answer

**False.** If the last line of a code cell is an expression that produces a value (like `2 + 3` or `x`), Jupyter will automatically display that value — no `print()` needed. This automatic display is a convenience feature of Jupyter's interactive environment. However, `print()` is needed when: (1) you want to display something that's not the last line, (2) you want to display multiple values, or (3) you want to control the format.

Question 10. True or False: Restarting the kernel deletes all the code in your notebook.

Answer

**False.** Restarting the kernel clears the kernel's *memory* — all variables, imported libraries, and execution history. But the notebook's *content* (the code and Markdown in your cells) is preserved. Think of it as erasing the blackboard but keeping the textbook. After restarting, you can re-run all your cells to rebuild the kernel's state.

Question 11. True or False: Markdown cells and code cells use the same keyboard shortcut (Shift+Enter) to run.

Answer

**True.** Shift+Enter runs the current cell regardless of its type. For code cells, it sends the code to the kernel for execution. For Markdown cells, it renders the Markdown formatting. The behavior differs, but the shortcut is the same — this is by design, so you can flow through a notebook running cell after cell without thinking about types.

Question 12. True or False: JupyterLab and Jupyter Notebook are completely different tools with incompatible file formats.

Answer

**False.** JupyterLab and Jupyter Notebook (classic) use the same `.ipynb` file format and the same kernel system. JupyterLab is a newer, more feature-rich interface (with a file browser, multiple tabs, a built-in terminal, etc.), but the notebooks themselves are fully compatible between the two. Everything you learn about cells, Markdown, kernels, and shortcuts applies to both.

Section 3: Short Answer (3 questions, 6 points each)

Question 13. Explain the difference between command mode and edit mode in Jupyter. How do you switch between them? Why does Jupyter have two modes?

Answer

**Command mode** is the notebook-level mode where you select, create, delete, move, and change the type of cells. The active cell has a blue border. You enter command mode by pressing **Esc**. **Edit mode** is the cell-level mode where you type code or text inside a cell. The active cell has a green border. You enter edit mode by pressing **Enter** on a selected cell (or clicking inside it). Jupyter has two modes so that letter keys can serve double duty: in edit mode, pressing **B** types the letter "b." In command mode, pressing **B** inserts a new cell below. Without separate modes, there would be no way to use single-key shortcuts without conflicting with normal typing. This design is borrowed from the text editor Vim, which uses a similar modal approach.

Question 14. A friend tells you: "I wrote a great analysis in Excel, and I know exactly what I did. Why would I bother with notebooks?" Give two specific reasons why a Jupyter notebook would be better for sharing and repeating the analysis.

Answer

Two strong reasons: 1. **Reproducibility.** In a notebook, every step is written as code. Anyone (including your future self) can read the code to understand exactly how each result was produced, and can re-run the entire analysis to verify or replicate it. In Excel, the steps are clicks, drags, and menu selections that leave no trace — if someone asks "how did you get this number?" the only answer is "I remember" (or "I don't"). 2. **Narrative context.** A notebook combines code *and* written explanations in a single document. You can explain *why* you made each analytical decision, what the results mean, and what the limitations are. An Excel spreadsheet has code and comments, but they're embedded in cell formulas and don't create a readable narrative. A notebook reads like a report; a spreadsheet reads like a ledger. Other valid answers include: automation (the notebook can be re-run on new data with no extra work), scalability (notebooks handle large datasets better), and version control (notebooks are text files that work well with tools like Git).

Question 15. Describe three things that the very first cell of every well-organized notebook should contain, according to the chapter's best practices. Why does each one matter?

Answer

The first cell should be a Markdown cell containing: 1. **A descriptive title** (as a level-1 heading) — so anyone opening the notebook immediately knows what it's about, without having to read through the code. 2. **The author's name and date** — so readers know who created the analysis and when. This matters for accountability and for understanding whether the analysis might be outdated. 3. **A brief description of the notebook's purpose** — one or two sentences explaining what question the notebook addresses, what data it uses, or what project it belongs to. This helps readers (and future-you) decide whether this is the notebook they're looking for. The underlying principle is that a notebook is a *document for humans*, not just a script for a computer. The first cell is your notebook's cover page.

Section 4: Applied Scenarios (2 questions, 7 points each)

Question 16. Priya is working on her NBA analysis notebook. She wrote the following cells in this order:

Cell 1: games = 82
Cell 2: points = games * ppg
Cell 3: ppg = 27.4
Cell 4: print("Total points:", points)

She runs them in order: Cell 1, then Cell 3, then Cell 2, then Cell 4. It works and prints Total points: 2246.8.

Her editor asks her to share the notebook. The editor opens it and runs Kernel > Restart & Run All. What happens, and how should Priya fix it?

Answer

When the editor runs Restart & Run All, cells execute in order: Cell 1, Cell 2, Cell 3, Cell 4. Cell 2 (`points = games * ppg`) will fail with a **NameError** because `ppg` hasn't been defined yet — Cell 3 (which defines `ppg`) comes after Cell 2. **Fix:** Priya should rearrange the cells so that dependencies are satisfied top-to-bottom: - Cell 1: `games = 82` - Cell 2: `ppg = 27.4` - Cell 3: `points = games * ppg` - Cell 4: `print("Total points:", points)` Or she could combine Cells 1-3 into a single cell:

games = 82
ppg = 27.4
points = games * ppg

This is a classic example of the out-of-order problem. The notebook worked for Priya only because she happened to run the cells in an order that satisfied the dependencies. A well-organized notebook should always work when run top-to-bottom.

Question 17. Jordan creates a notebook to investigate grade distributions but doesn't use any Markdown cells. Their notebook contains only these code cells:

Cell 1: 312/1240
Cell 2: 87/380
Cell 3: 312/1240 - 87/380
Cell 4: print(312/1240 * 100)
Cell 5: print(87/380 * 100)

List three specific ways Jordan could improve this notebook by applying the best practices from Chapter 2.

Answer

Three improvements (many valid answers exist): 1. **Add Markdown cells with context and explanation.** Before the code, Jordan should have a title cell explaining that this is a grade distribution analysis, and explanatory cells before each calculation saying what's being computed (e.g., "Computing the A-rate for Biology 101 vs. English 101"). 2. **Use descriptive variable names instead of raw numbers.** Instead of `312/1240`, write `bio_a_count = 312`, `bio_total = 1240`, `bio_a_rate = bio_a_count / bio_total`. This makes the code self-documenting — a reader can understand what the numbers represent without guessing. 3. **Add interpretive Markdown after the results.** After computing the rates, Jordan should write a cell explaining what the numbers mean: "Biology has a 25.2% A-rate compared to English's 22.9%. This 2.3 percentage point difference warrants further investigation." Numbers without interpretation are just numbers. Other valid answers include: adding a title cell with name/date/purpose, adding code comments, using `print()` with descriptive labels (like `print("Bio A rate:", ...)`), running Restart & Run All to verify order, and giving the notebook a descriptive filename.

Section 5: Code Analysis (3 questions, 6 points each)

For each question, predict the output. Then check your answer by running the code in Jupyter.

Question 18. What is the output of the following cell?

print(10 + 5)
print(10 - 5)
print(10 * 5)
print(10 / 5)
print(10 ** 2)

Answer

Notes: - `10 / 5` produces `2.0` (a float), not `2` (an integer). In Python 3, the `/` operator always returns a float, even when the division is even. To get integer division, you would use `//`. - `10 ** 2` is "10 to the power of 2," which is 100. - Each `print()` call produces a separate line of output.

Question 19. What is the output of the following cell?

x = 7
y = 3
print(x + y)
x = 10
print(x + y)

Answer

10
13

The first `print(x + y)` uses `x = 7` and `y = 3`, so it prints `10`. Then `x` is reassigned to `10`. The second `print(x + y)` uses the new value of `x` (`10`) plus `y` (`3`), so it prints `13`. The key insight: variables can be reassigned. When `x = 10` runs, it doesn't change the result of the first `print()` — that already ran. It only affects future uses of `x`. Code runs sequentially, top to bottom.

Question 20. What is the output of the following cell?

a = 100
b = a / 3
c = a // 3
d = a % 3
print("Division:", b)
print("Integer division:", c)
print("Remainder:", d)
print("Check:", c * 3 + d)

Answer

Division: 33.333333333333336
Integer division: 33
Remainder: 1
Check: 100

This demonstrates the three division operators: - `/` gives regular division with a decimal result (33.333...) - `//` gives integer division — the result rounded down to the nearest whole number (33) - `%` (modulo) gives the remainder after division (1, because 100 = 33 * 3 + 1) The "Check" line verifies that `(quotient * divisor) + remainder = original number` — a useful property of integer division and modulo. This will be handy later when you need to work with remainders (like converting seconds to minutes and seconds, or distributing items evenly into groups).

Scoring Guide

Section	Points
Section 1: Multiple Choice (8 questions x 4 points)	32
Section 2: True/False (4 questions x 4 points)	16
Section 3: Short Answer (3 questions x 6 points)	18
Section 4: Applied Scenarios (2 questions x 7 points)	14
Section 5: Code Analysis (3 questions x 6 points)	18
Extra credit for particularly insightful answers	2
Total	100

Passing score: 70/100. If you scored below 70, review the relevant sections of the chapter before moving to Chapter 3. The installation and interface skills from this chapter are prerequisite for everything that follows.