Chapter 2 Exercises: Setting Up Your Toolkit

Contributors to Introduction to Data Science

Chapter 2 Exercises: Setting Up Your Toolkit

How to use these exercises: These are hands-on exercises. Most of them require you to be sitting in front of your computer with Jupyter open. Do them in order — they build on each other. The best way to learn a tool is to use the tool, not to read about using the tool.

Difficulty key: ⭐ Foundational | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Extension

Part A: Conceptual Understanding ⭐

These questions check that you understand the why behind the tools, not just the how.

Exercise 2.1 — Why Python?

In two to three sentences, explain why this book uses Python instead of Excel for data science. Now imagine you're explaining it to Marcus, our bakery owner from Chapter 1 — he's comfortable with Excel and skeptical about learning something new. How would you make the case to him specifically?

Guidance

For Marcus specifically, the most compelling arguments are: (1) **reproducibility** — when he runs the same analysis next month, the code is already written, not a series of forgotten clicks; (2) **automation** — he can write a script that processes his sales data every week without manual work; and (3) **scale** — as his business grows, Excel will struggle with larger datasets, but Python won't. Don't lead with "Python is better" — lead with "Python solves problems you're already having." Marcus isn't interested in abstract superiority; he wants his life to be easier.

Exercise 2.2 — Kernels and notebooks

Explain the relationship between a Jupyter notebook and its kernel using an analogy. The chapter uses the analogy of "notebook = paper, kernel = brain." Come up with your own analogy that captures the same idea.

Guidance

Some good analogies: a notebook is like a script, and the kernel is like the actor performing it. A notebook is like a recipe card, and the kernel is like the kitchen where the cooking happens. A notebook is like a musical score, and the kernel is like the orchestra playing it. The key relationship to capture: the notebook *stores* the instructions and displays results, but the kernel *executes* the instructions. They're connected but separate — you can restart the kernel without losing the notebook contents.

Exercise 2.3 — Cell types

Without looking at the chapter, answer: what are the two main cell types in Jupyter, and what is each used for? What happens when you "run" each type?

Guidance

**Code cells** contain Python code. Running a code cell sends the code to the kernel for execution. The result (output) appears below the cell. **Markdown cells** contain formatted text written in Markdown syntax. Running a Markdown cell renders the formatting — converting plain-text syntax (like `**bold**`) into the formatted version (**bold**). The key insight: "run" means different things for different cell types, but the shortcut (Shift+Enter) is the same for both.

Exercise 2.4 — The out-of-order problem

Describe the "out-of-order problem" in Jupyter notebooks. Why does it happen? What is the recommended practice to catch it?

Guidance

The out-of-order problem occurs because Jupyter executes cells in whatever order you run them, not necessarily top to bottom. If you define a variable in cell 5, then use it in cell 3, the notebook will work during your session — but will fail if someone tries to run it from top to bottom, because cell 3 comes before cell 5. The recommended practice is to periodically do **Kernel > Restart & Run All**, which clears the kernel's memory and runs every cell sequentially. If the notebook still works after this, it's in good shape.

Exercise 2.5 — Anaconda's role

What is Anaconda, and why does this book recommend it over installing Python directly from python.org? Name at least two problems that Anaconda helps you avoid.

Guidance

Anaconda is a free distribution of Python bundled with Jupyter, data science libraries (pandas, NumPy, matplotlib, etc.), and the conda package manager. Problems it helps avoid: (1) **version conflicts** between libraries, (2) **missing dependencies** that cause confusing installation errors, (3) **PATH configuration issues** where the system can't find Python, and (4) the tedium of installing dozens of libraries one at a time. Anaconda provides a tested, pre-configured environment so beginners can focus on learning data science rather than system administration.

Part B: Hands-On Notebook Skills ⭐⭐

For these exercises, you should be working in a Jupyter notebook. Create a new notebook called chapter-02-exercises for your work.

Exercise 2.6 — Hello, Data Science

Create a code cell that uses print() to display the following three lines (exactly as shown):

Hello, Data Science!
My name is [your actual name].
Today I am learning Python and Jupyter.

Guidance

print("Hello, Data Science!")
print("My name is Jordan.")
print("Today I am learning Python and Jupyter.")

Each `print()` call produces one line of output. Note that the text inside the parentheses must be enclosed in quotation marks — either single quotes (`'...'`) or double quotes (`"..."`) work in Python.

Exercise 2.7 — Python as a calculator

Perform each of the following calculations in its own code cell. Before running each cell, predict what the answer will be. Then run it and check.

17 + 25
1000 - 372
24 * 60 (minutes in a day)
365 / 7 (weeks in a year — notice the decimal)
2 ** 16 (2 to the power of 16)
100 / 3 (notice the long decimal)
100 // 3 (this is integer division — what does it do?)
100 % 3 (this is the modulo operator — what does it do?)

Guidance

1. `42` 2. `628` 3. `1440` 4. `52.142857142857146` — regular division always gives a float (decimal number) 5. `65536` 6. `33.333333333333336` — Python shows many decimal places 7. `33` — integer division (`//`) divides and throws away the remainder 8. `1` — the modulo operator (`%`) gives the *remainder* after division. 100 divided by 3 is 33 remainder 1. If you hadn't seen `//` and `%` before, that's expected — we'll cover them formally in Chapter 3. For now, just notice they exist.

Exercise 2.8 — A data science calculation

Marcus's bakery sold the following number of croissants each day last week: 42, 38, 55, 47, 61, 73, 44. Write Python code in a single cell that:

Computes the total number of croissants sold that week
Computes the average daily sales (total divided by 7)
Prints both values with descriptive labels

Guidance

total = 42 + 38 + 55 + 47 + 61 + 73 + 44
average = total / 7
print("Total croissants sold:", total)
print("Average daily sales:", average)

Output:

Total croissants sold: 360
Average daily sales: 51.42857142857143

Don't worry about rounding the average — we'll learn formatting later. The key thing is that you used variables (`total` and `average`) to store intermediate results, making the calculation readable and reusable.

Exercise 2.9 — Markdown practice

Create a Markdown cell that contains all of the following elements. Run the cell to verify the formatting looks correct.

A level-1 heading: "My Data Science Journey"
A short paragraph (2-3 sentences) about why you're interested in data science
A level-2 heading: "Things I Want to Learn"
An unordered (bulleted) list of at least 4 things you want to learn
A level-2 heading: "My Background"
A sentence that includes one bold word and one italic word
A horizontal rule (---)
A block quote with your favorite quote about learning or curiosity

Guidance

Your cell should look something like this before running:

# My Data Science Journey

I'm taking this course because I've always been curious about how people use data
to make decisions. I think data science could help me in my studies and future career.

## Things I Want to Learn

- How to make charts and visualizations
- How to clean messy data
- Basic statistics for analyzing surveys
- Machine learning fundamentals

## My Background

I have **some** experience with spreadsheets but am *completely* new to programming.

---

> "The important thing is not to stop questioning. Curiosity has its own reason for existing."
> — Albert Einstein

Exercise 2.10 — Mixing code and Markdown

Create a mini-notebook section (inside your exercises notebook) that looks like a real analysis. Include:

A Markdown cell with a level-2 heading: "Average Temperature Analysis"
A Markdown cell explaining what you're about to calculate: "Let's compute the average temperature for the week, given daily highs of 72, 68, 75, 80, 77, 73, and 71 degrees Fahrenheit."
A code cell that computes the average
A Markdown cell interpreting the result: "The average daily high was [X] degrees, suggesting a mild week overall."

Fill in the actual computed value in the interpretation cell.

Guidance

The code cell would be:

total_temp = 72 + 68 + 75 + 80 + 77 + 73 + 71
average_temp = total_temp / 7
print("Average daily high:", average_temp, "degrees F")

Output: `Average daily high: 73.71428571428571 degrees F` Your interpretation Markdown cell might say: "The average daily high was approximately 73.7 degrees Fahrenheit, suggesting a mild week overall." The pattern here — explanation, code, interpretation — is exactly the pattern you'll use in real data science notebooks.

Part C: Keyboard Shortcut Mastery ⭐⭐

These exercises are designed to build muscle memory. Do them without using the mouse.

Exercise 2.11 — The cell creation drill

Starting from a single cell in your notebook, do the following using only keyboard shortcuts. Time yourself.

Press Esc to enter command mode
Press B five times (creating 5 new cells below)
Press A twice (creating 2 new cells above)
Navigate to the topmost new cell using arrow keys
Press Enter to go into edit mode
Type # Cell 1
Press Esc, then M (convert to Markdown)
Press Shift+Enter to run and move down
Type print("Cell 2") and press Shift+Enter
Continue numbering each cell until all 7 cells are filled

How long did it take? Try the exercise again and see if you're faster.

Guidance

The goal isn't speed — it's familiarity. After doing this 2-3 times, you should be able to create, navigate, and type in cells without thinking about which keys to press. The shortcuts should feel as natural as Ctrl+C for copy and Ctrl+V for paste. Common mistakes: forgetting to press Esc before using command-mode shortcuts (e.g., pressing B while in edit mode just types the letter "b"). If a shortcut doesn't seem to work, press Esc first.

Exercise 2.12 — The delete and undo drill

Create 4 new cells with simple content (anything — 1+1, 2+2, etc.)
Navigate to the second cell (command mode)
Press D, D to delete it
Notice it's gone. Press Z to undo the deletion
The cell should reappear
Now cut it instead: press X. Navigate elsewhere and press V to paste it.
Delete the last cell with D, D

Guidance

The key distinctions: **D, D** deletes a cell entirely. **X** cuts it (which means you can paste it elsewhere with **V**). **Z** undoes the last cell operation. Note that Z undoes *cell-level* operations (delete, cut, paste) — it's not the same as Ctrl+Z, which undoes *text editing* within a cell.

Exercise 2.13 — Cell type switching

Create a new cell
Type print("hello") (it's a code cell by default)
Press Esc to enter command mode
Press M — the cell is now Markdown. Notice the In [ ]: disappears
Press Y — the cell is code again. The In [ ]: returns
Run it with Shift+Enter to confirm it's a working code cell

Now do the reverse: create a Markdown cell, type a heading, switch it to code (Y), notice that running it causes an error, then switch it back to Markdown (M) and run it to render.

Guidance

When you switch a Markdown heading like `## My Heading` to a code cell and run it, Python will see `##` as a comment (the `#` character starts comments in Python), so it might not error — but it won't render as a heading either. The point is to understand that cell type matters: the same text behaves differently depending on whether the cell is code or Markdown.

Exercise 2.14 — The split shortcut

Create a code cell with two print() statements on separate lines: python print("Line one") print("Line two")
Place your cursor at the very beginning of the second line
Press Ctrl+Shift+- (minus) to split the cell at that point
You should now have two separate cells, each with one print() statement
Run both with Shift+Enter to verify they work independently

Guidance

Cell splitting is useful when you realize one cell is doing too many things and you want to separate them. Each cell should ideally do one logical step. The split shortcut works in edit mode (you need to be inside the cell with your cursor positioned where you want the split).

Part D: Notebook Organization and Best Practices ⭐⭐⭐

These exercises push you to think about notebooks as documents, not just collections of code.

Exercise 2.15 — The well-organized notebook

Create a new notebook called exercise-2-15-organized-notebook. Build a complete mini-analysis that follows all the best practices from Section 2.6. Your notebook should include:

A title cell with your name, date, and a description
At least 3 sections with level-2 headings
Explanatory Markdown before each code cell
Interpretive Markdown after at least 2 code cells
At least 5 code cells with calculations
Comments (#) in your code cells
A conclusion section summarizing what you found

Your "analysis" can be about anything — calculating how much you spend per month, comparing distances between cities, converting recipe quantities, or anything else. The content doesn't matter as much as the structure.

Guidance

A strong submission follows the pattern: title, introduction, section heading, explanation, code, interpretation, section heading, explanation, code, interpretation, ..., conclusion. Common mistakes: (1) putting all code in one giant cell, (2) skipping Markdown explanations, (3) having code without comments, (4) no conclusion. Remember: a notebook is a *document* for humans, not just a script for a computer.

Exercise 2.16 — Naming and navigation

Answer the following questions:

Which is a better notebook filename: Untitled3.ipynb or sales-analysis-jan-2024.ipynb? Why?
Why should you avoid spaces in filenames? (What problems can spaces cause?)
Suggest a folder structure for someone working on three different data science projects simultaneously.

Guidance

1. `sales-analysis-jan-2024.ipynb` is better because it's descriptive — you can tell what's inside without opening it. `Untitled3` tells you nothing and will be confusing when you have 20 notebooks. 2. Spaces in filenames cause problems when working at the command line (you need to escape them or wrap in quotes), in URLs (they get converted to `%20`), and in some programming contexts. Use hyphens (`-`) or underscores (`_`) instead. 3. A reasonable structure: ``` data-science/ project-a-sales-analysis/ data/ notebooks/ output/ project-b-survey-analysis/ data/ notebooks/ output/ project-c-web-scraping/ data/ notebooks/ output/ ```

Exercise 2.17 — The Restart & Run All test

Take any notebook you've created during these exercises. Before running the test, deliberately introduce an out-of-order dependency:

In cell 3, type y = x * 2
In cell 5, type x = 10
Run cell 5, then run cell 3. It works! (Because the kernel knows x = 10 from when you ran cell 5.)
Now do Kernel > Restart & Run All.
What happens? Why?
Fix the problem so that Restart & Run All succeeds.

Guidance

When you Restart & Run All, Jupyter runs cells from top to bottom. Cell 3 (`y = x * 2`) runs before cell 5 (`x = 10`), so Python doesn't know what `x` is yet — you'll get a **NameError**. The fix: move the `x = 10` cell above the `y = x * 2` cell, or merge them into one cell with `x = 10` on the first line and `y = x * 2` on the second line. The lesson: cells must be arranged so that they work when run sequentially from top to bottom.

Exercise 2.18 — Notebook as story

Look at the two notebook outlines below. Which tells a better story, and why?

Notebook A:

Cell 1: 42 + 38 + 55 + 47 + 61 + 73 + 44
Cell 2: 360 / 7
Cell 3: 73 * 7
Cell 4: 73 - 51.4

Notebook B:

Cell 1 (Markdown): ## Weekly Croissant Sales Analysis
Cell 2 (Markdown): Marcus wants to know his average daily croissant sales...
Cell 3 (Code): total = 42 + 38 + 55 + 47 + 61 + 73 + 44
Cell 4 (Markdown): The total was 360 croissants. Let's find the daily average.
Cell 5 (Code): average = total / 7; print("Average:", average)
Cell 6 (Markdown): At 51.4 per day, Marcus should plan for about 360/week...

Guidance

Notebook B is clearly better. It uses Markdown to explain what's being calculated and why, gives variables meaningful names (`total`, `average` instead of bare numbers), and interprets results. Notebook A would be completely opaque to anyone (including the author, two weeks later) — there's no context for what these numbers mean. This is the difference between using Jupyter as a calculator and using it as a lab notebook. Both produce the same numbers, but only one produces understanding.

Part E: Applied Scenarios and Exploration ⭐⭐⭐

These exercises connect your new skills to the bigger picture of data science.

Exercise 2.19 — Elena's quick calculation

Elena received preliminary data showing that in her county, 3 out of 20 neighborhoods have vaccination rates below 50%. She wants to express this as a percentage. She also wants to know: if each neighborhood has approximately 15,000 residents, and a "below 50%" vaccination rate means roughly 6,500 people are vaccinated in each, how many total unvaccinated people live in these three neighborhoods?

Write a notebook section (Markdown + code) that performs these calculations with clear explanations.

Guidance

# Percentage of neighborhoods below 50%
pct_below_50 = 3 / 20 * 100
print("Percentage of neighborhoods below 50%:", pct_below_50, "%")

# Unvaccinated population in these neighborhoods
residents_per_neighborhood = 15000
vaccinated_per_neighborhood = 6500
unvaccinated_per_neighborhood = residents_per_neighborhood - vaccinated_per_neighborhood
total_unvaccinated = unvaccinated_per_neighborhood * 3
print("Unvaccinated people in low-rate neighborhoods:", total_unvaccinated)

Output:

Percentage of neighborhoods below 50%: 15.0 %
Unvaccinated people in low-rate neighborhoods: 25500

The Markdown explanations should set up *why* these calculations matter (Elena needs to prioritize resources for these neighborhoods) and interpret what the numbers mean.

Exercise 2.20 — Priya's three-point comparison

Priya found the following data: In the 1999-2000 NBA season, teams averaged 14.9 three-point attempts per game. In the 2022-23 season, teams averaged 34.2 three-point attempts per game. Each season has 82 games, and there are 30 teams.

Write Python code to calculate: 1. Total three-point attempts across the entire league in each season 2. The increase from 1999 to 2023 (both as a raw number and as a percentage)

Format your output clearly with print() statements and labels.

Guidance

# Three-point attempts per season for the whole league
games_per_season = 82
teams = 30
total_games = games_per_season * teams / 2  # Each game involves 2 teams

attempts_1999 = 14.9 * total_games * 2  # multiply by 2 since 2 teams per game
attempts_2023 = 34.2 * total_games * 2

# Actually, simpler: each team plays 82 games
team_total_1999 = 14.9 * 82
team_total_2023 = 34.2 * 82
league_total_1999 = team_total_1999 * 30
league_total_2023 = team_total_2023 * 30

print("League-wide three-point attempts, 1999-00:", league_total_1999)
print("League-wide three-point attempts, 2022-23:", league_total_2023)
print("Increase:", league_total_2023 - league_total_1999)
print("Percentage increase:", (league_total_2023 - league_total_1999) / league_total_1999 * 100, "%")

The percentage increase is about 129.5%. Three-point attempts more than doubled.

Exercise 2.21 — Jordan's grade comparison

Jordan found that in the Fall 2023 semester: - Biology 101 (1,240 students): 312 A's, 396 B's, 285 C's, 156 D's, 91 F's - English 101 (380 students): 87 A's, 128 B's, 98 C's, 42 D's, 25 F's

Write a notebook section that calculates the percentage of each grade for both courses and displays them clearly. Add a Markdown interpretation comparing the two distributions.

Guidance

# Biology 101 grade percentages
bio_total = 1240
print("Biology 101 Grade Distribution:")
print("  A:", 312 / bio_total * 100, "%")
print("  B:", 396 / bio_total * 100, "%")
print("  C:", 285 / bio_total * 100, "%")
print("  D:", 156 / bio_total * 100, "%")
print("  F:", 91 / bio_total * 100, "%")

print()

# English 101 grade percentages
eng_total = 380
print("English 101 Grade Distribution:")
print("  A:", 87 / eng_total * 100, "%")
print("  B:", 128 / eng_total * 100, "%")
print("  C:", 98 / eng_total * 100, "%")
print("  D:", 42 / eng_total * 100, "%")
print("  F:", 25 / eng_total * 100, "%")

Interpretation: Biology has a slightly higher A rate (25.2% vs. 22.9%) but also a higher F rate (7.3% vs. 6.6%). The distributions are broadly similar, but there are differences worth investigating — which is exactly what Jordan plans to do.

Exercise 2.22 — Build a Markdown reference card

Create a Markdown cell that serves as a personal reference card for Markdown syntax. Include at least 8 different formatting elements (headings, bold, italic, lists, links, code, block quotes, horizontal rules). Add examples of each that you can copy later.

This is a practical exercise — you're building a tool you'll actually use. Save this in a notebook called markdown-reference.ipynb in your course folder.

Guidance

Your reference card should be comprehensive enough that you never need to look up Markdown syntax online. Include the raw syntax *and* what it produces. One approach: create two columns in a Markdown table, with "What You Type" and "What You Get" as headers, showing each element. The fact that you have to write Markdown *in* Markdown to create this reference is wonderfully self-referential. Use code blocks (triple backticks) to show the raw syntax without it being interpreted.

Exercise 2.23 — The debugging challenge

Each of the following code cells contains an error. For each one, (a) predict what error message Python will show, (b) explain why the error occurs, and (c) write the corrected code. Test your corrections in Jupyter.

Cell A:

print("Welcome to data science!)

Cell B:

print(Hello, world!)

Cell C:

pritn("This should work")

Cell D:

total = 42 + 38 + 55
average = total / 7
print("The average is: " + average)

Cell E:

x = 10
print(x + y)

Guidance

**Cell A:** **SyntaxError** — the closing quotation mark is missing. Should be: `print("Welcome to data science!")` **Cell B:** **SyntaxError** — the text isn't in quotes. Python thinks `Hello` is a variable name. Should be: `print("Hello, world!")` **Cell C:** **NameError** — `pritn` is not a recognized name. It's a typo for `print`. Should be: `print("This should work")` **Cell D:** **TypeError** — you can't use `+` to combine a string (`"The average is: "`) with a number (`average`). Fix options: `print("The average is:", average)` (using comma) or `print("The average is: " + str(average))` (converting number to string). **Cell E:** **NameError** — `y` hasn't been defined anywhere. Python doesn't know what `y` is. You'd need to add `y = 5` (or whatever value) before the `print` line. These are the five most common beginner errors. Getting comfortable reading error messages is one of the most valuable skills you can develop.

Exercise 2.24 — Exploring the Help system

Try each of the following in a Jupyter code cell and describe what happens:

Type print and press Shift+Tab
Type len? and run the cell
Type abs? and run the cell
Type help(round) and run the cell

What did you learn about each function? When would you use Shift+Tab versus ? versus help()?

Guidance

- **Shift+Tab** shows a brief tooltip while you're typing — great for a quick reminder of what arguments a function takes. - **`?`** (e.g., `len?`) shows the docstring in a panel at the bottom of the notebook — more detail than Shift+Tab. - **`help()`** (e.g., `help(round)`) prints the full help text as cell output — the most detailed option. You probably learned that: `len()` returns the length (number of items) of an object; `abs()` returns the absolute value of a number; `round()` rounds a number to a given number of decimal places. Use Shift+Tab when you're in the middle of writing code and need a quick hint. Use `?` or `help()` when you want to learn about a function you haven't used before.

Exercise 2.25 — Your own analysis notebook

This is the capstone exercise for the chapter. Create a new notebook called my-first-analysis and build a complete mini-analysis from scratch. Choose one of the following scenarios (or invent your own):

Option A: Personal Budget. You spent the following amounts on food this week: $12.50, $8.75, $22.30, $15.00, $9.25, $31.40, $18.60. Compute total spending, daily average, and how much you'd spend in a month at this rate. Format everything with Markdown context.

Option B: Fitness Tracker. You walked the following number of steps each day: 6200, 8400, 5100, 9800, 7300, 11200, 4500. Compute total steps, daily average, which day was highest, and how many miles that represents (assume 2,000 steps per mile). Format everything with Markdown context.

Option C: Study Time. You studied the following hours each day: 2.5, 1.0, 3.5, 2.0, 0.5, 4.0, 1.5. Compute total hours, daily average, and what percentage of each day (24 hours) you spent studying on average. Format everything with Markdown context.

Your notebook must include: - Title cell with name and date - At least 2 section headings - At least 3 Markdown cells with explanations - At least 4 code cells - Comments in your code - A conclusion section

Run Kernel > Restart & Run All before submitting to make sure it works.

Guidance

This exercise brings together everything from the chapter: Markdown, code cells, calculations, organization, and the "notebook as story" philosophy. There's no single right answer — the quality is in the *structure* and *clarity*, not the specific numbers. A strong submission reads like a short document, not a collection of random calculations. Someone who has never seen your notebook should be able to read it from top to bottom and understand what you analyzed, how you analyzed it, and what you found.

Reflection

After completing these exercises, take a moment to answer these questions for yourself (no need to submit — these are for your learning):

What was the most satisfying moment? (Most people say it was running their first code and seeing the output.)
What was the most frustrating moment? (Common answers: installation issues, or a shortcut not working because you were in the wrong mode.)
On a scale of 1-5, how comfortable do you feel with the Jupyter interface?
What's one thing you want to practice more before Chapter 3?

These reflections aren't graded, but they're valuable. Knowing where you are in your learning helps you decide where to go next.