Chapter 2: Setting Up Your Toolkit: Python, Jupyter, and Your First Notebook

Contributors to Introduction to Data Science

43 min read

> "The best time to plant a tree was twenty years ago. The second best time is now."

Learning Objectives

Install Python and Jupyter via Anaconda on their operating system and verify the installation works
Navigate the Jupyter notebook interface, creating, renaming, and organizing notebooks
Execute code cells and Markdown cells, understanding the difference between them
Apply Jupyter keyboard shortcuts to increase productivity (run cell, insert cell, restart kernel)
Create a well-organized notebook with headers, explanatory text, and code cells that tells a readable story

In This Chapter

Chapter Overview
2.1 Why Python? Why Not Excel, R, or Something Else?
2.2 Installing Your Data Science Toolkit (Anaconda)
2.3 Your First Jupyter Notebook — Launching, Creating, and the Interface Tour
2.4 Code Cells: Running Your First Python Code
2.5 Markdown Cells: Making Your Notebooks Tell a Story
2.6 The Notebook as a Lab Notebook for Data
2.7 Essential Keyboard Shortcuts and Productivity Tips
Project Checkpoint: Creating Your Project Notebook
Practical Considerations
Chapter Summary
Spaced Review: Chapter 1 Concepts
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 2: Setting Up Your Toolkit: Python, Jupyter, and Your First Notebook

"The best time to plant a tree was twenty years ago. The second best time is now." — Chinese proverb

Chapter Overview

Here's the thing about learning to cook: you can read every cookbook in the library, watch every YouTube tutorial, memorize the difference between a julienne and a brunoise, and still not know how to make dinner. At some point, you have to walk into a kitchen, pick up a knife, and start chopping.

This chapter is when you walk into the kitchen.

In Chapter 1, you learned what data science is, why it matters, and how to think like a data scientist. You met Elena, Marcus, Priya, and Jordan. You learned the six stages of the data science lifecycle. You wrote research questions. All of that was essential — you need a map before you start a journey. But a map isn't the territory.

Today, we install the tools. We open the tools. We use the tools. By the end of this chapter, you'll have a working Python installation, you'll know your way around a Jupyter notebook, and you'll have run your first lines of code. More importantly, you'll have created the project notebook that you'll build on for the entire rest of this book.

If you've never written a line of code in your life, that's perfectly fine. That's exactly who this chapter was written for. Every single step is spelled out. Every common error is anticipated. Every "I think I broke something" moment has a fix.

Let's get started.

In this chapter, you will learn to:

Install Python and Jupyter via Anaconda on your operating system and verify the installation works (all paths)
Navigate the Jupyter notebook interface, creating, renaming, and organizing notebooks (all paths)
Execute code cells and Markdown cells, understanding the difference between them (all paths)
Apply Jupyter keyboard shortcuts to increase productivity (all paths)
Create a well-organized notebook with headers, explanatory text, and code cells that tells a readable story (all paths)

Note — Learning path annotations: All objectives in this chapter are essential for every reader. There's no "skim this" option because this is a setup chapter — if you skip something here, everything that follows will be harder.

2.1 Why Python? Why Not Excel, R, or Something Else?

Before we install anything, let's address the question you might be asking: Why Python?

It's a fair question. There are lots of tools for working with data. You've probably used some already. Excel is everywhere. Google Sheets is free. R is popular in statistics departments. MATLAB is common in engineering. SAS and SPSS dominate certain industries. So why are we spending an entire book in Python?

Here's the honest answer: Python is the most versatile, most widely used, and most beginner-friendly language for data science today. Let me explain what I mean by each of those.

Python is versatile

Excel is fantastic for looking at data in rows and columns, making quick charts, and doing calculations on small datasets. If you have 500 rows and want a bar chart, Excel is probably faster than Python. No argument there.

But Excel starts struggling when your data has 100,000 rows. Or when you need to clean up inconsistent names across ten files. Or when you want to build a predictive model. Or when someone asks you to repeat your entire analysis on next month's data. Or when you need to pull data from a website automatically. Or when you want to share your work and let someone else verify exactly what you did, step by step.

Python handles all of these. It's a general-purpose programming language — meaning it wasn't designed specifically for data science. People use Python to build websites, create games, automate office tasks, control robots, and yes, analyze data. This generality is a strength: as your needs grow, Python grows with you. You'll never hit a ceiling and need to switch tools.

Python is widely used

If you run into a problem while learning Python, there's an almost-certain chance that someone else has had the same problem and posted a solution online. The Python data science community is enormous. Stack Overflow (a website where programmers help each other) has millions of Python questions answered. YouTube has thousands of Python tutorials. Every major data science company uses Python.

This matters for practical reasons: when you get stuck (and you will — everyone does), help is easy to find.

Python is beginner-friendly

Compared to languages like Java, C++, or even R, Python reads more like English. Here's a taste — don't worry about understanding this yet, just look at how readable it is:

for student in class_roster:
    if student.grade < 60:
        print(student.name, "needs extra help")

Even without any programming experience, you can probably guess what that code does: it goes through every student in the class roster, checks if their grade is below 60, and prints the name of anyone who needs extra help. That readability is not an accident — Python was explicitly designed to be clear and easy to read.

What about R?

R is an excellent language for statistics, and it's the primary tool in many academic statistics and biostatistics departments. If you end up in a graduate statistics program, you'll likely learn R there.

But here's the key difference: R was designed for statisticians. Python was designed for everyone. R is powerful for statistical analysis and visualization, but if you need to do anything beyond statistics — web scraping, automation, building a web application, working with APIs — Python is more natural. Since this book is teaching you data science broadly, not statistics narrowly, Python is the better starting point.

That said, the concepts you learn in this book transfer directly to R. If you learn data science thinking in Python, picking up R later is a weekend project, not a semester-long ordeal. The hard part is learning to think with data. The language is just syntax.

What about Excel?

We'll actually come back to Excel a few times in this book, because it's a tool you almost certainly have and it's genuinely useful for quick data inspection. But Excel has fundamental limitations for data science work:

Reproducibility. When you click through menus and drag cells in Excel, there's no record of what you did. If someone asks "how did you get this number?", you have to remember. In Python, every step is written down as code that anyone can read and re-run.
Scale. Excel can handle about a million rows before it starts choking. Real datasets often exceed this.
Complexity. Try building a machine learning model in Excel. Or scraping a website. Or processing text data. It's technically possible in some cases, but it's like trying to build a house with a Swiss Army knife — it has a lot of tools, but none of them are the right size.

Marcus — our bakery owner from Chapter 1 — has been doing everything in Excel. His seasonal sales analysis lives in a spreadsheet with color-coded tabs and complex formulas that only he understands. In Case Study 2 at the end of this chapter, you'll see exactly why he's going to be grateful for the switch to Jupyter notebooks.

🔄 Check Your Understanding

Name two advantages Python has over Excel for data science work.

Why might someone who learns Python for data science find it easier to learn R later, rather than the other way around?

What does "reproducibility" mean in the context of data analysis, and why does it matter?

2.2 Installing Your Data Science Toolkit (Anaconda)

Here's the plan: we're going to install something called Anaconda, which is a free software bundle that gives us Python, Jupyter, and hundreds of useful data science libraries all in one download. Think of it like buying a toolbox that comes pre-loaded with every tool you'll need — instead of buying each tool separately.

Why Anaconda?

You could install Python by itself from python.org, and then install Jupyter separately, and then install each data science library one at a time. Some experienced programmers prefer this approach because it gives them fine-grained control over their setup.

But for a beginner, that approach is a minefield. You'll run into version conflicts, missing dependencies, PATH problems, and half a dozen other headaches that have nothing to do with data science and everything to do with system administration. Anaconda avoids all of that.

Anaconda is a free, open-source distribution of Python specifically designed for data science. When you install Anaconda, you get:

Python (the programming language itself)
Jupyter Notebook (the interactive environment we'll use constantly)
JupyterLab (a newer, more feature-rich interface — we'll mention it but focus on classic notebooks)
pandas, NumPy, matplotlib, scikit-learn, and hundreds of other libraries (we'll use these starting in Part II)
conda (a package manager that handles installing and updating libraries)

One download. One installer. Everything works together.

Before You Install: A Quick Checklist

Before we start, make sure you have:

[ ] A computer with at least 5 GB of free disk space (Anaconda is a large download — about 800 MB for the installer, and about 3-4 GB installed)
[ ] An internet connection (to download the installer)
[ ] Administrator permissions on your computer (or the ability to install software in your user directory)
[ ] About 30 minutes of time

If you're on a shared or school computer where you can't install software, see the "Practical Considerations" section at the end of this chapter for cloud-based alternatives.

Now let's install. Find your operating system below and follow the steps.

Installation: Windows

Step 1: Download Anaconda.

Open your web browser and go to the Anaconda website (anaconda.com). Navigate to the Downloads page. You should see a large download button for Windows. Make sure you're downloading the version for Python 3 (Python 2 is outdated and no longer supported). As of this writing, the current version uses Python 3.11 or 3.12, but any Python 3.x version will work fine for this book.

Click the download button. The file will be something like Anaconda3-2024.xx-Windows-x86_64.exe. It's about 800 MB, so give it a few minutes.

Step 2: Run the installer.

Double-click the downloaded .exe file. You'll see a setup wizard.

Click Next on the welcome screen.
Click I Agree on the license agreement (after reading it, of course).
For "Install for," choose Just Me (recommended). This avoids needing administrator privileges.
For the installation location, the default is usually fine (something like C:\Users\YourName\anaconda3). Don't change it unless you have a reason to.
Important: On the "Advanced Options" screen, you'll see two checkboxes:
"Add Anaconda3 to my PATH environment variable" — The installer recommends against this, but for beginners, checking this box can make things easier. If you're unsure, leave it unchecked (the Anaconda Prompt will still work).
"Register Anaconda3 as my default Python" — Check this box. It means when programs look for Python on your computer, they'll find the Anaconda version.
Click Install and wait. This takes 5-15 minutes depending on your computer.

Step 3: Verify the installation.

Once installation is complete, look for Anaconda Navigator in your Start Menu. Click it. A window should open showing several application tiles, including Jupyter Notebook, JupyterLab, Spyder, and others.

If Anaconda Navigator opens, congratulations — you're installed.

For a more thorough check, open Anaconda Prompt (also in your Start Menu — look under Anaconda3). Type the following and press Enter:

python --version

You should see something like Python 3.11.5 or similar. Then type:

jupyter --version

You should see version information for Jupyter and its components. If both commands produce output without errors, you're ready to go.

🛠️ Debugging Walkthrough: "I don't see Anaconda in my Start Menu"

If you can't find Anaconda after installation: 1. Try searching for "Anaconda" in the Windows search bar (click the magnifying glass icon on your taskbar). 2. Check whether the installation actually completed — sometimes Windows Defender or antivirus software interrupts the installer. Try running the installer again. 3. If you chose "Install for All Users" instead of "Just Me," the Start Menu items might be in a different location. Look under "All Users" or try searching. 4. As a last resort, uninstall (via Add/Remove Programs), reboot, and try again with "Just Me."

Installation: macOS

Step 1: Download Anaconda.

Go to anaconda.com and navigate to the Downloads page. There are two options for macOS: one for Intel-based Macs and one for Apple Silicon (M1/M2/M3) Macs.

How to check which Mac you have: Click the Apple menu (top-left corner of your screen) and select "About This Mac." If it says "Chip: Apple M1" (or M2, M3, etc.), download the Apple Silicon version. If it says "Processor: Intel," download the Intel version.

Download the graphical installer (the .pkg file), not the command-line installer.

Step 2: Run the installer.

Double-click the downloaded .pkg file. Follow the prompts:

Click Continue through the introduction and license screens.
Click Agree to the license.
When asked for the installation location, choose Install for me only (this puts it in your home directory and avoids needing admin permissions for day-to-day use).
Click Install and wait.

Step 3: Verify the installation.

Open the Terminal application (you'll find it in Applications > Utilities, or search for "Terminal" in Spotlight by pressing Cmd+Space and typing "Terminal").

Type:

python --version

You should see a Python 3.x version. Then type:

jupyter --version

You should see Jupyter version information.

If you see command not found, close Terminal, open a new Terminal window, and try again. The installation needs to update your shell configuration, and this change only takes effect in new terminal windows.

🛠️ Debugging Walkthrough: "python still shows Python 2" or "command not found"

macOS sometimes ships with an older Python 2 pre-installed. If python --version shows Python 2.x or gives an error, try:

python3 --version

If python3 works but python doesn't, you're fine — Anaconda may have installed itself as python3. On modern macOS (Catalina and later), the default shell is zsh, and Anaconda should have added itself to your ~/.zshrc file. If it didn't:

Open a new Terminal window.

Type: source ~/anaconda3/bin/activate

Try python --version again.

If that works, Anaconda just needs to be initialized for your shell. Run: conda init zsh and then restart Terminal.

Installation: Linux

Step 1: Download Anaconda.

Go to anaconda.com and download the Linux installer. It will be a .sh file (a shell script), something like Anaconda3-2024.xx-Linux-x86_64.sh.

Step 2: Run the installer.

Open your terminal and navigate to wherever you downloaded the file (usually ~/Downloads). Then run:

bash Anaconda3-2024.xx-Linux-x86_64.sh

(Replace the filename with the actual name of the file you downloaded.)

The installer will: - Show you the license agreement (press Space to scroll, type "yes" to agree) - Ask for an installation location (the default ~/anaconda3 is fine — press Enter) - Ask whether to initialize Anaconda (type "yes" — this adds Anaconda to your PATH)

Step 3: Verify the installation.

Close your terminal and open a new one (this is important — the PATH changes only take effect in new terminal sessions). Then:

python --version
jupyter --version

Both should produce version information.

🛠️ Debugging Walkthrough: "conda: command not found" on Linux

If you see this error after installation: 1. Make sure you opened a new terminal window after installation. 2. Check if Anaconda added itself to your shell configuration: cat ~/.bashrc | grep conda (or ~/.zshrc if you use zsh). You should see several lines related to conda. 3. If nothing is there, run: ~/anaconda3/bin/conda init bash (or zsh), then open a new terminal. 4. If you installed to a custom location, replace ~/anaconda3 with your actual installation path.

Action Checklist: Verifying Your Installation

After installation, verify each item:

[ ] Anaconda Navigator opens (Windows: Start Menu; macOS: Applications; Linux: type anaconda-navigator in terminal)
[ ] python --version in terminal/Anaconda Prompt shows Python 3.x
[ ] jupyter --version shows version information
[ ] conda --version shows conda version information
[ ] conda list shows a long list of installed packages (you should see numpy, pandas, matplotlib, and many others)

If all five check out, your toolkit is installed and ready. Take a moment to feel good about that — you just set up a professional data science environment on your computer.

🔄 Check Your Understanding

What is Anaconda, and why do we use it instead of installing Python directly?

Why is it important to open a new terminal window after installation before testing?

What command do you type to check which version of Python is installed?

2.3 Your First Jupyter Notebook — Launching, Creating, and the Interface Tour

This is the moment. Let's open Jupyter Notebook and create your first notebook.

Launching Jupyter Notebook

There are two ways to launch Jupyter:

Option 1: From Anaconda Navigator.

Open Anaconda Navigator (you found it during the verification step above). You'll see a grid of application tiles. Find the one labeled Jupyter Notebook and click the Launch button underneath it.

Option 2: From the terminal/command line.

Open a terminal (macOS/Linux) or Anaconda Prompt (Windows). Type:

jupyter notebook

and press Enter.

Either way, two things will happen:

A terminal window will open (or a command line will start running) showing some text output. This is the notebook server — the engine that runs Jupyter in the background. Don't close this window. If you close it, Jupyter stops working. Just minimize it and forget about it.
Your web browser will open to a page that shows your files and folders. This is the Jupyter dashboard.

Let's pause here because this surprises a lot of people: Jupyter runs in your web browser. It's not a separate application with its own window — it uses Chrome, Firefox, Safari, or whatever browser you normally use. But even though it looks like a website, it's running entirely on your own computer. Your data stays on your machine. You don't need an internet connection to use it (only to install it).

Why does Jupyter use a browser? The short answer: browsers are really good at displaying rich content — text, images, code, charts — all mixed together. Instead of building a custom application from scratch, the Jupyter team leveraged what browsers already do well. The notebook server running in that terminal window is a small web server on your own computer, and your browser connects to it locally. Think of it like a private website that only you can see.

The Jupyter Dashboard

When Jupyter opens in your browser, you'll see a page that looks like a file browser. It shows the contents of whatever directory (folder) you were in when you launched Jupyter.

This is the Jupyter dashboard, and it's your home base. From here, you can:

Browse your folders
Open existing notebooks
Create new notebooks
Access a terminal

Creating Your First Notebook

Let's create a fresh notebook. Here's how:

In the Jupyter dashboard, navigate to a folder where you want to keep your work. (I recommend creating a folder called data-science-course or similar. You can create a new folder by clicking the New button in the top-right corner and selecting Folder.)
Click the New button and select Python 3 (or Python 3 (ipykernel)).

A new tab will open in your browser. You're now looking at a Jupyter notebook. Congratulations — this is where you'll spend a lot of time over the coming chapters.

The Notebook Interface: A Guided Tour

Let me walk you through what you see:

The title bar. At the top of the page, you'll see the word "Untitled" (or "Untitled1"). Click on it to rename your notebook. Let's call it chapter-02-first-notebook. Press Enter or click Rename to confirm. The notebook is automatically saved as a file called chapter-02-first-notebook.ipynb in whatever folder you're in. (The .ipynb extension stands for "IPython Notebook" — a historical name from before Jupyter supported multiple languages.)

The menu bar. Below the title, you'll see a menu bar with items like File, Edit, View, Insert, Cell, Kernel, and Help. These work like menus in any other application. We'll use some of these, but for most tasks, there are faster ways.

The toolbar. Below the menu bar is a row of icons — buttons for common actions like saving, adding cells, cutting/copying/pasting cells, running cells, and more. Hover over each icon to see a tooltip describing what it does.

The cell. Below the toolbar, you'll see a gray box with In [ ]: to its left. This is a cell — the basic building block of a Jupyter notebook. Everything you do in a notebook happens inside cells. Right now you have one empty cell, and it's waiting for you to type something.

The cell type dropdown. In the toolbar, you'll see a dropdown menu that currently says "Code." This tells you what kind of cell you're looking at. The two types you'll use constantly are:

Code cells: for writing and running Python code
Markdown cells: for writing formatted text (headings, paragraphs, lists, links)

We'll explore both in detail in the next two sections.

What Is a Kernel?

You'll hear the word kernel a lot in Jupyter. Here's what it means:

The kernel is the computational engine that runs your code. When you type Python code into a cell and run it, the cell sends that code to the kernel, the kernel executes it, and the kernel sends the result back to be displayed below the cell.

Think of it this way: the notebook is the piece of paper where you write things down. The kernel is the brain that does the actual thinking. They're connected, but they're not the same thing.

When you created your notebook and selected "Python 3," you started a Python 3 kernel. That kernel is now running in the background, waiting for you to send it code. The In [ ]: next to each cell is where Jupyter will show the execution order — once you run a cell, it will change to In [1]:, then the next one will be In [2]:, and so on.

This execution order matters, and we'll come back to it. But first, let's run some code.

🔄 Check Your Understanding

What two things happen when you type jupyter notebook in the terminal?

What is the difference between the Jupyter dashboard and a Jupyter notebook?

What is a kernel, and what does it do?

2.4 Code Cells: Running Your First Python Code

Here it is. The moment you write your first code.

Hello, World

Click on the empty cell in your notebook. A cursor should appear inside it. Type exactly this:

print("Hello, world!")

Now run the cell. There are three ways to do this:

Click the Run button in the toolbar (it looks like a play button, or the word "Run").
Press Shift+Enter on your keyboard. (This runs the cell and moves to the next one.)
Press Ctrl+Enter on your keyboard. (This runs the cell and stays on the same cell.)

I strongly recommend learning Shift+Enter right now. It will become muscle memory within an hour, and it's by far the fastest way to work.

After you run the cell, you should see:

Hello, world!

appear below the cell. The In [ ]: to the left should now say In [1]:, and below the cell you'll see the output.

Let me explain what just happened:

print() is a Python function — a command that tells Python to do something. In this case, it tells Python to display whatever is inside the parentheses.
"Hello, world!" is a string — a piece of text. The quotation marks tell Python "this is text, not code."
When you ran the cell, the notebook sent print("Hello, world!") to the kernel. The kernel executed it and sent back the result. The notebook displayed the result below the cell.

You just ran your first Python code. Seriously — take a second to appreciate that.

Arithmetic: Python as a Calculator

Python is, among other things, a very powerful calculator. Click on the next cell (or create a new one by pressing the + button in the toolbar, or by pressing B on your keyboard when no cell is selected for editing — more on this later). Type:

2 + 3

Run the cell (Shift+Enter). You'll see:

Try a few more. Run each one in its own cell:

100 - 37

15 * 4

144 / 12

2 ** 10

That last one uses ** for exponentiation — raising a number to a power. So 2 ** 10 means "2 to the power of 10," which is 1024.

Notice something: for these arithmetic expressions, you didn't need print(). When the last line of a code cell is an expression (something that produces a value), Jupyter automatically displays the result. This is a convenience feature — in a regular Python script, you'd need print() for everything, but Jupyter is more conversational.

Here's a slightly more interesting calculation. Elena wants to know what percentage of her county's population is vaccinated. Say 187,000 people have been vaccinated out of a county population of 312,000:

187000 / 312000 * 100

59.93589743589744

Almost 60%. Not bad, Elena.

The Joy of Immediate Feedback

Notice what's happening here. You type something, you run it, and you immediately see the result. There's no waiting, no compiling, no separate "run" step. This immediate feedback loop is one of the things that makes Jupyter notebooks so powerful for data science.

Think about it from Marcus's perspective. He wants to know his average daily sales for January. He could:

Open Excel, find January's data, select the column, find the AVERAGE function... or
Type sum_january / days_in_january and hit Shift+Enter.

Both work. But the Jupyter version is written down as code, which means:

Marcus can see exactly how the average was computed.
He can change the numbers and re-run instantly.
He can share the notebook with someone who can verify his math.
Next month, he can run the same code on February's data.

That's the power of the notebook: it's not just a calculator. It's a record of your thinking.

Understanding Cell Output

Let's talk about what Jupyter shows you after you run a cell, because there are some subtleties.

Rule 1: print() always produces output.

print("This will always show up")

Output:

This will always show up

Rule 2: The last expression in a cell is automatically displayed.

5 + 3

Output:

Rule 3: If the last line is an assignment or a function call that returns nothing, there's no automatic output.

x = 5 + 3

No output. The result (8) was stored in a variable called x, but nothing was displayed. To see it, you'd need to add print(x) or just put x on its own line at the end:

x = 5 + 3
x

Output:

Rule 4: print() output and automatic display look slightly different.

print(8)

Output:

They look the same here, but with strings:

print("hello")

Output:

hello

"hello"

Output:

'hello'

The print() version shows the text. The automatic display version shows the string's representation — with quotes. This distinction won't matter much right now, but it's good to be aware of.

Your Turn: Try Some Calculations

Here are some calculations to try in your notebook. Type each one in a new cell and run it. Don't just read them — actually type and run them. The muscle memory matters.

# How many hours are in a year?
365 * 24

(The # sign starts a comment — text that Python ignores. Comments are notes to yourself.)

# If Marcus sells 47 croissants on average per day, how many per week?
47 * 7

# Priya's article says NBA teams attempted 34.2 three-pointers per game in 2023.
# If there are 82 games in a season, how many total three-point attempts?
34.2 * 82

# Jordan notices that 312 students got an A in Biology and 87 got an A in English.
# If Biology had 1,240 students and English had 380, what was the A rate in each?
print("Biology A rate:", 312 / 1240 * 100, "%")
print("English A rate:", 87 / 380 * 100, "%")

That last one introduces something new: print() can take multiple arguments separated by commas, and it'll print them all with spaces between them.

🛠️ Debugging Walkthrough: "My cell shows an error instead of a result"

If you see red text below your cell, that's an error message. Don't panic. Here are the most common errors at this stage:

SyntaxError: You typed something Python doesn't understand. Common causes: - Missing quotation marks: print(hello) instead of print("hello") - Mismatched parentheses: print("hello" (missing the closing parenthesis) - Using curly quotes instead of straight quotes (this happens if you copy code from a word processor like Word)

NameError: You used a name that Python doesn't recognize. Common causes: - Misspelling: prnt("hello") instead of print("hello") - Using a variable before defining it: typing x + 5 before you've ever told Python what x is

The fix is always the same: Read the error message. Python is usually quite specific about what went wrong and where. Look at the line it's pointing to and check for typos.

🔄 Check Your Understanding

What are two ways to run a cell in Jupyter?

What's the difference between print(42) and just typing 42 in a cell?

What does the # symbol do in Python code?

2.5 Markdown Cells: Making Your Notebooks Tell a Story

Code cells are where the computation happens. But a notebook that's nothing but code is like a lab report that's nothing but numbers — technically complete, maybe, but impossible to understand.

This is where Markdown cells come in.

What Is Markdown?

Markdown is a simple formatting language for writing text. It lets you create headings, bold text, italic text, lists, links, and more — using plain text characters that are easy to remember. When you "run" a Markdown cell (same Shift+Enter as code cells), the plain text gets rendered into formatted text.

Markdown was created in 2004 by John Gruber, and it's now used all over the tech world — GitHub, Reddit, Stack Overflow, Slack, and many other platforms use some version of Markdown. Learning it once means you can use it everywhere.

Creating a Markdown Cell

To create a Markdown cell:

Create a new cell (click the + button, or press B to insert below / A to insert above).
Change the cell type from "Code" to "Markdown" using the dropdown in the toolbar. - Shortcut: Press Esc to enter command mode (we'll explain this soon), then press M to convert the current cell to Markdown. Press Y to convert it back to code.

Basic Markdown Syntax

Type each of the following examples in a Markdown cell and run it (Shift+Enter) to see the formatted result.

Headings:

# This is a big heading (level 1)
## This is a medium heading (level 2)
### This is a small heading (level 3)
#### Even smaller (level 4)

The number of # symbols determines the heading level. Level 1 is the biggest.

Bold and italic:

This word is **bold** and this word is *italic*.
You can also do ***bold and italic*** together.

Lists:

Unordered list:
- First item
- Second item
- Third item

Numbered list:
1. First step
2. Second step
3. Third step

Links:

[Click here to visit Python.org](https://www.python.org)

Code in text:

Use the `print()` function to display output.

The backticks (`) create inline code formatting — useful when you mention function names or code in your text.

Block quotes:

> "The goal is to turn data into information, and information into insight."
> — Carly Fiorina

Why Markdown Matters for Data Science

Here's why I'm spending so much time on this: the best data scientists are storytellers. They don't just produce numbers and charts — they weave them into a narrative that explains what they found, why it matters, and what should be done about it.

Markdown cells are how you tell that story in a notebook. Between your code cells, you write explanations:

What you're about to do and why
What the results mean
What you noticed that was surprising
What questions the results raise
What you'd investigate next

A notebook without Markdown is like a recipe that's just a list of ingredients with no instructions. Sure, everything is there, but good luck making dinner.

Here's an example of how a well-organized notebook section might look. You don't need to type this — just read it and notice the pattern:

## Calculating Vaccination Rates by Region

Elena's dataset includes the total population and number of vaccinated individuals
for each region. Let's compute the vaccination rate (percentage vaccinated) for
each region to identify where rates are highest and lowest.

# Calculate vaccination rate
north_rate = 45200 / 78000 * 100
south_rate = 31400 / 62000 * 100
print(f"North region: {north_rate:.1f}%")
print(f"South region: {south_rate:.1f}%")

The North region has a significantly higher vaccination rate (57.9%) compared to
the South (50.6%). In the next section, we'll explore whether this difference
correlates with distance to the nearest clinic.

See the rhythm? Markdown, code, Markdown, code. Explanation, computation, interpretation. That's the heartbeat of a good notebook.

Try It: Your First Markdown Cells

In your notebook, create a few Markdown cells and practice:

Create a level-1 heading that says "My First Jupyter Notebook"
Write a short paragraph introducing yourself and why you're learning data science
Create a level-2 heading that says "Practice Calculations"
Create a bulleted list of three things you'd like to learn in this course

Run each cell (Shift+Enter) to see the formatting.

📋 Quick Reference: Markdown Syntax

What You Want What You Type What You Get

Heading 1 # Title Large heading

Heading 2 ## Subtitle Medium heading

Heading 3 ### Section Small heading

Bold **bold text** bold text

Italic *italic text* italic text

Bullet list - item Bulleted item

Numbered list 1. item Numbered item

Link [text](url) Clickable link

Inline code `code` Formatted code

Block quote > quoted text Indented quote

Horizontal rule --- Horizontal line

🔄 Check Your Understanding

How do you change a cell from a code cell to a Markdown cell?

What's the difference between running a code cell and running a Markdown cell?

Why should you include Markdown cells between your code cells?

2.6 The Notebook as a Lab Notebook for Data

Scientists have been keeping lab notebooks for centuries. Charles Darwin kept meticulous notebooks during his voyage on the HMS Beagle. Marie Curie's notebooks are still radioactive. A lab notebook records what you did, what happened, and what you thought about it — creating a trail that others (and future-you) can follow.

A Jupyter notebook is the modern version of this. It's your lab notebook for data science.

Best Practices for Organizing Notebooks

Here are the habits that will save you hours of confusion later:

1. Every notebook should start with a title and description.

The very first cell should be a Markdown cell with a level-1 heading that explains what the notebook is about. Include the date, your name, and a brief description of the notebook's purpose:

# Coffee Shop Sales Analysis — January 2024
**Author:** Marcus
**Date:** January 31, 2024
**Purpose:** Analyze daily sales data from Rise & Shine Bakery to identify
trends, best-selling items, and staffing needs for February.

2. Use headers to create clear sections.

Use level-2 headings (##) for major sections and level-3 headings (###) for subsections. This creates a logical structure that's easy to navigate. If you look at the Jupyter menu under View, you may find a "Table of Contents" option that automatically creates a clickable navigation sidebar based on your headers.

3. Explain before you compute.

Before a code cell, write a Markdown cell that says what you're about to do and why. Before the results, write what they mean. Don't make your reader guess.

4. Name things clearly.

When you save your notebook, give it a descriptive name: coffee-sales-jan2024-analysis.ipynb, not Untitled3.ipynb. Use hyphens or underscores instead of spaces in filenames — spaces in filenames cause all sorts of headaches in programming.

5. Keep a folder structure.

Create a main folder for your data science work, with subfolders for different projects:

data-science-course/
    chapter-02/
        chapter-02-first-notebook.ipynb
    chapter-03/
        chapter-03-variables.ipynb
    project/
        vaccination-analysis.ipynb
    data/
        (data files will go here later)

6. Restart and run all — regularly.

Here's something that catches every beginner at some point: Jupyter runs cells in whatever order you run them, not necessarily top to bottom. You might run cell 5, then go back and run cell 2, then run cell 7. The kernel remembers everything you've run, in the order you ran it.

This means your notebook might work perfectly fine for you right now but fail completely when someone else tries to run it from top to bottom, because cells depend on things that were defined in a different order.

The fix: periodically go to Kernel > Restart & Run All. This clears the kernel's memory and runs every cell from top to bottom. If it works, great — your notebook is in good shape. If it doesn't, you've found a problem to fix.

We'll revisit this concept extensively — it's so important that we give it a name: restart kernel. When you restart the kernel, you clear all variables, imported libraries, and anything else the kernel was remembering. It's a fresh start.

⚠️ Common Pitfall: The Out-of-Order Problem

New Jupyter users frequently run into this scenario:

In cell 3, you write x = 10

In cell 4, you write print(x + 5) — it prints 15. Great.

You go back to cell 3 and change it to x = 20 but forget to re-run it.

You run cell 4 again — it still prints 15, because the kernel still remembers x = 10.

The fix: always re-run cells from the top when you make changes. Or use Kernel > Restart & Run All.

🔄 Check Your Understanding

Why should the first cell of a notebook be a Markdown cell with a title and description?

What happens when you "Restart & Run All" in Jupyter?

What's the "out-of-order problem," and how do you avoid it?

2.7 Essential Keyboard Shortcuts and Productivity Tips

You can use Jupyter entirely with the mouse — clicking buttons, using dropdown menus, pointing and clicking. But once you learn the keyboard shortcuts, your speed will double. Let me teach you the ones that matter most.

Command Mode vs. Edit Mode

Jupyter has two modes, and understanding them is the key to using shortcuts:

Edit Mode: You're inside a cell, typing code or text. The cell has a green border (or blue, depending on your theme). You got here by clicking inside a cell or pressing Enter on a selected cell.
Command Mode: You're not inside any cell — you're at the notebook level, selecting and manipulating cells. The selected cell has a blue border. You got here by pressing Esc.

Think of it like the difference between typing in a Word document (edit mode) and selecting paragraphs to move them around (command mode). You switch between them constantly.

Press Esc to go to Command Mode. Press Enter to go to Edit Mode.

The Essential Shortcuts

These are the shortcuts you'll use dozens of times per session. Learn these first.

In Command Mode (press Esc first):

Shortcut	What It Does
A	Insert a new cell above the current cell
B	Insert a new cell below the current cell
D, D	Delete the current cell (press D twice)
M	Convert the current cell to Markdown
Y	Convert the current cell to code (think "pYthon")
Up/Down arrows	Move between cells
Shift+Up/Down	Select multiple cells
X	Cut the current cell
C	Copy the current cell
V	Paste the cell below
Z	Undo cell operation (not text undo — cell-level undo)
L	Toggle line numbers in the current cell

In Edit Mode (inside a cell):

Shortcut	What It Does
Shift+Enter	Run the cell and move to the next one
Ctrl+Enter	Run the cell and stay on it
Alt+Enter	Run the cell and insert a new cell below
Tab	Autocomplete (start typing a function name and press Tab)
Shift+Tab	Show documentation for the function at cursor
Ctrl+Shift+-	Split the cell at the cursor

The Three Run Shortcuts

Of all the shortcuts, these three are the ones to memorize first:

Shortcut	Behavior	When to Use It
Shift+Enter	Run and advance	Most of the time — you're working through cells top to bottom
Ctrl+Enter	Run and stay	When you're tweaking one cell and want to re-run it repeatedly
Alt+Enter	Run and insert below	When you want to run the current cell and immediately start a new one

Productivity Tips

Tip 1: Use Tab completion.

Start typing a function name and press Tab. Jupyter will either complete it or show you a list of options. For example, type pri and press Tab — Jupyter will complete it to print. This saves typing and catches typos.

Tip 2: Use Shift+Tab for help.

Put your cursor after a function name (like print) and press Shift+Tab. A tooltip will appear showing what the function does and what arguments it takes. Press Shift+Tab multiple times to expand the documentation.

Tip 3: The exclamation mark runs terminal commands.

If you ever need to run a terminal command from inside a notebook, put ! at the beginning:

!python --version

This is handy for checking versions or installing packages without leaving the notebook.

Tip 4: Use ? for quick help.

Type a function name followed by ? and run the cell:

print?

This displays the function's documentation right in the notebook.

Practice: The Shortcut Workout

Try this sequence to practice shortcuts. Don't use the mouse at all:

Press Esc to make sure you're in command mode.
Press B three times to create three new cells below.
Press Up arrow to go back to the first new cell.
Press Enter to go into edit mode.
Type 2 + 2.
Press Shift+Enter to run it and move down.
Type 3 * 7.
Press Shift+Enter to run it and move down.
Type print("I'm getting faster!").
Press Ctrl+Enter to run it but stay on the cell.
Press Esc to go to command mode.
Press A to insert a cell above.
Press M to make it a Markdown cell.
Press Enter to edit it.
Type ## My Calculations.
Press Shift+Enter to render it.

If you made it through that without touching the mouse, you're already faster than most beginners. If you had to peek at the shortcut table, that's completely normal — you'll memorize them through repetition.

🔄 Check Your Understanding

What's the difference between command mode and edit mode? How do you switch between them?

What's the shortcut to insert a cell below the current one?

What's the difference between Shift+Enter and Ctrl+Enter?

Project Checkpoint: Creating Your Project Notebook

Throughout this book, you're building a progressive project — a complete data science investigation of global vaccination rates using WHO and CDC data. In Chapter 1, you defined your research questions. Now it's time to create the notebook where you'll do the actual work.

Step-by-Step: Create Your Project Notebook

Step 1: In Jupyter, navigate to your course folder and create a subfolder called project (if you haven't already).

Step 2: Inside the project folder, create a new Python 3 notebook.

Step 3: Click on "Untitled" and rename it to vaccination-analysis.

Step 4: In the first cell, change it to Markdown and type:

# Global Vaccination Rate Analysis
## A Data Science Investigation

**Author:** [Your Name]
**Date Started:** [Today's Date]
**Course:** Introduction to Data Science: From Curiosity to Code

---

### Project Overview

This notebook contains my progressive analysis of global vaccination rates,
using data from the World Health Organization (WHO) and the Centers for Disease
Control (CDC). Throughout the course, I'll build this analysis piece by piece
— from initial exploration through statistical modeling and final communication
of results.

### Research Questions

1. How do vaccination rates differ across world regions, and what patterns
   can we identify?
2. What demographic and economic factors are associated with higher or
   lower vaccination rates?
3. Are there meaningful clusters of countries with similar vaccination
   profiles?
4. Can we predict a country's vaccination rate based on its economic
   and healthcare indicators?
5. [Add your own question here]

Run this cell (Shift+Enter) to render the Markdown.

Step 5: Add section headers for the work we'll do in future chapters. Create a new Markdown cell for each:

## Part 1: Setup and Python Basics (Chapters 2-5)
*To be completed as we progress through Part I of the textbook.*

## Part 2: Data Loading and Exploration (Chapters 6-7)
*This is where we'll first load our real dataset.*

## Part 3: Data Cleaning and Wrangling (Chapters 8-13)
*Where we'll spend a lot of time making our data usable.*

## Part 4: Visualization (Chapters 14-18)
*Charts, maps, and visual storytelling.*

## Part 5: Statistical Analysis (Chapters 19-24)
*Formal analysis and hypothesis testing.*

## Part 6: Modeling (Chapters 25-30)
*Prediction and machine learning.*

## Part 7: Communication and Ethics (Chapters 31-36)
*Telling the story and doing it responsibly.*

Step 6: Go to Kernel > Restart & Run All to make sure everything works from top to bottom.

Step 7: Save the notebook (Ctrl+S or Cmd+S).

That's it. You now have a project notebook with a clear title, a description, your research questions, and section headers for the work ahead. It's mostly empty right now — and that's the point. Over the coming chapters, you'll fill it with code, analysis, charts, and insights.

Why This Structure Matters

Remember Elena from Chapter 1? When she started her vaccination rate investigation, one of her first steps was creating a well-organized notebook with clear sections and headers. She knew from experience that a project notebook you can't navigate is a project notebook you'll abandon.

Jordan learned this the hard way in their statistics class. They had a notebook full of calculations with no explanations, and when they came back to it two weeks later to prepare for an exam, they couldn't figure out what any of it meant. "I spent more time re-understanding my own work than it would have taken to just redo it," they told a friend. Adding Markdown explanations takes a minute now and saves an hour later.

Action Checklist: Project Notebook

[ ] Created a project folder in your course directory

[ ] Created vaccination-analysis.ipynb in the project folder

[ ] Added a title cell with your name, date, and project description

[ ] Listed your research questions from Chapter 1

[ ] Added section headers for Parts 1 through 7

[ ] Ran Kernel > Restart & Run All successfully

[ ] Saved the notebook

Practical Considerations

"I can't install software on my computer"

If you're on a school or work computer that doesn't allow installations, you have several options:

Google Colab (colab.research.google.com): A free, cloud-based Jupyter notebook environment from Google. It runs in your browser and requires only a Google account. Most of the code in this book will work in Colab without modification.
JupyterLite (jupyter.org/try): A lightweight version of Jupyter that runs entirely in your browser with no installation or account needed. It's more limited than a full installation but fine for basic work.
Binder (mybinder.org): Creates a temporary cloud environment from a GitHub repository. Some course materials may be available this way.

For most of this book, I'll assume you have a local Anaconda installation. But I'll note when something works differently in Colab.

"The installation failed and I don't know why"

Here's a systematic approach:

Check disk space. Anaconda needs about 5 GB. Free up space if needed.
Disable antivirus temporarily. Some antivirus programs interfere with the Anaconda installer. Disable it for the installation, then re-enable it afterward.
Try Miniconda instead. If Anaconda is too large, Miniconda is a minimal installer that includes only Python, conda, and their dependencies. It's about 50 MB instead of 800 MB. You can then install Jupyter separately with conda install jupyter.
Check for PATH conflicts. If you have a previous Python installation, it might conflict with Anaconda. Consider uninstalling the old Python first.
Use the Anaconda troubleshooting guide. The official Anaconda documentation has a detailed troubleshooting section at docs.anaconda.com.

"Jupyter opens in a browser I don't want"

Jupyter uses your system's default browser. To change this:

Generate the Jupyter config file: jupyter notebook --generate-config
Open the generated file (it'll tell you where it is).
Find the line # c.NotebookApp.browser = '' and change it to (for example) c.NotebookApp.browser = 'chrome'.

"I see a 'Kernel not found' error"

This means Jupyter can't find the Python kernel. Try:

In a terminal/Anaconda Prompt, run: python -m ipykernel install --user
Restart Jupyter and try again.

Classic Jupyter Notebook vs. JupyterLab

When you install Anaconda, you get both Jupyter Notebook (the classic interface) and JupyterLab (a newer, more feature-rich interface). Throughout this book, we use the classic Jupyter Notebook because:

It's simpler and less overwhelming for beginners
Screenshots and instructions are more consistent
Most online tutorials and courses use it

JupyterLab is excellent, and once you're comfortable with notebooks, I encourage you to try it. It has a file browser sidebar, multiple tabs, a built-in terminal, and a more modern look. You can launch it from Anaconda Navigator or by typing jupyter lab in the terminal.

Everything you learn about cells, Markdown, kernels, and shortcuts works the same way in JupyterLab.

Jupyter notebooks are saved as .ipynb files. These files contain your code, Markdown text, and all outputs (including charts). You can:

Email them as attachments
Upload them to Google Colab
Put them on GitHub (which renders notebooks nicely)
Convert them to other formats: jupyter nbconvert --to html your-notebook.ipynb creates an HTML file, and --to pdf creates a PDF (requires LaTeX)

The .ipynb file is a text file in JSON format — you can even open it in a text editor, though you wouldn't want to edit it that way.

Chapter Summary

What You Installed

Anaconda — a free distribution that bundles Python, Jupyter, and hundreds of data science libraries into one installer
Python — the programming language we'll use throughout this book
Jupyter Notebook — the interactive environment where we write and run code alongside explanatory text

What You Learned

Why Python: It's versatile (handles everything from quick calculations to machine learning), widely used (massive community and abundant help), and beginner-friendly (reads almost like English).
The Jupyter interface: Notebooks consist of cells — either code cells (for Python) or Markdown cells (for formatted text). The kernel is the engine that executes your code. The notebook server runs in the background.
Code cells: You write Python code and run it with Shift+Enter. Jupyter displays the output immediately below the cell. Python works as a calculator, and print() displays values explicitly.
Markdown cells: You write formatted text using simple syntax — # for headings, ** for bold, * for italic, - for lists. Running a Markdown cell renders the formatting.
Organization: Start every notebook with a title and description. Use headers to create structure. Explain before you compute. Name files descriptively. Use "Restart & Run All" to verify your notebook works from top to bottom.
Keyboard shortcuts: Esc/Enter to switch between command and edit mode. Shift+Enter to run. A/B to insert cells. M/Y to switch cell types. These will become second nature.

Key Terms Introduced

Term	Definition
Python	A general-purpose programming language widely used in data science for its readability and rich ecosystem of libraries
Anaconda	A free distribution of Python that includes Jupyter, data science libraries, and the conda package manager
Jupyter notebook	An interactive document that combines code, text, and visualizations in a single file
Kernel	The computational engine that executes code in a Jupyter notebook
Code cell	A cell containing Python code that can be executed to produce output
Markdown cell	A cell containing formatted text written in Markdown syntax
Cell execution	The process of running a cell's contents — sending code to the kernel or rendering Markdown
Terminal	A text-based interface for interacting with your computer's operating system
Command line	The prompt in a terminal where you type commands
Environment	A self-contained collection of Python packages and their versions, managed by tools like conda
IDE	Integrated Development Environment — a software application for writing code (Jupyter is a type of IDE)
Notebook server	The background process that connects your browser to the Jupyter kernel
Cell output	The result displayed below a cell after execution
Restart kernel	Clearing the kernel's memory and state, erasing all variables and computations

Spaced Review: Chapter 1 Concepts

Before moving on, let's make sure the ideas from Chapter 1 are sticking. Answer these without looking back.

🔄 Retrieval Practice — Chapter 1 Review

What are the six stages of the data science lifecycle? (Try to name all six from memory.)

What's the difference between a descriptive question and a causal question? Give one example of each.

Elena, Marcus, Priya, and Jordan — can you remember what each person is investigating?

What does it mean to say "data science is a way of thinking, not a set of tools"?

Which stage of the data science lifecycle do practitioners spend the most time on?

If you got all five, great — the concepts from Chapter 1 are solid. If any felt fuzzy, spend five minutes reviewing the relevant section of Chapter 1 before continuing. The spaced review works precisely because forgetting a little and then re-learning strengthens the memory.

What's Next

In this chapter, you installed your tools, created your first notebook, ran your first code, and learned Markdown. You've got a working data science environment and the beginnings of a project notebook. That's a huge step — you went from "I've never written code" to "I have a running Python environment and I know my way around it."

But so far, we've only used Python as a calculator. In Chapter 3: Python Fundamentals I — Variables, Data Types, and Expressions, we'll start programming for real. You'll learn how to store information in variables, work with different types of data (numbers, text, true/false values), and build expressions that compute useful results. You'll store Marcus's sales figures in variables, compute Priya's shooting percentages with formulas, and start building the vocabulary you need to talk to Python fluently.

The gap between "I can run 2 + 3" and "I can analyze a dataset" might feel enormous right now. It isn't. It's a series of small, manageable steps — and Chapter 3 is the next one. You've already proven you can do this. Let's keep going.

🔗 Connection: The Markdown skills you learned in this chapter will be essential in every chapter going forward. Every project checkpoint asks you to combine code and Markdown into a coherent narrative. In Chapter 6, when you do your first real data analysis, the quality of your Markdown explanations will determine whether your notebook is a useful document or an unreadable mess. Practice Markdown now — your future self will thank you.

What You Want	What You Type	What You Get
Heading 1	`# Title`	Large heading
Heading 2	`## Subtitle`	Medium heading
Heading 3	`### Section`	Small heading
Bold	`bold text`	bold text
Italic	`italic text`	italic text
Bullet list	`- item`	Bulleted item
Numbered list	`1. item`	Numbered item
Link	`[text](url)`	Clickable link
Inline code	`code`	Formatted code
Block quote	`> quoted text`	Indented quote
Horizontal rule	`---`	Horizontal line

Learning Objectives

In This Chapter

Chapter 2: Setting Up Your Toolkit: Python, Jupyter, and Your First Notebook

Chapter Overview

2.1 Why Python? Why Not Excel, R, or Something Else?

Python is versatile

Python is widely used

Python is beginner-friendly

What about R?

What about Excel?

2.2 Installing Your Data Science Toolkit (Anaconda)

Why Anaconda?

Before You Install: A Quick Checklist

Installation: Windows

Installation: macOS

Installation: Linux

Action Checklist: Verifying Your Installation

2.3 Your First Jupyter Notebook — Launching, Creating, and the Interface Tour

Launching Jupyter Notebook

The Jupyter Dashboard

Creating Your First Notebook

The Notebook Interface: A Guided Tour

What Is a Kernel?

2.4 Code Cells: Running Your First Python Code

Hello, World

Arithmetic: Python as a Calculator

The Joy of Immediate Feedback

Understanding Cell Output

Your Turn: Try Some Calculations

2.5 Markdown Cells: Making Your Notebooks Tell a Story

What Is Markdown?

Creating a Markdown Cell

Basic Markdown Syntax

Why Markdown Matters for Data Science

Try It: Your First Markdown Cells

2.6 The Notebook as a Lab Notebook for Data

Best Practices for Organizing Notebooks

2.7 Essential Keyboard Shortcuts and Productivity Tips

Command Mode vs. Edit Mode

The Essential Shortcuts

The Three Run Shortcuts

Productivity Tips

Practice: The Shortcut Workout

Project Checkpoint: Creating Your Project Notebook

Step-by-Step: Create Your Project Notebook

Why This Structure Matters

Practical Considerations

"I can't install software on my computer"

"The installation failed and I don't know why"

"Jupyter opens in a browser I don't want"

"I see a 'Kernel not found' error"

Classic Jupyter Notebook vs. JupyterLab

Saving and sharing notebooks

Chapter Summary

What You Installed

What You Learned

Key Terms Introduced

Spaced Review: Chapter 1 Concepts

What's Next