31 min read

> "Programs must be written for people to read, and only incidentally for machines to execute."

Learning Objectives

  • Create variables with descriptive names and assign values of different data types (int, float, str, bool)
  • Evaluate arithmetic expressions using Python operators and predict their results, including operator precedence
  • Convert between data types using int(), float(), str(), and bool() and explain when conversion is necessary
  • Manipulate strings using concatenation, f-strings, indexing, slicing, and common string methods
  • Diagnose common beginner errors (NameError, TypeError, SyntaxError) by reading error messages

Chapter 3: Python Fundamentals I — Variables, Data Types, and Expressions

"Programs must be written for people to read, and only incidentally for machines to execute." — Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs


Chapter Overview

In Chapter 2, you installed Python, launched Jupyter, and typed a few things into code cells. You printed "Hello, world." You ran 2 + 2 and saw 4. You experienced the thrill of telling a computer to do something and having it actually do it.

Now we're going to learn the language.

This chapter covers the absolute fundamentals of Python programming: how to store data in variables, what data types are and why they matter, how to do math with expressions and operators, how to work with strings (text), and what to do when Python yells at you with an error message. These are the building blocks. Everything else in this book — loading datasets, making charts, building models — rests on these foundations.

If you've never programmed before, this chapter is going to move at your pace. Every concept gets multiple examples. Every new idea gets a "type this and run it" moment. And every common mistake gets its own debugging section, because errors aren't failures — they're how programming works.

In this chapter, you will learn to:

  1. Create variables with descriptive names and assign values of different data types (all paths)
  2. Evaluate arithmetic expressions using Python operators and predict their results, including operator precedence (all paths)
  3. Convert between data types using int(), float(), str(), and bool() and explain when conversion is necessary (all paths)
  4. Manipulate strings using concatenation, f-strings, indexing, slicing, and common string methods (all paths)
  5. Diagnose common beginner errors (NameError, TypeError, SyntaxError) by reading error messages (all paths)

A quick word before we start: programming feels awkward at first. You'll mistype things. You'll forget a quotation mark. You'll write print when you meant Print, or you'll use = when you meant ==. This is normal, and it doesn't mean you're bad at it. It means you're a beginner, which is temporary. Every single person who writes code today started exactly where you are now.

Let's write some Python.


3.1 Your First Variable: Giving Names to Data

Open a new Jupyter notebook (or use the project notebook you created in Chapter 2). In the first empty code cell, type this and press Shift+Enter:

patient_count = 4521

Nothing happens. No output, no fanfare. But something important just occurred: you created a variable. You gave the name patient_count to the number 4521, and Python is now holding onto that association for you.

To see that Python remembers, type this in the next cell:

print(patient_count)
4521

There it is. You stored a value, gave it a name, and retrieved it. That's what variables do.

What Is a Variable?

A variable is a name that refers to a value stored in your computer's memory. When you write patient_count = 4521, you're telling Python: "create a value 4521 and attach the label patient_count to it so I can refer to it later."

The = sign here is called the assignment operator. It doesn't mean "equals" in the mathematical sense. It means "assign the value on the right to the name on the left." Think of it as an arrow pointing left: patient_count <- 4521.

Let's create a few more variables. Type each of these in separate cells (or the same cell — either works):

vaccination_rate = 0.73
city_name = "Minneapolis"
study_complete = True

You've just created four variables with four different types of data: a whole number, a decimal number, some text, and a True/False value. We'll explore each type in detail shortly.

🚪 Threshold Concept — Variables Are Labels, Not Boxes

Here's a mental model that trips up many beginners and causes confusion later. You might imagine a variable as a box that contains a value — like a shoebox with the number 4521 inside it. That's intuitive, but it's wrong in a way that will bite you eventually.

A better mental model: a variable is a sticky note that you attach to a value. The value 4521 exists in Python's memory, and the name patient_count is a label stuck onto it. The label points to the value; it doesn't contain it.

Why does this matter? Because you can stick multiple labels on the same value:

python patient_count = 4521 total_patients = patient_count

Now both patient_count and total_patients refer to the same value. You didn't make a copy of 4521. You put two sticky notes on the same thing.

And you can peel a label off one value and stick it on another:

python patient_count = 4521 patient_count = 5000 # The label now points to 5000

The value 4521 didn't change — the name patient_count was simply reassigned to a different value. This distinction between "the name moved" and "the value changed" becomes crucial when we work with lists and dictionaries in Chapter 5. For now, just remember: variables are labels, not boxes.

Naming Rules and Conventions

Python has rules about what you can name a variable. These are the rules — break them and Python will throw a SyntaxError:

  1. Names can contain letters, numbers, and underscores. That's it. No spaces, no dashes, no special characters.
  2. Names cannot start with a number. patient_count is fine; 3rd_patient is not.
  3. Names are case-sensitive. Patient_Count, patient_count, and PATIENT_COUNT are three different variables.
  4. Names cannot be Python keywords. Words like if, for, True, False, and, or, not, in, class, def, and about 30 others are reserved by Python. You can't use them as variable names.

Beyond the rules, Python has conventions — habits that experienced programmers follow to make code readable:

  • Use snake_case for variable names: patient_count, vaccination_rate, city_name. Words separated by underscores, all lowercase.
  • Choose descriptive names. vaccination_rate is better than vr. patient_count is better than x. Future you (and anyone who reads your code) will thank present you.
  • Avoid single-letter names except for throwaway counters (like i in a loop, which we'll see in Chapter 4).

Here are some examples of good and bad variable names, so you can build the instinct:

Name Verdict Why
patient_count Good Descriptive, snake_case
x Avoid What does x mean? Nobody knows
patientCount Legal but non-Pythonic This is camelCase — common in Java, not Python
2nd_dose Invalid Starts with a number
vaccination rate Invalid Contains a space
for Invalid Reserved keyword
total_vaccination_count_for_region_northwest Legal but too long Be descriptive, not encyclopedic
tot_vacc_ct Legal but too abbreviated Be readable, not cryptic
total_vaccinations Good Clear, reasonable length

Try it yourself. Type the following in a code cell and run it:

x = 42
temperature_fahrenheit = 98.6
my_name = "your name here"
print(x)
print(temperature_fahrenheit)
print(my_name)

Reassignment: Labels Can Move

Variables aren't permanent. You can reassign a variable to a new value at any time:

patient_count = 4521
print(patient_count)

patient_count = 4600
print(patient_count)
4521
4600

The name patient_count pointed to 4521, and then we moved it to point to 4600. The old value isn't destroyed — it just doesn't have a label anymore, and Python will eventually clean it up.

You can even use the old value to compute the new one:

patient_count = 4521
patient_count = patient_count + 79
print(patient_count)
4600

This looks strange if you think of = as "equals" — obviously 4521 doesn't equal 4521 + 79. But remember, = means "assign." Python evaluates the right side first (4521 + 79 = 4600), then assigns the result to patient_count.

🔄 Retrieval Practice

Before reading further, answer these in your head (or on paper): 1. What character does Python use for assignment? 2. Can a variable name start with a number? 3. What's the difference between patient_count and Patient_count? 4. If you write a = 10 and then b = a, how many copies of the number 10 exist?

Check your answers

  1. The = sign (assignment operator).
  2. No. Names can contain numbers but cannot start with one.
  3. They're two different variables — Python is case-sensitive.
  4. One copy. Both a and b are labels pointing to the same value 10.

3.2 Numbers: Integers and Floats

Data science is full of numbers. Patient counts, vaccination rates, average temperatures, GDP per capita, batting averages, p-values. Python handles numbers with two main types: integers and floats.

Integers: Whole Numbers

An integer (abbreviated int) is a whole number — no decimal point. Positive, negative, or zero.

patient_count = 4521
year = 2024
temperature_celsius = -15
population = 0

Integers in Python can be as large as your computer's memory allows. There's no upper limit. (This is unusual — many programming languages cap integers at a certain size.)

Floats: Decimal Numbers

A float (short for "floating-point number") is a number with a decimal point.

vaccination_rate = 0.73
pi = 3.14159
temperature = 98.6
negative_growth = -2.5

Even if the decimal part is zero, the presence of a decimal point makes it a float:

print(type(4))     # <class 'int'>
print(type(4.0))   # <class 'float'>

The type() function tells you what data type a value is. Try it — this will become one of your most-used debugging tools.

Arithmetic Operators

Python is a powerful calculator. Here are the arithmetic operators you'll use constantly:

Operator Name Example Result
+ Addition 10 + 3 13
- Subtraction 10 - 3 7
* Multiplication 10 * 3 30
/ Division 10 / 3 3.3333...
// Floor division 10 // 3 3
% Modulo (remainder) 10 % 3 1
** Exponentiation 10 ** 3 1000

Type each of these in a code cell and verify the results. Seriously — type them. Don't just read.

Division Gotchas: / vs //

This one catches beginners. In Python, the / operator always returns a float, even when dividing two integers evenly:

print(10 / 2)    # 5.0, not 5
print(10 / 3)    # 3.3333333333333335

If you want integer division (throwing away the remainder), use //:

print(10 // 3)   # 3
print(10 // 2)   # 5

And % gives you just the remainder:

print(10 % 3)    # 1  (because 10 = 3*3 + 1)
print(10 % 2)    # 0  (because 10 = 5*2 + 0)

The modulo operator seems obscure, but it comes up in data science more than you'd expect — checking if a year is a leap year, cycling through colors in a chart, determining if a record is in an even or odd position.

Order of Operations (Operator Precedence)

Python follows the standard mathematical order of operations — the same PEMDAS (or BODMAS) rules you learned in school:

  1. Parentheses first
  2. Exponentiation (**)
  3. Multiplication, Division, floor division, modulo (*, /, //, %) — left to right
  4. Addition, Subtraction (+, -) — left to right
result = 2 + 3 * 4
print(result)    # 14, not 20

Python evaluates 3 * 4 first (giving 12), then adds 2. If you want 2 + 3 first, use parentheses:

result = (2 + 3) * 4
print(result)    # 20

🧩 Productive Struggle — Predict Before You Run

Before running each of these expressions, write down what you think the result will be. Then run them and check.

python print(2 ** 3 ** 2) print(100 - 25 * 3 + 50) print(15 // 4) print(15 % 4) print((10 + 5) * 2 / (3 + 2))

Answers

  • 2 ** 3 ** 2 = 2 ** 9 = 512. Exponentiation is right-to-left: Python evaluates 3 ** 2 first (9), then 2 ** 9.
  • 100 - 25 * 3 + 50 = 100 - 75 + 50 = 75. Multiplication before addition/subtraction, then left to right.
  • 15 // 4 = 3. Floor division drops the decimal.
  • 15 % 4 = 3. 15 = 3*4 + 3, so the remainder is 3.
  • (10 + 5) * 2 / (3 + 2) = 15 * 2 / 5 = 30 / 5 = 6.0. Note: / always returns a float.

Real Data Calculations

Let's put arithmetic to work with a data science example. Imagine you're computing a simple statistic: the vaccination rate for a county.

total_population = 250000
vaccinated = 182500

vaccination_rate = vaccinated / total_population
print(vaccination_rate)
0.73

Now let's express that as a percentage:

percentage = vaccination_rate * 100
print(percentage)
73.0

And let's compute how many more people need to be vaccinated to reach an 85% target:

target_rate = 0.85
target_vaccinated = target_rate * total_population
additional_needed = target_vaccinated - vaccinated

print(additional_needed)
30000.0

Every number here is stored in a descriptive variable. If the population changes, you update one number and everything recalculates. That's already more powerful than punching numbers into a calculator one at a time.

Here's another worked example. Marcus, our bakery owner, wants to know his average daily revenue for the month:

monthly_revenue = 28750
days_in_month = 30

avg_daily_revenue = monthly_revenue / days_in_month
print(f"Average daily revenue: ${avg_daily_revenue:.2f}")
Average daily revenue: $957.50

And Priya, the sports journalist, wants to compare two players' points per game:

player_a_points = 1842
player_a_games = 78

player_b_points = 1536
player_b_games = 72

ppg_a = player_a_points / player_a_games
ppg_b = player_b_points / player_b_games

print(f"Player A PPG: {ppg_a:.1f}")
print(f"Player B PPG: {ppg_b:.1f}")
print(f"Difference: {ppg_a - ppg_b:.1f}")
Player A PPG: 23.6
Player B PPG: 21.3
Difference: 2.3

Notice how every calculation uses named variables, not raw numbers. You can read the code and understand what it's doing. If a stat turns out to be wrong, you can trace back to exactly which variable has the wrong value. This is what programmers mean when they talk about "readable code."

🔄 Retrieval Practice

  1. What's the difference between / and // in Python?
  2. What does type(7.0) return?
  3. In the expression 2 + 3 * 4, which operation happens first?

Check your answers

  1. / always returns a float (regular division). // returns the integer floor of the division (drops the decimal part).
  2. <class 'float'> — the decimal point makes it a float, even though the decimal part is zero.
  3. Multiplication (3 * 4 = 12), then addition (2 + 12 = 14). Python follows PEMDAS order.

3.3 Strings: Working with Text

Numbers are important, but data science is full of text too: country names, survey responses, column headers, dates as text, patient notes, tweet contents, product reviews. In Python, text is stored as strings.

Creating Strings

A string (abbreviated str) is a sequence of characters enclosed in quotes. You can use single quotes or double quotes — Python treats them identically:

country = "United States"
country = 'United States'   # Same thing

The convention in data science Python code is to use double quotes, but either works. The only time the choice matters is when your string contains a quote:

message = "It's a beautiful day"     # Double quotes around a single quote
message = 'She said "hello"'         # Single quotes around double quotes

For strings that span multiple lines, use triple quotes:

description = """This dataset contains
vaccination records from 2020-2023
across 195 countries."""

String Concatenation: Combining Text

You can glue strings together with the + operator. This is called concatenation:

first_name = "Elena"
last_name = "Rodriguez"
full_name = first_name + " " + last_name
print(full_name)
Elena Rodriguez

Notice the " " in the middle — without it, you'd get "ElenaRodriguez". Concatenation is literal; Python won't add spaces for you.

You can also repeat a string using *:

print("=" * 40)
print("REPORT HEADER")
print("=" * 40)
========================================
REPORT HEADER
========================================

This is handy for creating visual separators in output. You'll see it throughout the book.

But what if you want to mix text and numbers?

patient_count = 4521
message = "Total patients: " + patient_count
TypeError: can only concatenate str (not "int") to str

Python won't automatically convert numbers to text. Some languages do this automatically (JavaScript, for example), but Python makes you be explicit about it. This is actually a good thing — it catches bugs where you accidentally try to add a number to text, which is almost always a mistake.

You have two options: convert the number explicitly with str(), or — much better — use an f-string.

f-Strings: The Best Way to Build Strings

f-strings (formatted string literals) are Python's modern, readable way to embed values inside text. Put an f before the opening quote, and then put any expression inside curly braces {}:

patient_count = 4521
vaccination_rate = 0.73

message = f"Total patients: {patient_count}"
print(message)
Total patients: 4521

You can put any expression inside the curly braces — arithmetic, function calls, variable names:

print(f"Vaccination rate: {vaccination_rate * 100}%")
Vaccination rate: 73.0%

f-strings can also format numbers nicely. Want two decimal places?

pi = 3.14159265
print(f"Pi is approximately {pi:.2f}")
Pi is approximately 3.14

The :.2f inside the braces is a format specifier: "format this as a float with 2 decimal places." You'll use this constantly when displaying data science results.

More formatting tricks:

big_number = 1234567
print(f"Population: {big_number:,}")
Population: 1,234,567

The :, adds commas as thousands separators. Small things like this make your output readable.

You can combine format specifiers:

gdp_per_capita = 45732.8914
print(f"GDP per capita: ${gdp_per_capita:,.2f}")
GDP per capita: $45,732.89

The :,.2f means "comma separators AND two decimal places." f-strings are one of those Python features that beginners learn in week one and professionals use every day. Get comfortable with them — you'll write hundreds of them in this book.

🧩 Productive Struggle — f-String Practice

Given the variables below, try to write an f-string that produces each target output. Write your f-string before checking the answer.

python name = "Elena" rate = 0.7314 count = 245891

Target outputs: 1. "Elena's analysis" 2. "Vaccination rate: 73.1%" 3. "Processed 245,891 records"

Answers

  1. f"{name}'s analysis" — using double quotes for the f-string so the single quote (apostrophe) works inside.
  2. f"Vaccination rate: {rate * 100:.1f}%" — multiply by 100, format to one decimal place.
  3. f"Processed {count:,} records" — comma separator for readability.

Common String Methods

Strings come with built-in methods — functions attached to the string that transform or inspect it. Here are the ones you'll use most often in data science:

city = "  Minneapolis  "

# Remove leading/trailing whitespace
print(city.strip())         # "Minneapolis"

# Convert case
print(city.strip().upper())  # "MINNEAPOLIS"
print(city.strip().lower())  # "minneapolis"

# Replace text
print("New York".replace("New", "Old"))  # "Old York"

# Split into a list of parts
tags = "health,science,data"
print(tags.split(","))       # ['health', 'science', 'data']

Why do these matter for data science? Because real-world data is messy:

  • Column names might have extra spaces: " Patient_ID "
  • Text might be in mixed case: "minneapolis", "Minneapolis", "MINNEAPOLIS" — these are three different strings unless you normalize them
  • Data fields might need cleaning: replacing abbreviations, removing special characters, splitting combined fields

The .strip() method alone will save you hours of frustration when you start loading real datasets in Part II.

Two more useful methods worth knowing:

# Check how a string starts or ends
filename = "vaccination_data.csv"
print(filename.endswith(".csv"))     # True
print(filename.startswith("vacc"))   # True

# Count occurrences of a substring
sentence = "data science is the science of data"
print(sentence.count("data"))        # 2
print(sentence.count("science"))     # 2

You can also chain methods together — the output of one method becomes the input of the next:

messy = "  MINNEAPOLIS  "
clean = messy.strip().lower().title()
print(clean)   # "Minneapolis"

This chain reads left to right: strip the whitespace, convert to lowercase, then capitalize each word. Chaining is elegant, but don't chain more than 2-3 methods — readability matters more than cleverness.

🐛 Debugging Spotlight — String Methods Don't Change the Original

A very common beginner mistake:

python city = "Minneapolis" city.upper() print(city) # Still "Minneapolis"!

String methods return a new string — they don't modify the original. Strings in Python are immutable: once created, they cannot be changed. If you want the uppercase version, you need to save it:

python city = "Minneapolis" city_upper = city.upper() print(city_upper) # "MINNEAPOLIS"

Or reassign:

python city = city.upper() print(city) # "MINNEAPOLIS"

Indexing: Accessing Individual Characters

Every character in a string has a position number called an index. Python uses zero-based indexing — the first character is at position 0, not position 1.

word = "Python"
print(word[0])    # 'P'
print(word[1])    # 'y'
print(word[5])    # 'n'

You can also count from the end using negative indices:

print(word[-1])   # 'n' (last character)
print(word[-2])   # 'o' (second to last)

This is a visual map of the indices for the string "Python":

 P   y   t   h   o   n
 0   1   2   3   4   5
-6  -5  -4  -3  -2  -1

Slicing: Extracting Substrings

A slice extracts a portion of a string using the syntax string[start:stop]. The start index is included; the stop index is not included.

word = "Python"
print(word[0:3])    # 'Pyt' (indices 0, 1, 2)
print(word[2:5])    # 'tho' (indices 2, 3, 4)

Shortcuts:

print(word[:3])     # 'Pyt' (from beginning to index 3)
print(word[3:])     # 'hon' (from index 3 to end)
print(word[:])      # 'Python' (entire string)

Slicing is especially useful in data science when extracting parts of coded values. For example, imagine a dataset where patient IDs encode information:

patient_id = "MN-2024-00451"

state_code = patient_id[:2]
year = patient_id[3:7]
sequence = patient_id[8:]

print(f"State: {state_code}")
print(f"Year: {year}")
print(f"Sequence: {sequence}")
State: MN
Year: 2024
Sequence: 00451

🔄 Retrieval Practice

Given s = "Data Science", predict the output of each:

  1. s[0]
  2. s[-1]
  3. s[5:12]
  4. s[:4]
  5. s.lower()
  6. s.replace("Science", "Analysis")

Check your answers

  1. 'D' — first character (index 0)
  2. 'e' — last character
  3. 'Science' — characters at indices 5 through 11
  4. 'Data' — first four characters
  5. 'data science' — all lowercase (returns a new string)
  6. 'Data Analysis' — replaces the matched substring

The len() Function

The len() function tells you how many characters are in a string:

city = "Minneapolis"
print(len(city))    # 11

This works on any sequence in Python, not just strings. We'll use it with lists in Chapter 5.


3.4 Booleans: True, False, and Comparison Operators

The third data type we need is the boolean (abbreviated bool). Booleans have only two possible values: True and False. Note the capitalization — True and False are Python keywords; true and false won't work.

study_complete = True
data_is_clean = False

print(type(study_complete))   # <class 'bool'>

Booleans might seem too simple to be useful, but they're the foundation of decision-making in code. In Chapter 4, you'll use them with if statements to make your programs respond differently to different data. For now, let's focus on how booleans are created.

Comparison Operators

The most common way to get a boolean is by comparing two values:

Operator Meaning Example Result
== Equal to 5 == 5 True
!= Not equal to 5 != 3 True
< Less than 3 < 5 True
> Greater than 3 > 5 False
<= Less than or equal to 5 <= 5 True
>= Greater than or equal to 3 >= 5 False

🐛 Debugging Spotlight — = vs ==

This is the single most common beginner mistake in Python:

  • = is assignment: patient_count = 4521 (give this name this value)
  • == is comparison: patient_count == 4521 (is this name's value equal to 4521?)

If you write if patient_count = 4521: Python will throw a SyntaxError. You meant ==.

Let's use comparisons with data:

vaccination_rate = 0.73
target_rate = 0.85

print(vaccination_rate >= target_rate)   # False
print(vaccination_rate < 1.0)            # True
print(vaccination_rate == 0.73)          # True

Logical Operators: and, or, not

You can combine boolean values using logical operators:

rate = 0.73
sample_size = 500

# Both conditions must be True
meets_criteria = rate > 0.70 and sample_size >= 100
print(meets_criteria)    # True

# At least one condition must be True
needs_review = rate < 0.50 or sample_size < 30
print(needs_review)      # False

# Flip a boolean
is_incomplete = not study_complete
print(is_incomplete)     # False (since study_complete is True)

The rules are intuitive: - and returns True only if both sides are True - or returns True if either side is True - not flips True to False and vice versa

Here's a more realistic example. Elena is filtering patient records and needs to check multiple criteria:

age = 67
doses = 1
zip_code = "02134"

# Is this a senior citizen who isn't fully vaccinated?
needs_outreach = age >= 65 and doses < 2
print(f"Needs outreach: {needs_outreach}")   # True

# Is this person either very young or very old?
vulnerable = age < 5 or age >= 65
print(f"Vulnerable group: {vulnerable}")      # True

In Chapter 4, you'll use these boolean expressions inside if statements to make your code take different actions based on data. For now, practice building them and predicting their results.

Truthiness: What Counts as True?

Python has a concept called truthiness: every value, not just booleans, can be evaluated as True or False. The rule is simple:

  • Falsy values (treated as False): 0, 0.0, "" (empty string), None, False, and empty collections like []
  • Truthy values (treated as True): everything else
print(bool(0))        # False
print(bool(42))       # True
print(bool(""))       # False
print(bool("hello"))  # True
print(bool(0.0))      # False
print(bool(0.001))    # True

This becomes important in Chapter 4 when we write if statements. For now, just be aware that Python treats zero and empty things as "falsy."

🧩 Productive Struggle — Boolean Predictions

Predict True or False for each expression, then check by running them:

python print(10 > 5 and 3 < 1) print(10 > 5 or 3 < 1) print(not (5 == 5)) print("data" == "Data") print(bool("")) print(bool(" "))

Answers

  • 10 > 5 and 3 < 1True and FalseFalse
  • 10 > 5 or 3 < 1True or FalseTrue
  • not (5 == 5)not TrueFalse
  • "data" == "Data"False (string comparison is case-sensitive)
  • bool("")False (empty string is falsy)
  • bool(" ")True (a space is still a character — the string isn't empty!)

3.5 Type Conversion: Moving Between Types

By now you've seen four data types: int, float, str, and bool. Sometimes you need to convert between them. Python provides built-in functions for this:

Function What It Does Example Result
int() Converts to integer int(3.7) 3
float() Converts to float float("3.14") 3.14
str() Converts to string str(42) "42"
bool() Converts to boolean bool(1) True

Why Type Conversion Matters

Here's a scenario you'll encounter almost immediately in data science: you load data from a file, and everything comes in as a string. Even numbers.

# Simulating data read from a CSV file
age_from_file = "34"
weight_from_file = "72.5"

print(type(age_from_file))      # <class 'str'>
print(type(weight_from_file))   # <class 'str'>

You can't do math with strings:

print(age_from_file + 1)
TypeError: can only concatenate str (not "int") to str

You need to convert first:

age = int(age_from_file)
weight = float(weight_from_file)

print(age + 1)        # 35
print(weight * 2.2)   # 159.5

Conversion Gotchas

Not every conversion is possible:

int("hello")    # ValueError: invalid literal for int()
int("3.14")     # ValueError: can't convert string with decimal to int
float("hello")  # ValueError: could not convert string to float

The error messages are actually quite clear about what went wrong. If you see ValueError, Python is telling you: "I understand what you're trying to do, but the value you gave me doesn't make sense for that operation."

To convert a decimal string to an integer, go through float first:

value = "3.14"
result = int(float(value))   # float("3.14") = 3.14, int(3.14) = 3
print(result)                # 3

Note that int() truncates — it drops the decimal part, it doesn't round:

print(int(3.9))    # 3, not 4
print(int(-3.9))   # -3, not -4

If you want rounding, use the round() function:

print(round(3.9))     # 4
print(round(3.14159, 2))  # 3.14 (round to 2 decimal places)

A Common Data Science Scenario

Let's walk through a realistic type conversion situation. You're working with survey data, and the responses come in as strings (which is how most data arrives from forms, CSV files, and databases):

# Data arrives from a CSV file — everything is a string
age_str = "34"
weight_str = "72.5"
is_smoker_str = "True"
zip_str = "02134"

# Convert the ones that should be numbers
age = int(age_str)
weight = float(weight_str)

# Don't convert ZIP code — it's an identifier, not a number!
# zip_code = int(zip_str)  # DON'T DO THIS — loses the leading zero!
zip_code = zip_str  # Keep it as a string

# Now you can do math with the numeric values
bmi_height = 1.75  # We'd get this from another column
bmi = weight / (bmi_height ** 2)
print(f"Age: {age}, BMI: {bmi:.1f}, ZIP: {zip_code}")
Age: 34, BMI: 23.7, ZIP: 02134

The key insight: type conversion isn't mechanical — it requires thinking about what the data means. We converted age and weight because arithmetic makes sense for them. We kept zip_code as a string because it's an identifier. This decision framework (number vs. string) is explored in depth in Case Study 2.

The type() Function: Your Best Friend

When you're confused about what type something is — and you will be confused regularly — use type():

mystery_value = "42"
print(type(mystery_value))   # <class 'str'>

mystery_value = 42
print(type(mystery_value))   # <class 'int'>

mystery_value = 42.0
print(type(mystery_value))   # <class 'float'>

mystery_value = True
print(type(mystery_value))   # <class 'bool'>

🔄 Retrieval Practice

  1. What does int(7.9) return?
  2. Why does int("3.5") cause an error?
  3. What does str(100) return, and what type is it?
  4. What function tells you the type of a value?

Check your answers

  1. 7int() truncates (drops the decimal), it doesn't round.
  2. Python can't directly convert a string containing a decimal to an integer. You need to do int(float("3.5")) — convert to float first, then to int.
  3. "100" — it's a string. The digits look like a number, but it's text. You can't do str(100) + 1.
  4. type() — e.g., type(3.14) returns <class 'float'>.

3.6 When Things Go Wrong: Reading Error Messages

Every programmer — from first-day beginners to people who have written code for 30 years — encounters error messages. They're not a sign that you're bad at this. They're Python trying to tell you what went wrong. Learning to read error messages is one of the most valuable skills you can develop.

Let's look at the three errors you'll see most often at this stage.

NameError: "I Don't Know That Name"

A NameError means Python encountered a name it doesn't recognize. The most common cause is a typo.

patient_count = 4521
print(patinet_count)    # Typo!
NameError: name 'patinet_count' is not defined

🐛 Debugging Walkthrough — The NameError

When you see a NameError, ask yourself these questions in order:

  1. Did I spell the variable name correctly? Check letter by letter. patinet_count vs patient_count — can you spot the difference?
  2. Did I define the variable before using it? Python reads top to bottom. If you use a name in cell 3 but define it in cell 5, you'll get a NameError.
  3. Did I run the cell that defines it? In Jupyter, just writing a variable definition isn't enough — you have to execute the cell. A common scenario: you restart your kernel and then run a cell that uses a variable defined in an earlier cell you haven't re-run yet.
  4. Did I accidentally delete the cell that defines it? It happens.

The fix is almost always one of these four things. Check them in order and you'll solve 99% of NameErrors.

Another common trigger: forgetting quotes around a string.

city = Minneapolis    # NameError! Python thinks Minneapolis is a variable
city = "Minneapolis"  # Correct — it's a string

TypeError: "Wrong Type for This Operation"

A TypeError means you tried to do something with a data type that doesn't support it.

age = "25"
print(age + 5)
TypeError: can only concatenate str (not "int") to str

Python is telling you: "age is a string, and you can't add an integer to a string. Did you mean to convert it first?"

🐛 Debugging Walkthrough — The TypeError

When you see a TypeError, read the message carefully. Python usually tells you exactly what went wrong:

  • "can only concatenate str (not 'int') to str" — You're trying to add a string and an int. One of them needs to be converted.
  • "unsupported operand type(s) for +: 'int' and 'str'" — Same issue, different wording.
  • "'int' object is not subscriptable" — You're trying to index into a number (e.g., 42[0]). Numbers don't have indices.

The fix: use type() to check what your variables actually are, then convert as needed:

python age = "25" print(type(age)) # <class 'str'> — aha! age = int(age) # Convert to integer print(age + 5) # 30

Here's another common TypeError:

count = 100
message = "Total: " + count
TypeError: can only concatenate str (not "int") to str

Two fixes:

# Fix 1: Convert to string
message = "Total: " + str(count)

# Fix 2: Use an f-string (better!)
message = f"Total: {count}"

SyntaxError: "I Can't Even Read This"

A SyntaxError means Python can't parse your code — the structure itself is broken. Common causes:

Missing closing quote:

message = "Hello world
SyntaxError: EOL while scanning string literal

Missing parenthesis:

print("hello"
SyntaxError: unexpected EOF while parsing

Using = instead of ==:

if x = 5:
SyntaxError: invalid syntax

🐛 Debugging Walkthrough — The SyntaxError

SyntaxErrors usually have a caret (^) pointing to where Python got confused:

File "<stdin>", line 1 message = "Hello world ^ SyntaxError: EOL while scanning string literal

The caret points to the end of the line — Python expected a closing quote and never found one. When you see a SyntaxError:

  1. Look at the line number Python tells you.
  2. Look at the caret — it points to where Python first noticed the problem.
  3. Check for missing quotes, parentheses, or colons.
  4. Check the line above — sometimes the error is on the previous line, but Python doesn't notice until the next line.

Practice: Fix These Errors

Here's a block of broken code. Can you identify and fix all the errors before reading the solution?

patient_naem = 4521
print(patient_name)

vaccination rate = 0.73

city = Minneapolis

total = "100" + 50

🧩 Productive Struggle

Try to fix each line yourself before reading the solutions below.

Solutions

```python

Line 1-2: Typo — 'naem' vs 'name'

patient_name = 4521 print(patient_name)

Line 4: Variable names can't have spaces — use underscore

vaccination_rate = 0.73

Line 6: Missing quotes — Minneapolis is text, not a variable

city = "Minneapolis"

Line 8: Can't add string and int — convert or use f-string

total = 100 + 50 # If you want math total = "100" + str(50) # If you want string "10050" total = int("100") + 50 # If you want math from string data ```

How to Read Any Error Message

Every Python error message has the same structure:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'patinet_count' is not defined

The important parts: 1. The last line tells you the type of error and a description. 2. The line number tells you where to look. 3. The traceback shows the sequence of calls that led to the error (more useful in larger programs).

Start from the bottom and read up. The last line is the most important.

A note on emotional response: If you feel frustrated or stupid when you hit an error, that's normal and universal. Every programmer gets errors constantly. The difference between a beginner and an expert isn't that experts avoid errors — it's that experts read the error message, understand it, and fix it in seconds. You're building that skill right now.

ValueError: A Fourth Common Error

While NameError, TypeError, and SyntaxError are the big three for beginners, you'll also see ValueError fairly often. A ValueError means the type is correct, but the value doesn't make sense for the operation:

int("hello")   # ValueError: the string "hello" can't become an integer
int("")        # ValueError: empty string can't become an integer
float("N/A")   # ValueError: "N/A" isn't a valid number

This is extremely common in data science. When you're converting data from a file, you might encounter values like "N/A", "missing", "", or "-" in a column that's supposed to be numeric. Python can't convert these to numbers, and it will tell you so with a ValueError. In Chapter 8, you'll learn systematic ways to handle missing data. For now, just know what the error message means.

Building Your Debugging Muscle

Here's a practical strategy for when you hit an error:

  1. Don't panic. Read the error message.
  2. Look at the last line. It tells you the error type and a description.
  3. Look at the line number. Go to that line in your code.
  4. Check the line above too. Sometimes the actual mistake is on the previous line, but Python doesn't notice until the next line.
  5. Use type() liberally. When you see a TypeError, print the types of all the variables involved. The culprit will reveal itself.
  6. Reproduce with a simpler example. If you're confused by a complex expression, break it into smaller pieces and test each one.

This debugging process is not something you memorize once and never think about again. It's a habit you build through repetition. Every error you encounter and solve makes you faster at solving the next one.

🔄 Retrieval Practice

Match each error to its most likely cause:

  1. NameError: name 'vacc_rate' is not defined
  2. TypeError: can only concatenate str (not "int") to str
  3. SyntaxError: EOL while scanning string literal

Causes: - A. You forgot the closing quotation mark on a string. - B. You're trying to add a number to a string with +. - C. You misspelled a variable name, or haven't defined it yet.

Answers

1 → C, 2 → B, 3 → A


Project Checkpoint: Storing Dataset Metadata

It's time to apply what you've learned to your progressive project. Throughout this book, you're building a data analysis of a real public health dataset — WHO vaccination data. In this chapter's milestone, you'll store the metadata about your dataset in Python variables.

Open your project notebook (the one you created in Chapter 2). Add a new Markdown cell with the heading "## Dataset Metadata" and then a code cell with the following:

# Progressive Project: WHO Vaccination Data Analysis
# Chapter 3 Milestone: Store dataset metadata in variables

# Dataset identification
dataset_name = "WHO Immunization Data"
source_url = "https://immunizationdata.who.int/"
data_format = "CSV"

# Temporal coverage
start_year = 2000
end_year = 2023
years_covered = end_year - start_year + 1

# Dataset dimensions (we'll verify these when we load the data)
expected_countries = 195
expected_columns = 12

# Key columns (we'll confirm when we load the data)
column_descriptions = (
    "country: Country name (string)\n"
    "iso3: Three-letter country code (string)\n"
    "year: Year of observation (integer)\n"
    "vaccine: Vaccine type abbreviation (string)\n"
    "coverage: Estimated coverage percentage (float)"
)

# Research questions (from Chapter 1)
question_1 = "How do vaccination rates vary across WHO regions?"
question_2 = "Which countries show the largest changes over time?"
question_3 = "Is there a relationship between GDP and coverage?"

# Print a summary
print(f"Dataset: {dataset_name}")
print(f"Source: {source_url}")
print(f"Period: {start_year}-{end_year} ({years_covered} years)")
print(f"Countries: {expected_countries}")
print(f"\nKey columns:\n{column_descriptions}")

Run this cell and verify the output. Every variable has a descriptive name. The metadata is structured and readable. And the computed value (years_covered) demonstrates that variables aren't just for storage — they're for computation.

Add a Markdown cell below your code that says: "This metadata will be used throughout the project. When we load the actual dataset in Chapter 6, we'll verify these values against the real data."


Practical Considerations

Choosing Between Ints and Floats

When should you use an integer vs. a float? The rule of thumb:

  • Use integers for things you count: patient count, number of countries, year, number of columns. These are discrete — you can't have 4.7 patients.
  • Use floats for things you measure: vaccination rate, temperature, GDP per capita, batting average. These are continuous — a rate of 0.732 is meaningful.

In practice, Python handles the conversion smoothly. If you mix ints and floats in arithmetic, Python automatically converts the result to a float:

count = 10        # int
rate = 0.5        # float
result = count * rate
print(result)     # 5.0 (float)
print(type(result))  # <class 'float'>

Floating-Point Precision

Here's something that surprises every beginner:

print(0.1 + 0.2)
0.30000000000000004

That's not a bug. It's how computers store decimal numbers in binary. The value 0.1 can't be represented exactly in binary, just like 1/3 can't be represented exactly in decimal (0.333...). The error is tiny — less than one quadrillionth — but it can cause problems if you compare floats with ==:

print(0.1 + 0.2 == 0.3)   # False!

For data science, this rarely matters in practice — you're working with real-world measurements that already have much larger uncertainty. But it's good to be aware of it. If you need to compare floats, use round() or check if the difference is very small:

result = 0.1 + 0.2
print(round(result, 10) == round(0.3, 10))  # True

Multiple Assignment and Swapping

Python lets you assign multiple variables in one line:

x, y, z = 10, 20, 30
print(x, y, z)   # 10 20 30

And swap two variables without a temporary variable:

a = "first"
b = "second"
a, b = b, a
print(a, b)   # second first

This is a Python feature you won't find in many other languages. It's elegant, but use it sparingly — clarity is more important than cleverness.

Augmented Assignment

When you want to update a variable using its current value, Python offers shorthand operators:

count = 10
count += 5    # Same as count = count + 5; count is now 15
count -= 3    # Same as count = count - 3; count is now 12
count *= 2    # Same as count = count * 2; count is now 24
count /= 4    # Same as count = count / 4; count is now 6.0

Notice that /= converts an integer to a float (because / always returns a float).


Chapter Summary

You've covered a lot of ground in this chapter. Let's consolidate everything into reference tables you can come back to.

Data Types Reference

Type Python Name Example Values When to Use
Integer int 42, -7, 0, 2024 Counting things: patients, years, rows
Float float 3.14, -0.5, 0.0 Measuring things: rates, temperatures, prices
String str "hello", 'data', "" Text: names, labels, descriptions, IDs
Boolean bool True, False Yes/no decisions: data clean? threshold met?

Operators Reference

Category Operators Notes
Arithmetic +, -, *, /, //, %, ** / always returns float
Comparison ==, !=, <, >, <=, >= Return True or False
Logical and, or, not Combine booleans
Assignment =, +=, -=, *=, /= = assigns; == compares

String Methods Reference

Method What It Does Example
.strip() Remove leading/trailing whitespace " hi ".strip()"hi"
.upper() Convert to uppercase "hi".upper()"HI"
.lower() Convert to lowercase "HI".lower()"hi"
.replace(old, new) Replace occurrences "cat".replace("c", "b")"bat"
.split(sep) Split into list "a,b,c".split(",")["a","b","c"]
.startswith(s) Check prefix "data".startswith("da")True
.endswith(s) Check suffix "file.csv".endswith(".csv")True

Type Conversion Reference

Function Converts To Gotcha
int(x) Integer Truncates floats; can't handle decimal strings like "3.14"
float(x) Float Can handle integer strings like "42"
str(x) String Works on anything
bool(x) Boolean 0, 0.0, "", NoneFalse; everything else → True

Common Errors Reference

Error Meaning Most Common Cause
NameError Name not recognized Typo in variable name, or variable not yet defined
TypeError Wrong type for operation Adding string + int, indexing a number
SyntaxError Code structure is broken Missing quote, missing parenthesis, = instead of ==
ValueError Right type, wrong value int("hello"), int("3.14")

Spaced Review

These questions revisit concepts from earlier chapters. Research on learning shows that recalling information at increasing intervals strengthens long-term memory.

From Chapter 1: What Is Data Science?

  1. Name the six stages of the data science lifecycle.
  2. What's the difference between a descriptive question and a predictive question? Give an example of each using the vaccination data from our progressive project.
  3. The chapter argued that "data science is a way of thinking, not a set of tools." Now that you've started learning Python (a tool), does that claim still make sense? Why or why not?

From Chapter 2: Setting Up Your Toolkit

  1. What's the difference between a code cell and a Markdown cell in Jupyter?
  2. If you restart your Jupyter kernel and then try to print(patient_count) without re-running the cell that defines it, what happens? What type of error do you get?
  3. You defined a variable x = 10 in cell 3 and print(x) works in cell 4. Then you delete cell 3. Does print(x) still work in cell 4? Why or why not?
Quick Answers 1. Ask, Acquire, Clean, Explore, Model, Communicate. 2. Descriptive: "What were vaccination rates by region in 2023?" Predictive: "Which regions will fall below 70% coverage next year?" 3. Yes — the question still comes first. Python is a means to an end; the thinking (what question to ask, how to interpret the data) is the real skill. 4. Code cells contain Python code that gets executed. Markdown cells contain formatted text for explanation and narrative. 5. You get a `NameError: name 'patient_count' is not defined`. Restarting the kernel clears all variables from memory. 6. It might still work in the *current session* (the variable is in memory from when you ran cell 3), but it will fail after a kernel restart because there's no cell to re-run that defines `x`. This is a dangerous situation — always make sure your notebook runs top-to-bottom.

What's Next

You now have the building blocks: variables to store data, four data types to represent different kinds of values, operators to compute with them, strings for text, and the ability to read error messages when things go sideways.

But right now, your programs are linear — they execute every line in order, every time, with no decision-making and no repetition. Real data science requires programs that can choose (should I include this data point or skip it?) and repeat (do this calculation for every country in the dataset).

In Chapter 4: Python Fundamentals II — Control Flow, Functions, and Thinking Like a Programmer, you'll learn:

  • if/elif/else statements — how to make your code take different paths based on data values (like categorizing vaccination rates as "low," "medium," or "high")
  • for loops — how to repeat an operation for every item in a collection (like computing a statistic for each country)
  • Functions — how to package reusable logic so you write it once and use it everywhere

The booleans and comparison operators you learned in Section 3.4 are about to become very practical. See you in Chapter 4.