29 min read

Elena is looking at vaccination data for 50 countries. She wants to label each country's vaccination rate as "low," "medium," or "high." She could type out 50 individual comparisons — but that would take all afternoon and would break the moment...

Learning Objectives

  • Write conditional statements using if, elif, and else to make decisions based on data values
  • Construct for loops to iterate over sequences and while loops for condition-based repetition
  • Define functions with parameters and return values that encapsulate reusable logic
  • Decompose a multi-step data processing task into a sequence of function calls
  • Trace program execution through conditionals and loops, predicting output before running code

Chapter 4: Python Fundamentals II: Control Flow, Functions, and Thinking Like a Programmer

"First, solve the problem. Then, write the code." — John Johnson


Chapter Overview

Elena is looking at vaccination data for 50 countries. She wants to label each country's vaccination rate as "low," "medium," or "high." She could type out 50 individual comparisons — but that would take all afternoon and would break the moment someone added a 51st country.

Marcus has weekly sales figures for every day his bakery has been open. He wants the total, the average, and the best day — every single week, automatically, without copying formulas in a spreadsheet.

Priya needs to convert raw three-point shooting percentages into readable labels like "35.2%" for her article, and she wants to do it the same way every time, without accidentally formatting one number differently from the rest.

Jordan wants to check whether every grade in a dataset is a valid number between 0 and 100 before running any analysis, so one bad entry doesn't silently corrupt the results.

All four of them need the same three things: a way to make decisions (if this, then that), a way to repeat actions (do this for every item), and a way to package logic into reusable pieces (define it once, use it anywhere). Those three things — conditionals, loops, and functions — are what turn a collection of isolated Python statements into a real program.

In this chapter, you will learn to:

  1. Write conditional statements using if, elif, and else to make decisions based on data values (all paths)
  2. Construct for loops to iterate over sequences and while loops for condition-based repetition (all paths)
  3. Define functions with parameters and return values that encapsulate reusable logic (all paths)
  4. Decompose a multi-step data processing task into a sequence of function calls (standard + deep dive paths)
  5. Trace program execution through conditionals and loops, predicting output before running code (all paths)

What you need before starting: Everything from Chapter 3 — variables, data types (int, float, str, bool), arithmetic operators, comparison operators, and f-strings. If you can create a variable, do math with it, compare two values with > or ==, and print the result using an f-string, you're ready.


4.1 Making Decisions with if, elif, and else

Programs would be pretty boring if they always did the same thing. The power of programming begins the moment your code can choose — when it can look at a piece of data and decide what to do based on what it sees.

Your First Conditional

Recall from Chapter 3 that comparison operators like >, <, ==, and != produce boolean values — True or False. Those booleans are about to become very useful.

Let's say Elena has a country's vaccination rate stored in a variable, and she wants to print a message if the rate is below 50%:

vaccination_rate = 42.3

if vaccination_rate < 50:
    print("Warning: low vaccination rate")

Output:

Warning: low vaccination rate

That's an if statement — the simplest form of a conditional. Let's break down every piece:

  • if is the keyword that starts the conditional.
  • vaccination_rate < 50 is the boolean expression — the test. Python evaluates it and gets either True or False.
  • The colon : at the end of the if line is required. Forget it, and Python will complain.
  • The next line is indented by four spaces. This indentation isn't decorative — it tells Python that this line belongs to the if block. Everything indented under the if only runs when the condition is True.

Try changing the vaccination rate to 72.1 and running it again. Nothing prints — because the condition 72.1 < 50 is False, so Python skips the indented block entirely.

Adding else: What to Do When the Condition Is False

Often, you want to do one thing if the condition is true and something different if it's false. That's where else comes in:

vaccination_rate = 72.1

if vaccination_rate < 50:
    print("Warning: low vaccination rate")
else:
    print("Vaccination rate is acceptable")

Output:

Vaccination rate is acceptable

The else block catches everything that the if didn't. There's no condition on the else line — it simply means "otherwise." Notice that else is at the same indentation level as if, with its own colon and its own indented block.

Multiple Categories with elif

Elena doesn't just want "low" and "not low." She wants three categories: low (below 50%), medium (50% to 79%), and high (80% and above). For that, she needs elif — short for "else if":

vaccination_rate = 65.0

if vaccination_rate < 50:
    category = "low"
elif vaccination_rate < 80:
    category = "medium"
else:
    category = "high"

print(f"{vaccination_rate}% → {category}")

Output:

65.0% → medium

Here's how Python evaluates this, step by step:

  1. Is vaccination_rate < 50? Is 65.0 < 50? No. Skip this block.
  2. Is vaccination_rate < 80? Is 65.0 < 80? Yes. Execute this block: category = "medium".
  3. Since a condition matched, skip the else block entirely.

The elif chain is evaluated top to bottom, and Python stops at the first condition that's True. This matters! If you wrote the conditions in a different order, you'd get different results.

Why don't we write elif vaccination_rate >= 50 and vaccination_rate < 80? Because by the time Python reaches the elif, it already knows the rate is not less than 50 (it failed the first if). So the >= 50 check is redundant. Writing conditions that assume what's already been ruled out makes your code shorter and clearer. This is a habit worth building from the start.

Indentation Matters — Really

In many programming languages, indentation is optional — it makes code prettier but doesn't change how it runs. Python is different. Indentation is how Python knows which lines belong to which block. This is one of Python's most distinctive features, and it catches every beginner at least once.

vaccination_rate = 42.3

if vaccination_rate < 50:
    print("Warning: low rate")
    print("Consider outreach programs")
print("Analysis complete")

Output:

Warning: low rate
Consider outreach programs
Analysis complete

The first two print statements are indented — they're inside the if block and only run when the condition is true. The third print is not indented — it's outside the if block and runs no matter what. Now change the rate to 85.0:

Output:

Analysis complete

Only the un-indented line runs, because the condition was false and the indented lines were skipped.

🐛 Debugging Spotlight: IndentationError

If you mix up your indentation, Python won't guess what you meant. It will stop and tell you:

python if vaccination_rate < 50: print("Warning") # Oops — no indentation!

IndentationError: expected an indented block after 'if' statement on line 1

The fix: Make sure every line inside an if, elif, or else block is indented consistently — four spaces is the standard. Most code editors (including Jupyter) will do this automatically when you press Tab after a colon. If you get this error, look at the line Python is pointing to and check that it's indented properly.

Also watch out for mixing tabs and spaces. If your editor uses tabs in one place and spaces in another, Python may throw a TabError. The simplest prevention: configure your editor to insert four spaces when you press Tab. Jupyter does this by default.

Nesting Conditionals

You can put an if inside another if. Elena might want to add a special flag for critically low rates:

vaccination_rate = 22.5

if vaccination_rate < 50:
    category = "low"
    if vaccination_rate < 25:
        category = "critically low"
else:
    category = "acceptable"

print(f"{vaccination_rate}% → {category}")

Output:

22.5% → critically low

Nesting works, but deep nesting (three or four levels) makes code hard to read. If you find yourself nesting more than two levels, it's usually a sign that you should restructure your logic — often by using functions, which we'll learn in Section 4.4.

🔄 Check Your Understanding

  1. What is a boolean expression, and where does it appear in an if statement?
  2. What happens if you forget the colon at the end of an if line?
  3. In the three-category example (low/medium/high), what would category be if vaccination_rate were exactly 80.0?
  4. (From Chapter 3) What data type does vaccination_rate < 50 produce?

4.2 Repeating Actions with for Loops

Conditionals let your program make decisions. Loops let your program do things over and over again — and "over and over again" is about 90% of what data science code actually does.

The Basic for Loop

Elena has a list of country names. (We'll formally learn about Python lists in Chapter 5, but for now, just think of a list as a sequence of items between square brackets.) She wants to print each one:

countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]

for country in countries:
    print(country)

Output:

Brazil
India
Nigeria
Germany
Japan

Let's read this almost like English: "For each country in the list countries, print the country." The variable country is called the loop variable — it automatically takes on each value in the sequence, one at a time. On the first pass (or iteration), country is "Brazil". On the second iteration, it's "India". And so on, until the list is exhausted.

The key pieces: - for starts the loop. - country is the loop variable (you can name it anything, but descriptive names help). - in countries tells Python what to iterate over. - The colon : and the indented block work exactly like if — everything indented runs once per iteration.

Iteration is the technical term for one pass through the loop. If the list has 5 items, the loop does 5 iterations. If it has 5,000 items, it does 5,000 iterations — and Python doesn't break a sweat.

Combining Loops and Conditionals

Here's where things start to feel powerful. Elena can loop through countries and make decisions about each one:

rates = [42.3, 65.0, 88.1, 71.5, 93.2]
countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]

for i in range(len(countries)):
    rate = rates[i]
    country = countries[i]

    if rate < 50:
        label = "LOW"
    elif rate < 80:
        label = "medium"
    else:
        label = "high"

    print(f"{country}: {rate}% ({label})")

Output:

Brazil: 42.3% (LOW)
India: 65.0% (medium)
Nigeria: 88.1% (high)
Germany: 71.5% (medium)
Japan: 93.2% (high)

Don't worry about the range(len(...)) syntax for now — we'll revisit it in Chapter 5 when we dig deeper into lists. The important thing is the pattern: a loop that processes each item, and a conditional inside the loop that handles each item differently.

This is exactly the kind of task that would be tedious to do by hand for 5 countries, painful for 50, and impossible for 5,000. But for Python, it's the same amount of effort no matter how long the list is.

Accumulating a Total

One of the most common loop patterns in data science is the accumulator — starting with zero and adding to it on each iteration:

daily_sales = [420, 380, 510, 475, 390, 620, 550]

total = 0
for sale in daily_sales:
    total = total + sale

print(f"Weekly total: ${total}")
print(f"Daily average: ${total / len(daily_sales):.2f}")

Output:

Weekly total: $3345
Daily average: $477.86

Marcus would love this. Instead of adding seven numbers by hand, he writes a loop that works for any number of days. The pattern is always the same: create a variable before the loop (the accumulator), update it inside the loop, and use it after the loop.

Looping with range()

Sometimes you don't have a list to loop over — you just want to do something a specific number of times. The range() function generates a sequence of numbers:

for i in range(5):
    print(f"Iteration {i}")

Output:

Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4

Notice that range(5) produces 0, 1, 2, 3, 4 — five numbers, starting from 0. This zero-based counting might feel odd, but it's a Python convention you'll get used to quickly. (It matches how list indexing works, which you'll see in Chapter 5.)

You can also give range() a start and end: range(1, 6) produces 1, 2, 3, 4, 5. And you can add a step: range(0, 100, 10) produces 0, 10, 20, 30, 40, 50, 60, 70, 80, 90.

Counting Items That Meet a Condition

Here's another bread-and-butter pattern — counting how many items satisfy some criterion:

rates = [42.3, 65.0, 88.1, 71.5, 93.2, 38.7, 55.0]

low_count = 0
for rate in rates:
    if rate < 50:
        low_count = low_count + 1

print(f"{low_count} countries have low vaccination rates")

Output:

2 countries have low vaccination rates

This combines the accumulator pattern with a conditional inside the loop. It's a pattern you'll use constantly in data science: "How many X satisfy condition Y?"

🔄 Check Your Understanding

  1. What is the purpose of the loop variable in a for loop?
  2. How many times does for i in range(10): execute its body?
  3. (From Chapter 3) In the accumulator pattern, what type is the variable total — and what would happen if you forgot to initialize it to 0 before the loop?
  4. Write a for loop (mentally or on paper) that prints the squares of the numbers 1 through 5. What output would you expect?

4.3 While Loops: Repeating Until a Condition Changes

A for loop is perfect when you know in advance how many iterations you want — you have a list, a range, or some other sequence to go through. But sometimes you want to keep going until something changes, and you don't know when that will be. That's a while loop.

The Basic while Loop

count = 1
while count <= 5:
    print(f"Count is {count}")
    count = count + 1

print("Done!")

Output:

Count is 1
Count is 2
Count is 3
Count is 4
Count is 5
Done!

A while loop checks its condition before each iteration. If the condition is True, it runs the body and checks again. If it's False, it stops. The key difference from a for loop: you are responsible for making the condition eventually become False. If you don't, the loop runs forever.

Practical Use: Input Validation

Here's a scenario Jordan might face. Before analyzing grades, he wants to make sure each value is reasonable:

grade = -5  # Simulate a bad input

while grade < 0 or grade > 100:
    print(f"Invalid grade: {grade}")
    print("Grade must be between 0 and 100")
    grade = 75  # Simulate getting a corrected value

print(f"Valid grade accepted: {grade}")

Output:

Invalid grade: -5
Grade must be between 0 and 100
Valid grade accepted: 75

In a real interactive program, the line grade = 75 would be replaced by asking the user for new input. The while loop keeps asking until it gets something valid. We're simulating the pattern here — the important idea is that the loop doesn't know in advance how many times it will run. It depends on the data.

A Data Processing Example

Let's say Marcus wants to process sales data until he encounters a zero, which signals the end of the data:

sales_data = [420, 380, 510, 0, 475, 390]
index = 0
total = 0

while index < len(sales_data) and sales_data[index] != 0:
    total = total + sales_data[index]
    index = index + 1

print(f"Processed {index} days, total: ${total}")

Output:

Processed 3 days, total: $1310

The loop stops as soon as it hits the zero — it doesn't process the numbers after it. This "process until a sentinel value" pattern is common in real-world data.

🐛 Debugging Spotlight: Infinite Loops

The most common while loop bug is an infinite loop — a loop whose condition never becomes False:

python count = 1 while count <= 5: print(f"Count is {count}") # Oops! Forgot to update count!

This prints "Count is 1" forever (or until you interrupt it). In Jupyter, you'll see the cell keep running with a [*] that never turns into a number. To stop it, click the "Interrupt Kernel" button (the square icon) or press Ctrl+C.

Prevention: Every while loop should have something inside the body that moves the condition toward False. Before you run a while loop, ask yourself: "What changes on each iteration that will eventually make this stop?"

When to Use for vs. while

Here's a simple guideline:

  • Use for when you know what you're iterating over — a list, a range, a sequence of some kind. This covers the vast majority of data science loops.
  • Use while when you're waiting for a condition to change and you don't know how many iterations it will take — validation loops, convergence algorithms, or processing data until a stop condition.

In practice, you'll use for loops about 90% of the time in data science. But understanding while loops is important because they appear in algorithms, simulations, and anywhere you need "keep going until X."

🔄 Check Your Understanding

  1. What is the key difference between a for loop and a while loop?
  2. What happens if the condition in a while loop starts as False?
  3. Name one scenario where a while loop is more appropriate than a for loop.
  4. (From Section 4.1) If you put an if statement inside a while loop, which controls how many times the if is evaluated — the while or the if?

4.4 Writing Your Own Functions

Everything so far in this chapter has been about making programs smarter: conditionals make decisions, loops repeat actions. But there's a problem lurking. Watch what happens when Marcus tries to compute the same summary for two different weeks:

# Week 1
week1_sales = [420, 380, 510, 475, 390, 620, 550]
week1_total = 0
for sale in week1_sales:
    week1_total = week1_total + sale
week1_avg = week1_total / len(week1_sales)
print(f"Week 1: total=${week1_total}, avg=${week1_avg:.2f}")

# Week 2
week2_sales = [395, 410, 480, 520, 445, 580, 610]
week2_total = 0
for sale in week2_sales:
    week2_total = week2_total + sale
week2_avg = week2_total / len(week2_sales)
print(f"Week 2: total=${week2_total}, avg=${week2_avg:.2f}")

Do you see the repetition? The logic for computing the total and average is identical — the only thing that changes is the data. Marcus just copy-pasted the code and changed variable names. That works for two weeks. But what about 52 weeks? What if he decides to also track the maximum? He'd have to make the same change in 52 places.

This is the problem that functions solve.

The DRY Principle

There's a famous programming principle called DRYDon't Repeat Yourself. The idea is simple: if you find yourself writing the same code more than once, something is wrong. You should write it once, give it a name, and then reuse it.

The tool for this is the function — a named block of code that performs a specific task, can accept input, and can produce output.

Defining Your First Function

def summarize_sales(daily_sales):
    total = 0
    for sale in daily_sales:
        total = total + sale
    average = total / len(daily_sales)
    return total, average

Let's break this down:

  • def is the keyword that starts a function definition. It's short for "define."
  • summarize_sales is the function's name. Like variable names, function names should be descriptive and use lowercase with underscores.
  • (daily_sales) is the parameter — a variable that represents the input the function expects. When you call the function, you'll pass in actual data, and it will be assigned to this parameter name.
  • The colon : and indented block work just like if and for — everything indented is the function's body.
  • return total, average is the return statement — it sends results back to whoever called the function. A function without a return statement returns None (Python's way of saying "nothing").

Now Marcus can use this function for any week:

week1_total, week1_avg = summarize_sales([420, 380, 510, 475, 390, 620, 550])
week2_total, week2_avg = summarize_sales([395, 410, 480, 520, 445, 580, 610])

print(f"Week 1: total=${week1_total}, avg=${week1_avg:.2f}")
print(f"Week 2: total=${week2_total}, avg=${week2_avg:.2f}")

Output:

Week 1: total=$3345, avg=$477.86
Week 2: total=$3440, avg=$491.43

Two lines of code instead of twelve. And if Marcus wants to add a third week, fourth week, or fifty-second week, it's one more line each.

🚪 Threshold Concept: Functions as Abstractions

This is one of those moments where something fundamental shifts in how you think about programming. Take a breath — this is important.

Before functions, you think about code line by line: "First do this, then do that, then do the next thing." Every detail is in front of you, all the time.

With functions, you start thinking at a higher level: "Summarize the sales data." You don't care how the summarizing works — you trust that the function does it correctly, the same way you trust that print() prints things without thinking about the engineering inside print().

This shift from "how does it work step by step?" to "what does it accomplish?" is called abstraction, and it's the most important idea in all of computer science. It's also how the human brain naturally works — when you drive a car, you think "turn left" rather than "contract the muscles in my left arm to rotate the steering wheel 90 degrees counterclockwise while simultaneously easing pressure on the accelerator pedal."

Functions let you name a block of logic. Once it has a name, you can think about it as a single concept rather than a sequence of steps. You can say summarize_sales(data) and your brain processes that as one idea, not seven lines of code. This is how programmers manage complexity — by building layers of abstraction, where each layer hides the details of the layer below.

Here's the practical payoff: when you read code that uses well-named functions, you can understand what it does without reading the function definitions:

data = load_vaccination_data("who_data.csv")
cleaned = remove_missing_values(data)
summary = compute_regional_averages(cleaned)
display_report(summary)

You don't know how any of these functions work internally. But you can read the program and understand its purpose: load data, clean it, compute averages, display results. That's the power of abstraction.

If this feels obvious: Good — it means you're already thinking like a programmer. The idea gets more powerful as programs get more complex.

If this feels abstract (pun intended): That's completely normal. The concept will solidify as you write more functions. For now, just remember: a function lets you give a name to a process, and once it has a name, you can use it without thinking about its insides.

Parameters vs. Arguments

These two terms are often used interchangeably, but they have a precise difference:

  • A parameter is the variable name in the function definition: def summarize_sales(daily_sales) — here, daily_sales is the parameter.
  • An argument is the actual value you pass when you call the function: summarize_sales([420, 380, 510]) — here, [420, 380, 510] is the argument.

Think of it this way: the parameter is the placeholder; the argument is the real data that fills it in. Most of the time you don't need to worry about the distinction, but it helps when reading documentation.

Return Values: Getting Results Back

A function can return a value, and you can capture that value in a variable:

def format_percentage(value):
    return f"{value:.1f}%"

result = format_percentage(42.356)
print(result)

Output:

42.4%

The return statement does two things: it sends a value back to the caller, and it immediately exits the function. Any code after a return statement (at the same indentation level) will never run.

A function can return multiple values separated by commas, as we saw with summarize_sales. Python packs them into a tuple (a concept we'll cover in Chapter 5), and you can unpack them into separate variables:

total, avg = summarize_sales([100, 200, 300])

🐛 Debugging Spotlight: The Forgotten return

One of the most common function bugs is forgetting to include a return statement:

```python def compute_average(numbers): total = 0 for n in numbers: total = total + n average = total / len(numbers) # Oops — forgot to return the average!

result = compute_average([10, 20, 30]) print(result) ```

Output: None

The function computes the average correctly inside itself, but it never sends the result back. The variable result gets None — Python's default return value. This bug is sneaky because there's no error message. Everything runs fine; the answer just disappears.

The fix: If your function computes something, make sure the last meaningful line is return <the_thing_you_computed>.

Scope: What Happens Inside Stays Inside

Variables created inside a function exist only inside that function. This is called scope:

def greet(name):
    message = f"Hello, {name}!"
    return message

greet("Elena")
# print(message)  # This would cause a NameError!

The variable message exists inside greet but not outside it. If you uncomment the last line, you'll get NameError: name 'message' is not defined. This is actually a feature, not a bug — it means you can use common variable names like total, count, or result inside different functions without them interfering with each other.

🔄 Check Your Understanding

  1. What does the DRY principle stand for, and why does it matter?
  2. What is the difference between a parameter and an argument?
  3. What does a function return if you forget to include a return statement?
  4. (From Chapter 3) What data type is None — and how would you check whether a variable is None?

4.5 Functions That Call Functions: Building Up Complexity

So far, each function has been self-contained. But the real power of functions comes when they work together — when one function calls another, and that one calls another, building layers of abstraction.

A Concrete Example

Let's build a mini data-processing pipeline for Elena. She wants to take a vaccination rate, categorize it, format it as a percentage, and produce a report string — all in one go.

First, we write small, focused functions:

def categorize_rate(rate):
    if rate < 50:
        return "low"
    elif rate < 80:
        return "medium"
    else:
        return "high"

def format_percentage(value):
    return f"{value:.1f}%"

def make_report_line(country, rate):
    category = categorize_rate(rate)
    formatted = format_percentage(rate)
    return f"{country}: {formatted} ({category})"

Now make_report_line calls both categorize_rate and format_percentage:

print(make_report_line("Brazil", 42.3))
print(make_report_line("Germany", 71.5))
print(make_report_line("Japan", 93.2))

Output:

Brazil: 42.3% (low)
Germany: 71.5% (medium)
Japan: 93.2% (high)

Each function does one thing. categorize_rate categorizes. format_percentage formats. make_report_line assembles. If Elena later decides to change the category thresholds, she edits categorize_rate in one place, and every report line automatically updates.

Decomposition: Breaking Big Problems into Small Pieces

The process of taking a big task and breaking it into smaller, manageable pieces is called decomposition. It's one of the most important skills in programming — and in data science more broadly.

Here's how to think about decomposition:

  1. Describe the big task in plain English. ("Generate a vaccination report for a list of countries.")
  2. Identify the sub-tasks. ("For each country: look up the rate, categorize it, format it, build a report line.")
  3. Turn each sub-task into a function. (categorize_rate, format_percentage, make_report_line)
  4. Write a main function or loop that orchestrates them.
def generate_report(countries, rates):
    print("=== Vaccination Report ===")
    low_count = 0
    for i in range(len(countries)):
        line = make_report_line(countries[i], rates[i])
        print(line)
        if categorize_rate(rates[i]) == "low":
            low_count = low_count + 1
    print(f"\nCountries with low rates: {low_count}")

countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]
rates = [42.3, 65.0, 88.1, 71.5, 93.2]
generate_report(countries, rates)

Output:

=== Vaccination Report ===
Brazil: 42.3% (low)
India: 65.0% (medium)
Nigeria: 88.1% (high)
Germany: 71.5% (medium)
Japan: 93.2% (high)

Countries with low rates: 1

Notice how readable the generate_report function is. Even without seeing the definitions of make_report_line and categorize_rate, you can follow what's happening. That's decomposition at work.

The Project Milestone Functions

This is the perfect time to write the helper functions for your progressive project. Elena needs three things for analyzing WHO vaccination data:

Function 1: Format a number as a percentage

def format_as_percentage(value, decimals=1):
    return f"{value:.{decimals}f}%"

The decimals=1 is a default parameter — if you call format_as_percentage(42.356), it uses 1 decimal place. But you can override it: format_as_percentage(42.356, 2) gives "42.36%".

Function 2: Validate a data value

def is_valid_rate(value):
    if not isinstance(value, (int, float)):
        return False
    if value < 0 or value > 100:
        return False
    return True

Function 3: Categorize a vaccination rate

def categorize_vaccination_rate(rate):
    if not is_valid_rate(rate):
        return "invalid"
    if rate < 50:
        return "low"
    elif rate < 80:
        return "medium"
    else:
        return "high"

These three functions represent your project milestone for Chapter 4. You'll use them in later chapters as the project grows. Save them in a cell in your project notebook.

# Test your milestone functions
test_values = [42.3, 65.0, 88.1, -5, 110, "abc"]

for val in test_values:
    valid = is_valid_rate(val)
    if valid:
        cat = categorize_vaccination_rate(val)
        fmt = format_as_percentage(val)
        print(f"{val} → {fmt} ({cat})")
    else:
        print(f"{val} → INVALID")

Output:

42.3 → 42.3% (low)
65.0 → 65.0% (medium)
88.1 → 88.1% (high)
-5 → INVALID
110 → INVALID
abc → INVALID

🔄 Check Your Understanding

  1. What is decomposition, and why is it valuable in programming?
  2. In the vaccination report example, how many functions does generate_report call?
  3. What is a default parameter, and when would you use one?
  4. (From Section 4.2) The generate_report function uses a for loop. How many iterations does it perform for a list of 5 countries?

4.6 Thinking Like a Programmer: Pseudocode and Problem Decomposition

We've covered the syntax — if, for, while, def. But knowing the syntax of a language doesn't make you a writer, and knowing the syntax of Python doesn't make you a programmer. What separates someone who knows Python from someone who can solve problems with Python is a way of thinking.

Pseudocode: Thinking Before Typing

Pseudocode is a way of writing out your program logic in plain English (or whatever language you think in) before you write actual code. It's not meant to run — it's meant to help you think.

Let's say Elena asks you: "I have a list of vaccination rates for 50 countries. I want to know how many are in each category (low, medium, high) and what the overall average rate is."

Before you touch the keyboard, write pseudocode:

SET low_count, medium_count, high_count to 0
SET total to 0

FOR each rate in the list:
    ADD rate to total
    IF rate < 50:
        ADD 1 to low_count
    ELSE IF rate < 80:
        ADD 1 to medium_count
    ELSE:
        ADD 1 to high_count

COMPUTE average as total / number of rates

PRINT the counts and the average

Now translating to Python is almost mechanical:

def analyze_rates(rates):
    low_count = 0
    medium_count = 0
    high_count = 0
    total = 0

    for rate in rates:
        total = total + rate
        if rate < 50:
            low_count = low_count + 1
        elif rate < 80:
            medium_count = medium_count + 1
        else:
            high_count = high_count + 1

    average = total / len(rates)

    print(f"Low: {low_count}")
    print(f"Medium: {medium_count}")
    print(f"High: {high_count}")
    print(f"Average rate: {average:.1f}%")

The pseudocode step might feel unnecessary for simple problems. But as problems get more complex — and they will, starting in the very next chapter — pseudocode becomes essential. Professional programmers use it all the time. It's not a beginner crutch; it's a professional tool.

The Problem-Solving Process

Here's a process that works for any programming problem:

  1. Understand the problem. What are the inputs? What are the expected outputs? Can you work through an example by hand?
  2. Write pseudocode. Describe your approach in plain language. Don't worry about syntax.
  3. Translate to code. Turn each pseudocode line into Python. Most lines will map almost one-to-one.
  4. Test with a small example. Don't start with 50 countries — start with 3. Verify by hand that the output is correct.
  5. Handle edge cases. What if the list is empty? What if a value is negative? What if there's only one item?

🧩 Productive Struggle: The Grading Summary

Here's a problem for you to work through. Don't look at the solution until you've spent at least 10 minutes on it. Struggle is where learning happens.

Problem: Jordan has a list of exam grades: [85, 92, 67, 78, 95, 43, 88, 71, 56, 90]. Write a program that: 1. Counts how many grades are A (90-100), B (80-89), C (70-79), D (60-69), and F (below 60) 2. Computes the class average 3. Prints a summary

Step 1: Try writing pseudocode first. What are the accumulators you need? What conditions define each grade?

Step 2: Try translating your pseudocode to Python. You have all the tools — loops, conditionals, accumulators.

Step 3: Check your answer against this solution:

Solution (click to expand)
def grade_summary(grades):
    a_count = 0
    b_count = 0
    c_count = 0
    d_count = 0
    f_count = 0
    total = 0

    for grade in grades:
        total = total + grade
        if grade >= 90:
            a_count = a_count + 1
        elif grade >= 80:
            b_count = b_count + 1
        elif grade >= 70:
            c_count = c_count + 1
        elif grade >= 60:
            d_count = d_count + 1
        else:
            f_count = f_count + 1

    average = total / len(grades)

    print("Grade Distribution:")
    print(f"  A: {a_count}")
    print(f"  B: {b_count}")
    print(f"  C: {c_count}")
    print(f"  D: {d_count}")
    print(f"  F: {f_count}")
    print(f"Class average: {average:.1f}")

grades = [85, 92, 67, 78, 95, 43, 88, 71, 56, 90]
grade_summary(grades)
Output:
Grade Distribution:
  A: 3
  B: 2
  C: 2
  D: 1
  F: 2
Class average: 76.5

If you got a different answer: That's okay — compare your approach to the solution and see where they diverge. Did you start the elif chain from the top (highest grade) or the bottom (lowest grade)? Both can work, but the order of conditions matters.

If you got the same answer: Excellent. Now try adding a validate_grade function that checks whether each grade is between 0 and 100, and modify grade_summary to skip invalid grades.


Tracing Execution: Becoming the Computer

One of the most valuable skills you can build is the ability to trace through code in your head (or on paper) and predict what will happen before running it. This skill is what separates people who write code from people who understand it.

A Tracing Exercise

What does this code print? Work through it step by step before reading the answer.

values = [3, 7, 2, 8, 1]
result = values[0]

for val in values:
    if val > result:
        result = val

print(result)

Trace:

Iteration val val > result? result after
Start 3
1 3 3 > 3? No 3
2 7 7 > 3? Yes 7
3 2 2 > 7? No 7
4 8 8 > 7? Yes 8
5 1 1 > 8? No 8

Answer: 8. The code finds the maximum value in the list. This is another fundamental pattern — and yes, Python has a built-in max() function that does this. But understanding the loop-based version helps you write similar patterns for tasks that don't have a built-in shortcut.

🔄 Check Your Understanding

  1. What is pseudocode, and why should you write it before writing real code?
  2. In the tracing exercise, what would the code print if the list were [5, 3, 5, 1, 5]? (Trace it!)
  3. (From Section 4.4) If you wrapped the maximum-finding code in a function called find_max(numbers), what should the function return?
  4. (From Chapter 3) What would happen if values were an empty list []? What error would the line result = values[0] produce?

Project Checkpoint: Your Chapter 4 Milestone

Your progressive project milestone for this chapter is to write three helper functions that you'll use in later chapters. If you've been following along, you've already seen them — but now it's time to make them official.

Open your project notebook (the one you created in Chapter 2) and add a new section called "Chapter 4: Helper Functions." Write these three functions and test each one:

# === Chapter 4 Project Milestone ===
# Helper functions for the Global Health Data Explorer

def format_as_percentage(value, decimals=1):
    """Convert a number to a formatted percentage string."""
    return f"{value:.{decimals}f}%"

def is_valid_rate(value):
    """Check if a value is a valid rate (number between 0 and 100)."""
    if not isinstance(value, (int, float)):
        return False
    if value < 0 or value > 100:
        return False
    return True

def categorize_vaccination_rate(rate):
    """Categorize a vaccination rate as low, medium, or high."""
    if not is_valid_rate(rate):
        return "invalid"
    if rate < 50:
        return "low"
    elif rate < 80:
        return "medium"
    else:
        return "high"

# Test all three functions
print("Testing format_as_percentage:")
print(f"  42.356 → {format_as_percentage(42.356)}")
print(f"  42.356 (2 dec) → {format_as_percentage(42.356, 2)}")
print(f"  100 → {format_as_percentage(100)}")

print("\nTesting is_valid_rate:")
for test in [42.3, 0, 100, -5, 110, "abc"]:
    print(f"  {test} → {is_valid_rate(test)}")

print("\nTesting categorize_vaccination_rate:")
for test in [25.0, 65.0, 92.0, -10, 150]:
    print(f"  {test} → {categorize_vaccination_rate(test)}")

Expected output:

Testing format_as_percentage:
  42.356 → 42.4%
  42.356 (2 dec) → 42.36%
  100 → 100.0%

Testing is_valid_rate:
  42.3 → True
  0 → True
  100 → True
  -5 → False
  110 → False
  abc → False

Testing categorize_vaccination_rate:
  25.0 → low
  65.0 → medium
  92.0 → high
  -10 → invalid
  150 → invalid

Notice the triple-quoted strings under each def line (like """Convert a number to a formatted percentage string."""). These are called docstrings — they describe what the function does. They're not comments — they're actually stored by Python and can be accessed with help(format_as_percentage). Writing docstrings is a professional habit worth starting now.


Practical Considerations

Code Style: Making Your Code Readable

You've probably noticed that all the code in this chapter follows certain patterns: consistent indentation, descriptive variable names, blank lines between sections. This isn't accidental. Code is read far more often than it's written, and readable code is easier to debug, easier to modify, and easier for future-you to understand.

Some guidelines:

  • Use 4 spaces for indentation. Not 2, not 8, not tabs. Four spaces. This is the Python standard.
  • Name functions with verbs: compute_average, validate_grade, format_percentage. A function does something.
  • Name variables with nouns: total_sales, country_name, vaccination_rate. A variable holds something.
  • Keep functions short. If a function is longer than about 15-20 lines, it's probably doing too much. Split it into smaller functions.
  • One function, one job. A function called compute_and_print_and_save_results is doing three jobs. Split it up.

Performance: Don't Worry About It (Yet)

You might wonder whether for loops are "slow" in Python. You may have heard that Python is slower than languages like C or Java. Both of these things are true in a technical sense — but they don't matter yet.

For the data sizes you'll work with in this course (hundreds to hundreds of thousands of rows), Python loops are perfectly fast. When you reach millions of rows in later chapters, you'll learn about pandas and NumPy, which use optimized C code under the hood to process data much faster than a Python loop. But the logic you're learning now — loops, conditionals, accumulator patterns — is the same logic that pandas and NumPy express in more compact form.

Learn the patterns now. Optimize later.


Summary: What You've Learned

This chapter introduced the three structures that make programs dynamic and powerful:

Concept What It Does Key Syntax
Conditional (if/elif/else) Makes decisions based on data if condition:
for loop Repeats an action for each item in a sequence for item in sequence:
while loop Repeats an action until a condition changes while condition:
Function (def) Names a reusable block of logic def name(params): ... return value
Pseudocode Plans logic in plain language before coding No syntax — just clear thinking

Key patterns you should now recognize: - Accumulator pattern: Initialize a variable, update it in a loop, use it after the loop. - Count-with-condition pattern: Combine an accumulator with an if inside a loop. - Decomposition: Break big problems into small functions that call each other. - DRY principle: If you've written the same code twice, write a function instead.

Key pitfalls to watch for: - IndentationError: Inconsistent or missing indentation in blocks. - Infinite loops: A while loop whose condition never becomes False. - Forgotten return: A function that computes a value but doesn't send it back. - Scope confusion: Trying to use a variable outside the function where it was created.


🔄 Spaced Review: Concepts from Chapters 1-3

These questions pull from earlier chapters to keep foundational knowledge fresh. Spend a few minutes on them before moving on.

  1. (From Chapter 1) What are the six stages of the data science lifecycle? Which stage does this chapter's project milestone contribute to? (Hint: you're building tools that will be used in the exploration stage.)

  2. (From Chapter 2) In a Jupyter notebook, what's the difference between a Code cell and a Markdown cell? If you define a function in one Code cell and call it in another, will that work? (Yes — as long as you run the definition cell first.)

  3. (From Chapter 3) What's the difference between = and == in Python? Where did you use each one in this chapter?

  4. (From Chapter 3) If vaccination_rate = 42.3, what is the type of vaccination_rate < 50? What is its value?

  5. (From Chapter 1) Elena's vaccination rate analysis involves categorizing data, computing summaries, and generating reports. Which stages of the data science lifecycle is she working in?


What's Next

You can now make decisions, repeat actions, and package logic into reusable functions. That's a huge step — with just if, for, and def, you can write programs that process real data.

But there's a limitation you've probably felt: we've been storing data in individual variables or simple lists. Elena's vaccination data has country names and rates and regions and population sizes. Marcus's sales data has dates and amounts and product names. How do you keep all of that organized?

In Chapter 5: Working with Data Structures, you'll learn about Python's built-in tools for organizing complex data — dictionaries that map keys to values, lists of dictionaries that represent tables of data, and file I/O that lets you load data from the outside world. The functions you wrote in this chapter are about to get a lot more useful, because they'll have real, structured data to work with.

See you there.