Elena is looking at vaccination data for 50 countries. She wants to label each country's vaccination rate as "low," "medium," or "high." She could type out 50 individual comparisons — but that would take all afternoon and would break the moment...
Learning Objectives
- Write conditional statements using if, elif, and else to make decisions based on data values
- Construct for loops to iterate over sequences and while loops for condition-based repetition
- Define functions with parameters and return values that encapsulate reusable logic
- Decompose a multi-step data processing task into a sequence of function calls
- Trace program execution through conditionals and loops, predicting output before running code
In This Chapter
- Chapter Overview
- 4.1 Making Decisions with if, elif, and else
- 4.2 Repeating Actions with for Loops
- 4.3 While Loops: Repeating Until a Condition Changes
- 4.4 Writing Your Own Functions
- 4.5 Functions That Call Functions: Building Up Complexity
- 4.6 Thinking Like a Programmer: Pseudocode and Problem Decomposition
- Tracing Execution: Becoming the Computer
- Project Checkpoint: Your Chapter 4 Milestone
- Practical Considerations
- Summary: What You've Learned
- 🔄 Spaced Review: Concepts from Chapters 1-3
- What's Next
Chapter 4: Python Fundamentals II: Control Flow, Functions, and Thinking Like a Programmer
"First, solve the problem. Then, write the code." — John Johnson
Chapter Overview
Elena is looking at vaccination data for 50 countries. She wants to label each country's vaccination rate as "low," "medium," or "high." She could type out 50 individual comparisons — but that would take all afternoon and would break the moment someone added a 51st country.
Marcus has weekly sales figures for every day his bakery has been open. He wants the total, the average, and the best day — every single week, automatically, without copying formulas in a spreadsheet.
Priya needs to convert raw three-point shooting percentages into readable labels like "35.2%" for her article, and she wants to do it the same way every time, without accidentally formatting one number differently from the rest.
Jordan wants to check whether every grade in a dataset is a valid number between 0 and 100 before running any analysis, so one bad entry doesn't silently corrupt the results.
All four of them need the same three things: a way to make decisions (if this, then that), a way to repeat actions (do this for every item), and a way to package logic into reusable pieces (define it once, use it anywhere). Those three things — conditionals, loops, and functions — are what turn a collection of isolated Python statements into a real program.
In this chapter, you will learn to:
- Write conditional statements using
if,elif, andelseto make decisions based on data values (all paths) - Construct
forloops to iterate over sequences andwhileloops for condition-based repetition (all paths) - Define functions with parameters and return values that encapsulate reusable logic (all paths)
- Decompose a multi-step data processing task into a sequence of function calls (standard + deep dive paths)
- Trace program execution through conditionals and loops, predicting output before running code (all paths)
What you need before starting: Everything from Chapter 3 — variables, data types (int, float, str, bool), arithmetic operators, comparison operators, and f-strings. If you can create a variable, do math with it, compare two values with > or ==, and print the result using an f-string, you're ready.
4.1 Making Decisions with if, elif, and else
Programs would be pretty boring if they always did the same thing. The power of programming begins the moment your code can choose — when it can look at a piece of data and decide what to do based on what it sees.
Your First Conditional
Recall from Chapter 3 that comparison operators like >, <, ==, and != produce boolean values — True or False. Those booleans are about to become very useful.
Let's say Elena has a country's vaccination rate stored in a variable, and she wants to print a message if the rate is below 50%:
vaccination_rate = 42.3
if vaccination_rate < 50:
print("Warning: low vaccination rate")
Output:
Warning: low vaccination rate
That's an if statement — the simplest form of a conditional. Let's break down every piece:
ifis the keyword that starts the conditional.vaccination_rate < 50is the boolean expression — the test. Python evaluates it and gets eitherTrueorFalse.- The colon
:at the end of theifline is required. Forget it, and Python will complain. - The next line is indented by four spaces. This indentation isn't decorative — it tells Python that this line belongs to the
ifblock. Everything indented under theifonly runs when the condition isTrue.
Try changing the vaccination rate to 72.1 and running it again. Nothing prints — because the condition 72.1 < 50 is False, so Python skips the indented block entirely.
Adding else: What to Do When the Condition Is False
Often, you want to do one thing if the condition is true and something different if it's false. That's where else comes in:
vaccination_rate = 72.1
if vaccination_rate < 50:
print("Warning: low vaccination rate")
else:
print("Vaccination rate is acceptable")
Output:
Vaccination rate is acceptable
The else block catches everything that the if didn't. There's no condition on the else line — it simply means "otherwise." Notice that else is at the same indentation level as if, with its own colon and its own indented block.
Multiple Categories with elif
Elena doesn't just want "low" and "not low." She wants three categories: low (below 50%), medium (50% to 79%), and high (80% and above). For that, she needs elif — short for "else if":
vaccination_rate = 65.0
if vaccination_rate < 50:
category = "low"
elif vaccination_rate < 80:
category = "medium"
else:
category = "high"
print(f"{vaccination_rate}% → {category}")
Output:
65.0% → medium
Here's how Python evaluates this, step by step:
- Is
vaccination_rate < 50? Is65.0 < 50? No. Skip this block. - Is
vaccination_rate < 80? Is65.0 < 80? Yes. Execute this block:category = "medium". - Since a condition matched, skip the
elseblock entirely.
The elif chain is evaluated top to bottom, and Python stops at the first condition that's True. This matters! If you wrote the conditions in a different order, you'd get different results.
Why don't we write
elif vaccination_rate >= 50 and vaccination_rate < 80? Because by the time Python reaches theelif, it already knows the rate is not less than 50 (it failed the firstif). So the>= 50check is redundant. Writing conditions that assume what's already been ruled out makes your code shorter and clearer. This is a habit worth building from the start.
Indentation Matters — Really
In many programming languages, indentation is optional — it makes code prettier but doesn't change how it runs. Python is different. Indentation is how Python knows which lines belong to which block. This is one of Python's most distinctive features, and it catches every beginner at least once.
vaccination_rate = 42.3
if vaccination_rate < 50:
print("Warning: low rate")
print("Consider outreach programs")
print("Analysis complete")
Output:
Warning: low rate
Consider outreach programs
Analysis complete
The first two print statements are indented — they're inside the if block and only run when the condition is true. The third print is not indented — it's outside the if block and runs no matter what. Now change the rate to 85.0:
Output:
Analysis complete
Only the un-indented line runs, because the condition was false and the indented lines were skipped.
🐛 Debugging Spotlight: IndentationError
If you mix up your indentation, Python won't guess what you meant. It will stop and tell you:
python if vaccination_rate < 50: print("Warning") # Oops — no indentation!
IndentationError: expected an indented block after 'if' statement on line 1The fix: Make sure every line inside an
if,elif, orelseblock is indented consistently — four spaces is the standard. Most code editors (including Jupyter) will do this automatically when you press Tab after a colon. If you get this error, look at the line Python is pointing to and check that it's indented properly.Also watch out for mixing tabs and spaces. If your editor uses tabs in one place and spaces in another, Python may throw a
TabError. The simplest prevention: configure your editor to insert four spaces when you press Tab. Jupyter does this by default.
Nesting Conditionals
You can put an if inside another if. Elena might want to add a special flag for critically low rates:
vaccination_rate = 22.5
if vaccination_rate < 50:
category = "low"
if vaccination_rate < 25:
category = "critically low"
else:
category = "acceptable"
print(f"{vaccination_rate}% → {category}")
Output:
22.5% → critically low
Nesting works, but deep nesting (three or four levels) makes code hard to read. If you find yourself nesting more than two levels, it's usually a sign that you should restructure your logic — often by using functions, which we'll learn in Section 4.4.
🔄 Check Your Understanding
- What is a boolean expression, and where does it appear in an
ifstatement?- What happens if you forget the colon at the end of an
ifline?- In the three-category example (low/medium/high), what would
categorybe ifvaccination_ratewere exactly 80.0?- (From Chapter 3) What data type does
vaccination_rate < 50produce?
4.2 Repeating Actions with for Loops
Conditionals let your program make decisions. Loops let your program do things over and over again — and "over and over again" is about 90% of what data science code actually does.
The Basic for Loop
Elena has a list of country names. (We'll formally learn about Python lists in Chapter 5, but for now, just think of a list as a sequence of items between square brackets.) She wants to print each one:
countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]
for country in countries:
print(country)
Output:
Brazil
India
Nigeria
Germany
Japan
Let's read this almost like English: "For each country in the list countries, print the country." The variable country is called the loop variable — it automatically takes on each value in the sequence, one at a time. On the first pass (or iteration), country is "Brazil". On the second iteration, it's "India". And so on, until the list is exhausted.
The key pieces:
- for starts the loop.
- country is the loop variable (you can name it anything, but descriptive names help).
- in countries tells Python what to iterate over.
- The colon : and the indented block work exactly like if — everything indented runs once per iteration.
Iteration is the technical term for one pass through the loop. If the list has 5 items, the loop does 5 iterations. If it has 5,000 items, it does 5,000 iterations — and Python doesn't break a sweat.
Combining Loops and Conditionals
Here's where things start to feel powerful. Elena can loop through countries and make decisions about each one:
rates = [42.3, 65.0, 88.1, 71.5, 93.2]
countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]
for i in range(len(countries)):
rate = rates[i]
country = countries[i]
if rate < 50:
label = "LOW"
elif rate < 80:
label = "medium"
else:
label = "high"
print(f"{country}: {rate}% ({label})")
Output:
Brazil: 42.3% (LOW)
India: 65.0% (medium)
Nigeria: 88.1% (high)
Germany: 71.5% (medium)
Japan: 93.2% (high)
Don't worry about the range(len(...)) syntax for now — we'll revisit it in Chapter 5 when we dig deeper into lists. The important thing is the pattern: a loop that processes each item, and a conditional inside the loop that handles each item differently.
This is exactly the kind of task that would be tedious to do by hand for 5 countries, painful for 50, and impossible for 5,000. But for Python, it's the same amount of effort no matter how long the list is.
Accumulating a Total
One of the most common loop patterns in data science is the accumulator — starting with zero and adding to it on each iteration:
daily_sales = [420, 380, 510, 475, 390, 620, 550]
total = 0
for sale in daily_sales:
total = total + sale
print(f"Weekly total: ${total}")
print(f"Daily average: ${total / len(daily_sales):.2f}")
Output:
Weekly total: $3345
Daily average: $477.86
Marcus would love this. Instead of adding seven numbers by hand, he writes a loop that works for any number of days. The pattern is always the same: create a variable before the loop (the accumulator), update it inside the loop, and use it after the loop.
Looping with range()
Sometimes you don't have a list to loop over — you just want to do something a specific number of times. The range() function generates a sequence of numbers:
for i in range(5):
print(f"Iteration {i}")
Output:
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Notice that range(5) produces 0, 1, 2, 3, 4 — five numbers, starting from 0. This zero-based counting might feel odd, but it's a Python convention you'll get used to quickly. (It matches how list indexing works, which you'll see in Chapter 5.)
You can also give range() a start and end: range(1, 6) produces 1, 2, 3, 4, 5. And you can add a step: range(0, 100, 10) produces 0, 10, 20, 30, 40, 50, 60, 70, 80, 90.
Counting Items That Meet a Condition
Here's another bread-and-butter pattern — counting how many items satisfy some criterion:
rates = [42.3, 65.0, 88.1, 71.5, 93.2, 38.7, 55.0]
low_count = 0
for rate in rates:
if rate < 50:
low_count = low_count + 1
print(f"{low_count} countries have low vaccination rates")
Output:
2 countries have low vaccination rates
This combines the accumulator pattern with a conditional inside the loop. It's a pattern you'll use constantly in data science: "How many X satisfy condition Y?"
🔄 Check Your Understanding
- What is the purpose of the loop variable in a
forloop?- How many times does
for i in range(10):execute its body?- (From Chapter 3) In the accumulator pattern, what type is the variable
total— and what would happen if you forgot to initialize it to0before the loop?- Write a
forloop (mentally or on paper) that prints the squares of the numbers 1 through 5. What output would you expect?
4.3 While Loops: Repeating Until a Condition Changes
A for loop is perfect when you know in advance how many iterations you want — you have a list, a range, or some other sequence to go through. But sometimes you want to keep going until something changes, and you don't know when that will be. That's a while loop.
The Basic while Loop
count = 1
while count <= 5:
print(f"Count is {count}")
count = count + 1
print("Done!")
Output:
Count is 1
Count is 2
Count is 3
Count is 4
Count is 5
Done!
A while loop checks its condition before each iteration. If the condition is True, it runs the body and checks again. If it's False, it stops. The key difference from a for loop: you are responsible for making the condition eventually become False. If you don't, the loop runs forever.
Practical Use: Input Validation
Here's a scenario Jordan might face. Before analyzing grades, he wants to make sure each value is reasonable:
grade = -5 # Simulate a bad input
while grade < 0 or grade > 100:
print(f"Invalid grade: {grade}")
print("Grade must be between 0 and 100")
grade = 75 # Simulate getting a corrected value
print(f"Valid grade accepted: {grade}")
Output:
Invalid grade: -5
Grade must be between 0 and 100
Valid grade accepted: 75
In a real interactive program, the line grade = 75 would be replaced by asking the user for new input. The while loop keeps asking until it gets something valid. We're simulating the pattern here — the important idea is that the loop doesn't know in advance how many times it will run. It depends on the data.
A Data Processing Example
Let's say Marcus wants to process sales data until he encounters a zero, which signals the end of the data:
sales_data = [420, 380, 510, 0, 475, 390]
index = 0
total = 0
while index < len(sales_data) and sales_data[index] != 0:
total = total + sales_data[index]
index = index + 1
print(f"Processed {index} days, total: ${total}")
Output:
Processed 3 days, total: $1310
The loop stops as soon as it hits the zero — it doesn't process the numbers after it. This "process until a sentinel value" pattern is common in real-world data.
🐛 Debugging Spotlight: Infinite Loops
The most common
whileloop bug is an infinite loop — a loop whose condition never becomesFalse:
python count = 1 while count <= 5: print(f"Count is {count}") # Oops! Forgot to update count!This prints "Count is 1" forever (or until you interrupt it). In Jupyter, you'll see the cell keep running with a
[*]that never turns into a number. To stop it, click the "Interrupt Kernel" button (the square icon) or pressCtrl+C.Prevention: Every
whileloop should have something inside the body that moves the condition towardFalse. Before you run awhileloop, ask yourself: "What changes on each iteration that will eventually make this stop?"
When to Use for vs. while
Here's a simple guideline:
- Use
forwhen you know what you're iterating over — a list, a range, a sequence of some kind. This covers the vast majority of data science loops. - Use
whilewhen you're waiting for a condition to change and you don't know how many iterations it will take — validation loops, convergence algorithms, or processing data until a stop condition.
In practice, you'll use for loops about 90% of the time in data science. But understanding while loops is important because they appear in algorithms, simulations, and anywhere you need "keep going until X."
🔄 Check Your Understanding
- What is the key difference between a
forloop and awhileloop?- What happens if the condition in a
whileloop starts asFalse?- Name one scenario where a
whileloop is more appropriate than aforloop.- (From Section 4.1) If you put an
ifstatement inside awhileloop, which controls how many times theifis evaluated — thewhileor theif?
4.4 Writing Your Own Functions
Everything so far in this chapter has been about making programs smarter: conditionals make decisions, loops repeat actions. But there's a problem lurking. Watch what happens when Marcus tries to compute the same summary for two different weeks:
# Week 1
week1_sales = [420, 380, 510, 475, 390, 620, 550]
week1_total = 0
for sale in week1_sales:
week1_total = week1_total + sale
week1_avg = week1_total / len(week1_sales)
print(f"Week 1: total=${week1_total}, avg=${week1_avg:.2f}")
# Week 2
week2_sales = [395, 410, 480, 520, 445, 580, 610]
week2_total = 0
for sale in week2_sales:
week2_total = week2_total + sale
week2_avg = week2_total / len(week2_sales)
print(f"Week 2: total=${week2_total}, avg=${week2_avg:.2f}")
Do you see the repetition? The logic for computing the total and average is identical — the only thing that changes is the data. Marcus just copy-pasted the code and changed variable names. That works for two weeks. But what about 52 weeks? What if he decides to also track the maximum? He'd have to make the same change in 52 places.
This is the problem that functions solve.
The DRY Principle
There's a famous programming principle called DRY — Don't Repeat Yourself. The idea is simple: if you find yourself writing the same code more than once, something is wrong. You should write it once, give it a name, and then reuse it.
The tool for this is the function — a named block of code that performs a specific task, can accept input, and can produce output.
Defining Your First Function
def summarize_sales(daily_sales):
total = 0
for sale in daily_sales:
total = total + sale
average = total / len(daily_sales)
return total, average
Let's break this down:
defis the keyword that starts a function definition. It's short for "define."summarize_salesis the function's name. Like variable names, function names should be descriptive and use lowercase with underscores.(daily_sales)is the parameter — a variable that represents the input the function expects. When you call the function, you'll pass in actual data, and it will be assigned to this parameter name.- The colon
:and indented block work just likeifandfor— everything indented is the function's body. return total, averageis the return statement — it sends results back to whoever called the function. A function without areturnstatement returnsNone(Python's way of saying "nothing").
Now Marcus can use this function for any week:
week1_total, week1_avg = summarize_sales([420, 380, 510, 475, 390, 620, 550])
week2_total, week2_avg = summarize_sales([395, 410, 480, 520, 445, 580, 610])
print(f"Week 1: total=${week1_total}, avg=${week1_avg:.2f}")
print(f"Week 2: total=${week2_total}, avg=${week2_avg:.2f}")
Output:
Week 1: total=$3345, avg=$477.86
Week 2: total=$3440, avg=$491.43
Two lines of code instead of twelve. And if Marcus wants to add a third week, fourth week, or fifty-second week, it's one more line each.
🚪 Threshold Concept: Functions as Abstractions
This is one of those moments where something fundamental shifts in how you think about programming. Take a breath — this is important.
Before functions, you think about code line by line: "First do this, then do that, then do the next thing." Every detail is in front of you, all the time.
With functions, you start thinking at a higher level: "Summarize the sales data." You don't care how the summarizing works — you trust that the function does it correctly, the same way you trust that print() prints things without thinking about the engineering inside print().
This shift from "how does it work step by step?" to "what does it accomplish?" is called abstraction, and it's the most important idea in all of computer science. It's also how the human brain naturally works — when you drive a car, you think "turn left" rather than "contract the muscles in my left arm to rotate the steering wheel 90 degrees counterclockwise while simultaneously easing pressure on the accelerator pedal."
Functions let you name a block of logic. Once it has a name, you can think about it as a single concept rather than a sequence of steps. You can say summarize_sales(data) and your brain processes that as one idea, not seven lines of code. This is how programmers manage complexity — by building layers of abstraction, where each layer hides the details of the layer below.
Here's the practical payoff: when you read code that uses well-named functions, you can understand what it does without reading the function definitions:
data = load_vaccination_data("who_data.csv")
cleaned = remove_missing_values(data)
summary = compute_regional_averages(cleaned)
display_report(summary)
You don't know how any of these functions work internally. But you can read the program and understand its purpose: load data, clean it, compute averages, display results. That's the power of abstraction.
If this feels obvious: Good — it means you're already thinking like a programmer. The idea gets more powerful as programs get more complex.
If this feels abstract (pun intended): That's completely normal. The concept will solidify as you write more functions. For now, just remember: a function lets you give a name to a process, and once it has a name, you can use it without thinking about its insides.
Parameters vs. Arguments
These two terms are often used interchangeably, but they have a precise difference:
- A parameter is the variable name in the function definition:
def summarize_sales(daily_sales)— here,daily_salesis the parameter. - An argument is the actual value you pass when you call the function:
summarize_sales([420, 380, 510])— here,[420, 380, 510]is the argument.
Think of it this way: the parameter is the placeholder; the argument is the real data that fills it in. Most of the time you don't need to worry about the distinction, but it helps when reading documentation.
Return Values: Getting Results Back
A function can return a value, and you can capture that value in a variable:
def format_percentage(value):
return f"{value:.1f}%"
result = format_percentage(42.356)
print(result)
Output:
42.4%
The return statement does two things: it sends a value back to the caller, and it immediately exits the function. Any code after a return statement (at the same indentation level) will never run.
A function can return multiple values separated by commas, as we saw with summarize_sales. Python packs them into a tuple (a concept we'll cover in Chapter 5), and you can unpack them into separate variables:
total, avg = summarize_sales([100, 200, 300])
🐛 Debugging Spotlight: The Forgotten
returnOne of the most common function bugs is forgetting to include a
returnstatement:```python def compute_average(numbers): total = 0 for n in numbers: total = total + n average = total / len(numbers) # Oops — forgot to return the average!
result = compute_average([10, 20, 30]) print(result) ```
Output:
NoneThe function computes the average correctly inside itself, but it never sends the result back. The variable
resultgetsNone— Python's default return value. This bug is sneaky because there's no error message. Everything runs fine; the answer just disappears.The fix: If your function computes something, make sure the last meaningful line is
return <the_thing_you_computed>.
Scope: What Happens Inside Stays Inside
Variables created inside a function exist only inside that function. This is called scope:
def greet(name):
message = f"Hello, {name}!"
return message
greet("Elena")
# print(message) # This would cause a NameError!
The variable message exists inside greet but not outside it. If you uncomment the last line, you'll get NameError: name 'message' is not defined. This is actually a feature, not a bug — it means you can use common variable names like total, count, or result inside different functions without them interfering with each other.
🔄 Check Your Understanding
- What does the DRY principle stand for, and why does it matter?
- What is the difference between a parameter and an argument?
- What does a function return if you forget to include a
returnstatement?- (From Chapter 3) What data type is
None— and how would you check whether a variable isNone?
4.5 Functions That Call Functions: Building Up Complexity
So far, each function has been self-contained. But the real power of functions comes when they work together — when one function calls another, and that one calls another, building layers of abstraction.
A Concrete Example
Let's build a mini data-processing pipeline for Elena. She wants to take a vaccination rate, categorize it, format it as a percentage, and produce a report string — all in one go.
First, we write small, focused functions:
def categorize_rate(rate):
if rate < 50:
return "low"
elif rate < 80:
return "medium"
else:
return "high"
def format_percentage(value):
return f"{value:.1f}%"
def make_report_line(country, rate):
category = categorize_rate(rate)
formatted = format_percentage(rate)
return f"{country}: {formatted} ({category})"
Now make_report_line calls both categorize_rate and format_percentage:
print(make_report_line("Brazil", 42.3))
print(make_report_line("Germany", 71.5))
print(make_report_line("Japan", 93.2))
Output:
Brazil: 42.3% (low)
Germany: 71.5% (medium)
Japan: 93.2% (high)
Each function does one thing. categorize_rate categorizes. format_percentage formats. make_report_line assembles. If Elena later decides to change the category thresholds, she edits categorize_rate in one place, and every report line automatically updates.
Decomposition: Breaking Big Problems into Small Pieces
The process of taking a big task and breaking it into smaller, manageable pieces is called decomposition. It's one of the most important skills in programming — and in data science more broadly.
Here's how to think about decomposition:
- Describe the big task in plain English. ("Generate a vaccination report for a list of countries.")
- Identify the sub-tasks. ("For each country: look up the rate, categorize it, format it, build a report line.")
- Turn each sub-task into a function. (
categorize_rate,format_percentage,make_report_line) - Write a main function or loop that orchestrates them.
def generate_report(countries, rates):
print("=== Vaccination Report ===")
low_count = 0
for i in range(len(countries)):
line = make_report_line(countries[i], rates[i])
print(line)
if categorize_rate(rates[i]) == "low":
low_count = low_count + 1
print(f"\nCountries with low rates: {low_count}")
countries = ["Brazil", "India", "Nigeria", "Germany", "Japan"]
rates = [42.3, 65.0, 88.1, 71.5, 93.2]
generate_report(countries, rates)
Output:
=== Vaccination Report ===
Brazil: 42.3% (low)
India: 65.0% (medium)
Nigeria: 88.1% (high)
Germany: 71.5% (medium)
Japan: 93.2% (high)
Countries with low rates: 1
Notice how readable the generate_report function is. Even without seeing the definitions of make_report_line and categorize_rate, you can follow what's happening. That's decomposition at work.
The Project Milestone Functions
This is the perfect time to write the helper functions for your progressive project. Elena needs three things for analyzing WHO vaccination data:
Function 1: Format a number as a percentage
def format_as_percentage(value, decimals=1):
return f"{value:.{decimals}f}%"
The decimals=1 is a default parameter — if you call format_as_percentage(42.356), it uses 1 decimal place. But you can override it: format_as_percentage(42.356, 2) gives "42.36%".
Function 2: Validate a data value
def is_valid_rate(value):
if not isinstance(value, (int, float)):
return False
if value < 0 or value > 100:
return False
return True
Function 3: Categorize a vaccination rate
def categorize_vaccination_rate(rate):
if not is_valid_rate(rate):
return "invalid"
if rate < 50:
return "low"
elif rate < 80:
return "medium"
else:
return "high"
These three functions represent your project milestone for Chapter 4. You'll use them in later chapters as the project grows. Save them in a cell in your project notebook.
# Test your milestone functions
test_values = [42.3, 65.0, 88.1, -5, 110, "abc"]
for val in test_values:
valid = is_valid_rate(val)
if valid:
cat = categorize_vaccination_rate(val)
fmt = format_as_percentage(val)
print(f"{val} → {fmt} ({cat})")
else:
print(f"{val} → INVALID")
Output:
42.3 → 42.3% (low)
65.0 → 65.0% (medium)
88.1 → 88.1% (high)
-5 → INVALID
110 → INVALID
abc → INVALID
🔄 Check Your Understanding
- What is decomposition, and why is it valuable in programming?
- In the vaccination report example, how many functions does
generate_reportcall?- What is a default parameter, and when would you use one?
- (From Section 4.2) The
generate_reportfunction uses aforloop. How many iterations does it perform for a list of 5 countries?
4.6 Thinking Like a Programmer: Pseudocode and Problem Decomposition
We've covered the syntax — if, for, while, def. But knowing the syntax of a language doesn't make you a writer, and knowing the syntax of Python doesn't make you a programmer. What separates someone who knows Python from someone who can solve problems with Python is a way of thinking.
Pseudocode: Thinking Before Typing
Pseudocode is a way of writing out your program logic in plain English (or whatever language you think in) before you write actual code. It's not meant to run — it's meant to help you think.
Let's say Elena asks you: "I have a list of vaccination rates for 50 countries. I want to know how many are in each category (low, medium, high) and what the overall average rate is."
Before you touch the keyboard, write pseudocode:
SET low_count, medium_count, high_count to 0
SET total to 0
FOR each rate in the list:
ADD rate to total
IF rate < 50:
ADD 1 to low_count
ELSE IF rate < 80:
ADD 1 to medium_count
ELSE:
ADD 1 to high_count
COMPUTE average as total / number of rates
PRINT the counts and the average
Now translating to Python is almost mechanical:
def analyze_rates(rates):
low_count = 0
medium_count = 0
high_count = 0
total = 0
for rate in rates:
total = total + rate
if rate < 50:
low_count = low_count + 1
elif rate < 80:
medium_count = medium_count + 1
else:
high_count = high_count + 1
average = total / len(rates)
print(f"Low: {low_count}")
print(f"Medium: {medium_count}")
print(f"High: {high_count}")
print(f"Average rate: {average:.1f}%")
The pseudocode step might feel unnecessary for simple problems. But as problems get more complex — and they will, starting in the very next chapter — pseudocode becomes essential. Professional programmers use it all the time. It's not a beginner crutch; it's a professional tool.
The Problem-Solving Process
Here's a process that works for any programming problem:
- Understand the problem. What are the inputs? What are the expected outputs? Can you work through an example by hand?
- Write pseudocode. Describe your approach in plain language. Don't worry about syntax.
- Translate to code. Turn each pseudocode line into Python. Most lines will map almost one-to-one.
- Test with a small example. Don't start with 50 countries — start with 3. Verify by hand that the output is correct.
- Handle edge cases. What if the list is empty? What if a value is negative? What if there's only one item?
🧩 Productive Struggle: The Grading Summary
Here's a problem for you to work through. Don't look at the solution until you've spent at least 10 minutes on it. Struggle is where learning happens.
Problem: Jordan has a list of exam grades: [85, 92, 67, 78, 95, 43, 88, 71, 56, 90]. Write a program that:
1. Counts how many grades are A (90-100), B (80-89), C (70-79), D (60-69), and F (below 60)
2. Computes the class average
3. Prints a summary
Step 1: Try writing pseudocode first. What are the accumulators you need? What conditions define each grade?
Step 2: Try translating your pseudocode to Python. You have all the tools — loops, conditionals, accumulators.
Step 3: Check your answer against this solution:
Solution (click to expand)
def grade_summary(grades):
a_count = 0
b_count = 0
c_count = 0
d_count = 0
f_count = 0
total = 0
for grade in grades:
total = total + grade
if grade >= 90:
a_count = a_count + 1
elif grade >= 80:
b_count = b_count + 1
elif grade >= 70:
c_count = c_count + 1
elif grade >= 60:
d_count = d_count + 1
else:
f_count = f_count + 1
average = total / len(grades)
print("Grade Distribution:")
print(f" A: {a_count}")
print(f" B: {b_count}")
print(f" C: {c_count}")
print(f" D: {d_count}")
print(f" F: {f_count}")
print(f"Class average: {average:.1f}")
grades = [85, 92, 67, 78, 95, 43, 88, 71, 56, 90]
grade_summary(grades)
Output:
Grade Distribution:
A: 3
B: 2
C: 2
D: 1
F: 2
Class average: 76.5
If you got a different answer: That's okay — compare your approach to the solution and see where they diverge. Did you start the elif chain from the top (highest grade) or the bottom (lowest grade)? Both can work, but the order of conditions matters.
If you got the same answer: Excellent. Now try adding a validate_grade function that checks whether each grade is between 0 and 100, and modify grade_summary to skip invalid grades.
Tracing Execution: Becoming the Computer
One of the most valuable skills you can build is the ability to trace through code in your head (or on paper) and predict what will happen before running it. This skill is what separates people who write code from people who understand it.
A Tracing Exercise
What does this code print? Work through it step by step before reading the answer.
values = [3, 7, 2, 8, 1]
result = values[0]
for val in values:
if val > result:
result = val
print(result)
Trace:
| Iteration | val |
val > result? |
result after |
|---|---|---|---|
| Start | — | — | 3 |
| 1 | 3 | 3 > 3? No | 3 |
| 2 | 7 | 7 > 3? Yes | 7 |
| 3 | 2 | 2 > 7? No | 7 |
| 4 | 8 | 8 > 7? Yes | 8 |
| 5 | 1 | 1 > 8? No | 8 |
Answer: 8. The code finds the maximum value in the list. This is another fundamental pattern — and yes, Python has a built-in max() function that does this. But understanding the loop-based version helps you write similar patterns for tasks that don't have a built-in shortcut.
🔄 Check Your Understanding
- What is pseudocode, and why should you write it before writing real code?
- In the tracing exercise, what would the code print if the list were
[5, 3, 5, 1, 5]? (Trace it!)- (From Section 4.4) If you wrapped the maximum-finding code in a function called
find_max(numbers), what should the functionreturn?- (From Chapter 3) What would happen if
valueswere an empty list[]? What error would the lineresult = values[0]produce?
Project Checkpoint: Your Chapter 4 Milestone
Your progressive project milestone for this chapter is to write three helper functions that you'll use in later chapters. If you've been following along, you've already seen them — but now it's time to make them official.
Open your project notebook (the one you created in Chapter 2) and add a new section called "Chapter 4: Helper Functions." Write these three functions and test each one:
# === Chapter 4 Project Milestone ===
# Helper functions for the Global Health Data Explorer
def format_as_percentage(value, decimals=1):
"""Convert a number to a formatted percentage string."""
return f"{value:.{decimals}f}%"
def is_valid_rate(value):
"""Check if a value is a valid rate (number between 0 and 100)."""
if not isinstance(value, (int, float)):
return False
if value < 0 or value > 100:
return False
return True
def categorize_vaccination_rate(rate):
"""Categorize a vaccination rate as low, medium, or high."""
if not is_valid_rate(rate):
return "invalid"
if rate < 50:
return "low"
elif rate < 80:
return "medium"
else:
return "high"
# Test all three functions
print("Testing format_as_percentage:")
print(f" 42.356 → {format_as_percentage(42.356)}")
print(f" 42.356 (2 dec) → {format_as_percentage(42.356, 2)}")
print(f" 100 → {format_as_percentage(100)}")
print("\nTesting is_valid_rate:")
for test in [42.3, 0, 100, -5, 110, "abc"]:
print(f" {test} → {is_valid_rate(test)}")
print("\nTesting categorize_vaccination_rate:")
for test in [25.0, 65.0, 92.0, -10, 150]:
print(f" {test} → {categorize_vaccination_rate(test)}")
Expected output:
Testing format_as_percentage:
42.356 → 42.4%
42.356 (2 dec) → 42.36%
100 → 100.0%
Testing is_valid_rate:
42.3 → True
0 → True
100 → True
-5 → False
110 → False
abc → False
Testing categorize_vaccination_rate:
25.0 → low
65.0 → medium
92.0 → high
-10 → invalid
150 → invalid
Notice the triple-quoted strings under each def line (like """Convert a number to a formatted percentage string."""). These are called docstrings — they describe what the function does. They're not comments — they're actually stored by Python and can be accessed with help(format_as_percentage). Writing docstrings is a professional habit worth starting now.
Practical Considerations
Code Style: Making Your Code Readable
You've probably noticed that all the code in this chapter follows certain patterns: consistent indentation, descriptive variable names, blank lines between sections. This isn't accidental. Code is read far more often than it's written, and readable code is easier to debug, easier to modify, and easier for future-you to understand.
Some guidelines:
- Use 4 spaces for indentation. Not 2, not 8, not tabs. Four spaces. This is the Python standard.
- Name functions with verbs:
compute_average,validate_grade,format_percentage. A function does something. - Name variables with nouns:
total_sales,country_name,vaccination_rate. A variable holds something. - Keep functions short. If a function is longer than about 15-20 lines, it's probably doing too much. Split it into smaller functions.
- One function, one job. A function called
compute_and_print_and_save_resultsis doing three jobs. Split it up.
Performance: Don't Worry About It (Yet)
You might wonder whether for loops are "slow" in Python. You may have heard that Python is slower than languages like C or Java. Both of these things are true in a technical sense — but they don't matter yet.
For the data sizes you'll work with in this course (hundreds to hundreds of thousands of rows), Python loops are perfectly fast. When you reach millions of rows in later chapters, you'll learn about pandas and NumPy, which use optimized C code under the hood to process data much faster than a Python loop. But the logic you're learning now — loops, conditionals, accumulator patterns — is the same logic that pandas and NumPy express in more compact form.
Learn the patterns now. Optimize later.
Summary: What You've Learned
This chapter introduced the three structures that make programs dynamic and powerful:
| Concept | What It Does | Key Syntax |
|---|---|---|
Conditional (if/elif/else) |
Makes decisions based on data | if condition: |
for loop |
Repeats an action for each item in a sequence | for item in sequence: |
while loop |
Repeats an action until a condition changes | while condition: |
Function (def) |
Names a reusable block of logic | def name(params): ... return value |
| Pseudocode | Plans logic in plain language before coding | No syntax — just clear thinking |
Key patterns you should now recognize:
- Accumulator pattern: Initialize a variable, update it in a loop, use it after the loop.
- Count-with-condition pattern: Combine an accumulator with an if inside a loop.
- Decomposition: Break big problems into small functions that call each other.
- DRY principle: If you've written the same code twice, write a function instead.
Key pitfalls to watch for:
- IndentationError: Inconsistent or missing indentation in blocks.
- Infinite loops: A while loop whose condition never becomes False.
- Forgotten return: A function that computes a value but doesn't send it back.
- Scope confusion: Trying to use a variable outside the function where it was created.
🔄 Spaced Review: Concepts from Chapters 1-3
These questions pull from earlier chapters to keep foundational knowledge fresh. Spend a few minutes on them before moving on.
-
(From Chapter 1) What are the six stages of the data science lifecycle? Which stage does this chapter's project milestone contribute to? (Hint: you're building tools that will be used in the exploration stage.)
-
(From Chapter 2) In a Jupyter notebook, what's the difference between a Code cell and a Markdown cell? If you define a function in one Code cell and call it in another, will that work? (Yes — as long as you run the definition cell first.)
-
(From Chapter 3) What's the difference between
=and==in Python? Where did you use each one in this chapter? -
(From Chapter 3) If
vaccination_rate = 42.3, what is the type ofvaccination_rate < 50? What is its value? -
(From Chapter 1) Elena's vaccination rate analysis involves categorizing data, computing summaries, and generating reports. Which stages of the data science lifecycle is she working in?
What's Next
You can now make decisions, repeat actions, and package logic into reusable functions. That's a huge step — with just if, for, and def, you can write programs that process real data.
But there's a limitation you've probably felt: we've been storing data in individual variables or simple lists. Elena's vaccination data has country names and rates and regions and population sizes. Marcus's sales data has dates and amounts and product names. How do you keep all of that organized?
In Chapter 5: Working with Data Structures, you'll learn about Python's built-in tools for organizing complex data — dictionaries that map keys to values, lists of dictionaries that represent tables of data, and file I/O that lets you load data from the outside world. The functions you wrote in this chapter are about to get a lot more useful, because they'll have real, structured data to work with.
See you there.