> "Data is the new oil. But raw data is crude oil — it needs to be refined."
In This Chapter
- Opening Scenario: The Invoice That Broke
- 3.1 Variables: Naming Things
- 3.2 Python's Core Data Types
- 3.3 Checking and Converting Types
- 3.4 Arithmetic Operators
- 3.5 String Operations
- 3.6 Comparison Operators
- 3.7 Logical Operators
- 3.8 The print() Function in Depth
- 3.9 Comments: Writing for the Human Reader
- 3.10 Bringing It Together: A Business Calculation Script
- 3.11 The Walrus Operator (Python 3.8+)
- Summary
Chapter 3: Python Basics — Variables, Data Types, and Operators
"Data is the new oil. But raw data is crude oil — it needs to be refined." — Clive Humby (adapted)
Opening Scenario: The Invoice That Broke
Marcus Webb sent Priya an email Tuesday morning. Subject line: "Found the January problem."
Apparently, a formula in the master report had been adding customer discounts instead of subtracting them. For three weeks, Acme Corp's reported margin had been off by 4.2 percentage points. The data was right. The calculation was wrong. The underlying issue: the formula worked differently for customers with a discount rate of zero — a subtle type mismatch that Excel's formula engine silently accepted.
The problem wasn't bad data. It was the difference between the number 0 and the empty cell that Excel treats as 0. Same value. Different type.
Python has types too. The difference is that Python makes them explicit, predictable, and checkable. This chapter is about understanding Python's type system — the foundation of every calculation, comparison, and operation you'll ever write.
3.1 Variables: Naming Things
A variable is a name you give to a value so you can use it again. That's all it is.
revenue = 45000
This line tells Python: store the value 45000 and refer to it by the name revenue. From this point forward in your code, anywhere you write revenue, Python substitutes 45000.
revenue = 45000
print(revenue) # Output: 45000
print(revenue * 12) # Output: 540000
print(revenue + 5000) # Output: 50000
Variables are the building blocks of every program. They let you: - Give meaningful names to values (so your code reads like business logic, not just numbers) - Change a value in one place and have it update everywhere (no more find-and-replace across a spreadsheet) - Build calculations on top of other calculations
Naming Rules and Conventions
Python enforces some naming rules and has additional conventions by community consensus:
Rules (required by Python):
- Variable names can contain letters, numbers, and underscores
- Variable names cannot start with a number
- Variable names are case-sensitive (revenue and Revenue are different variables)
- Variable names cannot be Python keywords (if, for, while, class, etc.)
# Valid variable names
monthly_revenue = 45000
employee_count = 200
q1_sales = 125000
_temp = "temporary value"
# Invalid — would cause errors
3rd_quarter = 75000 # Cannot start with number
for = 100 # 'for' is a keyword
my-variable = 50 # Hyphens not allowed (that's subtraction)
Conventions (community practice, enforced by style guides):
- Use lowercase with underscores for variable names: monthly_revenue not monthlyRevenue
- Use descriptive names: customer_count not n, monthly_revenue not rev
- Use names that reflect business meaning: gross_margin_rate not gmr
# Business-appropriate naming
total_sales = 1_450_000
customer_count = 847
churn_rate = 0.12 # 12% churn
quarterly_target = 500_000
region = "Midwest"
is_active = True
Assignment and Reassignment
Variables can be reassigned. The new value replaces the old one:
stock_price = 142.50
print(stock_price) # 142.50
# End of day
stock_price = 143.75
print(stock_price) # 143.75 (the variable now holds the new value)
You can also use a variable's current value to compute its new value:
balance = 10000
balance = balance + 500 # Add 500 to current balance
print(balance) # 10500
# Shorthand for the same operation:
balance += 500
print(balance) # 11000
The shorthand operators (+=, -=, *=, /=) are common. balance += 500 reads as "add 500 to the current value of balance."
3.2 Python's Core Data Types
Every value in Python has a type. The type determines what operations make sense (you can multiply two numbers; you can't multiply two names) and how Python stores the value in memory.
Python has five core types you'll use constantly. We'll add more (lists, dictionaries, etc.) in Chapter 7.
int — Integers (Whole Numbers)
employee_count = 200
fiscal_year = 2024
items_in_stock = 14_782 # Underscores for readability; Python ignores them
print(type(employee_count)) # <class 'int'>
Integers are exact. 200 + 1 is always 201. There's no rounding.
In business contexts, integers represent things you count: employees, products, orders, days, items in stock.
float — Floating-Point Numbers (Decimals)
unit_price = 29.99
tax_rate = 0.0875 # 8.75%
gross_margin = 0.342 # 34.2%
print(type(unit_price)) # <class 'float'>
Floats represent numbers that can have decimal places. In business: prices, rates, percentages.
Warning
Floats are not always exact. This is a fundamental property of how computers store decimal numbers. You may occasionally see surprising results:
```python 0.1 + 0.2
Returns: 0.30000000000000004 (not 0.3!)
```
For financial calculations where exact decimal arithmetic matters, use Python's Decimal module:
```python from decimal import Decimal Decimal('0.1') + Decimal('0.2')
Returns: Decimal('0.3') — exact!
```
We'll cover Decimal in Chapter 29 (Financial Modeling). For most business analytics work, float precision is sufficient.
str — Strings (Text)
customer_name = "Sandra Chen"
region = 'Midwest' # Single or double quotes both work
product_sku = "OFF-CHR-2847"
empty_note = "" # Empty string
print(type(customer_name)) # <class 'str'>
Strings hold text: names, addresses, product codes, status values, descriptions.
String quirk: The number "200" (with quotes) is a string — text that happens to look like a number. The number 200 (no quotes) is an integer. They behave completely differently:
print("200" + "200") # "200200" (string concatenation)
print(200 + 200) # 400 (integer addition)
This distinction — and how to convert between string and numeric types — is one of the most common sources of beginner errors.
bool — Booleans (True/False)
is_active_customer = True
has_outstanding_invoice = False
over_quota = True
print(type(is_active_customer)) # <class 'bool'>
Booleans represent yes/no, on/off, true/false conditions. In business: active/inactive accounts, above/below threshold, paid/unpaid invoices.
Note that in Python, True and False are capitalized. true and false will cause errors.
None — The Absence of a Value
shipping_date = None # Date not set yet
manager_override = None # No override applied
None represents the absence of a value — not zero, not empty string, but genuinely "no value." It's Python's equivalent of a null value.
In business data, None commonly represents missing information: a field that hasn't been filled in, a date that doesn't apply yet, a value that hasn't been calculated.
print(type(None)) # <class 'NoneType'>
print(None == 0) # False — None is not zero
print(None == "") # False — None is not an empty string
print(None == False) # False — None is not False
3.3 Checking and Converting Types
Checking a Type
The type() function returns the type of any value:
print(type(42)) # <class 'int'>
print(type(3.14)) # <class 'float'>
print(type("hello")) # <class 'str'>
print(type(True)) # <class 'bool'>
print(type(None)) # <class 'NoneType'>
For a cleaner check, use isinstance():
revenue = 45000
print(isinstance(revenue, int)) # True
print(isinstance(revenue, float)) # False
print(isinstance(revenue, (int, float))) # True — either int OR float
isinstance() is preferred in production code because it handles type inheritance correctly (a topic for Chapter 39).
Converting Between Types
String to number:
revenue_text = "45000" # Came from a CSV — it's a string
revenue = int(revenue_text) # Convert to integer
# or
revenue = float(revenue_text) # Convert to float
print(revenue + 5000) # 50000 — now we can do arithmetic
Number to string:
employee_count = 200
message = "We have " + str(employee_count) + " employees"
print(message) # "We have 200 employees"
Float to int (truncates the decimal):
price = 29.99
price_int = int(price)
print(price_int) # 29 (not rounded — just truncated)
Common conversion errors:
int("$45,000") # ValueError — can't convert currency format directly
float("N/A") # ValueError — "N/A" is not a number
These errors are common when loading real business data. Chapter 12 covers data cleaning strategies for handling them.
3.4 Arithmetic Operators
Python supports the standard arithmetic operations plus a few that are particularly useful in business contexts.
revenue = 120000
cost = 78000
tax_rate = 0.21
units_sold = 847
# Basic arithmetic
profit = revenue - cost
print(f"Profit: ${profit:,}") # Profit: $42,000
# Multiplication and division
gross_margin_rate = profit / revenue
print(f"Gross margin: {gross_margin_rate:.1%}") # Gross margin: 35.0%
# Power/exponentiation
growth_rate = 0.08
years = 3
future_value = revenue * (1 + growth_rate) ** years
print(f"Revenue in 3 years (8% growth): ${future_value:,.0f}")
# Revenue in 3 years (8% growth): $151,165
The Complete Arithmetic Operator Set
| Operator | Operation | Example | Result |
|---|---|---|---|
+ |
Addition | 100 + 50 |
150 |
- |
Subtraction | 100 - 50 |
50 |
* |
Multiplication | 100 * 2 |
200 |
/ |
Division (float) | 100 / 3 |
33.333... |
// |
Floor division (integer) | 100 // 3 |
33 |
% |
Modulo (remainder) | 100 % 3 |
1 |
** |
Exponentiation | 2 ** 8 |
256 |
Business Use Cases for Each Operator
Floor division (//): Useful when you need whole units. "How many full cases of 12 can I ship from 100 items?" 100 // 12 = 8.
Modulo (%): Useful for remainders. "After shipping 8 full cases, how many items are left?" 100 % 12 = 4. Also useful for checking if a number is even/odd: n % 2 == 0.
Exponentiation (**): Compound growth, interest calculations. principal * (1 + rate) ** years.
# Practical floor division and modulo example
total_units = 100
units_per_case = 12
full_cases = total_units // units_per_case
leftover_units = total_units % units_per_case
print(f"Full cases: {full_cases}") # 8
print(f"Leftover units: {leftover_units}") # 4
Order of Operations
Python follows standard mathematical order of operations (PEMDAS):
1. Parentheses ()
2. Exponentiation **
3. Multiplication/Division/Floor Division/Modulo *, /, //, %
4. Addition/Subtraction +, -
# Without parentheses
result = 5 + 3 * 2 # 11 (multiplication first)
# With parentheses
result = (5 + 3) * 2 # 16 (parentheses first)
# Business example: compound growth with correct parentheses
principal = 100000
rate = 0.05
years = 10
# WRONG — the ** applies only to years
wrong = principal * 1 + rate ** years # ~100000.0000001
# RIGHT
correct = principal * (1 + rate) ** years # ~162889
When in doubt, use parentheses. They cost nothing and make intent explicit.
3.5 String Operations
Strings are more than just text storage — Python provides a rich set of operations for manipulating them.
Concatenation
first_name = "Sandra"
last_name = "Chen"
full_name = first_name + " " + last_name
print(full_name) # Sandra Chen
String Methods
Python strings have built-in methods (functions that operate on the string). Call them with dot notation:
region = "midwest"
region_title = region.title() # "Midwest"
region_upper = region.upper() # "MIDWEST"
region_lower = region.lower() # "midwest"
product_code = " OFF-CHR-2847 "
clean_code = product_code.strip() # "OFF-CHR-2847" (removes whitespace)
clean_code = product_code.strip().upper() # Chain methods
# Check contents
status = "Active Customer"
print(status.startswith("Active")) # True
print(status.endswith("Customer")) # True
print("Customer" in status) # True
# Replace
description = "Q1 results (Q1 2023)"
updated = description.replace("Q1", "Q2")
print(updated) # "Q2 results (Q2 2023)"
# Split into a list
address = "123 Main St, Chicago, IL, 60601"
parts = address.split(", ")
print(parts) # ['123 Main St', 'Chicago', 'IL', '60601']
city = parts[1]
print(city) # Chicago
f-Strings (Formatted String Literals)
f-strings are Python's modern approach to embedding values in text. They're the most readable and preferred format method:
company = "Acme Corp"
revenue = 1_450_000
employees = 200
# Basic f-string
print(f"Company: {company}")
print(f"Annual revenue: ${revenue:,}") # $1,450,000
print(f"Revenue per employee: ${revenue/employees:,.0f}") # $7,250
# Percentage formatting
margin = 0.342
print(f"Gross margin: {margin:.1%}") # 34.2%
print(f"Gross margin: {margin:.2%}") # 34.20%
# Decimal places
price = 29.987
print(f"Price: ${price:.2f}") # $29.99
Format specifiers — the code after : inside {} — control how the value is displayed:
| Specifier | Meaning | Example | Output |
|---|---|---|---|
, |
Thousands separator | {1450000:,} |
1,450,000 |
.2f |
2 decimal places, float | {29.987:.2f} |
29.99 |
.1% |
Percentage, 1 decimal | {0.342:.1%} |
34.2% |
.0f |
0 decimal places | {1450.7:.0f} |
1451 |
,.2f |
Comma separator + 2 decimals | {1450.75:,.2f} |
1,450.75 |
>10 |
Right-align, width 10 | {42:>10} |
42 |
<10 |
Left-align, width 10 | {"hello":<10} |
hello |
Multi-line Strings
For longer text (like email templates or SQL queries), use triple quotes:
report_header = """
Acme Corp — Weekly Sales Report
Week ending: 2024-03-15
Prepared by: Priya Okonkwo
"""
email_body = f"""
Hi Sandra,
Attached is the weekly sales report for the period ending {report_date}.
Total revenue: ${total_revenue:,.0f}
Please let me know if you have questions.
Priya
"""
3.6 Comparison Operators
Comparison operators return a boolean (True or False). They're the foundation of decision logic (Chapter 4) and filtering.
revenue = 45000
target = 50000
print(revenue == target) # False — equal to?
print(revenue != target) # True — not equal to?
print(revenue > target) # False — greater than?
print(revenue >= target) # False — greater than or equal to?
print(revenue < target) # True — less than?
print(revenue <= target) # True — less than or equal to?
Business Applications
# Is a customer's account overdue?
days_outstanding = 45
is_overdue = days_outstanding > 30
print(f"Account overdue: {is_overdue}") # Account overdue: True
# Has the rep hit quota?
sales_this_month = 87500
monthly_quota = 85000
hit_quota = sales_this_month >= monthly_quota
print(f"Quota achieved: {hit_quota}") # Quota achieved: True
# Is inventory critically low?
units_in_stock = 12
reorder_point = 50
needs_reorder = units_in_stock <= reorder_point
print(f"Reorder needed: {needs_reorder}") # Reorder needed: True
Comparing Strings
region = "Midwest"
print(region == "Midwest") # True
print(region == "midwest") # False — case-sensitive!
print(region.lower() == "midwest") # True — normalize case first
# String comparison is alphabetical
print("Apple" < "Banana") # True (A comes before B)
print("Z" > "A") # True
Chained Comparisons
Python allows chaining comparisons in a natural, mathematical style:
margin = 0.342
is_healthy_margin = 0.25 <= margin <= 0.45
print(f"Healthy margin: {is_healthy_margin}") # True
This is equivalent to 0.25 <= margin and margin <= 0.45 — Python evaluates both comparisons and returns True only if both are true.
3.7 Logical Operators
Logical operators combine boolean expressions.
| Operator | Meaning | Result |
|---|---|---|
and |
Both must be True | True and True → True; True and False → False |
or |
At least one must be True | True or False → True; False or False → False |
not |
Inverts the boolean | not True → False; not False → True |
revenue = 45000
margin = 0.342
customer_tier = "Gold"
# 'and' — both conditions must hold
is_priority_account = revenue > 40000 and customer_tier == "Gold"
print(f"Priority account: {is_priority_account}") # True
# 'or' — at least one must hold
needs_review = revenue < 10000 or margin < 0.15
print(f"Needs review: {needs_review}") # False
# 'not' — invert
is_active = True
is_inactive = not is_active
print(f"Inactive: {is_inactive}") # False
# Complex combination
flag_for_manager = (revenue > 100000 and margin < 0.10) or (days_overdue > 90)
Short-Circuit Evaluation
Python evaluates logical expressions from left to right and stops as soon as the outcome is determined:
A and B: If A is False, B is never evaluated (because the result is already False)A or B: If A is True, B is never evaluated (because the result is already True)
This is called short-circuit evaluation and matters when the right side of the expression has side effects or expensive operations.
3.8 The print() Function in Depth
You've been using print() throughout this chapter. Let's understand it properly.
# Basic usage
print("Hello") # Hello
print(42) # 42
print(3.14) # 3.14
print(True) # True
print() # Empty line
# Multiple arguments (separated by commas)
name = "Sandra"
title = "VP of Sales"
print(name, title) # Sandra VP of Sales (space-separated by default)
print(name, title, sep=", ") # Sandra, VP of Sales
# End parameter (default is newline)
print("Loading", end="") # No newline at end
print(".") # On same line: "Loading."
# Separator parameter
print("Acme", "Corp", "2024", sep="-") # Acme-Corp-2024
Printing Multiple Values
revenue = 45000
cost = 30000
profit = revenue - cost
# All three are equivalent:
print("Revenue:", revenue, "| Cost:", cost, "| Profit:", profit)
print(f"Revenue: {revenue} | Cost: {cost} | Profit: {profit}")
print("Revenue: %d | Cost: %d | Profit: %d" % (revenue, cost, profit)) # Old style, avoid
# The f-string version is most readable and most modern
3.9 Comments: Writing for the Human Reader
Comments are lines that Python ignores — they're for human readers.
# Single-line comment — use the hash character
# Calculate gross margin
# (Revenue minus Cost of Goods Sold, divided by Revenue)
revenue = 125_000
cogs = 81_250 # Cost of Goods Sold
gross_margin = (revenue - cogs) / revenue
print(f"Gross margin: {gross_margin:.1%}")
"""
This is a multi-line comment (technically a multi-line string,
but used as a comment when not assigned to a variable).
Good for longer explanations.
"""
When to comment:
- Explain why, not what. The code says what it does. Comments say why.
- Flag non-obvious business rules: # Discount applies only to orders over $500 (company policy)
- Mark assumptions: # Assumes fiscal year starts January 1
- Explain tricky workarounds: # Using string comparison here because the database returns strings, not integers
When not to comment:
- Don't state the obvious: # Add 1 to count on count += 1
- Don't leave old code commented out in production — delete it (version control keeps history)
- Don't explain Python syntax to beginners in production code — comments are for domain knowledge
3.10 Bringing It Together: A Business Calculation Script
Let's apply everything in this chapter to a realistic calculation.
"""
acme_monthly_summary.py
Monthly business summary calculation for Acme Corp.
"""
# ── INPUT DATA ────────────────────────────────────────────────────────────────
# Regional sales figures for March 2024
chicago_sales = 312_450.00
cincinnati_sales = 187_890.00
nashville_sales = 205_340.00
st_louis_sales = 168_720.00
# Cost of goods sold (as a percentage of sales — varies by region)
chicago_cogs_rate = 0.62
cincinnati_cogs_rate = 0.64
nashville_cogs_rate = 0.61
st_louis_cogs_rate = 0.65
# Company overhead for the month
monthly_overhead = 95_000.00
# ── CALCULATIONS ─────────────────────────────────────────────────────────────
# Total revenue
total_revenue = chicago_sales + cincinnati_sales + nashville_sales + st_louis_sales
# Gross profit by region (revenue - direct costs)
chicago_gp = chicago_sales * (1 - chicago_cogs_rate)
cincinnati_gp = cincinnati_sales * (1 - cincinnati_cogs_rate)
nashville_gp = nashville_sales * (1 - nashville_cogs_rate)
st_louis_gp = st_louis_sales * (1 - st_louis_cogs_rate)
total_gross_profit = chicago_gp + cincinnati_gp + nashville_gp + st_louis_gp
# Operating profit (gross profit minus overhead)
operating_profit = total_gross_profit - monthly_overhead
# Gross margin rate
gross_margin_rate = total_gross_profit / total_revenue
# Operating margin rate
operating_margin_rate = operating_profit / total_revenue
# ── PERFORMANCE FLAGS ─────────────────────────────────────────────────────────
target_revenue = 850_000
hit_revenue_target = total_revenue >= target_revenue
# Healthy margin defined as >= 35%
healthy_margin = gross_margin_rate >= 0.35
# ── OUTPUT ───────────────────────────────────────────────────────────────────
print("=" * 50)
print("ACME CORP — MARCH 2024 MONTHLY SUMMARY")
print("=" * 50)
print()
print("Regional Breakdown:")
print(f" Chicago: ${chicago_sales:>12,.2f} | GP: ${chicago_gp:>10,.2f}")
print(f" Cincinnati: ${cincinnati_sales:>12,.2f} | GP: ${cincinnati_gp:>10,.2f}")
print(f" Nashville: ${nashville_sales:>12,.2f} | GP: ${nashville_gp:>10,.2f}")
print(f" St. Louis: ${st_louis_sales:>12,.2f} | GP: ${st_louis_gp:>10,.2f}")
print()
print(f"Total Revenue: ${total_revenue:>12,.2f}")
print(f"Total Gross Profit:${total_gross_profit:>12,.2f}")
print(f"Monthly Overhead: ${monthly_overhead:>12,.2f}")
print(f"Operating Profit: ${operating_profit:>12,.2f}")
print()
print(f"Gross Margin Rate: {gross_margin_rate:.1%}")
print(f"Operating Margin Rate: {operating_margin_rate:.1%}")
print()
print(f"Revenue target hit: {'✓ YES' if hit_revenue_target else '✗ NO'}")
print(f"Healthy margin: {'✓ YES' if healthy_margin else '✗ BELOW TARGET'}")
print("=" * 50)
Expected output:
==================================================
ACME CORP — MARCH 2024 MONTHLY SUMMARY
==================================================
Regional Breakdown:
Chicago: $ 312,450.00 | GP: $ 118,731.00
Cincinnati: $ 187,890.00 | GP: $ 67,640.40
Nashville: $ 205,340.00 | GP: $ 80,082.60
St. Louis: $ 168,720.00 | GP: $ 59,052.00
Total Revenue: $ 874,400.00
Total Gross Profit:$ 325,506.00
Monthly Overhead: $ 95,000.00
Operating Profit: $ 230,506.00
Gross Margin Rate: 37.2%
Operating Margin Rate: 26.4%
Revenue target hit: ✓ YES
Healthy margin: ✓ YES
==================================================
This is a real business summary. Everything that generated it fits in a screen of Python. In Chapter 9, we'll replace the hard-coded input values with data loaded from CSV files.
3.11 The Walrus Operator (Python 3.8+)
One modern Python feature worth knowing: the walrus operator := assigns and returns a value in a single expression.
# Without walrus operator
revenue = get_monthly_revenue()
if revenue > 100000:
print(f"High-value month: ${revenue:,}")
# With walrus operator
if (revenue := get_monthly_revenue()) > 100000:
print(f"High-value month: ${revenue:,}")
In business code, you'll encounter this occasionally but shouldn't overuse it — it can reduce readability. We'll see practical uses in later chapters when working with loops and data streams.
Summary
- A variable is a named container for a value. Choose descriptive business-meaningful names.
- Python's core types:
int(whole numbers),float(decimals),str(text),bool(True/False),None(no value). - Convert between types with
int(),float(),str(). Conversion can fail if the value isn't compatible — handle this in Chapter 8. - Arithmetic operators:
+,-,*,/,//(floor div),%(remainder),**(power). Use parentheses to control order. - Comparison operators:
==,!=,>,>=,<,<=return booleans. - Logical operators:
and,or,notcombine booleans. - f-strings are the preferred way to embed variables in text:
f"Revenue: ${revenue:,.2f}". - Comments explain why, not what. Use them for business rules and non-obvious logic.
- Float arithmetic is not always exact — use
Decimalfor financial calculations requiring precision.
Chapter 4: Control Flow: Making Decisions in Your Programs →