9 min read

> "Data is the new oil. But raw data is crude oil — it needs to be refined."

Chapter 3: Python Basics — Variables, Data Types, and Operators

"Data is the new oil. But raw data is crude oil — it needs to be refined." — Clive Humby (adapted)


Opening Scenario: The Invoice That Broke

Marcus Webb sent Priya an email Tuesday morning. Subject line: "Found the January problem."

Apparently, a formula in the master report had been adding customer discounts instead of subtracting them. For three weeks, Acme Corp's reported margin had been off by 4.2 percentage points. The data was right. The calculation was wrong. The underlying issue: the formula worked differently for customers with a discount rate of zero — a subtle type mismatch that Excel's formula engine silently accepted.

The problem wasn't bad data. It was the difference between the number 0 and the empty cell that Excel treats as 0. Same value. Different type.

Python has types too. The difference is that Python makes them explicit, predictable, and checkable. This chapter is about understanding Python's type system — the foundation of every calculation, comparison, and operation you'll ever write.


3.1 Variables: Naming Things

A variable is a name you give to a value so you can use it again. That's all it is.

revenue = 45000

This line tells Python: store the value 45000 and refer to it by the name revenue. From this point forward in your code, anywhere you write revenue, Python substitutes 45000.

revenue = 45000
print(revenue)          # Output: 45000
print(revenue * 12)     # Output: 540000
print(revenue + 5000)   # Output: 50000

Variables are the building blocks of every program. They let you: - Give meaningful names to values (so your code reads like business logic, not just numbers) - Change a value in one place and have it update everywhere (no more find-and-replace across a spreadsheet) - Build calculations on top of other calculations

Naming Rules and Conventions

Python enforces some naming rules and has additional conventions by community consensus:

Rules (required by Python): - Variable names can contain letters, numbers, and underscores - Variable names cannot start with a number - Variable names are case-sensitive (revenue and Revenue are different variables) - Variable names cannot be Python keywords (if, for, while, class, etc.)

# Valid variable names
monthly_revenue = 45000
employee_count = 200
q1_sales = 125000
_temp = "temporary value"

# Invalid — would cause errors
3rd_quarter = 75000    # Cannot start with number
for = 100              # 'for' is a keyword
my-variable = 50       # Hyphens not allowed (that's subtraction)

Conventions (community practice, enforced by style guides): - Use lowercase with underscores for variable names: monthly_revenue not monthlyRevenue - Use descriptive names: customer_count not n, monthly_revenue not rev - Use names that reflect business meaning: gross_margin_rate not gmr

# Business-appropriate naming
total_sales = 1_450_000
customer_count = 847
churn_rate = 0.12        # 12% churn
quarterly_target = 500_000
region = "Midwest"
is_active = True

Assignment and Reassignment

Variables can be reassigned. The new value replaces the old one:

stock_price = 142.50
print(stock_price)   # 142.50

# End of day
stock_price = 143.75
print(stock_price)   # 143.75 (the variable now holds the new value)

You can also use a variable's current value to compute its new value:

balance = 10000
balance = balance + 500    # Add 500 to current balance
print(balance)              # 10500

# Shorthand for the same operation:
balance += 500
print(balance)              # 11000

The shorthand operators (+=, -=, *=, /=) are common. balance += 500 reads as "add 500 to the current value of balance."


3.2 Python's Core Data Types

Every value in Python has a type. The type determines what operations make sense (you can multiply two numbers; you can't multiply two names) and how Python stores the value in memory.

Python has five core types you'll use constantly. We'll add more (lists, dictionaries, etc.) in Chapter 7.

int — Integers (Whole Numbers)

employee_count = 200
fiscal_year = 2024
items_in_stock = 14_782    # Underscores for readability; Python ignores them

print(type(employee_count))    # <class 'int'>

Integers are exact. 200 + 1 is always 201. There's no rounding.

In business contexts, integers represent things you count: employees, products, orders, days, items in stock.

float — Floating-Point Numbers (Decimals)

unit_price = 29.99
tax_rate = 0.0875          # 8.75%
gross_margin = 0.342       # 34.2%

print(type(unit_price))    # <class 'float'>

Floats represent numbers that can have decimal places. In business: prices, rates, percentages.

Warning

Floats are not always exact. This is a fundamental property of how computers store decimal numbers. You may occasionally see surprising results:

```python 0.1 + 0.2

Returns: 0.30000000000000004 (not 0.3!)

```

For financial calculations where exact decimal arithmetic matters, use Python's Decimal module:

```python from decimal import Decimal Decimal('0.1') + Decimal('0.2')

Returns: Decimal('0.3') — exact!

```

We'll cover Decimal in Chapter 29 (Financial Modeling). For most business analytics work, float precision is sufficient.

str — Strings (Text)

customer_name = "Sandra Chen"
region = 'Midwest'          # Single or double quotes both work
product_sku = "OFF-CHR-2847"
empty_note = ""             # Empty string

print(type(customer_name))  # <class 'str'>

Strings hold text: names, addresses, product codes, status values, descriptions.

String quirk: The number "200" (with quotes) is a string — text that happens to look like a number. The number 200 (no quotes) is an integer. They behave completely differently:

print("200" + "200")    # "200200" (string concatenation)
print(200 + 200)        # 400 (integer addition)

This distinction — and how to convert between string and numeric types — is one of the most common sources of beginner errors.

bool — Booleans (True/False)

is_active_customer = True
has_outstanding_invoice = False
over_quota = True

print(type(is_active_customer))    # <class 'bool'>

Booleans represent yes/no, on/off, true/false conditions. In business: active/inactive accounts, above/below threshold, paid/unpaid invoices.

Note that in Python, True and False are capitalized. true and false will cause errors.

None — The Absence of a Value

shipping_date = None       # Date not set yet
manager_override = None    # No override applied

None represents the absence of a value — not zero, not empty string, but genuinely "no value." It's Python's equivalent of a null value.

In business data, None commonly represents missing information: a field that hasn't been filled in, a date that doesn't apply yet, a value that hasn't been calculated.

print(type(None))    # <class 'NoneType'>
print(None == 0)     # False — None is not zero
print(None == "")    # False — None is not an empty string
print(None == False) # False — None is not False

3.3 Checking and Converting Types

Checking a Type

The type() function returns the type of any value:

print(type(42))        # <class 'int'>
print(type(3.14))      # <class 'float'>
print(type("hello"))   # <class 'str'>
print(type(True))      # <class 'bool'>
print(type(None))      # <class 'NoneType'>

For a cleaner check, use isinstance():

revenue = 45000
print(isinstance(revenue, int))      # True
print(isinstance(revenue, float))    # False
print(isinstance(revenue, (int, float)))  # True — either int OR float

isinstance() is preferred in production code because it handles type inheritance correctly (a topic for Chapter 39).

Converting Between Types

String to number:

revenue_text = "45000"       # Came from a CSV — it's a string
revenue = int(revenue_text)  # Convert to integer
# or
revenue = float(revenue_text)  # Convert to float

print(revenue + 5000)  # 50000 — now we can do arithmetic

Number to string:

employee_count = 200
message = "We have " + str(employee_count) + " employees"
print(message)   # "We have 200 employees"

Float to int (truncates the decimal):

price = 29.99
price_int = int(price)
print(price_int)   # 29 (not rounded — just truncated)

Common conversion errors:

int("$45,000")   # ValueError — can't convert currency format directly
float("N/A")     # ValueError — "N/A" is not a number

These errors are common when loading real business data. Chapter 12 covers data cleaning strategies for handling them.


3.4 Arithmetic Operators

Python supports the standard arithmetic operations plus a few that are particularly useful in business contexts.

revenue = 120000
cost = 78000
tax_rate = 0.21
units_sold = 847

# Basic arithmetic
profit = revenue - cost
print(f"Profit: ${profit:,}")                    # Profit: $42,000

# Multiplication and division
gross_margin_rate = profit / revenue
print(f"Gross margin: {gross_margin_rate:.1%}")  # Gross margin: 35.0%

# Power/exponentiation
growth_rate = 0.08
years = 3
future_value = revenue * (1 + growth_rate) ** years
print(f"Revenue in 3 years (8% growth): ${future_value:,.0f}")
# Revenue in 3 years (8% growth): $151,165

The Complete Arithmetic Operator Set

Operator Operation Example Result
+ Addition 100 + 50 150
- Subtraction 100 - 50 50
* Multiplication 100 * 2 200
/ Division (float) 100 / 3 33.333...
// Floor division (integer) 100 // 3 33
% Modulo (remainder) 100 % 3 1
** Exponentiation 2 ** 8 256

Business Use Cases for Each Operator

Floor division (//): Useful when you need whole units. "How many full cases of 12 can I ship from 100 items?" 100 // 12 = 8.

Modulo (%): Useful for remainders. "After shipping 8 full cases, how many items are left?" 100 % 12 = 4. Also useful for checking if a number is even/odd: n % 2 == 0.

Exponentiation (**): Compound growth, interest calculations. principal * (1 + rate) ** years.

# Practical floor division and modulo example
total_units = 100
units_per_case = 12

full_cases = total_units // units_per_case
leftover_units = total_units % units_per_case

print(f"Full cases: {full_cases}")          # 8
print(f"Leftover units: {leftover_units}")  # 4

Order of Operations

Python follows standard mathematical order of operations (PEMDAS): 1. Parentheses () 2. Exponentiation ** 3. Multiplication/Division/Floor Division/Modulo *, /, //, % 4. Addition/Subtraction +, -

# Without parentheses
result = 5 + 3 * 2      # 11 (multiplication first)

# With parentheses
result = (5 + 3) * 2    # 16 (parentheses first)

# Business example: compound growth with correct parentheses
principal = 100000
rate = 0.05
years = 10
# WRONG — the ** applies only to years
wrong = principal * 1 + rate ** years       # ~100000.0000001
# RIGHT
correct = principal * (1 + rate) ** years  # ~162889

When in doubt, use parentheses. They cost nothing and make intent explicit.


3.5 String Operations

Strings are more than just text storage — Python provides a rich set of operations for manipulating them.

Concatenation

first_name = "Sandra"
last_name = "Chen"
full_name = first_name + " " + last_name
print(full_name)   # Sandra Chen

String Methods

Python strings have built-in methods (functions that operate on the string). Call them with dot notation:

region = "midwest"
region_title = region.title()     # "Midwest"
region_upper = region.upper()     # "MIDWEST"
region_lower = region.lower()     # "midwest"

product_code = "  OFF-CHR-2847  "
clean_code = product_code.strip()       # "OFF-CHR-2847" (removes whitespace)
clean_code = product_code.strip().upper()  # Chain methods

# Check contents
status = "Active Customer"
print(status.startswith("Active"))    # True
print(status.endswith("Customer"))    # True
print("Customer" in status)           # True

# Replace
description = "Q1 results (Q1 2023)"
updated = description.replace("Q1", "Q2")
print(updated)   # "Q2 results (Q2 2023)"

# Split into a list
address = "123 Main St, Chicago, IL, 60601"
parts = address.split(", ")
print(parts)   # ['123 Main St', 'Chicago', 'IL', '60601']
city = parts[1]
print(city)    # Chicago

f-Strings (Formatted String Literals)

f-strings are Python's modern approach to embedding values in text. They're the most readable and preferred format method:

company = "Acme Corp"
revenue = 1_450_000
employees = 200

# Basic f-string
print(f"Company: {company}")
print(f"Annual revenue: ${revenue:,}")          # $1,450,000
print(f"Revenue per employee: ${revenue/employees:,.0f}")  # $7,250

# Percentage formatting
margin = 0.342
print(f"Gross margin: {margin:.1%}")    # 34.2%
print(f"Gross margin: {margin:.2%}")    # 34.20%

# Decimal places
price = 29.987
print(f"Price: ${price:.2f}")           # $29.99

Format specifiers — the code after : inside {} — control how the value is displayed:

Specifier Meaning Example Output
, Thousands separator {1450000:,} 1,450,000
.2f 2 decimal places, float {29.987:.2f} 29.99
.1% Percentage, 1 decimal {0.342:.1%} 34.2%
.0f 0 decimal places {1450.7:.0f} 1451
,.2f Comma separator + 2 decimals {1450.75:,.2f} 1,450.75
>10 Right-align, width 10 {42:>10} 42
<10 Left-align, width 10 {"hello":<10} hello

Multi-line Strings

For longer text (like email templates or SQL queries), use triple quotes:

report_header = """
Acme Corp — Weekly Sales Report
Week ending: 2024-03-15
Prepared by: Priya Okonkwo
"""

email_body = f"""
Hi Sandra,

Attached is the weekly sales report for the period ending {report_date}.
Total revenue: ${total_revenue:,.0f}

Please let me know if you have questions.

Priya
"""

3.6 Comparison Operators

Comparison operators return a boolean (True or False). They're the foundation of decision logic (Chapter 4) and filtering.

revenue = 45000
target = 50000

print(revenue == target)    # False — equal to?
print(revenue != target)    # True — not equal to?
print(revenue > target)     # False — greater than?
print(revenue >= target)    # False — greater than or equal to?
print(revenue < target)     # True — less than?
print(revenue <= target)    # True — less than or equal to?

Business Applications

# Is a customer's account overdue?
days_outstanding = 45
is_overdue = days_outstanding > 30
print(f"Account overdue: {is_overdue}")   # Account overdue: True

# Has the rep hit quota?
sales_this_month = 87500
monthly_quota = 85000
hit_quota = sales_this_month >= monthly_quota
print(f"Quota achieved: {hit_quota}")     # Quota achieved: True

# Is inventory critically low?
units_in_stock = 12
reorder_point = 50
needs_reorder = units_in_stock <= reorder_point
print(f"Reorder needed: {needs_reorder}") # Reorder needed: True

Comparing Strings

region = "Midwest"
print(region == "Midwest")    # True
print(region == "midwest")    # False — case-sensitive!
print(region.lower() == "midwest")  # True — normalize case first

# String comparison is alphabetical
print("Apple" < "Banana")   # True (A comes before B)
print("Z" > "A")            # True

Chained Comparisons

Python allows chaining comparisons in a natural, mathematical style:

margin = 0.342
is_healthy_margin = 0.25 <= margin <= 0.45
print(f"Healthy margin: {is_healthy_margin}")   # True

This is equivalent to 0.25 <= margin and margin <= 0.45 — Python evaluates both comparisons and returns True only if both are true.


3.7 Logical Operators

Logical operators combine boolean expressions.

Operator Meaning Result
and Both must be True True and True → True; True and False → False
or At least one must be True True or False → True; False or False → False
not Inverts the boolean not True → False; not False → True
revenue = 45000
margin = 0.342
customer_tier = "Gold"

# 'and' — both conditions must hold
is_priority_account = revenue > 40000 and customer_tier == "Gold"
print(f"Priority account: {is_priority_account}")   # True

# 'or' — at least one must hold
needs_review = revenue < 10000 or margin < 0.15
print(f"Needs review: {needs_review}")              # False

# 'not' — invert
is_active = True
is_inactive = not is_active
print(f"Inactive: {is_inactive}")                   # False

# Complex combination
flag_for_manager = (revenue > 100000 and margin < 0.10) or (days_overdue > 90)

Short-Circuit Evaluation

Python evaluates logical expressions from left to right and stops as soon as the outcome is determined:

  • A and B: If A is False, B is never evaluated (because the result is already False)
  • A or B: If A is True, B is never evaluated (because the result is already True)

This is called short-circuit evaluation and matters when the right side of the expression has side effects or expensive operations.


3.8 The print() Function in Depth

You've been using print() throughout this chapter. Let's understand it properly.

# Basic usage
print("Hello")                   # Hello
print(42)                         # 42
print(3.14)                       # 3.14
print(True)                       # True
print()                           # Empty line

# Multiple arguments (separated by commas)
name = "Sandra"
title = "VP of Sales"
print(name, title)                # Sandra VP of Sales (space-separated by default)
print(name, title, sep=", ")      # Sandra, VP of Sales

# End parameter (default is newline)
print("Loading", end="")         # No newline at end
print(".")                       # On same line: "Loading."

# Separator parameter
print("Acme", "Corp", "2024", sep="-")  # Acme-Corp-2024

Printing Multiple Values

revenue = 45000
cost = 30000
profit = revenue - cost

# All three are equivalent:
print("Revenue:", revenue, "| Cost:", cost, "| Profit:", profit)
print(f"Revenue: {revenue} | Cost: {cost} | Profit: {profit}")
print("Revenue: %d | Cost: %d | Profit: %d" % (revenue, cost, profit))  # Old style, avoid

# The f-string version is most readable and most modern

3.9 Comments: Writing for the Human Reader

Comments are lines that Python ignores — they're for human readers.

# Single-line comment — use the hash character

# Calculate gross margin
# (Revenue minus Cost of Goods Sold, divided by Revenue)
revenue = 125_000
cogs = 81_250          # Cost of Goods Sold
gross_margin = (revenue - cogs) / revenue
print(f"Gross margin: {gross_margin:.1%}")

"""
This is a multi-line comment (technically a multi-line string,
but used as a comment when not assigned to a variable).
Good for longer explanations.
"""

When to comment: - Explain why, not what. The code says what it does. Comments say why. - Flag non-obvious business rules: # Discount applies only to orders over $500 (company policy) - Mark assumptions: # Assumes fiscal year starts January 1 - Explain tricky workarounds: # Using string comparison here because the database returns strings, not integers

When not to comment: - Don't state the obvious: # Add 1 to count on count += 1 - Don't leave old code commented out in production — delete it (version control keeps history) - Don't explain Python syntax to beginners in production code — comments are for domain knowledge


3.10 Bringing It Together: A Business Calculation Script

Let's apply everything in this chapter to a realistic calculation.

"""
acme_monthly_summary.py
Monthly business summary calculation for Acme Corp.
"""

# ── INPUT DATA ────────────────────────────────────────────────────────────────
# Regional sales figures for March 2024
chicago_sales = 312_450.00
cincinnati_sales = 187_890.00
nashville_sales = 205_340.00
st_louis_sales = 168_720.00

# Cost of goods sold (as a percentage of sales — varies by region)
chicago_cogs_rate = 0.62
cincinnati_cogs_rate = 0.64
nashville_cogs_rate = 0.61
st_louis_cogs_rate = 0.65

# Company overhead for the month
monthly_overhead = 95_000.00

# ── CALCULATIONS ─────────────────────────────────────────────────────────────
# Total revenue
total_revenue = chicago_sales + cincinnati_sales + nashville_sales + st_louis_sales

# Gross profit by region (revenue - direct costs)
chicago_gp = chicago_sales * (1 - chicago_cogs_rate)
cincinnati_gp = cincinnati_sales * (1 - cincinnati_cogs_rate)
nashville_gp = nashville_sales * (1 - nashville_cogs_rate)
st_louis_gp = st_louis_sales * (1 - st_louis_cogs_rate)

total_gross_profit = chicago_gp + cincinnati_gp + nashville_gp + st_louis_gp

# Operating profit (gross profit minus overhead)
operating_profit = total_gross_profit - monthly_overhead

# Gross margin rate
gross_margin_rate = total_gross_profit / total_revenue

# Operating margin rate
operating_margin_rate = operating_profit / total_revenue

# ── PERFORMANCE FLAGS ─────────────────────────────────────────────────────────
target_revenue = 850_000
hit_revenue_target = total_revenue >= target_revenue

# Healthy margin defined as >= 35%
healthy_margin = gross_margin_rate >= 0.35

# ── OUTPUT ───────────────────────────────────────────────────────────────────
print("=" * 50)
print("ACME CORP — MARCH 2024 MONTHLY SUMMARY")
print("=" * 50)
print()
print("Regional Breakdown:")
print(f"  Chicago:      ${chicago_sales:>12,.2f}  |  GP: ${chicago_gp:>10,.2f}")
print(f"  Cincinnati:   ${cincinnati_sales:>12,.2f}  |  GP: ${cincinnati_gp:>10,.2f}")
print(f"  Nashville:    ${nashville_sales:>12,.2f}  |  GP: ${nashville_gp:>10,.2f}")
print(f"  St. Louis:    ${st_louis_sales:>12,.2f}  |  GP: ${st_louis_gp:>10,.2f}")
print()
print(f"Total Revenue:     ${total_revenue:>12,.2f}")
print(f"Total Gross Profit:${total_gross_profit:>12,.2f}")
print(f"Monthly Overhead:  ${monthly_overhead:>12,.2f}")
print(f"Operating Profit:  ${operating_profit:>12,.2f}")
print()
print(f"Gross Margin Rate:     {gross_margin_rate:.1%}")
print(f"Operating Margin Rate: {operating_margin_rate:.1%}")
print()
print(f"Revenue target hit:  {'✓ YES' if hit_revenue_target else '✗ NO'}")
print(f"Healthy margin:      {'✓ YES' if healthy_margin else '✗ BELOW TARGET'}")
print("=" * 50)

Expected output:

==================================================
ACME CORP — MARCH 2024 MONTHLY SUMMARY
==================================================

Regional Breakdown:
  Chicago:      $    312,450.00  |  GP: $   118,731.00
  Cincinnati:   $    187,890.00  |  GP: $    67,640.40
  Nashville:    $    205,340.00  |  GP: $    80,082.60
  St. Louis:    $    168,720.00  |  GP: $    59,052.00

Total Revenue:     $    874,400.00
Total Gross Profit:$    325,506.00
Monthly Overhead:  $     95,000.00
Operating Profit:  $    230,506.00

Gross Margin Rate:     37.2%
Operating Margin Rate: 26.4%

Revenue target hit:  ✓ YES
Healthy margin:      ✓ YES
==================================================

This is a real business summary. Everything that generated it fits in a screen of Python. In Chapter 9, we'll replace the hard-coded input values with data loaded from CSV files.


3.11 The Walrus Operator (Python 3.8+)

One modern Python feature worth knowing: the walrus operator := assigns and returns a value in a single expression.

# Without walrus operator
revenue = get_monthly_revenue()
if revenue > 100000:
    print(f"High-value month: ${revenue:,}")

# With walrus operator
if (revenue := get_monthly_revenue()) > 100000:
    print(f"High-value month: ${revenue:,}")

In business code, you'll encounter this occasionally but shouldn't overuse it — it can reduce readability. We'll see practical uses in later chapters when working with loops and data streams.


Summary

  • A variable is a named container for a value. Choose descriptive business-meaningful names.
  • Python's core types: int (whole numbers), float (decimals), str (text), bool (True/False), None (no value).
  • Convert between types with int(), float(), str(). Conversion can fail if the value isn't compatible — handle this in Chapter 8.
  • Arithmetic operators: +, -, *, /, // (floor div), % (remainder), ** (power). Use parentheses to control order.
  • Comparison operators: ==, !=, >, >=, <, <= return booleans.
  • Logical operators: and, or, not combine booleans.
  • f-strings are the preferred way to embed variables in text: f"Revenue: ${revenue:,.2f}".
  • Comments explain why, not what. Use them for business rules and non-obvious logic.
  • Float arithmetic is not always exact — use Decimal for financial calculations requiring precision.

Chapter 4: Control Flow: Making Decisions in Your Programs →