Chapter 3 Exercises: Python for the Business Professional


Section A: Python Fundamentals (Exercises 1–10)

Exercise 1: Variables and Data Types

Create variables for the following business metrics and print each one with an appropriate label using f-strings:

  • Company name (string): "Meridian Healthcare"
  • Number of clinics (integer): 23
  • Annual revenue (float): 8,750,000.00
  • Year-over-year growth rate (float): 0.12
  • Publicly traded (boolean): False

Then print a summary sentence that incorporates all five variables.


Exercise 2: Type Conversion

A dataset arrives with revenue stored as strings (e.g., "$1,250,000"). Write code that:

  1. Starts with the string revenue_str = "$1,250,000"
  2. Removes the dollar sign and commas
  3. Converts the result to a float
  4. Prints the float value

Hint: Use the .replace() method on strings. You can chain multiple .replace() calls.


Exercise 3: Arithmetic Operators

A retail store has the following quarterly data:

  • Q1 Revenue: $320,000
  • Q2 Revenue: $415,000
  • Q3 Revenue: $380,000
  • Q4 Revenue: $510,000
  • Annual Costs: $1,200,000

Write code that calculates and prints:

  1. Total annual revenue
  2. Average quarterly revenue
  3. Annual profit (revenue minus costs)
  4. Profit margin as a percentage
  5. Projected next-year revenue assuming 8% growth

Exercise 4: String Operations

Given the following variables:

first_name = "Nkechi"
last_name = "Adeyemi"
department = "marketing"
employee_id = 4821

Write code that produces the following outputs:

  1. "ADEYEMI, Nkechi" (last name uppercase, comma, first name)
  2. "Marketing" (department with first letter capitalized)
  3. "Employee ID: 004821" (zero-padded to 6 digits)
  4. "nadeyemi@company.com" (first initial + last name, all lowercase)

Exercise 5: Conditional Logic — Discount Tiers

Write an if/elif/else statement that assigns a discount rate based on order value:

Order Value Discount
$500 or more 20%
$200 – $499.99 10%
$100 – $199.99 5%
Below $100 0%

Test your code with order values of $75, $150, $350, and $600. For each, print the original price, discount rate, and final price.


Exercise 6: For Loops — Monthly Report

You have a list of monthly sales figures:

monthly_sales = [42000, 38500, 51000, 47200, 39800, 55100,
                 60200, 58400, 45000, 52300, 61800, 74500]
month_names = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
               "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

Write a for loop that prints a line for each month showing: - Month name - Sales figure (formatted with dollar sign and commas) - Whether the month was above or below the annual average

At the end, print the annual total, annual average, best month, and worst month.


Exercise 7: While Loop — Break-Even Analysis

A startup has: - Initial investment: $500,000 - Monthly revenue: $45,000 - Monthly costs: $32,000

Write a while loop that determines how many months it takes to break even (i.e., when cumulative profit equals or exceeds the initial investment). Print the month number and cumulative profit at each step.


Exercise 8: List Comprehensions

Given a list of product prices:

prices = [29.99, 49.99, 15.00, 89.99, 12.50, 199.99, 34.99, 7.99]

Use list comprehensions to create:

  1. A list of prices with 10% tax added
  2. A list of only prices above $30
  3. A list of price labels (e.g., "$29.99" formatted as strings)
  4. A list of tuples pairing each price with its category: "Premium" if above $50, "Standard" if $20–$50, "Budget" if below $20

Exercise 9: Dictionaries — Product Catalog

Create a list of three dictionaries, each representing a product with keys: name, sku, price, category, in_stock.

Then write code that:

  1. Prints all products that are in stock
  2. Calculates the average price of all products
  3. Finds the most expensive product and prints its name
  4. Adds a new key discounted_price (15% off) to each dictionary

Exercise 10: Functions — Business Calculator

Write three functions:

  1. compound_growth(principal, rate, years) — Returns the future value given compound growth. Formula: principal * (1 + rate) ** years

  2. payback_period(investment, monthly_profit) — Returns the number of months to recoup an investment (rounded up to the nearest whole month).

  3. format_currency(amount) — Returns a string formatted as currency (e.g., 1234567.89 becomes "$1,234,567.89").

Test each function with at least two different inputs.


Section B: pandas Exercises (Exercises 11–25)

Exercise 11: Creating a DataFrame

Create a DataFrame from the following data about five regional offices:

Office City Employees Revenue Year Opened
HQ New York 120 4500000 2010
West San Francisco 85 3200000 2015
South Austin 45 1800000 2018
Midwest Chicago 60 2100000 2016
Northeast Boston 35 1200000 2020

After creating the DataFrame, display: - The shape - Data types of each column - Descriptive statistics


Exercise 12: Reading and Exploring Data

Using the Athena sales dataset created in the chapter (or a CSV of your choice), write code to answer:

  1. How many rows and columns does the dataset have?
  2. What are the unique values in the region column?
  3. What is the date range of the data?
  4. Are there any missing values?
  5. What are the mean, median, and standard deviation of the revenue column?

Exercise 13: Selecting and Filtering

Using the Athena sales dataset:

  1. Select only the store, category, and revenue columns.
  2. Filter for all rows where category is "Electronics."
  3. Filter for rows where revenue is above $60,000.
  4. Filter for rows in the "West" region with revenue above $40,000.
  5. Filter for rows where the category is either "Electronics" or "Apparel."

Exercise 14: Sorting

Using the Athena sales dataset:

  1. Sort by revenue in descending order and display the top 10 rows.
  2. Sort by region (alphabetically) and then by revenue (descending) within each region.
  3. Find the single highest-revenue record in the dataset. What store, category, and month does it represent?

Exercise 15: Creating New Columns

Using the Athena sales dataset:

  1. Create a column called avg_price calculated as revenue / units_sold.
  2. Create a column called quarter based on the month (Q1 = Jan–Mar, Q2 = Apr–Jun, etc.).
  3. Create a column called revenue_tier that classifies each row as "High" (above $60,000), "Medium" ($30,000–$60,000), or "Low" (below $30,000).

Exercise 16: GroupBy Basics

Using the Athena sales dataset:

  1. Calculate total revenue by region.
  2. Calculate average revenue by category.
  3. Calculate total units sold by store.
  4. Find the region with the highest total revenue.
  5. Find the category with the lowest average revenue.

Exercise 17: GroupBy Advanced

Using the Athena sales dataset:

  1. Calculate total revenue by region and category (two-level groupby).
  2. For each store, calculate the total revenue, average revenue, and number of months of data.
  3. Find the top 3 stores by total revenue.
  4. Calculate what percentage of total company revenue each region contributes.

Exercise 18: Pivot Tables

Using the Athena sales dataset, create a pivot table showing:

  • Rows: regions
  • Columns: categories
  • Values: total revenue

Then create a second pivot table showing average revenue instead of total.

Hint: Use pd.pivot_table() or df.pivot_table().


Exercise 19: Handling Missing Data

Create a DataFrame with some intentionally missing values:

import numpy as np

data = {
    "product": ["Widget A", "Widget B", "Widget C", "Widget D", "Widget E"],
    "price": [25.00, np.nan, 15.00, 42.00, np.nan],
    "units_sold": [100, 250, np.nan, 175, 300],
    "category": ["Electronics", "Electronics", "Home", np.nan, "Home"]
}
df = pd.DataFrame(data)

Write code that:

  1. Identifies which columns have missing values and how many
  2. Fills missing prices with the median price
  3. Fills the missing category with "Unknown"
  4. Drops any rows where units_sold is still missing
  5. Verifies that no missing values remain

Exercise 20: Merging DataFrames

You have two DataFrames:

stores = pd.DataFrame({
    "store_id": [1, 2, 3, 4, 5],
    "store_name": ["Downtown", "Airport", "Mall", "Suburban", "University"],
    "region": ["NE", "NE", "SE", "MW", "W"]
})

performance = pd.DataFrame({
    "store_id": [1, 2, 3, 4, 5],
    "q1_revenue": [320000, 280000, 410000, 150000, 90000],
    "q1_target": [300000, 300000, 400000, 200000, 100000]
})
  1. Merge these DataFrames on store_id.
  2. Add a column met_target that is True if q1_revenue >= q1_target.
  3. Add a column pct_of_target showing revenue as a percentage of target.
  4. Which stores met their target? Which did not?

Section C: Business Analysis Exercises (Exercises 21–30)

Exercise 21: Sales Trend Analysis

Using the Athena sales dataset, write code to:

  1. Calculate total revenue by month.
  2. Identify the month with the highest revenue and the month with the lowest.
  3. Calculate the month-over-month growth rate.
  4. Create a line chart showing the monthly revenue trend.
  5. Write a 2–3 sentence business summary of the trend.

Exercise 22: Store Performance Scorecard

Write a function called store_scorecard(df, store_name) that takes the Athena sales DataFrame and a store name, and prints:

  • Total annual revenue for that store
  • Revenue by category
  • Best-performing category
  • Worst-performing category
  • Monthly trend (list of monthly totals)
  • Whether the store is above or below the company average

Test it with at least three different stores.


Exercise 23: Category Analysis

Write code that answers the following questions about the Athena dataset:

  1. Which product category generates the most revenue overall?
  2. Which category has the highest average transaction value (revenue / units)?
  3. Is the category mix consistent across regions, or do some regions lean heavily toward certain categories?
  4. Which category shows the most seasonal variation?

Present your findings as printed output with clear labels.


Exercise 24: Underperformer Deep Dive

Extend the underperformer analysis from the chapter:

  1. Instead of using 80% of the category average as the threshold, try 70%, 80%, and 90%. How does the number of underperformers change?
  2. Are there any stores that appear as underperformers across multiple categories?
  3. For the most underperforming store, write a brief analysis (3–5 sentences) explaining what the data suggests and what additional information you would want.

Exercise 25: Automated Report Function

Write a function called monthly_report(df, month) that takes the Athena sales DataFrame and a month string (e.g., "2025-06") and produces a complete printed report including:

  • Total revenue for the month
  • Revenue by region
  • Revenue by category
  • Top-performing store
  • Lowest-performing store
  • Comparison to the previous month (if available)

Test it with at least two different months.


Exercise 26: Python vs. Excel Decision Framework

For each of the following tasks, decide whether Python or Excel would be the better tool (or whether both are needed). Explain your reasoning in 2–3 sentences per task.

  1. Calculating the sum of 50 numbers from a meeting
  2. Analyzing 500,000 rows of transaction data
  3. Building a one-time financial model for a board presentation
  4. Running the same sales report every Monday morning
  5. Exploring a new dataset you have never seen before
  6. Sharing results with a non-technical executive
  7. Connecting to an API to pull social media metrics

Exercise 27: Data Quality Assessment

Write a function called data_quality_check(df) that takes any DataFrame and prints:

  1. Shape (rows x columns)
  2. Columns with missing values and the percentage missing
  3. Columns with only one unique value (potential constants that add no information)
  4. Numeric columns with suspicious values (e.g., negative revenue)
  5. Duplicate rows (if any)

Test it on the Athena sales dataset and on a DataFrame you create with intentional quality issues.


Exercise 28: Year-over-Year Comparison

Create a second year of synthetic Athena data (2024) using the same structure as the 2025 data. Combine both years into a single DataFrame. Then:

  1. Calculate total revenue by year.
  2. Calculate the year-over-year growth rate by region.
  3. Identify which regions grew and which declined.
  4. Create a grouped bar chart comparing 2024 and 2025 revenue by region.

Exercise 29: Function Library

Create a Python file (or a notebook section) called "My Business Analysis Toolkit" containing the following reusable functions:

  1. calculate_cagr(start_value, end_value, years) — Compound annual growth rate
  2. classify_by_quartile(series) — Returns a Series with labels "Q1", "Q2", "Q3", "Q4" based on quartile position
  3. format_summary_table(df) — Takes a DataFrame and returns a string with formatted numbers (commas, dollar signs for revenue columns)
  4. detect_outliers(series, threshold=1.5) — Returns values that are more than threshold IQR away from Q1/Q3

Test each function.


Exercise 30: Capstone — Complete Business Analysis

Choose one of the following scenarios and write a complete analysis in a Jupyter notebook:

Option A: You are a new analyst at a chain of 12 coffee shops. Create a synthetic dataset of daily sales (date, shop name, drink category, revenue, cups sold) for 3 months. Analyze which shops are underperforming, which drink categories are most popular by location, and whether weekday vs. weekend patterns differ. Present your findings with at least two visualizations and a written recommendation.

Option B: You are a marketing analyst reviewing campaign performance. Create a synthetic dataset of 20 email campaigns (campaign name, date, audience size, open rate, click rate, conversions, spend). Analyze which campaigns were most cost-effective, whether audience size affects conversion rates, and what the optimal campaign profile looks like. Present findings with visualizations and recommendations.


Section D: Conceptual Questions (Exercises 31–35)

Exercise 31

Explain in your own words why reproducibility matters in business analysis. Give an example of a situation where a non-reproducible analysis could cause a business problem.


Exercise 32

A colleague says: "I don't need to learn Python — AI can write all my code for me." Write a 3–5 sentence response explaining why understanding Python is still valuable even in the age of AI coding assistants.


Exercise 33

Describe three situations where a Jupyter notebook would be a better deliverable than a PowerPoint presentation, and three situations where PowerPoint would be better.


Exercise 34

You have just shown your Python analysis to a VP who responds: "This is great, but can you put it in Excel so I can play with the numbers?" How do you handle this request, and what does it tell you about communication in data work?


Exercise 35

Professor Okonkwo says: "You are not learning to be software engineers. You are learning to ask better questions." In 3–5 sentences, explain what you think she means by this, and give an example of how Python enables "better questions" compared to manual analysis.


Solutions to selected exercises are available in Appendix B: Answers to Selected Exercises.