Chapter 10 Exercises: Introduction to pandas

These exercises are organized into five tiers. Complete them in order — each tier builds on the skills established in the previous one. Solutions are not provided here; test your work by running your code and verifying that the output makes sense.


How to Use These Exercises

Setup: All exercises in Tiers 1–3 use the starter DataFrame below. Copy it into a new .py file or Jupyter notebook before starting.

import pandas as pd
import numpy as np

# Starter DataFrame — Acme Corp Product Catalog
product_data = {
    "sku": ["ACM-001", "ACM-002", "ACM-003", "ACM-004", "ACM-005",
            "ACM-006", "ACM-007", "ACM-008", "ACM-009", "ACM-010"],
    "product_name": [
        "Widget A", "Widget B", "Gadget X", "Gadget Y", "Component Z",
        "Widget C", "Gadget Pro", "Component Kit", "Widget Mini", "Gadget Lite",
    ],
    "category": [
        "Widgets", "Widgets", "Gadgets", "Gadgets", "Components",
        "Widgets", "Gadgets", "Components", "Widgets", "Gadgets",
    ],
    "unit_price": [29.99, 49.99, 89.99, 129.99, 14.99, 39.99, 199.99, 24.99, 19.99, 59.99],
    "unit_cost":  [12.50, 22.00, 41.00, 58.50,  8.25, 34.50, 88.00,  11.00,  9.50, 28.00],
    "inventory":  [250,   180,   95,    42,      600,  310,   18,     420,    75,   140],
    "reorder_point": [100, 75, 50, 25, 200, 150, 20, 300, 40, 60],
    "weight_lbs": [0.5, 0.8, 1.2, 1.5, 0.1, 0.6, 2.1, 0.3, 0.4, 1.1],
}

df = pd.DataFrame(product_data)
df = df.set_index("sku")

Tier 1: Foundations (Getting Started)

These exercises practice the most fundamental pandas operations: creation, inspection, and basic selection. No prior pandas experience beyond this chapter is needed.


Exercise 1.1 — Your First Inspection

Load the starter DataFrame and answer the following questions using only pandas methods (not by reading the raw data dictionary):

a) How many rows and columns does the DataFrame have? Print the answer using an f-string: "The catalog has X products and Y columns."

b) What is the data type of the unit_price column?

c) What is the data type of the product_name column?

d) Does the DataFrame have any missing (null) values? How can you tell?

Expected output format:

The catalog has 10 products and 8 columns.
unit_price dtype: float64
product_name dtype: object
Missing values: False (no nulls)

Exercise 1.2 — Head, Tail, and Shape

Write code to:

a) Display only the first 3 rows of the DataFrame.

b) Display only the last 2 rows.

c) Display the column names as a Python list.

d) Print a statement that reads: "First product: [product_name]" and "Last product: [product_name]" — retrieve these values from the DataFrame, do not hard-code them.


Exercise 1.3 — Column Selection

a) Select only the product_name column. What type does this return?

b) Select only the unit_price column. What is the average of this column?

c) Select the columns product_name, category, and unit_price together. What type does this return?

d) Select unit_price and unit_cost together. Print the result.


Exercise 1.4 — Row Selection with .loc[]

Using .loc[]:

a) Retrieve the row for SKU ACM-005.

b) Retrieve the rows for SKUs ACM-001 and ACM-007.

c) Retrieve only the unit_price and unit_cost for SKU ACM-003.

d) Retrieve the label slice from ACM-002 to ACM-005 (all four rows).


Exercise 1.5 — Row Selection with .iloc[]

Using .iloc[]:

a) Retrieve the first row.

b) Retrieve the last row.

c) Retrieve rows at positions 2, 3, and 4 (a slice).

d) Retrieve the first 4 rows but only the first 3 columns.

e) What is the difference in output between df.iloc[0] and df.iloc[0:1]? Demonstrate with code and explain the difference.


Tier 2: Core Operations

These exercises practice the operations you will use daily: Boolean filtering, adding columns, and sorting.


Exercise 2.1 — Boolean Filtering: Single Condition

a) Filter the DataFrame to show only products in the Gadgets category.

b) Filter to show only products with unit_price greater than $50.

c) Filter to show only products with inventory less than or equal to 50.

d) Filter to show products where unit_cost is less than $15.

For each filter, print how many rows match using an f-string.


Exercise 2.2 — Boolean Filtering: Combined Conditions

a) Products in the Gadgets category with unit_price above $100.

b) Products with inventory below their reorder_point. (Hint: you can compare two columns directly.)

c) Products that are either in the Widgets category OR have a unit_price below $20.

d) Products where unit_price is between $30 and $100 (inclusive). Do this using the & operator.

e) All products that are NOT in the Components category.


Exercise 2.3 — Adding Calculated Columns

Add the following columns to df. Make sure each calculation is vectorized (no loops):

a) gross_profit = unit_price minus unit_cost

b) margin_pct = gross_profit divided by unit_price, rounded to 4 decimal places

c) potential_revenue = unit_price multiplied by inventory

d) is_below_reorder = a Boolean column that is True when inventory is less than reorder_point

e) shipping_cost = weight_lbs multiplied by 2.85 (a flat per-pound shipping rate), rounded to 2 decimal places

After adding all five columns, print df.columns to confirm they were added.


Exercise 2.4 — Sorting

a) Sort the DataFrame by unit_price from highest to lowest.

b) Sort by margin_pct from lowest to highest (worst margins first).

c) Sort by category alphabetically (A to Z), then by unit_price within each category from highest to lowest.

d) Sort by potential_revenue descending and show the top 3 rows. Which product has the highest potential revenue if all inventory is sold?


Exercise 2.5 — Dropping Data

a) Drop the weight_lbs column and assign the result to a new variable df_no_weight. Confirm the column is gone.

b) Drop the shipping_cost column in place using inplace=True. Confirm the column is gone by printing df.columns.

c) Remove the row for ACM-010 from df_no_weight. How many rows remain?

d) Create a new DataFrame called active_inventory_df that contains only products where inventory is greater than 0 AND the product is not Components. Do this using a filter, not .drop().


Tier 3: Analysis

These exercises combine multiple operations to answer realistic business questions.


Exercise 3.1 — Margin Analysis

Using the margin_pct column you added in Exercise 2.3:

a) How many products have a margin below 25%? Print the count and list the product names.

b) What is the average margin across all products? Across Gadgets only? Across Widgets only?

c) Which single product has the lowest margin? Use .idxmin() to get its SKU, then retrieve its full row.

d) Create a new column called margin_tier that classifies each product as: - "Low" if margin_pct < 0.25 - "Medium" if margin_pct is between 0.25 and 0.45 (inclusive) - "High" if margin_pct > 0.45

Use np.where() or pd.cut(). After creating the column, use .value_counts() on margin_tier to see how many products are in each tier.


Exercise 3.2 — Inventory and Reorder Analysis

a) How many products are below their reorder point?

b) What is the total inventory value of products below their reorder point? (Inventory value = unit_cost × inventory.)

c) Among products below the reorder point, which one has the highest unit_price? This is the most financially impactful stockout risk.

d) Add a column reorder_urgency that shows how many units are needed to reach the reorder point (i.e., reorder_point - inventory, but never negative — use .clip(lower=0)).

e) Sort the in-stock-shortage products by reorder_urgency in descending order and display the result.


Exercise 3.3 — Category Comparison

Without using .groupby() (that comes in Chapter 12), answer these questions using filters:

a) How many Widgets, Gadgets, and Components are in the catalog? (Three separate filters.)

b) What is the total potential revenue for Gadgets vs. Widgets vs. Components?

c) What is the average unit price for each category?

d) Which category has the highest average gross profit per unit?


Exercise 3.4 — Building a Report

Write code that produces a clean printed report in the following format. Use f-strings and .sort_values() to build it:

==============================
ACME CORP PRODUCT REPORT
==============================

Total SKUs: 10

Category Breakdown:
  Widgets:    4 products
  Gadgets:    4 products
  Components: 2 products

Top 3 Products by Potential Revenue:
  1. [product name]   $[revenue]
  2. [product name]   $[revenue]
  3. [product name]   $[revenue]

Products Needing Reorder:
  [list each SKU and product name]

Average Margin by Category:
  Widgets:    XX.X%
  Gadgets:    XX.X%
  Components: XX.X%

All values should be calculated dynamically from the DataFrame — do not hard-code any numbers.


Tier 4: Application

These exercises present open-ended problems. There is no single correct solution; focus on writing clean, readable code that produces correct results.


Exercise 4.1 — Price Sensitivity Analysis

Acme's pricing team wants to know: if they increase all Gadget prices by 10%, what happens to the potential revenue and margin?

a) Create a new DataFrame called gadgets_df containing only Gadgets.

b) Add a column new_unit_price = unit_price × 1.10 (10% increase).

c) Recalculate new_margin_pct using the new price and the original cost.

d) Calculate the change in potential revenue for each Gadget (new price × inventory vs. old price × inventory).

e) Print a comparison showing each Gadget's original and new margin, and the dollar change in potential revenue.


Exercise 4.2 — ABC Inventory Classification

In inventory management, the ABC method classifies products by their contribution to total revenue:

  • Class A: Top 20% of products by potential revenue (these are your critical items)
  • Class B: Next 30% of products
  • Class C: Bottom 50% of products

a) Sort the DataFrame by potential_revenue descending.

b) The top 2 products (20% of 10) are Class A. The next 3 are Class B. The remaining 5 are Class C. Add an abc_class column with values "A", "B", or "C" accordingly.

Hint: You can do this with three separate assignment statements using .iloc[] combined with .index to get the SKU labels, then assigning to .loc[].

c) Filter the DataFrame to show only Class A products. What percentage of total potential revenue do they account for?

d) Are any Class A products currently below their reorder point? This would be a critical business risk.


Exercise 4.3 — Building a Reusable Pricing Tool

Write a function called apply_discount(df, category, discount_pct) that:

  • Accepts a DataFrame, a category string, and a discount percentage (as a decimal, e.g., 0.15 for 15%)
  • Returns a new DataFrame containing only the specified category's products, with a new column discounted_price = unit_price × (1 - discount_pct)
  • Also adds a column margin_after_discount calculated using the discounted price and original cost
  • Does not modify the original DataFrame

Test your function:

widgets_sale = apply_discount(df, "Widgets", 0.15)
print(widgets_sale[["product_name", "unit_price", "discounted_price", "margin_after_discount"]])

What is the margin on Widget C (ACM-006) after a 15% discount? Is it still profitable?


Exercise 4.4 — Vendor Cost Comparison

Acme has received quotes from a new vendor who can supply all Components at a 12% lower cost. All other product costs remain the same.

a) Create a copy of the DataFrame (df_new_vendor = df.copy()).

b) For all Components rows, update unit_cost to be 12% lower.

c) Recalculate gross_profit and margin_pct for the updated DataFrame.

d) Compare the margin on each Component product before and after the vendor switch. How much does the margin improve?

e) What is the total annual savings if Acme sells through its entire Component inventory at the new cost?


Tier 5: Challenge

These exercises push beyond the chapter's core material. They may require looking up additional pandas documentation or combining techniques in ways not explicitly covered. Expect to spend time on these.


Exercise 5.1 — Multi-Level Filtering with Dynamic Thresholds

Write code that identifies products that are "simultaneously stressed" — meeting ALL of the following conditions at once:

  • Margin is below the median margin of all products (calculate this dynamically, do not hard-code)
  • Inventory is below the 75th percentile of all reorder points
  • Unit price is above $30 (these are not cheap items, making low margin more impactful)

Use .describe() or .quantile() to calculate the thresholds dynamically. Print the matching products and explain why each condition matters from a business perspective.


Exercise 5.2 — The Watchlist Report

You are Priya. Sandra wants a weekly "watchlist" email she can forward to her team. Write a Python function called generate_watchlist(df) that:

  1. Accepts the product catalog DataFrame
  2. Calculates all needed derived columns internally (do not require them to be pre-added)
  3. Returns a formatted multi-line string (not a DataFrame — a string, for emailing) that includes: - A count of products in each category - Products below 25% margin with their current margin shown - Products below reorder point with units needed to reach the reorder threshold - A one-line summary: "X products require attention: Y low-margin, Z below reorder."

Test it with: print(generate_watchlist(df))


Exercise 5.3 — Correlation Without groupby

Without using .groupby(), calculate and print a "category performance matrix" showing, for each category:

  • Average unit price
  • Average margin percentage
  • Average inventory
  • Total potential revenue
  • Count of products below reorder point

Store the results in a new DataFrame called category_matrix_df where the index is the category name. Each row should represent one category. Each column should be one of the five metrics above.

Hint: Build each result as a Series (using filtering and .mean(), .sum(), etc.), then combine them into a DataFrame manually.


Exercise 5.4 — Inventory Optimization Simulation

Simulate what happens to Acme's total inventory cost over three months if:

  • Products below reorder point are restocked to twice their reorder point in Month 1
  • Sales consume inventory at a rate of reorder_point / 3 units per month for each product
  • Any product falling below its reorder point mid-simulation is restocked again at the start of the following month

Build a DataFrame with columns for Month 1, Month 2, and Month 3 inventory levels for each product. Plot the total inventory value over time (calculate the value as unit_cost × inventory_level for each month). Which product drives the most volatility in total inventory cost?

Note: This exercise does not require matplotlib — a printed table of monthly inventory values is sufficient. The plotting reference is aspirational for those who explore Chapter 14 early.


Exercise 5.5 — Free-Form Business Problem

You are Maya Reyes. You have just onboarded a new client: a regional bookstore chain with 8 locations. They have given you a product catalog of their top 50 bestselling books with the following fields: isbn, title, author, genre, retail_price, wholesale_cost, monthly_sales_units, current_stock.

Design and build (using the product catalog DataFrame as a template, inventing realistic data for 10 books) a pandas-based analysis that answers the three questions their operations manager cares about most. Define what those three questions are, justify why they matter, and write the code. There is no single correct answer — this is a design exercise as much as a coding exercise.


Grading Guide for Instructors

Tier Focus Recommended Weighting
Tier 1 Recall and basic mechanics 15%
Tier 2 Core operations 25%
Tier 3 Multi-step analysis 30%
Tier 4 Applied problem-solving 20%
Tier 5 Extension and synthesis 10% (bonus)

Tiers 1–3 have deterministic correct answers. Tiers 4–5 should be assessed on code quality, correctness, and clarity of reasoning.