Case Study 1: What Does the Menu Tell Us? Exploring Fast Food Nutrition Data


Tier 3 — Illustrative/Composite Example: This case study uses a fictional but realistic fast food nutrition dataset modeled on the kinds of data that major restaurant chains are required to publish under menu labeling laws in the United States (such as the FDA's 2018 calorie disclosure rule). The restaurant name, specific menu items, and all numerical values are invented for pedagogical purposes. No specific company is represented.


The Setting

Amara is a second-year nutrition science student who's become obsessed with a question that annoys her friends at lunch: How bad are fast food meals, really?

Not "bad" in a vague, hand-wavy sense. Bad in a specific, quantifiable sense. She wants numbers. She's heard the conventional wisdom — fast food is full of calories, sodium, and fat — but she's also noticed that fast food chains now offer salads, grilled chicken, and "lighter" options. Has the conventional wisdom kept up with the menu?

Amara's professor has given the class a CSV file called fastfood_nutrition.csv containing nutritional information for 120 menu items from a fictional chain called BurgerBarn. The file has 8 columns:

Column Description Example
item_name Menu item name "Classic Burger"
category Menu category "Burgers", "Chicken", "Salads", "Sides", "Drinks", "Desserts"
calories Total calories "540"
total_fat_g Total fat in grams "28"
sodium_mg Sodium in milligrams "820"
protein_g Protein in grams "25"
sugar_g Sugar in grams "9"
serving_size_g Serving size in grams "215"

Amara has taken Chapters 1-5 of her data science course. She knows Python basics, functions, loops, lists, and dictionaries. She's never used pandas. Her assignment: load the data, explore it, compute summary statistics, identify data quality issues, and write up her findings as a notebook narrative. Sound familiar?

The Questions

Amara sits down with her Jupyter notebook and writes three questions before writing any code — just like Elena's sticky-note method from Chapter 6:

  1. What is the calorie distribution across the menu? (What does a "typical" BurgerBarn meal look like nutritionally?)
  2. How do different menu categories compare? (Are salads really better than burgers by the numbers?)
  3. Are there any suspiciously extreme values that might indicate data entry errors?

These are all descriptive questions — perfect for a first exploration.

Loading and First Inspection

Amara starts with the loading pattern she learned:

import csv

data = []
with open("fastfood_nutrition.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        data.append(row)

print(f"Loaded {len(data)} menu items")
print(f"Columns: {list(data[0].keys())}")
Loaded 120 menu items
Columns: ['item_name', 'category', 'calories', 'total_fat_g',
          'sodium_mg', 'protein_g', 'sugar_g', 'serving_size_g']

She checks the first few rows:

for row in data[:3]:
    print(f"{row['item_name']:30s} | {row['category']:10s} | {row['calories']} cal")
Classic Burger                 | Burgers    | 540 cal
Double Stack Burger            | Burgers    | 820 cal
Bacon Deluxe                   | Burgers    | 930 cal

Then she counts items by category:

category_counts = {}
for row in data:
    cat = row["category"]
    category_counts[cat] = category_counts.get(cat, 0) + 1

for cat, count in sorted(category_counts.items()):
    print(f"  {cat}: {count} items")
  Burgers: 18 items
  Chicken: 15 items
  Desserts: 14 items
  Drinks: 22 items
  Salads: 12 items
  Sides: 20 items
  Breakfast: 19 items

Seven categories, not six — the data has a "Breakfast" category that wasn't in the original schema description. Amara makes a note: "Found 7 categories, not 6 as documented. Breakfast category present with 19 items."

Computing Summary Statistics

Amara writes a utility function to extract numeric values safely (handling empty strings, which she already knows are a potential problem from Chapter 6):

def safe_extract(data, column):
    """Extract numeric values from a column, skipping empties."""
    values = []
    skipped = 0
    for row in data:
        raw = row[column].strip()
        if raw == "":
            skipped += 1
            continue
        try:
            values.append(float(raw))
        except ValueError:
            skipped += 1
    return values, skipped

She runs it on calories:

cal_values, cal_skipped = safe_extract(data, "calories")
print(f"Calorie data: {len(cal_values)} values, {cal_skipped} skipped")
print(f"Min:    {min(cal_values):.0f}")
print(f"Max:    {max(cal_values):.0f}")
print(f"Mean:   {sum(cal_values)/len(cal_values):.0f}")

sorted_cals = sorted(cal_values)
mid = len(sorted_cals) // 2
median_cal = sorted_cals[mid] if len(sorted_cals) % 2 else (sorted_cals[mid-1] + sorted_cals[mid]) / 2
print(f"Median: {median_cal:.0f}")
Calorie data: 118 values, 2 skipped
Min:    0
Max:    1450
Mean:   487
Median: 430

Two records have missing calorie data. And that minimum of 0 catches her eye immediately. Zero calories? That's either water or a data error.

# Find the zero-calorie items
for row in data:
    if row["calories"].strip() == "0":
        print(f"  Zero calories: {row['item_name']} ({row['category']})")
  Zero calories: Water Bottle (Drinks)
  Zero calories: Diet Soda Large (Drinks)

Water makes sense. Diet soda at exactly 0 is also reasonable (it might technically be under 5 calories, which is rounded to 0 per FDA labeling rules). Not errors — just edge cases. Amara writes a note explaining this.

Comparing Categories

Now for Amara's second question: how do categories compare? She writes a loop to compute mean calories by category:

print("Mean calories by category:")
print("-" * 40)
for cat in sorted(category_counts.keys()):
    cat_cals = []
    for row in data:
        if row["category"] == cat and row["calories"].strip() not in ("", ):
            try:
                cat_cals.append(float(row["calories"]))
            except ValueError:
                pass
    if cat_cals:
        mean_cal = sum(cat_cals) / len(cat_cals)
        print(f"  {cat:12s}  {mean_cal:6.0f} cal  (n={len(cat_cals)})")
Mean calories by category:
----------------------------------------
  Breakfast       485 cal  (n=19)
  Burgers         695 cal  (n=18)
  Chicken         542 cal  (n=15)
  Desserts        410 cal  (n=14)
  Drinks          198 cal  (n=20)
  Salads          345 cal  (n=12)
  Sides           312 cal  (n=20)

The conventional wisdom holds up — burgers are the highest-calorie category at 695 calories on average. Salads (345) are substantially lower, though Amara notes they're not exactly low-calorie. "A 345-calorie salad is probably fine for a meal," she writes, "but if someone thinks a salad is automatically 'light,' they might be surprised."

She extends the analysis to sodium:

print("\nMean sodium (mg) by category:")
print("-" * 40)
for cat in sorted(category_counts.keys()):
    cat_sodium = []
    for row in data:
        if row["category"] == cat and row["sodium_mg"].strip() != "":
            try:
                cat_sodium.append(float(row["sodium_mg"]))
            except ValueError:
                pass
    if cat_sodium:
        mean_na = sum(cat_sodium) / len(cat_sodium)
        print(f"  {cat:12s}  {mean_na:6.0f} mg   (n={len(cat_sodium)})")
Mean sodium (mg) by category:
----------------------------------------
  Breakfast       890 mg   (n=19)
  Burgers         980 mg   (n=18)
  Chicken         920 mg   (n=14)
  Desserts        245 mg   (n=14)
  Drinks           65 mg   (n=20)
  Salads          710 mg   (n=12)
  Sides           620 mg   (n=20)

"Here's the surprise," Amara writes in her notebook. "Salads have the third-highest mean sodium — 710 mg. That's because many salads include cheese, dressing, and seasoned toppings. The 'healthy' option isn't necessarily low in sodium."

Spotting Data Quality Issues

Amara runs a missing values check:

print("Missing values by column:")
for col in data[0].keys():
    missing = sum(1 for row in data if row[col].strip() == "")
    pct = (missing / len(data)) * 100
    print(f"  {col:20s}  {missing:3d} ({pct:.1f}%)")
Missing values by column:
  item_name               0 (0.0%)
  category                0 (0.0%)
  calories                2 (1.7%)
  total_fat_g             3 (2.5%)
  sodium_mg               1 (0.8%)
  protein_g               2 (1.7%)
  sugar_g                 4 (3.3%)
  serving_size_g          5 (4.2%)

Most missingness is low (under 5%), which is manageable. But she also notices something when she checks for extreme values:

# Check for outliers in calories
cal_sorted = sorted(cal_values, reverse=True)
print("Top 5 calorie counts:")
for val in cal_sorted[:5]:
    for row in data:
        if row["calories"].strip() != "" and float(row["calories"]) == val:
            print(f"  {val:.0f} cal: {row['item_name']}")
            break
Top 5 calorie counts:
  1450 cal: The Monster Triple Stack
  1380 cal: Ultimate Breakfast Platter
  1250 cal: Loaded Nachos Supreme
  980 cal: Bacon BBQ Chicken Sandwich
  930 cal: Bacon Deluxe

The 1,450-calorie Monster Triple Stack is extreme but plausible — it's a triple-patty burger. But Amara wants to check one more thing: are there any values that seem physically impossible?

# Check for unreasonably high values
for row in data:
    if row["serving_size_g"].strip() != "":
        serving = float(row["serving_size_g"])
        if serving > 1000:
            print(f"  Large serving: {row['item_name']} - {serving}g")
    if row["sodium_mg"].strip() != "":
        sodium = float(row["sodium_mg"])
        if sodium > 3000:
            print(f"  High sodium: {row['item_name']} - {sodium}mg")
  High sodium: Loaded Nachos Supreme - 3240mg

3,240 mg of sodium in a single menu item — that's more than the American Heart Association's recommended daily limit of 2,300 mg. It's extreme, but for a large plate of loaded nachos, it's probably accurate rather than erroneous. Amara flags it as an outlier worth noting but not an error.

Key Findings

Amara concludes her notebook with a findings section:

Finding 1: The average BurgerBarn menu item contains 487 calories, but this varies enormously by category — from 198 calories (drinks) to 695 calories (burgers).

Finding 2: Salads are lower in calories than burgers (345 vs. 695) but are not low in sodium (710 mg average). Customers choosing salads for health reasons should be aware of hidden sodium.

Finding 3: The highest-calorie single item (The Monster Triple Stack, 1,450 calories) contains roughly 72% of a standard 2,000-calorie daily intake in one meal.

Finding 4: Data quality is generally good, with under 5% missing values in any column. No obvious data entry errors were found, though some extreme values (3,240 mg sodium in the nachos) merit attention.

Next questions: How do these numbers compare to FDA daily recommended values? Are there items that are simultaneously low in calories, fat, and sodium? Would breaking down by individual macronutrient ratios tell a different story than looking at each nutrient in isolation?

What This Case Study Illustrates

Amara's exploration demonstrates several principles from Chapter 6:

  1. Questions first. She defined three questions before writing code, which gave direction to her entire analysis.
  2. EDA as conversation. Each finding naturally raised the next question. The calorie analysis led to the sodium analysis, which revealed the "healthy salad" surprise.
  3. Data quality matters. She found and investigated the zero-calorie items, missing values, and extreme outliers rather than blindly computing averages.
  4. Notebook narrative. She mixed code, output, and plain-English interpretation into a document that tells a story.
  5. Pure Python is sufficient. She accomplished all of this with csv.DictReader, loops, dictionaries, and basic arithmetic — the same tools you learned in Chapters 3-5.

The analysis isn't perfect. Amara would be the first to admit that with pandas, she could have done this in a fraction of the code. But she understands every step, and when she learns pandas in the next chapter, she'll appreciate what it's doing for her.