Case Study 2: From Spreadsheet Chaos to Notebook Clarity — A Business Analyst's Migration Story

Contributors to Introduction to Data Science

Case Study 2: From Spreadsheet Chaos to Notebook Clarity — A Business Analyst's Migration Story

Tier 3 — Illustrative/Composite Example: Marcus and Rise & Shine Bakery are fictional, but this case study reflects common real-world experiences of small business owners and analysts who transition from spreadsheet-based analysis to programmatic workflows. The pain points, learning curve, and eventual benefits described here are composites of widely documented experiences in the business analytics community. All names, data, and business details are invented for pedagogical purposes.

The Before: Marcus's Spreadsheet Empire

You met Marcus in Chapter 1. He owns Rise & Shine Bakery, a small but growing bakery that he started three years ago after a decade of working in restaurants. Marcus is smart, hard-working, and detail-oriented. He's also buried in spreadsheets.

Here's what Marcus's data situation looks like:

The Sales Tracker. Marcus has an Excel workbook called Sales_2024_FINAL_v3_ACTUAL_FINAL.xlsx. (The filename alone tells you something about the state of things.) It has 14 tabs:

"Jan," "Feb," "Mar," ... , "Dec" — one tab per month, each with daily sales broken down by product
"Summary" — yearly totals with formulas that reference the monthly tabs
"Charts" — a few bar charts that Marcus made six months ago and hasn't updated since

Each monthly tab has about 30 rows (one per day) and 15 columns (date, total sales, croissant count, muffin count, bread loaf count, catering revenue, and so on). The formatting is inconsistent: some cells are currency-formatted, some are plain numbers, some have notes in yellow highlight, and one column header says "Misc." without any explanation of what it contains.

The Inventory Sheet. A separate workbook tracks ingredient purchases, supplier invoices, and waste. It has conditional formatting rules that nobody remembers setting up, and several #REF! errors from a time when Marcus accidentally deleted a column.

The "Analysis." When Marcus wants to answer a question — like "Are my Saturday sales consistently higher than weekday sales?" — he does the following:

Opens the Sales Tracker
Scrolls through each monthly tab, visually scanning the Saturday rows
Maybe copies the Saturday values into a new tab
Tries to create a chart
Fights with Excel's chart formatting for 20 minutes
Gives up and estimates the answer based on gut feeling

This process has several problems. First, it's slow — each question takes 30-60 minutes of manual work. Second, it's unreliable — Marcus occasionally misses a row or copies the wrong column. Third, it's not reproducible — if his accountant asks "how did you calculate your Saturday average?", Marcus can't retrace his exact steps. Fourth, it doesn't scale — when he has two years of data, the workbook becomes sluggish, and the manual process doubles in length.

Marcus knows something needs to change. He's heard about Python and Jupyter from a friend who took a data analytics bootcamp. But he's skeptical. "I've been running a business with Excel for three years," he tells his friend. "Why would I spend weeks learning a programming language to do what I already know how to do?"

It's a reasonable question. Let's follow Marcus as he answers it.

The Catalyst: The December Disaster

The tipping point comes in December. Marcus gets an email from a local business magazine that wants to feature Rise & Shine in a "local businesses" profile. They ask for some basic data: total revenue for each quarter, the top 5 best-selling products, and whether sales have been growing year-over-year.

Marcus thinks: "Easy. I have all this data in my spreadsheet."

It takes him four hours.

The quarterly totals required manually adding up three monthly tabs per quarter, because his "Summary" tab had a broken formula that was silently giving the wrong number. (The March tab had an extra row for "void transaction" that threw off the SUM formula. Marcus had never noticed.)

The top 5 products required scrolling through every monthly tab and manually tallying sales by product type, because there was no pivot table or aggregation anywhere in the workbook. He made several counting errors and had to start over twice.

The year-over-year comparison was impossible because Marcus's 2023 data was in a different workbook with a slightly different column layout, and he couldn't easily merge the two.

After four hours, Marcus sends the magazine his best estimates with a note: "These are approximate. I'd need more time for exact figures."

That evening, he texts his friend: "Tell me about the Python thing."

Week 1: Learning to Walk

Marcus's friend lends him a copy of this textbook. He follows Chapter 2 — the chapter you just read — and installs Anaconda. It takes 15 minutes. He opens Jupyter Notebook. He creates his first notebook.

And then he types:

print("Hello, Rise & Shine!")

He presses Shift+Enter. The words appear. Marcus grins.

He spends the next hour typing arithmetic into code cells:

# December daily sales
dec_sales = [1240, 1180, 1350, 980, 1420, 1560, 1100,
             1280, 1190, 1340, 1010, 1450, 1520, 1080,
             1300, 1220, 1380, 1040, 1480, 1600, 1150,
             1320, 1250, 1400, 1060, 1500, 1580, 1120,
             1340, 1260, 1390]

total = sum(dec_sales)
average = total / len(dec_sales)
print("December total:", total)
print("December average:", average)
print("December days:", len(dec_sales))

(We haven't formally covered lists or sum() or len() yet — those come in Chapters 4 and 5. But Marcus's friend showed him enough to get started, and Jupyter's immediate feedback let him experiment.)

The output appears instantly:

December total: 39120
December average: 1262.58...
December days: 31

Marcus stares at the screen. In Excel, getting December's total required clicking on the right tab, selecting the right column, and using the SUM function — which might or might not include the right cells depending on whether he selected the full range. Here, the number is in the code. If it's wrong, he can see why. If it's right, he can prove it.

But more than the calculation, Marcus notices something else: he wrote a sentence above his code explaining what he was doing. The # comment says "December daily sales." When he comes back to this notebook in January, he'll know exactly what this cell contains and why.

In Excel, there was no good place to write "here's what I'm doing and why." In Jupyter, it's the default.

Week 2: The "Aha" Moment

Marcus works through Chapters 3 and 4 of this book. He learns variables, data types, loops, and functions. Then he tries something ambitious: replicating his Saturday-vs-weekday analysis in Jupyter.

In Excel, this analysis took him 45 minutes of scrolling, copying, and eyeballing. In Jupyter, once he has the data in a list with corresponding day labels, the code is:

# This is a preview — we'll learn these techniques properly in later chapters
saturday_sales = [s for s, d in zip(dec_sales, dec_days) if d == "Saturday"]
weekday_sales = [s for s, d in zip(dec_sales, dec_days) if d != "Saturday" and d != "Sunday"]

sat_avg = sum(saturday_sales) / len(saturday_sales)
wkd_avg = sum(weekday_sales) / len(weekday_sales)

print(f"Saturday average: ${sat_avg:.2f}")
print(f"Weekday average: ${wkd_avg:.2f}")
print(f"Saturday premium: ${sat_avg - wkd_avg:.2f} ({(sat_avg - wkd_avg) / wkd_avg * 100:.1f}%)")

The code takes 2 minutes to write. It runs in less than a second. The output clearly shows the Saturday premium. And here's the key: Marcus can now run this exact same analysis on November's data, October's data, or all of 2024, by changing one variable. In Excel, he'd have to repeat the entire manual process.

This is what programmers call automation, and it's the moment Marcus understands why Python exists.

He adds a Markdown cell above the code:

## Saturday vs. Weekday Sales Analysis

**Question:** Are Saturday sales consistently higher than weekday sales?
**Why it matters:** If Saturdays are reliably stronger, I should schedule my
best baker (me) for Saturdays and consider hiring weekend-specific staff.

He looks at his notebook. Code and explanation, side by side. Clear, readable, reproducible. He can share this with his accountant, his spouse, or the business magazine, and they can follow the logic from question to answer.

"This," Marcus tells his friend, "is what Excel can't do."

The Migration: What Changed

Over the next several weeks (spanning Chapters 5 through 8 of this book), Marcus gradually moves his analysis workflow from Excel to Jupyter. Here's what the transition looked like:

What He Kept in Excel

Marcus didn't abandon Excel entirely. He still uses it for:

Quick data entry. When he's at the bakery jotting down the day's sales, it's faster to type into a spreadsheet on his phone than to open Jupyter.
Quick visual inspection. When he gets a new invoice, he opens it in Excel to glance at the numbers before doing any analysis.
Sharing with people who don't use Python. His accountant wants an Excel file, not a Jupyter notebook.

What He Moved to Jupyter

The analysis — everything beyond "look at the raw numbers" — moved to Jupyter:

Calculating monthly and quarterly totals — previously done with error-prone SUM formulas across tabs
Comparing products — previously done by manual tallying
Identifying trends — previously done by eyeballing
Producing charts — previously done by fighting with Excel chart formatting
Answering ad hoc questions — "what would happen if I raised croissant prices by 10%?" is now a variable change, not a hypothetical

The Before and After

Let's look at a specific example to see the concrete difference.

The question: "What were my total sales by quarter in 2024?"

The Excel way (before): 1. Open Sales_2024_FINAL_v3_ACTUAL_FINAL.xlsx 2. Navigate to the "Jan" tab. Select column B. Note the SUM at the bottom. Write it down. 3. Repeat for "Feb" and "Mar." 4. Add the three numbers (manually or in a spare cell) to get Q1. 5. Repeat the entire process for Q2, Q3, and Q4. 6. Realize that March has an extra "void transaction" row that's inflating the SUM. Exclude it. 7. Re-do the Q1 total. 8. Time: ~35 minutes. Confidence in accuracy: moderate.

The Jupyter way (after): 1. Load the sales data from the CSV export (one line of code, using pandas — Chapter 7). 2. Group by quarter and sum (two lines of code). 3. Display the result (one line of code). 4. Time: ~2 minutes (including opening the notebook). Confidence in accuracy: high. 5. Bonus: the code is saved. Next quarter, run the same notebook on updated data.

# Marcus's quarterly analysis (preview of pandas techniques from Chapter 7)
# df = pd.read_csv("sales_2024.csv")
# quarterly = df.groupby("quarter")["daily_total"].sum()
# quarterly

That commented-out code is a preview of what Marcus will learn in Chapter 7. For now, the point is the contrast: 35 minutes of manual, error-prone work versus 2 minutes of reproducible code.

The Learning Curve: What Was Hard

Marcus is honest about the learning curve. It wasn't easy. Here's what he struggled with:

The first two weeks were the hardest. Learning the Jupyter interface, understanding Python syntax, remembering keyboard shortcuts — all of this was unfamiliar. Marcus compared it to the first week of culinary school: "You know intellectually that you'll be fine, but your hands don't know what to do yet."

Error messages were initially terrifying. The first time Marcus saw a red SyntaxError traceback, he thought he'd broken something. It took him a few days to internalize that error messages are information, not punishment — they tell you exactly what went wrong and where.

The payoff wasn't immediate. For the first couple of weeks, Marcus could do things faster in Excel than in Python, simply because he'd been using Excel for years. The speedup came only after he'd built up enough Python skill to work fluently. "It's like learning to type," he reflected. "You're slower at first, but once you learn, you can never go back to hunt-and-peck."

Some things were unexpectedly easy. Markdown formatting was intuitive. The print() function made sense immediately. Basic arithmetic in Python is just... arithmetic. And the Shift+Enter workflow — write a thought, run it, see the result — matched how Marcus naturally thinks about problems.

Three Months Later: The Payoff

Three months into his Jupyter journey, Marcus sits down with the business magazine for a follow-up interview. They ask the same questions as before: quarterly totals, top products, year-over-year growth.

This time, Marcus opens his laptop, loads his notebook, and runs three cells. The answers appear in seconds, complete with charts. The journalist is impressed.

But the real payoff isn't the speed. It's the confidence. When the journalist asks, "How do you know these numbers are right?", Marcus can show the code. Every calculation is visible, every assumption is documented in Markdown, and every number can be traced back to the raw data.

"In Excel," Marcus says, "I had numbers. In Jupyter, I have a story — with receipts."

What Marcus Gained

Reproducibility. The analysis runs the same way every time. No manual steps to forget or botch.
Clarity. Markdown cells explain why each calculation exists, not just what it computes.
Scalability. Adding a new month of data is trivial. Adding a new year is the same amount of work as adding a month.
Confidence. Marcus trusts his numbers now because he can verify every step.
Speed (eventually). After the initial learning curve, analysis that used to take an hour takes 5 minutes.
Sharability. Marcus can email his notebook to his accountant, and the accountant can follow the logic even without knowing Python — the Markdown explanations make it readable as a document.

What Marcus Lost

Nothing, really. He still has Excel for quick data entry and inspection. He just stopped using it as an analysis tool. It's like switching from handwriting to typing — you can still handwrite a grocery list, but you wouldn't handwrite a business report anymore.

Lessons for Your Own Journey

Marcus's experience illustrates several principles that apply to anyone transitioning from spreadsheets to notebooks:

The question-first habit transfers directly. Marcus was already asking good questions about his business ("Are Saturday sales higher?"). He just gained a better tool for answering them. The thinking skills from Chapter 1 don't change when the tool changes.

Notebooks enforce good habits. In Excel, it's easy to be sloppy — no comments, no explanations, just numbers in cells. In Jupyter, the presence of Markdown cells invites you to explain yourself. The tool's design encourages good practices.

The learning curve is front-loaded. The first two weeks are the hardest. After that, each new skill builds on the previous ones, and progress accelerates. If you're in the "this is harder than Excel" phase right now, keep going. Marcus was too, and he came out the other side.

You don't have to abandon your old tools overnight. Marcus didn't throw away Excel on Day 1. He used both tools for weeks, gradually shifting more analysis to Jupyter as his skills grew. There's no pressure to make a clean break. Use whatever tools help you get your work done, and let the transition happen naturally.

The notebook is the deliverable. In Excel, the spreadsheet is a workspace and the deliverable is a separate report you write about the spreadsheet. In Jupyter, the notebook is the report. The analysis and the communication happen in the same document. This is a fundamental shift in how you work, and it's one of the most powerful things about the notebook model.

Discussion Questions

Marcus's tipping point was the "December Disaster" — a task that should have been easy but wasn't. Have you experienced a similar moment with a tool you use, where you realized the tool was holding you back? What happened?
Marcus kept using Excel for quick data entry and visual inspection. Can you think of other situations where a spreadsheet is genuinely the better tool, even if you know Python?
The case study mentions that Marcus's accountant can read his notebook even without knowing Python, thanks to the Markdown explanations. How important is it for data analysis to be readable by non-technical people? Can you think of a situation where it matters and one where it doesn't?
Marcus's learning curve was "front-loaded" — hard at first, then easier. Is this typical of learning new tools in general? What strategies might help someone push through the difficult early period?
Re-read the "The Excel way (before)" section about quarterly totals. The process includes an error: the March tab had an extra "void transaction" row. Marcus found it during the Excel process, but what if he hadn't? How does the Jupyter approach reduce the risk of undetected errors like this?