Case Study 1: From Spreadsheets to Strategy — How a Local Coffee Chain Used Data Science to Survive a Pandemic
Tier 3 — Illustrative/Composite Example: Sunrise Coffee is a fictional company, but this case study is built from widely reported patterns in how small and mid-sized food-service businesses adapted during the COVID-19 pandemic. The data challenges, business decisions, and outcomes described here are composites of real-world experiences documented in industry reporting. No specific company is represented, and all names, locations, and figures are invented for pedagogical purposes.
The Setting
Imagine it's March 2020. You're Dara Okafor, the Director of Operations for Sunrise Coffee, a regional coffee chain with 12 locations spread across a mid-sized metropolitan area. The city straddles a river: six locations on the east side (a mix of downtown offices and a university campus), four on the west side (suburban neighborhoods and a shopping center), and two in outlying towns about 20 miles from the city core.
Sunrise Coffee is not Starbucks. It doesn't have a billion-dollar analytics department or an army of data scientists. What it has is a small central office with five people, a point-of-sale system that tracks every transaction, a loyal customer base, and an owner — Miriam Chen — who has been running coffee shops for 22 years.
And then everything stops.
Stay-at-home orders roll in. Foot traffic evaporates. The downtown locations, which depend on office workers grabbing lattes on their way to the elevator, see sales drop 85% in a single week. The university campus location closes entirely when students are sent home. Revenue across all 12 locations falls from $380,000 per week to $95,000 — and the overhead doesn't fall with it.
Miriam faces a decision that thousands of small business owners faced in 2020: Which locations should stay open? Which should pivot to delivery and drive-through only? And which should close temporarily — or permanently?
Her instinct, built over two decades of experience, says to keep the downtown locations open because they've always been the highest earners. But 2020 is not a normal year, and instinct alone isn't going to cut it.
This is where data science enters the story — not as a buzzword, not as a technology demo, but as a practical way of thinking through a life-or-death business decision.
The Question
The first thing a data scientist does — before touching any data, before opening any spreadsheet, before writing any code — is ask a clear question.
Dara, who has a business degree and has been reading about data-driven decision-making, pushes the team to be specific. "We can't just ask 'what should we do?'" she says at an emergency meeting. "That's too vague. We need to break it down."
After an hour of discussion, they land on three concrete questions:
-
Which locations have the best chance of generating positive revenue under pandemic restrictions? (This is a descriptive question — what does the data say about current conditions?)
-
If we add delivery service, which locations are positioned to capture the most delivery demand? (This is a predictive question — based on what we know, what's likely to happen if we change something?)
-
What's the minimum number of locations we need to keep open to avoid laying off our core staff? (This is a prescriptive question — what action should we take to achieve a specific goal?)
Notice something important: none of these questions mention algorithms, machine learning, or Python. They're business questions. The data science part is about figuring out how to answer them rigorously, rather than guessing.
🚪 Threshold Concept: This is the first lesson of data science, and arguably the most important one in this entire book: the question comes before the data, and the data comes before the tools. The most common mistake beginners make is diving into a dataset and looking for "interesting patterns" without knowing what they're looking for or why. Sunrise Coffee didn't start with a spreadsheet. They started with a crisis and a set of decisions they needed to make. The data was a means to an end.
The Data Science Approach
Let's walk through how Dara and her team approached this problem, mapping their work to the data science lifecycle introduced in the main chapter. This lifecycle has six stages, and real projects cycle through them messily — not in a neat, linear march.
Stage 1: Ask a Question
We've already done this. The three questions above gave the team a target. But even here, there was refinement. The first version of question 2 was "Should we add delivery?" Dara pushed back: "That's a yes/no question, and the answer is obviously yes — everyone's doing delivery. The real question is where delivery will work best, because we can't afford to set it up at all 12 locations simultaneously."
Good data science questions share a few traits: they're specific enough to be answerable, they're connected to a decision someone needs to make, and they acknowledge what's not known. "Which locations should we keep open?" is better than "What should we do?" but worse than "Which locations are projected to cover their operating costs under delivery-only operations in the next 90 days?"
Stage 2: Gather Data
Here's where things got interesting — and messy.
Dara realized they needed several types of data to answer their questions:
Data they already had: - Point-of-sale (POS) transaction records: Every sale at every location, going back three years. This included the item sold, the price, the time of day, the payment method, and whether it was dine-in, takeout, or (at the two locations that already had it) drive-through. - Staffing schedules: Who worked where, and when. - Rent and operating costs: Monthly fixed costs for each location.
Data they needed to find: - Neighborhood demographics: Population density, median household income, age distribution, and percentage of residents working from home (a new and suddenly crucial metric in 2020). - Delivery radius data: How far could they reasonably deliver from each location? What other coffee and food delivery options already existed in each area? - Foot traffic estimates: Some locations had been thriving because of walk-in traffic from nearby offices. With offices closed, what was foot traffic actually doing?
Data they wished they had but didn't: - Customer home addresses: They knew where people bought coffee but not where they lived. Without this, they couldn't estimate delivery demand directly. - Competitor data: They had no idea what other coffee shops were doing — who was closing, who was opening for delivery, who was offering discounts. - Real-time mobility data: Tech companies were publishing aggregated mobility data showing how much movement in a neighborhood had declined, but Dara didn't know how to access or use it.
This is a completely normal experience in data science. You almost never have exactly the data you need. You have some of it, you can find some more, and you have to work around the gaps. The skill is in knowing what you can learn from the data you do have — and being honest about what you can't.
Stage 3: Clean and Prepare the Data
Dara exported the POS data into a spreadsheet. It was 847,000 rows. And it was a mess.
Here's what she found:
- Missing dates: Two locations had gaps in their records — one was missing an entire week in November 2019 (a POS system crash that nobody had noticed in the backup), and another had days where the timestamp was recorded as "00:00:00" for every transaction.
- Inconsistent location names: The same location appeared as "Sunrise - Downtown #1," "DT1," "Downtown Main," and "sunrise downtown" depending on which register recorded the sale. One location had been renamed when it moved across the street, so the same physical area had two different names in the data.
- Duplicate transactions: The POS system occasionally recorded the same sale twice — once when the payment was initiated and once when it was completed. About 3% of records were duplicates.
- Mixed formats: Dollar amounts were sometimes stored as numbers ($4.50) and sometimes as text ("four fifty" — from a brief period when one location was using a manual backup system). Dates appeared in three different formats across three POS system versions.
Cleaning this data took Dara and a part-time bookkeeper about two weeks. That might sound like a lot, but it's not unusual. A commonly cited (if hard to pin down precisely) observation in data science is that practitioners spend a very large share of their time — some say 60%, some say 80% — on cleaning and preparing data. The glamorous part is the analysis. The actual work is fixing inconsistent location names.
📊 Real-World Application: If you've ever worked with a spreadsheet where someone entered "N/A" in some cells, left others blank, typed "none" in others, and put a dash in the rest — all meaning the same thing — you've experienced the data-cleaning problem. Now imagine that spreadsheet has 847,000 rows, and the decisions you make about how to handle those inconsistencies will determine whether people keep their jobs. That's data cleaning in the real world.
Stage 4: Explore and Analyze
With clean data in hand, Dara started looking for patterns. She didn't use machine learning. She didn't use Python (she didn't know Python). She used spreadsheets, pivot tables, and bar charts. The tools don't matter. The thinking does.
Here's what the data revealed:
Finding 1: The downtown locations were subsidizing everything else. Before the pandemic, the six east-side locations (especially the three downtown ones) generated 72% of total revenue. The west-side suburban locations were profitable but modest. The two outlying locations had never been particularly strong — they existed because Miriam wanted a presence in those communities.
Finding 2: Two suburban locations had barely been affected. While downtown sales dropped 85%, two west-side locations — one near a large residential neighborhood and one next to a grocery store — had dropped only 15-20%. People working from home still wanted coffee. They were just buying it closer to home.
Finding 3: Drive-through changed everything. The two locations with drive-through windows were the top performers under pandemic conditions. Their sales had actually increased slightly, because they were capturing customers who used to go to locations that were now closed or inconvenient.
Finding 4: Delivery data was promising but thin. Only two locations had offered delivery (through a third-party app) before the pandemic. Both showed strong delivery numbers, but two data points weren't enough to predict what delivery would look like at the other ten locations.
Dara created a simple scoring system for each location, combining: - Current revenue as a percentage of pre-pandemic revenue - Proximity to residential neighborhoods (estimated from census data) - Whether the location had or could add drive-through capability - Fixed costs (rent, utilities)
This wasn't a fancy algorithm. It was a weighted spreadsheet. But it organized the information in a way that made the trade-offs visible.
Stage 5: Build a Model (or, in this case, a Recommendation)
Based on her analysis, Dara recommended a three-tier plan:
Tier 1 — Keep open, add delivery (4 locations): The two drive-through locations, plus the two suburban locations near residential neighborhoods. All four would add third-party delivery.
Tier 2 — Delivery and pickup only (3 locations): Three locations with moderate scores — close enough to residential areas to sustain some business, but not strong enough to justify full dine-in operations.
Tier 3 — Temporarily close (5 locations): The three downtown locations, the campus location, and one outlying location. These would close until conditions changed, with leases maintained where possible.
She also recommended a 90-day review cycle: every three months, the team would re-examine the data and reassign locations between tiers.
Stage 6: Communicate Results
This is the step people forget about, and it's as important as any other.
Dara had to present her recommendation to three different audiences:
-
Miriam (the owner): Miriam needed the big picture. Dara built a single-page dashboard showing each location color-coded by tier, with projected revenue and costs. Miriam could see at a glance: "If we do this, we'll still be losing money, but we'll lose $22,000 per week instead of $65,000." That was the number that convinced her.
-
Franchise managers: Each location manager needed to understand why their location was in its tier. Dara held individual calls, walking through the data specific to each location. For the managers whose locations were closing, this was a painful conversation. Dara made sure to explain that the decision was based on location characteristics, not on the manager's performance.
-
Employees: Sunrise had 94 employees across all locations. The tier plan meant that 35 would continue working, 28 would shift to reduced hours, and 31 would be furloughed. Dara worked with Miriam to present this transparently, with a timeline for reassessment and a commitment to bring people back as locations reopened.
The communication wasn't just a formality. How the results were shared affected whether people trusted the process and supported the plan.
What They Got Wrong
No analysis is perfect, and Dara's was no exception. Here's where the data led them astray:
They underestimated delivery demand in the outlying towns. The two outlying locations had weak walk-in numbers, so they scored low and were closed. But it turned out that people in those towns had fewer delivery options than people in the city. When a competitor launched delivery in one of those towns, it was immediately successful — demand that Sunrise could have captured if they'd stayed open. The data showed low walk-in traffic, but it couldn't show the unmet demand for delivery in areas with few alternatives.
They assumed downtown would recover faster than it did. The 90-day review plan assumed that downtown offices would reopen in some form by mid-2021. They didn't. Two of the three downtown locations never reopened. The leases became a financial burden that the data analysis hadn't fully accounted for — it modeled revenue scenarios but not lease-termination costs.
They relied on pre-pandemic customer behavior to predict pandemic behavior. The analysis used three years of historical data, but 2020 broke the patterns that made that data useful. Customers who had always bought coffee at 7:30 AM downtown were now buying it at 10:00 AM from a suburban location. The data couldn't show this shift because it hadn't happened yet.
These mistakes aren't failures of data science. They're reminders that data science operates under uncertainty. The data tells you about the past. Your job is to make the best future decisions you can with that information, while staying humble about what you don't know.
The Human Element
Behind every row in Dara's spreadsheet was a person.
The 31 furloughed employees had families, rent, and bills. The barista at the campus location who had worked there for six years lost her primary income. The manager of Downtown #2, who had personally renovated the space, watched it close without knowing if it would ever reopen.
Data science gives you the numbers. It doesn't make the decisions easy. And it shouldn't. When Dara looked at her scoring spreadsheet, she knew that the columns labeled "Revenue" and "Cost" were really columns labeled "People who keep their jobs" and "People who don't."
This is not a caveat or a footnote. It's central to what data science is. In Chapter 32, we'll spend an entire chapter on ethics in data science — the responsibility that comes with turning human lives into numbers, analyzing those numbers, and making recommendations that affect those lives. For now, it's enough to notice that the question "What does the data say?" is never the only question. "What should we do about it?" is a human question, not a statistical one.
🪞 Learning Check-In: Think about a time when someone made a decision that affected you based on numbers — a grade, a credit score, a performance review. Did the numbers tell the whole story? What was missing? How does that experience connect to what happened at Sunrise Coffee?
How This Connects to the Rest of the Course
If you're reading this in Week 1 of a data science course, you might be wondering: "When do I learn how to actually do this?"
That's what the rest of this book is for. Here's a preview of how the skills you'll build connect to what Dara did:
| What Dara Did | Where You'll Learn It |
|---|---|
| Exported data from a POS system | Chapter 12 (Getting Data from Files) |
| Cleaned messy, inconsistent records | Chapter 8 (Cleaning Messy Data) |
| Created pivot tables and bar charts | Chapters 7 (pandas) and 15 (matplotlib) |
| Scored and ranked locations | Chapter 19 (Descriptive Statistics) |
| Estimated delivery demand | Chapter 22 (Sampling and Estimation) |
| Communicated results to stakeholders | Chapter 31 (Communicating Results) |
| Navigated the human side of data decisions | Chapter 32 (Ethics in Data Science) |
Dara did all of this in spreadsheets. You'll learn to do it in Python — which is faster, more reproducible, and scales to problems far bigger than 12 coffee shops. But the thinking is the same. The question always comes first.
Discussion Questions
-
The question before the data. Dara's team spent an hour refining their questions before looking at any data. Why does this matter? Think of a situation where jumping straight to the data (or straight to a solution) without defining the question first could lead to wasted effort or wrong conclusions.
-
Missing data, missing people. Sunrise Coffee didn't have customer home addresses, competitor data, or real-time mobility data. How might the analysis have been different if they'd had access to these? Are there ethical concerns about a coffee shop knowing where its customers live?
-
The outlying towns. The data showed low walk-in traffic at the outlying locations, so they were closed — but it turned out there was unmet delivery demand. What does this teach us about the difference between "the data shows X" and "X is the whole picture"? Can you think of other situations where data about the past is a poor guide to the future?
-
Communication matters. Dara presented the same analysis three different ways to three different audiences. Why? Think about a time when how information was presented to you mattered as much as the information itself.
-
Data science without code. Dara didn't use Python, machine learning, or any sophisticated algorithms. She used spreadsheets and clear thinking. Does this count as data science? Why or why not? What would have been different if she had used more advanced tools?
Mini-Project: Apply the Lifecycle to a Business You Know
Choose a business you interact with regularly — a grocery store, a gym, a restaurant, a campus bookstore, an online service. It can be a business you work at, shop at, or simply observe.
Now work through the data science lifecycle for a question that business might face:
Step 1: Define the question. What's a specific, answerable question this business could explore with data? (Not "How can we make more money?" but something like "Which products should we stock more of on weekends?" or "Do members who attend group classes retain their memberships longer?")
Step 2: What data would you need? List the data sources that would be relevant. Which does the business probably already have? Which would they need to find externally?
Step 3: How might the data be messy? Think of at least three specific ways the data might be incomplete, inconsistent, or hard to work with.
Step 4: What analysis would you do? You don't need to actually do the analysis — describe, in plain language, what you would look for in the data. What comparisons would you make? What patterns would you hope to find?
Step 5: How would you communicate the results? Who would need to see your findings? What format would be most convincing for that audience?
Write up your answers in a notebook, a document, or even on paper. Keep it to one page. This exercise isn't about getting the "right" answer — it's about practicing the habit of thinking through a problem systematically before reaching for tools.
📝 Note: You'll revisit this mini-project in Chapter 6, when you have enough Python skills to actually start exploring a dataset. Choosing a question now gives you a head start.