Case Study 14-02: Raj's Data Analysis Sprint

Advanced Data Analysis Under Real Time Pressure

Persona: Raj Patel, Senior Data Analyst at a B2B SaaS company. Works in Python and SQL. Primary tools: BigQuery, dbt, Looker. Supports multiple product and business stakeholders. Reports to Head of Analytics.

The Situation: It is 9:47 AM on a Tuesday. The quarterly business review is at 2:00 PM. Raj's director, who is presenting, has just forwarded a request from the CFO: "Can you add a breakdown of revenue by customer segment showing growth versus last quarter, with any interesting patterns you spot? Attach something visual if possible."

This is a routine analysis in concept. In practice, it requires pulling and processing raw transaction data, building segment-level aggregations, calculating quarter-over-quarter comparisons, identifying statistical patterns, generating charts, and producing something presentable — all in a format the director can use in a presentation.

Without Advanced Data Analysis: three hours of SQL queries, Python processing, Excel chart building, and manual slide-making. With a two-hour buffer before the meeting, it would be tight.

With Advanced Data Analysis: Raj's actual path.


The Dataset

The dataset the director attached was a CSV exported from their data warehouse. It contained 50,247 rows of transaction records with these columns:

  • transaction_id (string)
  • transaction_date (date, format: YYYY-MM-DD)
  • customer_id (string)
  • customer_segment (string: Enterprise, Mid-Market, SMB, Unknown)
  • product_line (string: Core Platform, Analytics Add-On, Integrations Pack, Professional Services)
  • transaction_type (string: New Business, Expansion, Renewal, Churn Credit)
  • amount_usd (float)
  • rep_region (string: North America, EMEA, APAC)
  • contract_length_months (integer)

The company operates on a fiscal quarter system with quarters ending on the last calendar day of March, June, September, and December.


The Full Prompt Sequence

Opening Message (9:51 AM)

Raj uploaded the CSV and sent:

I have a quarterly business review in 2 hours. My director needs a breakdown of
revenue by customer segment showing growth versus last quarter, with visual output
and any interesting patterns.

Before we do any analysis:
1. Describe this dataset — row count, columns, data types, any quality issues
2. Tell me: is there anything in the data that I should know before we start
   that might affect the analysis?
3. Confirm: are the dates spanning what appears to be multiple fiscal quarters?
   My fiscal quarters end on the last day of March, June, September, December.

Do not start the analysis yet — just give me the dataset description and flag
anything I need to decide.

Why this approach: Raj does not rush into analysis. He always wants to understand what he is working with first. Data quality issues found mid-analysis are expensive; found at the start, they are cheap.

ChatGPT's Response (9:54 AM)

ChatGPT ran code, returned:

Dataset: 50,247 rows, 9 columns

Data types confirmed:
- transaction_id: object (string)
- transaction_date: date (parsed successfully, range: 2024-01-03 to 2024-12-29)
- customer_id: object (1,847 unique customers)
- customer_segment: object (4 categories)
- product_line: object (4 categories)
- transaction_type: object (4 categories)
- amount_usd: float64 (range: -$47,200 to $284,500 — note negative values from Churn Credits)
- rep_region: object (3 categories)
- contract_length_months: int64 (values: 1, 3, 6, 12, 24)

Quality issues found:
1. customer_segment has 1,634 rows (3.25%) with value "Unknown" — these are real
   transactions but unclassified. Decision needed: exclude or bucket?
2. amount_usd has negative values — these appear to be Churn Credits based on
   transaction_type. Do you want to include these in revenue totals (net revenue)
   or exclude them (gross revenue)?
3. 14 rows have transaction_date values in the first two days of January 2024 that
   may belong to Q4 2023 if your fiscal year is calendar-aligned. Minimal impact
   (0.03% of rows) but flagging for accuracy.

Fiscal quarters confirmed:
The data spans January through December 2024, covering all four fiscal quarters:
- Q1 2024: Jan 1 - Mar 31 (12,614 rows)
- Q2 2024: Apr 1 - Jun 30 (13,108 rows)
- Q3 2024: Jul 1 - Sep 30 (12,847 rows)
- Q4 2024: Oct 1 - Dec 29 (11,678 rows)
Note: Q4 ends Dec 29 in this dataset, suggesting some Dec 30-31 transactions
may be missing. Worth flagging to your data team.

Raj's Decision Message (9:57 AM)

Good catches. Here are my decisions:
1. Include "Unknown" as a fourth segment — bucket them as "Unclassified"
2. Use net revenue (include Churn Credits, they represent real economics)
3. The two January rows — include them in Q1
4. I'll flag the missing Dec 30-31 to the data team separately

Now: primary analysis. Focus on Q3 vs Q4 2024 (our most recent quarter comparison).
Give me:
a) Total net revenue by customer segment, Q3 vs Q4, with quarter-over-quarter % change
b) Product line breakdown within each segment for Q4 only
c) Any pattern that stands out as notable or unusual — something a CFO would ask about

Then generate a bar chart showing Q3 vs Q4 revenue by segment, side-by-side.
Use color for segments. Professional chart formatting.

Why this structure: Breaking the request into labeled parts (a, b, c) dramatically reduces the chance ChatGPT will miss one element. Raj also flagged the audience ("something a CFO would ask about") to calibrate what "notable" means.

ChatGPT's Analysis Response (10:03 AM)

ChatGPT ran the analysis and returned:

Table: Net Revenue by Segment, Q3 vs Q4 2024

Segment         Q3 Revenue    Q4 Revenue    QoQ Change
Enterprise      $4,847,200    $5,231,400    +7.9%
Mid-Market      $2,134,600    $2,089,300    -2.1%
SMB             $891,400      $743,200      -16.6%
Unclassified    $187,300      $142,100      -24.1%
Total           $8,060,500    $8,206,000    +1.8%

Product Line Mix in Q4 by Segment:

Product Line          Enterprise  Mid-Market   SMB      Unclassified
Core Platform         58%         71%          84%      N/A
Analytics Add-On      27%         18%          9%       N/A
Integrations Pack     11%         8%           7%       N/A
Professional Services  4%         3%           0%       N/A

Notable Patterns:

"Three patterns stand out:

  1. SMB deterioration is significant and concentrated: The -16.6% QoQ decline in SMB is large. Looking at transaction types within SMB Q4: Churn Credits represent 18.3% of SMB gross revenue (vs 11.2% in Q3). This is a churn acceleration, not a new business slowdown. This is the pattern most likely to draw CFO attention.

  2. Enterprise growth is driven by expansion, not new business: Of the $384K Enterprise revenue increase Q3 to Q4, $298K came from Expansion transactions and $86K from New Business. This is a healthy signal (existing customers growing) but may indicate the new business pipeline needs attention.

  3. The Analytics Add-On shows interesting segment correlation: Adoption drops sharply as you move from Enterprise (27%) to SMB (9%). This may be an opportunity in Mid-Market (currently 18%) or a signal that SMB customers find it harder to deploy. Worth a follow-up question."

The chart was generated as a professional side-by-side bar chart with the four segments color-coded and clear Q3/Q4 labeling.

Raj's Follow-Up (10:09 AM)

Good. The SMB churn pattern is important. Two follow-ups:
1. Can you show me the SMB churn by month within Q4 — was it early, late,
   or distributed?
2. Can you show me average deal size for SMB new business, Q3 vs Q4?
   I want to know if we're losing customers or losing big customers.

ChatGPT's Response (10:13 AM)

SMB Churn Credits by Month in Q4:
- October: $38,400 (27% of Q4 churn total)
- November: $41,200 (29% of Q4 churn total)
- December: $62,900 (44% of Q4 churn total)

Pattern: Churn accelerated through the quarter, with December alone representing
nearly half of Q4 SMB churn. This is not a slow bleed — it concentrated at
quarter-end.

SMB New Business Deal Size:
- Q3 average: $3,847 (n=47 deals)
- Q4 average: $2,914 (n=41 deals)
- Change: -24.2% smaller average deal, 13% fewer deals

Interpretation: Both dimensions are moving in the wrong direction. SMB is losing
both volume and deal quality in new business while churn is accelerating. This
combination suggests a systemic issue (pricing, product fit, competitive pressure,
or go-to-market) rather than a one-quarter anomaly.

Raj's Final Request (10:17 AM)

Perfect. Now write me a three-paragraph executive summary of the Q3 vs Q4
analysis that my director can use in the business review. Audience: CFO and
executive team. Tone: factual, direct, no fluff. Flag the SMB situation clearly.
Do not hedge or soften — if the data shows a problem, say it's a problem.
Plain text, no markdown formatting.

ChatGPT's Summary (10:19 AM)

Net revenue for Q4 2024 reached $8.2M, a 1.8% increase over Q3's $8.1M, driven
entirely by Enterprise growth (+7.9% QoQ). Enterprise revenue improvement came
primarily from expansion transactions, reflecting healthy existing customer
relationships. Mid-Market was essentially flat at -2.1%.

The SMB segment requires immediate attention. SMB net revenue declined 16.6% in Q4,
driven by a churn acceleration — December alone accounted for 44% of the quarter's
total SMB churn credits. Simultaneously, SMB new business declined in both volume
(41 deals in Q4 versus 47 in Q3) and average deal size ($2,914 versus $3,847). These
two trends together — rising churn and shrinking new business quality — indicate a
systemic problem in the SMB segment, not a one-quarter fluctuation.

The Analytics Add-On attachment rate also reveals a potential opportunity or risk.
Enterprise customers adopt at 27% versus 9% for SMB, with Mid-Market at 18%. If
SMB customers cannot successfully implement the Analytics Add-On, this may be
contributing to churn. Alternatively, there may be an expansion opportunity in
Mid-Market that is not being fully pursued. Both questions are worth bringing to
the sales and customer success teams for context before drawing conclusions.

What Happened in the Meeting

Raj's director sent the summary and the chart to the CFO at 10:45 AM. The CFO's response: "This is exactly what I needed. The SMB story needs to be on the agenda. Can someone set up a deeper dive for next week?"

The QBR was the starting point for a longer investigation into SMB churn that ultimately identified a pricing change implemented in September as the primary driver — a finding that took a week of additional analysis to establish, but one that began with the 90 minutes Raj spent with Advanced Data Analysis that Tuesday morning.


The Analysis Debrief

What Advanced Data Analysis Handled Well

Data profiling and quality flagging: ChatGPT found the quality issues (Unknown segment, negative values, missing December rows) that Raj would have found anyway — but it found them in minutes rather than hours.

Accurate aggregation: The QoQ calculations and product mix percentages were accurate. Raj verified three data points manually against raw SQL queries. All matched.

Pattern identification: The SMB churn concentration in December and the deal-size decline were patterns Raj might have found, but they required additional queries. ChatGPT surfaced them as part of the "notable patterns" request.

Executive communication: The three-paragraph summary required minimal editing before use.

What Required Human Judgment

Interpretation and strategic context: "Both questions are worth bringing to the sales and customer success teams" — ChatGPT correctly identified what it did not know (why the pattern was happening). It did not invent a cause. Raj added two sentences of organizational context to the summary before his director sent it.

The right questions: The follow-up questions about SMB churn timing and deal size came from Raj's domain knowledge. ChatGPT answered the questions; it did not know to ask them.

Verification: Raj checked three calculations manually. In this case, all were correct. The habit of verification is what makes the trust appropriate — not blind confidence in the outputs.

The "so what" for the organization: The connection between the Analytics Add-On adoption gap and potential SMB churn was flagged by ChatGPT as a hypothesis worth investigating. Raj knew from organizational context that there was a known implementation challenge for SMB customers that made this hypothesis plausible rather than speculative.


Transferable Principles

Front-load the data description step. The opening message — "describe the data, find quality issues, do not start analysis yet" — is a discipline that pays consistent dividends. Issues found before analysis cost almost nothing. Issues found after analysis cost everything.

Make decisions explicitly. Raj's decision message documented each quality issue and stated his choice. This creates a record of the analytical decisions and prevents ChatGPT from making assumptions.

Name your audience. "Something a CFO would ask about" is more specific than "interesting patterns" and produced more targeted output.

Use labeled sub-requests for complex asks. Breaking the analysis into labeled items (a, b, c) prevents items from being skipped or collapsed together.

Verify before presenting. Raj verified three numbers manually. Two minutes of verification protects two hours of work.

Plain-text output for executive communication. The summary prompt explicitly requested plain text. The result required no reformatting before use in an email and presentation.