Capstone Project 2: Business Analytics Report
Project Overview
You are a data analyst at a mid-size company. Your VP of Operations has asked you to analyze a dataset related to business performance and produce a report with actionable recommendations. The audience is senior leadership — people who care about results, not formulas. Your job is to translate data into decisions.
This project requires the full range of statistical skills from the course, but with a business lens: the emphasis is on practical significance, clear communication, and recommendations that a non-technical executive could act on.
Estimated time: 15-25 hours over 2-3 weeks
Deliverables: 1. A Jupyter notebook containing all code and technical analysis 2. A 3-page executive report written for senior leadership 3. A slide deck (5-7 slides) presenting key findings
Step 1: Choose Your Dataset and Business Question
Select a publicly available business dataset from one of the following sources (or propose an alternative with your instructor's approval):
Recommended Data Sources: - Kaggle Business Datasets: Retail sales data, customer churn datasets, marketing campaign data, e-commerce transaction logs, employee attrition datasets. Search for datasets with at least 1,000 rows and a mix of numerical and categorical variables. - UCI Machine Learning Repository: Several classic business datasets — the "Online Retail" dataset, "Bank Marketing" dataset, "Adult Income" dataset. - Google Dataset Search (datasetsearch.research.google.com): Search for retail, marketing, HR, or operations datasets. - IBM HR Analytics Dataset: Employee attrition and performance data (available on Kaggle). - Superstore Sales Dataset: A widely used retail dataset with sales, profit, shipping, and customer segment data. - U.S. Census Bureau / Bureau of Labor Statistics: Economic and labor market data.
Your business question must: - Be specific enough to guide analysis but broad enough to require multiple techniques - Have clear implications for a business decision - Involve at least one comparison between groups or one predictive relationship - Be answerable with the data available (don't promise what the data can't deliver)
Example business questions (choose your own): - Which customer segments are most profitable, and what distinguishes high-value customers from low-value ones? - Does the new marketing campaign lead to significantly higher conversion rates compared to the control group? Is the difference large enough to justify the cost? - What factors best predict employee attrition, and which departments are most at risk? - Is there a significant relationship between discount depth and profit margin? At what point do discounts hurt more than they help? - Do shipping method and product category interact to affect return rates?
Deliverable for this step: A 200-300 word brief including your business question, the dataset, why this question matters to the business, and your initial hypothesis.
Step 2: Data Acquisition and Preparation
Download your dataset and prepare it for analysis. Document every step in your Jupyter notebook.
Required tasks:
-
Load and profile the data. Report dimensions, variable types, and basic summary statistics. Identify the "shape" of the dataset — how many observations, how many features, what time period does it cover?
-
Create a data dictionary. For each variable you plan to use: - Variable name and business meaning - Type (categorical or numerical, and subtype) - Units and scale - Any known data quality issues
-
Assess data quality. Report: - Missing values by variable (count and percentage) - Outliers or implausible values (e.g., negative quantities, prices of $0, dates in the future) - Duplicate transactions or records - Inconsistencies (e.g., a customer listed in two different segments)
-
Clean and transform the data. For each cleaning decision, document your reasoning: - How did you handle missing values? Why? - Did you remove outliers? What threshold did you use and why? - Did you create new variables? (e.g., profit margin from revenue and cost, customer tenure from signup date) - Did you aggregate data? (e.g., monthly totals, per-customer summaries)
-
Create analysis-ready subsets. If your analysis requires comparing groups (e.g., campaign A vs. B, churned vs. retained), create clearly labeled subsets and verify they're balanced enough for comparison.
Rubric focus for this step: Data handling, reproducibility.
Step 3: Exploratory Data Analysis
Explore the data before running formal analyses. Your EDA should tell a story about the business landscape.
Required elements:
-
Key performance indicators (KPIs). Calculate and present the business metrics that matter most for your question: - Revenue, profit, margins, conversion rates, retention rates, or other relevant KPIs - Present these with appropriate measures of center and spread - Show how KPIs vary across segments, time periods, or categories
-
Distribution analysis. For each key numerical variable: - Histogram or density plot with commentary on shape - Box plot comparing distributions across groups - Five-number summary and standard deviation - Flag any variables that are heavily skewed (this matters for your later analysis choices)
-
Relationship exploration. For your key relationships: - Scatterplots for numerical-numerical relationships - Side-by-side box plots or violin plots for numerical-categorical relationships - Contingency tables or heatmaps for categorical-categorical relationships - Correlation matrix for multiple numerical variables
-
Time-based patterns (if your data includes dates): - Line charts showing trends over time - Seasonal patterns or day-of-week effects - Before/after comparisons if a business change occurred during the data period
-
EDA narrative. Write 2-3 paragraphs summarizing the business landscape your data reveals. What are the key patterns? What's surprising? What early hypotheses does the EDA suggest?
Rubric focus for this step: Visualization, interpretation.
Step 4: Statistical Analysis
Conduct rigorous statistical analysis to answer your business question. You must include at least three of the following four analysis types:
4A: A/B Test or Group Comparison
Compare two or more groups to determine whether differences are statistically and practically significant.
Required elements: - Clearly define the groups being compared and the metric of interest - State null and alternative hypotheses - Choose the appropriate test: - Two-sample t-test (for comparing means of two independent groups) - Paired t-test (for before/after or matched comparisons) - Two-proportion z-test (for comparing rates or percentages) - ANOVA (for comparing means across three or more groups) - Chi-square test of independence (for categorical outcomes) - Verify conditions and assumptions - Report the test statistic, p-value, and confidence interval for the difference - Calculate and interpret the effect size - Business interpretation: Is the difference large enough to matter for the business? Quantify the impact in business terms (e.g., "the campaign group spent an average of $12.40 more per transaction, 95% CI [$8.20, $16.60], which would translate to approximately $186,000 in additional annual revenue based on current customer volume")
4B: Regression Analysis
Build a predictive model to identify drivers of a key business outcome.
Required elements: - Start with simple linear regression if appropriate, then extend to multiple regression - Interpret coefficients in business terms ("each additional day of shipping delay is associated with a 2.3 percentage point increase in return probability, holding product category and order value constant") - Report r-squared (or adjusted r-squared) and discuss model fit - Check assumptions using residual plots - If multicollinearity is a concern, report VIF values - If your outcome is binary (churn/no churn, buy/no buy), use logistic regression and report odds ratios - Business interpretation: Which factors matter most? What levers can the business actually pull?
4C: Segmentation Analysis
Identify meaningful groups within the data and compare their characteristics.
Required elements: - Define segments based on business logic or data-driven criteria (e.g., customer value tiers, product categories, geographic regions) - Compare segments on key metrics using appropriate tests (t-tests, ANOVA, chi-square) - Report descriptive statistics by segment - Create visualizations that clearly show segment differences - Calculate effect sizes for key comparisons - Business interpretation: Which segments deserve the most attention? Where is the biggest opportunity or risk?
4D: Trend or Forecast Analysis
Analyze patterns over time to inform future planning.
Required elements: - Visualize the trend with appropriate time-series plots - Test for significant changes (before/after a policy change, year-over-year comparison) - If appropriate, fit a regression model with time as a predictor - Use bootstrap methods to quantify uncertainty in your trend estimates - Business interpretation: Where is the business headed? What actions could change the trajectory?
Rubric focus for this step: Statistical analysis, question formulation.
Step 5: Business Recommendations
Translate your statistical findings into actionable business recommendations. This is where your analysis becomes useful.
Required elements (1-2 pages in notebook):
-
Key findings summary. List your 3-5 most important findings, each stated in one sentence of plain language. Lead with the finding that has the biggest business impact.
-
Recommended actions. For each key finding, state: - What should the business do differently? - What is the expected impact (quantified where possible)? - What is the confidence level in this recommendation? - What are the risks of acting — and of not acting?
-
Limitations and caveats. Be transparent about: - What the data can and cannot support - Confounding variables that might explain your results - Whether your findings support causal claims or only correlations - Sample size limitations or data quality concerns - External factors not captured in the data
-
Suggested next steps. What additional data or analysis would strengthen these recommendations? What experiments could the business run to test your hypotheses?
Rubric focus for this step: Interpretation, communication.
Step 6: Executive Report
Write a 3-page (approximately 1,200-1,500 word) executive report presenting your analysis and recommendations.
Format and style requirements: - Use a professional business report format with clear section headers - Open with a one-paragraph executive summary stating the question, key finding, and top recommendation - Use bullet points and numbered lists for readability - Include 3-4 well-designed visualizations embedded in the report (not screenshots from your notebook — polished, labeled, publication-ready) - Avoid statistical jargon. Instead of "the two-sample t-test yielded p = 0.008," write "customers in the new campaign group spent significantly more per transaction, and the difference was large enough that random chance is unlikely to explain it." - End with a clear "Recommended Actions" section - Include a brief "Methodology" note (2-3 sentences) at the end for anyone who wants to understand how the analysis was done
This is the document leadership would actually read. Make it count.
Rubric focus for this step: Communication.
Step 7: Slide Deck
Create a 5-7 slide presentation of your findings, as if you were presenting to the executive team in a 10-minute meeting.
Suggested slide structure: 1. Title slide: Project title, your name, date 2. The question: What business problem are you addressing and why does it matter? 3. Key finding 1: The most important result, with one clear visualization 4. Key finding 2: The second most important result, with one clear visualization 5. Supporting analysis: Any additional results that strengthen your case 6. Recommendations: What should the company do? Expected impact? 7. Next steps: What additional data or experiments do you recommend?
Design principles: - One main idea per slide - Minimal text — let the visuals do the work - Every chart must have a clear takeaway stated in the slide title (e.g., "Customers who receive same-day shipping return 40% fewer items" — not "Return rate by shipping method")
Rubric focus for this step: Communication, visualization.
Step 8: Ethical Considerations
Even business data raises ethical questions. Address the following in a dedicated section of your notebook (at least half a page):
-
Data privacy. Does your dataset contain personally identifiable information? If using customer data, what privacy protections are in place? Would customers expect their data to be used this way?
-
Fairness and bias. Could your recommendations disproportionately affect certain customer segments, employees, or communities? If your analysis involves demographic variables, are your comparisons fair and responsible?
-
Responsible use of results. Could your findings be used to justify harmful practices (e.g., discriminatory pricing, unfair employment practices, manipulative marketing)? What guardrails would you recommend?
-
Transparency. Are your methods transparent enough that someone could challenge your findings? Have you presented uncertainty honestly, or only highlighted results that tell a convenient story?
Rubric focus for this step: Ethics.
Step 9: Reproducibility Check
Before submitting, verify that your work is fully reproducible.
Checklist:
- [ ] The Jupyter notebook runs from top to bottom without errors ("Restart and Run All")
- [ ] All data files are included or download instructions are provided
- [ ] All library imports are at the top of the notebook
- [ ] Data cleaning steps are documented and justified
- [ ] Random seeds are set for any bootstrap or simulation
- [ ] All figures have titles, axis labels, legends, and source annotations
- [ ] Code cells include comments
- [ ] Variable names are descriptive (not x1, temp2, df_final_v3)
Rubric focus for this step: Reproducibility.
Project Structure
Organize your Jupyter notebook with the following section headers:
1. Business Question and Context
2. Data Description and Dictionary
3. Data Cleaning and Preparation
4. Exploratory Data Analysis
5. Statistical Analysis
5a. Group Comparison / A/B Test
5b. Regression Analysis
5c. Segmentation Analysis
5d. Trend / Forecast Analysis
6. Business Recommendations
7. Limitations and Caveats
8. Ethical Considerations
9. References
Submit the following files:
- capstone_business_analytics.ipynb — your complete Jupyter notebook
- executive_report.pdf — your 3-page executive report
- presentation.pdf (or .pptx) — your 5-7 slide deck
- Any data files needed to run the notebook
Tips for Success
-
Think like a business person, not a student. Your audience doesn't care that you ran an ANOVA — they care whether the East region is underperforming and what to do about it. Lead with the business insight, then back it up with the statistics.
-
Quantify the impact. "Statistically significant" isn't enough. "Significant, with an estimated annual impact of $340,000" is what gets action. Translate your findings into dollars, customers, time, or whatever metric the business cares about.
-
One chart, one message. Every visualization should make exactly one point, and that point should be obvious within 5 seconds. If it takes longer, simplify the chart.
-
Acknowledge what you don't know. The strongest business reports are honest about limitations. "This analysis cannot determine whether the new layout caused higher sales because it was not a randomized experiment" is a sign of sophistication, not weakness.
-
Use the rubric. It's included in this section. Read it before you start.
-
Proofread the executive report. Typos and sloppy formatting undermine credibility. If you're presenting to a VP, present like a professional.
Assessment
This project is assessed using the Capstone Rubric provided in this section. The rubric evaluates eight criteria:
| Criterion | Weight |
|---|---|
| Question Formulation | 10% |
| Data Handling | 10% |
| Statistical Analysis | 25% |
| Visualization | 15% |
| Interpretation | 15% |
| Ethics | 5% |
| Communication | 15% |
| Reproducibility | 5% |
Total: 100%
Note: The weighting for this project gives extra weight to Communication and Visualization (compared to the Public Health project) because the business context demands clear, audience-aware presentation.
See the detailed rubric for performance level descriptions (Excellent / Good / Developing / Needs Improvement) for each criterion.