Capstone Project 2: Business Analytics Report

Project Overview

You are a data analyst at a mid-size company. Your VP of Operations has asked you to analyze a dataset related to business performance and produce a report with actionable recommendations. The audience is senior leadership — people who care about results, not formulas. Your job is to translate data into decisions.

This project requires the full range of statistical skills from the course, but with a business lens: the emphasis is on practical significance, clear communication, and recommendations that a non-technical executive could act on.

Estimated time: 15-25 hours over 2-3 weeks

Deliverables: 1. A Jupyter notebook containing all code and technical analysis 2. A 3-page executive report written for senior leadership 3. A slide deck (5-7 slides) presenting key findings


Step 1: Choose Your Dataset and Business Question

Select a publicly available business dataset from one of the following sources (or propose an alternative with your instructor's approval):

Recommended Data Sources: - Kaggle Business Datasets: Retail sales data, customer churn datasets, marketing campaign data, e-commerce transaction logs, employee attrition datasets. Search for datasets with at least 1,000 rows and a mix of numerical and categorical variables. - UCI Machine Learning Repository: Several classic business datasets — the "Online Retail" dataset, "Bank Marketing" dataset, "Adult Income" dataset. - Google Dataset Search (datasetsearch.research.google.com): Search for retail, marketing, HR, or operations datasets. - IBM HR Analytics Dataset: Employee attrition and performance data (available on Kaggle). - Superstore Sales Dataset: A widely used retail dataset with sales, profit, shipping, and customer segment data. - U.S. Census Bureau / Bureau of Labor Statistics: Economic and labor market data.

Your business question must: - Be specific enough to guide analysis but broad enough to require multiple techniques - Have clear implications for a business decision - Involve at least one comparison between groups or one predictive relationship - Be answerable with the data available (don't promise what the data can't deliver)

Example business questions (choose your own): - Which customer segments are most profitable, and what distinguishes high-value customers from low-value ones? - Does the new marketing campaign lead to significantly higher conversion rates compared to the control group? Is the difference large enough to justify the cost? - What factors best predict employee attrition, and which departments are most at risk? - Is there a significant relationship between discount depth and profit margin? At what point do discounts hurt more than they help? - Do shipping method and product category interact to affect return rates?

Deliverable for this step: A 200-300 word brief including your business question, the dataset, why this question matters to the business, and your initial hypothesis.


Step 2: Data Acquisition and Preparation

Download your dataset and prepare it for analysis. Document every step in your Jupyter notebook.

Required tasks:

  1. Load and profile the data. Report dimensions, variable types, and basic summary statistics. Identify the "shape" of the dataset — how many observations, how many features, what time period does it cover?

  2. Create a data dictionary. For each variable you plan to use: - Variable name and business meaning - Type (categorical or numerical, and subtype) - Units and scale - Any known data quality issues

  3. Assess data quality. Report: - Missing values by variable (count and percentage) - Outliers or implausible values (e.g., negative quantities, prices of $0, dates in the future) - Duplicate transactions or records - Inconsistencies (e.g., a customer listed in two different segments)

  4. Clean and transform the data. For each cleaning decision, document your reasoning: - How did you handle missing values? Why? - Did you remove outliers? What threshold did you use and why? - Did you create new variables? (e.g., profit margin from revenue and cost, customer tenure from signup date) - Did you aggregate data? (e.g., monthly totals, per-customer summaries)

  5. Create analysis-ready subsets. If your analysis requires comparing groups (e.g., campaign A vs. B, churned vs. retained), create clearly labeled subsets and verify they're balanced enough for comparison.

Rubric focus for this step: Data handling, reproducibility.


Step 3: Exploratory Data Analysis

Explore the data before running formal analyses. Your EDA should tell a story about the business landscape.

Required elements:

  1. Key performance indicators (KPIs). Calculate and present the business metrics that matter most for your question: - Revenue, profit, margins, conversion rates, retention rates, or other relevant KPIs - Present these with appropriate measures of center and spread - Show how KPIs vary across segments, time periods, or categories

  2. Distribution analysis. For each key numerical variable: - Histogram or density plot with commentary on shape - Box plot comparing distributions across groups - Five-number summary and standard deviation - Flag any variables that are heavily skewed (this matters for your later analysis choices)

  3. Relationship exploration. For your key relationships: - Scatterplots for numerical-numerical relationships - Side-by-side box plots or violin plots for numerical-categorical relationships - Contingency tables or heatmaps for categorical-categorical relationships - Correlation matrix for multiple numerical variables

  4. Time-based patterns (if your data includes dates): - Line charts showing trends over time - Seasonal patterns or day-of-week effects - Before/after comparisons if a business change occurred during the data period

  5. EDA narrative. Write 2-3 paragraphs summarizing the business landscape your data reveals. What are the key patterns? What's surprising? What early hypotheses does the EDA suggest?

Rubric focus for this step: Visualization, interpretation.


Step 4: Statistical Analysis

Conduct rigorous statistical analysis to answer your business question. You must include at least three of the following four analysis types:

4A: A/B Test or Group Comparison

Compare two or more groups to determine whether differences are statistically and practically significant.

Required elements: - Clearly define the groups being compared and the metric of interest - State null and alternative hypotheses - Choose the appropriate test: - Two-sample t-test (for comparing means of two independent groups) - Paired t-test (for before/after or matched comparisons) - Two-proportion z-test (for comparing rates or percentages) - ANOVA (for comparing means across three or more groups) - Chi-square test of independence (for categorical outcomes) - Verify conditions and assumptions - Report the test statistic, p-value, and confidence interval for the difference - Calculate and interpret the effect size - Business interpretation: Is the difference large enough to matter for the business? Quantify the impact in business terms (e.g., "the campaign group spent an average of $12.40 more per transaction, 95% CI [$8.20, $16.60], which would translate to approximately $186,000 in additional annual revenue based on current customer volume")

4B: Regression Analysis

Build a predictive model to identify drivers of a key business outcome.

Required elements: - Start with simple linear regression if appropriate, then extend to multiple regression - Interpret coefficients in business terms ("each additional day of shipping delay is associated with a 2.3 percentage point increase in return probability, holding product category and order value constant") - Report r-squared (or adjusted r-squared) and discuss model fit - Check assumptions using residual plots - If multicollinearity is a concern, report VIF values - If your outcome is binary (churn/no churn, buy/no buy), use logistic regression and report odds ratios - Business interpretation: Which factors matter most? What levers can the business actually pull?

4C: Segmentation Analysis

Identify meaningful groups within the data and compare their characteristics.

Required elements: - Define segments based on business logic or data-driven criteria (e.g., customer value tiers, product categories, geographic regions) - Compare segments on key metrics using appropriate tests (t-tests, ANOVA, chi-square) - Report descriptive statistics by segment - Create visualizations that clearly show segment differences - Calculate effect sizes for key comparisons - Business interpretation: Which segments deserve the most attention? Where is the biggest opportunity or risk?

4D: Trend or Forecast Analysis

Analyze patterns over time to inform future planning.

Required elements: - Visualize the trend with appropriate time-series plots - Test for significant changes (before/after a policy change, year-over-year comparison) - If appropriate, fit a regression model with time as a predictor - Use bootstrap methods to quantify uncertainty in your trend estimates - Business interpretation: Where is the business headed? What actions could change the trajectory?

Rubric focus for this step: Statistical analysis, question formulation.


Step 5: Business Recommendations

Translate your statistical findings into actionable business recommendations. This is where your analysis becomes useful.

Required elements (1-2 pages in notebook):

  1. Key findings summary. List your 3-5 most important findings, each stated in one sentence of plain language. Lead with the finding that has the biggest business impact.

  2. Recommended actions. For each key finding, state: - What should the business do differently? - What is the expected impact (quantified where possible)? - What is the confidence level in this recommendation? - What are the risks of acting — and of not acting?

  3. Limitations and caveats. Be transparent about: - What the data can and cannot support - Confounding variables that might explain your results - Whether your findings support causal claims or only correlations - Sample size limitations or data quality concerns - External factors not captured in the data

  4. Suggested next steps. What additional data or analysis would strengthen these recommendations? What experiments could the business run to test your hypotheses?

Rubric focus for this step: Interpretation, communication.


Step 6: Executive Report

Write a 3-page (approximately 1,200-1,500 word) executive report presenting your analysis and recommendations.

Format and style requirements: - Use a professional business report format with clear section headers - Open with a one-paragraph executive summary stating the question, key finding, and top recommendation - Use bullet points and numbered lists for readability - Include 3-4 well-designed visualizations embedded in the report (not screenshots from your notebook — polished, labeled, publication-ready) - Avoid statistical jargon. Instead of "the two-sample t-test yielded p = 0.008," write "customers in the new campaign group spent significantly more per transaction, and the difference was large enough that random chance is unlikely to explain it." - End with a clear "Recommended Actions" section - Include a brief "Methodology" note (2-3 sentences) at the end for anyone who wants to understand how the analysis was done

This is the document leadership would actually read. Make it count.

Rubric focus for this step: Communication.


Step 7: Slide Deck

Create a 5-7 slide presentation of your findings, as if you were presenting to the executive team in a 10-minute meeting.

Suggested slide structure: 1. Title slide: Project title, your name, date 2. The question: What business problem are you addressing and why does it matter? 3. Key finding 1: The most important result, with one clear visualization 4. Key finding 2: The second most important result, with one clear visualization 5. Supporting analysis: Any additional results that strengthen your case 6. Recommendations: What should the company do? Expected impact? 7. Next steps: What additional data or experiments do you recommend?

Design principles: - One main idea per slide - Minimal text — let the visuals do the work - Every chart must have a clear takeaway stated in the slide title (e.g., "Customers who receive same-day shipping return 40% fewer items" — not "Return rate by shipping method")

Rubric focus for this step: Communication, visualization.


Step 8: Ethical Considerations

Even business data raises ethical questions. Address the following in a dedicated section of your notebook (at least half a page):

  1. Data privacy. Does your dataset contain personally identifiable information? If using customer data, what privacy protections are in place? Would customers expect their data to be used this way?

  2. Fairness and bias. Could your recommendations disproportionately affect certain customer segments, employees, or communities? If your analysis involves demographic variables, are your comparisons fair and responsible?

  3. Responsible use of results. Could your findings be used to justify harmful practices (e.g., discriminatory pricing, unfair employment practices, manipulative marketing)? What guardrails would you recommend?

  4. Transparency. Are your methods transparent enough that someone could challenge your findings? Have you presented uncertainty honestly, or only highlighted results that tell a convenient story?

Rubric focus for this step: Ethics.


Step 9: Reproducibility Check

Before submitting, verify that your work is fully reproducible.

Checklist: - [ ] The Jupyter notebook runs from top to bottom without errors ("Restart and Run All") - [ ] All data files are included or download instructions are provided - [ ] All library imports are at the top of the notebook - [ ] Data cleaning steps are documented and justified - [ ] Random seeds are set for any bootstrap or simulation - [ ] All figures have titles, axis labels, legends, and source annotations - [ ] Code cells include comments - [ ] Variable names are descriptive (not x1, temp2, df_final_v3)

Rubric focus for this step: Reproducibility.


Project Structure

Organize your Jupyter notebook with the following section headers:

1. Business Question and Context
2. Data Description and Dictionary
3. Data Cleaning and Preparation
4. Exploratory Data Analysis
5. Statistical Analysis
   5a. Group Comparison / A/B Test
   5b. Regression Analysis
   5c. Segmentation Analysis
   5d. Trend / Forecast Analysis
6. Business Recommendations
7. Limitations and Caveats
8. Ethical Considerations
9. References

Submit the following files: - capstone_business_analytics.ipynb — your complete Jupyter notebook - executive_report.pdf — your 3-page executive report - presentation.pdf (or .pptx) — your 5-7 slide deck - Any data files needed to run the notebook


Tips for Success

  • Think like a business person, not a student. Your audience doesn't care that you ran an ANOVA — they care whether the East region is underperforming and what to do about it. Lead with the business insight, then back it up with the statistics.

  • Quantify the impact. "Statistically significant" isn't enough. "Significant, with an estimated annual impact of $340,000" is what gets action. Translate your findings into dollars, customers, time, or whatever metric the business cares about.

  • One chart, one message. Every visualization should make exactly one point, and that point should be obvious within 5 seconds. If it takes longer, simplify the chart.

  • Acknowledge what you don't know. The strongest business reports are honest about limitations. "This analysis cannot determine whether the new layout caused higher sales because it was not a randomized experiment" is a sign of sophistication, not weakness.

  • Use the rubric. It's included in this section. Read it before you start.

  • Proofread the executive report. Typos and sloppy formatting undermine credibility. If you're presenting to a VP, present like a professional.


Assessment

This project is assessed using the Capstone Rubric provided in this section. The rubric evaluates eight criteria:

Criterion Weight
Question Formulation 10%
Data Handling 10%
Statistical Analysis 25%
Visualization 15%
Interpretation 15%
Ethics 5%
Communication 15%
Reproducibility 5%

Total: 100%

Note: The weighting for this project gives extra weight to Communication and Visualization (compared to the Public Health project) because the business context demands clear, audience-aware presentation.

See the detailed rubric for performance level descriptions (Excellent / Good / Developing / Needs Improvement) for each criterion.