Key Takeaways: Capstone Project

This is your reference card for Chapter 35 — the capstone. Use it as a checklist during your project to make sure you're hitting all the required elements.


Key Concepts

  • The capstone is integration, not new learning. You already have every skill you need. The capstone tests whether you can combine those skills into a coherent investigation — from question to conclusion, with every step documented and justified.

  • A capstone is not a homework assignment. Homework has a predetermined question, provided data, and a known answer. The capstone requires you to make every decision yourself: the question, the data strategy, the analytical approach, the interpretation, and the communication. That's what makes it hard. That's also what makes it valuable.

  • Communication is half the project. A capstone that runs perfect models but has no narrative, no interpretation, and no written conclusions is incomplete. The notebook should read like a story, not a code dump.

  • Limitations and ethics are required, not optional. The capstone rubric allocates 10 points (out of 24) to critical reflection. This reflects a genuine belief: a data scientist who doesn't think about what their analysis can't tell you, or about the human consequences of their work, is not yet a complete data scientist.

  • Done is better than perfect. A completed capstone at 85% quality is infinitely better than an unfinished one at 100%. Set a deadline and meet it.


Three Capstone Options

Option Topic Data Best For
A Global vaccination rate disparities WHO + World Bank (provided through progressive milestones) Students who completed progressive milestones
B Small bakery business analytics Simulated POS data + NOAA weather Students interested in business analytics
C NBA three-point revolution Basketball Reference team stats Students interested in sports analytics
Custom Your own topic Your own sources Students with a specific domain interest

Required Deliverables

The Notebook (10 sections)

1. Title and Abstract ............... 200-300 words
2. Introduction and Motivation ...... 500-800 words
3. Data Description ................. 400-600 words + code
4. Data Cleaning .................... 600-1000 words + code
5. Exploratory Analysis ............. 800-1200 words + 4-6 charts
6. Statistical Analysis/Modeling .... 800-1200 words + 2-4 charts
7. Findings and Conclusions ......... 500-800 words
8. Limitations and Future Work ...... 300-500 words
9. Ethical Reflection ............... 300-500 words
10. References ...................... Complete citations

The Repository

project-name/
    README.md           # Overview, findings, reproduction instructions
    notebooks/          # The capstone notebook
    data/raw/           # Original data (or download instructions)
    data/processed/     # Cleaned data
    figures/            # Saved key visualizations
    requirements.txt    # Dependencies with versions
    .gitignore          # Exclude checkpoints, caches, large files

The Rubric (24 points total)

Dimension Points What "Excellent" (4/4) Looks Like
Question and Motivation 4 Specific, interesting question; compelling motivation; clear scope
Data Handling 4 Documented sources; 3+ cleaning decisions justified; summary stats
Exploration and Visualization 4 4+ polished charts with titles, labels, and written interpretation
Statistical Analysis/Modeling 4 Appropriate methods; assumptions checked; proper evaluation; honest interpretation
Communication and Narrative 4 Reads as a story; accessible to non-technical readers; conclusion answers the question
Critical Reflection 4 Specific limitations; genuine ethical engagement; concrete future work
Total Score Assessment
22-24 Exceptional — portfolio-ready
18-21 Strong — solid competence
14-17 Satisfactory — needs strengthening
Below 14 Developing — significant revision needed

Milestone Checklist

Week 1: Foundation

  • [ ] Choose project option
  • [ ] Write research question (1-2 sentences)
  • [ ] Inventory existing work / identify data sources
  • [ ] Set up GitHub repository
  • [ ] Load and inspect all data
  • [ ] Complete data cleaning with documented decisions

Week 2: Analysis

  • [ ] Create 4-6 exploratory visualizations with interpretations
  • [ ] Run 2+ formal statistical analyses
  • [ ] Build 2+ models with proper evaluation
  • [ ] Draft findings section

Week 3: Polish

  • [ ] Write introduction, abstract, and data description
  • [ ] Write conclusions, limitations, and ethical reflection
  • [ ] Polish all visualizations
  • [ ] Remove debugging cells and clean notebook
  • [ ] Kernel > Restart & Run All

Week 4: Finalize

  • [ ] Write final README
  • [ ] Create requirements.txt
  • [ ] Peer review (give and receive)
  • [ ] Address feedback
  • [ ] Final Restart & Run All
  • [ ] Submit and celebrate

Common Pitfalls

Pitfall Prevention
Starting too late Begin Week 1; even loading data counts as progress
Scope creep Pin your question to your monitor; every analysis should connect to it
Code dump Aim for equal amounts of Markdown and code
Overcomplicating methods Use the simplest method that answers your question
Ignoring ethics Engage genuinely; consider representation, misuse, and consequences
Never finishing Set a deadline; "done" beats "perfect"

Self-Review Checklist

Before you call it done, verify:

  • [ ] A non-data-scientist can understand the introduction and conclusions
  • [ ] Every chart has a title, labels, and written interpretation
  • [ ] At least 3 analytical decisions are documented with reasoning
  • [ ] The notebook runs cleanly (Kernel > Restart & Run All)
  • [ ] Limitations are specific, not generic
  • [ ] The ethical reflection engages with real human dimensions
  • [ ] The README is complete and informative
  • [ ] You'd be proud to show this to a hiring manager

What You Should Be Able to Do Now

  • [ ] Execute a complete data science investigation end-to-end
  • [ ] Integrate cleaning, exploration, visualization, statistics, and modeling into one coherent narrative
  • [ ] Document analytical decisions with clear reasoning
  • [ ] Communicate findings to a non-technical audience
  • [ ] Reflect honestly on limitations and ethical dimensions
  • [ ] Produce a portfolio-quality Jupyter notebook

If you've completed the capstone, take a moment to appreciate what you've done. You started this book not knowing what data science was. Now you've completed a full investigation — your own investigation, with your own question, your own decisions, and your own conclusions. That's real. Chapter 36 celebrates how far you've come and maps where you can go next.