Key Takeaways: Capstone Project
This is your reference card for Chapter 35 — the capstone. Use it as a checklist during your project to make sure you're hitting all the required elements.
Key Concepts
-
The capstone is integration, not new learning. You already have every skill you need. The capstone tests whether you can combine those skills into a coherent investigation — from question to conclusion, with every step documented and justified.
-
A capstone is not a homework assignment. Homework has a predetermined question, provided data, and a known answer. The capstone requires you to make every decision yourself: the question, the data strategy, the analytical approach, the interpretation, and the communication. That's what makes it hard. That's also what makes it valuable.
-
Communication is half the project. A capstone that runs perfect models but has no narrative, no interpretation, and no written conclusions is incomplete. The notebook should read like a story, not a code dump.
-
Limitations and ethics are required, not optional. The capstone rubric allocates 10 points (out of 24) to critical reflection. This reflects a genuine belief: a data scientist who doesn't think about what their analysis can't tell you, or about the human consequences of their work, is not yet a complete data scientist.
-
Done is better than perfect. A completed capstone at 85% quality is infinitely better than an unfinished one at 100%. Set a deadline and meet it.
Three Capstone Options
| Option | Topic | Data | Best For |
|---|---|---|---|
| A | Global vaccination rate disparities | WHO + World Bank (provided through progressive milestones) | Students who completed progressive milestones |
| B | Small bakery business analytics | Simulated POS data + NOAA weather | Students interested in business analytics |
| C | NBA three-point revolution | Basketball Reference team stats | Students interested in sports analytics |
| Custom | Your own topic | Your own sources | Students with a specific domain interest |
Required Deliverables
The Notebook (10 sections)
1. Title and Abstract ............... 200-300 words
2. Introduction and Motivation ...... 500-800 words
3. Data Description ................. 400-600 words + code
4. Data Cleaning .................... 600-1000 words + code
5. Exploratory Analysis ............. 800-1200 words + 4-6 charts
6. Statistical Analysis/Modeling .... 800-1200 words + 2-4 charts
7. Findings and Conclusions ......... 500-800 words
8. Limitations and Future Work ...... 300-500 words
9. Ethical Reflection ............... 300-500 words
10. References ...................... Complete citations
The Repository
project-name/
README.md # Overview, findings, reproduction instructions
notebooks/ # The capstone notebook
data/raw/ # Original data (or download instructions)
data/processed/ # Cleaned data
figures/ # Saved key visualizations
requirements.txt # Dependencies with versions
.gitignore # Exclude checkpoints, caches, large files
The Rubric (24 points total)
| Dimension | Points | What "Excellent" (4/4) Looks Like |
|---|---|---|
| Question and Motivation | 4 | Specific, interesting question; compelling motivation; clear scope |
| Data Handling | 4 | Documented sources; 3+ cleaning decisions justified; summary stats |
| Exploration and Visualization | 4 | 4+ polished charts with titles, labels, and written interpretation |
| Statistical Analysis/Modeling | 4 | Appropriate methods; assumptions checked; proper evaluation; honest interpretation |
| Communication and Narrative | 4 | Reads as a story; accessible to non-technical readers; conclusion answers the question |
| Critical Reflection | 4 | Specific limitations; genuine ethical engagement; concrete future work |
| Total Score | Assessment |
|---|---|
| 22-24 | Exceptional — portfolio-ready |
| 18-21 | Strong — solid competence |
| 14-17 | Satisfactory — needs strengthening |
| Below 14 | Developing — significant revision needed |
Milestone Checklist
Week 1: Foundation
- [ ] Choose project option
- [ ] Write research question (1-2 sentences)
- [ ] Inventory existing work / identify data sources
- [ ] Set up GitHub repository
- [ ] Load and inspect all data
- [ ] Complete data cleaning with documented decisions
Week 2: Analysis
- [ ] Create 4-6 exploratory visualizations with interpretations
- [ ] Run 2+ formal statistical analyses
- [ ] Build 2+ models with proper evaluation
- [ ] Draft findings section
Week 3: Polish
- [ ] Write introduction, abstract, and data description
- [ ] Write conclusions, limitations, and ethical reflection
- [ ] Polish all visualizations
- [ ] Remove debugging cells and clean notebook
- [ ] Kernel > Restart & Run All
Week 4: Finalize
- [ ] Write final README
- [ ] Create requirements.txt
- [ ] Peer review (give and receive)
- [ ] Address feedback
- [ ] Final Restart & Run All
- [ ] Submit and celebrate
Common Pitfalls
| Pitfall | Prevention |
|---|---|
| Starting too late | Begin Week 1; even loading data counts as progress |
| Scope creep | Pin your question to your monitor; every analysis should connect to it |
| Code dump | Aim for equal amounts of Markdown and code |
| Overcomplicating methods | Use the simplest method that answers your question |
| Ignoring ethics | Engage genuinely; consider representation, misuse, and consequences |
| Never finishing | Set a deadline; "done" beats "perfect" |
Self-Review Checklist
Before you call it done, verify:
- [ ] A non-data-scientist can understand the introduction and conclusions
- [ ] Every chart has a title, labels, and written interpretation
- [ ] At least 3 analytical decisions are documented with reasoning
- [ ] The notebook runs cleanly (Kernel > Restart & Run All)
- [ ] Limitations are specific, not generic
- [ ] The ethical reflection engages with real human dimensions
- [ ] The README is complete and informative
- [ ] You'd be proud to show this to a hiring manager
What You Should Be Able to Do Now
- [ ] Execute a complete data science investigation end-to-end
- [ ] Integrate cleaning, exploration, visualization, statistics, and modeling into one coherent narrative
- [ ] Document analytical decisions with clear reasoning
- [ ] Communicate findings to a non-technical audience
- [ ] Reflect honestly on limitations and ethical dimensions
- [ ] Produce a portfolio-quality Jupyter notebook
If you've completed the capstone, take a moment to appreciate what you've done. You started this book not knowing what data science was. Now you've completed a full investigation — your own investigation, with your own question, your own decisions, and your own conclusions. That's real. Chapter 36 celebrates how far you've come and maps where you can go next.