Further Reading: Capstone Project
The capstone is about doing, not reading. But these resources can help you do it better — whether you need inspiration, technical reference, or examples of excellent data science communication.
Tier 1: Verified Sources
Cole Nussbaumer Knaflic, Storytelling with Data: A Data Visualization Guide for Business Professionals (Wiley, 2015). If your capstone notebook needs better visualizations and clearer communication, this is the book to read. Knaflic's approach — identify your audience, choose appropriate chart types, eliminate clutter, and focus attention — applies directly to capstone presentation. Particularly useful for the Findings and Conclusions section.
Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data (O'Reilly, 2nd edition, 2023). Your go-to technical reference while building the capstone. When you need to look up how to do a specific pandas operation, build a particular chart in matplotlib, or configure a scikit-learn model, VanderPlas has clear, well-organized coverage. Keep it open in another tab.
Wes McKinney, Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter (O'Reilly, 3rd edition, 2022). For the data cleaning and preparation phases of the capstone, McKinney's comprehensive coverage of pandas operations is invaluable. If you're doing complex merges, reshaping operations, or time series manipulations, this is the definitive reference.
Joel Grus, Data Science from Scratch: First Principles with Python (O'Reilly, 2nd edition, 2019). If you want to understand why your models work the way they do — not just how to call scikit-learn functions — Grus builds everything from first principles. Useful when writing the Statistical Analysis/Modeling section and you need to explain your methods clearly.
David Spiegelhalter, The Art of Statistics: How to Learn from Data (Basic Books, 2019). If your capstone involves statistical testing and you want to be sure you're interpreting p-values, confidence intervals, and effect sizes correctly, Spiegelhalter's accessible explanations will give you confidence. Particularly relevant for writing the kind of qualified, honest conclusions the capstone rubric rewards.
Edward Tufte, The Visual Display of Quantitative Information (Graphics Press, 2nd edition, 2001). The gold standard for data visualization design. When you're polishing your capstone charts, Tufte's principles (maximize the data-ink ratio, avoid chartjunk, let the data speak) will help you create visualizations that are both beautiful and informative.
Tier 2: Attributed Resources
Kaggle's "Notebooks" section. Browse highly-voted notebooks on Kaggle for examples of well-structured data science analyses. While the context is competitive modeling, the best notebooks demonstrate excellent exploratory analysis, clear visualization, and thoughtful interpretation. Search for notebooks in your domain (healthcare, sports, business) for domain-specific inspiration.
FiveThirtyEight articles. The data journalism site founded by Nate Silver consistently publishes analyses that combine rigorous statistics with engaging narrative. Their articles are excellent models for the communication style the capstone rubric rewards — technical depth presented accessibly.
Pudding.cool. An online publication that creates "visual essays" using data. If you're looking for inspiration on creative data visualization and narrative, Pudding's projects demonstrate what's possible when data science meets storytelling.
Tidy Tuesday (R community, but applicable to Python). The R community's weekly data visualization challenge produces thousands of analyses of interesting datasets. While the code is in R, the analytical approaches, visualization choices, and question framing are language-agnostic and can inspire your capstone design. Search "Tidy Tuesday" on social media for examples.
Reproducibility in Science articles. If you want to deepen the reproducibility dimension of your capstone, search for articles by Victoria Stodden, Roger Peng, or the ReproZip project. Their work on making computational research reproducible directly informs how your capstone repository should be structured.
Domain-Specific Resources
For Vaccination / Public Health Analysis (Option A)
WHO COVID-19 Dashboard and Data Portal. The primary data source. Familiarize yourself with the data documentation, variable definitions, and update schedule.
Our World in Data (ourworldindata.org). Provides clean, well-documented datasets on global health topics including vaccination. Their visualizations are excellent models for your own charts. Founded by Max Roser at the University of Oxford.
Global Burden of Disease Study. Published by the Institute for Health Metrics and Evaluation (IHME), this provides context for understanding health system capacity across countries. Useful for interpreting your vaccination findings.
For Business Analytics (Option B)
Hyndman, R.J. and Athanasopoulos, G., Forecasting: Principles and Practice (3rd edition, OTexts, 2021). Available free online. The definitive guide to time series forecasting, written accessibly. If your bakery capstone includes a forecasting component, this book covers everything from simple methods to advanced techniques. While examples are in R, the concepts translate directly to Python.
McKinsey Global Institute reports on small business analytics. McKinsey publishes reports on how small and mid-sized businesses use data. These provide realistic context for framing your bakery analysis.
For Sports Analytics (Option C)
Basketball Reference (basketball-reference.com). The primary data source for NBA statistics. Understand their glossary of terms (offensive rating, pace, usage rate) before building your analysis.
Dean Oliver, Basketball on Paper: Rules and Tools for Performance Analysis (Potomac Books, 2004). The foundational text on basketball analytics. Oliver's "Four Factors" (shooting, turnovers, rebounding, free throws) provide a framework for understanding what drives winning.
Kirk Goldsberry, SprawlBall: A Visual Tour of the New Era of the NBA (Mariner Books, 2019). A beautifully visualized investigation of the three-point revolution — essentially the published version of what Option C asks you to investigate. Reading this before or during your capstone provides both inspiration and a benchmark.
Recommended Process Resources
-
If you're stuck on data cleaning: Revisit Chapter 8 and McKinney's Python for Data Analysis, Chapter 7 (Data Cleaning and Preparation).
-
If your visualizations need polish: Revisit Chapter 18 and read Knaflic's Storytelling with Data, Chapters 3-5.
-
If your statistical analysis feels shaky: Revisit Chapters 22-24 and read Spiegelhalter's The Art of Statistics, Chapters 4-8.
-
If your models aren't performing well: Revisit Chapters 29-30 and check for common issues: data leakage, overfitting, inappropriate metric choice, insufficient feature engineering.
-
If you can't finish on time: Reduce scope, not quality. A well-executed analysis of a simpler question is better than a rushed analysis of an ambitious one. Cut the modeling section to one model instead of three if needed. Polish what you have.
-
If you need motivation: Reread Section 35.9 of the chapter. Then close this book and start working. The hardest part is beginning. Everything after that is one step at a time.
Good luck. You've got this.