Acknowledgments

This textbook owes its existence to the open-source data science community. Every library used in these pages — scikit-learn, pandas, numpy, XGBoost, LightGBM, SHAP, MLflow, FastAPI — represents thousands of hours of volunteer work by developers who believe that powerful tools should be freely available.

Thank you to the practitioners who reviewed early drafts and insisted that the code be production-grade, the examples be realistic, and the war stories be honest. Your fingerprints are on every chapter.

Thank you to the instructors who tested these materials in classrooms and bootcamps, who reported every confusing passage, every broken code example, and every exercise that was either too easy or impossibly hard.

Thank you to the students — the aspiring data scientists, the career changers, the analysts leveling up — who asked the questions this book tries to answer: "How do I actually do this at work?" "What do hiring managers really want to see?" "Why doesn't the real world look like the Kaggle dataset?"

Thank you to the open-source textbook community. The decision to release this book under CC-BY-SA-4.0 was inspired by the belief that quality education should not require a $200 textbook.

And thank you to the senior data scientists who shared their war stories — the models that failed in production, the A/B tests that went sideways, the stakeholder presentations that taught more than any textbook could. Your hard-won experience is the backbone of the practitioner voice in these pages.


If you find an error, have a suggestion, or want to contribute, see CONTRIBUTING.md. This book gets better because of its community.