Acknowledgments

A textbook of this scope does not come into existence through any single effort. It is the product of a vast community — programmers, educators, researchers, and learners — whose work made this book both possible and necessary.

The Open-Source Community

Our deepest gratitude goes to the creators and maintainers of the Python data science ecosystem. Wes McKinney created pandas and fundamentally changed how a generation of analysts works with data. John Hunter started matplotlib, and the visualization community that grew from his work has given us seaborn (Michael Waskom), plotly (the Plotly team), and dozens of other tools that turn numbers into understanding. Travis Oliphant's NumPy and SciPy gave Python its scientific backbone. The scikit-learn team — originally led by David Cournapeau and developed extensively by researchers at INRIA — made machine learning accessible to anyone willing to learn. Fernando Perez and the Jupyter team created a computing environment that has become the shared language of data science education worldwide.

These projects are maintained by communities of volunteers who fix bugs at midnight, answer questions on forums, review pull requests on weekends, and write documentation that teaches millions of people every year. If open-source software is the infrastructure of modern data science, these people are the engineers who built the bridge we all walk across.

Educators and Authors

This book is shaped by the work of those who have thought deeply about how to teach data science. We are grateful to the instructors and authors whose approaches influenced our own: Jake VanderPlas (Python Data Science Handbook), Joel Grus (Data Science from Scratch), Allen Downey (Think Stats and Think Bayes), Hadley Wickham (whose grammar-of-graphics philosophy transcends any single language), Edward Tufte (whose principles of data visualization remain foundational), and the countless university instructors who share their syllabi and materials openly so that others can build on their work.

Data Providers

The datasets used throughout this book come from organizations committed to making public data accessible. We thank the World Health Organization (WHO), the U.S. Centers for Disease Control and Prevention (CDC), and the broader open data movement for making the raw materials of data science available to learners everywhere. Public data is a public good, and the people who collect, curate, and distribute it deserve more recognition than they typically receive.

The Students

Every worked example, every "Common Pitfall" callout, every carefully worded explanation in this book was shaped by learners who raised their hands and said, "I still don't get it." Students who struggle with a concept and say so are doing the hardest and most valuable thing a learner can do. They are the ones who reveal where a textbook's explanations are insufficient, where its examples are unclear, and where its assumptions about prior knowledge are wrong. This book is better because of the students who refused to nod along when they were confused.

A Note on AI-Assisted Authorship

This textbook was created using AI-assisted generation with human curation, review, and editorial oversight. We believe in full transparency about this process. The initial drafts were generated by large language models, then reviewed and revised by human editors who verified technical accuracy, tested code examples, refined explanations, and ensured pedagogical coherence. We view this as a collaboration between human expertise and AI capability — one that allowed us to produce a comprehensive, free resource at a scale that would have been impractical for a traditional authoring process.

All code examples have been tested against Python 3.12+. All factual claims have been checked against primary sources where possible, with appropriate hedging where uncertainty exists. Any remaining errors are ours, and we welcome corrections from the community.

You

Finally, thank you — the person reading this right now. You picked up a 1,400-page textbook about a subject you might know nothing about, and you decided to give it a try. That takes guts. That takes curiosity. That's the most important thing a data scientist can have.

Let's make it count.