Further Reading: Introduction to pandas

You've just taken a major step — from writing your own data analysis loops to expressing your intent in a single line of pandas. If you want to deepen your understanding before moving into data cleaning in Chapter 8, here are resources organized by what caught your interest.


Tier 1: Verified Sources

These are published books with full bibliographic details.

Wes McKinney, Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter (O'Reilly, 3rd edition, 2022). McKinney created pandas, and this book is the definitive reference. The third edition covers modern pandas (version 1.4+) with Jupyter-based examples throughout. Chapters 5 and 7 in McKinney's book align closely with what you learned here — DataFrames, Series, indexing, and selection. If you want to go deeper on any topic from this chapter, McKinney's book is where to go first. The writing is clear, the examples are practical, and having the library's creator explain the design decisions behind loc vs. iloc or the SettingWithCopyWarning is invaluable.

Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data (O'Reilly, 2nd edition, 2023). VanderPlas covers NumPy, pandas, matplotlib, and scikit-learn in a single volume, with a strong focus on the conceptual framework behind each tool. His treatment of pandas indexing and selection is particularly good — he uses diagrams and analogies that make the loc/iloc distinction intuitive. The second edition was updated for current pandas versions and includes Jupyter notebook examples. Available for free online through the author's GitHub repository.

Daniel Y. Chen, Pandas for Everyone: Python Data Analysis (Addison-Wesley, 2nd edition, 2023). This book is specifically designed for pandas beginners and moves at a gentle pace. If the transition from pure Python to pandas felt too fast in this chapter, Chen's book provides additional worked examples and exercises at each stage. The second edition covers pandas 2.0 and modern best practices.

Matt Harrison, Effective Pandas: Patterns for Data Manipulation (Matt Harrison, 2nd edition, 2022). Once you're comfortable with the basics from this chapter, Harrison's book teaches you how to write better pandas code. It covers method chaining patterns, performance optimization, and idiomatic pandas style. Not a beginner book — save it for after you've finished Part II of our textbook — but excellent for leveling up your pandas fluency.


Tier 2: Attributed Resources

These are articles, documentation, and online resources well-known in the data science community. We provide enough detail to find them, but not URLs (because links change).

The official pandas documentation (pandas.pydata.org). The pandas documentation includes a comprehensive "Getting Started" tutorial series, a detailed API reference, and a "User Guide" that walks through all major operations. The "10 Minutes to pandas" tutorial is a popular starting point that covers roughly the same ground as this chapter in condensed form. Bookmark the documentation — you'll reference it constantly.

Hadley Wickham, "Tidy Data" (2014). Published in the Journal of Statistical Software (Volume 59, Issue 10). This paper introduced the "tidy data" framework — the idea that data should be structured so that each variable is a column, each observation is a row, and each type of observational unit forms a table. Although Wickham writes in R, the concepts are universal and directly inform how you'll structure DataFrames. We'll build on tidy data principles in Chapters 8 and 9. Reading this paper now will give you vocabulary for talking about data structure.

The pandas "Cookbook" section of the official documentation. A curated collection of short recipes showing how to accomplish common tasks in pandas — selecting data, merging DataFrames, working with dates, handling missing values, and more. Each recipe is self-contained and shows both the code and the output. Useful as a quick reference when you know what you want to do but can't remember the exact syntax.

Real Python's pandas tutorials (realpython.com). Real Python maintains a series of well-edited, peer-reviewed tutorials on pandas topics including DataFrames, selection, filtering, and groupby. The tutorials are longer and more detailed than typical blog posts, with runnable code examples. Search for "Real Python pandas DataFrame tutorial" for an entry point.


  • If you want more practice with the basics: Work through the "10 Minutes to pandas" tutorial from the official documentation, then try loading 2-3 different CSV datasets and running the Chapter 7 inspection workflow (info, head, describe, filter, sort) on each.

  • If you're ready for data cleaning: Move to Chapter 8, which builds directly on this chapter. Everything you learned about NaN, dtypes, and boolean indexing becomes essential for detecting and fixing data quality issues.

  • If the method chaining style appealed to you: Read the first three chapters of Harrison's Effective Pandas for a deep dive into chaining patterns and pandas idioms.

  • If you want to understand the internals: McKinney's book explains how pandas stores data in memory (using NumPy arrays), why certain operations are fast or slow, and the design philosophy behind the API. Understanding the internals will help you write more efficient code as your datasets grow.

  • If the "grammar of data manipulation" concept resonated: Read Wickham's "Tidy Data" paper and explore his R package dplyr, which formalizes the grammar idea with explicit verb functions (select, filter, mutate, arrange, summarize). The pandas equivalents map closely, and seeing both perspectives deepens your understanding of the framework.

Keep going — Chapter 8 is where you learn that real data is messy, and now you have the tools to do something about it.