Further Reading: Reshaping and Transforming Data
You've learned the structural transformations that let you take data in any shape and mold it for analysis. Here are resources to deepen your understanding, organized by what drew your attention most.
Tier 1: Verified Sources
These are published works with full bibliographic details.
Wes McKinney, Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter (O'Reilly, 3rd edition, 2022). McKinney created pandas, and Chapters 8 ("Data Wrangling: Join, Combine, and Reshape") and 10 ("Data Aggregation and Group Operations") in his book are the definitive reference for everything covered in this chapter. His explanation of the merge algorithm internals and the full range of groupby capabilities goes well beyond what we covered here. If you plan to work with pandas seriously, this book should be on your desk.
Hadley Wickham, R for Data Science (O'Reilly, 2nd edition, 2023), by Hadley Wickham, Mine Cetinkaya-Rundel, and Garrett Grolemund. Although written for R, not Python, Chapters 5 ("Data tidying") and 19 ("Joins") present the same concepts with exceptional clarity. Wickham's visual diagrams of join types and his explanation of why long ("tidy") format is superior for analysis are some of the best teaching on these topics in any language. The concepts transfer directly to pandas.
Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data (O'Reilly, 2nd edition, 2023). Chapter 3 covers pandas in detail, with particularly strong sections on hierarchical indexing (multi-index), combining datasets, and aggregation with groupby. VanderPlas's writing is precise and example-driven, making it an excellent complement to McKinney's more comprehensive treatment.
Daniel Chen, Pandas for Everyone: Python Data Analysis (Addison-Wesley, 2nd edition, 2023). This book is structured around building practical skills incrementally, much like our textbook. The chapters on tidy data, merging, and groupby operations include extensive worked examples with real-world datasets. Particularly useful if you want more practice problems beyond what our exercises provide.
Tier 2: Attributed Resources
These are articles, talks, and online resources well-known in the data science community. We provide enough detail to find them, but not URLs (because links rot).
Hadley Wickham, "Tidy Data" (2014). Published in the Journal of Statistical Software (Volume 59, Issue 10). This is the foundational paper on the concept of tidy data — the principle that each variable should be a column, each observation a row, and each type of observational unit a table. Wickham's paper formalizes the intuition behind why melting and pivoting matter: they're the operations that make data tidy. The paper includes a taxonomy of common "messy data" patterns and how to fix each one. Despite being an academic paper, it's written in remarkably accessible prose.
pandas official documentation: "Merge, join, concatenate and compare." The pandas documentation (accessible via pandas.pydata.org) has a comprehensive guide to all merge and join operations, including edge cases not covered in this chapter — cross joins, merge-ordered joins, and the relationship between merge(), join(), and concat(). The documentation includes visual diagrams similar to the ones in this chapter.
pandas official documentation: "Group by: split-apply-combine." The official groupby documentation is extensive and covers advanced features like filtering groups, applying custom functions, and working with the Grouper object for time-based grouping. If you found the named aggregation syntax useful, the docs show many more patterns.
Real Python, "Pandas Merge, Join, and Concat" tutorial series. Real Python (realpython.com) publishes long-form, peer-reviewed Python tutorials. Their pandas merging tutorial includes animated visualizations of how different join types work, which can supplement the ASCII diagrams in this chapter. Search for "Real Python pandas merge" to find it.
Chris Moffitt, "Pandas Pivot Table Explained" (Practical Business Python). This blog post (on the Practical Business Python site) walks through pivot_table with business-oriented examples — sales by region, revenue by quarter — that illustrate pivot_table parameters clearly. It's one of the most-referenced introductions to pivoting in the Python data community.
Recommended Next Steps
-
If you want more practice merging: Download two related datasets from a public source (e.g., country-level health data from the WHO and economic data from the World Bank) and merge them on country codes. Real-world key alignment problems — misspelled names, different code systems, missing countries — will teach you more than any textbook exercise.
-
If you want to master groupby: Chapter 10 of McKinney's Python for Data Analysis covers advanced groupby patterns including
filter,applywith custom functions, and theGrouperobject. These are powerful tools for complex analytical workflows. -
If the tidy data concept resonated: Read Wickham's 2014 paper. It will change how you think about data structure and make you faster at diagnosing why a particular analysis is difficult (the answer is usually: the data isn't in the right shape).
-
If you're ready for the next chapter: Chapter 10 (Working with Text Data) will teach you the string cleaning techniques that make merging practical — standardizing country names, extracting codes from messy fields, and pattern matching with regular expressions. The merge failures caused by string mismatches in this chapter are exactly what Chapter 10 solves.
-
If you want to see these techniques in a visualization context: Preview Chapter 15 (matplotlib) or Chapter 16 (seaborn). Both plotting libraries work best with long-format data, so the melt and groupby skills from this chapter are direct prerequisites for effective visualization.