Further Reading: Exploring Data — Graphs and Descriptive Statistics

Recommended Resources for Data Visualization and Exploratory Data Analysis

Books (Start Here)

Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Graphics Press. The undisputed classic of data visualization. Tufte coined terms like "chartjunk" (unnecessary visual clutter) and "data-ink ratio" (the proportion of ink devoted to actual data versus decoration). His principles — maximize data, minimize ink, avoid distortion — are the foundation of every good graph you'll ever make. The book is beautifully designed and reads like an art book. Even if you only look at the pictures, you'll come away with a sharper eye for effective visualization.

Cairo, A. (2019). How Charts Lie: Getting Smarter About Visual Information. W.W. Norton & Company. If Case Study 1 (misleading graphs) grabbed your attention, this is the book to read next. Alberto Cairo — a data journalism professor at the University of Miami — walks through dozens of real-world examples of deceptive graphs, from political ads to health infographics. He shows you exactly how each one misleads and teaches you to read graphs critically. Accessible, entertaining, and deeply practical.

Schwabish, J. (2021). Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks. Columbia University Press. A practical guide to creating clear, effective visualizations — particularly useful if you're creating graphs for reports, presentations, or publications. Schwabish covers chart selection, color use, annotation, and layout. More focused on production than theory, making it a great companion to Tufte's more philosophical approach.

Wheelan, C. (2013). Naked Statistics: Stripping the Dread from the Data. W.W. Norton & Company. Chapters 3-4 cover the descriptive statistics concepts from this chapter (and Chapter 6) in the same conversational tone as this textbook. Wheelan's explanations of distributions, averages, and misleading statistics are some of the clearest in popular writing. If you enjoyed this chapter's style, you'll enjoy Wheelan.

Wilke, C. O. (2019). Fundamentals of Data Visualization. O'Reilly Media. A comprehensive, principle-based guide to choosing and creating visualizations. Available free online at clauswilke.com/dataviz/. Wilke covers every graph type in this chapter (and many more) with clear guidance on when each is appropriate. The "directory of visualizations" chapters are especially useful as a quick-reference guide. Highly recommended for the Python-inclined reader.

Articles and Papers

Anscombe, F. J. (1973). "Graphs in Statistical Analysis." The American Statistician, 27(1), 17-21. One of the most famous short papers in statistics. Anscombe created four datasets — now known as Anscombe's Quartet — that have identical summary statistics (same mean, same standard deviation, same correlation) but look completely different when graphed. This paper is the single best argument for why you should always visualize your data before analyzing it. It's only five pages and entirely accessible to beginners. If you read one supplementary paper for this chapter, make it this one.

Matejka, J., & Fitzmaurice, G. (2017). "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290-1294. A modern extension of Anscombe's Quartet that creates a dinosaur-shaped dataset, a star-shaped dataset, and several others — all with identical summary statistics. The paper demonstrates, unforgettably, that summary statistics can be wildly misleading. Search for "Datasaurus Dozen" to see the visualizations.

Heer, J., & Bostock, M. (2010). "Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 203-212. A research study testing which visual encodings (position, length, angle, area, color) humans are best at perceiving accurately. The finding: position along a common scale (bar charts) beats angle (pie charts) beats area (bubble charts) for accuracy. This paper provides the scientific basis for preferring bar charts to pie charts.

Videos

Hans Rosling — "The Best Stats You've Ever Seen" (TED Talk, ~20 min) Rosling's legendary 2006 TED Talk uses animated bubble charts (built with his Gapminder software) to demolish assumptions about global health and development. It's the gold standard for what data visualization can accomplish as communication. If you saw this in Chapter 1, watch it again — you'll notice things you missed now that you understand distribution shapes and graph design.

Vox — "Why Every Map of the World Is Wrong" (YouTube, ~6 min) A short, engaging video about how the Mercator map projection distorts the relative sizes of countries — a visualization problem that parallels many of the distortions in this chapter. It's a vivid reminder that all visualizations involve choices, and those choices shape perception.

3Blue1Brown — "But What Does 'Random' Actually Mean?" (YouTube, ~14 min) While not directly about visualization, Grant Sanderson's video builds visual intuition about distributions and randomness. Watching his animations of how random processes create familiar distribution shapes prepares you beautifully for Chapter 10 (the normal distribution).

StatQuest with Josh Starmer — "Histograms Clearly Explained" (YouTube, ~5 min) A concise, animated explanation of how histograms work and why bin width matters. Starmer's videos are clear, no-nonsense, and perfectly pitched for statistics students. Watch this if you want a quick video refresher on Section 5.4.

Interactive and Online Resources

Seeing Theory — "Basic Probability" and "Frequency Distributions" (seeing-theory.brown.edu) This beautiful interactive website from Brown University lets you create and manipulate histograms, experiment with bin widths, and build intuition about distribution shapes. The "Frequency Distributions" section is directly relevant to this chapter. Spend 15-20 minutes playing with the interactive tools — they build understanding faster than reading alone.

From Data to Viz (data-to-viz.com) An interactive decision tree that helps you choose the right graph type based on your data. Select the number and types of your variables, and the tool recommends appropriate visualizations with examples. This is the digital version of the graph selection guide from Section 5.9 — bookmark it.

The Pudding (pudding.cool) A digital publication that uses creative data visualization to tell stories about culture, politics, and society. Their interactive visual essays demonstrate what's possible when visualization is done well. Browsing a few articles will inspire you and expand your sense of what data visualization can accomplish beyond traditional charts.

Python Graph Gallery (python-graph-gallery.com) A comprehensive collection of Python visualization code examples organized by chart type. Need to make a grouped bar chart, a ridgeline plot, or a violin chart? Find the code here. Each example includes the data, the Python code, and the resulting graph. Invaluable as you build your Python visualization skills.

Matplotlib and Seaborn Documentation - matplotlib.org/stable/tutorials/index.html — Official matplotlib tutorials, from basic to advanced - seaborn.pydata.org/tutorial.html — Seaborn's tutorial pages, with clear examples for every chart type

Podcasts

Data Skeptic (dataskeptic.com) A weekly podcast covering data science and statistics concepts. Episodes on data visualization, exploratory data analysis, and common statistical mistakes connect well to this chapter. The "mini episodes" (10-15 min) are particularly accessible for introductory students.

Storytelling with Data (storytellingwithdata.com/podcast) Cole Nussbaumer Knaflic's podcast focuses on effective data communication — choosing the right chart, eliminating clutter, and designing for your audience. Every episode connects to the graph design principles from this chapter. Start with episodes on "decluttering" and "choosing an effective visual."

Looking Ahead

The concepts in this chapter are foundations for everything that follows:

Chapter 6 (Numerical Summaries): You'll quantify the "center" and "spread" you described verbally in this chapter — with means, medians, standard deviations, and percentiles. You'll also learn box plots, which summarize an entire distribution in five numbers
Chapter 7 (Data Wrangling): You'll use histograms and other graphs to check your data for problems — missing values, obvious errors, and distributions that don't look right
Chapter 10 (Normal Distribution): The symmetric, bell-shaped histogram you saw in the Productive Struggle exercise? That's the normal distribution — arguably the most important distribution in all of statistics
Chapter 22 (Correlation and Regression): Scatterplots — the graph type we previewed briefly in Section 5.9 — become the centerpiece of regression analysis
Chapter 25 (Communicating with Data): Everything you learned about effective graph design in this chapter and Case Study 1 comes back when you learn to present your findings
Chapter 27 (Ethical Data Practice): Misleading graphs from Case Study 1 reappear in the context of ethical responsibilities in data communication