Chapter 25 Further Reading: Descriptive Statistics for Business Decisions

DataField.Dev

Affiliate disclosure

Book titles on this page link to Amazon. As an Amazon Associate, DataField.Dev earns from qualifying purchases — at no additional cost to you.

Chapter 25 Further Reading: Descriptive Statistics for Business Decisions

Books

Accessible Business Statistics

"Naked Statistics: Stripping the Dread from the Data" — Charles Wheelan (W. W. Norton, 2013)

The most readable introduction to statistical thinking for non-statisticians. Wheelan uses real-world examples and plain English to build genuine intuition for concepts like correlation, regression, and probability. Highly recommended as a companion to this chapter — read it on a plane, not at a desk. The chapter on the pitfalls of statistics (sampling bias, selection bias, publication bias) is worth the price alone.

"How to Lie with Statistics" — Darrell Huff (W. W. Norton, 1954, reprint)

A classic that is still entirely relevant. Short, punchy, and full of examples of how statistics can be selected and presented to mislead. Understanding these tricks from the "how to do it" angle makes you exceptionally good at spotting them when others do it to you. Read it in an afternoon.

"The Art of Statistics: How to Learn from Data" — David Spiegelhalter (Basic Books, 2019)

A more rigorous treatment than Wheelan, but still written for general audiences. Spiegelhalter is Britain's former Chief Statistician and writes with clarity and humor. Excellent on the topics of uncertainty, risk communication, and why we need to be humble about what data can tell us. Chapter 3 on summarizing data directly extends what you have learned here.

"Thinking, Fast and Slow" — Daniel Kahneman (Farrar, Straus and Giroux, 2011)

Not a statistics book per se, but essential reading for anyone making data-driven decisions. Kahneman's exploration of cognitive biases — including the tendency to over-weight vivid stories relative to statistical summaries, and to confuse small samples with reliable evidence — explains why the discipline of "so what" in this chapter matters. The chapter on regression to the mean alone is worth the read.

"Signal and the Noise: Why So Many Predictions Fail — But Some Don't" — Nate Silver (Penguin Press, 2012)

Silver built the FiveThirtyEight model and has thought deeply about prediction and uncertainty. While the book focuses more on forecasting (relevant to Chapter 26), his treatment of descriptive statistics, baselines, and what data is actually "signal" versus noise is directly useful here. The chapters on baseball and political forecasting are particularly engaging.

Online Resources

pandas Documentation: Statistical Functions https://pandas.pydata.org/docs/reference/frame.html#statistics

The official reference for every statistical method available on pandas DataFrames. Well-organized with examples. Bookmark the describe(), quantile(), corr(), and agg() sections.

scipy.stats Documentation https://docs.scipy.org/doc/scipy/reference/stats.html

The full reference for scipy's statistics module. Start with the descriptive statistics section (describe, skew, kurtosis, zscore). Useful for when you outgrow what pandas provides and need more advanced calculations.

Khan Academy: Statistics and Probability https://www.khanacademy.org/math/statistics-probability

Free, video-based coverage of every topic in this chapter with interactive practice problems. Particularly strong on measures of center and spread, box plots, and correlation. Useful for filling in conceptual gaps or reviewing foundational math that the chapter assumes.

Statistics How To https://www.statisticshowto.com

A well-organized reference site for statistical concepts with plain-English explanations. When you encounter an unfamiliar statistical term in a pandas output or a data science article, this is usually the fastest place to look it up. Covers everything from coefficient of variation to Simpson's Paradox to z-scores with worked examples.

Towards Data Science (Medium) https://towardsdatascience.com

A large, practitioner-focused blog on data science and analytics. The quality varies, but the descriptive statistics and exploratory data analysis (EDA) articles are consistently strong. Search for "EDA business dataset" or "descriptive statistics pandas" to find applied tutorials that complement what you have learned here.

Specific Articles Worth Reading

"How to Detect Outliers in Machine Learning" — Jason Brownlee, Machine Learning Mastery A clear, practical walkthrough of IQR and z-score outlier detection with Python code. While the context is machine learning, the techniques are identical to what you use in business analytics.

"Simpson's Paradox in Real Life" — Eric Topol, Nature Medicine (and numerous blog adaptations) The Berkeley admissions case (the most famous real-world example of Simpson's Paradox) has been written about extensively. Search for "Berkeley admissions Simpson's Paradox" to read the original story and several modern retellings with clear explanations.

"The Inspection Paradox is Everywhere" — Allen Downey, Probably Overthinking It blog A related statistical trap not covered in this chapter: why samples you draw are often biased toward overrepresenting high-frequency or high-visibility events. Relevant for any analyst trying to understand customer behavior from observed data.

Python Libraries and Tools

Seaborn https://seaborn.pydata.org

A higher-level visualization library built on top of matplotlib. The seaborn.histplot(), seaborn.boxplot(), and seaborn.heatmap() functions produce publication-quality charts with less code than matplotlib. Once you are comfortable with the matplotlib examples in this chapter, explore seaborn for faster, more polished results.

import seaborn as sns

# Correlation heatmap with one line
sns.heatmap(df.corr(), annot=True, cmap="RdYlGn", center=0)

Plotly Express https://plotly.com/python/plotly-express/

Interactive charts you can embed in web applications or Jupyter notebooks. Particularly good for box plots and violin plots where interactivity (hover to see exact values, click to isolate groups) adds significant value.

Practice Datasets

The following public datasets are well-suited for practicing the techniques in this chapter:

Kaggle — Superstore Sales Dataset A fictional retail dataset with sales, profit, discount, and category columns. Excellent for practicing regional comparisons, product margin analysis, and outlier detection. Search "superstore dataset kaggle."

UCI Machine Learning Repository — Bike Sharing Dataset Daily bike rental counts with weather, season, and calendar features. Good for practicing distribution analysis, correlation with weather variables, and identifying seasonal patterns (a preview of Chapter 26).

data.world — Small Business Datasets Curated collection of real small-business datasets from various industries. Filter by "sales" or "customer" to find datasets directly relevant to the business context of this book.

Going Deeper: When You Are Ready

If you find that descriptive statistics leaves you hungry for more analytical power, the natural next steps are:

Inferential statistics: Moving from describing your sample to making claims about the broader population (confidence intervals, hypothesis testing). See Chapter 27 of this book.
A/B testing: Applying statistical testing to business experiments. The foundational question: "Is this difference real or just random variation?" Chapter 28 covers this.
Regression analysis: Using correlation as a foundation, regression lets you quantify how much one variable predicts another and make predictions. Chapter 29.

The statistics in this chapter are the vocabulary. The chapters ahead are the grammar.