Further Reading: Chapter 27 — Customer Analytics and Segmentation
Foundational Books
"Loyalty Rules!" by Frederick F. Reichheld (Harvard Business School Press, 2001) Reichheld's earlier work established the business case for customer loyalty, including the data showing that small improvements in retention rates produce disproportionately large improvements in profitability. The underlying economics of CLV and why retention matters more than acquisition are explained clearly here. This book predates NPS but contains the thinking that led to it.
"The Loyalty Effect" by Frederick F. Reichheld (Harvard Business School Press, 1996) The companion to the above. Heavy on financial modeling of customer lifetime value in various industries. The case studies (particularly insurance and financial services) remain instructive for how to think about customer equity. The math has aged well even if the examples have not.
"Competing on Analytics" by Thomas H. Davenport and Jeanne G. Harris (Harvard Business School Press, 2007) A broader work on analytical competition, but Chapter 4's treatment of customer analytics and Chapter 8 on embedding analytics into business processes are directly relevant. Useful for understanding how customer analytics fits into a larger organizational capability.
"Customer Analytics For Dummies" by Jeff Sauro (Wiley, 2015) Do not let the title mislead you. This is a well-organized, genuinely practical guide to the full customer analytics toolkit with emphasis on behavioral metrics, survey-based insights, and statistical testing. Good companion to the code-heavy approach of this chapter.
Academic and Research Papers
"RFM and CLV: Using Iso-Value Curves for Customer Base Analysis" by Peter S. Fader, Bruce G.S. Hardie, and Ka Lok Lee (Journal of Marketing Research, 2005) The definitive academic treatment of RFM analysis and its relationship to CLV models. Fader and Hardie also developed the BG/NBD (Beta Geometric / Negative Binomial Distribution) model, which is a more statistically rigorous CLV model for non-contractual businesses. This paper is accessible to a non-statistician with patience.
"Counting Your Customers: Who Are They and What Will They Do Next?" by David Schmittlein, Donald Morrison, and Richard Colombo (Management Science, 1987) The original Pareto/NBD model — the precursor to BG/NBD. Important historically and conceptually. Shows that customer purchase behavior follows predictable statistical distributions, which enables CLV prediction even with limited data.
"How to Project Customer Retention" by Philip Kotler and William Baumol (Journal of the Academy of Marketing Science, 1971) Older but foundational. Introduced cohort analysis as a customer retention measurement tool. Useful for understanding where the framework came from.
Online Resources
Peter Fader's "Customer Centricity" course materials (Wharton School) Fader is one of the foremost researchers on CLV and customer-based corporate valuation. His Coursera course on customer analytics covers the statistical models underlying CLV estimation in depth. Freely available in audit mode at: coursera.org — search "Customer Analytics Wharton."
The "Lifetimes" Python Library Documentation
lifetimes is a Python package that implements the BG/NBD and Gamma-Gamma models for non-contractual CLV estimation — the statistically rigorous version of the simple CLV we used in this chapter. Installation: pip install lifetimes. Documentation and examples: https://lifetimes.readthedocs.io/
scikit-learn Clustering User Guide The official scikit-learn documentation for clustering algorithms, including K-Means, DBSCAN (useful when clusters are non-spherical), and hierarchical clustering. Includes worked examples and guidance on choosing between algorithms: https://scikit-learn.org/stable/modules/clustering.html
Towards Data Science: "RFM Analysis for Customer Segmentation" (multiple authors) A collection of practical tutorials on RFM implementation in Python, including variations like weighted RFM, time-windowed RFM, and hybrid approaches. Search the site for "RFM customer segmentation Python" to find current articles.
Tools and Libraries
lifetimes — BG/NBD and Gamma-Gamma CLV models. The statistical upgrade to this chapter's simple CLV calculation. Particularly valuable for e-commerce and subscription businesses.
pip install lifetimes
scikit-learn — K-Means and other clustering algorithms used in this chapter. Also includes silhouette scoring for cluster validation.
pip install scikit-learn
plotly — Interactive versions of the charts built in this chapter. Particularly effective for the cohort heatmap (which benefits from hover-over tooltips showing exact retention numbers) and the customer scatter plot.
pip install plotly
seaborn — The heatmap() function used for cohort visualization. Also useful for distribution plots of RFM score components and segment-level comparisons.
pip install seaborn # Usually included with the Anaconda distribution
Datasets for Practice
The Online Retail II dataset (UCI Machine Learning Repository) A real-world dataset containing all transactions from a UK-based online retail company between December 2009 and December 2011. 1,067,371 rows. Excellent for practicing RFM, cohort analysis, and CLV modeling with real data. Available at: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
The Instacart Online Grocery Shopping Dataset (2017) A large-scale real transaction dataset (3+ million orders from 200,000 users) useful for practicing frequency and product breadth analysis. Available on Kaggle: search "Instacart Market Basket Analysis."
Northwind Traders Database A classic sample database representing a fictional food import/export company. Available in SQLite format and as CSV exports. Useful for practicing customer analytics on a more modest, realistic B2B scale. Available through multiple GitHub repositories — search "Northwind SQLite."
Related Chapters in This Book
- Chapter 25 (Descriptive Statistics for Business Decisions) — The statistical foundations underlying the RFM scoring distributions
- Chapter 26 (Business Forecasting and Trend Analysis) — Extends cohort analysis with time series forecasting for customer count prediction
- Chapter 28 (Sales and Revenue Analytics) — Applies analytics to the sales pipeline: the mechanics by which customer relationships convert to revenue
- Chapter 31 (Marketing Analytics and Campaign Analysis) — Connects RFM segments to campaign targeting, attribution, and marketing ROI
- Chapter 33 (Introduction to Machine Learning for Business) — Extends the K-means clustering approach with supervised learning for churn prediction
- Chapter 34 (Predictive Models: Regression and Classification) — Builds a formal churn prediction model using logistic regression