Further Reading: Chapter 34

The Business of Data Science

The Economics of ML

1. Data Science for Business --- Foster Provost and Tom Fawcett (2013) The foundational text on connecting data science to business value. Chapter 7 ("Decision Analytic Thinking I: What Is a Good Model?") introduces the expected value framework used in this chapter. Chapter 8 extends it to cost-sensitive classification and threshold optimization. Despite being from 2013, the business thinking framework has not aged. Every data scientist should read Chapters 7 and 8 before their first stakeholder meeting. O'Reilly.

2. "The Value of a Prediction" --- Agrawal, Gans, and Goldfarb (2018) From the authors of Prediction Machines. This paper formalizes the idea that ML models produce predictions, and predictions are only valuable when they change decisions. The authors introduce a decision-theoretic framework: the value of a prediction = the value of the decision improvement it enables, minus the cost of producing the prediction. Published in the American Economic Review, Papers & Proceedings.

3. Prediction Machines: The Simple Economics of Artificial Intelligence --- Agrawal, Gans, and Goldfarb (2018) A book-length treatment of the economics of AI from three economists at the Rotman School of Management. The core thesis: AI reduces the cost of prediction, and cheaper prediction changes the optimal decision strategy. Chapters 5--8 cover the expected value framework, the cost of errors, and when prediction is and is not valuable. Accessible to non-technical readers. Harvard Business Review Press.

4. "The Business Case for AI" --- Thomas Davenport and Rajeev Ronanki, Harvard Business Review (2018) A survey of 152 AI projects across industries, categorizing them by business objective: automating processes (cost reduction), generating insights (revenue growth), or engaging customers (experience improvement). The key finding: the most successful projects started with a clear business problem, not with a technology choice. The least successful projects started with "we need AI."

Stakeholder Communication

5. Storytelling with Data --- Cole Nussbaumer Knaflic (2015) The definitive guide to data visualization for communication. Knaflic's framework --- context, choose an effective display, eliminate clutter, draw attention, think like a designer, tell a story --- applies directly to data science presentations. Chapters 7--9 on "declutter," "draw attention," and "think like a designer" will immediately improve your slides. Wiley.

6. The Pyramid Principle --- Barbara Minto (1987) The communication framework used by every major consulting firm. Minto's rule: start with the answer, then provide supporting evidence in a logical hierarchy. This is the opposite of how academics present (context, methods, results, conclusion) and the opposite of how most data scientists present (data, model, evaluation, recommendation). The Pyramid Principle is the single most impactful communication technique for data scientists presenting to executives.

7. Resonate --- Nancy Duarte (2010) A framework for building presentations that drive action. Duarte's structure: describe the current state, describe the desired future state, and alternate between the two to create tension that motivates change. Relevant for data scientists presenting strategic recommendations (e.g., "here is where we are without the model; here is where we could be with it"). Wiley.

8. "How to Communicate Results to Non-Technical Stakeholders" --- Cassie Kozyrkov, Medium (2019-2021) Kozyrkov (Google's former Chief Decision Scientist) writes extensively about the gap between statistical thinking and business thinking. Her articles on translating p-values, confidence intervals, and model performance into business language are directly applicable. Key articles: "Statistical Thinking for Non-Statisticians," "The What, Why, and How of A/B Testing," and "What Great Data Analysts Do."

Building a Data-Driven Culture

9. Competing on Analytics --- Thomas Davenport and Jeanne Harris (2007, updated 2017) The original business book on data-driven organizations. Davenport introduces the "analytics maturity model" (stages 1--5), which inspired the data maturity framework in this chapter. The updated edition covers AI and ML. Chapters 3--5 on organizational capabilities, culture, and leadership are directly relevant. Harvard Business Review Press.

10. "Creating a Data-Driven Organization" --- Carl Anderson (2015) A practitioner guide to building data science teams and embedding them in organizations. Chapters on team structure (centralized vs. embedded), hiring, stakeholder management, and project selection. The "data science project evaluation" framework --- impact, feasibility, data availability, organizational readiness --- is a useful checklist for prioritizing ML projects. O'Reilly.

11. "Why So Many Data Science Projects Fail to Deliver" --- Eric Colson, Harvard Business Review (2019) Colson (former VP of Data Science at Stitch Fix) argues that most data science projects fail not because of bad models but because of organizational dysfunction: unclear problem definitions, missing data infrastructure, and absent operational workflows. The article provides a framework for diagnosing why projects stall and recommendations for prevention.

Model Governance and Responsible Deployment

12. "Model Risk Management: Supervisory Guidance on Model Risk Management" --- Federal Reserve SR 11-7 (2011) The regulatory standard for model governance in financial services. SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." It requires model validation, documentation, ongoing monitoring, and a governance framework. While written for banks, the principles apply to any production ML system.

13. "Model Cards for Model Reporting" --- Mitchell et al. (2019) Proposes a standardized documentation format for ML models: intended use, training data, evaluation metrics, ethical considerations, and limitations. Model cards are the model governance equivalent of nutrition labels. The paper provides a template and examples. Relevant to the model governance framework in this chapter. Published at FAT* 2019. Available on arXiv (1810.03993).

14. Responsible AI --- Virginia Dignum (2019) A comprehensive treatment of AI ethics, governance, and societal impact from a European perspective. Chapters 4--6 cover accountability frameworks, governance structures, and the organizational practices required for responsible AI deployment. The book complements Chapter 33's focus on fairness with a broader governance lens. Springer.

Career Development

15. Build a Career in Data Science --- Emily Robinson and Jacqueline Nolis (2020) The most practical guide to data science careers. Chapters on building a portfolio, interviewing, managing stakeholders, and navigating organizational politics. Chapter 9 ("Working with Stakeholders") and Chapter 10 ("Making Effective Presentations") are directly relevant to this chapter. The authors draw on their combined experience at companies including Etsy, T-Mobile, and Warby Parker. Manning.

16. "What Data Scientists Really Do, According to 35 Data Scientists" --- Hugo Bowne-Anderson, Harvard Business Review (2018) Interviews with 35 data scientists reveal that the role is 50% communication and stakeholder management, 30% data wrangling, and 20% modeling. The article is a useful corrective to the perception that data science is primarily about building models. It reinforces this chapter's thesis: the business skills are at least as important as the technical ones.

17. "The Data Science Portfolio: How to Build One and What to Include" --- Jeremie Harris, Towards Data Science (2019) A practical guide to building a data science portfolio that demonstrates business thinking. Harris emphasizes: start with a question (not a dataset), include an ROI estimate, show your decision process, and write clearly. The article provides before/after examples of portfolio project descriptions.

The A/B Testing Problem

18. Trustworthy Online Controlled Experiments --- Kohavi, Tang, and Xu (2020) The definitive reference on A/B testing at scale, written by the team that built Microsoft's experimentation platform (serving 200+ million users). Chapters 3--5 cover statistical foundations, sample size calculation, and interpreting results. Chapter 17 ("The Politics of Experimentation") directly addresses the scenario in Case Study 2: what to do when stakeholders want to launch despite inconclusive results. Cambridge University Press.

19. "Peeking at A/B Tests: Why It Matters and What to Do About It" --- Johari et al. (2017) Addresses the common practice of checking A/B test results before the planned sample size is reached (which inflates false positive rates). The authors propose "always valid" inference methods that allow continuous monitoring without inflating error rates. Relevant when stakeholders pressure the data science team to "just tell me the answer" before the test is complete. Published at KDD 2017.

20. "The Perils of A/B Testing in the Age of AI" --- Ron Kohavi, KDD 2020 Keynote Kohavi discusses common mistakes in A/B testing: insufficient sample size, multiple comparisons, ignoring network effects, and premature termination. The talk includes real-world examples from Microsoft and Amazon where incorrect A/B test interpretations led to bad product decisions. Available on YouTube.

How to Use This List

If you are preparing for your first stakeholder presentation, start with Knaflic (item 5) and Minto (item 6). They will transform how you structure and deliver results.

If you need to justify your model's ROI, read Provost and Fawcett Chapter 7 (item 1) and Agrawal et al. (item 3). The expected value framework is the foundation.

If you are building a data science portfolio, read Robinson and Nolis (item 15) and Harris (item 17). Both provide concrete guidance on demonstrating business thinking.

If you are navigating organizational politics around A/B testing, read Kohavi et al. Chapter 17 (item 18). It is the most practical treatment of the "stakeholders want to launch despite inconclusive results" scenario.

If you are building a model governance framework, start with Mitchell et al. (item 13) for documentation standards and SR 11-7 (item 12) for the regulatory perspective.

If you want to understand why data science projects fail, read Colson (item 11) and Davenport and Ronanki (item 4). Both provide diagnostic frameworks for organizational dysfunction.

This reading list supports Chapter 34: The Business of Data Science. Return to the chapter to review concepts before diving in.