Chapter 4: Exercises
Data Strategy and Data Literacy
Recall and Comprehension
Exercise 4.1. Define data strategy in your own words. How does a data strategy differ from a data technology roadmap?
Exercise 4.2. Name and briefly describe the four pillars of data strategy presented in this chapter.
Exercise 4.3. List the six dimensions of data quality. For each, give one example of how a failure in that dimension could affect a retail company's AI initiatives.
Exercise 4.4. Distinguish among the three data governance roles: data owner, data steward, and data custodian. Who in a typical organization would fill each role?
Exercise 4.5. What is a "golden record" in the context of Master Data Management? Why is creating one more difficult than it might initially appear?
Exercise 4.6. Explain the difference between a data dictionary and a data catalog. When would an organization need both?
Exercise 4.7. List three reasons why data silos form in organizations. For each reason, suggest one structural change that could prevent or mitigate silo formation.
Exercise 4.8. What are the three mandates of a Chief Data Officer? Why do they sometimes conflict with one another?
Application
Exercise 4.9: Mini Data Audit. Choose a hypothetical mid-size company in an industry you know well (e.g., a regional hospital system, a restaurant chain, an e-commerce startup, or a financial services firm). Conduct a mini data audit by completing the following:
a) Identify the five most critical data domains for this organization (e.g., customer data, product data, financial data, employee data, etc.).
b) For each domain, estimate a data quality score (0–100%) across each of the six quality dimensions. Justify your estimates with specific scenarios.
c) Identify the three most likely data silos in this organization. For each, describe how the silo formed and what business impact it creates.
d) Recommend a prioritized action plan: which data domain should be addressed first, and why?
Exercise 4.10: Data Strategy One-Pager. You are a newly hired CDO at a 500-person SaaS company. The CEO has asked you to present a one-page data strategy at the next board meeting. Write that one-pager. It should include:
- A problem statement grounded in specific business pain points
- Three strategic priorities for the first 18 months
- Key metrics you will track to measure progress
- A brief statement on how data strategy connects to the company's AI ambitions
Exercise 4.11: Integration Pattern Selection. For each of the following scenarios, recommend the most appropriate data integration pattern (ETL, ELT, APIs, data mesh, or data virtualization) and explain your reasoning:
a) A hospital system that needs to combine electronic health records from five acquired clinics into a single reporting system for population health analytics.
b) An e-commerce company that needs real-time inventory checks across 12 warehouse locations when customers add items to their carts.
c) A global bank with 40+ business units, each with mature data engineering teams, that wants to enable cross-unit analytics without centralizing control.
d) A small nonprofit that needs to combine donor data from its website, email platform, and event management system for annual reporting.
Exercise 4.12: Privacy Impact Assessment. A fitness app company wants to use its users' health and workout data to train an AI model that predicts injury risk and recommends personalized recovery plans. Using the privacy principles from Section 4.10, analyze the following:
a) What data minimization challenges does this use case present?
b) How would purpose limitation apply if the company later wanted to sell anonymized aggregate data to insurance companies?
c) What data classification tier(s) would the relevant data fall into?
d) Draft a plain-language consent notice (100–150 words) that would satisfy the GDPR requirement for "freely given, specific, informed, and unambiguous" consent.
Exercise 4.13: Data Dictionary Creation.
You have been given access to a dataset containing the following fields from a retail company's customer loyalty program: member_id, join_date, tier_status, points_balance, lifetime_spend, preferred_store, email_opt_in, last_purchase_date, referral_source, age_range, household_size, nps_score.
Create a complete data dictionary entry for each field. For each, specify: field name, data type, business definition, permissible values (or range), source system, and any data quality notes.
Analysis
Exercise 4.14: Evaluating a Data Strategy. Read the following summary of a company's data strategy and identify at least five weaknesses:
"TechFlow Solutions has committed to becoming a data-driven organization. Our data strategy consists of the following initiatives: (1) Migrate all data to Snowflake by Q2. (2) Purchase Tableau licenses for all 300 employees. (3) Hire a data science team of five to build predictive models. (4) Collect as much customer data as possible to maximize our training dataset. (5) Launch a company-wide email announcing our new 'Data First' culture."
For each weakness you identify, suggest a specific improvement.
Exercise 4.15: CDO Organizational Positioning. A mid-size manufacturing company (2,000 employees, $800M revenue) is creating its first CDO position. The CEO has asked for your recommendation on:
a) Where should the CDO report — to the CEO, CIO, or CFO? Justify your recommendation with at least three arguments.
b) What should the CDO's first 90 days look like? Outline five specific actions.
c) What three metrics should the CDO be evaluated on after 12 months?
Exercise 4.16: Data Quality Cost Analysis. Athena Retail Group sends personalized email promotions based on customer purchase history. Due to a 15% duplicate rate in the customer database, some customers receive the same promotion multiple times. Estimate the financial impact of this problem using the following assumptions:
- Athena sends 12 promotional emails per year to each of its 2.8 million active customers
- Each email costs $0.03 to send (platform fees, creative production, etc.)
- Customer complaints about duplicate emails lead to a 2% opt-out rate above baseline
- Each opted-out customer represents $45 in lost annual email-driven revenue
- Resolving the duplicate problem would cost $2.1 million over 18 months
Calculate: (a) the annual cost of duplicates in wasted email spend, (b) the annual cost in lost revenue from excess opt-outs, (c) the total annual cost, and (d) the payback period for the remediation investment.
Exercise 4.17: Architecture Selection. A fast-growing direct-to-consumer (DTC) e-commerce brand ($50M revenue, 30 employees) currently stores all its data in Google Sheets, Shopify exports, and a Google Analytics dashboard. The CEO wants to "get serious about data" to support personalization and demand forecasting. The company has one data analyst and no data engineers.
a) Which data architecture pattern (warehouse, lake, lakehouse, modern data stack) would you recommend, and why?
b) What are the top three data quality issues you would expect to find in this company's current state?
c) Draft a 6-month data strategy roadmap with specific milestones for months 1, 3, and 6.
Research
Exercise 4.18: CDO Job Description Analysis. Find three real CDO or VP of Data job descriptions from job boards (LinkedIn, Indeed, etc.) for companies in different industries. For each:
a) Summarize the key responsibilities listed.
b) Identify whether the role emphasizes defense (governance/compliance), offense (analytics/AI), or transformation (culture/literacy).
c) Note the reporting structure if specified.
d) Compare the three job descriptions and identify common themes and notable differences.
Exercise 4.19: Data Regulation Landscape. Research the current state of data privacy regulation in your country (or a country of your choice). Answer the following:
a) What is the primary data privacy law or regulation?
b) How does it define "personal data"?
c) What are the key rights granted to individuals?
d) What are the penalties for non-compliance?
e) How does it compare to GDPR in scope and enforcement?
Exercise 4.20: Data Mesh in Practice. Research Zhamak Dehghani's data mesh concept. Find one published case study of a company that has implemented (or attempted to implement) data mesh. Summarize:
a) Why the company chose data mesh over a centralized approach.
b) How they organized domain teams and data products.
c) What challenges they encountered.
d) Whether they would describe the implementation as successful, and by what criteria.
Discussion Questions
Exercise 4.21. Professor Okonkwo distinguishes between "garbage in, garbage out" and "garbage in, decisions out." Why is the latter framing more dangerous? Can you think of a real-world example where AI outputs from bad data were acted on because they looked legitimate?
Exercise 4.22. CFO David Larsen objected to spending 40% of the AI budget on data infrastructure: "We approved $45M for AI, not for data plumbing." Is his objection unreasonable? If you were Ravi, how would you frame the data investment as an AI investment rather than a separate line item?
Exercise 4.23. Some organizations have adopted the principle "data is the new oil." Others have criticized this metaphor. What are the strengths and weaknesses of comparing data to oil? Propose a better metaphor if you disagree with this one.
Exercise 4.24. Should data literacy be a required competency for all MBA graduates, regardless of specialization? Make the case for and against.
Exercise 4.25. Apple has made privacy a central brand promise. Is this strategy available to all companies, or does it work only because Apple's business model does not depend on advertising revenue? Could a company like Meta or Google credibly adopt a similar positioning?
Exercise 4.26. Ravi recommends that Athena allocate 40% of Year 1 budget to data infrastructure. If you were Grace Chen, what conditions or milestones would you set to ensure this investment delivers value? How would you communicate this decision to the board of directors?
Exercise 4.27. The chapter identifies four stages for building a data-literate culture: executive commitment, role-specific training, data champions, and structural reinforcement. Which stage is most commonly skipped, and why? What happens when it is?
Integrative Exercise
Exercise 4.28: Data Strategy Presentation. Working in teams of 3–4, choose a real company (publicly traded, so financial information is available). Prepare a 15-minute presentation that includes:
a) Current State Assessment. Based on publicly available information (annual reports, earnings calls, press releases, job postings, news articles), assess the company's apparent data maturity. What evidence suggests strong or weak data governance?
b) Data Strategy Recommendations. Propose three strategic data initiatives for the company over the next 24 months. For each, explain the business case, estimated investment, expected outcomes, and key risks.
c) AI Readiness Evaluation. Using the Data Readiness Framework (Section 4.12), rate the company on each of the five readiness dimensions (accessibility, quality, governance, integration, ethics/compliance). Where is the company strongest? Where does it need the most work?
d) CFO Challenge. One team member plays the CFO. The presenting team must defend their recommendations against budget objections, timeline concerns, and the question "Why can't we just skip to AI?"
Reflection
Exercise 4.29. Think about your own organization (current or most recent employer). On a scale of 1–10, how would you rate its data literacy? What specific evidence supports your rating? If you could change one thing about how data is managed in that organization, what would it be?
Exercise 4.30. This chapter argues that data strategy is a business discipline, not a technology discipline. Do you agree? What are the risks of framing data strategy primarily as a business concern? What are the risks of framing it primarily as a technology concern?