Chapter 4: Quiz

Data Strategy and Data Literacy


Multiple Choice

Q1. Which of the following best describes a data strategy?

a) A plan for purchasing and implementing data management tools b) A comprehensive plan for how an organization collects, manages, and uses data to support business objectives c) A set of policies governing data access and security d) A roadmap for migrating data to the cloud

Q2. When Ravi Mehta asked "How many customers do we have?" and received three different answers, the root cause was primarily a failure of:

a) Data accuracy b) Data consistency and governance c) Data timeliness d) Data security

Q3. Which of the following is NOT one of the six dimensions of data quality?

a) Accuracy b) Scalability c) Completeness d) Uniqueness

Q4. In data governance, a data steward is best described as:

a) The senior executive accountable for a data domain's strategic decisions b) A hands-on practitioner responsible for day-to-day data quality within a domain c) An IT professional responsible for database administration and backups d) An external auditor who reviews data compliance annually

Q5. The "golden record" concept in Master Data Management refers to:

a) The largest and most complete database in the organization b) A single, authoritative representation of each entity that serves as the definitive version across the organization c) The original source system where data was first created d) A backup copy of data stored in a secure vault

Q6. Which data integration pattern is characterized by storing data in its raw format first and transforming it at query time?

a) ETL (Extract, Transform, Load) b) ELT (Extract, Load, Transform) c) Data mesh d) Data virtualization

Q7. According to the chapter, the average tenure of a Chief Data Officer is approximately:

a) 5 years b) 4 years c) 2.5 years d) 1 year

Q8. Which data architecture pattern combines the flexible storage of a data lake with the structured query capabilities of a data warehouse?

a) Data mesh b) Data lakehouse c) Modern data stack d) Data virtualization

Q9. The GDPR principle of "data minimization" requires organizations to:

a) Minimize the number of employees who can access data b) Collect only the data necessary for a specific, stated purpose c) Store data in the smallest possible file formats d) Delete all data older than one year

Q10. Which of the following best describes "entity resolution" in the context of MDM?

a) Resolving conflicts between data governance policies b) Determining whether two or more records in different systems refer to the same real-world entity c) Deciding which database technology to use for master data d) Resolving disputes between data owners about data definitions

Q11. Professor Okonkwo's revision of "garbage in, garbage out" to "garbage in, decisions out" emphasizes that:

a) AI models are incapable of producing useful output from imperfect data b) Bad data produces outputs that look legitimate and are acted upon with unwarranted confidence c) Data quality problems are always obvious to end users d) Organizations should not use AI until their data is perfect

Q12. According to research cited in this chapter, what percentage of Fortune 1000 companies identified organizational culture (not technology) as their biggest barrier to becoming data-driven?

a) 32% b) 55% c) Nearly 80% d) 95%

Q13. Which of the following is a key feature of a data catalog that distinguishes it from a data dictionary?

a) It defines data types and permissible values for individual fields b) It provides searchable discovery across all data assets in the organization, including lineage and social features c) It stores backup copies of all organizational data d) It enforces data quality rules at the point of entry

Q14. In Ravi Mehta's Data Strategy Roadmap for Athena, what percentage of the Year 1 AI transformation budget did he propose allocating to data infrastructure?

a) 15% b) 25% c) 40% d) 60%

Q15. Data literacy at the "foundational" level means an employee can:

a) Build machine learning models and evaluate their performance b) Design data architecture and write SQL queries c) Read and interpret charts, understand basic metrics, and recognize data quality red flags d) Perform self-service analysis using BI tools and build dashboards


Short Answer

Q16. Explain why data quality problems are particularly damaging for machine learning models, compared to traditional business reporting. Use an example from the chapter to support your answer.

Q17. Describe the difference between a data strategy and data tactics. Why must the strategy level precede the tactical level?

Q18. CFO David Larsen described Ravi's proposed data infrastructure investment as "data plumbing." Grace Chen responded with the metaphor of building a penthouse on a cracked foundation. In your own words, explain why the "data plumbing" framing is dangerous for organizations pursuing AI transformation.

Q19. Name two reasons why data literacy programs commonly fail, and for each, suggest a specific countermeasure.

Q20. Explain the concept of "privacy by design" and give one example of how it could be applied during the design of a customer loyalty program.


True or False (with justification)

For each statement, indicate whether it is true or false and provide a one-sentence justification.

Q21. A data lake enforces a predefined schema before data is loaded.

Q22. The CDO role is primarily a technology leadership position, similar to the CIO.

Q23. Data silos sometimes exist for legitimate privacy and security reasons.

Q24. According to GDPR, organizations must be able to delete an individual's personal data upon request.

Q25. An organization should achieve perfect data quality before beginning any AI initiatives.


Answer Key

Q1. b Q2. b Q3. b — Scalability is not one of the six dimensions. The six are: accuracy, completeness, consistency, timeliness, validity, and uniqueness. Q4. b Q5. b Q6. b Q7. c Q8. b Q9. b Q10. b Q11. b Q12. c — The NewVantage Partners (Wavestone) 2024 survey found 79.8%. Q13. b Q14. c Q15. c

Q16. ML models learn patterns from data and encode them in model weights. If the training data contains systematic quality issues — duplicates that skew distributions, inconsistencies that create contradictory signals, or accuracy errors that misrepresent reality — the model learns these distortions as if they were real patterns. Unlike a human analyst reviewing a report, the model cannot exercise judgment about whether a data point "seems wrong." Tom Kowalski's startup experienced this: a fraud detection model that was 94% accurate on clean test data dropped to 61% in production because the production data had a 23% duplicate rate.

Q17. A data strategy defines why and what — the business objectives that data should support and the capabilities needed to achieve them (e.g., "unify customer identity across all channels"). Data tactics define how and when — the specific tools, timelines, and implementation steps (e.g., "implement Segment CDP by Q3"). Strategy must precede tactics because tactics derive their priority and justification from strategy. Without strategy, an organization may invest in tools that solve problems nobody prioritized.

Q18. The "data plumbing" framing treats data infrastructure as unglamorous, utilitarian, and separate from the "real" AI work. This framing is dangerous because it suggests data infrastructure can be minimized or deferred in favor of model building. In reality, data infrastructure is AI infrastructure — models trained on poorly governed, inconsistent, or low-quality data will fail regardless of their algorithmic sophistication. The plumbing metaphor also obscures the strategic nature of data governance: decisions about data architecture, quality standards, and integration patterns shape what AI applications are even possible.

Q19. (1) Training without application — employees attend workshops but never use the skills because their daily workflow doesn't require them. Countermeasure: pair training with immediate, practical projects using real organizational data. (2) One-size-fits-all curriculum — forcing all employees through the same generic training regardless of role. Countermeasure: design role-specific training paths that address the specific data tasks each function performs (e.g., supply chain managers need different skills than marketing managers).

Q20. Privacy by design means embedding privacy protections into systems and processes from the outset, rather than adding them retroactively. For a customer loyalty program, privacy by design might include: collecting only the data necessary for the program's stated purpose (data minimization), providing clear opt-in consent at enrollment with a plain-language description of how data will be used, designing the database with access controls that restrict personal data to authorized personnel, and building automated data deletion for members who cancel their enrollment.

Q21. False — A data lake uses "schema on read," meaning data is loaded in raw format and structured at query time, unlike a data warehouse which uses "schema on write."

Q22. False — The chapter argues that the CDO role is a business leadership position that encompasses governance, analytics, AI, and culture, not merely technology management. CDOs who are perceived as technology leaders rather than business leaders tend to be less effective.

Q23. True — HR data should not be freely accessible to marketing, and healthcare data must be isolated for HIPAA compliance. The challenge is distinguishing deliberate, policy-driven separation from accidental, historical fragmentation.

Q24. True — GDPR grants individuals the "right to be forgotten" (right to erasure), which requires organizations to delete personal data upon request, subject to certain exceptions (e.g., legal obligations to retain data).

Q25. False — Waiting for perfect data quality is neither realistic nor necessary. Different AI use cases have different quality requirements. The goal is to achieve data quality sufficient for the intended application, not perfection. The chapter's Data Readiness Framework assesses quality relative to specific use cases.