Chapter 22 Further Reading: No-Code / Low-Code AI


AutoML and Automated Machine Learning

1. He, X., Zhao, K., & Chu, X. (2021). "AutoML: A Survey of the State-of-the-Art." Knowledge-Based Systems, 212, 106622. The most comprehensive academic survey of AutoML techniques, covering neural architecture search, hyperparameter optimization, feature engineering automation, and meta-learning. While technical in places, the taxonomy of AutoML approaches is invaluable for understanding what different platforms actually do under the hood. Particularly useful for readers who want to evaluate vendor claims against the underlying research.

2. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J. T., Blum, M., & Hutter, F. (2015). "Efficient and Robust Automated Machine Learning." Advances in Neural Information Processing Systems, 28. The foundational paper behind auto-sklearn, one of the first open-source AutoML systems. Feurer and colleagues describe how meta-learning, Bayesian optimization, and ensemble construction can be combined to automate the machine learning pipeline. Reading this paper provides a solid understanding of the algorithmic foundations that commercial platforms like DataRobot and H2O build upon.

3. Erickson, N., Mueller, J., Shirkov, A., et al. (2020). "AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data." arXiv preprint arXiv:2003.06505. Amazon's AutoGluon paper demonstrates that stacking and ensembling multiple models with minimal hyperparameter tuning can match or exceed the performance of extensively tuned individual models. The practical implications are significant: AutoML's strength lies less in finding the perfect model than in combining diverse models effectively. This paper explains why NK's stacked ensemble performed so well with so little effort.

4. Drozdal, J., Weisz, J. D., Wang, D., Dass, G., Yao, B., Zhao, C., Muber, M., Ju, Lin, H. (2020). "Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems." Proceedings of the ACM International Conference on Intelligent User Interfaces. A human-computer interaction study examining what information users need to trust AutoML-generated models. The findings reveal that transparency about feature engineering decisions, model selection rationale, and performance trade-offs significantly affects user trust — and that current AutoML interfaces often fail to provide this information adequately. Directly relevant to Professor Okonkwo's questions about whether NK can explain and defend her model.


No-Code/Low-Code AI Platforms

5. Polyzotis, N., Roy, S., Whang, S. E., & Zinkevich, M. (2018). "Data Lifecycle Challenges in Production Machine Learning: A Survey." SIGMOD Record, 47(2), 17-28. A Google research paper cataloging the challenges of managing data in production ML systems — data collection, data validation, data cleaning, feature engineering, and data monitoring. The paper argues that data management, not model building, is the primary bottleneck in production ML. This perspective is essential for understanding why AutoML's automation of model building, while valuable, addresses only part of the problem.

6. Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., Tausczik, Y., Samulowitz, H., & Gray, A. (2019). "Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated Machine Learning." Proceedings of the ACM Conference on Computer-Supported Cooperative Work. A qualitative study of how professional data scientists perceive and interact with AutoML tools. The findings are nuanced: data scientists value AutoML for rapid prototyping and baseline establishment but resist its use for production models, citing concerns about transparency, control, and the risk of deskilling. This tension — between automation's efficiency and expertise's depth — is the central theme of Chapter 22.

7. Xin, D., Ma, L., Liu, J., Macke, S., Song, S., & Parameswaran, A. (2021). "Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows." Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. An empirical study of how data scientists use AutoML in practice, based on analysis of Jupyter notebook histories and interviews. The authors find that AutoML is used primarily for model selection and hyperparameter tuning — not for the harder steps of data preparation and feature engineering. This finding aligns with the chapter's argument that AutoML automates the middle of the pipeline but leaves the ends (problem definition and deployment) to humans.


Shadow AI and AI Governance

8. Samsung Semiconductor ChatGPT Data Leak. (2023). Multiple sources: Bloomberg (April 2023), The Economist (May 2023), TechCrunch (April 2023). The primary journalistic accounts of Samsung's data leakage incidents. Bloomberg's original reporting provided the most detailed timeline, while The Economist's analysis placed the incident in the broader context of enterprise AI governance challenges. Essential reading for Case Study 2 and for any business leader developing an AI usage policy.

9. Cyberhaven. (2023). "New Data Shows 11% of Data Employees Paste into ChatGPT Is Confidential." Cyberhaven Labs. A data-driven analysis of ChatGPT usage patterns based on Cyberhaven's data loss prevention platform. The finding that 11 percent of employee inputs to ChatGPT contained confidential information — and that the rate was increasing over time — quantifies the shadow AI risk that many organizations sense but cannot measure. The report also identifies the most common categories of sensitive data shared (source code, client data, financial information).

10. Salesforce. (2024). "Generative AI Snapshot Research Series." Salesforce Research. A multi-wave survey tracking enterprise adoption of generative AI tools, including rates of unauthorized usage. The 2024 wave found that 55 percent of generative AI users at work had not received formal approval — a finding that directly supports the chapter's argument about the prevalence of shadow AI. The survey also tracks changes in organizational policies over time, providing a useful longitudinal view of governance evolution.

11. NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology. The US government's voluntary framework for managing AI risks, organized around four functions: Govern, Map, Measure, and Manage. While the full framework is the subject of Chapter 27, its treatment of organizational governance structures — including roles, responsibilities, and accountability mechanisms — is directly relevant to designing citizen data science programs. The framework's emphasis on context-appropriate risk management aligns with the tiered governance model presented in this chapter.


Citizen Data Science and Organizational Models

12. Gartner. (2024). "Predicts 2025: Data and Analytics Strategy." Gartner Research. Gartner's annual predictions report includes analysis of citizen data science program adoption, success rates, and common failure modes. The 2024 edition estimated that by 2025, citizen data scientists would produce more analytics deliverables than professional data scientists at 60 percent of large organizations — but also predicted that 40 percent of citizen data science initiatives would be scaled back due to governance failures. A sobering complement to the chapter's optimistic framing of democratization.

13. Davenport, T. H. (2020). "Beyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists." Harvard Business Review Digital Articles. Davenport argues that the "unicorn" data scientist — equally skilled in statistics, programming, domain knowledge, and communication — is an unrealistic ideal, and that organizations should instead develop specialized roles with targeted training. His framework for classifying data science practitioners into distinct categories (from "data-literate managers" to "deep analyticians") provides a more nuanced alternative to the binary "data scientist vs. citizen data scientist" framing. Relevant for designing training and certification programs.

14. Ransbotham, S., Kiron, D., Gerbert, P., & Reeves, M. (2017). "Reshaping Business with Artificial Intelligence." MIT Sloan Management Review and Boston Consulting Group. An early but still relevant study of organizational factors that differentiate AI leaders from AI laggards. The finding that organizational learning, executive sponsorship, and cross-functional collaboration matter more than technology sophistication is directly applicable to the citizen data science program design. Leaders create environments where AI experimentation is encouraged and governed — not one or the other.


Build vs. Buy and Platform Strategy

15. Iansiti, M., & Lakhani, K. R. (2020). Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World. Harvard Business Review Press. Iansiti and Lakhani's analysis of the "AI factory" — the organizational architecture that enables companies to operate AI at scale — provides strategic context for the build-vs-buy-vs-configure framework. Their argument that AI capabilities become more valuable when integrated into organizational operating models challenges the assumption that no-code platforms can deliver strategic advantage without deep organizational integration. Chapter 6's further reading also references this text.

16. Andrus, M., Spitzer, E., Brown, J., & Xiang, A. (2021). "What We Can't Measure, We Can't Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness." FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. While not directly about no-code AI, this paper illuminates a critical limitation of automated approaches to fairness: the difficulty of obtaining the demographic data necessary to test for bias. AutoML platforms can automate model building but cannot automate bias detection without access to protected attribute data — data that organizations often do not collect, cannot collect, or should not collect without appropriate safeguards. Essential context for understanding why Athena's HR resume-screening model is so problematic.


Responsible AI and Bias in Automated Systems

17. Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). "Fairness and Abstraction in Sociotechnical Systems." Proceedings of FAT 2019. A seminal paper arguing that fairness in AI systems cannot be reduced to mathematical optimization — it requires attention to the social context in which systems operate. The paper identifies five "traps" that technologists fall into when trying to make AI fair, including the "Ripple Effect Trap" (failing to account for how a system changes the social context it operates in). Directly relevant to the chapter's discussion of bias risks in citizen-built models and the limitations of automated fairness checks in AutoML platforms.

18. Raji, I. D., & Buolamwini, J. (2019). "Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Systems." Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. A study of how public auditing of commercial AI systems (in this case, facial recognition) drove companies to improve their products. The paper's methodology — external auditing combined with public disclosure — provides a model for how organizations might audit their own AutoML-generated models for bias. The conclusion that automated systems require external accountability mechanisms reinforces the chapter's argument for governance structures around citizen data science.


Practical Guides and Industry Reports

19. Mollick, E. (2024). Co-Intelligence: Living and Working with AI. Portfolio. Ethan Mollick's guide to working with AI tools includes practical frameworks for evaluating when AI tools are appropriate, managing their limitations, and integrating them into professional workflows. His discussion of "centaur" models (human-AI collaboration) and "cyborg" models (deeply integrated human-AI workflows) provides vocabulary for thinking about how citizen data scientists and professional data scientists collaborate. Also referenced in Chapter 1's further reading.

20. O'Reilly Media. (2024). "AI Adoption in the Enterprise." O'Reilly Annual Survey. O'Reilly's annual survey of AI practitioners provides granular data on tool adoption, platform preferences, and organizational challenges. The 2024 survey found that AutoML tools were used by 38 percent of respondents (up from 25 percent in 2022) but that satisfaction varied significantly by platform and use case. The survey's breakdown by organization size, industry, and AI maturity level helps readers calibrate the chapter's recommendations to their own context.

21. Wilson, H. J., & Daugherty, P. R. (2018). "Collaborative Intelligence: Humans and AI Are Joining Forces." Harvard Business Review, 96(4), 114-123. An Accenture study arguing that the greatest performance improvements come not from AI alone or humans alone but from human-AI collaboration. The paper's framework for understanding how humans complement AI (training, explaining, sustaining) and how AI complements humans (amplifying, interacting, embodying) provides a lens for thinking about the relationship between citizen data scientists and AutoML platforms. The citizen data scientist does not replace the professional data scientist; the two roles complement each other.

22. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmerman, T. (2019). "Software Engineering for Machine Learning: A Case Study." Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice. A Microsoft Research study of the software engineering challenges specific to machine learning systems, including data management, model training, deployment, and monitoring. The paper identifies "hidden technical debt" in ML systems — maintenance costs that accumulate over time as models degrade, data pipelines break, and organizational knowledge is lost. Essential reading for understanding why one-click deployment from AutoML platforms is the beginning of the challenge, not the end.


Online Resources and Tools

23. DataRobot Documentation and University. https://docs.datarobot.com/ DataRobot's comprehensive documentation and free learning resources provide hands-on exposure to enterprise AutoML. The DataRobot University courses cover platform usage, model interpretation, deployment, and governance features. Useful for the Try It exercises in the chapter and for anyone evaluating AutoML platforms.

24. H2O.ai Documentation and Tutorials. https://docs.h2o.ai/ H2O's open-source resources and Driverless AI documentation offer a complementary perspective to DataRobot's enterprise-focused approach. The open-source H2O-3 library is freely available and provides a good introduction to AutoML concepts within a Python coding environment. Recommended for readers who want to understand AutoML from both the no-code and the code-assisted perspectives.

25. Google Cloud Vertex AI AutoML Documentation. https://cloud.google.com/vertex-ai/docs/ Google's Vertex AI documentation covers AutoML Tables, AutoML Vision, and AutoML Natural Language within the Google Cloud ecosystem. The pricing calculator and consumption-based model provide a useful contrast to DataRobot and H2O's license-based pricing. Practical for readers evaluating cloud-native AutoML options.


For foundational concepts in machine learning that AutoML automates, see Further Reading in Chapters 7-11. For AI governance frameworks referenced in this chapter, see Further Reading in Chapter 27. For bias detection and fairness concepts, see Further Reading in Chapters 25-26.