Chapter 13 Further Reading: Neural Networks Demystified


Neural Network Foundations

1. Nielsen, M. (2015). Neural Networks and Deep Learning. Determination Press. [Online: neuralnetworksanddeeplearning.com] The single best free resource for understanding neural networks from first principles. Nielsen explains backpropagation, gradient descent, and network architectures with exceptional clarity, using visualizations and interactive examples. Written for readers with modest mathematical backgrounds, it bridges the gap between the purely conceptual approach of this chapter and the technical depth of a machine learning textbook. Start with Chapters 1-2 for a natural extension of the concepts covered here.

2. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. [Online: deeplearningbook.org] The definitive academic textbook on deep learning, written by three of the field's leading researchers. Substantially more mathematical than this chapter, but Part I (Applied Math and Machine Learning Basics) and Part II (Deep Networks: Modern Practices) are accessible to readers with undergraduate-level linear algebra and probability. The free online version makes it a valuable reference even if you only read selected chapters. Essential for anyone who wants to go deeper than the conceptual level.

3. 3Blue1Brown. (2017). "Neural Networks" [YouTube series, 4 videos]. Grant Sanderson's animated video series on neural networks is the best visual explanation of how neural networks learn. The series covers neurons, layers, gradient descent, and backpropagation using elegant geometric visualizations that make abstract concepts tangible. At roughly 20 minutes per video, it is the most efficient way to reinforce the concepts from this chapter. Particularly recommended for visual learners and for readers who want to see gradient descent "in action."

4. Karpathy, A. (2015). "The Unreasonable Effectiveness of Recurrent Neural Networks." [Blog post: karpathy.github.io] Andrej Karpathy (later Tesla's Director of AI, then at OpenAI) demonstrates RNNs learning to generate Shakespeare, Wikipedia articles, LaTeX code, and C source code — character by character. The post is both entertaining and illuminating, showing how recurrent networks learn sequential patterns. It provides an intuitive feel for what it means for a neural network to "learn" — and for the difference between memorization and generation.


Transformer Architecture and Attention

5. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). "Attention Is All You Need." Advances in Neural Information Processing Systems (NeurIPS), 30. The paper that introduced the transformer architecture. While the paper is technical, the core idea — replacing recurrence with self-attention — is explained clearly enough that readers with the foundation from this chapter can follow the high-level argument. Understanding this paper's contribution (even at a conceptual level) is essential for anyone who wants to understand why modern AI looks the way it does. Over 130,000 citations as of 2026.

6. Alammar, J. (2018). "The Illustrated Transformer." [Blog post: jalammar.github.io] The most accessible visual explanation of the transformer architecture. Alammar walks through self-attention, multi-head attention, and the encoder-decoder structure using clear diagrams and step-by-step examples. If the transformer discussion in this chapter left you wanting more detail but the Vaswani et al. paper feels too technical, this blog post is the ideal intermediate step. Also recommended: Alammar's "The Illustrated GPT-2" and "The Illustrated BERT."

7. Phuong, M., & Hutter, M. (2022). "Formal Algorithms for Transformers." arXiv preprint arXiv:2207.09238. A concise, precise summary of the mathematical operations in transformer models, written as a reference document rather than a tutorial. Useful for business leaders who want to understand what transformers actually compute without wading through a full textbook. At 22 pages, it is remarkably efficient. Best read after the Alammar blog post for conceptual context.


Deep Learning in Practice

8. Howard, J., & Gugger, S. (2020). Deep Learning for Coders with fastai and PyTorch. O'Reilly Media. A top-down, practical introduction to deep learning that starts with working code and gradually reveals the underlying theory. Howard's philosophy — that practitioners should build working systems before studying mathematical foundations — aligns with this textbook's MBA audience. Even if you never write deep learning code, the first four chapters provide an excellent practical perspective on transfer learning, training, and evaluation. The fastai library makes state-of-the-art deep learning accessible with remarkably few lines of code.

9. Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning Publications. Written by the creator of the Keras deep learning framework, this book is the standard practical reference for building deep learning models. Chollet is unusually skilled at explaining both the "what" and the "why" of deep learning design decisions. Chapter 1 ("What is deep learning?") and Chapter 5 ("Fundamentals of machine learning") are valuable even for non-technical readers seeking to understand the practitioner's perspective.

10. Ng, A. (2023). "A Chat with Andrew on MLOps: From Model-Centric to Data-Centric AI." DeepLearning.AI. Andrew Ng, one of the most influential figures in practical AI, argues that the AI community has over-invested in model architecture innovation and under-invested in data quality. His "data-centric AI" framework — focusing on improving training data rather than model complexity — is directly relevant to the "start simple" principle discussed in this chapter. Available as a free lecture/interview online.


Case Study Background

11. Silver, D., Huang, A., Maddison, C. J., et al. (2016). "Mastering the Game of Go with Deep Neural Networks and Tree Search." Nature, 529(7587), 484-489. The original AlphaGo paper. Describes how deep neural networks (policy and value networks) were combined with Monte Carlo Tree Search to achieve superhuman Go performance. A landmark paper in AI history. The methods section is technical, but the introduction and discussion sections are accessible and worth reading for the strategic reasoning behind the approach.

12. Silver, D., Schrittwieser, J., Simonyan, K., et al. (2017). "Mastering the Game of Go Without Human Knowledge." Nature, 550(7676), 354-359. The AlphaGo Zero paper, describing how a system trained entirely through self-play — with no human game data — surpassed the original AlphaGo in just three days. The result that human knowledge can be a constraint, not just an enabler, for deep learning systems has profound implications for business applications where "the way we've always done it" may not be optimal.

13. Koopman, P., & Wagner, M. (2017). "Autonomous Vehicle Safety: An Interdisciplinary Challenge." IEEE Intelligent Transportation Systems Magazine, 9(1), 90-96. A clear-eyed assessment of the safety challenges in autonomous driving, written by Carnegie Mellon researchers with deep expertise in safety-critical systems. Provides useful context for evaluating Tesla's approach and any deep learning deployment in safety-critical domains. Particularly relevant for business leaders who must assess whether deep learning systems are "safe enough" for their application.

14. Karpathy, A. (2023). "State of GPT." [Talk, Microsoft Build 2023] An hour-long presentation by Andrej Karpathy (by then at OpenAI, after his tenure leading Tesla AI) explaining how large language models are trained, including pre-training, supervised fine-tuning, and reinforcement learning from human feedback. Connects the neural network training concepts from this chapter to the large language models covered in Chapter 17. One of the most accessible expert-level explanations of modern AI available.


Deep Learning Strategy and Business Implications

15. Agrawal, A., Gans, J., & Goldfarb, A. (2022). Power and Prediction: The Disruptive Economics of Artificial Intelligence. Harvard Business Review Press. The sequel to Prediction Machines (recommended in Chapter 1), this book examines how AI changes organizational decision-making at the system level — not just individual predictions but entire decision workflows. The analysis of when to adopt AI "point solutions" vs. redesigning systems around AI is directly relevant to the deep learning vs. traditional ML decision framework discussed in this chapter.

16. Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). "Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data?" Advances in Neural Information Processing Systems (NeurIPS), 35. An empirical study confirming what practitioners have long observed: on structured, tabular data, tree-based models (random forests, gradient-boosted trees) match or outperform deep learning in most cases, while being faster, cheaper, and more interpretable. Essential evidence for the "start simple" principle and the deep learning decision framework. Every business leader who hears "we need deep learning for everything" should be aware of this paper's findings.

17. Bommasani, R., Hudson, D. A., Adeli, E., et al. (2021). "On the Opportunities and Risks of Foundation Models." arXiv preprint arXiv:2108.07258. Stanford Institute for Human-Centered Artificial Intelligence. A comprehensive (200+ page) report examining "foundation models" — large pre-trained models (like GPT, BERT, and CLIP) that serve as the basis for many downstream applications. The report examines technical capabilities, societal impact, ethical considerations, and economic implications. Sections on transfer learning, adaptation, and the economics of pre-trained models are directly relevant to this chapter's discussion of transfer learning and GPU economics.

18. Patterson, D., Gonzalez, J., Le, Q., et al. (2021). "Carbon Emissions and Large Neural Networks." arXiv preprint arXiv:2104.10350. A Google Research paper quantifying the carbon footprint of training large neural networks. The paper finds that the environmental cost of AI training is significant but manageable — and that it can be dramatically reduced through hardware efficiency, data center location, and training optimization. Relevant for business leaders who must balance AI ambition with environmental sustainability commitments, a theme explored further in Chapter 30.


Historical and Conceptual Background

19. LeCun, Y., Bengio, Y., & Hinton, G. (2015). "Deep Learning." Nature, 521(7553), 436-444. A survey paper by three pioneers of deep learning (often called the "godfathers of deep learning," all three won the 2018 Turing Award). Accessible for its historical context, overview of CNN and RNN architectures, and vision for the future of the field. The Nature format imposes conciseness, making this one of the most efficient ways to get the expert perspective on deep learning's significance.

20. Marcus, G. (2018). "Deep Learning: A Critical Appraisal." arXiv preprint arXiv:1801.00631. A prominent critique of deep learning's limitations by NYU professor Gary Marcus. Marcus identifies ten challenges — including brittleness, inability to handle novel situations, and lack of transparency — that remain relevant. Reading this alongside the LeCun-Bengio-Hinton survey provides a balanced view of deep learning's strengths and weaknesses. Essential for developing the critical perspective this chapter encourages.

21. Sejnowski, T. J. (2018). The Deep Learning Revolution. MIT Press. A history of deep learning told by one of the field's participants — Terry Sejnowski co-invented the Boltzmann machine with Geoffrey Hinton in the 1980s. The book provides rich historical context for the ideas in this chapter, including the AI winters, the resurgence of neural networks, and the personalities behind the breakthroughs. Written for a general audience with a scientist's precision.


Practical Resources

22. Hugging Face. [huggingface.co] The leading platform for pre-trained models, datasets, and deep learning tools. Hosts thousands of models for NLP, computer vision, and audio processing — all available for transfer learning. Exploring the model catalog provides a practical feel for the breadth of pre-trained models available and the tasks they address. The documentation includes tutorials suitable for beginners.

23. Papers With Code. [paperswithcode.com] A community-maintained database linking machine learning papers to their implementations and benchmark results. Invaluable for understanding the state of the art in any deep learning subfield — what accuracy levels are achievable, what architectures are currently dominant, and how quickly the field is advancing. Business leaders can use it to verify vendor claims: "Our model achieves state-of-the-art performance" can be checked against the actual leaderboards.

24. Google Colab. [colab.research.google.com] A free, browser-based environment for running Python code with GPU access. For readers who want to experiment with neural networks hands-on — even without installing any software — Colab provides free (though limited) GPU access. Many tutorials and courses (including fast.ai and DeepLearning.AI) provide Colab notebooks that let you train and experiment with neural networks at no cost.


For foundational machine learning concepts referenced in this chapter, see the further reading for Chapters 7-12. For the applications of neural networks to text, images, and generative AI, see the further reading for Chapters 14, 15, and 17-18.