Chapter 15 Further Reading: Computer Vision for Business
Foundations of Computer Vision
1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Chapters 9-12. The definitive technical reference on deep learning, including comprehensive treatments of convolutional neural networks (Chapter 9), sequence modeling (Chapter 10), and practical methodology (Chapter 11). Chapters 9 and 12 are particularly relevant to this chapter's discussion of CNN architecture and computer vision applications. More mathematical than this textbook requires, but invaluable as a reference when you need to go deeper. Available free online at deeplearningbook.org.
2. Chollet, F. (2021). Deep Learning with Python, 2nd ed. Manning Publications. Chapters 8-9. Francois Chollet, creator of the Keras deep learning library, provides the most accessible hands-on guide to building computer vision applications with Python. Chapters 8 (computer vision) and 9 (advanced CV techniques) walk through image classification, transfer learning, and object detection with complete code examples. Ideal for business professionals who want to move from conceptual understanding to practical implementation.
3. Szeliski, R. (2022). Computer Vision: Algorithms and Applications, 2nd ed. Springer. A comprehensive, updated survey of the computer vision field spanning feature detection, image segmentation, object recognition, and 3D reconstruction. More academic than Chollet but provides deeper theoretical grounding. The second edition includes substantial new material on deep learning approaches. Available free online from the author's website.
CNN Architecture and Transfer Learning
4. He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep Residual Learning for Image Recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778. The paper that introduced ResNet — the residual network architecture that enabled training of much deeper neural networks by using skip connections. ResNet-50 remains one of the most widely used pre-trained models for transfer learning in business applications. Understanding why residual connections work (they allow gradients to flow through the network without vanishing) helps explain why modern CNNs can be so deep and so accurate.
5. Tan, M., & Le, Q. V. (2019). "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." Proceedings of the International Conference on Machine Learning, 6105-6114. The paper behind EfficientNet, which introduced a principled method for scaling CNN architectures across width, depth, and resolution. EfficientNet achieves state-of-the-art accuracy with significantly fewer parameters than previous models — a critical advantage for edge deployment and cost-sensitive applications. Directly relevant to the chapter's discussion of edge deployment and model optimization.
6. Howard, A., et al. (2017). "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv preprint arXiv:1704.04861. Introduces the MobileNet architecture, designed specifically for on-device inference with limited computational resources. The depthwise separable convolution technique described in this paper reduces model size and computation by an order of magnitude compared to standard convolutions, enabling computer vision on smartphones and IoT devices. Essential reading for anyone considering edge deployment.
7. Dosovitskiy, A., et al. (2020). "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv preprint arXiv:2010.11929. The Vision Transformer (ViT) paper demonstrated that the transformer architecture — originally developed for NLP (Chapter 14) — could match or exceed CNN performance on image recognition tasks. This convergence of vision and language architectures is foundational to the multimodal AI systems discussed in Chapter 18 and represents a significant shift in how the field approaches computer vision.
Object Detection and Segmentation
8. Redmon, J., et al. (2016). "You Only Look Once: Unified, Real-Time Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788. The original YOLO paper, which introduced the single-stage object detection paradigm. YOLO's key innovation — processing the entire image in a single forward pass — enabled real-time object detection at speeds suitable for video analysis and production deployment. The YOLO family has since evolved through multiple versions (YOLOv2 through YOLOv9+), each improving accuracy and speed.
9. Jocher, G. (2023). "Ultralytics YOLOv8." GitHub repository and documentation. The practical starting point for anyone deploying YOLO-based object detection today. YOLOv8 provides pre-trained models for detection, segmentation, classification, and pose estimation with a simple Python API. The documentation includes tutorials for training custom models — directly applicable to business use cases like Athena's shelf analytics.
Retail Computer Vision Applications
10. Tonioni, A., & Di Stefano, L. (2019). "Product Recognition in Store Shelves as a Sub-Graph Isomorphism Problem." Image Analysis and Processing — ICIAP 2017, Lecture Notes in Computer Science. An academic treatment of the shelf analytics problem, framing product recognition as a graph matching problem. Provides technical depth on how shelf analytics systems match detected products against planogram specifications. Useful for understanding the algorithmic foundations behind systems like Athena's.
11. Polacco, A., & Backes, K. (2018). "The Amazon Go Concept: Implications, Applications, and Sustainability." Journal of Business and Management, 24(1), 79-92. An early business analysis of Amazon Go's concept and its implications for the retail industry. While published before the full trajectory of Amazon Go's commercial challenges became apparent, it provides a useful framework for evaluating cashierless checkout technology against traditional retail models.
12. Willems, K., Smolders, A., Brengman, M., Luyten, K., & Schöning, J. (2017). "The Path-to-Purchase Is Paved with Digital Opportunities: An Overview of In-Store Technology Research." Technological Forecasting and Social Change, 124, 228-242. A comprehensive review of in-store digital technologies, including computer vision, beacon technology, digital signage, and augmented reality. Provides business context for retail CV deployments by situating them within the broader landscape of in-store technology innovation. Useful for understanding where computer vision fits in a retailer's technology portfolio.
Manufacturing and Quality Inspection
13. Villalba-Diez, J., Schmidt, D., Geegan, R., de Leon Hijes, F., & Ordieres-Meré, J. (2019). "Deep Learning for Industrial Computer Vision Quality Control in the Printing Industry." Sensors, 19(18), 3987. A detailed case study of deploying deep learning for quality inspection in a printing facility. The paper covers the full pipeline from data collection to deployment and reports a 25x improvement in defect detection rate compared to human inspectors. Provides a realistic template for manufacturing CV projects, including discussion of practical challenges and deployment considerations.
14. Bhatt, P. M., et al. (2021). "Image-Based Surface Defect Detection Using Deep Learning: A Review." Journal of Computing and Information Science in Engineering, 21(4). A systematic review of deep learning methods for surface defect detection across multiple manufacturing domains — steel, textiles, semiconductors, and more. Categorizes approaches by defect type, model architecture, and data strategy. Valuable for manufacturing leaders evaluating which CV approach is most appropriate for their specific inspection challenges.
Healthcare Computer Vision
15. McKinney, S. M., et al. (2020). "International Evaluation of an AI System for Breast Cancer Screening." Nature, 577, 89-94. The landmark Google Health study demonstrating that a deep learning system for mammography screening outperformed board-certified radiologists in breast cancer detection, reducing both false positives and false negatives. The study used data from the US and UK, providing evidence of cross-population generalization. Directly referenced in the chapter's discussion of medical imaging applications.
16. Topol, E. J. (2019). Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. Basic Books. Cardiologist and digital health researcher Eric Topol argues that AI's greatest contribution to medicine will be freeing physicians from pattern-recognition tasks (image analysis, data review) so they can spend more time on the uniquely human aspects of care — empathy, communication, and complex decision-making. Provides important strategic context for healthcare CV: the goal is augmentation, not replacement. Accessible and thoughtfully written.
17. US Food and Drug Administration. (2024). "Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices." FDA Online Database. The FDA's continuously updated list of AI-enabled medical devices that have received regulatory clearance or approval. As of 2025, over 700 devices are listed, with radiology and cardiology representing the largest categories. Essential reference for understanding what types of medical CV applications have achieved regulatory approval and what the regulatory expectations are.
Ethics, Bias, and Surveillance
18. Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77-91. The foundational study demonstrating that commercial facial analysis systems had dramatically higher error rates for darker-skinned women compared to lighter-skinned men. This paper catalyzed a global conversation about bias in AI systems and led to significant changes in how technology companies develop and deploy facial analysis technology. Required reading for anyone deploying computer vision systems that process images of people. Directly referenced in the chapter and explored further in Chapter 25.
19. Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs. A sweeping analysis of how technology companies extract value from behavioral data — often captured through visual surveillance — and use it to predict and influence human behavior. While broader than computer vision specifically, Zuboff's framework is directly applicable to understanding the surveillance implications of retail and workplace CV deployments. Provides the theoretical grounding for the chapter's ethical discussion.
20. Hill, K. (2020). "The Secretive Company That Might End Privacy as We Know It." The New York Times, January 18. The investigative report that revealed Clearview AI's practice of scraping billions of photographs from the internet to build a facial recognition database marketed to law enforcement. The story illustrates the extreme end of the surveillance spectrum discussed in the chapter and has become a landmark case study in the ethics of computer vision technology.
Edge Deployment and MLOps for CV
21. Warden, P., & Situnayake, D. (2019). TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers. O'Reilly Media. A practical guide to deploying machine learning models on extremely resource-constrained devices. While the focus is on microcontrollers rather than edge GPUs, the principles of model optimization — quantization, pruning, architecture selection — apply directly to the edge deployment discussion in the chapter. Particularly relevant for IoT-based CV applications.
22. NVIDIA Developer. (2024). "Jetson AI Lab: Computer Vision Models and Workflows." NVIDIA Technical Documentation. NVIDIA's practical documentation for deploying computer vision models on Jetson edge devices. Includes tutorials for running YOLO, EfficientNet, and other models on Jetson hardware with TensorRT optimization. The hands-on, deployment-oriented focus makes this directly applicable for teams implementing the kind of edge-cloud hybrid architecture described in Athena's deployment.
Accessibility and Computer Vision
23. Shaikh, S. (2018). "Building Seeing AI." Microsoft Research Blog. A first-person account by Seeing AI's creator describing the technical and design decisions behind the app. Shaikh's perspective — as both a developer and a user — provides unique insight into how accessibility requirements shaped the product's architecture. Directly relevant to Case Study 2 and the broader principle that the best AI applications are built by people who understand the problem firsthand.
24. Morris, M. R. (2020). "AI and Accessibility." Communications of the ACM, 63(6), 35-37. A concise overview of how AI — particularly computer vision and natural language processing — is transforming assistive technology. Morris identifies key research challenges, including the need for more diverse training data, the tension between capability and bias, and the importance of co-design with disabled users. Provides strategic context for Seeing AI and similar applications.
Industry Reports and Surveys
25. Grand View Research. (2024). "Computer Vision Market Size, Share & Trends Analysis Report, 2024-2030." A comprehensive market analysis projecting the global computer vision market to exceed $50 billion by 2030, with manufacturing, retail, and healthcare as the largest application segments. Useful for business leaders building the market-sizing component of CV business cases. The report's segmentation by technology (cloud vs. edge), application, and geography provides a structured view of market opportunities.
For additional references on specific topics, see: Chapter 13 (neural network foundations), Chapter 18 (generative image AI and multimodal models), Chapter 23 (cloud AI services and APIs), and Chapter 25 (bias in AI systems). The Bibliography provides complete citation details for all references.