Chapter 27: Computer Vision and Video Analysis - Further Reading
Foundational Computer Vision Texts
Introductory
Szeliski, R. (2022). Computer Vision: Algorithms and Applications (2nd ed.). Springer. Comprehensive introduction to computer vision. Free online at szeliski.org/Book/. Covers image formation, feature detection, motion estimation, and 3D reconstruction. Essential foundation for sports video analysis.
Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing (4th ed.). Pearson. Classic text on image processing fundamentals. Covers filtering, segmentation, and morphological operations that underpin detection systems.
Prince, S. J. D. (2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. Probabilistic approach to computer vision. Strong on mathematical foundations and model-based reasoning.
Deep Learning for Vision
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. The definitive deep learning textbook. Free at deeplearningbook.org. Essential for understanding CNNs, RNNs, and modern architectures.
Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning. Practical introduction using Keras. Good for implementing vision models quickly.
Howard, J., & Gugger, S. (2020). Deep Learning for Coders with fastai and PyTorch. O'Reilly. Practical, top-down approach to deep learning. Excellent for building working systems.
Object Detection and Tracking
Seminal Papers
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). "You Only Look Once: Unified, Real-Time Object Detection." CVPR. Introduced YOLO, revolutionizing real-time object detection. Foundation for many sports applications.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." NeurIPS. Two-stage detection architecture that remains highly influential.
Wojke, N., Bewley, A., & Paulus, D. (2017). "Simple Online and Realtime Tracking with a Deep Association Metric." ICIP. DeepSORT algorithm combining appearance features with tracking. Widely used in sports.
Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). "Simple Online and Realtime Tracking." ICIP. Original SORT algorithm - simple, fast, effective baseline for multi-object tracking.
Survey Papers
Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2020). "Deep Learning in Video Multi-Object Tracking: A Survey." Neurocomputing, 381, 61-88. Comprehensive survey of deep learning approaches to multi-object tracking.
Wu, Y., Lim, J., & Yang, M. H. (2013). "Online Object Tracking: A Benchmark." CVPR. Established tracking benchmarks and evaluation methodology.
Pose Estimation
Key Papers
Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields." CVPR. OpenPose paper - foundational work on multi-person pose estimation.
Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). "Deep High-Resolution Representation Learning for Visual Recognition." CVPR. HRNet architecture maintaining high resolution for accurate keypoint detection.
Lugaresi, C., Tang, J., Nash, H., et al. (2019). "MediaPipe: A Framework for Building Perception Pipelines." arXiv. MediaPipe framework enabling real-time pose estimation on mobile devices.
Güler, R. A., Neverova, N., & Kokkinos, I. (2018). "DensePose: Dense Human Pose Estimation in the Wild." CVPR. Dense correspondence estimation for detailed body surface mapping.
Books and Tutorials
Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press. Excellent coverage of graphical models used in pose estimation. Free online.
OpenPose Documentation - https://github.com/CMU-Perceptual-Computing-Lab/openpose - Comprehensive implementation details and tutorials
Action Recognition
Foundational Papers
Carreira, J., & Zisserman, A. (2017). "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset." CVPR. I3D architecture and Kinetics dataset - major advances in video understanding.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). "Learning Spatiotemporal Features with 3D Convolutional Networks." ICCV. C3D model for learning video representations.
Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). "SlowFast Networks for Video Recognition." ICCV. Dual-pathway architecture for video understanding at multiple temporal resolutions.
Vaswani, A., et al. (2017). "Attention Is All You Need." NeurIPS. Transformer architecture now widely applied to video understanding.
Sports-Specific
Yue, Y., Lucey, P., Carr, P., Bialkowski, A., & Matthews, I. (2014). "Learning Fine-Grained Spatial Models for Dynamic Sports Play Prediction." ICDM. Deep learning for spatial modeling in sports plays.
Felsen, P., Lucey, P., & Ganguly, S. (2017). "Where Will They Go? Predicting Fine-Grained Adversarial Multi-Agent Motion Using Conditional Variational Autoencoders." ECCV. Trajectory prediction in sports using deep generative models.
Sports Analytics with Computer Vision
Basketball-Specific
Cervone, D., D'Amour, A., Bornn, L., & Goldsberry, K. (2016). "A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes." JASA, 111(514), 585-599. EPV model using tracking data - seminal work in basketball analytics.
Wang, K. C., & Zemel, R. (2016). "Classifying NBA Offensive Plays Using Neural Networks." MIT Sloan Sports Analytics Conference. Neural network approach to play classification.
Sicilia, A., Pelechrinis, K., & Goldsberry, K. (2019). "DeepHoops: Evaluating Micro-Actions in Basketball Using Deep Feature Representations of Spatio-Temporal Data." KDD. Deep learning for micro-action evaluation in basketball.
Sha, L., Lucey, P., Yue, Y., et al. (2016). "Chalkboarding: A New Spatiotemporal Query Paradigm for Sports Play Retrieval." IUI. Novel query interface for searching plays using spatial patterns.
Cross-Sport Applications
Gade, R., & Moeslund, T. B. (2018). "Constrained Multi-Target Tracking for Team Sports Activities." Computer Vision and Image Understanding, 159, 80-92. Multi-target tracking framework for team sports.
Manafifard, M., Ebadi, H., & Moghaddam, H. A. (2017). "A Survey on Player Tracking in Soccer Videos." Computer Vision and Image Understanding, 159, 19-46. Comprehensive survey of player tracking techniques.
Thomas, G., Gade, R., Moeslund, T. B., Carr, P., & Hilton, A. (2017). "Computer Vision for Sports: Current Applications and Research Topics." Computer Vision and Image Understanding, 159, 3-18. Overview of computer vision applications across sports.
Datasets and Benchmarks
General Video Understanding
Kinetics Dataset - https://deepmind.com/research/open-source/kinetics - Large-scale video action recognition dataset - Pre-training source for sports models
UCF101 - https://www.crcv.ucf.edu/data/UCF101.php - 101 action categories from YouTube videos
COCO (Common Objects in Context) - https://cocodataset.org/ - Standard benchmark for detection and pose estimation
Sports-Specific
NBA Tracking Data - Available through NBA Stats API (limited) - Second Spectrum provides comprehensive data to teams
SoccerNet - https://www.soccer-net.org/ - Large-scale soccer video understanding benchmark
Sports-1M - Million sports videos for recognition research
Online Courses and Tutorials
Computer Vision
Stanford CS231n: CNNs for Visual Recognition - http://cs231n.stanford.edu/ - Premier course on deep learning for vision - Lecture videos and assignments available
Michigan EECS 498: Deep Learning for Computer Vision - https://web.eecs.umich.edu/~justincj/teaching/eecs498/ - Comprehensive modern curriculum
Fast.ai Practical Deep Learning - https://www.fast.ai/ - Practical, code-first approach
Specialized Topics
Coursera: Visual Perception for Self-Driving Cars - Good coverage of detection, tracking, depth estimation - University of Toronto
PyTorch Tutorials: Video Classification - https://pytorch.org/tutorials/ - Official tutorials for video understanding
Tools and Libraries
Core Frameworks
PyTorch - https://pytorch.org/ - Preferred framework for vision research - Strong ecosystem (torchvision, PyTorch Video)
TensorFlow - https://www.tensorflow.org/ - Good production deployment options - TF Hub for pre-trained models
Computer Vision Libraries
OpenCV - https://opencv.org/ - Essential for video I/O, basic vision operations - Python and C++ interfaces
Detectron2 - https://github.com/facebookresearch/detectron2 - State-of-the-art detection and segmentation
MMDetection - https://github.com/open-mmlab/mmdetection - Comprehensive detection toolbox
Pose Estimation
MediaPipe - https://mediapipe.dev/ - Real-time ML solutions for pose, hands, face
OpenPose - https://github.com/CMU-Perceptual-Computing-Lab/openpose - Multi-person pose estimation
MMPose - https://github.com/open-mmlab/mmpose - Pose estimation toolbox
Video Understanding
PyTorchVideo - https://pytorchvideo.org/ - Video understanding library from Facebook
SlowFast - https://github.com/facebookresearch/SlowFast - Video recognition models
Conferences and Venues
Computer Vision
CVPR (Computer Vision and Pattern Recognition) - Premier computer vision conference - Annual, proceedings available online
ICCV (International Conference on Computer Vision) - Biennial major conference
ECCV (European Conference on Computer Vision) - Biennial European venue
Machine Learning
NeurIPS (Neural Information Processing Systems) - Major ML venue with vision papers
ICML (International Conference on Machine Learning) - Top ML conference
Sports Analytics
MIT Sloan Sports Analytics Conference - https://www.sloansportsconference.com/ - Research paper track includes CV work
CVPR Sports Workshop - Workshop at CVPR focused on sports vision
ACM KDD Sports Analytics Workshop - Data mining approaches to sports
Industry Resources
Company Research Blogs
Second Spectrum Engineering Blog - Insights from NBA tracking provider
Google AI Blog - Sports - https://ai.googleblog.com/ - ML/CV research with sports applications
Meta AI - https://ai.facebook.com/ - Video understanding research
Practitioner Blogs
PyImageSearch - https://pyimagesearch.com/ - Practical computer vision tutorials
Towards Data Science - https://towardsdatascience.com/ - Sports analytics and CV articles
Recommended Reading Path
For CV Beginners
- Stanford CS231n lectures (free online)
- Szeliski textbook - Chapters 1-5
- PyTorch tutorials - vision section
- Implement basic detection with Detectron2
- Practice on sports datasets
For ML Practitioners New to Vision
- Szeliski textbook - selected chapters
- YOLO and Faster R-CNN papers
- OpenPose paper
- Implement tracking pipeline
- Apply to basketball video
For Sports Analysts Adding CV Skills
- OpenCV Python tutorials
- MediaPipe pose estimation
- Pre-trained model inference
- Basketball-specific papers (Cervone, Wang)
- Build shot analysis tool
For Researchers
- Current CVPR/ICCV proceedings
- Survey papers on tracking and action recognition
- Sports-specific workshops
- Implement and extend recent methods
- Publish novel applications
Future Directions Reading
Emerging Topics
Self-Supervised Learning - He, K., et al. (2020). "Momentum Contrast for Unsupervised Visual Representation Learning." CVPR.
Vision Transformers - Dosovitskiy, A., et al. (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." ICLR.
Neural Radiance Fields - Mildenhall, B., et al. (2020). "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis." ECCV.
Generative Models - Ho, J., Jain, A., & Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." NeurIPS.
Sports-Specific Frontiers
Broadcast Video Analysis - Converting broadcast footage to tracking data
Privacy-Preserving Analytics - Federated learning for distributed analysis
Real-Time 3D Reconstruction - Volumetric capture of sports action
Natural Language + Vision - Generating play descriptions from video
Quick Reference
Must-Read Papers (Top 10)
- YOLO (Redmon et al., 2016)
- Faster R-CNN (Ren et al., 2015)
- DeepSORT (Wojke et al., 2017)
- OpenPose (Cao et al., 2017)
- I3D (Carreira & Zisserman, 2017)
- Attention Is All You Need (Vaswani et al., 2017)
- EPV Model (Cervone et al., 2016)
- HRNet (Sun et al., 2019)
- SlowFast (Feichtenhofer et al., 2019)
- ResNet (He et al., 2016)
Essential Tools
| Task | Tool |
|---|---|
| Video I/O | OpenCV, FFmpeg |
| Detection | Detectron2, YOLOv8 |
| Tracking | DeepSORT, ByteTrack |
| Pose | MediaPipe, MMPose |
| Action | PyTorchVideo, SlowFast |
| Visualization | Matplotlib, Plotly |