Chapter 27: Computer Vision and Video Analysis - Further Reading

Foundational Computer Vision Texts

Introductory

Szeliski, R. (2022). Computer Vision: Algorithms and Applications (2nd ed.). Springer. Comprehensive introduction to computer vision. Free online at szeliski.org/Book/. Covers image formation, feature detection, motion estimation, and 3D reconstruction. Essential foundation for sports video analysis.

Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing (4th ed.). Pearson. Classic text on image processing fundamentals. Covers filtering, segmentation, and morphological operations that underpin detection systems.

Prince, S. J. D. (2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. Probabilistic approach to computer vision. Strong on mathematical foundations and model-based reasoning.

Deep Learning for Vision

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. The definitive deep learning textbook. Free at deeplearningbook.org. Essential for understanding CNNs, RNNs, and modern architectures.

Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning. Practical introduction using Keras. Good for implementing vision models quickly.

Howard, J., & Gugger, S. (2020). Deep Learning for Coders with fastai and PyTorch. O'Reilly. Practical, top-down approach to deep learning. Excellent for building working systems.


Object Detection and Tracking

Seminal Papers

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). "You Only Look Once: Unified, Real-Time Object Detection." CVPR. Introduced YOLO, revolutionizing real-time object detection. Foundation for many sports applications.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." NeurIPS. Two-stage detection architecture that remains highly influential.

Wojke, N., Bewley, A., & Paulus, D. (2017). "Simple Online and Realtime Tracking with a Deep Association Metric." ICIP. DeepSORT algorithm combining appearance features with tracking. Widely used in sports.

Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). "Simple Online and Realtime Tracking." ICIP. Original SORT algorithm - simple, fast, effective baseline for multi-object tracking.

Survey Papers

Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2020). "Deep Learning in Video Multi-Object Tracking: A Survey." Neurocomputing, 381, 61-88. Comprehensive survey of deep learning approaches to multi-object tracking.

Wu, Y., Lim, J., & Yang, M. H. (2013). "Online Object Tracking: A Benchmark." CVPR. Established tracking benchmarks and evaluation methodology.


Pose Estimation

Key Papers

Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields." CVPR. OpenPose paper - foundational work on multi-person pose estimation.

Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). "Deep High-Resolution Representation Learning for Visual Recognition." CVPR. HRNet architecture maintaining high resolution for accurate keypoint detection.

Lugaresi, C., Tang, J., Nash, H., et al. (2019). "MediaPipe: A Framework for Building Perception Pipelines." arXiv. MediaPipe framework enabling real-time pose estimation on mobile devices.

Güler, R. A., Neverova, N., & Kokkinos, I. (2018). "DensePose: Dense Human Pose Estimation in the Wild." CVPR. Dense correspondence estimation for detailed body surface mapping.

Books and Tutorials

Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press. Excellent coverage of graphical models used in pose estimation. Free online.

OpenPose Documentation - https://github.com/CMU-Perceptual-Computing-Lab/openpose - Comprehensive implementation details and tutorials


Action Recognition

Foundational Papers

Carreira, J., & Zisserman, A. (2017). "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset." CVPR. I3D architecture and Kinetics dataset - major advances in video understanding.

Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). "Learning Spatiotemporal Features with 3D Convolutional Networks." ICCV. C3D model for learning video representations.

Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). "SlowFast Networks for Video Recognition." ICCV. Dual-pathway architecture for video understanding at multiple temporal resolutions.

Vaswani, A., et al. (2017). "Attention Is All You Need." NeurIPS. Transformer architecture now widely applied to video understanding.

Sports-Specific

Yue, Y., Lucey, P., Carr, P., Bialkowski, A., & Matthews, I. (2014). "Learning Fine-Grained Spatial Models for Dynamic Sports Play Prediction." ICDM. Deep learning for spatial modeling in sports plays.

Felsen, P., Lucey, P., & Ganguly, S. (2017). "Where Will They Go? Predicting Fine-Grained Adversarial Multi-Agent Motion Using Conditional Variational Autoencoders." ECCV. Trajectory prediction in sports using deep generative models.


Sports Analytics with Computer Vision

Basketball-Specific

Cervone, D., D'Amour, A., Bornn, L., & Goldsberry, K. (2016). "A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes." JASA, 111(514), 585-599. EPV model using tracking data - seminal work in basketball analytics.

Wang, K. C., & Zemel, R. (2016). "Classifying NBA Offensive Plays Using Neural Networks." MIT Sloan Sports Analytics Conference. Neural network approach to play classification.

Sicilia, A., Pelechrinis, K., & Goldsberry, K. (2019). "DeepHoops: Evaluating Micro-Actions in Basketball Using Deep Feature Representations of Spatio-Temporal Data." KDD. Deep learning for micro-action evaluation in basketball.

Sha, L., Lucey, P., Yue, Y., et al. (2016). "Chalkboarding: A New Spatiotemporal Query Paradigm for Sports Play Retrieval." IUI. Novel query interface for searching plays using spatial patterns.

Cross-Sport Applications

Gade, R., & Moeslund, T. B. (2018). "Constrained Multi-Target Tracking for Team Sports Activities." Computer Vision and Image Understanding, 159, 80-92. Multi-target tracking framework for team sports.

Manafifard, M., Ebadi, H., & Moghaddam, H. A. (2017). "A Survey on Player Tracking in Soccer Videos." Computer Vision and Image Understanding, 159, 19-46. Comprehensive survey of player tracking techniques.

Thomas, G., Gade, R., Moeslund, T. B., Carr, P., & Hilton, A. (2017). "Computer Vision for Sports: Current Applications and Research Topics." Computer Vision and Image Understanding, 159, 3-18. Overview of computer vision applications across sports.


Datasets and Benchmarks

General Video Understanding

Kinetics Dataset - https://deepmind.com/research/open-source/kinetics - Large-scale video action recognition dataset - Pre-training source for sports models

UCF101 - https://www.crcv.ucf.edu/data/UCF101.php - 101 action categories from YouTube videos

COCO (Common Objects in Context) - https://cocodataset.org/ - Standard benchmark for detection and pose estimation

Sports-Specific

NBA Tracking Data - Available through NBA Stats API (limited) - Second Spectrum provides comprehensive data to teams

SoccerNet - https://www.soccer-net.org/ - Large-scale soccer video understanding benchmark

Sports-1M - Million sports videos for recognition research


Online Courses and Tutorials

Computer Vision

Stanford CS231n: CNNs for Visual Recognition - http://cs231n.stanford.edu/ - Premier course on deep learning for vision - Lecture videos and assignments available

Michigan EECS 498: Deep Learning for Computer Vision - https://web.eecs.umich.edu/~justincj/teaching/eecs498/ - Comprehensive modern curriculum

Fast.ai Practical Deep Learning - https://www.fast.ai/ - Practical, code-first approach

Specialized Topics

Coursera: Visual Perception for Self-Driving Cars - Good coverage of detection, tracking, depth estimation - University of Toronto

PyTorch Tutorials: Video Classification - https://pytorch.org/tutorials/ - Official tutorials for video understanding


Tools and Libraries

Core Frameworks

PyTorch - https://pytorch.org/ - Preferred framework for vision research - Strong ecosystem (torchvision, PyTorch Video)

TensorFlow - https://www.tensorflow.org/ - Good production deployment options - TF Hub for pre-trained models

Computer Vision Libraries

OpenCV - https://opencv.org/ - Essential for video I/O, basic vision operations - Python and C++ interfaces

Detectron2 - https://github.com/facebookresearch/detectron2 - State-of-the-art detection and segmentation

MMDetection - https://github.com/open-mmlab/mmdetection - Comprehensive detection toolbox

Pose Estimation

MediaPipe - https://mediapipe.dev/ - Real-time ML solutions for pose, hands, face

OpenPose - https://github.com/CMU-Perceptual-Computing-Lab/openpose - Multi-person pose estimation

MMPose - https://github.com/open-mmlab/mmpose - Pose estimation toolbox

Video Understanding

PyTorchVideo - https://pytorchvideo.org/ - Video understanding library from Facebook

SlowFast - https://github.com/facebookresearch/SlowFast - Video recognition models


Conferences and Venues

Computer Vision

CVPR (Computer Vision and Pattern Recognition) - Premier computer vision conference - Annual, proceedings available online

ICCV (International Conference on Computer Vision) - Biennial major conference

ECCV (European Conference on Computer Vision) - Biennial European venue

Machine Learning

NeurIPS (Neural Information Processing Systems) - Major ML venue with vision papers

ICML (International Conference on Machine Learning) - Top ML conference

Sports Analytics

MIT Sloan Sports Analytics Conference - https://www.sloansportsconference.com/ - Research paper track includes CV work

CVPR Sports Workshop - Workshop at CVPR focused on sports vision

ACM KDD Sports Analytics Workshop - Data mining approaches to sports


Industry Resources

Company Research Blogs

Second Spectrum Engineering Blog - Insights from NBA tracking provider

Google AI Blog - Sports - https://ai.googleblog.com/ - ML/CV research with sports applications

Meta AI - https://ai.facebook.com/ - Video understanding research

Practitioner Blogs

PyImageSearch - https://pyimagesearch.com/ - Practical computer vision tutorials

Towards Data Science - https://towardsdatascience.com/ - Sports analytics and CV articles


For CV Beginners

  1. Stanford CS231n lectures (free online)
  2. Szeliski textbook - Chapters 1-5
  3. PyTorch tutorials - vision section
  4. Implement basic detection with Detectron2
  5. Practice on sports datasets

For ML Practitioners New to Vision

  1. Szeliski textbook - selected chapters
  2. YOLO and Faster R-CNN papers
  3. OpenPose paper
  4. Implement tracking pipeline
  5. Apply to basketball video

For Sports Analysts Adding CV Skills

  1. OpenCV Python tutorials
  2. MediaPipe pose estimation
  3. Pre-trained model inference
  4. Basketball-specific papers (Cervone, Wang)
  5. Build shot analysis tool

For Researchers

  1. Current CVPR/ICCV proceedings
  2. Survey papers on tracking and action recognition
  3. Sports-specific workshops
  4. Implement and extend recent methods
  5. Publish novel applications

Future Directions Reading

Emerging Topics

Self-Supervised Learning - He, K., et al. (2020). "Momentum Contrast for Unsupervised Visual Representation Learning." CVPR.

Vision Transformers - Dosovitskiy, A., et al. (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." ICLR.

Neural Radiance Fields - Mildenhall, B., et al. (2020). "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis." ECCV.

Generative Models - Ho, J., Jain, A., & Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." NeurIPS.

Sports-Specific Frontiers

Broadcast Video Analysis - Converting broadcast footage to tracking data

Privacy-Preserving Analytics - Federated learning for distributed analysis

Real-Time 3D Reconstruction - Volumetric capture of sports action

Natural Language + Vision - Generating play descriptions from video


Quick Reference

Must-Read Papers (Top 10)

  1. YOLO (Redmon et al., 2016)
  2. Faster R-CNN (Ren et al., 2015)
  3. DeepSORT (Wojke et al., 2017)
  4. OpenPose (Cao et al., 2017)
  5. I3D (Carreira & Zisserman, 2017)
  6. Attention Is All You Need (Vaswani et al., 2017)
  7. EPV Model (Cervone et al., 2016)
  8. HRNet (Sun et al., 2019)
  9. SlowFast (Feichtenhofer et al., 2019)
  10. ResNet (He et al., 2016)

Essential Tools

Task Tool
Video I/O OpenCV, FFmpeg
Detection Detectron2, YOLOv8
Tracking DeepSORT, ByteTrack
Pose MediaPipe, MMPose
Action PyTorchVideo, SlowFast
Visualization Matplotlib, Plotly