Chapter 16: Further Reading
Foundational Texts
-
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. Chapter 14 (Autoencoders) provides a thorough theoretical treatment of undercomplete, sparse, denoising, and contractive autoencoders. Chapter 20 covers deep generative models including VAEs. Freely available at https://www.deeplearningbook.org/.
-
Bishop, C. M. and Bishop, H. (2024). Deep Learning: Foundations and Concepts. Springer. Chapter 17 (Generative Models) covers VAEs with careful mathematical exposition, including the ELBO derivation and connections to expectation-maximization.
-
Murphy, K. P. (2023). Probabilistic Machine Learning: Advanced Topics. MIT Press. Chapters 20--21 cover variational autoencoders and related deep generative models with a rigorous probabilistic perspective. Freely available at https://probml.github.io/pml-book/.
-
Prince, S. J. D. (2023). Understanding Deep Learning. MIT Press. Chapters 17--18 cover autoencoders and VAEs with excellent diagrams and intuitive explanations. Freely available at https://udlbook.github.io/udlbook/.
Key Papers: Autoencoders
-
Hinton, G. E. and Salakhutdinov, R. R. (2006). "Reducing the Dimensionality of Data with Neural Networks." Science, 313(5786), 504--507. The landmark paper that demonstrated deep autoencoders can learn nonlinear dimensionality reductions superior to PCA, revitalizing interest in deep learning.
-
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. (2008). "Extracting and Composing Robust Features with Denoising Autoencoders." ICML 2008. Introduced denoising autoencoders and showed they learn more useful features than standard autoencoders.
-
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. (2010). "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion." JMLR, 11, 3371--3408. Extended the denoising autoencoder to multiple layers and established connections to score matching.
-
Ng, A. (2011). "Sparse Autoencoder." CS294A Lecture Notes, Stanford University. Clear tutorial on sparse autoencoders with both L1 and KL divergence sparsity penalties. Widely cited as an accessible introduction.
Key Papers: Variational Autoencoders
-
Kingma, D. P. and Welling, M. (2014). "Auto-Encoding Variational Bayes." ICLR 2014. arXiv:1312.6114. One of the two papers that simultaneously introduced VAEs. Presents the reparameterization trick and ELBO-based training.
-
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). "Stochastic Backpropagation and Approximate Inference in Deep Generative Models." ICML 2014. The other foundational VAE paper, arriving at similar conclusions via stochastic backpropagation.
-
Kingma, D. P. and Welling, M. (2019). "An Introduction to Variational Autoencoders." Foundations and Trends in Machine Learning, 12(4), 307--392. arXiv:1906.02691. A comprehensive tutorial by the original VAE authors. The definitive reference for understanding VAE theory and practice.
-
Higgins, I., Matthey, L., Pal, A., et al. (2017). "beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework." ICLR 2017. Showed that increasing the weight on the KL term encourages disentangled representations, sparking extensive research on disentanglement.
-
van den Oord, A., Vinyals, O., and Kavukcuoglu, K. (2017). "Neural Discrete Representation Learning." NeurIPS 2017. Introduced VQ-VAE, replacing continuous latent variables with a discrete codebook. Produced much sharper reconstructions than standard VAEs.
-
Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2016). "Generating Sentences from a Continuous Space." CoNLL 2016. Identified the posterior collapse problem in VAEs applied to text and proposed KL annealing as a solution.
Key Papers: Contrastive and Self-Supervised Learning
-
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020). "A Simple Framework for Contrastive Learning of Visual Representations." ICML 2020. Introduced SimCLR, demonstrating that a simple contrastive framework with strong augmentations can match supervised pretraining. The ablation studies on augmentation strategies are particularly valuable.
-
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020). "Momentum Contrast for Unsupervised Visual Representation Learning." CVPR 2020. Introduced MoCo, which uses a momentum-updated encoder and a queue of negative examples to decouple batch size from the number of negatives.
-
Grill, J.-B., Strub, F., Altche, F., et al. (2020). "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning." NeurIPS 2020. Showed that contrastive learning can work without negative pairs, using asymmetric architectures and EMA updates.
-
Caron, M., Touvron, H., Misra, I., et al. (2021). "Emerging Properties in Self-Supervised Vision Transformers (DINO)." ICCV 2021. Demonstrated that self-supervised Vision Transformers learn features that contain explicit information about object segmentation, without any supervision.
-
He, K., Chen, X., Xie, S., Li, Y., Dollar, P., and Girshick, R. (2022). "Masked Autoencoders Are Scalable Vision Learners." CVPR 2022. Showed that masking 75% of image patches and training a ViT to reconstruct them produces excellent representations, connecting the masked language modeling paradigm to computer vision.
-
Oquab, M., Darcet, T., Moutakanni, T., et al. (2024). "DINOv2: Learning Robust Visual Features without Supervision." TMLR 2024. Scaled up self-supervised pretraining to produce foundation models for computer vision that match or exceed supervised pretraining across diverse tasks.
Tutorials and Surveys
-
Doersch, C. (2016). "Tutorial on Variational Autoencoders." arXiv:1606.05908. An accessible introduction to VAEs with clear derivations and practical advice.
-
Tschannen, M., Djolonga, J., Rubenstein, P. K., Gelly, S., and Lucic, M. (2020). "On Mutual Information Maximization for Representation Learning." ICLR 2020. A critical examination of the theoretical foundations of contrastive learning, showing that the success of these methods may not be explained by mutual information alone.
-
Le-Khac, P. H., Healy, G., and Smeaton, A. F. (2020). "Contrastive Representation Learning: A Framework and Review." IEEE Access, 8, 193907--193934. A comprehensive survey of contrastive learning methods.
-
Liu, X., Zhang, F., Hou, Z., et al. (2023). "Self-Supervised Learning: Generative or Contrastive." IEEE Transactions on Knowledge and Data Engineering, 35(1), 857--876. A thorough survey comparing generative (autoencoder-based) and contrastive self-supervised learning.
Software and Implementations
-
PyTorch VAE examples: https://github.com/pytorch/examples/tree/main/vae --- The official PyTorch VAE example, a clean minimal implementation.
-
lightly: https://github.com/lightly-ai/lightly --- A Python library for self-supervised learning with PyTorch, implementing SimCLR, BYOL, MoCo, and many other methods with a unified API.
-
solo-learn: https://github.com/vturrisi/solo-learn --- A library of self-supervised methods for visual representation learning, with extensive benchmarking.