Chapter 17: Further Reading

Foundational Texts

  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. Chapter 20 (Deep Generative Models) covers GANs alongside other generative approaches, including the theoretical analysis of the minimax objective and convergence properties. Freely available at https://www.deeplearningbook.org/.

  • Prince, S. J. D. (2023). Understanding Deep Learning. MIT Press. Chapters 14--15 cover GANs with clear diagrams and intuitive explanations of training dynamics, mode collapse, and Wasserstein distance. Freely available at https://udlbook.github.io/udlbook/.

  • Bishop, C. M. and Bishop, H. (2024). Deep Learning: Foundations and Concepts. Springer. Chapter 17 (Generative Models) places GANs in context alongside VAEs and normalizing flows.

  • Goodfellow, I. (2016). "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv:1701.00160. A comprehensive tutorial by the GAN inventor, covering theory, practical advice, and open problems. Essential reading for anyone working with GANs.

Original Papers

Core GAN Framework

  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. (2014). "Generative Adversarial Nets." NeurIPS 2014. The paper that started it all. Introduces the adversarial framework, proves the theoretical optimum, and demonstrates initial results on MNIST and CIFAR-10.

  • Radford, A., Metz, L., and Chintala, S. (2016). "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks." ICLR 2016. Introduced DCGAN with architectural guidelines that became the standard for GAN design. Also demonstrated that GAN latent spaces support semantic arithmetic.

  • Salimans, T., Goodfellow, I., Zaremba, W., et al. (2016). "Improved Techniques for Training GANs." NeurIPS 2016. Introduced several stabilization techniques including feature matching, minibatch discrimination, historical averaging, and the Inception Score metric.

Wasserstein GAN and Training Stability

  • Arjovsky, M., Chintala, S., and Bottou, L. (2017). "Wasserstein Generative Adversarial Networks." ICML 2017. Introduced WGAN, providing theoretical motivation for the Wasserstein distance and demonstrating improved training stability.

  • Arjovsky, M. and Bottou, L. (2017). "Towards Principled Methods for Training Generative Adversarial Networks." ICLR 2017. The theoretical companion to the WGAN paper, analyzing the pathologies of GAN training and motivating the move to Wasserstein distance.

  • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). "Improved Training of Wasserstein GANs." NeurIPS 2017. Introduced the gradient penalty (WGAN-GP), replacing weight clipping with a soft constraint that works better in practice.

  • Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (2018). "Spectral Normalization for Generative Adversarial Networks." ICLR 2018. Introduced spectral normalization, a simple and effective method for enforcing Lipschitz constraints.

  • Mao, X., Li, Q., Xie, H., Lau, R. Y. K., Wang, Z., and Smolley, S. P. (2017). "Least Squares Generative Adversarial Networks." ICCV 2017. Proposed replacing the BCE loss with a least-squares loss, providing non-vanishing gradients and improved stability.

Conditional and Application-Specific GANs

  • Mirza, M. and Osindero, S. (2014). "Conditional Generative Adversarial Nets." arXiv:1411.1784. The original conditional GAN paper, showing how to condition both generator and discriminator on class labels.

  • Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017). "Image-to-Image Translation with Conditional Adversarial Networks (Pix2Pix)." CVPR 2017. Demonstrated conditional GANs for paired image-to-image translation, introducing the PatchGAN discriminator and U-Net generator.

  • Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks (CycleGAN)." ICCV 2017. Extended image-to-image translation to unpaired data using cycle consistency loss.

  • Brock, A., Donahue, J., and Simonyan, K. (2019). "Large Scale GAN Training for High Fidelity Natural Image Synthesis (BigGAN)." ICLR 2019. Scaled conditional GANs to ImageNet, achieving then-state-of-the-art image quality with large batch sizes and class conditioning.

StyleGAN Family

  • Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2018). "Progressive Growing of GANs for Improved Quality, Stability, and Variation." ICLR 2018. Introduced progressive training from low to high resolution, enabling generation of high-resolution face images.

  • Karras, T., Laine, S., and Aila, T. (2019). "A Style-Based Generator Architecture for Generative Adversarial Networks (StyleGAN)." CVPR 2019. Introduced the mapping network, AdaIN-based style injection, and noise inputs for stochastic variation.

  • Karras, T., Laine, S., Aittala, M., et al. (2020). "Analyzing and Improving the Image Quality of StyleGAN (StyleGAN2)." CVPR 2020. Eliminated artifacts from progressive growing, introduced weight demodulation, and improved quality.

  • Karras, T., Aittala, M., Laine, S., et al. (2021). "Alias-Free Generative Adversarial Networks (StyleGAN3)." NeurIPS 2021. Addressed texture sticking and aliasing issues through translation and rotation equivariance.

Evaluation Metrics

  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium." NeurIPS 2017. Introduced the Frechet Inception Distance (FID), now the standard metric for evaluating generative models.

  • Sajjadi, M. S. M., Bachem, O., Lucic, M., Bousquet, O., and Gelly, S. (2018). "Assessing Generative Models via Precision and Recall." NeurIPS 2018. Decomposed generation quality into precision (quality of individual samples) and recall (coverage of the data distribution).

  • Borji, A. (2022). "Pros and Cons of GAN Evaluation Measures: New Developments." Computer Vision and Image Understanding, 215. A comprehensive survey of GAN evaluation metrics.

Theory and Analysis

  • Mescheder, L., Geiger, A., and Nowozin, S. (2018). "Which Training Methods for GANs Do Actually Converge?" ICML 2018. Rigorous analysis of GAN training convergence, showing that many popular methods do not converge and proposing gradient penalties that do.

  • Lucic, M., Kurach, K., Michalski, M., Gelly, S., and Bousquet, O. (2018). "Are GANs Created Equal? A Large-Scale Study." NeurIPS 2018. A comprehensive comparison of GAN variants, finding that most improve over the original GAN but none consistently dominates.

Surveys

  • Gui, J., Sun, Z., Wen, Y., Tao, D., and Ye, J. (2022). "A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications." IEEE Transactions on Knowledge and Data Engineering, 35(4), 3313--3332. A thorough survey covering GAN theory, architectures, training techniques, and applications.

  • Jabbar, A., Li, X., and Omar, B. (2021). "A Survey on Generative Adversarial Networks: Algorithms, Theory, and Applications." IEEE Access, 9, 137488--137510.

Software and Implementations

  • PyTorch DCGAN Tutorial: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html --- Official PyTorch tutorial implementing DCGAN on CelebA faces.

  • StyleGAN2-ADA: https://github.com/NVlabs/stylegan2-ada-pytorch --- NVIDIA's official PyTorch implementation of StyleGAN2 with adaptive data augmentation.

  • clean-fid: https://github.com/GaParmar/clean-fid --- A library for computing FID scores with proper preprocessing, avoiding common implementation pitfalls.