Chapter 36 Further Reading: AI and Music Generation
Foundational Research Papers
Agostinelli, A., Denk, T. I., Borsos, Z., et al. (2023). "MusicLM: Generating Music from Text." arXiv preprint arXiv:2301.11325. The foundational Google paper on text-to-music generation using hierarchical audio language models. Essential reading for understanding the architecture described in this chapter.
Défossez, A., Copet, J., Synnaeve, G., & Adi, Y. (2024). "High Fidelity Neural Audio Compression." arXiv preprint arXiv:2210.13438. The EnCodec paper describing neural audio codecs — the compression systems that make latent audio diffusion tractable.
Engel, J., Hantrakul, L., Gu, C., & Roberts, A. (2020). "DDSP: Differentiable Digital Signal Processing." International Conference on Learning Representations (ICLR) 2020. The paper introducing physics-informed neural audio synthesis. Groundbreaking for demonstrating how to embed physical models in neural architectures. Freely available on arXiv.
Ho, J., Jain, A., & Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." NeurIPS 2020. The foundational paper on the diffusion model framework. Understanding this paper gives you the mathematical foundation for all audio diffusion systems.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). "High-Resolution Image Synthesis with Latent Diffusion Models." CVPR 2022. Stable Diffusion's foundational paper — the latent diffusion framework for images that is directly adapted for audio in systems like Lyria.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). "Attention Is All You Need." NeurIPS 2017. The transformer paper. Indispensable for understanding the architecture underlying most modern AI music generation.
Books and Book Chapters
Boden, M. A. (2004). The Creative Mind: Myths and Mechanisms (2nd ed.). Routledge. The source for Boden's combinational/exploratory/transformational creativity framework discussed in 36.9. Accessible and philosophically rigorous.
Cope, D. (2001). Virtual Music: Computer Synthesis of Musical Style. MIT Press. David Cope's account of his EMI (Experiments in Musical Intelligence) system, including the philosophical controversy it generated. Includes responses from critics.
Roads, C. (2001). Microsound. MIT Press. A deep dive into the granular and spectral structures of sound at the sub-note level — essential background for understanding what AI systems are actually modeling.
Xenakis, I. (1971). Formalized Music: Thought and Mathematics in Music. Indiana University Press. Xenakis's account of stochastic composition — the founding text for mathematically rigorous algorithmic music. Dense but rewarding.
Accessible Reading
Hsu, J. (2023, August 21). "Generative AI Has a Visual Plagiarism Problem." IEEE Spectrum. Covers the copyright issues around AI-generated content from a technical angle applicable to both images and music.
Levitin, D. (2006). This Is Your Brain on Music. Dutton. The neuroscience and cognitive science of musical experience — essential background for understanding what AI music does and does not achieve in terms of listener experience.
Metz, C. (2023, July 5). "AI and Music: What Does It Mean for the Future of the Industry?" The New York Times. Journalism covering the industry response to AI music generation in a balanced, well-reported way.
Pareles, J. (2024). "The A.I. Music Machine: Suno and the Challenge to the Industry." The New York Times. Coverage of the Suno emergence and RIAA lawsuit, accessible to general readers.
Legal Resources
U.S. Copyright Office. Copyright and Artificial Intelligence. (Multiple reports, 2023–2024.) The Copyright Office's ongoing analysis of AI and copyright, including analysis of training data, AI-generated works, and what constitutes "human authorship." Available at copyright.gov.
Pham, V. (2024). "RIAA v. Suno: What the AI Music Lawsuit Means for the Industry." Music Business Worldwide. Accessible legal analysis of the RIAA lawsuit and its implications.
Historical and Philosophical Context
Hofstadter, D. R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books. Hofstadter's exploration of creativity, consciousness, and formal systems — the philosophical context for asking whether AI can be creative. Hofstadter famously felt troubled by Cope's EMI system.
Turing, A. M. (1950). "Computing Machinery and Intelligence." Mind, 59(236), 433–460. The paper that introduced the "imitation game" — the philosophical precursor to every discussion of whether a machine can "really" be intelligent, creative, or musical.
Online Resources and Tools
Magenta Project (Google). magenta.tensorflow.org — Open-source music AI research and tools, including MusicVAE, DDSP, and interactive demos. The educational resources are particularly strong.
OpenAI MuseNet. openai.com/research/musenet — Background on MuseNet, an earlier (2019) multi-instrument music generation system, with listening examples. A useful historical comparison point.
Suno. suno.com — The commercial text-to-song system discussed in this chapter. Try it yourself; listen critically with the spectral concepts in mind.
Udio. udio.com — The competing commercial text-to-song system. Compare outputs from both systems using the same prompt — an excellent practical exercise in AI music analysis.