Chapter 36 Exercises: AI and Music Generation — Pattern Machines and Creative Machines
Part A: Conceptual Understanding
A1. Describe the "Musikalisches Würfelspiel" and explain how it embodies the core principle of all algorithmic music generation. What does it assume about the nature of music?
A2. Explain the difference between an autoregressive transformer and a diffusion model for music generation. Which is better suited to generating long-range coherent musical structure, and why?
A3. Define "spectral averaging" in the context of AI-generated music. Why does this phenomenon occur, given the training objective of most neural music models?
A4. Explain the statistics/physics distinction using your own example. Choose a musical phenomenon other than the singer's formant and explain: (a) what the statistics look like, (b) what physical principle generates those statistics, and (c) what an AI system trained on the statistics would and would not capture.
A5. Margaret Boden distinguishes combinational, exploratory, and transformational creativity. Classify the following AI music tasks according to Boden's framework and justify your classification: (a) generating a melody in the style of Mozart; (b) generating music by exploring the interior of the "classical piano" feature space; (c) generating a piece that invents a new musical scale that was not present in the training data.
Part B: Technical Analysis
B1. The spectral centroid of a sound is its "center of mass" in frequency. A researcher finds that AI-generated pop music has a spectral centroid distribution tightly clustered around 2.1 kHz, while human-produced pop music shows spectral centroids ranging from 0.9 kHz to 4.3 kHz. What does this tell us about how the AI was trained, and what musical consequence does this have for a listener?
B2. Consider a first-order Markov chain trained on a C-major scale melody (notes C D E F G A B C). The transition matrix shows that from each note, the chain most likely moves to an adjacent scale degree. What does this Markov chain capture about tonal music, and what does it miss? Be specific about at least three features of tonal music not captured.
B3. The forward diffusion process adds Gaussian noise to audio according to: $x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon$. At $t = T$, $\bar{\alpha}_T \approx 0$. Explain in plain language: (a) what $x_T$ looks like physically, (b) why starting from $x_T$ and reversing the process can generate music, and (c) why the model needs to have learned the structure of music for this to work.
B4. Aiko found that AI-generated soprano singing had a singer's formant region approximately 1.8 kHz wide, versus a typical trained singer's formant cluster of approximately 400 Hz wide. Calculate the ratio of widths. In terms of spectral precision, what does this ratio represent about the AI's knowledge of soprano technique?
B5. A researcher computes the Shannon entropy of interval distributions for three systems: a Markov chain trained on Bach chorales (entropy: 2.8 bits), a deep transformer trained on the same data (entropy: 2.1 bits), and the Bach chorales themselves (entropy: 3.4 bits). What does it mean that the transformer has lower entropy than Bach? What musical consequence follows from this?
Part C: Creative Process Analysis
C1. An AI music system generates a "jazz improvisation" that passes spectral analysis checks for jazz style but receives uniformly negative reviews from jazz musicians. The musicians describe it as "technically correct but soulless." Using the concepts of this chapter, give a precise, physics-grounded account of what "soulless" might mean acoustically. What measurable properties might distinguish the AI jazz from human jazz?
C2. Compare two approaches to AI-assisted composition: (a) a composer uses AI to generate a complete piece from a text prompt, then publishes it unchanged; (b) a composer uses AI to generate many fragments, curates and arranges them with human judgment, and significantly edits the result. Using Boden's creativity framework and the statistics/physics distinction, analyze the differences between these approaches. Which would you consider more "creative," and why?
C3. David Cope's EMI system (1980s) used rule-based pattern extraction and recombination to generate music "in the style of" Bach or Chopin. Modern systems use statistical learning on millions of tracks. In what ways is the modern approach a qualitative improvement? In what ways do both approaches share the same fundamental limitation described by Aiko Tanaka?
C4. A recording artist proposes using AI to generate 50 different chord progressions, then composing melodies and lyrics for each one herself, selecting her favorite 10 for her album. At what point in this workflow is the "human creativity" contribution? Could this process result in music that is more creative, less creative, or the same as entirely human-composed music? Defend your position using specific arguments.
C5. Consider the embodiment argument against AI creativity: human musical performance is physically embodied (a violinist's bow arm has mass; a singer's lungs have capacity), and these physical constraints shape the musical output in meaningful ways. Generate three specific examples — from different musical traditions or instruments — where the physical embodiment of the performer produces musical structure that a bodyless AI could not generate authentically.
Part D: Physics of Originality and Copyright
D1. The U.S. Copyright Office has stated that works generated "entirely by AI without meaningful human authorship" cannot be copyrighted. Analyze the following scenario: A composer writes a text prompt of 400 words specifying the key, tempo, time signature, mood, instrumentation, melodic contour, harmonic language, and lyrical theme of a piece, then selects the best of 20 AI-generated versions and publishes it. Does this constitute "meaningful human authorship"? What physical and philosophical criteria would you use to decide?
D2. Two pieces of music are physically indistinguishable by all spectral analysis measures — identical power spectra, identical temporal microstructure. One was composed by a human over three years; the other was generated by AI in four seconds. Are they equally "original"? Are they equally "valuable"? How do your answers to these two questions differ, and why?
D3. The RIAA argues that training AI music models on copyrighted recordings constitutes infringement even if no specific protected work is reproduced. Evaluate this argument using the following framework: What, physically, does the trained model "store"? What does it not store? Is the trained model more like a recording of the music, a musician who has listened to and learned from the music, or something else entirely?
D4. Consider musical style: a composer spends a lifetime developing a unique stylistic voice — specific harmonic preferences, rhythmic fingerprints, timbral choices. An AI system can learn this style from recordings and generate new music "in the composer's style." The composer is still alive. Analyze the ethical dimensions of this scenario from the perspective of: (a) physics (what is actually being copied?), (b) economics (what harm is done?), and (c) personhood (what is the relationship between a person and their artistic style?).
D5. Propose a "physics of originality" metric — a specific, measurable criterion for determining how original a piece of music is relative to a reference corpus. Your metric should be based on spectral or information-theoretic principles rather than human judgment. What would your metric measure? What would it miss? How would AI-generated music score compared to boundary-pushing human compositions?
Part E: Synthesis and Reflection
E1. Write a 500-word dialogue between Aiko Tanaka and an AI music researcher who believes AI will eventually solve all the limitations Aiko identified in her formant experiment. Each position should be argued with physics-based reasoning. You may reach a conclusion or leave the debate unresolved — but both sides must make substantive arguments.
E2. The chapter describes four positions on AI's role in music: generator, instrument, assistant, and analyst. Design a specific AI-assisted music creation workflow for each position, appropriate to a different musical context (choose your own contexts). For each workflow, specify: what the human does, what the AI does, how their contributions interact, and what the resulting music might be capable of that neither could achieve alone.
E3. Evaluate this claim: "The AI learned the statistics of music. It didn't learn music's physics." (Aiko Tanaka, this chapter.) Is this critique specific to current AI architectures, or is it a fundamental limitation of any data-driven learning approach? If a future AI system incorporated explicit physics models (harmonic series, vocal tract acoustics, room acoustics), would it have "learned music's physics"? Defend your position.
E4. The chapter's Thought Experiment asks whether an AI-generated Bach cantata, indistinguishable from authentic Bach, would be equally valuable. Write a 400-word response from the perspective of: (a) a musicologist who studies Bach, (b) a teenager who has never thought about Bach before, (c) a music licensing executive, (d) a philosopher of mind. Do their answers converge or diverge? What does the divergence (or convergence) tell us about the nature of musical value?
E5. Design a research study that would empirically test Aiko Tanaka's claim that AI music systems reproduce the average of the training distribution rather than intentional, physics-driven structural choices. Your study should: (a) specify a measurable musical phenomenon (not the singer's formant, which Aiko already used), (b) describe how you would produce AI-generated and human-composed versions, (c) specify what spectral analysis you would perform, (d) state what result would confirm and what would disconfirm Aiko's hypothesis, and (e) address one potential confound in your design.