Chapter 18 Exercises: Information Theory & Music

Part A: Shannon Entropy Calculations

A1. A melody consists of only three notes: C, G, and E, appearing with probabilities P(C) = 1/2, P(G) = 1/4, P(E) = 1/4.

(a) Calculate the Shannon entropy H of this melody in bits per note. Use H = -Σ P(x) log₂ P(x).

(b) What would the entropy be if all three notes were equally likely (P = 1/3 each)?

(c) What would the entropy be if C always appeared (P(C) = 1, P(G) = 0, P(E) = 0)?

(d) Which of these three distributions carries the most information per note? Why?

A2. A pentatonic scale has five notes. Consider two pentatonic melodies: - Melody A: Uses all five notes with probabilities 0.4, 0.25, 0.15, 0.1, 0.1 - Melody B: Uses all five notes equally (0.2 each)

(a) Calculate H(A) and H(B).

(b) H_max for 5 symbols = log₂(5) ≈ 2.322 bits. What fraction of the maximum entropy does each melody achieve? (Express as H/H_max.)

(c) A third melody uses only three of the five pentatonic notes (with equal probability). Calculate its entropy. Is this higher or lower than Melody A? Does this match your intuition?

A3. The information content of a specific event with probability P is I = -log₂(P) bits.

(a) In a C major melody, suppose the probability that the note following the leading tone (B) is the tonic (C) is P = 0.75. What is the information content of hearing C after B? Compare to the information content if P = 0.1 (an unexpected continuation).

(b) In a twelve-tone composition where all pitch classes are used with equal probability (P = 1/12), what is the information content of any particular pitch?

(c) A deceptive cadence substitutes the vi chord for the expected I chord. If the vi chord appears at a cadential point with probability P = 0.12, and the I chord appears with probability P = 0.70, calculate the information content of each. Explain why the deceptive cadence is informationally surprising.

A4. Consider a melody where each note is a fair coin flip between the previous note (P = 0.5) and the note a minor second above the previous (P = 0.5).

(a) What is the conditional entropy H(current note | previous note)?

(b) Is this melody more or less predictable than a melody where each note is chosen uniformly from all 12 pitch classes? Explain.

(c) What is the unigram (context-free) entropy of this melody? Hint: think about what the long-run distribution of pitch classes will look like as the melody progresses.

A5. The entropy of a source with N equally likely symbols is H = log₂(N) bits.

(a) Calculate H for: N=2 (binary choice), N=7 (diatonic scale), N=12 (chromatic scale), N=88 (full piano keyboard), N=128 (full MIDI range).

(b) A melody drawn uniformly from the 7 diatonic pitch classes has entropy log₂(7) ≈ 2.807 bits. A melody drawn uniformly from all 12 chromatic pitch classes has entropy log₂(12) ≈ 3.585 bits. What is the extra information "cost" of using chromaticism? Explain intuitively what this means.

(c) If a composer restricts their palette to 7 diatonic notes (reducing entropy from 3.585 to 2.807 bits), they have "thrown away" 0.778 bits of potential information per note. In what sense is this a good trade for the composer and listener?


Part B: Expectation, Tension, and Prediction Error

B1. Describe a musical experience you have had where you predicted what would happen next and were proven right. Describe another experience where your prediction was wrong (pleasantly). Using ITPRA theory (Imagination, Tension, Prediction, Reaction, Appraisal):

(a) Identify which stage of ITPRA was most prominent in each experience.

(b) In the second experience (wrong prediction), was your initial Reaction positive or negative? What determined whether the Appraisal was ultimately positive or negative?

(c) Do you think the amount of musical training you have affects the accuracy of your musical predictions? How, and at what stage of ITPRA does training most affect processing?

B2. The ERAN (Early Right Anterior Negativity) is a brain potential that increases with the unexpectedness of a harmonic event. An experiment plays listeners a series of chord progressions in C major, followed by: - Condition A: C major chord (tonic, expected) - Condition B: Am chord (submediant, expected deceptive cadence) - Condition C: F# major chord (out of key, unexpected) - Condition D: F# diminished chord (maximally unexpected)

(a) Rank these four conditions from smallest to largest expected ERAN response.

(b) If P(C major at cadence) = 0.55 and P(F# major at cadence) = 0.01, calculate the information content of each event.

(c) Does a large ERAN response indicate that the music is "better" or "worse"? Explain why ERAN magnitude is not a quality measure.

B3. A jazz musician improvises a solo over a chord progression. Experienced jazz listeners can form accurate predictions for what notes will come next, based on their knowledge of bebop vocabulary. Novice listeners cannot.

(a) From an information-theoretic perspective, do experienced and novice listeners "receive" the same information from the same solo? Explain.

(b) Who receives more information per note — the expert or the novice? Is this consistent with the idea that the expert "gets more" out of the music?

(c) Propose an experiment that could test whether musical training changes the information content listeners extract from music. What would you measure, and what results would confirm or disconfirm the hypothesis?

B4. "Meta-expectation" is the expectation of a violation. An experienced listener who has heard many deceptive cadences can predict them, paradoxically lowering the information content of the violation itself.

(a) Explain in information-theoretic terms: if a deceptive cadence has probability P₁ for a novice listener and probability P₂ > P₁ for an expert listener (because the expert predicts it better), how does the information content differ for each listener?

(b) Does this mean that musical education reduces the information content of music for the educated listener? If so, is this a problem? What does the educated listener gain if not more information?

(c) There is a paradox: highly trained musicians can experience strong emotional responses to music they have heard hundreds of times and can predict completely. How can information theory account for this?

B5. Compose (or describe in detail) a four-measure melody in which you deliberately: - Use two moments of high information content (very surprising notes or harmonies) - Use two moments of low information content (highly expected, confirming notes)

Locate these moments strategically to create a specific emotional arc (you define the arc). Then analyze: does the melody's emotional arc match the information content profile you designed? Are there aspects of the emotional experience that your information analysis does not capture?


Part C: Compression and Musical Grammar

C1. Data compression algorithms work by finding redundancy — patterns that repeat — and encoding them more compactly. Apply this idea to music:

(a) A piece consists of a 4-bar theme followed by the same theme transposed up a fifth, then a third theme of similar length (all different), then a return of the first theme, and finally a coda using material from the first theme. Write out the compression scheme: how would you encode this piece in terms of themes (T1, T2, T3) and operations (transpose, repeat) rather than note by note? How does your compressed representation compare in length to the full note sequence?

(b) A standard 32-bar jazz chorus has the form AABA (where A is 8 bars repeated three times and B is an 8-bar "bridge"). If A and B sections are completely different, what compression ratio does the AABA form achieve compared to stating all 32 bars independently?

(c) The opening of Beethoven's Fifth Symphony (first movement) repeatedly uses the four-note motif in various transformations. Estimate the compression ratio of describing the first 30 bars in terms of the motif and its transformations, versus writing out all the notes.

C2. MP3 audio compression exploits two types of redundancy: - Statistical redundancy: patterns in the audio that are predictable - Perceptual redundancy: information the human ear cannot detect (removed by the encoder)

(a) Which type of redundancy is most directly related to Shannon entropy? Explain.

(b) A very high-entropy piece of music (maximally unpredictable, like white noise) should compress less well than a tonal melody. Why? What does this imply about the relationship between musical structure and audio file size?

(c) Consider two recordings of the same melody: (1) a live performance with natural room acoustics and performance variations, (2) a synthesized performance with perfect tempo and pitch. Which would compress to a smaller file size? Why? What does this tell us about the "information" contained in performance variation?

C3. Musical notation is itself a compression scheme: a score encodes many minutes of music in a compact symbolic representation.

(a) Estimate the compression ratio of musical notation. A typical symphony movement of 10 minutes contains approximately 100,000 notes across all instruments. A full orchestra score might contain 50 pages of notation. Roughly how many notes per page? How does this compare to storing the same information as a list of (pitch, duration, instrument, dynamics) tuples?

(b) What information does a score compress away that the performance must supply? List at least four types of information that must be added by the performer that are not explicitly in the score.

(c) A conductor's interpretation of a score can be thought of as resolving the ambiguity left by the compressed notation. In information-theoretic terms, the conductor is adding information that the score left unspecified. Is a more interpretively flexible score (one that leaves more to the performer) higher or lower entropy than a more prescriptive score?

C4. Tonal harmony can be understood as a grammar — a set of rules that constrains what note can follow what. This grammar reduces the conditional entropy of the pitch sequence.

(a) In a key of C major, estimate the conditional probability of each note following the leading tone B (assume the probabilities are approximately 0.7 for C, 0.1 for D, 0.05 for G, and 0.05 for A, with the remaining 0.1 distributed among other notes). Calculate the conditional entropy H(next note | current note = B).

(b) Compare this to the conditional entropy if no tonal grammar were operating (uniform distribution over 12 notes). The difference between these two entropies represents the "information reduction" provided by the tonal context. In what sense is this reduction cognitively useful for the listener?

(c) Does a composer who violates the expected resolution of the leading tone (e.g., resolving B down to Bb instead of up to C) produce a high-information or low-information event? How does this compare to the same note (Bb) appearing in a context where it was expected?

C5. The concept of redundancy in information theory is R = 1 - H/H_max, where H is the actual entropy and H_max is the maximum possible entropy. R ranges from 0 (no redundancy, maximum information) to 1 (complete redundancy, no information).

(a) Calculate the redundancy of a C major scale used as a melody (assume all 7 notes are equally likely). Compare to the redundancy of a chromatic melody (all 12 notes equally likely).

(b) Shannon showed that to transmit information reliably over a noisy channel, some redundancy is necessary — you need to add redundant information to detect and correct errors. Propose an analogy in music: in what sense does musical redundancy help listeners "decode" music reliably across noisy perceptual conditions (bad acoustics, inattention, unfamiliarity with style)?

(c) Too much redundancy makes communication inefficient; too little makes it unreliable. At what level of redundancy does music seem to operate? Is this level similar to the redundancy of natural language? What does this comparison suggest?


Part D: Cross-Cultural and Advanced Analysis

D1. The chapter discusses the Spotify Spectral Dataset analysis showing that pitch entropy is higher in jazz than in pop music. Design a more rigorous study to test this claim:

(a) Operationalize "pitch entropy": exactly what would you measure and how?

(b) What sample size would you need? How would you select representative tracks from each genre?

(c) What confounds could affect your results? (Consider: era, subgenre, instrumentation, recording technology.) How would you control for these?

(d) What alternative explanation for the jazz-pop entropy difference would need to be ruled out? (Hint: think about improvisation vs. composition.)

D2. Aiko's experiment found that her composition had higher Shannon entropy than Bach's chorale. But Aiko was unsatisfied with this conclusion because she felt her music was more "complex" in some sense.

(a) Propose at least three dimensions of musical complexity that Shannon entropy does not capture. For each, give an example of music that is complex in that dimension while potentially having low Shannon entropy.

(b) Aiko decides to compute the "Kolmogorov complexity" of her composition vs. Bach's as a complementary measure. If her composition has high Shannon entropy but also high Kolmogorov complexity (hard to describe compactly), while Bach has low Shannon entropy and low Kolmogorov complexity (easy to describe compactly), what does this suggest about the nature of each piece's "complexity"?

(c) Based on your analysis, give Aiko concrete advice: if she wants to write music that is both structurally sophisticated (low Kolmogorov complexity — describable by elegant rules) and locally surprising (high conditional Shannon entropy — hard to predict note by note), what compositional approach should she pursue?

D3. Information theory was originally developed for communication over noisy channels. Apply this framework to a live concert performance:

(a) Identify the "source" (what is being communicated), the "channel" (how it is transmitted), the "noise" (what interferes with transmission), and the "receiver" (who receives the communication) in a live concert context.

(b) Shannon's Channel Capacity theorem says that there is a maximum rate at which information can be reliably transmitted over a noisy channel. What limits the "channel capacity" of a concert performance? (Think about: acoustic quality of the hall, audience attention, performer technique, listener familiarity with the style.)

(c) Does a concert in a reverberant hall with poor acoustics have lower channel capacity than a concert in a perfectly designed concert hall? What kinds of musical information are most affected by acoustic quality?

D4. The chapter discusses how Western tonality's reduction of conditional entropy might explain its global commercial dominance. Critically evaluate this hypothesis:

(a) State the hypothesis clearly: why would lower conditional entropy be commercially advantageous?

(b) What evidence supports this hypothesis?

(c) What alternative explanations for Western tonality's global dominance should be considered? (Think about: colonialism and cultural imperialism, economic power of the music industry, the role of technology in disseminating Western music, etc.)

(d) If it were shown empirically that lower entropy music is more commercially successful globally, would this imply that lower entropy music is "better" music? What would be wrong with drawing this conclusion?

D5. The redundancy of natural English text is estimated at about 75%: Shannon showed that about 75% of the letters in an English sentence could be correctly predicted from their context, leaving only 25% of letters carrying genuine information. Research or estimate the redundancy of tonal music by the following thought experiment:

If you were to play a tonal melody and stop at each note, asking a musically trained listener to predict the next note, what fraction of notes would they correctly predict? How does this compare to the 25% information rate of English text? What does this comparison suggest about music's information density relative to language?


Part E: Synthesis and Research Projects

E1. Replicate Aiko's experiment using the code provided in the chapter's code directory (entropy_analysis.py). Run the script and examine the output.

(a) Report the unigram, bigram, and trigram entropy values for all three sequences.

(b) Modify the generate_bach_like_sequence function to make the tonal grammar stricter (reduce the probability of non-diatonic notes to near zero) or looser (increase chromatic probabilities). Run the analysis again. How do the entropy values change?

(c) Create a fourth sequence type: a "modal" sequence using only the notes of the Dorian mode (D, E, F, G, A, B, C) with approximately equal probabilities. How does its entropy compare to the Bach-like tonal sequence? What does this tell you about the information-theoretic consequences of different modal systems?

E2. The chapter mentions that the entropy profile of a tonal piece — the pattern of high-entropy (uncertain) and low-entropy (predictable) moments — is itself part of the compositional design.

Analyze the entropy profile of a real piece of music you know. Specifically:

(a) Choose a short tonal composition (a folk song, a simple chorale, or a theme from a classical work) and identify its highest-entropy and lowest-entropy moments by ear. (High entropy = where you might not predict what comes next; low entropy = where the next note is obvious.)

(b) Map these high/low entropy moments onto the piece's formal structure. Do the high-entropy moments coincide with structural boundaries (beginnings of sections, modulations), and do the low-entropy moments coincide with cadences and arrivals?

(c) Does your analysis support the claim in the chapter that entropy profile is a compositional tool — that composers deliberately manipulate listener expectation? Or does the entropy pattern you found seem more like a consequence of the formal structure rather than its cause?

E3. The relationship between musical information and emotional response is one of the most debated topics in music psychology.

(a) Present the strongest version of the claim that emotion in music is purely information-theoretic — that musical emotions are entirely explained by surprise, prediction error, and dopamine release.

(b) Present the strongest objection to this reductionist claim. What aspects of musical emotion does information theory clearly fail to explain?

(c) Develop your own synthesis: is information theory a partial account of musical emotion, a complete account, or a misleading account? What additional theoretical resources would be needed for a complete theory of musical emotion?

E4. Design a study to test the claim that "listeners prefer music with intermediate entropy (approximately 1/f statistics) over music with too-high or too-low entropy."

(a) Define your stimuli: how would you generate musical excerpts with controlled entropy levels? What variables would you hold constant while varying entropy?

(b) Define your dependent variable: how would you measure "preference"? (Consider: explicit rating, listening time, physiological measures, behavioral indicators.)

(c) Predict potential confounds: what factors might cause some listeners to prefer high-entropy music over intermediate-entropy music, regardless of any universal preference? How would you control for these?

(d) What result would falsify the hypothesis? What result would support it? Would supporting the hypothesis necessarily imply that entropy is the cause of preference, or merely correlated with it?

E5. Essay Project: Write a 600–800 word essay titled "What Information Theory Tells Us — and Doesn't Tell Us — About Musical Meaning."

Your essay must: - Explain Shannon's definition of information and its application to music (2–3 paragraphs) - Describe at least two specific musical phenomena that information theory illuminates (1–2 paragraphs) - Identify at least two aspects of musical meaning that information theory cannot explain (1–2 paragraphs) - Conclude with your own assessment: is information theory a powerful framework for understanding music, a useful tool among others, or a fundamentally limited approach? (1 paragraph)

Use Aiko's entropy experiment as a specific example at some point in your essay. Cite at least one source beyond this textbook.