Chapter 37 Exercises: Music in Social Media — The Acoustics of Virality

Part A: Conceptual Understanding

A1. Define the "attention economy" and explain how it changes the structural demands placed on music compared to the pre-streaming era. Give two specific examples of acoustic properties that the attention economy favors and two that it penalizes, with physical explanations for each.

A2. Explain why spectral brightness (high spectral centroid) is particularly effective for gaining listener attention in the first three seconds of a social media video. Give two distinct physical reasons — one relating to the auditory system and one relating to the technology through which music is typically consumed on social media.

A3. TikTok's algorithm uses "completion rate" as its highest-weighted engagement signal. Explain what "completion rate" measures, why it is more valuable to the algorithm than "likes," and what acoustic properties would maximize it.

A4. Explain the difference between Spotify's "energy" metric and its "danceability" metric. What physical quantities contribute to each? Why would a piece of music score high on energy but low on danceability? Give a specific musical example.

A5. Describe the "double optimization loop" and explain how it leads to musical homogenization. At what point in the loop does the physics of acoustic features interface with the economics of platform design?


Part B: Acoustic Feature Analysis

B1. The spectral centroid formula is $C = \frac{\sum_k f_k |X_k|^2}{\sum_k |X_k|^2}$. A researcher measures the spectral centroid of three tracks: a trap hip-hop beat (C = 1.1 kHz), a pop song (C = 2.3 kHz), and a heavy metal song (C = 1.9 kHz). Rank these by predicted completion rate on TikTok and explain your ranking using the physics of spectral brightness and phone speaker reproduction.

B2. Calculate the RMS amplitude of the following simple waveform sampled at 8 values: $x = [0.8, -0.6, 0.9, -0.7, 0.85, -0.65, 0.75, -0.8]$. If a mastering engineer applies a limiter that caps all amplitudes at ±0.6, calculate the new RMS and explain what this does to the dynamic range and perceived loudness.

B3. Spotify normalizes tracks to −14 LUFS. A heavily mastered pop track arrives at −6 LUFS; a classical recording arrives at −22 LUFS. Describe what happens to each track during normalization. After normalization, which track will have more dynamic range? Which will sound more "intense"? Why?

B4. The onset strength function $O(t) = \sum_k \max(0, |X(t,k)|^2 - |X(t-1,k)|^2)$ measures the energy increase at each time frame. A song's opening 3 seconds are described as: "begins with a single solo acoustic guitar arpeggio, gradually increasing in volume." Estimate whether $O(0)$ is likely to be high or low and explain what this predicts about the song's first-second engagement on TikTok.

B5. The SIR model gives $R_0 = \beta/\gamma$ where $\beta$ is the transmission rate and $\gamma$ is the recovery rate. For each of the following musical profiles, estimate whether $\beta$ and $\gamma$ are high or low, and whether the song is more likely to be a "viral smash" (burns bright, burns fast) or a "slow burn classic" (spreads slowly, lasts long): (a) a 15-second perfectly-crafted hook, instantly recognizable but repetitive; (b) a 5-minute orchestral piece with stunning development but no obvious 15-second hook; (c) a 3-minute indie ballad with a distinctive vocal timbre and emotionally resonant lyric that rewards repeated listening.


Part C: Platform Physics and Algorithm Analysis

C1. TikTok's "sound layer" creates a positive feedback loop for songs associated with high-completion-rate videos. Compare this feedback loop to acoustic resonance in a physical cavity. In what specific ways is the analogy apt, and in what ways does it break down?

C2. Compare the loudness targets of three streaming platforms: Spotify (−14 LUFS), Apple Music (−16 LUFS), and TikTok (variable, with additional processing). What does the choice of target loudness reveal about each platform's assumptions about its listening environment? What acoustic consequence follows from a platform using a louder (less negative) LUFS target?

C3. A music producer wants to maximize the viral potential of a new song on TikTok. She identifies the "virality zone" as high energy (>0.70) and high danceability (>0.68). Design a specific set of production decisions — one for each of the following dimensions — that would push the song into this zone: (a) instrumentation choice, (b) tempo, (c) dynamic processing, (d) spectral EQ.

C4. The chapter argues that platform algorithms do not distinguish between "good" music and "engaging" music. Propose an acoustic feature or combination of features that might better track genuine musical quality rather than just engagement. What challenges would you face in operationalizing "quality" as a measurable acoustic quantity?

C5. Memeability is described as "maximum emotional meaning per second of audio." Using information theory concepts (mutual information, entropy), give a formal account of why a 2-second musical moment can carry a large amount of emotional information. What acoustic properties maximize the mutual information between a short clip and the emotional state it evokes?


Part D: Virality Research Design

D1. Design a study to empirically test the hypothesis that "danceability" as measured by Spotify is culturally biased toward Western popular music conventions. Specify: (a) the comparison groups of music you would analyze, (b) what additional "danceability" measure you would use as a culture-appropriate ground truth, (c) what result would confirm and what would disconfirm the hypothesis.

D2. A researcher claims that music with higher spectral centroid (brighter sound) goes viral faster on TikTok. Describe a confound that might explain this correlation without the causal relationship proposed. Then describe a research design that would control for this confound.

D3. Using the virality_analysis.py code from this chapter as a starting point, describe the analysis you would run to test whether the "virality zone" (high energy + high danceability) has become larger or smaller over the period 2012–2023. What data would you need? What statistical test would you apply to the null hypothesis "the virality zone did not change in size"?

D4. The chapter describes a polarization effect: mainstream music becomes acoustically more homogeneous while the long tail becomes more diverse. Design a study using Spotify acoustic feature data that would empirically test this hypothesis. What would you measure? How would you operationalize "homogeneous" and "diverse"?

D5. The SIR epidemic model predicts that songs with high $\beta$ (shareability) but also high $\gamma$ (listener fatigue) will have a sharp, brief viral peak, while songs with moderate $\beta$ and low $\gamma$ will spread slowly but achieve lasting cultural impact. Propose a way to empirically measure $\beta$ and $\gamma$ for specific songs using publicly available streaming data. What data streams would you need, and what assumptions would your measurement require?


Part E: Synthesis and Critical Analysis

E1. The chapter presents two opposing positions on whether algorithm-optimized music can still be art. Write a 400-word synthesis position that incorporates both views, uses specific acoustic/physical arguments, and arrives at a nuanced conclusion. Your conclusion should neither fully endorse nor fully reject either position.

E2. Compare the "acoustic selection environment" of TikTok/Spotify to two historical acoustic selection environments: (a) the concert hall in 18th-century Vienna, and (b) AM radio in 1960s America. For each comparison, identify: what acoustic features each environment rewarded, what it penalized, and what music evolved to thrive in it. What is similar and what is different about the current streaming environment's selection pressures?

E3. The chapter identifies a paradox: algorithmic recommendation narrows the mainstream while deepening niche communities. Use the physics of clustering (or network physics) to explain this paradox mechanistically — why does optimizing for engagement produce polarization rather than either pure homogenization or pure diversification?

E4. Aiko Tanaka is mentioned briefly in this chapter's context — she uses social media to disseminate her research on the singer's formant. If she were to apply the acoustic virality framework from this chapter to the question of how scientific content propagates on social media, what would she predict? What acoustic/structural features of scientific communication are compatible with viral propagation, and which are not?

E5. Compose a 300-word critique of the "Spotify Sound" homogenization narrative from the perspective of a music sociologist who argues that streaming has actually increased access to musical diversity for listeners outside major music-industry centers (e.g., rural areas, developing countries, non-English-speaking populations). How does this critique relate to the mainstream/long-tail polarization analysis in the chapter? Is the critique compatible with the acoustic evidence?