Case Study 37.2: The "Spotify Sound" — How Streaming Algorithms Homogenized Pop Music

The Complaint

By approximately 2016-2018, a complaint had become common enough among music critics, artists, and listeners that it was being discussed in mainstream publications: popular music was beginning to sound the same. Not just "modern pop sounds like modern pop" — that has always been true — but something more specific: the variance within mainstream popular music seemed to be contracting. Songs felt similar in tempo, similar in energy, similar in emotional valence, similar in production approach. The guitar solo had largely disappeared from mainstream pop. The dramatic slow build was replaced by immediate hook delivery. Dynamic contrast was compressed away. The unique sonic fingerprint of specific recording studios, specific equipment, and specific production approaches — what audiophiles call "character" — was increasingly absent.

The blame, rightly or wrongly, was placed on Spotify.

The Evidence: What Researchers Found

Several academic studies analyzed Spotify API data from large samples of tracks released across multiple decades and compared them on acoustic features. The findings, while not uniformly consistent, pointed in a consistent direction for the mainstream tier:

Tempo convergence. The distribution of tempos in Billboard Hot 100 charting songs narrowed between 2010 and 2020. Songs in the 90-130 BPM range became increasingly dominant; songs outside this range became increasingly rare in the top tier.

Loudness stabilization (post-streaming). Loudness continued to be high but actually became somewhat less variable post-streaming normalization (2017-onward), as the competitive advantage of extreme over-mastering decreased.

Declining timbral diversity. Research by Mauch et al. (2015, published in Royal Society Open Science) analyzed a large corpus of popular music from 1960-2010 and found that timbral variety — the diversity of instrument combinations and recording textures — had been declining since the early 1980s. Streaming accelerated this trend in the most popular tier.

Harmonic simplification. The average number of distinct chords per song in the Billboard Hot 100 declined from approximately 5.8 (1960s) to approximately 3.5 (2020s). The dominance of the I-V-vi-IV progression (and its variants) in contemporary pop is well-documented.

Valence trends. Several studies found that the average valence of popular music has shifted, with some periods showing increasing "sad" or low-valence content (which some researchers link to the prevalence of bedroom pop and emo-adjacent aesthetics on streaming) and other periods showing preference for high-valence "feel-good" content. The trend is less consistent than tempo or timbre trends.

What "More Similar" Means Spectrally

From a physics perspective, "music becoming more similar" can be operationalized in several precise ways:

Reduced spectral variance. If the distribution of spectral centroids across songs in the popular tier narrows over time, songs are becoming more spectrally similar. A narrowing from a standard deviation of ±500 Hz to ±200 Hz in spectral centroid values means the acoustic "fingerprints" of mainstream songs cluster more tightly.

Reduced dynamic range distribution. If the distribution of dynamic range (LU — Loudness Units) across songs narrows, the space of compression choices is contracting. Heavy compression means low dynamic range; lighter processing allows more dynamic variation. A contracting distribution of dynamic range means producers are converging on a specific compression philosophy.

Reduced harmonic entropy. If the distribution of "harmonic surprise" (measured as the information-theoretic entropy of chord sequence probability distributions) decreases over time, harmonic progressions are becoming more predictable — a sign of convergence toward the statistically most common patterns.

Timbral convex hull shrinkage. If you represent each song as a point in a multi-dimensional timbre space (defined by MFCCs — Mel-Frequency Cepstral Coefficients — or similar features), and measure the volume of the "convex hull" (the multi-dimensional shape that contains all songs), a shrinking hull means the space of distinct timbres being used is contracting.

All four of these metrics showed measurable contraction in the mainstream popular music tier from approximately 2010 to 2022, according to the academic studies that examined them.

Is Spotify the Cause? A Causal Analysis

The attribution of this convergence to Spotify's algorithm is widespread but contested. A careful causal analysis reveals multiple interacting mechanisms:

The Spotify hypothesis (algorithm causes convergence): Spotify's recommendation algorithm preferentially distributes music that sounds like what has been successful before. Producers receive this signal through streaming data and rationally adapt their output toward what the algorithm rewards. The result is a self-reinforcing convergence toward the acoustic center of the historical success distribution.

The major label hypothesis (industry causes convergence): Major labels have always promoted homogeneous music — radio programming, distribution power, and promotional resources have historically concentrated on a narrow acoustic profile. Streaming did not change this; it just changed the mechanism. The convergence would have happened under any system dominated by major label priorities.

The technology hypothesis (production tools cause convergence): The widespread availability of the same digital audio workstations (Ableton Live, Logic Pro, Pro Tools), the same plugins, the same sample libraries, and the same production tutorials has made it easier to converge on a similar production aesthetic. When every bedroom producer has access to the same preset orchestral sounds, the same stock drum samples, and the same compression algorithms, the outputs will naturally cluster. This is independent of Spotify.

The cultural hypothesis (listener tastes cause convergence): Perhaps listeners genuinely prefer more similar music, and streaming is simply giving them what they want more efficiently than previous distribution systems. Under this hypothesis, the convergence reflects revealed preference rather than algorithmic imposition.

The most defensible conclusion is that all four mechanisms operate simultaneously and reinforce each other. Spotify's algorithm is a contributor to convergence but not the sole or necessarily primary cause. Blaming Spotify exclusively is too simple; exonerating it entirely is also too simple.

The Long Tail Counter-Evidence

Against the mainstream convergence story, one must hold the long tail evidence: streaming has dramatically increased the acoustic diversity of music that reaches audiences outside the mainstream tier.

Before streaming, independent artists in niche genres faced almost insurmountable barriers to reaching their potential audiences: radio gatekeeping, physical distribution costs, retail space limitations. A small independent jazz label releasing modal post-bop in 2001 could realistically reach only listeners who already knew to look for that label. By 2023, Spotify's recommendation algorithm could identify the 3,000 listeners in São Paulo, Oslo, and Chicago who share exactly that taste and route the music directly to them.

This long-tail diversity expansion is real and large. The number of distinct genres represented in streaming catalogs dwarfs what any previous distribution system made available. The acoustic variety of music that listeners in smaller markets can access has expanded enormously. If you measure diversity across the entire streaming catalog, the picture may be one of increasing, not decreasing, acoustic diversity.

The "Spotify Sound" homogenization story and the "streaming democratizes diversity" story are both true — they apply to different tiers of the same ecosystem.

The Acoustic Politics of the "Spotify Sound"

There is an additional dimension to the "Spotify Sound" debate that acoustic analysis alone does not capture: the political economy of whose music converges and whose diverges.

The mainstream pop acoustic profile that Spotify's algorithm tends to promote — high energy, high danceability, compressed dynamics, bright spectral profile, hook-dense structure — is not culturally neutral. It reflects the acoustic conventions of Western popular music production, particularly American and British pop. This means that non-Western music, music with complex rhythmic structures outside Western pop conventions, music with different approaches to dynamic expression (including much African, South Asian, and East Asian popular music) faces a structural disadvantage in the mainstream tier even before factors of language and cultural familiarity are considered.

The "Spotify Sound" is, in this sense, acoustically hegemonic — it encodes a specific cultural aesthetic as the neutral default and makes deviation from it legible as "non-mainstream" regardless of the cultural context in which the music was made. Whether this represents a genuine imposition of acoustic norms or simply the market outcome of a demographically skewed user base is a contested question with both acoustic and political dimensions.

Discussion Questions

  1. Researchers found that the average number of distinct chords per song in the Billboard Hot 100 declined from approximately 5.8 (1960s) to approximately 3.5 (2020s). From a physics-of-music perspective, what does this decline represent spectrally and acoustically? Is "simpler" harmony necessarily "worse" harmony? What other musical parameters might compensate for reduced harmonic variety?

  2. The chapter's "double optimization loop" predicts convergence as a mathematically inevitable outcome when both generation and distribution systems optimize for the same signal. Design a modification to either the generation system (how music is produced) or the distribution system (how music is curated and recommended) that would break the feedback loop and allow more acoustic diversity in the mainstream tier. Be specific about the mechanism.

  3. The counter-evidence from the long tail suggests that streaming has expanded acoustic diversity for audiences outside the mainstream. Does this expansion compensate for the contraction in mainstream acoustic diversity? Who benefits from each? Is there an equity argument to be made about which form of diversity is more important?

  4. If the "Spotify Sound" acoustic profile reflects Western pop production conventions and systematically disadvantages non-Western music in the mainstream recommendation tier, what changes would be needed to make the recommendation system acoustically culturally equitable? Is this technically feasible? Is it commercially likely?