Chapter 33 Further Reading: Audio Compression — MP3, Perceptual Coding & What We Lose
Foundational Psychoacoustics
Zwicker, Eberhard, and Hugo Fastl. Psychoacoustics: Facts and Models, 3rd ed. Springer, 2007. The foundational textbook of psychoacoustics by the researcher who developed the Bark scale and critical band theory — the physics on which all perceptual audio codecs rest. Technically demanding but essential for anyone who wants to understand the masking model at depth. Covers simultaneous masking, temporal masking, the absolute threshold, and much more.
Moore, Brian C.J. An Introduction to the Psychology of Hearing, 6th ed. Emerald, 2012. More accessible than Zwicker-Fastl. Moore's textbook is the standard introduction to hearing science for students without a physics background. Excellent chapters on masking and its neural basis.
Békésy, Georg von. Experiments in Hearing. McGraw-Hill, 1960. The Nobel Prize-winning work on the mechanics of the cochlea. Foundational for understanding why masking is the way it is — the traveling wave on the basilar membrane determines the spatial structure of masking. Historically important and surprisingly readable.
Codec Standards and Technical Documents
Brandenburg, Karlheinz, and G. Stoll. "ISO/MPEG-1 Audio: Coding of High-Quality Digital Audio for Rates between 32 and 384 kbit/s." Journal of the Audio Engineering Society, 42/10 (1994): 789–814. The primary paper describing the MP3 standard by its principal developer. Technical but essential for understanding what the codec actually does versus what popular accounts claim it does.
ISO/IEC 11172-3:1993. Information Technology — Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s — Part 3: Audio. The official MPEG-1 Audio standard, including Layer 3 (MP3). Publicly available from the ISO. More a specification than a tutorial, but essential for technical accuracy.
Herre, Jürgen, and James D. Johnston. "Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping." Proceedings of the AES 101st Convention, 1996. The paper introducing Temporal Noise Shaping (TNS) — the AAC technology that specifically addresses the pre-echo problem discussed in this chapter.
Listening Tests and Empirical Evidence
Meyer, E. Brad, and David R. Moran. "Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback." Journal of the Audio Engineering Society, 55/9 (2007): 775–779. The definitive controlled study of high-resolution vs. CD-standard audio perceptibility. Their null result is the most cited finding in the hi-res audio debate. Essential reading alongside the Reiss meta-analysis.
Reiss, Joshua D. "A Meta-Analysis of High Resolution Audio Perceptual Evaluation." Journal of the Audio Engineering Society, 64/6 (2016): 364–379. Meta-analysis finding a small but statistically significant preference for hi-res audio across 80 studies. Provides important context for interpreting Meyer-Moran and understanding the state of the evidence.
Toole, Floyd E. Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms. Focal Press, 2018. Comprehensive treatment of the perception of reproduced sound, including extensive discussion of listening test methodology and the psychology of audiophile evaluation. Essential for critically evaluating subjective audio claims.
Historical and Cultural Analysis
Sterne, Jonathan. MP3: The Meaning of a Format. Duke University Press, 2012. The most thorough cultural and historical analysis of the MP3 codec. Sterne examines the cultural history of psychoacoustic research, the social construction of what counts as "good enough" audio, and the political economy of the MP3 patent regime. The chapter on the "audile technique" — the specific mode of listening cultivated by the psychoacoustic researchers — is particularly illuminating for this chapter's themes.
Katz, Mark. Capturing Sound: How Technology Has Changed Music. University of California Press, 2010. Chapter 9 ("Music in 1s and 0s") addresses the digital revolution, MP3, and Napster. Less technical than Sterne but more accessible and useful for historical context.
Kot, Greg. Ripped: How the Wired Generation Revolutionized Music. Scribner, 2009. Journalistic account of the Napster era and the music industry's response to digital distribution. Excellent social history of the decade when the MP3 transformed the music business.
Voice Science and the Singer's Formant
Sundberg, Johan. The Science of the Singing Voice. Northern Illinois University Press, 1987. The essential text on the acoustic science of the trained singing voice, including the first systematic investigation of the singer's formant. Relevant to Aiko Tanaka's research context and to understanding why the singer's formant region is acoustically significant.
Titze, Ingo R. Principles of Voice Production. National Center for Voice and Speech, 2000. Comprehensive treatment of the voice as an acoustic instrument. Covers the relationship between formant structure and vocal quality, essential background for the singer's formant discussion in Section 33.7.
Bloothooft, G., and R. Plomp. "Spectral Analysis of Sung Vowels." Journal of the Acoustical Society of America, 80/4 (1986): 1304–1316. Empirical measurements of the singer's formant in professional singers across voice types. Direct evidence for the phenomenon that Aiko's research examines.
Online and Interactive Resources
Xiph.org — "A Digital Media Primer for Geeks" (video) Monty Montgomery's outstanding video introduction to digital audio fundamentals, including sampling, quantization, and — in the second part — codec principles. Available at xiph.org/video. The section on "what codecs do" is directly relevant to this chapter.
Hydrogenaudio Wiki (wiki.hydrogenaud.io) Community-maintained technical wiki covering every aspect of audio codecs, formats, and quality evaluation. The articles on MP3, AAC, Opus, and listening test methodology are technically rigorous and frequently updated.
IETF RFC 6716: Definition of the Opus Audio Codec The official Opus codec specification, freely available from the IETF. For technically-minded readers who want to understand the state of the art beyond what MP3 and AAC offer.
Hydrogen Audio ABX Test Results Archive (hydrogenaud.io) Extensive database of listener-reported ABX test results comparing formats and bit rates. Provides empirical data on where compression artifacts become audible across different material types and listener abilities.
For the Mathematically Inclined
Princen, John, and Alan Bradley. "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation." IEEE Transactions on ASSP, 34/5 (1986): 1153–1161. The original paper introducing the Modified Discrete Cosine Transform (MDCT). The mathematics behind the analysis window structure of MP3 and AAC.
Noll, Peter. "MPEG Digital Audio Coding." IEEE Signal Processing Magazine, 14/5 (1997): 59–81. Tutorial on the MPEG audio coding standards, including the technical basis of the psychoacoustic model and bit allocation. Accessible to readers with signal processing background.