49 min read

This glossary defines key technical terms used throughout The Physics of Music and the Music of Physics. Entries are organized alphabetically and cross-referenced to the chapter where the term is first introduced or discussed most thoroughly. Terms...

In This Chapter

Appendix F: Glossary

This glossary defines key technical terms used throughout The Physics of Music and the Music of Physics. Entries are organized alphabetically and cross-referenced to the chapter where the term is first introduced or discussed most thoroughly. Terms span acoustics, digital signal processing, music theory, psychoacoustics, neuroscience, and quantum mechanics. Where a term carries distinct meanings in physics and music, both definitions are given.


A

Absolute pitch (also perfect pitch) — The ability of an individual to identify or produce a musical pitch without reference to an external standard. Absolute pitch is present in roughly 1 in 10,000 people in Western populations but is significantly more common among speakers of tonal languages, suggesting a strong developmental and linguistic component. First discussed: Chapter 27 (Pitch Perception and the Brain).

Absorption coefficient — A dimensionless number between 0 and 1 expressing the fraction of incident sound energy absorbed by a surface rather than reflected. A coefficient of 0 indicates perfect reflection; a coefficient of 1 indicates complete absorption. Different materials have different absorption coefficients at different frequencies, which is why acoustic panels are frequency-selective in their treatment. First discussed: Chapter 8 (Room Acoustics and Architectural Sound).

Acoustic impedance — The opposition that a medium or structure presents to the flow of acoustic energy, defined as the ratio of sound pressure to particle velocity (Z = p/u). Impedance mismatches at the boundaries between media — for example, between a vibrating string and the surrounding air — cause partial reflection of acoustic energy, which is why instrument soundboards are designed to match impedances as effectively as possible. Units are Rayls (Pa·s/m). First discussed: Chapter 4 (Wave Transmission and Impedance).

Acoustics — The branch of physics concerned with the generation, transmission, and reception of mechanical waves in gases, liquids, and solids, with particular attention to audible sound (approximately 20 Hz to 20 kHz). Acoustics encompasses architectural acoustics, musical acoustics, psychoacoustics, underwater acoustics, and medical ultrasonics. First discussed: Chapter 1 (What Is Sound?).

ADSR envelope — A four-stage model describing the time-varying amplitude of a musical sound: Attack (the initial rise from silence to peak amplitude), Decay (the fall from peak to sustain level), Sustain (the steady-state amplitude maintained while a key is held), and Release (the fall back to silence after a key is released). ADSR envelopes are fundamental to synthesis and profoundly shape perceived timbre and instrument character. First discussed: Chapter 13 (Synthesis and the Electronic Voice).

Aliasing — A form of distortion that occurs in digital audio when a signal contains frequency components above the Nyquist frequency (half the sampling rate). These high-frequency components are "folded back" into the audible range and appear as spurious tones unrelated to the original signal. Aliasing is prevented by applying an anti-aliasing low-pass filter before analog-to-digital conversion. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Amplitude — The maximum displacement of a wave from its equilibrium position; in acoustics, this is often expressed as maximum sound pressure (Pascals) or maximum particle displacement (meters). Amplitude is directly related to the perceived loudness of a sound, though the relationship is nonlinear and frequency-dependent. First discussed: Chapter 1 (What Is Sound?).

Ambisonics — A full-sphere surround-sound technique that encodes and reproduces the acoustic field at a point in three-dimensional space using spherical harmonic decomposition. Unlike channel-based surround formats (5.1, 7.1), Ambisonics is speaker-layout-independent: a single B-format recording can be decoded for headphones, a stereo pair, or a large loudspeaker array. First discussed: Chapter 35 (Spatial Audio and Immersive Environments).

Anti-aliasing filter — A low-pass filter applied to an analog signal before analog-to-digital conversion to remove all frequency content above the Nyquist frequency, thereby preventing aliasing. The steepness (roll-off slope) of this filter is a critical design parameter; modern oversampling ADCs use gentler analog filters combined with steep digital filters operating at higher internal sample rates. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Appoggiatura — A melodic ornament in which a non-chord tone, typically placed on a strong beat, is resolved by step to an adjacent chord tone. Research by Huron and colleagues suggests that appoggiaturas reliably trigger frisson (chills) because the non-chord tone creates a tension-prediction violation that is then resolved — a cycle of expectation, disappointment, and relief that engages the reward system. First discussed: Chapter 30 (Expectation, Tension, and Musical Emotion).

Attractor (strange attractor) — In dynamical systems theory, an attractor is a set of states toward which a system evolves over time regardless of initial conditions. A strange attractor is a fractal attractor with sensitive dependence on initial conditions, characteristic of chaotic systems. Musical rhythm and performance timing have been modeled as dynamical systems with attractor-like behavior, and the 1/f noise structure of natural rhythms suggests low-dimensional chaotic dynamics. First discussed: Chapter 37 (Chaos, Complexity, and Musical Structure).


B

Bandwidth — The range of frequencies occupied by a signal or passed by a filter, typically measured in hertz (Hz). In acoustics, the bandwidth of a resonance is the range of frequencies within 3 dB of its peak response; a narrow bandwidth indicates a sharp, selective resonance (high Q factor). In digital audio, the maximum reproducible bandwidth is limited to the Nyquist frequency. First discussed: Chapter 5 (Resonance and Filters).

Basilar membrane — A thin, tapered membrane running the length of the cochlea (approximately 35 mm in humans) that performs a mechanical frequency analysis of incoming sound. The basilar membrane is stiff and narrow at the base (responding to high frequencies) and wide and flexible at the apex (responding to low frequencies), implementing a biological version of a Fourier analysis. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

Beat (acoustic) — A periodic fluctuation in amplitude produced when two tones of slightly different frequencies are sounded simultaneously. The beat frequency equals the absolute difference between the two frequencies: f_beat = |f₁ - f₂|. Beats are used by musicians to tune instruments — when two strings are perfectly in tune, beats disappear. First discussed: Chapter 3 (Interference and Superposition).

Bit depth — The number of bits used to encode each audio sample in a digital audio system, determining the dynamic range and quantization noise floor. A bit depth of n bits provides a theoretical dynamic range of approximately 6n dB (e.g., 16-bit audio ≈ 96 dB; 24-bit audio ≈ 144 dB). CD audio uses 16-bit depth; professional studio recording typically uses 24-bit depth. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Binaural audio — A recording and reproduction technique that captures or simulates the acoustic signals arriving at the two eardrums, including the direction- and distance-dependent filtering imposed by the head, pinnae, and torso (the HRTF). When reproduced over headphones, binaural recordings create a convincing three-dimensional auditory scene. First discussed: Chapter 23 (Spatial Hearing and Sound Localization).

Bone conduction — The transmission of sound to the inner ear through vibrations of the bones of the skull, bypassing the outer and middle ear. Bone conduction explains why recorded voices sound different from how we hear ourselves speak (we hear ourselves partly through bone conduction) and is the basis for bone-conduction headphones and hearing aids used by individuals with outer or middle ear hearing loss. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

Bypass (signal processing) — A routing configuration in which a signal is sent directly to the output without passing through a processing unit, allowing direct comparison between processed and unprocessed sound. The ability to bypass processing at unity gain is fundamental to critical listening practice and is required to evaluate the audible effect of any signal processing stage. First discussed: Chapter 34 (Mixing, Mastering, and the Signal Chain).


C

Cepstrum — The inverse Fourier transform of the logarithm of the power spectrum of a signal, defined as C(τ) = F⁻¹{log|F{x(t)}|²}. The cepstrum separates the spectral envelope (source filter characteristics) from the fine spectral structure (fundamental frequency and harmonics), making it useful for pitch detection, formant analysis, and voice identification. The word "cepstrum" is a deliberate anagram of "spectrum." First discussed: Chapter 20 (Voice, Speech, and the Source-Filter Model).

Chroma — The perceptual quality that makes pitches separated by an octave sound "the same" despite differing in pitch height. Chroma is the circular dimension of pitch — the quality of "C-ness" or "G-ness" — independent of register. The chroma circle (or pitch-class circle) underlies tonal relationships in music theory and is represented neurologically in the auditory cortex. First discussed: Chapter 26 (The Geometry of Tonal Space).

Chromatic scale — A musical scale dividing the octave into twelve equal (or approximately equal) semitones. In equal temperament, each semitone corresponds to a frequency ratio of 2^(1/12) ≈ 1.05946. The chromatic scale encompasses all pitches used in Western tonal and atonal music. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Cochlea — The snail-shaped, fluid-filled sensory organ of the inner ear that transduces mechanical vibrations into electrical nerve impulses. The cochlea contains approximately 3,500 inner hair cells arranged tonotopically along the basilar membrane, each tuned to a specific frequency range. It performs the primary spectral analysis that enables pitch discrimination and speech recognition. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

Coherence — A measure of the correlation between two signals as a function of frequency, ranging from 0 (completely uncorrelated) to 1 (perfectly correlated). In room acoustics, coherence between the direct sound and reflections determines the quality of perceived spaciousness; in quantum mechanics, coherence refers to the definite phase relationship between superposed quantum states, which is analogous to constructive interference in wave physics. First discussed: Chapter 8 (Room Acoustics and Architectural Sound); quantum analog discussed in Chapter 17 (Quantum Acoustics).

Comma (Pythagorean, syntonic) — A small interval arising from the discrepancy between two different tuning systems. The Pythagorean comma (≈ 23.46 cents) is the gap between twelve pure perfect fifths and seven octaves; it is the reason that a circle of pure fifths does not close. The syntonic comma (≈ 21.51 cents) is the difference between a Pythagorean major third (81/64) and a just major third (5/4). Commas are the fundamental problem that temperament systems were invented to solve. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Compression (dynamic range) — A signal processing technique that reduces the dynamic range of an audio signal by attenuating the loudest portions, typically controlled by parameters including threshold, ratio, attack time, release time, and make-up gain. Compression is widely used in music production to increase perceived loudness, control peaks, and shape the temporal envelope of instruments. First discussed: Chapter 34 (Mixing, Mastering, and the Signal Chain).

Compression (data/lossy) — An encoding technique that reduces file size by discarding audio information deemed perceptually inaudible, typically using psychoacoustic masking models to identify redundant or masked data. MP3, AAC, and Ogg Vorbis are lossy compression formats; their algorithms rely on perceptual coding strategies including masking, MDCT, and Huffman coding. First discussed: Chapter 15 (Perceptual Coding and the Psychoacoustics of MP3).

Consonance — The perceptual quality of a sound combination or interval that sounds stable, pleasant, or resolved. Consonance is related to (but not entirely explained by) simple integer frequency ratios, coincidence of overtones, and low roughness. Perceptions of consonance are also culturally conditioned and context-dependent. First discussed: Chapter 3 (Interference and Superposition); extended treatment in Chapter 25 (Consonance, Dissonance, and Roughness).

Critical band — A frequency-analysis bandwidth of the auditory system, representing the range of frequencies that share a single auditory filter channel. Two tones within the same critical band interfere with each other's detection (masking); tones in different critical bands are processed relatively independently. Critical bands vary from about 100 Hz wide at low frequencies to several hundred Hz wide at high frequencies. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).


D

Damping — The process by which the amplitude of an oscillating system decreases over time due to energy dissipation (through friction, viscosity, radiation, or material loss). In acoustics, damping determines how quickly a resonance decays after excitation; in musical instruments, controlled damping shapes the sustain and decay of notes. The damping ratio ζ ranges from 0 (undamped) to 1 (critically damped). First discussed: Chapter 2 (Oscillators, Resonance, and Damping).

Decibel (dB) — A logarithmic unit expressing the ratio of two quantities, most commonly sound pressure or power. Sound pressure level (SPL) is defined as L = 20·log₁₀(p/p_ref) dB, where p_ref = 20 μPa (the threshold of human hearing). The decibel scale was adopted because the ear responds logarithmically to intensity: a 10 dB increase corresponds roughly to a doubling of perceived loudness. First discussed: Chapter 1 (What Is Sound?).

Decoherence — In quantum mechanics, the process by which a quantum system loses its quantum coherence (definite phase relationships) through interaction with its environment, causing quantum superpositions to behave like classical statistical mixtures. Decoherence is why macroscopic objects do not exhibit quantum interference. By analogy, acoustic decoherence describes the loss of phase relationships in a reverberant field over time. First discussed: Chapter 17 (Quantum Acoustics).

Diatonic scale — A seven-note scale consisting of five whole steps and two half steps arranged in a specific pattern (W-W-H-W-W-W-H for the major mode). The seven diatonic modes (Ionian, Dorian, Phrygian, Lydian, Mixolydian, Aeolian, Locrian) are rotations of this pattern. The diatonic scale is the foundation of Western tonal harmony. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Diffraction — The bending of waves around obstacles or through apertures, most pronounced when the wavelength is comparable to the obstacle size. In room acoustics, low-frequency sounds diffract around furniture and through doorways more readily than high-frequency sounds. Diffraction also explains why sound can be heard around corners and why microphones have frequency-dependent directional patterns. First discussed: Chapter 3 (Interference and Superposition).

Diffusion (acoustic) — The scattering of sound waves in many directions by irregular surfaces, distributing acoustic energy evenly throughout a space. Diffusion reduces the coloration caused by discrete echoes and flutter echo, and creates a more enveloping, uniform sound field. Diffusers are designed using quadratic residue sequences and other mathematical patterns to achieve frequency-independent scattering. First discussed: Chapter 8 (Room Acoustics and Architectural Sound).

Dissonance — The perceptual quality of a sound combination or interval that sounds tense, rough, or unresolved. Dissonance arises from beating between near-coincident partials, from roughness in the auditory system's response, and from violations of learned harmonic expectation. Like consonance, dissonance is both psychoacoustically grounded and culturally learned. First discussed: Chapter 25 (Consonance, Dissonance, and Roughness).

Doppler effect — The change in observed frequency of a wave caused by relative motion between the source and observer. When source and observer approach, observed frequency increases; when they recede, it decreases. In music, vibrato produced by moving a sound source (Leslie speaker cabinet) uses the Doppler effect; in astrophysics, the Doppler effect reveals stellar motion through spectral shifts. First discussed: Chapter 3 (Interference and Superposition).

Dynamic range — The ratio between the loudest and softest sound levels in a signal or system, typically expressed in decibels. The human ear has a dynamic range of approximately 120 dB (from threshold of hearing to threshold of pain). Digital audio formats have dynamic ranges determined by bit depth (16-bit ≈ 96 dB; 24-bit ≈ 144 dB). First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).


E

Eigenstate — In quantum mechanics, a state in which a physical observable has a definite, well-defined value (an eigenvalue). An eigenstate of the energy operator (Hamiltonian) is a stationary state: a quantum system in an energy eigenstate does not change over time. The standing wave modes of vibrating strings are the acoustic analog of energy eigenstates in quantum systems — both arise from boundary conditions imposed on wave equations. First discussed: Chapter 17 (Quantum Acoustics).

Eigenvalue — In linear algebra and quantum mechanics, the scalar associated with an eigenvector or eigenstate: Aψ = λψ, where A is an operator, ψ is the eigenstate, and λ is the eigenvalue. In musical acoustics, the resonant frequencies of a vibrating system are the eigenvalues of its governing wave equation; each eigenfrequency corresponds to a normal mode. First discussed: Chapter 17 (Quantum Acoustics).

Entrainment — The synchronization of two oscillating systems through their mutual interaction, such that they adopt a common frequency or phase relationship. In music, entrainment describes the tendency of listeners to synchronize body movements (foot tapping, head nodding) to musical beat, and may also apply to neural oscillations synchronizing to rhythmic auditory stimuli. First discussed: Chapter 32 (Rhythm, Meter, and Neural Entrainment).

Equal temperament — A tuning system that divides the octave into twelve equal semitones, each a frequency ratio of 2^(1/12) ≈ 1.05946. Equal temperament enables free modulation to any key with identical interval qualities in every key, at the cost of slightly mistuned thirds and fifths compared to just intonation. Equal temperament became the standard Western tuning in the 19th century. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Equalizer (EQ) — A signal processing device or algorithm that selectively boosts or attenuates specified frequency bands of an audio signal. Parametric equalizers allow control of frequency, gain, and bandwidth (Q factor) for each band; graphic equalizers have fixed-frequency bands; shelving EQs boost or cut all frequencies above or below a set point. EQ is used for both corrective and creative purposes in recording and live sound. First discussed: Chapter 34 (Mixing, Mastering, and the Signal Chain).


F

Formant — A resonant frequency peak in the spectral envelope of a sound, produced by resonances of the vocal tract or instrument body that amplify certain harmonic frequencies. The first two formants (F1 and F2) largely determine vowel identity in speech; for vowel /a/, F1 ≈ 800 Hz and F2 ≈ 1,200 Hz. Formant transitions are the primary acoustic cue for place of articulation in consonants. First discussed: Chapter 20 (Voice, Speech, and the Source-Filter Model).

Fourier series — A mathematical representation of a periodic function as a sum of sinusoidal components (harmonics) at integer multiples of a fundamental frequency: x(t) = Σ[aₙcos(2πnf₀t) + bₙsin(2πnf₀t)]. Every periodic waveform — from a square wave to a violin tone — can be decomposed into a unique set of harmonic amplitudes and phases via the Fourier series. First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Fourier transform — A mathematical transformation that decomposes a time-domain signal into its constituent frequency components, producing a complex-valued frequency-domain representation. The Fourier transform generalizes the Fourier series to non-periodic signals: X(f) = ∫x(t)e^(−i2πft)dt. The Fast Fourier Transform (FFT) is an efficient algorithm for computing discrete Fourier transforms and is fundamental to digital audio analysis. First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Frequency — The number of complete oscillation cycles occurring per second, measured in hertz (Hz). For sound, frequency is the primary physical correlate of perceived pitch, though the relationship is not perfectly linear (see: mel scale). The audible range for healthy young humans is approximately 20 Hz to 20,000 Hz. First discussed: Chapter 1 (What Is Sound?).

Frisson — A pleasurable tingling sensation, often described as "chills" or "goosebumps," experienced in response to emotionally evocative music. Frisson involves activation of the reward system (dopamine release), strong emotional arousal, and often occurs at moments of musical surprise, violation of expectation, or particular timbral quality. Approximately 55–86% of people report experiencing frisson in response to music. First discussed: Chapter 30 (Expectation, Tension, and Musical Emotion).

Fugue — A contrapuntal compositional technique in which a subject (short melodic theme) is introduced in one voice and then imitated successively in other voices, while earlier voices continue with countersubjects and free counterpoint. Bach's fugues are the canonical examples of the form. The fugue demonstrates musical symmetry operations — transposition, inversion, retrograde — that have direct analogs to group-theoretic operations. First discussed: Chapter 38 (Symmetry Groups and Musical Structure).

Fundamental frequency (f₀) — The lowest frequency component of a complex periodic tone, which typically corresponds to the perceived pitch of the sound. For a string vibrating in its simplest mode, f₀ = (1/2L)√(T/μ), where L is length, T is tension, and μ is linear mass density. The fundamental frequency of the human speaking voice ranges from about 85 Hz (deep bass) to 255 Hz (high soprano). First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).


G

Gabor uncertainty principle — An acoustic analog of the Heisenberg uncertainty principle, stating that a signal cannot be simultaneously well-resolved in both time and frequency: ΔtΔf ≥ 1/(4π). A very short sound pulse (precise in time) must have a broad frequency spectrum; a pure tone (precise in frequency) must extend infinitely in time. This principle governs the fundamental limits of time-frequency analysis and shapes the design of audio codecs. First discussed: Chapter 16 (Time-Frequency Uncertainty and the Gabor Limit).

Group theory — The branch of abstract algebra studying the properties of mathematical groups — sets of elements with an associative binary operation, an identity element, and inverses. Group theory provides a powerful framework for analyzing musical symmetry operations: transposition, inversion, retrograde, and augmentation form groups. The twelve pitch classes under transposition form the cyclic group Z₁₂. First discussed: Chapter 38 (Symmetry Groups and Musical Structure).

Groove — The perception of a rhythmic pattern as inducing a desire for bodily movement, associated with "feel" in funk, jazz, and dance music. Groove arises from specific micro-timing deviations from an exact metric grid, combined with particular timbral and dynamic patterns. Computational groove models have quantified the contribution of timing, loudness, and spectral features to groove ratings. First discussed: Chapter 32 (Rhythm, Meter, and Neural Entrainment).


H

Hair cells — Mechanosensory receptor cells of the inner ear that transduce mechanical vibrations into electrical nerve impulses. Outer hair cells (approximately 12,000) amplify cochlear motion through electromotility; inner hair cells (approximately 3,500) are the primary transducers that signal to the auditory nerve. Hair cell damage from noise exposure or ototoxic drugs is the leading cause of sensorineural hearing loss. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

Harmonic — A sinusoidal frequency component of a complex tone that is an integer multiple of the fundamental frequency: the nth harmonic has frequency nf₀. A perfectly vibrating string with fixed endpoints produces a complete harmonic series. Real instruments deviate from perfect harmonicity due to stiffness (inharmonicity), producing slightly stretched or compressed overtone series. First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).

Harmonic series — The sequence of frequencies that are integer multiples of a fundamental: f₀, 2f₀, 3f₀, 4f₀, ... The ratios between adjacent harmonics approximate simple musical intervals: the 2nd harmonic gives an octave, the 3rd a perfect fifth above that, the 4th another octave, the 5th a major third, and so on. Just intonation tuning systems are based on frequency ratios drawn from the harmonic series. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

HRTF (Head-Related Transfer Function) — The direction-dependent filtering imposed on a sound by the diffraction and reflection properties of the head, pinnae, and torso. The HRTF encodes elevation, azimuth, and distance cues in the frequency and timing characteristics of binaural signals. Individualized HRTFs produce the most convincing three-dimensional audio; generic HRTFs can cause front-back confusions and poor externalization. First discussed: Chapter 23 (Spatial Hearing and Sound Localization).

Heisenberg uncertainty principle — A fundamental principle of quantum mechanics stating that the position and momentum of a particle cannot both be measured with arbitrary precision simultaneously: ΔxΔp ≥ ℏ/2. This is not a limitation of measurement technology but a fundamental property of quantum states. The acoustic Gabor limit is a classical analog arising from the wave nature of sound rather than quantum mechanics. First discussed: Chapter 17 (Quantum Acoustics).

Hilbert space — An abstract mathematical space of infinite dimension in which vectors represent quantum states, and inner products define the probability of measurement outcomes. In quantum mechanics, every physical observable corresponds to a Hermitian operator on Hilbert space, and measurement collapses a state vector to an eigenstate. The Hilbert space formalism provides the unifying mathematical framework connecting quantum mechanics and wave-based physics. First discussed: Chapter 17 (Quantum Acoustics).


I

Impedance — See Acoustic impedance. In electronics, impedance (Z) is the complex ratio of voltage to current, encompassing both resistance and reactance. Impedance matching between microphones, preamplifiers, and amplifiers is critical for optimal signal transfer in audio systems. First discussed: Chapter 4 (Wave Transmission and Impedance).

Inharmonicity — The deviation of the overtone frequencies of a real instrument from the ideal harmonic series. In piano strings, stiffness (described by the stiffness coefficient B) raises the frequencies of upper partials above their ideal harmonic values: fₙ = nf₀√(1 + Bn²). Inharmonicity contributes to the characteristic "stretch tuning" of the piano and is responsible for the perception of piano sound as both bright and slightly "rough" at high registers. First discussed: Chapter 11 (The Physics of Keyboard Instruments).

Information entropy — In information theory, the average amount of information (surprise) contained in a message, measured in bits: H = −Σpᵢlog₂(pᵢ). Musical information entropy quantifies the predictability of pitch, rhythm, and harmonic sequences; highly predictable music has low entropy, highly random music has high entropy. Optimal musical interest may lie at an intermediate entropy value. First discussed: Chapter 39 (Information Theory and Musical Complexity).

Interference (constructive/destructive) — The superposition of two or more waves producing a resultant wave of greater amplitude (constructive interference, when waves are in phase) or lesser amplitude (destructive interference, when waves are out of phase). Acoustic interference produces the spatial patterns called standing waves in enclosed spaces, the beating of slightly mistuned intervals, and the directional patterns of stereo loudspeaker arrays. First discussed: Chapter 3 (Interference and Superposition).

Interval (musical) — The perceptual distance between two pitches, defined by the frequency ratio between them (in just intonation) or by the number of semitones separating them (in equal temperament). Intervals include the unison (1:1), octave (2:1), perfect fifth (3:2), perfect fourth (4:3), major third (5:4), minor third (6:5), and their inversions and compounds. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

ITD (Interaural Time Difference) — The difference in arrival time of a sound at the two ears, used by the auditory system to localize sound in the horizontal plane. ITDs are maximal (approximately 650 microseconds) for sounds directly to one side and zero for sounds directly in front or behind. ITDs are the dominant localization cue for frequencies below approximately 1,500 Hz. First discussed: Chapter 23 (Spatial Hearing and Sound Localization).

ILD (Interaural Level Difference) — The difference in sound pressure level between the two ears, arising from the acoustic "shadow" cast by the head for high-frequency sounds. ILDs increase with frequency above approximately 1,500 Hz and provide the dominant sound localization cue at high frequencies, complementing the ITD mechanism. First discussed: Chapter 23 (Spatial Hearing and Sound Localization).


J

Just intonation — A tuning system in which all intervals are tuned to simple integer frequency ratios drawn from the harmonic series: octave 2:1, perfect fifth 3:2, major third 5:4, minor third 6:5, etc. Just intonation produces acoustically pure intervals free of beating, but certain intervals (particularly major thirds built on non-tonic scale degrees) are unacceptably mistuned in some keys, and free modulation is impossible without retuning. First discussed: Chapter 6 (Scales, Temperament, and Tuning).


K

Key (musical) — The tonal center of a musical passage, defined by a tonic pitch and the diatonic scale built on that pitch. A piece "in C major" uses the diatonic pitches of the C major scale and gravitates harmonically toward the C major triad. Key provides the harmonic framework within which individual notes and chords acquire their functional meaning (tonic, dominant, subdominant, etc.). First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory).


L

Longitudinal wave — A wave in which the displacement of the medium is parallel to the direction of wave propagation, producing alternating compressions and rarefactions. Sound in air is a longitudinal wave; the air molecules oscillate back and forth along the direction of sound travel. Contrast with transverse waves (e.g., waves on a string, where displacement is perpendicular to propagation). First discussed: Chapter 1 (What Is Sound?).

LTAS (Long-Term Average Spectrum) — The average power spectrum of an audio signal calculated over a time interval long enough to capture its typical spectral characteristics, usually several seconds to minutes. LTAS is used to characterize the timbral "color" of voices, instruments, and mixes, and to measure the average spectral slope of natural sounds. First discussed: Chapter 20 (Voice, Speech, and the Source-Filter Model).

LUFS (Loudness Units Full Scale) — A standardized measure of integrated loudness conforming to the EBU R128 and ITU-R BS.1770 specifications, designed to align with human loudness perception better than peak-based measurements. Streaming platforms use LUFS targets (typically −14 LUFS for music) to normalize playback levels, reducing the incentive for extreme dynamic range compression ("loudness war"). First discussed: Chapter 34 (Mixing, Mastering, and the Signal Chain).


M

Masking (acoustic) — The phenomenon in which one sound (the masker) reduces the audibility of another sound (the target) because both stimulate overlapping regions of the basilar membrane or activate shared neural channels. Simultaneous masking occurs when masker and target overlap in time; forward masking (post-masking) occurs when a prior loud sound temporarily elevates thresholds; backward masking is weaker and shorter in duration. First discussed: Chapter 15 (Perceptual Coding and the Psychoacoustics of MP3).

MDCT (Modified Discrete Cosine Transform) — A lapped transform used in audio coding (MP3, AAC, OGG) that divides an audio signal into overlapping analysis frames and converts each frame into frequency-domain coefficients. The overlap-and-add structure of the MDCT avoids blocking artifacts at frame boundaries, and its energy compaction properties enable efficient quantization of audio spectra. First discussed: Chapter 15 (Perceptual Coding and the Psychoacoustics of MP3).

Mel scale — A perceptual frequency scale that maps physical frequency (Hz) to perceived pitch (mels), reflecting the approximately logarithmic relationship between frequency and pitch perception at low frequencies and a more compressed response at high frequencies. The mel scale was derived from direct scaling experiments asking subjects to set a tone to "half the pitch" of a reference tone. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

MFCC (Mel-Frequency Cepstral Coefficient) — A compact representation of the spectral envelope of a sound obtained by computing the cepstrum of a mel-scaled spectrogram. MFCCs capture the slowly varying spectral shape (formant structure, timbral identity) while discarding pitch information, making them widely used in automatic speech recognition, instrument classification, and music information retrieval. First discussed: Chapter 36 (Machine Listening and Music Information Retrieval).

Modal music — Music organized around a mode (a scale with a characteristic interval pattern and melodic orientation) rather than the functional tonal hierarchy of major/minor tonality. Modal music includes Medieval and Renaissance polyphony, Indian raga-based music, jazz modal improvisation (Miles Davis's Kind of Blue), and folk music from many cultures. First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory).

Mode (acoustic) — A pattern of vibration of an acoustic system (room, instrument body, air column) at a specific resonant frequency, in which every part of the system oscillates at that frequency with a fixed spatial amplitude pattern. Each mode has characteristic nodal lines (zero displacement) and antinodal regions (maximum displacement). The modes of a rectangular room are found at frequencies f = (c/2)√[(l/Lx)² + (m/Ly)² + (n/Lz)²]. First discussed: Chapter 8 (Room Acoustics and Architectural Sound).

Mode (musical) — One of the seven diatonic rotations (Ionian, Dorian, Phrygian, Lydian, Mixolydian, Aeolian, Locrian), each with a distinctive interval pattern producing a characteristic emotional quality. Modes can also refer to non-diatonic scales such as the octatonic (diminished), whole-tone, or other synthetic scales. First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory).

Modulation — In music theory, the process of transitioning from one key to another within a composition, typically achieved through pivot chords common to both keys, chromatic alteration, or direct (abrupt) key change. In signal processing, modulation refers to the variation of a carrier signal's amplitude (AM), frequency (FM), or phase (PM) according to a modulating signal. First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory) and Chapter 13 (Synthesis and the Electronic Voice).

Monophony — Musical texture consisting of a single melodic line without accompaniment or harmonic support. Gregorian chant is the canonical example. Monophony contrasts with homophony (melody with chordal accompaniment), polyphony (multiple independent melodic voices), and heterophony (simultaneous variations of the same melody). First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory).

Musical scale — An ordered set of pitches spanning an octave, providing the pitch material for melody and harmony. Scales differ in the number of pitches, their interval pattern, and their relationship to a tonal center. Common scales include the chromatic (12 notes), diatonic major and minor (7 notes), pentatonic (5 notes), whole-tone (6 notes), and octatonic (8 notes). First discussed: Chapter 6 (Scales, Temperament, and Tuning).


N

Node (acoustic) — A point, line, or surface in a standing wave pattern where the amplitude of oscillation is always zero (a displacement node) or pressure is always zero (a pressure node). Displacement nodes coincide with pressure antinodes and vice versa. The nodal patterns of vibrating plates, visualized with sand (Chladni figures), reveal the mode shapes of the plate. First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).

Noise (pink, white, brown) — Random signals characterized by their power spectral density. White noise has equal power per unit frequency (flat spectrum); pink noise has equal power per octave (power proportional to 1/f); brown noise (Brownian noise) has power proportional to 1/f². Natural sounds (waterfalls, wind) have approximately pink spectra; musical dynamics and many natural time series also exhibit 1/f statistics. First discussed: Chapter 37 (Chaos, Complexity, and Musical Structure).

Nyquist frequency — Half the sampling rate of a digital audio system, representing the highest frequency that can be accurately represented. For CD audio (44,100 Hz sampling rate), the Nyquist frequency is 22,050 Hz. Frequencies above the Nyquist frequency alias to lower frequencies in the audio band and must be removed by an anti-aliasing filter before digitization. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Nyquist-Shannon sampling theorem — The fundamental theorem of digital audio, stating that a bandlimited signal can be perfectly reconstructed from discrete samples if the sampling rate is at least twice the highest frequency component in the signal. Formally: if f_max is the bandwidth of the signal, then sampling at f_s ≥ 2f_max allows perfect reconstruction. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).


O

Octave — The musical interval corresponding to a frequency ratio of exactly 2:1. A pitch one octave above another vibrates at twice the frequency; one octave below vibrates at half the frequency. The octave is the most universally recognized musical interval across cultures and is encoded in the tonotopic organization of the auditory cortex as a fundamental perceptual category. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Onset — The beginning of a note or sound event, characterized by a rapid rise in amplitude (the attack transient). Onset detection is a fundamental task in music information retrieval, enabling beat tracking, segmentation, and alignment. Onset characteristics (attack time, spectral flux) are major contributors to perceived timbre. First discussed: Chapter 13 (Synthesis and the Electronic Voice).

Oscillator — A system that produces a periodic waveform by cycling repeatedly through a sequence of states. Mechanical oscillators (pendula, springs), acoustic oscillators (vibrating strings, air columns), and electronic oscillators (voltage-controlled oscillators in synthesizers) all share the mathematical property of exchanging energy between potential and kinetic forms. The ideal simple harmonic oscillator produces a sinusoidal waveform. First discussed: Chapter 2 (Oscillators, Resonance, and Damping).

Overtone — A frequency component of a complex tone above the fundamental frequency. The first overtone is the second harmonic (2f₀), the second overtone is the third harmonic (3f₀), and so on. In some instruments (e.g., bells, bars), overtones may be inharmonic — not integer multiples of f₀ — profoundly affecting perceived timbre and pitch. First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).


P

Partial — Any sinusoidal frequency component of a complex tone, whether or not it is harmonically related to the fundamental. The term "partial" is more general than "harmonic" (which implies integer multiples) or "overtone" (which implies frequencies above the fundamental). Bell tones, for example, have partials that are explicitly inharmonic. First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).

Pentatonic scale — A five-note scale per octave. The major pentatonic (e.g., C-D-E-G-A) and minor pentatonic are the most common forms in Western and global popular music; analogous five-note scales appear in music of China, Japan, Africa, and indigenous American cultures. Pentatonic scales are often considered the most universally shared pitch structures across human cultures. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Periodicity — The property of a signal that exactly repeats itself after a fixed interval (the period T = 1/f). Strict periodicity produces a tone with a definite pitch; aperiodic signals (noise) have no pitch. Quasi-periodic signals (natural instruments) have near-periodicity with cycle-to-cycle variation, producing a richer, more natural sound quality than perfectly periodic synthesized tones. First discussed: Chapter 1 (What Is Sound?).

Phase — The position within an oscillation cycle at a given moment, expressed as an angle in degrees or radians (0° to 360° or 0 to 2π). Phase relationships between sound waves determine whether interference is constructive or destructive; phase differences between the two ears provide spatial localization cues (ITD). In signal processing, phase shift describes the delay a filter imposes at each frequency. First discussed: Chapter 1 (What Is Sound?).

Phase transition — In physics, an abrupt qualitative change in the state or organization of a system as a control parameter crosses a critical value (e.g., water freezing at 0°C). In music and cultural physics, phase transitions describe abrupt shifts in musical style (e.g., the tonal-to-atonal transition in early 20th-century music), the onset of synchrony in ensemble playing, and the emergence of groove from rhythmic patterns. First discussed: Chapter 37 (Chaos, Complexity, and Musical Structure).

Pitch — The perceptual attribute of a sound ordered on a musical scale from low to high, most closely correlated with fundamental frequency. Pitch is a subjective percept, not a physical quantity: the pitch of a complex tone can be perceived even when the fundamental frequency is absent (the "missing fundamental" phenomenon). The just noticeable difference (JND) for pitch is approximately 0.3–3 Hz depending on frequency and musical training. First discussed: Chapter 24 (Pitch Perception: Theories and Mechanisms).

Polyphony — Musical texture in which two or more independent melodic lines are sounded simultaneously, each with its own rhythmic and melodic identity. Renaissance counterpoint, Bach fugues, and jazz improvisation are paradigm cases. Polyphony requires auditory stream segregation — the perceptual ability to separate simultaneous sound streams into distinct voices. First discussed: Chapter 28 (Auditory Scene Analysis and Stream Segregation).

Psychoacoustics — The scientific study of the relationship between physical acoustic stimuli and subjective auditory perception, including pitch, loudness, timbre, and spatial hearing. Psychoacoustics uses behavioral experiments (threshold measurements, magnitude estimation, detection tasks) and physiological recording to characterize the auditory system's response to sound. First discussed: Chapter 21 (Introduction to Psychoacoustics).

Pulse — In rhythm, the basic unit of temporal measurement — a regular, isochronous beat to which music is perceived to be organized. The pulse corresponds to the tapping rate of a listener responding to music and is related to, but distinct from, the notated beat of a score (tempo). Preferred pulse rates in music cluster around 100–120 beats per minute. First discussed: Chapter 32 (Rhythm, Meter, and Neural Entrainment).


Q

Q factor (Quality factor) — A dimensionless parameter measuring the sharpness of a resonance, defined as Q = f₀/Δf, where f₀ is the resonant frequency and Δf is the −3 dB bandwidth. High-Q resonators oscillate for many cycles after excitation (low damping); low-Q resonators damp quickly. In audio equalizers, Q determines the width of the frequency band affected by a boost or cut. First discussed: Chapter 2 (Oscillators, Resonance, and Damping).

Quantization (acoustic) — The restriction of pitch in a musical system to a discrete set of allowed values (the pitches of a scale or tuning system), rather than the continuum of possible frequencies. Quantization is culturally determined: Western music quantizes pitch to 12 equal divisions of the octave, while other traditions use different quantization schemes. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Quantization (digital) — The process of mapping a continuous-amplitude analog sample to the nearest value in a finite set of discrete digital levels, introducing quantization error (noise). With n bits, there are 2ⁿ quantization levels; quantization noise power decreases by approximately 6 dB for each additional bit. Dithering (adding small amounts of noise before quantization) linearizes the quantization process and reduces audible distortion. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Quantum mechanics — The branch of physics describing the behavior of matter and energy at atomic and subatomic scales, characterized by quantization of energy, wave-particle duality, the superposition principle, and probabilistic measurement outcomes. The mathematical formalism of quantum mechanics (wave equations, eigenvalues, Hilbert space) has deep structural analogies with acoustic wave physics, explored throughout this text. First discussed: Chapter 17 (Quantum Acoustics).

Quantum state — A complete mathematical description of a quantum system, represented as a vector (ket |ψ⟩) in Hilbert space. The quantum state encodes all information about the probability of measurement outcomes. A superposition state |ψ⟩ = α|0⟩ + β|1⟩ is the quantum analog of a superposition of acoustic modes — simultaneously occupying multiple states until measured. First discussed: Chapter 17 (Quantum Acoustics).


R

Raga — In Indian classical music, a melodic framework specifying: the scale (set of permissible pitches), characteristic ascending and descending phrases (aaroh and avaroh), emphasized tones (vadi and samvadi), ornaments, and association with specific times of day or seasons. A raga is not simply a scale but a complete aesthetic-melodic personality. There are hundreds of recognized ragas in Hindustani and Carnatic traditions. First discussed: Chapter 33 (World Musical Systems and Universal Structures).

Register (vocal) — A range of pitches produced by a particular configuration of the vocal folds and resonating cavities, characterized by distinct acoustic and physiological properties. The main registers are chest voice (modal, thick vocal fold vibration), falsetto (head voice, thin vibration, higher pitch), and mixed voice. Vocal pedagogy concerns the smooth blending of registers across "break" points. First discussed: Chapter 19 (The Physics of the Singing Voice).

Resonance — The tendency of a system to oscillate with greater amplitude at specific frequencies (resonant frequencies) when driven by an external force. Resonance occurs when the driving frequency matches a natural frequency of the system, enabling efficient energy transfer. The resonances of instrument bodies, vocal tracts, and concert halls fundamentally shape musical timbre and spatial character. First discussed: Chapter 2 (Oscillators, Resonance, and Damping).

Resonance frequency — The frequency at which a resonant system oscillates with maximum amplitude when driven. For a simple harmonic oscillator, f_res = (1/2π)√(k/m); for an open pipe, f_n = nc/2L; for a closed pipe, f_n = (2n−1)c/4L. The resonant frequencies of an instrument body or concert hall shape its frequency response and musical character. First discussed: Chapter 2 (Oscillators, Resonance, and Damping).

Reverberation — The persistence of sound in an enclosed space after the original source has ceased, caused by multiple reflections from surfaces. Reverberation is characterized by its decay curve (sound pressure as a function of time after source cessation) and its RT60 (time for level to fall by 60 dB). Appropriate reverberation is essential to the acoustic quality of performance spaces. First discussed: Chapter 8 (Room Acoustics and Architectural Sound).

RT60 — The time required for the reverberant sound level in a room to decay by 60 dB after the source is switched off. RT60 is the standard measure of room reverberation time. Optimal RT60 values depend on the intended use of the space: approximately 0.3–0.5 s for speech intelligibility, 1.5–2.0 s for orchestral music, 2.5–3.5 s for choral/organ music. Sabine's formula gives RT60 ≈ 0.161·V/A, where V is room volume and A is total sound absorption. First discussed: Chapter 8 (Room Acoustics and Architectural Sound).

Rhythm — The organization of sound events in time, including the durations of notes and silences, the grouping of beats into metric patterns, and the relationship between surface events and an underlying pulse. Rhythm is produced by the composer and performer, perceived and organized by the listener's rhythmic cognition system, and may entrain bodily movement. First discussed: Chapter 32 (Rhythm, Meter, and Neural Entrainment).


S

Sampling — The process of measuring the instantaneous amplitude of an analog signal at discrete, equally spaced time intervals (the sampling period T = 1/f_s). Sampling is the first step in analog-to-digital conversion; reconstruction of the original signal from samples is possible if the Nyquist-Shannon theorem conditions are satisfied. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Sampling rate — The number of samples taken per second during analog-to-digital conversion, measured in Hz or kHz. CD audio uses 44,100 Hz; professional studio audio commonly uses 48,000 Hz, 88,200 Hz, or 96,000 Hz. Higher sampling rates increase the reproducible bandwidth and reduce aliasing artifacts in the transition band of the anti-aliasing filter. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Serialism — A compositional technique in which a fixed ordered sequence (a tone row or series) of the twelve pitch classes is used as the basis for melody, harmony, and counterpoint, with permitted transformations including transposition, inversion, retrograde, and retrograde-inversion. Developed by Schoenberg, Berg, and Webern, serialism extends through Boulez and Stockhausen to total serialism, applying series to rhythm, dynamics, and timbre. First discussed: Chapter 38 (Symmetry Groups and Musical Structure).

Signal-to-noise ratio (SNR) — The ratio of signal power to noise power in a system, typically expressed in decibels: SNR = 10·log₁₀(P_signal/P_noise). Higher SNR indicates cleaner, less noisy audio. For digital audio, theoretical maximum SNR is determined by bit depth (approximately 6n dB for n bits); analog systems are limited by thermal and electronic noise floors. First discussed: Chapter 14 (Digital Audio and the Sampling Revolution).

Sine wave — The simplest periodic waveform, described by x(t) = A·sin(2πft + φ), with amplitude A, frequency f, and phase φ. A pure sine wave contains a single frequency component. All periodic waveforms can be decomposed into a sum of sine waves (Fourier series); all linear systems can be characterized by their response to sine wave inputs. First discussed: Chapter 1 (What Is Sound?).

Source-filter model — A model of sound production in which the output spectrum is the product of a source spectrum (produced by the glottis in the voice, or the reed/lips/bow in instruments) and a filter function (the resonances of the vocal tract or instrument body). The model explains how changes in articulation or instrument shape modify timbre while keeping the source characteristics constant. First discussed: Chapter 20 (Voice, Speech, and the Source-Filter Model).

Spectrogram — A two-dimensional representation of audio showing frequency (vertical axis) versus time (horizontal axis) with intensity encoded by color or brightness. The spectrogram is the standard visualization tool for time-varying spectral analysis. Its time-frequency resolution is governed by the Gabor uncertainty principle: narrow time windows give good time resolution but poor frequency resolution, and vice versa. First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Spectral centroid — A measure of the "center of mass" of a sound's frequency spectrum, calculated as the weighted average frequency: SC = Σ(f·A(f)) / ΣA(f). The spectral centroid correlates with perceived brightness or "sharpness" of a sound; high spectral centroid = bright, high-frequency content; low centroid = dark, bass-heavy. First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Spectral envelope — The smooth curve connecting the peaks of the harmonics in a spectrum, describing the overall shape of the frequency distribution independently of the fundamental frequency. The spectral envelope determines timbre and is shaped by the resonances of the instrument body, vocal tract, or acoustic environment. Formants are peaks in the spectral envelope. First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Standing wave — A wave pattern produced by the superposition of two waves traveling in opposite directions with the same frequency and amplitude, resulting in a stationary pattern of nodes (zero amplitude) and antinodes (maximum amplitude). Standing waves are the basis of acoustic resonances in pipes, strings, and rooms. The resonant frequencies of a standing wave system are determined by the boundary conditions. First discussed: Chapter 9 (Strings, Membranes, and Vibrating Bodies).

Superposition — The principle that the total response of a linear system to multiple simultaneous inputs is the sum of its responses to each input individually. In acoustics, the superposition principle means that sound waves add linearly (their pressures add point by point), enabling the decomposition of complex waveforms into simpler components and the construction of complex waveforms from simple ones. First discussed: Chapter 3 (Interference and Superposition).

Symmetry (musical) — A transformation of a musical object (a melody, chord, or rhythm) that leaves some property invariant. Musical symmetry operations include transposition (shifting all pitches by a constant interval), inversion (flipping the contour of a melody), retrograde (reversing the time sequence), and augmentation/diminution (multiplying all durations by a constant). These operations form mathematical groups. First discussed: Chapter 38 (Symmetry Groups and Musical Structure).

Symmetry breaking — The process by which a symmetric system transitions to a less symmetric state, typically triggered by instability. In physics, symmetry breaking underlies phase transitions (magnetization, crystal formation, the Higgs mechanism). In music, the establishment of a tonal center from a chromatic pitch space, or the emergence of a rhythmic pulse from isochronous events, can be understood as symmetry breaking. First discussed: Chapter 37 (Chaos, Complexity, and Musical Structure).


T

Tala — The rhythmic cycle framework of Indian classical music, consisting of a fixed pattern of beats (matras) organized into groups (vibhags) and marked by specific hand gestures (bols) and body movements. Well-known talas include Teentaal (16 beats), Rupak (7 beats), and Jhaptaal (10 beats). Tala provides the metric framework within which rhythmic improvisation occurs. First discussed: Chapter 33 (World Musical Systems and Universal Structures).

Temperament — Any tuning system that adjusts the pure intervals of just intonation to achieve practical goals such as freedom of modulation or uniformity across keys. Temperament systems include equal temperament, meantone temperament, well temperament (e.g., Kirnberger, Werckmeister), and various irregular temperaments. The choice of temperament profoundly affects the expressive character of music in different keys. First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Timbre — The perceptual quality that distinguishes sounds of the same pitch and loudness but different sonic character (e.g., a violin and a clarinet playing the same note). Timbre is a multidimensional percept correlated with spectral envelope, temporal envelope (ADSR), inharmonicity, noise content, and dynamic spectral evolution. The American Standards Association defines timbre as "that attribute of auditory sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar." First discussed: Chapter 10 (Timbre and the Fourier Decomposition of Sound).

Tonal music — Music organized around a tonal center (tonic) and governed by the hierarchical relationships between chords and scale degrees in the major-minor tonal system. Tonal music creates tension and release through functional harmonic progressions (I-IV-V-I) and voice leading conventions. Western music from approximately 1600–1900 is paradigmatically tonal. First discussed: Chapter 7 (Harmony, Tonality, and Functional Theory).

Tone — A sound with a definite, stable pitch, produced by a periodic or quasi-periodic waveform. In music theory, "tone" also refers to the interval of a major second (two semitones). Pure tones are single-frequency sinusoids; complex tones contain multiple frequency components (harmonics or partials). First discussed: Chapter 1 (What Is Sound?).

Tonotopic map — The spatial organization of frequency-selective neurons along a sensory structure, such that neurons responsive to low frequencies are located at one end and neurons responsive to high frequencies at the other. Tonotopic organization is present in the basilar membrane, the auditory nerve, the cochlear nucleus, the inferior colliculus, and the primary auditory cortex — a fundamental principle of auditory neural architecture. First discussed: Chapter 22 (The Inner Ear as a Fourier Analyzer).

Transient — A brief, non-periodic portion of a sound signal with rapid temporal change and broad spectral content, typically occurring at the onset of a note (the "attack transient") or upon sudden dynamic change. Transients are perceptually highly salient and are crucial for instrument recognition; audio codecs must handle transients carefully to avoid pre-echo artifacts. First discussed: Chapter 13 (Synthesis and the Electronic Voice).

Transposition — The operation of shifting all pitches of a musical passage by the same interval, resulting in a melodically identical but higher or lower version. In equal temperament, transposition by n semitones multiplies all frequencies by 2^(n/12). Transposition is a fundamental symmetry of the musical pitch space and generates the cyclic group Z₁₂ under the twelve possible transpositions. First discussed: Chapter 38 (Symmetry Groups and Musical Structure).


U

Uncertainty principle (acoustic) — See Gabor uncertainty principle. The acoustic uncertainty principle states that time duration and frequency bandwidth of a signal are inversely related, with the minimum time-bandwidth product equal to 1/(4π). This principle governs the fundamental tradeoff between temporal and spectral resolution in all time-frequency analysis methods, including the short-time Fourier transform, wavelet analysis, and audio codec window design. First discussed: Chapter 16 (Time-Frequency Uncertainty and the Gabor Limit).

Unison — The interval between two pitches of identical frequency, corresponding to a frequency ratio of 1:1. Perfect unison produces complete constructive interference (assuming identical phase). In ensemble music, "playing in unison" means all voices perform the same pitches (in the same octave); slight mistuning creates the "chorus" effect of ensemble warmth. First discussed: Chapter 3 (Interference and Superposition).


V

Valence (musical) — The emotional dimension of a musical experience ranging from negative (unpleasant, sad) to positive (pleasant, happy). Valence is one of the two primary dimensions of the circumplex model of affect (the other being arousal/energy). Musical valence is influenced by mode (major = higher valence), tempo, loudness, and cultural context. First discussed: Chapter 29 (Music and Emotion: Theories and Evidence).

Vibrato — A regular, periodic variation in the pitch (frequency vibrato) or amplitude (amplitude vibrato/tremolo) of a musical tone, typically at a rate of 5–8 Hz and a depth of ±25–100 cents. Vibrato is an essential expressive device in singing and bowed string playing; it enriches timbre by smearing the harmonic spectrum and enhances the "warmth" of a tone by preventing neural adaptation. First discussed: Chapter 19 (The Physics of the Singing Voice).

Voice leading — In music theory, the principles governing the movement of individual melodic lines (voices) within a harmonic progression, concerned with smooth, stepwise motion, avoidance of parallel fifths and octaves, and resolution of dissonances. Voice leading rules encode centuries of accumulated practice in connecting chords smoothly. Dmitri Tymoczko's geometric model of voice leading maps chord progressions to paths through an orbifold. First discussed: Chapter 26 (The Geometry of Tonal Space).

Vowel formant — See Formant. The characteristic resonant frequencies of the vocal tract that define vowel quality. The first two formants (F1: related to jaw height and vowel openness; F2: related to tongue position front/back) are the primary determinants of vowel identity. The vowel quadrilateral maps vowels in the F1-F2 plane and corresponds roughly to the articulatory space of tongue height and advancement. First discussed: Chapter 20 (Voice, Speech, and the Source-Filter Model).


W

Waveform — The shape of a wave as a function of time — the graph of amplitude versus time for a sound signal. Common waveforms include sine (single frequency, pure tone), square (rich in odd harmonics), sawtooth (rich in all harmonics, fundamental of subtractive synthesis), and triangle waves (rich in odd harmonics with faster rolloff than square). Natural musical sounds have complex, time-varying waveforms. First discussed: Chapter 1 (What Is Sound?).

Wavelength — The spatial period of a wave — the distance between two successive points of identical phase (e.g., two successive compressions in a sound wave). Wavelength λ = c/f, where c is wave speed and f is frequency. For middle C (262 Hz) in air (c = 343 m/s): λ ≈ 1.31 m. Wavelength determines the scale of acoustic phenomena: diffraction is significant when wavelength is comparable to obstacle size. First discussed: Chapter 1 (What Is Sound?).

Well temperament — A family of tuning systems popular in the Baroque and Classical periods (including Werckmeister III, Kirnberger II/III, and Vallotti temperament) in which all twelve keys are usable but have different interval qualities, giving each key a distinctive "character" or "color." Well temperament (not equal temperament) is believed to be what Bach intended for Das Wohltemperierte Klavier ("The Well-Tempered Clavier"). First discussed: Chapter 6 (Scales, Temperament, and Tuning).

Wolf note/fifth — In meantone temperament, the "wolf fifth" is the highly dissonant fifth that arises when the circle of twelve pure fifths is closed — it is approximately 35–40 cents narrower than a just fifth, producing audible beats. More generally, wolf notes are pitches on a stringed instrument that vibrate with the body resonances in an uncontrolled, unstable way. Wolf notes are the practical price of certain temperament choices. First discussed: Chapter 6 (Scales, Temperament, and Tuning).


Z

Zero-point energy — In quantum mechanics, the lowest possible energy of a quantum system, which is not zero but equals ℏω/2 for a simple harmonic oscillator. Zero-point energy arises from the Heisenberg uncertainty principle: a particle cannot simultaneously have exactly zero position uncertainty and exactly zero momentum uncertainty. The concept has a loose acoustic analog in the irreducible quantum noise floor of a resonant system at absolute zero temperature. First discussed: Chapter 17 (Quantum Acoustics).


This glossary covers the major technical terms used in this textbook. For further elaboration on any term, see the chapter(s) cited in each entry. Additional terminology used in specific chapters may be found in the "Key Terms" section at the end of each chapter.