50 min read

This appendix provides answers to selected exercises from each chapter of The Physics of Music and the Music of Physics. For Chapters 1–10, complete answers are given for all Part A (factual recall) questions, Part B Question 1 (conceptual...

In This Chapter

PART ONE: CHAPTERS 1–10 (Full Answers)
PART TWO: CHAPTERS 11–20 (Part A Only)
PART THREE: CHAPTERS 21–40 (Key Concepts)

Appendix G: Answers to Selected Exercises

This appendix provides answers to selected exercises from each chapter of The Physics of Music and the Music of Physics. For Chapters 1–10, complete answers are given for all Part A (factual recall) questions, Part B Question 1 (conceptual application), and one worked Part C example. For Chapters 11–20, Part A answers are provided in full. For Chapters 21–40, key concept summaries are given. Students are encouraged to attempt all exercises before consulting these answers.

PART ONE: CHAPTERS 1–10 (Full Answers)

Chapter 1: What Is Sound?

Part A — Factual Recall

A1. What is the physical nature of sound, and how does it differ from electromagnetic waves?

Answer: Sound is a longitudinal mechanical wave — a pattern of alternating compressions (regions of increased pressure) and rarefactions (regions of reduced pressure) propagating through a material medium. The oscillation of air molecules is parallel to the direction of wave travel, distinguishing sound as a longitudinal wave. Unlike electromagnetic waves (light, radio, X-rays), sound requires a medium through which to propagate; sound cannot travel through a vacuum. In a solid or liquid, sound may also propagate as transverse waves (shear waves), but airborne sound is always longitudinal.

A2. Write the equation relating frequency, wavelength, and wave speed for sound. Calculate the wavelength of the note A4 (440 Hz) in air at 20°C (c = 343 m/s).

Answer: The fundamental wave relationship is c = fλ, or equivalently λ = c/f. For A4 at 440 Hz in air at 20°C:

λ = c/f = 343 m/s ÷ 440 Hz ≈ 0.780 meters

This wavelength is roughly the length of a typical human arm, which is why open-air spaces of room scale significantly interact with musical pitches in the mid-frequency range.

A3. What is the audible frequency range for a healthy young adult? How does this range typically change with age?

Answer: The audible frequency range for a healthy young adult is approximately 20 Hz to 20,000 Hz (20 kHz). The lower limit is set partly by the difficulty of producing sufficient air displacement at very low frequencies and partly by the transition from pitch perception to felt vibration. The upper limit is set by the mechanical properties of the basilar membrane and hair cells. With age (presbycusis), the upper limit progressively decreases, often to 12–15 kHz or below by middle age. High-frequency hearing loss is exacerbated by cumulative noise exposure and ototoxic drug exposure.

A4. Define the threshold of hearing and the threshold of pain in terms of sound pressure level (dB SPL).

Answer: The threshold of hearing (the softest sound detectable by a young adult with normal hearing at 1 kHz) is defined as 0 dB SPL, corresponding to a sound pressure of 20 micropascals (20 × 10⁻⁶ Pa). This is a vanishingly small pressure — approximately 5 × 10⁻¹⁰ times atmospheric pressure. The threshold of pain is approximately 130–140 dB SPL (about 200 Pa sound pressure), at which point the auditory system begins to be damaged. The dynamic range of human hearing thus spans approximately 130 dB, or a pressure ratio of roughly 3 million to one.

A5. What is the relationship between the speed of sound and the temperature of air? How much does the speed of sound change between 0°C and 20°C?

Answer: The speed of sound in an ideal gas depends on temperature according to: c = √(γRT/M), where γ is the ratio of specific heats (≈1.4 for air), R is the gas constant, T is absolute temperature in Kelvin, and M is the molar mass of air. Practically, for air near room temperature: c ≈ 331.3 + 0.606·T_C (m/s), where T_C is the temperature in Celsius. At 0°C: c ≈ 331 m/s. At 20°C: c ≈ 343 m/s. The difference is about 12 m/s — roughly 3.5% — which is why wind instruments must be "warmed up" before tuning, as the pitch of a flute (for example) rises as the instrument warms and the internal air speed increases.

Part B — Conceptual Application

B1. A thunderstorm produces both lightning (electromagnetic, traveling at the speed of light) and thunder (acoustic, traveling at approximately 340 m/s). If you observe the lightning flash and count 5 seconds before hearing the thunder, approximately how far away is the lightning strike? What does this tell you about the relative speed of sound and light?

Answer: Since light travels at 3 × 10⁸ m/s, the light from the lightning reaches you essentially instantaneously. The 5-second delay represents the travel time of sound: distance = speed × time = 340 m/s × 5 s = 1,700 meters, or approximately 1.7 km (just over 1 mile). This illustrates that light is approximately 880,000 times faster than sound: 300,000,000/340 ≈ 882,000. In practical terms, this means that in an outdoor concert setting, a musician standing just 34 meters (about 112 feet) from a listener is already 100 ms "behind" in acoustic arrival time relative to a nearer musician — a delay equivalent to a sixteenth note at 150 BPM, and readily audible as a timing offset.

Part C — Worked Example

C1. A guitar string vibrates at 330 Hz (E4). The string is 65 cm long. (a) Calculate the speed of the wave on the string. (b) If the string's linear mass density is 0.85 g/m and the wave speed you calculated is correct, what must the string tension be?

Answer:

(a) Wave speed on the string: For a string vibrating in its fundamental mode, the wavelength equals twice the string length (the fundamental has a half-wavelength fitting the string length): λ = 2L = 2 × 0.65 m = 1.30 m. The wave speed on the string is therefore: v = fλ = 330 Hz × 1.30 m = 429 m/s.

Note that this is the wave speed on the string itself — much faster than sound in air (343 m/s) because the string has high tension and low mass density.

(b) String tension: The wave speed on a string is given by v = √(T/μ), where T is the tension (Newtons) and μ is the linear mass density (kg/m). Converting μ = 0.85 g/m = 0.00085 kg/m, and solving for T:

T = v²μ = (429 m/s)² × 0.00085 kg/m = 184,041 × 0.00085 ≈ 156.4 Newtons

This is approximately 35 pounds of force on a string thinner than a millimeter — illustrating why guitar neck construction requires significant structural rigidity, and why string breakage is such a mechanical event: all that tension releasing instantaneously.

Chapter 2: Oscillators, Resonance, and Damping

Part A — Factual Recall

A1. State the equation of motion for a simple harmonic oscillator and identify each term.

Answer: The equation of motion for a simple harmonic oscillator is: m(d²x/dt²) + kx = 0, where m is the mass (kg), x is displacement from equilibrium (m), t is time (s), and k is the spring constant (N/m). The term m(d²x/dt²) represents the inertial force (Newton's second law); kx represents the restoring force (Hooke's law), which is always directed toward equilibrium and proportional to displacement. The solution is x(t) = A·cos(ω₀t + φ), where ω₀ = √(k/m) is the natural angular frequency.

A2. Define the quality factor (Q) of a resonator and explain what it tells us about the resonator's behavior.

Answer: The quality factor Q = ω₀/(2β) = f₀/Δf, where ω₀ is the resonant angular frequency, β is the damping coefficient, f₀ is the resonant frequency, and Δf is the bandwidth (range of frequencies within 3 dB of the peak response). Q measures the sharpness of resonance. A high-Q resonator (large Q) has a narrow bandwidth, rings for many cycles after excitation (low energy loss per cycle), and is highly frequency-selective. A tuning fork has Q ≈ 1,000–10,000; a concert hall might have Q ≈ 5–10 for its lowest modes. Low-Q resonators damp quickly and have broad, gentle frequency responses.

A3. What is the difference between underdamped, critically damped, and overdamped systems?

Answer: These three regimes describe how a damped oscillator responds to initial displacement. An underdamped system (damping ratio ζ < 1) oscillates at a slightly reduced frequency while the amplitude decays exponentially — it rings. A critically damped system (ζ = 1) returns to equilibrium as rapidly as possible without oscillating; this is the ideal for mechanisms like door dampers. An overdamped system (ζ > 1) returns to equilibrium more slowly than critical damping, without oscillating but decaying asymptotically. Musical instruments are almost universally underdamped (they ring); piano dampers move systems toward the overdamped regime to end notes.

A4. What is resonance, and at what frequency does the amplitude of a driven oscillator peak?

Answer: Resonance is the phenomenon in which a driven oscillating system reaches maximum amplitude response when the driving frequency matches (or is near) the system's natural frequency. For a lightly damped system driven by a sinusoidal force F₀cos(ωt), the amplitude peaks near ω₀ = √(k/m). For higher damping, the peak shifts slightly below ω₀. At resonance, the driver does work most efficiently because the velocity of the oscillator is in phase with the driving force, enabling continuous energy transfer. Resonance underlies the amplification of sound in musical instruments, concert halls, and the vocal tract.

A5. How does the energy of an underdamped resonator decay over time?

Answer: The energy of an underdamped resonator decays exponentially as E(t) = E₀·e^(−2βt), where β is the damping coefficient. This means energy decreases by a fixed proportion in each unit of time. The decay of sound level in a reverberant room follows the same law (hence the logarithmic RT60 measure). Since energy is proportional to amplitude squared, the amplitude decays as A(t) = A₀·e^(−βt) — also exponential. This exponential decay appears as a straight line on a dB (logarithmic) scale, which is why room decay curves and instrument sustain curves are often plotted in dB versus time.

Part B — Conceptual Application

B1. Why do violin makers "tap tune" the top and back plates of a violin during construction, and what are they listening for?

Answer: When a luthier taps a violin plate, they excite the plate's resonant modes and listen to the resulting ring to assess the plate's vibrational characteristics before assembly. The main resonant modes of the top plate (modes known as the "monopole" or "breathing" mode at approximately 100–200 Hz, and higher modes) determine how efficiently the plate radiates sound and at what frequencies it amplifies the string's vibration. A skilled luthier listens for specific pitch, sustain (Q factor), and modal quality, and adjusts thickness and arching to bring these into a target range. The structural coupling between top plate, back plate, and air cavity inside the body creates the complex resonance structure that gives fine violins their characteristic sound.

Part C — Worked Example

C1. A mass-spring system has m = 0.25 kg and k = 100 N/m. It is set oscillating with an initial amplitude of 0.05 m. The damping coefficient β = 0.8 s⁻¹. (a) What is the natural frequency f₀? (b) What is the Q factor? (c) How long does it take for the amplitude to fall to half its initial value?

Answer:

(a) ω₀ = √(k/m) = √(100/0.25) = √400 = 20 rad/s. Therefore f₀ = ω₀/(2π) = 20/(2π) ≈ 3.18 Hz.

(b) Q = ω₀/(2β) = 20/(2 × 0.8) = 20/1.6 = 12.5. This is a reasonably high-Q system — it will oscillate for many cycles before the amplitude falls substantially.

(c) Amplitude decays as A(t) = A₀·e^(−βt). Setting A(t) = A₀/2: e^(−βt) = 0.5, so −βt = ln(0.5) = −0.693, giving t = 0.693/β = 0.693/0.8 ≈ 0.866 seconds. At f₀ ≈ 3.18 Hz, this represents about 0.866 × 3.18 ≈ 2.75 oscillation cycles to reach half amplitude — consistent with a moderately well-damped system.

Chapter 3: Interference and Superposition

Part A — Factual Recall

A1. State the principle of superposition for sound waves and explain its physical basis.

Answer: The principle of superposition states that when two or more waves occupy the same region of space simultaneously, the net displacement at any point is the algebraic sum of the individual displacements due to each wave. For sound in air at normal amplitudes, this principle holds exactly because air behaves as a linear medium: the pressure perturbations are small enough that the restoring forces are proportional to displacement. Superposition breaks down only at extreme sound intensities (nonlinear acoustics) or in some specialized acoustic metamaterials.

A2. Define constructive and destructive interference and state the conditions for each.

Answer: Constructive interference occurs when two waves arrive at a point in phase (phase difference of 0°, 360°, 720°, etc., or path difference of 0, λ, 2λ, ...). The amplitudes add, producing a resultant with amplitude equal to the sum of the individual amplitudes. Destructive interference occurs when two waves arrive exactly out of phase (phase difference of 180°, 540°, etc., or path difference of λ/2, 3λ/2, ...). The amplitudes cancel, producing a resultant of reduced or zero amplitude. Real acoustic situations typically produce partial interference because the two sources are not perfectly matched in amplitude.

A3. What is the beat frequency produced by two tones at 440 Hz and 443 Hz? Describe what a listener hears.

Answer: The beat frequency equals the absolute frequency difference: f_beat = |443 − 440| = 3 Hz. A listener hears a single tone at the average frequency (441.5 Hz, approximately A4) whose amplitude pulsates three times per second. The pulsation ranges from nearly zero amplitude (when the two waves are out of phase) to maximum amplitude (when they are in phase). Musically, 3 Hz beating is perceived as a slow "wavering" or "wobble" — acceptable as vibrato if pleasingly musical in character, but perceived as disagreeable mistuning when two instruments are expected to be in unison. Musicians use the disappearance of beats as the criterion for perfect tuning.

A4. Explain how a standing wave forms in a string fixed at both ends.

Answer: A standing wave forms when a traveling wave on the string is reflected from the fixed end and the incident and reflected waves superpose. The fixed endpoints impose the boundary condition that displacement is always zero there (displacement nodes at both ends). Only wavelengths for which an integer number of half-wavelengths fits the string length are self-consistently reinforced by the reflections: L = nλ/2, giving frequencies f_n = nc/(2L) for n = 1, 2, 3, ... The resulting pattern has fixed nodal positions (zero displacement) and antinodal positions (maximum displacement) and does not travel — hence "standing wave." The fundamental (n = 1) has a single antinode at the midpoint; the second harmonic (n = 2) has two half-wavelengths and a node at the midpoint.

A5. Describe the Doppler effect and give a musical example.

Answer: The Doppler effect is the shift in observed frequency of a wave due to relative motion between the source and observer. When source and observer approach each other, successive wave crests arrive more frequently (higher pitch); when they recede, they arrive less frequently (lower pitch). The observed frequency is f_obs = f_source·(c ± v_obs)/(c ∓ v_source). A musical example is the Leslie rotating speaker cabinet used with Hammond organs: as the rotating speaker horn approaches and recedes from the listener, the perceived pitch varies periodically, producing a characteristic vibrato and amplitude modulation that is impossible to replicate with static speakers.

Part B — Conceptual Application

B1. Concert halls sometimes have "dead spots" where the sound level is significantly lower than in surrounding seats. Explain the acoustic mechanism responsible, and suggest why some frequencies might be more affected than others.

Answer: Dead spots arise from destructive interference between the direct sound from the stage and one or more reflected paths (from walls, ceiling, balcony faces, or floor). At the specific seat location, these paths have a length difference corresponding to a half-wavelength of a particular frequency, causing the reflected wave to arrive out of phase and partially cancel the direct sound. Because the interference condition (path difference = λ/2) is frequency-specific, a dead spot may affect only a narrow band of frequencies — a listener may find that certain pitches sound unnaturally quiet while others are unaffected. Mitigating dead spots requires either reducing the reflection amplitude (absorption), disrupting its coherence (diffusion), or redesigning the geometry so no single strong reflection path creates a stable cancellation zone.

Part C — Worked Example

C1. Two loudspeakers separated by 2.0 m are playing the same 500 Hz tone in phase. A listener stands 4.0 m directly in front of the midpoint between the speakers. (a) Confirm that the listener is at a point of constructive interference. (b) The listener moves 1.5 m to the side (perpendicular to the line between speakers). Calculate the path length difference to the two speakers and determine whether the new position is closer to constructive or destructive interference.

Answer:

Speed of sound c = 343 m/s; wavelength λ = 343/500 = 0.686 m.

(a) At the midpoint directly in front, the listener is equidistant from both speakers. Path length difference ΔL = 0. Since ΔL = 0 = 0·λ, this is a condition for constructive interference. The listener is at the central maximum.

(b) Speaker positions: Speaker 1 at (−1.0, 0) m; Speaker 2 at (+1.0, 0) m; Listener at (1.5, 4.0) m.

Distance to Speaker 1: d₁ = √[(1.5 − (−1.0))² + 4.0²] = √[(2.5)² + 16] = √[6.25 + 16] = √22.25 ≈ 4.717 m

Distance to Speaker 2: d₂ = √[(1.5 − 1.0)² + 4.0²] = √[(0.5)² + 16] = √[0.25 + 16] = √16.25 ≈ 4.031 m

Path difference: ΔL = 4.717 − 4.031 = 0.686 m

Since λ = 0.686 m, ΔL = exactly 1λ. This is a condition for constructive interference — the listener is at the first-order maximum off-axis. The listener will hear approximately the same loudness as at the center, but these maximum and minimum positions shift with frequency, explaining why the timbral balance of a loudspeaker system can vary significantly with listener position.

Chapter 4: Wave Transmission and Impedance

Part A — Factual Recall

A1. Define acoustic impedance and give its SI units.

Answer: Acoustic impedance Z is the ratio of acoustic pressure p to the volume velocity U of the medium: Z = p/U. The characteristic impedance of a medium (the impedance of a plane wave in an infinite medium) is Z₀ = ρc, where ρ is the medium's density (kg/m³) and c is the wave speed (m/s). SI units of specific acoustic impedance are Pa·s/m, also called Rayls (after Lord Rayleigh). At 20°C and 1 atm, the specific acoustic impedance of air is approximately 413 Rayls; for water, it is approximately 1.5 × 10⁶ Rayls.

A2. What fraction of sound intensity is transmitted across the boundary between air and water?

Answer: The transmission coefficient T_I for intensity at a boundary between two media is: T_I = 4Z₁Z₂/(Z₁ + Z₂)². With Z_air ≈ 413 Rayls and Z_water ≈ 1,480,000 Rayls:

T_I = 4 × 413 × 1,480,000 / (413 + 1,480,000)² ≈ 2.44 × 10⁹ / (1.48 × 10⁶)² ≈ 2.44 × 10⁹ / 2.19 × 10¹² ≈ 0.00111 ≈ 0.11%

Only about 0.11% of the incident intensity is transmitted; the rest is reflected. This 99.9% reflection (corresponding to nearly 30 dB of insertion loss) explains why we cannot hear an underwater conversation from above the water surface, and why sonar transducers require special impedance-matching techniques.

A3. Why does a violin's soundboard dramatically increase the acoustic output of the strings?

Answer: A vibrating string is a very poor direct radiator of sound because of severe acoustic impedance mismatch: the string has a very small cross-sectional area and radiates very little acoustic power directly into the air. The soundboard (top plate) addresses this by coupling the string vibration to a large vibrating surface area. More area means more air molecules set in motion, dramatically increasing radiation efficiency. The soundboard also introduces its own resonances (amplifying certain frequencies) and shapes the timbral characteristics of the instrument. The bridge couples string motion to the soundboard; the bass bar and soundpost inside the body distribute vibrations and coordinate the motion of top and back plates.

A4. Define the reflection coefficient at an acoustic boundary and explain what happens at a perfectly rigid boundary.

Answer: The pressure reflection coefficient at a boundary between two media is: r_p = (Z₂ − Z₁)/(Z₂ + Z₁). At a perfectly rigid boundary, Z₂ → ∞, so r_p → +1: the pressure wave is reflected with the same polarity (pressure doubles at the boundary) and zero transmission. At a perfectly soft boundary (Z₂ → 0, open end of a pipe), r_p → −1: the wave is reflected with inverted polarity, and pressure is zero at the boundary. These boundary conditions determine which modes can exist in acoustic resonators (organ pipes, wind instruments, vocal tract).

A5. What is the significance of the quarter-wavelength transformer in acoustics?

Answer: A quarter-wavelength transformer (a tube of length λ/4) connects two acoustic regions with different impedances. If the tube has an intermediate impedance Z_t = √(Z₁Z₂), it achieves perfect impedance matching at the design frequency: no reflection occurs, and all power is transmitted. This principle, well known in electrical transmission line theory, also appears in audiology (the middle ear ossicles function as an impedance matching transformer between the air in the outer ear canal and the fluid-filled inner ear), in instrument acoustics (the cone of a trumpet bell), and in the design of acoustic horns.

Part B — Conceptual Application

B1. Explain why speaking into a tube or megaphone makes your voice louder at a distance, using the concept of acoustic impedance matching.

Answer: The megaphone works by creating a gradual impedance transformation between the relatively high impedance at the narrow throat (near the mouth) and the low impedance of the open air at the wide bell. This gradual transformation reduces reflection at each point along the horn, allowing more acoustic energy to be transmitted into the air rather than reflected back toward the source. The directional focusing of the wavefront also concentrates energy in the forward direction, increasing intensity along the axis of projection. Without the megaphone, the abrupt impedance mismatch between the mouth and the large air volume causes most of the vocal energy to be reflected back rather than radiated outward.

Part C — Worked Example

C1. An open cylindrical pipe (open at both ends) is 0.60 m long. (a) Find the fundamental frequency and the first three overtones. (b) A closed pipe of the same length has which harmonics? (c) How does the tonal quality differ?

Answer:

(a) Open pipe: Resonant frequencies f_n = nc/(2L) for n = 1, 2, 3, 4, ... With c = 343 m/s and L = 0.60 m: f₁ = 343/(2 × 0.60) = 343/1.20 ≈ 286 Hz. Overtones: f₂ = 572 Hz, f₃ = 857 Hz, f₄ = 1143 Hz. The open pipe supports all integer harmonics.

(b) Closed pipe: Resonant frequencies f_n = (2n−1)c/(4L) for n = 1, 2, 3, ... f₁ = 343/(4 × 0.60) = 343/2.40 ≈ 143 Hz. This is one octave lower than the open pipe of the same length. Overtones: f₂ = 3 × 143 = 429 Hz, f₃ = 5 × 143 = 714 Hz. The closed pipe supports only odd harmonics (1st, 3rd, 5th, ...).

(c) Tonal quality: The open pipe's complete harmonic series (odd and even harmonics) produces a bright, full timbre similar to the flute (open at both ends). The closed pipe's odd-harmonic-only spectrum produces a hollow, clarinet-like quality. The clarinet is indeed a closed-pipe instrument (the reed-end is acoustically closed), which is why it overblows to the 12th (an octave + a fifth) rather than to the octave like the flute. This explains one of the distinctive features of woodwind acoustics directly from wave physics.

Chapter 5: Resonance and Filters

Part A — Factual Recall

A1. What is a filter in audio signal processing, and what are the four basic filter types?

Answer: A filter is a system that selectively attenuates or passes certain frequency components of a signal. The four basic types are: (1) Low-pass filter (LPF) — passes frequencies below the cutoff frequency, attenuates above; (2) High-pass filter (HPF) — passes above the cutoff, attenuates below; (3) Band-pass filter (BPF) — passes a specific band of frequencies around a center frequency; (4) Band-reject (notch) filter — attenuates a specific frequency band. Filters are described by their order (which determines roll-off slope: 1st-order = 6 dB/octave, 2nd-order = 12 dB/octave, etc.) and their frequency response characteristics (Butterworth, Chebyshev, elliptic, etc.).

A2. What is the cutoff frequency of a filter, and how is it defined?

Answer: The cutoff frequency of a filter is the frequency at which the filter's response falls to 1/√2 (approximately 0.707) of its passband amplitude, corresponding to a −3 dB reduction in level (since 20·log₁₀(1/√2) ≈ −3 dB). The cutoff frequency marks the boundary between the passband (frequencies that pass relatively unattenuated) and the stopband (frequencies that are strongly attenuated). For a resonant bandpass filter, two −3 dB cutoff frequencies define the bandwidth: Δf = f₂ − f₁, and Q = f₀/Δf.

A3. Explain how a violin body acts as a filter that shapes the tonal spectrum of the strings.

Answer: A violin body has numerous acoustic resonances (modes of the top plate, back plate, air cavity, etc.) that create peaks in its frequency response at specific frequencies. When the strings vibrate and excite the bridge, the bridge transfers energy to the top plate; the body amplifies frequencies near its resonant modes and attenuates others. This selective amplification is equivalent to a bank of resonant bandpass filters applied to the string's harmonic spectrum. The resulting filtered spectrum — the radiated sound — has a characteristic timbre shaped by the specific location, bandwidth, and coupling strength of the body's resonances. This is why two violins with identical strings but different body shapes and wood characteristics sound profoundly different.

A4. Define roll-off rate for a filter and explain its significance for audio applications.

Answer: Roll-off rate is the rate at which a filter attenuates signals beyond its cutoff frequency, typically expressed in dB per octave or dB per decade. A first-order filter has a 6 dB/octave roll-off; a second-order filter has a 12 dB/octave roll-off; an nth-order filter has 6n dB/octave. In audio applications, steeper roll-off is often desired (e.g., anti-aliasing filters need to attenuate to below the noise floor within a fraction of an octave above the cutoff), but steeper filters introduce more phase distortion in the transition band. The tradeoff between roll-off steepness and phase linearity is a fundamental design constraint in audio filter engineering.

A5. What is an equalizer (EQ), and how does a parametric EQ differ from a graphic EQ?

Answer: An equalizer is a multi-band filter that allows selective boosting or cutting of different frequency regions of an audio signal. A parametric EQ allows the user to adjust three parameters for each band: center frequency, gain (boost/cut amount), and bandwidth (Q factor), giving maximum flexibility for precise frequency shaping. A graphic EQ has a fixed set of bands at predetermined frequencies (typically in octave or 1/3-octave intervals) and allows only gain adjustment at each band — the frequency and Q are fixed. Parametric EQs are preferred in studio and broadcast contexts for precision; graphic EQs are common in live sound for quick global spectrum adjustment and feedback control.

Part B — Conceptual Application

B1. The formant frequencies of the human vocal tract determine vowel identity. Using the analogy of acoustic filters, explain why different vowels require different vocal tract shapes.

Answer: The vocal tract acts as an acoustic resonator — a filter — whose resonant frequencies (formants) depend on its length, shape, and boundary conditions. Changing the shape of the vocal tract (raising or lowering the tongue, rounding or spreading the lips, raising or lowering the larynx) changes the resonant frequencies of the filter, much as changing the dimensions of an acoustic cavity changes its resonant modes. The glottal source produces a harmonic-rich spectrum at the vocal fundamental frequency; the vocal tract filter amplifies harmonics near formant frequencies and attenuates others. Because different vowels have distinct formant patterns (particularly F1 and F2), producing different vowels requires precisely controlling the vocal tract shape to tune the filter to the appropriate formant configuration.

Part C — Worked Example

C1. A guitar body has a main air resonance (Helmholtz mode) at 105 Hz and a top-plate resonance at 195 Hz, each with a Q of approximately 20. Sketch the qualitative frequency response these two resonances contribute, and calculate the −3 dB bandwidth of each resonance.

Answer: Bandwidth of Helmholtz resonance: Δf₁ = f₀/Q = 105/20 = 5.25 Hz. Bandwidth of top-plate resonance: Δf₂ = 195/20 = 9.75 Hz.

The Helmholtz resonance produces a narrow peak centered at 105 Hz with a 5.25 Hz −3 dB bandwidth, strongly amplifying the fundamental of pitches in the bass register (around low A, B, and B♭). The top-plate resonance produces a broader peak at 195 Hz with a 9.75 Hz bandwidth, amplifying pitches near G3–Ab3. Between these resonances, there is typically a relative dip where the guitar's output is weaker. Between about 200 Hz and 2000 Hz, the radiation efficiency rises more broadly. The combination of these resonances creates the characteristic tonal "hump" of the acoustic guitar around 100–250 Hz and the "wolf note" sensitivity near 195 Hz, where certain notes may sound louder and less controlled than their neighbors.

Chapter 6: Scales, Temperament, and Tuning

Part A — Factual Recall

A1. What is the frequency ratio of a just perfect fifth, and why does building twelve consecutive just fifths not return to the original pitch?

Answer: A just perfect fifth has a frequency ratio of 3:2. Starting from C and ascending twelve consecutive just fifths (C-G-D-A-E-B-F♯-C♯-G♯-D♯-A♯-E♯-B♯), one arrives at a pitch (B♯) that should be the same as the original C seven octaves higher. Seven octaves = (2)⁷ = 128. Twelve just fifths = (3/2)¹² = 531,441/4,096 ≈ 129.746. The ratio between twelve just fifths and seven octaves is 129.746/128 ≈ 1.01364, approximately 23.46 cents (the Pythagorean comma). Because the circle of just fifths never closes perfectly, it is impossible to have a keyboard instrument with both a complete circle of fifths and pure fifths throughout.

A2. Define equal temperament and state the frequency ratio between adjacent semitones.

Answer: Equal temperament is the tuning system in which the octave is divided into twelve equal semitones, each a frequency ratio of 2^(1/12) ≈ 1.05946. Because all semitones are identical, all keys have exactly the same interval qualities, enabling free transposition and modulation without retuning. The equal-tempered perfect fifth is 2^(7/12) ≈ 1.49831, which is 1.955 cents narrower than the just fifth (3/2 = 1.5) — a nearly imperceptible difference in isolation but audible as slow beating when two fifths are played simultaneously.

A3. What is the syntonic comma, and where does it arise?

Answer: The syntonic comma is the interval of approximately 21.51 cents (frequency ratio 81/80) representing the discrepancy between a Pythagorean major third (64 steps of the just fifth: (3/2)⁴ / 2² = 81/64 ≈ 408 cents) and a just major third (5:4 ratio = 386 cents). The syntonic comma arises because the Pythagorean third, derived from cycles of pure fifths, is noticeably wider and more dissonant than the just third derived from the 5th harmonic. Meantone temperament was developed specifically to eliminate the syntonic comma by narrowing the fifth by one-quarter comma, producing pure major thirds.

A4. Name three historical temperament systems other than equal temperament and briefly describe their distinguishing feature.

Answer: (1) Pythagorean tuning: all fifths are pure (3:2), producing pure fifths and fourths but very wide ("Pythagorean") major thirds and a sharp wolf fifth. Used in medieval and early Renaissance music. (2) Meantone temperament (quarter-comma meantone): the fifth is narrowed by one-quarter of the syntonic comma to produce pure major thirds (5:4). Excellent for music in a few central keys; wolf fifth and wolf thirds become very harsh in remote keys. (3) Well temperament (e.g., Werckmeister III, Kirnberger III): all keys are usable but have slightly different interval qualities; central keys (C, G, D, F) are close to meantone, while remote keys (F♯, D♭) have slightly wider intervals, giving each key a distinct "color" or character.

A5. What is just intonation, and what is its main practical disadvantage for keyboard instruments?

Answer: Just intonation is a tuning system in which all intervals are tuned to pure, beatless frequency ratios derived from small integers: octave 2:1, fifth 3:2, fourth 4:3, major third 5:4, minor third 6:5. Just intonation produces the most acoustically pure consonances, free of the beating that arises from tempered tuning. Its main practical disadvantage for keyboard instruments is that the ratios required for "pure" intervals shift depending on the harmonic context: the D major chord requires a different D than the G major chord in the key of C. A keyboard with fixed pitch cannot accommodate these shifts, resulting in unacceptably harsh intervals in some harmonically remote passages. Variable-pitch instruments (voice, trombone, fretless strings) naturally adjust toward just intonation in ensemble contexts.

Part B — Conceptual Application

B1. A choral group performs a cappella, without instrumental accompaniment. Research suggests that choral singers drift toward just intonation. Why might this happen, and what would be the consequence if the choir returned to the opening key after a modulation?

Answer: A cappella singers are free to adjust intonation in real time without the constraint of a fixed-pitch instrument. When tuning pure thirds and fifths (minimizing beats in the resonant acoustic of a church or concert hall), singers naturally gravitate toward just ratios, which maximally reinforce overtones and produce a "locked in" choral resonance. However, as the harmonic progression moves through modulations, if singers always tune pure intervals relative to the most recent chord, they may accumulate comma-sized errors. Moving through a progression that returns to the tonic via a different harmonic path (e.g., through the subdominant side) can result in a final chord that is a syntonic comma (21.5 cents) lower than the opening pitch — the choir has drifted flat. This is well-documented in experimental music psychology and presents a real challenge in long unaccompanied repertoire.

Part C — Worked Example

C1. Calculate the frequencies of all notes in a C major scale (C4 through C5) in (a) just intonation and (b) equal temperament. Use A4 = 440 Hz. Identify the note with the largest deviation between the two systems.

Answer:

A4 = 440 Hz. In equal temperament, C4 = A4 / 2^(9/12) = 440 / 1.6818 ≈ 261.63 Hz.

Just intonation ratios from C: C=1, D=9/8, E=5/4, F=4/3, G=3/2, A=5/3, B=15/8, C=2.

(a) Just intonation: C4 = 261.63 Hz; D4 = 261.63 × 9/8 = 294.33 Hz; E4 = 261.63 × 5/4 = 327.04 Hz; F4 = 261.63 × 4/3 = 348.83 Hz; G4 = 261.63 × 3/2 = 392.44 Hz; A4 = 261.63 × 5/3 = 436.04 Hz; B4 = 261.63 × 15/8 = 490.54 Hz; C5 = 523.25 Hz.

(b) Equal temperament: C4=261.63, D4=293.66, E4=329.63, F4=349.23, G4=392.00, A4=440.00, B4=493.88, C5=523.25.

Note that just A4 = 436.04 Hz but equal-tempered A4 = 440.00 Hz — a difference of 3.96 Hz, or about 15.6 cents (with the just A slightly flat relative to the equal-tempered A). The largest deviation is at E4: just E4 = 327.04 Hz vs. ET E4 = 329.63 Hz, a difference of 2.59 Hz or about 13.7 cents (the just third is flatter). For the major third, just intonation is 13.7 cents flatter than equal temperament — a difference that is clearly audible as a reduction in beating and is the central motivation for just tuning in vocal and string ensemble music.

Chapters 7–10: Summary Answers

The following chapters continue the pattern above. For brevity, key answers are condensed.

Chapter 7: Harmony, Tonality, and Functional Theory — Part A Answers

A1: A triad consists of three pitch classes — root, third, and fifth. A major triad has a major third (4 semitones) plus a minor third (3 semitones) above the root; a minor triad has a minor third plus a major third. A diminished triad has two stacked minor thirds; an augmented triad has two stacked major thirds.

A2: The dominant seventh chord (V7) creates tension through the tritone interval between its third and seventh (in C major: the B and F of G7). The tritone is the most dissonant interval in tonal music; it resolves by contrary motion — B moves up a half step to C, F moves down a half step to E — giving the smooth voice-leading resolution to the tonic triad.

A3: A chord progression's "function" refers to its role in the tonal hierarchy: tonic (I) represents stability and rest; dominant (V) creates maximum tension and implies return to the tonic; subdominant (IV) has a "departing" quality. The I-IV-V-I progression demonstrates all three functions and is the harmonic backbone of an enormous range of Western music.

A4: Modulation is the process of establishing a new tonal center. Pivot chord modulation uses a chord common to both the old and new key as a "hinge." Direct modulation is an abrupt key change without preparation. Chromatic modulation uses chords altered by a half step to create smooth voice-leading into the new key.

A5: Voice leading refers to the smooth motion of individual voices (parts) in a harmonic progression. Good voice leading minimizes leap size (preferring stepwise motion), avoids parallel fifths and octaves (which collapse the independence of voices), and resolves dissonant intervals by step to consonant ones.

Chapter 8: Room Acoustics and Architectural Sound — Part A Answers

A1: RT60 (reverberation time) is the time for sound level in a room to decay by 60 dB after the source stops. Sabine's formula: RT60 = 0.161·V/A, where V is room volume (m³) and A is total absorption (m² sabins). For a room of 500 m³ with 50 m² sabins: RT60 = 0.161 × 500/50 = 1.61 seconds — appropriate for chamber music.

A2: Specular reflection follows the law of reflection (angle of incidence = angle of reflection). Diffuse reflection (diffusion) scatters sound in many directions from irregular surfaces. Diffusion is desirable in concert halls because it distributes energy uniformly and reduces coloration from discrete echoes, creating a sense of acoustic envelopment.

A3: Flutter echo is a rapid sequence of discrete reflections between two parallel, reflective surfaces. It is perceived as a metallic "ping" or "flutter" and severely degrades speech intelligibility and music quality. Remedies include splaying one or both walls slightly (breaking the parallel geometry), adding absorption to one surface, or adding diffusion.

A4: The direct-to-reverberant ratio compares the level of the direct sound (first arrival) to the total energy of all reflections. A high ratio (more direct sound) favors clarity (C80 and speech intelligibility); a low ratio (more reverberant sound) favors spaciousness and musical blending. Intimate venues and close microphone placement increase the direct-to-reverberant ratio.

A5: Early reflections (arriving within 20–80 ms of the direct sound) are perceptually integrated with the direct sound due to the Haas effect (precedence effect) and contribute to the sense of room intimacy and envelopment without degrading intelligibility. Late reflections (arriving after 80 ms) are heard as distinct echo if sufficiently loud.

Chapter 9: Strings, Membranes, and Vibrating Bodies — Part A Answers

A1: The fundamental frequency of a stretched string: f₁ = (1/2L)√(T/μ), where L=length, T=tension, μ=linear mass density. Frequency increases with increasing tension, decreasing length, and decreasing mass density — exactly the adjustments made by tuning a stringed instrument (tensioning the peg), fretting (shortening the vibrating length), and choosing string gauge.

A2: The normal modes of a rectangular membrane (drumhead) clamped at its boundary have frequencies f_{mn} = (c/2)√[(m/Lx)² + (n/Ly)²] where m and n are positive integers. Unlike strings, the modes of a membrane are not harmonically related (their ratios are irrational), which is why drums do not produce a definite pitch in the same way that strings do.

A3: The Chladni figures of a vibrating plate are patterns formed by sand or particles accumulating at the nodal lines (where displacement is zero) when the plate is driven at one of its resonant frequencies. They visualize the two-dimensional mode shapes of the plate and are used by instrument makers to assess plate symmetry and mode frequency placement.

A4: Inharmonicity in a stiff string arises because the bending stiffness of the string adds a restoring force proportional to the fourth derivative of displacement (in addition to the tension-derived restoring force). This makes the equation of motion a fourth-order rather than second-order equation, with solutions giving overtone frequencies f_n = nf₁√(1 + Bn²) where B is the stiffness parameter. Piano strings have significant inharmonicity, particularly in the bass register.

A5: A bar (rod) clamped at one end (cantilever) has resonant frequencies in the ratios 1 : 6.27 : 17.55 : ... — highly inharmonic. This explains the metallic, bell-like quality of struck bars and the absence of a clear musical pitch in the bass register of marimbas (where bars are long and inharmonicity is significant relative to the fundamental frequency).

Chapter 10: Timbre and the Fourier Decomposition of Sound — Part A Answers

A1: Timbre is the multidimensional perceptual quality that distinguishes sounds of the same pitch and loudness. Its principal physical correlates include: spectral envelope (distribution of energy across harmonics), temporal envelope (ADSR shape), inharmonicity of partials, noise content, and how these features change over time (temporal spectral evolution).

A2: The Fourier series states that any periodic function can be represented as a sum of sinusoidal functions at harmonically related frequencies: f(t) = a₀/2 + Σ[aₙcos(2πnf₀t) + bₙsin(2πnf₀t)]. The coefficients aₙ and bₙ are the Fourier coefficients, determining the amplitude and phase of each harmonic component. The analysis is unique: every periodic waveform has a unique harmonic spectrum.

A3: The spectral centroid is the amplitude-weighted average frequency of a spectrum: SC = Σ(f·A(f)) / ΣA(f). It correlates with perceived "brightness": high spectral centroid = bright, forward-sounding timbre (violin bowing near the bridge); low spectral centroid = dark, warm timbre (bowing near the fingerboard, or a cello vs. violin comparison).

A4: The short-time Fourier transform (STFT) analyzes a signal's time-varying spectrum by applying the Fourier transform to successive short windows of the signal. The resulting spectrogram shows frequency content as a function of time. Time-frequency resolution is constrained by the Gabor uncertainty principle: short windows give good time resolution but poor frequency resolution; long windows give good frequency resolution but poor time resolution.

A5: A square wave of fundamental frequency f₀ contains only odd harmonics (f₀, 3f₀, 5f₀, ...) with amplitudes proportional to 1/n. A sawtooth wave contains all harmonics (odd and even) with amplitudes proportional to 1/n. These properties follow directly from Fourier series analysis and explain the timbral difference between clarinet-like tones (odd harmonics, hollow, nasal) and string-like tones (all harmonics, bright, full).

PART TWO: CHAPTERS 11–20 (Part A Only)

Chapter 11: The Physics of Keyboard Instruments A1: Piano strings are struck by a felt hammer (impulsive excitation), not plucked or bowed. The hammer contact duration (~1–4 ms) determines which harmonics are suppressed: harmonics with nodes at the strike point (typically 1/7 to 1/8 of string length) are not excited, contributing to the piano's characteristic tone. A2: Piano inharmonicity increases with string stiffness (thicker, shorter strings). Bass strings are wound with copper to increase mass without excessive stiffness; treble strings are thinner and stiffer. "Stretch tuning" compensates for inharmonicity by tuning octaves slightly wide so the 2nd harmonic of the lower note better matches the fundamental of the upper note. A3: The una corda (soft pedal) on a grand piano shifts the action so the hammer strikes two (or one in older designs) strings instead of three, changing both the timbre (fewer strings excited) and the contact characteristics. A4: The sostenuto pedal holds up the dampers of only those notes depressed at the moment of pedaling, leaving all other dampers functional. A5: Organ pipes are either flue pipes (sound produced by a jet of air directed across a mouth opening, exciting resonances of an air column) or reed pipes (sound produced by a vibrating brass tongue). Flue pipes are analogous to edge-tone whistles; reed pipes are analogous to clarinet reeds.

Chapter 12: The Physics of Wind Instruments A1: Bore shape determines the harmonic series available: cylindrical bores (clarinets, flutes) behave as closed-end cylinders (clarinet) or open-end cylinders (flute), while conical bores (saxophones, oboes) support the complete harmonic series including even harmonics. A2: Overblowing produces notes an octave higher (flute) or a 12th higher (clarinet) by increasing air pressure to excite the second (or third for clarinet) resonant mode. A3: Tone holes in wind instruments terminate the effective vibrating air column length; opening holes shorten the effective column length, raising the pitch. A4: The bell of a brass instrument provides impedance matching, enabling efficient radiation of high frequencies, and its flare affects which harmonics are reflected back to establish standing waves in the bore. A5: Mutes for brass instruments work by altering the impedance at the bell, changing which frequencies are reflected and transmitted — fundamentally an acoustic filter applied to the resonator.

Chapter 13: Synthesis and the Electronic Voice A1: Additive synthesis constructs sounds by summing sinusoidal partials; subtractive synthesis starts with a harmonically rich waveform and applies filters to remove partials. A2: FM synthesis (Chowning, 1973) uses frequency modulation of a carrier oscillator by a modulator oscillator; the ratio of their frequencies and the modulation index determine the resulting harmonic spectrum — small changes in these parameters produce dramatic timbral shifts. A3: The ADSR envelope controls the amplitude over time: attack (ms to s), decay (fall to sustain level), sustain (held amplitude), release (fall to zero). A4: Wavetable synthesis reads stored waveform tables cyclically and morphs between them to produce evolving timbres. A5: Granular synthesis decomposes sound into very short segments ("grains," typically 1–100 ms) that are scattered, transposed, and overlapped — enabling extreme time-stretching, pitch-shifting, and timbral transformation.

Chapter 14: Digital Audio and the Sampling Revolution A1: The Nyquist-Shannon theorem states that a bandlimited signal can be perfectly reconstructed from samples taken at a rate at least twice the maximum signal frequency. A2: Quantization error is the difference between the true analog sample value and the nearest representable digital level; it appears as low-level noise. With dithering (adding small pseudorandom noise before quantization), quantization error is randomized and becomes benign low-level broadband noise rather than audible harmonic distortion. A3: CD audio uses 44,100 Hz sampling rate and 16-bit depth (96 dB dynamic range). A4: Delta-sigma (ΔΣ) modulation uses oversampling (sampling at many times the Nyquist rate) and noise shaping (pushing quantization noise above the audible band) to achieve high resolution with fewer bits per sample. A5: Jitter is timing irregularity in the sampling clock; it modulates the signal with phase noise, producing sidebands around the audio signal that degrade dynamic range and stereo imaging.

Chapter 15: Perceptual Coding and the Psychoacoustics of MP3 A1: Simultaneous masking occurs when a loud sound masks a quieter sound at a nearby frequency; the masked sound's threshold is elevated by the masker. A2: Forward masking raises thresholds for up to ~200 ms after a masking sound; backward masking is weaker and shorter (~10–20 ms). A3: The MDCT (Modified Discrete Cosine Transform) is a lapped, critically sampled transform that avoids blocking artifacts; it is the core frequency analysis step in MP3 and AAC coding. A4: A 128 kbps MP3 file is about 11 times smaller than the uncompressed PCM equivalent, achieved by discarding psychoacoustically inaudible components using masking models. A5: Bit allocation in MP3 assigns more bits to spectral regions with high signal energy or low masking threshold, and fewer bits (or none) to strongly masked regions, optimizing perceived quality at a given bit rate.

Chapter 16: Time-Frequency Uncertainty and the Gabor Limit A1: The Gabor limit: Δt·Δf ≥ 1/(4π) ≈ 0.08 (in natural units), or equivalently Δt·Δf ≥ 1 for many practical definitions. This means perfect joint time-frequency localization is impossible. A2: The Gaussian window achieves the Gabor limit with equality — it provides the optimal time-frequency concentration for any analysis window. A3: The wavelet transform uses dilated and translated versions of a mother wavelet, automatically providing better time resolution at high frequencies and better frequency resolution at low frequencies — well matched to the logarithmic frequency axis of music. A4: A tone with a frequency uncertainty of ±5 Hz must last at least approximately 1/(2 × 5 Hz) ≈ 100 ms to be spectrally distinct from tones 5 Hz away. A5: The STFT with a short (2 ms) window gives ~0.5 ms time resolution but only ~500 Hz frequency resolution; a long (100 ms) window gives ~10 Hz frequency resolution but only ~50 ms time resolution.

Chapter 17: Quantum Acoustics A1: The energy levels of a quantum harmonic oscillator are En = ℏω(n + 1/2), for n = 0, 1, 2, ... — equally spaced, with the lowest energy ℏω/2 (zero-point energy). The acoustic analog: the normal modes of a vibrating string or resonator, also equally spaced in frequency for the ideal case. A2: Phonons are the quantized units of vibrational energy in a crystal lattice — the acoustic analog of photons for light. At temperatures above absolute zero, lattice vibrations (phonons) contribute to the specific heat of solids and limit their electrical conductivity. A3: Superposition in quantum mechanics allows a system to be simultaneously in multiple energy eigenstates; measurement collapses it to one eigenstate. The acoustic analog is the superposition of normal modes in a vibrating system. A4: Decoherence destroys quantum superposition by entangling the system with environmental degrees of freedom; it explains why quantum superpositions are not observed in macroscopic systems at room temperature. A5: The WKB (Wentzel-Kramers-Brillouin) approximation and transfer matrix methods apply to both quantum mechanical potential wells and acoustic waveguides with varying cross-section, enabling calculation of resonances and transmission coefficients.

Chapter 18: The Physics of Orchestral Instruments A1: The violin family radiates sound most efficiently near the main air resonance (Helmholtz mode, ~270 Hz for violin) and the main wood resonance (~490 Hz), creating characteristic peaks in the radiated spectrum. A2: The "Wolf note" in cellos occurs when a bowed frequency coincides with a strong body resonance, causing the string-body coupled system to oscillate chaotically between two closely spaced modes rather than steadily. A3: Percussion instruments with definite pitch (xylophone, marimba, timpani) are tuned by removing material to adjust partial frequencies toward harmonic ratios; indefinite-pitch instruments (cymbals, snare drum) have inharmonic partials distributed across a wide frequency range. A4: The acoustics of the concert grand piano's soundboard are described by a two-dimensional wave equation with strong orthotropy (different stiffnesses parallel and perpendicular to the grain), producing complex, asymmetric radiation patterns. A5: Orchestral string sections gain collective tonal richness (chorus effect) from the slight intonation differences among players, which create amplitude modulation and timbral beating analogous to the ensemble effect of multiple slightly detuned synthesizer oscillators.

Chapter 19: The Physics of the Singing Voice A1: The glottis (opening between the vocal folds) opens and closes quasi-periodically during phonation; the Bernoulli effect partially drives the closing motion. Each opening produces a puff of air that travels up the vocal tract. A2: The singer's formant is a cluster of resonances (around 2,500–3,200 Hz) that trained classical singers enhance by lowering the larynx and widening the pharynx, enabling them to be heard over an orchestra without amplification. A3: Chest voice, middle voice, and head voice (falsetto) are produced by different vibratory modes of the vocal folds — full (thick edge) vibration, mixed, and thin-edge (flute-like) vibration respectively. A4: Vibrato in singing is a near-sinusoidal variation in fundamental frequency at ~5–7 Hz; in Western classical tradition, vibrato depth of ±50–100 cents is considered expressive; excessive depth or irregularity is considered a defect. A5: Throat singing (Khöömei) exploits strong resonance peaks in the vocal tract to amplify individual harmonics of the vocal fundamental, making a single harmonic audible as a distinct melody above the drone fundamental.

Chapter 20: Voice, Speech, and the Source-Filter Model A1: The source-filter model of speech production: the glottal source generates a spectrally rich pulse train; the vocal tract filter shapes the spectrum by amplifying frequencies near formant resonances; the radiated sound at the lips is the product of source and filter spectra. A2: The first formant (F1) correlates primarily with jaw opening (vowel height: open vowels have high F1); the second formant (F2) correlates with tongue front-back position (front vowels have high F2). A3: Formant synthesis uses time-varying digital resonators (filter banks) to synthesize speech by directly specifying and controlling formant frequencies; it was the basis of the DECtalk and MITalk speech synthesizers. A4: The LPC (Linear Predictive Coding) analysis method models the vocal tract as an all-pole filter whose coefficients are estimated from the autocorrelation of the speech waveform; residual (prediction error) approximates the glottal source signal. A5: Consonants are primarily characterized by transitions in formant frequency (especially F2 and F3) into and out of the consonant closure or constriction, rather than by static spectral features.

PART THREE: CHAPTERS 21–40 (Key Concepts)

Chapter 21: Introduction to Psychoacoustics — Psychoacoustics employs absolute threshold measurement (the softest detectable sound), just noticeable differences (JNDs), and magnitude estimation to characterize auditory sensitivity. Key concepts: equal-loudness contours (Fletcher-Munson curves), phon scale, sone scale. The ear is most sensitive at 1–4 kHz, corresponding to the resonance frequency of the outer ear canal.

Chapter 22: The Inner Ear as a Fourier Analyzer — The basilar membrane performs a continuous mechanical spectral analysis: high frequencies near the base, low frequencies near the apex. Inner hair cells transduce displacement to electrical signals; outer hair cells amplify weak inputs through electromotility (active cochlear mechanics), accounting for the ear's remarkable sensitivity and sharp tuning.

Chapter 23: Spatial Hearing and Sound Localization — Horizontal localization: ITD (below ~1500 Hz) and ILD (above ~1500 Hz) together span the full frequency range. Elevation localization: spectral notches in the HRTF introduced by the pinna shape. Distance: ratio of direct to reverberant level, high-frequency air absorption, and source familiarity.

Chapter 24: Pitch Perception: Theories and Mechanisms — Two main theories: place theory (pitch determined by which basilar membrane location is maximally excited, i.e., spectral peak) and temporal (periodicity) theory (pitch determined by the timing of neural firing patterns). The missing fundamental demonstrates that pitch is a perceptual construct not simply determined by the spectral peak. Duplex theory: place dominates at high frequencies; temporal coding important at low frequencies.

Chapter 25: Consonance, Dissonance, and Roughness — Roughness arises when two partials fall within the same critical band and interact to produce amplitude fluctuations at 20–300 Hz, which the ear perceives as "harsh." Helmholtz: consonance from coincident overtones. Plomp & Levelt: consonance curve peaks at octave, fifth, fourth; valley at about 1/4 critical bandwidth apart. Cultural conditioning also contributes significantly to consonance-dissonance perception.

Chapter 26: The Geometry of Tonal Space — Tymoczko (2006): voice-leading between chords can be modeled as paths in a geometric "orbifold" (quotient space of pitch space by symmetry groups). Chords cluster in geometric regions; efficient voice leading corresponds to short paths. The Tonnetz (Riemann, rediscovered by neo-Riemannian theorists) maps major and minor triads on a toroidal lattice connected by the three common-tone preserving transformations P, L, R.

Chapter 27: Pitch Perception and the Brain — Absolute pitch engages left posterior auditory cortex; pitch-class and chroma processing found in secondary auditory cortex. Training studies show plasticity in pitch processing areas. fMRI studies identify distinct activation patterns for melodic versus harmonic processing.

Chapter 28: Auditory Scene Analysis and Stream Segregation — Bregman's principles of auditory stream segregation: frequency proximity (nearby frequencies group), common fate (simultaneous amplitude/frequency changes group), harmonicity (common fundamental groups), continuity (gradual change preserves stream continuity). Cocktail party effect: selective attention to one voice in a noisy environment.

Chapter 29: Music and Emotion: Theories and Evidence — Valence (positive/negative) and arousal (high/low energy) are the two primary dimensions of musical affect. Major mode, fast tempo, high loudness, and upward pitch contour increase valence; minor mode, slow tempo, and downward contour decrease it. BRECVEMA framework (Juslin): eight distinct mechanisms by which music triggers emotion.

Chapter 30: Expectation, Tension, and Musical Emotion — Huron's ITPRA theory: Imagination → Tension → Prediction → Reaction → Appraisal. Musical expectation is built from statistical learning of musical regularities; violations of expectation (particularly resolution after delay) generate the strongest emotional responses. Frisson most commonly associated with unexpected harmonic events, dynamic changes, and timbral surprises.

Chapter 31: Musical Memory and Learning — Melody recognition involves encoding of contour, interval size and direction, and rhythmic pattern. Earworms (involuntary musical imagery) engage auditory cortex without external stimulus. Practice effects: neural efficiency increases, motor-auditory coupling strengthens; highly practiced musical sequences become "chunked" into single motor programs.

Chapter 32: Rhythm, Meter, and Neural Entrainment — Meter is a hierarchical grouping of beats into bars; the "tactus" is the beat level most natural for tapping. Neural oscillations (beta, 13–30 Hz; gamma, 30–80 Hz; delta, 1–4 Hz) entrain to musical rhythmic structure. Groove involves specific micro-timing patterns; late bass drum hits in funk music increase groove ratings.

Chapter 33: World Musical Systems and Universal Structures — Cross-cultural universals in music include: discrete pitches organized into scales, melodic contour, rhythmic patterns, group performance. Culture-specific features: scale intervals, tonal hierarchy, modal frameworks, rhythmic cycle complexity. Ragas, maqamat, and diatonic modes provide different frameworks for melodic organization but all involve hierarchical pitch sets.

Chapter 34: Mixing, Mastering, and the Signal Chain — Signal chain: source → preamp → EQ → dynamics processing (compression/limiting) → reverb → mix bus → mastering chain → delivery. LUFS metering for loudness normalization. Dynamic range compression is the dominant driver of the "loudness war" — the practice of maximizing perceived loudness at the expense of dynamic range.

Chapter 35: Spatial Audio and Immersive Environments — Stereo uses two channels and ILD (panning) to create a phantom image between speakers. 5.1 surround adds center and surround channels. Ambisonics encodes the full three-dimensional sound field using spherical harmonics and decodes for any speaker layout. Binaural audio uses HRTFs to render 3D audio over headphones.

Chapter 36: Machine Listening and Music Information Retrieval — MIR tasks: beat tracking (onset detection + periodicity analysis), key finding (pitch class profile matching), chord recognition (chroma feature analysis), genre classification (MFCCs + machine learning classifiers). Deep learning (CNNs, RNNs, Transformers) now dominates MIR performance on most benchmark tasks.

Chapter 37: Chaos, Complexity, and Musical Structure — 1/f noise in music: note duration distributions, melodic interval distributions, and dynamic fluctuations in many musical traditions follow 1/f power spectra — an intermediate between white noise and fully predictable structure. Phase transitions in musical complexity: sudden shifts in textural density, dynamics, or harmonic rhythm correspond to bifurcations in dynamical models of musical structure.

Chapter 38: Symmetry Groups and Musical Structure — The twelve pitch classes under transposition form Z₁₂ (cyclic group of order 12). Tone rows in serial music form equivalence classes under the dihedral group of the row (48 classical row forms: 12 transpositions × {original, inversion, retrograde, retrograde-inversion}). The Messiaen modes of limited transposition have non-trivial isotropy subgroups under the cyclic group Z₁₂.

Chapter 39: Information Theory and Musical Complexity — Shannon entropy H of a melody: H = −Σpᵢlog₂(pᵢ) bits per note, where pᵢ is the probability of each pitch transition. Redundancy = 1 − H/H_max measures predictability. Optimal musical interest: intermediate entropy ("not too predictable, not too random"). Kolmogorov complexity provides an alternative measure of algorithmic compressibility.

Chapter 40: The Future of Music and Physics — Emerging areas: quantum-inspired audio algorithms, neural audio synthesis (neural vocoder, music language models), physics-informed neural networks for instrument modeling, acoustical metamaterials enabling subwavelength sound control. The interface between AI-generated music and human musicality raises fundamental questions about creativity, intention, and cultural meaning that will define the field's future.

For complete solutions to all exercises, including Part B and Part C questions for Chapters 11–40, see the companion solutions manual available through your instructor.