Chapter 10 Exercises: Electronic Sound & Synthesis
Part A: Oscillators, Waveforms, and Fourier Basics
Exercise A1 — Waveform Harmonic Content (a) List the first five harmonics (frequencies and their relative amplitudes) of a sawtooth wave at 110 Hz. (b) Do the same for a square wave at 110 Hz. (c) Which waveform would sound "brighter," and why? Use the concept of spectral energy distribution to explain. (d) Both the sawtooth and square wave include a component at 330 Hz. In the sawtooth, this component has amplitude 1/3. In the square wave, it also has amplitude 1/3 (relative to fundamental). Compare the amplitudes of the 4th harmonic in each waveform. What does this tell you about the importance of even harmonics to timbre? (e) A sine wave and a triangle wave are both "gentle" sounding. Using their harmonic content, explain why the triangle wave sounds slightly fuller or more complex than a pure sine wave, despite both being perceived as relatively simple.
Exercise A2 — The Fourier Reconstruction Using the formula for additive synthesis:
x(t) = Σ aₙ · sin(2πnf₀t) for n = 1, 2, 3, ...
(a) Write out the first four terms of the Fourier series for a square wave with f₀ = 100 Hz. (b) If you include only the first harmonic, what percentage of the square wave's total energy does this represent? (Hint: total energy ∝ Σ 1/n² for odd n; compute the sum for n = 1, 3, 5, ... 99) (c) Describe qualitatively what the sound would be like if you added harmonics one by one from lowest to highest. At what point would it start to sound "square-wave-like"? (d) The Gibbs phenomenon causes an overshoot at discontinuities in Fourier reconstructions. What does this mean in audio terms — what do you hear when a digital square wave has this artifact?
Exercise A3 — Nyquist and Digital Audio (a) The standard CD sample rate is 44,100 samples per second. What is the maximum frequency that can be represented without aliasing (the Nyquist frequency)? (b) If a synthesizer generates a sawtooth wave at 10,000 Hz, what is the frequency of the 5th harmonic? The 6th? Explain what aliasing would do to these harmonics. (c) A recording is made at 22,050 Hz sample rate. What frequencies are lost compared to a 44,100 Hz recording? Can you hear the difference if you're 60 years old? (Hint: consider age-related hearing loss.) (d) "High-resolution" audio (96 kHz or 192 kHz sample rates) is marketed as superior. Using the Nyquist theorem, evaluate this claim. Under what circumstances, if any, might a higher sample rate provide a perceptible benefit?
Exercise A4 — ADSR Envelope Design You want to synthesize the following instrument characters using ADSR envelopes. For each, specify appropriate Attack, Decay, Sustain level, and Release time, and explain the physical acoustic model: (a) Grand piano (note struck and held for 3 seconds) (b) Pizzicato (plucked) cello (c) Flute playing a sustained note (d) Snare drum hit (e) Sustained string pad in electronic music (the characteristic "wash" of synthesizer strings)
Exercise A5 — Source-Filter in Electronics vs. Voice Chapter 9 described the voice as a source-filter system. Chapter 10 describes the synthesizer as oscillator + filter + amplifier. (a) Create a mapping table between the voice's source-filter system and the synthesizer's VCO-VCF-VCA system. Include at least 5 specific correspondences. (b) A synthesizer LFO (low-frequency oscillator) modulates the VCF cutoff at 6 Hz. What vocal production process does this electronically mimic? (c) A synthesizer's VCF is set with high resonance (Q = 8) and cutoff at 800 Hz. What vocal or acoustic configuration does this resemble? (Hint: think about Chapter 9's discussion of formants and overtone singing.) (d) How would you use a VCO + VCF to produce something acoustically similar to the singer's formant described in Chapter 9?
Part B: Synthesis Paradigms — Subtractive, Additive, FM
Exercise B1 — Subtractive Synthesis Design You want to synthesize a bowed cello using subtractive synthesis. (a) What waveform would you choose as your source, and why? Consider the acoustic physics of how a bow excites a string. (b) The cello body has resonances (formants) at approximately 220 Hz, 600 Hz, and 1000 Hz. Design a multi-filter configuration that would produce these resonances. Specify filter type, cutoff, and Q for each. (c) The bow is pressed on the string and the sound builds over about 0.3 seconds. Design an ADSR envelope appropriate for this. (d) Cello vibrato is typically ±25 cents at 5.5 Hz. Describe the synthesizer routing (which LFO to which destination) that would achieve this. (e) What aspects of cello acoustics cannot be captured by this simple subtractive model? (Consider: string inharmonicity, bow-string friction physics, body radiation patterns.)
Exercise B2 — FM Synthesis Mathematics A DX7-style FM operator has carrier frequency fc = 261 Hz and modulator frequency fm = 261 Hz (C:M = 1:1), with modulation index I = 2.5. (a) List the first five sideband frequencies above and below the carrier. (b) The amplitude of the nth sideband is proportional to |Jn(I)| (nth Bessel function). Using a Bessel function table or calculator, find the amplitudes of the carrier component (J₀(2.5)) and the first three sidebands (J₁(2.5), J₂(2.5), J₃(2.5)). (c) Convert these amplitudes to dB relative to the carrier. Which sidebands are significant (say, within 20 dB of the carrier)? (d) How would the spectrum change if the modulation index increased from 2.5 to 5.0? (Describe qualitatively and use Bessel function properties to support your answer.) (e) Change the C:M ratio to 1:1.414 (the modulator is a tritone above the carrier). Are the sidebands harmonically related to the carrier? What timbre type does this produce?
Exercise B3 — Additive Synthesis: Organ and Beyond (a) A Hammond organ drawbar set to [8, 8, 8, 4, 2, 2, 2, 0, 0] (drawbar settings from 16' to 1') adds 9 harmonics with specified amplitudes. List the relative amplitudes and frequencies of the resulting partials for a note at 130 Hz (C3). (b) Design an additive synthesis preset for a "French Horn" timbre. Using published data or acoustic intuition, specify which harmonics (1st through 12th) are present and their relative amplitudes. Compare your design to published spectral analyses of French horn recordings. (c) A xylophone has inharmonic partial frequencies. If the fundamental is at 523 Hz (C5), and the partials appear at ratios of 1.00, 2.76, 5.40, and 8.93 relative to the fundamental, how would you implement this in additive synthesis? Why is a standard harmonic series inappropriate here? (d) What is the primary practical limitation of additive synthesis for real-time performance, and how have modern synthesizers addressed this?
Exercise B4 — FM Synthesis: Instrument Design Using FM synthesis (one carrier, one modulator), design patches for the following timbres. For each, specify fc, fm, I, and explain why your choices produce the target timbre: (a) Electric bass guitar (fundamental at 41 Hz = E1) (b) Marimba note (inharmonic percussion, warm, wood-like) (c) Brass instrument approximation (trumpet-like) (d) Pad / sustained atmosphere (smooth, evolving texture for background music)
Exercise B5 — Comparing Synthesis Paradigms For each of the following acoustic phenomena, evaluate which synthesis paradigm (subtractive, additive, FM, physical modeling) would most naturally and accurately reproduce it. Justify each choice with a physical or mathematical argument: (a) The resonance of a struck metal plate (b) The vowel formants of a sung "ahh" (c) The slow evolution of a bowed string over several bow strokes (d) A rapidly changing, complex electronic texture with no acoustic referent (e) The precise harmonic series of a stopped organ pipe
Part C: Physical Modeling, Karplus-Strong, and Waveguide Synthesis
Exercise C1 — Karplus-Strong Analysis The Karplus-Strong algorithm: 1. Fill a delay line of length P samples with random noise 2. On each step: output[i] = delay_line[0]; new = decay × 0.5 × (delay_line[0] + delay_line[1]); shift and feed back
(a) If the sample rate is 44,100 Hz and you want to synthesize the note A4 (440 Hz), how many samples long should the delay line be? (b) The averaging step (0.5 × (x[0] + x[1])) acts as a two-sample averaging filter. What is the frequency response of this filter? (Hint: it's a simple low-pass filter. Calculate its −3 dB cutoff frequency.) (c) This filter causes high-frequency harmonics to decay faster than low-frequency ones — exactly like a real string. Explain why this is physically correct. (d) If you change the averaging to a weighted average (e.g., 0.6 × x[0] + 0.4 × x[1]), how does this change the "brightness" of the resulting tone? Explain in terms of filter frequency response. (e) The decay factor (0.996 in the example code) controls how quickly the note fades. Translate this into an approximate decay time constant in seconds. What physical property of a real string does this parameter model?
Exercise C2 — Waveguide Synthesis Concepts Digital waveguide synthesis uses bidirectional delay lines to model wave propagation in acoustic instruments. (a) For a cylindrical tube (like a clarinet bore) of length L = 0.5 m, the round-trip time of a wave is 2L/v where v = 343 m/s. What is this round-trip time in milliseconds? What delay line length (in samples at 44,100 Hz) would model this tube? (b) The clarinet is closed at the reed end (one end) and open at the bell (other end). What boundary conditions apply at each end, and how would you implement these at the ends of the delay lines? (c) The reed of a clarinet is a nonlinear element — it can be in one of three states: open (linear flow), partially open (nonlinear transition), or closed (no flow). Why is this nonlinearity important for the clarinet's sound, and how would you implement it in a waveguide model? (d) Physical modeling synthesis is computationally expensive. Research question: What processing power was required to run a real-time physical model of a piano in the 1990s vs. today? What does this tell you about the relationship between physics, mathematics, and computing capability?
Exercise C3 — The Universal Oscillator Equation Section 10.8 showed that the resonant filter in a synthesizer is governed by the same differential equation as the quantum harmonic oscillator. This equation is:
m·ẍ + b·ẋ + k·x = F(t)
where ẍ = d²x/dt², ẋ = dx/dt.
(a) For a mass-spring system, identify what m, b, k, and F(t) physically represent. (b) For an RLC electrical circuit, identify the electrical equivalents of m, b, k, and x (charge or current). (c) For a vocal tract formant, identify the acoustic equivalents. (d) For a quantum harmonic oscillator (in the classical limit), identify the quantum equivalents. (e) The natural resonant frequency of this system is ω₀ = √(k/m). The Q factor is Q = √(km)/b. If you double the "mass" (inductance in the circuit, or mass of a mechanical oscillator), how do ω₀ and Q change? What does this mean for the synthesizer filter: if you conceptually "double the inductance" of the filter, how does the sound change?
Exercise C4 — Aiko's Insight: Physics Across Scales Inspired by Aiko Tanaka's discovery in Section 10.8: (a) The quantum harmonic oscillator has discrete energy levels Eₙ = ℏω₀(n + ½). The harmonic series of a vibrating string has frequencies fₙ = nf₁. Are these structures the same? Explain the similarity and the key difference. (b) In a resonant filter, the "selectivity" (how narrowly it responds to its resonant frequency) is measured by Q. In quantum mechanics, the "sharpness" of an energy level is related to the lifetime of the state (via the energy-time uncertainty principle). Identify the relationship between Q factor and decay time in a classical resonator, then compare this to the quantum relationship between energy width and lifetime. (c) Aiko realizes that her sawtooth + resonant filter is mathematically equivalent to the quantum harmonic oscillator. But the QHO is described by the Schrödinger equation, which is a partial differential equation, while the filter is described by an ordinary differential equation. Identify the specific approximation that makes these mathematically equivalent (hint: consider single-mode vs. multi-mode analysis). (d) Write a 200-word reflection on what it means that a synthesizer filter and a quantum system are governed by "the same equation." Does this mean they are "the same thing"? What is the scope and limits of mathematical analogy in physics?
Exercise C5 — Building a Physical Model Design (in mathematical terms, not necessarily implemented code) a physical model of a handbell (the hand-held tuned bell used in handbell choirs): (a) A handbell is a metal bell — its vibration modes are inharmonic (their frequencies are not integer ratios of the fundamental). Look up or estimate the first four partial frequency ratios for a bell-like object. How would you represent these in a synthesis model? (b) The handbell is struck and its amplitude decays exponentially. Different modes decay at different rates (higher modes decay faster). How would you model this differential decay in a physical model synthesizer? (c) When the ringer "damps" the bell (pressing it against the body), the sound stops instantly. How is this behavior modeled in the synthesis? (d) Handbells are tuned to produce a clear fundamental even though their spectrum is inharmonic. What acoustic phenomenon allows a listener to hear a clear pitch despite inharmonic partials? (Hint: recall the concept of "virtual pitch" or "missing fundamental.")
Part D: History, Culture, and the Physics of Analog/Digital Divide
Exercise D1 — The Moog Revolution (a) The Moog transistor ladder filter has a slope of 24 dB/octave (four poles). Explain what this means: if the cutoff is at 1000 Hz, how much does the filter attenuate a signal at 2000 Hz? At 4000 Hz? (b) The Moog filter's characteristic "warmth" comes from slight harmonic distortion in the transistors (saturation). From a physics standpoint, what does "saturation" mean for a transistor, and how does it generate harmonic distortion? (c) The Minimoog (1970) was a non-modular, fixed-architecture synthesizer — the signal flow was hardwired. Compare the creative flexibility of a modular vs. fixed-architecture synthesizer. What does each approach optimize for, and what does each sacrifice? (d) Wendy Carlos's Switched-On Bach (1968) was the first album to demonstrate that electronic synthesis could produce music of artistic substance. The album used a Moog synthesizer and required enormous time and effort (each note was recorded separately). Using your knowledge of synthesis, explain why sequential recording (one note at a time) was necessary given the technology of 1968.
Exercise D2 — The DX7 and FM Synthesis in Pop Culture The Yamaha DX7 (1983) used 6-operator FM synthesis and became one of the best-selling synthesizers in history. (a) The DX7 has 6 FM operators that can be configured in 32 different "algorithms" — different routings of carriers and modulators. Explain why more operators (6 vs. 2) allow a much wider range of timbres. (b) The DX7's "electric piano" preset (one of its most famous sounds) uses a specific C:M ratio and modulation index. Based on your knowledge of FM synthesis, what ratio and index would produce a piano-like sound? Explain your reasoning. (c) Numerous iconic 1980s pop songs used the DX7's presets heavily — "Take On Me," "Owner of a Lonely Heart," "Jump," and hundreds of others. What does this cultural saturation of a single instrument's physics tell you about the relationship between technology and musical aesthetics? (d) FM synthesis re-emerged in the 2010s as "retro" and "vintage" — the same sounds that seemed cutting-edge in 1983 were nostalgic by 2015. Propose a physics-based explanation for why FM sounds "date" in a way that acoustic instruments don't.
Exercise D3 — Analog vs. Digital: Physics of the Divide (a) An analog synthesizer operates on continuous voltages. A digital synthesizer operates on discrete samples. List three acoustic/physical advantages of analog synthesis and three of digital synthesis. (b) "Analog warmth" is a frequently cited reason why musicians prefer vintage analog synthesizers. Propose at least two specific physical mechanisms that might account for a perceptible difference between analog and digital synthesis of the same sound. (c) A 1973 Minimoog and a modern software emulation of the Minimoog both aim to produce "the same sound." From a physical standpoint, what is it possible to emulate perfectly, and what can only be approximated? Consider component variations, thermal noise, nonlinear saturation, and mechanical characteristics. (d) If you could design the "perfect" synthesizer that combined the best of analog and digital, what would its architecture look like? Describe at least 4 specific design choices and explain the physics behind each.
Exercise D4 — Neural Audio and the Limits of Physics (a) A neural network trained on violin recordings generates audio that is perceptually indistinguishable from a real violin. Does this system "understand" the physics of the violin? Defend your position using both physics and philosophy. (b) Compare neural audio synthesis to the Karplus-Strong algorithm as approaches to violin simulation. What physical knowledge does Karplus-Strong encode that a neural network doesn't? What acoustic quality does the neural network achieve that Karplus-Strong struggles with? (c) A fundamental limitation of neural synthesis: the system can only generate sounds within the distribution of its training data. If you wanted to synthesize a physically impossible instrument (a 3-meter guitar string, a clarinet with infinite bore length), could a neural network do it? Could FM synthesis? Could physical modeling? Explain. (d) Section 10.13 suggests that the future of synthesis involves hybrid physical/neural systems. Design the architecture of such a hybrid system for violin synthesis, specifying which aspects of the instrument you would model physically and which you would learn from data.
Exercise D5 — The Stradivarius Thought Experiment Return to the thought experiment of Section 10.14. (a) Define "acoustically identical" precisely. What measurements would you need to take to verify that a synthesized Stradivarius is acoustically identical to the real thing? (b) A concert violinist plays both the real Stradivarius and the perfect simulation in a double-blind study. If the violinist cannot tell the difference, what conclusion can you draw? If the violinist can tell the difference, what does this imply about what "acoustic identity" means? (c) The argument was made that even a perfect acoustic simulation is not a Stradivarius because it lacks the historical identity. But consider: the ship of Theseus paradox — if every plank of a ship is replaced over time, is it still the same ship? Apply this to the Stradivarius. If every molecule of the original violin were replaced over time (as they are, slowly, through the physics of wood aging), at what point would it cease to be "the Stradivarius"? (d) This thought experiment ultimately is about what music is. Is music a physical phenomenon (sound waves), a social/historical phenomenon (the instrument's identity and the tradition it carries), or a perceptual one (what you experience when you listen)? Defend one position using arguments from physics, acoustics, and the material from this chapter.
Part E: Integration — Synthesis, Physics, and Creativity
Exercise E1 — Designing a Complete Synthesis Patch Design a complete synthesizer patch (VCO → VCF → VCA, with LFO and envelope) for the following target sound: "A bowed instrument of impossible physical size — a string the length of a concert hall, bowed by a giant."
Specify: (a) The waveform and frequency of the main oscillator. Consider: what pitch would a 30-meter string produce? What harmonic content would it have? (b) The filter type, cutoff, and Q. Consider: what resonances would a wooden body the size of a building have? (c) The ADSR envelope (a 30-meter string takes a long time to "speak" when bowed). Give specific values in seconds. (d) LFO modulation for realistic "bowing" behavior. (e) Describe what the resulting sound would feel like to hear. What aspects of this are physically derived, and what are aesthetic choices?
Exercise E2 — Synthesis and the Source-Filter Model Aiko's insight (Section 10.8) showed that the resonant filter is the same as the quantum harmonic oscillator. Extend this comparison: (a) The source-filter model of the voice (Chapter 9) maps onto the synthesizer architecture. Now map the complete chain: vocal folds → vocal tract formants → radiation at the lips → onto: VCO → VCF → ??? → speaker. (b) In the vocal source-filter model, the filter (vocal tract) can be changed in real time by the singer. In a synthesizer, which control changes the filter in real time? Now consider: what is the equivalent of "vowel identity" in a synthesizer patch? Is there a concept of "synthesizer phoneme"? (c) Create a "vowel synthesizer" design specification: a subtractive synthesizer designed specifically to produce all five English vowels using appropriate sawtooth source and multiple bandpass filters. Specify filter parameters for each vowel (use the formant frequency table from Chapter 9). (d) If you implemented this vowel synthesizer, would the resulting sounds be perceived as a human voice? What is missing — what physical properties of the voice does the synthesizer source-filter fail to capture?
Exercise E3 — FM Synthesis and Emergence FM synthesis is an example of emergence: simple mathematical rules (frequency modulation of two sine waves) produce complex acoustic results (rich, multi-sideband spectra). (a) The modulation index I controls how many Bessel function sidebands are significant. As I increases from 0 to 10, the number of significant sidebands grows approximately as I + 2. Calculate the number of sidebands for I = 0, 1, 2, 4, 8, 16. What pattern do you notice? (b) This emergence (simple formula → complex spectrum) is analogous to other examples of emergence in this book. Compare FM synthesis emergence to: (i) Choral blend emerging from 60 individual voices; (ii) A complex vowel spectrum emerging from the source-filter interaction; (iii) A musical scale emerging from the harmonic series. (c) Is FM synthesis "more reductive" or "more emergent" than additive synthesis? Explain. What does this tell you about the relationship between the "level of description" of a synthesis algorithm and its perceptual output? (d) Could you use FM synthesis to produce a sound that contains exactly the harmonic series — fundamental plus exact integer multiples, with 1/n amplitude falloff? Describe what FM parameters would achieve this, and what this tells you about the relationship between FM and additive synthesis.
Exercise E4 — Physical Modeling and the Philosophy of Simulation (a) A perfect physical model of a Stradivarius computes the exact solution to the wave equations governing the violin's physical structure. A perfect recording of the Stradivarius captures the acoustic output of those same equations. In what sense is the model "more" than the recording, and in what sense is the recording "more" than the model? (b) Physical modeling synthesis of a piano requires simulating several physical subsystems: string vibration, hammer felt compression, soundboard resonance, air resonance of the body, string-soundboard coupling, and radiation. Which of these subsystems are governed by the universal oscillator equation (second-order linear ODE)? Which require more complex models? (c) The piano has 88 keys, each with 2–3 strings and complex nonlinear interactions (sympathetic resonance, duplex scaling). A complete physical model would require simulating all of these interactions simultaneously. Estimate the computational complexity of this model in terms of differential equations that must be solved per time step. What does this tell you about the practical limits of physical modeling synthesis? (d) Is there a meaningful difference between "simulating" a piano and "building" a piano? At what level of simulation completeness does the distinction become physically meaningless? At what level does it remain philosophically meaningful?
Exercise E5 — The Future: Synthesis Design Project Design a synthesis instrument that exploits the physics of sound in a way that no existing instrument (acoustic or electronic) currently does. Your design should: (a) Be based on a specific physical principle — identify the differential equation or wave physics at its core. (b) Produce sounds that are physically impossible for any existing acoustic instrument. (c) Offer at least three expressive parameters that correspond to meaningful physical properties. (d) Connect to at least one of the chapter's running themes: technology as mediator, reductionism vs. emergence, constraint and creativity. (e) Include a 300-word "design rationale" explaining the physics, the expressive possibilities, and why this instrument couldn't exist without electronic synthesis.