This appendix collects the mathematical tools that appear throughout the textbook. The goal here is not to derive everything from first principles but to give you a sturdy, intuitive grasp of each idea so that when you encounter it in a chapter, it...
In This Chapter
Appendix A: Mathematical Foundations (Intuitive)
This appendix collects the mathematical tools that appear throughout the textbook. The goal here is not to derive everything from first principles but to give you a sturdy, intuitive grasp of each idea so that when you encounter it in a chapter, it feels like meeting a familiar face rather than a stranger. If you are a physics student who has taken calculus, some of this will be review; if you are a music student encountering these ideas for the first time, work through the examples slowly and use the "Chapter Links" sidebars to see exactly how each concept shows up in the main text.
A.1 Waves and Sine Functions
What a Sine Wave Is
Imagine you are watching a point on the rim of a bicycle wheel as someone slowly rolls the wheel past you. The point starts at the rightmost position, rises as the wheel turns, reaches the top, descends, hits the bottom, and returns to where it started. If you plot the height of that point against time, you get a smooth, repeating S-curve. That curve is a sine wave.
More precisely, a sine wave is the projection of uniform circular motion onto a straight line. The circle is rotating at a constant rate; the sine wave is what you see when you collapse that rotation down to one dimension. This geometric origin is the reason sine waves are everywhere in physics: any system that "wants" to return to equilibrium in a restoring force proportional to displacement (a spring, a pendulum, a vibrating string) will produce exactly this kind of motion.
Amplitude → 1 | * *
0.5| * * * *
y(t) 0 |* * *——— time →
-0.5| * * *
-1 | *
|—————|—————|—————|
0 T/2 T 3T/2
The plot above shows a pure sine wave. Notice that it is perfectly symmetrical, rising and falling in an identical curved pattern. The horizontal axis is time; the vertical axis is the value of the wave at that moment — which could be air pressure, string displacement, or electrical voltage depending on context.
The Three Numbers That Define a Sine Wave
Every pure sine wave is completely described by exactly three numbers:
Amplitude (A) is the peak value — how far the wave swings from zero. In sound, amplitude corresponds to loudness: a larger amplitude means more air pressure variation and therefore a louder sound. In the diagram above, the amplitude is 1. If we doubled A to 2, the wave would swing from +2 to −2 but otherwise look identical — same shape, same timing, just taller.
Frequency (f) is the number of complete cycles per second, measured in Hertz (Hz). If f = 440 Hz, the wave completes 440 full oscillations every second. That is the note A above middle C, the standard orchestral tuning pitch. Higher frequency means more cycles per second, which means higher pitch. Lower frequency means fewer cycles per second, which means lower pitch.
Phase (φ) is the starting angle of the wave, measured in radians. It answers the question: where in its cycle is the wave at time t = 0? A phase of 0 means the wave starts at zero and goes up immediately. A phase of π/2 (90 degrees) means the wave starts at its maximum. Phase matters enormously when two waves interact — two waves with the same frequency and amplitude but opposite phases (φ = π apart) will cancel each other out completely. This is the principle behind noise-canceling headphones.
The Equation
The master equation for a sine wave is:
y(t) = A × sin(2π × f × t + φ)
Let us read every symbol in plain English:
- y(t) is the value of the wave at time t. This might be air pressure at a microphone, displacement of a guitar string, or voltage from a speaker amplifier.
- A is the amplitude (peak value). Units depend on context: Pascals for pressure, meters for displacement.
- sin(...) is the sine function, which takes an angle and returns a number between −1 and +1. It is the mathematical engine that generates the smooth oscillation.
- 2π converts frequency from cycles per second into radians per second. One complete cycle = 2π radians. So 2πf gives the angular frequency ω (omega), in radians per second.
- f is frequency in Hz. For A4, f = 440.
- t is time in seconds.
- φ (phi) is the initial phase in radians.
The product 2π × f × t grows steadily as time increases, and passing this ever-growing angle into the sine function is what makes the wave repeat.
Chapter Links: The sine wave equation appears in Chapter 1 (Introduction to Sound as Waveform), Chapter 5 (Resonance and Standing Waves), and Chapter 12 (Fourier Analysis in Music).
Why Sine Waves Are "Pure"
A sine wave is the unique solution to the simplest possible oscillation equation:
d²y/dt² = −(2πf)² × y
This equation says: the acceleration of y is proportional to y but in the opposite direction. A spring exerts this kind of force. A pendulum (for small angles) does too. A column of air in a tube does as well. Any system described by this equation will naturally produce sine waves and only sine waves. That is why physicists call the sine wave the "natural" or "pure" oscillation — it is not a combination of anything; it is the ground floor of oscillatory motion.
Reference Table: Common Musical Frequencies
| Note | Frequency (Hz) | Period (ms) | Wavelength in Air (cm) |
|---|---|---|---|
| A0 (lowest piano) | 27.5 | 36.4 | 1,254 |
| C2 | 65.4 | 15.3 | 527 |
| A2 | 110 | 9.09 | 313 |
| Middle C (C4) | 261.6 | 3.82 | 132 |
| A4 (concert A) | 440 | 2.27 | 78.2 |
| C5 | 523.3 | 1.91 | 65.7 |
| A5 | 880 | 1.14 | 39.1 |
| C8 (highest piano) | 4,186 | 0.239 | 8.2 |
Wavelengths assume speed of sound ≈ 343 m/s in air at 20°C. Notice that bass notes have wavelengths measured in meters — comparable to room dimensions — which is why bass frequencies are strongly affected by room acoustics and why subwoofer placement matters.
A.2 Frequency, Period, and Wavelength
The Reciprocal Relationship: Period and Frequency
Period and frequency are two ways of describing the same thing from opposite viewpoints. The period T is the time it takes to complete one full cycle, measured in seconds. The frequency f is the number of cycles completed per second.
They are exact reciprocals:
T = 1/f f = 1/T
If a wave completes 440 cycles per second (f = 440 Hz), each cycle takes 1/440 seconds ≈ 0.00227 seconds = 2.27 milliseconds. This is the period of concert A.
The intuition is simple: the faster the wave oscillates (higher f), the less time each oscillation takes (smaller T). You cannot have a high-frequency wave with a long period — these are mutually exclusive.
Worked Example: The lowest note on a standard piano is A0 at 27.5 Hz. What is its period?
T = 1/27.5 ≈ 0.0364 seconds = 36.4 milliseconds
That means each oscillation of the lowest piano note takes about 36 thousandths of a second — imperceptible as individual events, but repeated 27.5 times per second they create the sensation of a rumbling bass pitch.
Wavelength in a Medium
When a sound wave travels through air, it has a wavelength λ (lambda) — the physical distance from one pressure peak to the next. Wavelength, frequency, and the speed of sound c are related by:
λ = c / f
The speed of sound in air at room temperature (20°C) is approximately 343 m/s. This speed is fixed by the medium — it does not depend on frequency or amplitude. Therefore, high-frequency sounds have short wavelengths and low-frequency sounds have long wavelengths.
Worked Example: What is the wavelength of A4 (440 Hz)?
λ = 343 / 440 ≈ 0.780 m = 78 cm
What about a bass note at 80 Hz?
λ = 343 / 80 ≈ 4.3 m
That 4.3-meter wavelength is comparable to the dimensions of a typical room, which is why bass frequencies interact so strongly with room boundaries (standing waves, room modes).
Frequency Ranges Reference Table
| Frequency Range | Category | Musical/Physical Notes |
|---|---|---|
| 20 – 60 Hz | Sub-bass | Organ pedal, kick drum felt, room modes |
| 60 – 250 Hz | Bass | Bass guitar, cello low notes, male voice fundamental |
| 250 – 500 Hz | Low-midrange | Piano midrange, vocal warmth |
| 500 Hz – 2 kHz | Midrange | Core of most instruments, vocal intelligibility |
| 2 – 4 kHz | Upper-mid | Presence, consonant clarity, nasal tones |
| 4 – 8 kHz | Presence/high-mid | Sibilance, high guitar, tin whistle |
| 8 – 20 kHz | Air/treble | Cymbals, breath noise, recording "air" |
| > 20 kHz | Ultrasound | Dog whistles, sonar, medical imaging |
| Human hearing | 20 Hz – 20 kHz | Shrinks with age, especially at high end |
| Middle C | 261.6 Hz | Reference point for keyboard instruments |
The Octave: Why Doubling Makes "Same but Higher"
In virtually every musical culture on Earth, a pitch that is exactly twice the frequency of another is perceived as the "same note, just higher." An A at 440 Hz and an A at 880 Hz are both called "A." Why does doubling produce this perceptual equivalence?
The answer lies partly in the physics of harmonics. When any instrument plays A at 440 Hz, it also produces overtones at 880, 1320, 1760 Hz and so on. The note at 880 Hz is already present as the second harmonic of the lower A. The two pitches share a dense harmonic overlap, which makes them sound related — almost the same.
Psychoacoustically, the brain's pitch-processing mechanism has a compressive, approximately logarithmic character. On a logarithmic pitch scale, each octave spans the same perceptual "distance" regardless of where in the frequency range you are. The interval from 110 to 220 Hz feels the same size as the interval from 440 to 880 Hz.
This logarithmic octave equivalence is why we can meaningfully say that "doubling frequency equals one octave," and it is the foundation of all the ratio-based music theory in Section A.3.
Chapter Links: Wavelength and room acoustics appear in Chapter 8 (Room Acoustics and Standing Waves). Octave equivalence and logarithmic pitch are explored in Chapter 3 (Pitch Perception and Psychoacoustics).
A.3 Ratios and Intervals
What a Ratio Means Musically
In music, an interval is the relationship between two pitches — and relationships are best expressed as ratios. If note A has frequency f₁ and note B has frequency f₂, the interval between them is characterized by the ratio f₂/f₁.
What matters for musical perception is not the absolute difference (f₂ − f₁) but the ratio. The interval from 220 Hz to 330 Hz (ratio 3:2) sounds identical to the interval from 440 Hz to 660 Hz (also ratio 3:2). Both are perfect fifths, even though the first pair differs by 110 Hz and the second pair differs by 220 Hz. Ratios, not differences, define musical intervals.
Integer Ratios and Consonance
The simplest integer ratios correspond to the intervals that most human cultures regard as the most consonant (harmonious, stable-sounding):
| Interval Name | Frequency Ratio | Example (from A4=440 Hz) |
|---|---|---|
| Unison | 1 : 1 | 440 Hz (same note) |
| Octave | 2 : 1 | 880 Hz |
| Perfect Fifth | 3 : 2 | 660 Hz |
| Perfect Fourth | 4 : 3 | 586.7 Hz |
| Major Third | 5 : 4 | 550 Hz |
| Minor Third | 6 : 5 | 528 Hz |
| Major Sixth | 5 : 3 | 733.3 Hz |
| Minor Seventh | 7 : 4 | 770 Hz |
| Minor Second | 16 : 15 | 469.3 Hz (very dissonant) |
The physical reason these simple ratios sound consonant involves the alignment of overtones. When you play a perfect fifth (3:2), the overtones of both notes land on many shared frequencies: the 3rd harmonic of the lower note equals the 2nd harmonic of the upper note. This spectral alignment reduces beating (interference) and produces a smooth, fused sound.
Complex ratios (like 16:15 for a minor second) produce many misaligned overtones that beat against each other rapidly, generating the roughness perceived as dissonance.
Cents: The Logarithmic Unit
Ratios are the "correct" way to think about intervals, but they become cumbersome for precise comparison. A musician needs to know whether a particular tuning is 3 cents sharp or 7 cents flat — that precision requires a finer unit than "roughly 3:2."
The cent is defined such that one octave equals exactly 1200 cents, and one semitone (equal temperament) equals exactly 100 cents. The cent is a logarithmic unit: each cent is a ratio of 2^(1/1200), a tiny frequency multiplier.
Why logarithmic? Because our perception of pitch intervals is logarithmic. The perceptual "distance" from 440 to 880 Hz equals the distance from 880 to 1760 Hz — both are one octave. On a linear frequency scale, these distances are 440 Hz and 880 Hz — very different. On a logarithmic scale, they are identical. Cents live on the logarithmic scale, so they match how we actually hear.
Converting a frequency ratio to cents:
cents = 1200 × log₂(f₂ / f₁)
The log₂ (logarithm base 2) asks: "to what power must I raise 2 to get this ratio?"
- log₂(2) = 1, so an octave (ratio 2:1) gives 1200 × 1 = 1200 cents. ✓
- log₂(3/2) ≈ 0.585, so a pure fifth gives 1200 × 0.585 ≈ 702 cents.
- log₂(4/3) ≈ 0.415, so a pure fourth gives 1200 × 0.415 ≈ 498 cents.
- log₂(5/4) ≈ 0.322, so a major third gives 1200 × 0.322 ≈ 386 cents.
Intuition for log₂: The logarithm base 2 counts octaves. log₂(4) = 2 means "4 is two octaves above 1." log₂(8) = 3. If the ratio is between 1 and 2, log₂ gives a number between 0 and 1 — meaning the interval is less than an octave.
Equal Temperament Reference Table
Modern instruments use equal temperament: the octave is divided into 12 equal semitones. "Equal" here means equal in ratio, not equal in Hertz — each semitone is a ratio of 2^(1/12) ≈ 1.05946.
| Semitone | Note (from C) | ET Ratio (×1) | Cents | Just Ratio (approx.) | Deviation |
|---|---|---|---|---|---|
| 0 | C | 1.0000 | 0 | 1:1 | 0 |
| 1 | C#/Db | 1.0595 | 100 | 16:15 | −12 cents |
| 2 | D | 1.1225 | 200 | 9:8 | +4 cents |
| 3 | D#/Eb | 1.1892 | 300 | 6:5 | −16 cents |
| 4 | E | 1.2599 | 400 | 5:4 | +14 cents |
| 5 | F | 1.3348 | 500 | 4:3 | +2 cents |
| 6 | F#/Gb | 1.4142 | 600 | 45:32 | −10 cents |
| 7 | G | 1.4983 | 700 | 3:2 | −2 cents |
| 8 | G#/Ab | 1.5874 | 800 | 8:5 | +14 cents |
| 9 | A | 1.6818 | 900 | 5:3 | −16 cents |
| 10 | A#/Bb | 1.7818 | 1000 | 7:4 | +31 cents |
| 11 | B | 1.8877 | 1100 | 15:8 | +12 cents |
| 12 | C' | 2.0000 | 1200 | 2:1 | 0 |
The "Deviation" column shows how far equal temperament departs from pure integer ratios. The perfect fifth (G, 700 cents) is only 2 cents flat from pure — nearly imperceptible. The major third (E, 400 cents) is 14 cents sharp from pure — a musically significant difference that gives equal temperament its characteristic slightly brash sound compared to just intonation.
Chapter Links: Ratios and consonance are explored deeply in Chapter 4 (Consonance, Dissonance, and Harmony). Temperament and tuning systems occupy Chapters 6 and 7.
A.4 Logarithms and Decibels
Why Logarithms Match Human Perception
The human ear is capable of detecting sounds across an enormous range of intensities — from the faintest audible whisper to the roar of a jet engine. The ratio of intensities between these extremes is approximately 10^12 (one trillion). If sound levels were reported on a linear scale in Watts per square meter, you would need to say "that's 0.000000000001 W/m² for a whisper and 1 W/m² for a jet engine" — deeply inconvenient.
More importantly, our perception is not linear. In the 1830s, the physiologist Ernst Weber and later Gustav Fechner formalized the observation that equal ratios of stimulus correspond to equal steps in perception. If doubling the intensity produces one noticeable step in loudness, then you need to double again (quadruple the original) to get another step — not merely add another unit. This Weber-Fechner Law implies that loudness is approximately proportional to the logarithm of intensity.
The decibel (dB) scale was designed to match this logarithmic perception. It is defined relative to a reference level:
dB (SPL) = 20 × log₁₀(A / A₀) [for amplitude, pressure]
dB (SPL) = 10 × log₁₀(P / P₀) [for power, intensity]
Why 20 and 10? Because power goes as amplitude squared (P ∝ A²). If you double amplitude (ratio 2), you quadruple power (ratio 4). log₁₀(2) ≈ 0.301. Multiplying by 20 gives 6.02 dB for a doubling of amplitude. log₁₀(4) ≈ 0.602. Multiplying by 10 gives 6.02 dB for a quadrupling of power. Both formulas produce the same dB value for the same physical change — they are two ways of expressing the same thing.
The reference level A₀ for Sound Pressure Level (SPL) is 20 micropascals (20 × 10⁻⁶ Pa) — approximately the quietest sound a young adult with normal hearing can detect at 1 kHz.
Key Properties of Decibels
Adding dB values: When two independent sound sources of equal power combine, the total intensity doubles. A doubling of power = +3 dB. Two violins playing the same note will be about 3 dB louder than one violin — not 6 dB, not twice the dB value.
10 dB ≈ double perceived loudness: Research by Fletcher and others showed that approximately 10 dB increase corresponds to a perceived doubling of loudness (though this varies with frequency and listener). This is a rough perceptual rule, not a physical law.
Every 6 dB = doubling the amplitude: In audio engineering, 6 dB (exactly 20 × log₁₀(2)) corresponds to doubling amplitude. This is used constantly in recording and mixing.
Decibel Reference Table
| dB SPL | Sound Source | Perception |
|---|---|---|
| 0 | Threshold of hearing | Inaudible to most adults |
| 10 | Rustling leaves | Barely perceptible |
| 20 | Whisper (1 m away) | Very quiet |
| 30 | Quiet bedroom at night | Quiet |
| 40 | Library, soft background | Quiet ambient |
| 60 | Normal conversation | Comfortable |
| 70 | Busy restaurant | Loud |
| 80 | Alarm clock, loud traffic | Annoyingly loud |
| 85 | OSHA 8-hour exposure limit | Hearing damage possible |
| 90 | Lawnmower, motorcycle | Loud; protective gear advised |
| 100 | Power saw, subway train | Very loud |
| 110 | Rock concert (near stage) | Painful for sustained exposure |
| 120 | Threshold of pain | Physical pain in ear |
| 130 | Jet engine at 100 m | Immediate hearing risk |
| 140 | Gunshot at close range | Instant damage possible |
| 194 | Theoretical maximum SPL in air | Pressure wave goes to vacuum |
Worked Example: Adding Two Sound Sources
A trumpet plays at 90 dB and another trumpet joins in at the same level. What is the combined level?
The physical intensities add, not the decibels:
I₁ = I₂ = I (same level)
I_total = 2I
dB_total = 10 × log₁₀(2I / I₀)
= 10 × log₁₀(2) + 10 × log₁₀(I / I₀)
= 10 × 0.301 + 90
≈ 3 + 90 = 93 dB
Two identical sources add only 3 dB — this surprises students who expect a doubling in dB. The reason is that the dB scale already "compresses" large ratios; adding a second equal source is only a 2:1 intensity ratio, which maps to just 3 dB.
Chapter Links: Decibels and loudness perception appear in Chapter 3 (Psychoacoustics), Chapter 9 (Dynamics and Dynamic Range), and Chapter 15 (Audio Recording and Signal Chain).
A.5 The Fourier Series (Intuitive)
The Central Idea
Jean-Baptiste Joseph Fourier showed, in the early 19th century, something that initially seems almost magical: any periodic function, no matter how complicated its shape, can be written as a sum of sine and cosine waves at different frequencies.
This is not an approximation. Given enough terms, the sum converges exactly to the target function. A square wave, a sawtooth wave, the pressure waveform of a violin — all of these are (in principle) exact sums of pure sine waves.
The practical implication for music is profound: every musical timbre is a recipe for mixing harmonics. The characteristic sound of a clarinet vs. an oboe vs. a violin is entirely encoded in which harmonics are present and at what amplitudes. If you could dial up and down the amplitudes of harmonics at will, you could synthesize any timbre from pure sine waves. This is exactly what synthesizers do.
Reading a Fourier Series
A Fourier series for a periodic function with fundamental frequency f₀ looks like:
y(t) = A₀ + A₁sin(2πf₀t + φ₁) + A₂sin(2×2πf₀t + φ₂) + A₃sin(3×2πf₀t + φ₃) + ...
- A₀ is the DC offset — the average value of the function. For audio centered at zero pressure, this is 0.
- A₁, A₂, A₃, ... are the amplitudes of the 1st, 2nd, 3rd harmonics (overtones).
- f₀ is the fundamental frequency — the perceived pitch.
- 2f₀, 3f₀, 4f₀, ... are the harmonics. They are all integer multiples of f₀.
- φ₁, φ₂, φ₃, ... are the phases of each harmonic.
To "read" a Fourier series, look at each term and ask: what frequency is this, how loud is it (amplitude), and where does it start (phase)? The amplitudes tell you the spectral content — the tone color.
The Square Wave: Building Complexity from Pure Tones
A square wave alternates instantly between +1 and −1, spending equal time at each value. Its Fourier series is:
y(t) = (4/π) × [sin(2πf₀t) + (1/3)sin(3×2πf₀t) + (1/5)sin(5×2πf₀t) + ...]
Only odd harmonics (1st, 3rd, 5th, 7th, ...) are present. Each has amplitude 1/n where n is the harmonic number. As you add more terms, the sum approaches the square wave:
1 term: ~~~~~ (smooth sine)
3 terms: _|‾|_ (roughly squarish, with ripples)
10 terms: |‾‾‾| (clearly square, with small Gibbs ripples at corners)
∞ terms: exact square wave
The clarinet's tone is dominated by odd harmonics — this is why clarinets sound "hollow" and are sometimes compared to a square wave.
Spectral Content of Common Waveforms
| Waveform | Harmonics Present | Character |
|---|---|---|
| Pure sine | 1st only | Flute-like, pure |
| Square | Odd only (1/n amplitude) | Hollow, nasal (clarinet) |
| Sawtooth | All harmonics (1/n amplitude) | Bright, buzzy (string, brass) |
| Triangle | Odd only (1/n² amplitude) | Softer than square |
| Pulse (narrow) | All harmonics (nearly equal amplitude) | Thin, percussive |
From Fourier Series to Fourier Transform
The Fourier series applies to periodic functions. The Fourier Transform extends the idea to non-periodic signals — like a spoken word or a single piano note that decays over time. Instead of a discrete list of harmonics, the Fourier Transform produces a continuous spectrum of frequencies.
In practice, audio is analyzed using the Discrete Fourier Transform (DFT) computed via the Fast Fourier Transform (FFT) algorithm. The FFT takes a block of N audio samples and returns N/2 complex numbers, each representing the amplitude and phase at a specific frequency bin.
The frequency resolution of an FFT is:
Δf = sample_rate / N
For N = 4096 samples at 44,100 Hz sample rate: Δf = 44100/4096 ≈ 10.8 Hz per bin. You can resolve two frequencies that are at least 10.8 Hz apart.
Chapter Links: Fourier series appear in Chapter 11 (Timbre and Spectral Analysis). The FFT is implemented in Python throughout Chapters 12–16. The connection between waveform shape and timbre is explored in Chapter 13.
A.6 Basic Statistics for Music Analysis
Mean and Standard Deviation
When we analyze audio features — the spectral centroid of 100 recordings, the tempo of songs across a decade, the fundamental frequency variation in a vocalist's vibrato — we are working with datasets that need statistical summarization.
The mean (average) is the most familiar summary: add all values and divide by the count.
mean = (x₁ + x₂ + ... + xₙ) / n
The standard deviation σ (sigma) measures how spread out the values are around the mean:
σ = sqrt[ Σ(xᵢ − mean)² / n ]
Intuitively: compute how far each value is from the mean, square those distances (to make them positive), average them, and take the square root. A small σ means the data clusters tightly around the mean; a large σ means it is widely scattered.
Example: A soprano sings a sustained A4 (440 Hz). Analysis of 500 ms of audio using pitch tracking yields individual estimates: 438, 441, 440, 443, 439, 440, 442, ... Hz. The mean might be 440.3 Hz (very close to target). The standard deviation might be 1.8 Hz (tight intonation) or 6.4 Hz (wavering intonation with excessive vibrato).
Correlation
Correlation measures the linear relationship between two variables, on a scale from −1 to +1:
- +1: Perfect positive relationship — when one variable increases, the other increases proportionally.
- 0: No linear relationship — the variables are independent (or related nonlinearly).
- −1: Perfect negative relationship — when one increases, the other decreases proportionally.
The formula is Pearson's correlation coefficient:
r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / [n × σₓ × σᵧ]
Example in music research: If we analyze 500 songs and measure both the "spectral brightness" (high-frequency content) and listener ratings of "energy," we might find r = 0.72 — a strong positive correlation. Brighter-sounding songs tend to be rated as more energetic. This does not prove that brightness causes the rating; it shows a strong statistical association.
Histograms and Distributions
A histogram counts how many data points fall into each "bin" of values. For audio, we might histogram the distribution of note durations in a jazz solo, or the distribution of spectral centroid values across a genre.
A key distribution for music analysis is the normal (Gaussian) distribution, the famous bell curve:
*
* | *
* | *
* | *
* | *
—————————|—————————
mean
Many biological measurements (vocal pitch variation, timing deviation in human performance) follow near-normal distributions. Many audio features (spectral centroid across many recordings) may be skewed or multimodal — not normal. Always visualize data before assuming normality.
The standard deviation defines the width of the bell curve: in a normal distribution, 68% of values fall within 1σ of the mean, 95% within 2σ, and 99.7% within 3σ.
What p < 0.05 Means (and Its Limits)
In music research papers, you will frequently encounter statements like "listeners preferred the equal-tempered version significantly more (p = 0.03)." The p-value requires careful interpretation.
The p-value is the probability of getting results at least this extreme if there were truly no effect — if the null hypothesis (H₀: no difference) were true. p = 0.03 means: "if there were truly no preference difference, we would get results this extreme only 3% of the time by chance."
By convention, p < 0.05 is called "statistically significant" — it falls below the threshold where we accept the result as real rather than chance.
Critical caveats that responsible researchers acknowledge:
-
p < 0.05 does not mean there is a 95% chance the hypothesis is true. The p-value is not the probability of the hypothesis; it is the probability of the data given no effect.
-
Statistical significance ≠ practical significance. A study with 10,000 participants might find a "significant" difference of 0.5 Hz in pitch perception that is completely musically irrelevant.
-
Multiple comparisons inflate false positive rates. If you test 20 different features and find p < 0.05 for one of them, that result has a high chance of being a false positive — you expected about one false positive by chance.
-
Replication is the gold standard. A single study with p = 0.04 should be treated with appropriate skepticism. Replicated results from multiple independent labs carry much more weight.
Chapter Links: Statistical analysis appears in Chapter 28 (Empirical Research in Music Psychology), Chapter 33 (Machine Learning for Music Analysis), and Chapter 38 (Cross-Cultural Comparisons of Musical Scales).
Summary of Key Formulas
| Concept | Formula | Units |
|---|---|---|
| Sine wave | y(t) = A sin(2πft + φ) | Varies |
| Period-frequency | T = 1/f | s, Hz |
| Wavelength | λ = c/f | m |
| Angular frequency | ω = 2πf | rad/s |
| Interval in cents | c = 1200 log₂(f₂/f₁) | cents |
| Octave | f₂/f₁ = 2 | dimensionless |
| Equal temperament semitone | ratio = 2^(1/12) ≈ 1.0595 | dimensionless |
| dB (amplitude) | dB = 20 log₁₀(A/A₀) | dB |
| dB (power) | dB = 10 log₁₀(P/P₀) | dB |
| FFT frequency resolution | Δf = f_s / N | Hz |
| Standard deviation | σ = √[Σ(xᵢ−x̄)²/n] | same as data |
This appendix provides a reference foundation. For deeper treatment of any topic, consult the chapters indicated in the Chapter Links callouts, or the resources listed in Appendix B.