Appendix B: Python Audio Toolkit & Code Reference

DataField.Dev

9 min read

This appendix is your technical home base for every Python code example in the textbook. Every library, every function, every common error pattern is documented here. When you encounter a code block in Chapter 12 that uses librosa.stft() or a...

Appendix B: Python Audio Toolkit & Code Reference

This appendix is your technical home base for every Python code example in the textbook. Every library, every function, every common error pattern is documented here. When you encounter a code block in Chapter 12 that uses librosa.stft() or a spectrogram visualization in Chapter 19, the explanations below will tell you exactly what those calls do and how to customize them. The complete audio_toolkit.py module lives in the code/ subdirectory of this appendix and is imported by nearly all chapter exercises.

B.1 Installation and Setup

Python Version

This textbook requires Python 3.9 or newer. Python 3.11 and 3.12 have been tested and work well. To check your current version:

python --version
# or
python3 --version

If you need to install or update Python, download the official installer from python.org. During installation on Windows, check the box that says "Add Python to PATH." On macOS, the system Python is often outdated; use the official installer or Homebrew (brew install python).

Creating a Virtual Environment

A virtual environment is an isolated Python installation that keeps your audio analysis packages separate from other projects. This prevents version conflicts and makes your environment reproducible.

# Navigate to your working directory
cd ~/physics-of-music

# Create a virtual environment named 'audio-env'
python -m venv audio-env

# Activate it (macOS / Linux)
source audio-env/bin/activate

# Activate it (Windows Command Prompt)
audio-env\Scripts\activate.bat

# Activate it (Windows PowerShell)
audio-env\Scripts\Activate.ps1

# Your prompt should now show (audio-env) at the beginning

Once activated, all packages you install go into the virtual environment only. To deactivate when you are done:

deactivate

Installing All Required Packages

With your virtual environment activated, install all textbook dependencies in one command:

pip install numpy scipy matplotlib librosa soundfile pandas seaborn scikit-learn jupyterlab

This may take several minutes. librosa pulls in several dependencies automatically, including audioread and resampy. Let the installer work.

For Jupyter notebook support (recommended for exploratory analysis):

pip install jupyterlab ipywidgets
jupyter lab

Testing Your Installation

Run this script to confirm everything is working. Save it as test_install.py and run python test_install.py:

# test_install.py — confirms all textbook dependencies are installed correctly
import sys
print(f"Python version: {sys.version}")

import numpy as np
print(f"NumPy: {np.__version__}")

import scipy
print(f"SciPy: {scipy.__version__}")

import matplotlib
print(f"Matplotlib: {matplotlib.__version__}")

import librosa
print(f"librosa: {librosa.__version__}")

import soundfile as sf
print(f"soundfile: {sf.__version__}")

import pandas as pd
print(f"pandas: {pd.__version__}")

import sklearn
print(f"scikit-learn: {sklearn.__version__}")

# Generate a test sine wave and compute its FFT
sr = 44100
duration = 1.0
freq = 440.0
t = np.linspace(0, duration, int(sr * duration), endpoint=False)
y = np.sin(2 * np.pi * freq * t)

Y = np.fft.rfft(y)
freqs = np.fft.rfftfreq(len(y), 1/sr)
peak_freq = freqs[np.argmax(np.abs(Y))]

print(f"\nSelf-test: generated {freq} Hz sine wave")
print(f"FFT peak detected at: {peak_freq:.1f} Hz")
print("All imports successful. Installation verified.")

Expected output ends with FFT peak detected at: 440.0 Hz and All imports successful.

Common Installation Issues

Issue: pip install librosa fails with a C-compiler error - On Linux: sudo apt-get install python3-dev libsndfile1-dev - On macOS: Install Xcode Command Line Tools: xcode-select --install - On Windows: Install Microsoft C++ Build Tools from visualstudio.microsoft.com

Issue: import soundfile fails on Linux - Install the system library: sudo apt-get install libsndfile1 - Then reinstall: pip install soundfile --force-reinstall

Issue: librosa.load() fails with "audioread" errors on MP3 files - Install ffmpeg: on macOS brew install ffmpeg; on Ubuntu sudo apt-get install ffmpeg - WAV files do not require ffmpeg and are recommended for student exercises

Issue: Matplotlib shows no window (headless server) - Add at the top of your script: import matplotlib; matplotlib.use('Agg') before importing pyplot - Save figures with plt.savefig('output.png') instead of plt.show()

Issue: pip installs to the wrong Python - Always activate your virtual environment first before running pip - Use python -m pip install ... to ensure pip is linked to the correct Python

B.2 Core Libraries Reference

NumPy

NumPy (Numerical Python) is the foundation of scientific computing in Python. It provides the ndarray — a fast, fixed-type array — plus thousands of mathematical functions that operate on arrays without explicit loops.

In this textbook, NumPy appears in virtually every code example. Its most important roles are: 1. Generating test signals (sine waves, noise) 2. Computing the FFT (Fast Fourier Transform) 3. Array arithmetic needed for signal processing

Key functions used in this textbook:

Function	Purpose
`np.linspace(start, stop, n)`	Evenly spaced array of n values
`np.sin(x)`, `np.cos(x)`	Sine and cosine (operates on arrays)
`np.fft.fft(y)`	Full complex FFT
`np.fft.rfft(y)`	FFT for real input (returns n/2+1 values)
`np.fft.rfftfreq(n, d)`	Frequency values for rfft output
`np.abs(x)`	Magnitude of complex numbers
`np.zeros(n)`, `np.ones(n)`	Arrays of zeros or ones
`np.random.normal(0, 1, n)`	Gaussian white noise
`np.concatenate([a, b])`	Join arrays
`np.argmax(x)`	Index of the maximum value

Minimal working example — generating and analyzing a sine wave:

import numpy as np

sr = 44100          # sample rate in Hz
duration = 2.0      # seconds
freq = 261.63       # Middle C

# Generate time array and sine wave
t = np.linspace(0, duration, int(sr * duration), endpoint=False)
y = 0.8 * np.sin(2 * np.pi * freq * t)

# FFT to find frequency content
Y = np.fft.rfft(y)
freqs = np.fft.rfftfreq(len(y), 1/sr)

# Find the dominant frequency
peak_index = np.argmax(np.abs(Y))
print(f"Dominant frequency: {freqs[peak_index]:.2f} Hz")  # → 261.63 Hz
print(f"Signal duration: {len(y)/sr:.2f} seconds")
print(f"Amplitude range: {y.min():.3f} to {y.max():.3f}")

Important note on FFT output: np.fft.rfft() returns complex numbers. To get amplitude (magnitude), take np.abs(). To convert to decibels: 20 * np.log10(np.abs(Y) + 1e-10) (the small constant prevents log of zero).

SciPy Signal Processing (`scipy.signal`)

SciPy extends NumPy with algorithms for scientific computing. Its signal submodule provides tools for filtering, spectral analysis, and convolution that go beyond NumPy's basic FFT.

Key functions used in this textbook:

Function	Purpose
`scipy.signal.spectrogram(y, fs)`	Compute short-time power spectrum
`scipy.signal.periodogram(y, fs)`	Power spectral density estimate
`scipy.signal.butter(N, Wn, btype)`	Design Butterworth IIR filter
`scipy.signal.sosfilt(sos, y)`	Apply filter using second-order sections
`scipy.signal.convolve(x, h)`	Convolution (FIR filtering, reverb)
`scipy.signal.find_peaks(y)`	Locate peaks in an array
`scipy.signal.resample(y, n)`	Resample signal to n samples

Minimal working example — designing and applying a low-pass filter:

import numpy as np
from scipy import signal

sr = 44100
cutoff_hz = 1000    # Cut frequencies above 1 kHz
order = 4           # Filter steepness

# Design a Butterworth low-pass filter
sos = signal.butter(order, cutoff_hz, btype='low', fs=sr, output='sos')

# Generate test signal: 440 Hz + 5000 Hz
t = np.linspace(0, 1.0, sr, endpoint=False)
y = np.sin(2*np.pi*440*t) + np.sin(2*np.pi*5000*t)

# Apply filter
y_filtered = signal.sosfilt(sos, y)

print("Filter designed and applied.")
print(f"Original peak amplitude: {np.max(np.abs(y)):.3f}")
print(f"Filtered peak amplitude: {np.max(np.abs(y_filtered)):.3f}")
# The 5000 Hz component is attenuated; 440 Hz passes through

Matplotlib

Matplotlib is Python's primary plotting library. In audio analysis, it handles waveform plots, spectrum plots, spectrograms, and feature visualizations.

Key functions used in this textbook:

Function	Purpose
`plt.plot(x, y)`	Line plot (waveforms, spectra)
`plt.figure(figsize=(w, h))`	Create figure with specified size
`plt.subplot(rows, cols, n)`	Divide figure into subplots
`plt.imshow(Z, aspect, origin)`	Display 2D array as image
`plt.colorbar()`	Add colorbar to image/spectrogram
`plt.specgram(y, NFFT, Fs)`	Built-in spectrogram display
`plt.xlabel()`, `plt.ylabel()`	Axis labels
`plt.title()`	Figure title
`plt.tight_layout()`	Prevent subplot overlap
`plt.savefig('file.png', dpi=150)`	Save to file

Minimal working example — multi-panel plot:

import numpy as np
import matplotlib.pyplot as plt

sr = 44100
t = np.linspace(0, 1.0, sr, endpoint=False)
y = np.sin(2 * np.pi * 440 * t)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6))

# Top panel: first 0.01 seconds of waveform
n_samples = int(0.01 * sr)
ax1.plot(t[:n_samples] * 1000, y[:n_samples])   # × 1000 for milliseconds
ax1.set_xlabel("Time (ms)")
ax1.set_ylabel("Amplitude")
ax1.set_title("Waveform (first 10 ms)")

# Bottom panel: frequency spectrum
Y = np.fft.rfft(y)
freqs = np.fft.rfftfreq(len(y), 1/sr)
ax2.plot(freqs[:2000], np.abs(Y[:2000]))   # plot up to ~2000 Hz bins
ax2.set_xlabel("Frequency (Hz)")
ax2.set_ylabel("Magnitude")
ax2.set_title("FFT Spectrum")

plt.tight_layout()
plt.savefig("sine_analysis.png", dpi=150)
plt.show()

librosa

librosa is the most powerful and widely-used Python library for music and audio analysis. It provides functions for loading audio, computing spectrograms, extracting musical features, and analyzing rhythm — all in a consistent, well-documented API.

Key functions used in this textbook:

Function	Purpose
`librosa.load(path, sr)`	Load audio file, returns (y, sr)
`librosa.stft(y, n_fft, hop_length)`	Short-Time Fourier Transform
`librosa.amplitude_to_db(S)`	Convert magnitude to dB scale
`librosa.display.specshow(D, sr)`	Display spectrogram with frequency labels
`librosa.feature.mfcc(y, sr, n_mfcc)`	Mel-frequency cepstral coefficients
`librosa.feature.spectral_centroid(y, sr)`	Brightness of spectrum
`librosa.feature.spectral_bandwidth(y, sr)`	Spread of spectrum
`librosa.feature.spectral_rolloff(y, sr)`	Frequency below which 85% of energy lies
`librosa.feature.zero_crossing_rate(y)`	Rate of sign changes (noisiness)
`librosa.beat.beat_track(y, sr)`	Estimate tempo and beat frames
`librosa.onset.onset_detect(y, sr)`	Find note onsets
`librosa.effects.harmonic(y)`	Separate harmonic from percussive
`librosa.times_like(y, sr)`	Generate time array matching audio
`librosa.hz_to_midi(hz)`	Convert Hz to MIDI note number

Minimal working example — loading audio and extracting features:

import librosa
import numpy as np

# Load audio file (sr=None preserves original sample rate)
y, sr = librosa.load('recording.wav', sr=None)

print(f"Sample rate: {sr} Hz")
print(f"Duration: {len(y)/sr:.2f} seconds")
print(f"Samples: {len(y)}")

# Extract spectral centroid (brightness)
centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
print(f"Mean spectral centroid: {centroid.mean():.1f} Hz")

# Estimate tempo
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
print(f"Estimated tempo: {tempo:.1f} BPM")

Note on sr=None: When you call librosa.load('file.wav', sr=None), librosa preserves the original sample rate. If you call librosa.load('file.wav') without sr=None, it resamples to 22,050 Hz by default. For this textbook, always use sr=None unless you explicitly want resampling.

soundfile

soundfile provides straightforward reading and writing of audio files in WAV, FLAC, and other formats. It does not support MP3 (which requires ffmpeg). For textbook exercises, we generate and save WAV files.

Key functions:

Function	Purpose
`sf.read(path)`	Returns (data, samplerate)
`sf.write(path, data, samplerate)`	Write audio array to file
`sf.info(path)`	Returns metadata without loading
`sf.available_formats()`	Lists supported formats

Minimal working example:

import numpy as np
import soundfile as sf

# Generate a 440 Hz tone
sr = 44100
t = np.linspace(0, 2.0, int(sr * 2.0), endpoint=False)
y = 0.5 * np.sin(2 * np.pi * 440 * t)

# Write to WAV file
sf.write('output_440hz.wav', y, sr)

# Read it back
y_loaded, sr_loaded = sf.read('output_440hz.wav')
print(f"Loaded {len(y_loaded)} samples at {sr_loaded} Hz")
print(f"Max amplitude: {np.max(np.abs(y_loaded)):.4f}")

B.3 Common Audio Analysis Patterns

Pattern 1: Loading and Plotting a Waveform

import librosa
import matplotlib.pyplot as plt
import numpy as np

# Load audio — sr=None preserves original sample rate
y, sr = librosa.load('audio.wav', sr=None)

# Generate time axis in seconds
times = librosa.times_like(y, sr=sr)

plt.figure(figsize=(12, 4))
plt.plot(times, y, color='steelblue', linewidth=0.5)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Waveform')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Duration: {len(y)/sr:.2f} s | Sample rate: {sr} Hz | Samples: {len(y)}")

Pattern 2: Computing and Displaying a Spectrogram

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

y, sr = librosa.load('audio.wav', sr=None)

# Compute Short-Time Fourier Transform
# n_fft: FFT window size (frequency resolution)
# hop_length: samples between windows (time resolution)
n_fft = 2048
hop_length = 512

D = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)

# Convert magnitude to decibels
D_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

# Plot
fig, ax = plt.subplots(figsize=(12, 5))
img = librosa.display.specshow(
    D_db,
    sr=sr,
    hop_length=hop_length,
    x_axis='time',
    y_axis='log',      # log frequency scale — matches musical intervals
    ax=ax,
    cmap='magma'
)
fig.colorbar(img, ax=ax, format='%+2.0f dB', label='Amplitude (dB)')
ax.set_title('Spectrogram (dB, log frequency scale)')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Frequency (Hz)')
plt.tight_layout()
plt.show()

print(f"Spectrogram shape: {D_db.shape}  (freq bins × time frames)")
print(f"Frequency resolution: {sr/n_fft:.1f} Hz per bin")
print(f"Time resolution: {hop_length/sr*1000:.1f} ms per frame")

Tuning the spectrogram: Larger n_fft gives better frequency resolution but worse time resolution — a fundamental trade-off called the time-frequency uncertainty principle. For music with fast transients, use smaller n_fft (512 or 1024). For tonal analysis, use larger n_fft (4096 or 8192).

Pattern 3: Computing the FFT Spectrum of a Short Clip

import librosa
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load('audio.wav', sr=None)

# Extract a 2-second segment starting at 1 second
start_sample = int(1.0 * sr)
end_sample = int(3.0 * sr)
clip = y[start_sample:end_sample]

# Apply Hann window to reduce spectral leakage
window = np.hanning(len(clip))
clip_windowed = clip * window

# Compute FFT
Y = np.fft.rfft(clip_windowed)
freqs = np.fft.rfftfreq(len(clip_windowed), 1/sr)
magnitude_db = 20 * np.log10(np.abs(Y) + 1e-10)

# Find top 5 frequency peaks
from scipy.signal import find_peaks
peaks, _ = find_peaks(magnitude_db, height=-40, distance=50)
top_peaks = peaks[np.argsort(magnitude_db[peaks])[-5:]]

# Plot
plt.figure(figsize=(12, 5))
plt.plot(freqs, magnitude_db, color='royalblue', linewidth=0.8, label='Spectrum')
plt.plot(freqs[top_peaks], magnitude_db[top_peaks], 'ro', markersize=6, label='Peaks')
for p in top_peaks:
    plt.annotate(f"{freqs[p]:.0f} Hz",
                 xy=(freqs[p], magnitude_db[p]),
                 xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude (dB)')
plt.title('FFT Spectrum (2-second clip, Hann window)')
plt.xlim(0, 5000)    # Show up to 5 kHz
plt.ylim(-80, 10)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pattern 4: Extracting Spectral Features with librosa

import librosa
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load('audio.wav', sr=None)
hop_length = 512

# --- Spectral Centroid (brightness) ---
centroid = librosa.feature.spectral_centroid(y=y, sr=sr, hop_length=hop_length)

# --- Spectral Bandwidth (spread) ---
bandwidth = librosa.feature.spectral_bandwidth(y=y, sr=sr, hop_length=hop_length)

# --- Spectral Rolloff (85% energy frequency) ---
rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr, hop_length=hop_length, roll_percent=0.85)

# --- Zero Crossing Rate (noisiness) ---
zcr = librosa.feature.zero_crossing_rate(y, hop_length=hop_length)

# Time axis for features
times = librosa.times_like(centroid, sr=sr, hop_length=hop_length)

# --- Summary statistics ---
print("=== Spectral Feature Summary ===")
print(f"Spectral Centroid  — Mean: {centroid.mean():.1f} Hz, Std: {centroid.std():.1f} Hz")
print(f"Spectral Bandwidth — Mean: {bandwidth.mean():.1f} Hz, Std: {bandwidth.std():.1f} Hz")
print(f"Spectral Rolloff   — Mean: {rolloff.mean():.1f} Hz")
print(f"Zero Crossing Rate — Mean: {zcr.mean():.4f} crossings/sample")

# Plot all features over time
fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True)

axes[0].plot(times, centroid.T, color='orange')
axes[0].set_ylabel('Hz'); axes[0].set_title('Spectral Centroid (Brightness)')

axes[1].plot(times, bandwidth.T, color='green')
axes[1].set_ylabel('Hz'); axes[1].set_title('Spectral Bandwidth')

axes[2].plot(times, rolloff.T, color='purple')
axes[2].set_ylabel('Hz'); axes[2].set_title('Spectral Rolloff (85%)')

axes[3].plot(times, zcr.T, color='red')
axes[3].set_ylabel('Rate'); axes[3].set_title('Zero Crossing Rate')
axes[3].set_xlabel('Time (s)')

plt.tight_layout()
plt.show()

Pattern 5: Detecting Beats and Tempo

import librosa
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load('audio.wav', sr=None)

# Estimate tempo and beat frame locations
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

print(f"Estimated tempo: {tempo:.1f} BPM")
print(f"Number of beats detected: {len(beat_times)}")
print(f"Beat interval: {60/tempo:.3f} seconds = {1000*60/tempo:.1f} ms")

# Plot waveform with beat markers
fig, ax = plt.subplots(figsize=(14, 4))
times = librosa.times_like(y, sr=sr)
ax.plot(times, y, color='steelblue', linewidth=0.5, alpha=0.7, label='Waveform')
for bt in beat_times:
    ax.axvline(x=bt, color='red', linewidth=0.8, alpha=0.6)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Amplitude')
ax.set_title(f'Beat Detection — {tempo:.1f} BPM')
ax.legend(['Waveform', 'Beat markers'])
plt.tight_layout()
plt.show()

Pattern 6: Comparing Two Audio Files Spectrally

import librosa
import numpy as np
import matplotlib.pyplot as plt

def load_and_analyze(filepath):
    """Load audio and compute average spectrum."""
    y, sr = librosa.load(filepath, sr=None)
    Y = np.fft.rfft(y * np.hanning(len(y)))
    freqs = np.fft.rfftfreq(len(y), 1/sr)
    magnitude_db = 20 * np.log10(np.abs(Y) + 1e-10)
    return y, sr, freqs, magnitude_db

file1 = 'violin.wav'
file2 = 'cello.wav'

y1, sr1, freqs1, mag1 = load_and_analyze(file1)
y2, sr2, freqs2, mag2 = load_and_analyze(file2)

# Plot comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Waveforms
axes[0, 0].plot(np.linspace(0, len(y1)/sr1, len(y1)), y1, linewidth=0.3, color='royalblue')
axes[0, 0].set_title('Violin Waveform'); axes[0, 0].set_xlabel('Time (s)')

axes[0, 1].plot(np.linspace(0, len(y2)/sr2, len(y2)), y2, linewidth=0.3, color='firebrick')
axes[0, 1].set_title('Cello Waveform'); axes[0, 1].set_xlabel('Time (s)')

# Spectra
axes[1, 0].plot(freqs1[:len(freqs1)//4], mag1[:len(freqs1)//4], color='royalblue', linewidth=0.8)
axes[1, 0].set_title('Violin Spectrum'); axes[1, 0].set_xlabel('Frequency (Hz)')
axes[1, 0].set_ylabel('dB'); axes[1, 0].set_ylim(-80, 10)

axes[1, 1].plot(freqs2[:len(freqs2)//4], mag2[:len(freqs2)//4], color='firebrick', linewidth=0.8)
axes[1, 1].set_title('Cello Spectrum'); axes[1, 1].set_xlabel('Frequency (Hz)')
axes[1, 1].set_ylabel('dB'); axes[1, 1].set_ylim(-80, 10)

plt.tight_layout()
plt.savefig('spectral_comparison.png', dpi=150)
plt.show()

# Compute and report centroid difference
c1 = librosa.feature.spectral_centroid(y=y1, sr=sr1).mean()
c2 = librosa.feature.spectral_centroid(y=y2, sr=sr2).mean()
print(f"Violin centroid: {c1:.1f} Hz")
print(f"Cello centroid:  {c2:.1f} Hz")
print(f"Difference: {c1 - c2:.1f} Hz ({(c1/c2-1)*100:.1f}% brighter)")

Pattern 7: Generating a Sine Wave and Saving as Audio

import numpy as np
import soundfile as sf
import matplotlib.pyplot as plt

# Parameters
sr = 44100          # sample rate
duration = 3.0      # seconds
frequency = 440.0   # Hz (A4)
amplitude = 0.7     # peak amplitude (0.0–1.0; avoid > 0.9 to prevent clipping)

# Add a short fade-in and fade-out to prevent click artifacts
fade_samples = int(0.01 * sr)    # 10 ms fade

# Generate time array and sine wave
t = np.linspace(0, duration, int(sr * duration), endpoint=False)
y = amplitude * np.sin(2 * np.pi * frequency * t)

# Apply fade-in and fade-out
fade_in = np.linspace(0, 1, fade_samples)
fade_out = np.linspace(1, 0, fade_samples)
y[:fade_samples] *= fade_in
y[-fade_samples:] *= fade_out

# Save to WAV file
output_path = f'sine_{int(frequency)}hz.wav'
sf.write(output_path, y, sr)
print(f"Saved: {output_path}  ({duration:.1f} s at {sr} Hz)")

# Verify by plotting a short segment
plt.figure(figsize=(10, 3))
n_show = int(0.01 * sr)    # show 10 ms
plt.plot(t[:n_show] * 1000, y[:n_show])
plt.xlabel('Time (ms)'); plt.ylabel('Amplitude')
plt.title(f'Sine Wave: {frequency} Hz (first 10 ms)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pattern 8: Implementing a Basic Filter

import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import soundfile as sf

def apply_filter(y, sr, filter_type='lowpass', cutoff_hz=1000, order=4):
    """
    Apply a Butterworth IIR filter to audio.
    filter_type: 'lowpass', 'highpass', 'bandpass', 'bandstop'
    cutoff_hz: scalar for low/high pass; [low, high] list for band filters
    """
    if filter_type in ('bandpass', 'bandstop'):
        Wn = [cutoff_hz[0], cutoff_hz[1]]
    else:
        Wn = cutoff_hz
    sos = signal.butter(order, Wn, btype=filter_type, fs=sr, output='sos')
    return signal.sosfilt(sos, y)

# Load or generate audio
sr = 44100
t = np.linspace(0, 2.0, int(sr * 2.0), endpoint=False)
# Create a multi-tone test signal: bass (100 Hz) + midrange (1000 Hz) + treble (8000 Hz)
y = (0.5 * np.sin(2*np.pi*100*t) +
     0.4 * np.sin(2*np.pi*1000*t) +
     0.3 * np.sin(2*np.pi*8000*t))

# Apply a low-pass filter at 500 Hz (should keep bass, remove treble)
y_lp = apply_filter(y, sr, filter_type='lowpass', cutoff_hz=500)

# Apply a high-pass filter at 2000 Hz (should keep treble, remove bass)
y_hp = apply_filter(y, sr, filter_type='highpass', cutoff_hz=2000)

# Plot frequency spectra to verify
def plot_spectrum(ax, signal_data, sr, title, color):
    Y = np.fft.rfft(signal_data * np.hanning(len(signal_data)))
    freqs = np.fft.rfftfreq(len(signal_data), 1/sr)
    ax.plot(freqs, 20*np.log10(np.abs(Y) + 1e-10), color=color, linewidth=0.8)
    ax.set_title(title); ax.set_xlabel('Frequency (Hz)'); ax.set_ylabel('dB')
    ax.set_xlim(0, 10000); ax.set_ylim(-80, 20); ax.grid(True, alpha=0.3)

fig, axes = plt.subplots(3, 1, figsize=(10, 9))
plot_spectrum(axes[0], y, sr, 'Original Signal (100 + 1000 + 8000 Hz)', 'black')
plot_spectrum(axes[1], y_lp, sr, 'After Low-pass Filter (cutoff 500 Hz)', 'steelblue')
plot_spectrum(axes[2], y_hp, sr, 'After High-pass Filter (cutoff 2000 Hz)', 'firebrick')
plt.tight_layout()
plt.show()

# Save filtered versions
sf.write('filtered_lowpass.wav', y_lp, sr)
sf.write('filtered_highpass.wav', y_hp, sr)
print("Filtered audio saved.")

B.4 Troubleshooting Guide

Error: `librosa.load()` returns wrong sample rate

Symptom: You call librosa.load('file.wav') and the returned sr is 22050, not what you expected.

Cause: By default, librosa resamples to 22,050 Hz. This saves memory for large files but discards high-frequency information above 11,025 Hz.

Fix: Always use sr=None to preserve the original sample rate:

y, sr = librosa.load('file.wav', sr=None)   # preserves original SR

Error: FFT amplitude values seem very large

Symptom: np.abs(np.fft.fft(y)) returns values in the thousands, not between 0 and 1.

Cause: The FFT of an N-sample signal sums N complex values. The DC component (index 0) and peak components accumulate to large values proportional to N.

Fix: Normalize by dividing by N/2 (for the one-sided spectrum):

Y = np.fft.rfft(y)
magnitude_normalized = np.abs(Y) / (len(y) / 2)
# Now peak magnitude ≈ the amplitude of the original sine wave

Or use decibels with ref=np.max to focus on relative levels rather than absolute:

magnitude_db = 20 * np.log10(np.abs(Y) / np.max(np.abs(Y)) + 1e-10)

Error: Matplotlib spectrogram colors are all one shade

Symptom: plt.specgram() or librosa.display.specshow() shows a nearly uniform image.

Cause: The dynamic range of audio spectra is typically 60–100 dB. If you plot raw magnitudes (not in dB), the color scale is dominated by a few loud frequency bins, making everything else appear identical.

Fix: Always convert to decibels before displaying:

D = librosa.stft(y)
D_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
librosa.display.specshow(D_db, ...)

Set vmin and vmax to control the displayed dynamic range:

librosa.display.specshow(D_db, vmin=-80, vmax=0, ...)

Error: `soundfile.write()` produces distorted audio

Symptom: The saved WAV file sounds clipped, crackly, or distorted.

Cause: WAV files store audio as integers. Values outside [−1.0, +1.0] in a float array are clipped during the integer conversion, causing hard digital distortion.

Fix: Always check and normalize your signal before writing:

# Check for clipping
if np.max(np.abs(y)) > 1.0:
    print("Warning: signal will clip! Normalizing...")
    y = y / np.max(np.abs(y)) * 0.95   # normalize to 95% of maximum

sf.write('output.wav', y, sr)

Alternatively, write as 32-bit float which avoids the clipping issue at the format level:

sf.write('output.wav', y, sr, subtype='FLOAT')

Error: `librosa.beat.beat_track()` returns implausible tempo

Symptom: Tempo estimate is 280 BPM when the actual song is 70 BPM (or half the actual tempo).

Cause: Beat trackers can lock onto double- or half-time. This is a known limitation of autocorrelation-based beat tracking, especially in music with heavy syncopation or unusual meters.

Fix: Constrain the search range:

tempo, beats = librosa.beat.beat_track(y=y, sr=sr, bpm=70)  # seed estimate
# or
tempo, beats = librosa.beat.beat_track(y=y, sr=sr,
                                        start_bpm=120,
                                        tightness=100)

Also try onset strength with different hop sizes. When in doubt, manually annotate a few beats and compare.

B.5 Further Resources

librosa Documentation: The official librosa documentation is comprehensive and includes tutorial notebooks. Search for "librosa documentation" with the current version number. Pay special attention to the "Examples" section and the "Advanced Usage" pages on custom feature extraction.

SciPy Signal Processing Tutorial: The SciPy documentation includes a dedicated signal processing tutorial that covers filtering theory, spectral analysis, and the discrete Fourier Transform in depth. The "scipy.signal" module reference is the definitive source for filter design functions.

NumPy FFT Documentation: NumPy's numpy.fft module documentation explains the exact conventions used (normalization, frequency ordering, one-sided vs. two-sided transforms). Reading the "Notes" section carefully will clarify many common confusion points about amplitude scaling.

Jupyter Notebooks for Audio Analysis: Search for "jupyter audio analysis tutorial" or "music information retrieval tutorial notebook." The ISMIR (International Society for Music Information Retrieval) community has published many high-quality tutorial notebooks that complement this textbook's material.

"Fundamentals of Music Processing" by Meinard Müller: This academic textbook provides the mathematical foundations underlying librosa's algorithms. If you want to understand the theory behind STFT, chroma features, and beat tracking, this is the reference to consult.

scikit-learn Documentation: For the machine learning applications in Part VII of the textbook (Chapters 33–37), the scikit-learn documentation's "User Guide" section on classification, clustering, and feature selection is essential reading.

The complete audio_toolkit.py module is in the code/ subdirectory. Import it in any chapter exercise with from audio_toolkit import load_audio, plot_spectrogram, compute_features.

In This Chapter

Appendix B: Python Audio Toolkit & Code Reference

B.1 Installation and Setup

Python Version

Creating a Virtual Environment

Installing All Required Packages

Testing Your Installation

Common Installation Issues

B.2 Core Libraries Reference

NumPy

SciPy Signal Processing (scipy.signal)

Matplotlib

librosa

soundfile

B.3 Common Audio Analysis Patterns

Pattern 1: Loading and Plotting a Waveform

Pattern 2: Computing and Displaying a Spectrogram

Pattern 3: Computing the FFT Spectrum of a Short Clip

Pattern 4: Extracting Spectral Features with librosa

Pattern 5: Detecting Beats and Tempo

Pattern 6: Comparing Two Audio Files Spectrally

Pattern 7: Generating a Sine Wave and Saving as Audio

Pattern 8: Implementing a Basic Filter

B.4 Troubleshooting Guide

Error: librosa.load() returns wrong sample rate

Error: FFT amplitude values seem very large

Error: Matplotlib spectrogram colors are all one shade

Error: soundfile.write() produces distorted audio

Error: librosa.beat.beat_track() returns implausible tempo

B.5 Further Resources

SciPy Signal Processing (`scipy.signal`)

Error: `librosa.load()` returns wrong sample rate

Error: `soundfile.write()` produces distorted audio

Error: `librosa.beat.beat_track()` returns implausible tempo