Code Lab: Time-Frequency Uncertainty Demonstration

In quantum mechanics, the Heisenberg uncertainty principle states that position and momentum cannot both be known with arbitrary precision. An analogous principle governs signal analysis: you cannot simultaneously achieve perfect resolution in both time and frequency. A short analysis window gives you precise timing but blurs frequency content; a long window sharpens frequency resolution but smears events in time. This trade-off is fundamental to how we analyze musical timbre. In this lab, you will explore the time-frequency uncertainty principle hands-on using the Short-Time Fourier Transform (STFT) and spectrograms.

Generating Test Signals

We begin by creating three signals that stress different aspects of time-frequency analysis: a linear chirp (frequency changes continuously), a transient click (precisely located in time), and a sustained tone (precisely located in frequency).

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import stft, spectrogram

sr = 8000  # sample rate in Hz
duration = 1.0
t = np.linspace(0, duration, int(sr * duration), endpoint=False)

# Linear chirp: sweeps from 200 Hz to 2000 Hz
chirp = np.sin(2 * np.pi * (200 * t + (2000 - 200) / (2 * duration) * t ** 2))

# Transient click at t = 0.5 s
transient = np.zeros_like(t)
click_center = int(0.5 * sr)
click_width = int(0.002 * sr)  # 2 ms Gaussian pulse
window = np.exp(-0.5 * ((np.arange(len(t)) - click_center) / click_width) ** 2)
transient = window

# Sustained pure tone at 1000 Hz
sustained = np.sin(2 * np.pi * 1000 * t)

fig, axes = plt.subplots(3, 1, figsize=(10, 5), sharex=True)
for ax, sig, label in zip(axes, [chirp, transient, sustained],
                          ["Chirp (200-2000 Hz)", "Transient Click",
                           "Sustained Tone (1000 Hz)"]):
    ax.plot(t, sig, linewidth=0.5)
    ax.set_ylabel(label)
axes[-1].set_xlabel("Time (s)")
fig.suptitle("Test Signals for Time-Frequency Analysis")
plt.tight_layout()
plt.show()

The STFT and Window Size Trade-Off

The STFT divides a signal into overlapping segments, applies a window function to each segment, and computes the FFT. The window length directly controls the uncertainty trade-off: shorter windows improve time resolution at the cost of frequency resolution, and vice versa.

def plot_spectrogram(signal, sr, nperseg, title=""):
    """Compute and plot an STFT-based spectrogram."""
    f, t_spec, Zxx = stft(signal, fs=sr, nperseg=nperseg,
                          noverlap=nperseg * 3 // 4, window="hann")
    plt.figure(figsize=(10, 3))
    plt.pcolormesh(t_spec, f, np.abs(Zxx), shading="gouraud", cmap="inferno")
    plt.ylabel("Frequency (Hz)")
    plt.xlabel("Time (s)")
    plt.ylim(0, 3000)
    plt.colorbar(label="Magnitude")
    plt.title(title)
    plt.tight_layout()
    plt.show()

# Short window: good time resolution, poor frequency resolution
plot_spectrogram(chirp, sr, nperseg=64,
                 title="Chirp -- Short Window (64 samples = 8 ms)")

# Long window: poor time resolution, good frequency resolution
plot_spectrogram(chirp, sr, nperseg=512,
                 title="Chirp -- Long Window (512 samples = 64 ms)")

With the short window, the chirp's trajectory through time is sharply defined, but the frequency axis appears broad and blurry. With the long window, each frequency component is a thin line, but the temporal smearing makes the chirp look like a wide band rather than a crisp diagonal.

Comparing Window Sizes Across All Three Signals

window_sizes = [32, 128, 512]

fig, axes = plt.subplots(3, 3, figsize=(14, 8))
signals = [chirp, transient, sustained]
signal_names = ["Chirp", "Transient", "Sustained Tone"]

for col, (sig, name) in enumerate(zip(signals, signal_names)):
    for row, nperseg in enumerate(window_sizes):
        f, t_spec, Zxx = stft(sig, fs=sr, nperseg=nperseg,
                              noverlap=nperseg * 3 // 4, window="hann")
        ax = axes[row, col]
        ax.pcolormesh(t_spec, f, np.abs(Zxx), shading="gouraud",
                      cmap="inferno")
        ax.set_ylim(0, 3000)
        if row == 0:
            ax.set_title(name)
        if col == 0:
            ax.set_ylabel(f"win={nperseg}\nFreq (Hz)")
        if row == 2:
            ax.set_xlabel("Time (s)")

fig.suptitle("Time-Frequency Uncertainty: Window Size Comparison", y=1.01)
plt.tight_layout()
plt.show()

Study the 3x3 grid carefully. The transient is best resolved with a short window (top row), where it appears as a narrow vertical stripe. The sustained tone is best resolved with a long window (bottom row), where it appears as a thin horizontal line. No single window size is optimal for all three signals simultaneously -- this is the uncertainty principle in action.

Quantifying the Uncertainty Product

The time-frequency uncertainty principle can be stated as $\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$, where $\Delta t$ and $\Delta f$ are the standard deviations of the window in time and frequency, respectively. For a Gaussian window, this bound is achieved exactly.

def uncertainty_product(nperseg, sr):
    """Compute delta_t and delta_f for a Hann window of given length."""
    # Time resolution: effective width of the Hann window
    delta_t = nperseg / sr  # window duration in seconds
    # Frequency resolution: main lobe width of Hann window DFT
    delta_f = 2.0 * sr / nperseg  # approximate main-lobe width in Hz
    return delta_t, delta_f

print(f"{'Window (samples)':>20s} {'delta_t (ms)':>14s} {'delta_f (Hz)':>14s} "
      f"{'Product':>10s}")
print("-" * 62)

for nperseg in [32, 64, 128, 256, 512, 1024]:
    dt, df = uncertainty_product(nperseg, sr)
    print(f"{nperseg:>20d} {dt*1000:>14.2f} {df:>14.2f} {dt*df:>10.2f}")

Notice that the product $\Delta t \cdot \Delta f$ remains constant regardless of window size. You can trade time precision for frequency precision, but you can never reduce both simultaneously.

Musical Implications: Analyzing a Multi-Component Signal

Real musical sounds contain both transient attacks and sustained harmonics. Let us build a composite signal that mimics a plucked string and examine how window choice affects our ability to see both features.

# Plucked-string model: sharp attack + decaying harmonics
attack = np.exp(-t / 0.005) * np.random.randn(len(t)) * 0.3
harmonics = np.zeros_like(t)
for n, amp in enumerate([1.0, 0.7, 0.5, 0.3, 0.15], start=1):
    harmonics += amp * np.exp(-t * n * 2) * np.sin(2 * np.pi * 220 * n * t)
pluck = attack + harmonics

for nperseg, label in [(64, "Short Window (64)"),
                        (256, "Medium Window (256)"),
                        (1024, "Long Window (1024)")]:
    plot_spectrogram(pluck, sr, nperseg,
                     title=f"Plucked String -- {label}")

The short window resolves the attack transient clearly but smears the harmonic series. The long window separates each harmonic partial into a distinct ridge but loses the precise timing of the attack. The medium window offers a compromise. In practice, audio engineers and researchers choose window sizes (or use multi-resolution techniques such as wavelets) based on which features matter most for their application.

Try It Yourself

  1. Window function comparison. Replace the "hann" window in plot_spectrogram with "boxcar" (rectangular), "hamming", and "blackman". For each, generate the chirp spectrogram with nperseg=256. How does the choice of window function affect spectral leakage (the vertical smearing of energy into neighboring frequency bins)? Which window gives the narrowest main lobe? Which has the lowest sidelobes?

  2. Build a two-tone signal. Create a signal containing two simultaneous sine waves at 1000 Hz and 1050 Hz (only 50 Hz apart). What is the minimum window length required to visually resolve the two frequencies as separate horizontal lines in the spectrogram? Verify your answer against the theoretical frequency resolution $\Delta f \approx f_s / N$.

  3. Design an adaptive spectrogram. Write a function that computes the STFT twice -- once with a short window and once with a long window -- and combines them by taking the element-wise maximum of the two magnitude arrays (after interpolating to a common time-frequency grid). Apply it to the plucked-string signal. Does this simple multi-resolution approach improve your ability to see both the attack transient and the harmonic partials simultaneously?