Case Study 1: Image Compression with SVD

Overview

Digital images are matrices. A grayscale image of $m \times n$ pixels is simply a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ where each entry stores a pixel intensity (typically 0 to 255). As we learned in Section 2.5, the SVD decomposes any matrix into a sum of rank-1 components ordered by importance. By keeping only the top $k$ components, we obtain a compressed representation that retains the most visually significant information while dramatically reducing storage.

This case study walks through the complete pipeline: loading an image, computing its SVD, reconstructing it at various compression levels, and analyzing the trade-off between quality and storage.

The Mathematics

Recall the SVD of an image matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$:

$$ \mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T = \sum_{i=1}^{r} \sigma_i \mathbf{u}_i \mathbf{v}_i^T $$

where:

$r = \text{rank}(\mathbf{A})$ is the number of non-zero singular values,
$\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0$ are the singular values in decreasing order,
each term $\sigma_i \mathbf{u}_i \mathbf{v}_i^T$ is a rank-1 matrix capturing one "layer" of the image.

The rank-$k$ approximation:

$$ \mathbf{A}_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T = \mathbf{U}_k \boldsymbol{\Sigma}_k \mathbf{V}_k^T $$

Storage analysis. The original image stores $m \times n$ values. The rank-$k$ representation stores:

$\mathbf{U}_k$: $m \times k$ values
$\boldsymbol{\Sigma}_k$: $k$ values
$\mathbf{V}_k^T$: $k \times n$ values

Total: $k(m + n + 1)$ values. The compression ratio is:

$$ \text{Compression ratio} = \frac{m \times n}{k(m + n + 1)} $$

For a $512 \times 512$ image with $k = 50$, this gives a compression ratio of approximately $\frac{262144}{50 \times 1025} \approx 5.1\times$.

Step-by-Step Implementation

Step 1: Create a Test Image

We will generate a synthetic grayscale image with clear structure that SVD can exploit. We will also show how to work with real images if available.

import numpy as np
import matplotlib.pyplot as plt

def create_test_image(size: int = 256) -> np.ndarray:
    """Create a synthetic test image with various features.

    Args:
        size: Width and height of the square image.

    Returns:
        A 2D NumPy array of shape (size, size) with values in [0, 255].
    """
    img = np.zeros((size, size))

    # Gradient background
    x = np.linspace(0, 1, size)
    y = np.linspace(0, 1, size)
    X, Y = np.meshgrid(x, y)
    img += 50 * X + 30 * Y

    # Circular feature
    center = size // 2
    radius = size // 4
    mask = (X * size - center) ** 2 + (Y * size - center) ** 2 < radius ** 2
    img[mask] += 100

    # Horizontal and vertical bars
    bar_width = size // 32
    for i in range(0, size, size // 8):
        img[i:i + bar_width, :] += 40
        img[:, i:i + bar_width] += 40

    # Clip to valid range
    img = np.clip(img, 0, 255)
    return img


image = create_test_image(256)
print(f"Image shape: {image.shape}")
print(f"Image dtype: {image.dtype}")
print(f"Pixel range: [{image.min():.1f}, {image.max():.1f}]")

Step 2: Compute the SVD

def compute_svd(image: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Compute the full SVD of an image matrix.

    Args:
        image: 2D array representing a grayscale image.

    Returns:
        Tuple of (U, sigma, Vt) from the SVD.
    """
    U, sigma, Vt = np.linalg.svd(image, full_matrices=False)
    return U, sigma, Vt


U, sigma, Vt = compute_svd(image)
print(f"U shape: {U.shape}")
print(f"Number of singular values: {len(sigma)}")
print(f"Vt shape: {Vt.shape}")
print(f"Top 10 singular values: {sigma[:10].round(1)}")

Step 3: Analyze Singular Value Decay

The rate at which singular values decay tells us how compressible the image is. Images with smooth, regular structure have rapidly decaying singular values.

def analyze_singular_values(sigma: np.ndarray) -> None:
    """Analyze and plot singular value decay and cumulative energy.

    Args:
        sigma: Array of singular values in decreasing order.
    """
    # Cumulative energy (fraction of total squared singular values)
    energy = np.cumsum(sigma ** 2) / np.sum(sigma ** 2)

    fig, axes = plt.subplots(1, 2, figsize=(12, 4))

    # Singular value magnitudes
    axes[0].semilogy(sigma, 'b-', linewidth=1.5)
    axes[0].set_xlabel('Index')
    axes[0].set_ylabel('Singular Value (log scale)')
    axes[0].set_title('Singular Value Decay')
    axes[0].grid(True, alpha=0.3)

    # Cumulative energy
    axes[1].plot(energy, 'r-', linewidth=1.5)
    axes[1].axhline(y=0.95, color='gray', linestyle='--', label='95% energy')
    axes[1].axhline(y=0.99, color='gray', linestyle=':', label='99% energy')

    # Find k for 95% and 99% energy
    k_95 = np.searchsorted(energy, 0.95) + 1
    k_99 = np.searchsorted(energy, 0.99) + 1
    axes[1].axvline(x=k_95, color='green', linestyle='--',
                    label=f'k={k_95} (95%)')
    axes[1].axvline(x=k_99, color='orange', linestyle='--',
                    label=f'k={k_99} (99%)')

    axes[1].set_xlabel('Number of Components (k)')
    axes[1].set_ylabel('Cumulative Energy')
    axes[1].set_title('Cumulative Energy Captured')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('singular_value_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()

    print(f"Components for 95% energy: {k_95}")
    print(f"Components for 99% energy: {k_99}")


analyze_singular_values(sigma)

Step 4: Reconstruct at Various Ranks

def reconstruct_image(
    U: np.ndarray,
    sigma: np.ndarray,
    Vt: np.ndarray,
    k: int
) -> np.ndarray:
    """Reconstruct an image using the top-k singular components.

    Args:
        U: Left singular vectors, shape (m, min(m,n)).
        sigma: Singular values, shape (min(m,n),).
        Vt: Right singular vectors transposed, shape (min(m,n), n).
        k: Number of singular components to keep.

    Returns:
        Reconstructed image as a 2D array, clipped to [0, 255].
    """
    reconstructed = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]
    return np.clip(reconstructed, 0, 255)


def compression_analysis(
    image: np.ndarray,
    U: np.ndarray,
    sigma: np.ndarray,
    Vt: np.ndarray,
    ranks: list[int]
) -> None:
    """Reconstruct image at multiple ranks and display results.

    Args:
        image: Original image matrix.
        U: Left singular vectors.
        sigma: Singular values.
        Vt: Right singular vectors transposed.
        ranks: List of rank values to test.
    """
    m, n = image.shape
    original_size = m * n

    fig, axes = plt.subplots(2, len(ranks) // 2 + 1, figsize=(16, 8))
    axes = axes.flatten()

    # Show original
    axes[0].imshow(image, cmap='gray', vmin=0, vmax=255)
    axes[0].set_title(f'Original\n{m}x{n} = {original_size:,} values')
    axes[0].axis('off')

    results = []
    for idx, k in enumerate(ranks):
        reconstructed = reconstruct_image(U, sigma, Vt, k)

        # Compute metrics
        compressed_size = k * (m + n + 1)
        ratio = original_size / compressed_size
        rmse = np.sqrt(np.mean((image - reconstructed) ** 2))
        rel_error = np.linalg.norm(image - reconstructed, 'fro') / \
                    np.linalg.norm(image, 'fro')
        energy = np.sum(sigma[:k] ** 2) / np.sum(sigma ** 2)

        results.append({
            'k': k,
            'ratio': ratio,
            'rmse': rmse,
            'rel_error': rel_error,
            'energy': energy
        })

        axes[idx + 1].imshow(reconstructed, cmap='gray', vmin=0, vmax=255)
        axes[idx + 1].set_title(
            f'Rank {k}\nRatio: {ratio:.1f}x | Error: {rel_error:.3f}'
        )
        axes[idx + 1].axis('off')

    # Hide unused axes
    for idx in range(len(ranks) + 1, len(axes)):
        axes[idx].axis('off')

    plt.tight_layout()
    plt.savefig('svd_compression_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()

    # Print summary table
    print(f"\n{'k':>4} | {'Comp. Ratio':>11} | {'RMSE':>8} | "
          f"{'Rel. Error':>10} | {'Energy':>8}")
    print("-" * 55)
    for r in results:
        print(f"{r['k']:4d} | {r['ratio']:11.2f}x | {r['rmse']:8.2f} | "
              f"{r['rel_error']:10.4f} | {r['energy']:7.4f}")


ranks = [1, 5, 10, 20, 50, 100]
compression_analysis(image, U, sigma, Vt, ranks)

Step 5: Color Image Extension

Real images have three color channels (RGB). We apply SVD independently to each channel.

def compress_color_image(
    image_rgb: np.ndarray,
    k: int
) -> np.ndarray:
    """Compress a color image using SVD on each channel independently.

    Args:
        image_rgb: 3D array of shape (m, n, 3) with RGB channels.
        k: Number of singular components to keep per channel.

    Returns:
        Compressed image as a 3D array, clipped to [0, 255].
    """
    compressed = np.zeros_like(image_rgb, dtype=float)

    for channel in range(3):
        U, sigma, Vt = np.linalg.svd(
            image_rgb[:, :, channel].astype(float),
            full_matrices=False
        )
        compressed[:, :, channel] = (
            U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]
        )

    return np.clip(compressed, 0, 255).astype(np.uint8)


# Create a synthetic color image
np.random.seed(42)
m, n = 256, 256
color_image = np.zeros((m, n, 3), dtype=np.uint8)
x = np.linspace(0, 1, n)
y = np.linspace(0, 1, m)
X, Y = np.meshgrid(x, y)

color_image[:, :, 0] = (200 * np.sin(2 * np.pi * X) ** 2).astype(np.uint8)
color_image[:, :, 1] = (200 * np.sin(2 * np.pi * Y) ** 2).astype(np.uint8)
color_image[:, :, 2] = (150 * np.sin(2 * np.pi * (X + Y))).astype(np.uint8)

for k in [5, 20, 50]:
    compressed = compress_color_image(color_image, k)
    error = np.sqrt(np.mean((color_image.astype(float) -
                             compressed.astype(float)) ** 2))
    ratio = (m * n * 3) / (k * (m + n + 1) * 3)
    print(f"k={k:3d}: RMSE={error:.2f}, Compression={ratio:.1f}x")

Results and Discussion

The results demonstrate several key insights:

Rapid energy concentration. For structured images, the first few singular values capture the vast majority of the energy. Even keeping just 20 components out of 256 can retain over 95% of the image energy.
Diminishing returns. The improvement from adding more components decreases as $k$ increases. Going from $k = 1$ to $k = 10$ produces a dramatic improvement, while going from $k = 50$ to $k = 100$ yields only marginal gains.
Compression vs. quality trade-off. There is no single "right" value of $k$---it depends on the application. For thumbnail previews, $k = 10$ might suffice. For archival quality, you might need $k$ close to full rank.
Image structure matters. Images with smooth gradients and regular patterns (like our synthetic test image) compress much better than images with fine-grained texture or noise, because their singular values decay faster.

Connection to AI

SVD-based compression is a foundational concept that extends to many AI applications:

Autoencoders learn a non-linear generalization of SVD compression, where the encoder maps to a low-dimensional bottleneck and the decoder reconstructs.
Weight compression in large language models uses SVD to reduce the parameter count of weight matrices, a technique known as low-rank factorization.
Data preprocessing with truncated SVD removes noise and reduces dimensionality before feeding data to downstream models.

We will explore the non-linear extensions of these ideas when we cover neural network architectures in later chapters.

Key Takeaways

SVD provides the mathematically optimal low-rank approximation of any matrix (Eckart-Young theorem).
The singular value spectrum reveals how compressible the data is.
The compression ratio for a rank-$k$ approximation of an $m \times n$ matrix is $\frac{mn}{k(m + n + 1)}$.
For color images, SVD is applied independently to each channel.
SVD compression is the linear precursor to autoencoder-based compression in deep learning.