Singular Value Decomposition | Advanced Topics | Linear Algebra

1. The SVD Theorem

Theorem 8.1: Singular Value Decomposition

Every matrix $A \in M_{m \times n}(\mathbb{R})$ can be factored as:

A = U \Sigma V^T

where:

$U \in M_{m \times m}$ is orthogonal (columns are left singular vectors)
$V \in M_{n \times n}$ is orthogonal (columns are right singular vectors)
$\Sigma \in M_{m \times n}$ is diagonal with singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$

Remark 8.1: Key Properties

Singular values are unique; singular vectors are unique up to sign (for distinct σᵢ)
Number of nonzero singular values = rank(A)
$\sigma_i = \sqrt{\lambda_i(A^T A)} = \sqrt{\lambda_i(AA^T)}$

Definition 8.1: Singular Values and Vectors

Singular values: $\sigma_i$ are the diagonal entries of $\Sigma$
Right singular vectors: columns $v_i$ of $V$ (eigenvectors of $A^T A$ )
Left singular vectors: columns $u_i$ of $U$ (eigenvectors of $AA^T$ )

Example 8.1: SVD of 2×2 Matrix

Find SVD of $A = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}$ .

Since A is already diagonal with positive entries:

U = I, \quad \Sigma = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}, \quad V = I

Singular values: $\sigma_1 = 3, \sigma_2 = 2$ .

Proof:

Proof of SVD Existence:

1. $A^T A$ is symmetric positive semi-definite, so has eigenvalues $\lambda_1 \geq \cdots \geq \lambda_n \geq 0$ .

2. Let $\sigma_i = \sqrt{\lambda_i}$ and $v_i$ be orthonormal eigenvectors of $A^T A$ .

3. For $\sigma_i > 0$ , define $u_i = Av_i / \sigma_i$ . These are orthonormal.

4. Extend to orthonormal bases of $\mathbb{R}^m$ and $\mathbb{R}^n$ .

5. Verify $AV = U\Sigma$ , hence $A = U\Sigma V^T$ .

∎

Example 8.1a: SVD Computation

Find SVD of $A = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}$ :

Step 1: Compute $A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}$

Step 2: Eigenvalues: $\lambda_1 = 2, \lambda_2 = 0$ , so $\sigma_1 = \sqrt{2}, \sigma_2 = 0$

Step 3: Right singular vectors: $v_1 = (1, 1)^T/\sqrt{2}, v_2 = (1, -1)^T/\sqrt{2}$

Step 4: Left singular vectors: $u_1 = Av_1/\sigma_1 = (1, 0)^T, u_2 = (0, 1)^T$

Theorem 8.1a: Relationship to Eigenvalues

For $A \in M_{m \times n}$ :

Nonzero eigenvalues of $A^T A$ and $AA^T$ are the same (= $\sigma_i^2$ )
If $A$ is square, $|\det(A)| = \prod \sigma_i$
If $A$ is symmetric positive semi-definite, singular values = eigenvalues

Remark 8.1a: Full vs Reduced SVD

Full SVD: U is m×m, Σ is m×n, V is n×n
Reduced/Thin SVD: U is m×r, Σ is r×r, V is n×r (r = rank)
Reduced form is more storage-efficient and commonly used in practice

Example 8.1b: Rectangular Matrix SVD

Find SVD of $A = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}$ (3×1 matrix):

$A^T A = (9)$ , so $\sigma_1 = 3$

$V = (1)$ (1×1)

$u_1 = A/3 = (1/3, 2/3, 2/3)^T$

A = \begin{pmatrix} 1/3 \\ 2/3 \\ 2/3 \end{pmatrix} \cdot (3) \cdot (1)

2. Geometric Interpretation

Remark 8.2: SVD as Three Operations

The factorization $A = U \Sigma V^T$ means multiplying by $A$ is equivalent to:

Rotate/reflect by $V^T$ (align with principal axes)
Scale by $\Sigma$ (stretch/compress along axes)
Rotate/reflect by $U$ (to final orientation)

Remark 8.3: Image of Unit Sphere

A maps the unit sphere to an ellipsoid. The singular values are the semi-axis lengths, and the singular vectors are the axis directions.

Example 8.3: Geometric Visualization

For $A = \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix}$ :

The unit circle {x² + y² = 1} is mapped to an ellipse with:

Major axis length $2\sigma_1$ in direction $u_1$
Minor axis length $2\sigma_2$ in direction $u_2$

Definition 8.2: Matrix Norms via SVD

Spectral norm: $\|A\|_2 = \sigma_1$ (largest singular value)
Frobenius norm: $\|A\|_F = \sqrt{\sum \sigma_i^2}$
Nuclear norm: $\|A\|_* = \sum \sigma_i$

Remark 8.4: Interpretation of Norms

The spectral norm $\|A\|_2 = \sigma_1$ is the maximum stretching factor:

\|A\|_2 = \max_{\|x\|=1} \|Ax\|

The maximum is achieved when $x = v_1$ (first right singular vector).

Definition 8.3: Condition Number

The condition number of $A$ is:

\kappa(A) = \frac{\sigma_1}{\sigma_r}

where $\sigma_r$ is the smallest nonzero singular value. Measures sensitivity to perturbations.

Example 8.4: Condition Number

For $A = \begin{pmatrix} 1 & 0 \\ 0 & 0.001 \end{pmatrix}$ :

$\sigma_1 = 1, \sigma_2 = 0.001$

$\kappa(A) = 1000$ (ill-conditioned)

A 0.1% error in input could cause 100% error in output!

3. Low-Rank Approximation

Theorem 8.2: Eckart-Young Theorem

The best rank- $k$ approximation to $A$ (in Frobenius or spectral norm) is:

A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T

The approximation error is:

\|A - A_k\|_2 = \sigma_{k+1}, \quad \|A - A_k\|_F = \sqrt{\sum_{i=k+1}^{r} \sigma_i^2}

Example 8.2: Rank-1 Approximation

For $A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}$ with $\sigma_1 = 4, \sigma_2 = 2$ :

A_1 = 4 \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}

Proof:

Proof of Eckart-Young (Spectral Norm):

Let $B$ be any rank-k matrix. We show $\|A - B\|_2 \geq \sigma_{k+1}$ .

Since dim(null(B)) ≥ n - k and dim(span{v₁,...,v_{k+1}}) = k + 1, there exists nonzero $x \in \text{null}(B) \cap \text{span}\{v_1, ..., v_{k+1}\}$ .

Then $\|Ax\| \geq \sigma_{k+1}\|x\|$ and $Bx = 0$ , so:

\|A - B\|_2 \geq \frac{\|(A-B)x\|}{\|x\|} = \frac{\|Ax\|}{\|x\|} \geq \sigma_{k+1}

Equality is achieved by $A_k = \sum_{i=1}^k \sigma_i u_i v_i^T$ .

∎

Remark 8.5: Optimal Approximation

The Eckart-Young theorem is remarkable:

Among ALL rank-k matrices, truncated SVD is the best!
No optimization needed—the answer comes directly from SVD
Works for both spectral and Frobenius norms

Example 8.2a: Approximation Error

For $A$ with singular values $\sigma = (10, 5, 2, 1)$ :

Rank-1 error: $\|A - A_1\|_2 = 5$
Rank-2 error: $\|A - A_2\|_2 = 2$
Rank-3 error: $\|A - A_3\|_2 = 1$

Frobenius errors: $\sqrt{25 + 4 + 1} = \sqrt{30}$ , $\sqrt{5}$ , $1$

Corollary 8.1: Compression Ratio

For m×n matrix with rank r, storage is O(mn).

Rank-k approximation stores: U_k (m×k) + σ (k) + V_k (n×k) = O((m+n)k).

Compression ratio: $\frac{(m+n)k}{mn}$ . For k ≪ min(m,n), this is huge savings!

Definition 8.4: Relative Approximation Error

The relative error in rank-k approximation is:

\frac{\|A - A_k\|_F}{\|A\|_F} = \sqrt{\frac{\sum_{i=k+1}^r \sigma_i^2}{\sum_{i=1}^r \sigma_i^2}}

This measures what fraction of "energy" is lost.

Example 8.2b: Choosing Rank

How to choose k? Common strategies:

Keep 90% or 99% of total "energy" (sum of σᵢ²)
Look for "elbow" in singular value plot
Use cross-validation if doing prediction

4. Applications

Image Compression

Keep top k singular values to compress images. Rank-50 often visually indistinguishable from original.

Recommendation Systems

Matrix factorization for Netflix/Amazon. Low-rank structure captures user-item preferences.

Principal Component Analysis

SVD of centered data gives principal components. Dimensionality reduction preserving variance.

Pseudoinverse

$A^+ = V\Sigma^+ U^T$ . Solves least squares for any matrix.

Definition 8.5: Moore-Penrose Pseudoinverse

The pseudoinverse $A^+$ is defined via SVD:

A^+ = V \Sigma^+ U^T

where $\Sigma^+$ has diagonal entries $1/\sigma_i$ (for nonzero σᵢ) and 0 otherwise.

Theorem 8.3: Pseudoinverse Properties

The pseudoinverse $A^+$ satisfies:

$AA^+A = A$
$A^+AA^+ = A^+$
$(AA^+)^T = AA^+$
$(A^+A)^T = A^+A$

Example 8.5: Computing Pseudoinverse

For $A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$ :

SVD: $U = V = I, \Sigma = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$

$\Sigma^+ = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$ , so $A^+ = A$

Remark 8.6: Least Squares via Pseudoinverse

For $Ax = b$ (possibly overdetermined or rank-deficient):

x^+ = A^+ b

This gives the minimum-norm least squares solution.

Example 8.6: Image Compression

A 512×512 grayscale image has 262,144 values.

With rank-50 SVD approximation: 50×(512+1+512) = 51,250 values.

Compression ratio: 5:1 while retaining main features!

Example 8.7: Latent Semantic Analysis

In text analysis, term-document matrix A has:

Rows = words, columns = documents
SVD reveals latent "topics"
Similar documents have similar V columns
Related words have similar U columns

Example 8.8: Principal Component Analysis

Given centered data matrix $X$ (n samples × p features):

Compute SVD: $X = U \Sigma V^T$
Principal components: columns of $V$
Projected data: $XV_k$ (first k components)
Variance explained: $\sigma_i^2 / (n-1)$

5. Computing SVD

Remark 8.7: Never Form AᵀA!

Theoretical approach: eigenvalues of $A^T A$ give $\sigma_i^2$ .

DON'T DO THIS! It squares condition number and loses precision.

SVD Algorithms

Algorithm	Complexity	Best For
Golub-Reinsch	$O(mn^2)$	Dense matrices
Divide-and-conquer	$O(mn^2)$	Fast for all singular values
Randomized SVD	$O(mn \log k)$	Low-rank approximation
Lanczos/Arnoldi	$O(mn k)$	Sparse, few singular values

Example 8.9: Randomized SVD

For large matrices when only top k singular values needed:

Random projection: $Y = A\Omega$ where $\Omega$ is n×(k+p) random
Orthogonalize: $Y = QR$
Project: $B = Q^T A$
SVD of small matrix: $B = \tilde{U}\Sigma V^T$
Recover: $U = Q\tilde{U}$

6. Common Mistakes

Confusing SVD with eigendecomposition

SVD uses two different orthogonal matrices (U, V). Eigendecomposition uses one (and only works for square matrices).

Computing AᵀA explicitly

Never form AᵀA to compute SVD—it squares the condition number and loses precision. Use direct SVD algorithms.

Forgetting Σ is rectangular

For m×n matrix A, Σ is m×n (not square). Only min(m,n) singular values exist.

Wrong order: UΣVᵀ vs VΣUᵀ

The correct order is A = UΣVᵀ. Remember: U is m×m, V is n×n, and the transpose is on V.

Thinking singular values can be negative

Singular values are always non-negative. They come from √(eigenvalues of AᵀA), which are non-negative.

Remark 8.8: Debugging SVD

Verify your SVD by checking:

UᵀU = I and VᵀV = I (orthogonality)
A = UΣVᵀ (reconstruction)
Singular values are non-negative and sorted
rank(A) = number of nonzero σᵢ

7. Key Formulas Summary

SVD

• $A = U \Sigma V^T$
• $\sigma_i = \sqrt{\lambda_i(A^T A)}$
• rank(A) = # nonzero σᵢ
• |det(A)| = ∏σᵢ (square matrices)

Applications

• $A_k = \sum_{i=1}^k \sigma_i u_i v_i^T$
• $A^+ = V \Sigma^+ U^T$
• $\|A\|_2 = \sigma_1, \|A\|_F = \sqrt{\sum \sigma_i^2}$
• κ(A) = σ₁/σᵣ

SVD Computation Algorithm

Input: Matrix A (m×n)
Eigendecomposition: AᵀA = VDVᵀ (eigenvalues λᵢ, eigenvectors vᵢ)
Singular values: σᵢ = √λᵢ, sorted descending
Left singular vectors: uᵢ = Avᵢ/σᵢ for nonzero σᵢ
Extend: Complete U to orthonormal basis if needed
Output: A = UΣVᵀ

8. More Worked Examples

Example 8.10: Complete 2×3 SVD

Find SVD of $A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix}$ :

Step 1: Compute AᵀA:

A^T A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 2 \end{pmatrix}

Step 2: Eigenvalues: λ = 3, 1, 0. So σ = √3, 1, 0.

Step 3: Find eigenvectors of AᵀA (V) and compute U = AV/σ.

Example 8.11: Rank Determination

For matrix with SVD showing σ = (10, 5, 2, 0.0001, 0):

Exact rank: 4 (four nonzero singular values)
Numerical rank at tolerance 0.01: 3
Condition number: 10/0.0001 = 100,000 (ill-conditioned!)

Example 8.12: Least Squares via SVD

Solve $Ax = b$ where $A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix}$ and $b = (2, 1, 1)^T$ :

Step 1: Compute SVD of A

Step 2: $x = A^+ b = V \Sigma^+ U^T b$

This gives minimum-norm least squares solution.

Example 8.13: Matrix Approximation Error

If A has singular values 100, 50, 10, 5, 1:

||A||₂ = 100, ||A||_F = √(10000+2500+100+25+1) ≈ 112.3
Rank-2 error: ||A-A₂||₂ = 10, ||A-A₂||_F = √(126) ≈ 11.2
Energy in top 2 components: (100²+50²)/(112.3²) ≈ 99%

9. Connections and Extensions

SVD in the Linear Algebra Ecosystem

Spectral Theorem (LA-7.5)

SVD generalizes spectral theorem. For symmetric A, SVD and eigendecomposition coincide.

Polar Decomposition

A = UP where U is unitary and P = √(AᵀA). SVD gives this decomposition.

Four Fundamental Subspaces

SVD reveals all four: col(A), null(Aᵀ), row(A), null(A) via U and V columns.

Projections (LA-7.4)

AAᵀ and AᵀA projection matrices come from SVD. UUᵀ projects onto col(A).

Theorem 8.4: Polar Decomposition

Any matrix A = UP where:

$U$ is orthogonal (or unitary)
$P = \sqrt{A^T A}$ is positive semi-definite

From SVD: $U_{polar} = UV^T$ and $P = V\Sigma V^T$ .

Remark 8.9: Four Fundamental Subspaces via SVD

For $A = U\Sigma V^T$ with rank r:

$\text{col}(A) = \text{span}\{u_1, ..., u_r\}$
$\text{null}(A^T) = \text{span}\{u_{r+1}, ..., u_m\}$
$\text{row}(A) = \text{span}\{v_1, ..., v_r\}$
$\text{null}(A) = \text{span}\{v_{r+1}, ..., v_n\}$

10. Chapter Summary

Key Takeaways

SVD Theorem

• A = UΣVᵀ for ANY matrix
• U, V orthogonal; Σ diagonal
• Singular values ≥ 0
• Unique (up to signs for distinct σᵢ)

Key Results

• Eckart-Young: optimal low-rank
• Pseudoinverse: A⁺ = VΣ⁺Uᵀ
• Norms: ||A||₂ = σ₁
• rank(A) = # nonzero σᵢ

Applications

• Image compression
• Recommender systems
• PCA / dimensionality reduction
• Least squares / regression

Computation

• Never form AᵀA!
• Use Golub-Reinsch or similar
• Randomized SVD for large matrices
• O(mn²) for dense matrices

Remark 8.10: Mastery Checklist

You've mastered SVD when you can:

✓ State the SVD theorem and explain the components
✓ Compute SVD for simple matrices by hand
✓ Explain the geometric interpretation
✓ Apply Eckart-Young for low-rank approximation
✓ Compute pseudoinverse using SVD
✓ Connect SVD to eigendecomposition
✓ Use SVD for applications (PCA, compression, etc.)

Pro Tips for Exams

• Remember: A = UΣVᵀ, transpose is on V not U
• σᵢ = √(eigenvalue of AᵀA), always non-negative
• For symmetric A: SVD = spectral decomposition
• Low-rank approx: keep largest σᵢ terms
• Pseudoinverse: invert nonzero σᵢ, leave zeros as zeros
• Condition number: κ = σ₁/σᵣ (large = unstable)

11. Additional Theory

Theorem 8.5: Weyl's Inequality

For matrices A and E (perturbation):

|\sigma_i(A + E) - \sigma_i(A)| \leq \|E\|_2

Singular values are stable under small perturbations.

Remark 8.11: Perturbation Theory

SVD is numerically well-behaved:

Singular values change continuously with A
Singular vectors stable when σᵢ well-separated
Algorithms achieve backward stability

Definition 8.6: Truncated SVD

The rank-k truncated SVD retains only the k largest singular values:

A_k = U_k \Sigma_k V_k^T

where $U_k$ is m×k, $\Sigma_k$ is k×k, $V_k$ is n×k.

Example 8.14: Netflix Prize

The Netflix Prize used matrix factorization (related to SVD):

User-movie rating matrix R (sparse, 100M ratings)
Factor as R ≈ UV ᵀ (low-rank approximation)
Predict missing ratings: r_ij ≈ u_i · v_j
Winning team used modified SVD with regularization

Theorem 8.6: Relationship to Norms

The singular values determine all unitarily invariant norms:

Spectral: $\|A\|_2 = \sigma_1$
Frobenius: $\|A\|_F = \sqrt{\sum \sigma_i^2}$
Nuclear: $\|A\|_* = \sum \sigma_i$
Schatten-p: $\|A\|_p = (\sum \sigma_i^p)^{1/p}$

Example 8.15: Data Denoising

SVD for noise reduction:

Noisy data: A = signal + noise
Compute SVD of A
Small σᵢ likely correspond to noise
Truncate to keep only large σᵢ
Reconstructed A_k has reduced noise

12. Quick Reference

Key Definitions

• Singular values: σᵢ = √λᵢ(AᵀA)
• Left singular vectors: columns of U
• Right singular vectors: columns of V
• Pseudoinverse: A⁺ = VΣ⁺Uᵀ

Key Properties

• σᵢ ≥ 0, sorted descending
• rank(A) = # nonzero σᵢ
• |det(A)| = ∏σᵢ
• κ(A) = σ₁/σᵣ

Norms

• ||A||₂ = σ₁
• ||A||_F = √(Σσᵢ²)
• ||A||_* = Σσᵢ

Low-Rank Approximation

• A_k = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ
• ||A - A_k||₂ = σ_{k+1}
• ||A - A_k||_F = √(Σᵢ₌ₖ₊₁ σᵢ²)

Notation Summary

A = UΣVᵀ	SVD factorization
σᵢ	i-th singular value
uᵢ, vᵢ	Left/right singular vectors
A⁺	Moore-Penrose pseudoinverse
A_k	Rank-k truncated SVD
κ(A)	Condition number

Remark 8.12: Historical Note

The SVD was first developed by Eugenio Beltrami (1873) and Camille Jordan (1874) for bilinear forms. The modern computational algorithms were developed by Gene Golub and others in the 1960s. Today, SVD is computed billions of times daily in applications from Google searches to Netflix recommendations.

13. Additional Practice

Example 8.16: Practice Problem 1

Find the SVD of $A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$ :

Solution outline:

AᵀA = diag(2, 2), so σ₁ = σ₂ = √2
A is already orthogonal (up to scaling)
U = A/√2, Σ = √2 I, V = I

Example 8.17: Practice Problem 2

For $A = \begin{pmatrix} 2 & 0 \\ 0 & 3 \\ 0 & 0 \end{pmatrix}$ , find:

(a) Singular values
(b) SVD factorization
(c) Pseudoinverse A⁺
(d) rank(A), ||A||₂, ||A||_F

Example 8.18: Practice Problem 3

Show that for any matrix A:

\text{tr}(A^T A) = \sum_i \sigma_i^2 = \|A\|_F^2

Remark 8.13: Practice Strategy

To master SVD:

Practice computing SVD of 2×2 and 2×3 matrices
Understand each component geometrically
Connect to applications (image compression, PCA)
Compare with eigendecomposition for symmetric matrices

14. Advanced Applications

Example 8.19: Recommendation Systems

In collaborative filtering for recommendation:

User-item rating matrix R (mostly missing entries)
Factor R ≈ UVᵀ (implicitly via SVD)
User i's preferences: row i of U
Item j's features: row j of V
Predicted rating: $r_{ij} \approx u_i \cdot v_j$

Example 8.20: Face Recognition (Eigenfaces)

Classic face recognition using SVD:

Flatten images to vectors, form data matrix X
Center: subtract mean face
Compute SVD: X = UΣVᵀ
"Eigenfaces" are columns of U (principal components)
Project new face onto eigenfaces for recognition

Example 8.21: Document Similarity (LSI)

Latent Semantic Indexing for document search:

Term-document matrix A (rows=words, cols=docs)
Compute truncated SVD: A ≈ UₖΣₖVₖᵀ
Documents in "concept space": columns of ΣₖVₖᵀ
Query: project to concept space, find nearest docs
SVD captures semantic meaning beyond keyword matching

Remark 8.14: SVD in Machine Learning

SVD is foundational to many ML algorithms:

PCA: Dimensionality reduction via SVD
Matrix completion: Netflix, collaborative filtering
Robust PCA: Separate low-rank + sparse components
CUR decomposition: Interpretable low-rank approx
Word embeddings: SVD of co-occurrence matrix

Example 8.22: Signal Processing

SVD for signal separation:

Hankel matrix from time series
Large σᵢ correspond to dominant frequencies
Small σᵢ correspond to noise
Truncated SVD extracts clean signal

15. Computational Details

Remark 8.15: Numerical Stability

Why SVD algorithms avoid forming AᵀA:

κ(AᵀA) = κ(A)² — condition number squares!
Small singular values may be lost to roundoff
Direct algorithms maintain better precision

Example 8.23: Numerical Example

For $A$ with σ = (1, 10⁻⁸):

κ(A) = 10⁸
AᵀA has eigenvalues (1, 10⁻¹⁶)
κ(AᵀA) = 10¹⁶ — near machine precision!
Direct SVD preserves small singular value

Algorithm Comparison

Situation	Best Algorithm
Small dense matrix, all σᵢ needed	Golub-Reinsch or D&C
Large dense matrix, few σᵢ needed	Randomized SVD
Large sparse matrix	Lanczos/ARPACK
Streaming data	Incremental SVD

Definition 8.7: Truncated SVD Variants

Hard threshold: Keep σᵢ > τ
Soft threshold: σᵢ → max(σᵢ - τ, 0)
Top-k: Keep k largest σᵢ
Energy: Keep enough σᵢ to capture x% of ||A||_F²

16. Final Synthesis

The Big Picture

The SVD is the most important matrix decomposition because it works for ANY matrix and reveals its complete structure. It provides the optimal low-rank approximation, the pseudoinverse, all matrix norms, and the four fundamental subspaces.

Structure

A = UΣVᵀ decomposes any matrix into rotations and scaling.

Optimization

Truncated SVD gives THE best low-rank approximation. No other method comes close.

Applications

Image compression, Netflix recommendations, Google search, and countless more.

Remark 8.16: Looking Forward

SVD leads naturally to:

Quadratic Forms (LA-8.2): Classification using eigenvalues
Multilinear Algebra (LA-8.3): Tensor decompositions generalize SVD
Numerical Linear Algebra: Iterative methods, preconditioning
Machine Learning: Matrix factorization, deep learning compression

SVD Practice

12

Questions

0

Correct

0%

Accuracy

1

SVD decomposes any

m \times n

matrix

A

as:

Easy

Not attempted

2

Singular values are:

Medium

Not attempted

3

Columns of

U

in SVD are:

Medium

Not attempted

4

Columns of

V

in SVD are:

Medium

Not attempted

5

The rank of

A

equals:

Medium

Not attempted

6

The best rank-

k

approximation to

A

is:

Medium

Not attempted

7

The pseudoinverse

A^+

using SVD is:

Hard

Not attempted

8

SVD exists for:

Easy

Not attempted

9

\|A\|_2

(spectral norm) equals:

Medium

Not attempted

10

\|A\|_F

(Frobenius norm) equals:

Medium

Not attempted

11

In image compression, keeping top

k

singular values:

Easy

Not attempted

12

The condition number

\kappa(A)

is:

Hard

Not attempted

Frequently Asked Questions

What makes SVD so important?

SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.

How does SVD relate to eigendecomposition?

For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.

What's the geometric interpretation?

Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.

How is SVD used in image compression?

An image is a matrix of pixel values. Keep only top k singular values/vectors to get a rank-k approximation. This compresses the image while preserving the most important visual features.

What's the relationship to PCA?

PCA on data matrix X uses SVD of the centered data. Principal components are columns of V (right singular vectors). Explained variance comes from squared singular values.

How do I compute SVD?

Theoretically: find eigenvalues/vectors of AᵀA and AAᵀ. Practically: use iterative algorithms (Golub-Reinsch, divide-and-conquer). Never form AᵀA explicitly—it loses precision!

What's the pseudoinverse good for?

A⁺ solves least squares: x = A⁺b gives min-norm least squares solution. Works for any matrix (not just invertible). A⁺ = VΣ⁺Uᵀ where Σ⁺ inverts nonzero singular values.

Why is low-rank approximation optimal?

Eckart-Young theorem: The rank-k truncated SVD minimizes ||A - B||₂ and ||A - B||_F over all rank-k matrices B. No other rank-k matrix gets closer to A.

What does the condition number tell us?

κ(A) = σ₁/σₙ measures sensitivity to perturbations. Large κ means ill-conditioned: small input changes cause large output changes. Affects numerical stability.

Can singular values be negative?

No! Singular values are always non-negative (they're square roots of eigenvalues of the positive semidefinite matrix AᵀA). By convention, they're ordered σ₁ ≥ σ₂ ≥ ... ≥ 0.