The Singular Value Decomposition (SVD) is perhaps the most important matrix factorization in applied mathematics. It works for any matrix, reveals its fundamental structure, and provides optimal low-rank approximations used in image compression, recommendation systems, and data analysis.
Every matrix can be factored as:
where:
Find SVD of .
Since A is already diagonal with positive entries:
Singular values: .
Proof of SVD Existence:
1. is symmetric positive semi-definite, so has eigenvalues .
2. Let and be orthonormal eigenvectors of .
3. For , define . These are orthonormal.
4. Extend to orthonormal bases of and .
5. Verify , hence .
Find SVD of :
Step 1: Compute
Step 2: Eigenvalues: , so
Step 3: Right singular vectors:
Step 4: Left singular vectors:
For :
Find SVD of (3×1 matrix):
, so
(1×1)
The factorization means multiplying by is equivalent to:
A maps the unit sphere to an ellipsoid. The singular values are the semi-axis lengths, and the singular vectors are the axis directions.
For :
The unit circle {x² + y² = 1} is mapped to an ellipse with:
The spectral norm is the maximum stretching factor:
The maximum is achieved when (first right singular vector).
The condition number of is:
where is the smallest nonzero singular value. Measures sensitivity to perturbations.
For :
(ill-conditioned)
A 0.1% error in input could cause 100% error in output!
The best rank- approximation to (in Frobenius or spectral norm) is:
The approximation error is:
For with :
Proof of Eckart-Young (Spectral Norm):
Let be any rank-k matrix. We show .
Since dim(null(B)) ≥ n - k and dim(span{v₁,...,v_{k+1}}) = k + 1, there exists nonzero .
Then and , so:
Equality is achieved by .
The Eckart-Young theorem is remarkable:
For with singular values :
Frobenius errors: , ,
For m×n matrix with rank r, storage is O(mn).
Rank-k approximation stores: Uk (m×k) + σ (k) + Vk (n×k) = O((m+n)k).
Compression ratio: . For k ≪ min(m,n), this is huge savings!
The relative error in rank-k approximation is:
This measures what fraction of "energy" is lost.
How to choose k? Common strategies:
Keep top k singular values to compress images. Rank-50 often visually indistinguishable from original.
Matrix factorization for Netflix/Amazon. Low-rank structure captures user-item preferences.
SVD of centered data gives principal components. Dimensionality reduction preserving variance.
. Solves least squares for any matrix.
The pseudoinverse is defined via SVD:
where has diagonal entries (for nonzero σᵢ) and 0 otherwise.
The pseudoinverse satisfies:
For :
SVD:
, so
For (possibly overdetermined or rank-deficient):
This gives the minimum-norm least squares solution.
A 512×512 grayscale image has 262,144 values.
With rank-50 SVD approximation: 50×(512+1+512) = 51,250 values.
Compression ratio: 5:1 while retaining main features!
In text analysis, term-document matrix A has:
Given centered data matrix (n samples × p features):
Theoretical approach: eigenvalues of give .
DON'T DO THIS! It squares condition number and loses precision.
| Algorithm | Complexity | Best For |
|---|---|---|
| Golub-Reinsch | Dense matrices | |
| Divide-and-conquer | Fast for all singular values | |
| Randomized SVD | Low-rank approximation | |
| Lanczos/Arnoldi | Sparse, few singular values |
For large matrices when only top k singular values needed:
SVD uses two different orthogonal matrices (U, V). Eigendecomposition uses one (and only works for square matrices).
Never form AᵀA to compute SVD—it squares the condition number and loses precision. Use direct SVD algorithms.
For m×n matrix A, Σ is m×n (not square). Only min(m,n) singular values exist.
The correct order is A = UΣVᵀ. Remember: U is m×m, V is n×n, and the transpose is on V.
Singular values are always non-negative. They come from √(eigenvalues of AᵀA), which are non-negative.
Verify your SVD by checking:
Find SVD of :
Step 1: Compute AᵀA:
Step 2: Eigenvalues: λ = 3, 1, 0. So σ = √3, 1, 0.
Step 3: Find eigenvectors of AᵀA (V) and compute U = AV/σ.
For matrix with SVD showing σ = (10, 5, 2, 0.0001, 0):
Solve where and :
Step 1: Compute SVD of A
Step 2:
This gives minimum-norm least squares solution.
If A has singular values 100, 50, 10, 5, 1:
SVD generalizes spectral theorem. For symmetric A, SVD and eigendecomposition coincide.
A = UP where U is unitary and P = √(AᵀA). SVD gives this decomposition.
SVD reveals all four: col(A), null(Aᵀ), row(A), null(A) via U and V columns.
AAᵀ and AᵀA projection matrices come from SVD. UUᵀ projects onto col(A).
Any matrix A = UP where:
From SVD: and .
For with rank r:
You've mastered SVD when you can:
For matrices A and E (perturbation):
Singular values are stable under small perturbations.
SVD is numerically well-behaved:
The rank-k truncated SVD retains only the k largest singular values:
where is m×k, is k×k, is n×k.
The Netflix Prize used matrix factorization (related to SVD):
The singular values determine all unitarily invariant norms:
SVD for noise reduction:
| A = UΣVᵀ | SVD factorization |
| σᵢ | i-th singular value |
| uᵢ, vᵢ | Left/right singular vectors |
| A⁺ | Moore-Penrose pseudoinverse |
| A_k | Rank-k truncated SVD |
| κ(A) | Condition number |
The SVD was first developed by Eugenio Beltrami (1873) and Camille Jordan (1874) for bilinear forms. The modern computational algorithms were developed by Gene Golub and others in the 1960s. Today, SVD is computed billions of times daily in applications from Google searches to Netflix recommendations.
Find the SVD of :
Solution outline:
For , find:
Show that for any matrix A:
To master SVD:
In collaborative filtering for recommendation:
Classic face recognition using SVD:
Latent Semantic Indexing for document search:
SVD is foundational to many ML algorithms:
SVD for signal separation:
Why SVD algorithms avoid forming AᵀA:
For with σ = (1, 10⁻⁸):
| Situation | Best Algorithm |
|---|---|
| Small dense matrix, all σᵢ needed | Golub-Reinsch or D&C |
| Large dense matrix, few σᵢ needed | Randomized SVD |
| Large sparse matrix | Lanczos/ARPACK |
| Streaming data | Incremental SVD |
The SVD is the most important matrix decomposition because it works for ANY matrix and reveals its complete structure. It provides the optimal low-rank approximation, the pseudoinverse, all matrix norms, and the four fundamental subspaces.
A = UΣVᵀ decomposes any matrix into rotations and scaling.
Truncated SVD gives THE best low-rank approximation. No other method comes close.
Image compression, Netflix recommendations, Google search, and countless more.
SVD leads naturally to:
SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.
For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.
Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.
An image is a matrix of pixel values. Keep only top k singular values/vectors to get a rank-k approximation. This compresses the image while preserving the most important visual features.
PCA on data matrix X uses SVD of the centered data. Principal components are columns of V (right singular vectors). Explained variance comes from squared singular values.
Theoretically: find eigenvalues/vectors of AᵀA and AAᵀ. Practically: use iterative algorithms (Golub-Reinsch, divide-and-conquer). Never form AᵀA explicitly—it loses precision!
A⁺ solves least squares: x = A⁺b gives min-norm least squares solution. Works for any matrix (not just invertible). A⁺ = VΣ⁺Uᵀ where Σ⁺ inverts nonzero singular values.
Eckart-Young theorem: The rank-k truncated SVD minimizes ||A - B||₂ and ||A - B||_F over all rank-k matrices B. No other rank-k matrix gets closer to A.
κ(A) = σ₁/σₙ measures sensitivity to perturbations. Large κ means ill-conditioned: small input changes cause large output changes. Affects numerical stability.
No! Singular values are always non-negative (they're square roots of eigenvalues of the positive semidefinite matrix AᵀA). By convention, they're ordered σ₁ ≥ σ₂ ≥ ... ≥ 0.