MathIsimple
LA-8.1
Available
Fundamental

Singular Value Decomposition

The Singular Value Decomposition (SVD) is perhaps the most important matrix factorization in applied mathematics. It works for any matrix, reveals its fundamental structure, and provides optimal low-rank approximations used in image compression, recommendation systems, and data analysis.

Learning Objectives
  • State and prove the SVD theorem
  • Understand the geometric interpretation of SVD
  • Compute SVD for simple matrices
  • Use SVD for low-rank approximation
  • Apply SVD to image compression
  • Understand the connection to eigenvalues
  • Compute the pseudoinverse using SVD
  • Apply SVD to data analysis and PCA
Prerequisites
  • Spectral theorem and orthogonal diagonalization (LA-7.5)
  • Eigenvalues and eigenvectors (LA-6.1-6.2)
  • Orthonormal bases (LA-7.2)
  • Matrix rank and null space (LA-4.5)

1. The SVD Theorem

Theorem 8.1: Singular Value Decomposition

Every matrix AMm×n(R)A \in M_{m \times n}(\mathbb{R}) can be factored as:

A=UΣVTA = U \Sigma V^T

where:

  • UMm×mU \in M_{m \times m} is orthogonal (columns are left singular vectors)
  • VMn×nV \in M_{n \times n} is orthogonal (columns are right singular vectors)
  • ΣMm×n\Sigma \in M_{m \times n} is diagonal with singular values σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0
Remark 8.1: Key Properties
  • Singular values are unique; singular vectors are unique up to sign (for distinct σᵢ)
  • Number of nonzero singular values = rank(A)
  • σi=λi(ATA)=λi(AAT)\sigma_i = \sqrt{\lambda_i(A^T A)} = \sqrt{\lambda_i(AA^T)}
Definition 8.1: Singular Values and Vectors
  • Singular values: σi\sigma_i are the diagonal entries of Σ\Sigma
  • Right singular vectors: columns viv_i of VV (eigenvectors of ATAA^T A)
  • Left singular vectors: columns uiu_i of UU (eigenvectors of AATAA^T)
Example 8.1: SVD of 2×2 Matrix

Find SVD of A=(3002)A = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}.

Since A is already diagonal with positive entries:

U=I,Σ=(3002),V=IU = I, \quad \Sigma = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}, \quad V = I

Singular values: σ1=3,σ2=2\sigma_1 = 3, \sigma_2 = 2.

Proof:

Proof of SVD Existence:

1. ATAA^T A is symmetric positive semi-definite, so has eigenvalues λ1λn0\lambda_1 \geq \cdots \geq \lambda_n \geq 0.

2. Let σi=λi\sigma_i = \sqrt{\lambda_i} and viv_i be orthonormal eigenvectors of ATAA^T A.

3. For σi>0\sigma_i > 0, define ui=Avi/σiu_i = Av_i / \sigma_i. These are orthonormal.

4. Extend to orthonormal bases of Rm\mathbb{R}^m and Rn\mathbb{R}^n.

5. Verify AV=UΣAV = U\Sigma, hence A=UΣVTA = U\Sigma V^T.

Example 8.1a: SVD Computation

Find SVD of A=(1100)A = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}:

Step 1: Compute ATA=(1111)A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}

Step 2: Eigenvalues: λ1=2,λ2=0\lambda_1 = 2, \lambda_2 = 0, so σ1=2,σ2=0\sigma_1 = \sqrt{2}, \sigma_2 = 0

Step 3: Right singular vectors: v1=(1,1)T/2,v2=(1,1)T/2v_1 = (1, 1)^T/\sqrt{2}, v_2 = (1, -1)^T/\sqrt{2}

Step 4: Left singular vectors: u1=Av1/σ1=(1,0)T,u2=(0,1)Tu_1 = Av_1/\sigma_1 = (1, 0)^T, u_2 = (0, 1)^T

Theorem 8.1a: Relationship to Eigenvalues

For AMm×nA \in M_{m \times n}:

  • Nonzero eigenvalues of ATAA^T A and AATAA^T are the same (= σi2\sigma_i^2)
  • If AA is square, det(A)=σi|\det(A)| = \prod \sigma_i
  • If AA is symmetric positive semi-definite, singular values = eigenvalues
Remark 8.1a: Full vs Reduced SVD
  • Full SVD: U is m×m, Σ is m×n, V is n×n
  • Reduced/Thin SVD: U is m×r, Σ is r×r, V is n×r (r = rank)
  • Reduced form is more storage-efficient and commonly used in practice
Example 8.1b: Rectangular Matrix SVD

Find SVD of A=(122)A = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix} (3×1 matrix):

ATA=(9)A^T A = (9), so σ1=3\sigma_1 = 3

V=(1)V = (1) (1×1)

u1=A/3=(1/3,2/3,2/3)Tu_1 = A/3 = (1/3, 2/3, 2/3)^T

A=(1/32/32/3)(3)(1)A = \begin{pmatrix} 1/3 \\ 2/3 \\ 2/3 \end{pmatrix} \cdot (3) \cdot (1)

2. Geometric Interpretation

Remark 8.2: SVD as Three Operations

The factorization A=UΣVTA = U \Sigma V^T means multiplying by AA is equivalent to:

  1. Rotate/reflect by VTV^T (align with principal axes)
  2. Scale by Σ\Sigma (stretch/compress along axes)
  3. Rotate/reflect by UU (to final orientation)
Remark 8.3: Image of Unit Sphere

A maps the unit sphere to an ellipsoid. The singular values are the semi-axis lengths, and the singular vectors are the axis directions.

Example 8.3: Geometric Visualization

For A=(2101)A = \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix}:

The unit circle {x² + y² = 1} is mapped to an ellipse with:

  • Major axis length 2σ12\sigma_1 in direction u1u_1
  • Minor axis length 2σ22\sigma_2 in direction u2u_2
Definition 8.2: Matrix Norms via SVD
  • Spectral norm: A2=σ1\|A\|_2 = \sigma_1 (largest singular value)
  • Frobenius norm: AF=σi2\|A\|_F = \sqrt{\sum \sigma_i^2}
  • Nuclear norm: A=σi\|A\|_* = \sum \sigma_i
Remark 8.4: Interpretation of Norms

The spectral norm A2=σ1\|A\|_2 = \sigma_1 is the maximum stretching factor:

A2=maxx=1Ax\|A\|_2 = \max_{\|x\|=1} \|Ax\|

The maximum is achieved when x=v1x = v_1 (first right singular vector).

Definition 8.3: Condition Number

The condition number of AA is:

κ(A)=σ1σr\kappa(A) = \frac{\sigma_1}{\sigma_r}

where σr\sigma_r is the smallest nonzero singular value. Measures sensitivity to perturbations.

Example 8.4: Condition Number

For A=(1000.001)A = \begin{pmatrix} 1 & 0 \\ 0 & 0.001 \end{pmatrix}:

σ1=1,σ2=0.001\sigma_1 = 1, \sigma_2 = 0.001

κ(A)=1000\kappa(A) = 1000 (ill-conditioned)

A 0.1% error in input could cause 100% error in output!

3. Low-Rank Approximation

Theorem 8.2: Eckart-Young Theorem

The best rank-kk approximation to AA (in Frobenius or spectral norm) is:

Ak=i=1kσiuiviTA_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T

The approximation error is:

AAk2=σk+1,AAkF=i=k+1rσi2\|A - A_k\|_2 = \sigma_{k+1}, \quad \|A - A_k\|_F = \sqrt{\sum_{i=k+1}^{r} \sigma_i^2}
Example 8.2: Rank-1 Approximation

For A=(3113)A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix} with σ1=4,σ2=2\sigma_1 = 4, \sigma_2 = 2:

A1=412(11)12(11)=(2222)A_1 = 4 \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}
Proof:

Proof of Eckart-Young (Spectral Norm):

Let BB be any rank-k matrix. We show AB2σk+1\|A - B\|_2 \geq \sigma_{k+1}.

Since dim(null(B)) ≥ n - k and dim(span{v₁,...,v_{k+1}}) = k + 1, there exists nonzero xnull(B)span{v1,...,vk+1}x \in \text{null}(B) \cap \text{span}\{v_1, ..., v_{k+1}\}.

Then Axσk+1x\|Ax\| \geq \sigma_{k+1}\|x\| and Bx=0Bx = 0, so:

AB2(AB)xx=Axxσk+1\|A - B\|_2 \geq \frac{\|(A-B)x\|}{\|x\|} = \frac{\|Ax\|}{\|x\|} \geq \sigma_{k+1}

Equality is achieved by Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T.

Remark 8.5: Optimal Approximation

The Eckart-Young theorem is remarkable:

  • Among ALL rank-k matrices, truncated SVD is the best!
  • No optimization needed—the answer comes directly from SVD
  • Works for both spectral and Frobenius norms
Example 8.2a: Approximation Error

For AA with singular values σ=(10,5,2,1)\sigma = (10, 5, 2, 1):

  • Rank-1 error: AA12=5\|A - A_1\|_2 = 5
  • Rank-2 error: AA22=2\|A - A_2\|_2 = 2
  • Rank-3 error: AA32=1\|A - A_3\|_2 = 1

Frobenius errors: 25+4+1=30\sqrt{25 + 4 + 1} = \sqrt{30}, 5\sqrt{5}, 11

Corollary 8.1: Compression Ratio

For m×n matrix with rank r, storage is O(mn).

Rank-k approximation stores: Uk (m×k) + σ (k) + Vk (n×k) = O((m+n)k).

Compression ratio: (m+n)kmn\frac{(m+n)k}{mn}. For k ≪ min(m,n), this is huge savings!

Definition 8.4: Relative Approximation Error

The relative error in rank-k approximation is:

AAkFAF=i=k+1rσi2i=1rσi2\frac{\|A - A_k\|_F}{\|A\|_F} = \sqrt{\frac{\sum_{i=k+1}^r \sigma_i^2}{\sum_{i=1}^r \sigma_i^2}}

This measures what fraction of "energy" is lost.

Example 8.2b: Choosing Rank

How to choose k? Common strategies:

  • Keep 90% or 99% of total "energy" (sum of σᵢ²)
  • Look for "elbow" in singular value plot
  • Use cross-validation if doing prediction

4. Applications

Image Compression

Keep top k singular values to compress images. Rank-50 often visually indistinguishable from original.

Recommendation Systems

Matrix factorization for Netflix/Amazon. Low-rank structure captures user-item preferences.

Principal Component Analysis

SVD of centered data gives principal components. Dimensionality reduction preserving variance.

Pseudoinverse

A+=VΣ+UTA^+ = V\Sigma^+ U^T. Solves least squares for any matrix.

Definition 8.5: Moore-Penrose Pseudoinverse

The pseudoinverse A+A^+ is defined via SVD:

A+=VΣ+UTA^+ = V \Sigma^+ U^T

where Σ+\Sigma^+ has diagonal entries 1/σi1/\sigma_i (for nonzero σᵢ) and 0 otherwise.

Theorem 8.3: Pseudoinverse Properties

The pseudoinverse A+A^+ satisfies:

  1. AA+A=AAA^+A = A
  2. A+AA+=A+A^+AA^+ = A^+
  3. (AA+)T=AA+(AA^+)^T = AA^+
  4. (A+A)T=A+A(A^+A)^T = A^+A
Example 8.5: Computing Pseudoinverse

For A=(1000)A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}:

SVD: U=V=I,Σ=(1000)U = V = I, \Sigma = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}

Σ+=(1000)\Sigma^+ = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, so A+=AA^+ = A

Remark 8.6: Least Squares via Pseudoinverse

For Ax=bAx = b (possibly overdetermined or rank-deficient):

x+=A+bx^+ = A^+ b

This gives the minimum-norm least squares solution.

Example 8.6: Image Compression

A 512×512 grayscale image has 262,144 values.

With rank-50 SVD approximation: 50×(512+1+512) = 51,250 values.

Compression ratio: 5:1 while retaining main features!

Example 8.7: Latent Semantic Analysis

In text analysis, term-document matrix A has:

  • Rows = words, columns = documents
  • SVD reveals latent "topics"
  • Similar documents have similar V columns
  • Related words have similar U columns
Example 8.8: Principal Component Analysis

Given centered data matrix XX (n samples × p features):

  1. Compute SVD: X=UΣVTX = U \Sigma V^T
  2. Principal components: columns of VV
  3. Projected data: XVkXV_k (first k components)
  4. Variance explained: σi2/(n1)\sigma_i^2 / (n-1)

5. Computing SVD

Remark 8.7: Never Form AᵀA!

Theoretical approach: eigenvalues of ATAA^T A give σi2\sigma_i^2.

DON'T DO THIS! It squares condition number and loses precision.

SVD Algorithms
AlgorithmComplexityBest For
Golub-ReinschO(mn2)O(mn^2)Dense matrices
Divide-and-conquerO(mn2)O(mn^2)Fast for all singular values
Randomized SVDO(mnlogk)O(mn \log k)Low-rank approximation
Lanczos/ArnoldiO(mnk)O(mn k)Sparse, few singular values
Example 8.9: Randomized SVD

For large matrices when only top k singular values needed:

  1. Random projection: Y=AΩY = A\Omega where Ω\Omega is n×(k+p) random
  2. Orthogonalize: Y=QRY = QR
  3. Project: B=QTAB = Q^T A
  4. SVD of small matrix: B=U~ΣVTB = \tilde{U}\Sigma V^T
  5. Recover: U=QU~U = Q\tilde{U}

6. Common Mistakes

Confusing SVD with eigendecomposition

SVD uses two different orthogonal matrices (U, V). Eigendecomposition uses one (and only works for square matrices).

Computing AᵀA explicitly

Never form AᵀA to compute SVD—it squares the condition number and loses precision. Use direct SVD algorithms.

Forgetting Σ is rectangular

For m×n matrix A, Σ is m×n (not square). Only min(m,n) singular values exist.

Wrong order: UΣVᵀ vs VΣUᵀ

The correct order is A = UΣVᵀ. Remember: U is m×m, V is n×n, and the transpose is on V.

Thinking singular values can be negative

Singular values are always non-negative. They come from √(eigenvalues of AᵀA), which are non-negative.

Remark 8.8: Debugging SVD

Verify your SVD by checking:

  • UᵀU = I and VᵀV = I (orthogonality)
  • A = UΣVᵀ (reconstruction)
  • Singular values are non-negative and sorted
  • rank(A) = number of nonzero σᵢ

7. Key Formulas Summary

SVD

  • A=UΣVTA = U \Sigma V^T
  • σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^T A)}
  • • rank(A) = # nonzero σᵢ
  • • |det(A)| = ∏σᵢ (square matrices)

Applications

  • Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T
  • A+=VΣ+UTA^+ = V \Sigma^+ U^T
  • A2=σ1,AF=σi2\|A\|_2 = \sigma_1, \|A\|_F = \sqrt{\sum \sigma_i^2}
  • • κ(A) = σ₁/σᵣ

SVD Computation Algorithm

  1. Input: Matrix A (m×n)
  2. Eigendecomposition: AᵀA = VDVᵀ (eigenvalues λᵢ, eigenvectors vᵢ)
  3. Singular values: σᵢ = √λᵢ, sorted descending
  4. Left singular vectors: uᵢ = Avᵢ/σᵢ for nonzero σᵢ
  5. Extend: Complete U to orthonormal basis if needed
  6. Output: A = UΣVᵀ

8. More Worked Examples

Example 8.10: Complete 2×3 SVD

Find SVD of A=(101011)A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix}:

Step 1: Compute AᵀA:

ATA=(101011112)A^T A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 2 \end{pmatrix}

Step 2: Eigenvalues: λ = 3, 1, 0. So σ = √3, 1, 0.

Step 3: Find eigenvectors of AᵀA (V) and compute U = AV/σ.

Example 8.11: Rank Determination

For matrix with SVD showing σ = (10, 5, 2, 0.0001, 0):

  • Exact rank: 4 (four nonzero singular values)
  • Numerical rank at tolerance 0.01: 3
  • Condition number: 10/0.0001 = 100,000 (ill-conditioned!)
Example 8.12: Least Squares via SVD

Solve Ax=bAx = b where A=(110110)A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix} and b=(2,1,1)Tb = (2, 1, 1)^T:

Step 1: Compute SVD of A

Step 2: x=A+b=VΣ+UTbx = A^+ b = V \Sigma^+ U^T b

This gives minimum-norm least squares solution.

Example 8.13: Matrix Approximation Error

If A has singular values 100, 50, 10, 5, 1:

  • ||A||₂ = 100, ||A||_F = √(10000+2500+100+25+1) ≈ 112.3
  • Rank-2 error: ||A-A₂||₂ = 10, ||A-A₂||_F = √(126) ≈ 11.2
  • Energy in top 2 components: (100²+50²)/(112.3²) ≈ 99%

9. Connections and Extensions

SVD in the Linear Algebra Ecosystem

Spectral Theorem (LA-7.5)

SVD generalizes spectral theorem. For symmetric A, SVD and eigendecomposition coincide.

Polar Decomposition

A = UP where U is unitary and P = √(AᵀA). SVD gives this decomposition.

Four Fundamental Subspaces

SVD reveals all four: col(A), null(Aᵀ), row(A), null(A) via U and V columns.

Projections (LA-7.4)

AAᵀ and AᵀA projection matrices come from SVD. UUᵀ projects onto col(A).

Theorem 8.4: Polar Decomposition

Any matrix A = UP where:

  • UU is orthogonal (or unitary)
  • P=ATAP = \sqrt{A^T A} is positive semi-definite

From SVD: Upolar=UVTU_{polar} = UV^T and P=VΣVTP = V\Sigma V^T.

Remark 8.9: Four Fundamental Subspaces via SVD

For A=UΣVTA = U\Sigma V^T with rank r:

  • col(A)=span{u1,...,ur}\text{col}(A) = \text{span}\{u_1, ..., u_r\}
  • null(AT)=span{ur+1,...,um}\text{null}(A^T) = \text{span}\{u_{r+1}, ..., u_m\}
  • row(A)=span{v1,...,vr}\text{row}(A) = \text{span}\{v_1, ..., v_r\}
  • null(A)=span{vr+1,...,vn}\text{null}(A) = \text{span}\{v_{r+1}, ..., v_n\}

10. Chapter Summary

Key Takeaways

SVD Theorem
  • • A = UΣVᵀ for ANY matrix
  • • U, V orthogonal; Σ diagonal
  • • Singular values ≥ 0
  • • Unique (up to signs for distinct σᵢ)
Key Results
  • • Eckart-Young: optimal low-rank
  • • Pseudoinverse: A⁺ = VΣ⁺Uᵀ
  • • Norms: ||A||₂ = σ₁
  • • rank(A) = # nonzero σᵢ
Applications
  • • Image compression
  • • Recommender systems
  • • PCA / dimensionality reduction
  • • Least squares / regression
Computation
  • • Never form AᵀA!
  • • Use Golub-Reinsch or similar
  • • Randomized SVD for large matrices
  • • O(mn²) for dense matrices
Remark 8.10: Mastery Checklist

You've mastered SVD when you can:

  • ✓ State the SVD theorem and explain the components
  • ✓ Compute SVD for simple matrices by hand
  • ✓ Explain the geometric interpretation
  • ✓ Apply Eckart-Young for low-rank approximation
  • ✓ Compute pseudoinverse using SVD
  • ✓ Connect SVD to eigendecomposition
  • ✓ Use SVD for applications (PCA, compression, etc.)

Pro Tips for Exams

  • • Remember: A = UΣVᵀ, transpose is on V not U
  • • σᵢ = √(eigenvalue of AᵀA), always non-negative
  • • For symmetric A: SVD = spectral decomposition
  • • Low-rank approx: keep largest σᵢ terms
  • • Pseudoinverse: invert nonzero σᵢ, leave zeros as zeros
  • • Condition number: κ = σ₁/σᵣ (large = unstable)

11. Additional Theory

Theorem 8.5: Weyl's Inequality

For matrices A and E (perturbation):

σi(A+E)σi(A)E2|\sigma_i(A + E) - \sigma_i(A)| \leq \|E\|_2

Singular values are stable under small perturbations.

Remark 8.11: Perturbation Theory

SVD is numerically well-behaved:

  • Singular values change continuously with A
  • Singular vectors stable when σᵢ well-separated
  • Algorithms achieve backward stability
Definition 8.6: Truncated SVD

The rank-k truncated SVD retains only the k largest singular values:

Ak=UkΣkVkTA_k = U_k \Sigma_k V_k^T

where UkU_k is m×k, Σk\Sigma_k is k×k, VkV_k is n×k.

Example 8.14: Netflix Prize

The Netflix Prize used matrix factorization (related to SVD):

  • User-movie rating matrix R (sparse, 100M ratings)
  • Factor as R ≈ UV ᵀ (low-rank approximation)
  • Predict missing ratings: r_ij ≈ u_i · v_j
  • Winning team used modified SVD with regularization
Theorem 8.6: Relationship to Norms

The singular values determine all unitarily invariant norms:

  • Spectral: A2=σ1\|A\|_2 = \sigma_1
  • Frobenius: AF=σi2\|A\|_F = \sqrt{\sum \sigma_i^2}
  • Nuclear: A=σi\|A\|_* = \sum \sigma_i
  • Schatten-p: Ap=(σip)1/p\|A\|_p = (\sum \sigma_i^p)^{1/p}
Example 8.15: Data Denoising

SVD for noise reduction:

  1. Noisy data: A = signal + noise
  2. Compute SVD of A
  3. Small σᵢ likely correspond to noise
  4. Truncate to keep only large σᵢ
  5. Reconstructed A_k has reduced noise

12. Quick Reference

Key Definitions

  • Singular values: σᵢ = √λᵢ(AᵀA)
  • Left singular vectors: columns of U
  • Right singular vectors: columns of V
  • Pseudoinverse: A⁺ = VΣ⁺Uᵀ

Key Properties

  • • σᵢ ≥ 0, sorted descending
  • • rank(A) = # nonzero σᵢ
  • • |det(A)| = ∏σᵢ
  • • κ(A) = σ₁/σᵣ

Norms

  • • ||A||₂ = σ₁
  • • ||A||_F = √(Σσᵢ²)
  • • ||A||_* = Σσᵢ

Low-Rank Approximation

  • • A_k = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ
  • • ||A - A_k||₂ = σ_{k+1}
  • • ||A - A_k||_F = √(Σᵢ₌ₖ₊₁ σᵢ²)

Notation Summary

A = UΣVᵀSVD factorization
σᵢi-th singular value
uᵢ, vᵢLeft/right singular vectors
A⁺Moore-Penrose pseudoinverse
A_kRank-k truncated SVD
κ(A)Condition number
Remark 8.12: Historical Note

The SVD was first developed by Eugenio Beltrami (1873) and Camille Jordan (1874) for bilinear forms. The modern computational algorithms were developed by Gene Golub and others in the 1960s. Today, SVD is computed billions of times daily in applications from Google searches to Netflix recommendations.

13. Additional Practice

Example 8.16: Practice Problem 1

Find the SVD of A=(1111)A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}:

Solution outline:

  1. AᵀA = diag(2, 2), so σ₁ = σ₂ = √2
  2. A is already orthogonal (up to scaling)
  3. U = A/√2, Σ = √2 I, V = I
Example 8.17: Practice Problem 2

For A=(200300)A = \begin{pmatrix} 2 & 0 \\ 0 & 3 \\ 0 & 0 \end{pmatrix}, find:

  • (a) Singular values
  • (b) SVD factorization
  • (c) Pseudoinverse A⁺
  • (d) rank(A), ||A||₂, ||A||_F
Example 8.18: Practice Problem 3

Show that for any matrix A:

tr(ATA)=iσi2=AF2\text{tr}(A^T A) = \sum_i \sigma_i^2 = \|A\|_F^2
Remark 8.13: Practice Strategy

To master SVD:

  • Practice computing SVD of 2×2 and 2×3 matrices
  • Understand each component geometrically
  • Connect to applications (image compression, PCA)
  • Compare with eigendecomposition for symmetric matrices

14. Advanced Applications

Example 8.19: Recommendation Systems

In collaborative filtering for recommendation:

  1. User-item rating matrix R (mostly missing entries)
  2. Factor R ≈ UVᵀ (implicitly via SVD)
  3. User i's preferences: row i of U
  4. Item j's features: row j of V
  5. Predicted rating: rijuivjr_{ij} \approx u_i \cdot v_j
Example 8.20: Face Recognition (Eigenfaces)

Classic face recognition using SVD:

  1. Flatten images to vectors, form data matrix X
  2. Center: subtract mean face
  3. Compute SVD: X = UΣVᵀ
  4. "Eigenfaces" are columns of U (principal components)
  5. Project new face onto eigenfaces for recognition
Example 8.21: Document Similarity (LSI)

Latent Semantic Indexing for document search:

  1. Term-document matrix A (rows=words, cols=docs)
  2. Compute truncated SVD: A ≈ UₖΣₖVₖᵀ
  3. Documents in "concept space": columns of ΣₖVₖᵀ
  4. Query: project to concept space, find nearest docs
  5. SVD captures semantic meaning beyond keyword matching
Remark 8.14: SVD in Machine Learning

SVD is foundational to many ML algorithms:

  • PCA: Dimensionality reduction via SVD
  • Matrix completion: Netflix, collaborative filtering
  • Robust PCA: Separate low-rank + sparse components
  • CUR decomposition: Interpretable low-rank approx
  • Word embeddings: SVD of co-occurrence matrix
Example 8.22: Signal Processing

SVD for signal separation:

  • Hankel matrix from time series
  • Large σᵢ correspond to dominant frequencies
  • Small σᵢ correspond to noise
  • Truncated SVD extracts clean signal

15. Computational Details

Remark 8.15: Numerical Stability

Why SVD algorithms avoid forming AᵀA:

  • κ(AᵀA) = κ(A)² — condition number squares!
  • Small singular values may be lost to roundoff
  • Direct algorithms maintain better precision
Example 8.23: Numerical Example

For AA with σ = (1, 10⁻⁸):

  • κ(A) = 10⁸
  • AᵀA has eigenvalues (1, 10⁻¹⁶)
  • κ(AᵀA) = 10¹⁶ — near machine precision!
  • Direct SVD preserves small singular value
Algorithm Comparison
SituationBest Algorithm
Small dense matrix, all σᵢ neededGolub-Reinsch or D&C
Large dense matrix, few σᵢ neededRandomized SVD
Large sparse matrixLanczos/ARPACK
Streaming dataIncremental SVD
Definition 8.7: Truncated SVD Variants
  • Hard threshold: Keep σᵢ > τ
  • Soft threshold: σᵢ → max(σᵢ - τ, 0)
  • Top-k: Keep k largest σᵢ
  • Energy: Keep enough σᵢ to capture x% of ||A||_F²

16. Final Synthesis

The Big Picture

The SVD is the most important matrix decomposition because it works for ANY matrix and reveals its complete structure. It provides the optimal low-rank approximation, the pseudoinverse, all matrix norms, and the four fundamental subspaces.

Structure

A = UΣVᵀ decomposes any matrix into rotations and scaling.

Optimization

Truncated SVD gives THE best low-rank approximation. No other method comes close.

Applications

Image compression, Netflix recommendations, Google search, and countless more.

Remark 8.16: Looking Forward

SVD leads naturally to:

  • Quadratic Forms (LA-8.2): Classification using eigenvalues
  • Multilinear Algebra (LA-8.3): Tensor decompositions generalize SVD
  • Numerical Linear Algebra: Iterative methods, preconditioning
  • Machine Learning: Matrix factorization, deep learning compression
SVD Practice
12
Questions
0
Correct
0%
Accuracy
1
SVD decomposes any m×nm \times n matrix AA as:
Easy
Not attempted
2
Singular values are:
Medium
Not attempted
3
Columns of UU in SVD are:
Medium
Not attempted
4
Columns of VV in SVD are:
Medium
Not attempted
5
The rank of AA equals:
Medium
Not attempted
6
The best rank-kk approximation to AA is:
Medium
Not attempted
7
The pseudoinverse A+A^+ using SVD is:
Hard
Not attempted
8
SVD exists for:
Easy
Not attempted
9
A2\|A\|_2 (spectral norm) equals:
Medium
Not attempted
10
AF\|A\|_F (Frobenius norm) equals:
Medium
Not attempted
11
In image compression, keeping top kk singular values:
Easy
Not attempted
12
The condition number κ(A)\kappa(A) is:
Hard
Not attempted

Frequently Asked Questions

What makes SVD so important?

SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.

How does SVD relate to eigendecomposition?

For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.

What's the geometric interpretation?

Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.

How is SVD used in image compression?

An image is a matrix of pixel values. Keep only top k singular values/vectors to get a rank-k approximation. This compresses the image while preserving the most important visual features.

What's the relationship to PCA?

PCA on data matrix X uses SVD of the centered data. Principal components are columns of V (right singular vectors). Explained variance comes from squared singular values.

How do I compute SVD?

Theoretically: find eigenvalues/vectors of AᵀA and AAᵀ. Practically: use iterative algorithms (Golub-Reinsch, divide-and-conquer). Never form AᵀA explicitly—it loses precision!

What's the pseudoinverse good for?

A⁺ solves least squares: x = A⁺b gives min-norm least squares solution. Works for any matrix (not just invertible). A⁺ = VΣ⁺Uᵀ where Σ⁺ inverts nonzero singular values.

Why is low-rank approximation optimal?

Eckart-Young theorem: The rank-k truncated SVD minimizes ||A - B||₂ and ||A - B||_F over all rank-k matrices B. No other rank-k matrix gets closer to A.

What does the condition number tell us?

κ(A) = σ₁/σₙ measures sensitivity to perturbations. Large κ means ill-conditioned: small input changes cause large output changes. Affects numerical stability.

Can singular values be negative?

No! Singular values are always non-negative (they're square roots of eigenvalues of the positive semidefinite matrix AᵀA). By convention, they're ordered σ₁ ≥ σ₂ ≥ ... ≥ 0.