MathIsimple
LA-C11
Available

Course 11: Advanced Topics Part 1

SVD & Quadratic Forms

This course covers two powerful advanced topics: the Singular Value Decomposition (SVD), which works for any matrix and provides optimal low-rank approximations, and quadratic forms, which classify the behavior of symmetric matrices and are fundamental to optimization.

12-15 hours Advanced Level 10 Objectives
Learning Objectives
  • State and prove the SVD theorem for any matrix.
  • Understand the geometric interpretation of SVD.
  • Apply SVD to low-rank approximation and image compression.
  • Compute the pseudoinverse using SVD.
  • Define quadratic forms and their matrix representation.
  • Classify quadratic forms (positive/negative definite, semidefinite, indefinite).
  • Apply Sylvester's criterion for positive definiteness.
  • Understand the law of inertia and signature.
  • Apply quadratic forms to optimization problems.
  • Connect SVD and quadratic forms to eigenvalues and the spectral theorem.
Prerequisites
  • LA-C10: Inner Product Spaces
  • Spectral theorem and orthogonal diagonalization
  • Eigenvalues and eigenvectors
  • Matrix rank and null space
  • Determinants
Historical Context

The Singular Value Decomposition was discovered independently by Eugenio Beltrami (1835–1900) and Camille Jordan (1838–1922) in the 1870s, with James Joseph Sylvester (1814–1897) also contributing. The modern computational SVD algorithm was developed by Gene Golub and William Kahan in the 1960s. Quadratic forms have been studied since ancient times, with Carl Friedrich Gauss (1777–1855) developing the theory of binary quadratic forms. James Joseph Sylvesterproved the law of inertia in 1852, and Carl Gustav Jacob Jacobi (1804–1851) developed methods for diagonalizing quadratic forms.

1. Singular Value Decomposition (SVD)

The Singular Value Decomposition is one of the most important matrix factorizations. Unlike eigendecomposition, it exists for any matrix, regardless of shape or rank.

Theorem 1.1: SVD Theorem

For any m×nm \times n matrix AA, there exist:

  • Orthogonal m×mm \times m matrix UU (left singular vectors)
  • Diagonal m×nm \times n matrix Σ\Sigma with non-negative entries σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0 (singular values)
  • Orthogonal n×nn \times n matrix VV (right singular vectors)

such that A=UΣVTA = U \Sigma V^T.

Definition 1.1: Singular Values

The singular values of AA are σi=λi\sigma_i = \sqrt{\lambda_i} where λi\lambda_i are eigenvalues of ATAA^T A (or AATAA^T).

Theorem 1.2: Rank and SVD

The rank of AA equals the number of nonzero singular values.

Theorem 1.3: Eckart-Young Theorem

The best rank-kk approximation to AA (in Frobenius or spectral norm) is:

Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T

where ui,viu_i, v_i are columns of U,VU, V.

Definition 1.2: Pseudoinverse

The Moore-Penrose pseudoinverse is A+=VΣ+UTA^+ = V \Sigma^+ U^T where Σii+=1/σi\Sigma^+_{ii} = 1/\sigma_i if σi>0\sigma_i > 0, else 00.

Example 1.1: SVD for Image Compression

An image is a matrix of pixel values. SVD gives A=UΣVTA = U \Sigma V^T. Keeping only top kk singular values gives a rank-kk approximation that preserves most visual information while reducing storage.

Remark 1.1: Connection to PCA

Principal Component Analysis (PCA) is essentially SVD of the centered data matrix. Principal components are right singular vectors, and explained variance comes from squared singular values.

Theorem 1.4: Computing SVD

To compute SVD of AMm×n(R)A \in M_{m \times n}(\mathbb{R}):

  1. Compute ATAA^T A (or AATAA^T if m<nm < n)
  2. Find eigenvalues λi\lambda_i and orthonormal eigenvectors viv_i of ATAA^T A
  3. Singular values: σi=λi\sigma_i = \sqrt{\lambda_i}
  4. Right singular vectors: viv_i (columns of VV)
  5. Left singular vectors: ui=1σiAviu_i = \frac{1}{\sigma_i} A v_i for σi>0\sigma_i > 0
Example 1.2: SVD Computation Example

For A=(1101)A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}:

ATA=(1112)A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}

Eigenvalues: λ1=3+52\lambda_1 = \frac{3+\sqrt{5}}{2}, λ2=352\lambda_2 = \frac{3-\sqrt{5}}{2}

Singular values: σ1=λ1\sigma_1 = \sqrt{\lambda_1}, σ2=λ2\sigma_2 = \sqrt{\lambda_2}

Theorem 1.5: Uniqueness of SVD

The singular values are unique (up to ordering). The singular vectors are unique up to sign if the corresponding singular values are distinct.

Corollary 1.1: SVD and Matrix Norms

  • Frobenius norm: AF=σi2\|A\|_F = \sqrt{\sum \sigma_i^2}
  • Spectral norm: A2=σ1\|A\|_2 = \sigma_1 (largest singular value)

2. Quadratic Forms

A quadratic form is a homogeneous polynomial of degree 2. They are fundamental in optimization, geometry, and the classification of symmetric matrices.

Definition 2.1: Quadratic Form

A quadratic form on Rn\mathbb{R}^n is a function Q:RnRQ: \mathbb{R}^n \to \mathbb{R} of the form:

Q(x)=xTAx=i,jaijxixjQ(x) = x^T A x = \sum_{i,j} a_{ij} x_i x_j

where AA is a symmetric n×nn \times n matrix.

Definition 2.2: Classification

A symmetric matrix AA (or quadratic form xTAxx^T A x) is:

  • Positive definite: xTAx>0x^T A x > 0 for all x0x \neq 0
  • Positive semidefinite: xTAx0x^T A x \geq 0 for all xx
  • Negative definite: xTAx<0x^T A x < 0 for all x0x \neq 0
  • Indefinite: takes both positive and negative values
Theorem 2.1: Eigenvalue Characterization

AA is positive definite if and only if all eigenvalues of AA are positive.

Theorem 2.2: Sylvester's Criterion

A symmetric matrix AA is positive definite if and only if all leading principal minors are positive.

That is, the determinants of all upper-left submatrices A1,A2,,AnA_1, A_2, \ldots, A_n are positive.

Definition 2.3: Signature

The signature of a quadratic form is the triple (n+,n,n0)(n_+, n_-, n_0) where:

  • n+n_+ = number of positive eigenvalues
  • nn_- = number of negative eigenvalues
  • n0n_0 = number of zero eigenvalues
Theorem 2.3: Sylvester's Law of Inertia

The signature is invariant under congruence transformations PTAPP^T A P (where PP is invertible).

Example 2.1: Optimization

At a critical point of f:RnRf: \mathbb{R}^n \to \mathbb{R}, the Hessian matrix HH (matrix of second derivatives) determines the nature:

  • HH positive definite → local minimum
  • HH negative definite → local maximum
  • HH indefinite → saddle point
Example 2.2: Conics and Quadrics

The equation xTAx=1x^T A x = 1 describes:

  • Ellipse/ellipsoid if AA is positive definite
  • Hyperbola/hyperboloid if AA is indefinite
  • Parabola/paraboloid if AA is singular
Theorem 2.4: Diagonalization of Quadratic Forms

Any quadratic form Q(x)=xTAxQ(x) = x^T A x can be diagonalized: there exists an orthogonal matrix PP such that:

Q(Py)=yTDy=i=1nλiyi2Q(Py) = y^T D y = \sum_{i=1}^n \lambda_i y_i^2

where DD is diagonal with eigenvalues of AA.

Proof:

Since AA is symmetric, it's orthogonally diagonalizable: A=PDPTA = PDP^T with PP orthogonal and DD diagonal.

Substitute x=Pyx = Py: Q(Py)=(Py)TA(Py)=yTPTAPy=yTDyQ(Py) = (Py)^T A (Py) = y^T P^T A P y = y^T D y.

Example 2.3: Classifying a Quadratic Form

For Q(x,y)=2x2+4xy+2y2Q(x, y) = 2x^2 + 4xy + 2y^2, the matrix is A=(2222)A = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}.

Eigenvalues: λ1=4\lambda_1 = 4, λ2=0\lambda_2 = 0. So AA is positive semidefinite (not positive definite).

3. SVD Applications

SVD has numerous practical applications in data science, signal processing, and machine learning. Understanding these applications demonstrates the power of this decomposition.

Definition 3.1: Low-Rank Approximation

The best rank-kk approximation of AA (in Frobenius norm) is:

Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T

This minimizes ABF\|A - B\|_F over all rank-kk matrices BB.

Theorem 3.1: Eckart-Young-Mirsky Theorem

The error in the rank-kk approximation is:

AAkF=i=k+1rσi2\|A - A_k\|_F = \sqrt{\sum_{i=k+1}^r \sigma_i^2}

where rr is the rank of AA. The relative error is i=k+1rσi2/i=1rσi2\sqrt{\sum_{i=k+1}^r \sigma_i^2} / \sqrt{\sum_{i=1}^r \sigma_i^2}.

Example 3.1: Image Compression

A grayscale image is an m×nm \times n matrix. SVD gives A=UΣVTA = U \Sigma V^T.

Keeping only top kk singular values reduces storage from mnmn to k(m+n+1)k(m + n + 1) numbers.

For 1000×10001000 \times 1000 image with k=50k = 50: compression ratio ≈ 10002/(502001)101000^2 / (50 \cdot 2001) \approx 10.

Definition 3.2: Principal Component Analysis (PCA)

Given data matrix XMm×n(R)X \in M_{m \times n}(\mathbb{R}) (each row is a data point), center the data and compute SVD of the centered matrix.

The principal components are the right singular vectors (columns of VV), and the explained variance is proportional to σi2\sigma_i^2.

Example 3.2: Dimensionality Reduction

Project data onto the first kk principal components:

Xk=XVkX_k = X V_k

where VkV_k contains the first kk columns of VV. This reduces dimension from nn to kk while preserving maximum variance.

Theorem 3.2: Pseudoinverse and Least Squares

For the least squares problem minxAxb2\min_x \|Ax - b\|^2, the solution with minimum norm is:

x=A+bx = A^+ b

where A+A^+ is the Moore-Penrose pseudoinverse computed via SVD.

Example 3.3: Recommender Systems

In collaborative filtering, the user-item matrix AA is approximated by a low-rank matrix via SVD. This captures latent factors (e.g., movie genres) and enables recommendations.

Remark 3.1: Numerical Stability

SVD is numerically stable and is the preferred method for computing matrix rank, solving least squares problems, and computing pseudoinverses in practice.

4. Quadratic Forms Applications

Quadratic forms appear naturally in optimization, geometry, and physics. Understanding their applications connects linear algebra to calculus, optimization theory, and differential geometry.

Theorem 4.1: Second Derivative Test

For a function f:RnRf: \mathbb{R}^n \to \mathbb{R} with critical point at x0x_0, let HH be the Hessian matrix (matrix of second partial derivatives).

  • If HH is positive definite → x0x_0 is a local minimum
  • If HH is negative definite → x0x_0 is a local maximum
  • If HH is indefinite → x0x_0 is a saddle point
  • If HH is semidefinite → test requires higher derivatives
Example 4.1: Optimization Example

For f(x,y)=x2+2xy+3y2f(x, y) = x^2 + 2xy + 3y^2, the Hessian is:

H=(2226)H = \begin{pmatrix} 2 & 2 \\ 2 & 6 \end{pmatrix}

Eigenvalues: 4+224 + 2\sqrt{2} and 4224 - 2\sqrt{2}, both positive. So HH is positive definite, and (0,0)(0, 0) is a local minimum.

Definition 4.1: Conic Sections

A conic section in R2\mathbb{R}^2 is given by:

ax2+bxy+cy2+dx+ey+f=0ax^2 + bxy + cy^2 + dx + ey + f = 0

In matrix form: xTAx+bTx+c=0x^T A x + b^T x + c = 0 where AA is the 2×22 \times 2 symmetric matrix of quadratic terms.

Theorem 4.2: Classification of Conics

The type of conic depends on the eigenvalues of AA:

  • Both positive: ellipse
  • Both negative: ellipse (after sign change)
  • One positive, one negative: hyperbola
  • One zero: parabola
  • Both zero: degenerate (line or pair of lines)
Example 4.2: Quadric Surfaces

In R3\mathbb{R}^3, the equation xTAx=1x^T A x = 1 with AA positive definite gives an ellipsoid. The axes are along the eigenvectors of AA, with lengths 1/λi1/\sqrt{\lambda_i}.

Theorem 4.3: Rayleigh Quotient

For a symmetric matrix AA:

λminxTAxxTxλmax\lambda_{\min} \leq \frac{x^T A x}{x^T x} \leq \lambda_{\max}

The minimum is achieved when xx is the eigenvector corresponding to λmin\lambda_{\min}, and maximum for λmax\lambda_{\max}.

Example 4.3: Constrained Optimization

Maximize xTAxx^T A x subject to x=1\|x\| = 1. By Rayleigh quotient, the maximum is λmax\lambda_{\max}, achieved at the corresponding eigenvector.

5. Positive Definite Matrices and Cholesky Decomposition

Positive definite matrices have special structure that enables efficient factorization and numerical algorithms. The Cholesky decomposition is a key tool for working with these matrices.

Theorem 5.1: Characterizations of Positive Definiteness

For a symmetric matrix AA, the following are equivalent:

  1. xTAx>0x^T A x > 0 for all x0x \neq 0
  2. All eigenvalues of AA are positive
  3. All leading principal minors are positive (Sylvester's criterion)
  4. A=BTBA = B^T B for some invertible matrix BB
  5. AA has a Cholesky decomposition
Definition 5.1: Cholesky Decomposition

A positive definite matrix AA can be uniquely factored as:

A=LLTA = LL^T

where LL is lower triangular with positive diagonal entries. This is the Cholesky decomposition.

Theorem 5.2: Existence and Uniqueness of Cholesky

A symmetric matrix AA has a Cholesky decomposition if and only if AA is positive definite. The decomposition is unique if we require LL to have positive diagonal entries.

Example 5.1: Cholesky Example

For A=(4223)A = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}:

Cholesky decomposition:

L=(2012)L = \begin{pmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{pmatrix}

Then A=LLTA = LL^T.

Theorem 5.3: Properties of Positive Definite Matrices

  • If A,BA, B are positive definite, so is A+BA + B
  • If AA is positive definite and PP is invertible, then PTAPP^T A P is positive definite
  • Principal submatrices of a positive definite matrix are positive definite
  • The inverse of a positive definite matrix is positive definite

Example 5.2: Covariance Matrices

In statistics, covariance matrices are always positive semidefinite. If the random variables are linearly independent, the covariance matrix is positive definite.

Remark 5.1: Computational Advantages

Cholesky decomposition requires O(n3/3)O(n^3/3) operations (half of LU decomposition) and is numerically stable. It's used in solving linear systems, computing determinants, and generating correlated random variables.

6. More SVD Properties

SVD has many elegant mathematical properties that make it a fundamental tool in linear algebra. Understanding these properties deepens our appreciation of this decomposition.

Theorem 6.1: SVD and Eigenvalue Decomposition

For a symmetric matrix AA, the SVD coincides with the eigenvalue decomposition:

A=UΣUTA = U \Sigma U^T

where UU contains eigenvectors and Σ\Sigma contains absolute values of eigenvalues.

Theorem 6.2: Polar Decomposition

Any square matrix AA can be written as:

A=UPA = UP

where UU is orthogonal and PP is positive semidefinite. This is the polar decomposition.

From SVD A=UΣVTA = U \Sigma V^T, we get A=(UVT)(VΣVT)A = (UV^T)(V \Sigma V^T) where UVTUV^T is orthogonal and VΣVTV \Sigma V^T is positive semidefinite.

Definition 6.1: Condition Number

The condition number of a matrix AA (with respect to the 2-norm) is:

κ(A)=σmaxσmin\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}}

where σmax\sigma_{\max} and σmin\sigma_{\min} are the largest and smallest nonzero singular values.

Remark 6.1: Numerical Stability

A large condition number indicates that the matrix is "ill-conditioned"—small changes in input can cause large changes in output. This is crucial for numerical algorithms.

Theorem 6.3: SVD and Matrix Factorization

SVD provides a factorization that generalizes many other decompositions:

  • For square invertible AA: SVD gives polar decomposition
  • For symmetric AA: SVD gives eigenvalue decomposition
  • For any AA: SVD gives the most general factorization
Example 6.1: SVD for Rectangular Matrices

SVD works for any matrix, not just square ones. For AMm×nA \in M_{m \times n} with mnm \neq n, we still get A=UΣVTA = U \Sigma V^T where UU is m×mm \times m and VV is n×nn \times n.

Theorem 6.4: SVD and Column/Row Spaces

  • Left singular vectors (columns of UU) form an orthonormal basis for the column space
  • Right singular vectors (columns of VV) form an orthonormal basis for the row space
  • The null space is spanned by the right singular vectors corresponding to zero singular values

Example 6.2: SVD for Rank-Deficient Matrices

If rank(A)=r<min(m,n)\text{rank}(A) = r < \min(m, n), then only the first rr singular values are nonzero. The SVD reveals the structure: AA maps the rr-dimensional row space onto the rr-dimensional column space.

Course 11 Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
SVD decomposes any m×nm \times n matrix AA as:
Easy
Not attempted
2
Singular values are:
Medium
Not attempted
3
The rank of AA equals:
Medium
Not attempted
4
A quadratic form Q(x)=xTAxQ(x) = x^T A x requires AA to be:
Easy
Not attempted
5
Positive definite means:
Easy
Not attempted
6
AA is positive definite iff:
Medium
Not attempted
7
Sylvester's criterion: AA is positive definite iff:
Medium
Not attempted
8
The best rank-kk approximation to AA is:
Medium
Not attempted
9
The pseudoinverse A+A^+ using SVD is:
Hard
Not attempted
10
Sylvester's law of inertia states:
Medium
Not attempted

Frequently Asked Questions

What makes SVD so important?

SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.

How does SVD relate to eigendecomposition?

For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.

What's the geometric interpretation of SVD?

Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.

What's the difference between positive definite and positive semidefinite?

Definite: Q(x) > 0 for all x ≠ 0 (all eigenvalues positive). Semidefinite: Q(x) ≥ 0 for all x (eigenvalues ≥ 0, at least one is zero).

How do I check positive definiteness quickly?

Three ways: (1) All eigenvalues positive, (2) Sylvester's criterion (leading principal minors positive), (3) Cholesky decomposition exists (A = LLᵀ).