LA-C11

Available

Course 11: Advanced Topics Part 1

SVD & Quadratic Forms

This course covers two powerful advanced topics: the Singular Value Decomposition (SVD), which works for any matrix and provides optimal low-rank approximations, and quadratic forms, which classify the behavior of symmetric matrices and are fundamental to optimization.

12-15 hours Advanced Level 10 Objectives

Learning Objectives

State and prove the SVD theorem for any matrix.
Understand the geometric interpretation of SVD.
Apply SVD to low-rank approximation and image compression.
Compute the pseudoinverse using SVD.
Define quadratic forms and their matrix representation.
Classify quadratic forms (positive/negative definite, semidefinite, indefinite).
Apply Sylvester's criterion for positive definiteness.
Understand the law of inertia and signature.
Apply quadratic forms to optimization problems.
Connect SVD and quadratic forms to eigenvalues and the spectral theorem.

Prerequisites

LA-C10: Inner Product Spaces
Spectral theorem and orthogonal diagonalization
Eigenvalues and eigenvectors
Matrix rank and null space
Determinants

Historical Context

The Singular Value Decomposition was discovered independently by Eugenio Beltrami (1835–1900) and Camille Jordan (1838–1922) in the 1870s, with James Joseph Sylvester (1814–1897) also contributing. The modern computational SVD algorithm was developed by Gene Golub and William Kahan in the 1960s. Quadratic forms have been studied since ancient times, with Carl Friedrich Gauss (1777–1855) developing the theory of binary quadratic forms. James Joseph Sylvesterproved the law of inertia in 1852, and Carl Gustav Jacob Jacobi (1804–1851) developed methods for diagonalizing quadratic forms.

1. Singular Value Decomposition (SVD)

The Singular Value Decomposition is one of the most important matrix factorizations. Unlike eigendecomposition, it exists for any matrix, regardless of shape or rank.

Theorem 1.1: SVD Theorem

For any $m \times n$ matrix $A$ , there exist:

Orthogonal $m \times m$ matrix $U$ (left singular vectors)
Diagonal $m \times n$ matrix $\Sigma$ with non-negative entries $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$ (singular values)
Orthogonal $n \times n$ matrix $V$ (right singular vectors)

such that $A = U \Sigma V^T$ .

Definition 1.1: Singular Values

The singular values of $A$ are $\sigma_i = \sqrt{\lambda_i}$ where $\lambda_i$ are eigenvalues of $A^T A$ (or $AA^T$ ).

Theorem 1.2: Rank and SVD

The rank of $A$ equals the number of nonzero singular values.

Theorem 1.3: Eckart-Young Theorem

The best rank- $k$ approximation to $A$ (in Frobenius or spectral norm) is:

A_k = \sum_{i=1}^k \sigma_i u_i v_i^T

where $u_i, v_i$ are columns of $U, V$ .

Definition 1.2: Pseudoinverse

The Moore-Penrose pseudoinverse is $A^+ = V \Sigma^+ U^T$ where $\Sigma^+_{ii} = 1/\sigma_i$ if $\sigma_i > 0$ , else $0$ .

Example 1.1: SVD for Image Compression

An image is a matrix of pixel values. SVD gives $A = U \Sigma V^T$ . Keeping only top $k$ singular values gives a rank- $k$ approximation that preserves most visual information while reducing storage.

Remark 1.1: Connection to PCA

Principal Component Analysis (PCA) is essentially SVD of the centered data matrix. Principal components are right singular vectors, and explained variance comes from squared singular values.

Theorem 1.4: Computing SVD

To compute SVD of $A \in M_{m \times n}(\mathbb{R})$ :

Compute $A^T A$ (or $AA^T$ if $m < n$ )
Find eigenvalues $\lambda_i$ and orthonormal eigenvectors $v_i$ of $A^T A$
Singular values: $\sigma_i = \sqrt{\lambda_i}$
Right singular vectors: $v_i$ (columns of $V$ )
Left singular vectors: $u_i = \frac{1}{\sigma_i} A v_i$ for $\sigma_i > 0$

Example 1.2: SVD Computation Example

For $A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$ :

$A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$

Eigenvalues: $\lambda_1 = \frac{3+\sqrt{5}}{2}$ , $\lambda_2 = \frac{3-\sqrt{5}}{2}$

Singular values: $\sigma_1 = \sqrt{\lambda_1}$ , $\sigma_2 = \sqrt{\lambda_2}$

Theorem 1.5: Uniqueness of SVD

The singular values are unique (up to ordering). The singular vectors are unique up to sign if the corresponding singular values are distinct.

Corollary 1.1: SVD and Matrix Norms

Frobenius norm: $\|A\|_F = \sqrt{\sum \sigma_i^2}$
Spectral norm: $\|A\|_2 = \sigma_1$ (largest singular value)

2. Quadratic Forms

A quadratic form is a homogeneous polynomial of degree 2. They are fundamental in optimization, geometry, and the classification of symmetric matrices.

Definition 2.1: Quadratic Form

A quadratic form on $\mathbb{R}^n$ is a function $Q: \mathbb{R}^n \to \mathbb{R}$ of the form:

Q(x) = x^T A x = \sum_{i,j} a_{ij} x_i x_j

where $A$ is a symmetric $n \times n$ matrix.

Definition 2.2: Classification

A symmetric matrix $A$ (or quadratic form $x^T A x$ ) is:

Positive definite: $x^T A x > 0$ for all $x \neq 0$
Positive semidefinite: $x^T A x \geq 0$ for all $x$
Negative definite: $x^T A x < 0$ for all $x \neq 0$
Indefinite: takes both positive and negative values

Theorem 2.1: Eigenvalue Characterization

$A$ is positive definite if and only if all eigenvalues of $A$ are positive.

Theorem 2.2: Sylvester's Criterion

A symmetric matrix $A$ is positive definite if and only if all leading principal minors are positive.

That is, the determinants of all upper-left submatrices $A_1, A_2, \ldots, A_n$ are positive.

Definition 2.3: Signature

The signature of a quadratic form is the triple $(n_+, n_-, n_0)$ where:

$n_+$ = number of positive eigenvalues
$n_-$ = number of negative eigenvalues
$n_0$ = number of zero eigenvalues

Theorem 2.3: Sylvester's Law of Inertia

The signature is invariant under congruence transformations $P^T A P$ (where $P$ is invertible).

Example 2.1: Optimization

At a critical point of $f: \mathbb{R}^n \to \mathbb{R}$ , the Hessian matrix $H$ (matrix of second derivatives) determines the nature:

$H$ positive definite → local minimum
$H$ negative definite → local maximum
$H$ indefinite → saddle point

Example 2.2: Conics and Quadrics

The equation $x^T A x = 1$ describes:

Ellipse/ellipsoid if $A$ is positive definite
Hyperbola/hyperboloid if $A$ is indefinite
Parabola/paraboloid if $A$ is singular

Theorem 2.4: Diagonalization of Quadratic Forms

Any quadratic form $Q(x) = x^T A x$ can be diagonalized: there exists an orthogonal matrix $P$ such that:

Q(Py) = y^T D y = \sum_{i=1}^n \lambda_i y_i^2

where $D$ is diagonal with eigenvalues of $A$ .

Proof:

Since $A$ is symmetric, it's orthogonally diagonalizable: $A = PDP^T$ with $P$ orthogonal and $D$ diagonal.

Substitute $x = Py$ : $Q(Py) = (Py)^T A (Py) = y^T P^T A P y = y^T D y$ .

∎

Example 2.3: Classifying a Quadratic Form

For $Q(x, y) = 2x^2 + 4xy + 2y^2$ , the matrix is $A = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}$ .

Eigenvalues: $\lambda_1 = 4$ , $\lambda_2 = 0$ . So $A$ is positive semidefinite (not positive definite).

3. SVD Applications

SVD has numerous practical applications in data science, signal processing, and machine learning. Understanding these applications demonstrates the power of this decomposition.

Definition 3.1: Low-Rank Approximation

The best rank- $k$ approximation of $A$ (in Frobenius norm) is:

A_k = \sum_{i=1}^k \sigma_i u_i v_i^T

This minimizes $\|A - B\|_F$ over all rank- $k$ matrices $B$ .

Theorem 3.1: Eckart-Young-Mirsky Theorem

The error in the rank- $k$ approximation is:

\|A - A_k\|_F = \sqrt{\sum_{i=k+1}^r \sigma_i^2}

where $r$ is the rank of $A$ . The relative error is $\sqrt{\sum_{i=k+1}^r \sigma_i^2} / \sqrt{\sum_{i=1}^r \sigma_i^2}$ .

Example 3.1: Image Compression

A grayscale image is an $m \times n$ matrix. SVD gives $A = U \Sigma V^T$ .

Keeping only top $k$ singular values reduces storage from $mn$ to $k(m + n + 1)$ numbers.

For $1000 \times 1000$ image with $k = 50$ : compression ratio ≈ $1000^2 / (50 \cdot 2001) \approx 10$ .

Definition 3.2: Principal Component Analysis (PCA)

Given data matrix $X \in M_{m \times n}(\mathbb{R})$ (each row is a data point), center the data and compute SVD of the centered matrix.

The principal components are the right singular vectors (columns of $V$ ), and the explained variance is proportional to $\sigma_i^2$ .

Example 3.2: Dimensionality Reduction

Project data onto the first $k$ principal components:

X_k = X V_k

where $V_k$ contains the first $k$ columns of $V$ . This reduces dimension from $n$ to $k$ while preserving maximum variance.

Theorem 3.2: Pseudoinverse and Least Squares

For the least squares problem $\min_x \|Ax - b\|^2$ , the solution with minimum norm is:

x = A^+ b

where $A^+$ is the Moore-Penrose pseudoinverse computed via SVD.

Example 3.3: Recommender Systems

In collaborative filtering, the user-item matrix $A$ is approximated by a low-rank matrix via SVD. This captures latent factors (e.g., movie genres) and enables recommendations.

Remark 3.1: Numerical Stability

SVD is numerically stable and is the preferred method for computing matrix rank, solving least squares problems, and computing pseudoinverses in practice.

4. Quadratic Forms Applications

Quadratic forms appear naturally in optimization, geometry, and physics. Understanding their applications connects linear algebra to calculus, optimization theory, and differential geometry.

Theorem 4.1: Second Derivative Test

For a function $f: \mathbb{R}^n \to \mathbb{R}$ with critical point at $x_0$ , let $H$ be the Hessian matrix (matrix of second partial derivatives).

If $H$ is positive definite → $x_0$ is a local minimum
If $H$ is negative definite → $x_0$ is a local maximum
If $H$ is indefinite → $x_0$ is a saddle point
If $H$ is semidefinite → test requires higher derivatives

Example 4.1: Optimization Example

For $f(x, y) = x^2 + 2xy + 3y^2$ , the Hessian is:

H = \begin{pmatrix} 2 & 2 \\ 2 & 6 \end{pmatrix}

Eigenvalues: $4 + 2\sqrt{2}$ and $4 - 2\sqrt{2}$ , both positive. So $H$ is positive definite, and $(0, 0)$ is a local minimum.

Definition 4.1: Conic Sections

A conic section in $\mathbb{R}^2$ is given by:

ax^2 + bxy + cy^2 + dx + ey + f = 0

In matrix form: $x^T A x + b^T x + c = 0$ where $A$ is the $2 \times 2$ symmetric matrix of quadratic terms.

Theorem 4.2: Classification of Conics

The type of conic depends on the eigenvalues of $A$ :

Both positive: ellipse
Both negative: ellipse (after sign change)
One positive, one negative: hyperbola
One zero: parabola
Both zero: degenerate (line or pair of lines)

Example 4.2: Quadric Surfaces

In $\mathbb{R}^3$ , the equation $x^T A x = 1$ with $A$ positive definite gives an ellipsoid. The axes are along the eigenvectors of $A$ , with lengths $1/\sqrt{\lambda_i}$ .

Theorem 4.3: Rayleigh Quotient

For a symmetric matrix $A$ :

\lambda_{\min} \leq \frac{x^T A x}{x^T x} \leq \lambda_{\max}

The minimum is achieved when $x$ is the eigenvector corresponding to $\lambda_{\min}$ , and maximum for $\lambda_{\max}$ .

Example 4.3: Constrained Optimization

Maximize $x^T A x$ subject to $\|x\| = 1$ . By Rayleigh quotient, the maximum is $\lambda_{\max}$ , achieved at the corresponding eigenvector.

5. Positive Definite Matrices and Cholesky Decomposition

Positive definite matrices have special structure that enables efficient factorization and numerical algorithms. The Cholesky decomposition is a key tool for working with these matrices.

Theorem 5.1: Characterizations of Positive Definiteness

For a symmetric matrix $A$ , the following are equivalent:

$x^T A x > 0$ for all $x \neq 0$
All eigenvalues of $A$ are positive
All leading principal minors are positive (Sylvester's criterion)
$A = B^T B$ for some invertible matrix $B$
$A$ has a Cholesky decomposition

Definition 5.1: Cholesky Decomposition

A positive definite matrix $A$ can be uniquely factored as:

A = LL^T

where $L$ is lower triangular with positive diagonal entries. This is the Cholesky decomposition.

Theorem 5.2: Existence and Uniqueness of Cholesky

A symmetric matrix $A$ has a Cholesky decomposition if and only if $A$ is positive definite. The decomposition is unique if we require $L$ to have positive diagonal entries.

Example 5.1: Cholesky Example

For $A = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}$ :

Cholesky decomposition:

L = \begin{pmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{pmatrix}

Then $A = LL^T$ .

Theorem 5.3: Properties of Positive Definite Matrices

If $A, B$ are positive definite, so is $A + B$
If $A$ is positive definite and $P$ is invertible, then $P^T A P$ is positive definite
Principal submatrices of a positive definite matrix are positive definite
The inverse of a positive definite matrix is positive definite

Example 5.2: Covariance Matrices

In statistics, covariance matrices are always positive semidefinite. If the random variables are linearly independent, the covariance matrix is positive definite.

Remark 5.1: Computational Advantages

Cholesky decomposition requires $O(n^3/3)$ operations (half of LU decomposition) and is numerically stable. It's used in solving linear systems, computing determinants, and generating correlated random variables.

6. More SVD Properties

SVD has many elegant mathematical properties that make it a fundamental tool in linear algebra. Understanding these properties deepens our appreciation of this decomposition.

Theorem 6.1: SVD and Eigenvalue Decomposition

For a symmetric matrix $A$ , the SVD coincides with the eigenvalue decomposition:

A = U \Sigma U^T

where $U$ contains eigenvectors and $\Sigma$ contains absolute values of eigenvalues.

Theorem 6.2: Polar Decomposition

Any square matrix $A$ can be written as:

A = UP

where $U$ is orthogonal and $P$ is positive semidefinite. This is the polar decomposition.

From SVD $A = U \Sigma V^T$ , we get $A = (UV^T)(V \Sigma V^T)$ where $UV^T$ is orthogonal and $V \Sigma V^T$ is positive semidefinite.

Definition 6.1: Condition Number

The condition number of a matrix $A$ (with respect to the 2-norm) is:

\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}}

where $\sigma_{\max}$ and $\sigma_{\min}$ are the largest and smallest nonzero singular values.

Remark 6.1: Numerical Stability

A large condition number indicates that the matrix is "ill-conditioned"—small changes in input can cause large changes in output. This is crucial for numerical algorithms.

Theorem 6.3: SVD and Matrix Factorization

SVD provides a factorization that generalizes many other decompositions:

For square invertible $A$ : SVD gives polar decomposition
For symmetric $A$ : SVD gives eigenvalue decomposition
For any $A$ : SVD gives the most general factorization

Example 6.1: SVD for Rectangular Matrices

SVD works for any matrix, not just square ones. For $A \in M_{m \times n}$ with $m \neq n$ , we still get $A = U \Sigma V^T$ where $U$ is $m \times m$ and $V$ is $n \times n$ .

Theorem 6.4: SVD and Column/Row Spaces

Left singular vectors (columns of $U$ ) form an orthonormal basis for the column space
Right singular vectors (columns of $V$ ) form an orthonormal basis for the row space
The null space is spanned by the right singular vectors corresponding to zero singular values

Example 6.2: SVD for Rank-Deficient Matrices

If $\text{rank}(A) = r < \min(m, n)$ , then only the first $r$ singular values are nonzero. The SVD reveals the structure: $A$ maps the $r$ -dimensional row space onto the $r$ -dimensional column space.

Course 11 Practice Quiz

Questions

Correct

Accuracy

SVD decomposes any

m \times n

matrix

A

as:

Easy

Not attempted

Singular values are:

Medium

Not attempted

The rank of

A

equals:

Medium

Not attempted

A quadratic form

Q(x) = x^T A x

requires

A

to be:

Easy

Not attempted

Positive definite means:

Easy

Not attempted

A

is positive definite iff:

Medium

Not attempted

Sylvester's criterion:

A

is positive definite iff:

Medium

Not attempted

The best rank-

k

approximation to

A

is:

Medium

Not attempted

The pseudoinverse

A^+

using SVD is:

Hard

Not attempted

Sylvester's law of inertia states:

Medium

Not attempted

Frequently Asked Questions

What makes SVD so important?

SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.

How does SVD relate to eigendecomposition?

For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.

What's the geometric interpretation of SVD?

Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.

What's the difference between positive definite and positive semidefinite?

Definite: Q(x) > 0 for all x ≠ 0 (all eigenvalues positive). Semidefinite: Q(x) ≥ 0 for all x (eigenvalues ≥ 0, at least one is zero).

How do I check positive definiteness quickly?

Three ways: (1) All eigenvalues positive, (2) Sylvester's criterion (leading principal minors positive), (3) Cholesky decomposition exists (A = LLᵀ).