SVD & Quadratic Forms
This course covers two powerful advanced topics: the Singular Value Decomposition (SVD), which works for any matrix and provides optimal low-rank approximations, and quadratic forms, which classify the behavior of symmetric matrices and are fundamental to optimization.
The Singular Value Decomposition was discovered independently by Eugenio Beltrami (1835–1900) and Camille Jordan (1838–1922) in the 1870s, with James Joseph Sylvester (1814–1897) also contributing. The modern computational SVD algorithm was developed by Gene Golub and William Kahan in the 1960s. Quadratic forms have been studied since ancient times, with Carl Friedrich Gauss (1777–1855) developing the theory of binary quadratic forms. James Joseph Sylvesterproved the law of inertia in 1852, and Carl Gustav Jacob Jacobi (1804–1851) developed methods for diagonalizing quadratic forms.
The Singular Value Decomposition is one of the most important matrix factorizations. Unlike eigendecomposition, it exists for any matrix, regardless of shape or rank.
For any matrix , there exist:
such that .
The singular values of are where are eigenvalues of (or ).
The rank of equals the number of nonzero singular values.
The best rank- approximation to (in Frobenius or spectral norm) is:
where are columns of .
The Moore-Penrose pseudoinverse is where if , else .
An image is a matrix of pixel values. SVD gives . Keeping only top singular values gives a rank- approximation that preserves most visual information while reducing storage.
Principal Component Analysis (PCA) is essentially SVD of the centered data matrix. Principal components are right singular vectors, and explained variance comes from squared singular values.
To compute SVD of :
For :
Eigenvalues: ,
Singular values: ,
The singular values are unique (up to ordering). The singular vectors are unique up to sign if the corresponding singular values are distinct.
A quadratic form is a homogeneous polynomial of degree 2. They are fundamental in optimization, geometry, and the classification of symmetric matrices.
A quadratic form on is a function of the form:
where is a symmetric matrix.
A symmetric matrix (or quadratic form ) is:
is positive definite if and only if all eigenvalues of are positive.
A symmetric matrix is positive definite if and only if all leading principal minors are positive.
That is, the determinants of all upper-left submatrices are positive.
The signature of a quadratic form is the triple where:
The signature is invariant under congruence transformations (where is invertible).
At a critical point of , the Hessian matrix (matrix of second derivatives) determines the nature:
The equation describes:
Any quadratic form can be diagonalized: there exists an orthogonal matrix such that:
where is diagonal with eigenvalues of .
Since is symmetric, it's orthogonally diagonalizable: with orthogonal and diagonal.
Substitute : .
For , the matrix is .
Eigenvalues: , . So is positive semidefinite (not positive definite).
SVD has numerous practical applications in data science, signal processing, and machine learning. Understanding these applications demonstrates the power of this decomposition.
The best rank- approximation of (in Frobenius norm) is:
This minimizes over all rank- matrices .
The error in the rank- approximation is:
where is the rank of . The relative error is .
A grayscale image is an matrix. SVD gives .
Keeping only top singular values reduces storage from to numbers.
For image with : compression ratio ≈ .
Given data matrix (each row is a data point), center the data and compute SVD of the centered matrix.
The principal components are the right singular vectors (columns of ), and the explained variance is proportional to .
Project data onto the first principal components:
where contains the first columns of . This reduces dimension from to while preserving maximum variance.
For the least squares problem , the solution with minimum norm is:
where is the Moore-Penrose pseudoinverse computed via SVD.
In collaborative filtering, the user-item matrix is approximated by a low-rank matrix via SVD. This captures latent factors (e.g., movie genres) and enables recommendations.
SVD is numerically stable and is the preferred method for computing matrix rank, solving least squares problems, and computing pseudoinverses in practice.
Quadratic forms appear naturally in optimization, geometry, and physics. Understanding their applications connects linear algebra to calculus, optimization theory, and differential geometry.
For a function with critical point at , let be the Hessian matrix (matrix of second partial derivatives).
For , the Hessian is:
Eigenvalues: and , both positive. So is positive definite, and is a local minimum.
A conic section in is given by:
In matrix form: where is the symmetric matrix of quadratic terms.
The type of conic depends on the eigenvalues of :
In , the equation with positive definite gives an ellipsoid. The axes are along the eigenvectors of , with lengths .
For a symmetric matrix :
The minimum is achieved when is the eigenvector corresponding to , and maximum for .
Maximize subject to . By Rayleigh quotient, the maximum is , achieved at the corresponding eigenvector.
Positive definite matrices have special structure that enables efficient factorization and numerical algorithms. The Cholesky decomposition is a key tool for working with these matrices.
For a symmetric matrix , the following are equivalent:
A positive definite matrix can be uniquely factored as:
where is lower triangular with positive diagonal entries. This is the Cholesky decomposition.
A symmetric matrix has a Cholesky decomposition if and only if is positive definite. The decomposition is unique if we require to have positive diagonal entries.
For :
Cholesky decomposition:
Then .
In statistics, covariance matrices are always positive semidefinite. If the random variables are linearly independent, the covariance matrix is positive definite.
Cholesky decomposition requires operations (half of LU decomposition) and is numerically stable. It's used in solving linear systems, computing determinants, and generating correlated random variables.
SVD has many elegant mathematical properties that make it a fundamental tool in linear algebra. Understanding these properties deepens our appreciation of this decomposition.
For a symmetric matrix , the SVD coincides with the eigenvalue decomposition:
where contains eigenvectors and contains absolute values of eigenvalues.
Any square matrix can be written as:
where is orthogonal and is positive semidefinite. This is the polar decomposition.
From SVD , we get where is orthogonal and is positive semidefinite.
The condition number of a matrix (with respect to the 2-norm) is:
where and are the largest and smallest nonzero singular values.
A large condition number indicates that the matrix is "ill-conditioned"—small changes in input can cause large changes in output. This is crucial for numerical algorithms.
SVD provides a factorization that generalizes many other decompositions:
SVD works for any matrix, not just square ones. For with , we still get where is and is .
If , then only the first singular values are nonzero. The SVD reveals the structure: maps the -dimensional row space onto the -dimensional column space.
SVD works for ANY matrix (any shape, any rank), provides optimal low-rank approximations, reveals rank, gives the pseudoinverse, and connects to eigenvalues. It's the 'Swiss army knife' of matrix decompositions.
For symmetric A, SVD and eigendecomposition coincide. For general A: singular values are √(eigenvalues of AᵀA), right singular vectors are eigenvectors of AᵀA, left singular vectors are eigenvectors of AAᵀ.
Any linear map can be decomposed as: (1) rotate/reflect by Vᵀ, (2) scale along axes by Σ, (3) rotate/reflect by U. SVD reveals the 'principal axes' of the transformation.
Definite: Q(x) > 0 for all x ≠ 0 (all eigenvalues positive). Semidefinite: Q(x) ≥ 0 for all x (eigenvalues ≥ 0, at least one is zero).
Three ways: (1) All eigenvalues positive, (2) Sylvester's criterion (leading principal minors positive), (3) Cholesky decomposition exists (A = LLᵀ).