Orthogonal projection finds the closest point in a subspace to a given vector. This fundamental operation underlies least squares approximation—the backbone of regression, curve fitting, and countless applications in science and engineering.
Let be a subspace of inner product space . The orthogonal projection of onto is the unique vector such that:
The orthogonal projection is the unique closest point in to :
Let and . Then and .
By Pythagorean theorem (since ):
So unless .
If is an orthonormal basis for :
Let . Then (linear combination of basis).
Check : for each :
Project onto the xy-plane in .
The residual is perpendicular to the plane.
Orthogonal projection is like "dropping a perpendicular" from to the subspace :
Project onto the line .
First normalize:
Distance from to line:
Every vector can be uniquely written as:
where .
In , project onto :
Residual: ✓ (check: )
The projection map is a linear transformation:
Using the formula :
The projection operator satisfies:
For the xy-plane projection in :
Check : ✓ (matrix multiplication confirms)
Check : ✓ (symmetric)
projects onto z-axis:
The distance from to subspace is:
Find distance from to plane .
The plane has normal . Project onto :
Distance:
If has full column rank and columns span , the projection matrix onto is:
Then .
If , then for some . We need , i.e., :
So .
Orthogonal projection matrix satisfies:
For line through :
Check: , ✓
For plane spanned by , :
Then is a 3×3 projection matrix.
If columns of are orthonormal, then and:
This is much simpler—no inverse needed!
With orthonormal basis for xy-plane:
An orthogonal projection matrix has only two eigenvalues:
If , then , so .
If , then , so .
Since , these are all the eigenvalues.
For projection onto -dimensional subspace:
An oblique projection has but . It projects along a direction that is not perpendicular to the subspace.
If projects onto , then projects onto :
Both conditions verified: is an orthogonal projection.
For any orthogonal projection and any vector :
Equality holds iff (v is already in the subspace).
By Pythagorean theorem: .
Since , we have .
Given with and , find minimizing:
The least squares solution satisfies the normal equation:
If has full column rank, the unique solution is:
We want . The residual must be orthogonal to :
Fit to points .
Solving : , so .
The least squares problem asks: find the closest point in to .
The residual is perpendicular to column space.
Fit to points :
Solve to find best-fit parabola.
Let be the least squares solution and :
From the line-fitting example, with :
Check: ... (recalculating...)
Residual norm:
The R² value measures how well the model fits:
where is the mean of . R² = 1 means perfect fit.
Two methods for solving least squares:
For , the least squares solution is:
Solve upper triangular system by back substitution.
If does not have full column rank:
The Moore-Penrose pseudoinverse generalizes matrix inverse:
where inverts nonzero singular values and transposes.
. Least squares gives .
Fit degree-n polynomial using Vandermonde matrix. Same least squares setup.
Project noisy signal onto "smooth" subspace (low frequencies, polynomials, etc.).
Overdetermined system from multiple satellites. Least squares finds best position estimate.
Given data , fit :
Solve to get regression coefficients.
If observations have different reliabilities, use weighted least squares:
Solution:
When is near-singular, add regularization (ridge regression):
This trades bias for variance reduction and prevents overfitting.
A blurred image relates to original by where is a blurring operator.
Deblurring is an ill-posed inverse problem. Regularized least squares:
The Fourier series of is its projection onto span of trigonometric functions:
where and .
Find the best degree-2 polynomial approximation to on :
Project onto using Legendre polynomials (orthogonal):
On , the inner product is:
Project onto on :
(odd function)
, so \text{proj}(|x|) = \frac{1}{2}}
Among all functions in subspace , the projection minimizes:
Project onto on :
Compute Fourier coefficients:
So .
The error in projecting onto first Fourier modes is:
This decreases as we add more modes (Bessel's inequality).
The Legendre polynomials are orthogonal on :
Projection onto gives best polynomial approximation.
proj_U(v) is a vector. ⟨v, e⟩ is a scalar. Don't forget to multiply by e!
Σ⟨v, eᵢ⟩eᵢ only works for orthonormal basis. Use P = A(AᵀA)⁻¹Aᵀ for general basis.
(AᵀA)⁻¹ exists only if A has full column rank. Otherwise, infinitely many solutions.
For orthogonal projections, P = Pᵀ. But for general linear maps, this may not hold.
P = A(AᵀA)⁻¹Aᵀ, NOT (AAᵀ)⁻¹. The order matters!
When verifying projections:
| Property | Formula | Meaning |
|---|---|---|
| Idempotent | Project twice = project once | |
| Symmetric | Orthogonal projection | |
| Eigenvalues | 0 and 1 only | U⊥ and U |
| Complement | Projects onto U⊥ |
For orthogonal projection :
The operator norm of a nonzero orthogonal projection is exactly 1.
An oblique projection onto along satisfies:
Project onto x-axis along the line (not perpendicular!):
Check: ✓, but (not orthogonal).
For self-adjoint with eigenvalue and eigenspace :
These projections are orthogonal and sum to identity: .
The Spectral Theorem says:
Every self-adjoint operator is a weighted sum of orthogonal projections onto eigenspaces.
In quantum mechanics, observables are self-adjoint operators. Measurement projects the state onto an eigenspace:
The probability of outcome is .
Self-adjoint operators decompose into spectral projections onto eigenspaces.
Singular value decomposition generalizes spectral theory to all matrices.
Projections help diagonalize quadratic forms and classify critical points.
Iterative methods like GMRES and conjugate gradients use projections.
Project onto :
Solution outline:
Find the least squares fit of to (−1, 2), (0, 1), (1, 2), (2, 5):
Solution outline:
The method of least squares was developed independently by Carl Friedrich Gauss (1795) and Adrien-Marie Legendre (1805). Gauss used it to predict the orbit of the asteroid Ceres, one of the great triumphs of mathematical astronomy.
| proj_U(v) | Orthogonal projection of v onto U |
| P | Projection matrix (onto column space) |
| A⁺ | Moore-Penrose pseudoinverse |
| x̂ | Least squares solution |
| r | Residual (b − Ax̂) |
Project onto plane :
Step 1: Orthonormalize using Gram-Schmidt:
Step 2: Compute coefficients:
Step 3: Projection:
Interesting: is already in the plane! (proj = v)
Temperature readings with errors: fit to t = 0, 1, 2, 3 with T = 15, ?, 21, 24:
Use only available data points:
Solve normal equations for best-fit line.
Find the projection matrix onto :
Verify: , , eigenvalues are 1 (multiplicity 1) and 0 (multiplicity 2).
Fit to data points:
Solve for .
Decompose into parts parallel and perpendicular to :
Parallel part:
Perpendicular part:
Verify: ✓
For :
Projections appear throughout mathematics:
You've mastered orthogonal projections when you can:
Find the projection of onto where :
Step 1: Find null(A): , so
Step 2: Orthonormalize the basis, then project.
Alternative: Use where .
Find the angle between subspaces and :
The angle satisfies:
So .
Fit to data by linearizing:
Set up least squares with , then solve for .
Fit to data points:
Solve for .
When fitting data:
All reduce to linear least squares with appropriate design matrix!
| Method | Complexity | Stability | Best For |
|---|---|---|---|
| Normal equations | Poor | Well-conditioned | |
| QR decomposition | Good | General use | |
| SVD | Excellent | Rank-deficient | |
| Iterative (LSQR) | per iter | Good | Very large sparse |
The condition number affects accuracy:
Polynomial fitting with high-degree polynomials leads to Vandermonde matrices with very large condition numbers:
For better conditioning, use orthogonal polynomials (Legendre, Chebyshev) instead of monomials.
When the problem is ill-posed, add regularization:
It's 'dropping a perpendicular' from v to the subspace U. The projection is where the perpendicular meets U, and the residual v - proj is perpendicular to U.
By the Pythagorean theorem: ||v - w||² = ||v - proj||² + ||proj - w||² for any w in U. Since the second term is non-negative, ||v - w|| ≥ ||v - proj||.
Ax = b has no solution when b ∉ col(A). The 'best' x makes Ax = proj_{col(A)}(b), the projection of b onto column space. This minimizes ||Ax - b||.
'Normal' means perpendicular. The equation AᵀAx = Aᵀb ensures the residual b - Ax is orthogonal (normal) to col(A), which is the defining property of projection.
AᵀA is invertible iff A has full column rank (columns are linearly independent). This is necessary for a unique least squares solution.
There are infinitely many least squares solutions. The minimum-norm solution uses the pseudoinverse: x = A⁺b. Or add regularization (ridge regression).
Filtering is projection! Keeping only certain frequency components projects the signal onto the subspace spanned by those frequencies.
Linear regression y = Xβ + ε is least squares: minimize ||y - Xβ||². The normal equation Xᵀy = XᵀXβ gives β = (XᵀX)⁻¹Xᵀy.
Yes, in Hilbert spaces. For closed subspaces, orthogonal projections exist. Fourier truncation (keeping first n terms) is projection onto finite-dimensional subspace.
Projection matrices have eigenvalues 0 and 1 only. Eigenvalue 1 corresponds to vectors in U, eigenvalue 0 to vectors in U⊥.