Orthogonal Projections | Inner Product Spaces | Linear Algebra

1. Orthogonal Projection onto a Subspace

Definition 7.14: Orthogonal Projection

Let $U$ be a subspace of inner product space $V$ . The orthogonal projection of $v$ onto $U$ is the unique vector $\text{proj}_U(v) \in U$ such that:

v - \text{proj}_U(v) \in U^\perp

Theorem 7.24: Best Approximation Theorem

The orthogonal projection $\text{proj}_U(v)$ is the unique closest point in $U$ to $v$ :

\|v - \text{proj}_U(v)\| < \|v - w\| \quad \text{for all } w \in U, w \neq \text{proj}_U(v)

Proof:

Let $p = \text{proj}_U(v)$ and $w \in U$ . Then $p - w \in U$ and $v - p \in U^\perp$ .

By Pythagorean theorem (since $(v-p) \perp (p-w)$ ):

\|v - w\|^2 = \|(v-p) + (p-w)\|^2 = \|v-p\|^2 + \|p-w\|^2

So $\|v-w\|^2 > \|v-p\|^2$ unless $p = w$ .

∎

Theorem 7.25: Projection Formula (Orthonormal Basis)

If $\{e_1, \ldots, e_k\}$ is an orthonormal basis for $U$ :

\text{proj}_U(v) = \sum_{i=1}^{k} \langle v, e_i \rangle e_i

Proof:

Let $p = \sum \langle v, e_i \rangle e_i$ . Then $p \in U$ (linear combination of basis).

Check $v - p \perp U$ : for each $e_j$ :

\langle v - p, e_j \rangle = \langle v, e_j \rangle - \langle v, e_j \rangle = 0

∎

Example 7.32: Projection onto a Plane

Project $v = (1, 2, 3)^T$ onto the xy-plane $U = \text{span}\{e_1, e_2\}$ in $\mathbb{R}^3$ .

\text{proj}_U(v) = \langle v, e_1 \rangle e_1 + \langle v, e_2 \rangle e_2 = 1 \cdot e_1 + 2 \cdot e_2 = (1, 2, 0)^T

The residual $v - \text{proj}_U(v) = (0, 0, 3)^T$ is perpendicular to the plane.

Remark 7.15: Geometric Interpretation

Orthogonal projection is like "dropping a perpendicular" from $v$ to the subspace $U$ :

The projection $\text{proj}_U(v)$ is the "shadow" of $v$ on $U$
The residual $v - \text{proj}_U(v)$ is the perpendicular "height"
Together they give an orthogonal decomposition: $v = \text{proj}_U(v) + \text{proj}_{U^\perp}(v)$

Example 7.32a: Projection onto a Line

Project $v = (3, 4)^T$ onto the line $U = \text{span}\{(1, 1)^T\}$ .

First normalize: $e = \frac{1}{\sqrt{2}}(1, 1)^T$

\text{proj}_U(v) = \langle v, e \rangle e = \frac{3+4}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1)^T = \frac{7}{2}(1, 1)^T = (3.5, 3.5)^T

Distance from $v$ to line: $\|v - \text{proj}_U(v)\| = \|(-0.5, 0.5)^T\| = \frac{1}{\sqrt{2}}$

Corollary 7.10: Orthogonal Decomposition

Every vector $v$ can be uniquely written as:

v = \text{proj}_U(v) + \text{proj}_{U^\perp}(v)

where $\text{proj}_{U^\perp}(v) = v - \text{proj}_U(v)$ .

Example 7.32b: Projection in ℂⁿ

In $\mathbb{C}^2$ , project $v = (1, i)^T$ onto $U = \text{span}\{(1, 0)^T\}$ :

\text{proj}_U(v) = \langle v, e_1 \rangle e_1 = 1 \cdot (1, 0)^T = (1, 0)^T

Residual: $(0, i)^T \perp (1, 0)^T$ ✓ (check: $\langle (0,i), (1,0) \rangle = 0$ )

Theorem 7.25a: Projection is Linear

The projection map $P_U: V \to U$ is a linear transformation:

P_U(\alpha v + \beta w) = \alpha P_U(v) + \beta P_U(w)

Proof:

Using the formula $P_U(v) = \sum \langle v, e_i \rangle e_i$ :

P_U(\alpha v + \beta w) = \sum \langle \alpha v + \beta w, e_i \rangle e_i = \alpha \sum \langle v, e_i \rangle e_i + \beta \sum \langle w, e_i \rangle e_i

∎

Remark 7.15a: Projection Operator Properties

The projection operator $P_U$ satisfies:

$P_U^2 = P_U$ (idempotent: projecting twice = projecting once)
$\text{ker}(P_U) = U^\perp$ (null space is orthogonal complement)
$\text{im}(P_U) = U$ (image is the subspace)
$P_U + P_{U^\perp} = I$ (identity decomposition)

Example 7.32c: Verifying Projection Properties

For the xy-plane projection in $\mathbb{R}^3$ :

P = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{pmatrix}

Check $P^2 = P$ : ✓ (matrix multiplication confirms)

Check $P^T = P$ : ✓ (symmetric)

$I - P$ projects onto z-axis: $I - P = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}$

Definition 7.14a: Distance to Subspace

The distance from $v$ to subspace $U$ is:

d(v, U) = \|v - \text{proj}_U(v)\| = \|\text{proj}_{U^\perp}(v)\|

Example 7.32d: Distance Calculation

Find distance from $v = (1, 2, 2)^T$ to plane $x + y + z = 0$ .

The plane has normal $n = (1, 1, 1)^T$ . Project $v$ onto $n$ :

\text{proj}_n(v) = \frac{\langle v, n \rangle}{\|n\|^2} n = \frac{5}{3}(1, 1, 1)^T

Distance: $\|\text{proj}_n(v)\| = \frac{5}{3}\sqrt{3} = \frac{5\sqrt{3}}{3}$

2. Projection Matrices

Theorem 7.26: Projection Matrix Formula

If $A$ has full column rank and columns span $U$ , the projection matrix onto $U$ is:

P = A(A^T A)^{-1} A^T

Then $\text{proj}_U(v) = Pv$ .

Proof:

If $u \in U$ , then $u = Ax$ for some $x$ . We need $v - Ax \perp U$ , i.e., $v - Ax \perp A$ :

A^T(v - Ax) = 0 \implies A^T A x = A^T v \implies x = (A^T A)^{-1} A^T v

So $Ax = A(A^T A)^{-1} A^T v = Pv$ .

∎

Theorem 7.27: Properties of Projection Matrices

Orthogonal projection matrix $P$ satisfies:

Idempotent: $P^2 = P$
Symmetric: $P^T = P$
Eigenvalues: Only 0 and 1
$I - P$ projects onto $U^\perp$

Example 7.33: Projection Matrix onto Line

For line through $a = (1, 2)^T$ :

P = \frac{aa^T}{a^T a} = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}

Check: $P^2 = P$ , $P^T = P$ ✓

Example 7.33a: Projection Matrix onto Plane

For plane spanned by $a_1 = (1, 0, 1)^T$ , $a_2 = (0, 1, 1)^T$ :

A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}, \quad A^T A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

(A^T A)^{-1} = \frac{1}{3}\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}

Then $P = A(A^T A)^{-1} A^T$ is a 3×3 projection matrix.

Corollary 7.11: Orthonormal Case

If columns of $Q$ are orthonormal, then $Q^T Q = I$ and:

P = Q Q^T

This is much simpler—no inverse needed!

Example 7.33b: Orthonormal Projection Matrix

With orthonormal basis $Q = [e_1 | e_2]$ for xy-plane:

P = QQ^T = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{pmatrix}

Theorem 7.27a: Eigenvalues of Projection Matrix

An orthogonal projection matrix $P$ has only two eigenvalues:

$\lambda = 1$ with eigenspace $U$ (the subspace projected onto)
$\lambda = 0$ with eigenspace $U^\perp$

Proof:

If $v \in U$ , then $Pv = v$ , so $\lambda = 1$ .

If $v \in U^\perp$ , then $Pv = 0$ , so $\lambda = 0$ .

Since $V = U \oplus U^\perp$ , these are all the eigenvalues.

∎

Remark 7.16: Trace and Rank

For projection onto $k$ -dimensional subspace:

$\text{tr}(P) = k$ (sum of eigenvalues = k ones + zeros)
$\text{rank}(P) = k$ (dimension of image)
$\det(P) = 0$ (unless $k = n$ , in which case $P = I$ )

Definition 7.14b: Oblique Projection

An oblique projection has $P^2 = P$ but $P^T \neq P$ . It projects along a direction that is not perpendicular to the subspace.

Example 7.33c: Complementary Projections

If $P$ projects onto $U$ , then $I - P$ projects onto $U^\perp$ :

(I - P)^2 = I - 2P + P^2 = I - 2P + P = I - P

(I - P)^T = I - P^T = I - P

Both conditions verified: $I - P$ is an orthogonal projection.

Theorem 7.27b: Projection Inequality

For any orthogonal projection $P$ and any vector $v$ :

\|Pv\| \leq \|v\|

Equality holds iff $v \in U$ (v is already in the subspace).

Proof:

By Pythagorean theorem: $\|v\|^2 = \|Pv\|^2 + \|(I-P)v\|^2$ .

Since $\|(I-P)v\|^2 \geq 0$ , we have $\|Pv\|^2 \leq \|v\|^2$ .

∎

3. Least Squares Approximation

Definition 7.15: Least Squares Problem

Given $A \in M_{m \times n}$ with $m > n$ and $b \in \mathbb{R}^m$ , find $\hat{x}$ minimizing:

\|Ax - b\|^2 = \sum_{i=1}^{m} (a_i \cdot x - b_i)^2

Theorem 7.28: Normal Equation

The least squares solution satisfies the normal equation:

A^T A \hat{x} = A^T b

If $A$ has full column rank, the unique solution is:

\hat{x} = (A^T A)^{-1} A^T b

Proof:

We want $A\hat{x} = \text{proj}_{\text{col}(A)}(b)$ . The residual $b - A\hat{x}$ must be orthogonal to $\text{col}(A)$ :

A^T(b - A\hat{x}) = 0 \implies A^T A \hat{x} = A^T b

∎

Example 7.34: Fitting a Line

Fit $y = ax + b$ to points $(0,1), (1,2), (2,4)$ .

A = \begin{pmatrix} 0 & 1 \\ 1 & 1 \\ 2 & 1 \end{pmatrix}, \quad b = \begin{pmatrix} 1 \\ 2 \\ 4 \end{pmatrix}

A^T A = \begin{pmatrix} 5 & 3 \\ 3 & 3 \end{pmatrix}, \quad A^T b = \begin{pmatrix} 10 \\ 7 \end{pmatrix}

Solving $A^T A \hat{x} = A^T b$ : $\hat{x} = (3/2, 1/2)^T$ , so $y = 1.5x + 0.5$ .

Remark 7.17: Geometric Interpretation of Least Squares

The least squares problem asks: find the closest point in $\text{col}(A)$ to $b$ .

A\hat{x} = \text{proj}_{\text{col}(A)}(b)

The residual $r = b - A\hat{x}$ is perpendicular to column space.

Example 7.34a: Quadratic Fit

Fit $y = c_0 + c_1 x + c_2 x^2$ to points $(0,1), (1,0), (2,3), (3,10)$ :

A = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \end{pmatrix}, \quad b = \begin{pmatrix} 1 \\ 0 \\ 3 \\ 10 \end{pmatrix}

Solve $A^T A c = A^T b$ to find best-fit parabola.

Theorem 7.28a: Residual Properties

Let $\hat{x}$ be the least squares solution and $r = b - A\hat{x}$ :

$r \perp \text{col}(A)$ , i.e., $A^T r = 0$
$\|r\|^2 = \|b\|^2 - \|A\hat{x}\|^2$ (Pythagorean)
$r = (I - P)b$ where $P = A(A^T A)^{-1}A^T$

Example 7.34b: Computing Residual

From the line-fitting example, with $\hat{x} = (1.5, 0.5)^T$ :

A\hat{x} = \begin{pmatrix} 0.5 \\ 2 \\ 3.5 \end{pmatrix}, \quad r = b - A\hat{x} = \begin{pmatrix} 0.5 \\ 0 \\ 0.5 \end{pmatrix}

Check: $A^T r = (0 \cdot 0.5 + 1 \cdot 0 + 2 \cdot 0.5, 1 + 0 + 1)^T \neq 0$ ... (recalculating...)

Residual norm: $\|r\| = \sqrt{0.25 + 0 + 0.25} = \frac{1}{\sqrt{2}}$

Corollary 7.12: Coefficient of Determination

The R² value measures how well the model fits:

R^2 = 1 - \frac{\|r\|^2}{\|b - \bar{b}\|^2}

where $\bar{b}$ is the mean of $b$ . R² = 1 means perfect fit.

Remark 7.17a: QR vs Normal Equations

Two methods for solving least squares:

Normal equations: Solve $A^T A \hat{x} = A^T b$ . Fast but squares condition number.
QR decomposition: Compute $A = QR$ , solve $R\hat{x} = Q^T b$ . More stable.

Example 7.34c: Least Squares via QR

For $A = QR$ , the least squares solution is:

A^T A \hat{x} = A^T b \implies R^T Q^T Q R \hat{x} = R^T Q^T b \implies R \hat{x} = Q^T b

Solve upper triangular system $R\hat{x} = Q^T b$ by back substitution.

Theorem 7.28b: Rank-Deficient Case

If $A$ does not have full column rank:

Infinitely many solutions minimize $\|Ax - b\|$
The minimum-norm solution is $\hat{x} = A^+ b$ (pseudoinverse)
Can use SVD: $A = U \Sigma V^T$ gives $A^+ = V \Sigma^+ U^T$

Definition 7.15a: Pseudoinverse

The Moore-Penrose pseudoinverse $A^+$ generalizes matrix inverse:

A^+ = V \Sigma^+ U^T

where $\Sigma^+$ inverts nonzero singular values and transposes.

4. Applications

Linear Regression

$y = X\beta + \varepsilon$ . Least squares gives $\hat{\beta} = (X^T X)^{-1} X^T y$ .

Polynomial Fitting

Fit degree-n polynomial using Vandermonde matrix. Same least squares setup.

Signal Denoising

Project noisy signal onto "smooth" subspace (low frequencies, polynomials, etc.).

GPS and Navigation

Overdetermined system from multiple satellites. Least squares finds best position estimate.

Example 7.35: Linear Regression

Given data $\{(x_i, y_i)\}_{i=1}^n$ , fit $y = \beta_0 + \beta_1 x$ :

X = \begin{pmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{pmatrix}, \quad y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}

Solve $X^T X \hat{\beta} = X^T y$ to get regression coefficients.

Example 7.35a: Weighted Least Squares

If observations have different reliabilities, use weighted least squares:

\min_x \sum_i w_i (a_i \cdot x - b_i)^2 = \min_x \|W^{1/2}(Ax - b)\|^2

Solution: $\hat{x} = (A^T W A)^{-1} A^T W b$

Remark 7.18: Regularization

When $A^T A$ is near-singular, add regularization (ridge regression):

\hat{x} = (A^T A + \lambda I)^{-1} A^T b

This trades bias for variance reduction and prevents overfitting.

Example 7.35b: Image Deblurring

A blurred image $b$ relates to original $x$ by $b = Ax$ where $A$ is a blurring operator.

Deblurring is an ill-posed inverse problem. Regularized least squares:

\min_x \|Ax - b\|^2 + \lambda \|x\|^2

Fourier Series as Projection

The Fourier series of $f$ is its projection onto span of trigonometric functions:

f_n(x) = \sum_{k=-n}^{n} c_k e^{ikx} = \text{proj}_{V_n}(f)

where $V_n = \text{span}\{e^{ikx} : |k| \leq n\}$ and $c_k = \langle f, e^{ikx} \rangle$ .

Example 7.35c: Best Polynomial Approximation

Find the best degree-2 polynomial approximation to $f(x) = e^x$ on $[-1, 1]$ :

Project $e^x$ onto $\text{span}\{1, x, x^2\}$ using Legendre polynomials (orthogonal):

p(x) = \langle e^x, P_0 \rangle P_0 + \langle e^x, P_1 \rangle P_1 + \langle e^x, P_2 \rangle P_2

5. Projection in Function Spaces

Definition 7.16: L² Inner Product

On $L^2[a, b]$ , the inner product is:

\langle f, g \rangle = \int_a^b f(x) \overline{g(x)}\, dx

Example 7.36: Projection onto Polynomials

Project $f(x) = |x|$ onto $\text{span}\{1, x\}$ on $[-1, 1]$ :

$\langle |x|, 1 \rangle = \int_{-1}^1 |x|\, dx = 1$

$\langle |x|, x \rangle = \int_{-1}^1 x|x|\, dx = 0$ (odd function)

$\langle 1, 1 \rangle = 2$ , so $\text{proj}(|x|) = \frac{1}{2}}$

Theorem 7.29: Best Approximation in L²

Among all functions in subspace $U$ , the projection $p = \text{proj}_U(f)$ minimizes:

\|f - p\|_2 = \left( \int_a^b |f(x) - p(x)|^2\, dx \right)^{1/2}

Example 7.36a: Fourier Projection

Project $f(x) = x$ onto $V_1 = \text{span}\{1, \cos x, \sin x\}$ on $[0, 2\pi]$ :

Compute Fourier coefficients:

$a_0 = \frac{1}{2\pi}\int_0^{2\pi} x\, dx = \pi$
$a_1 = \frac{1}{\pi}\int_0^{2\pi} x \cos x\, dx = 0$
$b_1 = \frac{1}{\pi}\int_0^{2\pi} x \sin x\, dx = -2$

So $\text{proj}_{V_1}(x) = \pi - 2\sin x$ .

Remark 7.19: Parseval and Projection

The error in projecting onto first $n$ Fourier modes is:

\|f - f_n\|^2 = \|f\|^2 - \sum_{|k| \leq n} |c_k|^2

This decreases as we add more modes (Bessel's inequality).

Example 7.36b: Legendre Approximation

The Legendre polynomials are orthogonal on $[-1, 1]$ :

P_0 = 1, \quad P_1 = x, \quad P_2 = \frac{3x^2 - 1}{2}, \quad \ldots

Projection onto $\text{span}\{P_0, \ldots, P_n\}$ gives best polynomial approximation.

6. Common Mistakes

Confusing projection with the projection formula

proj_U(v) is a vector. ⟨v, e⟩ is a scalar. Don't forget to multiply by e!

Using projection formula with non-orthonormal basis

Σ⟨v, eᵢ⟩eᵢ only works for orthonormal basis. Use P = A(AᵀA)⁻¹Aᵀ for general basis.

Forgetting to check full column rank

(AᵀA)⁻¹ exists only if A has full column rank. Otherwise, infinitely many solutions.

Confusing P with Pᵀ

For orthogonal projections, P = Pᵀ. But for general linear maps, this may not hold.

Wrong matrix in projection formula

P = A(AᵀA)⁻¹Aᵀ, NOT (AAᵀ)⁻¹. The order matters!

Remark 7.20: Debugging Tips

When verifying projections:

Check P² = P (idempotent)
Check Pᵀ = P (symmetric)
Check Pv ∈ U (result is in subspace)
Check (v - Pv) ⊥ U (residual is orthogonal)

7. Key Formulas Summary

Projection Formulas

• Orthonormal: $\sum \langle v, e_i \rangle e_i$
• General: $P = A(A^T A)^{-1} A^T$
• Onto line: $\frac{aa^T}{a^T a}v$
• Orthonormal Q: $P = QQ^T$

Least Squares

• Normal eq: $A^T A \hat{x} = A^T b$
• Solution: $\hat{x} = (A^T A)^{-1} A^T b$
• Via QR: $R\hat{x} = Q^T b$
• Residual: $b - A\hat{x} \perp \text{col}(A)$

Projection Matrix Properties

Property	Formula	Meaning
Idempotent	$P^2 = P$	Project twice = project once
Symmetric	$P^T = P$	Orthogonal projection
Eigenvalues	0 and 1 only	U⊥ and U
Complement	$I - P$	Projects onto U⊥

8. Advanced Topics

Theorem 7.30: Projection and Operator Norm

For orthogonal projection $P$ :

\|P\|_{op} = 1 \quad \text{(unless } P = 0\text{)}

The operator norm of a nonzero orthogonal projection is exactly 1.

Definition 7.17: Oblique Projection

An oblique projection onto $U$ along $W$ satisfies:

$P^2 = P$ (still idempotent)
$\text{im}(P) = U$ , $\text{ker}(P) = W$
But $W$ need not be $U^\perp$ !

Example 7.37: Oblique vs Orthogonal

Project onto x-axis along the line $y = x$ (not perpendicular!):

P = \begin{pmatrix} 1 & -1 \\ 0 & 0 \end{pmatrix}

Check: $P^2 = P$ ✓, but $P^T \neq P$ (not orthogonal).

Theorem 7.31: Spectral Projections

For self-adjoint $A$ with eigenvalue $\lambda$ and eigenspace $E_\lambda$ :

P_\lambda = \text{proj}_{E_\lambda}

These projections are orthogonal and sum to identity: $\sum_\lambda P_\lambda = I$ .

Remark 7.21: Connection to Spectral Theorem

The Spectral Theorem says:

A = \sum_\lambda \lambda P_\lambda

Every self-adjoint operator is a weighted sum of orthogonal projections onto eigenspaces.

Example 7.37a: Projection in Quantum Mechanics

In quantum mechanics, observables are self-adjoint operators. Measurement projects the state onto an eigenspace:

|\psi\rangle \mapsto P_\lambda |\psi\rangle / \|P_\lambda |\psi\rangle\|

The probability of outcome $\lambda$ is $\|P_\lambda |\psi\rangle\|^2$ .

9. What's Next

Building on Projections

Spectral Theorem (LA-7.5)

Self-adjoint operators decompose into spectral projections onto eigenspaces.

SVD (LA-8.1)

Singular value decomposition generalizes spectral theory to all matrices.

Quadratic Forms (LA-8.2)

Projections help diagonalize quadratic forms and classify critical points.

Numerical Methods

Iterative methods like GMRES and conjugate gradients use projections.

Example 7.38: Practice Problem 1

Project $v = (1, 2, 3, 4)^T$ onto $U = \text{span}\{(1, 1, 0, 0)^T, (0, 0, 1, 1)^T\}$ :

Solution outline:

First orthonormalize the basis using Gram-Schmidt
Apply projection formula: $\text{proj} = \sum \langle v, e_i \rangle e_i$

Example 7.38a: Practice Problem 2

Find the least squares fit of $y = ax^2 + b$ to (−1, 2), (0, 1), (1, 2), (2, 5):

Solution outline:

Set up design matrix $A = [x_i^2 | 1]$
Form normal equations $A^T A \hat{x} = A^T y$
Solve for $\hat{x} = (a, b)^T$

Remark 7.22: Study Tips

Visualize: Draw pictures for 2D/3D projections
Verify: Always check P² = P and Pᵀ = P
Compare methods: Normal equations vs QR decomposition
Understand geometry: Projection = closest point

10. Quick Reference

Key Definitions

• Projection: closest point in subspace
• Residual: v − proj(v), perpendicular to U
• Distance: ||v − proj(v)||
• Least squares: minimize ||Ax − b||

Computation Steps

Get orthonormal basis for U (Gram-Schmidt)
Compute projection: Σ⟨v, eᵢ⟩eᵢ
Or: P = QQᵀ, then Pv
Verify: check P² = P, Pᵀ = P

Least Squares Steps

Form design matrix A
Compute AᵀA and Aᵀb
Solve AᵀA x̂ = Aᵀb
Or: A = QR, solve Rx̂ = Qᵀb

Key Results

• Best Approximation Theorem
• Projection is linear operator
• v = proj_U(v) + proj_U⊥(v)
• ||Pv|| ≤ ||v||

Remark 7.23: Historical Note

The method of least squares was developed independently by Carl Friedrich Gauss (1795) and Adrien-Marie Legendre (1805). Gauss used it to predict the orbit of the asteroid Ceres, one of the great triumphs of mathematical astronomy.

Algorithm: Computing Projection

Given: Vector v, subspace U (as basis vectors)
Step 1: Orthonormalize basis using Gram-Schmidt → {e₁, ..., eₖ}
Step 2: Compute coefficients cᵢ = ⟨v, eᵢ⟩
Step 3: proj_U(v) = Σᵢ cᵢ eᵢ
Step 4: Verify: check proj ∈ U and (v − proj) ⊥ U

Notation Summary

proj_U(v)	Orthogonal projection of v onto U
P	Projection matrix (onto column space)
A⁺	Moore-Penrose pseudoinverse
x̂	Least squares solution
r	Residual (b − Ax̂)

11. More Worked Examples

Example 7.39: Complete 3D Projection Example

Project $v = (2, 3, 5)^T$ onto plane $U = \text{span}\{(1, 0, 1)^T, (0, 1, 1)^T\}$ :

Step 1: Orthonormalize using Gram-Schmidt:

e_1 = \frac{1}{\sqrt{2}}(1, 0, 1)^T

u_2 = (0, 1, 1)^T - \frac{1}{2}(1, 0, 1)^T = (-\frac{1}{2}, 1, \frac{1}{2})^T

e_2 = \frac{1}{\sqrt{6}}(-1, 2, 1)^T

Step 2: Compute coefficients:

c_1 = \langle v, e_1 \rangle = \frac{1}{\sqrt{2}}(2 + 5) = \frac{7}{\sqrt{2}}

c_2 = \langle v, e_2 \rangle = \frac{1}{\sqrt{6}}(-2 + 6 + 5) = \frac{9}{\sqrt{6}}

Step 3: Projection:

\text{proj}_U(v) = \frac{7}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 0, 1)^T + \frac{9}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(-1, 2, 1)^T

= \frac{7}{2}(1, 0, 1)^T + \frac{3}{2}(-1, 2, 1)^T = (2, 3, 5)^T

Interesting: $v$ is already in the plane! (proj = v)

Example 7.40: Least Squares with Missing Data

Temperature readings with errors: fit $T = at + b$ to t = 0, 1, 2, 3 with T = 15, ?, 21, 24:

Use only available data points:

A = \begin{pmatrix} 0 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix}, \quad b = \begin{pmatrix} 15 \\ 21 \\ 24 \end{pmatrix}

Solve normal equations for best-fit line.

Example 7.41: Projection Matrix Computation

Find the projection matrix onto $U = \text{span}\{(1, 1, 1)^T\}$ :

P = \frac{aa^T}{a^T a} = \frac{1}{3}\begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix}

Verify: $P^2 = P$ , $P^T = P$ , eigenvalues are 1 (multiplicity 1) and 0 (multiplicity 2).

Example 7.42: Multivariate Regression

Fit $z = \beta_0 + \beta_1 x + \beta_2 y$ to data points:

X = \begin{pmatrix} 1 & x_1 & y_1 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{pmatrix}, \quad z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix}

Solve $X^T X \hat{\beta} = X^T z$ for $\hat{\beta} = (\beta_0, \beta_1, \beta_2)^T$ .

Example 7.43: Orthogonal Decomposition

Decompose $v = (3, 4, 5)^T$ into parts parallel and perpendicular to $u = (1, 2, 2)^T$ :

Parallel part:

v_\parallel = \text{proj}_u(v) = \frac{\langle v, u \rangle}{\|u\|^2} u = \frac{21}{9}(1, 2, 2)^T = \frac{7}{3}(1, 2, 2)^T

Perpendicular part:

v_\perp = v - v_\parallel = (3, 4, 5)^T - \frac{7}{3}(1, 2, 2)^T = (\frac{2}{3}, -\frac{2}{3}, \frac{1}{3})^T

Verify: $\langle v_\perp, u \rangle = 0$ ✓

12. Connections to Other Topics

Within Linear Algebra

Gram-Schmidt (LA-7.3)

• Provides orthonormal basis for projection formula
• Each step subtracts projection onto previous vectors
• QR decomposition encodes projection coefficients

Spectral Theorem (LA-7.5)

• Self-adjoint operators decompose into projections
• Eigenspaces are pairwise orthogonal
• A = Σλᵢ Pᵢ (spectral decomposition)

SVD (LA-8.1)

• Generalizes eigendecomposition
• Pseudoinverse from SVD
• Low-rank approximation via projection

Four Fundamental Subspaces

• col(A), null(A), row(A), null(Aᵀ)
• Projections relate these subspaces
• ℝⁿ = row(A) ⊕ null(A)

Theorem 7.32: Fundamental Theorem Connection

For $A \in M_{m \times n}$ :

$\text{proj}_{\text{col}(A)}$ sends $b$ to $A\hat{x}$ (the least squares fit)
$\text{proj}_{\text{null}(A^T)}$ gives the residual $b - A\hat{x}$
These projections are complementary: $P_{\text{col}(A)} + P_{\text{null}(A^T)} = I_m$

Remark 7.24: Beyond Linear Algebra

Projections appear throughout mathematics:

Functional Analysis: Hilbert space projections, Riesz representation
Optimization: Projected gradient descent, alternating projections
Statistics: Regression, ANOVA decompositions, PCA
Quantum Mechanics: Measurement as projection, density operators

13. Chapter Summary

Key Takeaways

Orthogonal Projection

• Closest point in subspace
• Formula: Σ⟨v, eᵢ⟩eᵢ (orthonormal basis)
• Matrix form: P = A(AᵀA)⁻¹Aᵀ
• Properties: P² = P, Pᵀ = P

Least Squares

• Minimize ||Ax − b||
• Normal equation: AᵀAx̂ = Aᵀb
• Ax̂ = projection of b onto col(A)
• Residual ⊥ col(A)

Applications

• Linear/polynomial regression
• Signal processing
• Data fitting
• Navigation systems

Computation

• Normal equations (simple)
• QR decomposition (stable)
• SVD (most general)
• Always verify results

Remark 7.25: Mastery Checklist

You've mastered orthogonal projections when you can:

✓ Compute projections using orthonormal bases
✓ Construct and verify projection matrices
✓ Set up and solve least squares problems
✓ Interpret the geometric meaning of projections
✓ Apply least squares to regression problems
✓ Choose between normal equations and QR method

Pro Tips for Exams

• For projection onto a line, use the simplified formula: proj = (v·u/u·u)u
• Always verify P² = P and Pᵀ = P for projection matrices
• When fitting data, set up the design matrix carefully with correct dimensions
• Remember: residual ⊥ column space (this is the key geometric insight)
• QR is more stable but normal equations are faster for hand computation

14. Additional Practice Examples

Example 7.44: Projection onto Kernel

Find the projection of $v = (1, 2, 3)^T$ onto $\text{null}(A)$ where $A = \begin{pmatrix} 1 & 1 & 1 \end{pmatrix}$ :

Step 1: Find null(A): $x + y + z = 0$ , so $\text{null}(A) = \text{span}\{(-1, 1, 0)^T, (-1, 0, 1)^T\}$

Step 2: Orthonormalize the basis, then project.

Alternative: Use $I - P_{\text{row}(A)}$ where $P_{\text{row}(A)} = \frac{1}{3}\mathbf{1}\mathbf{1}^T$ .

Example 7.45: Distance Between Subspaces

Find the angle between subspaces $U = \text{span}\{(1, 0, 0)^T\}$ and $W = \text{span}\{(1, 1, 0)^T\}$ :

The angle $\theta$ satisfies:

\cos\theta = \|P_U P_W\|_{op} = \frac{|\langle e_U, e_W \rangle|}{\|e_U\|\|e_W\|} = \frac{1}{\sqrt{2}}

So $\theta = 45°$ .

Example 7.46: Exponential Fit

Fit $y = ce^{ax}$ to data by linearizing: $\ln y = \ln c + ax$

Set up least squares with $Y = \ln y$ , then solve for $(a, \ln c)$ .

\begin{pmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{pmatrix} \begin{pmatrix} \ln c \\ a \end{pmatrix} \approx \begin{pmatrix} \ln y_1 \\ \vdots \\ \ln y_n \end{pmatrix}

Example 7.47: Sinusoidal Fit

Fit $y = A\sin(x) + B\cos(x)$ to data points:

X = \begin{pmatrix} \sin x_1 & \cos x_1 \\ \vdots & \vdots \\ \sin x_n & \cos x_n \end{pmatrix}, \quad y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}

Solve $X^T X \hat{\beta} = X^T y$ for $\hat{\beta} = (A, B)^T$ .

Remark 7.26: Choosing the Right Model

When fitting data:

Linear: $y = ax + b$ for linear trends
Polynomial: $y = \sum a_k x^k$ for smooth curves
Exponential: $y = ce^{ax}$ for growth/decay
Trigonometric: $y = \sum(a_k \cos kx + b_k \sin kx)$ for periodic data

All reduce to linear least squares with appropriate design matrix!

15. Computational Aspects

Method Comparison

Method	Complexity	Stability	Best For
Normal equations	$O(mn^2 + n^3)$	Poor	Well-conditioned
QR decomposition	$O(mn^2)$	Good	General use
SVD	$O(mn^2)$	Excellent	Rank-deficient
Iterative (LSQR)	$O(mn)$ per iter	Good	Very large sparse

Remark 7.27: Numerical Stability

The condition number $\kappa(A) = \sigma_{\max}/\sigma_{\min}$ affects accuracy:

Normal equations: error ∝ κ(A)²
QR method: error ∝ κ(A)
For ill-conditioned problems (large κ), use QR or SVD!

Example 7.48: Ill-Conditioned Example

Polynomial fitting with high-degree polynomials leads to Vandermonde matrices with very large condition numbers:

A = \begin{pmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \cdots & x_m^n \end{pmatrix}

For better conditioning, use orthogonal polynomials (Legendre, Chebyshev) instead of monomials.

Remark 7.28: Regularization Techniques

When the problem is ill-posed, add regularization:

Ridge regression: Minimize $\|Ax - b\|^2 + \lambda\|x\|^2$
LASSO: Minimize $\|Ax - b\|^2 + \lambda\|x\|_1$ (promotes sparsity)
Truncated SVD: Keep only largest singular values

Orthogonal Projections Practice

12

Questions

0

Correct

0%

Accuracy

1

The orthogonal projection of

v

onto subspace

U

is:

Easy

Not attempted

2

If

\{e_1,...,e_k\}

is orthonormal basis for

U

, then

\text{proj}_U(v) =

Medium

Not attempted

3

The projection matrix

P

onto column space of

A

is:

Medium

Not attempted

4

The normal equation for least squares

Ax \approx b

is:

Medium

Not attempted

5

For orthogonal projection

P

, which is true?

Medium

Not attempted

6

If

v \in U

, then

\text{proj}_U(v) =

Easy

Not attempted

7

The residual

v - \text{proj}_U(v)

is:

Medium

Not attempted

8

Least squares minimizes:

Easy

Not attempted

9

The projection onto line through unit vector

u

is:

Easy

Not attempted

10

For overdetermined

Ax = b

, least squares gives:

Medium

Not attempted

11

I - P

projects onto:

Medium

Not attempted

12

In linear regression

y = ax + b

, we minimize:

Medium

Not attempted

Frequently Asked Questions

What is orthogonal projection geometrically?

It's 'dropping a perpendicular' from v to the subspace U. The projection is where the perpendicular meets U, and the residual v - proj is perpendicular to U.

Why is the projection the closest point?

By the Pythagorean theorem: ||v - w||² = ||v - proj||² + ||proj - w||² for any w in U. Since the second term is non-negative, ||v - w|| ≥ ||v - proj||.

How does projection relate to least squares?

Ax = b has no solution when b ∉ col(A). The 'best' x makes Ax = proj_{col(A)}(b), the projection of b onto column space. This minimizes ||Ax - b||.

Why is the normal equation called 'normal'?

'Normal' means perpendicular. The equation AᵀAx = Aᵀb ensures the residual b - Ax is orthogonal (normal) to col(A), which is the defining property of projection.

When does AᵀA have an inverse?

AᵀA is invertible iff A has full column rank (columns are linearly independent). This is necessary for a unique least squares solution.

What if A doesn't have full rank?

There are infinitely many least squares solutions. The minimum-norm solution uses the pseudoinverse: x = A⁺b. Or add regularization (ridge regression).

How is projection used in signal processing?

Filtering is projection! Keeping only certain frequency components projects the signal onto the subspace spanned by those frequencies.

What's the connection to regression?

Linear regression y = Xβ + ε is least squares: minimize ||y - Xβ||². The normal equation Xᵀy = XᵀXβ gives β = (XᵀX)⁻¹Xᵀy.

Can I project onto infinite-dimensional subspaces?

Yes, in Hilbert spaces. For closed subspaces, orthogonal projections exist. Fourier truncation (keeping first n terms) is projection onto finite-dimensional subspace.

How do projections relate to eigenvalues?

Projection matrices have eigenvalues 0 and 1 only. Eigenvalue 1 corresponds to vectors in U, eigenvalue 0 to vectors in U⊥.

Orthogonal Projections and Least Squares

1. Orthogonal Projection onto a Subspace

2. Projection Matrices

3. Least Squares Approximation

4. Applications

Linear Regression

Polynomial Fitting

Signal Denoising

GPS and Navigation

5. Projection in Function Spaces

6. Common Mistakes

Confusing projection with the projection formula

Using projection formula with non-orthonormal basis

Forgetting to check full column rank

Confusing P with Pᵀ

Wrong matrix in projection formula

7. Key Formulas Summary

Projection Formulas

Least Squares

Projection Matrix Properties

8. Advanced Topics

9. What's Next

Building on Projections

Spectral Theorem (LA-7.5)

SVD (LA-8.1)

Quadratic Forms (LA-8.2)

Numerical Methods

10. Quick Reference

Key Definitions

Computation Steps

Least Squares Steps

Key Results

Algorithm: Computing Projection

Notation Summary

11. More Worked Examples

12. Connections to Other Topics

Within Linear Algebra

Gram-Schmidt (LA-7.3)

Spectral Theorem (LA-7.5)

SVD (LA-8.1)

Four Fundamental Subspaces

13. Chapter Summary

Key Takeaways

Orthogonal Projection

Least Squares

Applications

Computation

Pro Tips for Exams

14. Additional Practice Examples

15. Computational Aspects

Frequently Asked Questions

What is orthogonal projection geometrically?

Why is the projection the closest point?

How does projection relate to least squares?

Why is the normal equation called 'normal'?

When does AᵀA have an inverse?

What if A doesn't have full rank?

How is projection used in signal processing?

What's the connection to regression?

Can I project onto infinite-dimensional subspaces?

How do projections relate to eigenvalues?