Gram-Schmidt Orthogonalization | Inner Product Spaces | Linear Algebra

1. The Gram-Schmidt Algorithm

Theorem 7.22: Gram-Schmidt Process

Given linearly independent vectors $\{v_1, v_2, \ldots, v_n\}$ , the Gram-Schmidt process produces orthonormal vectors $\{e_1, e_2, \ldots, e_n\}$ with:

\text{span}\{e_1, \ldots, e_k\} = \text{span}\{v_1, \ldots, v_k\} \quad \text{for each } k

Definition 7.12: Gram-Schmidt Algorithm

Step 1: Normalize the first vector:

e_1 = \frac{v_1}{\|v_1\|}

Step k (for k = 2, 3, ...): Orthogonalize then normalize:

u_k = v_k - \sum_{i=1}^{k-1} \langle v_k, e_i \rangle e_i, \quad e_k = \frac{u_k}{\|u_k\|}

Remark 7.16: Intuition

At step k, we subtract from $v_k$ its projections onto all previous orthonormal vectors. This removes all components in the directions of $e_1, \ldots, e_{k-1}$ , leaving only the "new" direction perpendicular to them.

Proof:

Correctness: We show by induction that $\{e_1, \ldots, e_k\}$ is orthonormal and spans the same space as $\{v_1, \ldots, v_k\}$ .

Base case: $e_1 = v_1/\|v_1\|$ has unit norm and $\text{span}\{e_1\} = \text{span}\{v_1\}$ .

Inductive step: Assume true for $k-1$ . Then:

$u_k$ is orthogonal to each $e_i$ for $i < k$ : $\langle u_k, e_i \rangle = \langle v_k, e_i \rangle - \langle v_k, e_i \rangle = 0$
$u_k \neq 0$ since $v_k$ is not in $\text{span}\{v_1, \ldots, v_{k-1}\}$ (linear independence)
$e_k = u_k/\|u_k\|$ is a unit vector orthogonal to $e_1, \ldots, e_{k-1}$
$v_k = u_k + \sum \langle v_k, e_i \rangle e_i \in \text{span}\{e_1, \ldots, e_k\}$

∎

Corollary 7.6: Existence of Orthonormal Bases

Every finite-dimensional inner product space has an orthonormal basis.

Proof:

Start with any basis $\{v_1, \ldots, v_n\}$ (exists by finite-dimensionality). Apply Gram-Schmidt to obtain orthonormal $\{e_1, \ldots, e_n\}$ spanning the same space.

∎

Remark 7.16a: Geometric Interpretation

The projection $\text{proj}_{e_i}(v_k) = \langle v_k, e_i \rangle e_i$ is the component of $v_k$ in the direction of $e_i$ . Subtracting these removes all "old" directions, leaving only the genuinely new component.

Example 7.27a: Simple 2D Example

Apply Gram-Schmidt to $v_1 = (3, 4)^T$ , $v_2 = (1, 0)^T$ :

Step 1: $\|v_1\| = 5$ , so $e_1 = (3/5, 4/5)^T$

Step 2: $\langle v_2, e_1 \rangle = 3/5$

u_2 = (1, 0)^T - \frac{3}{5}(\frac{3}{5}, \frac{4}{5})^T = (1 - \frac{9}{25}, -\frac{12}{25})^T = (\frac{16}{25}, -\frac{12}{25})^T

\|u_2\| = \frac{1}{25}\sqrt{256 + 144} = \frac{20}{25} = \frac{4}{5}

So $e_2 = (4/5, -3/5)^T$ . Verify: $\langle e_1, e_2 \rangle = 12/25 - 12/25 = 0$ ✓

Theorem 7.22a: Span Preservation Property

At each step k of Gram-Schmidt:

\text{span}\{e_1, \ldots, e_k\} = \text{span}\{v_1, \ldots, v_k\}

The orthonormal vectors span exactly the same subspace as the original vectors up to that point.

Remark 7.16b: Uniqueness

The orthonormal basis produced by Gram-Schmidt depends on:

The order of input vectors (different orderings give different bases)
Sign choices (can flip any $e_k \to -e_k$ )

But the span at each step is uniquely determined by the input order.

Definition 7.12a: Projection Operator

The orthogonal projection onto $\text{span}\{e_1, \ldots, e_k\}$ is:

P_k(v) = \sum_{i=1}^{k} \langle v, e_i \rangle e_i

In Gram-Schmidt, $u_k = v_k - P_{k-1}(v_k)$ .

2. Step-by-Step Example

Example 7.28: Gram-Schmidt in ℝ³

Apply Gram-Schmidt to $v_1 = (1, 1, 0)^T, v_2 = (1, 0, 1)^T, v_3 = (0, 1, 1)^T$ .

Step 1: $\|v_1\| = \sqrt{2}$

e_1 = \frac{1}{\sqrt{2}}(1, 1, 0)^T

Step 2: $\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(1 + 0 + 0) = \frac{1}{\sqrt{2}}$

u_2 = v_2 - \frac{1}{\sqrt{2}} e_1 = (1, 0, 1)^T - \frac{1}{2}(1, 1, 0)^T = (\frac{1}{2}, -\frac{1}{2}, 1)^T

\|u_2\| = \sqrt{\frac{1}{4} + \frac{1}{4} + 1} = \sqrt{\frac{3}{2}}, \quad e_2 = \frac{1}{\sqrt{6}}(1, -1, 2)^T

Step 3: $\langle v_3, e_1 \rangle = \frac{1}{\sqrt{2}}$ , $\langle v_3, e_2 \rangle = \frac{1}{\sqrt{6}}(0 - 1 + 2) = \frac{1}{\sqrt{6}}$

u_3 = v_3 - \frac{1}{\sqrt{2}} e_1 - \frac{1}{\sqrt{6}} e_2 = (-\frac{1}{3}, \frac{1}{3}, \frac{1}{3})^T

e_3 = \frac{1}{\sqrt{3}}(-1, 1, 1)^T

Remark 7.17: Verification

Always verify: check that $\langle e_i, e_j \rangle = \delta_{ij}$ and that each $\|e_i\| = 1$ .

Example 7.28a: Another ℝ³ Example

Apply Gram-Schmidt to $v_1 = (1, 0, 1)^T, v_2 = (0, 1, 1)^T, v_3 = (1, 1, 0)^T$ :

Step 1: $\|v_1\| = \sqrt{2}$

e_1 = \frac{1}{\sqrt{2}}(1, 0, 1)^T

Step 2: $\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(0 + 0 + 1) = \frac{1}{\sqrt{2}}$

u_2 = (0, 1, 1)^T - \frac{1}{2}(1, 0, 1)^T = (-\frac{1}{2}, 1, \frac{1}{2})^T

\|u_2\| = \sqrt{\frac{1}{4} + 1 + \frac{1}{4}} = \sqrt{\frac{3}{2}}

e_2 = \sqrt{\frac{2}{3}}(-\frac{1}{2}, 1, \frac{1}{2})^T = \frac{1}{\sqrt{6}}(-1, 2, 1)^T

Step 3: Compute projections and subtract...

Example 7.28b: Complex Example

In $\mathbb{C}^2$ with standard inner product, orthogonalize $v_1 = (1, i)^T, v_2 = (1, 1)^T$ :

Step 1: $\|v_1\|^2 = |1|^2 + |i|^2 = 2$ , so $e_1 = \frac{1}{\sqrt{2}}(1, i)^T$

Step 2: $\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(1 + 1 \cdot \bar{i}) = \frac{1}{\sqrt{2}}(1 - i)$

u_2 = (1, 1)^T - \frac{1-i}{2}(1, i)^T = (1 - \frac{1-i}{2}, 1 - \frac{i(1-i)}{2})^T

Continue simplifying and normalize...

Remark 7.17a: Tips for Hand Computation

Keep fractions exact until the final step to avoid rounding errors
Verify orthogonality after each step: $\langle u_k, e_i \rangle = 0$ for $i < k$
Use symmetry when possible (e.g., if input has integer entries)
The matrix form (QR) can sometimes be faster to compute

Example 7.28c: Detecting Linear Dependence

Apply Gram-Schmidt to $v_1 = (1, 2)^T, v_2 = (2, 4)^T$ :

Step 1: $e_1 = \frac{1}{\sqrt{5}}(1, 2)^T$

Step 2: $\langle v_2, e_1 \rangle = \frac{1}{\sqrt{5}}(2 + 8) = \frac{10}{\sqrt{5}} = 2\sqrt{5}$

u_2 = (2, 4)^T - 2\sqrt{5} \cdot \frac{1}{\sqrt{5}}(1, 2)^T = (2, 4)^T - 2(1, 2)^T = (0, 0)^T

$u_2 = 0$ indicates $v_2$ is linearly dependent on $v_1$ !

3. QR Decomposition

Theorem 7.23: QR Decomposition

Every matrix $A \in M_{m \times n}(\mathbb{R})$ with linearly independent columns can be written as:

A = QR

where $Q$ is $m \times n$ with orthonormal columns and $R$ is $n \times n$ upper triangular with positive diagonal entries.

Proof:

Apply Gram-Schmidt to columns of $A$ . Let $Q$ have columns $e_1, \ldots, e_n$ .

Since $v_j \in \text{span}\{e_1, \ldots, e_j\}$ , we can write:

v_j = \sum_{i=1}^{j} R_{ij} e_i \quad \text{where } R_{ij} = \langle v_j, e_i \rangle

This gives $A = QR$ with $R$ upper triangular.

∎

Example 7.29: QR Decomposition Example

From the previous example with $A = [v_1 | v_2 | v_3]$ :

Q = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{-1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{pmatrix}

R = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{3}{2}} & \frac{1}{\sqrt{6}} \\ 0 & 0 & \frac{\sqrt{3}}{3} \end{pmatrix}

Remark 7.18: Applications of QR

QR decomposition is used for:

Solving least squares: $Ax \approx b \Rightarrow Rx = Q^T b$
Computing eigenvalues: QR iteration algorithm
Orthonormalizing: Finding orthonormal bases

Theorem 7.23a: Uniqueness of QR

If $A$ has linearly independent columns, then $A = QR$ is unique when we require:

$Q$ has orthonormal columns
$R$ is upper triangular with positive diagonal entries

Proof:

Suppose $A = Q_1 R_1 = Q_2 R_2$ . Then $Q_2^T Q_1 = R_2 R_1^{-1}$ .

Left side is orthogonal, right side is upper triangular. An orthogonal upper triangular matrix must be diagonal with $\pm 1$ entries.

With positive diagonal constraint, $Q_2^T Q_1 = I$ , so $Q_1 = Q_2$ and $R_1 = R_2$ .

∎

Example 7.29a: Solving Least Squares via QR

To solve $Ax \approx b$ (minimize $\|Ax - b\|$ ):

Compute $A = QR$
Compute $Q^T b$
Solve $Rx = Q^T b$ by back substitution

\|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^T b\|^2 + \|(I - QQ^T)b\|^2

Minimized when $Rx = Q^T b$ .

Remark 7.18a: Computational Cost

For an $m \times n$ matrix:

Gram-Schmidt QR: $O(mn^2)$ operations
Householder QR: $O(mn^2 - n^3/3)$ operations, more stable
Givens QR: Better for sparse matrices

Definition 7.13a: Reduced vs Full QR

For $A \in M_{m \times n}$ with $m > n$ :

Reduced QR: $Q$ is $m \times n$ , $R$ is $n \times n$
Full QR: $Q$ is $m \times m$ (orthogonal), $R$ is $m \times n$

Reduced QR is more memory-efficient; full QR is useful for orthogonal projections.

Example 7.29b: QR for Rank Computation

The diagonal of $R$ reveals the rank:

\text{rank}(A) = \text{number of nonzero diagonal entries of } R

In numerical computation, count entries greater than a tolerance.

Theorem 7.23b: QR and Column Space

The columns of $Q$ form an orthonormal basis for $\text{col}(A)$ :

\text{col}(A) = \text{col}(Q)

4. Numerical Stability

Remark 7.19: Problem with Classical Gram-Schmidt

In floating-point arithmetic, rounding errors accumulate. After many steps, the computed $e_i$ may not be orthogonal! This is especially bad when input vectors are nearly linearly dependent.

Definition 7.13: Modified Gram-Schmidt (MGS)

Instead of computing all projections at once, update $v_k$ sequentially:

v_k^{(1)} = v_k - \langle v_k, e_1 \rangle e_1

v_k^{(2)} = v_k^{(1)} - \langle v_k^{(1)}, e_2 \rangle e_2

\vdots

u_k = v_k^{(k-1)} - \langle v_k^{(k-1)}, e_{k-1} \rangle e_{k-1}

Remark 7.20: Why MGS is Better

In exact arithmetic, classical and modified Gram-Schmidt give identical results. But with floating-point errors, MGS produces vectors closer to orthogonal because errors don't accumulate as quickly.

Example 7.30: Loss of Orthogonality

Consider nearly dependent vectors. Classical GS might produce:

\langle e_1, e_3 \rangle \approx 10^{-8} \text{ (should be 0)}

Modified GS typically gives $\langle e_1, e_3 \rangle \approx 10^{-15}$ (machine precision).

Theorem 7.24: Error Analysis

Let $\epsilon$ be machine precision. For classical Gram-Schmidt:

\|Q^T Q - I\| = O(\epsilon \cdot \kappa(A))

For modified Gram-Schmidt:

\|Q^T Q - I\| = O(\epsilon \cdot \kappa(A))

But the constant for MGS is much smaller in practice.

Definition 7.13b: Condition Number

The condition number of $A$ is $\kappa(A) = \sigma_{\max}/\sigma_{\min}$ (ratio of largest to smallest singular values).

Large $\kappa(A)$ indicates near-dependence; Gram-Schmidt is unstable for such matrices.

Remark 7.20a: When to Use Each Method

Classical GS: For hand computation or when stability doesn't matter
Modified GS: When some numerical stability is needed
Householder: For production-quality numerical code (LAPACK uses this)

Example 7.30a: Comparing Methods

For the nearly dependent vectors $v_1 = (1, 10^{-8}, 0)^T$ , $v_2 = (1, 0, 10^{-8})^T$ :

Classical GS: $\langle e_1, e_2 \rangle \approx 10^{-8}$
Modified GS: $\langle e_1, e_2 \rangle \approx 10^{-16}$
Householder: $\langle e_1, e_2 \rangle \approx 10^{-16}$

Definition 7.13c: Householder Reflection

A Householder reflector is $H = I - 2uu^T$ where $\|u\| = 1$ .

It reflects across the hyperplane perpendicular to $u$ . Used to zero out entries below the diagonal.

Remark 7.20b: Reorthogonalization

To improve classical GS, apply it twice:

Run classical Gram-Schmidt once
Check orthogonality: if $\|Q^T Q - I\| > \text{tol}$ , reorthogonalize
Repeat until orthogonal enough

This is called "Gram-Schmidt with reorthogonalization" (GS2).

5. Gram-Schmidt on Function Spaces

Example 7.31: Orthogonalizing Polynomials on [-1,1]

Start with $\{1, x, x^2\}$ and inner product $\langle f, g \rangle = \int_{-1}^{1} f(x)g(x)\, dx$ .

Step 1: $\|1\|^2 = \int_{-1}^{1} 1\, dx = 2$ , so $e_0 = 1/\sqrt{2}$

Step 2: $\langle x, e_0 \rangle = \frac{1}{\sqrt{2}}\int_{-1}^{1} x\, dx = 0$ (odd function!)

So $u_1 = x$ , $\|x\|^2 = \int_{-1}^{1} x^2\, dx = 2/3$ , $e_1 = \sqrt{3/2}\, x$

Step 3: $\langle x^2, e_0 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{2}{3}$ , $\langle x^2, e_1 \rangle = 0$ (even × odd = odd)

u_2 = x^2 - \frac{1}{3}, \quad e_2 \propto x^2 - \frac{1}{3}

These are (scalar multiples of) Legendre polynomials!

Remark 7.21: Orthogonal Polynomials

Different intervals and weight functions give different orthogonal polynomial families. Gram-Schmidt provides a systematic way to construct them.

Example 7.31a: Chebyshev Polynomials

On $[-1, 1]$ with weight $w(x) = 1/\sqrt{1-x^2}$ :

\langle f, g \rangle = \int_{-1}^{1} \frac{f(x)g(x)}{\sqrt{1-x^2}}\, dx

Gram-Schmidt on $\{1, x, x^2, \ldots\}$ produces Chebyshev polynomials $T_n(x)$ .

Example 7.31b: Hermite Polynomials

On $(-\infty, \infty)$ with weight $w(x) = e^{-x^2}$ :

\langle f, g \rangle = \int_{-\infty}^{\infty} f(x)g(x)e^{-x^2}\, dx

Gram-Schmidt produces Hermite polynomials $H_n(x)$ , used in quantum mechanics.

Theorem 7.25: Three-Term Recurrence

Orthogonal polynomials from Gram-Schmidt always satisfy a recurrence:

P_{n+1}(x) = (A_n x + B_n) P_n(x) - C_n P_{n-1}(x)

This allows efficient computation without repeated orthogonalization.

Example 7.31c: Fourier Series Connection

On $[0, 2\pi]$ with inner product $\langle f, g \rangle = \int_0^{2\pi} f(x)\overline{g(x)}\, dx$ :

The set $\{e^{inx}\}_{n \in \mathbb{Z}}$ is already orthogonal! Normalizing gives Fourier basis.

Remark 7.21a: Applications of Orthogonal Polynomials

Numerical integration: Gaussian quadrature uses zeros of orthogonal polynomials
Approximation: Best polynomial approximation via projection
Differential equations: Sturm-Liouville eigenfunctions
Physics: Quantum harmonic oscillator (Hermite), hydrogen atom (Laguerre)

Example 7.31d: Least Squares Polynomial Fitting

To fit data $\{(x_i, y_i)\}$ with a degree- $n$ polynomial:

Form matrix $A_{ij} = x_i^{j-1}$ (Vandermonde)
Compute $A = QR$
Solve $Rc = Q^T y$ for coefficients

Or: orthogonalize $\{1, x, x^2, \ldots\}$ at data points, then project.

6. Common Mistakes

Forgetting to normalize

After computing uₖ, you must normalize: eₖ = uₖ/||uₖ||. The algorithm produces orthogonal, not orthonormal, vectors without this step.

Using wrong projection formula

Project onto eᵢ (unit vector): ⟨vₖ, eᵢ⟩eᵢ. NOT ⟨vₖ, eᵢ⟩/||eᵢ||² (that's for non-unit vectors).

Applying to dependent vectors

Gram-Schmidt requires linearly independent input. Dependent vectors produce uₖ = 0 at some step.

Arithmetic errors in computation

Always verify orthonormality of your result: ⟨eᵢ, eⱼ⟩ = δᵢⱼ.

Using classical GS for ill-conditioned problems

For numerically sensitive applications, use modified Gram-Schmidt or Householder reflections.

Confusing uₖ and eₖ

uₖ is orthogonal but not normalized. eₖ = uₖ/||uₖ|| is the final orthonormal vector.

Wrong sign in projection

It's uₖ = vₖ MINUS projections, not plus. We subtract the old components.

Remark 7.22: Debugging Tips

If Gram-Schmidt gives wrong results:

Check input is linearly independent (if uₖ = 0, vectors are dependent)
Verify each eₖ has ||eₖ|| = 1
Check orthogonality: ⟨eᵢ, eⱼ⟩ = 0 for i ≠ j
Confirm span preservation: span{e₁,...,eₖ} = span{v₁,...,vₖ}

7. Key Formulas Summary

Gram-Schmidt

• $e_1 = v_1/\|v_1\|$
• $u_k = v_k - \sum_{i<k} \langle v_k, e_i \rangle e_i$
• $e_k = u_k/\|u_k\|$

QR Decomposition

• $A = QR$
• $Q$ : orthonormal columns
• $R$ : upper triangular
• $R_{ij} = \langle v_j, e_i \rangle$ for $i \leq j$

Algorithm Comparison

Method	Complexity	Stability	Use Case
Classical GS	O(mn²)	Poor	Hand computation
Modified GS	O(mn²)	Good	General use
Householder	O(mn²)	Excellent	Production code
Givens	O(mn²)	Excellent	Sparse matrices

8. Applications

Least Squares Problems

QR decomposition provides a numerically stable method for solving least squares:

\min_x \|Ax - b\| \quad \Rightarrow \quad Rx = Q^T b

Advantages over normal equations $A^T Ax = A^T b$ :

Better numerical stability (condition number squared in normal equations)
Works for rank-deficient matrices with column pivoting
No need to form $A^T A$ explicitly

QR Algorithm for Eigenvalues

The QR iteration computes eigenvalues of symmetric matrices:

Start with $A_0 = A$
Compute $A_k = Q_k R_k$ (QR decomposition)
Set $A_{k+1} = R_k Q_k$
Repeat until $A_k$ is diagonal (eigenvalues on diagonal)

With shifts and deflation, this is the standard method for computing all eigenvalues.

Signal Processing

Adaptive filtering: Recursive least squares uses incremental QR updates
MIMO systems: QR precoding for wireless communications
Beamforming: Orthogonalize antenna array responses
Spectral analysis: Orthogonalize basis functions for frequency estimation

Example 7.32: Data Fitting

Fit $y = c_0 + c_1 x + c_2 x^2$ to data points (1, 2), (2, 3), (3, 7), (4, 9):

A = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \end{pmatrix}, \quad b = \begin{pmatrix} 2 \\ 3 \\ 7 \\ 9 \end{pmatrix}

Compute $A = QR$ , then solve $Rc = Q^T b$ for coefficients.

9. What's Next

Building on Gram-Schmidt

Orthogonal Projections (LA-7.4)

Project vectors onto subspaces using orthonormal bases from Gram-Schmidt.

Spectral Theorem (LA-7.5)

Self-adjoint operators have orthonormal eigenbases (use Gram-Schmidt in eigenspaces).

SVD (LA-8.1)

Singular value decomposition produces two orthonormal bases via related QR-type algorithms.

Numerical Methods

Householder reflections and Givens rotations for production-quality implementations.

Example 7.33: Practice Problem 1

Apply Gram-Schmidt to $v_1 = (1, 1, 1)^T$ , $v_2 = (1, 0, 0)^T$ , $v_3 = (0, 1, 0)^T$ :

Solution outline:

$e_1 = \frac{1}{\sqrt{3}}(1, 1, 1)^T$
Project $v_2$ onto $e_1$ , subtract, normalize
Project $v_3$ onto $e_1, e_2$ , subtract, normalize

Example 7.34: Practice Problem 2

Find the QR decomposition of $A = \begin{pmatrix} 1 & 2 \\ 2 & 1 \\ 2 & 2 \end{pmatrix}$ :

Solution outline:

Apply Gram-Schmidt to columns of $A$
Form $Q$ from orthonormal columns
Compute $R = Q^T A$

Remark 7.23: Study Tips

Practice by hand: Work through several examples to internalize the algorithm
Verify results: Always check $Q^T Q = I$ and $A = QR$
Understand the geometry: Visualize projection and subtraction in 2D/3D
Know the variants: Classical vs modified, and when to use each

10. Quick Reference

Classical Gram-Schmidt

e₁ = v₁/||v₁||
For k = 2, 3, ..., n:
uₖ = vₖ - Σᵢ₌₁ᵏ⁻¹⟨vₖ, eᵢ⟩eᵢ
eₖ = uₖ/||uₖ||

Modified Gram-Schmidt

For k = 1, 2, ..., n:
uₖ = vₖ
For i = 1, ..., k-1:
uₖ = uₖ - ⟨uₖ, eᵢ⟩eᵢ
eₖ = uₖ/||uₖ||

Key Properties

• span{e₁,...,eₖ} = span{v₁,...,vₖ}
• ⟨eᵢ, eⱼ⟩ = δᵢⱼ (orthonormal)
• Detects linear dependence (uₖ = 0)
• A = QR (Q orthonormal, R upper triangular)

Applications

• Least squares: Rx = Qᵀb
• Eigenvalue computation (QR algorithm)
• Orthogonal polynomials
• Signal processing

Remark 7.24: Historical Note

The process is named after Jørgen Pedersen Gram (1850-1916) and Erhard Schmidt (1876-1959). Gram introduced orthogonalization in 1879; Schmidt independently developed it in 1907 for function spaces.

Verification Checklist

After computing orthonormal basis {e₁, ..., eₙ}, verify:

||eₖ|| = 1 for all k
⟨eᵢ, eⱼ⟩ = 0 for i ≠ j
span{e₁, ..., eₖ} = span{v₁, ..., vₖ}
If QR: verify A = QR by matrix multiplication

Common Computation Shortcuts

• If vₖ ⊥ all previous eᵢ, just normalize: eₖ = vₖ/||vₖ||
• For symmetric input, exploit structure to reduce computation
• Use R = Qᵀ A instead of computing R entry-by-entry
• For verification, only check n(n-1)/2 + n values (not n²)

11. Advanced Topics

Theorem 7.26: QR with Column Pivoting

For potentially rank-deficient $A$ , use column pivoting:

AP = QR

where $P$ is a permutation matrix that reorders columns by decreasing norm at each step.

Remark 7.25: Rank-Revealing QR

Column pivoting helps reveal the numerical rank:

Diagonal entries of $R$ are in decreasing order
Small diagonal entry indicates near-dependence
Truncate at first small diagonal for low-rank approximation

Definition 7.14: Givens Rotation

A Givens rotation $G(i, j, \theta)$ rotates in the $(i, j)$ plane by angle $\theta$ :

G = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}

Used to zero out individual entries (good for sparse matrices).

Example 7.35: Givens Rotation Example

To zero out $a_{21}$ in $A = \begin{pmatrix} 3 & 1 \\ 4 & 2 \end{pmatrix}$ :

Choose $\cos\theta = 3/5$ , $\sin\theta = 4/5$ (normalize (3, 4)):

G^T A = \begin{pmatrix} 3/5 & 4/5 \\ -4/5 & 3/5 \end{pmatrix} \begin{pmatrix} 3 & 1 \\ 4 & 2 \end{pmatrix} = \begin{pmatrix} 5 & 11/5 \\ 0 & 2/5 \end{pmatrix}

Definition 7.15: Householder Reflection

A Householder reflector $H = I - 2uu^T$ (with $\|u\| = 1$ ) reflects vectors across the hyperplane perpendicular to $u$ .

Used to zero out entire subcolumns in one operation.

Example 7.36: Householder Example

To map $x = (3, 4)^T$ to $\|x\| e_1 = (5, 0)^T$ :

u = \frac{x - \|x\|e_1}{\|x - \|x\|e_1\|} = \frac{(-2, 4)^T}{\sqrt{20}} = \frac{1}{\sqrt{5}}(-1, 2)^T

Then $H = I - 2uu^T$ maps $x$ to $(5, 0)^T$ .

Theorem 7.27: Householder QR

For $A \in M_{m \times n}$ , there exist Householder reflectors $H_1, \ldots, H_n$ such that:

H_n \cdots H_1 A = R \quad \text{(upper triangular)}

Then $Q = H_1 \cdots H_n$ (product of orthogonal matrices is orthogonal).

Remark 7.26: Comparison: Householder vs Givens

Aspect	Householder	Givens
Operation	Zero entire subcolumn	Zero single entry
Best for	Dense matrices	Sparse matrices
Parallelization	Harder	Easier
Stability	Excellent	Excellent

12. More Worked Examples

Example 7.37: Complete ℝ⁴ Example

Apply Gram-Schmidt to vectors in $\mathbb{R}^4$ :

v_1 = (1, 0, 0, 0)^T, \quad v_2 = (1, 1, 0, 0)^T, \quad v_3 = (1, 1, 1, 0)^T, \quad v_4 = (1, 1, 1, 1)^T

Step 1: $e_1 = v_1 = (1, 0, 0, 0)^T$ (already unit)

Step 2: $\langle v_2, e_1 \rangle = 1$

u_2 = (1, 1, 0, 0)^T - 1 \cdot (1, 0, 0, 0)^T = (0, 1, 0, 0)^T

$e_2 = (0, 1, 0, 0)^T$ (already unit)

Step 3: $\langle v_3, e_1 \rangle = 1$ , $\langle v_3, e_2 \rangle = 1$

u_3 = (1, 1, 1, 0)^T - (1, 0, 0, 0)^T - (0, 1, 0, 0)^T = (0, 0, 1, 0)^T = e_3

Step 4: Similarly, $e_4 = (0, 0, 0, 1)^T$

Result: Standard basis (this special input already has nice structure).

Example 7.38: Non-Standard Inner Product

In $\mathbb{R}^2$ with inner product $\langle x, y \rangle = 2x_1 y_1 + x_2 y_2$ , orthogonalize $v_1 = (1, 1)^T$ , $v_2 = (1, 0)^T$ :

Step 1: $\|v_1\|^2 = 2(1) + 1 = 3$ , so $e_1 = \frac{1}{\sqrt{3}}(1, 1)^T$

Step 2: $\langle v_2, e_1 \rangle = \frac{1}{\sqrt{3}}(2 + 0) = \frac{2}{\sqrt{3}}$

u_2 = (1, 0)^T - \frac{2}{3}(1, 1)^T = (\frac{1}{3}, -\frac{2}{3})^T

Normalize with the weighted inner product to get $e_2$ .

Example 7.39: QR for System Solving

Solve $Ax = b$ where $A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix}$ , $b = \begin{pmatrix} 1 \\ 2 \\ 4 \end{pmatrix}$ :

Compute $A = QR$ via Gram-Schmidt
$Q^T b = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}$
Solve $Rx = Q^T b$ (upper triangular system)
This gives the least squares solution $x$

Example 7.40: Extending Orthonormal Set

Given orthonormal $e_1 = (1, 0, 0)^T$ , extend to orthonormal basis of $\mathbb{R}^3$ :

Choose any $v_2$ not parallel to $e_1$ , e.g., $v_2 = (1, 1, 0)^T$
$u_2 = v_2 - \langle v_2, e_1 \rangle e_1 = (0, 1, 0)^T$
$e_2 = (0, 1, 0)^T$ (already unit)
Choose $v_3 = (0, 0, 1)^T$ , already orthogonal to $e_1, e_2$
$e_3 = (0, 0, 1)^T$

Example 7.41: Orthogonalizing Trigonometric Functions

Apply Gram-Schmidt to $\{1, \cos x, \cos 2x\}$ on $[0, 2\pi]$ :

These are already orthogonal! Check:

\int_0^{2\pi} \cos x\, dx = 0, \quad \int_0^{2\pi} \cos 2x\, dx = 0

\int_0^{2\pi} \cos x \cos 2x\, dx = \frac{1}{2}\int_0^{2\pi} (\cos 3x + \cos x)\, dx = 0

Just normalize: $e_0 = 1/\sqrt{2\pi}$ , $e_1 = \cos x/\sqrt{\pi}$ , $e_2 = \cos 2x/\sqrt{\pi}$ .

Remark 7.27: When Input is Already Orthogonal

If the input vectors are already orthogonal, Gram-Schmidt reduces to normalization only:

e_k = \frac{v_k}{\|v_k\|}

All projection terms vanish since $\langle v_k, e_i \rangle = 0$ for $i < k$ .

13. Implementation Notes

Pseudocode: Classical Gram-Schmidt

function classical_gs(V):
    n = number of columns of V
    Q = empty matrix
    R = zeros(n, n)
    for k = 1 to n:
        q = V[:, k]  // k-th column
        for i = 1 to k-1:
            R[i, k] = <Q[:, i], V[:, k]>
            q = q - R[i, k] * Q[:, i]
        R[k, k] = ||q||
        Q[:, k] = q / R[k, k]
    return Q, R

Pseudocode: Modified Gram-Schmidt

function modified_gs(V):
    n = number of columns of V
    Q = copy of V
    R = zeros(n, n)
    for k = 1 to n:
        R[k, k] = ||Q[:, k]||
        Q[:, k] = Q[:, k] / R[k, k]
        for j = k+1 to n:
            R[k, j] = <Q[:, k], Q[:, j]>
            Q[:, j] = Q[:, j] - R[k, j] * Q[:, k]
    return Q, R

Remark 7.28: Key Differences

Classical: Compute all projections using original $v_k$
Modified: Update $v_j$ immediately after computing each projection
Memory: Modified can work in-place, overwriting V
Parallelism: Classical projections are independent; modified are sequential

Numerical Considerations

Tolerance for zero: If $\|u_k\| < \epsilon$ , treat $v_k$ as dependent
Reorthogonalization: If $\|Q^T Q - I\|$ is large, run a second pass
Scaling: Normalize input columns to similar magnitudes first
Pivoting: Process columns in order of decreasing norm for stability

Example 7.42: Detecting Near-Dependence

If during Gram-Schmidt we get $\|u_k\| < 10^{-10}$ (for double precision):

The columns are numerically dependent
Either skip this column or stop the algorithm
The rank is less than the number of columns

14. Connections to Other Topics

Linear Algebra Connections

Eigenvalues (Chapter 6)

• QR algorithm computes eigenvalues
• Orthonormalize eigenvectors in eigenspaces
• Spectral theorem uses orthonormal eigenbases

Orthogonal Projections (LA-7.4)

• $P = QQ^T$ projects onto col(Q)
• Best approximation = orthogonal projection
• Least squares = projection onto col(A)

SVD (LA-8.1)

• SVD produces two orthonormal bases
• Can be computed via QR-like iterations
• Bidiagonalization uses Householder reflections

Determinants (Chapter 5)

• det(Q) = ±1 for orthogonal Q
• det(A) = det(R) (Q has det ±1)
• Volume = |det| preserved by orthogonal

Theorem 7.28: Determinant via QR

For square $A$ with $A = QR$ :

\det(A) = \det(Q) \det(R) = \pm \prod_{i=1}^n R_{ii}

Since $R$ is triangular, its determinant is the product of diagonal entries.

Remark 7.29: Why Gram-Schmidt is Fundamental

Gram-Schmidt is more than an algorithm—it's a proof technique:

Proves every inner product space has an orthonormal basis
Proves QR decomposition exists and is unique (with sign convention)
Provides constructive approach to orthogonal projections
Foundation for numerical methods in scientific computing

Example 7.43: Connection to Projection

The orthogonal projection onto $\text{span}\{e_1, \ldots, e_k\}$ is:

P_k = \sum_{i=1}^k e_i e_i^T \quad \text{or} \quad P_k = Q_k Q_k^T

where $Q_k = [e_1 | \cdots | e_k]$ . This projects any vector onto the span.

Theorem 7.29: Projection Formula

For orthonormal $\{e_1, \ldots, e_k\}$ , the projection of $v$ onto their span is:

\text{proj}(v) = \sum_{i=1}^k \langle v, e_i \rangle e_i

This is exactly what Gram-Schmidt computes and subtracts at each step!

Remark 7.30: Gram-Schmidt as Projections

Gram-Schmidt can be viewed as repeated projections:

u_k = v_k - \text{proj}_{\text{span}\{e_1, \ldots, e_{k-1}\}}(v_k)

Each step subtracts the projection onto the span of previous basis vectors.

15. Chapter Summary

Key Takeaways

The Algorithm

• Normalize first vector
• For each subsequent vector: subtract projections, normalize
• Output: orthonormal basis spanning same space
• Modified GS: update vectors sequentially for stability

QR Decomposition

• A = QR with Q orthonormal, R upper triangular
• Q from Gram-Schmidt on columns of A
• R entries are projection coefficients
• Unique with positive diagonal constraint

Applications

• Least squares: Rx = Qᵀb
• Eigenvalue computation: QR algorithm
• Orthogonal polynomials: Legendre, Chebyshev
• Signal processing and data fitting

Numerical Aspects

• Classical GS: simple but unstable
• Modified GS: more stable in practice
• Householder: production-quality stability
• Givens: good for sparse matrices

Remark 7.31: Mastery Checklist

You've mastered Gram-Schmidt when you can:

✓ Execute the algorithm correctly by hand
✓ Verify orthonormality of output
✓ Extract QR decomposition from Gram-Schmidt
✓ Explain why modified GS is more stable
✓ Apply to function spaces (polynomials, etc.)
✓ Use QR for least squares problems
✓ Recognize when input is nearly dependent

Pro Tips for Exams

• Always verify your answer: check ⟨eᵢ, eⱼ⟩ = δᵢⱼ and A = QR
• Watch for already-orthogonal inputs (only normalize)
• If uₖ = 0, input vectors are linearly dependent
• Remember: subtract projections (not add)
• For function spaces, use the given inner product (often an integral)

Gram-Schmidt Practice

12

Questions

0

Correct

0%

Accuracy

1

Gram-Schmidt takes as input:

Easy

Not attempted

2

The first step in Gram-Schmidt is:

Easy

Not attempted

3

In Gram-Schmidt,

u_k

is computed by:

Medium

Not attempted

4

Gram-Schmidt proves that:

Medium

Not attempted

5

If input has

n

vectors, output has:

Easy

Not attempted

6

QR decomposition gives

A = QR

where:

Medium

Not attempted

7

Modified Gram-Schmidt is preferred because:

Medium

Not attempted

8

Gram-Schmidt on polynomials

\{1, x, x^2\}

with

\int_0^1

inner product gives:

Hard

Not attempted

9

The projection

\text{proj}_e(v)

for unit

e

is:

Easy

Not attempted

10

Gram-Schmidt preserves:

Medium

Not attempted

11

If

v_3 \in \text{span}\{v_1, v_2\}

, Gram-Schmidt will:

Medium

Not attempted

12

The R matrix in QR contains:

Hard

Not attempted

Frequently Asked Questions

What does Gram-Schmidt actually do?

It transforms any basis into an orthonormal basis spanning the same space. At each step, it makes the current vector orthogonal to all previous ones, then normalizes it.

Why subtract projections?

The projection of vₖ onto eᵢ is the 'component of vₖ in the eᵢ direction.' Subtracting all these projections removes the parts of vₖ that are in the span of previous vectors, leaving only the 'new' direction.

What if the input vectors are linearly dependent?

You'll get a zero vector at some step (after subtracting projections). This actually detects linear dependence! In practice, skip such vectors or stop the algorithm.

Why is numerical stability important?

Rounding errors accumulate. Classical Gram-Schmidt can produce vectors that are far from orthogonal in floating-point arithmetic. Modified Gram-Schmidt reduces error propagation.

What is the Modified Gram-Schmidt algorithm?

Instead of computing all projections using original vₖ, update vₖ sequentially: subtract proj_{e₁}(vₖ), then subtract proj_{e₂}(updated vₖ), etc. This reduces error accumulation.

How does Gram-Schmidt relate to QR decomposition?

A = QR where Q has orthonormal columns (from Gram-Schmidt) and R is upper triangular (the projection coefficients). Column j of A = Σᵢ Rᵢⱼ · (column i of Q).

Can Gram-Schmidt work on functions?

Yes! With an integral inner product, Gram-Schmidt orthogonalizes function spaces. Starting from {1, x, x², ...} gives orthogonal polynomials (Legendre, etc.).

What's the computational complexity?

For n vectors in ℝᵐ: O(mn²) operations. Each of n vectors requires projections onto up to n previous vectors, each projection costs O(m).

Is the orthonormal basis unique?

No! Different orderings of input vectors give different orthonormal bases. Also, each eₖ can be ±1 times what we compute. But the span is always the same.

What are alternatives to Gram-Schmidt?

Householder reflections and Givens rotations are more stable for numerical QR. They're used in production-quality linear algebra libraries (LAPACK, etc.).