MathIsimple
LA-7.3
Available
Algorithm

Gram-Schmidt Orthogonalization

The Gram-Schmidt process is an algorithm that transforms any basis into an orthonormal basis. It's a constructive proof that every inner product space has an orthonormal basis, and forms the foundation for QR decomposition—one of the most important algorithms in numerical linear algebra.

Learning Objectives
  • Understand the Gram-Schmidt orthogonalization algorithm
  • Apply Gram-Schmidt to construct orthonormal bases from any basis
  • Compute step-by-step examples in ℝⁿ
  • Understand the connection to QR decomposition
  • Recognize numerical stability issues with classical Gram-Schmidt
  • Learn the modified Gram-Schmidt algorithm
  • Apply Gram-Schmidt to function spaces
  • Use Gram-Schmidt for polynomial orthogonalization
Prerequisites
  • Orthogonality and orthonormal sets (LA-7.2)
  • Inner product definition (LA-7.1)
  • Linear independence and bases (LA-2.4-2.5)
  • Projections onto vectors

1. The Gram-Schmidt Algorithm

Theorem 7.22: Gram-Schmidt Process

Given linearly independent vectors {v1,v2,,vn}\{v_1, v_2, \ldots, v_n\}, the Gram-Schmidt process produces orthonormal vectors {e1,e2,,en}\{e_1, e_2, \ldots, e_n\} with:

span{e1,,ek}=span{v1,,vk}for each k\text{span}\{e_1, \ldots, e_k\} = \text{span}\{v_1, \ldots, v_k\} \quad \text{for each } k
Definition 7.12: Gram-Schmidt Algorithm

Step 1: Normalize the first vector:

e1=v1v1e_1 = \frac{v_1}{\|v_1\|}

Step k (for k = 2, 3, ...): Orthogonalize then normalize:

uk=vki=1k1vk,eiei,ek=ukuku_k = v_k - \sum_{i=1}^{k-1} \langle v_k, e_i \rangle e_i, \quad e_k = \frac{u_k}{\|u_k\|}
Remark 7.16: Intuition

At step k, we subtract from vkv_k its projections onto all previous orthonormal vectors. This removes all components in the directions of e1,,ek1e_1, \ldots, e_{k-1}, leaving only the "new" direction perpendicular to them.

Proof:

Correctness: We show by induction that {e1,,ek}\{e_1, \ldots, e_k\} is orthonormal and spans the same space as {v1,,vk}\{v_1, \ldots, v_k\}.

Base case: e1=v1/v1e_1 = v_1/\|v_1\| has unit norm and span{e1}=span{v1}\text{span}\{e_1\} = \text{span}\{v_1\}.

Inductive step: Assume true for k1k-1. Then:

  • uku_k is orthogonal to each eie_i for i<ki < k: uk,ei=vk,eivk,ei=0\langle u_k, e_i \rangle = \langle v_k, e_i \rangle - \langle v_k, e_i \rangle = 0
  • uk0u_k \neq 0 since vkv_k is not in span{v1,,vk1}\text{span}\{v_1, \ldots, v_{k-1}\} (linear independence)
  • ek=uk/uke_k = u_k/\|u_k\| is a unit vector orthogonal to e1,,ek1e_1, \ldots, e_{k-1}
  • vk=uk+vk,eieispan{e1,,ek}v_k = u_k + \sum \langle v_k, e_i \rangle e_i \in \text{span}\{e_1, \ldots, e_k\}
Corollary 7.6: Existence of Orthonormal Bases

Every finite-dimensional inner product space has an orthonormal basis.

Proof:

Start with any basis {v1,,vn}\{v_1, \ldots, v_n\} (exists by finite-dimensionality). Apply Gram-Schmidt to obtain orthonormal {e1,,en}\{e_1, \ldots, e_n\} spanning the same space.

Remark 7.16a: Geometric Interpretation

The projection projei(vk)=vk,eiei\text{proj}_{e_i}(v_k) = \langle v_k, e_i \rangle e_i is the component of vkv_k in the direction of eie_i. Subtracting these removes all "old" directions, leaving only the genuinely new component.

Example 7.27a: Simple 2D Example

Apply Gram-Schmidt to v1=(3,4)Tv_1 = (3, 4)^T, v2=(1,0)Tv_2 = (1, 0)^T:

Step 1: v1=5\|v_1\| = 5, so e1=(3/5,4/5)Te_1 = (3/5, 4/5)^T

Step 2: v2,e1=3/5\langle v_2, e_1 \rangle = 3/5

u2=(1,0)T35(35,45)T=(1925,1225)T=(1625,1225)Tu_2 = (1, 0)^T - \frac{3}{5}(\frac{3}{5}, \frac{4}{5})^T = (1 - \frac{9}{25}, -\frac{12}{25})^T = (\frac{16}{25}, -\frac{12}{25})^T
u2=125256+144=2025=45\|u_2\| = \frac{1}{25}\sqrt{256 + 144} = \frac{20}{25} = \frac{4}{5}

So e2=(4/5,3/5)Te_2 = (4/5, -3/5)^T. Verify: e1,e2=12/2512/25=0\langle e_1, e_2 \rangle = 12/25 - 12/25 = 0

Theorem 7.22a: Span Preservation Property

At each step k of Gram-Schmidt:

span{e1,,ek}=span{v1,,vk}\text{span}\{e_1, \ldots, e_k\} = \text{span}\{v_1, \ldots, v_k\}

The orthonormal vectors span exactly the same subspace as the original vectors up to that point.

Remark 7.16b: Uniqueness

The orthonormal basis produced by Gram-Schmidt depends on:

  • The order of input vectors (different orderings give different bases)
  • Sign choices (can flip any ekeke_k \to -e_k)

But the span at each step is uniquely determined by the input order.

Definition 7.12a: Projection Operator

The orthogonal projection onto span{e1,,ek}\text{span}\{e_1, \ldots, e_k\} is:

Pk(v)=i=1kv,eieiP_k(v) = \sum_{i=1}^{k} \langle v, e_i \rangle e_i

In Gram-Schmidt, uk=vkPk1(vk)u_k = v_k - P_{k-1}(v_k).

2. Step-by-Step Example

Example 7.28: Gram-Schmidt in ℝ³

Apply Gram-Schmidt to v1=(1,1,0)T,v2=(1,0,1)T,v3=(0,1,1)Tv_1 = (1, 1, 0)^T, v_2 = (1, 0, 1)^T, v_3 = (0, 1, 1)^T.

Step 1: v1=2\|v_1\| = \sqrt{2}

e1=12(1,1,0)Te_1 = \frac{1}{\sqrt{2}}(1, 1, 0)^T

Step 2: v2,e1=12(1+0+0)=12\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(1 + 0 + 0) = \frac{1}{\sqrt{2}}

u2=v212e1=(1,0,1)T12(1,1,0)T=(12,12,1)Tu_2 = v_2 - \frac{1}{\sqrt{2}} e_1 = (1, 0, 1)^T - \frac{1}{2}(1, 1, 0)^T = (\frac{1}{2}, -\frac{1}{2}, 1)^T
u2=14+14+1=32,e2=16(1,1,2)T\|u_2\| = \sqrt{\frac{1}{4} + \frac{1}{4} + 1} = \sqrt{\frac{3}{2}}, \quad e_2 = \frac{1}{\sqrt{6}}(1, -1, 2)^T

Step 3: v3,e1=12\langle v_3, e_1 \rangle = \frac{1}{\sqrt{2}}, v3,e2=16(01+2)=16\langle v_3, e_2 \rangle = \frac{1}{\sqrt{6}}(0 - 1 + 2) = \frac{1}{\sqrt{6}}

u3=v312e116e2=(13,13,13)Tu_3 = v_3 - \frac{1}{\sqrt{2}} e_1 - \frac{1}{\sqrt{6}} e_2 = (-\frac{1}{3}, \frac{1}{3}, \frac{1}{3})^T
e3=13(1,1,1)Te_3 = \frac{1}{\sqrt{3}}(-1, 1, 1)^T
Remark 7.17: Verification

Always verify: check that ei,ej=δij\langle e_i, e_j \rangle = \delta_{ij} and that each ei=1\|e_i\| = 1.

Example 7.28a: Another ℝ³ Example

Apply Gram-Schmidt to v1=(1,0,1)T,v2=(0,1,1)T,v3=(1,1,0)Tv_1 = (1, 0, 1)^T, v_2 = (0, 1, 1)^T, v_3 = (1, 1, 0)^T:

Step 1: v1=2\|v_1\| = \sqrt{2}

e1=12(1,0,1)Te_1 = \frac{1}{\sqrt{2}}(1, 0, 1)^T

Step 2: v2,e1=12(0+0+1)=12\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(0 + 0 + 1) = \frac{1}{\sqrt{2}}

u2=(0,1,1)T12(1,0,1)T=(12,1,12)Tu_2 = (0, 1, 1)^T - \frac{1}{2}(1, 0, 1)^T = (-\frac{1}{2}, 1, \frac{1}{2})^T
u2=14+1+14=32\|u_2\| = \sqrt{\frac{1}{4} + 1 + \frac{1}{4}} = \sqrt{\frac{3}{2}}
e2=23(12,1,12)T=16(1,2,1)Te_2 = \sqrt{\frac{2}{3}}(-\frac{1}{2}, 1, \frac{1}{2})^T = \frac{1}{\sqrt{6}}(-1, 2, 1)^T

Step 3: Compute projections and subtract...

Example 7.28b: Complex Example

In C2\mathbb{C}^2 with standard inner product, orthogonalize v1=(1,i)T,v2=(1,1)Tv_1 = (1, i)^T, v_2 = (1, 1)^T:

Step 1: v12=12+i2=2\|v_1\|^2 = |1|^2 + |i|^2 = 2, so e1=12(1,i)Te_1 = \frac{1}{\sqrt{2}}(1, i)^T

Step 2: v2,e1=12(1+1iˉ)=12(1i)\langle v_2, e_1 \rangle = \frac{1}{\sqrt{2}}(1 + 1 \cdot \bar{i}) = \frac{1}{\sqrt{2}}(1 - i)

u2=(1,1)T1i2(1,i)T=(11i2,1i(1i)2)Tu_2 = (1, 1)^T - \frac{1-i}{2}(1, i)^T = (1 - \frac{1-i}{2}, 1 - \frac{i(1-i)}{2})^T

Continue simplifying and normalize...

Remark 7.17a: Tips for Hand Computation
  • Keep fractions exact until the final step to avoid rounding errors
  • Verify orthogonality after each step: uk,ei=0\langle u_k, e_i \rangle = 0 for i<ki < k
  • Use symmetry when possible (e.g., if input has integer entries)
  • The matrix form (QR) can sometimes be faster to compute
Example 7.28c: Detecting Linear Dependence

Apply Gram-Schmidt to v1=(1,2)T,v2=(2,4)Tv_1 = (1, 2)^T, v_2 = (2, 4)^T:

Step 1: e1=15(1,2)Te_1 = \frac{1}{\sqrt{5}}(1, 2)^T

Step 2: v2,e1=15(2+8)=105=25\langle v_2, e_1 \rangle = \frac{1}{\sqrt{5}}(2 + 8) = \frac{10}{\sqrt{5}} = 2\sqrt{5}

u2=(2,4)T2515(1,2)T=(2,4)T2(1,2)T=(0,0)Tu_2 = (2, 4)^T - 2\sqrt{5} \cdot \frac{1}{\sqrt{5}}(1, 2)^T = (2, 4)^T - 2(1, 2)^T = (0, 0)^T

u2=0u_2 = 0 indicates v2v_2 is linearly dependent on v1v_1!

3. QR Decomposition

Theorem 7.23: QR Decomposition

Every matrix AMm×n(R)A \in M_{m \times n}(\mathbb{R}) with linearly independent columns can be written as:

A=QRA = QR

where QQ is m×nm \times n with orthonormal columns and RR is n×nn \times n upper triangular with positive diagonal entries.

Proof:

Apply Gram-Schmidt to columns of AA. Let QQ have columns e1,,ene_1, \ldots, e_n.

Since vjspan{e1,,ej}v_j \in \text{span}\{e_1, \ldots, e_j\}, we can write:

vj=i=1jRijeiwhere Rij=vj,eiv_j = \sum_{i=1}^{j} R_{ij} e_i \quad \text{where } R_{ij} = \langle v_j, e_i \rangle

This gives A=QRA = QR with RR upper triangular.

Example 7.29: QR Decomposition Example

From the previous example with A=[v1v2v3]A = [v_1 | v_2 | v_3]:

Q=(12161312161302613)Q = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{-1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{pmatrix}
R=(21212032160033)R = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{3}{2}} & \frac{1}{\sqrt{6}} \\ 0 & 0 & \frac{\sqrt{3}}{3} \end{pmatrix}
Remark 7.18: Applications of QR

QR decomposition is used for:

  • Solving least squares: AxbRx=QTbAx \approx b \Rightarrow Rx = Q^T b
  • Computing eigenvalues: QR iteration algorithm
  • Orthonormalizing: Finding orthonormal bases
Theorem 7.23a: Uniqueness of QR

If AA has linearly independent columns, then A=QRA = QR is unique when we require:

  • QQ has orthonormal columns
  • RR is upper triangular with positive diagonal entries
Proof:

Suppose A=Q1R1=Q2R2A = Q_1 R_1 = Q_2 R_2. Then Q2TQ1=R2R11Q_2^T Q_1 = R_2 R_1^{-1}.

Left side is orthogonal, right side is upper triangular. An orthogonal upper triangular matrix must be diagonal with ±1\pm 1 entries.

With positive diagonal constraint, Q2TQ1=IQ_2^T Q_1 = I, so Q1=Q2Q_1 = Q_2 and R1=R2R_1 = R_2.

Example 7.29a: Solving Least Squares via QR

To solve AxbAx \approx b (minimize Axb\|Ax - b\|):

  1. Compute A=QRA = QR
  2. Compute QTbQ^T b
  3. Solve Rx=QTbRx = Q^T b by back substitution
Axb2=QRxb2=RxQTb2+(IQQT)b2\|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^T b\|^2 + \|(I - QQ^T)b\|^2

Minimized when Rx=QTbRx = Q^T b.

Remark 7.18a: Computational Cost

For an m×nm \times n matrix:

  • Gram-Schmidt QR: O(mn2)O(mn^2) operations
  • Householder QR: O(mn2n3/3)O(mn^2 - n^3/3) operations, more stable
  • Givens QR: Better for sparse matrices
Definition 7.13a: Reduced vs Full QR

For AMm×nA \in M_{m \times n} with m>nm > n:

  • Reduced QR: QQ is m×nm \times n, RR is n×nn \times n
  • Full QR: QQ is m×mm \times m (orthogonal), RR is m×nm \times n

Reduced QR is more memory-efficient; full QR is useful for orthogonal projections.

Example 7.29b: QR for Rank Computation

The diagonal of RR reveals the rank:

rank(A)=number of nonzero diagonal entries of R\text{rank}(A) = \text{number of nonzero diagonal entries of } R

In numerical computation, count entries greater than a tolerance.

Theorem 7.23b: QR and Column Space

The columns of QQ form an orthonormal basis for col(A)\text{col}(A):

col(A)=col(Q)\text{col}(A) = \text{col}(Q)

4. Numerical Stability

Remark 7.19: Problem with Classical Gram-Schmidt

In floating-point arithmetic, rounding errors accumulate. After many steps, the computed eie_i may not be orthogonal! This is especially bad when input vectors are nearly linearly dependent.

Definition 7.13: Modified Gram-Schmidt (MGS)

Instead of computing all projections at once, update vkv_k sequentially:

vk(1)=vkvk,e1e1v_k^{(1)} = v_k - \langle v_k, e_1 \rangle e_1
vk(2)=vk(1)vk(1),e2e2v_k^{(2)} = v_k^{(1)} - \langle v_k^{(1)}, e_2 \rangle e_2
\vdots
uk=vk(k1)vk(k1),ek1ek1u_k = v_k^{(k-1)} - \langle v_k^{(k-1)}, e_{k-1} \rangle e_{k-1}
Remark 7.20: Why MGS is Better

In exact arithmetic, classical and modified Gram-Schmidt give identical results. But with floating-point errors, MGS produces vectors closer to orthogonal because errors don't accumulate as quickly.

Example 7.30: Loss of Orthogonality

Consider nearly dependent vectors. Classical GS might produce:

e1,e3108 (should be 0)\langle e_1, e_3 \rangle \approx 10^{-8} \text{ (should be 0)}

Modified GS typically gives e1,e31015\langle e_1, e_3 \rangle \approx 10^{-15} (machine precision).

Theorem 7.24: Error Analysis

Let ϵ\epsilon be machine precision. For classical Gram-Schmidt:

QTQI=O(ϵκ(A))\|Q^T Q - I\| = O(\epsilon \cdot \kappa(A))

For modified Gram-Schmidt:

QTQI=O(ϵκ(A))\|Q^T Q - I\| = O(\epsilon \cdot \kappa(A))

But the constant for MGS is much smaller in practice.

Definition 7.13b: Condition Number

The condition number of AA is κ(A)=σmax/σmin\kappa(A) = \sigma_{\max}/\sigma_{\min} (ratio of largest to smallest singular values).

Large κ(A)\kappa(A) indicates near-dependence; Gram-Schmidt is unstable for such matrices.

Remark 7.20a: When to Use Each Method
  • Classical GS: For hand computation or when stability doesn't matter
  • Modified GS: When some numerical stability is needed
  • Householder: For production-quality numerical code (LAPACK uses this)
Example 7.30a: Comparing Methods

For the nearly dependent vectors v1=(1,108,0)Tv_1 = (1, 10^{-8}, 0)^T, v2=(1,0,108)Tv_2 = (1, 0, 10^{-8})^T:

  • Classical GS: e1,e2108\langle e_1, e_2 \rangle \approx 10^{-8}
  • Modified GS: e1,e21016\langle e_1, e_2 \rangle \approx 10^{-16}
  • Householder: e1,e21016\langle e_1, e_2 \rangle \approx 10^{-16}
Definition 7.13c: Householder Reflection

A Householder reflector is H=I2uuTH = I - 2uu^T where u=1\|u\| = 1.

It reflects across the hyperplane perpendicular to uu. Used to zero out entries below the diagonal.

Remark 7.20b: Reorthogonalization

To improve classical GS, apply it twice:

  1. Run classical Gram-Schmidt once
  2. Check orthogonality: if QTQI>tol\|Q^T Q - I\| > \text{tol}, reorthogonalize
  3. Repeat until orthogonal enough

This is called "Gram-Schmidt with reorthogonalization" (GS2).

5. Gram-Schmidt on Function Spaces

Example 7.31: Orthogonalizing Polynomials on [-1,1]

Start with {1,x,x2}\{1, x, x^2\} and inner product f,g=11f(x)g(x)dx\langle f, g \rangle = \int_{-1}^{1} f(x)g(x)\, dx.

Step 1: 12=111dx=2\|1\|^2 = \int_{-1}^{1} 1\, dx = 2, so e0=1/2e_0 = 1/\sqrt{2}

Step 2: x,e0=1211xdx=0\langle x, e_0 \rangle = \frac{1}{\sqrt{2}}\int_{-1}^{1} x\, dx = 0 (odd function!)

So u1=xu_1 = x, x2=11x2dx=2/3\|x\|^2 = \int_{-1}^{1} x^2\, dx = 2/3, e1=3/2xe_1 = \sqrt{3/2}\, x

Step 3: x2,e0=1223\langle x^2, e_0 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{2}{3}, x2,e1=0\langle x^2, e_1 \rangle = 0 (even × odd = odd)

u2=x213,e2x213u_2 = x^2 - \frac{1}{3}, \quad e_2 \propto x^2 - \frac{1}{3}

These are (scalar multiples of) Legendre polynomials!

Remark 7.21: Orthogonal Polynomials

Different intervals and weight functions give different orthogonal polynomial families. Gram-Schmidt provides a systematic way to construct them.

Example 7.31a: Chebyshev Polynomials

On [1,1][-1, 1] with weight w(x)=1/1x2w(x) = 1/\sqrt{1-x^2}:

f,g=11f(x)g(x)1x2dx\langle f, g \rangle = \int_{-1}^{1} \frac{f(x)g(x)}{\sqrt{1-x^2}}\, dx

Gram-Schmidt on {1,x,x2,}\{1, x, x^2, \ldots\} produces Chebyshev polynomials Tn(x)T_n(x).

Example 7.31b: Hermite Polynomials

On (,)(-\infty, \infty) with weight w(x)=ex2w(x) = e^{-x^2}:

f,g=f(x)g(x)ex2dx\langle f, g \rangle = \int_{-\infty}^{\infty} f(x)g(x)e^{-x^2}\, dx

Gram-Schmidt produces Hermite polynomials Hn(x)H_n(x), used in quantum mechanics.

Theorem 7.25: Three-Term Recurrence

Orthogonal polynomials from Gram-Schmidt always satisfy a recurrence:

Pn+1(x)=(Anx+Bn)Pn(x)CnPn1(x)P_{n+1}(x) = (A_n x + B_n) P_n(x) - C_n P_{n-1}(x)

This allows efficient computation without repeated orthogonalization.

Example 7.31c: Fourier Series Connection

On [0,2π][0, 2\pi] with inner product f,g=02πf(x)g(x)dx\langle f, g \rangle = \int_0^{2\pi} f(x)\overline{g(x)}\, dx:

The set {einx}nZ\{e^{inx}\}_{n \in \mathbb{Z}} is already orthogonal! Normalizing gives Fourier basis.

Remark 7.21a: Applications of Orthogonal Polynomials
  • Numerical integration: Gaussian quadrature uses zeros of orthogonal polynomials
  • Approximation: Best polynomial approximation via projection
  • Differential equations: Sturm-Liouville eigenfunctions
  • Physics: Quantum harmonic oscillator (Hermite), hydrogen atom (Laguerre)
Example 7.31d: Least Squares Polynomial Fitting

To fit data {(xi,yi)}\{(x_i, y_i)\} with a degree-nn polynomial:

  1. Form matrix Aij=xij1A_{ij} = x_i^{j-1} (Vandermonde)
  2. Compute A=QRA = QR
  3. Solve Rc=QTyRc = Q^T y for coefficients

Or: orthogonalize {1,x,x2,}\{1, x, x^2, \ldots\} at data points, then project.

6. Common Mistakes

Forgetting to normalize

After computing uₖ, you must normalize: eₖ = uₖ/||uₖ||. The algorithm produces orthogonal, not orthonormal, vectors without this step.

Using wrong projection formula

Project onto eᵢ (unit vector): ⟨vₖ, eᵢ⟩eᵢ. NOT ⟨vₖ, eᵢ⟩/||eᵢ||² (that's for non-unit vectors).

Applying to dependent vectors

Gram-Schmidt requires linearly independent input. Dependent vectors produce uₖ = 0 at some step.

Arithmetic errors in computation

Always verify orthonormality of your result: ⟨eᵢ, eⱼ⟩ = δᵢⱼ.

Using classical GS for ill-conditioned problems

For numerically sensitive applications, use modified Gram-Schmidt or Householder reflections.

Confusing uₖ and eₖ

uₖ is orthogonal but not normalized. eₖ = uₖ/||uₖ|| is the final orthonormal vector.

Wrong sign in projection

It's uₖ = vₖ MINUS projections, not plus. We subtract the old components.

Remark 7.22: Debugging Tips

If Gram-Schmidt gives wrong results:

  • Check input is linearly independent (if uₖ = 0, vectors are dependent)
  • Verify each eₖ has ||eₖ|| = 1
  • Check orthogonality: ⟨eᵢ, eⱼ⟩ = 0 for i ≠ j
  • Confirm span preservation: span{e₁,...,eₖ} = span{v₁,...,vₖ}

7. Key Formulas Summary

Gram-Schmidt

  • e1=v1/v1e_1 = v_1/\|v_1\|
  • uk=vki<kvk,eieiu_k = v_k - \sum_{i<k} \langle v_k, e_i \rangle e_i
  • ek=uk/uke_k = u_k/\|u_k\|

QR Decomposition

  • A=QRA = QR
  • QQ: orthonormal columns
  • RR: upper triangular
  • Rij=vj,eiR_{ij} = \langle v_j, e_i \rangle for iji \leq j

Algorithm Comparison

MethodComplexityStabilityUse Case
Classical GSO(mn²)PoorHand computation
Modified GSO(mn²)GoodGeneral use
HouseholderO(mn²)ExcellentProduction code
GivensO(mn²)ExcellentSparse matrices

8. Applications

Least Squares Problems

QR decomposition provides a numerically stable method for solving least squares:

minxAxbRx=QTb\min_x \|Ax - b\| \quad \Rightarrow \quad Rx = Q^T b

Advantages over normal equations ATAx=ATbA^T Ax = A^T b:

  • Better numerical stability (condition number squared in normal equations)
  • Works for rank-deficient matrices with column pivoting
  • No need to form ATAA^T A explicitly
QR Algorithm for Eigenvalues

The QR iteration computes eigenvalues of symmetric matrices:

  1. Start with A0=AA_0 = A
  2. Compute Ak=QkRkA_k = Q_k R_k (QR decomposition)
  3. Set Ak+1=RkQkA_{k+1} = R_k Q_k
  4. Repeat until AkA_k is diagonal (eigenvalues on diagonal)

With shifts and deflation, this is the standard method for computing all eigenvalues.

Signal Processing
  • Adaptive filtering: Recursive least squares uses incremental QR updates
  • MIMO systems: QR precoding for wireless communications
  • Beamforming: Orthogonalize antenna array responses
  • Spectral analysis: Orthogonalize basis functions for frequency estimation
Example 7.32: Data Fitting

Fit y=c0+c1x+c2x2y = c_0 + c_1 x + c_2 x^2 to data points (1, 2), (2, 3), (3, 7), (4, 9):

A=(1111241391416),b=(2379)A = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \end{pmatrix}, \quad b = \begin{pmatrix} 2 \\ 3 \\ 7 \\ 9 \end{pmatrix}

Compute A=QRA = QR, then solve Rc=QTbRc = Q^T b for coefficients.

9. What's Next

Building on Gram-Schmidt

Orthogonal Projections (LA-7.4)

Project vectors onto subspaces using orthonormal bases from Gram-Schmidt.

Spectral Theorem (LA-7.5)

Self-adjoint operators have orthonormal eigenbases (use Gram-Schmidt in eigenspaces).

SVD (LA-8.1)

Singular value decomposition produces two orthonormal bases via related QR-type algorithms.

Numerical Methods

Householder reflections and Givens rotations for production-quality implementations.

Example 7.33: Practice Problem 1

Apply Gram-Schmidt to v1=(1,1,1)Tv_1 = (1, 1, 1)^T, v2=(1,0,0)Tv_2 = (1, 0, 0)^T, v3=(0,1,0)Tv_3 = (0, 1, 0)^T:

Solution outline:

  1. e1=13(1,1,1)Te_1 = \frac{1}{\sqrt{3}}(1, 1, 1)^T
  2. Project v2v_2 onto e1e_1, subtract, normalize
  3. Project v3v_3 onto e1,e2e_1, e_2, subtract, normalize
Example 7.34: Practice Problem 2

Find the QR decomposition of A=(122122)A = \begin{pmatrix} 1 & 2 \\ 2 & 1 \\ 2 & 2 \end{pmatrix}:

Solution outline:

  1. Apply Gram-Schmidt to columns of AA
  2. Form QQ from orthonormal columns
  3. Compute R=QTAR = Q^T A
Remark 7.23: Study Tips
  • Practice by hand: Work through several examples to internalize the algorithm
  • Verify results: Always check QTQ=IQ^T Q = I and A=QRA = QR
  • Understand the geometry: Visualize projection and subtraction in 2D/3D
  • Know the variants: Classical vs modified, and when to use each

10. Quick Reference

Classical Gram-Schmidt

  1. e₁ = v₁/||v₁||
  2. For k = 2, 3, ..., n:
  3. uₖ = vₖ - Σᵢ₌₁ᵏ⁻¹⟨vₖ, eᵢ⟩eᵢ
  4. eₖ = uₖ/||uₖ||

Modified Gram-Schmidt

  1. For k = 1, 2, ..., n:
  2. uₖ = vₖ
  3. For i = 1, ..., k-1:
  4. uₖ = uₖ - ⟨uₖ, eᵢ⟩eᵢ
  5. eₖ = uₖ/||uₖ||

Key Properties

  • • span{e₁,...,eₖ} = span{v₁,...,vₖ}
  • • ⟨eᵢ, eⱼ⟩ = δᵢⱼ (orthonormal)
  • • Detects linear dependence (uₖ = 0)
  • • A = QR (Q orthonormal, R upper triangular)

Applications

  • • Least squares: Rx = Qᵀb
  • • Eigenvalue computation (QR algorithm)
  • • Orthogonal polynomials
  • • Signal processing
Remark 7.24: Historical Note

The process is named after Jørgen Pedersen Gram (1850-1916) and Erhard Schmidt (1876-1959). Gram introduced orthogonalization in 1879; Schmidt independently developed it in 1907 for function spaces.

Verification Checklist

After computing orthonormal basis {e₁, ..., eₙ}, verify:

  1. ||eₖ|| = 1 for all k
  2. ⟨eᵢ, eⱼ⟩ = 0 for i ≠ j
  3. span{e₁, ..., eₖ} = span{v₁, ..., vₖ}
  4. If QR: verify A = QR by matrix multiplication

Common Computation Shortcuts

  • • If vₖ ⊥ all previous eᵢ, just normalize: eₖ = vₖ/||vₖ||
  • • For symmetric input, exploit structure to reduce computation
  • • Use R = Qᵀ A instead of computing R entry-by-entry
  • • For verification, only check n(n-1)/2 + n values (not n²)

11. Advanced Topics

Theorem 7.26: QR with Column Pivoting

For potentially rank-deficient AA, use column pivoting:

AP=QRAP = QR

where PP is a permutation matrix that reorders columns by decreasing norm at each step.

Remark 7.25: Rank-Revealing QR

Column pivoting helps reveal the numerical rank:

  • Diagonal entries of RR are in decreasing order
  • Small diagonal entry indicates near-dependence
  • Truncate at first small diagonal for low-rank approximation
Definition 7.14: Givens Rotation

A Givens rotation G(i,j,θ)G(i, j, \theta) rotates in the (i,j)(i, j) plane by angle θ\theta:

G=(cosθsinθsinθcosθ)G = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}

Used to zero out individual entries (good for sparse matrices).

Example 7.35: Givens Rotation Example

To zero out a21a_{21} in A=(3142)A = \begin{pmatrix} 3 & 1 \\ 4 & 2 \end{pmatrix}:

Choose cosθ=3/5\cos\theta = 3/5, sinθ=4/5\sin\theta = 4/5 (normalize (3, 4)):

GTA=(3/54/54/53/5)(3142)=(511/502/5)G^T A = \begin{pmatrix} 3/5 & 4/5 \\ -4/5 & 3/5 \end{pmatrix} \begin{pmatrix} 3 & 1 \\ 4 & 2 \end{pmatrix} = \begin{pmatrix} 5 & 11/5 \\ 0 & 2/5 \end{pmatrix}
Definition 7.15: Householder Reflection

A Householder reflector H=I2uuTH = I - 2uu^T (with u=1\|u\| = 1) reflects vectors across the hyperplane perpendicular to uu.

Used to zero out entire subcolumns in one operation.

Example 7.36: Householder Example

To map x=(3,4)Tx = (3, 4)^T to xe1=(5,0)T\|x\| e_1 = (5, 0)^T:

u=xxe1xxe1=(2,4)T20=15(1,2)Tu = \frac{x - \|x\|e_1}{\|x - \|x\|e_1\|} = \frac{(-2, 4)^T}{\sqrt{20}} = \frac{1}{\sqrt{5}}(-1, 2)^T

Then H=I2uuTH = I - 2uu^T maps xx to (5,0)T(5, 0)^T.

Theorem 7.27: Householder QR

For AMm×nA \in M_{m \times n}, there exist Householder reflectors H1,,HnH_1, \ldots, H_n such that:

HnH1A=R(upper triangular)H_n \cdots H_1 A = R \quad \text{(upper triangular)}

Then Q=H1HnQ = H_1 \cdots H_n (product of orthogonal matrices is orthogonal).

Remark 7.26: Comparison: Householder vs Givens
AspectHouseholderGivens
OperationZero entire subcolumnZero single entry
Best forDense matricesSparse matrices
ParallelizationHarderEasier
StabilityExcellentExcellent

12. More Worked Examples

Example 7.37: Complete ℝ⁴ Example

Apply Gram-Schmidt to vectors in R4\mathbb{R}^4:

v1=(1,0,0,0)T,v2=(1,1,0,0)T,v3=(1,1,1,0)T,v4=(1,1,1,1)Tv_1 = (1, 0, 0, 0)^T, \quad v_2 = (1, 1, 0, 0)^T, \quad v_3 = (1, 1, 1, 0)^T, \quad v_4 = (1, 1, 1, 1)^T

Step 1: e1=v1=(1,0,0,0)Te_1 = v_1 = (1, 0, 0, 0)^T (already unit)

Step 2: v2,e1=1\langle v_2, e_1 \rangle = 1

u2=(1,1,0,0)T1(1,0,0,0)T=(0,1,0,0)Tu_2 = (1, 1, 0, 0)^T - 1 \cdot (1, 0, 0, 0)^T = (0, 1, 0, 0)^T

e2=(0,1,0,0)Te_2 = (0, 1, 0, 0)^T (already unit)

Step 3: v3,e1=1\langle v_3, e_1 \rangle = 1, v3,e2=1\langle v_3, e_2 \rangle = 1

u3=(1,1,1,0)T(1,0,0,0)T(0,1,0,0)T=(0,0,1,0)T=e3u_3 = (1, 1, 1, 0)^T - (1, 0, 0, 0)^T - (0, 1, 0, 0)^T = (0, 0, 1, 0)^T = e_3

Step 4: Similarly, e4=(0,0,0,1)Te_4 = (0, 0, 0, 1)^T

Result: Standard basis (this special input already has nice structure).

Example 7.38: Non-Standard Inner Product

In R2\mathbb{R}^2 with inner product x,y=2x1y1+x2y2\langle x, y \rangle = 2x_1 y_1 + x_2 y_2, orthogonalize v1=(1,1)Tv_1 = (1, 1)^T, v2=(1,0)Tv_2 = (1, 0)^T:

Step 1: v12=2(1)+1=3\|v_1\|^2 = 2(1) + 1 = 3, so e1=13(1,1)Te_1 = \frac{1}{\sqrt{3}}(1, 1)^T

Step 2: v2,e1=13(2+0)=23\langle v_2, e_1 \rangle = \frac{1}{\sqrt{3}}(2 + 0) = \frac{2}{\sqrt{3}}

u2=(1,0)T23(1,1)T=(13,23)Tu_2 = (1, 0)^T - \frac{2}{3}(1, 1)^T = (\frac{1}{3}, -\frac{2}{3})^T

Normalize with the weighted inner product to get e2e_2.

Example 7.39: QR for System Solving

Solve Ax=bAx = b where A=(111213)A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix}, b=(124)b = \begin{pmatrix} 1 \\ 2 \\ 4 \end{pmatrix}:

  1. Compute A=QRA = QR via Gram-Schmidt
  2. QTb=(c1c2)Q^T b = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}
  3. Solve Rx=QTbRx = Q^T b (upper triangular system)
  4. This gives the least squares solution xx
Example 7.40: Extending Orthonormal Set

Given orthonormal e1=(1,0,0)Te_1 = (1, 0, 0)^T, extend to orthonormal basis of R3\mathbb{R}^3:

  1. Choose any v2v_2 not parallel to e1e_1, e.g., v2=(1,1,0)Tv_2 = (1, 1, 0)^T
  2. u2=v2v2,e1e1=(0,1,0)Tu_2 = v_2 - \langle v_2, e_1 \rangle e_1 = (0, 1, 0)^T
  3. e2=(0,1,0)Te_2 = (0, 1, 0)^T (already unit)
  4. Choose v3=(0,0,1)Tv_3 = (0, 0, 1)^T, already orthogonal to e1,e2e_1, e_2
  5. e3=(0,0,1)Te_3 = (0, 0, 1)^T
Example 7.41: Orthogonalizing Trigonometric Functions

Apply Gram-Schmidt to {1,cosx,cos2x}\{1, \cos x, \cos 2x\} on [0,2π][0, 2\pi]:

These are already orthogonal! Check:

02πcosxdx=0,02πcos2xdx=0\int_0^{2\pi} \cos x\, dx = 0, \quad \int_0^{2\pi} \cos 2x\, dx = 0
02πcosxcos2xdx=1202π(cos3x+cosx)dx=0\int_0^{2\pi} \cos x \cos 2x\, dx = \frac{1}{2}\int_0^{2\pi} (\cos 3x + \cos x)\, dx = 0

Just normalize: e0=1/2πe_0 = 1/\sqrt{2\pi}, e1=cosx/πe_1 = \cos x/\sqrt{\pi}, e2=cos2x/πe_2 = \cos 2x/\sqrt{\pi}.

Remark 7.27: When Input is Already Orthogonal

If the input vectors are already orthogonal, Gram-Schmidt reduces to normalization only:

ek=vkvke_k = \frac{v_k}{\|v_k\|}

All projection terms vanish since vk,ei=0\langle v_k, e_i \rangle = 0 for i<ki < k.

13. Implementation Notes

Pseudocode: Classical Gram-Schmidt
function classical_gs(V):
    n = number of columns of V
    Q = empty matrix
    R = zeros(n, n)
    for k = 1 to n:
        q = V[:, k]  // k-th column
        for i = 1 to k-1:
            R[i, k] = <Q[:, i], V[:, k]>
            q = q - R[i, k] * Q[:, i]
        R[k, k] = ||q||
        Q[:, k] = q / R[k, k]
    return Q, R
Pseudocode: Modified Gram-Schmidt
function modified_gs(V):
    n = number of columns of V
    Q = copy of V
    R = zeros(n, n)
    for k = 1 to n:
        R[k, k] = ||Q[:, k]||
        Q[:, k] = Q[:, k] / R[k, k]
        for j = k+1 to n:
            R[k, j] = <Q[:, k], Q[:, j]>
            Q[:, j] = Q[:, j] - R[k, j] * Q[:, k]
    return Q, R
Remark 7.28: Key Differences
  • Classical: Compute all projections using original vkv_k
  • Modified: Update vjv_j immediately after computing each projection
  • Memory: Modified can work in-place, overwriting V
  • Parallelism: Classical projections are independent; modified are sequential
Numerical Considerations
  • Tolerance for zero: If uk<ϵ\|u_k\| < \epsilon, treat vkv_k as dependent
  • Reorthogonalization: If QTQI\|Q^T Q - I\| is large, run a second pass
  • Scaling: Normalize input columns to similar magnitudes first
  • Pivoting: Process columns in order of decreasing norm for stability
Example 7.42: Detecting Near-Dependence

If during Gram-Schmidt we get uk<1010\|u_k\| < 10^{-10} (for double precision):

  • The columns are numerically dependent
  • Either skip this column or stop the algorithm
  • The rank is less than the number of columns

14. Connections to Other Topics

Linear Algebra Connections

Eigenvalues (Chapter 6)
  • • QR algorithm computes eigenvalues
  • • Orthonormalize eigenvectors in eigenspaces
  • • Spectral theorem uses orthonormal eigenbases
Orthogonal Projections (LA-7.4)
  • P=QQTP = QQ^T projects onto col(Q)
  • • Best approximation = orthogonal projection
  • • Least squares = projection onto col(A)
SVD (LA-8.1)
  • • SVD produces two orthonormal bases
  • • Can be computed via QR-like iterations
  • • Bidiagonalization uses Householder reflections
Determinants (Chapter 5)
  • • det(Q) = ±1 for orthogonal Q
  • • det(A) = det(R) (Q has det ±1)
  • • Volume = |det| preserved by orthogonal
Theorem 7.28: Determinant via QR

For square AA with A=QRA = QR:

det(A)=det(Q)det(R)=±i=1nRii\det(A) = \det(Q) \det(R) = \pm \prod_{i=1}^n R_{ii}

Since RR is triangular, its determinant is the product of diagonal entries.

Remark 7.29: Why Gram-Schmidt is Fundamental

Gram-Schmidt is more than an algorithm—it's a proof technique:

  • Proves every inner product space has an orthonormal basis
  • Proves QR decomposition exists and is unique (with sign convention)
  • Provides constructive approach to orthogonal projections
  • Foundation for numerical methods in scientific computing
Example 7.43: Connection to Projection

The orthogonal projection onto span{e1,,ek}\text{span}\{e_1, \ldots, e_k\} is:

Pk=i=1keieiTorPk=QkQkTP_k = \sum_{i=1}^k e_i e_i^T \quad \text{or} \quad P_k = Q_k Q_k^T

where Qk=[e1ek]Q_k = [e_1 | \cdots | e_k]. This projects any vector onto the span.

Theorem 7.29: Projection Formula

For orthonormal {e1,,ek}\{e_1, \ldots, e_k\}, the projection of vv onto their span is:

proj(v)=i=1kv,eiei\text{proj}(v) = \sum_{i=1}^k \langle v, e_i \rangle e_i

This is exactly what Gram-Schmidt computes and subtracts at each step!

Remark 7.30: Gram-Schmidt as Projections

Gram-Schmidt can be viewed as repeated projections:

uk=vkprojspan{e1,,ek1}(vk)u_k = v_k - \text{proj}_{\text{span}\{e_1, \ldots, e_{k-1}\}}(v_k)

Each step subtracts the projection onto the span of previous basis vectors.

15. Chapter Summary

Key Takeaways

The Algorithm
  • • Normalize first vector
  • • For each subsequent vector: subtract projections, normalize
  • • Output: orthonormal basis spanning same space
  • • Modified GS: update vectors sequentially for stability
QR Decomposition
  • • A = QR with Q orthonormal, R upper triangular
  • • Q from Gram-Schmidt on columns of A
  • • R entries are projection coefficients
  • • Unique with positive diagonal constraint
Applications
  • • Least squares: Rx = Qᵀb
  • • Eigenvalue computation: QR algorithm
  • • Orthogonal polynomials: Legendre, Chebyshev
  • • Signal processing and data fitting
Numerical Aspects
  • • Classical GS: simple but unstable
  • • Modified GS: more stable in practice
  • • Householder: production-quality stability
  • • Givens: good for sparse matrices
Remark 7.31: Mastery Checklist

You've mastered Gram-Schmidt when you can:

  • ✓ Execute the algorithm correctly by hand
  • ✓ Verify orthonormality of output
  • ✓ Extract QR decomposition from Gram-Schmidt
  • ✓ Explain why modified GS is more stable
  • ✓ Apply to function spaces (polynomials, etc.)
  • ✓ Use QR for least squares problems
  • ✓ Recognize when input is nearly dependent

Pro Tips for Exams

  • • Always verify your answer: check ⟨eᵢ, eⱼ⟩ = δᵢⱼ and A = QR
  • • Watch for already-orthogonal inputs (only normalize)
  • • If uₖ = 0, input vectors are linearly dependent
  • • Remember: subtract projections (not add)
  • • For function spaces, use the given inner product (often an integral)
Gram-Schmidt Practice
12
Questions
0
Correct
0%
Accuracy
1
Gram-Schmidt takes as input:
Easy
Not attempted
2
The first step in Gram-Schmidt is:
Easy
Not attempted
3
In Gram-Schmidt, uku_k is computed by:
Medium
Not attempted
4
Gram-Schmidt proves that:
Medium
Not attempted
5
If input has nn vectors, output has:
Easy
Not attempted
6
QR decomposition gives A=QRA = QR where:
Medium
Not attempted
7
Modified Gram-Schmidt is preferred because:
Medium
Not attempted
8
Gram-Schmidt on polynomials {1,x,x2}\{1, x, x^2\} with 01\int_0^1 inner product gives:
Hard
Not attempted
9
The projection proje(v)\text{proj}_e(v) for unit ee is:
Easy
Not attempted
10
Gram-Schmidt preserves:
Medium
Not attempted
11
If v3span{v1,v2}v_3 \in \text{span}\{v_1, v_2\}, Gram-Schmidt will:
Medium
Not attempted
12
The R matrix in QR contains:
Hard
Not attempted

Frequently Asked Questions

What does Gram-Schmidt actually do?

It transforms any basis into an orthonormal basis spanning the same space. At each step, it makes the current vector orthogonal to all previous ones, then normalizes it.

Why subtract projections?

The projection of vₖ onto eᵢ is the 'component of vₖ in the eᵢ direction.' Subtracting all these projections removes the parts of vₖ that are in the span of previous vectors, leaving only the 'new' direction.

What if the input vectors are linearly dependent?

You'll get a zero vector at some step (after subtracting projections). This actually detects linear dependence! In practice, skip such vectors or stop the algorithm.

Why is numerical stability important?

Rounding errors accumulate. Classical Gram-Schmidt can produce vectors that are far from orthogonal in floating-point arithmetic. Modified Gram-Schmidt reduces error propagation.

What is the Modified Gram-Schmidt algorithm?

Instead of computing all projections using original vₖ, update vₖ sequentially: subtract proj_{e₁}(vₖ), then subtract proj_{e₂}(updated vₖ), etc. This reduces error accumulation.

How does Gram-Schmidt relate to QR decomposition?

A = QR where Q has orthonormal columns (from Gram-Schmidt) and R is upper triangular (the projection coefficients). Column j of A = Σᵢ Rᵢⱼ · (column i of Q).

Can Gram-Schmidt work on functions?

Yes! With an integral inner product, Gram-Schmidt orthogonalizes function spaces. Starting from {1, x, x², ...} gives orthogonal polynomials (Legendre, etc.).

What's the computational complexity?

For n vectors in ℝᵐ: O(mn²) operations. Each of n vectors requires projections onto up to n previous vectors, each projection costs O(m).

Is the orthonormal basis unique?

No! Different orderings of input vectors give different orthonormal bases. Also, each eₖ can be ±1 times what we compute. But the span is always the same.

What are alternatives to Gram-Schmidt?

Householder reflections and Givens rotations are more stable for numerical QR. They're used in production-quality linear algebra libraries (LAPACK, etc.).