The Gram-Schmidt process is an algorithm that transforms any basis into an orthonormal basis. It's a constructive proof that every inner product space has an orthonormal basis, and forms the foundation for QR decomposition—one of the most important algorithms in numerical linear algebra.
Given linearly independent vectors , the Gram-Schmidt process produces orthonormal vectors with:
Step 1: Normalize the first vector:
Step k (for k = 2, 3, ...): Orthogonalize then normalize:
At step k, we subtract from its projections onto all previous orthonormal vectors. This removes all components in the directions of , leaving only the "new" direction perpendicular to them.
Correctness: We show by induction that is orthonormal and spans the same space as .
Base case: has unit norm and .
Inductive step: Assume true for . Then:
Every finite-dimensional inner product space has an orthonormal basis.
Start with any basis (exists by finite-dimensionality). Apply Gram-Schmidt to obtain orthonormal spanning the same space.
The projection is the component of in the direction of . Subtracting these removes all "old" directions, leaving only the genuinely new component.
Apply Gram-Schmidt to , :
Step 1: , so
Step 2:
So . Verify: ✓
At each step k of Gram-Schmidt:
The orthonormal vectors span exactly the same subspace as the original vectors up to that point.
The orthonormal basis produced by Gram-Schmidt depends on:
But the span at each step is uniquely determined by the input order.
The orthogonal projection onto is:
In Gram-Schmidt, .
Apply Gram-Schmidt to .
Step 1:
Step 2:
Step 3: ,
Always verify: check that and that each .
Apply Gram-Schmidt to :
Step 1:
Step 2:
Step 3: Compute projections and subtract...
In with standard inner product, orthogonalize :
Step 1: , so
Step 2:
Continue simplifying and normalize...
Apply Gram-Schmidt to :
Step 1:
Step 2:
indicates is linearly dependent on !
Every matrix with linearly independent columns can be written as:
where is with orthonormal columns and is upper triangular with positive diagonal entries.
Apply Gram-Schmidt to columns of . Let have columns .
Since , we can write:
This gives with upper triangular.
From the previous example with :
QR decomposition is used for:
If has linearly independent columns, then is unique when we require:
Suppose . Then .
Left side is orthogonal, right side is upper triangular. An orthogonal upper triangular matrix must be diagonal with entries.
With positive diagonal constraint, , so and .
To solve (minimize ):
Minimized when .
For an matrix:
For with :
Reduced QR is more memory-efficient; full QR is useful for orthogonal projections.
The diagonal of reveals the rank:
In numerical computation, count entries greater than a tolerance.
The columns of form an orthonormal basis for :
In floating-point arithmetic, rounding errors accumulate. After many steps, the computed may not be orthogonal! This is especially bad when input vectors are nearly linearly dependent.
Instead of computing all projections at once, update sequentially:
In exact arithmetic, classical and modified Gram-Schmidt give identical results. But with floating-point errors, MGS produces vectors closer to orthogonal because errors don't accumulate as quickly.
Consider nearly dependent vectors. Classical GS might produce:
Modified GS typically gives (machine precision).
Let be machine precision. For classical Gram-Schmidt:
For modified Gram-Schmidt:
But the constant for MGS is much smaller in practice.
The condition number of is (ratio of largest to smallest singular values).
Large indicates near-dependence; Gram-Schmidt is unstable for such matrices.
For the nearly dependent vectors , :
A Householder reflector is where .
It reflects across the hyperplane perpendicular to . Used to zero out entries below the diagonal.
To improve classical GS, apply it twice:
This is called "Gram-Schmidt with reorthogonalization" (GS2).
Start with and inner product .
Step 1: , so
Step 2: (odd function!)
So , ,
Step 3: , (even × odd = odd)
These are (scalar multiples of) Legendre polynomials!
Different intervals and weight functions give different orthogonal polynomial families. Gram-Schmidt provides a systematic way to construct them.
On with weight :
Gram-Schmidt on produces Chebyshev polynomials .
On with weight :
Gram-Schmidt produces Hermite polynomials , used in quantum mechanics.
Orthogonal polynomials from Gram-Schmidt always satisfy a recurrence:
This allows efficient computation without repeated orthogonalization.
On with inner product :
The set is already orthogonal! Normalizing gives Fourier basis.
To fit data with a degree- polynomial:
Or: orthogonalize at data points, then project.
After computing uₖ, you must normalize: eₖ = uₖ/||uₖ||. The algorithm produces orthogonal, not orthonormal, vectors without this step.
Project onto eᵢ (unit vector): ⟨vₖ, eᵢ⟩eᵢ. NOT ⟨vₖ, eᵢ⟩/||eᵢ||² (that's for non-unit vectors).
Gram-Schmidt requires linearly independent input. Dependent vectors produce uₖ = 0 at some step.
Always verify orthonormality of your result: ⟨eᵢ, eⱼ⟩ = δᵢⱼ.
For numerically sensitive applications, use modified Gram-Schmidt or Householder reflections.
uₖ is orthogonal but not normalized. eₖ = uₖ/||uₖ|| is the final orthonormal vector.
It's uₖ = vₖ MINUS projections, not plus. We subtract the old components.
If Gram-Schmidt gives wrong results:
| Method | Complexity | Stability | Use Case |
|---|---|---|---|
| Classical GS | O(mn²) | Poor | Hand computation |
| Modified GS | O(mn²) | Good | General use |
| Householder | O(mn²) | Excellent | Production code |
| Givens | O(mn²) | Excellent | Sparse matrices |
QR decomposition provides a numerically stable method for solving least squares:
Advantages over normal equations :
The QR iteration computes eigenvalues of symmetric matrices:
With shifts and deflation, this is the standard method for computing all eigenvalues.
Fit to data points (1, 2), (2, 3), (3, 7), (4, 9):
Compute , then solve for coefficients.
Project vectors onto subspaces using orthonormal bases from Gram-Schmidt.
Self-adjoint operators have orthonormal eigenbases (use Gram-Schmidt in eigenspaces).
Singular value decomposition produces two orthonormal bases via related QR-type algorithms.
Householder reflections and Givens rotations for production-quality implementations.
Apply Gram-Schmidt to , , :
Solution outline:
Find the QR decomposition of :
Solution outline:
The process is named after Jørgen Pedersen Gram (1850-1916) and Erhard Schmidt (1876-1959). Gram introduced orthogonalization in 1879; Schmidt independently developed it in 1907 for function spaces.
After computing orthonormal basis {e₁, ..., eₙ}, verify:
For potentially rank-deficient , use column pivoting:
where is a permutation matrix that reorders columns by decreasing norm at each step.
Column pivoting helps reveal the numerical rank:
A Givens rotation rotates in the plane by angle :
Used to zero out individual entries (good for sparse matrices).
To zero out in :
Choose , (normalize (3, 4)):
A Householder reflector (with ) reflects vectors across the hyperplane perpendicular to .
Used to zero out entire subcolumns in one operation.
To map to :
Then maps to .
For , there exist Householder reflectors such that:
Then (product of orthogonal matrices is orthogonal).
| Aspect | Householder | Givens |
|---|---|---|
| Operation | Zero entire subcolumn | Zero single entry |
| Best for | Dense matrices | Sparse matrices |
| Parallelization | Harder | Easier |
| Stability | Excellent | Excellent |
Apply Gram-Schmidt to vectors in :
Step 1: (already unit)
Step 2:
(already unit)
Step 3: ,
Step 4: Similarly,
Result: Standard basis (this special input already has nice structure).
In with inner product , orthogonalize , :
Step 1: , so
Step 2:
Normalize with the weighted inner product to get .
Solve where , :
Given orthonormal , extend to orthonormal basis of :
Apply Gram-Schmidt to on :
These are already orthogonal! Check:
Just normalize: , , .
If the input vectors are already orthogonal, Gram-Schmidt reduces to normalization only:
All projection terms vanish since for .
function classical_gs(V):
n = number of columns of V
Q = empty matrix
R = zeros(n, n)
for k = 1 to n:
q = V[:, k] // k-th column
for i = 1 to k-1:
R[i, k] = <Q[:, i], V[:, k]>
q = q - R[i, k] * Q[:, i]
R[k, k] = ||q||
Q[:, k] = q / R[k, k]
return Q, Rfunction modified_gs(V):
n = number of columns of V
Q = copy of V
R = zeros(n, n)
for k = 1 to n:
R[k, k] = ||Q[:, k]||
Q[:, k] = Q[:, k] / R[k, k]
for j = k+1 to n:
R[k, j] = <Q[:, k], Q[:, j]>
Q[:, j] = Q[:, j] - R[k, j] * Q[:, k]
return Q, RIf during Gram-Schmidt we get (for double precision):
For square with :
Since is triangular, its determinant is the product of diagonal entries.
Gram-Schmidt is more than an algorithm—it's a proof technique:
The orthogonal projection onto is:
where . This projects any vector onto the span.
For orthonormal , the projection of onto their span is:
This is exactly what Gram-Schmidt computes and subtracts at each step!
Gram-Schmidt can be viewed as repeated projections:
Each step subtracts the projection onto the span of previous basis vectors.
You've mastered Gram-Schmidt when you can:
It transforms any basis into an orthonormal basis spanning the same space. At each step, it makes the current vector orthogonal to all previous ones, then normalizes it.
The projection of vₖ onto eᵢ is the 'component of vₖ in the eᵢ direction.' Subtracting all these projections removes the parts of vₖ that are in the span of previous vectors, leaving only the 'new' direction.
You'll get a zero vector at some step (after subtracting projections). This actually detects linear dependence! In practice, skip such vectors or stop the algorithm.
Rounding errors accumulate. Classical Gram-Schmidt can produce vectors that are far from orthogonal in floating-point arithmetic. Modified Gram-Schmidt reduces error propagation.
Instead of computing all projections using original vₖ, update vₖ sequentially: subtract proj_{e₁}(vₖ), then subtract proj_{e₂}(updated vₖ), etc. This reduces error accumulation.
A = QR where Q has orthonormal columns (from Gram-Schmidt) and R is upper triangular (the projection coefficients). Column j of A = Σᵢ Rᵢⱼ · (column i of Q).
Yes! With an integral inner product, Gram-Schmidt orthogonalizes function spaces. Starting from {1, x, x², ...} gives orthogonal polynomials (Legendre, etc.).
For n vectors in ℝᵐ: O(mn²) operations. Each of n vectors requires projections onto up to n previous vectors, each projection costs O(m).
No! Different orderings of input vectors give different orthonormal bases. Also, each eₖ can be ±1 times what we compute. But the span is always the same.
Householder reflections and Givens rotations are more stable for numerical QR. They're used in production-quality linear algebra libraries (LAPACK, etc.).