LA-8.4

Available

Capstone

Applications of Linear Algebra

Linear algebra is everywhere in modern technology. From the neural networks powering AI to the graphics in video games, from Google's PageRank to quantum computers—the concepts you've learned have profound real-world applications. This capstone module showcases how theory meets practice.

Learning Objectives

Apply linear algebra to machine learning and neural networks
Understand linear algebra in computer graphics
Model networks and graphs with matrices
See connections to quantum computing
Apply SVD to image compression
Understand Markov chains and PageRank
Use linear algebra in signal processing
Connect theory to practical algorithms

1. Machine Learning & Neural Networks

Remark 8.7: Neural Networks as Matrix Operations

A neural network layer computes:

y = \sigma(Wx + b)

where $W$ is the weight matrix, $x$ is input, $b$ is bias, and $σ$ is activation function.

Example 8.9: Forward Pass

For a 2-layer network with inputs $x \in \mathbb{R}^n$ :

h = \sigma(W_1 x + b_1), \quad y = W_2 h + b_2

Training adjusts $W_1, W_2$ using gradient descent—computed via the chain rule on matrix derivatives.

Remark 8.8: Attention in Transformers

\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

The core of GPT, BERT, and modern NLP—all matrix multiplications!

Example 8.9a: Backpropagation

Gradient computation uses chain rule on matrices:

\frac{\partial L}{\partial W} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial W}

For y = Wx: ∂L/∂W = (∂L/∂y)xᵀ, a rank-1 outer product update!

Remark 8.8a: PCA for Dimensionality Reduction

Principal Component Analysis finds directions of maximum variance:

Center data: X̄ = X - mean
Compute covariance: C = X̄ᵀX̄/(n-1)
Eigendecomposition: C = VDVᵀ
Project: X_reduced = X̄V_k (top k eigenvectors)

Example 8.9b: Recommender Systems

Matrix factorization for recommendations:

R \approx UV^T

R: user-item rating matrix (sparse, mostly missing)
U: user embedding matrix
V: item embedding matrix
Predict: r_ij ≈ u_i · v_j

Remark 8.8b: Gradient Descent

Weight update rule:

W^{(t+1)} = W^{(t)} - \eta \nabla_W L

The gradient ∇_W L is computed via backpropagation through matrix operations.

2. Computer Graphics

Remark 8.9: Transformation Pipeline

Every vertex goes through:

v_{\text{screen}} = P \cdot V \cdot M \cdot v_{\text{local}}

$M$ : Model matrix (object → world)
$V$ : View matrix (world → camera)
$P$ : Projection matrix (3D → 2D)

Example 8.10: 2D Rotation Matrix

R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}

Rotates points counterclockwise by angle θ. Orthogonal with det = 1.

Example 8.10a: 3D Rotation Matrices

Rotation about z-axis:

R_z(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}

Any 3D rotation = composition of rotations about axes (Euler angles).

Definition 8.18: Homogeneous Coordinates

To include translation as matrix multiplication, use homogeneous coordinates:

\begin{pmatrix} x' \\ y' \\ 1 \end{pmatrix} = \begin{pmatrix} a & b & t_x \\ c & d & t_y \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ 1 \end{pmatrix}

Example 8.10b: Perspective Projection

3D to 2D perspective projection matrix:

P = \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \frac{f+n}{f-n} & \frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix}

where f = far plane, n = near plane. Creates realistic depth perception.

Remark 8.9a: GPU Architecture

Graphics processing units (GPUs) are optimized for:

Parallel matrix operations
Thousands of cores for vertex/pixel shaders
SIMD (Single Instruction Multiple Data)
This is why GPUs excel at both graphics AND machine learning!

3. PageRank & Network Analysis

Remark 8.10: Google PageRank

Model web as graph with transition matrix $P$ . PageRank vector $r$ satisfies:

r = \left(\alpha P + \frac{1-\alpha}{n} \mathbf{1}\mathbf{1}^T\right) r

This is an eigenvector problem! Solved by power iteration.

Example 8.11: Adjacency and Laplacian

For a graph with $n$ nodes:

Adjacency: $A_{ij} = 1$ if edge (i,j) exists
Degree: $D_{ii} = \deg(i)$
Laplacian: $L = D - A$ (eigenvalues reveal connectivity)

Definition 8.19: Markov Chains

A Markov chain has transition matrix P where:

P_{ij} = \Pr(X_{t+1} = j | X_t = i), \quad \sum_j P_{ij} = 1

The stationary distribution π satisfies πP = π (left eigenvector with eigenvalue 1).

Example 8.11a: Random Walk on Graph

Random walk transition matrix:

P_{ij} = \frac{A_{ij}}{\deg(i)}

PageRank is stationary distribution of this random walk (with teleportation).

Theorem 8.8: Perron-Frobenius

For a positive stochastic matrix P:

λ = 1 is the largest eigenvalue
Unique positive eigenvector (stationary distribution)
Power iteration converges: Pⁿx → π as n → ∞

Remark 8.10a: Spectral Clustering

Graph Laplacian eigenvalues reveal cluster structure:

Number of zero eigenvalues = number of components
Second-smallest eigenvector (Fiedler vector) reveals best cut
Used in image segmentation, community detection

4. Quantum Computing

Remark 8.11: Qubits and Gates

A qubit is a unit vector in $\mathbb{C}^2$ :

|\psi\rangle = \alpha|0\rangle + \beta|1\rangle, \quad |\alpha|^2 + |\beta|^2 = 1

Quantum gates are unitary matrices: Hadamard, Pauli, CNOT, etc.

Example 8.12: Hadamard Gate

H = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

Creates superposition: $H|0\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$

Example 8.12a: Pauli Matrices

The Pauli matrices are fundamental 2×2 unitary gates:

X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \quad Y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}, \quad Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}

X = NOT gate, Z = phase flip, Y = both.

Definition 8.20: Quantum Entanglement

Two qubits form a product state if |ψ⟩ = |a⟩ ⊗ |b⟩.

An entangled state cannot be written as a product. Example:

|\Phi^+\rangle = \frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)

Remark 8.11a: Quantum Algorithms

Key quantum algorithms use linear algebra:

Shor's: Factoring via QFT (unitary matrix)
Grover's: Search via reflection operators
VQE: Variational optimization of expectation values

Example 8.12b: CNOT Gate

Controlled-NOT acts on 2 qubits (4×4 matrix):

\text{CNOT} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}

If control qubit is |1⟩, flip target qubit. Creates entanglement!

5. More Applications

Signal Processing

Fourier transform is a linear map. Filtering is matrix multiplication. Audio, images, video—all processed with linear algebra.

GPS & Navigation

Least squares solves overdetermined position equations from satellite signals. Error correction uses linear algebra.

Control Systems

State-space models: ẋ = Ax + Bu. Stability from eigenvalues. Kalman filter for optimal estimation—all linear algebra.

Economics & Finance

Input-output models, portfolio optimization, risk analysis—matrices model economic relationships and optimize allocations.

Example 8.13: Image Compression via SVD

A grayscale image is an m×n matrix of pixel values:

Compute SVD: A = UΣVᵀ
Keep only top k singular values: A_k = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ
Storage: k(m+n+1) instead of mn
Compression ratio: mn/(k(m+n+1))

Example 8.13a: Kalman Filter

Optimal state estimation for linear dynamical systems:

x_{t+1} = Ax_t + Bu_t + w_t, \quad y_t = Cx_t + v_t

Uses matrix operations to combine predictions with observations optimally.

Remark 8.12: Linear Regression

The least squares solution to Xβ = y:

\hat{\beta} = (X^T X)^{-1} X^T y = X^+ y

Foundation of statistics, machine learning, and data science.

Example 8.13b: Portfolio Optimization

Markowitz mean-variance optimization:

\min_w \quad w^T \Sigma w \quad \text{s.t.} \quad w^T \mu = r, \quad w^T \mathbf{1} = 1

where Σ = covariance matrix, μ = expected returns, r = target return.

6. Image Processing

Definition 8.21: Convolution

2D convolution with kernel K:

(I * K)_{ij} = \sum_{m,n} I_{i+m, j+n} K_{m,n}

This is a linear operation! Can be represented as matrix multiplication.

Example 8.14: Edge Detection

Sobel operator for horizontal edges:

K_x = \begin{pmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{pmatrix}

Convolve with image to detect vertical edges (horizontal gradients).

Remark 8.13: CNN Layers

Convolutional Neural Networks use:

Convolution layers: learnable kernels
Pooling: downsampling via max/average
Fully connected: standard matrix layers

All linear algebra under the hood!

Example 8.14a: Discrete Fourier Transform

The DFT is a linear transformation:

X_k = \sum_{n=0}^{N-1} x_n e^{-2\pi i kn/N}

Matrix form: X = Fx where F is the DFT matrix (unitary up to scaling).

7. Control Theory

Definition 8.22: State-Space Model

A linear time-invariant system:

\dot{x} = Ax + Bu, \quad y = Cx + Du

x = state, u = input, y = output. A, B, C, D are system matrices.

Theorem 8.9: Stability

The system ẋ = Ax is stable iff all eigenvalues of A have negative real part.

For discrete systems x_{k+1} = Ax_k: stable iff |λᵢ| < 1 for all eigenvalues.

Example 8.15: Spring-Mass-Damper

For mẍ + cẋ + kx = u, state-space form:

A = \begin{pmatrix} 0 & 1 \\ -k/m & -c/m \end{pmatrix}, \quad B = \begin{pmatrix} 0 \\ 1/m \end{pmatrix}

Remark 8.14: Controllability and Observability

Controllable: rank([B, AB, A²B, ...]) = n
Observable: rank([C; CA; CA²; ...]) = n

These matrix rank conditions determine if a system can be controlled/observed.

8. Key Takeaways

Core Patterns

• Transformations = Matrices
• Steady states = Eigenvectors
• Compression = Low-rank approx
• Optimization = Least squares

Why It Matters

• Universal language of computation
• Efficient algorithms (GPU-optimized)
• Foundation for advanced math
• Bridges theory and practice

9. More Application Examples

Example 8.16: Cryptography: Hill Cipher

Encrypt by matrix multiplication mod 26:

c = K \cdot m \pmod{26}

K = key matrix (invertible mod 26), m = message vector.

Example 8.16a: Error-Correcting Codes

Hamming codes use parity-check matrix H:

Hc = 0 \pmod{2}

Valid codewords c are in null space of H over GF(2).

Example 8.16b: Google Search

Beyond PageRank, Google uses:

LSI: SVD on term-document matrix
Word embeddings: matrix factorization
Neural ranking: attention matrices

Example 8.16c: Face Recognition

Eigenfaces method:

Collect face images as vectors
Compute covariance matrix
Find top eigenvectors ("eigenfaces")
Project new faces, compare in eigenspace

10. Physics Applications

Example 8.17: Quantum Mechanics

Schrödinger equation in matrix form:

i\hbar \frac{d}{dt}|\psi\rangle = H|\psi\rangle

H = Hamiltonian (Hermitian matrix). Observables are Hermitian operators.

Example 8.17a: Lorentz Transformation

Special relativity coordinate change:

\Lambda = \begin{pmatrix} \gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

where γ = 1/√(1-β²), β = v/c.

Remark 8.15: Tensor Applications

Physics uses tensors everywhere:

Stress/strain: 3×3 symmetric tensors
Moment of inertia: 3×3 tensor
Metric: 4×4 in general relativity
Electromagnetic: Antisymmetric 4×4

11. Linear Algebra Course Summary

What You've Learned

Foundations (Ch 1-3)

• Vector spaces
• Linear maps
• Bases & dimension
• Rank-nullity

Core (Ch 4-6)

• Matrices & operations
• Determinants
• Eigenvalues
• Diagonalization

Advanced (Ch 7-8)

• Inner products
• Spectral theorem
• SVD
• Applications

Remark 8.16: Where to Go Next

Build on linear algebra with:

Numerical Linear Algebra: Algorithms, stability, efficiency
Abstract Algebra: Groups, rings, fields
Differential Equations: Linear systems
Optimization: Convex analysis, LP, SDP
Machine Learning: Deep learning, probabilistic models

Congratulations!

You've completed a comprehensive journey through linear algebra—from abstract vector spaces to real-world applications. The concepts and techniques you've learned form the mathematical backbone of modern science, engineering, and technology. Keep practicing, and you'll find linear algebra appearing everywhere!

12. Data Science Applications

Example 8.18: k-Means Clustering

Cluster assignment is matrix computation:

Compute distances: D_ij = ||x_i - c_j||²
Assign to nearest center: cluster_i = argmin_j D_ij
Update centers: c_j = mean of assigned points

Example 8.18a: Linear Discriminant Analysis

Find projection maximizing class separation:

\max_w \frac{w^T S_B w}{w^T S_W w}

S_B = between-class scatter, S_W = within-class scatter. Solution: generalized eigenproblem.

Example 8.18b: Support Vector Machines

SVM finds separating hyperplane via quadratic optimization:

\min_{w,b} \frac{1}{2}\|w\|^2 \quad \text{s.t.} \quad y_i(w^T x_i + b) \geq 1

Kernel methods: K(x,y) = φ(x)ᵀφ(y) without computing φ explicitly.

Remark 8.17: Kernel Methods

Kernels are inner products in feature space:

Linear: K(x,y) = xᵀy
Polynomial: K(x,y) = (xᵀy + c)ᵈ
RBF: K(x,y) = exp(-γ||x-y||²)

13. Robotics Applications

Definition 8.23: Kinematics

Robot arm transformations use homogeneous matrices:

T = \begin{pmatrix} R & p \\ 0 & 1 \end{pmatrix}

R = rotation (3×3), p = position (3×1). Chain: T_total = T₁ T₂ ... Tₙ

Example 8.19: Jacobian in Robotics

The Jacobian relates joint velocities to end-effector velocity:

\dot{x} = J(\theta) \dot{\theta}

Inverse kinematics uses J⁺ (pseudoinverse): θ̇ = J⁺ẋ

Remark 8.18: Quaternions

Rotations often use quaternions instead of matrices:

q = w + xi + yj + zk with |q| = 1
Avoids gimbal lock of Euler angles
Smooth interpolation (SLERP)
Related to SU(2) matrices

14. Biological Applications

Example 8.20: Gene Expression

Gene expression data is a matrix:

Rows: genes (thousands)
Columns: samples/conditions
PCA identifies dominant expression patterns
Clustering groups similar genes

Example 8.20a: Population Genetics

Leslie matrix models population dynamics:

\begin{pmatrix} n_0(t+1) \\ n_1(t+1) \\ \vdots \end{pmatrix} = L \begin{pmatrix} n_0(t) \\ n_1(t) \\ \vdots \end{pmatrix}

Dominant eigenvalue gives population growth rate.

Remark 8.19: Sequence Alignment

Bioinformatics uses matrices extensively:

Substitution matrices (BLOSUM, PAM)
Dynamic programming is matrix filling
Phylogenetic trees via distance matrices

15. Quick Reference

Application → Linear Algebra Tool

Machine Learning	Matrix multiplication, SVD, eigendecomposition
Graphics	Homogeneous coordinates, orthogonal matrices
PageRank	Eigenvectors, power iteration
Quantum Computing	Unitary matrices, tensor products
Image Compression	SVD, low-rank approximation
Control Systems	State-space models, eigenvalue analysis
Data Science	PCA, least squares, covariance

Key Algorithms

• Gaussian elimination: O(n³)
• Matrix multiply: O(n³) or O(n^2.37)
• SVD: O(mn²) for m×n
• Power iteration: O(n²) per step
• QR: O(n³)

Libraries

• NumPy, SciPy (Python)
• MATLAB, Octave
• Eigen (C++)
• LAPACK, BLAS (core)
• PyTorch, TensorFlow (ML)

16. Final Thoughts

The Universality of Linear Algebra

Linear algebra is one of the most practically useful areas of mathematics. Whether you go into data science, physics, engineering, finance, or any technical field, the concepts you've learned will appear repeatedly.

Remember

• Every transformation is a matrix
• Eigenvalues reveal behavior
• SVD solves everything
• Least squares is everywhere

Keep Learning

• Practice on real problems
• Implement algorithms
• Read application papers
• Connect theory to practice

Remark 8.20: The Journey Continues

Linear algebra is not just a course—it's a way of thinking:

See structure in high-dimensional problems
Transform problems into solvable forms
Understand what's really happening in algorithms
Bridge between pure math and applications

17. Practice Problems

Example 8.21: Practice Problem 1

A 2-layer neural network has W₁ (100×784) and W₂ (10×100). How many parameters?

Solution: W₁: 100×784 = 78,400. W₂: 10×100 = 1,000.

Plus biases: 100 + 10 = 110. Total: 79,510 parameters.

Example 8.21a: Practice Problem 2

For PageRank with damping factor α = 0.85 and 4 pages, what size is the transition matrix?

Solution: 4×4 matrix. Each column sums to 1.

Example 8.21b: Practice Problem 3

An image is 1024×768 grayscale. Using SVD with k=50, what's the compression ratio?

Solution: Original: 1024×768 = 786,432 values.

Compressed: 50×(1024 + 768 + 1) = 89,650 values.

Ratio: 786,432 / 89,650 ≈ 8.77× compression.

Example 8.21c: Practice Problem 4

Show that a quantum gate must preserve probability (|α|² + |β|² = 1).

Solution: For unitary U: ||U|ψ⟩||² = ⟨ψ|U†U|ψ⟩ = ⟨ψ|I|ψ⟩ = ||ψ||² = 1.

18. Common Mistakes

Forgetting normalization

In PCA, data should be centered (mean-subtracted). In PageRank, columns should sum to 1.

Confusing SVD and eigendecomposition

SVD works for any matrix. Eigendecomposition requires square, often symmetric/Hermitian.

Wrong matrix dimensions

In neural networks: y = Wx requires W (output × input) and x (input × 1).

Numerical instability

In practice, use SVD or QR instead of (AᵀA)⁻¹Aᵀ for least squares. Condition number matters!

19. Industry Applications

Netflix

• Matrix factorization for recommendations
• $1M prize won using SVD-based methods
• Billions of user-item interactions

Google

• PageRank (eigenvector computation)
• Word2Vec (matrix factorization)
• Transformer attention (matrix ops)

Tesla

• Camera calibration (projective geometry)
• Sensor fusion (Kalman filters)
• Neural networks for perception

Pixar

• 3D transformations (matrix chains)
• Character rigging (Jacobians)
• Physics simulation (linear systems)

20. Further Resources

Books

• Strang: "Linear Algebra and Its Applications"
• Axler: "Linear Algebra Done Right"
• Trefethen: "Numerical Linear Algebra"
• Boyd: "Introduction to Applied Linear Algebra"

Online

• MIT OpenCourseWare (Gilbert Strang)
• 3Blue1Brown: "Essence of Linear Algebra"
• Khan Academy
• Coursera/edX courses

Remark 8.21: Pro Tips

Implement algorithms yourself—don't just use libraries
Visualize! Plot eigenvectors, singular values, projections
Connect every new topic to what you know
Linear algebra appears everywhere—look for it!

21. Natural Language Processing

Example 8.22: Word Embeddings

Word2Vec learns word vectors via matrix factorization:

W \approx UV^T

W_ij = co-occurrence of word i and context j. Resulting vectors capture semantic meaning!

Remark 8.22: Attention Mechanism

Self-attention in transformers:

Compute Q = XW_Q, K = XW_K, V = XW_V
Attention weights: A = softmax(QKᵀ/√d)
Output: AV (weighted combination of values)

Example 8.22a: Latent Semantic Indexing

SVD on term-document matrix reveals topics:

A = U\Sigma V^T \approx U_k \Sigma_k V_k^T

Columns of U_k represent "topics", V_k shows document-topic relationships.

22. Audio Processing

Definition 8.24: Short-Time Fourier Transform

Analyze audio frequency content over time:

X(m, k) = \sum_{n=0}^{N-1} x(n + mH) w(n) e^{-2\pi i kn/N}

Result is a matrix (time × frequency spectrogram).

Example 8.23: Noise Reduction

SVD-based denoising:

Compute spectrogram matrix S
Apply SVD: S = UΣVᵀ
Keep top k components (signal)
Discard small singular values (noise)

Remark 8.23: Music Information Retrieval

Beat detection: Autocorrelation matrices
Chord recognition: Template matching via dot products
Source separation: Non-negative matrix factorization

23. Optimization

Definition 8.25: Quadratic Programming

Minimize quadratic function subject to linear constraints:

\min_x \frac{1}{2}x^T Q x + c^T x \quad \text{s.t.} \quad Ax \leq b

Q positive definite guarantees unique global minimum.

Example 8.24: Newton's Method

Second-order optimization uses Hessian:

x_{k+1} = x_k - H^{-1} \nabla f

Converges faster than gradient descent near optimum.

Remark 8.24: Semidefinite Programming

SDP optimizes over positive semidefinite matrices:

\min \langle C, X \rangle \quad \text{s.t.} \quad X \succeq 0

Used in control, combinatorics, quantum information.

24. Course Completion

🎉 Congratulations! 🎉

You've completed the entire Linear Algebra course at MathIsimple!

Chapters

30+

Topics

∞

Applications

What's Next?

• Practice: Solve more problems, implement algorithms
• Apply: Use linear algebra in your projects
• Explore: Numerical analysis, optimization, machine learning
• Share: Teach others what you've learned

Applications Practice

Questions

Correct

Accuracy

In machine learning, weights of a neural network layer are represented as:

Easy

Not attempted

PageRank algorithm finds:

Medium

Not attempted

In computer graphics, a 3D rotation is represented by:

Medium

Not attempted

Image compression using SVD keeps:

Easy

Not attempted

A Markov chain transition matrix has:

Medium

Not attempted

In quantum computing, a qubit state is:

Medium

Not attempted

Convolution in image processing can be viewed as:

Hard

Not attempted

The adjacency matrix of a graph has

A_{ij} = 1

iff:

Easy

Not attempted

PCA finds directions of:

Easy

Not attempted

Google's PageRank uses the power method to find:

Medium

Not attempted

In control theory, system stability is determined by:

Hard

Not attempted

Transformers in NLP use:

Hard

Not attempted

Frequently Asked Questions

Why is linear algebra so important in machine learning?

ML is fundamentally about learning transformations from data. Neural networks are compositions of linear transformations (matrices) and nonlinear activations. Training uses gradient-based optimization, which relies on linear algebra.

How does Google's PageRank work?

Model the web as a directed graph. The transition matrix has P_ij = 1/(out-degree of j) if j links to i. PageRank is the dominant eigenvector: the stationary distribution of a random walk on the web.

How is linear algebra used in computer graphics?

Every transformation—rotation, scaling, translation, projection—is a matrix. Rendering pipelines multiply vertices by model, view, and projection matrices. Shaders perform matrix operations on GPU.

What makes quantum computing linear algebraic?

Quantum states are vectors in Hilbert space. Quantum gates are unitary matrices. Measurement involves projections. The entire theory is built on complex linear algebra.

How does SVD compress images?

An image is a matrix of pixel values. SVD: A = UΣVᵀ. Keep only top k singular values: A_k ≈ Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ. This rank-k matrix captures most visual information with fewer parameters.

What are Markov chains?

A Markov chain is a random process where the next state depends only on the current state. Transition probabilities form a stochastic matrix. The long-run distribution is an eigenvector with eigenvalue 1.

How does PCA reduce dimensionality?

PCA finds orthogonal directions of maximum variance in data. These are eigenvectors of the covariance matrix. Project onto top k eigenvectors to reduce dimensions while preserving most variance.

Why are GPUs so good at linear algebra?

GPUs have many parallel cores designed for matrix operations. Neural networks, graphics, and simulations are matrix-heavy, making GPUs ideal. Libraries like cuBLAS optimize matrix multiplication for GPU architecture.

How is linear algebra used in cryptography?

Lattice-based cryptography uses linear algebra over integers. Error-correcting codes use matrices over finite fields. Matrix groups provide structure for some protocols.

What about applications in physics?

Quantum mechanics: operators are matrices, states are vectors. Mechanics: inertia tensors, stress tensors. Relativity: metric tensors, Lorentz transformations. Linear algebra is the language of physics.