MathIsimple
LA-C12
Available

Course 12: Advanced Topics Part 2

Multilinear Algebra & Applications

This final course introduces multilinear algebra and tensors, which generalize vectors and matrices to higher dimensions. We then explore the vast applications of linear algebra in machine learning, computer graphics, quantum computing, network analysis, and many other fields.

12-15 hours Advanced Level 10 Objectives
Learning Objectives
  • Define multilinear maps and tensor products.
  • Understand tensors as multilinear maps and their transformation properties.
  • Use Einstein summation notation for tensor operations.
  • Distinguish covariant and contravariant indices.
  • Apply linear algebra to machine learning and neural networks.
  • Understand linear algebra in computer graphics and transformations.
  • Model networks and graphs with adjacency matrices.
  • See connections to quantum computing and quantum mechanics.
  • Apply SVD and other techniques to image processing.
  • Understand Markov chains, PageRank, and other applications.
Prerequisites
  • LA-C11: Advanced Topics Part 1 (SVD & Quadratic Forms)
  • Dual spaces and linear functionals
  • Bilinear forms
  • All previous linear algebra concepts
  • Basic understanding of applications
Historical Context

Tensors were introduced by Woldemar Voigt (1850–1919) in the 1890s and developed by Gregorio Ricci-Curbastro (1853–1925) and Tullio Levi-Civita (1873–1941). Albert Einstein (1879–1955) used tensor calculus in general relativity. The modern applications of linear algebra have exploded with the digital age: machine learning relies heavily on matrix operations, Larry Page and Sergey Brin used eigenvectors for PageRank in 1998, and quantum computing is fundamentally built on linear algebra over complex numbers. Today, linear algebra is the mathematical foundation of data science, AI, graphics, and many other fields.

1. Multilinear Algebra and Tensors

Tensors generalize vectors and matrices to higher dimensions. They are fundamental in physics, differential geometry, and modern machine learning.

Definition 1.1: Multilinear Map

A function B:V1×V2××VkUB: V_1 \times V_2 \times \cdots \times V_k \to U is multilinear if it is linear in each argument when the others are held fixed.

A bilinear map is multilinear with k=2k = 2.

Definition 1.2: Tensor Product

The tensor product VWV \otimes W is a vector space such that bilinear maps from V×WV \times W correspond to linear maps from VWV \otimes W.

Elements vwv \otimes w are called simple tensors. General elements are sums of these.

Theorem 1.1: Dimension of Tensor Product

If dimV=m\dim V = m and dimW=n\dim W = n, then dim(VW)=mn\dim(V \otimes W) = mn.

Definition 1.3: Tensor

A (p,q)-tensor is a multilinear map from (V)p×Vq(V^*)^p \times V^q to FF.

Equivalently, it's an array of numbers with pp contravariant (upper) indices and qq covariant (lower) indices.

Example 1.1: Examples of Tensors
  • Scalar: (0,0)-tensor
  • Vector: (1,0)-tensor
  • Linear functional: (0,1)-tensor
  • Matrix/linear map: (1,1)-tensor
  • Bilinear form: (0,2)-tensor
Definition 1.4: Einstein Summation Convention

When an index appears twice, once as upper and once as lower, summation over that index is implied.

Example: aibi=iaibia_i b^i = \sum_i a_i b^i.

Remark 1.1: Transformation Law

Tensors are characterized by how their components transform under change of basis. This makes them coordinate-independent objects.

2. Applications in Machine Learning

Linear algebra is the mathematical foundation of machine learning. Neural networks, data representation, and optimization all rely heavily on matrix operations.

Definition 2.1: Neural Network Layer

A fully connected layer computes y=σ(Wx+b)y = \sigma(Wx + b) where:

  • WW is a weight matrix
  • xx is the input vector
  • bb is a bias vector
  • σ\sigma is a nonlinear activation function
Example 2.1: Data Representation

Data in ML is stored in tensors:

  • Image: H×W×CH \times W \times C (height × width × channels)
  • Batch: N×H×W×CN \times H \times W \times C (batch size × image dimensions)
  • Sequence: N×T×DN \times T \times D (batch × time steps × features)
Remark 2.1: Gradient Descent

Training neural networks uses gradient descent, which requires computing gradients of matrix-valued functions. The chain rule involves matrix multiplication and transposition.

Example 2.2: Principal Component Analysis (PCA)

PCA finds directions of maximum variance in data. It uses SVD of the centered data matrix. Principal components are right singular vectors, and explained variance comes from squared singular values.

3. Computer Graphics

Every transformation in computer graphics is a matrix operation. Understanding linear algebra is essential for 3D rendering, animation, and image processing.

Definition 3.1: 3D Transformations

Common transformations:

  • Rotation: Orthogonal matrix with det=1\det = 1 (SO(3))
  • Scaling: Diagonal matrix
  • Translation: Affine transformation (homogeneous coordinates)
  • Projection: Perspective or orthogonal projection matrix
Example 3.1: Rendering Pipeline

Vertex transformation: vscreen=PVMvmodelv_{screen} = P \cdot V \cdot M \cdot v_{model} where:

  • MM = model matrix (object to world)
  • VV = view matrix (world to camera)
  • PP = projection matrix (camera to screen)
Remark 3.1: GPU Computing

Modern GPUs are optimized for matrix operations. Graphics shaders perform millions of matrix multiplications per second to render scenes in real-time.

4. Network Analysis and PageRank

Graphs and networks can be represented by matrices. Eigenvalues and eigenvectors reveal important structural properties.

Definition 4.1: Adjacency Matrix

For a graph with nn vertices, the adjacency matrix AA has:

Aij={1if edge from i to j0otherwiseA_{ij} = \begin{cases} 1 & \text{if edge from } i \text{ to } j \\ 0 & \text{otherwise} \end{cases}
Definition 4.2: PageRank

PageRank models the web as a directed graph. The transition matrix PP has:

Pij={1/out-degree(j)if j links to i0otherwiseP_{ij} = \begin{cases} 1/\text{out-degree}(j) & \text{if } j \text{ links to } i \\ 0 & \text{otherwise} \end{cases}

PageRank is the dominant eigenvector (eigenvalue 1) of PP, representing the stationary distribution of a random walk.

Theorem 4.1: Perron-Frobenius

For an irreducible stochastic matrix, the eigenvalue 1 is simple and the corresponding eigenvector has all positive entries.

Example 4.1: Markov Chains

A Markov chain is a random process where the next state depends only on the current state. Transition probabilities form a stochastic matrix. The long-run distribution is an eigenvector with eigenvalue 1.

5. Quantum Computing

Quantum mechanics is fundamentally linear algebraic. Quantum states are vectors, operations are unitary matrices, and measurement involves projections.

Definition 5.1: Qubit

A qubit is a quantum state ψ=α0+β1|\psi\rangle = \alpha|0\rangle + \beta|1\rangle where α2+β2=1|\alpha|^2 + |\beta|^2 = 1.

This is a unit vector in C2\mathbb{C}^2.

Definition 5.2: Quantum Gate

A quantum gate is a unitary matrix UU (satisfying UU=IU^* U = I).

Unitary matrices preserve the norm, ensuring probability conservation.

Example 5.1: Pauli Matrices

Important quantum gates include the Pauli matrices:

σx=(0110),σy=(0ii0),σz=(1001)\sigma_x = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \quad \sigma_y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}, \quad \sigma_z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}
Remark 5.1: Measurement

Quantum measurement projects the state onto an eigenspace. The probability of outcome ii is ψei2|\langle \psi | e_i \rangle|^2.

6. Other Applications

Linear algebra appears in countless other fields. Here are a few more important examples.

Example 6.1: Signal Processing

Convolution is a linear operation, represented by a Toeplitz or circulant matrix. The Fourier transform is a change of basis to the frequency domain. Filtering is projection onto frequency subspaces.

Example 6.2: Control Theory

System stability is determined by eigenvalues of the state matrix. Stable systems have all eigenvalues with negative real parts (continuous) or magnitude less than 1 (discrete).

Example 6.3: Cryptography

Lattice-based cryptography uses linear algebra over integers. Error-correcting codes use matrices over finite fields. Matrix groups provide structure for cryptographic protocols.

Example 6.4: Physics

Quantum mechanics: operators are matrices, states are vectors. Mechanics: inertia tensors, stress tensors. Relativity: metric tensors, Lorentz transformations. Linear algebra is the language of physics.

Course 12 Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
A bilinear map B:V×WUB: V \times W \to U is linear in:
Easy
Not attempted
2
If dimV=m\dim V = m and dimW=n\dim W = n, then dim(VW)=\dim(V \otimes W) =
Medium
Not attempted
3
The outer product of column vectors uu and vv is:
Easy
Not attempted
4
In machine learning, weights of a neural network layer are represented as:
Easy
Not attempted
5
PageRank algorithm finds:
Medium
Not attempted
6
In computer graphics, a 3D rotation is represented by:
Medium
Not attempted
7
In quantum computing, a qubit state is:
Medium
Not attempted
8
The adjacency matrix of a graph has Aij=1A_{ij} = 1 iff:
Easy
Not attempted
9
PCA finds directions of:
Easy
Not attempted
10
Einstein summation convention means:
Medium
Not attempted

Frequently Asked Questions

What is a tensor?

A tensor is a multilinear map from products of vector spaces (and their duals) to scalars. Equivalently, it's an array of numbers that transforms in a specific way under change of basis. Scalars are (0,0)-tensors, vectors are (1,0)-tensors, and matrices are (1,1)-tensors.

Why is linear algebra so important in machine learning?

ML is fundamentally about learning transformations from data. Neural networks are compositions of linear transformations (matrices) and nonlinear activations. Training uses gradient-based optimization, which relies on linear algebra. Data is stored in tensors (multi-dimensional arrays).

How does Google's PageRank work?

Model the web as a directed graph. The transition matrix has P_ij = 1/(out-degree of j) if j links to i. PageRank is the dominant eigenvector: the stationary distribution of a random walk on the web. It's computed using the power method.

How is linear algebra used in computer graphics?

Every transformation—rotation, scaling, translation, projection—is a matrix. Rendering pipelines multiply vertices by model, view, and projection matrices. Shaders perform matrix operations on GPU. 3D rotations are orthogonal matrices with determinant +1.

What makes quantum computing linear algebraic?

Quantum states are vectors in Hilbert space. Quantum gates are unitary matrices. Measurement involves projections. The entire theory is built on complex linear algebra. A qubit is a unit vector in ℂ², and quantum operations preserve the norm.