Introduction to nonlinear dimensionality reduction through manifold learning. Understand when data lies on low-dimensional manifolds and how to preserve structure.
Manifold learning assumes that high-dimensional data lies on or near a low-dimensional manifold embedded in the high-dimensional space. A manifold is a space that is locally Euclidean (locally flat) but globally curved.
While data exists in d-dimensional space, it may actually lie on a d'-dimensional manifold where . Manifold learning finds this intrinsic structure.
Example: A 2D Swiss roll embedded in 3D space. The data is 3D but intrinsically 2D (can be "unrolled" to a flat sheet).
Linear methods (PCA, MDS) assume data lies on a linear subspace. When data lies on a curved manifold, linear methods fail to capture the structure.
Different manifold learning methods preserve different aspects of structure:
Preserves distances and relationships across the entire dataset.
Good for understanding overall data topology.
Preserves relationships within local neighborhoods.
Good for preserving fine-grained local geometry.
Two fundamental approaches to manifold learning:
Goal: Preserve global geodesic distances (shortest paths on the manifold).
Goal: Preserve local linear reconstruction relationships.