MathIsimple

Multidimensional Scaling (MDS)

Master the foundational linear dimensionality reduction method that preserves pairwise distances. Learn how MDS derives low-dimensional embeddings from distance matrices.

Module 3 of 9
Intermediate Level
90-120 min

What is Multidimensional Scaling?

Multidimensional Scaling (MDS) is a linear dimensionality reduction technique that finds a low-dimensional representation of data such that pairwise distances in the original space are preserved as closely as possible in the low-dimensional space.

Core Assumption

Given a distance matrix DRm×mD \in \mathbb{R}^{m \times m} wheredistijdist_{ij} is the distance between samples xix_iand xjx_j, MDS finds low-dimensional representationszi,zjRdz_i, z_j \in \mathbb{R}^{d'} such that:

zizjdistij\|z_i - z_j\| \approx dist_{ij}

The goal is to preserve distances, making MDS ideal for visualization and distance-based analysis.

From Distance Matrix to Inner Product Matrix

The key insight of MDS is converting distance information into inner product information, which enables linear algebra techniques.

Step 1: Distance to Inner Product Relationship

For Euclidean distance in low-dimensional space:

distij2=zizj2=zi2+zj22ziTzjdist_{ij}^2 = \|z_i - z_j\|^2 = \|z_i\|^2 + \|z_j\|^2 - 2z_i^T z_j

Let bij=ziTzjb_{ij} = z_i^T z_j be the inner product. Then:

distij2=bii+bjj2bijdist_{ij}^2 = b_{ii} + b_{jj} - 2b_{ij}

Step 2: Centering and Global Statistics

To solve for bijb_{ij}, we center the data (assume mean is zero) and introduce global distance statistics:

Average distance from sample i:

disti.2=1mj=1mdistij2dist_{i.}^2 = \frac{1}{m}\sum_{j=1}^m dist_{ij}^2

Average distance from sample j:

dist.j2=1mi=1mdistij2dist_{.j}^2 = \frac{1}{m}\sum_{i=1}^m dist_{ij}^2

Global average distance:

dist..2=1m2i=1mj=1mdistij2dist_{..}^2 = \frac{1}{m^2}\sum_{i=1}^m\sum_{j=1}^m dist_{ij}^2

Step 3: Solving for Inner Product Matrix

Through algebraic manipulation, we can solve for the inner product matrix BBwhere Bij=bijB_{ij} = b_{ij}:

bij=12(distij2disti.2dist.j2+dist..2)b_{ij} = -\frac{1}{2}\left(dist_{ij}^2 - dist_{i.}^2 - dist_{.j}^2 + dist_{..}^2\right)

This formula converts distance information into inner product information, enabling eigenvalue decomposition.

Low-Dimensional Mapping via Eigenvalue Decomposition

Once we have the inner product matrix BB, we can find the low-dimensional representation through eigenvalue decomposition.

Eigenvalue Decomposition

Decompose the inner product matrix:

B=VΛVTB = V\Lambda V^T

Where VV contains eigenvectors and Λ\Lambdacontains eigenvalues in descending order.

Select the top dd' eigenvalues and corresponding eigenvectors:

Z=Λ~1/2V~TZ = \tilde{\Lambda}^{1/2} \tilde{V}^T

Where Λ~\tilde{\Lambda} is the d×dd' \times d'diagonal matrix of top eigenvalues, and V~\tilde{V} contains the corresponding eigenvectors. ZRd×mZ \in \mathbb{R}^{d' \times m} is the low-dimensional representation.

Why This Works

The top eigenvalues capture the most variance in the distance structure. By keeping only the largest eigenvalues, we:

  • • Preserve the most important distance relationships
  • • Discard noise and redundant dimensions
  • • Achieve optimal low-dimensional approximation

MDS Algorithm Steps

The complete MDS algorithm:

Step 1

Input Distance Matrix

Given distance matrix DRm×mD \in \mathbb{R}^{m \times m} whereDij=distijD_{ij} = dist_{ij} is the distance between samplesii and jj.

Step 2

Compute Inner Product Matrix

Calculate global statistics and solve for BB:

bij=12(distij2disti.2dist.j2+dist..2)b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i.}^2 - dist_{.j}^2 + dist_{..}^2)
Step 3

Eigenvalue Decomposition

Decompose B=VΛVTB = V\Lambda V^T and select topdd' eigenvalues and eigenvectors.

Step 4

Low-Dimensional Representation

Compute Z=Λ~1/2V~TZ = \tilde{\Lambda}^{1/2} \tilde{V}^T where each column is a low-dimensional sample.

Linear Dimensionality Reduction Form

MDS is a linear dimensionality reduction method. All linear methods can be expressed as:

Z=WTXZ = W^T X

Where:

  • XRd×mX \in \mathbb{R}^{d \times m} is the original data matrix
  • WRd×dW \in \mathbb{R}^{d \times d'} is the projection matrix
  • ZRd×mZ \in \mathbb{R}^{d' \times m} is the low-dimensional representation

Different linear methods differ in how they determine WW:

  • MDS: WW preserves pairwise distances
  • PCA: WW maximizes variance
  • LDA: WW maximizes class separation