Machine Learning/Learning Center/Dimensionality Reduction/Multidimensional Scaling

Multidimensional Scaling (MDS)

Master the foundational linear dimensionality reduction method that preserves pairwise distances. Learn how MDS derives low-dimensional embeddings from distance matrices.

Module 3 of 9

Intermediate Level

90-120 min

What is Multidimensional Scaling?

Multidimensional Scaling (MDS) is a linear dimensionality reduction technique that finds a low-dimensional representation of data such that pairwise distances in the original space are preserved as closely as possible in the low-dimensional space.

Core Assumption

Given a distance matrix $D \in \mathbb{R}^{m \times m}$ where $dist_{ij}$ is the distance between samples $x_i$ and $x_j$ , MDS finds low-dimensional representations $z_i, z_j \in \mathbb{R}^{d'}$ such that:

\|z_i - z_j\| \approx dist_{ij}

The goal is to preserve distances, making MDS ideal for visualization and distance-based analysis.

From Distance Matrix to Inner Product Matrix

The key insight of MDS is converting distance information into inner product information, which enables linear algebra techniques.

Step 1: Distance to Inner Product Relationship

For Euclidean distance in low-dimensional space:

dist_{ij}^2 = \|z_i - z_j\|^2 = \|z_i\|^2 + \|z_j\|^2 - 2z_i^T z_j

Let $b_{ij} = z_i^T z_j$ be the inner product. Then:

dist_{ij}^2 = b_{ii} + b_{jj} - 2b_{ij}

Step 2: Centering and Global Statistics

To solve for $b_{ij}$ , we center the data (assume mean is zero) and introduce global distance statistics:

Average distance from sample i:

dist_{i.}^2 = \frac{1}{m}\sum_{j=1}^m dist_{ij}^2

Average distance from sample j:

dist_{.j}^2 = \frac{1}{m}\sum_{i=1}^m dist_{ij}^2

Global average distance:

dist_{..}^2 = \frac{1}{m^2}\sum_{i=1}^m\sum_{j=1}^m dist_{ij}^2

Step 3: Solving for Inner Product Matrix

Through algebraic manipulation, we can solve for the inner product matrix $B$ where $B_{ij} = b_{ij}$ :

b_{ij} = -\frac{1}{2}\left(dist_{ij}^2 - dist_{i.}^2 - dist_{.j}^2 + dist_{..}^2\right)

This formula converts distance information into inner product information, enabling eigenvalue decomposition.

Low-Dimensional Mapping via Eigenvalue Decomposition

Once we have the inner product matrix $B$ , we can find the low-dimensional representation through eigenvalue decomposition.

Eigenvalue Decomposition

Decompose the inner product matrix:

B = V\Lambda V^T

Where $V$ contains eigenvectors and $\Lambda$ contains eigenvalues in descending order.

Select the top $d'$ eigenvalues and corresponding eigenvectors:

Z = \tilde{\Lambda}^{1/2} \tilde{V}^T

Where $\tilde{\Lambda}$ is the $d' \times d'$ diagonal matrix of top eigenvalues, and $\tilde{V}$ contains the corresponding eigenvectors. $Z \in \mathbb{R}^{d' \times m}$ is the low-dimensional representation.

Why This Works

The top eigenvalues capture the most variance in the distance structure. By keeping only the largest eigenvalues, we:

• Preserve the most important distance relationships
• Discard noise and redundant dimensions
• Achieve optimal low-dimensional approximation

MDS Algorithm Steps

The complete MDS algorithm:

Step 1

Input Distance Matrix

Given distance matrix $D \in \mathbb{R}^{m \times m}$ where $D_{ij} = dist_{ij}$ is the distance between samples $i$ and $j$ .

Step 2

Compute Inner Product Matrix

Calculate global statistics and solve for $B$ :

b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i.}^2 - dist_{.j}^2 + dist_{..}^2)

Step 3

Eigenvalue Decomposition

Decompose $B = V\Lambda V^T$ and select top $d'$ eigenvalues and eigenvectors.

Step 4

Low-Dimensional Representation

Compute $Z = \tilde{\Lambda}^{1/2} \tilde{V}^T$ where each column is a low-dimensional sample.

Linear Dimensionality Reduction Form

MDS is a linear dimensionality reduction method. All linear methods can be expressed as:

Z = W^T X

Where:

• $X \in \mathbb{R}^{d \times m}$ is the original data matrix
• $W \in \mathbb{R}^{d \times d'}$ is the projection matrix
• $Z \in \mathbb{R}^{d' \times m}$ is the low-dimensional representation

Different linear methods differ in how they determine $W$ :

• MDS: $W$ preserves pairwise distances
• PCA: $W$ maximizes variance
• LDA: $W$ maximizes class separation

Next Module