Elevate from dimensionality reduction to learning optimal distance metrics. Understand how to learn metrics that make similar samples closer and dissimilar samples farther apart.
Metric learning goes beyond dimensionality reduction by directly learning an optimal distance metric for a specific task. Instead of finding a low-dimensional space, metric learning learns a distance function that makes similar samples close and dissimilar samples far apart.
Every low-dimensional space corresponds to a distance metric. Metric learning skips the dimensionality reduction step and directly learns the optimal metric, which can incorporate supervised information and domain knowledge.
Distance metrics have evolved from simple to sophisticated:
All features weighted equally. Simple but ignores feature importance and correlations.
Diagonal weight matrix allows different feature importance, but still ignores feature correlations.
General symmetric positive semi-definite matrix (metric matrix) captures feature importance and correlations. This is what metric learning optimizes.
Constraint: (positive semi-definite) ensures distance properties (non-negativity, symmetry, triangle inequality).
NCA is a supervised metric learning method that optimizes the leave-one-out accuracy of kNN classification.
For a projection matrix where , the probability that sample is correctly classified by its neighbors:
Where is the set of samples with the same class as.
NCA maximizes the sum of these probabilities:
Incorporate domain knowledge through constraints on similar and dissimilar pairs:
Given sets of similar pairs (must-link) and dissimilar pairs (cannot-link):
Subject to:
This minimizes distances between similar pairs while ensuring dissimilar pairs are far apart, solved via convex optimization.
Learns metrics optimized for specific learning tasks (e.g., kNN classification accuracy) rather than generic variance preservation.
Can incorporate class labels, pairwise constraints, or domain knowledge to guide metric learning.
Works directly in original space, but if has rank, it implicitly performs dimensionality reduction.