Master graph-based semi-supervised learning through label propagation. Learn how to construct graphs, propagate labels along edges, and leverage the manifold assumption for classification.
Map data to an undirected graph where nodes represent samples and edges represent similarity. Labels from labeled samples propagate along edges to unlabeled samples, following the manifold assumption: nearby samples (connected by edges) should have similar labels.
The quality of the graph significantly affects label propagation. We use a Gaussian kernel functionto compute edge weights based on sample similarity.
Where:
The bandwidth controls how quickly similarity decreases with distance:
Labels propagate iteratively from labeled nodes to unlabeled nodes. We define a degree matrixand propagation matrix to control the propagation process.
A diagonal matrix where each diagonal element is the sum of edge weights connected to that node:
The degree measures how "well-connected" node i is in the graph.
For multi-class classification, we use symmetric normalization:
This symmetric normalization ensures stable propagation and convergence.
Alternative (Row Normalization):
P represents transition probabilities (from node i to j). Used in some variants.
Let be the label matrix at iteration t, where represents the probability that node i belongs to class j:
Where:
As , the label matrix converges to:
This closed-form solution can be computed directly, but iterative updates are often more efficient for large graphs.
Compute edge weights using Gaussian kernel for all sample pairs. Construct degree matrix and propagation matrix .
Initialize label matrix :
Repeat until convergence:
Stop when (e.g., ).
For unlabeled sample i, predict class:
Choose the class with highest probability in the converged label matrix.
Apply graph-based learning to classify 200 social network users into "Tech Enthusiast" or "Sports Fan" based on friendship connections. Only 50 users have known interests (labeled), while 150 users are unlabeled.
Build graph with 200 nodes (users). Edge weight based on friendship strength:
Iteratively propagate labels from 50 labeled users to 150 unlabeled users:
Achieves 88% accuracy on test set (vs 75% using labeled samples only). Labels successfully propagate through friendship connections, leveraging the "birds of a feather" principle.