When 5 Labels "Infect" 95 Samples
You manage a social network of 100 people. Five have labeled interests — three like sports, two like TV shows. The other 95? No data.
Traditional supervised learning says: no labels, no predictions. Go collect more data. Time-consuming. Expensive. Tedious.
Graph-based semi-supervised learning says: hold on. Look at how those 95 relate to the labeled 5. Someone who works out with the sports fans? Probably likes sports. Always chatting about plot twists? Likely a TV show enthusiast.
Let the known labels spread through the "social network." Those 95 unknowns get answers fast.
Not magic. Graph structure + label propagation.
This article tackles three questions:
- What's a "graph"? (Turn data into a relationship network)
- How do labels propagate? (Labeled nodes "pull" their neighbors)
- Why does this work? (Two common-sense assumptions)
A "Graph" Isn't a Chart — It's a Relationship Network
Step one: turn all samples into a graph.
Don't let the term intimidate you. A graph has two components. That's it.
Nodes: Each Sample
- Classifying animal images? Each image = one node.
- Tagging users? Each user = one node.
- Detecting spam emails? Each email = one node.
100 samples = 100 nodes. Simple.
Edges: Similarity Relationships Between Samples
Two samples "look alike" or "have a connection"? Draw an edge between them.
Edges have thickness (weights):
- More similar → thicker edge (higher weight)
- Less similar → thinner edge (weight near 0, or no edge at all)
Real-World Analogy
Imagine a classroom. Each student is a node. Xiao Ming and Xiao Gang play basketball daily → thick edge (weight 0.9). Xiao Ming and Xiao Hong occasionally chat → thin edge (weight 0.3). Xiao Ming and Xiao Li never interact → no edge (weight 0). This "classroom relationship graph" works exactly like a data graph.
Label Propagation: "Seed Nodes" Pull Neighbors Along
Graph built. Now the core question: how do a few labeled nodes help label the rest?
Answer: Label Propagation.
The Core Logic
Labeled nodes ("seed nodes") pass their class labels along edges to neighbors.
- Stronger connection (higher weight) → faster propagation
- Neighbors continue passing to their neighbors
- Iterate until the entire graph stabilizes — all nodes get labels
Classroom Network Example
Only 2 students have labeled interests:
- Xiao Ming (likes sports)
- Xiao Hong (likes TV shows)
Network:
- Xiao Ming ↔ Xiao Gang, Xiao Liang (thick edges, hang out often)
- Xiao Gang ↔ Xiao Wei (thin edge, occasional interaction)
- Xiao Hong ↔ Xiao Li, Xiao Mei (thick edges)
- Xiao Li ↔ Xiao Na (thin edge)
Propagation Process:
Round 1: Xiao Ming spreads "sports" to Xiao Gang, Xiao Liang (thick edges, strong influence). Xiao Hong spreads "TV shows" to Xiao Li, Xiao Mei.
Round 2: Xiao Gang spreads "sports" to Xiao Wei (thin edge, weaker influence). Xiao Li spreads "TV shows" to Xiao Na.
Round 3: All labels stabilize. Propagation stops.
Result: entire class has "interest labels." Xiao Ming's five friends all tagged "sports." Xiao Hong's four friends all tagged "TV shows."
That's graph-based semi-supervised learning.
Implementation: 5 Steps to Label Propagation
Let's walk through a real task: classifying 100 animal images.
Scenario:
- 100 images, only 5 labeled (3 cats, 2 dogs)
- 95 unlabeled
- Goal: use graph semi-supervised learning to classify them
Step 1: Build the "Image Relationship Graph"
Nodes: 100 images, one node each.
Edges: Compute similarity between every pair.
- Two cat images: pointy ears, round eyes → high similarity → thick edge (weight 0.9)
- Cat vs dog: floppy ears vs pointy, long muzzle vs short → low similarity → thin edge (weight 0.2) or no edge
Weight Formula (Gaussian kernel):
- : feature vectors of two images (color, texture, etc.)
- Closer distance → approaches 1
- Farther distance → approaches 0
Produces a 100×100 affinity matrix W, recording all pairwise similarities.
Step 2: Initialize Labels
Labeled nodes (5 total):
- 3 cat images: label = 1 (represents "cat")
- 2 dog images: label = 0 (represents "dog")
Unlabeled nodes (95 total):
- Initial label = 0.5 (neutral, "50% cat, 50% dog")
Step 3: Iterative Propagation
Each unlabeled node "listens to its neighbors."
Propagation Formula:
- : node i's "cat" probability at round t+1
- : edge weight between node i and node j
- : neighbor j's "cat" probability at round t
Example:
An unlabeled node has 3 neighbors:
- Neighbor A (cat, P=1, edge weight 0.8)
- Neighbor B (cat, P=1, edge weight 0.7)
- Neighbor C (dog, P=0, edge weight 0.4)
Calculation:
This node's "cat" probability becomes 0.79 — likely a cat.
Key Constraint
Labeled nodes' labels are always fixed. Cats stay at 1, dogs stay at 0. They're the "seeds" ensuring propagation doesn't drift from true classes.
Step 4: Check for Convergence
When to stop propagating? Two criteria.
Criterion 1: Probability change below threshold
If all unlabeled nodes' probabilities change less than ε (e.g., 10⁻⁵) across two consecutive iterations:
Labels stabilized. Stop.
Criterion 2: Maximum iterations reached
Set an upper limit (e.g., 100 rounds). Hit it? Stop, avoiding infinite loops.
Step 5: Predict Final Classes
After propagation stops, determine each node's class by its probability:
- P > 0.5 → predict cat (class 1)
- P < 0.5 → predict dog (class 0)
- P = 0.5 → random assignment (rare)
95 unlabeled images now have "cat" or "dog" predictions. Done.
Why Does This Work? Two Common-Sense Assumptions
Graph-based semi-supervised learning relies on two assumptions aligned with reality.
Assumption 1: Homophily
"Close = similar": nodes with stronger connections (higher similarity) likely share the same class.
Like your friends — their interests probably match yours. Daily gym buddies? Probably all into fitness.
Same with graphs: two cat images with high similarity (thick edge) probably belong to the same class.
Assumption 2: Cluster Assumption
"Clusters match classes": if a group of nodes connects tightly (forming a "clique"), they likely share the same class.
Classroom friend groups often have similar interests. Data's no different: similar cat images cluster together, dog images form another cluster.
These assumptions aren't perfect, but they work for most scenarios.
Binary Classification: Simpler Propagation
Earlier we covered the general framework. Binary classification (e.g., spam vs legitimate email) offers some conveniences.
Difference 1: Only One Probability Needed
Multi-class: maintain "cat probability," "dog probability," "bird probability"...
Binary: one probability suffices — P(positive class). Negative class probability = 1 - P automatically.
Difference 2: Simpler Threshold Decision
Multi-class: compare multiple class probabilities, pick the highest.
Binary: check if P > 0.5. Over.
Difference 3: More Intuitive Initialization
- Positive class (e.g., spam): P = 1
- Negative class (e.g., legitimate): P = 0
- Unlabeled: P = 0.5 (neutral)
Propagation formula stays identical, just less computation.
Graph Semi-Supervised vs Other Semi-Supervised Methods
| Aspect | Graph Semi-Supervised | Self-Training | Co-Training |
|---|---|---|---|
| Core Idea | Leverage sample relationship network | Model trains on its own predictions | Two feature views train each other |
| Requires | Sample similarity measure | Initial classifier | Separable feature sets |
| Pros | Intuitive, interpretable (see propagation paths) | Simple, no graph construction | Exploits feature diversity |
| Cons | Large datasets = huge graph, slow computation | Error accumulation | Needs feature independence |
Graph semi-supervised's biggest strength: visualization. You see how labels spread along edges, which nodes influenced others.
Biggest weakness: scalability. 100,000 nodes? Affinity matrix = 100,000 × 100,000. Can't store it, can't compute it. New sample arrives? Rebuild the entire graph.
Real-World Applications
1. Social Network Analysis
Problem: Predict user interests, but only a few filled out surveys.
Solution:
- Build user relationship graph (follows, likes, comments)
- Label propagation: interest labels spread through social ties
- Result: non-survey users get interest predictions
2. Image Classification (Scarce Labels)
Problem: Medical imaging data, annotation extremely costly (requires experts). 1,000 scans, only 50 labeled.
Solution:
- Compute image similarity (color, texture, shape)
- Build graph, 50 labeled nodes as seeds
- Propagate: similar lesion scans get similar labels
3. Text Classification
Problem: News classification, 100 labeled articles, 10,000 unlabeled.
Solution:
- Compute text similarity (TF-IDF, word vectors)
- Build document relationship graph
- Propagate: topically similar documents cluster, auto-gain categories
Limitations and Future Directions
Limitation 1: High Computational Complexity
Problem: Affinity matrix W is n × n (n = sample count). 100,000 samples = 10 billion elements.
Improvement:
- Sparsification: only keep k-nearest neighbor edges (k-NN graph)
- Approximation algorithms: use sampling or dimensionality reduction
Limitation 2: New Samples Require Graph Rebuild
Problem: New sample arrives, entire graph needs recomputation.
Improvement:
- Incremental graph construction: only compute new sample's edges to existing nodes
- Graph Neural Networks (GNNs): learn graph representations, support dynamic updates
Limitation 3: Depends on Similarity Accuracy
Problem: Wrong similarity (calculating cats and dogs as very similar) propagates wrong labels.
Improvement:
- Better feature extraction (deep learning)
- Multi-view fusion (combine different similarity metrics)
Key Takeaways
- Graph essence: Turn samples into nodes, similarity into edges, build relationship network.
- Label propagation logic:
- Labeled nodes are seeds
- Labels spread along edges (higher similarity = faster spread)
- Iterate until stable
- Two assumptions:
- Homophily: similar samples → same class
- Cluster assumption: tightly connected nodes → same class
- Binary classification advantages:
- Only one probability needed
- Simple threshold decision (P > 0.5)
- vs Other methods:
- Pros: intuitive, interpretable
- Cons: slow, not scalable
- Applications: Social networks, medical imaging, text classification — any scenario with "sample relationships"