Graph Semi-Supervised Learning: When 5 Labels Spread to 95 Samples

When 5 Labels "Infect" 95 Samples

You manage a social network of 100 people. Five have labeled interests — three like sports, two like TV shows. The other 95? No data.

Traditional supervised learning says: no labels, no predictions. Go collect more data. Time-consuming. Expensive. Tedious.

Graph-based semi-supervised learning says: hold on. Look at how those 95 relate to the labeled 5. Someone who works out with the sports fans? Probably likes sports. Always chatting about plot twists? Likely a TV show enthusiast.

Let the known labels spread through the "social network." Those 95 unknowns get answers fast.

Not magic. Graph structure + label propagation.

This article tackles three questions:

What's a "graph"? (Turn data into a relationship network)
How do labels propagate? (Labeled nodes "pull" their neighbors)
Why does this work? (Two common-sense assumptions)

A "Graph" Isn't a Chart — It's a Relationship Network

Step one: turn all samples into a graph.

Don't let the term intimidate you. A graph has two components. That's it.

Nodes: Each Sample

Classifying animal images? Each image = one node.
Tagging users? Each user = one node.
Detecting spam emails? Each email = one node.

100 samples = 100 nodes. Simple.

Edges: Similarity Relationships Between Samples

Two samples "look alike" or "have a connection"? Draw an edge between them.

Edges have thickness (weights):

More similar → thicker edge (higher weight)
Less similar → thinner edge (weight near 0, or no edge at all)

Real-World Analogy

Imagine a classroom. Each student is a node. Xiao Ming and Xiao Gang play basketball daily → thick edge (weight 0.9). Xiao Ming and Xiao Hong occasionally chat → thin edge (weight 0.3). Xiao Ming and Xiao Li never interact → no edge (weight 0). This "classroom relationship graph" works exactly like a data graph.

Label Propagation: "Seed Nodes" Pull Neighbors Along

Graph built. Now the core question: how do a few labeled nodes help label the rest?

Answer: Label Propagation.

The Core Logic

Labeled nodes ("seed nodes") pass their class labels along edges to neighbors.

Stronger connection (higher weight) → faster propagation
Neighbors continue passing to their neighbors
Iterate until the entire graph stabilizes — all nodes get labels

Classroom Network Example

Only 2 students have labeled interests:

Xiao Ming (likes sports)
Xiao Hong (likes TV shows)

Network:

Xiao Ming ↔ Xiao Gang, Xiao Liang (thick edges, hang out often)
Xiao Gang ↔ Xiao Wei (thin edge, occasional interaction)
Xiao Hong ↔ Xiao Li, Xiao Mei (thick edges)
Xiao Li ↔ Xiao Na (thin edge)

Propagation Process:

Round 1: Xiao Ming spreads "sports" to Xiao Gang, Xiao Liang (thick edges, strong influence). Xiao Hong spreads "TV shows" to Xiao Li, Xiao Mei.

Round 2: Xiao Gang spreads "sports" to Xiao Wei (thin edge, weaker influence). Xiao Li spreads "TV shows" to Xiao Na.

Round 3: All labels stabilize. Propagation stops.

Result: entire class has "interest labels." Xiao Ming's five friends all tagged "sports." Xiao Hong's four friends all tagged "TV shows."

That's graph-based semi-supervised learning.

Implementation: 5 Steps to Label Propagation

Let's walk through a real task: classifying 100 animal images.

Scenario:

100 images, only 5 labeled (3 cats, 2 dogs)
95 unlabeled
Goal: use graph semi-supervised learning to classify them

Step 1: Build the "Image Relationship Graph"

Nodes: 100 images, one node each.

Edges: Compute similarity between every pair.

Two cat images: pointy ears, round eyes → high similarity → thick edge (weight 0.9)
Cat vs dog: floppy ears vs pointy, long muzzle vs short → low similarity → thin edge (weight 0.2) or no edge

Weight Formula (Gaussian kernel):

W_{ij} = \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right)

$x_i, x_j$ : feature vectors of two images (color, texture, etc.)
Closer distance → $W_{ij}$ approaches 1
Farther distance → $W_{ij}$ approaches 0

Produces a 100×100 affinity matrix W, recording all pairwise similarities.

Step 2: Initialize Labels

Labeled nodes (5 total):

3 cat images: label = 1 (represents "cat")
2 dog images: label = 0 (represents "dog")

Unlabeled nodes (95 total):

Initial label = 0.5 (neutral, "50% cat, 50% dog")

Step 3: Iterative Propagation

Each unlabeled node "listens to its neighbors."

Propagation Formula:

P_i^{(t+1)} = \frac{\sum_{j \in \text{neighbors}} W_{ij} \cdot P_j^{(t)}}{\sum_{j \in \text{neighbors}} W_{ij}}

$P_i^{(t+1)}$ : node i's "cat" probability at round t+1
$W_{ij}$ : edge weight between node i and node j
$P_j^{(t)}$ : neighbor j's "cat" probability at round t

Example:

An unlabeled node has 3 neighbors:

Neighbor A (cat, P=1, edge weight 0.8)
Neighbor B (cat, P=1, edge weight 0.7)
Neighbor C (dog, P=0, edge weight 0.4)

Calculation:

P = \frac{0.8 \times 1 + 0.7 \times 1 + 0.4 \times 0}{0.8 + 0.7 + 0.4} = \frac{1.5}{1.9} \approx 0.79

This node's "cat" probability becomes 0.79 — likely a cat.

Key Constraint

Labeled nodes' labels are always fixed. Cats stay at 1, dogs stay at 0. They're the "seeds" ensuring propagation doesn't drift from true classes.

Step 4: Check for Convergence

When to stop propagating? Two criteria.

Criterion 1: Probability change below threshold

If all unlabeled nodes' probabilities change less than ε (e.g., 10⁻⁵) across two consecutive iterations:

|P_i^{(t+1)} - P_i^{(t)}| < \epsilon \quad \text{(for all unlabeled nodes)}

Labels stabilized. Stop.

Criterion 2: Maximum iterations reached

Set an upper limit (e.g., 100 rounds). Hit it? Stop, avoiding infinite loops.

Step 5: Predict Final Classes

After propagation stops, determine each node's class by its probability:

P > 0.5 → predict cat (class 1)
P < 0.5 → predict dog (class 0)
P = 0.5 → random assignment (rare)

95 unlabeled images now have "cat" or "dog" predictions. Done.

Why Does This Work? Two Common-Sense Assumptions

Graph-based semi-supervised learning relies on two assumptions aligned with reality.

Assumption 1: Homophily

"Close = similar": nodes with stronger connections (higher similarity) likely share the same class.

Like your friends — their interests probably match yours. Daily gym buddies? Probably all into fitness.

Same with graphs: two cat images with high similarity (thick edge) probably belong to the same class.

Assumption 2: Cluster Assumption

"Clusters match classes": if a group of nodes connects tightly (forming a "clique"), they likely share the same class.

Classroom friend groups often have similar interests. Data's no different: similar cat images cluster together, dog images form another cluster.

These assumptions aren't perfect, but they work for most scenarios.

Binary Classification: Simpler Propagation

Earlier we covered the general framework. Binary classification (e.g., spam vs legitimate email) offers some conveniences.

Difference 1: Only One Probability Needed

Multi-class: maintain "cat probability," "dog probability," "bird probability"...

Binary: one probability suffices — P(positive class). Negative class probability = 1 - P automatically.

Difference 2: Simpler Threshold Decision

Multi-class: compare multiple class probabilities, pick the highest.

Binary: check if P > 0.5. Over.

Difference 3: More Intuitive Initialization

Positive class (e.g., spam): P = 1
Negative class (e.g., legitimate): P = 0
Unlabeled: P = 0.5 (neutral)

Propagation formula stays identical, just less computation.

Graph Semi-Supervised vs Other Semi-Supervised Methods

Aspect	Graph Semi-Supervised	Self-Training	Co-Training
Core Idea	Leverage sample relationship network	Model trains on its own predictions	Two feature views train each other
Requires	Sample similarity measure	Initial classifier	Separable feature sets
Pros	Intuitive, interpretable (see propagation paths)	Simple, no graph construction	Exploits feature diversity
Cons	Large datasets = huge graph, slow computation	Error accumulation	Needs feature independence

Graph semi-supervised's biggest strength: visualization. You see how labels spread along edges, which nodes influenced others.

Biggest weakness: scalability. 100,000 nodes? Affinity matrix = 100,000 × 100,000. Can't store it, can't compute it. New sample arrives? Rebuild the entire graph.

Real-World Applications

1. Social Network Analysis

Problem: Predict user interests, but only a few filled out surveys.

Solution:

Build user relationship graph (follows, likes, comments)
Label propagation: interest labels spread through social ties
Result: non-survey users get interest predictions

2. Image Classification (Scarce Labels)

Problem: Medical imaging data, annotation extremely costly (requires experts). 1,000 scans, only 50 labeled.

Solution:

Compute image similarity (color, texture, shape)
Build graph, 50 labeled nodes as seeds
Propagate: similar lesion scans get similar labels

3. Text Classification

Problem: News classification, 100 labeled articles, 10,000 unlabeled.

Solution:

Compute text similarity (TF-IDF, word vectors)
Build document relationship graph
Propagate: topically similar documents cluster, auto-gain categories

Limitations and Future Directions

Limitation 1: High Computational Complexity

Problem: Affinity matrix W is n × n (n = sample count). 100,000 samples = 10 billion elements.

Improvement:

Sparsification: only keep k-nearest neighbor edges (k-NN graph)
Approximation algorithms: use sampling or dimensionality reduction

Limitation 2: New Samples Require Graph Rebuild

Problem: New sample arrives, entire graph needs recomputation.

Improvement:

Incremental graph construction: only compute new sample's edges to existing nodes
Graph Neural Networks (GNNs): learn graph representations, support dynamic updates

Limitation 3: Depends on Similarity Accuracy

Problem: Wrong similarity (calculating cats and dogs as very similar) propagates wrong labels.

Improvement:

Better feature extraction (deep learning)
Multi-view fusion (combine different similarity metrics)

Key Takeaways

Graph essence: Turn samples into nodes, similarity into edges, build relationship network.
Label propagation logic:
- Labeled nodes are seeds
- Labels spread along edges (higher similarity = faster spread)
- Iterate until stable
Two assumptions:
- Homophily: similar samples → same class
- Cluster assumption: tightly connected nodes → same class
Binary classification advantages:
- Only one probability needed
- Simple threshold decision (P > 0.5)
vs Other methods:
- Pros: intuitive, interpretable
- Cons: slow, not scalable
Applications: Social networks, medical imaging, text classification — any scenario with "sample relationships"

Ready to explore semi-supervised learning?

Master graph-based methods, label propagation algorithms, and other semi-supervised techniques. Learn when to leverage unlabeled data and how to build robust classifiers with minimal supervision.