MathIsimple
Article
15 min read

Graph Semi-Supervised Learning: When 5 Labels Spread to 95 Samples

How label propagation turns relationship networks into powerful classifiers

2026-01-25
Semi-Supervised Learning
Graph-Based Learning
Label Propagation
Classification
Unlabeled Data

When 5 Labels "Infect" 95 Samples

You manage a social network of 100 people. Five have labeled interests — three like sports, two like TV shows. The other 95? No data.

Traditional supervised learning says: no labels, no predictions. Go collect more data. Time-consuming. Expensive. Tedious.

Graph-based semi-supervised learning says: hold on. Look at how those 95 relate to the labeled 5. Someone who works out with the sports fans? Probably likes sports. Always chatting about plot twists? Likely a TV show enthusiast.

Let the known labels spread through the "social network." Those 95 unknowns get answers fast.

Not magic. Graph structure + label propagation.

This article tackles three questions:

  1. What's a "graph"? (Turn data into a relationship network)
  2. How do labels propagate? (Labeled nodes "pull" their neighbors)
  3. Why does this work? (Two common-sense assumptions)

A "Graph" Isn't a Chart — It's a Relationship Network

Step one: turn all samples into a graph.

Don't let the term intimidate you. A graph has two components. That's it.

Nodes: Each Sample

  • Classifying animal images? Each image = one node.
  • Tagging users? Each user = one node.
  • Detecting spam emails? Each email = one node.

100 samples = 100 nodes. Simple.

Edges: Similarity Relationships Between Samples

Two samples "look alike" or "have a connection"? Draw an edge between them.

Edges have thickness (weights):

  • More similar → thicker edge (higher weight)
  • Less similar → thinner edge (weight near 0, or no edge at all)

Real-World Analogy

Imagine a classroom. Each student is a node. Xiao Ming and Xiao Gang play basketball daily → thick edge (weight 0.9). Xiao Ming and Xiao Hong occasionally chat → thin edge (weight 0.3). Xiao Ming and Xiao Li never interact → no edge (weight 0). This "classroom relationship graph" works exactly like a data graph.

Label Propagation: "Seed Nodes" Pull Neighbors Along

Graph built. Now the core question: how do a few labeled nodes help label the rest?

Answer: Label Propagation.

The Core Logic

Labeled nodes ("seed nodes") pass their class labels along edges to neighbors.

  • Stronger connection (higher weight) → faster propagation
  • Neighbors continue passing to their neighbors
  • Iterate until the entire graph stabilizes — all nodes get labels

Classroom Network Example

Only 2 students have labeled interests:

  • Xiao Ming (likes sports)
  • Xiao Hong (likes TV shows)

Network:

  • Xiao Ming ↔ Xiao Gang, Xiao Liang (thick edges, hang out often)
  • Xiao Gang ↔ Xiao Wei (thin edge, occasional interaction)
  • Xiao Hong ↔ Xiao Li, Xiao Mei (thick edges)
  • Xiao Li ↔ Xiao Na (thin edge)

Propagation Process:

Round 1: Xiao Ming spreads "sports" to Xiao Gang, Xiao Liang (thick edges, strong influence). Xiao Hong spreads "TV shows" to Xiao Li, Xiao Mei.

Round 2: Xiao Gang spreads "sports" to Xiao Wei (thin edge, weaker influence). Xiao Li spreads "TV shows" to Xiao Na.

Round 3: All labels stabilize. Propagation stops.

Result: entire class has "interest labels." Xiao Ming's five friends all tagged "sports." Xiao Hong's four friends all tagged "TV shows."

That's graph-based semi-supervised learning.

Implementation: 5 Steps to Label Propagation

Let's walk through a real task: classifying 100 animal images.

Scenario:

  • 100 images, only 5 labeled (3 cats, 2 dogs)
  • 95 unlabeled
  • Goal: use graph semi-supervised learning to classify them

Step 1: Build the "Image Relationship Graph"

Nodes: 100 images, one node each.

Edges: Compute similarity between every pair.

  • Two cat images: pointy ears, round eyes → high similarity → thick edge (weight 0.9)
  • Cat vs dog: floppy ears vs pointy, long muzzle vs short → low similarity → thin edge (weight 0.2) or no edge

Weight Formula (Gaussian kernel):

Wij=exp(xixj22σ2)W_{ij} = \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right)
  • xi,xjx_i, x_j: feature vectors of two images (color, texture, etc.)
  • Closer distance → WijW_{ij} approaches 1
  • Farther distance → WijW_{ij} approaches 0

Produces a 100×100 affinity matrix W, recording all pairwise similarities.

Step 2: Initialize Labels

Labeled nodes (5 total):

  • 3 cat images: label = 1 (represents "cat")
  • 2 dog images: label = 0 (represents "dog")

Unlabeled nodes (95 total):

  • Initial label = 0.5 (neutral, "50% cat, 50% dog")

Step 3: Iterative Propagation

Each unlabeled node "listens to its neighbors."

Propagation Formula:

Pi(t+1)=jneighborsWijPj(t)jneighborsWijP_i^{(t+1)} = \frac{\sum_{j \in \text{neighbors}} W_{ij} \cdot P_j^{(t)}}{\sum_{j \in \text{neighbors}} W_{ij}}
  • Pi(t+1)P_i^{(t+1)}: node i's "cat" probability at round t+1
  • WijW_{ij}: edge weight between node i and node j
  • Pj(t)P_j^{(t)}: neighbor j's "cat" probability at round t

Example:

An unlabeled node has 3 neighbors:

  • Neighbor A (cat, P=1, edge weight 0.8)
  • Neighbor B (cat, P=1, edge weight 0.7)
  • Neighbor C (dog, P=0, edge weight 0.4)

Calculation:

P=0.8×1+0.7×1+0.4×00.8+0.7+0.4=1.51.90.79P = \frac{0.8 \times 1 + 0.7 \times 1 + 0.4 \times 0}{0.8 + 0.7 + 0.4} = \frac{1.5}{1.9} \approx 0.79

This node's "cat" probability becomes 0.79 — likely a cat.

Key Constraint

Labeled nodes' labels are always fixed. Cats stay at 1, dogs stay at 0. They're the "seeds" ensuring propagation doesn't drift from true classes.

Step 4: Check for Convergence

When to stop propagating? Two criteria.

Criterion 1: Probability change below threshold

If all unlabeled nodes' probabilities change less than ε (e.g., 10⁻⁵) across two consecutive iterations:

Pi(t+1)Pi(t)<ϵ(for all unlabeled nodes)|P_i^{(t+1)} - P_i^{(t)}| < \epsilon \quad \text{(for all unlabeled nodes)}

Labels stabilized. Stop.

Criterion 2: Maximum iterations reached

Set an upper limit (e.g., 100 rounds). Hit it? Stop, avoiding infinite loops.

Step 5: Predict Final Classes

After propagation stops, determine each node's class by its probability:

  • P > 0.5 → predict cat (class 1)
  • P < 0.5 → predict dog (class 0)
  • P = 0.5 → random assignment (rare)

95 unlabeled images now have "cat" or "dog" predictions. Done.

Why Does This Work? Two Common-Sense Assumptions

Graph-based semi-supervised learning relies on two assumptions aligned with reality.

Assumption 1: Homophily

"Close = similar": nodes with stronger connections (higher similarity) likely share the same class.

Like your friends — their interests probably match yours. Daily gym buddies? Probably all into fitness.

Same with graphs: two cat images with high similarity (thick edge) probably belong to the same class.

Assumption 2: Cluster Assumption

"Clusters match classes": if a group of nodes connects tightly (forming a "clique"), they likely share the same class.

Classroom friend groups often have similar interests. Data's no different: similar cat images cluster together, dog images form another cluster.

These assumptions aren't perfect, but they work for most scenarios.

Binary Classification: Simpler Propagation

Earlier we covered the general framework. Binary classification (e.g., spam vs legitimate email) offers some conveniences.

Difference 1: Only One Probability Needed

Multi-class: maintain "cat probability," "dog probability," "bird probability"...

Binary: one probability suffices — P(positive class). Negative class probability = 1 - P automatically.

Difference 2: Simpler Threshold Decision

Multi-class: compare multiple class probabilities, pick the highest.

Binary: check if P > 0.5. Over.

Difference 3: More Intuitive Initialization

  • Positive class (e.g., spam): P = 1
  • Negative class (e.g., legitimate): P = 0
  • Unlabeled: P = 0.5 (neutral)

Propagation formula stays identical, just less computation.

Graph Semi-Supervised vs Other Semi-Supervised Methods

AspectGraph Semi-SupervisedSelf-TrainingCo-Training
Core IdeaLeverage sample relationship networkModel trains on its own predictionsTwo feature views train each other
RequiresSample similarity measureInitial classifierSeparable feature sets
ProsIntuitive, interpretable (see propagation paths)Simple, no graph constructionExploits feature diversity
ConsLarge datasets = huge graph, slow computationError accumulationNeeds feature independence

Graph semi-supervised's biggest strength: visualization. You see how labels spread along edges, which nodes influenced others.

Biggest weakness: scalability. 100,000 nodes? Affinity matrix = 100,000 × 100,000. Can't store it, can't compute it. New sample arrives? Rebuild the entire graph.

Real-World Applications

1. Social Network Analysis

Problem: Predict user interests, but only a few filled out surveys.

Solution:

  • Build user relationship graph (follows, likes, comments)
  • Label propagation: interest labels spread through social ties
  • Result: non-survey users get interest predictions

2. Image Classification (Scarce Labels)

Problem: Medical imaging data, annotation extremely costly (requires experts). 1,000 scans, only 50 labeled.

Solution:

  • Compute image similarity (color, texture, shape)
  • Build graph, 50 labeled nodes as seeds
  • Propagate: similar lesion scans get similar labels

3. Text Classification

Problem: News classification, 100 labeled articles, 10,000 unlabeled.

Solution:

  • Compute text similarity (TF-IDF, word vectors)
  • Build document relationship graph
  • Propagate: topically similar documents cluster, auto-gain categories

Limitations and Future Directions

Limitation 1: High Computational Complexity

Problem: Affinity matrix W is n × n (n = sample count). 100,000 samples = 10 billion elements.

Improvement:

  • Sparsification: only keep k-nearest neighbor edges (k-NN graph)
  • Approximation algorithms: use sampling or dimensionality reduction

Limitation 2: New Samples Require Graph Rebuild

Problem: New sample arrives, entire graph needs recomputation.

Improvement:

  • Incremental graph construction: only compute new sample's edges to existing nodes
  • Graph Neural Networks (GNNs): learn graph representations, support dynamic updates

Limitation 3: Depends on Similarity Accuracy

Problem: Wrong similarity (calculating cats and dogs as very similar) propagates wrong labels.

Improvement:

  • Better feature extraction (deep learning)
  • Multi-view fusion (combine different similarity metrics)

Key Takeaways

  1. Graph essence: Turn samples into nodes, similarity into edges, build relationship network.
  2. Label propagation logic:
    • Labeled nodes are seeds
    • Labels spread along edges (higher similarity = faster spread)
    • Iterate until stable
  3. Two assumptions:
    • Homophily: similar samples → same class
    • Cluster assumption: tightly connected nodes → same class
  4. Binary classification advantages:
    • Only one probability needed
    • Simple threshold decision (P > 0.5)
  5. vs Other methods:
    • Pros: intuitive, interpretable
    • Cons: slow, not scalable
  6. Applications: Social networks, medical imaging, text classification — any scenario with "sample relationships"

Ready to explore semi-supervised learning?

Master graph-based methods, label propagation algorithms, and other semi-supervised techniques. Learn when to leverage unlabeled data and how to build robust classifiers with minimal supervision.

Ask AI ✨
Graph Semi-Supervised Learning: When 5 Labels Spread to 95 Samples | MathIsimple