MathIsimple

Semi-Supervised SVM (TSVM)

Master Transductive SVM for semi-supervised learning. Learn how to leverage unlabeled samples to optimize the decision boundary through low-density separation principle.

Module 3 of 6
Intermediate to Advanced
100-130 min

Core Idea: Low-Density Separation

Based on the cluster assumption, the optimal decision boundary should pass through low-density regions (sparse areas between clusters) rather than high-density regions. Traditional SVM only uses labeled samples; TSVM additionally uses unlabeled samples to optimize the hyperplane position.

Key Principle

The decision boundary should not only separate labeled samples correctly but also pass through regions where unlabeled samples are sparse. This aligns with the cluster assumption: low-density regions are likely boundaries between clusters (classes).

Advantages

  • • High classification accuracy
  • • Strong robustness
  • • Leverages unlabeled data
  • • Clear geometric intuition

Limitations

  • • Computationally complex
  • • Difficult for multi-class
  • • Sensitive to initialization
  • • Requires tuning C_l, C_u

TSVM (Transductive SVM) Algorithm

TSVM assigns pseudo-labels y^\hat{y} to unlabeled samples and optimizes the SVM objective function with both labeled and pseudo-labeled samples.

Optimization Objective

minw,b,y^,ξ12w22+Cli=1lξi+Cui=l+1mξi\min_{w,b,\hat{y},\xi} \frac{1}{2}\|w\|_2^2 + C_l \sum_{i=1}^l \xi_i + C_u \sum_{i=l+1}^m \xi_i

Subject to:

yi(wxi+b)1ξi(ξi0,i=1,,m)y_i(w \cdot x_i + b) \geq 1 - \xi_i \quad (\xi_i \geq 0, i=1,\ldots,m)

Where:

  • w22\|w\|_2^2 = margin maximization term (traditional SVM)
  • ClC_l = penalty parameter for labeled samples (controls misclassification cost)
  • CuC_u = penalty parameter for unlabeled samples (controls pseudo-label confidence)
  • ξi\xi_i = slack variables (allow some misclassification)
  • y^\hat{y} = pseudo-labels for unlabeled samples (values in {+1,1}\{+1,-1\})
  • • Typically, start with Cu<ClC_u < C_l (less confidence in pseudo-labels initially)
Step 1

Initialization

Train a standard SVM using only labeled samples DlD_l. Use this model to predict pseudo-labels y^\hat{y} for unlabeled samples DuD_u.

Example: If SVM predicts f(xj)>0f(x_j) > 0, assign y^j=+1\hat{y}_j = +1; otherwise y^j=1\hat{y}_j = -1.

Step 2

Iterative Optimization

Set initial Cu<ClC_u < C_l (e.g., Cu=0.1ClC_u = 0.1 \cdot C_l). Solve the optimization problem to obtain hyperplane (w,b)(w, b).

Note: Lower CuC_u means less penalty for pseudo-label errors, allowing the algorithm to explore different pseudo-label assignments.

Step 3

Check Margin Conflicts

Check if pseudo-labels are consistent with the margin:

  • • If a sample with y^j=+1\hat{y}_j = +1 has margin <0< 0 (wrong side), it's a conflict
  • • If a sample with y^j=1\hat{y}_j = -1 has margin >0> 0 (wrong side), it's a conflict
  • • Swap pseudo-labels of conflicting samples
Step 4

Gradually Increase C_u

Increase CuC_u (e.g., Cu=min{2Cu,Cl}C_u = \min\{2C_u, C_l\}) and repeat steps 2-3 until Cu=ClC_u = C_l.

Rationale: Gradually increasing confidence in pseudo-labels allows the algorithm to refine the decision boundary progressively.

Step 5

Output

Final pseudo-labels y^\hat{y} for unlabeled samples are the prediction results (transductive learning: predicting known unlabeled samples).

Handling Class Imbalance

If pseudo-labels result in severe class imbalance (e.g., 90% positive, 10% negative), it can bias the SVM training. Solution: split CuC_u into Cu+C_u^+ and CuC_u^-.

Balanced Cost Adjustment

Cu+=uu+CuC_u^+ = \frac{u_-}{u_+} C_u^-

Where:

  • u+u_+ = number of pseudo-labeled positive samples
  • uu_- = number of pseudo-labeled negative samples
  • • This ensures balanced penalty for both classes

Example

If after pseudo-labeling we have 90 positive and 10 negative samples:

  • u+=90u_+ = 90, u=10u_- = 10
  • • Set Cu=0.1C_u^- = 0.1 (base value)
  • • Then Cu+=10900.1=0.011C_u^+ = \frac{10}{90} \cdot 0.1 = 0.011
  • • This reduces penalty for the majority class, balancing the optimization

Medical Diagnosis Example

Apply TSVM to diagnose a disease using 200 patient records. Only 50 patients have confirmed diagnoses (labeled), while 150 patients lack diagnosis (unlabeled). Features include age, fever temperature, cough presence, and fatigue level.

Patient Dataset: Limited Labels

IDAgeFever (°C)CoughFatigueLabelType
14538.5YesYes
Positive
labeled
23236.8NoNo
Negative
labeled
35837.2NoYes
Negative
labeled
42938.8YesYes
Positive
labeled
54136.9NoNo
Negative
labeled
65237.5YesNoUnknown
unlabeled
73538.2YesYesUnknown
unlabeled
84836.7NoNoUnknown
unlabeled
92737.8NoYesUnknown
unlabeled
106138.9YesYesUnknown
unlabeled

Dataset: 200 patients (50 labeled: 25 Positive, 25 Negative; 150 unlabeled). Binary classification: Positive (disease present) vs Negative (disease absent).

TSVM Training Process

Initialization:

Train standard SVM on 50 labeled patients. Predict pseudo-labels for 150 unlabeled patients.

  • • Initial accuracy on labeled: 84%
  • • Pseudo-labels: 78 positive, 72 negative (relatively balanced)

Iteration 1 (C_u = 0.1 * C_l):

Optimize with low confidence in pseudo-labels. Check margin conflicts and swap 12 conflicting pseudo-labels.

Iteration 2-4 (C_u gradually increases):

Continue optimization with increasing confidence. Decision boundary shifts to low-density regions.

Final Result (C_u = C_l):

Final model achieves 91% accuracy on test set (vs 84% using labeled samples only). Decision boundary passes through low-density region between positive and negative clusters.

Advantages and Limitations

Advantages

  • High classification accuracy: Leverages unlabeled data effectively
  • Strong robustness: Margin maximization provides good generalization
  • Clear geometric intuition: Low-density separation is easy to understand
  • Well-established theory: Based on solid SVM foundation

Limitations

  • Computationally complex: Requires solving optimization with pseudo-labels
  • Difficult for multi-class: Primarily designed for binary classification
  • Sensitive to initialization: Poor initial pseudo-labels can lead to local optima
  • Requires parameter tuning: C_l and C_u need careful selection