Master multi-view learning and co-training algorithms. Learn how multiple classifiers trained on different views can collaborate to improve performance by leveraging high-confidence unlabeled samples.
A view is a subset of attributes (feature set) that can be used to train a classifier. Multi-view data has multiple independent views, each providing complementary information about the same samples.
For disagreement-based methods to work effectively, two critical assumptions must hold:
Different views contain consistent class information. If one view predicts class A, other views should also predict class A (or at least agree on the same class).
Example:
If text view classifies a web page as "Tech", the link view should also classify it as "Tech" (not "Sports"). Both views provide consistent information about the same underlying class.
Each view is sufficient (can train an optimal classifier alone) and conditionally independent (given the class, views are independent).
Sufficiency:
A single view contains enough information to train a good classifier. For example, text content alone can classify web pages reasonably well.
Conditional Independence:
Given the class label, views are independent:
The most famous disagreement-based method. Two classifiers trained on different views "teach each other" by selecting high-confidence unlabeled samples and providing pseudo-labels to the other view.
From unlabeled set , randomly sample samples to form a buffer pool . Remaining samples: .
Parameters: Base learning algorithm (e.g., SVM, Decision Tree), number of rounds T, positive samples per round p, negative samples per round n.
For each view j (j = 1, 2), train an initial classifier using labeled samples .
Example: View 1 (text) classifier and View 2 (links) classifier, both trained on 50 labeled web pages.
For each round t = 1 to T, and for each view j:
Final classifiers and . Can use either separately or combine predictions (e.g., voting or averaging).
Only select unlabeled samples where the classifier is highly confident:
Apply co-training to classify 200 web pages into "Tech" or "Sports" categories. Only 50 pages are labeled, while 150 are unlabeled. Two views: text content and hyperlink structure.
| ID | Text Words | Text Links | Link Count | Link Text | Label | Type |
|---|---|---|---|---|---|---|
| 1 | 450 | 8 | 12 | 35 | Tech | labeled |
| 2 | 320 | 3 | 5 | 15 | Sports | labeled |
| 3 | 280 | 2 | 8 | 25 | Tech | labeled |
| 4 | 520 | 5 | 15 | 45 | Sports | labeled |
| 5 | 380 | 6 | 10 | 30 | Tech | labeled |
| 6 | 410 | 4 | 7 | 20 | Unknown | unlabeled |
| 7 | 290 | 3 | 9 | 28 | Unknown | unlabeled |
| 8 | 480 | 7 | 11 | 38 | Unknown | unlabeled |
| 9 | 350 | 2 | 6 | 18 | Unknown | unlabeled |
| 10 | 440 | 5 | 13 | 42 | Unknown | unlabeled |
Dataset: 200 web pages (50 labeled: 25 Tech, 25 Sports; 150 unlabeled). View 1 (Text): word count, text link count. View 2 (Links): total links, link text words.
Train View 1 (text) classifier and View 2 (links) classifier on 50 labeled pages. Initial accuracy: 78% (text), 72% (links).
View 1 selects 5 high-confidence Tech and 5 Sports pages from buffer. View 2 selects 5 Tech and 5 Sports. Each view adds the other view's selections to its training set. Both classifiers retrain.
Continue iterative learning. Each round, classifiers improve by learning from high-confidence samples identified by the other view.
After 5 rounds, View 1 accuracy: 89%, View 2 accuracy: 85%. Combined (voting): 91% accuracy (vs 78% using labeled samples only). Co-training successfully leveraged unlabeled data.