The Representer Theorem is a fundamental result that justifies the use of kernel methods across many learning algorithms.
For a regularized optimization problem in a reproducing kernel Hilbert space :
where is monotonically increasing and is any loss function.
The optimal solution can always be expressed as:
Based on the representer theorem, many linear models can be kernelized to handle non-linear problems:
| Original Method | Kernelized Version | Core Principle |
|---|---|---|
| Linear SVM | Kernel SVM | Maximum margin classification |
| Ridge Regression | Kernel Ridge Regression | L₂ regularized regression |
| LDA | Kernel LDA | Linear discriminant analysis |
| PCA | Kernel PCA | Principal component analysis |
| Perceptron | Kernel Perceptron | Online learning algorithm |
High-dimensional sparse features, linear SVM works well
RBF kernel captures non-linear patterns in pixel data
Custom kernels for biological sequences, small sample sizes
Robust to outliers, handles mixed feature types well
Controls trade-off between margin maximization and training error. Typical range:
For RBF kernel: . Controls influence radius. Typical range:
Epsilon-tube width. Should match acceptable error tolerance in your problem domain.
Exhaustive search over parameter grid. Simple but can be computationally expensive.
Use k-fold CV (typically k=5 or 10) to evaluate parameter combinations reliably.
Smarter search using probabilistic models. More efficient for expensive evaluations.
SVM Evolution:
Relationship with Deep Learning: