While classification SVM finds a decision boundary to separate classes, SVR finds a function that fits data points within a tolerance margin.
| Aspect | Classification SVM | Regression SVR |
|---|---|---|
| Goal | Find maximum margin separating hyperplane | Find function within epsilon-tube containing most samples |
| Output | Discrete class labels | Continuous real values |
| Margin | Distance between classes | Epsilon-tube width around regression line |
| Loss Function | Hinge loss | Epsilon-insensitive loss |
| Support Vectors | Points on or violating margin | Points outside epsilon-tube |
In classification, we want to push samples apart. In regression, we want to fit samples within a tube while keeping the function as flat (simple) as possible.
The -insensitive loss allows prediction errors within to have zero loss, creating a "tube" around the regression function.
Loss = 0
Prediction is "good enough," no penalty
Loss =
Linear penalty beyond epsilon
Why two slack variables? Because predictions can deviate above or below the target. handles upper violations, handles lower violations.
Lagrangian multipliers:
for lower bound, for upper bound
Kernel trick:
Works just like classification SVM, can be any valid kernel
In SVR, support vectors are samples that either lie on the boundary of the epsilon-tube or outside it.
Samples with prediction error have and do not contribute to the final model. This creates the desired sparse representation—the regression function depends only on "difficult" samples outside the tube.
Only support vectors (samples outside epsilon-tube) affect the final model, leading to efficient predictions
Epsilon-insensitive loss makes SVR less sensitive to outliers and noise compared to squared error loss
Can handle non-linear regression through kernel trick without explicit feature transformation
Based on statistical learning theory with provable generalization bounds
Scenario: Predict median house prices in California districts based on location, demographics, and housing characteristics.
Support Vectors
2,456 (12%)
Test RMSE
$48,200
R² Score
0.78
Result: SVR with RBF kernel captured non-linear relationships between location and price. Only 12% of districts became support vectors—these were areas with unusual price patterns (e.g., luxury neighborhoods, areas with recent development). The sparse model predicted efficiently while remaining robust to outliers like extremely expensive coastal properties.