Master wrapper methods that tailor feature selection to specific learning algorithms. Learn the Las Vegas Wrapper (LVW) algorithm using random subset search and cross-validation to find optimal feature subsets.
Wrapper methods use the performance of a learning algorithm as the evaluation criterion for feature subsets. Unlike filter methods that evaluate features independently, wrapper methods train models on candidate feature subsets and select features that maximize the learner's performance.
Wrapper methods directly optimize for the learning algorithm's performance. The feature subset that works best with a specific learner (e.g., decision tree, SVM) may differ from the subset that works best with another learner. This "tailored" approach often yields better performance than filter methods.
LVW uses a random search strategy to explore feature subsets. It evaluates each candidate subset by training the learning algorithm and measuring its cross-validation error. The algorithm continues until no improvement is found for a specified number of consecutive iterations.
Initialize best error and feature set:
(best error so far)
(best feature set, start with all features)
(consecutive non-improvement counter)
While :
If or ( and):
(reset counter)
Otherwise: (increment counter)
Output the optimal feature subset with error.
LVW uses cross-validation to evaluate feature subsets. This provides a reliable estimate of generalization performance and reduces overfitting risk.
Consider a credit scoring dataset with features: Income, Age, Employment, Credit Score, Debt Ratio, Loan Amount, and Loan History. We want to select features for a decision tree classifier to predict loan default.
Start with all 7 features. Train decision tree with 5-fold cross-validation: (18% error rate).
Train decision tree with 3 features, 5-fold CV:
→ Update: ,,
Random subsets {Age, Employment}, {Loan Amount}, etc. all have . Counter increments:
Train with 4 features, 5-fold CV:
→ Update: ,,
After consecutive non-improvements, algorithm stops. Optimal subset: {Income, Credit Score, Debt Ratio, Loan History} with 12% error rate, reducing from 7 features to 4 while improving performance.
Understanding the trade-offs helps choose the right method:
| Aspect | Filter Methods | Wrapper Methods |
|---|---|---|
| Speed | Very fast (linear time) | Slow (requires model training) |
| Performance | Good, general-purpose | Better, learner-specific |
| Feature Interactions | Often ignored | Automatically considered |
| Overfitting Risk | Low | Higher (mitigated by CV) |
| Scalability | Excellent | Limited by training time |
| Use Case | Initial screening, large datasets | Final optimization, small-medium datasets |
Wrapper methods optimize feature selection for specific learning algorithms, often achieving better performance than filter methods.
LVW uses random subset search to avoid local optima, evaluating subsets with cross-validation for reliable performance estimates.
The algorithm stops after consecutive iterations without improvement, balancing exploration and convergence.
Cross-validation provides unbiased error estimates and reduces overfitting risk in feature selection.
Wrapper methods are computationally expensive but valuable when performance is critical and dataset size is manageable.
The tie-breaking rule (prefer smaller subsets when errors are equal) implements the parsimony principle.