Master filter-based feature selection that operates independently of learning algorithms. Learn Relief for binary classification and ReliefF for multi-class problems using near-hit and near-miss concepts.
Filter methods perform feature selection as a preprocessing step, independent of any learning algorithm. They evaluate features based on intrinsic properties of the data, such as correlation with the target variable or information content, without training a model.
Relief is a filter method designed for binary classification. It evaluates features by how well they distinguish between samples from different classes.
For a sample , the near-hit is the nearest sample from the same class.
Intuition: A good feature should have similar values for samples in the same class.
For a sample , the near-miss is the nearest sample from a different class.
Intuition: A good feature should have different values for samples from different classes.
The difference function measures how different two samples are on feature :
For discrete/categorical features:
For continuous features (normalized to [0,1]):
For each feature , Relief computes a relevance statistic:
where the sum is over all samples in the dataset.
Interpretation: Higher means feature is more relevant. The feature is good if:
The complete Relief algorithm for binary classification:
Initialize relevance statistics: for all features.
For iterations (typically where is the number of samples):
Rank features by in descending order. Select top features or features with above a threshold.
ReliefF extends Relief to handle multi-class classification problems. Instead of a single near-miss, it considers near-misses from each different class.
For sample with class , ReliefF finds:
The relevance statistic for feature :
where is the proportion of samples belonging to class in the dataset, and is the class of sample .
Interpretation: The weighted sum ensures that classes with more samples contribute more to the relevance statistic. This handles class imbalance naturally.
Consider a medical diagnosis dataset with features: Age, Blood Pressure, Cholesterol, and Glucose. We want to predict disease presence (Yes/No).
| Age | BP | Cholesterol | Glucose | Disease |
|---|---|---|---|---|
| 0.6 | 0.8 | 0.7 | 0.9 | Yes |
| 0.5 | 0.7 | 0.6 | 0.8 | Yes |
| 0.3 | 0.2 | 0.3 | 0.2 | No |
| 0.4 | 0.3 | 0.4 | 0.3 | No |
| 0.7 | 0.9 | 0.8 | 0.95 | Yes |
• Near-hit: Sample 2 (same class, closest)
• Near-miss: Sample 3 (different class, closest)
For Glucose feature: (near-hit), (near-miss)
Update:
Glucose accumulates the highest relevance statistic because it best distinguishes between disease and no-disease cases. Blood Pressure and Cholesterol also show high relevance, while Age may be less discriminative.
Select top 2-3 features based on values. This reduces dimensionality while preserving the most discriminative information.
Relief and ReliefF are highly efficient filter methods:
where:
Key Advantage: Linear in all dimensions! This makes Relief/ReliefF extremely fast for large-scale datasets, much faster than wrapper methods that require training models for each candidate subset.
Filter methods evaluate features independently of learning algorithms, making them fast and general-purpose.
Relief uses near-hit and near-miss concepts to measure feature relevance: good features have similar values for same-class samples and different values for different-class samples.
ReliefF extends Relief to multi-class problems by considering near-misses from each different class, weighted by class proportions.
The relevance statistic quantifies feature importance: higher values indicate more discriminative features.
Relief/ReliefF have linear time complexity , making them highly efficient for large-scale feature selection.
Filter methods are ideal for initial feature screening before applying more expensive wrapper or embedded methods.