Machine Learning/Learning Center/Rule Learning/Sequential Covering

Sequential Covering

Master the core learning framework for rule extraction. Learn the divide-and-conquer strategy, greedy algorithm characteristics, and step-by-step examples with loan approval classification.

Module 2 of 7

Intermediate Level

90-120 min

Core Idea: Divide-and-Conquer Strategy

Sequential covering is a divide-and-conquer approach to learning rule sets. The algorithm learns one rule at a time, removes all samples covered by that rule (both positive and negative), and repeats the process on the remaining samples until all samples are covered or no effective rules can be learned.

Key Principle

Each learned rule should cover as many positive examples as possible while avoiding negative examples. Once a rule is learned, all covered samples (positive and negative) are removed from the training set, and the process continues with the remaining samples.

Why Remove Covered Samples?

• Prevents redundant rules that cover the same samples
• Focuses learning on uncovered samples
• Ensures each rule contributes uniquely to the rule set
• Simplifies the learning problem incrementally

Algorithm Steps

Sequential Covering Algorithm

1. Initialize: Start with the full training set $D$ containing positive examples $P$ and negative examples $N$ . Initialize empty rule set $R = \emptyset$ .
2. Learn a rule: Use a single rule learning algorithm to find the best rule $r$ that covers many positive examples and few negative examples from the current training set.
3. Add rule to set: Add the learned rule $r$ to the rule set: $R = R \cup \{r\}$ .
4. Remove covered samples: Remove all samples (both positive and negative) covered by rule $r$ from the training set: $D = D \setminus \text{covered}(r)$ .
5. Check termination: If no positive examples remain or no effective rule can be learned, stop. Otherwise, return to step 2.
6. Add default rule: If needed, add a default rule (usually predicting the majority class) to handle any remaining uncovered samples.

Greedy Algorithm Characteristics

Sequential covering is a greedy algorithm: at each step, it selects the locally optimal rule (best for current samples) without considering the global impact on the final rule set.

Advantages

• Efficient: Polynomial time complexity
• Simple: Easy to implement and understand
• Incremental: Builds rule set step by step
• Interpretable: Each rule is learned independently

Limitations

• Suboptimal: May not find globally optimal rule set
• Order dependency: Rule order matters for predictions
• Early commitment: Once a rule is added, it cannot be modified
• Local optima: May get stuck in suboptimal solutions

Example of Suboptimality

If the algorithm first learns rule "approve ← income ≥ $50,000", it may miss a better rule "approve ← income ≥ $40,000 ∧ credit_score ≥ 750" that covers more samples with higher accuracy. This is why advanced algorithms like RIPPER use global optimization to refine rules.

Loan Approval Example

Apply sequential covering to learn a rule set for loan approval. The training set contains 100 samples: 60 approve (positive) and 40 reject (negative).

Sequential Covering Process

Step	Learned Rule	Samples Removed	Remaining
1	IF income ≥ $50,000 AND credit_score ≥ 700 THEN approve	45	55
2	IF employment_years ≥ 2 AND credit_score ≥ 650 THEN approve	30	25
3	IF debt_ratio < 0.3 AND savings ≥ $10,000 THEN approve	20	5
4	Default: reject	5	0

After 4 steps, all 100 samples are covered. Rules are evaluated in order (1 → 2 → 3 → default).

Step-by-Step Walkthrough

Step 1: Initial State

Training set: 60 approve, 40 reject (100 total). Learn first rule covering high-income, high-credit applicants. Rule covers 45 samples (40 approve, 5 reject). Remove all 45 samples.

Step 2: Remaining Samples

Remaining: 20 approve, 35 reject (55 total). Learn rule for stable employment cases. Rule covers 30 samples (25 approve, 5 reject). Remove all 30 samples.

Step 3: Final Rule

Remaining: 5 approve, 25 reject (30 total). Learn rule for low-debt, high-savings cases. Rule covers 20 samples (15 approve, 5 reject). Remove all 20 samples.

Step 4: Default Rule

Remaining: 0 approve, 5 reject (5 total). Add default rule "reject" to handle remaining samples. All samples now covered.

Final Rule Set:

Four rules learned sequentially. Each rule handles a distinct subset of samples. Rules are evaluated in order, and the first matching rule's prediction is used.

Coverage and Termination Conditions

Coverage Metrics

Rule Coverage

Number of samples (positive and negative) that satisfy the rule body. A rule with high positive coverage and low negative coverage is desirable.

Rule Accuracy

Proportion of covered samples that are positive: $\text{accuracy} = \frac{\text{covered\_positives}}{\text{covered\_positives} + \text{covered\_negatives}}$

Termination Conditions

The algorithm stops when one of the following conditions is met:

• No positive examples remain: All positive samples are covered by learned rules
• No effective rule can be learned: No rule meets the minimum accuracy threshold
• Maximum rules reached: Predefined limit on rule set size
• Coverage threshold met: Sufficient percentage of positive examples covered

Advantages and Limitations

Advantages

• Simple and intuitive: Easy to understand and implement
• Efficient: Polynomial time complexity
• Incremental: Builds rule set step by step
• Interpretable: Each rule learned independently
• Handles large datasets: Scales well with data size

Limitations

• Greedy approach: May not find globally optimal rule set
• Order dependency: Rule evaluation order matters
• Early commitment: Cannot modify learned rules
• Suboptimal coverage: May miss better rule combinations
• Local optima: Can get stuck in suboptimal solutions

Next Module