Machine Learning/Learning Center/Rule Learning/RIPPER Algorithm

RIPPER Algorithm

Master the RIPPER algorithm for global rule set optimization. Learn how to overcome sequential covering's greedy limitations through rule re-specialization and re-generalization.

Module 5 of 7

Intermediate to Advanced

120-150 min

Motivation: Overcoming Sequential Covering Limitations

Sequential covering is a greedy algorithm that learns rules one at a time, never revisiting or modifying previously learned rules. This can lead to suboptimal rule sets because early rules may prevent better rules from being discovered later.

RIPPER's Solution: Global Optimization

RIPPER (Repeated Incremental Pruning to Produce Error Reduction) addresses this by optimizing rules globally. After generating an initial rule set, RIPPER iteratively refines each rule by re-learning it on its covered samples, potentially finding better rules that improve the entire rule set's performance.

Key Innovation:

Instead of treating rules as immutable, RIPPER allows rules to be re-specialized and re-generalized based on their contribution to the overall rule set performance, not just individual rule quality.

RIPPER Algorithm Steps

Step 1

Generate Initial Rule Set

Use IREP* algorithm to generate an initial rule set $R = \{r_1, r_2, \ldots, r_k\}$ that covers most training samples. This provides a starting point for optimization.

Example:

Initial rule set: 3 rules covering 95% of training samples with 82% overall accuracy.

Step 2

Rule Optimization Loop

For each rule $r_i$ in the rule set:

a. Collect covered samples: Find all samples (positive and negative) covered by $r_i$ in the original training set.
b. Re-learn rule: Use single rule learning on the covered samples to generate a new candidate rule $r_i'$ .
c. Two optimization strategies:
- • Re-specialize: Start from $r_i$ and add literals to improve accuracy
- • Re-generalize: Start from $r_i$ and remove literals to improve coverage
d. Global evaluation: Replace $r_i$ with $r_i'$ in rule set and evaluate entire rule set on validation set. Keep the version (original or new) with higher validation accuracy.

Step 3

Iterative Improvement

Repeat step 2 for all rules in the rule set. Continue iterating until:

• No rule can be improved (validation accuracy stops increasing)
• Maximum number of iterations reached
• Improvement threshold not met

Re-Specialization and Re-Generalization

RIPPER uses two complementary strategies to optimize rules:

Re-Specialization (Adding Literals)

Start from current rule and add literals to reduce negative coverage while maintaining positive coverage:

Example:

Current rule: $\text{fraud} \leftarrow (\text{amount} > 10000)$

Covers: 50 positives, 20 negatives (accuracy: 71%)

Re-specialize: $\text{fraud} \leftarrow (\text{amount} > 10000) \land (\text{location} = \text{foreign})$

Covers: 45 positives, 5 negatives (accuracy: 90%) ✓ Better!

Re-Generalization (Removing Literals)

Start from current rule and remove literals to increase coverage while maintaining acceptable accuracy:

Example:

Current rule: $\text{fraud} \leftarrow (\text{amount} > 10000) \land (\text{location} = \text{foreign}) \land (\text{time} = \text{night})$

Covers: 30 positives, 2 negatives (accuracy: 94%, but low coverage)

Re-generalize: $\text{fraud} \leftarrow (\text{amount} > 10000) \land (\text{location} = \text{foreign})$

Covers: 45 positives, 5 negatives (accuracy: 90%, better coverage) ✓ Better overall!

Global Evaluation

The key difference between RIPPER and sequential covering is global evaluation. Instead of evaluating a rule in isolation, RIPPER evaluates how the rule contributes to the entire rule set's performance.

Evaluation Process

1. Original rule set: Evaluate $R = \{r_1, r_2, \ldots, r_k\}$ on validation set → accuracy $A_{\text{original}}$
2. Modified rule set: Replace $r_i$ with $r_i'$ to get $R'$ , evaluate on validation set → accuracy $A_{\text{modified}}$
3. Decision: If $A_{\text{modified}} > A_{\text{original}}$ , keep $r_i'$ ; otherwise, keep $r_i$

Why Global Evaluation Matters:

A rule that improves individually may not improve the rule set if it overlaps with other rules. Global evaluation ensures that rule changes benefit the entire system, not just the individual rule.

Fraud Detection Example

Apply RIPPER to fraud detection. Optimize an initial rule set through iterative re-specialization and re-generalization.

Rule Optimization Process

Iteration	Rule	Validation Accuracy	Action
1	fraud ← (amount > $10,000) ∧ (location = foreign)	85%	Initial rule
2	fraud ← (amount > $10,000) ∧ (location = foreign) ∧ (time = night)	88%	Re-specialize
3	fraud ← (amount > $10,000) ∧ (location = foreign)	90%	Re-generalize
4	fraud ← (amount > $10,000) ∧ (location = foreign)	90%	Converged

RIPPER improves rule from 85% to 90% validation accuracy through re-specialization and re-generalization.

Optimization Details

Iteration 2: Re-Specialization

Original rule covers some legitimate transactions. Add "time = night" literal to reduce false positives.

Result: Accuracy improves from 85% to 88%, but coverage decreases slightly.

Iteration 3: Re-Generalization

"time = night" literal is too restrictive. Remove it to increase coverage while maintaining good accuracy.

Result: Accuracy improves to 90% with better coverage. Global evaluation shows this is better for the rule set.

Final Result:

RIPPER successfully optimizes the rule through iterative refinement. The final rule achieves 90% validation accuracy, better than the initial 85%, demonstrating the power of global optimization over greedy sequential covering.

Advantages Over Sequential Covering

Key Advantages

• Global optimization: Considers entire rule set performance
• Rule refinement: Can improve rules after initial learning
• Better accuracy: Typically achieves higher validation accuracy
• Handles rule interactions: Accounts for overlap between rules
• Flexible optimization: Can specialize or generalize as needed

Limitations

• Computational cost: More expensive than sequential covering
• Requires validation set: Needs separate data for evaluation
• Iteration overhead: Multiple passes through rule set
• Convergence issues: May require many iterations
• Still not globally optimal: Heuristic optimization, not exhaustive

Next Module