Machine Learning/Learning Center/Rule Learning/First-Order Rule Learning

First-Order Rule Learning

Master rule learning for relational data. Learn first-order logic, variables and predicates, the FOIL algorithm, and applications to social networks and recommendation systems.

Module 6 of 7

Intermediate to Advanced

120-150 min

Motivation: Handling Relational Data

Propositional rules can only handle attribute-value data (tabular format). They cannot express relationships between entities, such as "user X follows user Y" or "product A is similar to product B". First-order rules extend rule learning to relational data by introducing variables and predicates.

When to Use First-Order Rules

• Relational databases: Data with multiple tables and relationships
• Social networks: User connections, friendships, follows
• Recommendation systems: User-item interactions, similarity relationships
• Knowledge graphs: Entity relationships and properties
• Bioinformatics: Protein-protein interactions, gene relationships

First-Order Logic Basics

Variables

Variables (typically uppercase letters like $X, Y, Z$ ) represent entities or objects in the domain. They can be instantiated to specific values.

Example:

In social network: $X$ and $Y$ represent users. $X = \text{Alice}$ , $Y = \text{Bob}$ is one instantiation.

Predicates

Predicates are relations or properties that can be true or false for given entities. They take variables or constants as arguments.

Examples:

• $\text{follows}(X, Y)$ : User X follows user Y
• $\text{similar\_interests}(X, Y)$ : Users X and Y have similar interests
• $\text{same\_city}(X, Y)$ : Users X and Y are in the same city
• $\text{recommend}(X, Y)$ : Recommend user Y to user X

First-Order Rules

First-order rules use variables and predicates to express relationships:

\text{recommend}(X, Y) \leftarrow \text{follows}(X, Y) \land \text{similar\_interests}(X, Y)

This rule states: "If X follows Y AND X and Y have similar interests, then recommend Y to X."

FOIL Algorithm

FOIL (First-Order Inductive Learner) extends sequential covering to first-order rules. It uses the same divide-and-conquer framework but generates candidate literals from predicates instead of attribute-value pairs.

Algorithm Framework

FOIL follows sequential covering:

1. Learn a rule: Use top-down search to find best first-order rule covering many positive examples and few negative examples
2. Remove covered samples: Remove all samples (positive and negative) covered by the learned rule
3. Repeat: Continue until no more positive examples remain

Candidate Literal Generation

For each predicate in the background knowledge, FOIL generates candidate literals:

Types of Candidates:

• Positive literals: $P(X, Y)$ , $P(X, \text{constant})$ , $P(\text{constant}, Y)$
• Negative literals: $\neg P(X, Y)$ (negation of predicates)
• Recursive literals: $Q(X, Y)$ where $Q$ is the target predicate (allows recursive rules)
• Function literals: $P(f(X), Y)$ (functions applied to variables)

F-GAIN Evaluation Metric

F-GAIN is the first-order extension of information gain. It measures the improvement in rule quality when adding a candidate literal to the current rule.

F-GAIN Formula

\text{F-GAIN} = \hat{m}_+ \left( \log_2 \frac{\hat{m}_+}{\hat{m}_+ + \hat{m}_-} - \log_2 \frac{m_+}{m_+ + m_-} \right)

Where:

• $m_+$ = number of positive examples covered by current rule
• $m_-$ = number of negative examples covered by current rule
• $\hat{m}_+$ = number of positive examples covered by rule after adding literal
• $\hat{m}_-$ = number of negative examples covered by rule after adding literal

Interpretation:

F-GAIN measures the reduction in entropy (uncertainty) weighted by the number of positive examples covered. Higher F-GAIN means the literal significantly improves rule quality by increasing accuracy while maintaining or increasing positive coverage.

Social Network Analysis Example

Apply FOIL to learn recommendation rules for a social network. Predict when to recommend user Y to user X based on their relationships and attributes.

Training Data: User Relationships

User X	User Y	Follows	Similar Interests	Same City	Recommend
Alice	Bob	Yes	Yes	Yes	Yes
Alice	Charlie	No	Yes	No	Yes
Bob	Alice	Yes	Yes	Yes	Yes
Bob	David	No	No	Yes	No
Charlie	Alice	Yes	Yes	No	Yes

Goal: Learn rule predicting when to recommend user Y to user X based on relationships.

FOIL Learning Process

Step 1: Initial Rule

\text{recommend}(X, Y) \leftarrow .

Covers all 5 examples (3 positive, 2 negative). Accuracy: 60%.

Step 2: Add Literal

Candidate literals: $\text{follows}(X, Y)$ , $\text{similar\_interests}(X, Y)$ , $\text{same\_city}(X, Y)$

F-GAIN for $\text{follows}(X, Y)$ : 0.42 (highest)

\text{recommend}(X, Y) \leftarrow \text{follows}(X, Y)

Covers 3 examples (3 positive, 0 negative). Accuracy: 100%.

Final Rule:

\text{recommend}(X, Y) \leftarrow \text{follows}(X, Y)

Perfect accuracy (100%) covering all positive examples. Rule states: "If X follows Y, then recommend Y to X."

Comparison with Propositional Rules

Propositional Rules

• Handle attribute-value data only
• Cannot express relationships
• Simpler and more efficient
• Limited to tabular data
• Example: $\text{approve} \leftarrow (\text{income} \geq 50000)$

First-Order Rules

• Handle relational data
• Express entity relationships
• More expressive but complex
• Suitable for networks/graphs
• Example: $\text{recommend}(X, Y) \leftarrow \text{follows}(X, Y)$

Advantages and Limitations

Advantages

• Handles relational data: Can model entity relationships
• More expressive: Captures complex patterns in networks
• No attribute quantization: Works with qualitative relationships
• Background knowledge: Can incorporate domain knowledge

Limitations

• Large search space: Many possible predicates and variable bindings
• Computational complexity: More expensive than propositional learning
• Requires background knowledge: Need to define predicates
• Less efficient: Slower than propositional methods

Next Module