MathIsimple
Machine Learning/Learning Center/Rule Learning/Inductive Logic Programming

Inductive Logic Programming (ILP)

Master advanced ILP techniques for complex knowledge discovery. Learn Minimum General Generalization (LGG), inverse resolution, predicate invention, and applications to bioinformatics.

Module 7 of 7
Advanced Level
150-180 min

ILP Overview and Goals

Inductive Logic Programming (ILP) is the advanced form of first-order rule learning. Its goal is to comprehensively learn first-order rules (Horn clauses) and support predicate invention—automatically discovering new predicates that capture implicit relationships in the data.

Key Capabilities

  • Complete rule learning: Learns full first-order Horn clauses
  • Predicate invention: Discovers new predicates not in background knowledge
  • Background knowledge integration: Incorporates existing domain knowledge
  • Complex relationship modeling: Handles multi-entity relationships

Minimum General Generalization (LGG)

LGG generalizes two specific rules into the most general rule that covers both, making minimal changes to preserve essential information.

Core Idea

Given two specific rules r1r_1 and r2r_2, find the most general rule rLGGr_{\text{LGG}} such that:

  • rLGGr_{\text{LGG}} covers all examples covered by r1r_1 and r2r_2
  • rLGGr_{\text{LGG}} is the most general (least specific) such rule
  • • Changes from r1r_1 and r2r_2 are minimal

LGG Algorithm Steps

  1. 1. Find common predicates: Identify predicates that appear in both rules (with same predicate name)
  2. 2. For each common predicate:
    • • Recursively examine arguments (terms in parentheses)
    • • If arguments are identical, keep them as-is
    • • If arguments differ, replace with a new variable
  3. 3. Remove non-common predicates: Delete predicates that appear in only one rule
  4. 4. Result: The LGG rule contains only common predicates with generalized arguments

LGG Example

Rule 1:

better(1,10)rootmorecurled(1,10)soundmoredeep(1,10)better(1,10) ← root_more_curled(1,10) ∧ sound_more_deep(1,10)

Rule 2:

better(1,15)rootmorecurled(1,15)navelmoreconcave(1,15)better(1,15) ← root_more_curled(1,15) ∧ navel_more_concave(1,15)

LGG Result:

better(1,Y)rootmorecurled(1,Y)better(1,Y) ← root_more_curled(1,Y)

Common predicate: root_more_curled\text{root\_more\_curled}. Different arguments:1010 and 1515 → replaced with variable YY. Non-common predicates removed.

Inverse Resolution

Resolution is a deductive inference rule (from general to specific). Inverse resolution is the reverse—an inductive inference rule (from specific to general) that can invent new predicates to simplify rule structures.

Unification Concepts

Substitution

A substitution θ={t1/v1,t2/v2,}\theta = \{t_1/v_1, t_2/v_2, \ldots\} replaces variables viv_i with terms tit_i.

Example: θ={1/X,2/Y}\theta = \{1/X, 2/Y\} replaces XX with11 and YY with 22.

Unification

Unification finds a substitution that makes two expressions equal.A=color_deeper(1,X)A = \text{color\_deeper}(1, X) andB=color_deeper(Y,2)B = \text{color\_deeper}(Y, 2) unify withθ={2/X,1/Y}\theta = \{2/X, 1/Y\}.

Most General Unifier (MGU)

The MGU is the most general substitution that unifies two expressions. All other unifiers can be derived from the MGU.

Four Complete Inverse Resolution Operations

These operations enable predicate invention and rule generalization:

1. Absorption

pAB;qApqB;qA\frac{p \leftarrow A \land B; \quad q \leftarrow A}{p \leftarrow q \land B; \quad q \leftarrow A}

Introduces new predicate qq to factor out common subexpression AA.

2. Identification

pAB;pAqqB;pAq\frac{p \leftarrow A \land B; \quad p \leftarrow A \land q}{q \leftarrow B; \quad p \leftarrow A \land q}

Discovers that qq is equivalent to BB.

3. Intra-Construction

pAB;pACqB;pAq;qC\frac{p \leftarrow A \land B; \quad p \leftarrow A \land C}{q \leftarrow B; \quad p \leftarrow A \land q; \quad q \leftarrow C}

Invents predicate qq to represent common pattern in BB and CC.

4. Inter-Construction

pAB;qACprB;rA;qrC\frac{p \leftarrow A \land B; \quad q \leftarrow A \land C}{p \leftarrow r \land B; \quad r \leftarrow A; \quad q \leftarrow r \land C}

Invents predicate rr to factor out common subexpression AA.

Predicate Invention: Core Advantage

The most powerful feature of ILP is predicate invention—automatically discovering new predicates that capture implicit patterns in the data, making rules more concise and generalizable.

Example: Predicate Invention

Given rules:

better(X,Y)color_deeper(X,Y)root_more_curled(X,Y)\text{better}(X, Y) \leftarrow \text{color\_deeper}(X, Y) \land \text{root\_more\_curled}(X, Y)
better(X,Y)color_deeper(X,Y)navel_more_concave(X,Y)\text{better}(X, Y) \leftarrow \text{color\_deeper}(X, Y) \land \text{navel\_more\_concave}(X, Y)

ILP can invent new predicate superior(X,Y)\text{superior}(X, Y):

superior(X,Y)root_more_curled(X,Y)\text{superior}(X, Y) \leftarrow \text{root\_more\_curled}(X, Y)
superior(X,Y)navel_more_concave(X,Y)\text{superior}(X, Y) \leftarrow \text{navel\_more\_concave}(X, Y)
better(X,Y)color_deeper(X,Y)superior(X,Y)\text{better}(X, Y) \leftarrow \text{color\_deeper}(X, Y) \land \text{superior}(X, Y)

The invented predicate superior\text{superior} captures the common pattern, making the rule set more concise and interpretable.

Bioinformatics Application Example

Apply ILP to discover protein-protein interaction patterns. Learn rules predicting when two proteins interact based on their properties and relationships.

Protein Interaction Rules

Background Knowledge:

  • located_in(P,C)\text{located\_in}(P, C): Protein P is located in cellular component C
  • has_function(P,F)\text{has\_function}(P, F): Protein P has function F
  • similar_sequence(P1,P2)\text{similar\_sequence}(P1, P2): Proteins P1 and P2 have similar sequences
  • interacts(P1,P2)\text{interacts}(P1, P2): Proteins P1 and P2 interact (target predicate)

Learned Rule:

interacts(P1,P2)located_in(P1,C)located_in(P2,C)has_function(P1,F)has_function(P2,F)\text{interacts}(P1, P2) \leftarrow \text{located\_in}(P1, C) \land \text{located\_in}(P2, C) \land \text{has\_function}(P1, F) \land \text{has\_function}(P2, F)

Rule states: "If two proteins are in the same cellular component AND have the same function, then they interact."

With Predicate Invention:

ILP invents predicate co_localized(P1,P2)\text{co\_localized}(P1, P2):

co_localized(P1,P2)located_in(P1,C)located_in(P2,C)\text{co\_localized}(P1, P2) \leftarrow \text{located\_in}(P1, C) \land \text{located\_in}(P2, C)
interacts(P1,P2)co_localized(P1,P2)has_function(P1,F)has_function(P2,F)\text{interacts}(P1, P2) \leftarrow \text{co\_localized}(P1, P2) \land \text{has\_function}(P1, F) \land \text{has\_function}(P2, F)

The invented predicate co_localized\text{co\_localized} captures the co-location pattern, making the rule more concise and reusable.

ILP Advantage:

ILP discovers meaningful biological patterns and invents predicates that represent biological concepts (co-localization), enabling knowledge discovery that goes beyond simple pattern matching.

Advantages and Limitations

Advantages

  • Predicate invention: Discovers implicit relationships
  • Knowledge discovery: Reveals hidden patterns in data
  • Concise rules: Invented predicates simplify rule structures
  • Background knowledge: Integrates domain expertise
  • Comprehensive learning: Learns complete Horn clauses

Limitations

  • High complexity: Computationally expensive
  • Data quality sensitive: Requires clean, consistent data
  • Difficult to engineer: Hard to implement and tune
  • Limited scalability: May not scale to very large datasets
  • Requires expertise: Needs domain knowledge for predicates