Machine Learning/Learning Center/Probabilistic Graphical Models/Exact Inference

Exact Inference

Master precise probability computation methods without approximation. Learn variable elimination, belief propagation, and when exact inference is feasible for acyclic graphs.

Module 5 of 7

Intermediate to Advanced

120-150 min

Inference Goals

In probabilistic graphical models, inference refers to computing probabilities of interest from the joint distribution. The two main types are:

Marginal Probability

P(x_E) = \sum_{x_F} P(x_E, x_F)

Compute probability of variable set $E$ by summing over all possible values of other variables $F$ .

Conditional Probability

P(x_F | x_E) = \frac{P(x_E, x_F)}{\sum_{x_F} P(x_E, x_F)}

Compute probability of $F$ given evidence $E$ using Bayes' rule.

Variable Elimination

Variable elimination is a general exact inference algorithm that eliminates variables one by one through step-by-step integration (for continuous) or summation (for discrete).

Algorithm Steps

1. Choose elimination order: Select an order to eliminate variables $x_1, x_2, \ldots, x_k$ (not in query set $E$ ).
2. For each variable $x_i$ :
- • Collect all factors (probability tables) involving $x_i$
- • Multiply these factors together
- • Sum out (eliminate) $x_i$ from the product
- • Result is a new factor (without $x_i$ )
3. Final result: After eliminating all variables, remaining factors give the marginal probability $P(x_E)$ .

Example: Computing $P(x_3)$

Given joint distribution $P(x_1, x_2, x_3, x_4) = P(x_1) P(x_2 | x_1) P(x_3 | x_2) P(x_4 | x_3)$ :

Step 1: Eliminate $x_1$

\sum_{x_1} P(x_1) P(x_2 | x_1) = P(x_2)

Step 2: Eliminate $x_2$

\sum_{x_2} P(x_2) P(x_3 | x_2) = P(x_3)

Step 3: Eliminate $x_4$

\sum_{x_4} P(x_3) P(x_4 | x_3) = P(x_3)

Result: $P(x_3)$ (marginal probability of $x_3$ ).

Complexity

Time complexity depends on elimination order. Optimal order minimizes the size of intermediate factors. For tree structures, variable elimination is efficient. For graphs with cycles, complexity can be exponential.

Belief Propagation (Message Passing)

Belief Propagation (BP), also called the sum-product algorithm, is an efficient exact inference method for tree-structured graphs (acyclic graphs). It works by passing messages between nodes.

Message Passing Formula

Message from node $i$ to node $j$ :

m_{i \to j}(x_j) = \sum_{x_i} \psi_{ij}(x_i, x_j) \phi_i(x_i) \prod_{k \in n(i) \setminus j} m_{k \to i}(x_i)

Where:

• $\psi_{ij}(x_i, x_j)$ = potential function on edge (i, j)
• $\phi_i(x_i)$ = local potential at node i
• $n(i) \setminus j$ = neighbors of i except j
• Messages are passed from leaves to root, then root to leaves

Marginal Probability Formula

After message passing, marginal probability of node $i$ :

P(x_i) \propto \phi_i(x_i) \prod_{j \in n(i)} m_{j \to i}(x_i)

Normalize to ensure probabilities sum to 1.

Algorithm Steps

1. Initialize: Set all messages to 1 (or uniform distribution).
2. Upward pass (leaves → root):
- • Start from leaf nodes (nodes with only one neighbor)
- • Each leaf sends message to its neighbor
- • Continue until root receives messages from all children
3. Downward pass (root → leaves):
- • Root sends messages to all children
- • Each node sends messages to children after receiving from parent
- • Continue until all leaves receive messages
4. Compute marginals: For each node, combine incoming messages with local potential to get marginal probability.

Applicable Scenarios

Tree Structures (Acyclic Graphs)

Exact inference is most efficient for tree-structured graphs (no cycles):

• Belief propagation: Guaranteed to converge and give exact results
• Time complexity: $O(N \cdot |S|^2)$ where N = nodes, |S| = state space size
• Examples: Chains (HMM), trees (hierarchical models), polytrees

Graphs with Cycles

For graphs with cycles, exact inference becomes intractable:

• Variable elimination: Complexity can be exponential
• Belief propagation: May not converge or give incorrect results
• Solution: Use approximate inference methods (see next module)

Small Graphs

For small graphs (few variables, small state spaces), exact inference is always feasible:

• Brute force: Enumerate all possible assignments
• Variable elimination: Efficient even with cycles if graph is small
• Examples: Medical diagnosis networks with <10 variables

Advantages and Limitations

Advantages

• Exact results: No approximation error
• Efficient for trees: Polynomial time complexity
• Interpretable: Clear algorithmic steps
• Deterministic: Same input always gives same output

Limitations

• Only for trees: Requires acyclic graph structure
• Intractable for cycles: Exponential complexity
• Large state spaces: Can be slow for many states
• Memory intensive: May require storing large factors

Next Module

Exact Inference

Inference Goals

Marginal Probability

Conditional Probability

Variable Elimination

Algorithm Steps

Example: Computing P(x3)P(x_3)P(x3​)

Complexity

Belief Propagation (Message Passing)

Message Passing Formula

Marginal Probability Formula

Algorithm Steps

Applicable Scenarios

Tree Structures (Acyclic Graphs)

Graphs with Cycles

Small Graphs

Advantages and Limitations

Advantages

Limitations

Example: Computing $P(x_3)$