Machine Learning/Learning Center/Probabilistic Graphical Models/Markov Random Fields

Markov Random Fields (MRF)

Master undirected graph probability models. Learn how cliques, potential functions, and Markov properties enable spatial correlation modeling for image processing and computer vision.

Module 3 of 7

Intermediate to Advanced

120-150 min

Core Definition

Markov Random Fields (MRF) are probability models based on undirected graphs. Nodes represent variables, edges represent correlation relationships between variables, and the joint probability distribution is defined through "cliques" and "potential functions".

Key Characteristics

• Undirected edges: Represent symmetric correlations (no causal direction)
• Clique-based: Joint distribution defined on cliques (fully connected subgraphs)
• Potential functions: Non-negative functions quantifying clique variable associations
• Markov properties: Enable conditional independence reasoning

Advantages

• Natural for symmetric relationships
• Flexible correlation modeling
• Well-suited for spatial data
• No causal assumptions needed

Limitations

• Normalization constant Z can be intractable
• Inference more complex than directed models
• Parameter learning can be difficult
• May require approximation methods

Key Concepts: Cliques and Potential Functions

Cliques

A clique is a subset of nodes where every pair of nodes is connected by an edge (fully connected subgraph). For example, if $\{x_1, x_2\}$ has an edge, it's a clique.

Maximal Clique:

A clique that cannot be extended by adding more nodes while maintaining the clique property. It is not contained in any larger clique.

Example:

In a graph with nodes $\{x_1, x_2, x_3, x_4\}$ and edges connecting all pairs except $(x_1, x_4)$ : Cliques: $\{x_1, x_2\}$ , $\{x_2, x_3\}$ , $\{x_1, x_2, x_3\}$ . Maximal clique: $\{x_1, x_2, x_3\}$ .

Potential Functions (Factors)

A potential function $\psi_Q(x_Q)$ is a non-negative real function defined on clique $Q$ , used to quantify the association strength of variables in the clique.

Common Form:

\psi_Q(x_Q) = e^{-H_Q(x_Q)}

Where $H_Q(x_Q)$ is the energy function. The exponential form ensures non-negativity. Lower energy = higher potential = more likely configuration.

Example:

For clique $\{x_1, x_2\}$ , if $x_1 = x_2$ (same value), energy H = 0, potential ψ = 1 (high). If $x_1 \neq x_2$ , energy H = 1, potential ψ = $e^{-1}$ ≈ 0.37 (lower). This encourages similar values.

Core Probability Formula

The joint probability distribution is based on clique (or maximal clique) decomposition, normalized by a partition function $Z$ to ensure probabilities sum to 1.

Based on All Cliques

P(x) = \frac{1}{Z} \prod_{Q \in \mathcal{C}} \psi_Q(x_Q)

Where:

• $\mathcal{C}$ = set of all cliques in the graph
• $\psi_Q(x_Q)$ = potential function on clique $Q$
• $Z = \sum_x \prod_{Q \in \mathcal{C}} \psi_Q(x_Q)$ = normalization constant (partition function)

Based on Maximal Cliques

P(x) = \frac{1}{Z^*} \prod_{Q \in \mathcal{C}^*} \psi_Q(x_Q)

Where:

• $\mathcal{C}^*$ = set of all maximal cliques
• $Z^*$ = normalization constant for maximal clique decomposition
• Hammersley-Clifford Theorem: Maximal clique decomposition is sufficient and often more efficient

Normalization Constant Z

The partition function $Z$ ensures probabilities sum to 1, but computing it requires summing over all possible variable assignments, which can be computationally intractable for large graphs.

This is why MRF inference often requires approximation methods (MCMC, variational inference).

Three Markov Properties

The core property of MRF is conditional independence, which can be expressed through three equivalent Markov properties. All three are equivalent for positive distributions.

1. Global Markov Property

If variable sets $A$ and $B$ are separated by set $C$ (all paths from A to B must pass through C), then:

P(x_A, x_B | x_C) = P(x_A | x_C) P(x_B | x_C)

This means $x_A \perp x_B | x_C$ (A and B are conditionally independent given C).

2. Local Markov Property

Given a variable's neighbors (directly connected variables), the variable is conditionally independent of all other non-neighbor variables:

P(x_v | x_{V \setminus \{v\}}) = P(x_v | x_{n(v)})

Where $n(v)$ is the set of neighbors of variable $v$ , and $V \setminus \{v\}$ is all variables except $v$ .

3. Pairwise Markov Property

Given all other variables, two non-adjacent variables are conditionally independent:

P(x_i, x_j | x_{V \setminus \{i,j\}}) = P(x_i | x_{V \setminus \{i,j\}}) P(x_j | x_{V \setminus \{i,j\}})

If there's no edge between $x_i$ and $x_j$ , they are conditionally independent given all other variables.

Image Denoising Example

Apply MRF to denoise a 3×3 grayscale image. Each pixel is a variable, and neighboring pixels are connected by edges. The MRF encourages similar values for neighboring pixels while preserving observed noisy values.

Pixel Grid: Noisy vs Clean Values

Pixel ID	Position	Noisy Value	Clean Value	Neighbors
1	(1, 1)	0.8	1	2,4
2	(1, 2)	0.2	0	1,3,5
3	(1, 3)	0.3	0	2,6
4	(2, 1)	0.7	1	1,5,7
5	(2, 2)	0.4	0	2,4,6,8
6	(2, 3)	0.1	0	3,5,9
7	(3, 1)	0.9	1	4,8
8	(3, 2)	0.6	1	5,7,9
9	(3, 3)	0.5	0	6,8

Values: 0 = black, 1 = white. Neighbors are connected by edges in the MRF graph. Goal: Infer clean values from noisy observations using spatial smoothness.

MRF Model for Image Denoising

Graph Structure:

Each pixel is a node. Edges connect neighboring pixels (4-connected: up, down, left, right). Cliques: all pairs of neighboring pixels.

Potential Function:

For clique $\{x_i, x_j\}$ (neighboring pixels):

\psi(x_i, x_j) = \begin{cases} e^{1} & \text{if } x_i = x_j \\ e^{-1} & \text{if } x_i \neq x_j \end{cases}

Encourages similar values for neighbors (smoothness prior).

Inference:

Use belief propagation or Gibbs sampling to compute $P(\text{clean} | \text{noisy})$ . Result: Denoised image with smooth regions while preserving edges.

Result:

MRF successfully denoises the image by balancing observed noisy values with spatial smoothness. Pixels 1, 4, 7, 8 correctly inferred as white (1), pixels 2, 3, 5, 6, 9 as black (0).

Advantages and Limitations

Advantages

• Natural for symmetric relationships: No need to specify causal direction
• Flexible correlation modeling: Can capture complex spatial/temporal patterns
• Well-suited for spatial data: Perfect for images, grids, and spatial structures
• No causal assumptions: More flexible than directed models for some applications

Limitations

• Normalization constant Z: Computing partition function can be intractable
• Inference complexity: More complex than directed models for exact inference
• Parameter learning: Learning potential functions can be challenging
• May require approximation: Often needs MCMC or variational methods

Next Module