MathIsimple

Bayesian Networks

Probabilistic graphical models for complex dependency structures

What are Bayesian Networks?

Probabilistic Graphical Model

Bayesian Networks (also called "belief networks") are a type of probabilistic graphical model that use a graph structure to represent conditional dependencies between variables. They provide a compact way to represent and reason about complex probability distributions.

Definition

A Bayesian network B=(G,Θ)B = (G, \Theta) consists of:

  • Structure GG: A Directed Acyclic Graph (DAG) where nodes represent attribute variables and edges represent dependencies
  • Parameters Θ\Theta: A set of Conditional Probability Tables (CPT) for each node given its parents

Core Assumption

Conditional Independence

Bayesian networks assume that each attribute is conditionally independent of its non-descendants given its parent nodes. This allows the joint probability distribution to be factorized:

PB(x1,x2,,xd)=i=1dPB(xiπi)=i=1dθxiπiP_B(x_1, x_2, \ldots, x_d) = \prod_{i=1}^{d} P_B(x_i | \pi_i) = \prod_{i=1}^{d} \theta_{x_i | \pi_i}

where πi\pi_i is the set of parent nodes of xix_i

Example Factorization

For a Bayesian network with variables (好瓜 x1x_1, 甜度 x2x_2, 敲声 x3x_3, 色泽 x4x_4, 根蒂 x5x_5):

P(x1,x2,x3,x4,x5)=P(x1)P(x2)P(x3x1)P(x4x1,x2)P(x5x2)P(x_1, x_2, x_3, x_4, x_5) = P(x_1) P(x_2) P(x_3 | x_1) P(x_4 | x_1, x_2) P(x_5 | x_2)

Three Basic Dependency Structures

Common Parent

Structure: x1x3x_1 \to x_3, x1x4x_1 \to x_4

Conditional Independence: x3x4x1x_3 \perp x_4 | x_1

Given parent x1x_1, children x3x_3 and x4x_4 are independent.

V-Structure

Structure: x1x4x_1 \to x_4, x2x4x_2 \to x_4

Marginal Independence: x1x2x_1 \perp x_2

Key: Given x4x_4, x1x_1 and x2x_2 become dependent (explaining away effect).

Sequential Structure

Structure: xyzx \to y \to z

Conditional Independence: xzyx \perp z | y

Given intermediate node yy, xx and zz are independent.

D-separation

Moral Graph

To analyze D-separation, we first convert the directed graph to an undirected moral graph:

  1. 1. Connect V-structure parents: For each V-structure (ACBA \to C \gets B), add an undirected edge between AA and BB ("moralization")
  2. 2. Convert directed edges: Replace all directed edges with undirected edges

D-separation Criterion

Two variables xx and yy are D-separated by setzz if xx and yy are separated into different connected components in the moral graph after removing nodes in zz.

If xx and yy are D-separated by zz, then xyzx \perp y | z (conditional independence).

Structure Learning

Scoring Functions

Structure learning aims to find the optimal network structure GG that best fits the training data. Common scoring functions include:

AIC (Akaike Information Criterion)

s(BD)=f(θ)BLL(BD)s(B | D) = f(\theta) |B| - LL(B | D)

where f(θ)=1f(\theta) = 1

BIC (Bayesian Information Criterion)

s(BD)=f(θ)BLL(BD)s(B | D) = f(\theta) |B| - LL(B | D)

where f(θ)=12logmf(\theta) = \frac{1}{2} \log m (mm is sample size)

MDL (Minimal Description Length)

Based on information-theoretic principles, balancing model complexity and data fit.

Computational Challenge

Finding the optimal Bayesian network structure is NP-hard. In practice, we use heuristic search algorithms (e.g., greedy search, genetic algorithms) to find good structures rather than optimal ones.

Inference

Exact Inference

Directly compute posterior probabilities from the joint distribution defined by the Bayesian network. However, exact inference is NP-hard for general networks.

Approximate Inference

For complex networks, we use approximate methods:

  • Gibbs Sampling: MCMC method that iteratively samples variables to approximate the posterior distribution
  • Variational Inference: Transform inference into an optimization problem, approximating the true posterior with a simpler distribution

Example: Medical Diagnosis Network

Bayesian network for disease diagnosis

Network Structure

Variables: Disease (D), Symptom1 (S1), Symptom2 (S2), Test Result (T), Age (A)

Dependencies:

  • • D → S1, D → S2, D → T
  • • A → D

Joint Probability:

P(D,S1,S2,T,A)=P(A)P(DA)P(S1D)P(S2D)P(TD)P(D, S_1, S_2, T, A) = P(A) P(D | A) P(S_1 | D) P(S_2 | D) P(T | D)

Inference Example

Given observed symptoms S1=present, S2=present, and test result T=positive, we can compute:

P(D=diseaseS1,S2,T)=P(D,S1,S2,T)P(S1,S2,T)P(D = \text{disease} | S_1, S_2, T) = \frac{P(D, S_1, S_2, T)}{P(S_1, S_2, T)}

This allows us to diagnose the probability of disease given the observed evidence.