Machine Learning/Learning Center/Probabilistic Graphical Models/Fundamentals

Probabilistic Graphical Models Fundamentals

Understand the foundation of probabilistic graphical models: graph structures, directed vs undirected models, and generative vs discriminative modeling paradigms.

Module 1 of 7

Intermediate Level

90-120 min

Definition and Core Goal

Probabilistic Graphical Models (PGMs) are a class of probability models that use graph structures as core tools to represent and reason about probabilistic relationships between random variables.

Graph Structure Components

• Nodes (Vertices): Represent single or groups of random variables
• Edges: Represent probabilistic dependency or correlation relationships between variables
• Graph Structure: Provides intuitive visualization of complex probability distributions

Core Goal: Inference

Through graph structure, simplify probability computation and achieve inference:

• Use known variables (e.g., observable variables) to infer unknown variables (e.g., hidden variables)
• Compute marginal probabilities: $P(x_E) = \sum_{x_F} P(x_E, x_F)$
• Compute conditional probabilities: $P(x_F | x_E) = \frac{P(x_E, x_F)}{\sum_{x_F} P(x_E, x_F)}$
• Where $E$ = known variable set, $F$ = target variable set

Two Classifications by Graph Structure

Probabilistic graphical models are classified into two main categories based on the type of graph structure used:

Directed Graph Models (Bayesian Networks)

Use Directed Acyclic Graphs (DAG) to represent one-way dependency relationships (e.g., "parent node → child node" causal dependencies).

Key Characteristics:

• Directed edges represent causal or conditional dependencies
• Acyclic: no directed cycles (no loops)
• Joint distribution factorizes as: $P(x_1, \ldots, x_n) = \prod_i P(x_i | \text{parents}(x_i))$

Typical Examples:

• Hidden Markov Models (HMM) - sequential dependencies
• Bayesian Networks - general causal structures
• Latent Dirichlet Allocation (LDA) - topic modeling

Undirected Graph Models (Markov Networks)

Use undirected graphs to represent bidirectional correlation relationships between variables.

Key Characteristics:

• Undirected edges represent symmetric correlations
• Joint distribution based on clique decomposition
• Uses potential functions (energy functions) on cliques

Typical Examples:

• Markov Random Fields (MRF) - spatial correlations
• Conditional Random Fields (CRF) - discriminative sequence models
• Boltzmann Machines - energy-based models

Modeling Paradigms: Generative vs Discriminative

Probabilistic graphical models can also be classified by their modeling approach:

Generative Models

Model the joint probability distribution $P(Y, R, O)$ , where:

• $Y$ = target variable (to be predicted)
• $O$ = observable variables
• $R$ = other variables

Derive conditional distribution $P(Y | O)$ from the joint distribution.

Examples:

• Hidden Markov Models (HMM) - model $P(\text{states}, \text{observations})$
• Latent Dirichlet Allocation (LDA) - model $P(\text{documents}, \text{topics}, \text{words})$
• Gaussian Mixture Models (GMM) - model $P(\text{data}, \text{components})$

Discriminative Models

Directly model the conditional probability distribution $P(Y, R | O)$ , focusing on the direct relationship between target variables and observable variables.

Key Advantage:

More flexible and often more accurate for classification tasks, as they don't need to model the full joint distribution of all variables.

Examples:

• Conditional Random Fields (CRF) - model $P(\text{labels} | \text{observations})$
• Logistic Regression - model $P(\text{class} | \text{features})$

Comparison

Generative Models:

✓ Can generate new samples
✓ Model full data distribution
✗ May be less accurate for classification

Discriminative Models:

✓ Often more accurate for classification
✓ More flexible assumptions
✗ Cannot generate new samples

Medical Diagnosis Network Example

Consider a medical diagnosis system using a directed probabilistic graphical model (Bayesian network) to diagnose diseases based on patient symptoms and test results.

Network Variables and Dependencies

Variable	Type	Possible Values	Parent Nodes
Age	Observable	Young, Middle, Old	None
Disease	Hidden	Healthy, Flu, Pneumonia	Age
Fever	Observable	No, Yes	Disease
Cough	Observable	No, Yes	Disease
Test Result	Observable	Negative, Positive	Disease

Network structure: Age → Disease → {Fever, Cough, Test Result}. This directed graph represents causal relationships: age affects disease probability, and disease causes symptoms and test results.

Joint Probability Factorization

The joint probability distribution factorizes according to the graph structure:

P(\text{Age}, \text{Disease}, \text{Fever}, \text{Cough}, \text{Test}) = P(\text{Age}) \cdot P(\text{Disease} | \text{Age}) \cdot P(\text{Fever} | \text{Disease}) \cdot P(\text{Cough} | \text{Disease}) \cdot P(\text{Test} | \text{Disease})

This factorization simplifies the joint distribution from 5 variables to products of simpler conditional probabilities.

Inference Example:

Given observable variables (Age=Middle, Fever=Yes, Cough=Yes, Test=Positive), we can compute $P(\text{Disease} = \text{Pneumonia} | \text{observations})$ using Bayes' theorem and the graph structure, without enumerating all possible combinations.

Advantages and Applications

Key Advantages

• Intuitive visualization: Graph structure provides clear representation of dependencies
• Simplified computation: Factorization reduces computational complexity
• Handles uncertainty: Naturally represents probabilistic relationships
• Interpretable: Easy to understand and explain to domain experts
• Modular: Can combine simple models into complex structures

Real-World Applications

• Medical diagnosis: Disease prediction from symptoms
• Speech recognition: HMM for sequence modeling
• Image processing: MRF for denoising and segmentation
• Natural language processing: CRF for POS tagging, LDA for topic modeling
• Bioinformatics: Gene network analysis, protein structure prediction
• Computer vision: Object recognition, scene understanding

Frequently Asked Questions

Q: What's the difference between directed and undirected graphs?

A: Directed graphs use arrows to represent causal or conditional dependencies (parent → child), while undirected graphs use lines to represent symmetric correlations. Directed graphs are better for causal modeling, while undirected graphs are better for spatial or symmetric relationships.

Q: When should I use generative vs discriminative models?

A: Use generative models when you need to generate new samples or model the full data distribution. Use discriminative models when you only care about classification accuracy and don't need to generate samples. Discriminative models are often more accurate for classification tasks.

Q: How do graphical models simplify probability computation?

A: By factorizing the joint distribution based on the graph structure, we can break down complex high-dimensional probability calculations into simpler local computations. Instead of computing $P(x_1, \ldots, x_n)$ directly, we compute products of conditional probabilities $\prod_i P(x_i | \text{parents}(x_i))$ , which is much more efficient.

Q: Can I combine directed and undirected graphs?

A: Yes! There are hybrid models like chain graphs that combine both directed and undirected edges. However, most practical models use either purely directed (Bayesian networks) or purely undirected (Markov networks) structures for simplicity and computational efficiency.

Next Module