Master probabilistic graphical models for reasoning under uncertainty
A Bayesian network consists of:
This compact representation exploits conditional independence to reduce from to manageable parameters.
Variables: (Disease), (Fever), (Cough)
Structure: (disease causes symptoms)
Parameters:
Independence:
B blocks the path. Given B, A doesn't provide info about C.
Independence:
B is common cause. Conditioning on B blocks influence between A and C.
Independence:
Marginally independent, but dependent when conditioning on B ('explaining away').
Scenario: Grass is wet (W). Possible causes: Rain (R) or Sprinkler (S). Structure:
Without observing W:
(rain and sprinkler are independent events)
After observing W = wet:
. If we learn sprinkler was on, rain is less likely (one cause explains the effect).
D-separation (directed separation) determines if by examining paths in the graph:
If all paths from X to Y are blocked by Z, then
Network:
Query: Is ?
Path :
Answer: Yes,
Given evidence (observed variables), compute posterior for query variable(s) :
where are hidden (unobserved) variables, is normalization constant
✓ Accurate | ✗ Exponential in tree-width
✓ Scales to large networks | ✗ Approximate
Key insight: Elimination order matters! Good orders exploit independence structure for efficiency.
Given structure, estimate CPT parameters from data:
Maximum Likelihood:
With complete data, closed-form solution. For missing data, use EM algorithm.
Learn graph structure from data:
Challenge: Exponentially many structures, hard to learn causal direction from observational data alone.
Priors:
Likelihoods:
Query:
Use variable elimination or belief propagation to compute posterior probability of Flu given both symptoms. Result might be ,
Note: Both can be high if they're not mutually exclusive (patient could have both).
This is always true by definition of conditional probability, but has exponential parameters!
Theorem: For DAG G with nodes X_1, ..., X_n:
Each variable depends only on its parents, not all predecessors! This is the key insight.
CPT Size Reduction:
If each node has at most k parents: O(n × 2^k) vs O(2^n) parameters. Exponential savings!
D-separation determines conditional independence from graph structure:
Chain: A → B → C
A ⊥ C | B (B blocks path)
Fork: A ← B → C
A ⊥ C | B (B blocks path)
V-structure (collider): A → B ← C
A ⊥ C | ∅ but NOT A ⊥ C | B (B unblocks!)
Goal: Compute P(X_query | evidence)
Naive summation: exponential time in number of variables!
Example: P(A) in chain A-B-C-D
Naive:
Cost: O(2^4) = 16 operations
Variable Elimination (eliminate D, then C, then B):
Cost: O(3×2) = 6 operations
1. Choose elimination ordering π
2. For each variable Z in π:
a. Multiply all factors containing Z
b. Sum out Z from the product
c. Add resulting factor to factor list
3. Multiply remaining factors
4. Normalize to get probability
Time complexity: O(n × d^(w+1)) where:
Finding optimal elimination order is NP-hard, but good heuristics exist (min-fill, min-degree).
For node X_i with parents Pa_i, estimate P(X_i | Pa_i):
Simply count frequencies in data! Each CPT entry estimated independently (global decomposition).
Log-likelihood of data D:
Rearranging sums:
Each term involves only parameters of one CPT! Optimize independently.
With Dirichlet prior α:
Equivalent to Laplace smoothing! Prevents overfitting with sparse data.
Computing P(X_query | evidence) in general Bayesian networks is #P-complete.
(#P includes NP, meaning at least as hard as SAT, traveling salesman, etc.)
Idea: Construct BN that encodes 3-SAT formula, where inference solves SAT.
Since SAT is NP-complete and we reduced it to BN inference, BN inference is NP-hard!
✓ Poly-time Cases:
Approximate Methods:
Test your understanding with 10 multiple-choice questions