Handling latent variables and missing data with iterative optimization
The Expectation-Maximization (EM) Algorithm is an iterative optimization method for estimating parameters in probabilistic models with latent (unobserved) variables or missing data. It's a powerful tool for handling incomplete datasets.
Example: A watermelon's stem has fallen off, so we can't observe whether it's "curled" or "stiff". The "stem" attribute value is unknown—this is a latent variable. EM algorithm can estimate model parameters even when some variables are unobserved.
Let denote observed variables, denote latent variables, and denote model parameters. We want to maximize the marginal likelihood:
Starting from initial parameters , iterate until convergence:
Based on current parameters , infer the expected value of latent variables, denoted .
Based on observed variables and , perform maximum likelihood estimation to update parameters, denoted .
By alternating between E-step and M-step, the marginal likelihood monotonically increases, eventually converging to a local maximum. EM provides a principled way to handle missing data without discarding incomplete samples.
Gibbs Sampling is a Markov Chain Monte Carlo (MCMC) method for approximate inference in Bayesian networks. It iteratively samples variables to approximate the posterior distribution.
Gibbs sampling alternates between variables, gradually converging to the true posterior distribution. It's particularly effective in high-dimensional spaces and works well for complex Bayesian networks.
Variational Inference transforms complex probabilistic inference into an optimization problem. It uses a known simple distribution to approximate the complex posterior distribution.
The log-likelihood can be decomposed into two parts:
Evidence Lower Bound (ELBO):
KL Divergence:
Since , we have . Variational inference maximizes (the ELBO), which makes as close as possible to the true posterior .
In practice, the E-step of EM (inferring ) may be complex. Variational inference can simplify this by constraining the form of :
Fix , find optimal to maximize. This is equivalent to minimizing.
Fix , find optimal to maximize. This is equivalent to maximizing.
By alternating E-step and M-step, monotonically increases, eventually converging to a local maximum. Variational inference makes the E-step tractable by using simpler approximate distributions.
Plate notation is a compact way to represent probabilistic graphical models. Mutually independent variables generated by the same mechanism are placed in a box (plate) with the number of repetitions labeled. Plates can be nested, and observed variables are typically shaded.
For observed variables generated from latent variable:
The corresponding log-likelihood:
Using EM algorithm for incomplete datasets
In a medical dataset, some patients have missing test results. We want to estimate model parameters (e.g., disease probabilities) using all available data, including incomplete records.
E-step: For samples with missing test results, estimate the expected value of the missing variable based on current parameter estimates and observed symptoms.
M-step: Update disease probabilities and symptom distributions using both complete samples and expected values from incomplete samples.
Iterate: Repeat E-step and M-step until convergence. The algorithm naturally handles missing data without discarding incomplete samples.