MathIsimple

Gaussian Mixture Model (GMM)

Learn generative model-based clustering using Gaussian distributions. Master the EM algorithm, understand soft clustering with probability assignments, and handle multi-modal distributions.

Module 7 of 9
Advanced Level
90-120 min

What is Gaussian Mixture Model?

Gaussian Mixture Model (GMM) is a generative model that assumes data is generated from a mixture of k Gaussian distributions. Each Gaussian component represents one cluster, and samples are assigned to clusters probabilistically (soft clustering).

Core Assumption

Each sample is generated by:

  1. 1. First, select a Gaussian component i with probability αᵢ (mixing coefficient)
  2. 2. Then, sample from that Gaussian distribution N(μᵢ, Σᵢ)

p_M(x) = Σᵢ₌₁ᵏ αᵢ · p(x | μᵢ, Σᵢ)

where Σᵢ αᵢ = 1, αᵢ > 0, and p(x | μᵢ, Σᵢ) is the Gaussian PDF

Advantages

  • • Supports soft clustering
  • • Handles non-spherical clusters
  • • Can model multi-modal distributions
  • • Probabilistic assignments

Parameters

  • • Mixing coefficients αᵢ
  • • Mean vectors μᵢ
  • • Covariance matrices Σᵢ
  • • Number of components k

EM Algorithm (Expectation-Maximization)

GMM parameters are estimated using the EM algorithm, which iterates between E-step (expectation) and M-step (maximization):

E-Step

Expectation Step

Calculate the posterior probability that each sample belongs to each Gaussian component:

γⱼᵢ = P(zⱼ = i | xⱼ) = αᵢ · p(xⱼ | μᵢ, Σᵢ) / Σₗ αₗ · p(xⱼ | μₗ, Σₗ)

where zⱼ is the hidden variable indicating which component generated sample xⱼ

M-Step

Maximization Step

Update model parameters to maximize the expected log-likelihood:

αᵢ = (1/m) Σⱼ γⱼᵢ

μᵢ = (Σⱼ γⱼᵢ · xⱼ) / (Σⱼ γⱼᵢ)

Σᵢ = (Σⱼ γⱼᵢ · (xⱼ - μᵢ)(xⱼ - μᵢ)ᵀ) / (Σⱼ γⱼᵢ)

Soft Clustering Support

Unlike k-means, GMM provides soft clustering - each sample has a probability distribution over all clusters rather than a hard assignment.

For sample xⱼ, the posterior probabilities γⱼᵢ indicate how likely it belongs to each cluster:

Example probabilities:

  • • Cluster 1: γⱼ₁ = 0.7 (70% probability)
  • • Cluster 2: γⱼ₂ = 0.2 (20% probability)
  • • Cluster 3: γⱼ₃ = 0.1 (10% probability)

This sample is primarily in Cluster 1 but has some membership in other clusters.

Next: Density Clustering