Master sequential directed graph models with hidden states. Learn how HMMs model temporal dependencies for speech recognition, weather prediction, and sequence analysis.
Hidden Markov Models (HMM) are sequential directed graph models based on Markov chains. The core feature is "hidden states, observable outputs": state variables cannot be directly observed, but can only be inferred indirectly through observation variables.
Internal states that cannot be directly observed. Each state variable takes values from the state space , where is the number of states.
Examples:
Variables that can be directly observed. Each observation variable takes values from the observation space , where is the number of possible observations.
Examples:
An HMM is completely defined by three parameters :
Probability of transitioning from state to state .
Example (Weather):
means 30% chance of transitioning from Sunny to Rainy. Each row sums to 1: .
Probability of generating observation when in state .
Example (Weather-Umbrella):
means 70% chance of seeing "Umbrella" when state is "Rainy". Each row sums to 1: .
Probability of being in state at the initial time step (t=1).
Example:
means 50% chance of starting in "Sunny" state. Vector sums to 1: .
Based on the Markov chain assumption (next state depends only on current state, independent of history), the joint probability distribution of all variables is:
This factorization breaks down into:
How an HMM generates an observation sequence:
According to initial state probability , select initial state .
According to current state and emission probability , generate observation .
According to current state and transition probability , transition to next state .
Repeat steps 2-3 until (desired sequence length reached).
HMM applications typically involve solving one of three fundamental problems:
Given: Model parameters and observation sequence
Compute: (probability of observation sequence given model)
Algorithm:
Application:
Model selection - compare different HMMs to see which best explains the observed data.
Given: Model parameters and observation sequence
Find: Optimal state sequence
Algorithm:
Application:
Speech recognition - infer phonemes/words from acoustic signals. Part-of-speech tagging - infer POS tags from words.
Given: Observation sequence
Estimate: Model parameters
Algorithm:
Application:
Train HMM from unlabeled observation sequences. Learn model parameters from data.
Apply HMM to predict weather states (Sunny, Cloudy, Rainy) from umbrella observations (Umbrella, No Umbrella) over 7 days.
| Day | State (Hidden) | Observation | Emission Prob |
|---|---|---|---|
| 1 | Sunny | No Umbrella | 0.6 |
| 2 | Sunny | No Umbrella | 0.6 |
| 3 | Rainy | Umbrella | 0.7 |
| 4 | Rainy | Umbrella | 0.7 |
| 5 | Cloudy | No Umbrella | 0.5 |
| 6 | Sunny | No Umbrella | 0.6 |
| 7 | Rainy | Umbrella | 0.7 |
States are hidden (not directly observable). We observe umbrella usage and infer weather states using HMM parameters and Viterbi algorithm.
Sunny → Sunny: 0.7, Sunny → Cloudy: 0.2, Sunny → Rainy: 0.1
Rainy → Rainy: 0.6, Rainy → Cloudy: 0.2, Rainy → Sunny: 0.2
Sunny → No Umbrella: 0.6, Sunny → Umbrella: 0.4
Rainy → Umbrella: 0.7, Rainy → No Umbrella: 0.3
Sunny: 0.5, Cloudy: 0.3, Rainy: 0.2
Given observation sequence [No, No, Umbrella, Umbrella, No, No, Umbrella], Viterbi algorithm infers most likely state sequence: [Sunny, Sunny, Rainy, Rainy, Cloudy, Sunny, Rainy] with probability 0.0042.