Master discriminative undirected graph models for sequence labeling. Learn how CRFs model conditional probabilities for part-of-speech tagging, named entity recognition, and syntax analysis.
Conditional Random Fields (CRF) are discriminative undirected graph models that focus on modeling the conditional probability of state sequence given observation sequence : .
The most commonly used CRF structure is the linear-chain CRF, where state sequence forms a chain, corresponding to observation sequence.
States form a linear chain: (undirected edges). Each state is connected to its corresponding observation .
Example (POS Tagging):
CRF defines conditional probability through feature functions and exponential form, ensuring non-negativity and interpretability:
Where:
Feature functions are binary indicators (0 or 1) that capture specific patterns in the data. They enable flexible feature engineering.
Capture relationships between adjacent states and , possibly depending on observations and position .
Example (POS Tagging):
Capture relationships between current state and observations at position .
Example (POS Tagging):
Apply linear-chain CRF to tag words in a sentence with their part-of-speech labels. Observations are words, states are POS tags.
| Index | Word (Observation) | POS Tag (State) | Tag Description |
|---|---|---|---|
| 1 | The | DT | Determiner |
| 2 | quick | JJ | Adjective |
| 3 | brown | JJ | Adjective |
| 4 | fox | NN | Noun |
| 5 | jumps | VBZ | Verb (3rd person) |
| 6 | over | IN | Preposition |
| 7 | the | DT | Determiner |
| 8 | lazy | JJ | Adjective |
| 9 | dog | NN | Noun |
Sentence: "The quick brown fox jumps over the lazy dog". Goal: Given word sequence, infer POS tag sequence using CRF.
Linear chain: . Each POS tag corresponds to word .
Given word sequence, CRF computes and finds most likely tag sequence using Viterbi-like algorithm. Result: DT-JJ-JJ-NN-VBZ-IN-DT-JJ-NN (shown in table above) with high confidence.