Learn about prototype-based clustering methods that represent clusters using prototype vectors (cluster centers or representative samples)
Prototype clustering assumes that the clustering structure can be characterized by a set of prototypes (also called cluster centers or representative vectors). Each cluster is represented by one or more prototype vectors that capture the cluster's characteristics.
The clustering structure can be fully described by a set of prototype vectors. Each sample belongs to the cluster whose prototype is closest (by distance measure).
Sample xⱼ → Cluster Cᵢ if dist(xⱼ, pᵢ) = min_k dist(xⱼ, pₖ)
where pᵢ is the prototype vector for cluster Cᵢ
All prototype clustering algorithms follow a similar iterative workflow:
Start with k initial prototype vectors. Common methods: random selection from data, k-means++ initialization, or random initialization in feature space.
For each sample, calculate distance to all prototypes and assign to the nearest prototype's cluster. This creates the current cluster partition.
Based on current cluster assignments, update prototype vectors. Different algorithms use different update rules: mean calculation (k-means), gradient-based updates (LVQ), or maximum likelihood estimation (GMM).
If prototypes haven't changed (or changed less than threshold) and cluster assignments are stable, stop. Otherwise, return to step 2.
Three main prototype clustering algorithms, each with different characteristics and use cases: