Learn how to evaluate clustering quality using external metrics (with reference labels) and internal metrics (without reference labels)
Since clustering has no absolute "correct" answer, we need metrics to assess clustering quality. These metrics help us:
Determine which clustering algorithm performs best on your dataset
Select optimal parameters (e.g., number of clusters k) for your clustering algorithm
Ensure clustering results meet quality standards for your application
Gain insights into data structure and cluster characteristics
Clustering performance metrics are divided into two categories based on whether they require reference labels (ground truth) or not.
Compare clustering results against a reference model (ground truth labels). Require labeled data for evaluation.
Use Cases:
Examples:
Evaluate clustering quality without reference labels. Based on cluster structure itself (intra-cluster similarity, inter-cluster separation).
Use Cases:
Examples:
External metrics are based on comparing sample pairs between the clustering result and a reference model. For a dataset D with m samples, there are m(m-1)/2 possible pairs.
For each pair of samples , we compare their cluster assignments in:
Consider 4 samples with clustering result C = {C₁={x₁,x₂}, C₂={x₃,x₄}} and reference C* = {C₁*={x₁,x₃}, C₂*={x₂,x₄}}:
| Pair | In Clustering C | In Reference C* | Type | Count |
|---|---|---|---|---|
| (x₁, x₂) | Same | Same | SS | a |
| (x₁, x₃) | Same | Different | SD | b |
| (x₂, x₄) | Different | Same | DS | c |
| (x₃, x₄) | Different | Different | DD | d |
Three commonly used external metrics for evaluating clustering quality:
Formula:
Range:
[0, 1]
Interpretation:
Higher is better
Measures the proportion of pairs that are in the same cluster in both the clustering result and reference model, relative to all pairs that are in the same cluster in at least one.
Example Calculation:
If a=50, b=10, c=15, then JC = 50/(50+10+15) = 0.67
Formula:
Range:
[0, 1]
Interpretation:
Higher is better
Measures the proportion of pairs that are correctly classified (both same or both different) in both clustering and reference.
Example Calculation:
If a=50, d=100, m=20, then RI = 2(50+100)/(20×19) = 0.79
Formula:
Range:
[0, 1]
Interpretation:
Higher is better
Geometric mean of precision and recall, balancing both aspects of clustering agreement.
Example Calculation:
If a=50, b=10, c=15, then FMI = √[(50/60) × (50/65)] = 0.80
Internal metrics evaluate clustering quality based on the structure of clusters themselves, without requiring reference labels. They measure:
How tightly packed samples are within each cluster (lower intra-cluster distance is better)
How well-separated different clusters are from each other (higher inter-cluster distance is better)
avg(C): Average intra-cluster distance
Average distance between all pairs within cluster C
diam(C): Cluster diameter
Maximum distance between any two points in cluster C
d_min(Cᵢ, Cⱼ): Minimum inter-cluster distance
Distance between nearest points in clusters Cᵢ and Cⱼ
d_cen(μᵢ, μⱼ): Center distance
Distance between cluster centers (mean vectors)
Formula:
Range:
[0, +∞)
Interpretation:
Lower is better
Measures average similarity between each cluster and its most similar cluster. Lower values indicate better separation.
Key Components:
Formula:
Range:
(0, +∞)
Interpretation:
Higher is better
Measures the ratio of minimum inter-cluster distance to maximum intra-cluster diameter. Higher values indicate better separation.
Key Components: