Machine Learning/Learning Center/Clustering/Hierarchical Clustering

Hierarchical Clustering

Learn hierarchical clustering methods including AGNES (agglomerative) and DIANA (divisive). Understand dendrograms, cluster distance measures, and flexible cluster number selection.

Module 9 of 9

Intermediate to Advanced

80-100 min

What is Hierarchical Clustering?

Hierarchical clustering creates a tree-like structure (dendrogram) representing clusters at different levels of granularity. Unlike k-means, hierarchical clustering doesn't require specifying the number of clusters beforehand - you can choose the appropriate level after clustering.

Two Approaches

AGNES (Agglomerative)

Bottom-up: Start with each sample as a cluster, iteratively merge closest clusters

DIANA (Divisive)

Top-down: Start with all samples in one cluster, iteratively split clusters

Advantages

• No need to specify k
• Dendrogram visualization
• Flexible cluster selection
• Handles non-spherical clusters

Limitations

• Higher computational cost
• Sensitive to noise
• Difficult to undo merges
• O(n²) or O(n³) complexity

AGNES (Agglomerative Nesting)

AGNES is a bottom-up approach that starts with each sample as its own cluster and iteratively merges the closest clusters:

Step 1

Initialize

Start with m clusters (each sample is its own cluster).

Step 2

Find Closest Clusters

Calculate distance between all pairs of clusters and find the closest pair.

Step 3

Merge Clusters

Merge the two closest clusters into one. Update cluster distances.

Step 4

Repeat

Repeat steps 2-3 until all samples are in one cluster. Build dendrogram during process.

Cluster Distance Measures (Linkage)

The choice of how to measure distance between clusters (linkage criterion) significantly affects results:

Single Linkage (d_min)

Distance between clusters = minimum distance between any two points in different clusters.

d_min(Cᵢ, Cⱼ) = min dist(x, y) where x in Cᵢ, y in Cⱼ

Tends to create elongated clusters (chaining effect).

Complete Linkage (d_max)

Distance between clusters = maximum distance between any two points in different clusters.

d_max(Cᵢ, Cⱼ) = max dist(x, y) where x in Cᵢ, y in Cⱼ

Tends to create compact, spherical clusters.

Average Linkage (d_avg)

Distance between clusters = average distance between all pairs of points in different clusters.

d_avg(Cᵢ, Cⱼ) = (1/|Cᵢ||Cⱼ|) Σ dist(x, y) where x in Cᵢ, y in Cⱼ

Balanced approach, commonly used in practice.

DIANA (Divisive Analysis)

DIANA is a top-down approach that starts with all samples in one cluster and iteratively splits clusters:

Algorithm:

1. Start with all samples in one cluster
2. Find the cluster with largest diameter (most diverse)
3. Split it into two subclusters (e.g., using k-means with k=2)
4. Repeat until each sample is its own cluster

Note: DIANA is computationally more expensive than AGNES (O(n²) vs O(n² log n) for AGNES with efficient implementations). Typically used for smaller datasets.

Dendrogram Interpretation

A dendrogram is a tree diagram showing the hierarchical relationship between clusters. The height of branches represents the distance at which clusters were merged.

How to Read a Dendrogram

Leaves: Individual samples at the bottom
Branches: Merged clusters
Height: Distance at which clusters merged
Cutting: Draw horizontal line to get clusters at that level

Back to Clustering