# How many clusters Six Clusters Two Clusters Four

• Slides: 84

K-means 的局限性 l K-means has problems when clusters are of differing – Sizes大小 – Densities密度 – Non-globular shapes非球形

Limitations of K-means: Differing Sizes Original Points K-means (3 Clusters)

Limitations of K-means: Differing Density Original Points K-means (3 Clusters)

Limitations of K-means: Non-globular Shapes Original Points K-means (2 Clusters)

K-means 局限性的克服 Original Points K-means Clusters One solution is to use many clusters. Find parts of clusters, but need to put together.

Overcoming K-means Limitations Original Points K-means Clusters

Overcoming K-means Limitations Original Points K-means Clusters

Hierarchical Clustering: Comparison 1 3 5 5 1 2 3 6 MIN MAX 5 2 5 1 5 Ward’s Method 3 6 4 1 2 5 2 Group Average 3 1 4 6 4 2 3 3 3 2 4 5 4 1 5 1 2 2 4 4 6 1 4 3

Original Points Point types: core, border and noise Eps = 10, Min. Pts = 4

(Min. Pts=4, Eps=9. 75). Original Points • Varying densities • High-dimensional data (Min. Pts=4, Eps=9. 92)

Measuring Cluster Validity Via Correlation l Correlation of incidence and proximity matrices for the K-means clusterings of the following two data sets. Corr = -0. 9235 Corr = -0. 5810

Using Similarity Matrix for Cluster Validation l Order the similarity matrix with respect to cluster labels and inspect visually.

Using Similarity Matrix for Cluster Validation Clusters in random data are not so crisp DBSCAN

Using Similarity Matrix for Cluster Validation l Clusters in random data are not so crisp K-means

Using Similarity Matrix for Cluster Validation l Clusters in random data are not so crisp Complete Link

Using Similarity Matrix for Cluster Validation DBSCAN

Internal Measures: SSE l SSE is good for comparing two clusterings or two clusters (average SSE). l Can also be used to estimate the number of clusters