HIERARCHICAL CLUSTERING Presented by Asst Prof Mahajan Uchita

HIERARCHICAL CLUSTERING Presented by Asst. Prof. Mahajan Uchita Vidyadhar Department of Statistics, KCE’s Society PGCSTR, Jalgaon.

Contents • Cluster Analysis • Types Of Clustering • Good Clustering • Distance Similarity Measures • Distance Similarity Matrix • Types Of Clustering Algorithm • Hierarchical Clustering • Types of Hierarchical Clustering • Conclusion Hierarchical Clustering 2

What Is Cluster Analysis ? • Cluster analysis or simply clustering is the process of partitioning a set of data objects (or observations) into subsets. • Each subset is a cluster, such that objects in a cluster are similar to one another, yet dissimilar to objects in other clusters. The set of clusters resulting from a cluster analysis can be referred to as a clustering. Hierarchical Clustering 3

Types of Clustering Roughly clustering can be divided into two subgroups • Hard Clustering • In hard clustering, each data point either belongs to a cluster completely or not. • Strict partitioning clustering: each object belongs to exactly one cluster. • Soft Clustering • Instead of putting each data point into a separate cluster, a probability of that data point to be in those clusters is assigned. • Each object belongs to each cluster to a certain degree. Hierarchical Clustering 4

What Is Good Clustering ? • A good clustering method will produce high quality cluster with • High intra-class similarity. • Low inter-class similarity. • The quality of clustering result depends on both the similarity measure used by the method and its implementation. Hierarchical Clustering 5

Distance Or Similarity Measures • Hierarchical Clustering 6

Distance Or Similarity Matrix • Hierarchical Clustering 7

Types of Clustering Algorithms Clustering Hierarchical Agglomerative Partitioning Hierarchical Clustering Non hierarchical Divisive Density based Model based 8

Hierarchical Clustering • In hierarchical clustering the goal is to produce a hierarchical series of nested clusters, ranging from clusters of individual points at the bottom to an all-inclusive cluster at the top. A diagram called a dendrogram graphically represents this hierarchy. • One of the attractions of hierarchical techniques is that they correspond to taxonomies that are very common in the biological sciences, e. g. , kingdom, phylum, genus, species. Hierarchical Clustering 9

Dendrogram • A dendrogram is a diagram representing a tree. • In hierarchical clustering, it illustrates the arrangement of the cluster produced by the corresponding analysis. Hierarchical Clustering 10

Types of Hierarchical Clustering • A hierarchical method can be classified as being either • Agglomerative • Divisive Hierarchical Clustering 11

Algorithmic Steps For Divisive Hierarchical 1. Start with one cluster that contain all samples. 2. Calculate diameter of each cluster. (Diameter is the maximum distance between sample in cluster. ) and choose one cluster C having maximum diameter of all clusters to split. 3. Find the most dissimilar sample X from cluster C. Let X depart from the original cluster C from a new independent cluster N. 4. Assign all member of cluster C to MC. And repeat step 6 Hierarchical Clustering 12

Continue … Until members of cluster C & N do not change 5. Calculate similarities from each member of MC to cluster C & N & let the member owning the highest similarities in MC move to its similar cluster C or N. Update member of C & N. 6. Repeat the steps 2, 3, 4, 5 until the number of clusters becomes the number of samples or as specified by the user. Hierarchical Clustering 13

Agglomerative Hierarchical Clustering • Start with the points as individual clusters and, at each step, merge the closest pair of clusters. • The agglomerative hierarchical clustering further classified as 1. Linkage Method 2. • Single Linkage • Complete Linkage • Average Linkage Variance Method Or Ward’s Method Hierarchical Clustering 14

Algorithmic Step For Agglomerative Hierarchical • Hierarchical Clustering 15

Continue … • Hierarchical Clustering 16

Single Linkage • Hierarchical Clustering 17

Complete Linkage • Hierarchical Clustering 18

Average Linkage • Hierarchical Clustering 19

Ward’s Method • Hierarchical Clustering 20

Example • votes. repub data example Votes for Republican Candidate in Presidential Elections A data frame with the percents of votes given to the republican candidate in presidential elections from 1856 to 1976. Rows represent the 50 states, and columns the 31 elections. • votes. repub example. R Hierarchical Clustering 21

Conclusion • Hierarchical Clustering 22

References • Johnson, R. and Dean W. Wichern (2002). Applied Multivariate Statistical Analysis (Fifth edition), (prentice-Hall) • Dunham, Margaret H. “Data Mining : Introductory and Advance Topics”. Pearson Education, Inc. 2003. • An Introduction to Clustering & different methods of clustering, by Saurav Kaushik 2016. • www. Analytics. Vidhya. com. Hierarchical Clustering 23

THANK YOU Hierarchical Clustering 24
- Slides: 24