Interactive Exploration of Hierarchical Clustering Results HCE Hierarchical

  • Slides: 25
Download presentation
Interactive Exploration of Hierarchical Clustering Results HCE (Hierarchical Clustering Explorer) Jinwook Seo and Ben

Interactive Exploration of Hierarchical Clustering Results HCE (Hierarchical Clustering Explorer) Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab Department of Computer Science University of Maryland, College Park jinwook@cs. umd. edu

Cluster Analysis of Microarray Experiment Data • About 100 ~ 20, 000 gene samples

Cluster Analysis of Microarray Experiment Data • About 100 ~ 20, 000 gene samples • Under 2 ~ 80 experimental conditions • Identify similar gene samples – startup point for studying unknown genes • Identify similar experimental conditions – develop a better treatment for a special group • Clustering algorithms – Hierarchical, K-means, etc.

Dendrogram -3. 64 4. 87

Dendrogram -3. 64 4. 87

Dendrogram -3. 64 4. 87

Dendrogram -3. 64 4. 87

Dendrogram -3. 64 4. 87

Dendrogram -3. 64 4. 87

Interactive Exploration Techniques • Dynamic Query Controls – Number of clusters, Level of detail

Interactive Exploration Techniques • Dynamic Query Controls – Number of clusters, Level of detail • Coordinated Display – Bi-directional interaction with 2 D scattergrams • Overview of the entire dataset – Coupled with detail view • Visual Comparison of Different Results – Different results by different methods

Demonstration • 99 Yeast genes • 7 variables (time points) • Download HCE at

Demonstration • 99 Yeast genes • 7 variables (time points) • Download HCE at – www. cs. umd. edu/hcil/multi-cluster • More demonstration – A. V. Williams Bldg, 3174 – 3: 30 -5: 00 pm, May 31.

Dynamic Query Controls Filter out less similar genes § By pulling down the minimum

Dynamic Query Controls Filter out less similar genes § By pulling down the minimum similarity bar § Show only the clusters that satisfy the minimum similarity threshold § Help users determine the proper number of clusters § Easy to find the most similar genes

Dynamic Query Controls Adjust level of detail § By dragging up the detail cutoff

Dynamic Query Controls Adjust level of detail § By dragging up the detail cutoff bar § Show the representative pattern of each cluster § Hide detail below the bar § Easy to view global structure

Coordinated Displays • Two experimental conditions for the x and y axes • Two-dimensional

Coordinated Displays • Two experimental conditions for the x and y axes • Two-dimensional scattergrams – limited to two variables at a time – readily understood by most users – users can concentrate on the data without distraction • Bi-directional interactions between displays

Overview in a limited screen space • What if there are more than 1,

Overview in a limited screen space • What if there are more than 1, 600 items to display? • Compressed Overview : averaging adjacent leaves • Easy to locate interesting spots Melanoma Microarray Experiment (3614 x 38)

Overview in a limited screen space • What if there are more than 1,

Overview in a limited screen space • What if there are more than 1, 600 items to display? • Alternative Overview : changing bar width (2~10) • Show more detail, but need scrolling

Cluster Comparison • • There is no perfect clustering algorithm! Different Distance Measures Different

Cluster Comparison • • There is no perfect clustering algorithm! Different Distance Measures Different Linkage Methods Two dendrograms at the same time – Show the mapping of each gene between the two dendrograms – Busy screen with crossing lines – Easy to see anomalies

Cluster Comparison

Cluster Comparison

Conclusion • Integrate four features to interactively explore clustering results to gain a stronger

Conclusion • Integrate four features to interactively explore clustering results to gain a stronger understanding of the significance of the clusters – Overview, Dynamic Query, Coordination, Cluster Comparison • Powerful algorithms + Interactive tools • Bioinformatics Visualization www. cs. umd. edu/hcil/multi-cluster July 2002 IEEE Computer Special Issue on Bio. Informatics

Hierarchical Clustering Initial Data Items Distance Matrix Dist A B C D D A

Hierarchical Clustering Initial Data Items Distance Matrix Dist A B C D D A B C D 20 7 2 10 25 3

Hierarchical Clustering Initial Data Items Distance Matrix Dist A B C D D A

Hierarchical Clustering Initial Data Items Distance Matrix Dist A B C D D A B C D 20 7 2 10 25 3

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist A B C 2 A

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist A B C 2 A D B C D A B C D 20 7 2 10 25 3

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20 3 B C A D B C 10

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20 3 B C A D B C 10

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist AD B C AD 20 3 B C 3 A D C B 10

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist ADC B ADC 10 B

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist ADC B ADC 10 B A D C B

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist ADC B ADC 10 B

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix Dist ADC B ADC 10 B A D C B

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix 10 Dist ADC B ADC 10

Hierarchical Clustering Single Linkage Current Clusters Distance Matrix 10 Dist ADC B ADC 10 B A D C B

Hierarchical Clustering Single Linkage Final Result Distance Matrix Dist ADC B A D C

Hierarchical Clustering Single Linkage Final Result Distance Matrix Dist ADC B A D C B