Cluster 0 Color CC 0000 Analysis of Clustering

  • Slides: 1
Download presentation
Cluster 0 (Color CC 0000) Analysis of Clustering Algorithms Ethan Summers and Kathryn Cooper

Cluster 0 (Color CC 0000) Analysis of Clustering Algorithms Ethan Summers and Kathryn Cooper College of Information Science and Technology University of Nebraska at Omaha, NE 68182 ABSTRACT In Bioinformatics, choosing the right algorithm for a problem is very important. Choosing the wrong algorithm or one that is less efficient can make or break a project. Analyzing algorithms beforehand is key. The goal of this project is to analyze three clustering algorithms for protein interaction networks and compare their function and results. A clustering algorithm takes a dataset, in this case a simulated PPI (protein-protein interaction) network and groups together similar data points based on some similarity criteria. It is important to know the difference between these algorithms to get the desired results. Results K-medoid Cluster 0: Red Cluster 1: Green Cluster 2: Purple Methods Algorithm Clusters Nodes in Clusters Unclustered nodes K-medoid 3 0 MCODE 2 Cluster 0: 8 Cluster 1: 10 Cluster 2: 8 Cluster 0: 5 Cluster 1: 4 MCL 3 Cluster 0: 19 Cluster 1: 3 Cluster 2: 2 2 17 K-medoid MCODE MCL • Even clusters • Skewed nodes PPI data was simulated by random assignment of both values and edges • Forced cluster amount • Fewer clusters • Very large single cluster A number of network clustering algorithms were investigated for use. Many of these were not used but included WDCM, MSARC, e-CCC biclustering, and SPICI • No unclustered nodes • Many unclustered nodes • Few unclustered nodes • Seems to be good if there needs to be a specific amount of clusters, and all nodes must be used • Seems to be good to • Seems to create a large pull specialized clusters cluster, this could be out only if there is a due to it being the first. connection Ultimately, the following algorithms were chosen and run using Cytoscape is a UI that was used to run algorithms using a randomly generated list of nodes and edges. The algorithms used were: MCODE Cluster 0: Red Cluster 1: Green • K-medoids, • MCODE, and • MCL These algorithms were chosen for their usability and ease of use with existing file formats. Conclusion and Future Directions Results “Base” Network: Randomized set with 26 nodes and edges with edge weights betwen between 0 -1 (chosen randomly) Node 1 Node 2 A G B W C E D R E T F Y G U H I I O J P K A L S M D N F O G P H Q J R K S L T Z U X V C W Q X B Y N Z M Edge 0. 1 0. 2 0. 3 0. 5 0. 8 0. 9 0. 8 0. 7 0. 6 0. 4 0. 5 0. 6 0. 5 0. 1 1 0. 9 0. 8 0. 5 0. 6 0. 7 0. 9 0. 8 1 0. 8 Node 1 Node 2 A P B O C I D U E Y F T G R H E I W J Q K L L J M K N H O V P F Q D R S S A T M U N V B W V X C Y Z Z X Edge 0. 9 0. 4 0. 6 0. 5 0. 8 0. 7 0. 9 0. 4 0. 3 0. 6 0. 8 0. 9 1 0. 5 0. 4 0. 6 0. 3 0. 2 0. 8 0. 7 0. 9 0. 8 0. 9 Node 1 Node 2 A Q B A C Z D W E S F X G E H D I C J R K F L V M T N G O B P Y Q H R N S U T J U M V I W K X O Y L Z P • There is no set format to be used for this type of system, so finding a GUI is very helpful MCL Cluster 0: Red Edge 0. 7 0. 6 0. 4 0. 5 0. 3 0. 2 0. 1 1 0. 9 0. 8 0. 7 0. 4 0. 6 0. 3 0. 2 0. 1 0. 8 0. 9 0. 7 0. 5 0. 6 0. 3 0. 2 0. 1 The University of Nebraska does not discriminate based on race, color, ethnicity, national origin, sex, pregnancy, sexual orientation, gender identity, religion, disability, age, genetic information, veteran status, marital status, and/or political affiliation in its programs, activities, or employment. Cluster 1: Green Cluster 2: Purple • There is a general sense of people making systems only for their own work, due to a lack of examples, guides, and standardization • For the future look for software, like Cytoscape, rather than individual tools References Bader, G. D. , & Hogue, C. W. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4(1), 2. Morris, J. H. , Apeltsin, L. , Newman, A. M. , Baumbach, J. , Wittkop, T. , Su, G. , . . . & Ferrin, T. E. (2011). cluster. Maker: a multi-algorithm clustering plugin for Cytoscape. BMC bioinformatics, 12(1), 436. Shannon, P. , Markiel, A. , Ozier, O. , Baliga, N. S. , Wang, J. T. , Ramage, D. , . . . & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11), 2498 -2504.