Kmeans Clustering Group 15 Swathi Gurram Prajakta Purohit
- Slides: 16
K-means Clustering Group 15 Swathi Gurram Prajakta Purohit
Goal �To program K-means on Twister (Iterative Map- Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.
Survey �Twister Configurable long running (cacheable) map/reduce tasks � Pub/sub messaging based communication/data transfers � Efficient support for Iterative Map. Reduce computation � Combine phase to collect all reduce outputs � Data access via local disks �
Survey �Hadoop: a software framework that supports data -intensive distributed applications �Uses Map- reduce programming model �it's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large files �can intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing
Survey �Haloop : a modified version of the Hadoop Map. Reduce framework � provide caching options for loop-invariant data access �let users reuse major building blocks from applications' Hadoop implementations �have similar intra-job fault-tolerance mechanisms to Hadoop. � Ha. Loop reduces query runtimes by 1. 85 compared with Hadoop
K-means Clustering
K-means Clustering
Twister K-means
Hadoop K-means
Twister- Hadoop Comparison 1000 900 Execution Time in seconds --> 800 700 600 500 400 300 200 100 0 Twister Hadoop 1 1. 1542 603 2 1. 1263 630 3 1. 1264 886 4 1. 1097 642 5 6 1. 1137 1. 1262 646 942 Centroid Sets--> 7 1. 0926 483 8 1. 1102 690 9 1. 1034 671 10 1. 1159 713
Implementation Timeline Week Task Team member Oct 24 th – Oct 31 st Understand K-means algorithm and design Prajakta, Swathi Nov 1 st – Nov 7 th Implement K-means Prajakta, Swathi Nov 8 th – Nov 21 st Implement K-means on Twister and performance Prajakta, Swathi analysis Nov 21 st – Optimized validation method for Kmeans Nov 28 th algorithm Prajakta, Swathi Nov 29 th – Implement K-means on Hadoop Dec 3 rd Prajakta, Swathi Dec 4 th – Dec 5 th Performance Analysis and Presentation Prajakta, Swathi Dec 6 th – Dec 12 th Final Technical report Prajakta, Swathi
Validation methods
Conclusion �Twister framework is faster than Hadoop for iterative map- reduce applications.
References �http: //salsahpc. indiana. edu �http: //www. iterativemapreduce. org/samples. html �http: //hadoop. apache. org/ �http: //en. wikipedia. org/wiki/Apache_Hadoop �http: //clue. cs. washington. edu/node/14 �http: //code. google. com/p/haloop/ �http: //www. cs. washington. edu/homes/billhowe/pu bs/Ha. Loop. pdf
Demo
Thank you
- Swathi gurram
- Swathi gurram
- Atul purohit wikipedia
- Rumus euclidean
- Flat clustering vs hierarchical clustering
- Partitional clustering vs hierarchical clustering
- Patrick mackey
- Sota analysis
- Javascript kmeans
- Kunchan swathi
- Trajectory clustering: a partition-and-group framework
- Birch clustering
- Hierarchical clustering demo
- Bfr algorithm example
- What is clustering in writing
- Text clustering
- Rank order clustering example