K-means Clustering Group 15 Prajakta Purohit Swathi Gurram
Goal � Performance algorithm in ◦ Twister ◦ Hadoop ◦ Dryad. Lin. Q analysis of K-means clustering
Cluster Analysis � Cluster Analysis : assigning a set of objects into clusters such that the objects in the same cluster are more similar to each other than to those in other clusters. � Used for : Statistical Data analysis, machine learning, pattern recognition etc
K-Means Algorithm � Partitions ‘n’ observations to k clusters such that each observation belongs to a cluster with the nearest mean. � Initial means (seeds) are found using kmeans++ algorithm.
Forming K-mean clusters
Working of K-means
Timeline and Responsibilities Week Task Team member Week 1 Understand K-means algorithm and design Prajakta, Swathi Week 2 Implement K-means Prajakta, Swathi Week 3 Implement K-means on Twister and performance analysis Prajakta, Swathi Week 4 Implement K-means on Hadoop and analyze Prajakta, performance Swathi Week 5 Implement K-means on Dryad. LINQ and analyze performance Prajakta, Swathi Week 6 Optimize the algorithm based on performance analysis Prajakta, Swathi Week 7 Final Technical report Prajakta, Swathi
Deliverables � K- means code � Technical Report and Performance analysis
References � Wikipedia � http: //salsahpc. indiana. edu � http: //www. iterativemapreduce. org/samples. html