Kmeans Clustering Group 15 Prajakta Purohit Swathi Gurram

Goal � Performance algorithm in ◦ Twister ◦ Hadoop ◦ Dryad. Lin. Q analysis

Cluster Analysis � Cluster Analysis : assigning a set of objects into clusters such

K-Means Algorithm � Partitions ‘n’ observations to k clusters such that each observation belongs

Timeline and Responsibilities Week Task Team member Week 1 Understand K-means algorithm and design

Deliverables � K- means code � Technical Report and Performance analysis

References � Wikipedia � http: //salsahpc. indiana. edu � http: //www. iterativemapreduce. org/samples. html

Slides: 9

Download presentation

K-means Clustering Group 15 Prajakta Purohit Swathi Gurram

Goal � Performance algorithm in ◦ Twister ◦ Hadoop ◦ Dryad. Lin. Q analysis of K-means clustering

Cluster Analysis � Cluster Analysis : assigning a set of objects into clusters such that the objects in the same cluster are more similar to each other than to those in other clusters. � Used for : Statistical Data analysis, machine learning, pattern recognition etc

K-Means Algorithm � Partitions ‘n’ observations to k clusters such that each observation belongs to a cluster with the nearest mean. � Initial means (seeds) are found using kmeans++ algorithm.

Forming K-mean clusters

Working of K-means

Timeline and Responsibilities Week Task Team member Week 1 Understand K-means algorithm and design Prajakta, Swathi Week 2 Implement K-means Prajakta, Swathi Week 3 Implement K-means on Twister and performance analysis Prajakta, Swathi Week 4 Implement K-means on Hadoop and analyze Prajakta, performance Swathi Week 5 Implement K-means on Dryad. LINQ and analyze performance Prajakta, Swathi Week 6 Optimize the algorithm based on performance analysis Prajakta, Swathi Week 7 Final Technical report Prajakta, Swathi

Deliverables � K- means code � Technical Report and Performance analysis

References � Wikipedia � http: //salsahpc. indiana. edu � http: //www. iterativemapreduce. org/samples. html