Kmeans Clustering by Gradual Data Transformation Mikko Malinen
K-means*: Clustering by Gradual Data Transformation Mikko Malinen and Pasi Fränti Speech and Image Processing Unit School of Computing University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
K-means* clustering Gradual transformation of data Data Fit the data to a model Model Intermediate Final University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
K-means clustering Iterate between two steps: 1. Assignment step Assign the points to the nearest centroids 2. Update step Update the location of centroids University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
K-means* clustering University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Example of clustering (s 2 dataset) University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
0% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
10% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
20% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
30% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
40% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
50% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
60% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
70% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
80% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
90% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
100% done University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Empty clusters problem University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Time Complexity Initialization Data set transform Empty clusters removal K-means Algorithm total University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Time Complexity Fixed k-means Initialization Data set transform Empty clusters removal K-means Algorithm total University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
s 1 d=2 n = 5000 k = 15 s 2 d=2 n = 5000 k = 15 s 3 d=2 n = 5000 k = 15 s 4 d=2 n = 5000 k = 15 bridge d = 16 n = 4096 k= 256 missa d = 16 n = 6480 k= 256 house d = 3 n=34000 k=256 thyroid d = 5 n = 215 k = 2 iris d=4 n = 150 k = 2 wine d = 13 n = 178 k = 3 Datasets University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error Dataset k-means proposed GKM optimal s 1 1. 85 1. 01 0. 89 s 2 1. 94 1. 52 1. 33 s 3 1. 97 1. 71 1. 69 s 4 1. 69 1. 63 1. 57 bridge 168. 2 164. 7 164. 1 160. 7 missa 5. 33 5. 15 5. 34 5. 12 house 9. 88 9. 48 5. 94 5. 86 thyroid 6. 97 6. 92 1. 52 iris 3. 70 2. 02 wine 1. 92 1. 90 0. 88 University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Mean square error vs. number of steps University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Number of incorrect clusters All correct: proposed: 36% k-means: 14% University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Number of incorrect clusters 1 incorrect: proposed: 64% k-means: 38% University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Number of incorrect clusters 2 incorrect: proposed: 0% k-means: 34% University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Number of incorrect clusters 3 incorrect: proposed: 0% k-means: 10% University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
Summary • We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model. • The proposed method gives better mean square error than k-means. University of Eastern Finland School of Computing P. O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www. uef. fi/cs
- Slides: 33