ICIST 2013 Yangzhou China An Adaptive Clustering Algorithm

  • Slides: 19
Download presentation
ICIST 2013 Yangzhou, China An Adaptive Clustering Algorithm Based on Data Field in Complex

ICIST 2013 Yangzhou, China An Adaptive Clustering Algorithm Based on Data Field in Complex Networks Cui Xu, Yuhua Liu, Kaihua Xu and Ke Xu Paper ID: CSE 0271 Correspondence Email: yhliu@mail. ccnu. edu. cn School of Computer, Central China Normal University Wuhan, China 2013. 03

l l l Outline Introduction Related Research and Analysis The Importance Factor of Vertices

l l l Outline Introduction Related Research and Analysis The Importance Factor of Vertices Proposed and The Data Field Introduced ¾ The importance factor of vertices proposed ¾ The data field introduced l l An Adaptive Clustering Algorithm Based on Data Field in Complex Networks Simulations and Analysis ¾ Effectiveness Analysis ¾ Accuracy Analysis l Conclusions

Introduction In the real world, many complex systems can be abstracted as networks, such

Introduction In the real world, many complex systems can be abstracted as networks, such as social networks, food webs, biological networks, Web networks and other networks, these networks have the complex characteristic of complex network and the topological structure of networks have obvious cluster characteristic. l At present, there are many kinds of clustering algorithms, which can be roughly divided into optimization and heuristic methods. This algorithm is a heuristic algorithm. a) with a lower time complexity. b) the cluster number not only can be adjusted by manual tuning, but also can be controlled by algorithm itself c) it has a better effectiveness and accuracy. l

Related Research and Definition 1 The degree centrality of vertex is Analysis(1/2) (1) where

Related Research and Definition 1 The degree centrality of vertex is Analysis(1/2) (1) where is the degree of a vertex vertices. , n is the number of Definition 2 The closeness centrality of vertex is (2) where the is the shortest path between vertex and .

Related Research and Definition 3 The mutual-information centrality of vertices. Analysis(2/2) (3) Where Modularity

Related Research and Definition 3 The mutual-information centrality of vertices. Analysis(2/2) (3) Where Modularity formulation (4) Where , c is the number of clusters in the network, e is the sum of all edges in the whole network, ei is the sum of all edges in cluster, di is the sum of the degree of all vertices in cluster i.

The Importance Factor of Vertices Proposed and The. A. Data Field Introduced(1/2) The importance

The Importance Factor of Vertices Proposed and The. A. Data Field Introduced(1/2) The importance factor of vertices B. (5) THE RESULT OF IMPORTANCE EVALUATION Number The topogical structure of 10 veritices 0. 4444 0. 2454 0. 4734 0. 3268 0. 1111 -0. 0701 0. 3330 0. 0137 0. 2222 -0. 0701 0. 5292 0. 0805 0. 4444 0. 2103 0. 5292 0. 3465 0. 1111 -0. 0701 0. 3600 0. 0148 0. 2222 0. 0000 0. 3915 0. 0870 0. 1111 -0. 0350 0. 2907 0. 0221

The Importance Factor of Vertices Proposed and The Data Field Introduced(2/2) B. The data

The Importance Factor of Vertices Proposed and The Data Field Introduced(2/2) B. The data field Definition 5 The field-strength function. (6) Definition 6 The potential function. (7)

An Adaptive Clustering l In Algorithm this algorithm, because of the mathematical of the

An Adaptive Clustering l In Algorithm this algorithm, because of the mathematical of the Gaussian potential Based function (as shown in Figure 2 and 3), the influence scope of on Data Field in. the vertex is under a certain Complex Networks(1/5) Figure 2. The potential curve of a single vertex Figure 3. The equipotential line distribution of a single vertex

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(2/5) Studies show the

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(2/5) Studies show the diameter of cluster structure is always less than 9[8], also literature [9] that researched on the Web cluster structure showed the influence scope of the vertex to others was about 3 to 4, so we can only calculate 4 hops of vertices when we calculate the filed-strength to reduce the computation complexity, that is , so. l

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(3/5) l Algorithm Description

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(3/5) l Algorithm Description Initialization: adjacency matrix of the network, , , ; Step 1 If you want to decided the number of cluster, make and ; Step 2 using (5), we calculate the importance factor of the first bigger degree of vertices and sort them in descending order;

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(4/5) Step 3 Take

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(4/5) Step 3 Take the first c important vertices in Step 2 as the center of clusters, the rest add into set. Using (6), add the vertex to cluster j which the important vertex in it whose field-strength is biggest to vertex i , that is. If the first two biggest fieldstrength values are very small or equal, add the vertex to Temp and delete it in A; Step 4 If , using (7), calculate the potential value of vertices in Temp, in (7), add them to the cluster whose value is biggest; Step 5 If , go to Step 6, else go to Step 7; Step 6 If , , go to Step 3, else go to Step 7; Step 7 Output the clusters structure.

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(5/5) The comparison of

An Adaptive Clustering Algorithm Based on Data Field in Complex Networks(5/5) The comparison of time complexity between DFC and GN algorithm Algorithm Time complexity Statements GN O(m 2 n) n vertices, m edges DFC O(n 2) n vertices

Simulations and Analysis(1/4) A. Effectiveness Analysis u Zachary’s Karate Club Figure 6. Clusters in

Simulations and Analysis(1/4) A. Effectiveness Analysis u Zachary’s Karate Club Figure 6. Clusters in Zachary’s karate club network by GN. The network is divided into two groups which represent by Figure 7. Clusters in Zachary’s karate club network by DFC. The network is divided into two groups which represent by different form and color and vertices 1 and 34 different form and color. are much bigger than others.

Simulations and Analysis(2/4) B. Accuracy Analysis u Computer-generated Networks Each network has 128 vertices

Simulations and Analysis(2/4) B. Accuracy Analysis u Computer-generated Networks Each network has 128 vertices divided into 4 clusters of equal size, so in this algorithm, we make the cluster number is 4. The links are connected randomly with a probability for a link to occur for each pair of intra-community vertices and another probability for each pair of inter-community vertices so as to keep the average degree of a vertex to be 16, the formula as following (8)

Simulations and Analysis(3/4) Figure 8. Comparison with GN algorithm on computer-generated networks.

Simulations and Analysis(3/4) Figure 8. Comparison with GN algorithm on computer-generated networks.

Simulations and Analysis(4/4) As the Figure 8 shows, this algorithm performs well, correctly identifying

Simulations and Analysis(4/4) As the Figure 8 shows, this algorithm performs well, correctly identifying more than 90% of vertices when values of , but when , the accuracy decreases rapidly. When , the two algorithms have good performances with 100% correct, when , our method has better performance than GN, however, when , the networks are more and more difficult to be dealt with for the two algorithms, but DFC has a better performance, so we can see that the DFC outperforms GN algorithm.

Conclusions l This paper proposed an adaptive clustering algorithm based on data field in

Conclusions l This paper proposed an adaptive clustering algorithm based on data field in complex networks, considering the attributes and position of vertices, the importance factor was put forward to dig out the important vertices in network. The theory of data field in physics was introduced into complex network, using the potential of vertices to find the cluster structure of network. Compared with the majority of algorithms, this algorithm has low time complexity, and the cluster number not only can be adjusted by manual tuning, but also can be controlled by algorithm itself, at the same time, it also has a better effectiveness and accuracy.

Reference n n n U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer,

Reference n n n U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, etal. “On Modularity Clustering”, IEEE Transaction on Knowledge and Data Engineering, vol. 20, no. 2, pp. 172188, 2. 2008. X. F. Wang , G. R. Chen, “Complex Networks: Small-world, Scale-free and Beyong”, Circuit and Systems Magazine, IEEE, vol. 3, no. 1, pp. 6 -20, 2003. B. Yang, D. Y. Liu, J. M. Liu, “Complex Network Clustering Algorithms”, Journal of Software, vol. 20, no. 1, pp. 54 -66, 1. 2009. M. E. J Newman, M. Girvan, “Finding and Evaluating Community Structure in Networks”, Physical Review E, vol. 69, no. 2, 2004. Dirk Koschutzki, Katharina Anna Lehmann, Leon Peeters, Stefan Richter, Dagmar Tenfelde. Podehl, Oliver Zlotowski, “Centrality Indices”, Computer Science, Springer, vol. 3418, pp. 1661, 2005. Z. Yi, Y. H. Liu, K. H. Xu, Z. R. Luo, “Evaluation Method for Node Importance Based on Mutual Information in Complex Networks”, Computer Science, vol. 38, no. 6, pp. 88 -89, 6. 2011. W. Y. Gan, D. Y. Li, J. M. Wang, “An Hierarchical Clustering Method Based on Data Fields”, EJournal, vol. 34, no. 2, pp. 258 -262, 2. 2006. H. B. Hu, K. Wang, L. Xu, “Analysis of Online Social Networks Based on Complex Network Theory”, Complex Systems and Complexity Science, vol. 5, no. 2, pp. 1 -14, 2008. R. Albert, A. L. Barabasi, “Statistical Mechanics of Complex Network”, Review of Modern Physics, vol. 74, no. 1, pp. 47 -97, 2002. M. K. Pakhiraa, S. Bandyopadhyayb, U. Maulikc, “Validity Index for Crisp and Fuzzy Clusters”, Pattern Recognition, Elsevier, vol. 37, no. 3, pp. 487 -501, 3. 2004. Zachary WW, “An Information Flow Model for Conflict and Fission in Small Groups”, Journal of Anthropological Research, vol. 33, no. 4, pp. 452 -473, 1977.