Community Structures What is Community Structure Definition q
- Slides: 104
Community Structures
What is Community Structure Ø Definition: q A community is a group of nodes in which: § There are more edges (interactions) between nodes within the group than to nodes outside of it My T. Thai mythai@cise. ufl. edu 2
Why Community Structure (CS)? Ø Many systems can be expressed by a network, in which nodes represent the objects and edges represent the relations between them: q Social networks: collaboration, online social networks q Technological networks: IP address networks, WWW, software dependency q Biological networks: protein interaction networks, metabolic networks, gene regulatory networks My T. Thai mythai@cise. ufl. edu 3
Why CS? Yeast Protein interaction networks My T. Thai mythai@cise. ufl. edu 4
Why CS? IP address network My T. Thai mythai@cise. ufl. edu 5
Why Community Structure? Ø Nodes in a community have some common properties Ø Communities represent some properties of a networks Ø Examples: q In social networks, represent social groupings based on interest or background q In citation networks, represent related papers on one topic q In metabolic networks, represent cycles and other functional groupings My T. Thai mythai@cise. ufl. edu 6
An Overview of Recent Work Ø Disjoint CS Ø Overlapping CS Ø Centralized Approach q Define the quantity of modularity and use the greedy algorithms, IP, SDP, Spectral, Random walk, Clique percolation Ø Localized Approach Ø Handle Dynamics and Evolution Ø Incorporate other information My T. Thai mythai@cise. ufl. edu 7
Graph Partitioning? It’s not Ø Graph partitioning algorithms are typically based on minimum cut approaches or spectral partitioning
Graph Partitioning Ø Minimum cut partitioning breaks down when we don’t know the sizes of the groups - Optimizing the cut size with the groups sizes free puts all vertices in the same group Ø Cut size is the wrong thing to optimize - A good division into communities is not just one where there a small number of edges between groups Ø There must be a smaller than expected number edges between communities
Edge Betweeness Ø Focus on the edges which are least central, i. e. , , the edges which are most “between” communities Ø Instead of adding edge to G = (V, emptyset), progressively removing edges from an original graph G = (V, E) My T. Thai mythai@cise. ufl. edu 10
Edge Betweeness Ø Definition: q For each edge (u, v), the edge betweeness of (u, v) is defined as the number of shortest paths between any pair of nodes in a network that run through (u, v) q betweeness(u, v) = | { Pxy | x, y in V, Pxy is a shortest path between x and y, and (u, v) in Pxy}| My T. Thai mythai@cise. ufl. edu 11
Why Edge Betweeness My T. Thai mythai@cise. ufl. edu 12
Algorithm Ø Initialize G = (V, E) representing a network Ø while E is not empty q Calculate the betweeness of all edges in G q Remove the edge e with the highest betweeness, G = (V, E – e) Ø Indeed, we just need to recalculate the betweeness of all edges affected by the removal My T. Thai mythai@cise. ufl. edu 13
Time Complexity Ø Let |V| = n and |E| = m Ø Calculate the betweeness of all edges: O(mn) Ø Since we need to recalculate each time we remove an edge: O(m 2 n) My T. Thai mythai@cise. ufl. edu 14
An Example My T. Thai mythai@cise. ufl. edu 15
Disadvantages/Improvements Ø Can we improve the time complexity? Ø The communities are in the hierarchical form, can we find the disjoint communities? My T. Thai mythai@cise. ufl. edu 16
Define the quantity (measurement) of modularity Q and find an approximation algorithm to maximize Q My T. Thai mythai@cise. ufl. edu 17
Finding community structure in very large networks Authors: Aaron Clauset, M. E. J. Newman, Cristopher Moore 2004 Ø Consider edges that fall within a community or between a community and the rest of the network if vertices are in the Ø Define modularity: same community adjacency matrix probability of an edge between two vertices is proportional to their degrees n For a random network, Q = 0 n the number of edges within a community is no different from what you would expect
Finding community structure in very large networks Authors: Aaron Clauset, M. E. J. Newman, Cristopher Moore 2004 Ø Algorithm q start with all vertices as isolates q follow a greedy strategy: § § successively join clusters with the greatest increase DQ in modularity stop when the maximum possible DQ <= 0 from joining any two q successfully used to find community structure in a graph with > 400, 000 nodes with > 2 million edges § Amazon’s people who bought this also bought that… q alternatives to achieving optimum DQ: § simulated annealing rather than greedy search
Extensions to weighted networks Ø Betweenness clustering? q Will not work – strong ties will have a disproportionate number of short paths, and those are the ones we want to keep q Modularity (Analysis of weighted networks, M. E. J. Newman) weighted edge reuters new articles keywords
Structural Quality Coverage Modularity Conductance Inter-cluster conductance Average conductance There is no single perfect quality function. [Almedia et al. 2011]
Resolution Limit ls : # links inside module s L : # links in the network ds : The total degree of the nodes in module s : Expected # of links in module s 22
The Limit of Modularity Ø Modularity seems to have some intrinsic scale of order , which constrains the number and the size of the modules. Ø For a given total number of nodes and links we could build many more than modules, but the corresponding network would be less “modular”, namely with a value of the modularity lower than the maximum 23
The Resolution Limit Since M 1 and M 2 are constructed modules, we have 24
The Resolution Limit (cont) Let’s consider the following case • QA : M 1 and M 2 are separate modules • QB : M 1 and M 2 is a single module Since both M 1 and M 2 are modules by construction, we need That is, 25
The Resolution Limit (cont) Now let’s see how it contradicts the constructed modules M 1 and M 2 We consider the following two scenarios: ( • • ) The two modules have a perfect balance between internal and external degree (a 1+b 1=2, a 2+b 2=2), so they are on the edge between being or not being communities, in the weak sense. The two modules have the smallest possible external degree, which means that there is a single link connecting them to the rest of the network and only one link connecting each other (a 1=a 2=b 1=b 2=1/l). 26
Scenario 1 (cont) When and , the right side of can reach the maximum value In this case, may happen. 27
Scenario 2 (cont) a 1=a 2=b 1=b 2=1/l 28
Schematic Examples (cont) For example, p=5, m=20 The maximal modularity of the network corresponds to the partition in which the two smaller cliques are merged 29
Fix the resolution? Ø Uncover communities of different sizes My T. Thai mythai@cise. ufl. edu 30
Community Detection Algorithms Ø Blondel (Louvian method), [Blondel et al. 2008] q Fast Modularity Optimization q Hierarchical clustering Ø Infomap, [Rosvall & Bergstrom 2008] q Maps of Random Walks q Flow-based and information theoretic Ø Info. H (Info. Hiermap), [Rosvall & Bergstrom 2011] q Multilevel Compression of Random Walks q Hierarchical version of Infomap
Community Detection Algorithms Ø RN, [Ronhovde & Nussinov 2009] q Potts Model Community Detection q Minimization of Hamiltonian of an Potts model spin system Ø MCL, [Dongen 2000] q Markov Clustering q Random walks stay longer in dense clusters Ø LC, [Ahn et al. 2010] q Link Community Detection q A community is redefined as a set of closely interrelated edges q Overlapping and hierarchical clustering
Blondel et al Ø Two Phases: q Phase 1: § § Initially, we have n communities (each node is a community) For each node i, consider the neighbor j of i and evaluate the modularity gain that would take place by placing i in the community of j. Node i will be placed in one of the communities for which this gain is maximum (and positive) Stop this process when no further improvement can be achieved q Phase 2: § § Compress each community into a node and thus, constructing a new graph representing the community structures after phase 1 Re-apply Phase 1 My T. Thai mythai@cise. ufl. edu 33
My T. Thai mythai@cise. ufl. edu 34
My T. Thai mythai@cise. ufl. edu 35
State-of-the-art methods Ø Evaluated by Lancichinetti, Fortunato, Physical Review E 09 q Infomap[Rosvall and Bergstrom, PNAS 07] q Blondel’s method [Blondel et. al, J. of Statistical Mechanics: Theory and Experiment 08] q Ronhovde & Nussinov’s method (RN) [Phys. Rev. E, 09] Ø Many other recent heuristics q OSLOM, QCA… No Provable Performance Guarantee Need Approximation Algorithms 36
Power-Law Networks Ø 37
PLNs Model P(α, β) 38
LDF Algorithm – The Basis Ø w x v u z y 39
LDF Algorithm Ø 40
An Example of LDF 41
Theorem: Sketch of the proof Ø 42
LDF Undirected -Theorem 43
D-LDF – Directed Networks Ø v u 44
D-LDF – Directed Networks Ø v u 45
LDF-Directed Networks 46
Dynamic Community Structure merge move more edges t t+1 t+2 Time Network evolution 47
Quantifying social group evolution (Palla et. al – Nature 07) Ø Developed an algorithm based on clique percolation -> allows to investigate the time dependence of overlapping communties q Uncover basic relationships characterizing community evolution q Understand the development and self-optimization 48
Findings Ø Fundamental diffs b/w the dynamics of small and large groups q Large groups persists for longer; capable of dynamically altering their membership q Small groups: their composition remains unchanged in order to be stable Ø Knowledge of the time commitment of members to a given community can be used for estimating the community’s lifetime 49
50
51
52
Research Problems Ø How to update the evolving community structure (CS) without re-computing it Ø Why? q Prohibitive computational costs for re-computing q Introduce incorrect evolution phenomena Ø How to predict new relationships based on the evolving of CS 53
An Adaptive Model Input network Basic CS : : Basic communities Network changes • Need to handle – Node insertion – Edge insertion – Node removal – Edge removal Updated communities 54
Related Work in Dynamic Networks Ø Graph. Scope [J. Sun et al. , KDD 2007] Ø Facet. Net [Y-R. Lin et al. , WWW 2008] Ø Bayesian inference approach [T. Yang et al. , J. Machine Learning, 2010] Ø QCA [N. P. Nguyen and M. T. Thai, INFOCOM 2011] Ø OSLOM [A. Lancichinetti et al. , PLo. S ONE, 2011] Ø AFOCS [Nguyen at el, Mobicom 2011] 55
An Adaptive Algorithm for Overlapping Input network Phase 1: Basic CS detection ( ) Basic communities Network changes Our solution: AFOCS: A 2 -phase and limited input dependent framework Phase 2: Adaptive CS update ( ) Updated communities N. Nguyen and M. T. Thai, ACM Mobi. Com 2011 56
Phase 1: Basic Communities Detection Ø Basic communities q Dense parts of the networks q Can possibly overlap q Bases for adaptive CS update Ø Duties q Locates basic communities q Merges them if they are highly overlapped 57
Phase 1: Basic Communities Detection �Locating basic communities: when (C) = 0. 9 (C) =0. 725 �Merging: when OS(Ci, Cj) = 1. 027 = 0. 75 58
Phase 1: Basic Communities Detection 59
Phase 2: Adaptive CS Update Ø Update network communities when changes are introduced Need to handle Basic communities Network changes – Adding a node/edge – Removing a node/edge Updated communities + Locally locate new local communities + Merge them if they highly overlap with current ones 60
Phase 2: Adding a New Node u u u 61
Phase 2: Adding a New Edge 62
Phase 2: Removing a Node � Identify the left-over structure(s) on C{u} � Merge overlapping substructure(s) 63
Phase 2: Removing an Edge � Identify the left-over structure(s) on C{u, v} � Merge overlapping substructure(s) 64
AFOCS performance: Choosing β 65
AFOCS v. s. Static Detection + CFinder [G. Palla et al. , Nature 2005] + COPRA [S. Gregory, New J. of Physics, 2010] 66
AFOCS v. s. Other Dynamic Methods + i. LCD [R. Cazabet et al. , SOCIALCOM 2010] 67
Adaptive CS Detection in Dynamic Networks Ø Running time is proportional to the amount of changes Ø Can be locally employed Ø More consistent community structure: Critical for applications such as routing. START 1. Initial Network 4. Changes in the Network 6. Compact Representation Graph (CRG) 3. Refine CS 5. Output CS 68
Adaptive CS Detection in Dynamic Networks a z y 3 x z t b b t y 2 aa b 2 22 y xx 20 20 28 10 10 10 1 16 2 2 x 2 z a t b 12 2 69
A-LDF – Dynamic Network Algorithm START Initial Network Changes in the Network Compact Representation Graph Refine CS Output CS • Both selected as the LDF algorithm (without the refining phase) • Compact representation: • Label nodes that represents communities with leader. • Unlabel all pulled out nodes (nodes that are incident to changes). 70
A-LDF – Dynamic Network Algorithm Ø 71
Experimental Results Ø Datasets Ø Static data sets: Karate Club, Dolphin, Twitter, Flickr, . etc Ø Dynamic social networks: § § Facebook (New Orleans): 63 K nodes, 1. 5 M edges Ar. Xiv Citation network: 225 K articles, ~40 K new each year 72
Static Networks Size # 1 2 3 4 5 6 7 8 9 10 11 Vertices Karate Dolphin Les Miserables Political Books Ame. Col. Fb. Elec. Cir. S 838 Erdos Scie. Collab. Foursquare Facebook Twitter Fllickr 34 62 77 105 115 512 6, 100 44, 832 63, 731 88, 484 80, 513 Edges 78 159 254 441 613 819 9, 939 1, 664, 402 905, 565 2, 364, 322 5, 899, 882 73
0. 8500 0. 8000 0. 7500 0. 7000 0. 6500 0. 6000 0. 5500 0. 5000 0. 4500 0. 4000 Karate Dolphin Les Miserables Politic Books Amer Colg. Fb. Elec. Cir. S 838 Scien. Collab. Foursquare Facebook Twitter Fllickr Average s M Ka r Am ise ate er rab le Sc Colg s ie. F n. Co b. Fa llab ce. bo o Fll k ick r Running time in second(s) Le Modularity Performance Evaluation LDF Blondel Optimal 10 1 0. 01 0. 0001 LDF Blondel 74
Evaluation in Dynamic Networks Modularity (FB) Running time (FB) 0. 65 1000 0. 6 0. 55 100 0. 5 LDF 0. 45 Oslom 0. 4 Blondel 0. 3 Oslom QCA 1 QCA 0. 35 LDF 10 0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 0. 1 Blondel 0. 25 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 Time points 0. 01 Number of communities (FB) Time points NMI (FB) 1 0. 9 0. 8 0. 7 0. 6 LDF 0. 5 Oslom 0. 4 QCA 0. 3 Blondel 0. 2 0. 1 0 1000 100 10 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 Time points LDF Oslom QCA 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 Time points 75
Evaluation in Dynamic Networks Modularity (arxiv) Running time (arxiv) 0. 7 1000 0. 65 0. 6 100 0. 55 LDF 0. 5 Oslom 0. 45 QCA 0. 4 Blondel 0. 35 0. 3 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Time points LDF 10 Oslom QCA 1 0 2 6 8 10 12 14 16 18 20 22 24 26 28 30 0. 1 0. 01 Blondel Time points NMI (arxiv) 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 4 Number of communities (arxiv) 1000 LDF Oslom QCA Blondel 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Time points 100 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Time points 76
Incorporate other information Ø Social connections q Friendship (mutal) relation (Facebook, Google+) q Follower (unidirectional) relation (Twitter) 77
Incorporate other information Ø The discussed topics q Topics that people in a group are mostly interested 78
Incorporate other information Ø Social interactions types q Wall posts, private or group messages (Facebook) q Tweets, retweets (Twitter) q Comments 79
In rich-content social networks Ø Not only the topology that matters But also, Ø User interests q A user may interested in many communities Ø Community interests q A community may interested in different topics 80
In rich-content social networks Ø Communities = “groups of users who are interconnected and communicate on shared topics” q interconnected by social connection and interaction types Ø Given a social network with q Network topology q Users and their social connections and interactions q Topics of interests Ø How can we find meaningful communities as well as their topics of interests? 81
Approaches Ø Use Bayesian models to extract latent communities q Topic User Community Model § Posts/Tweets can be broadcasted q Topic User Recipient Community Model § Posts/Tweets are restricted to desired users only q Full Topic User Recipient Community Model § A post/tweet generated by a user can be based on multiple topics 82
Assumptions Ø A user can belong to multiple communities Ø A community can participate in multiple topics Ø For TUCM and TURCM q Posts in general discuss one topic only Ø Full TURCM q Posts can discuss multiple topics 83
Background Ø Multinormial distribution – Mult(. ) q n trials q k possible outcomes with prob. p 1, p 2, …, pk sum up to 1 q X 1, X 2, . . , Xk (Xi denote the number of times outcome #i appears in n trials) 84
Multinormal distribution 85
Symmetric Dirichlet Distribution Ø Dir. K(α) where α = (α 1, …, αK) on variable x 1, x 2, …, x. K where x. K = 1 – (x 1+. . +x. K-1) has prob. 86
Notations Ø Observation variables Latent variables 87
Notations (cont’d) Ø 88
Topic User Community Model Ø Social Interaction Profile - SIP(ui) Ø The SIP of users is represented as random mixtures over latent community variables q Each community is in turn defined as a distribution over the interaction space 89
Topic User Community Model 1 2 90
Topic User Community Model 3 a 3 b 91
TUCM Ø Model presentation Ø A Bayesian decomposition 92
TUCM – Parameter Estimation Ø 93
TUCM – Parameter Estimation 94
TUCM – Parameter Estimation 95
Topic User Recipient Community Ø This model q Does not allow mass messaging q The sender typically sends out messages to his/her acquaintances q The post are on a topic that both sender and recipient are interested in. Ø In the same spirit of TUCM q Now we have user uj for all uj in Ri 96
TURC 97
Full TURC Model Ø Previous models q Assume that each post generated by a user is based on a single topic Ø Full TURC q Relaxes this requirement q Communities how have a higher relationship to authors 98
Full TURC Model 2 1 3 99
Full TURC Model 100
Experiments Ø Data q 6 month of Twitter in 2009 § 5405 nodes, 13214 edges, 23043 posts q Enron email § 150 nodes, ~300 K emails in total Ø Number of communities C = 10 Ø Number of topics = 20 Ø Competitor methods: CUT and CART 101
Results 102
Results 103
Results 104
- Homologous structures and analogous structures
- Structures of a community
- Chapter 15 darwin's theory of evolution section review 15-1
- Theory of evolution
- External text structures examples
- Function of homologous structure
- Plan together in community mobilization
- Function of community in pakistan
- Finding community structure in very large networks
- Community detection in networks
- Physical structure in community
- Covalent bond melting point
- Giant molecular structure vs simple molecular structure
- Surface and deep structure
- Chomsky theory
- Surface and deep structure
- Zinc oxide + nitric acid → zinc nitrate + water
- Static data structures
- Contoh record
- Deep structure and surface structure
- Iamis structures
- While loops and if-else structures
- Froude's number
- Virtualization structures/tools and mechanisms
- Hardware and control structures
- Lymph nodes lower body
- Parts of a root
- A pear shaped organ
- Bureaucratic structures
- Forces on structures grade 5
- Homologous or analogous
- Which nims command and coordination
- Analogous structures vs homologous
- Text structure transition words
- Art-labeling activity: figure 13.2
- A scientist plans to cut a segment of dna
- Adductor hiatus
- Ovarian structures
- Social structures examples
- Hypothalamus
- Upper respiratory system
- Anterior nucleus
- Structures in limbic system
- Text features vs text structures
- An example of a solid
- Teaching market structures with a competitive gum market
- Superior steel structures
- Understanding business management
- What are mass structures
- Structures in c
- Mechanism and structure
- Mass structures
- Statement level control structures
- Btechsmartclass data structures
- What does it mean for a structure to be stable
- Tongue epithelium
- Types of social structure
- Pneumatic system structure
- Types of data structures in r
- Dot symbol
- Operational coordination is considered
- Popliteal fossa structures
- Smile acronym poetry
- Poem structure types
- Unit 15 plant structures and taxonomy
- Intro.php?aid=
- Persuasive speech problem solution
- Impersonal passive exercises
- Transverse perineal muscle
- Rod or bar fingerprint
- Distal tubule
- Description text structure examples
- Type of text structures
- Chapter 7 organizational structures
- Operating system structure
- Stairs in 3 point perspective
- Marketing channel structures
- Oblivious data structures
- External brain structure
- Sales force structure
- Marketing channel structures
- Deep fascia
- Brodmann
- Rules for drawing lewis structures
- Write the lewis dot structure of co molecule
- Advantages and disadvantages of indeterminate structures
- Linux kernel data structures
- Robert fugard
- Introduction to data structures
- Introduction to data structures
- Lewis theory of covalent bond
- Muscles cut in episiotomy
- Connecting the concepts angiosperm reproductive structures
- Professor ajit diwan
- Homologous defintion
- How to draw a bohr rutherford diagram for lithium
- Hoop building for hogs
- Are bat wings and whale flipper homologous
- Parallel structure
- Example of metal
- Characteristics of eukaryotes
- Fronting of the first auxillary
- Branches of femoral artery
- Tempo markings slowest to fastest