Properties of network community structure Jure Leskovec CMU
- Slides: 57
Properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research
Networks �Big data �Study emerging behaviors �How are small networks different from large 2
Network community structure �Communities (groups, clusters, modules): Sets of nodes with lots of connections inside and few to outside (the rest of the network) Communities, clusters, groups, modules 3
Example 1: Biology � Nodes represent proteins � Edges represent interactions/associations � Proteins with same function interact more � Can use network to discover functional groups Yeast transcriptional regulatory modules [Bar-Joseph et al. , 2003] 4
Example 2: Social networks �Clusters correspond to social communities, organizational units (e. g. , departments) Zachary’s Karate club network • During the study the club split into 2 • The split corresponds to min-cut (● vs. ■) 5
Example 3: Web (blogs) Democrat vs. Republican blogs [Adamic-Glance 2005] 6
Example 4: Science Citations Collaborations [Newman 2003] 7
Example 5: Hierarchies �Nested communities: modular structure of networks is hierarchically organized University Arts Science CS Math Drama Music 8
Example 6: Hierarchies �Recursive hierarchical network (a) N=5, E=8 (b) N=25, E=56 (c) N=125, E=344 9
Discovering communities �Intuition: Find nodes that can be easily separated from the rest of the network �Various objective functions Min-cut Normalized-cut Centrality, Modularity �Various algorithms Spectral clustering (random walks) Girvan-Newman (centrality) Metis (contraction based) Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality 10
Our question: How well are communities expressed?
Our question: How well are communities expressed? Statistical properties of community structure �Instead of searching for communities we measure well how expressed are communities Questions �What is the community structure of real world networks? �How to measure and quantify this? �What does this tell us about network structure? �What is a good model (intuition)? �What are consequences for clustering/partitioning algorithms? 12
Community score (quality) �How community like is a S set of nodes? �Need a natural intuitive S’ measure �Conductance (normalized cut) Φ(S) = # edges cut / # edges inside �Small Φ(S) corresponds to more community-like sets of nodes 13
Community score (quality) What is “best” community of 5 nodes? Score: Φ(S) = # edges cut / # edges inside 14
Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/6 = 0. 83 Score: Φ(S) = # edges cut / # edges inside 15
Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/7 = 0. 7 Better community Φ=2/5 = 0. 4 Score: Φ(S) = # edges cut / # edges inside 16
Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/7 = 0. 7 Best community Φ=2/8 = 0. 25 Better community Φ=2/5 = 0. 4 Score: Φ(S) = # edges cut / # edges inside 17
Network Community Profile Plot �We define: Network community profile (NCP) plot Plot the score of best community of size k �Search over all subsets of size k and find best: Φ(k=5) = 0. 25 �NCP plot is intractable to compute 18
Network Community Profile Plot �We define: Network community profile (NCP) plot Plot the score of best community of size k log Φ(k) k=5, Φ(k)=0. 25 k=7, Φ(k)=0. 18 Community size, log k 19
Community score, log Φ(k) NCP: Example Community size, log k 20
Scaling to large networks � Local spectral clustering algorithm Pick a seed node Slowly diffuse mass around it (via Page. Rank like random walk) Find the bottleneck � Repeat many times � Many seed nodes for very local walks � Less seed nodes for more global (longer) walks 21
How are NCP plots of real networks?
NCP plot: Small Social Network �Dolphin social network Two communities of dolphins Network NCP plot 23
NCP plot: Zachary’s karate club �Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B Network NCP plot 24
NCP plot: Network Science �Collaborations between scientists in Networks Network NCP plot 25
NCP plot: Grids Network NCP plot 26
NCP plot: Graphs with geometry Network NCP plot 27
NCP plot: Manifold dataset �Manifold learning dataset (Hands) Network NCP plot 28
NCP plot: Power Grid �Eastern US power grid: 29
NCP plot: Hierarchical network – Small social networks – Geometric and – Hierarchical network have downward NCP plot Network What about large networks? NCP plot 30
NCP of Large Networks
Our work: Large networks �Previously researchers examined community structure of small networks (~100 nodes) �We examined more than 70 different large networks Large real-world networks look very different! 32
Example of our findings �Typical example: General relativity collaboration network (4, 158 nodes, 13, 422 edges) 33
NCP: Live. Journal (N=5 M, E=42 M) Community score Better and better communities Best communities get worse and worse Best community has 100 nodes Community size 34
Explanation: Downward part �Whiskers are responsible for downward slope of NCP plot Largest whisker Whisker is a set of nodes connected to the network by a single edge 35
Explanation: Upward part �Each new edge inside the community costs more Φ=1/3 = 0. 33 NCP plot Φ=2/4 = 0. 5 Φ=8/6 = 1. 3 Φ=64/14 = 4. 5 Each node has twice as many children 36
Comparison to rewired network �Take a real network G �Rewire edges for a long time �We obtain a random graph with same degree distribution as the real network G 37
Comparison to a rewired network Rewired network: random network with same degree distribution 38
Live. Journal whisker sizes Whiskers in real networks are larger than expected 39
Whisker shapes Edge to cut Whiskers in real networks are non-trivial (richer than trees) 40
Caveat: Bag of whiskers What if we allow cuts that give disconnected communities? • Cut all whiskers • Compose communities out of whiskers • How good “communities” do we get? 41
Community score Communities made of whiskers We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers Community size 42
What if I remove whiskers? Nothing happens! Now we have 2 -edge connected whiskers to deal with. 43
Conclusion: Core+Whiskers Rewired network Connected communities Bag of whiskers 44
Conclusion: Network structure Denser and denser core of the network Core contains 60% node and 80% edges Network structure: Core-periphery (jellyfish, octopus) Whiskers are responsible for good communities 45
What is a good model? Intuiton?
(Sparse) Random graphs �(Sparse) Random graph: Start with N nodes Pick pairs of nodes uniformly at random and connect Theorem (works for any degree distribution) Flat (long random connections) Sparsity does not explain our observation 47
Preferential attachment � Preferential attachment [Price 1965, Albert & Barabasi 1999]: Add a new node, create m out-links Probability of linking a node ki is proportional to its degree � Based on Herbert Simon’s result Power-laws arise from “Rich get richer” (cumulative advantage) Flat (connections to hubs – no locality) 48
Small-World model �Let’s exploit local connections Down (locally network looks like a mesh) and Flat (at large scale network looks random) 49
Geometry + Preferential attachment �Geometric preferential attachment: Place nodes at random in 2 D Pick a node Pick nodes in a radius Connect preferentially Flat (locally network is random) and Down (globally network is a mesh – union of local expanders) 50
Model that works: Forest Fire �Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network 51
Forest Fire NCP plot rewired network Bag of whiskers 52
Conclusion and connections �Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to at most 150 people �Core: Core has little structure (hard to cut) Still more structure than the random network 53
Connections to previous work �Other researchers examined small networks so they did not hit the Dunbar’s limit �Small evidence: 400 k nodes Amazon co-purchasing network [Clauset et al. 2004] ▪ Largest community has 50% of all nodes ▪ It was labeled “Miscelaneous” Karate club has no significant community structure [Newman et al. 2007] 54
Other explanantions �Bond vs. identity communities �Multiple hierarchies that blur the community boundaries 55
Is there still hope? �Ground truth �Yes, use attributes, better link semantics 56
Conclusion and connections �NCP plot is a way to analyze network community structure �Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms) �But large networks are different �Large networks Whiskers + Core structure Small well isolated communities blend into the core of the networks as they grow 57
- Leskovec
- Jure kokalj
- Jure petric
- De facto and de jure meaning
- Jure jerman
- De jure and de facto sovereignty
- De jure vs de facto
- Jure staničić
- Estandar de iure
- Jure grando grave
- Materi respon internasional terhadap kemerdekaan indonesia
- Jure hren
- Jureumi meaning
- Jure radišek
- Extensive vs intensive
- Chemical property definition
- Functions of community
- Community action cycle for community mobilization
- Covalent molecular and covalent network
- Properties of flow network
- Covalent network conductivity
- Cpesn pharmacy locator
- Oakland community health network
- Canadian community economic development network
- Community health access network
- Virtual circuit network uses
- Network topology in computer network
- Features of peer to peer network and client server network
- Ece 526
- Network centric computing and network centric content
- Packet switching advantages and disadvantages
- Atomic structure and properties ap chemistry
- Study of composition structure and properties
- Wjec periodic table
- Structure and properties of ceramics
- Properties of polyethylene
- Structure and function of community
- Community structure means
- Finding community structure in very large networks
- Modularity and community structure in networks
- Physical structure of a community
- Cmu 16-385
- Gregory kesden cmu
- Hci minor cmu
- Igemdock
- Cmu 15-418
- Lorrie cranor cmu
- Mitchell machine learning
- 14848 cmu
- Cmu 14848
- Hui zhang cmu
- Cmu 15-410
- Vyas sekar
- Anupam datta cmu
- 10417 cmu
- Cloud computing lecture
- Tom cortina cmu
- Cmu snake robot