Community Detection Modularity and Trawling CS 224 W
- Slides: 43
Community Detection: Modularity and Trawling CS 224 W: Social and Information Network Analysis Jure Leskovec, Stanford University http: //cs 224 w. stanford. edu
Network Communities �Communities: sets of tightly connected nodes �Define: Modularity Q § A measure of how well a network is partitioned into communities § Given a partitioning of the network into groups s S: Q ∑s S [ (# edges within group s) – (expected # edges within group s) ] Need a null model! 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 2
Null Model: Configuration Model � i j 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu Note: 3
Modularity � Normalizing cost. : -1<Q<1 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu Aij = 1 if i j, 0 else 4
Modularity: Number of clusters �Modularity is useful for selecting the number of clusters: Q Why not optimize modularity directly? 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 5
Rewrite the question as two separate summations – one over A, one of ki*kj Method 2: Modularity Optimization � 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 1. . if si=sj 0. . else 6
Modularity Matrix � 12/5/2020 Why it B_ij s_i s_j a s. Bs product? Explain. What do we mean rewrite Q in terms of eigen vals and vecs? Give the basic Note: each row definition of the eigen /column of B sums to 0 decompostion Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 7
Why is making s parallel to u_1 the right thing to do? Modularity Optimization � 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 8
Finding Vector s � 12/5/2020 Explain the approximation – we only consider first term in the summation. Explain the intuition that we are making s approximately parallel to u_1 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 9
Summary: Modularity Optimization �Fast Modularity Optimization Algorithm: § Find leading eigenvector u 1 of modularity matrix B § Divide the nodes by the signs of the elements of u 1 § Repeat hierarchically until: § If a proposed split does not cause modularity to increase, declare community indivisible and do not split it § If all communities are indivisible, stop �How to find u 1? Power method! § Start with random v(1), repeat : § When converged (v(t) ≈ v(t+1)), set u 1 = v(t) 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 10
Skip this slide! Additional Heuristic Approaches Start: �(1) Greedy post-processing: § Start with nodes in two groups, s § Repeat t = 1. . n until all nodes have been moved: § For i = 1. . n § Consider moving node i, compute new Qt(si) § Move node j that hasn’t yet been moved and that maximizes Qt(sj) § Note that Qt can decrease with time t 1 5 2 6 3 7 Move best not-yet-moved node (3), store Q 1 1 5 2 6 7 3 Move best not-yet-moved node (5), store Q 2 § Once iteration is complete, find 5 1 intermediate state t with highest Qt 2 6 § Start from this state and repeat 7 3 until Q stops increasing Dot this for every not-yet-moved node, pick state x that max Qt 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 11
Skip! Too many details and not enough time to explain the updates to the modularity matrix Additional Heuristic Approaches � 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 12
Modularity Optimization Methods Cut out the CNM and DA columns. Fast modularity � 12/5/2020 GN = Betweenness centrality, O(n 3) CNM = Clauset-Newman-Moore (n log 2 n) DA = External optimization O(n 2 log 2 n) Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 13
Summary: Modularity �Girvan-Newman (previous lecture): § Based on the “strength of weak ties” § Remove edge of highest betweenness �Modularity: § Overall quality of the partitioning of a graph § Use to determine the number of communities �Fast modularity optimization: § Transform the modularity optimization to a eigenvalue problem �Clauset-Newman-Moore: § Agglomerative clustering based on Modularity 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 14
Trawling for Web Communities
[Kumar et al. ‘ 99] Method 3: Trawling �Searching for small communities in … … … the Web graph �What is the signature of a community / discussion in a Web graph? Use this to define “topics”: What the same people on the left talk about on the right Remember HITS! Dense 2 -layer graph Intuition: Many people all talking about the same things 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 16
Searching for Small Communities �A more well-defined problem: Enumerate complete bipartite subgraphs Ks, t § Where Ks, t : s nodes on the “left” where each links to the same t other nodes on the “right” X K 3, 4 Y |X| = s = 3 |Y| = t = 4 Fully connected 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 17
The Plan: (1), (2) and (3) [Kumar et al. ‘ 99] �Two points: § (1) Dense bipartite graph: the signature of a community/discussion § (2) Complete bipartite subgraph Ks, t § Ks, t = graph on s nodes, each links to the same t other nodes �Plan: § (A) From (2) get back to (1): § Via: Any dense enough graph contains a smaller Ks, t as a subgraph § (B) How do we solve (2) in a giant graph? § What similar problems were solved on big non-graph data? § (3) Frequent itemset enumeration [Agrawal-Srikant ‘ 99] 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 18
[Agrawal-Srikant ‘ 99] Frequent Itemset Enumeration �Marketbasket analysis: § What items are bought together in a store? �Setting: § Market: Universe U of n items § Baskets: m subsets of U: S 1, S 2, …, Sm U (Si is a set of items one person bought) § Support: Frequency threshold f Products sold in a store �Goal: § Find all subsets T s. t. T Si of f sets Si (items in T were bought together at least f times) 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 19
Frequent Itemsets: Example �Given: § Universe of items: § U={1, 2, 3, 4, 5} § Market baskets: Support of T={2, 3} is 2 § S 1={1, 3, 5}, S 2={2, 3, 4}, S 3={2, 4, 5}, S 4={3, 4, 5}, S 5={1, 3, 4, 5}, S 6={2, 3, 4, 5} (T appears in S 2 and S 6) § Minimum support: f = 3 § Goal: Find all sets T that appear in at least f Si’s § Call such itemsets T frequent itemsets (they have support f) �Algorithm: Build the lists bottom-up § Insight: For a frequent set of size k, all its subsets are also frequent 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu If T={3, 4, 5} is frequent, then {3, 4}, {3, 5}, {4, 5} must also be frequent! 20
[Agrawal-Srikant ‘ 99] Example: the Apriori Algorithm �Setting: § U={1, 2, 3, 4, 5}, f=3 § S 1={1, 3, 5}, S 2={2, 3, 4}, S 3={2, 4, 5}, S 4={3, 4, 5}, S 5={1, 3, 4, 5}, S 6={2, 3, 4, 5} Itemset size 1 2 3 4 12/5/2020 Itemsets {1} {2} {3} {4} {5} {2, 3} {2, 4} {2, 5} {3, 4} {3, 5} {4, 5} {2, 3, 4} {3, 4, 5} 2 steps: 1) Candidate generation 2) Pruning {} Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 21
The Apriori Algorithm �For [Agrawal-Srikant ‘ 99] i = 1, …, k § Generate all sets of size i by composing sets of size i-1 that differ in 1 element § Prune the sets of size i with support < f �Open question: § Efficiently find only maximal frequent sets �What’s the connection between the itemsets and complete bipartite graphs? 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 22
[Kumar et al. ‘ 99] From Itemsets to Bipartite Ks, t �Itemsets finds Complete bipartite graphs a �How? § View each node i as a set Si of nodes i points to § Ks, t = a set Y of size t that occurs in s sets Si § Looking for Ks, t set of frequency threshold to s and look at layer t – all frequent sets of size t 12/5/2020 b i c Si={a, b, c, d} d j i k X a b c d Y s … minimum support (|X|=s) t … itemset size Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 23
From Ks, t to Communities � 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 24
*****END***** �END OF TIME – had 5 min left to cover the competition results �Itemsets caught people’s attention �Need to find a better visual way to explain how itemsets find bipartite graphs 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 25
Proof: Ks, t and Communities f(x) � 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu x 26
Nodes and Buckets �Consider node i of degree ki and neighbor set Si a i b i i c (a, b) (a, c) (a, d) (b, c) …. d …. �Put node i in buckets for all size t subsets of i’s neighbors 12/5/2020 Potential right-hand sides of Ks, t (i. e. , all size t subsets of Si) As soon as s nodes appear in a bucket we have a Ks, t Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 27
Nodes and Buckets � = # of ways to select t elements out of ki ki … degree of node i By convexity (ki > t) 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 28
Nodes and Buckets � Plug in: 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 29
And We are Done! �We have: Total height of all buckets: �How many buckets are there? �What is the average height of buckets? So, avg. bucket height s � By pigeonhole principle, there must be at least one bucket with more than s nodes in it. � We found a Ks, t 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 30
[Kumar et al. ‘ 99] Method 3: Trawling — Summary �Analytical result: § Complete bipartite subgraphs Ks, t are embedded in larger dense enough graphs (i. e. , the communities) § Biparite subgraphs act as “signatures” of communities �Algorithmic result: § Frequent itemset extraction and dynamic programming finds graphs Ks, t § Method is super scalable 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 31
Spectral Graph Partitioning
Method 4: Graph Partitioning �Undirected graph G(V, E): 5 1 2 4 3 �Bi-partitioning task: 6 § Divide vertices into two disjoint groups (A, B) A 2 3 B 5 1 4 6 �Questions: § How can we define a “good” partition of G? § How can we efficiently identify such a partition? 11/8/2010 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 33
Graph Partitioning �What makes a good partition? § Maximize the number of within-group connections § Minimize the number of between-group connections 5 1 2 3 A 11/8/2010 6 4 B Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 34
Graph Cuts �Express partitioning objectives as a function of the “edge cut” of the partition �Cut: Set of edges with only one vertex in a group: A 1 2 3 11/8/2010 B 5 4 6 cut(A, B) = 2 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 35
Graph Cut Criterion �Criterion: Minimum-cut § Minimise weight of connections between groups min. A, B cut(A, B) �Degenerate case: “Optimal cut” Minimum cut �Problem: § Only considers external cluster connections § Does not consider internal cluster connectivity 11/8/2010 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 36
Graph Cut Criteria [Shi-Malik] � 11/8/2010 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 37
Competition Results: Graph Alignment Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu
Wikipedia Graph Alignment �Given the German and French Wikipedia graph �And a few example corresponding articles �Goal: Find the remaining correspondences: § Link “Paris” in German to “Paris” in French § Intuition: Paris in both languages links to “similar” pages (pages that also link to each other) 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 39
Approach 1: Square Maximization Winning solution: �Start from some pairing S § Start from random pairing �Goodness of pairing S: § Number of “squares” �Consider transforming (u. F, u. G), (v. F, v. G) to (v. F, u. G), (u. F, v. G) �Accept the swap if the number of squares increases �Improvements: § Bound on swap improvement: § No need to swap nodes that don’t give good improvement § Computing swap change efficiently 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 40
Approach 1: Square Maximization 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 41
Approach 2: Machine Learning �For a pair of nodes (u. F, u. G) construct a feature vector § Matches from the training set (M. txt) are “positive” examples § Pairs not in M. txt are “negative” examples �Use Random Forests to label pairs (AUC=0. 87) § Each pair gets a probability that they match �Now greedily fill-in the remaining pairings by considering correspondence probabilities 12/5/2020 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 42
Results and Extra Credit ID # Correct krish (10%) 3, 308 pmk (8%) 2, 941 lussier 1 (6%) 2, 191 prgao (4%) 2, 107 jieyang (4%) 1, 706 carmenv 978 anmittal 861 adotey 828 billyue 805 gibbons 4 507 leonlin 145 cktan 65 12/5/2020 Fraction 0. 83 0. 74 0. 55 0. 53 0. 43 0. 24 0. 22 0. 21 0. 20 0. 13 0. 04 0. 02 Jure Leskovec, Stanford CS 224 W: Social and Information Network Analysis, http: //cs 224 w. stanford. edu 43
- Community detection in networks
- Builder vs buyer in stm
- Modularity oop
- Look up table in fpga
- Ncut
- Gephi tool
- Crack comparator card
- Kohlbergs stages
- Score de genève modifié simplifié
- Asw 224
- Ece 224
- Asw 224
- Utilizarile alchinelor
- 224 bus route
- 224 binary
- Cs 224w
- Soc 224
- Km 224
- Eliminasyon nedir organik kimya
- Km 224
- Km 224
- Mat 224
- Ece 224
- Dea number
- 224+48
- 224+48
- Autoencoders
- Cs 224
- 224 meaning
- 224-176
- Phys 224
- Iso tc 224
- Cs-224 computer organization
- Evaluate together in community mobilization
- Radar is an acronym for
- Checksum in computer networks with example
- Hazard detection and resolution
- Hazard detection and resolution
- Deadlock detection and recovery
- Firewalls and intrusion detection systems
- Error detection and correction in data link layer
- Crc mechanism
- Collision detection and resolution
- Collide and slide algorithm