Spectral Analysis of Random Graphs with Skewed degrees

  • Slides: 33
Download presentation
Spectral Analysis of Random Graphs with Skewed degrees Anirban Dasgupta Joint work with John

Spectral Analysis of Random Graphs with Skewed degrees Anirban Dasgupta Joint work with John Hopcroft and Frank Mc. Sherry 1

Motivation • Clustering : partitioning data into subsets such that elements in each part

Motivation • Clustering : partitioning data into subsets such that elements in each part share a common ‘trait’. • Our interest is in clustering graph structured data e. g. web, linked databases. • Data can be represented by the adjacency matrix of the graph. 2

Spectral Clustering • Singular Value decomposition of a matrix 3

Spectral Clustering • Singular Value decomposition of a matrix 3

Spectral Clustering A 4

Spectral Clustering A 4

Spectral Clustering 5

Spectral Clustering 5

Spectral Clustering • Discard smaller singular values and vectors. • Cluster the low dimensional

Spectral Clustering • Discard smaller singular values and vectors. • Cluster the low dimensional representation. • Intuition: removing smaller singular values results in ‘more tightly’ clustered data 6

Spectral Clustering • Analyze the heuristic under a probabilistic model of data. - G(n,

Spectral Clustering • Analyze the heuristic under a probabilistic model of data. - G(n, p) : Erdos model for random graphs. - trivial clustering methods perform well in this space. • Need a model of random graphs with structure embedded. 7

Model • Partition the vertices into k parts. example: Set of papers ={T 1,

Model • Partition the vertices into k parts. example: Set of papers ={T 1, T 2, D 1, D 2}. G= • Adjacency matrix 1 w. p. Guv 0 w. p. 1 - Guv • For k-partitions, partition function for n vertices into k parts. - Guv depends on (u) and (v) - generalizes clique, k-coloring. (Mc. Sherry 01) 8

Model • T 1 and T 2 have – same distribution of probabilities. –

Model • T 1 and T 2 have – same distribution of probabilities. – same expected degree. 9

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M = DGD Muv = du. Guvdv 10

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M = DGD Muv = du. Guvdv • Adjacency matrix by 0/1 rounding 1 w. p. Muv 0 w. p. 1 – Muv • (actual) expected degrees proportional to du 11

Background • Vector norm |x|2 = (x 12 + …+xn 2)1/2 • Matrix norm

Background • Vector norm |x|2 = (x 12 + …+xn 2)1/2 • Matrix norm - |A|2 = max |x|2 =1| Ax |2 - |A|F = (sum of squared entries Aij )1/2 12

Background Theorem: For a rank-k matrix A, || A ||F 2 k || A

Background Theorem: For a rank-k matrix A, || A ||F 2 k || A ||22 13

Background • {v 1, v 2} orthogonal vectors - P = v 1 v

Background • {v 1, v 2} orthogonal vectors - P = v 1 v 1 T + v 2 v 2 T is projection onto space spanned by {v 1, v 2}. example: {v 1, . . , vk} eigenvectors of A PA = A(k) is optimal rank-k approximation of A 14

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. –

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) 15

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. –

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) – is O(kn varmax [Ĝuv] ) 16

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. –

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) – is O(kn varmax [Ĝuv] ) Theorem: If varmax[Ĝuv]>> then E < 2 kn varmax[Ĝuv] 17

Why spectral works Reconstruction error E /(blocksize) + … • For the block matrix

Why spectral works Reconstruction error E /(blocksize) + … • For the block matrix G, - rows in a class suffer (almost)similar reconstruction error. - entries within a block have same variance. 18

Effect of Degree distribution • With degrees, Muv = du. Guvdv E = 2

Effect of Degree distribution • With degrees, Muv = du. Guvdv E = 2 kn varmax [Muv] 19

Effect of Degree distribution Theorem (Mihail, Papadimitriou) For a power law degree distribution {du},

Effect of Degree distribution Theorem (Mihail, Papadimitriou) For a power law degree distribution {du}, for i=1, . . , o(n) and Skewed degree distributions are bad ? ? 20

Solution • Want to scale entries such that variances are uniform. • Suppose X

Solution • Want to scale entries such that variances are uniform. • Suppose X and Y are 0/1 random variables with – Pr[X=1] = w Pr[Y=1] var[X] = w var [Y] var[ ] = var[ Y ] • Scale each entry by 21

Laplacian • Define the Laplacian matrix as Expected Laplacian 22

Laplacian • Define the Laplacian matrix as Expected Laplacian 22

Laplacian The total error Same as G, increased the signal / noise ratio of

Laplacian The total error Same as G, increased the signal / noise ratio of data. Skewed degree distributions are ok better !! 23

Related Work - Mc. Sherry 2001 Chung et al. use of Laplacian for normalized

Related Work - Mc. Sherry 2001 Chung et al. use of Laplacian for normalized graph-cuts. other normalizations used in IR. 24

Algorithm • Given find the Laplacian • Cluster the columns of 25

Algorithm • Given find the Laplacian • Cluster the columns of 25

Proof • Will show that • Proof proceeds by bounding the sides of the

Proof • Will show that • Proof proceeds by bounding the sides of the triangle. Systematic error: Random error: 26

Proof • Remember that • What if the eigenvectors form a bad basis ?

Proof • Remember that • What if the eigenvectors form a bad basis ? Idea: Space P does not have to be the eigenvector space. 27

Proof • Proof would work great with the space of the expectation vectors same

Proof • Proof would work great with the space of the expectation vectors same as space of cluster vectors. 28

New algorithm • Do an initial greedy clustering of to get the space P.

New algorithm • Do an initial greedy clustering of to get the space P. • Use this space P for projection to get final result. 29

Proof • Finally, if we have proved that then, if we had for u,

Proof • Finally, if we have proved that then, if we had for u, v in different clusters, we can separate the clusters. 30

Theorem If M=DGD, and each column in different parts of is separated as follows

Theorem If M=DGD, and each column in different parts of is separated as follows Then we can recover the partition w. p. 1 - . 31

Recap • To undo effects of degree on variance, need to take Laplacian. •

Recap • To undo effects of degree on variance, need to take Laplacian. • Rank-k approximation boosts ‘signal / noise’ ratio, more for skewed degrees. • Breaking the analysis into the systemic and random errors. • Constructing a different projection space. 32

Future Work • Can extend this to bipartite graphs but need both row and

Future Work • Can extend this to bipartite graphs but need both row and column separations. • Need better matrix inequality to extend to sparse graphs. • Can we represent the apriori weights in collaborative filtering using this degree formulation ? • Extending the random graph model ? 33