Spectral Analysis of Random Graphs with Skewed degrees

Motivation • Clustering : partitioning data into subsets such that elements in each part

Spectral Clustering • Singular Value decomposition of a matrix 3

Spectral Clustering • Discard smaller singular values and vectors. • Cluster the low dimensional

Spectral Clustering • Analyze the heuristic under a probabilistic model of data. - G(n,

Model • Partition the vertices into k parts. example: Set of papers ={T 1,

Model • T 1 and T 2 have – same distribution of probabilities. –

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M

Background • Vector norm |x|2 = (x 12 + …+xn 2)1/2 • Matrix norm

Background Theorem: For a rank-k matrix A, || A ||F 2 k || A

Background • {v 1, v 2} orthogonal vectors - P = v 1 v

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. –

Why spectral works Reconstruction error E /(blocksize) + … • For the block matrix

Effect of Degree distribution • With degrees, Muv = du. Guvdv E = 2

Effect of Degree distribution Theorem (Mihail, Papadimitriou) For a power law degree distribution {du},

Solution • Want to scale entries such that variances are uniform. • Suppose X

Laplacian • Define the Laplacian matrix as Expected Laplacian 22

Laplacian The total error Same as G, increased the signal / noise ratio of

Related Work - Mc. Sherry 2001 Chung et al. use of Laplacian for normalized

Algorithm • Given find the Laplacian • Cluster the columns of 25

Proof • Will show that • Proof proceeds by bounding the sides of the

Proof • Remember that • What if the eigenvectors form a bad basis ?

Proof • Proof would work great with the space of the expectation vectors same

New algorithm • Do an initial greedy clustering of to get the space P.

Proof • Finally, if we have proved that then, if we had for u,

Theorem If M=DGD, and each column in different parts of is separated as follows

Recap • To undo effects of degree on variance, need to take Laplacian. •

Future Work • Can extend this to bipartite graphs but need both row and

Slides: 33

Download presentation

Spectral Analysis of Random Graphs with Skewed degrees Anirban Dasgupta Joint work with John Hopcroft and Frank Mc. Sherry 1

Motivation • Clustering : partitioning data into subsets such that elements in each part share a common ‘trait’. • Our interest is in clustering graph structured data e. g. web, linked databases. • Data can be represented by the adjacency matrix of the graph. 2

Spectral Clustering • Singular Value decomposition of a matrix 3

Spectral Clustering A 4

Spectral Clustering 5

Spectral Clustering • Discard smaller singular values and vectors. • Cluster the low dimensional representation. • Intuition: removing smaller singular values results in ‘more tightly’ clustered data 6

Spectral Clustering • Analyze the heuristic under a probabilistic model of data. - G(n, p) : Erdos model for random graphs. - trivial clustering methods perform well in this space. • Need a model of random graphs with structure embedded. 7

Model • Partition the vertices into k parts. example: Set of papers ={T 1, T 2, D 1, D 2}. G= • Adjacency matrix 1 w. p. Guv 0 w. p. 1 - Guv • For k-partitions, partition function for n vertices into k parts. - Guv depends on (u) and (v) - generalizes clique, k-coloring. (Mc. Sherry 01) 8

Model • T 1 and T 2 have – same distribution of probabilities. – same expected degree. 9

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M = DGD Muv = du. Guvdv 10

Incorporating degrees • {d 1, d 2, …, dn} set of degrees. • M = DGD Muv = du. Guvdv • Adjacency matrix by 0/1 rounding 1 w. p. Muv 0 w. p. 1 – Muv • (actual) expected degrees proportional to du 11

Background • Vector norm |x|2 = (x 12 + …+xn 2)1/2 • Matrix norm - |A|2 = max |x|2 =1| Ax |2 - |A|F = (sum of squared entries Aij )1/2 12

Background Theorem: For a rank-k matrix A, || A ||F 2 k || A ||22 13

Background • {v 1, v 2} orthogonal vectors - P = v 1 v 1 T + v 2 v 2 T is projection onto space spanned by {v 1, v 2}. example: {v 1, . . , vk} eigenvectors of A PA = A(k) is optimal rank-k approximation of A 14

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) 15

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) – is O(kn varmax [Ĝuv] ) 16

Why spectral works • obtained by rounding probability matrix G, rank(G) = k. – is O(n 2 varmax[Ĝuv] ) – is O(kn varmax [Ĝuv] ) Theorem: If varmax[Ĝuv]>> then E < 2 kn varmax[Ĝuv] 17

Why spectral works Reconstruction error E /(blocksize) + … • For the block matrix G, - rows in a class suffer (almost)similar reconstruction error. - entries within a block have same variance. 18

Effect of Degree distribution • With degrees, Muv = du. Guvdv E = 2 kn varmax [Muv] 19

Effect of Degree distribution Theorem (Mihail, Papadimitriou) For a power law degree distribution {du}, for i=1, . . , o(n) and Skewed degree distributions are bad ? ? 20

Solution • Want to scale entries such that variances are uniform. • Suppose X and Y are 0/1 random variables with – Pr[X=1] = w Pr[Y=1] var[X] = w var [Y] var[ ] = var[ Y ] • Scale each entry by 21

Laplacian • Define the Laplacian matrix as Expected Laplacian 22

Laplacian The total error Same as G, increased the signal / noise ratio of data. Skewed degree distributions are ok better !! 23

Related Work - Mc. Sherry 2001 Chung et al. use of Laplacian for normalized graph-cuts. other normalizations used in IR. 24

Algorithm • Given find the Laplacian • Cluster the columns of 25

Proof • Will show that • Proof proceeds by bounding the sides of the triangle. Systematic error: Random error: 26

Proof • Remember that • What if the eigenvectors form a bad basis ? Idea: Space P does not have to be the eigenvector space. 27

Proof • Proof would work great with the space of the expectation vectors same as space of cluster vectors. 28

New algorithm • Do an initial greedy clustering of to get the space P. • Use this space P for projection to get final result. 29

Proof • Finally, if we have proved that then, if we had for u, v in different clusters, we can separate the clusters. 30

Theorem If M=DGD, and each column in different parts of is separated as follows Then we can recover the partition w. p. 1 - . 31

Recap • To undo effects of degree on variance, need to take Laplacian. • Rank-k approximation boosts ‘signal / noise’ ratio, more for skewed degrees. • Breaking the analysis into the systemic and random errors. • Constructing a different projection space. 32

Future Work • Can extend this to bipartite graphs but need both row and column separations. • Need better matrix inequality to extend to sparse graphs. • Can we represent the apriori weights in collaborative filtering using this degree formulation ? • Extending the random graph model ? 33