CMU SCS Large Graph Algorithms Christos Faloutsos CMU

  • Slides: 31
Download presentation
CMU SCS Large Graph Algorithms Christos Faloutsos CMU Akoglu, Leman Chau, Polo Kang, U

CMU SCS Large Graph Algorithms Christos Faloutsos CMU Akoglu, Leman Chau, Polo Kang, U Open. Cirrus'10 Mc. Glohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis C. Faloutsos (CMU) #1

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] ICDM-LDMTA 2009 C. Faloutsos 2

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN • • • TM web: hyper-text graph Social networking sites (Facebook, twitter) Users posing and answering questions Click-streams (user – page bipartite graph). . . and more – any M: N db relationship ICDM-LDMTA 2009 C. Faloutsos T 1 3

CMU SCS Our goal: One-stop solution for mining huge graphs: PEGASUS project (PEta Gr.

CMU SCS Our goal: One-stop solution for mining huge graphs: PEGASUS project (PEta Gr. Aph mining System) • www. cs. cmu. edu/~pegasus • Open-source code and papers Open. Cirrus'10 C. Faloutsos (CMU) 4

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles Visualization Open. Cirrus'10 Centralized Hadoop/PEG ASUS old old old DONE STARTED C. Faloutsos (CMU) 5

CMU SCS HADI for diameter estimation • Radius Plots for Mining Tera-byte Scale Graphs

CMU SCS HADI for diameter estimation • Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’ 10 • Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1 B) • Our HADI: linear on E (~10 B) – Near-linear scalability wrt # machines – Several optimizations -> 5 x faster Open. Cirrus'10 C. Faloutsos (CMU) 6

CMU SCS Count ? ? ? 19+? [Barabasi+] Radius Yahoo. Web graph (120 Gb,

CMU SCS Count ? ? ? 19+? [Barabasi+] Radius Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • Largest publicly available graph ever studied. Open. Cirrus'10 C. Faloutsos (CMU) 7

CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B

CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores. Open. Cirrus'10 C. Faloutsos (CMU) 8

CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B

CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores. Open. Cirrus'10 C. Faloutsos (CMU) 9

CMU SCS Radius Plot of GCC of Yahoo. Web. Open. Cirrus'10 C. Faloutsos (CMU)

CMU SCS Radius Plot of GCC of Yahoo. Web. Open. Cirrus'10 C. Faloutsos (CMU) 10

CMU SCS Running time - Kronecker and Erdos-Renyi Graphs with billions edges. Open. Cirrus'10

CMU SCS Running time - Kronecker and Erdos-Renyi Graphs with billions edges. Open. Cirrus'10 C. Faloutsos (CMU) #11

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles Visualization Open. Cirrus'10 Centralized Hadoop/PEG ASUS old old old DONE STARTED C. Faloutsos (CMU) 12

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). Open. Cirrus'10 C. Faloutsos (CMU) 13

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) • Page. Rank • proximity (RWR)

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) • Page. Rank • proximity (RWR) • Diameter • Connected components • (eigenvectors, • Belief Prop. • …) Open. Cirrus'10 C. Faloutsos (CMU) Matrix – vector Multiplication (iterated) 14

CMU SCS Example: GIM-V At Work • Connected Components Count Size 15 Open. Cirrus'10

CMU SCS Example: GIM-V At Work • Connected Components Count Size 15 Open. Cirrus'10 C. Faloutsos (CMU)

CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X

CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? Size 16 Open. Cirrus'10 C. Faloutsos (CMU)

CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not

CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not existing now) Size 17 Open. Cirrus'10 C. Faloutsos (CMU)

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles Visualization Open. Cirrus'10 Centralized Hadoop/PEG ASUS old old old DONE STARTED C. Faloutsos (CMU) 18

CMU SCS Triangles • Real social networks have a lot of triangles ASONAM 2009

CMU SCS Triangles • Real social networks have a lot of triangles ASONAM 2009 C. Faloutsos 19

CMU SCS Triangles • Real social networks have a lot of triangles – Friends

CMU SCS Triangles • Real social networks have a lot of triangles – Friends of friends are friends • Q 1: how to compute quickly? • Q 2: Any patterns? ASONAM 2009 C. Faloutsos 20

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] Q: Can we do that quickly?

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] Q: Can we do that quickly? Triangles are expensive to compute (3 -way join; several approx. algos) ASONAM 2009 C. Faloutsos 21

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3 -way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( li 3 ) (and, because of skewness, we only need the top few eigenvalues! ASONAM 2009 C. Faloutsos 22

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] ASONAM 2009 1000 x+ speed-up, high

CMU SCS Triangles : Computations [Tsourakakis ICDM 2008] ASONAM 2009 1000 x+ speed-up, high accuracy C. Faloutsos 23

CMU SCS Triangles • Easy to implement on hadoop: it only needs eigenvalues (working

CMU SCS Triangles • Easy to implement on hadoop: it only needs eigenvalues (working on it, using Lanczos) Open. Cirrus'10 C. Faloutsos (CMU) 24

CMU SCS Triangles • Real social networks have a lot of triangles – Friends

CMU SCS Triangles • Real social networks have a lot of triangles – Friends of friends are friends • Q 1: how to compute quickly? • Q 2: Any patterns? ASONAM 2009 C. Faloutsos 25

CMU SCS Triangle Law: #1 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of

CMU SCS Triangle Law: #1 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes ASONAM 2009 C. Faloutsos 26

CMU SCS Triangle Law: #2 [Tsourakakis ICDM 2008] Reuters SN Epinions ASONAM 2009 C.

CMU SCS Triangle Law: #2 [Tsourakakis ICDM 2008] Reuters SN Epinions ASONAM 2009 C. Faloutsos X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets) 27

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles

CMU SCS Outline – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles Visualization Open. Cirrus'10 Centralized Hadoop/PEG ASUS old old old DONE STARTED C. Faloutsos (CMU) 28

CMU SCS Visualization: Shift. R • Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and

CMU SCS Visualization: Shift. R • Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining Approaches Aniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. Hong Sensemaking Workshop at CHI 2009, April 4 -5. Boston, MA, USA. Open. Cirrus'10 C. Faloutsos (CMU) 29

CMU SCS

CMU SCS

CMU SCS Conclusions One-stop shopping for large graph mining: • www. cs. cmu. edu/~pegasus

CMU SCS Conclusions One-stop shopping for large graph mining: • www. cs. cmu. edu/~pegasus Akoglu, Leman Tsourakakis, Babis Kang, U Chau, Polo Mc. Glohon, Mary THANKS: NSF, Yahoo (M 45), LLNL Open. Cirrus'10 C. Faloutsos (CMU) 31