CMU SCS Part 1 Graph Mining patterns Christos

  • Slides: 63
Download presentation
CMU SCS Part 1: Graph Mining – patterns Christos Faloutsos CMU

CMU SCS Part 1: Graph Mining – patterns Christos Faloutsos CMU

CMU SCS Our goal: Open source system for mining huge graphs: PEGASUS project (PEta

CMU SCS Our goal: Open source system for mining huge graphs: PEGASUS project (PEta Gr. Aph mining System) • www. cs. cmu. edu/~pegasus • code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017 2

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 3

CMU SCS Outline • • Introduction – Motivation Part#1: Patterns in graphs Part#2: Tools

CMU SCS Outline • • Introduction – Motivation Part#1: Patterns in graphs Part#2: Tools (Ranking, proximity) Conclusions Tepper, CMU, April 4 (c) C. Faloutsos, 2017 4

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Friendship Network

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Friendship Network [Moody ’ 01] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] 5

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: Tepper, CMU, April 4 (c) C. Faloutsos, 2017 T 1 6

CMU SCS Graphs - why should we care? • network of companies & board-of-directors

CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . Tepper, CMU, April 4 (c) C. Faloutsos, 2017 7

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs Tepper, CMU, April 4 (c) C. Faloutsos, 2017 8

CMU SCS Network and graph mining • How does the Internet look like? •

CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 9

CMU SCS Network and graph mining • How does the Internet look like? •

CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns Tepper, CMU, April 4 (c) C. Faloutsos, 2017 10

CMU SCS Network and graph mining • How does the Internet look like? •

CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns – Large datasets reveal patterns/anomalies that may be invisible otherwise… Tepper, CMU, April 4 (c) C. Faloutsos, 2017 11

CMU SCS Topology How does the Internet look like? Any rules? (Looks random –

CMU SCS Topology How does the Internet look like? Any rules? (Looks random – right? ) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 12

CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April 4 (c)

CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 13

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! –

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data Tepper, CMU, April 4 (c) C. Faloutsos, 2017 14

CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what

CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what is the most probable degree? count ? ? 2 Tepper, CMU, April 4 degree (c) C. Faloutsos, 2017 15

CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what

CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what is the most probable degree? count 2 Tepper, CMU, April 4 count ? ? degree (c) C. Faloutsos, 2017 2 degree 16

CMU SCS Solution S 1. Power-law: outdegree O Frequency Exponent = slope O =

CMU SCS Solution S 1. Power-law: outdegree O Frequency Exponent = slope O = -2. 15 Nov’ 97 Outdegree The plot is linear in log-log scale [FFF’ 99] freq = degree (-2. 15) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 17

CMU SCS Solution# S. 1’ • Power law in the degree distribution [SIGCOMM 99]

CMU SCS Solution# S. 1’ • Power law in the degree distribution [SIGCOMM 99] internet domains att. com log(degree) ibm. com -0. 82 log(rank) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 18

CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E =

CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix Tepper, CMU, April 4 (c) C. Faloutsos, 2017 19

CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E =

CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Mihail, Papadimitriou ’ 02]: slope is ½ of rank exponent Tepper, CMU, April 4 (c) C. Faloutsos, 2017 20

CMU SCS But: How about graphs from other domains? Tepper, CMU, April 4 (c)

CMU SCS But: How about graphs from other domains? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 21

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic Count (log scale) Zipf ``ebay’’ users sites in-degree (log scale) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 22

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017 23

CMU SCS And numerous more • • # of sexual contacts Income [Pareto] –’

CMU SCS And numerous more • • # of sexual contacts Income [Pareto] –’ 80 -20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) • Size of files of a user • … • ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 24

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs • Degree • Triangles • … – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 25

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles Tepper, CMU, April 4 (c) C. Faloutsos, 2017 26

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 27

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: #

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Tepper, CMU, April 4 (c) C. Faloutsos, 2017 28

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: #

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Tepper, CMU, April 4 (c) C. Faloutsos, 2017 29

CMU SCS Triangle Law: #S. 4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis:

CMU SCS Triangle Law: #S. 4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n 1. 6 triangles Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017 30

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 31

CMU SCS Observations on weighted graphs? • A: yes - even more ‘laws’! M.

CMU SCS Observations on weighted graphs? • A: yes - even more ‘laws’! M. Mc. Glohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 32

CMU SCS Observation W. 1: Fortification Q: How do the weights of nodes relate

CMU SCS Observation W. 1: Fortification Q: How do the weights of nodes relate to degree? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 33

CMU SCS Observation W. 1: Fortification More donors, more $ ? $10 $5 $7

CMU SCS Observation W. 1: Fortification More donors, more $ ? $10 $5 $7 ‘Reagan’ ‘Clinton’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 34

CMU SCS Observation W. 1: fortification: Snapshot Power Law • Weight: super-linear on in-degree

CMU SCS Observation W. 1: fortification: Snapshot Power Law • Weight: super-linear on in-degree • exponent ‘iw’: 1. 01 < iw < 1. 26 Orgs-Candidates More donors, even more $ $10 e. g. John Kerry, $10 M received, from 1 K donors In-weights ($) $5 Edges (# donors) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 35

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 36

CMU SCS Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and

CMU SCS Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 37

CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law

CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 38

CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law

CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time Tepper, CMU, April 4 (c) C. Faloutsos, 2017 39

CMU SCS T. 1 Diameter – “Patents” • Patent citation network • 25 years

CMU SCS T. 1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 diameter – 2. 9 M nodes – 16. 5 M edges time [years] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 40

CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at

CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 41

CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at

CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 42

CMU SCS T. 2 Densification – Patent Citations • Citations among patents granted E(t)

CMU SCS T. 2 Densification – Patent Citations • Citations among patents granted E(t) • @1999 – 2. 9 M nodes – 16. 5 M edges 1. 66 • Each year is a datapoint N(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 43

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in

CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 44

CMU SCS More on Time-evolving graphs M. Mc. Glohon, L. Akoglu, and C. Faloutsos

CMU SCS More on Time-evolving graphs M. Mc. Glohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 45

CMU SCS Observation T. 3: NLCC behavior Q: How do NLCC’s emerge and join

CMU SCS Observation T. 3: NLCC behavior Q: How do NLCC’s emerge and join with the GCC? (``NLCC’’ = non-largest conn. components) – Do they continue to grow in size? – or do they shrink? – or stabilize? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 46

CMU SCS Observation T. 3: NLCC behavior • After the gelling point, the GCC

CMU SCS Observation T. 3: NLCC behavior • After the gelling point, the GCC takes off, but NLCC’s remain ~constant (actually, oscillate). IMDB CC size Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017 47

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System

CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). Tepper, CMU, April 4 (c) C. Faloutsos, 2017 48

CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April

CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 49

CMU SCS Example: GIM-V At Work • Connected Components Count ~0. 7 B singleton

CMU SCS Example: GIM-V At Work • Connected Components Count ~0. 7 B singleton nodes Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 50

CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April

CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 51

CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X

CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 52

CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not

CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not existing now) Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 53

CMU SCS GIM-V At Work • Connected Components over Time • Linked. In: 7.

CMU SCS GIM-V At Work • Connected Components over Time • Linked. In: 7. 5 M nodes and 58 M edges Stable tail slope after the gelling point Tepper, CMU, April 4 (c) C. Faloutsos, 2017 54

CMU SCS Timing for Blogs • with Mary Mc. Glohon (CMU) • Jure Leskovec

CMU SCS Timing for Blogs • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU->Stanford) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’ 07] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 55

CMU SCS T. 4 : popularity over time # in links 1 2 3

CMU SCS T. 4 : popularity over time # in links 1 2 3 lag: days after post Post popularity drops-off – exponentially? @t @t + lag Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56

CMU SCS T. 4 : popularity over time # in links (log) 1 2

CMU SCS T. 4 : popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57

CMU SCS T. 4 : popularity over time # in links (log) -1. 6

CMU SCS T. 4 : popularity over time # in links (log) -1. 6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 • close to -1. 5: Barabasi’s stack model • and like the zero-crossings of a random walk Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58

CMU SCS Conclusions (part 1) MANY patterns in real graphs • Skewed degree distributions

CMU SCS Conclusions (part 1) MANY patterns in real graphs • Skewed degree distributions • Small (and shrinking) diameter • Power-laws wrt triangles • Oscillating size of connected components • … and more Tepper, CMU, April 4 (c) C. Faloutsos, 2017 59

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 60

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time:

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). Tepper, CMU, April 4 (c) C. Faloutsos, 2017 61

CMU SCS Project info www. cs. cmu. edu/~pegasus Chau, Polo Akoglu, Leman Mc. Glohon,

CMU SCS Project info www. cs. cmu. edu/~pegasus Chau, Polo Akoglu, Leman Mc. Glohon, Mary Kang, U Tsourakakis, Babis Prakash, Aditya Tong, Hanghang Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M 45), LLNL, IBM, SPRINT, Tepper, CMU, April 4 INTEL, HP (c) C. Faloutsos, 2017 62

CMU SCS Part 1 END Tepper, CMU, April 4 (c) C. Faloutsos, 2017 63

CMU SCS Part 1 END Tepper, CMU, April 4 (c) C. Faloutsos, 2017 63