CMU SCS Graph Mining Laws Generators and Tools

  • Slides: 104
Download presentation
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU IMC 08 C.

CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU IMC 08 C. Faloutsos #

CMU SCS Thank you! • Dina Papagiannaki • Zhi-li Zhang • Bruce Maggs IMC

CMU SCS Thank you! • Dina Papagiannaki • Zhi-li Zhang • Bruce Maggs IMC 08 C. Faloutsos 2

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions IMC 08 C. Faloutsos 3

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 4

CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) IMC

CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) IMC 08 C. Faloutsos 5

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] IMC 08 C. Faloutsos 6

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: IMC 08 C. Faloutsos T 1 7

CMU SCS Graphs - why should we care? • network of companies & board-of-directors

CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . IMC 08 C. Faloutsos 8

CMU SCS Problem #1 - network and graph mining • • IMC 08 How

CMU SCS Problem #1 - network and graph mining • • IMC 08 How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? C. Faloutsos 9

CMU SCS Graph mining • Are real graphs random? IMC 08 C. Faloutsos 10

CMU SCS Graph mining • Are real graphs random? IMC 08 C. Faloutsos 10

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! –

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns IMC 08 C. Faloutsos 11

CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains

CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) IMC 08 C. Faloutsos 12

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix IMC 08 C. Faloutsos 13

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Mihail, Papadimitriou ’ 02]: slope is ½ of rank exponent IMC 08 C. Faloutsos 14

CMU SCS But: How about graphs from other domains? IMC 08 C. Faloutsos 15

CMU SCS But: How about graphs from other domains? IMC 08 C. Faloutsos 15

CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent

CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent peers follows a power-law IMC 08 C. Faloutsos 16

CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman

CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) IMC 08 C. Faloutsos 17

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ users sites log(in-degree) IMC 08 C. Faloutsos 18

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree IMC 08 C. Faloutsos 19

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 20

CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg

CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) IMC 08 C. Faloutsos 21

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? IMC 08 C. Faloutsos 22

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time IMC 08 C. Faloutsos 23

CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers •

CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 – 2003 • One graph per year diameter time [years] IMC 08 C. Faloutsos 24

CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per

CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes IMC 08 C. Faloutsos 25

CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors

CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] IMC 08 C. Faloutsos 26

CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data

CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] IMC 08 C. Faloutsos 27

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) IMC 08 C. Faloutsos 28

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ IMC 08 C. Faloutsos 29

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003:

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations ? ? N(t) IMC 08 C. Faloutsos 30

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003:

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 N(t) IMC 08 C. Faloutsos 31

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003:

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 1: tree N(t) IMC 08 C. Faloutsos 32

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003:

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations clique: 2 1. 69 N(t) IMC 08 C. Faloutsos 33

CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999

CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66 – 2. 9 million nodes – 16. 5 million edges • Each year is a datapoint IMC 08 N(t) C. Faloutsos 34

CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1.

CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 – 6, 000 nodes – 26, 000 edges • One graph per day N(t) IMC 08 C. Faloutsos 35

CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002

CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1. 15 – 60, 000 nodes • 20, 000 authors • 38, 000 papers – 133, 000 edges IMC 08 N(t) C. Faloutsos 36

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 37

CMU SCS Problem#3: Generation IMC 08 C. Faloutsos 38

CMU SCS Problem#3: Generation IMC 08 C. Faloutsos 38

CMU SCS Problem Definition • Given a growing graph with count of nodes N

CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters IMC 08 C. Faloutsos 39

CMU SCS Problem Definition • Given a growing graph with count of nodes N

CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns • Idea: Self-similarity – Leads to power laws – Communities within communities –… IMC 08 C. Faloutsos 40

CMU SCS Kronecker Product – a Graph Intermediate stage IMC 08 Adjacency matrix C.

CMU SCS Kronecker Product – a Graph Intermediate stage IMC 08 Adjacency matrix C. Faloutsos Adjacency matrix 41

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … IMC 08 G 4 adjacency matrix C. Faloutsos 42

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … IMC 08 G 4 adjacency matrix C. Faloutsos 43

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … IMC 08 G 4 adjacency matrix C. Faloutsos 44

CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~

CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~ power law – Diameter: constant – Eigenvalue distribution: multinomial – First eigenvector: multinomial • See [Leskovec+, PKDD’ 05] for proofs IMC 08 C. Faloutsos 45

CMU SCS Problem Definition • Given a growing graph with nodes N 1, N

CMU SCS Problem Definition • Given a growing graph with nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters • First and only generator for which we can prove all these properties IMC 08 C. Faloutsos 46

CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1

CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1 • Compute the kth Kronecker power Pk • For each entry puv of Pk include an edge (u, v) with probability puv 0. 4 0. 2 0. 1 0. 3 P 1 Kronecker multiplication 0. 16 0. 08 0. 04 0. 12 0. 06 0. 04 0. 02 0. 12 0. 06 0. 01 0. 03 0. 09 Pk IMC 08 C. Faloutsos Instance Matrix G 2 flip biased coins 47

CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics

CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics citations: • 30, 000 papers, 350, 000 citations • 10 years of data – U. S. Patent citation network • 4 million patents, 16 million citations • 37 years of data – Autonomous systems – graph of internet • Single snapshot from January 2002 • 6, 400 nodes, 26, 000 edges • We show both static and temporal patterns IMC 08 C. Faloutsos 48

CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of

CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of Kronecker graphs + • Max likelihood + • Metropolis sampling • [Leskovec+, ICML’ 07] IMC 08 C. Faloutsos 49

CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen

CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen values IMC 08 Network value C. Faloutsos 50

CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed

CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors – All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters – We can formally prove these results IMC 08 C. Faloutsos 51

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 52

CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006

CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006 • htong <at> cs. cmu. edu IMC 08 C. Faloutsos 53

CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( )

CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • App. – Social Networks – Law Inforcement, … • Idea: – Proximity -> random walk with restarts IMC 08 C. Faloutsos 54

CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan

CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan IMC 08 C. Faloutsos 55

CMU SCS Case Study: AND query IMC 08 C. Faloutsos 56

CMU SCS Case Study: AND query IMC 08 C. Faloutsos 56

CMU SCS Case Study: AND query IMC 08 C. Faloutsos 57

CMU SCS Case Study: AND query IMC 08 C. Faloutsos 57

CMU SCS databases ML/Statistics 2_Soft. And query IMC 08 C. Faloutsos 58

CMU SCS databases ML/Statistics 2_Soft. And query IMC 08 C. Faloutsos 58

CMU SCS Conclusions • • Q 1: How to measure the importance? A 1:

CMU SCS Conclusions • • Q 1: How to measure the importance? A 1: RWR+K_Soft. And Q 2: How to do it efficiently? A 2: Graph Partition (Fast Ce. PS) – ~90% quality – 150 x speedup (ICDM’ 06, b. p. award) IMC 08 C. Faloutsos 59

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions IMC 08 C. Faloutsos 60

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 61

CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [

CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [ “ , SDM’ 07] • [ CF, Kolda, Sun, SDM’ 07 tutorial] IMC 08 C. Faloutsos 62

CMU SCS Social network analysis • Static: find community structures Keywords IMC 08 Authors

CMU SCS Social network analysis • Static: find community structures Keywords IMC 08 Authors 1990 DB C. Faloutsos 63

CMU SCS Social network analysis • Static: find community structures IMC 08 Authors 1992

CMU SCS Social network analysis • Static: find community structures IMC 08 Authors 1992 1991 1990 DB C. Faloutsos 64

CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community

CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps IMC 08 C. Faloutsos 65

CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu authors 1990 Uauthors

CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu authors 1990 Uauthors 2004 DM DB Ukeyword DB keyword Michael Stonebraker Pattern Query • Projection matrices specify the clusters • Core tensors give cluster activation level IMC 08 C. Faloutsos 66

CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct

CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct 2 nd order tensors with yearly windows with <author, keywords> • Each tensor: 4584 3741 • 11 timestamps (years) IMC 08 C. Faloutsos 67

CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector

CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri, parallel, optimization, concurr, objectorient 1995 surajit chaudhuri, mitch cherniack, michael stonebraker, ugur etintemel DB jiawei han, jian pei, philip s. yu, jianyong wang, charu c. aggarwal distribut, systems, view, storage, servic, pr 2004 ocess, cache streams, pattern, support, cluster, index, gener, queri 2004 DM • Two groups are correctly identified: Databases and Data mining • People and concepts are drifting over time IMC 08 C. Faloutsos 68

CMU SCS Network forensics • Directional network flows • A large ISP with 100

CMU SCS Network forensics • Directional network flows • A large ISP with 100 POPs, each POP 10 Gbps link capacity [Hotnets 2004] – 450 GB/hour with compression • Task: Identify abnormal traffic pattern and find out the cause IMC 08 normal traffic destination abnormal traffic source C. Faloutsos source (with Prof. Hui Zhang and Dr. Yinglian Xie) 69

CMU SCS MDL mining on time-evolving graph (Enron emails) IMC 08 Graph. Scope [w.

CMU SCS MDL mining on time-evolving graph (Enron emails) IMC 08 Graph. Scope [w. Jimeng Sun, C. Faloutsos 70 Spiros Papadimitriou and Philip Yu, KDD’ 07]

CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving

CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving graphs, and • on streams (monitoring) IMC 08 C. Faloutsos 71

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time IMC 08 C. Faloutsos 72

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions IMC 08 C. Faloutsos 73

CMU SCS Virus propagation • How do viruses/rumors propagate? • Blog influence? • Will

CMU SCS Virus propagation • How do viruses/rumors propagate? • Blog influence? • Will a flu-like virus linger, or will it become extinct soon? IMC 08 C. Faloutsos 74

CMU SCS The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d

CMU SCS The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d Healthy Prob. d N 2 Prob. b N 1 N Infected IMC 08 Pro b. β N 3 C. Faloutsos 75

CMU SCS Epidemic threshold t of a graph: the value of t, such that

CMU SCS Epidemic threshold t of a graph: the value of t, such that if strength s = b / d < t an epidemic can not happen Thus, • given a graph • compute its epidemic threshold IMC 08 C. Faloutsos 76

CMU SCS Epidemic threshold t What should t depend on? • avg. degree? and/or

CMU SCS Epidemic threshold t What should t depend on? • avg. degree? and/or highest degree? • and/or variance of degree? • and/or third moment of degree? • and/or diameter? IMC 08 C. Faloutsos 77

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ =

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1, A IMC 08 C. Faloutsos 78

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1, A attack prob. largest eigenvalue of adj. matrix A Proof: [Wang+03] IMC 08 C. Faloutsos 79

CMU SCS Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the

CMU SCS Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the threshold) b/d < τ (below threshold) IMC 08 C. Faloutsos 80

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions IMC 08 C. Faloutsos 81

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU IMC 08

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU IMC 08 C. Faloutsos 82

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? IMC 08 C. Faloutsos 83

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? • or him/her? IMC 08 C. Faloutsos 84

CMU SCS E-bay Fraud detection - Net. Probe IMC 08 C. Faloutsos 85

CMU SCS E-bay Fraud detection - Net. Probe IMC 08 C. Faloutsos 85

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions IMC 08 C. Faloutsos 86

CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU)

CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’ 07] IMC 08 C. Faloutsos 87

CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1

CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1 a B 2 1 B 3 B 4 Blogosphere blogs + posts 1 B 3 b c 2 B 4 Blog network links among blogs 3 d e Post network links among posts Q 1: popularity-decay of a post? Q 2: degree distributions? IMC 08 C. Faloutsos 88

CMU SCS Q 1: popularity over time # in links 1 2 3 days

CMU SCS Q 1: popularity over time # in links 1 2 3 days after post Post popularity drops-off – exponentially? IMC 08 C. Faloutsos Days after post 89

CMU SCS Q 1: popularity over time # in links (log) 1 2 3

CMU SCS Q 1: popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? IMC 08 C. Faloutsos Days after post 90

CMU SCS Q 1: popularity over time # in links (log) -1. 6 1

CMU SCS Q 1: popularity over time # in links (log) -1. 6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 (close to -1. 5: Barabasi’s stack model) IMC 08 C. Faloutsos Days after post 91

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 ? ? 1 1 1 B 2 2 B B 3 4 3 blog in-degree IMC 08 C. Faloutsos 92

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 1 B 2 2 B B 3 4 3 blog in-degree IMC 08 C. Faloutsos 93

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count in-degree slope: -1. 7 out-degree: -3 ‘rich get richer’ IMC 08 blog in-degree C. Faloutsos 94

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) – And research directions • Conclusions IMC 08 C. Faloutsos 95

CMU SCS Next steps: • edges with – categorical attributes and/or – time-stamps and/or

CMU SCS Next steps: • edges with – categorical attributes and/or – time-stamps and/or – weights • nodes with attributes [G-Ray, Tong et al] • scalability (cloud computing) IMC 08 C. Faloutsos 96

CMU SCS E. g. : self-* system @ CMU • >200 nodes • 40

CMU SCS E. g. : self-* system @ CMU • >200 nodes • 40 racks of computing equipment • 774 kw of power. • target: 1 Peta. Byte • goal: self-correcting, selfsecuring, self-monitoring, self-. . . IMC 08 C. Faloutsos 97

CMU SCS Cloud computing, D. I. S. C. and hadoop • ‘Data Intensive Scientific

CMU SCS Cloud computing, D. I. S. C. and hadoop • ‘Data Intensive Scientific Computing’ [R. Bryant, CMU] – ‘big data’ – http: //www. cs. cmu. edu/~bryant/pubdir/cmu-cs-07128. pdf • Yahoo: ~5 Pb of data [Fayyad’ 07] • ‘M 45’: 4 K proc’s, 3 Tb RAM, 1. 5 Pb disk • Hadoop: open-source clone of map-reduce http: //hadoop. apache. org/ IMC 08 C. Faloutsos 98

CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity

CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity and power laws work, when textbook methods fail! • New patterns (shrinking diameter!) • New generator: Kronecker • SVD / tensors / RWR: valuable tools • Scalability / cloud computing -> Peta. Bytes IMC 08 C. Faloutsos 99

CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk

CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. • Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA • Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad Fast Best-Effort Pattern Matching in Large Attributed Graphs KDD 2007, San Jose, CA IMC 08 C. Faloutsos 100

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time:

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award). • Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005. IMC 08 C. Faloutsos 101

CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs

CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA • Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos Net. Probe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8 -12, 2007. • Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA IMC 08 C. Faloutsos 102

CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is

CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf] • Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, Graph. Scope: Parameterfree Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007 IMC 08 C. Faloutsos 103

CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc)

CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc) IMC 08 C. Faloutsos 104