Social Networks and Graph Mining Christos Faloutsos CMU

  • Slides: 45
Download presentation
Social Networks and Graph Mining Christos Faloutsos CMU - MLD-AB '07

Social Networks and Graph Mining Christos Faloutsos CMU - MLD-AB '07

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation]

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 2

Motivation • • Data mining: ~ find patterns (rules, outliers) Problem#1: How do real

Motivation • • Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay? MLD-AB '07 3

Problem#1: Joint work with • Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) MLD-AB '07

Problem#1: Joint work with • Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) MLD-AB '07 4

Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’

Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] MLD-AB '07 5

Graphs - why should we care? • network of companies & board-of-directors members •

Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . MLD-AB '07 6

Problem #1 - network and graph mining • How does the Internet look like?

Problem #1 - network and graph mining • How does the Internet look like? • How does the web look like? • What constitutes a ‘normal’ social network? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? MLD-AB '07 7

Graph mining • Are real graphs random? MLD-AB '07 8

Graph mining • Are real graphs random? MLD-AB '07 8

Laws and patterns NO!! • Diameter • in- and out- degree distributions • other

Laws and patterns NO!! • Diameter • in- and out- degree distributions • other (surprising) patterns MLD-AB '07 9

Solution • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm.

Solution • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) MLD-AB '07 10

But: • Q 1: How about graphs from other domains? • Q 2: How

But: • Q 1: How about graphs from other domains? • Q 2: How about temporal evolution? MLD-AB '07 11

The Peer-to-Peer Topology [Jovanovic+] • Frequency versus degree • Number of adjacent peers follows

The Peer-to-Peer Topology [Jovanovic+] • Frequency versus degree • Number of adjacent peers follows a power-law MLD-AB '07 12

More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) MLD-AB

More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) MLD-AB '07 13

Swedish sex-web Albert Laszlo Barabasi Nodes: people (Females; Males) Links: sexual relationships http: //www.

Swedish sex-web Albert Laszlo Barabasi Nodes: people (Females; Males) Links: sexual relationships http: //www. nd. edu/~networks/ Publication%20 Categories/ 04%20 Talks/2005 -norway 3 hours. ppt 4781 Swedes; 18 -74; 59% response rate. MLD-AB '07 14 Liljeros et al. Nature 2001

More power laws: • web hit counts [w/ A. Montgomery] log(count) Web Site Traffic

More power laws: • web hit counts [w/ A. Montgomery] log(count) Web Site Traffic Zipf ``ebay’’ users sites log(in-degree) MLD-AB '07 15

epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out)

epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree MLD-AB '07 16

But: • Q 1: How about graphs from other domains? • Q 2: How

But: • Q 1: How about graphs from other domains? • Q 2: How about temporal evolution? MLD-AB '07 17

Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb.

Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) MLD-AB '07 18

Evolution of the Diameter • Prior work on Power Law graphs hints at slowly

Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? MLD-AB '07 19

Evolution of the Diameter • Prior work on Power Law graphs hints at slowly

Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time – As the network grows the distances between nodes slowly decrease MLD-AB '07 20

Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 –

Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 – 2003 • One graph per year diameter time [years] MLD-AB '07 21

Diameter – “Autonomous Systems” • Graph of Internet • One graph per day •

Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes MLD-AB '07 22

Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to

Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] MLD-AB '07 23

Diameter – “Patents” • Patent citation network • 25 years of data diameter time

Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] MLD-AB '07 24

Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t)

Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) MLD-AB '07 25

Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t)

Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ MLD-AB '07 26

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29,

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations ? ? N(t) MLD-AB '07 27

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29,

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 N(t) MLD-AB '07 28

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29,

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 1: tree N(t) MLD-AB '07 29

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29,

Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations clique: 2 1. 69 N(t) MLD-AB '07 30

Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66

Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66 – 2. 9 million nodes – 16. 5 million edges • Each year is a datapoint N(t) MLD-AB '07 31

Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 –

Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 – 6, 000 nodes – 26, 000 edges • One graph per day N(t) MLD-AB '07 32

Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1.

Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1. 15 – 60, 000 nodes • 20, 000 authors • 38, 000 papers – 133, 000 edges N(t) MLD-AB '07 33

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation]

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 34

Virus propagation • How do viruses/rumors propagate? • Will a flu-like virus linger, or

Virus propagation • How do viruses/rumors propagate? • Will a flu-like virus linger, or will it become extinct soon? MLD-AB '07 35

The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d Healthy Prob.

The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d Healthy Prob. d N 2 Prob. b N 1 N Infected Pro b. β N 3 MLD-AB '07 36

Epidemic threshold t What should t depend on? • avg. degree? and/or highest degree?

Epidemic threshold t What should t depend on? • avg. degree? and/or highest degree? • and/or variance of degree? • and/or third moment of degree? • and/or diameter? MLD-AB '07 38

Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ

Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1, A MLD-AB '07 39

Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ

Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1, A attack prob. largest eigenvalue of adj. matrix A Proof: [Wang+03] MLD-AB '07 40

Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the threshold) b/d

Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the threshold) b/d < τ (below threshold) MLD-AB '07 41

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation]

Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 42

E-bay Fraud detection w/ Polo Chau, CMU MLD-AB '07 43

E-bay Fraud detection w/ Polo Chau, CMU MLD-AB '07 43

E-bay Fraud detection - Net. Probe MLD-AB '07 44

E-bay Fraud detection - Net. Probe MLD-AB '07 44

Conclusions • Graphs pose fascinating problems • self-similarity/fractals and power laws work, when textbook

Conclusions • Graphs pose fascinating problems • self-similarity/fractals and power laws work, when textbook methods fail! • Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++… MLD-AB '07 45

Contact info • christos@cs. cmu. edu • www. cs. cmu. edu/~christos MLD-AB '07 46

Contact info • christos@cs. cmu. edu • www. cs. cmu. edu/~christos MLD-AB '07 46