Social Networks and Graph Mining Christos Faloutsos CMU
- Slides: 45
Social Networks and Graph Mining Christos Faloutsos CMU - MLD-AB '07
Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 2
Motivation • • Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay? MLD-AB '07 3
Problem#1: Joint work with • Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) MLD-AB '07 4
Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] MLD-AB '07 5
Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . MLD-AB '07 6
Problem #1 - network and graph mining • How does the Internet look like? • How does the web look like? • What constitutes a ‘normal’ social network? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? MLD-AB '07 7
Graph mining • Are real graphs random? MLD-AB '07 8
Laws and patterns NO!! • Diameter • in- and out- degree distributions • other (surprising) patterns MLD-AB '07 9
Solution • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) MLD-AB '07 10
But: • Q 1: How about graphs from other domains? • Q 2: How about temporal evolution? MLD-AB '07 11
The Peer-to-Peer Topology [Jovanovic+] • Frequency versus degree • Number of adjacent peers follows a power-law MLD-AB '07 12
More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) MLD-AB '07 13
Swedish sex-web Albert Laszlo Barabasi Nodes: people (Females; Males) Links: sexual relationships http: //www. nd. edu/~networks/ Publication%20 Categories/ 04%20 Talks/2005 -norway 3 hours. ppt 4781 Swedes; 18 -74; 59% response rate. MLD-AB '07 14 Liljeros et al. Nature 2001
More power laws: • web hit counts [w/ A. Montgomery] log(count) Web Site Traffic Zipf ``ebay’’ users sites log(in-degree) MLD-AB '07 15
epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree MLD-AB '07 16
But: • Q 1: How about graphs from other domains? • Q 2: How about temporal evolution? MLD-AB '07 17
Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) MLD-AB '07 18
Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? MLD-AB '07 19
Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time – As the network grows the distances between nodes slowly decrease MLD-AB '07 20
Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 – 2003 • One graph per year diameter time [years] MLD-AB '07 21
Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes MLD-AB '07 22
Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] MLD-AB '07 23
Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] MLD-AB '07 24
Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) MLD-AB '07 25
Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ MLD-AB '07 26
Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations ? ? N(t) MLD-AB '07 27
Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 N(t) MLD-AB '07 28
Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 1: tree N(t) MLD-AB '07 29
Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations clique: 2 1. 69 N(t) MLD-AB '07 30
Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66 – 2. 9 million nodes – 16. 5 million edges • Each year is a datapoint N(t) MLD-AB '07 31
Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 – 6, 000 nodes – 26, 000 edges • One graph per day N(t) MLD-AB '07 32
Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1. 15 – 60, 000 nodes • 20, 000 authors • 38, 000 papers – 133, 000 edges N(t) MLD-AB '07 33
Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 34
Virus propagation • How do viruses/rumors propagate? • Will a flu-like virus linger, or will it become extinct soon? MLD-AB '07 35
The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d Healthy Prob. d N 2 Prob. b N 1 N Infected Pro b. β N 3 MLD-AB '07 36
Epidemic threshold t What should t depend on? • avg. degree? and/or highest degree? • and/or variance of degree? • and/or third moment of degree? • and/or diameter? MLD-AB '07 38
Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1, A MLD-AB '07 39
Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1, A attack prob. largest eigenvalue of adj. matrix A Proof: [Wang+03] MLD-AB '07 40
Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the threshold) b/d < τ (below threshold) MLD-AB '07 41
Outline • • • Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions MLD-AB '07 42
E-bay Fraud detection w/ Polo Chau, CMU MLD-AB '07 43
E-bay Fraud detection - Net. Probe MLD-AB '07 44
Conclusions • Graphs pose fascinating problems • self-similarity/fractals and power laws work, when textbook methods fail! • Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++… MLD-AB '07 45
Contact info • christos@cs. cmu. edu • www. cs. cmu. edu/~christos MLD-AB '07 46
- Christos faloutsos
- Michalis faloutsos
- Data mining cmu
- Cmu data mining
- Difference between strip mining and open pit mining
- Text and web mining
- Cmu 15-441
- Virtual circuit switching example
- Strip mining vs open pit mining
- Strip mining before and after
- Mining multimedia databases
- Eck
- Networks and graphs: circuits, paths, and graph structures
- Bayesian belief networks in data mining
- Basestore iptv
- Cmu graph theory
- Christos papadimitriou columbia
- Christos kanellopoulos
- Interstitiella lungsjukdomar
- Christos davatzikos
- Christos anastasiou
- Nicholas lemonias
- Christos takoudis
- Christos h papadimitriou
- Christos chronopoulos
- Ucsb barc
- Christos chronopoulos
- Christos lenis
- Christos hatzis
- Christos markou
- Christos hatzis
- Christos kotselidis
- Arabesque: a system for distributed graph mining
- How should mining graph look like
- Social media analytics and text mining
- Modeling relational data with graph convolutional networks
- Few shot learning with graph neural networks
- Measurement and analysis of online social networks
- Measurement and analysis of online social networks
- Collaborating via social networks and groupware
- " dr. jan" and "social networks"
- Social thinking adalah
- Social thinking social influence social relations
- Resource allocation graph and wait for graph
- Mining social network graphs
- Text mining social media