CMU SCS Anomaly detection in large graphs Christos

  • Slides: 127
Download presentation
CMU SCS Anomaly detection in large graphs Christos Faloutsos CMU

CMU SCS Anomaly detection in large graphs Christos Faloutsos CMU

CMU SCS Thank you! • Dr. Lei Li Toutiao Lab (c) C. Faloutsos, 2016

CMU SCS Thank you! • Dr. Lei Li Toutiao Lab (c) C. Faloutsos, 2016 2

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1:

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 3

CMU SCS Graphs - why should we care? >$10 B; ~1 B users Toutiao

CMU SCS Graphs - why should we care? >$10 B; ~1 B users Toutiao Lab (c) C. Faloutsos, 2016 4

CMU SCS Graphs - why should we care? Food Web [Martinez ’ 91] Internet

CMU SCS Graphs - why should we care? Food Web [Martinez ’ 91] Internet Map [lumeta. com] Toutiao Lab (c) C. Faloutsos, 2016 5

CMU SCS Graphs - why should we care? • web-log (‘blog’) news propagation •

CMU SCS Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • Recommendation systems • . . • Many-to-many db relationship -> graph Toutiao Lab (c) C. Faloutsos, 2016 6

CMU SCS Motivating problems • P 1: patterns? Fraud detection? • P 2: patterns

CMU SCS Motivating problems • P 1: patterns? Fraud detection? • P 2: patterns in time-evolving graphs / tensors destination sou rce Toutiao Lab (c) C. Faloutsos, 2016 time 7

CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies • P

CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies • P 2: patterns in time-evolving graphs / tensors destination sou rce Toutiao Lab (c) C. Faloutsos, 2016 time 8

CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies* • P

CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies* • P 2: patterns in time-evolving graphs / tensors destination sou rce time * Robust Random Cut Forest Based Anomaly Detection on Streams Sudipto Guha, Nina Mishra , Gourav Roy, Toutiao Lab (c) C. Faloutsos, 2016 9 Okke Schrijvers, ICML’ 16

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1:

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns & fraud detection • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 10

CMU SCS Part 1: Patterns, & fraud detection Toutiao Lab (c) C. Faloutsos, 2016

CMU SCS Part 1: Patterns, & fraud detection Toutiao Lab (c) C. Faloutsos, 2016 11

CMU SCS Laws and patterns • Q 1: Are real graphs random? Toutiao Lab

CMU SCS Laws and patterns • Q 1: Are real graphs random? Toutiao Lab (c) C. Faloutsos, 2016 12

CMU SCS Laws and patterns • Q 1: Are real graphs random? • A

CMU SCS Laws and patterns • Q 1: Are real graphs random? • A 1: NO!! – Diameter (‘ 6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data Toutiao Lab (c) C. Faloutsos, 2016 13

CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x

CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM 99] internet domains log(degree) att. com ibm. com log(rank) Toutiao Lab (c) C. Faloutsos, 2016 14

CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x

CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) Toutiao Lab (c) C. Faloutsos, 2016 15

CMU SCS S 2: connected component sizes • Connected Components – 4 observations: Count

CMU SCS S 2: connected component sizes • Connected Components – 4 observations: Count 1. 4 B nodes 6 B edges Size Toutiao Lab (c) C. Faloutsos, 2016 16

CMU SCS S 2: connected component sizes • Connected Components Count 1) 10 K

CMU SCS S 2: connected component sizes • Connected Components Count 1) 10 K x larger than next Size Toutiao Lab (c) C. Faloutsos, 2016 17

CMU SCS S 2: connected component sizes • Connected Components Count 2) ~0. 7

CMU SCS S 2: connected component sizes • Connected Components Count 2) ~0. 7 B singleton nodes Size Toutiao Lab (c) C. Faloutsos, 2016 18

CMU SCS S 2: connected component sizes • Connected Components Count 3) SLOPE! Size

CMU SCS S 2: connected component sizes • Connected Components Count 3) SLOPE! Size Toutiao Lab (c) C. Faloutsos, 2016 19

CMU SCS S 2: connected component sizes • Connected Components Count 300 -size cmpt

CMU SCS S 2: connected component sizes • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? 4) Spikes! Size Toutiao Lab (c) C. Faloutsos, 2016 20

CMU SCS S 2: connected component sizes • Connected Components Count suspicious financial-advice sites

CMU SCS S 2: connected component sizes • Connected Components Count suspicious financial-advice sites (not existing now) Toutiao Lab Size (c) C. Faloutsos, 2016 21

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns: Degree; Triangles – P 1. 2: Anomaly/fraud detection • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 22

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles Toutiao Lab (c) C. Faloutsos, 2016 23

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot

CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2 x the friends, 2 x the triangles ? Toutiao Lab (c) C. Faloutsos, 2016 24

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] Reuters Epinions Toutiao Lab SN

CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] Reuters Epinions Toutiao Lab SN X-axis: degree Y-axis: mean # triangles n friends -> ~n 1. 6 triangles (c) C. Faloutsos, 2016 25

CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~

CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 26

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 27

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 28

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 29

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 30

CMU SCS S 4: k-core patterns - dfn – k-core (of a graph) –

CMU SCS S 4: k-core patterns - dfn – k-core (of a graph) – degeneracy (of a graph) – coreness (of a vertex) Toutiao Lab (c) C. Faloutsos, 2016 31

CMU SCS Core. Scope: Graph Mining Using k. Core Analysis Patterns, Anomalies, and Algorithms

CMU SCS Core. Scope: Graph Mining Using k. Core Analysis Patterns, Anomalies, and Algorithms ICDM’ 16 (to appear) Kijung Shin, Tina Eliassi-Rad and CF

CMU SCS Mirror Pattern: Observation – coreness (of a vertex): maximum k such that

CMU SCS Mirror Pattern: Observation – coreness (of a vertex): maximum k such that the vertex belongs to the k-core – Definition: [Mirror Pattern] degree ~ coreness Toutiao Lab (c) C. Faloutsos, 2016 33

CMU SCS Mirror Pattern: Application • Exceptions are ‘strange’ Toutiao Lab (c) C. Faloutsos,

CMU SCS Mirror Pattern: Application • Exceptions are ‘strange’ Toutiao Lab (c) C. Faloutsos, 2016 34

CMU SCS MORE Graph Patterns ✔ ✔ ✔ RTG: Toutiao A Recursive Realistic Graph

CMU SCS MORE Graph Patterns ✔ ✔ ✔ RTG: Toutiao A Recursive Realistic Graph Generator using Random Lab (c) C. Faloutsos, 2016 35 Typing Leman Akoglu and Christos Faloutsos. PKDD’ 09.

CMU SCS MORE Graph Patterns • Mary Mc. Glohon, Leman Akoglu, Christos Faloutsos. Statistical

CMU SCS MORE Graph Patterns • Mary Mc. Glohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed. : Charu Aggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Toutiao Lab (c) C. Faloutsos, 2016 36

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral Patterns anomalies • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 37

CMU SCS How to find ‘suspicious’ groups? • ‘blocks’ are normal, right? idols fans

CMU SCS How to find ‘suspicious’ groups? • ‘blocks’ are normal, right? idols fans Toutiao Lab (c) C. Faloutsos, 2016 38

CMU SCS Except that: • ‘blocks’ are normal, right? • ‘hyperbolic’ communities are more

CMU SCS Except that: • ‘blocks’ are normal, right? • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Toutiao Lab (c) C. Faloutsos, 2016 39

CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more

CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Q: Can we spot blocks, easily? Toutiao Lab (c) C. Faloutsos, 2016 40

CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more

CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Q: Can we spot blocks, easily? A: Silver bullet: SVD! Toutiao Lab (c) C. Faloutsos, 2016 41

CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks

CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 42

CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks

CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 43

CMU SCS Inferring Strange Behavior from Connectivity Pattern in Social Networks PAKDD’ 14 Meng

CMU SCS Inferring Strange Behavior from Connectivity Pattern in Social Networks PAKDD’ 14 Meng Jiang, Peng Cui, Shiqiang Yang (Tsinghua) Alex Beutel, Christos Faloutsos (CMU)

CMU SCS Lockstep and Spectral Subspace Plot • Case #0: No lockstep behavior in

CMU SCS Lockstep and Spectral Subspace Plot • Case #0: No lockstep behavior in random power law graph of 1 M nodes, 3 M edges • Random “Scatter” Adjacency Matrix Toutiao Lab Spectral Subspace Plot + (c) C. Faloutsos, 2016 + 45

CMU SCS Lockstep and Spectral Subspace Plot • Case #1: non-overlapping lockstep • “Blocks”

CMU SCS Lockstep and Spectral Subspace Plot • Case #1: non-overlapping lockstep • “Blocks” “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 46

CMU SCS Lockstep and Spectral Subspace Plot • Case #2: non-overlapping lockstep • “Blocks;

CMU SCS Lockstep and Spectral Subspace Plot • Case #2: non-overlapping lockstep • “Blocks; low density” Elongation Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 47

CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage”

CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage” (or “Fame”) Tilting “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 48

CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage”

CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage” (or “Fame”) Tilting “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 49

CMU SCS Lockstep and Spectral Subspace Plot • Case #4: • “? ” ?

CMU SCS Lockstep and Spectral Subspace Plot • Case #4: • “? ” ? Adjacency Matrix lockstep “Pearls” Spectral Subspace Plot ? Toutiao Lab (c) C. Faloutsos, 2016 50

CMU SCS Lockstep and Spectral Subspace Plot • Case #4: overlapping lockstep • “Staircase”

CMU SCS Lockstep and Spectral Subspace Plot • Case #4: overlapping lockstep • “Staircase” “Pearls” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 51

CMU SCS Dataset • Tencent Weibo • 117 million nodes (with profile and UGC

CMU SCS Dataset • Tencent Weibo • 117 million nodes (with profile and UGC data) • 3. 33 billion directed edges Toutiao Lab (c) C. Faloutsos, 2016 52

CMU SCS “Rays” Real Data “Pearls” Toutiao Lab “Block” “Staircase” (c) C. Faloutsos, 2016

CMU SCS “Rays” Real Data “Pearls” Toutiao Lab “Block” “Staircase” (c) C. Faloutsos, 2016 53

CMU SCS Real Data • Spikes on the out-degree distribution Toutiao Lab (c) C.

CMU SCS Real Data • Spikes on the out-degree distribution Toutiao Lab (c) C. Faloutsos, 2016 54

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral methods – Suspiciousness • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 55

CMU SCS Suspicious Patterns in Event Data ? 2 -modes ? ? n-modes A

CMU SCS Suspicious Patterns in Event Data ? 2 -modes ? ? n-modes A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan Toutiao Lab (c) C. Faloutsos, 2016 56 Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20,

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours Toutiao Lab vs. (c) C. Faloutsos, 2016 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses 57

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20,

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses (c) C. Faloutsos, 2016 Answer: volume * D KL(p|| pbackground) Toutiao Lab 58

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20,

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours size vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses contrast (c) C. Faloutsos, 2016 Answer: volume * D KL(p|| pbackground) Toutiao Lab 59

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Retweeting: “Galaxy Note Dream Project:

CMU SCS ICDM 2015 Suspicious Patterns in Event Data Retweeting: “Galaxy Note Dream Project: Happy Life Traveling the World” Toutiao Lab (c) C. Faloutsos, 2016 60

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral methods • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 61

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’ 07]

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’ 07] Toutiao Lab (c) C. Faloutsos, 2016 62

CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 63

CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 63

CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 64

CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 64

CMU SCS E-bay Fraud detection - Net. Probe Toutiao Lab (c) C. Faloutsos, 2016

CMU SCS E-bay Fraud detection - Net. Probe Toutiao Lab (c) C. Faloutsos, 2016 65

CMU SCS Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy

CMU SCS Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code? ’) Toutiao Lab (c) C. Faloutsos, 2016 66

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • No labels - Spectral methods • w/ labels: Belief Propagation – closed formulas • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 67

CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo

CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5 -9 September 2011, Athens, Greece

CMU SCS Problem Definition: GBA techniques ? ? ? Given: Graph; & few labeled

CMU SCS Problem Definition: GBA techniques ? ? ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? Toutiao Lab (c) C. Faloutsos, 2016 69

CMU SCS Are they related? • RWR (Random Walk with Restarts) – google’s page.

CMU SCS Are they related? • RWR (Random Walk with Restarts) – google’s page. Rank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Toutiao Lab (c) C. Faloutsos, 2016 70

CMU SCS Are they related? YES! • RWR (Random Walk with Restarts) – google’s

CMU SCS Are they related? YES! • RWR (Random Walk with Restarts) – google’s page. Rank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Toutiao Lab (c) C. Faloutsos, 2016 71

CMU SCS Correspondence of Methods Method RWR SSL FABP Matrix [I – c AD-1]

CMU SCS Correspondence of Methods Method RWR SSL FABP Matrix [I – c AD-1] [I + a(D - A)] [I + a D - c’A] 1 Toutiao Lab 1 1 Unknown × x = (1 -c)y × x = y × bh = φh d 1 0 d 2 1 0 1 d 3 0 1 0 adjacency matrix (c) C. Faloutsos, 2016 ? final labels/ beliefs 0 1 1 prior labels/ beliefs 72

CMU SCS runtime (min) Results: Scalability # of edges (Kronecker graphs) FABP is linear

CMU SCS runtime (min) Results: Scalability # of edges (Kronecker graphs) FABP is linear on the number of edges. Toutiao Lab (c) C. Faloutsos, 2016 73

CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products,

CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products, sellers and positive/negative ratings with “seed labels” • Find the top k most fraudulent users, products and sellers Toutiao Lab (c) C. Faloutsos, 2016 74

CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products,

CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products, sellers and positive/negative ratings with “seed labels” • Find the top k most fraudulent users, products and sellers Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 75 submission to VLDB 2017

CMU SCS Problem: e-commerce ratings fraud Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP:

CMU SCS Problem: e-commerce ratings fraud Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 76 submission to VLDB 2017

CMU SCS Zoo. BP: features Fast; convergence guarantees. Near-perfect accuracy linear in graph size

CMU SCS Zoo. BP: features Fast; convergence guarantees. Near-perfect accuracy linear in graph size Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 77 submission to VLDB 2017

CMU SCS Zoo. BP in the real world • Near 100% precision on top

CMU SCS Zoo. BP in the real world • Near 100% precision on top 300 users (Flipkart) • Flagged users: suspicious • • 400 ratings in 1 sec 5000 good ratings and no bad ratings Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 78 submission to VLDB 2017

CMU SCS Summary of Part#1 • *many* patterns in real graphs – Power-laws everywhere

CMU SCS Summary of Part#1 • *many* patterns in real graphs – Power-laws everywhere – Long (and growing) list of tools for anomaly/fraud detection Patterns anomalies Toutiao Lab (c) C. Faloutsos, 2016 79

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2:

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 80

CMU SCS Part 2: Time evolving graphs; tensors Toutiao Lab (c) C. Faloutsos, 2016

CMU SCS Part 2: Time evolving graphs; tensors Toutiao Lab (c) C. Faloutsos, 2016 81

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies n o s n joh smith Toutiao Lab (c) C. Faloutsos, 2016 82

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies Toutiao Lab (c) C. Faloutsos, 2016 83

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies Tue Mon Toutiao Lab (c) C. Faloutsos, 2016 84

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who

CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies e tim caller callee Toutiao Lab (c) C. Faloutsos, 2016 85

CMU SCS Graphs over time -> tensors! • Problem #2. 1’: – Given author-keyword-date

CMU SCS Graphs over time -> tensors! • Problem #2. 1’: – Given author-keyword-date – Find patterns / anomalies e t a MANY more settings, with >2 ‘modes’ d author keyword Toutiao Lab (c) C. Faloutsos, 2016 86

CMU SCS Graphs over time -> tensors! • Problem #2. 1’’: – Given subject

CMU SCS Graphs over time -> tensors! • Problem #2. 1’’: – Given subject – verb – object facts – Find patterns / anomalies b r e MANY more settings, with >2 ‘modes’ v subject object Toutiao Lab (c) C. Faloutsos, 2016 87

CMU SCS Graphs over time -> tensors! • Problem #2. 1’’’: – Given <triplets>

CMU SCS Graphs over time -> tensors! • Problem #2. 1’’’: – Given <triplets> – Find patterns / anomalies 3 e d o m MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes) mode 1 mode 2 Toutiao Lab (c) C. Faloutsos, 2016 88

CMU SCS Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks M

CMU SCS Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks M products N users Toutiao Lab ‘meat-eaters’ ‘vegetarians’ ‘kids’ ‘steaks’ ‘cookies’ ‘plants’ ~ + (c) C. Faloutsos, 2016 + 89

CMU SCS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M

CMU SCS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 90

CMU SCS Answer: tensor factorization • PARAFAC decomposition artists athletes ve rb politicians subject

CMU SCS Answer: tensor factorization • PARAFAC decomposition artists athletes ve rb politicians subject = object Toutiao Lab + (c) C. Faloutsos, 2016 + 91

CMU SCS Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when ? ?

CMU SCS Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when ? ? tim e – 4 M x 15 days caller = callee Toutiao Lab + (c) C. Faloutsos, 2016 + 92

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Toutiao Lab (c) C. Faloutsos, 2016 93

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Toutiao Lab (c) C. Faloutsos, 2016 94

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call

CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com 2: Fast ~200 calls to EACH receiver on EACH day! Automatic Discovery of (c)Temporal (Comet) Communities. Toutiao Lab C. Faloutsos, 2016 95 PAKDD 2014, Tainan, Taiwan.

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2:

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns – inter-arrival time • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 96

CMU SCS Universidade de São Paulo KDD 2015 – Sydney, Australia RSC: Mining and

CMU SCS Universidade de São Paulo KDD 2015 – Sydney, Australia RSC: Mining and Modeling Temporal Activity in Social Media Alceu F. Costa* Yuto Yamaguchi Caetano Traina Jr. *alceufc@icmc. usp. br Agma J. M. Traina Christos Faloutsos

CMU SCS Pattern Mining: Datasets Reddit Dataset Twitter Dataset Time-stamp from comments 21, 198

CMU SCS Pattern Mining: Datasets Reddit Dataset Twitter Dataset Time-stamp from comments 21, 198 users 20 Million time-stamps Time-stamp from tweets 6, 790 users 16 Million time-stamps For each user we have: Sequence of postings time-stamps: T = (t 1, t 2, t 3, …) Inter-arrival times (IAT) of postings: (∆1, ∆2, ∆3, …) t 1 Toutiao Lab ∆3 ∆2 ∆1 t 2 t 3 (c) C. Faloutsos, 2016 t 4 time 98

CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be

CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis) Toutiao Lab (c) C. Faloutsos, 2016 Reddit Users Twitter Users 99

CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be

CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings No surprises – IAT Complementary Cumulative Distribution Function (CCDF) Should(log-log we axis) give up? Toutiao Lab (c) C. Faloutsos, 2016 Reddit Users Twitter Users 100

CMU SCS Human? Robots? linear log Toutiao Lab (c) C. Faloutsos, 2016 101

CMU SCS Human? Robots? linear log Toutiao Lab (c) C. Faloutsos, 2016 101

CMU SCS Human? Robots? 2’ 3 h 1 day linear log Toutiao Lab (c)

CMU SCS Human? Robots? 2’ 3 h 1 day linear log Toutiao Lab (c) C. Faloutsos, 2016 102

CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve

CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Twitter Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Toutiao Lab (c) C. Faloutsos, 2016 103

CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve

CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Reddit Toutiao Lab (c) C. Faloutsos, 2016 Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots 104

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2:

CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns • inter-arrival time • Network growth • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 105

Beyond Sigmoids: the Net. Tide Model for Social Network Growth and its Applications KDD’’

Beyond Sigmoids: the Net. Tide Model for Social Network Growth and its Applications KDD’’ 16 Chengxi Zang 臧承熙, Peng Cui, CF 106

PROBLEM: n(t) and e(t), over time? • n(t): the number of nodes. • e(t):

PROBLEM: n(t) and e(t), over time? • n(t): the number of nodes. • e(t): the number of edges. • E. g. : – How many members will have next month? – How many friendship links will have next year? • Linear? • Exponential? • Sigmoid? C C/2 0

Datasets • We. Chat 2011/1 -2013/1 300 M nodes, 4. 75 B links •

Datasets • We. Chat 2011/1 -2013/1 300 M nodes, 4. 75 B links • Ar. Xiv 1992/3 -2002/3 17 k nodes, 2. 4 M links • Enron 1998/1 -2002/7 86 K nodes, 600 K links • Weibo links 2006 165 K nodes, 331 K

A: Power Law Growth Cumulative growth(Log-Log scale)

A: Power Law Growth Cumulative growth(Log-Log scale)

Details Proposed: Net. Tide Model • Nodes n(t) • Links e(t)

Details Proposed: Net. Tide Model • Nodes n(t) • Links e(t)

Net. Tide-Node Model Details • Intuition: • Rich-get-richer • Limitation • Fizzling nature =

Net. Tide-Node Model Details • Intuition: • Rich-get-richer • Limitation • Fizzling nature = SI; ~Bass 111

Net. Tide-Node Model Details #nodes(t) • Intuition: • Rich-get-richer • Limitation • Fizzling nature

Net. Tide-Node Model Details #nodes(t) • Intuition: • Rich-get-richer • Limitation • Fizzling nature Total population = SI; ~Bass 112

Results: Accuracy

Results: Accuracy

Results: Accuracy

Results: Accuracy

Results: Accuracy 115

Results: Accuracy 115

Results: Accuracy 116

Results: Accuracy 116

Results: Accuracy 117

Results: Accuracy 117

Results: Forecast We. Chat from 100 million to 300 million 730 days ahead 118

Results: Forecast We. Chat from 100 million to 300 million 730 days ahead 118

CMU SCS Part 2: Conclusions • Time-evolving / heterogeneous graphs -> tensors • PARAFAC

CMU SCS Part 2: Conclusions • Time-evolving / heterogeneous graphs -> tensors • PARAFAC finds patterns • Surprising temporal patterns (P. L. growth) = Toutiao Lab (c) C. Faloutsos, 2016 119

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1:

CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions Toutiao Lab (c) C. Faloutsos, 2016 120

CMU SCS Thanks Disclaimer: All opinions are mine; not necessarily reflecting the opinions of

CMU SCS Thanks Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M 45), LLNL, IBM, SPRINT, Toutiao Lab (c) C. Faloutsos, 2016 121 Google, INTEL, HP, i. Lab

CMU SCS Cast Akoglu, Leman Kang, U Toutiao Lab Araujo, Miguel Beutel, Alex Koutra,

CMU SCS Cast Akoglu, Leman Kang, U Toutiao Lab Araujo, Miguel Beutel, Alex Koutra, Papalexakis, Danai Vagelis Chau, Polo Shah, Neil (c) C. Faloutsos, 2016 Eswaran, Dhivya Shin, Kijung Hooi, Bryan Song, Hyun Ah 122

CMU SCS CONCLUSION#1 – Big data • Patterns Anomalies • Large datasets reveal patterns/outliers

CMU SCS CONCLUSION#1 – Big data • Patterns Anomalies • Large datasets reveal patterns/outliers that are invisible otherwise Toutiao Lab (c) C. Faloutsos, 2016 123

CMU SCS CONCLUSION#2 – tensors • powerful tool = 1 caller Toutiao Lab 5

CMU SCS CONCLUSION#2 – tensors • powerful tool = 1 caller Toutiao Lab 5 receivers (c) C. Faloutsos, 2016 4 days of activity 124

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and

CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Toutiao Lab (c) C. Faloutsos, 2016 125

CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 126

CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 126

CMU SCS Thank you! Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 127

CMU SCS Thank you! Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 127