CMU SCS Anomaly detection in large graphs Christos
- Slides: 127
CMU SCS Anomaly detection in large graphs Christos Faloutsos CMU
CMU SCS Thank you! • Dr. Lei Li Toutiao Lab (c) C. Faloutsos, 2016 2
CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 3
CMU SCS Graphs - why should we care? >$10 B; ~1 B users Toutiao Lab (c) C. Faloutsos, 2016 4
CMU SCS Graphs - why should we care? Food Web [Martinez ’ 91] Internet Map [lumeta. com] Toutiao Lab (c) C. Faloutsos, 2016 5
CMU SCS Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • Recommendation systems • . . • Many-to-many db relationship -> graph Toutiao Lab (c) C. Faloutsos, 2016 6
CMU SCS Motivating problems • P 1: patterns? Fraud detection? • P 2: patterns in time-evolving graphs / tensors destination sou rce Toutiao Lab (c) C. Faloutsos, 2016 time 7
CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies • P 2: patterns in time-evolving graphs / tensors destination sou rce Toutiao Lab (c) C. Faloutsos, 2016 time 8
CMU SCS Motivating problems • P 1: patterns? Fraud detection? Patterns anomalies* • P 2: patterns in time-evolving graphs / tensors destination sou rce time * Robust Random Cut Forest Based Anomaly Detection on Streams Sudipto Guha, Nina Mishra , Gourav Roy, Toutiao Lab (c) C. Faloutsos, 2016 9 Okke Schrijvers, ICML’ 16
CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns & fraud detection • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 10
CMU SCS Part 1: Patterns, & fraud detection Toutiao Lab (c) C. Faloutsos, 2016 11
CMU SCS Laws and patterns • Q 1: Are real graphs random? Toutiao Lab (c) C. Faloutsos, 2016 12
CMU SCS Laws and patterns • Q 1: Are real graphs random? • A 1: NO!! – Diameter (‘ 6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data Toutiao Lab (c) C. Faloutsos, 2016 13
CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM 99] internet domains log(degree) att. com ibm. com log(rank) Toutiao Lab (c) C. Faloutsos, 2016 14
CMU SCS Solution# S. 1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) Toutiao Lab (c) C. Faloutsos, 2016 15
CMU SCS S 2: connected component sizes • Connected Components – 4 observations: Count 1. 4 B nodes 6 B edges Size Toutiao Lab (c) C. Faloutsos, 2016 16
CMU SCS S 2: connected component sizes • Connected Components Count 1) 10 K x larger than next Size Toutiao Lab (c) C. Faloutsos, 2016 17
CMU SCS S 2: connected component sizes • Connected Components Count 2) ~0. 7 B singleton nodes Size Toutiao Lab (c) C. Faloutsos, 2016 18
CMU SCS S 2: connected component sizes • Connected Components Count 3) SLOPE! Size Toutiao Lab (c) C. Faloutsos, 2016 19
CMU SCS S 2: connected component sizes • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? 4) Spikes! Size Toutiao Lab (c) C. Faloutsos, 2016 20
CMU SCS S 2: connected component sizes • Connected Components Count suspicious financial-advice sites (not existing now) Toutiao Lab Size (c) C. Faloutsos, 2016 21
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns: Degree; Triangles – P 1. 2: Anomaly/fraud detection • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 22
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles Toutiao Lab (c) C. Faloutsos, 2016 23
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2 x the friends, 2 x the triangles ? Toutiao Lab (c) C. Faloutsos, 2016 24
CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] Reuters Epinions Toutiao Lab SN X-axis: degree Y-axis: mean # triangles n friends -> ~n 1. 6 triangles (c) C. Faloutsos, 2016 25
CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 26
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 27
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 28
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 29
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] Toutiao Lab (c) C. Faloutsos, 2016 30
CMU SCS S 4: k-core patterns - dfn – k-core (of a graph) – degeneracy (of a graph) – coreness (of a vertex) Toutiao Lab (c) C. Faloutsos, 2016 31
CMU SCS Core. Scope: Graph Mining Using k. Core Analysis Patterns, Anomalies, and Algorithms ICDM’ 16 (to appear) Kijung Shin, Tina Eliassi-Rad and CF
CMU SCS Mirror Pattern: Observation – coreness (of a vertex): maximum k such that the vertex belongs to the k-core – Definition: [Mirror Pattern] degree ~ coreness Toutiao Lab (c) C. Faloutsos, 2016 33
CMU SCS Mirror Pattern: Application • Exceptions are ‘strange’ Toutiao Lab (c) C. Faloutsos, 2016 34
CMU SCS MORE Graph Patterns ✔ ✔ ✔ RTG: Toutiao A Recursive Realistic Graph Generator using Random Lab (c) C. Faloutsos, 2016 35 Typing Leman Akoglu and Christos Faloutsos. PKDD’ 09.
CMU SCS MORE Graph Patterns • Mary Mc. Glohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed. : Charu Aggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Toutiao Lab (c) C. Faloutsos, 2016 36
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral Patterns anomalies • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 37
CMU SCS How to find ‘suspicious’ groups? • ‘blocks’ are normal, right? idols fans Toutiao Lab (c) C. Faloutsos, 2016 38
CMU SCS Except that: • ‘blocks’ are normal, right? • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Toutiao Lab (c) C. Faloutsos, 2016 39
CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Q: Can we spot blocks, easily? Toutiao Lab (c) C. Faloutsos, 2016 40
CMU SCS Except that: • ‘blocks’ are usually suspicious • ‘hyperbolic’ communities are more realistic [Araujo+, PKDD’ 14] Q: Can we spot blocks, easily? A: Silver bullet: SVD! Toutiao Lab (c) C. Faloutsos, 2016 41
CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 42
CMU SCS DETAILS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 43
CMU SCS Inferring Strange Behavior from Connectivity Pattern in Social Networks PAKDD’ 14 Meng Jiang, Peng Cui, Shiqiang Yang (Tsinghua) Alex Beutel, Christos Faloutsos (CMU)
CMU SCS Lockstep and Spectral Subspace Plot • Case #0: No lockstep behavior in random power law graph of 1 M nodes, 3 M edges • Random “Scatter” Adjacency Matrix Toutiao Lab Spectral Subspace Plot + (c) C. Faloutsos, 2016 + 45
CMU SCS Lockstep and Spectral Subspace Plot • Case #1: non-overlapping lockstep • “Blocks” “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 46
CMU SCS Lockstep and Spectral Subspace Plot • Case #2: non-overlapping lockstep • “Blocks; low density” Elongation Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 47
CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage” (or “Fame”) Tilting “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 48
CMU SCS Lockstep and Spectral Subspace Plot • Case #3: non-overlapping lockstep • “Camouflage” (or “Fame”) Tilting “Rays” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 49
CMU SCS Lockstep and Spectral Subspace Plot • Case #4: • “? ” ? Adjacency Matrix lockstep “Pearls” Spectral Subspace Plot ? Toutiao Lab (c) C. Faloutsos, 2016 50
CMU SCS Lockstep and Spectral Subspace Plot • Case #4: overlapping lockstep • “Staircase” “Pearls” Adjacency Matrix Toutiao Lab Spectral Subspace Plot (c) C. Faloutsos, 2016 51
CMU SCS Dataset • Tencent Weibo • 117 million nodes (with profile and UGC data) • 3. 33 billion directed edges Toutiao Lab (c) C. Faloutsos, 2016 52
CMU SCS “Rays” Real Data “Pearls” Toutiao Lab “Block” “Staircase” (c) C. Faloutsos, 2016 53
CMU SCS Real Data • Spikes on the out-degree distribution Toutiao Lab (c) C. Faloutsos, 2016 54
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral methods – Suspiciousness • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 55
CMU SCS Suspicious Patterns in Event Data ? 2 -modes ? ? n-modes A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan Toutiao Lab (c) C. Faloutsos, 2016 56 Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.
CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours Toutiao Lab vs. (c) C. Faloutsos, 2016 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses 57
CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses (c) C. Faloutsos, 2016 Answer: volume * D KL(p|| pbackground) Toutiao Lab 58
CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20, 000 Users Retweeting same 20 tweets 6 times each All in 10 hours size vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses contrast (c) C. Faloutsos, 2016 Answer: volume * D KL(p|| pbackground) Toutiao Lab 59
CMU SCS ICDM 2015 Suspicious Patterns in Event Data Retweeting: “Galaxy Note Dream Project: Happy Life Traveling the World” Toutiao Lab (c) C. Faloutsos, 2016 60
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – P 1. 1: Patterns – P 1. 2: Anomaly / fraud detection • No labels – spectral methods • With labels: Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 61
CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’ 07] Toutiao Lab (c) C. Faloutsos, 2016 62
CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 63
CMU SCS E-bay Fraud detection Toutiao Lab (c) C. Faloutsos, 2016 64
CMU SCS E-bay Fraud detection - Net. Probe Toutiao Lab (c) C. Faloutsos, 2016 65
CMU SCS Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code? ’) Toutiao Lab (c) C. Faloutsos, 2016 66
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • No labels - Spectral methods • w/ labels: Belief Propagation – closed formulas • Part#2: time-evolving graphs; tensors • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 67
CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5 -9 September 2011, Athens, Greece
CMU SCS Problem Definition: GBA techniques ? ? ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? Toutiao Lab (c) C. Faloutsos, 2016 69
CMU SCS Are they related? • RWR (Random Walk with Restarts) – google’s page. Rank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Toutiao Lab (c) C. Faloutsos, 2016 70
CMU SCS Are they related? YES! • RWR (Random Walk with Restarts) – google’s page. Rank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Toutiao Lab (c) C. Faloutsos, 2016 71
CMU SCS Correspondence of Methods Method RWR SSL FABP Matrix [I – c AD-1] [I + a(D - A)] [I + a D - c’A] 1 Toutiao Lab 1 1 Unknown × x = (1 -c)y × x = y × bh = φh d 1 0 d 2 1 0 1 d 3 0 1 0 adjacency matrix (c) C. Faloutsos, 2016 ? final labels/ beliefs 0 1 1 prior labels/ beliefs 72
CMU SCS runtime (min) Results: Scalability # of edges (Kronecker graphs) FABP is linear on the number of edges. Toutiao Lab (c) C. Faloutsos, 2016 73
CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products, sellers and positive/negative ratings with “seed labels” • Find the top k most fraudulent users, products and sellers Toutiao Lab (c) C. Faloutsos, 2016 74
CMU SCS Problem: e-commerce ratings fraud • Given a heterogeneous graph on users, products, sellers and positive/negative ratings with “seed labels” • Find the top k most fraudulent users, products and sellers Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 75 submission to VLDB 2017
CMU SCS Problem: e-commerce ratings fraud Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 76 submission to VLDB 2017
CMU SCS Zoo. BP: features Fast; convergence guarantees. Near-perfect accuracy linear in graph size Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 77 submission to VLDB 2017
CMU SCS Zoo. BP in the real world • Near 100% precision on top 300 users (Flipkart) • Flagged users: suspicious • • 400 ratings in 1 sec 5000 good ratings and no bad ratings Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, “Zoo. BP: Belief Propagation for Heterogeneous Networks”, In Toutiao Lab (c) C. Faloutsos, 2016 78 submission to VLDB 2017
CMU SCS Summary of Part#1 • *many* patterns in real graphs – Power-laws everywhere – Long (and growing) list of tools for anomaly/fraud detection Patterns anomalies Toutiao Lab (c) C. Faloutsos, 2016 79
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 80
CMU SCS Part 2: Time evolving graphs; tensors Toutiao Lab (c) C. Faloutsos, 2016 81
CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies n o s n joh smith Toutiao Lab (c) C. Faloutsos, 2016 82
CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies Toutiao Lab (c) C. Faloutsos, 2016 83
CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies Tue Mon Toutiao Lab (c) C. Faloutsos, 2016 84
CMU SCS Graphs over time -> tensors! • Problem #2. 1: – Given who calls whom, and when – Find patterns / anomalies e tim caller callee Toutiao Lab (c) C. Faloutsos, 2016 85
CMU SCS Graphs over time -> tensors! • Problem #2. 1’: – Given author-keyword-date – Find patterns / anomalies e t a MANY more settings, with >2 ‘modes’ d author keyword Toutiao Lab (c) C. Faloutsos, 2016 86
CMU SCS Graphs over time -> tensors! • Problem #2. 1’’: – Given subject – verb – object facts – Find patterns / anomalies b r e MANY more settings, with >2 ‘modes’ v subject object Toutiao Lab (c) C. Faloutsos, 2016 87
CMU SCS Graphs over time -> tensors! • Problem #2. 1’’’: – Given <triplets> – Find patterns / anomalies 3 e d o m MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes) mode 1 mode 2 Toutiao Lab (c) C. Faloutsos, 2016 88
CMU SCS Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks M products N users Toutiao Lab ‘meat-eaters’ ‘vegetarians’ ‘kids’ ‘steaks’ ‘cookies’ ‘plants’ ~ + (c) C. Faloutsos, 2016 + 89
CMU SCS Crush intro to SVD • Recall: (SVD) matrix factorization: finds blocks M idols N fans Toutiao Lab ‘music lovers’ ‘sports lovers’ ‘citizens’ ‘singers’ ‘athletes’ ‘politicians’ ~ + (c) C. Faloutsos, 2016 + 90
CMU SCS Answer: tensor factorization • PARAFAC decomposition artists athletes ve rb politicians subject = object Toutiao Lab + (c) C. Faloutsos, 2016 + 91
CMU SCS Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when ? ? tim e – 4 M x 15 days caller = callee Toutiao Lab + (c) C. Faloutsos, 2016 + 92
CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Toutiao Lab (c) C. Faloutsos, 2016 93
CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Toutiao Lab (c) C. Faloutsos, 2016 94
CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4 M clients, data over 2 weeks Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com 2: Fast ~200 calls to EACH receiver on EACH day! Automatic Discovery of (c)Temporal (Comet) Communities. Toutiao Lab C. Faloutsos, 2016 95 PAKDD 2014, Tainan, Taiwan.
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns – inter-arrival time • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 96
CMU SCS Universidade de São Paulo KDD 2015 – Sydney, Australia RSC: Mining and Modeling Temporal Activity in Social Media Alceu F. Costa* Yuto Yamaguchi Caetano Traina Jr. *alceufc@icmc. usp. br Agma J. M. Traina Christos Faloutsos
CMU SCS Pattern Mining: Datasets Reddit Dataset Twitter Dataset Time-stamp from comments 21, 198 users 20 Million time-stamps Time-stamp from tweets 6, 790 users 16 Million time-stamps For each user we have: Sequence of postings time-stamps: T = (t 1, t 2, t 3, …) Inter-arrival times (IAT) of postings: (∆1, ∆2, ∆3, …) t 1 Toutiao Lab ∆3 ∆2 ∆1 t 2 t 3 (c) C. Faloutsos, 2016 t 4 time 98
CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis) Toutiao Lab (c) C. Faloutsos, 2016 Reddit Users Twitter Users 99
CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings No surprises – IAT Complementary Cumulative Distribution Function (CCDF) Should(log-log we axis) give up? Toutiao Lab (c) C. Faloutsos, 2016 Reddit Users Twitter Users 100
CMU SCS Human? Robots? linear log Toutiao Lab (c) C. Faloutsos, 2016 101
CMU SCS Human? Robots? 2’ 3 h 1 day linear log Toutiao Lab (c) C. Faloutsos, 2016 102
CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Twitter Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Toutiao Lab (c) C. Faloutsos, 2016 103
CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top Reddit Toutiao Lab (c) C. Faloutsos, 2016 Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots 104
CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs – P 2. 1: tools/tensors – P 2. 2: other patterns • inter-arrival time • Network growth • Conclusions Toutiao Lab (c) C. Faloutsos, 2016 105
Beyond Sigmoids: the Net. Tide Model for Social Network Growth and its Applications KDD’’ 16 Chengxi Zang 臧承熙, Peng Cui, CF 106
PROBLEM: n(t) and e(t), over time? • n(t): the number of nodes. • e(t): the number of edges. • E. g. : – How many members will have next month? – How many friendship links will have next year? • Linear? • Exponential? • Sigmoid? C C/2 0
Datasets • We. Chat 2011/1 -2013/1 300 M nodes, 4. 75 B links • Ar. Xiv 1992/3 -2002/3 17 k nodes, 2. 4 M links • Enron 1998/1 -2002/7 86 K nodes, 600 K links • Weibo links 2006 165 K nodes, 331 K
A: Power Law Growth Cumulative growth(Log-Log scale)
Details Proposed: Net. Tide Model • Nodes n(t) • Links e(t)
Net. Tide-Node Model Details • Intuition: • Rich-get-richer • Limitation • Fizzling nature = SI; ~Bass 111
Net. Tide-Node Model Details #nodes(t) • Intuition: • Rich-get-richer • Limitation • Fizzling nature Total population = SI; ~Bass 112
Results: Accuracy
Results: Accuracy
Results: Accuracy 115
Results: Accuracy 116
Results: Accuracy 117
Results: Forecast We. Chat from 100 million to 300 million 730 days ahead 118
CMU SCS Part 2: Conclusions • Time-evolving / heterogeneous graphs -> tensors • PARAFAC finds patterns • Surprising temporal patterns (P. L. growth) = Toutiao Lab (c) C. Faloutsos, 2016 119
CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions Toutiao Lab (c) C. Faloutsos, 2016 120
CMU SCS Thanks Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M 45), LLNL, IBM, SPRINT, Toutiao Lab (c) C. Faloutsos, 2016 121 Google, INTEL, HP, i. Lab
CMU SCS Cast Akoglu, Leman Kang, U Toutiao Lab Araujo, Miguel Beutel, Alex Koutra, Papalexakis, Danai Vagelis Chau, Polo Shah, Neil (c) C. Faloutsos, 2016 Eswaran, Dhivya Shin, Kijung Hooi, Bryan Song, Hyun Ah 122
CMU SCS CONCLUSION#1 – Big data • Patterns Anomalies • Large datasets reveal patterns/outliers that are invisible otherwise Toutiao Lab (c) C. Faloutsos, 2016 123
CMU SCS CONCLUSION#2 – tensors • powerful tool = 1 caller Toutiao Lab 5 receivers (c) C. Faloutsos, 2016 4 days of activity 124
CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Toutiao Lab (c) C. Faloutsos, 2016 125
CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 126
CMU SCS Thank you! Cross-disciplinarity = Toutiao Lab (c) C. Faloutsos, 2016 127
- Flink anomaly detection
- Anomaly detection spark
- System log analysis for anomaly detection
- Agrima seth
- Elasticsearch anomaly detection
- Graphs that enlighten and graphs that deceive
- End behaviour chart
- State graph in software testing
- Graphs that compare distance and time are called
- Community detection gephi
- Oag: toward linking large-scale heterogeneous entity graphs
- Inductive representation learning on large graphs.
- Christos leonidopoulos
- Christos faloutsos
- Christos markou
- Christos takoudis
- Christos chronopoulos
- Christos davatzikos
- Christos hatzis
- Christos h papadimitriou
- Christos papadimitriou columbia
- Christos lenis
- Dr christos anastasiou
- Christos kotselidis
- Christos chronopoulos
- Interstitiella lungsjukdomar
- Christos kanellopoulos
- Christos hatzis
- Christos pcb
- Simbol scs
- Lengkung scs
- Scs elogs
- Scs archiver
- Tirstor
- Skin carotenoid score
- Desco industries sanford nc
- Scs reasonable person principle
- Scs cn method of runoff estimation
- Scs lulu
- Wiki.scs
- Applied hydrology
- Scs thyristor
- Simbol komponen diac
- Scs methode
- Scs.ryerson.ca harley
- Lluvia neta
- Diagram superelevasi scs
- Scs carleton
- Scs curve number
- Doc scs
- Application of data flow testing
- Signature based vs anomaly based
- C 2 =121
- Naxuan
- Anomaly mcm
- How to calculate true anomaly
- Types of anomaly in semantics
- Cisco anomaly detector
- Anomaly score
- Anomaly: instruction "lea" is modifying the stack
- Page replacement fifo
- Carl allen md
- Virtual memory management techniques
- Anomaly management systems
- Mean anomaly
- Mt ararat anomaly
- Anomaly score
- Pyriform aperture stenosis
- Anomaly score
- Arytenoid mucosa
- Data flow anomaly state graph
- Birman cat neutrophil granulation anomaly
- Standardized anomaly formula
- Vascular ring anomaly
- Belady's anomaly example
- Craneosquisis
- Belady's anomaly example
- Homorogeneous
- Bibliothèque cmu
- Cmu snake robot
- 15213 bomb lab
- Two-player
- Spectral clustering
- Rowena mittal
- Cmu rapid prototyping
- Tom mitchell cmu
- Ryan o'donnell cmu
- Cmu 15-441
- Healthconnect cmu
- Autolab.andrew.cmu
- Sphoorti joglekar
- Cmu 15-513
- Cmu bioinformatics
- Cmu machine learning
- Mism carnegie mellon
- Kevin lin cmu
- Canvas cmu
- 15-410 cmu
- Circoncision
- Chris atkeson
- 10417 cmu
- Cmu data mining
- Bill nace cmu
- Tw cmu xxx
- Cmu parallel computing
- Brian railing cmu
- David eber cmu
- 15441 cmu
- Cachelab transpose
- Attack lab
- Malloc lab cmu
- Cmu proxy
- Cmu 15-213
- Cmu graph theory
- Tetrad cmu
- Hoda heidari cmu
- Group x cmu
- Ohqueue cmu
- Cmu 16385
- Cmu 14848
- Machine learning lecture
- Sio cmu
- Cmu panoptic dataset
- 16-385 computer vision
- Cmu 15213
- 15 251
- Cache blocking matrix transpose