CMU SCS Large Graph Mining Patterns Tools and
- Slides: 128
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU
CMU SCS Thank you! • Brian Gallagher • Jan Winfield LLNL, Jan 2013 C. Faloutsos (CMU) 2
CMU SCS Roadmap • Introduction – Motivation – Why ‘big data’ – Why (big) graphs? • • Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 3
CMU SCS Why ‘big data’ • Why? • What is the problem definition? • What are the major research challenges? LLNL, Jan 2013 C. Faloutsos (CMU) 4
CMU SCS Main message: Big data: often > experts • ‘Super Crunchers’ Why Thinking-By-Numbers is the New Way To Be Smart by Ian Ayres, 2008 • Google won the machine translation competition 2005 • http: //www. itl. nist. gov/iad/mig//tests/mt/2005/doc/mt 05 eval_official_r esults_release_20050801_v 3. html LLNL, Jan 2013 C. Faloutsos (CMU) 5
CMU SCS Problem definition – big picture Tera/Peta-byte data LLNL, Jan 2013 Analytics C. Faloutsos (CMU) Insights, outliers 6
CMU SCS Problem definition – big picture Tera/Peta-byte data Analytics Insights, outliers Main emphasis in this talk LLNL, Jan 2013 C. Faloutsos (CMU) 7
CMU SCS Problem definition – big picture Tera/Peta-byte data LLNL, Jan 2013 (my personal) rules of thumb: if data • fits in memory -> R, matlab, scipy • single disk -> RDBMS (sqlite 3, mysql, postgres) • multiple (<100 -1000) disks: parallel RDBMS (Vertica, Tera. Data) • multiple (>1000) disks: hadoop, pig C. Faloutsos (CMU) 8
CMU SCS (Free) Resource for graphs Open source system for mining huge graphs: PEGASUS project (PEta Gr. Aph mining System) • www. cs. cmu. edu/~pegasus • Apache license for s/w • code and papers LLNL, Jan 2013 C. Faloutsos (CMU) 9
CMU SCS Research challenges • The usual ones from data mining – Data cleansing – Feature engineering –… PLUS – Scalability ( < O(N**2)) – Real data *disobey* textbook assumptions (uniformity, independence, Gaussian, Poisson) with huge performance implications LLNL, Jan 2013 C. Faloutsos (CMU) 10
CMU SCS Roadmap • Introduction – Motivation – Why ‘big data’ – Why (big) graphs? • • Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 11
CMU SCS Graphs - why should we care? Food Web [Martinez ’ 91] >$10 B revenue >0. 5 B users Internet Map [lumeta. com] LLNL, Jan 2013 C. Faloutsos (CMU) 12
CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: LLNL, Jan 2013 C. Faloutsos (CMU) T 1 13
CMU SCS Graphs - why should we care? • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . • Subject-verb-object -> graph • Many-to-many db relationship -> graph LLNL, Jan 2013 C. Faloutsos (CMU) 14
CMU SCS Outline • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs – Weighted graphs – Time evolving graphs • Problem#2: Tools • Problem#3: Scalability • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 15
CMU SCS Problem #1 - network and graph mining • What does the Internet look like? • What does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? LLNL, Jan 2013 C. Faloutsos (CMU) 16
CMU SCS Problem #1 - network and graph mining • What does the Internet look like? • What does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns LLNL, Jan 2013 C. Faloutsos (CMU) 17
CMU SCS Problem #1 - network and graph mining • What does the Internet look like? • What does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns – Large datasets reveal patterns/anomalies that may be invisible otherwise… LLNL, Jan 2013 C. Faloutsos (CMU) 18
CMU SCS Graph mining • Are real graphs random? LLNL, Jan 2013 C. Faloutsos (CMU) 19
CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data LLNL, Jan 2013 C. Faloutsos (CMU) 20
CMU SCS Solution# S. 1 • Power law in the degree distribution [SIGCOMM 99] internet domains att. com log(degree) ibm. com log(rank) LLNL, Jan 2013 C. Faloutsos (CMU) 21
CMU SCS Solution# S. 1 • Power law in the degree distribution [SIGCOMM 99] internet domains att. com log(degree) ibm. com -0. 82 log(rank) LLNL, Jan 2013 C. Faloutsos (CMU) 22
CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix LLNL, Jan 2013 C. Faloutsos (CMU) 23
CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Mihail, Papadimitriou ’ 02]: slope is ½ of rank exponent LLNL, Jan 2013 C. Faloutsos (CMU) 24
CMU SCS But: How about graphs from other domains? LLNL, Jan 2013 C. Faloutsos (CMU) 25
CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic Count (log scale) Zipf ``ebay’’ users sites in-degree (log scale) LLNL, Jan 2013 C. Faloutsos (CMU) 26
CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree LLNL, Jan 2013 C. Faloutsos (CMU) 27
CMU SCS And numerous more • • # of sexual contacts Income [Pareto] –’ 80 -20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) • Size of files of a user • … • ‘Black swans’ LLNL, Jan 2013 C. Faloutsos (CMU) 28
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs • degree, diameter, eigen, • triangles • cliques – Weighted graphs – Time evolving graphs • Problem#2: Tools LLNL, Jan 2013 C. Faloutsos (CMU) 29
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles LLNL, Jan 2013 C. Faloutsos (CMU) 30
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? LLNL, Jan 2013 C. Faloutsos (CMU) 31
CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions LLNL, Jan 2013 X-axis: # of participating triangles Y: count (~ pdf) C. Faloutsos (CMU) 32
CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions LLNL, Jan 2013 X-axis: # of participating triangles Y: count (~ pdf) C. Faloutsos (CMU) 33
CMU SCS Triangle Law: #S. 4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n 1. 6 triangles Epinions LLNL, Jan 2013 C. Faloutsos (CMU) 34
CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] LLNL, Jan 2013 C. Faloutsos (CMU) 38
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] LLNL, Jan 2013 C. Faloutsos (CMU) 39
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] LLNL, Jan 2013 C. Faloutsos (CMU) 40
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’ 11] LLNL, Jan 2013 C. Faloutsos (CMU) 41
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs • degree, diameter, eigen, • triangles • cliques – Weighted graphs – Time evolving graphs • Problem#2: Tools LLNL, Jan 2013 C. Faloutsos (CMU) 42
CMU SCS Observations on weighted graphs? • A: yes - even more ‘laws’! M. Mc. Glohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 LLNL, Jan 2013 C. Faloutsos (CMU) 43
CMU SCS Observation W. 1: Fortification Q: How do the weights of nodes relate to degree? LLNL, Jan 2013 C. Faloutsos (CMU) 44
CMU SCS Observation W. 1: Fortification More donors, more $ ? $10 $5 $7 ‘Reagan’ ‘Clinton’ LLNL, Jan 2013 C. Faloutsos (CMU) 45
CMU SCS Observation W. 1: fortification: Snapshot Power Law • Weight: super-linear on in-degree • exponent ‘iw’: 1. 01 < iw < 1. 26 More donors, even more $ $10 In-weights ($) Orgs-Candidates e. g. John Kerry, $10 M received, from 1 K donors $5 Edges (# donors) LLNL, Jan 2013 C. Faloutsos (CMU) 46
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs – Weighted graphs – Time evolving graphs • Problem#2: Tools • … LLNL, Jan 2013 C. Faloutsos (CMU) 47
CMU SCS Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) LLNL, Jan 2013 C. Faloutsos (CMU) 48
CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? LLNL, Jan 2013 C. Faloutsos (CMU) 49
CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time LLNL, Jan 2013 C. Faloutsos (CMU) 50
CMU SCS T. 1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 diameter – 2. 9 M nodes – 16. 5 M edges time [years] LLNL, Jan 2013 C. Faloutsos (CMU) 51
CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) LLNL, Jan 2013 C. Faloutsos (CMU) 52
CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ LLNL, Jan 2013 C. Faloutsos (CMU) 53
CMU SCS T. 2 Densification – Patent Citations • Citations among patents granted E(t) • @1999 – 2. 9 M nodes – 16. 5 M edges 1. 66 • Each year is a datapoint N(t) LLNL, Jan 2013 C. Faloutsos (CMU) 54
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs – Weighted graphs – Time evolving graphs • Problem#2: Tools • … LLNL, Jan 2013 C. Faloutsos (CMU) 55
CMU SCS T. 3 : popularity over time # in links 1 2 3 lag: days after post Post popularity drops-off – exponentially? @t @t + lag LLNL, Jan 2013 C. Faloutsos (CMU) 56
CMU SCS T. 3 : popularity over time # in links (log) days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? LLNL, Jan 2013 C. Faloutsos (CMU) 57
CMU SCS T. 3 : popularity over time # in links (log) -1. 6 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 • close to -1. 5: Barabasi’s stack model • and like the zero-crossings of a random walk LLNL, Jan 2013 C. Faloutsos (CMU) 58
CMU SCS -1. 5 slope J. G. Oliveira & A. -L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF] Prob(RT > x) (log) Response time (log) LLNL, Jan 2013 C. Faloutsos (CMU) 59
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Tools – Belief Propagation – Tensors – Spike analysis • Problem#3: Scalability • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 60
CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’ 07] LLNL, Jan 2013 C. Faloutsos (CMU) 61
CMU SCS E-bay Fraud detection LLNL, Jan 2013 C. Faloutsos (CMU) 62
CMU SCS E-bay Fraud detection LLNL, Jan 2013 C. Faloutsos (CMU) 63
CMU SCS E-bay Fraud detection - Net. Probe LLNL, Jan 2013 C. Faloutsos (CMU) 64
CMU SCS E-bay Fraud detection - Net. Probe Compatibility matrix F F A H 99% heterophily 99% 49% LLNL, Jan 2013 49% C. Faloutsos (CMU) 65
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Tools – Belief Propagation – Tensors – Spike analysis • Problem#3: Scalability • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 66
CMU SCS Giga. Tensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Evangelos Papalexakis Abhay Harpale Christos Faloutsos KDD’ 12 LLNL, Jan 2013 C. Faloutsos (CMU) 67
CMU SCS Background: Tensor • Tensors (=multi-dimensional arrays) are everywhere – Hyperlinks &anchor text [Kolda+, 05] Anchor Text 1 URL 2 1 1 1 URL 1 LLNL, Jan 2013 C. Faloutsos (CMU) 1 1 C# 1 C++ Java 68
CMU SCS Background: Tensor • Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base “Eric Clapton plays guitar” “Barrack Obama is the president of U. S. ” LLNL, Jan 2013 (48 M) (26 M) NELL (Never Ending Language Learner) data Nonzeros =144 M (26 M) C. Faloutsos (CMU) 69
CMU SCS Background: Tensor • Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base Time-stamp IP-source LLNL, Jan 2013 IP-destination C. Faloutsos (CMU) Anomaly Detection in Computer networks 70
CMU SCS all I learned on tensors: from Nikos Sidiropoulos UMN LLNL, Jan 2013 C. Faloutsos (CMU) Tamara Kolda, Sandia Labs (tensor toolbox) 71
CMU SCS Problem Definition • How to decompose a billion-scale tensor? – Corresponds to SVD in 2 D case LLNL, Jan 2013 C. Faloutsos (CMU) 72
CMU SCS Problem Definition q q q Q 1: Dominant concepts/topics? Q 2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM) (48 M) (26 M) NELL (Never Ending Language Learner) data Nonzeros =144 M (26 M) LLNL, Jan 2013 C. Faloutsos (CMU) 73
CMU SCS Experiments • Giga. Tensor solves 100 x larger problem (K) (J) Giga. Tensor r o s n box e T ool T LLNL, Jan 2013 100 x Out of Memory C. Faloutsos (CMU) (I) Number of nonzero = I / 50 74
CMU SCS A 1: Concept Discovery • Concept Discovery in Knowledge Base LLNL, Jan 2013 C. Faloutsos (CMU) 75
CMU SCS LLNL, Jan 2013 A 1: Concept Discovery C. Faloutsos (CMU) 76
CMU SCS A 2: Synonym Discovery LLNL, Jan 2013 C. Faloutsos (CMU) 77
CMU SCS Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Tools – Belief propagation – Tensors – Spike analysis • Problem#3: Scalability -PEGASUS • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 78
CMU SCS Rise and fall patterns in social media • Meme (# of mentions in blogs) – short phrases Sourced from U. S. politics in 2008 “you can put lipstick on a pig” “yes we can” LLNL, Jan 2013 C. Faloutsos (CMU) 79
CMU SCS Rise and fall patterns in social media • Can we find a unifying model, which includes these patterns? • four classes on You. Tube [Crane et al. ’ 08] • six classes on Meme [Yang et al. ’ 11] LLNL, Jan 2013 C. Faloutsos (CMU) 80
CMU SCS Rise and fall patterns in social media • Answer: YES! • We can represent all patterns by single model In Matsubara+ SIGKDD 2012 LLNL, Jan 2013 C. Faloutsos (CMU) 81
CMU SCS Main idea - Spike. M - 1. Un-informed bloggers (uninformed about rumor) - 2. External shock at time nb (e. g, breaking news) - 3. Infection (word-of-mouth) Time n=0 Time n=nb β Time n=nb+1 Infectiveness of a blog-post at age n: - Strength of infection (quality of news) LLNL, Jan 2013 Decay function C. Faloutsos (CMU) 82
CMU SCS Main idea - Spike. M - 1. Un-informed bloggers (uninformed about rumor) - 2. External shock at time nb (e. g, breaking news) - 3. Infection (word-of-mouth) Time n=0 Time n=nb β Time n=nb+1 Infectiveness of a blog-post at age n: - Strength of infection (quality of news) LLNL, Jan 2013 Decay function C. Faloutsos (CMU) 83
Details CMU SCS Spike. M - with periodicity • Full equation of Spike. M Periodicity noon Bloggers change their activity over time activity (e. g. , daily, weekly, yearly) Peak 3 am Dip Time n LLNL, Jan 2013 C. Faloutsos (CMU) 84
CMU SCS Details • Analysis – exponential rise and power-raw fall Rise-part Lin-log SI -> exponential Spike. M -> exponential Log-log LLNL, Jan 2013 C. Faloutsos (CMU) 85
CMU SCS Details • Analysis – exponential rise and power-raw fall Fall-part Lin-log SI -> exponential Spike. M -> power law LLNL, Jan 2013 C. Faloutsos (CMU) Log-log 86
CMU SCS Tail-part forecasts • Spike. M can capture tail part LLNL, Jan 2013 C. Faloutsos (CMU) 87
CMU SCS “What-if” forecasting (1) First spike (2) Release date (3) Two weeks before release ? ? e. g. , given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date LLNL, Jan 2013 C. Faloutsos (CMU) 88
CMU SCS “What-if” forecasting (1) First spike (2) Release date (3) Two weeks before release Spike. M can forecast upcoming spikes LLNL, Jan 2013 C. Faloutsos (CMU) 89
CMU SCS Roadmap • • Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability –PEGASUS – Diameter – Connected components • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 90
CMU SCS Scalability • Google: > 450, 000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] • • Yahoo: 5 Pb of data [Fayyad, KDD’ 07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http: //hadoop. apache. org/ LLNL, Jan 2013 C. Faloutsos (CMU) 91
CMU SCS Roadmap – Algorithms & results Degree Distr. Pagerank Diameter/ANF Conn. Comp Triangles Visualization LLNL, Jan 2013 Centralized Hadoop/PEG ASUS old old old HERE old done started C. Faloutsos (CMU) 92
CMU SCS HADI for diameter estimation • Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’ 10 • Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1 B) • Our HADI: linear on E (~10 B) – Near-linear scalability wrt # machines – Several optimizations -> 5 x faster LLNL, Jan 2013 C. Faloutsos (CMU) 93
CMU SCS Count ? ? 19+ [Barabasi+] ~1999, ~1 M nodes Radius LLNL, Jan 2013 C. Faloutsos (CMU) 94
CMU SCS ? ? Count ? ? 19+ [Barabasi+] ~1999, ~1 M nodes Radius Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • Largest publicly available graph ever studied. LLNL, Jan 2013 C. Faloutsos (CMU) 95
CMU SCS Count 14 (dir. ) ? ? ~7 (undir. ) 19+? [Barabasi+] Radius Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • Largest publicly available graph ever studied. LLNL, Jan 2013 C. Faloutsos (CMU) 96
CMU SCS Count 14 (dir. ) ? ? ~7 (undir. ) 19+? [Barabasi+] Radius Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • 7 degrees of separation (!) • Diameter: shrunk LLNL, Jan 2013 C. Faloutsos (CMU) 97
CMU SCS Count ? ? ~7 (undir. ) Radius Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) Q: Shape? LLNL, Jan 2013 C. Faloutsos (CMU) 98
CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality (? !) LLNL, Jan 2013 C. Faloutsos (CMU) 99
CMU SCS Radius Plot of GCC of Yahoo. Web. LLNL, Jan 2013 C. Faloutsos (CMU) 100
CMU SCS Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores. LLNL, Jan 2013 C. Faloutsos (CMU) 101
CMU SCS Conjecture: DE EN ~7 BR Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores. LLNL, Jan 2013 C. Faloutsos (CMU) 102
CMU SCS Conjecture: ~7 Yahoo. Web graph (120 Gb, 1. 4 B nodes, 6. 6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores. LLNL, Jan 2013 C. Faloutsos (CMU) 103
CMU SCS Roadmap • • Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability –PEGASUS – Diameter – Connected components • Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 105
CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). LLNL, Jan 2013 C. Faloutsos (CMU) 106
CMU SCS details Generalized Iterated Matrix Vector Multiplication (GIMV) • Page. Rank • proximity (RWR) • Diameter • Connected components • (eigenvectors, • Belief Prop. • …) LLNL, Jan 2013 C. Faloutsos (CMU) Matrix – vector Multiplication (iterated) 107
CMU SCS Example: GIM-V At Work • Connected Components – 4 observations: Count Size LLNL, Jan 2013 C. Faloutsos (CMU) 108
CMU SCS Example: GIM-V At Work • Connected Components Count 1) 10 K x larger than next Size LLNL, Jan 2013 C. Faloutsos (CMU) 109
CMU SCS Example: GIM-V At Work • Connected Components Count 2) ~0. 7 B singleton nodes Size LLNL, Jan 2013 C. Faloutsos (CMU) 110
CMU SCS Example: GIM-V At Work • Connected Components Count 3) SLOPE! Size LLNL, Jan 2013 C. Faloutsos (CMU) 111
CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? 4) Spikes! Size LLNL, Jan 2013 C. Faloutsos (CMU) 112
CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not existing now) LLNL, Jan 2013 Size C. Faloutsos (CMU) 113
CMU SCS GIM-V At Work • Connected Components over Time • Linked. In: 7. 5 M nodes and 58 M edges Stable tail slope after the gelling point LLNL, Jan 2013 C. Faloutsos (CMU) 114
CMU SCS Roadmap • • • Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions LLNL, Jan 2013 C. Faloutsos (CMU) 115
CMU SCS OVERALL CONCLUSIONS – low level: • Several new patterns (fortification, triangle -laws, conn. components, etc) • New tools: – belief propagation, giga. Tensor, etc • Scalability: PEGASUS / hadoop LLNL, Jan 2013 C. Faloutsos (CMU) 116
CMU SCS OVERALL CONCLUSIONS – high level • BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise LLNL, Jan 2013 C. Faloutsos (CMU) 117
CMU SCS Theory & Algo. Comp. Systems ML & Stats. Biology Graph Analytics Physics Social Science Econ. 118 LLNL, Jan 2013 C. Faloutsos (CMU)
CMU SCS References • Leman Akoglu, Christos Faloutsos: RTG: A Recursive Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13 -28 • Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006) LLNL, Jan 2013 C. Faloutsos (CMU) 119
CMU SCS References • Deepayan Chakrabarti, Yang Wang, Chenxi Wang, Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008) • Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P 2 P Networks. INFOCOM 2007: 1316 -1324 LLNL, Jan 2013 C. Faloutsos (CMU) 120
CMU SCS References • Christos Faloutsos, Tamara G. Kolda, Jimeng Sun: Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174 LLNL, Jan 2013 C. Faloutsos (CMU) 121
CMU SCS References • T. G. Kolda and J. Sun. Scalable Tensor Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363 -372, December 2008. LLNL, Jan 2013 C. Faloutsos (CMU) 122
CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). • Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133 -145 LLNL, Jan 2013 C. Faloutsos (CMU) 123
CMU SCS References • Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’ 12, pp. 6 -14, Beijing, China, August 2012 LLNL, Jan 2013 C. Faloutsos (CMU) 124
CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. • Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, Graph. Scope: Parameterfree Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007 LLNL, Jan 2013 C. Faloutsos (CMU) 125
CMU SCS References • Jimeng Sun, Dacheng Tao, Christos Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374383 LLNL, Jan 2013 C. Faloutsos (CMU) 126
CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong. • Hanghang Tong, Christos Faloutsos, Center -Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA LLNL, Jan 2013 C. Faloutsos (CMU) 127
CMU SCS References • Hanghang Tong, Christos Faloutsos, Brian Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737 -746 LLNL, Jan 2013 C. Faloutsos (CMU) 128
CMU SCS Project info & ‘thanks’ www. cs. cmu. edu/~pegasus Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M 45), LLNL, IBM, SPRINT, LLNL, Jan 2013 C. Faloutsos (CMU) 129 Google, INTEL, HP, i. Lab
CMU SCS Cast Akoglu, Leman Mc. Glohon, Mary LLNL, Jan 2013 Beutel, Alex Prakash, Aditya Chau, Polo Kang, U Papalexakis, Vagelis C. Faloutsos (CMU) Koutra, Danae Tong, Hanghang 130
CMU SCS Thanks to LLNL colleagues Brian Gallagher LLNL, Jan 2013 C. Faloutsos (CMU) Keith Henderson 131
CMU SCS Take-home message Tera/Peta-byte data Analytics Insights, outliers Big data reveal insights that would be invisible otherwise (even to experts) LLNL, Jan 2013 C. Faloutsos (CMU) 132
- Data mining cmu
- Cmu data mining
- Difference between strip mining and open pit mining
- Text and web mining
- Strip mining vs open pit mining
- Strip mining vs open pit mining
- Mining multimedia databases in data mining
- Eck
- Associations and correlations in data mining
- Eclat algorithm
- Mining frequent patterns associations and correlations
- Cmu graph theory
- Reporting and query tools in data mining
- Mining frequent patterns without candidate generation
- Mining frequent patterns without candidate generation
- Desco industries sanford nc
- Scs method
- Numero de curva scs
- Tikungan spiral spiral
- Antecedent moisture condition
- Dioda triac
- Scs curve number
- Curva tiristor
- Color 9132005
- Scs.ryerson.ca harley
- Simbol scs
- Scs reasonable person principle
- Scs thyristor
- Scs carleton
- Scs archiver
- Jenis lengkung
- Scs elogs
- Scs lulu
- Scs methode
- Doc scs
- Scs scanner
- Arabesque: a system for distributed graph mining
- How should mining graph look like
- Dating serves several important functions that include
- Pregel: a system for large-scale graph processing
- Aser: a large-scale eventuality knowledge graph
- Resource allocation graph and wait for graph
- Lesson 9 graph patterns
- The appropriate cutting tool used in cutting fabrics
- Algori
- Bridge graph
- Cmu 16-385
- Kesden cmu
- Hci major
- Igemdock
- Cmu 15-418
- Lorrie faith cranor
- Mitchell 1997 machine learning
- 14848 cmu
- 14848 cmu
- Hui zhang cmu
- Cmu 15-410
- Vyas sekar
- Anupam datta cmu
- David blackman
- Cloud computing lecture
- Tom cortina cmu
- Cmu snake robot
- Tauro+
- Cmu it
- Hanti lin
- Autolab cmu
- Cmu proxy lab
- Garbled bytes
- Malloc lab cmu
- Cmu shell lab
- Cmu 15 213
- 15 213 cmu
- Ghc cmu
- Cache lab part b
- Autolab.andrew.cmu
- 18742 cmu
- 18734 cmu
- Foundations of privacy
- 15441 cmu
- 15 213 cmu
- Umut acar
- Ryan o'donnell cmu
- 15213 malloc lab
- 16385 cmu
- Vanessa branch cmu
- Cmu parallel computing
- Cmu 16720
- Cmu robotics minor
- Cmu proxy lab
- Cmu two factor authentication
- Rowena mittal
- Sio cmu
- Cmu autolab
- Canvas cmu
- Canvas cmu
- Canvas.cmu edu
- Cmu bomb threat
- 15213 bomb lab
- Ohqueue cmu
- Cs theory toolkit
- Triangulation
- Tetrad cmu
- Citi training cmu
- Zack weinberg cmu
- Cmu machine learning
- Cmu pitt computational biology
- Proxy lab cmu
- Cmu sio
- Attack lab
- Cmu attack lab
- 15-440 cmu
- Sphoorti joglekar
- Lee weiss cmu
- David garlan cmu
- Ryan o'donnell cmu
- Cmu cameo
- Cmu library
- Cmu library
- Artzner coherent measures of risk
- Cmu
- Cmu
- Healthconnect cmu
- Cmu panoptic dataset
- Articulated masonry veneer
- Cmu vpn
- Bill nace cmu
- Cmu 15-110
- Cmu speech recognition