CMU SCS Part 1 Graph Mining patterns Christos
- Slides: 63
CMU SCS Part 1: Graph Mining – patterns Christos Faloutsos CMU
CMU SCS Our goal: Open source system for mining huge graphs: PEGASUS project (PEta Gr. Aph mining System) • www. cs. cmu. edu/~pegasus • code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017 2
CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 3
CMU SCS Outline • • Introduction – Motivation Part#1: Patterns in graphs Part#2: Tools (Ranking, proximity) Conclusions Tepper, CMU, April 4 (c) C. Faloutsos, 2017 4
CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Friendship Network [Moody ’ 01] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] 5
CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: Tepper, CMU, April 4 (c) C. Faloutsos, 2017 T 1 6
CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . Tepper, CMU, April 4 (c) C. Faloutsos, 2017 7
CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs Tepper, CMU, April 4 (c) C. Faloutsos, 2017 8
CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 9
CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns Tepper, CMU, April 4 (c) C. Faloutsos, 2017 10
CMU SCS Network and graph mining • How does the Internet look like? • How does Face. Book like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? – To spot anomalies (rarities), we have to discover patterns – Large datasets reveal patterns/anomalies that may be invisible otherwise… Tepper, CMU, April 4 (c) C. Faloutsos, 2017 11
CMU SCS Topology How does the Internet look like? Any rules? (Looks random – right? ) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 12
CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 13
CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data Tepper, CMU, April 4 (c) C. Faloutsos, 2017 14
CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what is the most probable degree? count ? ? 2 Tepper, CMU, April 4 degree (c) C. Faloutsos, 2017 15
CMU SCS Laws – degree distributions • Q: avg degree is ~2 - what is the most probable degree? count 2 Tepper, CMU, April 4 count ? ? degree (c) C. Faloutsos, 2017 2 degree 16
CMU SCS Solution S 1. Power-law: outdegree O Frequency Exponent = slope O = -2. 15 Nov’ 97 Outdegree The plot is linear in log-log scale [FFF’ 99] freq = degree (-2. 15) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 17
CMU SCS Solution# S. 1’ • Power law in the degree distribution [SIGCOMM 99] internet domains att. com log(degree) ibm. com -0. 82 log(rank) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 18
CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix Tepper, CMU, April 4 (c) C. Faloutsos, 2017 19
CMU SCS Solution# S. 2: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Mihail, Papadimitriou ’ 02]: slope is ½ of rank exponent Tepper, CMU, April 4 (c) C. Faloutsos, 2017 20
CMU SCS But: How about graphs from other domains? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 21
CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic Count (log scale) Zipf ``ebay’’ users sites in-degree (log scale) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 22
CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017 23
CMU SCS And numerous more • • # of sexual contacts Income [Pareto] –’ 80 -20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) • Size of files of a user • … • ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 24
CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs • Degree • Triangles • … – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 25
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles Tepper, CMU, April 4 (c) C. Faloutsos, 2017 26
CMU SCS Solution# S. 3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 27
CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Tepper, CMU, April 4 (c) C. Faloutsos, 2017 28
CMU SCS Triangle Law: #S. 3 [Tsourakakis ICDM 2008] HEP-TH ASN Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Tepper, CMU, April 4 (c) C. Faloutsos, 2017 29
CMU SCS Triangle Law: #S. 4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n 1. 6 triangles Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017 30
CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 31
CMU SCS Observations on weighted graphs? • A: yes - even more ‘laws’! M. Mc. Glohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 32
CMU SCS Observation W. 1: Fortification Q: How do the weights of nodes relate to degree? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 33
CMU SCS Observation W. 1: Fortification More donors, more $ ? $10 $5 $7 ‘Reagan’ ‘Clinton’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 34
CMU SCS Observation W. 1: fortification: Snapshot Power Law • Weight: super-linear on in-degree • exponent ‘iw’: 1. 01 < iw < 1. 26 Orgs-Candidates More donors, even more $ $10 e. g. John Kerry, $10 M received, from 1 K donors In-weights ($) $5 Edges (# donors) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 35
CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 36
CMU SCS Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 37
CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 38
CMU SCS T. 1 Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time Tepper, CMU, April 4 (c) C. Faloutsos, 2017 39
CMU SCS T. 1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 diameter – 2. 9 M nodes – 16. 5 M edges time [years] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 40
CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 41
CMU SCS T. 2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 42
CMU SCS T. 2 Densification – Patent Citations • Citations among patents granted E(t) • @1999 – 2. 9 M nodes – 16. 5 M edges 1. 66 • Each year is a datapoint N(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017 43
CMU SCS Outline • Introduction – Motivation • Patterns in graphs – Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs • Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 44
CMU SCS More on Time-evolving graphs M. Mc. Glohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 45
CMU SCS Observation T. 3: NLCC behavior Q: How do NLCC’s emerge and join with the GCC? (``NLCC’’ = non-largest conn. components) – Do they continue to grow in size? – or do they shrink? – or stabilize? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 46
CMU SCS Observation T. 3: NLCC behavior • After the gelling point, the GCC takes off, but NLCC’s remain ~constant (actually, oscillate). IMDB CC size Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017 47
CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). Tepper, CMU, April 4 (c) C. Faloutsos, 2017 48
CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 49
CMU SCS Example: GIM-V At Work • Connected Components Count ~0. 7 B singleton nodes Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 50
CMU SCS Example: GIM-V At Work • Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 51
CMU SCS Example: GIM-V At Work • Connected Components Count 300 -size cmpt X 500. 1100 -size cmpt Why? X 65. Why? Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 52
CMU SCS Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not existing now) Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017 53
CMU SCS GIM-V At Work • Connected Components over Time • Linked. In: 7. 5 M nodes and 58 M edges Stable tail slope after the gelling point Tepper, CMU, April 4 (c) C. Faloutsos, 2017 54
CMU SCS Timing for Blogs • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU->Stanford) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’ 07] Tepper, CMU, April 4 (c) C. Faloutsos, 2017 55
CMU SCS T. 4 : popularity over time # in links 1 2 3 lag: days after post Post popularity drops-off – exponentially? @t @t + lag Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56
CMU SCS T. 4 : popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57
CMU SCS T. 4 : popularity over time # in links (log) -1. 6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 • close to -1. 5: Barabasi’s stack model • and like the zero-crossings of a random walk Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58
CMU SCS Conclusions (part 1) MANY patterns in real graphs • Skewed degree distributions • Small (and shrinking) diameter • Power-laws wrt triangles • Oscillating size of connected components • … and more Tepper, CMU, April 4 (c) C. Faloutsos, 2017 59
CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http: //www. morganclaypool. com/doi/abs/10. 2200/S 004 49 ED 1 V 01 Y 201209 DMK 006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 60
CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). Tepper, CMU, April 4 (c) C. Faloutsos, 2017 61
CMU SCS Project info www. cs. cmu. edu/~pegasus Chau, Polo Akoglu, Leman Mc. Glohon, Mary Kang, U Tsourakakis, Babis Prakash, Aditya Tong, Hanghang Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M 45), LLNL, IBM, SPRINT, Tepper, CMU, April 4 INTEL, HP (c) C. Faloutsos, 2017 62
CMU SCS Part 1 END Tepper, CMU, April 4 (c) C. Faloutsos, 2017 63
- Data mining cmu
- Cmu data mining
- Strip mining vs open pit mining
- Chapter 13 mineral resources and mining
- Difference between strip mining and open pit mining
- Difference between text mining and web mining
- Mining multimedia databases in data mining
- Mining complex data types
- Cmu graph theory
- Christos papadimitriou columbia
- Christos kanellopoulos
- Interstitiella lungsjukdomar
- Christos faloutsos
- Christos davatzikos
- Dr christos anastasiou
- Nicholas lemonais
- Christos takoudis
- Christos h papadimitriou
- Christos chronopoulos
- Ucsb barc
- Christos chronopoulos
- Christos lenis
- Christos hatzis
- Christos markou
- Christos hatzis
- Christos kotselidis
- Mining frequent patterns associations and correlations
- Closed patterns and max-patterns
- Dbminer
- Mining frequent patterns without candidate generation
- Mining frequent patterns associations and correlations
- Scs 770069 power relay
- Scs method
- Lluvia neta
- Apa itu lengkung peralihan
- Antecedent moisture condition
- Dioda triac
- Scs curve number
- Tirstor
- Color 9132005
- Scs.ryerson.ca harley
- Contoh rangkaian fet
- Scs reasonable person principle
- Scs thyristor
- Scs carleton
- Scs archiver
- Lengkung peralihan
- Scs elogs
- Scs lulu
- Scs methode
- Doc scs
- Skin carotenoid score
- Arabesque: a system for distributed graph mining
- How should mining graph look like
- In traditional dating patterns dating behavior
- Unit 1 lesson 9
- Handshaking theorem
- Resource allocation graph and wait for graph
- Mark up rule
- Two part tariff graph
- Words with prefix quad
- Travel graph questions and answers
- Part part whole addition
- Part to part ratio definition