SCS CMU Proximity on Large Graphs Speaker Hanghang

  • Slides: 115
Download presentation
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008 -4 -10 15 -826

SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008 -4 -10 15 -826 Guest Lecture

SCS CMU Graphs are everywhere! 2

SCS CMU Graphs are everywhere! 2

SCS CMU Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level

SCS CMU Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level We are here! 4

SCS CMU Proximity on Graph: What? a. k. a Relevance, Closeness, ‘Similarity’… 5

SCS CMU Proximity on Graph: What? a. k. a Relevance, Closeness, ‘Similarity’… 5

SCS CMU Proximity is the main tool behind… • • • Link prediction [Liben-Nowell+],

SCS CMU Proximity is the main tool behind… • • • Link prediction [Liben-Nowell+], [Tong+] Ranking [Haveliwala], [Chakrabarti+] Email Management [Minkov+] Image caption [Pan+] Neighborhooh Formulation [Sun+] Conn. subgraph [Faloutsos+], [Tong+], [Koren+] Pattern match [Tong+] Collaborative Filtering [Fouss+] Many more… Will return to this later 6

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • Variants • Asymmetry of Prox. • Group Prox • Prox w/ Attributes • Prox w/ Time 7

SCS CMU Some ``bad’’ proximities Why not shortest path? ‘pizza delivery guy’ problem ‘multi-facet’

SCS CMU Some ``bad’’ proximities Why not shortest path? ‘pizza delivery guy’ problem ‘multi-facet’ relationship 8

SCS CMU Some ``bad’’ proximities Why not max. netflow? No punishment on long paths

SCS CMU Some ``bad’’ proximities Why not max. netflow? No punishment on long paths 9

SCS CMU What is a ``good’’ Proximity? … • Multiple Connections • Quality of

SCS CMU What is a ``good’’ Proximity? … • Multiple Connections • Quality of connection • Direct & In-directed Conns • Length, Degree, Weight… 11

SCS CMU Random walk with restart 9 10 12 2 8 1 11 3

SCS CMU Random walk with restart 9 10 12 2 8 1 11 3 4 6 5 7 12

SCS CMU Random walk with restart 0. 04 9 0. 10 2 0. 13

SCS CMU Random walk with restart 0. 04 9 0. 10 2 0. 13 1 3 8 0. 13 0. 03 10 12 0. 08 11 0. 04 4 0. 13 6 5 7 Node 4 0. 05 Nearby nodes, higher scores More red, more relevant 0. 02 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0. 13 0. 10 0. 13 0. 22 0. 13 0. 05 0. 08 0. 04 0. 03 0. 04 0. 02 Ranking vector 13

SCS CMU Why RWR is a good score? all paths from i to j

SCS CMU Why RWR is a good score? all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3 14

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • Variants • Asymmetry of Prox. • Group Prox • Prox w/ Attributes • Prox w/ Time 15

SCS CMU Variant: escape probability • Define Random Walk (RW) on the graph •

SCS CMU Variant: escape probability • Define Random Walk (RW) on the graph • Esc_Prob(A B) – Prob (starting at A, reaches B before returning to A) A the remaining graph B Esc_Prob = Pr (smile before cry) 16

SCS CMU Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+]

SCS CMU Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+] – Sim. Rank [Jeh+] • Equivalence of Random Walks – Electric Networks: • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems • Katz [Katz], [Huang+], [Scholkopf+] • Matrix-Forest-based Alg [Chobotarev+] 17

SCS CMU Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+]

SCS CMU Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+] – Sim. Rank [Jeh+] are related to, or similar • All Equivalence of Random Walks to – Electric Networks: random walk with restart! • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems • Katz [Katz], [Huang+], [Scholkopf+] • Matrix-Forest-based Alg [Chobotarev+] 18

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • Variants • Asymmetry of Prox. • Group Prox • Prox w/ Attributes • Prox w/ Time 19

SCS CMU Asymmetry of Proximity [Tong+ KDD 07 a] What is Prox between A

SCS CMU Asymmetry of Proximity [Tong+ KDD 07 a] What is Prox between A and B? What is Prox from A to B? What is Prox from B to A? 20

SCS CMU Asymmetry also exists in un-directed graphs • Hanghang’s most important conf. is

SCS CMU Asymmetry also exists in un-directed graphs • Hanghang’s most important conf. is KDD • The most important author in KDD is. . . Hanghang So is love… KDD 21

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part

SCS CMU Roadmap • Basic: RWR • • • Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • Variants • Asymmetry of Prox. • Group Prox • Prox w/ Attributes • Prox w/ Time 22

SCS CMU Group Proximity [Tong+ 2007] • Q: How close are Accountants to SECs?

SCS CMU Group Proximity [Tong+ 2007] • Q: How close are Accountants to SECs? • A: Prob (starting at any RED, reaches any GREEN before touching any RED again) 23

SCS CMU Proximity on Attribute Graphs What is the proximity from node 7 to

SCS CMU Proximity on Attribute Graphs What is the proximity from node 7 to 10? If we know that… 24

SCS CMU Sol: Augmented graphs 25

SCS CMU Sol: Augmented graphs 25

SCS CMU skip Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW 07] Works Wrote Sent

SCS CMU skip Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW 07] Works Wrote Sent Cited Received In-Replied-to 26

SCS CMU Proximity w/ Time • Sol #1: treat time an categorical attr. [Minkov+]

SCS CMU Proximity w/ Time • Sol #1: treat time an categorical attr. [Minkov+] • Sol #2: aggregate slice matrices [Tong+] • Global aggregation • Slide window • Exponential emphasis Time 27

SCS CMU Summary of Part I • Goal: Summarize multiple … relationships • Solutions

SCS CMU Summary of Part I • Goal: Summarize multiple … relationships • Solutions – Basic: Random Walk with Restart – Property: Asymmetry – Variants: Esc_Prob and many others. – Generalization: Group Prox. ; w/ Attr. ; w/ Time 28

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • B_Lin: RWR • Fast. All. DAP: Esc_Prob • BB_Lin: Skewed BGs • Fast. Update: Time-Evolving 29

SCS CMU Preliminary: Sherman–Morrison Lemma If: = Then: 30

SCS CMU Preliminary: Sherman–Morrison Lemma If: = Then: 30

SCS CMU SM Lemma: Applications • RLS – and almost any algorithm in time

SCS CMU SM Lemma: Applications • RLS – and almost any algorithm in time series! • • Leave-one-out cross validation for LS Kalman filtering Incremental matrix decomposition … and all the fast sols we will introduce! 31

SCS CMU Ranking vector Computing RWR Adjacent matrix Restart p Starting vector 9 1

SCS CMU Ranking vector Computing RWR Adjacent matrix Restart p Starting vector 9 1 2 1 8 3 10 12 11 4 5 nx 1 nxn 6 7 nx 1 32

SCS CMU Beyond RWR : Maxwell Equation for Web! [Chakrabarti] SM Learning RL in

SCS CMU Beyond RWR : Maxwell Equation for Web! [Chakrabarti] SM Learning RL in CBIR [Zhou, Zhu] [He] P-Page. Rank [Haveliwala] RWR Page. Rank [Pan, Sun] [Haveliwala] Fast RWR (B_Lin) Finds the Root Solution !33

SCS CMU Beyond RWR • RWR is the building block for computing… – Escape

SCS CMU Beyond RWR • RWR is the building block for computing… – Escape Probability (augmented w/sink) [Tong+] –. . Effective Conductanc Resistance Dist. Commute Time – MRF (special structure) [Cohen] • Similar Idea of B_Lin to compute other measurements 34

SCS CMU • Q: Given query i, how to solve it? ? ? Adjacent

SCS CMU • Q: Given query i, how to solve it? ? ? Adjacent matrix Starting vector 35

SCS CMU Onthe. Fly: 0. 04 0. 10 10 0. 03 9 10 9

SCS CMU Onthe. Fly: 0. 04 0. 10 10 0. 03 9 10 9 12 12 0. 08 0. 02 88 11 11 22 1 1 3 30. 13 44 5 5 660. 05 0. 13 77 0. 13 0. 04 0. 05 No pre-computation/ light storage Slow on-line response O(m. E) 36

SCS CMU Pre. Compute 0. 04 0. 13 11 R: 99 10 10 0.

SCS CMU Pre. Compute 0. 04 0. 13 11 R: 99 10 10 0. 03 1212 0. 08 88 0. 02 11 11 0. 10 22 44 3 0. 13 5 0. 13 0. 04 66 77 0. 05 [Haveliwala] 37

SCS CMU Pre. Compute: 0. 04 0. 13 11 99 10 10 0. 03

SCS CMU Pre. Compute: 0. 04 0. 13 11 99 10 10 0. 03 1212 0. 08 88 0. 02 11 11 0. 10 22 44 3 0. 13 5 0. 13 0. 04 66 77 0. 05 Fast on-line response Heavy pre-computation/storage cost O(n 3 ) O(n 2 ) 38

SCS CMU Q: How to Balance? Off-line On-line 39

SCS CMU Q: How to Balance? Off-line On-line 39

B_Lin: Basic Idea [Tong+] 9 10 SCS CMU Find Community 2 1 4 9

B_Lin: Basic Idea [Tong+] 9 10 SCS CMU Find Community 2 1 4 9 2 1 8 3 11 6 5 10 12 0. 04 7 12 0. 13 1 11 0. 10 2 4 4 5 9 3 0. 08 0. 13 7 1 2 4 Fix the remaining 9 8 3 5 10 12 8 0. 13 5 6 10 11 0. 03 12 0. 04 6 7 0. 05 Combine 11 6 7 40

SCS CMU Pre-computational stage • Q: Efficiently compute and store Q-1 • A: A

SCS CMU Pre-computational stage • Q: Efficiently compute and store Q-1 • A: A few small, instead of ONE BIG, matrices inversions 41

SCS CMU On-Line Query Stage • Q: Efficiently recover one column of Q-1 •

SCS CMU On-Line Query Stage • Q: Efficiently recover one column of Q-1 • A: A few, instead of MANY, matrix-vector multiplication + 42

SCS CMU Pre-compute Stage • p 1: B_Lin Decomposition – P 1. 1 partition

SCS CMU Pre-compute Stage • p 1: B_Lin Decomposition – P 1. 1 partition – P 1. 2 low-rank approximation • p 2: Q matrices – P 2. 1 computing – P 2. 2 computing (for each partition) (for concept space) 43

SCS CMU skip P 1. 1: partition 9 2 1 8 3 10 12

SCS CMU skip P 1. 1: partition 9 2 1 8 3 10 12 11 4 5 6 7 Within-partition links cross-partition links 44

SCS CMU skip P 1. 1: 9 2 1 8 3 block-diagonal 10 12

SCS CMU skip P 1. 1: 9 2 1 8 3 block-diagonal 10 12 11 4 5 6 7 45

SCS CMU skip P 1. 2: LRA for 9 2 1 8 3 10

SCS CMU skip P 1. 2: LRA for 9 2 1 8 3 10 12 11 4 5 6 7 ~ |S| << |W 2| 46

SCS CMU skip = + 47

SCS CMU skip = + 47

SCS CMU skip p 2. 1 Computing c 48

SCS CMU skip p 2. 1 Computing c 48

SCS CMU skip Comparing and • Computing Time – 100, 000 nodes; 100 partitions

SCS CMU skip Comparing and • Computing Time – 100, 000 nodes; 100 partitions – Computing 100, 00 x is Faster! • Storage Cost – 100 x saving! Q 1, 1 = Q 2 1, Q 1, k 49

SCS CMU skip ~ ~ ~ + + ? • Q: How to fix

SCS CMU skip ~ ~ ~ + + ? • Q: How to fix the green portions? 50

SCS CMU skip p 2. 2 Computing: Q 1, 1 _ = -1 V

SCS CMU skip p 2. 2 Computing: Q 1, 1 _ = -1 V Q 2 1, U Q 1, k 9 1 2 8 3 10 12 11 4 5 6 7 51

SCS CMU skip We have: Communities Bridges SM Lemma says: 52

SCS CMU skip We have: Communities Bridges SM Lemma says: 52

SCS CMU skip On-Line Stage • Q + Pre-Computation Query ? Result • A

SCS CMU skip On-Line Stage • Q + Pre-Computation Query ? Result • A (SM lemma) 53

SCS CMU skip On-Line Query Stage q 1: q 2: q 3: q 4:

SCS CMU skip On-Line Query Stage q 1: q 2: q 3: q 4: q 5: q 6: 54

SCS CMU skip 55

SCS CMU skip 55

SCS CMU Query Time vs. Pre-Compute Time Log Query Time • Quality: 90%+ •

SCS CMU Query Time vs. Pre-Compute Time Log Query Time • Quality: 90%+ • On-line: • Up to 150 x speedup • Pre-computation: • Two orders saving Log Pre-compute Time 56

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • B_Lin: RWR • Fast. All. DAP: Esc_Prob • BB_Lin: Skewed BGs • Fast. Update: Time-Evolving 57

SCS CMU Fast. All. DAP [Tong+] • Q: How to compute – Esc_Prob =

SCS CMU Fast. All. DAP [Tong+] • Q: How to compute – Esc_Prob = Pr (smile before cry)? A the remaining graph B 58 Footnote: augmented w/ universal sink as practical modification

SCS CMU Solving DAP (Straight-forward way) 1 -c: fly-out probability (to black-hole) 1 x

SCS CMU Solving DAP (Straight-forward way) 1 -c: fly-out probability (to black-hole) 1 x (n-2) 1 x (n-2) One matrix inversion, one proximity! 59

SCS CMU P= P: Transition matrix (row norm. ) -1 Esc_Prob(1 ->5) = I

SCS CMU P= P: Transition matrix (row norm. ) -1 Esc_Prob(1 ->5) = I -c 2 c + 60

SCS CMU Challenges • Case 1, Medium Size Graph – Matrix inversion is feasible,

SCS CMU Challenges • Case 1, Medium Size Graph – Matrix inversion is feasible, but… – What if we want many proximities? – Q: How to get all (n 2 ) proximities efficiently? – A: Fast. All. DAP! • Case 2: Large Size Graph – Matrix inversion is infeasible – Q: How to get one proximity efficiently? – A: Fast. One. DAP! skip 61

SCS CMU Fast. All. DAP • Q 1: How to efficiently compute all possible

SCS CMU Fast. All. DAP • Q 1: How to efficiently compute all possible proximities on a medium size graph? – a. k. a. how to efficiently solve multiple linear systems simultaneously? • Goal: reduce # of matrix inversions! 62

SCS CMU Fast. All. DAP: Observation P= P= Need two different matrix inversions! 63

SCS CMU Fast. All. DAP: Observation P= P= Need two different matrix inversions! 63

SCS CMU Prox(1 5) Fast. All. DAP: Rescue P= Prox(1 6) Overlap between two

SCS CMU Prox(1 5) Fast. All. DAP: Rescue P= Prox(1 6) Overlap between two gray parts! P= Redundancy among different linear systems! 64

SCS CMU Fast. All. DAP: Theorem • Theorem: • Example: • Proof: by SM

SCS CMU Fast. All. DAP: Theorem • Theorem: • Example: • Proof: by SM Lemma 65

SCS CMU Fast. All. DAP: Algorithm • Alg. – Compute Q – For i,

SCS CMU Fast. All. DAP: Algorithm • Alg. – Compute Q – For i, j =1, …, n, compute • Computational Save O(1) instead of O(n 2 )! • Example – w/ 1000 nodes, – 1 m matrix inversion vs. 1 matrix! 66

SCS CMU Fast. All. DAP Time (sec) Straight-Solver 1, 000 x faster! Fast. All.

SCS CMU Fast. All. DAP Time (sec) Straight-Solver 1, 000 x faster! Fast. All. DAP Size of Graph 67

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • B_Lin: RWR • Fast. All. DAP: Esc_Prob • BB_Lin: Skewed BGs • Fast. Update: Time-Evolving 68

SCS CMU authors RWR on Bipartite Graph Author-Conf. Matrix Observation: n >> m! Examples:

SCS CMU authors RWR on Bipartite Graph Author-Conf. Matrix Observation: n >> m! Examples: n Conferences 1. DBLP: 400 k aus, 3. 5 k confs 2. Net. Flix: 2. 7 M usrs, 18 k mvs m 69

SCS CMU RWR on Skewed bipartite graphs • Q: Given query i, how to

SCS CMU RWR on Skewed bipartite graphs • Q: Given query i, how to solve it? ? 0 …. . Ac n …. . Ar. . . …. . ? 0 m 70

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1:

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1: M = Ac X Ar All Conf-Conf Prox. Scores • Step 2: • Cost: • Examples – Net. Flix: 1. 5 hr for pre-computation; – DBLP: 1 few minutes 71

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1:

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1: M = Ac X Ar All Conf-Conf Prox. Scores • Step 2: 72

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1:

SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2 -step RWR for Conferences • Step 1: M = Ac X Ar All Conf-Conf Prox. Scores • Step 2: • Cost: • Examples – Net. Flix: 1. 5 hr for pre-computation; – DBLP: 1 few minutes mxm Ac/Ar 73 E edges

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 1: - Conf Read out !

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 1: - Conf Read out ! Ac/Ar E edges 74

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 2: - Au - Conf 1

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 2: - Au - Conf 1 matrix-vec! Ac/Ar E edges 75

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 3: - Au 2 m atrix-vec!

SCS CMU authors BB_Lin: On-Line Stage Conferences Case 3: - Au 2 m atrix-vec! Ac/Ar E edges 76

SCS CMU BB_Lin: Examples • Net. Flix dataset (2. 7 m user x 18

SCS CMU BB_Lin: Examples • Net. Flix dataset (2. 7 m user x 18 k movies) – 1. 5 hr for pre-computation; – <1 sec for on-line • DBLP dataset (400 k authors x 3. 5 k confs) – A few minutes for pre-computation – <0. 01 sec for on-line 77

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • B_Lin: RWR • Fast. All. DAP: Esc_Prob • BB_Lin: Skewed BGs • Fast. Update: Time-Evolving 78

SCS CMU Challenges • BB_Lin is good for skewed bipartite graphs – for Net.

SCS CMU Challenges • BB_Lin is good for skewed bipartite graphs – for Net. Flix (2. 7 M nodes and 100 M edges) – w/ 1. 5 hr pre-computation for m x m core matrix – fraction of seconds for on-line query • But…what if the graph is evolving over time – New edges/nodes arrive; edge weights increase… – 1. 5 hr itself becomes a part of on-line cost! 79

SCS CMU Q: How to update the core matrix? t=0 t=1 ~ ~ ?

SCS CMU Q: How to update the core matrix? t=0 t=1 ~ ~ ? 80

SCS CMU Update the core matrix • Step 1: ~M = X Ar Ac

SCS CMU Update the core matrix • Step 1: ~M = X Ar Ac • Step 2: ~ = = ~ ? + M + X Rank 2 update X 81

SCS CMU Update : General Case [Tong+ 2008] n authors ~M = Ac X

SCS CMU Update : General Case [Tong+ 2008] n authors ~M = Ac X Ar m Conferences • E’ edges changed • Involves n’ authors, m’ confs. • Observation 82

SCS CMU Update : General Case • Observation: – the rank of update is

SCS CMU Update : General Case • Observation: – the rank of update is small! n authors m Conferences • Algorithm: – E’ edges changed – Involves n’ authors, m’ confs. – our Alg. – (details in the paper) 83

SCS CMU Fast. One. Update Time (Seconds) 40 x speedup 176 x speedup 84

SCS CMU Fast. One. Update Time (Seconds) 40 x speedup 176 x speedup 84 Datasets

SCS CMU Fast-Batch-Update Time (Seconds) E’ Min (n’, m’) 15 x speed-up on average!

SCS CMU Fast-Batch-Update Time (Seconds) E’ Min (n’, m’) 15 x speed-up on average! 85

SCS CMU Summary of Part II • Goal: Efficiently Solve Linear System(s) • Sols.

SCS CMU Summary of Part II • Goal: Efficiently Solve Linear System(s) • Sols. – B_Lin: Approximate one large linear system – Fast. All. DAP: multiple inner-related linear systems – BB_Lin: the intrinsic complexity is small – Fast. Update: (smooth) dynamic linear system 86

SCS CMU Fast. All. DAP … B_Lin BB_Lin Fast. Update … 87

SCS CMU Fast. All. DAP … B_Lin BB_Lin Fast. Update … 87

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions

SCS CMU • • • Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion • Link Prediction • NF • g. Cap • Ce. PS • G-Ray • p. Track/c. Track 88

SCS CMU density Link Prediction: existence with link Prox (i j)+Prox (j i) Prox.

SCS CMU density Link Prediction: existence with link Prox (i j)+Prox (j i) Prox. is effective to distinguish red and blue! density no link Prox (i j)+Prox (j i) 89

SCS CMU Link Prediction: direction • Q: Given the existence of the link, what

SCS CMU Link Prediction: direction • Q: Given the existence of the link, what is the direction of the link? • A: Compare prox(i j) and prox(j i) density >70% 90 Prox (i j) - Prox (j i)

SCS CMU Neighborhood Formulation … … Q: what is most related conference to ICDM

SCS CMU Neighborhood Formulation … … Q: what is most related conference to ICDM A: RWR! [Sun ICDM 2005] … … Conference Author 91

SCS CMU NF: example 92

SCS CMU NF: example 92

SCS CMU g. Ca. P: Automatic Image Caption • Q … { Sea Sun

SCS CMU g. Ca. P: Automatic Image Caption • Q … { Sea Sun Sky Wave} { Cat Forest Grass Tiger } ? A: RWR! {? , ? , } [Pan KDD 2004] 93

SCS CMU Region Image Test Image Sea Sun Sky Wave Cat Forest Keyword Tiger

SCS CMU Region Image Test Image Sea Sun Sky Wave Cat Forest Keyword Tiger Grass 94

SCS CMU Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave

SCS CMU Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave Cat Keyword Forest Tiger Grass 95

SCS CMU Center-Piece Subgraph(Ce. PS) Q Ce. PS guy ? Original Graph Black: query

SCS CMU Center-Piece Subgraph(Ce. PS) Q Ce. PS guy ? Original Graph Black: query nodes Ce. PS A: RWR! [Tong KDD 2006] Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C)) 96

SCS CMU Ce. PS: Example 97

SCS CMU Ce. PS: Example 97

SCS CMU K_Soft. And: Relaxation of AND Noise Disconnected Communities Asking AND query? No

SCS CMU K_Soft. And: Relaxation of AND Noise Disconnected Communities Asking AND query? No Answer! 98

SCS CMU 2_Soft. And x 1 e-4 And 1_Soft. And (OR) 99

SCS CMU 2_Soft. And x 1 e-4 And 1_Soft. And (OR) 99

SCS CMU Ce. PS: 2 Soft_AND DB Stat. 100

SCS CMU Ce. PS: 2 Soft_AND DB Stat. 100

SCS CMU Input Output Query Graph Matching Subgraph Attributed Data Graph X-Ray 101

SCS CMU Input Output Query Graph Matching Subgraph Attributed Data Graph X-Ray 101

SCS CMU G-Ray: How to? matching node Goodness = Prox (12, 4) x Prox

SCS CMU G-Ray: How to? matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12) 102

SCS CMU Effectiveness: star-query Query Result 103

SCS CMU Effectiveness: star-query Query Result 103

SCS CMU Effectiveness: line-query Query Result 104

SCS CMU Effectiveness: line-query Query Result 104

SCS CMU Effectiveness: loop-query Query Result 105

SCS CMU Effectiveness: loop-query Query Result 105

SCS CMU Author A’ Rank in KDD p. Track • [Given] – (1) a

SCS CMU Author A’ Rank in KDD p. Track • [Given] – (1) a large, skewed time-evolving bipartite graphs, – (2) the query nodes of interest Year • [Track] – (1) top-k most related nodes for each query node at each time step t; – (2) the proximity score (or rank of proximity) between any two query nodes at each time step t 106

SCS CMU Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS

SCS CMU Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB 1992 1997 2002 2007 Databases Performance Distributed Sys. Databases Data Mining 107

SCS CMU KDD’s Rank wrt. VLDB over years Rank Data Mining and Databases are

SCS CMU KDD’s Rank wrt. VLDB over years Rank Data Mining and Databases are more and more relavant! Year 108

SCS CMU c. Track • [Given] – (1) a large, skewed time-evolving graphs, –

SCS CMU c. Track • [Given] – (1) a large, skewed time-evolving graphs, – (2) the query nodes of interest • [Track] – (1) top-k most central nodes at each time step t; – (2) the centrality score (or rank of centrality) for each query node at each time step t 109

SCS CMU Ranking of Centrality up to each year (in NIPS) T. Sejnowski Rank

SCS CMU Ranking of Centrality up to each year (in NIPS) T. Sejnowski Rank of Influential-ness C. Koch G. Hinton M. Jordan Year 110

SCS CMU 10 most influential authors up to each year T. Sejnowski M. Jordan

SCS CMU 10 most influential authors up to each year T. Sejnowski M. Jordan Author-paper bipartite graph from NIPS 1987 -1999. 3 k. 1740 papers, 2037 authors, spreading over 13 years 111

SCS CMU Applications Ce. PSProximity Use Link Prediction G-Ray as Building. NFblock p. Track

SCS CMU Applications Ce. PSProximity Use Link Prediction G-Ray as Building. NFblock p. Track c. Track g. Cap Computations Fast. Update Efficiently BB_Lin Solve Linear Fast. All. DAP System(s) B_Lin Weighted Multiple Relationship RW Va R ria nt G s ro up Po rx w/. At tri bu w/ te Ti m e Proximity On Graphs Definitions 112

SCS CMU Take-home Messages • Proximity Definitions – RWR – and a lot of

SCS CMU Take-home Messages • Proximity Definitions – RWR – and a lot of variants • Computations – SM Lemma 113

SCS CMU References • L. Page, S. Brin, R. Motwani, & T. Winograd. (1998),

SCS CMU References • L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The Page. Rank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library. • T. H. Haveliwala. (2002) Topic-Sensitive Page. Rank. In WWW, 517526, 2002 • J. Y. Pan, H. J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653 -658, 2004. • C. Faloutsos, K. S. Mc. Curley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118 -127, 2004. • J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418 -425, 2005. • W. Cohen. (2007) Graph Walks and Graphical Models. Draft. 114

SCS CMU References • P. Doyle & J. Snell. (1984) Random walks and electric

SCS CMU References • P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York. • Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245– 255, 2006. • A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14 -23, 2006. • S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571 -580, 2007. • F. Fouss, A. Pirotte, J. -M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355 -369 2007. 115

SCS CMU References • H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition

SCS CMU References • H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404 -413, 2006. • H. Tong, C. Faloutsos, & J. Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613 -622, 2006. • H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast directionaware proximity for graph mining. In KDD, 747 -756, 2007. • H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, 737 -746, 2007. • H. Tong, S. Papadimitriou, P. S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008. 116

SCS CMU Thank you! htong@cs. cmu. edu www. cs. cmu. edu/~htong 117

SCS CMU Thank you! htong@cs. cmu. edu www. cs. cmu. edu/~htong 117