Fast Random Walk with Restart and Its Applications

  • Slides: 46
Download presentation
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec. 18 -22, Hong. Kong

Motivating Questions • Q: How to measure the relevance? • A: Random walk with

Motivating Questions • Q: How to measure the relevance? • A: Random walk with restart • Q: How to do it efficiently? • A: This talk tries to answer! 2

Random walk with restart 9 10 12 2 8 1 11 3 4 6

Random walk with restart 9 10 12 2 8 1 11 3 4 6 5 7 3

Random walk with restart 0. 04 9 0. 10 2 0. 13 1 3

Random walk with restart 0. 04 9 0. 10 2 0. 13 1 3 8 0. 13 0. 03 10 12 0. 08 11 0. 04 4 0. 13 6 5 7 Node 4 0. 05 Nearby nodes, higher scores More red, more relevant 0. 02 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0. 13 0. 10 0. 13 0. 22 0. 13 0. 05 0. 08 0. 04 0. 03 0. 04 0. 02 Ranking vector 4

Automatic Image Caption • Q … { Sea Sun Sky Wave} { Cat Forest

Automatic Image Caption • Q … { Sea Sun Sky Wave} { Cat Forest Grass Tiger } ? A: RWR! {? , ? , } [Pan KDD 2004] 5

Region Image Test Image Sea Sun Sky Wave Cat Forest Keyword Tiger Grass 6

Region Image Test Image Sea Sun Sky Wave Cat Forest Keyword Tiger Grass 6

Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave Cat Keyword

Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave Cat Keyword Forest Tiger Grass 7

Neighborhood Formulation … … Q: what is most related conference to ICDM A: RWR!

Neighborhood Formulation … … Q: what is most related conference to ICDM A: RWR! [Sun ICDM 2005] … … Conference Author 8

NF: example 9

NF: example 9

Center-Piece Subgraph(Ce. PS) Q ? Original Graph Black: query nodes A: RWR! [Tong KDD

Center-Piece Subgraph(Ce. PS) Q ? Original Graph Black: query nodes A: RWR! [Tong KDD 2006] Ce. PS 10

Ce. PS: Example 11

Ce. PS: Example 11

Other Applications • Content-based Image Retrieval [He] • Personalized Page. Rank [Jeh], [Widom], [Haveliwala]

Other Applications • Content-based Image Retrieval [He] • Personalized Page. Rank [Jeh], [Widom], [Haveliwala] • Anomaly Detection (for node; link) [Sun] • Link Prediction [Getoor], [Jensen] • Semi-supervised Learning [Zhu], [Zhou] • … 12

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 13

Computing RWR Ranking vector Adjacent matrix Restart p Starting vector 9 1 2 1

Computing RWR Ranking vector Adjacent matrix Restart p Starting vector 9 1 2 1 8 3 10 12 11 4 5 nx 1 nxn 6 7 nx 1 14

Beyond RWR : Maxwell Equation for Web! [Chakrabarti] SM Learning RL in CBIR [Zhou,

Beyond RWR : Maxwell Equation for Web! [Chakrabarti] SM Learning RL in CBIR [Zhou, Zhu] [He] P-Page. Rank [Haveliwala] RWR Page. Rank [Pan, Sun] [Haveliwala] Fast RWR Finds the Root Solution ! 15

 • Q: Given query i, how to solve it? ? ? 16

• Q: Given query i, how to solve it? ? ? 16

Onthe. Fly: 0. 04 0. 10 10 0. 03 9 10 9 12 12

Onthe. Fly: 0. 04 0. 10 10 0. 03 9 10 9 12 12 0. 08 0. 02 88 11 11 22 1 1 3 30. 13 44 5 5 660. 05 0. 13 77 0. 13 0. 04 0. 05 No pre-computation/ light storage Slow on-line response O(m. E) 17

Pre. Compute 0. 04 0. 13 11 R: 99 10 10 0. 03 1212

Pre. Compute 0. 04 0. 13 11 R: 99 10 10 0. 03 1212 0. 08 88 0. 02 11 11 0. 10 22 44 3 0. 13 5 0. 13 0. 04 66 77 0. 05 [Haveliwala] 18

Pre. Compute: 0. 04 0. 13 11 99 10 10 0. 03 1212 0.

Pre. Compute: 0. 04 0. 13 11 99 10 10 0. 03 1212 0. 08 88 0. 02 11 11 0. 10 22 44 3 0. 13 5 0. 13 0. 04 66 77 0. 05 Fast on-line response Heavy pre-computation/storage cost O(n 3 ) O(n 2 ) 19

Q: How to Balance? Off-line On-line 20

Q: How to Balance? Off-line On-line 20

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 21

Basic Idea Find Community 2 1 4 9 2 1 8 3 9 8

Basic Idea Find Community 2 1 4 9 2 1 8 3 9 8 3 12 11 6 5 10 10 0. 04 7 12 0. 13 1 11 0. 10 2 4 4 5 9 3 0. 08 0. 13 7 1 2 4 Fix the remaining 9 8 3 5 10 12 8 0. 13 5 6 10 11 0. 03 12 0. 04 6 7 0. 05 Combine 11 6 7 22

Pre-computational stage • Q: Efficiently compute and store Q-1 • A: A few small,

Pre-computational stage • Q: Efficiently compute and store Q-1 • A: A few small, instead of ONE BIG, matrices inversions 23

On-Line Query Stage • Q: Efficiently recover one column of Q-1 • A: A

On-Line Query Stage • Q: Efficiently recover one column of Q-1 • A: A few, instead of MANY, matrix-vector multiplication + 24

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 25

Pre-compute Stage • p 1: B_Lin Decomposition – P 1. 1 partition – P

Pre-compute Stage • p 1: B_Lin Decomposition – P 1. 1 partition – P 1. 2 low-rank approximation • p 2: Q matrices – P 2. 1 computing – P 2. 2 computing (for each partition) (for concept space) 26

P 1. 1: partition 9 2 1 8 3 10 12 11 4 5

P 1. 1: partition 9 2 1 8 3 10 12 11 4 5 6 7 Within-partition links cross-partition links 27

P 1. 1: 9 2 1 8 3 block-diagonal 10 12 11 4 5

P 1. 1: 9 2 1 8 3 block-diagonal 10 12 11 4 5 6 7 28

P 1. 2: LRA for 9 2 1 8 3 10 12 11 4

P 1. 2: LRA for 9 2 1 8 3 10 12 11 4 5 6 7 ~ |S| << |W 2| 29

p 2. 1 Computing 31

p 2. 1 Computing 31

Comparing and • Computing Time – 100, 000 nodes; 100 partitions – Computing 100,

Comparing and • Computing Time – 100, 000 nodes; 100 partitions – Computing 100, 00 x is Faster! • Storage Cost – 100 x saving! Q 1, 1 = Q 2 1, Q 1, k 32

~ ~ ~ + + ? • Q: How to fix the green portions?

~ ~ ~ + + ? • Q: How to fix the green portions? 33

p 2. 2 Computing: Q 1, 1 _ = -1 V Q 2 1,

p 2. 2 Computing: Q 1, 1 _ = -1 V Q 2 1, U Q 1, k 9 1 2 8 3 10 12 11 4 5 6 7 34

We have: Communities Bridges SM Lemma says: 35

We have: Communities Bridges SM Lemma says: 35

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 36

On-Line Stage • Q + Pre-Computation Query ? Result • A (SM lemma) 37

On-Line Stage • Q + Pre-Computation Query ? Result • A (SM lemma) 37

On-Line Query Stage q 1: q 2: q 3: q 4: q 5: q

On-Line Query Stage q 1: q 2: q 3: q 4: q 5: q 6: 38

39

39

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 40

Experimental Setup • Dataset – DBLP/authorship – Author-Paper – 315 k nodes – 1,

Experimental Setup • Dataset – DBLP/authorship – Author-Paper – 315 k nodes – 1, 800 k edges • Approx. Quality: Relative Accuracy • Application: Center-Piece Subgraph 41

Query Time vs. Pre-Compute Time Log Query Time • Quality: 90%+ • On-line: •

Query Time vs. Pre-Compute Time Log Query Time • Quality: 90%+ • On-line: • Up to 150 x speedup • Pre-computation: • Two orders saving Log Pre-compute Time 42

Query Time vs. Pre-Storage Log Query Time • Quality: 90%+ • On-line: • Up

Query Time vs. Pre-Storage Log Query Time • Quality: 90%+ • On-line: • Up to 150 x speedup • Pre-storage: • Three orders saving Log Storage 43

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast.

Roadmap • Background – RWR: Definitions – RWR: Algorithms • Basic Idea • Fast. RWR – Pre-Compute Stage – On-Line Stage • Experimental Results • Conclusion 44

Conclusion • Fast. RWR – Reasonable quality preservation (90%+) – 150 x speed-up: query

Conclusion • Fast. RWR – Reasonable quality preservation (90%+) – 150 x speed-up: query time – Orders of magnitude saving: pre-compute & storage • More in the paper – The variant of Fast. RWR and theoretic justification – Implementation details • normalization, low-rank approximation, sparse – More experiments • Other datasets, other applications 45

Q&A Thank you! htong@cs. cmu. edu www. cs. cmu. edu/~htong 46

Q&A Thank you! htong@cs. cmu. edu www. cs. cmu. edu/~htong 46