Distributed Page Rank Computation Based on Iterative Aggregation
Distributed Page. Rank Computation Based on Iterative Aggregation. Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen Nov. 2, 2005
Outline n n Quick Review of Page. Rank Distributed Page. Rank Computation n n Motivation Basic Idea Algorithm Experiments Conclusion and Future Work Nov. 2, 2005 2
Page. Rank - Background Ranking Web pages n Content-based methods n Link-based methods n n n Page. Rank HITS SALSA Nov. 2, 2005 [Page & Brin, 1998] [Kleinberg, 1998] [Lempel & Moran, 2000] 3
Page. Rank - Intuition n n Page A points to B means that the author of A recommends B. A page is of high quality if it is n n referred to by many other pages referred to by pages of high quality Nov. 2, 2005 4
Page. Rank - Model n Random Surfer - Markov Chain Nov. 2, 2005 5
Page. Rank - Algorithm n Power method Nov. 2, 2005 6
Outline n n Quick Review of Page. Rank Distributed Page. Rank Computation n n Motivation Basic Idea Algorithm Experiments Conclusion and Future Work Nov. 2, 2005 7
Motivation n Compass search engine confederation Nov. 2, 2005 8
Motivation (cont. ) Nov. 2, 2005 9
Basic Idea n n Divide and conquer Make use of the natural block structure of web graphs Nov. 2, 2005 10
DPC Algorithm n Step 1 - Initialization Local nodes compute local Page. Rank vectors. Nov. 2, 2005 11
DPC Algorithm (cont. ) n Step 2 - Aggregation Central node computes the Node. Rank vector. Nov. 2, 2005 12
DPC Algorithm (cont. ) n Step 3 - Disaggregation Local nodes compute extended local Page. Rank vectors. Nov. 2, 2005 X: External nodes 13
DPC Algorithm (cont. ) n Step 4 - Central node computes the L 1 distance between current global Page. Rank vector and previous one. Nov. 2, 2005 14
Advantages n n n DPC mainly consists of standard Page. Rank computation. Small matrices fit into main memory. Low communication overhead. Nov. 2, 2005 15
Outline n n Quick Review of Page. Rank Distributed Page. Rank Computation n n Motivation Basic Idea Algorithm Experiments Conclusion and Future Work Nov. 2, 2005 16
Experimental Setup n n n Simulation on a single Linux box. Group web pages by sites. For comparison n n Classic power method LPR-Ref-2 algorithm in [Wang, VLDB 2004] Nov. 2, 2005 17
Data Sets n n ST 01/03 - crawled in 2001/2003 by Stanford Web. Base Project CN 04 - crawled in 2004 from web sites in China. Nov. 2, 2005 18
Evaluation Metrics n n L 1 distance Kendall's τ-distance if page i and j are in different order in the two ranking lists. Nov. 2, 2005 19
Accuracy of the First Iteration n L 1 n Kendall Nov. 2, 2005 20
Convergence Rate Number of iteration for convergence ( ) Nov. 2, 2005 21
Outline n n Quick Review of Page. Rank Distributed Page. Rank Computation Experiments Conclusion and Future Work Nov. 2, 2005 22
Conclusion n n A distributed Page. Rank computation algorithm based on iterative aggregationdisaggregation (IAD) methods with Block Jacobi smoothing. Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method. Nov. 2, 2005 23
Future Work n n Implement DPC in distributed system. Integrate with Compass search engine confederation. How to update Page. Rank vectors efficiently within DPC framework? Nov. 2, 2005 24
Thank you ! Nov. 2, 2005 25
General Page. Rank Algorithm Nov. 2, 2005 26
IAD Method - Notations n Aggregation matrix(n×N) n Disaggregation matrix(N×n) Nov. 2, 2005 27
IAD Method Nov. 2, 2005 28
DPC Algorithm Nov. 2, 2005 29
DPC Algorithm (Cont. ) Nov. 2, 2005 30
DPC Algorithm (Cont. ) Nov. 2, 2005 31
DPC Convergence Analysis n n n The global convergence of IAD method is still an open problem. The difficulty partly comes from that the disaggregation step is non-linear. The paper proves the global convergence of Block Jacobi method in Page. Rank scenario when n > 2. Nov. 2, 2005 32
Experiments - Basic Facts n Distribution over size of sites Nov. 2, 2005 n Distribution over number of pages hosted by sites of different size 33
Experiments Communication Overhead Power LPR-Ref-2 / DPC Pos( • ) - Number of positive elements L/U - Block strictly lower/upper triangular part of P Nov. 2, 2005 34
- Slides: 34