Exploiting the Hierarchical Structure for Link Analysis GuiRong

  • Slides: 10
Download presentation
Exploiting the Hierarchical Structure for Link Analysis Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong

Exploiting the Hierarchical Structure for Link Analysis Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen Presented by: Xiaoguang Qi Page 2005 -10 -18

Introduction • Existing link analysis algorithms often suffer from two problems – Sparsity of

Introduction • Existing link analysis algorithms often suffer from two problems – Sparsity of link graph – Biased-ranking of newly-emerging pages • Incorporate the inherent hierarchical structure of the web into link analysis to deal with these problems 2 intro 2005 -10 -18

Sketch of Hierarchical Ranking Algorithm 1. Web pages are aggregated based on their hierarchical

Sketch of Hierarchical Ranking Algorithm 1. Web pages are aggregated based on their hierarchical structure at directory, host or domain level 2. Link analysis if performed on the aggregated graph 3. The importance of each node on the aggregated graph is distributed to individual pages belong to the node 3 sketch 2005 -10 -18

Two-Layer Hierarchical Graph • Upper-layer graph – Partition the page set on a certain

Two-Layer Hierarchical Graph • Upper-layer graph – Partition the page set on a certain level – One supernode for each partition – Edges between supernodes are weighted • Weight (Si Sj) = # links from pages in Si to pages in Sj • Lower-layer graph – All the pages within a supernode are organized in a hierarchical structure based on the URL relationship 4 graph 2005 -10 -18

Hierarchical Random Walk Model • Surf on the lower-layer graph – Go to another

Hierarchical Random Walk Model • Surf on the lower-layer graph – Go to another page within current supernode • Surf on the upper-layer graph – Follow a link originated from current supernode – Jump to a random supernode 5 random walk 2005 -10 -18

Calculating Supernode Importance • Supernode importance • In matrix form 6 supernode 2005 -10

Calculating Supernode Importance • Supernode importance • In matrix form 6 supernode 2005 -10 -18

Calculating Page Importance • Constructing weighted tree structure • Calculating page importance by DHC

Calculating Page Importance • Constructing weighted tree structure • Calculating page importance by DHC – – 7 page 2005 -10 -18

Parameter Tuning • Aggregation level – Host level aggregation is the best choice •

Parameter Tuning • Aggregation level – Host level aggregation is the best choice • Parameter tuning – θ=0. 6 – α=0. 6 – β=0. 4 – γ=0. 8 8 tuning 2005 -10 -18

Experimental Results Hierarchical ranking algorithm consistently outperforms other well-known ranking algorithms • – BM

Experimental Results Hierarchical ranking algorithm consistently outperforms other well-known ranking algorithms • – BM 2500, Block. Rank, Page. Rank, Layer. Rank, Weighted. Rank, Host. Ranking on sparse data • – Effectively alleviate the sparse link problem 9 result 2005 -10 -18

Experimental Results (Cont. ) • Ranking of new pages – Aim: to assign reasonable

Experimental Results (Cont. ) • Ranking of new pages – Aim: to assign reasonable rank to newly-emerging web pages – Test in an analogous way • Test set: 10, 000 pages randomly selected with different rank values • Remove 90% of their incoming links • Perform algorithms on the modified graph 10 2005 -10 -18