Page Rank Random Walk The important of a

  • Slides: 20
Download presentation
Page. Rank & Random Walk “The important of a Web page is depends on

Page. Rank & Random Walk “The important of a Web page is depends on the readers interest, knowledge and attitudes…” –By Larry Page, Co-Founder of Google

Presentation Outline 1. 2. 3. 4. 5. 6. Introduction on Page. Rank Calculation of

Presentation Outline 1. 2. 3. 4. 5. 6. Introduction on Page. Rank Calculation of Page. Rank on Webpage Original algorithm on Page. Rank Modifications of the original algorithm Result on the modifications Applications of Page. Rank

Introduction on Page. Rank • Page. Rank is a link analysis algorithm … with

Introduction on Page. Rank • Page. Rank is a link analysis algorithm … with the purpose of "measuring" its (Webpage) relative importance within the set. – From Wikipedia, the free encyclopedia • Developed by Larry Page as his Ph. D research topic • 3 years later, he quitted Stanford and founded Google with Brin • He lost his Ph. D qualification. In return, his net worth now is …

Introduction on Page. Rank • Page. Rank = Importance of the Webpage • Concept

Introduction on Page. Rank • Page. Rank = Importance of the Webpage • Concept is simple: 20 20 Bloomberg 20 Page. Rank=60 10 Page. Rank= 20+10 = 30

Introduction on Page. Rank An example of Webpage system

Introduction on Page. Rank An example of Webpage system

Calculation of Page. Rank on Webpage A B C D

Calculation of Page. Rank on Webpage A B C D

Calculation of Page. Rank on Webpage • R(. ) = Page. Rank of a

Calculation of Page. Rank on Webpage • R(. ) = Page. Rank of a Webpage 1. R(A) = 100%R(B) + 50%R(C) 2. R(B) = 50%R(C) + 100%R(D) 3. R(C) = 100%R(A) ( 1 -1 -0. 5 0 0 1 -0. 5 -1 -1 0 )( ) = ( ) A B C D 0 0 0

Calculation of Page. Rank on Webpage • Let A = 50, then, B =

Calculation of Page. Rank on Webpage • Let A = 50, then, B = 25, C = 50 and D = 0 • Normalize the Page. Rank by dividing the number by 100. (A+B+C+D = 50+25+50+0) • Therefore, – – A = 0. 5 B = 0. 25 C = 0. 5 D=0 • In general:

Calculation of Page. Rank on Webpage • There are 2 PROBLEMS !!! – Problem

Calculation of Page. Rank on Webpage • There are 2 PROBLEMS !!! – Problem 1: What if there are over 280, 000 Web-pages, over 3 millions hyperlinks and the? – Problem 2: The Page. Rank of D = 0 It will be a bias Rank Sink may appear

Original algorithm on Page. Rank • In order to tackle the 2 problems, an

Original algorithm on Page. Rank • In order to tackle the 2 problems, an calculation algorithm was introduced: Where: c - Normalization factor N - No. of links on the page v E - A factor to tackle rank sink

Original algorithm on Page. Rank • Multiply Rk by matrix, A, to form Rk+1

Original algorithm on Page. Rank • Multiply Rk by matrix, A, to form Rk+1 (i. e. ARk = Rk+1) • A is a square matrix. – Au, v = 1/Nu if there is an edge from u to v. – Au, v = 0 if there is no edge from u to v. • R = c. AR, where c is the eigenvalue, and R is the eigenvector • We can treat c=1/normalization factor and R is the Page. Rank vector

Original algorithm on Page. Rank • The algorithm is:

Original algorithm on Page. Rank • The algorithm is:

Modifications of the original algorithm • The run time of the original algorithm is

Modifications of the original algorithm • The run time of the original algorithm is not efficient • Because the Web-page with low Page. Rank converge faster while the one with high rank spend more time to converge

Modifications of the original algorithm

Modifications of the original algorithm

Modifications of the original algorithm • Modification 1 – Main concept: For the Webpage

Modifications of the original algorithm • Modification 1 – Main concept: For the Webpage which Page. Rank is converged already, we could ignore them – Therefore we separate the matrix and vector into 2 parts – N = not yet converge; C = converged

Modifications of the original algorithm • Modification 1 –

Modifications of the original algorithm • Modification 1 –

Modifications of the original algorithm • Modification 2 – Disadvantage on modification 1: the

Modifications of the original algorithm • Modification 2 – Disadvantage on modification 1: the reordering cost of matrix A is expensive – Set AC be 0

Modifications of the original algorithm • Modification 2 –

Modifications of the original algorithm • Modification 2 –

Result on the modifications

Result on the modifications

Applications of Page. Rank • Searching machine • Type 1 – Title search –

Applications of Page. Rank • Searching machine • Type 1 – Title search – Finds all the webpages which titles contain all of the query words. Then it sorts the results by Page. Rank • Type 2 – Google – Full-text search engine using Page. Rank