Introduction to Page Rank Algorithm and Programming Assignment

Outline n n n Background Markov Chains Page. Rank Computation Exercise on Page. Rank

Background n History: Proposed by Sergey Brin and Lawrence Page (Google’s Bosses) in 1998

Background n Scenario: A random surfer who begins at a Web page A. ¨

Background n Problem: Current location of the surfer, e. g. , node A, has

Markov Chains n Markov Chain: A Markov chain is a discrete-time stochastic process consisting

Markov Chains n Transition Probability Matrix: A matrix with non-negative entries that satisfies ¨

Markov Chains n Ergodic Markov Chain : ¨ Conditions: n Irreducibility ¨ n Aperiodicity

Page. Rank Computation n Target Solve the steady-state probability vector π, which is the

Exercise on Page. Rank A= 2 n 0 1 0 1 0 Each 1

Example of Programming Assignment n 1 Input: 3 ¨ 01 ¨ 10000 1 ¨

Example of Programming Assignment From Left Node to Right Node on this path 1

Reference n http: //infolab. stanford. edu/~backrub/google. html

Slides: 14

Download presentation

Introduction to Page. Rank Algorithm and Programming Assignment 1 CSC 4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou Email: czhou@cse. cuhk. edu. hk

Outline n n n Background Markov Chains Page. Rank Computation Exercise on Page. Rank Example of Programming Assignment QA

Background n History: Proposed by Sergey Brin and Lawrence Page (Google’s Bosses) in 1998 at Stanford. ¨ Algorithm of the first generation of Google Search Engine. ¨ “The Anatomy of a Large-Scale Hypertextual Web Search Engine”. ¨ n Target: Measure the importance of Web page based on the link structure alone. ¨ Assign each node a numerical score between 0 and 1: Page. Rank. ¨ Rank Web pages based on Page. Rank values. ¨

Background n Scenario: A random surfer who begins at a Web page A. ¨ Execute a random walk from A to a randomly chosen Web page that A hyperlinks to. ¨ Some nodes are visited more often. Intuitively, these are nodes with many links coming in from other frequently visited nodes. ¨ n Idea: ¨ Pages visited more often in this walk are more important. B A C D

Background n Problem: Current location of the surfer, e. g. , node A, has no out-links? ¨ Teleport operation: ¨ n n The surfer jumps from a node to any other node in the Web graph. E. g. : Type an address into the URL bar. The destination of a teleport operation is chosen uniformly at random from all Web pages: 1/N Page. Rank Scheme: At node with no output-links: teleport operation ¨ At node with output-links: teleport operation with probability 0<α<1 and the standard random walk 1 - α. α is a fixed parameter chosen in advance. ¨

Markov Chains n Markov Chain: A Markov chain is a discrete-time stochastic process consisting of N states, each Web page corresponds to a state. ¨ A Markov chain is characterized by an N*N transition probability matrix P. ¨ n Transition Probability Matrix: Each entry is in the interval [0, 1]. ¨ Pij is the probability that the state at the next time-step is j, conditioned on the current state being i. ¨ Each entry Pij is known as a transition probabilit and depends only on the current state i. Markov property. ¨

Markov Chains n Transition Probability Matrix: A matrix with non-negative entries that satisfies ¨ is known as a stochastic matrix. ¨ Has a principal left eigenvector corresponding to its largest eigenvalue, which is 1. ¨ n Derive the Transition Probability Matrix P: ¨ Build the adjacency matrix A of the web graph. n There is a hyperlink from page i to page j, Aij = 1, otherwise Aij =0. Derive each 1 in A by the number of 1 s in its row. ¨ Multiply the resulting matrix by 1 - α. ¨ Add α/N to every entry of the resulting matrix, to obtain P. ¨

Markov Chains n Ergodic Markov Chain : ¨ Conditions: n Irreducibility ¨ n Aperiodicity ¨ ¨ A sequence of transitions of nonzero probability from any state to any state. States are not partitioned into sets such that all state transitions occur cyclically from one set to another. Property: n n n There is a unique steady-state probability vector π that is the principal left eigenvector of P. η(i, t) is the number of visits to state i in t steps. π(i)>0 is the steady-state probability for state i.

Page. Rank Computation n Target Solve the steady-state probability vector π, which is the Page. Rank of the corresponding Web page. ¨ πP=λ π, λ is 1 for stochastic matrix. ¨ n Method Power iteration. ¨ Given an initial probability distribution vector x 0 ¨ x 0*P=x 1, x 1*P=x 2 … Until the probability distribution converges. (Variation in the computed values are below some predetermined threshold. ) ¨

Exercise on Page. Rank A= 2 n 0 1 0 1 0 Each 1 divied by the number of links as ones inare this row Consider a Web graph with three nodes 1, 2, and 3. The 1 0 follows: 1 ->2, 3 ->2, 2 ->1, 2 ->3. Write down the transition probability (1 - α)* ½ 0 ½ matrices P for the surfer’s walk with teleporting, with the value of 3 0 1 0 teleport probability α=0. 5. + α* 1/3 1/3 1/3 = 1/6 2/3 1/6 5/12 1/6 2/3 1/6

Example of Programming Assignment n 1 Input: 3 ¨ 01 ¨ 10000 1 ¨ n Output: 0 ¨ 0. 5 ¨ 0 ¨ 2 5 0 1 10000 0 1 5 3

Example of Programming Assignment From Left Node to Right Node on this path 1 2 1 12 13 Shortest Path 1 12 2 123 23 23 21 none 32 none CB(2)= σ13(2)/σ13 + σ31(2)/ σ31 = 1/1 + 0 = 1 CB’(2) = CB(2)/(3 -1)(3 -2) = 0. 5 5 3

Reference n http: //infolab. stanford. edu/~backrub/google. html

n Questions?