28 Page Rank Google Page Rank Quantifying Importance

  • Slides: 32
Download presentation
28. Page. Rank Google Page. Rank

28. Page. Rank Google Page. Rank

Quantifying Importance How do you rank web pages for importance given that you know

Quantifying Importance How do you rank web pages for importance given that you know the link structure of the Web, i. e. , the in-links and out-links for each web page? A related question: How does a deleted or added link on a webpage affect its “rank”? Insight Through Computing

Background Index all the pages on the Web from 1 to n. (n is

Background Index all the pages on the Web from 1 to n. (n is around ten billion. ) The Page. Rank algorithm orders these pages from “most important” to “least important. ” It does this by analyzing links, not content. Insight Through Computing

Key ideas n n There is a random web surfer—a special random walk The

Key ideas n n There is a random web surfer—a special random walk The surfer has some random “surfing” behavior—a transition probability matrix The transition probability matrix comes from the link structure of the web—a connectivity matrix Applying the transition probability matrix Page Rank Insight Through Computing

A 3 -node network with specified transition probabilities A node . 3 Transition probabilitie.

A 3 -node network with specified transition probabilities A node . 3 Transition probabilitie. 1 s 2. 7. 3 . 6. 2 1 . 2 3. 1 Insight Through Computing . 5

A special random walk Suppose there a 1000 people on each node. At the

A special random walk Suppose there a 1000 people on each node. At the sound of a whistle they hop to another node in accordance with the “outbound” probabilities. For now we assume we know these probabilities. Later we will see how to get them. Insight Through Computing

At Node 1. 3. 1 2 0. 7. 3 . 6 0. 2 1

At Node 1. 3. 1 2 0. 7. 3 . 6 0. 2 1 . 2 3 0. 1 Insight Through Computing . 5

At Node 1. 3. 1 2 700. 3 . 6 200 1 . 2

At Node 1. 3. 1 2 700. 3 . 6 200 1 . 2 3 100 Insight Through Computing . 5

At Node 2 300 100 2 700. 3 600 200 1 . 2 3

At Node 2 300 100 2 700. 3 600 200 1 . 2 3 100 Insight Through Computing . 5

At Node 3 300 100 2 700 300 600 200 1 200 3 100

At Node 3 300 100 2 700 300 600 200 1 200 3 100 Insight Through Computing 500

State Vector: describes the state at each node at a specific time T=0 T=1

State Vector: describes the state at each node at a specific time T=0 T=1 T=2 1000 1120 1000 1300 1000 700 580 Insight Through Computing

After 100 iterations T=99 T=100 Node 1 1142. 85 Node 2 1357. 14 Node

After 100 iterations T=99 T=100 Node 1 1142. 85 Node 2 1357. 14 Node 3 500. 00 Appears to reach a steady state Call this the stationary vector Insight Through Computing

Transition Probability Matrix . 2. 6. 2 P. 7. 3. 3. 1. 1. 5

Transition Probability Matrix . 2. 6. 2 P. 7. 3. 3. 1. 1. 5 P(i, j) is the probability of hopping to node i from node j Insight Through Computing

Formula for the new state vector . 2. 6. 2 P. 7. 3. 3.

Formula for the new state vector . 2. 6. 2 P. 7. 3. 3. 1. 1. 5 P(i, j) is probability of hopping to node i from node j W(1) = P(1, 1)*v(1) + P(1, 2)*v(2) + P(1, 3)*v(3) W(2) = P(2, 1)*v(1) + P(2, 2)*v(2) + P(2, 3)*v(3) W(3) = P(3, 1)*v(1) + P(3, 2)*v(2) + P(3, 3)*v(3) v is the old state vector w is the updated state vector Insight Through Computing

The general case function w = Update(P, v) % Update state vector v based

The general case function w = Update(P, v) % Update state vector v based on transition % probability matrix P to give state vector w n = length(v); w = zeros(n, 1); for i=1: n for j=1: n w(i) = w(i) + P(i, j)*v(j); end Insight Through Computing

To obtain the stationary vector… function [w, err]= Stat. Vec(P, v, tol, k. Max)

To obtain the stationary vector… function [w, err]= Stat. Vec(P, v, tol, k. Max) % Iterate to get stationary vector w w = Update(P, v); err = max(abs(w-v)); k = 1; while k<k. Max && err>tol v = w; w = Update(P, v); err = max(abs(w-v)); k = k+1; end Insight Through Computing

2 Stationary vector indicates importance: 1 3 . 3. 1 2. 7 1357 .

2 Stationary vector indicates importance: 1 3 . 3. 1 2. 7 1357 . 3 . 6. 2 1 1143 Insight Through Computing . 2 3 500 . 1 . 5

A random walk on the web Repeat: You are on a webpage. There are

A random walk on the web Repeat: You are on a webpage. There are m outlinks, so choose one at random. Click on the link. Use the link structure of the web to figure out the transitional probabilities! Insight Through Computing Random island hopping Repeat: You are on an island. According to the transitional probabilities, go to another island. (Assume no dead ends for now; we deal with them later. )

Connectivit y Matrix G 0 1 1 0 0 0 1 0 1 1

Connectivit y Matrix G 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 G(i, j) is 1 if there is a link on page j to page i. (I. e. , you can get to i from j. ) Insight Through Computing

G 0 1 1 0 0 0 1 0 1 1 1 0 0

G 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 P 0 ? ? 0 0 0 ? 0 0 ? ? ? 0 0 0 0 0 ? 0 0 0 0 ? 0 ? ? 0 0 Connectivit y Matrix Transition Probability Matrix derived from Connectivity Matrix Insight Through Computing

G 0 1 1 0 0 0 1 0 1 1 1 0 0

G 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 P 0 ? ? 0 0 0 ? 0 0 ? ? ? 0 0 0 0 0 ? 0 0 0 0 ? 0 ? ? 0 0 Connectivit y Matrix Transition Probability A. 0 B. 1/8 C. 1/3 D. 1 E. rand(1) Insight Through Computing

G 0 1 1 0 0 0 1 P 0. 33 0 0 0

G 0 1 1 0 0 0 1 P 0. 33 0 0 0 0 0 1 Connectivit y Matrix Transition Probability Matrix derived from Connectivity Matrix Insight Through Computing 0 0 1 1 1 0 0 0 0 0 1 0 0 0. 50. 25 0 0. 50 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 . 25 0 0

Connectivity (G) Transition Probability (P) [n, n] = size(G); P = zeros(n, n); for

Connectivity (G) Transition Probability (P) [n, n] = size(G); P = zeros(n, n); for j=1: n P(: , j) = G(: , j)/sum(G(: , j)); end Insight Through Computing

To obtain the stationary vector… function [w, err]= Stat. Vec(P, v, tol, k. Max)

To obtain the stationary vector… function [w, err]= Stat. Vec(P, v, tol, k. Max) % Iterate to get stationary vector w w = Update(P, v); err = max(abs(w-v)); k = 1; while k<k. Max && err>tol v = w; w = Update(P, v); err = max(abs(w-v)); k = k+1; end Insight Through Computing

Stationary vector represents how “popular” the pages are Page. Rank 0. 5723 0. 8206

Stationary vector represents how “popular” the pages are Page. Rank 0. 5723 0. 8206 0. 7876 0. 2609 0. 2064 0. 8911 0. 2429 0. 4100 stat. Vec Insight Through Computing 0. 8911 6 0. 8206 2 0. 7876 3 0. 5723 1 0. 4100 8 0. 2609 4 0. 2429 7 0. 2064 5 sorted idx 4 2 3 6 8 1 7 5 p. R

[sorted, idx] = sort(-stat. Vec); for k= 1: length(stat. Vec) j = idx(k); %

[sorted, idx] = sort(-stat. Vec); for k= 1: length(stat. Vec) j = idx(k); % index of kth largest p. R(j) = k; end 0. 5723 0. 8206 0. 7876 0. 2609 0. 2064 0. 8911 0. 2429 0. 4100 stat. Vec Insight Through Computing -0. 8911 6 -0. 8206 2 -0. 7876 3 -0. 5723 1 -0. 4100 8 -0. 2609 4 -0. 2429 7 -0. 2064 5 sorted idx 4 2 3 6 8 1 7 5 p. R

The random walk idea gets the transitional probabilities from connectivity. So how to deal

The random walk idea gets the transitional probabilities from connectivity. So how to deal with dead ends? Repeat: You are on a webpage. There are m outlinks. Choose one at random. Click on the link. What if there are no outlinks? Insight Through Computing

The random walk idea gets transitional probabilities from connectivity. Can modify the random walk

The random walk idea gets transitional probabilities from connectivity. Can modify the random walk to deal with dead ends. Repeat: You are on a webpage. If there are no outlinks in o Pick a random page and go there. c r i a f un n a , e c i t ks r o else w s d In prac ea h 5 8. b Flip an unfair coin. with pro if heads well. Click on a random outlink and go there. else Pick a random page and go there. end This results in a different transitional probability matrix. Insight Through Computing

Quantifying Importance How do you rank web pages for importance given that you know

Quantifying Importance How do you rank web pages for importance given that you know the link structure of the Web, i. e. , the in-links and out-links for each web page? A related question: How does a deleted or added link on a webpage affect its “rank”? Insight Through Computing

Shakespeare Sub. Web (n=4383) Insight Through Computing PRank 1 2 3 4 5 6

Shakespeare Sub. Web (n=4383) Insight Through Computing PRank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 In. Rank 24 417 110 14 68 8 37 54 2 261 1 67 118 50 3

Nat’l Parks Sub. Web (n=4757) PRank Insight Through Computing 1 2 3 4 5

Nat’l Parks Sub. Web (n=4757) PRank Insight Through Computing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 In. Rank 1 100 77 386 62 110 37 109 127 32 28 830 169 168 64

Basketball Sub. Web (n=6049) Insight Through Computing PRank 1 2 3 4 5 6

Basketball Sub. Web (n=6049) Insight Through Computing PRank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 In. Rank 2 1 20 19 3 61 23 43 91 28 85 358 313 71 68