1 A Brief Introduction of Page Rank CSE
1 A Brief Introduction of Page. Rank CSE 5243 Author: Y-K Shih Ohio State University Autumn 2012
Background 2 q q q Besides the keywords, how are the other evidences to rate the importance of a webpage within a set of webpages? Solution: Use the hyperlink structure E. g. a webpage linked by many webpages is probably important. q but q this method is not global (comprehensive). Page. Rank is developed by Larry Page in 1998.
Idea 3 q A graph representing WWW q Node: webpage q Directed edge: hyperlink q A user randomly clicks the hyperlink to surf WWW. q The probability a user stop in a particular webpage is the Page. Rank value. q A node that is linked by many nodes with high Page. Rank value receives a high rank itself; If there are no links to a node, then there is no support for that page.
A simple version 4 q q q u: a webpage Bu: the set of u’s backlinks Nv: the number of forward links of page v Initially, R(u) is 1/N for every webpage Iteratively update each webpage’s PR value until convergence.
Example 1 5 Page. Rank Calculation: first iteration
Example 1 6 Page. Rank Calculation: second iteration
Example 1 7 Convergence after some iterations
A little more advanced version 8 q Adding a damping factor d Image that a surfer would stop clicking a hyperlink with probability 1 -d q R(u) is at least (1 -d)/(N-1) q q. N is total num. of nodes.
Other applications 9 q Social network (Facebook, Twitter, etc) q Node: Person; Edge: Follower / Followee / Friend q Higher PR value: Celebrity q Citation network q Node: Paper; Edge: Citation q Higher PR values: Important Papers. q Protein-protein interaction network q Node: Protein; Edge: Two proteins bind together q Higher PR values: Essential proteins.
- Slides: 9