Part I Web Structure Mining Chapter 2 Hyperlink
Part I: Web Structure Mining Chapter 2: Hyperlink Based Ranking • • • Social Network Analysis Page. Rank Authorities and Hubs Link Based Similarity Search Enhanced Techniques for Page Ranking Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 1
Social Networks • Directed graph with weights assigned to its edges • Nodes represent documents and the edges – citations from one document to other documents. • Prestige can be associated with the number of input edges to a node (in-degree). • Prestige has a recursive nature. depends on the authority (or again, the prestige) of citations Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 2
Social Networks • adjacency matrix – if document cites document – otherwise • prestige score Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 3
Social Networks • Computing prestige • Eigen decomposition – Eigenvector P – Eigenvalue Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 4
Social Networks Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 5
Social Networks Power Iteration • • Loop: • While Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 6
Page. Rank • “Random web surfer” keeps clicking on hyperlinks at random with uniform probability • Implements random walk on the web graph • Page u links to web pages • Probability of visiting page v will be • Amount of prestige that page v receives from page u is of the prestige of u Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 7
Page. Rank Propagation of page rank Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 8
Page. Rank Calculation of page rank Norm Integers Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007. Slides for Chapter 1: Information Retrieval an Web Search 9
- Slides: 9