Ranking Linkbased Ranking 2 generation Reading 21 Queryindependent
Ranking Link-based Ranking (2° generation) Reading 21
Query-independent ordering n First generation: using link counts as simple measures of popularity. n Undirected popularity: n n Each page gets a score given by the number of in-links plus the number of out-links (es. 3+2=5). Directed popularity: n Score of a page = number of its in-links (es. 3). Easy to SPAM
Second generation: Page. Rank n Each link has its own importance!! n Page. Rank is n independent of the query n many interpretations…
Basic Intuition… What about nodes with no in/out links?
Google’s Pagerank Random jump Principal eigenvector r = [ a PT + (1 - a) e e. T ] × r B(i) : set of pages linking to i. #out(j) : number of outgoing links from j. e : vector of components 1/sqrt{N}.
Three different interpretations n Graph (intuitive interpretation) n n Matrix (easy for computation) n n Co-citation Eigenvector computation or a linear system solution Markov Chain (useful to prove convergence) n a sort of Usage Simulation “In the steady state” each page has a long-term visit rate - use this as the page’s score. 1 -a Any node a Neighbors
Pagerank: use in Search Engines n Preprocessing: n n n Given graph, build matrix a PT + Compute its principal eigenvector r r[i] is the pagerank of page i (1 - a) e e. T We are interested in the relative order n Query processing: n n Retrieve pages containing query terms Rank them by their Pagerank The final order is query-independent
HITS: Hypertext Induced Topic Search
Calculating HITS n n It is query-dependent Produces two scores per page: n Authority score: a good authority page for a topic is pointed to by many good hubs for that topic. n Hub score: A good hub page for a topic points to many authoritative pages for that topic.
Authority and Hub scores 5 2 3 1 4 1 6 7 a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)
HITS: Link Analysis Computation Where a: Vector of Authority’s scores h: Vector of Hub’s scores. A: Adjacency matrix in which ai, j = 1 if i j Thus, h is an eigenvector of AAt a is an eigenvector of At. A Symmetric matrices
Weighting links Weight more if the query occurs in the neighborhood of the link (e. g. anchor text).
- Slides: 12