TopicSensitive Page Rank Taher H Haveliwala Page Rank

  • Slides: 44
Download presentation
Topic-Sensitive Page. Rank Taher H. Haveliwala

Topic-Sensitive Page. Rank Taher H. Haveliwala

Page. Rank l Importance is propagated l A global ranking vector is pre-computed

Page. Rank l Importance is propagated l A global ranking vector is pre-computed

Page. Rank

Page. Rank

Topic-Sensitive Page. Rank l Basic ¡ For idea each topic, the importance scores for

Topic-Sensitive Page. Rank l Basic ¡ For idea each topic, the importance scores for each page are computed ¡ Composite score of a page are calculated by combining the scores of the page based on the topics of the query

Topic-Sensitive Page. Rank l ODP-Biasing The top level categories of the Open Directory (16

Topic-Sensitive Page. Rank l ODP-Biasing The top level categories of the Open Directory (16 topics) is used l Let Tj be the set of URLs in the ODP categories cj l In computing the Page. Rank vector for topic cj, we replace the uniform damping vector by the nonuniform vector where l l It will be referred as

Topic-Sensitive Page. Rank We chose to make P(cj) uniform

Topic-Sensitive Page. Rank We chose to make P(cj) uniform

Topic-Sensitive Page. Rank

Topic-Sensitive Page. Rank

Experiment

Experiment

Experimental Results l Similarity ¡ overlap Measure for Induced Rankings of two sets A

Experimental Results l Similarity ¡ overlap Measure for Induced Rankings of two sets A and B l ¡ Kendall’s l = . k = 20 distance measure

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results l Query-Sensitive ¡ User Scoring Study 10 queries (randomly selected from our

Experimental Results l Query-Sensitive ¡ User Scoring Study 10 queries (randomly selected from our test set) l 5 volunteers l For each query, the volunteer was shown 2 result rankings: l • 1. top 10 results ranked with the unbiased Page. Rank vector • 2. top 10 results ranked with the topic-sensitive Page. Rank vector

Experimental Results ¡ User l Study( con’t) The volunteer was asked to • 1.

Experimental Results ¡ User l Study( con’t) The volunteer was asked to • 1. select all URLs which were “relevant” to the query • 2. select the ranking list which is better (They were not told anything about how either of the rankings was generated. )

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results l Context-Sensitive Scoring

Experimental Results l Context-Sensitive Scoring

Experimental Results

Experimental Results

Other issues l Search Context ¡ hierarchical directory ¡ users’ browsing patterns ¡ Bookmarks

Other issues l Search Context ¡ hierarchical directory ¡ users’ browsing patterns ¡ Bookmarks ¡ email archives

Other issues ¡ Flexibility l Apply to any kinds of context ¡ Transparency l

Other issues ¡ Flexibility l Apply to any kinds of context ¡ Transparency l tune the classifier used on the search context, or adjust topic weights ¡ Privacy l a client-side program could use the user context to generate the user profile locally ¡ Efficiency l query-time cost and the offline preprocessing cost is low

Automatic Identification of User Interest For Personalized Search Feng Qiu Junghoo Cho

Automatic Identification of User Interest For Personalized Search Feng Qiu Junghoo Cho

User Preference Representation l Topic ¡T Preference Vector = [T(1), …, T(m)] ¡ T(i)

User Preference Representation l Topic ¡T Preference Vector = [T(1), …, T(m)] ¡ T(i) represents the user’s degree of interest in the ith topic ¡

User Preference Representation

User Preference Representation

User Model l Topic-Driven Random Surfer Model • The user browses the web in

User Model l Topic-Driven Random Surfer Model • The user browses the web in a two-step process. • First, the user chooses a topic of interest t for the ensuing sequence of random walks with probability T(t) • Then with equal probability, she jumps to one of the pages on topic t • Starting from this page, the user then performs a random walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1 -d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic. • This process is repeated forever.

User Model l Topic-Driven Searcher Model • The user always visits web pages through

User Model l Topic-Driven Searcher Model • The user always visits web pages through a search engine in a two-step process. • First, the user chooses a topic of interest t with probability T(t). • Then the user goes to the search engine and issues a query on the chosen topic t. • The search engine then returns pages ranked by TSPRt(p), on which the user clicks.

User Model l Relationship between V and T ¡ Under Topic-Driven Random Surfer Model

User Model l Relationship between V and T ¡ Under Topic-Driven Random Surfer Model ¡ Under Topic-Driven Searcher Model

Learning Topic Preference Vector l Problem ¡ Given V and TSPRi, find T satisfies

Learning Topic Preference Vector l Problem ¡ Given V and TSPRi, find T satisfies

Learning Topic Preference Vector l Linear regression ¡ Minimize l Maximum the square-root error

Learning Topic Preference Vector l Linear regression ¡ Minimize l Maximum the square-root error likelihood estimator ** ¡ l = the probability that the user visits the page p

Ranking Search Results Using Topic Preference Vectors l Ranking l because l of page

Ranking Search Results Using Topic Preference Vectors l Ranking l because l of page p =

Evaluation Metrics l Accuracy ¡ Te of topic preference vector is our estimation based

Evaluation Metrics l Accuracy ¡ Te of topic preference vector is our estimation based on the user’s click history ¡ T is the user’s actual topic preference vector

Evaluation Metrics l Accuracy ¡ Kendall of personalized ranking distance between and ¡ is

Evaluation Metrics l Accuracy ¡ Kendall of personalized ranking distance between and ¡ is the sorted list of top-k pages based on the estimated personalized ranking scores ¡ is the sorted list of top-k pages computed the user ‘s true preference vector

Evaluation Metrics l Improvement ¡ Average in search quality rank of relevant pages in

Evaluation Metrics l Improvement ¡ Average in search quality rank of relevant pages in the search result ¡S denotes the set of the pages the user u selected ¡ R(p) is the ranking of the page p

Experiments l User Study ¡ 10 subjects in the UCLA Computer Science Department ¡

Experiments l User Study ¡ 10 subjects in the UCLA Computer Science Department ¡ 04/2004 – 10/2004 (6 months) ¡ Queries to Google, results and clicked URLs average number of queries per subject = 255. 6 l average number of clicks per query = 0. 91 l

Experiments l Accuracy of Learning Method ¡ synthetic dataset generated by simulation based on

Experiments l Accuracy of Learning Method ¡ synthetic dataset generated by simulation based on our topic-driven searcher model l Generation of topic preference vector • Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized l Generation of click history • Use the generated topic preference vector to generate the clicks by the visit probability distribution dictated by the topic-driven searcher model

Experiments ¡ Accuracy of estimated topic preference vector

Experiments ¡ Accuracy of estimated topic preference vector

Experiments ¡ Accuracy of estimated topic preference vector

Experiments ¡ Accuracy of estimated topic preference vector

Experiments l Accuracy of Personalized Page. Rank

Experiments l Accuracy of Personalized Page. Rank

Experiments l Accuracy of Personalized Page. Rank

Experiments l Accuracy of Personalized Page. Rank

Experiments l Quality of Personalized Search

Experiments l Quality of Personalized Search

Experiments l Quality of Personalized Search

Experiments l Quality of Personalized Search

Conclusion l Proposed a framework to investigate the problem of personalizing web searching by

Conclusion l Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR l Conducted both theoretical and real life experiments to evaluate the approach

l Thank you

l Thank you