Improvement of HITSbased Algorithms on Web Documents Presenter
Improvement of HITS-based Algorithms on Web Documents Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002. ACM. Intelligent Database Systems Lab
Outlines n Motivation n Objectives n Methodology n Experiments n Conclusions n Comments Intelligent Database Systems Lab
Motivation • Content analysis usually takes a long time, and it is almost impossible to get users' feedback or visiting times for most Web documents. Intelligent Database Systems Lab
Objectives • Present two ways to improve the precision of HITS-based algorithms on Web documents. Intelligent Database Systems Lab
Methodology – HITS algorithm limit hub authority New weighted HITS-BASED algorithm Intelligent Database Systems Lab
Methodology – HITS algorithm limit Intelligent Database Systems Lab
Methodology – Vector Space Model(VSM) Weight Vector a query q document Xi Inner Product Intelligent Database Systems Lab
Methodology – Vector Space Model(VSM) coverage of Google Intelligent Database Systems Lab
Methodology – Okapi Similarity Measurement(Okapi) Intelligent Database Systems Lab
Methodology – Cover Density Ranking (CDR) In CDR, the results of phrase queries are ranked in two steps: The score of the cover set Intelligent Database Systems Lab
Methodology – Three-Level Scoring Method (TLS) Compute the relevance of a Web page to a query two steps: (1 ) (2 ) Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Conclusions • The weighted HITS-based method performs better than Bharat's improved HITS algorithm. Intelligent Database Systems Lab
Comments • Advantages - Effective. • Applications - Information retrieval、Rank web pages. Intelligent Database Systems Lab
- Slides: 16