Introduction to Information Retrieval Homework 1 TFIDF wyangntu
- Slides: 16
Introduction to Information Retrieval Homework 1 : TF-IDF 作業討論 楊立偉教授 wyang@ntu. edu. tw © Copyright 2016 1
Introduction to Information Retrieval Demonstration • Check the example Excel file 3
Introduction to Information Retrieval Discussion (1) • Why n-gram approach with tf-idf can extract Chinese keywords ? – if use only tf ? – if use only df ? – if use only idf ? 4
Introduction to Information Retrieval • tf can extract candidate terms, which may be common terms • idf can filter the common terms 5
Introduction to Information Retrieval • Exercise – longer terms are more informative – adjust the tf-idf weighting to favor the longer terms • apply the following in 保健 and 社會 topic tf-idf * len(term)n, n≧ 1
Introduction to Information Retrieval Discussion (2) • Use SQL for topic pre-selection – i. e. SELECT * FROM corpus WHERE topic LIKE '%政治%' – if use all corpus ? • What is the relationship between the topic and keywords ? 10
Introduction to Information Retrieval Topic-related Keyword • In addition to term weighting, need to consider the relevance between terms and the topic • use Mutual Information tf-idf * MI 11
Introduction to Information Retrieval • Mutual Information P : probability N : size of the corpus f(x) : the occurrences of term x in the corpus f(y) : the occurrences of term x in the corpus f(x, y) : the co-occurrences of term x and y in the corpus 12
Introduction to Information Retrieval • Mutual Information – larger MI means the tendency of co-occurrences of term x and y 值越大其共現率越高 13
Introduction to Information Retrieval 15
Introduction to Information Retrieval • using tf-idf * MI may extract topic-related keywords more precisely 16
- Manning introduction to information retrieval
- Concept drift example
- Introduction to information retrieval
- Introduction to information retrieval manning
- Jack prelutsky homework oh homework
- Homework oh homework i hate you you stink
- Jack prelutsky homework oh homework
- Oh homework oh homework poem
- Alitteration definition
- Literal and figurative language
- What is sequential search
- Search engine architecture in information retrieval
- Recall and precision in information retrieval
- Text operations in information retrieval
- Query operations in information retrieval
- Skip pointers in information retrieval
- Index construction in information retrieval