CS 533 5 min Presentations M Sami Arpa

  • Slides: 13
Download presentation
CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan Amit Singhal, Chris

CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan Amit Singhal, Chris Buckley, and Mandar Mitra. 1996. Pivoted document length normalization, In Proceedings of the 19 th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '96). ACM, New York, NY, USA, 21 -29. DOI=10. 1145/243199. 243206 http: //doi. acm. org/10. 1145/243199. 243206

Pivoted Document Normalization Subject: Automatic information retrieval systems work with documents of varying lengths

Pivoted Document Normalization Subject: Automatic information retrieval systems work with documents of varying lengths in a text collection.

Pivoted Document Normalization Problem: Long documents have advantage in retrieval over the short documents

Pivoted Document Normalization Problem: Long documents have advantage in retrieval over the short documents because of: - Higher term frequencies - More terms

Pivoted Document Normalization Previous Solutions: Document length normalization, - Provides fairly retrieving documents of

Pivoted Document Normalization Previous Solutions: Document length normalization, - Provides fairly retrieving documents of all lengths. - Cosine normalization - Maximum tf normalization - Byte length normalization

Pivoted Document Normalization Problem with Previous Solutions: Probability of retrieval and probability of relevance

Pivoted Document Normalization Problem with Previous Solutions: Probability of retrieval and probability of relevance has different slopes, because of normalization factor.

Pivoted Document Normalization New approach: Pivoted Document Normalization

Pivoted Document Normalization New approach: Pivoted Document Normalization

Pivoted Document Normalization Likelihood of relevance and retrieval: - Order documents in a collection

Pivoted Document Normalization Likelihood of relevance and retrieval: - Order documents in a collection by their lengths - Divide them into several equal sized “bins” - Compute probability of a randomly selected relevant/retrieved document belonging to a certain bin.

Pivoted Document Normalization Pivoted Normalization Scheme: - “The probability of retrieval of a document

Pivoted Document Normalization Pivoted Normalization Scheme: - “The probability of retrieval of a document is inversely related to the normalization factor. ” - To increase the chances of some documents to be retrieved, decrease the value of norm. factor or opp.

Pivoted Document Normalization Method: - Use a previous normalization method (like cosine or byte

Pivoted Document Normalization Method: - Use a previous normalization method (like cosine or byte size) to initially retrieve some documents. - Find a tilting amount from previous normalization

Pivoted Document Normalization Method:

Pivoted Document Normalization Method:

Pivoted Document Normalization Results:

Pivoted Document Normalization Results:

Pivoted Document Normalization Conclusion: - If documents of different lenghts are retrieved with equal

Pivoted Document Normalization Conclusion: - If documents of different lenghts are retrieved with equal chances, retrieval effectivess increases. - Pivoted normalization technique could make previously developed normalization techniques more powerful.

thank you.

thank you.