Maximum Likelihood Estimation for Information Thresholding Carnegie Mellon

Overview u u u Adaptive filtering: definition and challenges Threshold based on score distribution

Adaptive Filtering Given an initial description of information needs, a filtering system sifts through

Adaptive Filtering u Three major problems l l l u Learning corpus statistics, such

A Model of Score Distribution: Assumptions and Empirical Justification u Relevant: u Non-relevant: u

Optimize for Linear Utility Measure: from Score Distribution to Probability of Relevancy u p:

Optimize for F Measure: From Score Distribution to Precision and Recall If set threshold

What We Have Now? u u u A model for score distribution Algorithms to

Bias Problem for Parameter Estimation while Filtering u u We only receive feedback for

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (1) ML: the best estimation

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (2) For each item inside

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (3) Calculating the denominator: 12

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (4) • For a relevant

Relationship to Arampatzis’s Estimation If no threshold exists The previous formula becomes: • For

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (5) u Optimization using conjugate

Experimental Methodology (1) u Optimization goal (similar to the measure used by TREC 9):

Results: OHSUMED Data OHSU topics MESH topics Run 1: Biased estimation Run 2: Biased

Results: Financial Times Run 1: Biased estimation Run 2: Biased estimation + min. delivery

Result Analysis: Difference Between Run 4 and Run 2 on TREC 9 OHSU Topics

Conclusion u Score density distribution l l u u u Relevant documents: normal distribution

Slides: 21

Download presentation

Maximum Likelihood Estimation for Information Thresholding Carnegie Mellon Yi Zhang & Jamie Callan Carnegie Mellon University {yiz, callan}@cs. cmu. edu 1

Overview u u u Adaptive filtering: definition and challenges Threshold based on score distribution and the sampling bias problem Maximum likelihood estimation for score distribution parameters Results of Experiments Conclusion 2

Adaptive Filtering Given an initial description of information needs, a filtering system sifts through a stream of documents, and delivers relevant documents to a user as soon as the document arrives. Relevance feedback maybe available for some of the delivered documents, thus user profiles can be updated adaptively. Filtering System 3

Adaptive Filtering u Three major problems l l l u Learning corpus statistics, such as idf Learning user profile, such as adding or deleting key words and adjusting term weights. (Scoring method) Learning delivery threshold. (Binary judgment) Evaluation Measures l l Linear utility = r 1*RR+r 2*NR+r 3*RN+r 4*NN Optimizing linear utility => Finding P(relevant|document) In one dimension: P(relevant|document) = P(relevant|score) F measure 4

A Model of Score Distribution: Assumptions and Empirical Justification u Relevant: u Non-relevant: u According to other researchers, this is generally true for various statistical searching systems (scoring methods, Manmatha’s paper, Arampatzis’s paper) Figure 1. Density of document scores: TREC 9 OHSU 5 Topic 3 and Topic 5

Optimize for Linear Utility Measure: from Score Distribution to Probability of Relevancy u p: p(r) ratio of relevant documents 6

Optimize for F Measure: From Score Distribution to Precision and Recall If set threshold at θ: 7

What We Have Now? u u u A model for score distribution Algorithms to find the optimal threshold for different evaluation measures given the model Learning task: find the parameters for the model? 8

Bias Problem for Parameter Estimation while Filtering u u We only receive feedback for documents delivered Parameter estimation based on random sampling assumption is biased Sampling criteria depends on threshold, which changes over time Solution: maximum likelihood principle, which is guaranteed to be unbiased Figure: Estimation of parameters for relevant document scores of TREC 9 OHSU Topic 3 with a fixed dissemination threshold 0. 4435 9

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (1) ML: the best estimation of parameters is the one that maximizes the probability of training data: 10

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (2) For each item inside the sum operation of the previous formula: 11

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (3) Calculating the denominator: 12

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (4) • For a relevant document delivered: • For a non-relevant document delivered: 13

Relationship to Arampatzis’s Estimation If no threshold exists The previous formula becomes: • For a relevant document delivered: • For a non-relevant document delivered: Corresponding result will be the same as Arampatzis’s 14

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (5) u Optimization using conjugate gradient descent algorithm u Smoothing using conjugate prior: l l l Prior for p: beta distribution: Prior for variance: Set: 15

Experimental Methodology (1) u Optimization goal (similar to the measure used by TREC 9): T 9 U’=2*Relevant_Retrieved-Non_Relevant_Retrieved=2 RR-NR Corresponding rule: deliver if : u Dataset l l u u OHSUMED data (348566 articles from 1887 to 1991. 63 OHSUMED queries and 500 Me. SH headings to simulate user profiles) FT data (210158 articles from Financial Times 1991 to 1994. TREC topics 351 -400 to simulate user profiles) Each profile begins with 2 relevant documents and an initial user profile No profile updating for simplicity. 16

Experimental Methodology (2) u Four l l runs for each profile Run 1 : biased estimation of parameters because sampling bias was not considered Run 3 : maximum likelihood estimation. Both runs will stop delivering documents if the threshold is set too high, especially in the early stages of filtering. We introduced a minimum delivery ratio: If a profile has not achieved the minimum delivery ratio, its threshold will be decreased automatically: l l Run 2: biased estimation + minimum delivery ratio Run 4: maximum likelihood estimation + minimum delivery ratio u Time: 21 minutes for the whole process of 63 OHSU topics on 4 years of OHSUMED data (ML algorithm) 17

Results: OHSUMED Data OHSU topics MESH topics Run 1: Biased estimation Run 2: Biased estimation+ min. delivery Ratio Run 3: Unbiased estimation Run 4: Unbiased estimation+min. delivery ratio T 9 U’ utility 1. 84 3. 25 2. 7 8. 17 Avg. docs. delivered per profile 3. 83 9. 65 5. 73 18. 40 Precision 0. 37 0. 29 0. 36 0. 32 Recall 0. 036 0. 080 0. 052 0. 137 T 9 U’ utility 1. 89 4. 28 2. 44 13. 10 Avg. docs. delivered per profile 3. 51 11. 82 6. 22 27. 91 Precision 0. 42 0. 39 0. 40 0. 34 Recall 0. 018 0. 046 0. 025 0. 068 18

Results: Financial Times Run 1: Biased estimation Run 2: Biased estimation + min. delivery ratio Run 3: Unbiased estimation Run 4: Unbiased estimation + min. delivery ratio T 9 U’ utility 1. 44 -0. 209 0. 65 0. 84 Avg. docs. Delivered per profile 9. 58 10. 44 9. 05 12. 27 Precision 0. 20 0. 17 0. 22 0. 26 Recall 0. 161 0. 167 0. 15 0. 193 19

Result Analysis: Difference Between Run 4 and Run 2 on TREC 9 OHSU Topics Utility: ML - Biased Topics • For some of the topics , ML (run 4) has a much higher utility than Run 2, while they are similar in most of the other topics Docs delivered: ML -Biased Topics • For most of the topics, ML (Run 4) delivered more documents than Run 2 20

Conclusion u Score density distribution l l u u u Relevant documents: normal distribution Non-relevant documents: exponential distribution Bias problem due to non-random sampling can be solved based on the maximum likelihood principle Significant improvement in the TREC-9 filtering task. Future work l l Thresholding while updating profiles Non-random sampling problem in other task 21