Question Answering From Zero to Hero Elena Eneva

Sources V P C TREC-9. 2001. http: //la. lti. cs. cmu. edu/Javelin E. Voorhees.

Question Answering IR n n Successful in large scale text search problems Retrieve full

QA track in TREC Collection of unstructured documents (table 1 in V) Short factual

Evaluation By people Reciprocal rank of first correct answer or 0 % answers which

2 QA TREK systems Question Answering by Predictive Annotation - Prager, Brown, Coden (IBM)

Exploiting Redundancy in Figure 1 in C Question Answering Question -> a query for

3 features with greatest contribution Flexibility of the parser Passage retrieval technique (high quality

Passage Retrieval techniques Each document D is an ordered sequence of terms D= d

Redundancy Each candidate term t is is assigned a weight that takes into account

Exploiting redundancy “Who” questions 100 GB corpus K depth, W width Figure 2 in

Who wants to be a Millionaire? Real life example 70% correct overall Figure 5

Question answering by predictive annotation IBM system Shallow NLP System structure Figure 1 in

Slides: 13

Download presentation

Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar

Sources V P C TREC-9. 2001. http: //la. lti. cs. cmu. edu/Javelin E. Voorhees. "The Overview of the TREC-9 Question Answering track. " J. Prager, E. Brown, A. Coden and D. Radev. "Question answering by predictive annotation. " SIGIR '00. C. L. A. Clarke, G. V. Cormack and T. R. Lynam. "Exploiting redundancy in question answering. " In Proceedings of the 24 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001.

Question Answering IR n n Successful in large scale text search problems Retrieve full documents IE n n Successful in extracting very precise answers from text Work on pre-specified domains Combining the strengths

QA track in TREC Collection of unstructured documents (table 1 in V) Short factual questions in English (Why can't ostriches fly ? Where did Bill Gates go to college ? ) also figure 1 in V Return answer as a ranked list of 5 fragments of documents (2 categories: 50 and 250 bytes)

Evaluation By people Reciprocal rank of first correct answer or 0 % answers which were found Strict and Lenient scores (supported and unsupported judgment) Short and Long version

2 QA TREK systems Question Answering by Predictive Annotation - Prager, Brown, Coden (IBM) and Radev (U of Michigan) Exploiting Redundancy in Question Answering - Clarke, Cormack, Lynam (U of Waterloo) Ranking - Table 2 in V

Exploiting Redundancy in Figure 1 in C Question Answering Question -> a query for submission to a passage retrieval component -> a set of selection rules what guides the process of extracting answers from the passages (answer category) Get a list of k passages Identify possible answers Rank the possible answers Question analysis – IR – IE

3 features with greatest contribution Flexibility of the parser Passage retrieval technique (high quality passages) Redundancy in the answer selection component – contribution of evidence from multiple passages to identify the most likely answer

Passage Retrieval techniques Each document D is an ordered sequence of terms D= d 1 d 2 d 3 … dm Extent (u, v) (minimal) Query Q generated from the question Q={q 1, q 2, q 3, …} Compute the score for an extent(u, v) for which T Q is a cover Higher scores to passages whose P of occurrence is lower

Redundancy Each candidate term t is is assigned a weight that takes into account the number of distinct passages in which the term appears, as well as the relative frequency of the term in the database Wt = Ct log (N/ft) Ct is the number of distinct passages in which t appears Summing the weights of a all terms in a candidate answer Determine the first one, reduce weights to 0, do all over until have 5 Figure 2 in C

Exploiting redundancy “Who” questions 100 GB corpus K depth, W width Figure 2 in C

Who wants to be a Millionaire? Real life example 70% correct overall Figure 5 in C

Question answering by predictive annotation IBM system Shallow NLP System structure Figure 1 in P Annotation Indexing