Automatic Set Expansion for List Question Answering Richard

  • Slides: 30
Download presentation
Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W.

Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213 USA Set Expansion for List Question Answering

Set Expansion for List Question Answering Richard C. Wang Task n n Automatically improve

Set Expansion for List Question Answering Richard C. Wang Task n n Automatically improve answers generated by Question Answering systems for list questions, by using a Set Expansion system. For example: ¨ Name cities that have Starbucks. QA Answers Boston Seattle Carnegie-Mellon Aquafina Google Logitech r! Expanded Answers. Bette Seattle Boston Chicago Pittsburgh Carnegie-Mellon Google Language Technologies Institute, Carnegie Mellon University 2

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 3

Set Expansion for List Question Answering Richard C. Wang Question Answering (QA) n Question

Set Expansion for List Question Answering Richard C. Wang Question Answering (QA) n Question Answering task: ¨ Retrieve n answers to natural language questions Different question types: ¨ Factoid questions ¨ List questions ¨ Definitional questions ¨ Opinion questions n Major QA evaluations: ¨ Text REtrieval Conference (TREC): English ¨ NTCIR: Japanese, Chinese ¨ CLEF: European languages Language Technologies Institute, Carnegie Mellon University 4

Set Expansion for List Question Answering Typical QA Pipeline Richard C. Wang Question String

Set Expansion for List Question Answering Typical QA Pipeline Richard C. Wang Question String “Who invented the smiley? ” Question Analysis Answer type: Person Keywords: invented, smiley. . . The two original text smileys were invented on September 19, 1982 by Scott E. Fahlman. . . Analyzed Question Query Generation & Search Results Candidate Generation • smileys • September 19, 1982 • Scott E. Fahlman Candidate Answers Candidate Score Scott E. Fahlman 0. 853 0. 418 0. 239 smileys September 19, 1982 Knowledge Sources Language Technologies Institute, Carnegie Mellon University Answer Scoring Scored Answers 5

Set Expansion for List Question Answering Richard C. Wang QA System: Ephyra (Schlaefer et

Set Expansion for List Question Answering Richard C. Wang QA System: Ephyra (Schlaefer et al. , TREC 2007) n History: ¨ Developed at University of Karlsruhe, Germany and Carnegie Mellon University, USA ¨ TREC participations in 2006 (13 th out of 27 teams) and 2007 (7 th out of 21 teams) ¨ Released into open source in 2008 n Different candidate generators: ¨ Answer type classification ¨ Regular expression matching ¨ Semantic parsing n Available for download at: http: //www. ephyra. info/ Language Technologies Institute, Carnegie Mellon University 6

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 7

Set Expansion for List Question Answering Richard C. Wang Set Expansion (SE) n n

Set Expansion for List Question Answering Richard C. Wang Set Expansion (SE) n n n For example, ¨ Given a query: {“survivor”, “amazing race”} ¨ Answer is: {“american idol”, “big brother”, . . } More formally, ¨ Given a small number of seeds: x 1, x 2, …, xk where each xi St ¨ Answer is a listing of other probable elements: e 1, e 2, …, en where each ei St A well-known example of a web-based set expansion system is Google Sets™ ¨ http: //labs. google. com/sets Language Technologies Institute, Carnegie Mellon University 8

Set Expansion for List Question Answering SE System: SEAL n (Wang & Cohen, ICDM

Set Expansion for List Question Answering SE System: SEAL n (Wang & Cohen, ICDM 2007) Features ¨ Independent of human/markup language n n ¨ Support seeds in English, Chinese, Japanese, Korean, . . . Accept documents in HTML, XML, SGML, Te. X, Wiki. ML, … Does not require pre-annotated training data n n Richard C. Wang Utilize readily-available corpus: World Wide Web Based on two research contributions Automatically construct wrappers for extracting candidate items ¨ Rank extracted items using random graph walk ¨ n Try it out for yourself: http: //rcwang. com/seal Language Technologies Institute, Carnegie Mellon University 9

Set Expansion for List Question Answering Canon Nikon Olympus n n n SEAL’s SE

Set Expansion for List Question Answering Canon Nikon Olympus n n n SEAL’s SE Pipeline Richard C. Wang Pentax Sony Kodak Minolta Panasonic Casio Leica Fuji Samsung … Fetcher: downloads web pages from the Web Extractor: learns wrappers from web pages Ranker: ranks entities extracted by wrappers Language Technologies Institute, Carnegie Mellon University 10

Set Expansion for List Question Answering Richard C. Wang Challenge SE systems require relevant

Set Expansion for List Question Answering Richard C. Wang Challenge SE systems require relevant (non-noisy) seeds, but answers produced by QA systems are often noisy. n How can we integrate those two systems together? n ¨ We propose three extensions to SEAL Aggressive Fetcher n Lenient Extractor n Hinted Expander n Language Technologies Institute, Carnegie Mellon University 11

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 12

Set Expansion for List Question Answering Richard C. Wang Original Fetcher Procedure: 1. 2.

Set Expansion for List Question Answering Richard C. Wang Original Fetcher Procedure: 1. 2. 3. Compose a search query by concatenating all seeds Use Google to request top 100 web pages Fetch web pages and send to the Extractor Seeds Boston Seattle Carnegie-Mellon Query Boston Seattle Carnegie-Mellon Language Technologies Institute, Carnegie Mellon University 13

Set Expansion for List Question Answering Richard C. Wang Proposed Fetcher n Aggressive Fetcher

Set Expansion for List Question Answering Richard C. Wang Proposed Fetcher n Aggressive Fetcher (AF) Sends a two-seed query for every possible pair of seeds to the search engines ¨ More likely to compose queries containing only relevant seeds ¨ Seeds Boston Seattle Carnegie-Mellon Queries Boston Seattle Boston Carnegie-Mellon Seattle Carnegie-Mellon Language Technologies Institute, Carnegie Mellon University 14

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 15

Set Expansion for List Question Answering Richard C. Wang Original Extractor n A wrapper

Set Expansion for List Question Answering Richard C. Wang Original Extractor n A wrapper is a pair of L and R context string Maximally-long contextual strings that bracket at least one instance of every seed ¨ Extracts strings between L and R ¨ n Learn wrappers from web pages and seeds on the fly Utilize semi-structured documents ¨ Wrappers defined at character level ¨ n n No tokenization required (language-independent) However, very page specific (page-dependent) Language Technologies Institute, Carnegie Mellon University 16

Set Expansion for List Question Answering Language Technologies Institute, Carnegie Mellon University Richard C.

Set Expansion for List Question Answering Language Technologies Institute, Carnegie Mellon University Richard C. Wang 17

Set Expansion for List Question Answering Richard C. Wang Proposed Extractor n Lenient Extractor

Set Expansion for List Question Answering Richard C. Wang Proposed Extractor n Lenient Extractor (LE) ¨ Maximally-long contextual strings that bracket at least one instance of a minimum of two seeds ¨ More likely to find useful contexts that bracket only relevant seeds Text Learned Wrapper (w/o LE) . . . in Boston City Hall. . . in Seattle City Hall. . . at Boston University. . . at Seattle University. . . at Carnegie-Mellon University. . . at <blah> University Learned Wrappers (w/ LE) Language Technologies Institute, Carnegie Mellon University at <blah> University in <blah> City Hall 18

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 19

Set Expansion for List Question Answering Richard C. Wang Hinted Expander (HE) n Utilizes

Set Expansion for List Question Answering Richard C. Wang Hinted Expander (HE) n Utilizes contexts in the question to constrain SEAL’s search space on the Web ¨ Extract up to three keywords from the question using Ephyra’s keyword extractor ¨ Append the keywords to the search query n Example: ¨ Name n cities that have Starbucks. More likely to find documents containing desired set of answers Language Technologies Institute, Carnegie Mellon University 20

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 21

Set Expansion for List Question Answering Richard C. Wang Experiment #1: Ephyra n Evaluate

Set Expansion for List Question Answering Richard C. Wang Experiment #1: Ephyra n Evaluate on TREC 13, 14, and 15 datasets ¨ 55, n 93, and 89 list questions respectively Use SEAL to expand top four answers from Ephyra ¨ Outputs n a list of answers ranked by confidence scores For each dataset, we report: ¨ Mean n Average Precision (MAP) Mean of average precision for each ranked list ¨ Average n F 1 with Optimal Per-Question Threshold For each question, cut off the list at a threshold which maximizes the F 1 score for that particular question Language Technologies Institute, Carnegie Mellon University 22

Set Expansion for List Question Answering Richard C. Wang Experiment #1: Ephyra Language Technologies

Set Expansion for List Question Answering Richard C. Wang Experiment #1: Ephyra Language Technologies Institute, Carnegie Mellon University 23

Set Expansion for List Question Answering Richard C. Wang Experiment #2: Ephyra n n

Set Expansion for List Question Answering Richard C. Wang Experiment #2: Ephyra n n In practice, thresholds are unknown For each dataset, do 5 -fold cross validation: ¨ Train: Find one optimal threshold for four folds ¨ Test: Use threshold to evaluate the fifth fold n Introduce a fourth dataset: All ¨ Union n of TREC 13, 14, and 15 Introduce another system: Hybrid ¨ Intersection of original answers from Ephyra and expanded answers from SEAL Language Technologies Institute, Carnegie Mellon University 24

Set Expansion for List Question Answering Richard C. Wang Experiment #2: Ephyra Language Technologies

Set Expansion for List Question Answering Richard C. Wang Experiment #2: Ephyra Language Technologies Institute, Carnegie Mellon University 25

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question

Set Expansion for List Question Answering Richard C. Wang Outline n Introduction ¨ Question Answering ¨ Set Expansion n Proposed Approach ¨ Aggressive Fetcher ¨ Lenient Extractor ¨ Hinted Expander n Experimental Results ¨ QA System: Ephyra ¨ Other QA Systems n Conclusion Language Technologies Institute, Carnegie Mellon University 26

Set Expansion for List Question Answering Richard C. Wang Experiment: Other QA Systems n

Set Expansion for List Question Answering Richard C. Wang Experiment: Other QA Systems n Top five QA systems that perform the best on list questions in TREC 15 evaluation 1. 2. 3. 4. 5. n Language Computer Corporation (lcc. PA 06) The Chinese University of Hong Kong (cuhkqaepisto) National University of Singapore (NUSCHUAQA 1) Fudan University (FDUQAT 15 A) National Security Agency (QACTIS 06 C) For each QA system, train thresholds for SEAL and Hybrid on the union of TREC 13 and 14 ¨ Expand top four answers from the QA systems on TREC 15, and apply the trained threshold Language Technologies Institute, Carnegie Mellon University 27

Set Expansion for List Question Answering Richard C. Wang Experiment: Top QA Systems Language

Set Expansion for List Question Answering Richard C. Wang Experiment: Top QA Systems Language Technologies Institute, Carnegie Mellon University 28

Set Expansion for List Question Answering Richard C. Wang Conclusion A feasible method for

Set Expansion for List Question Answering Richard C. Wang Conclusion A feasible method for integrating a SE approach into any QA system n Proposed SE approach is effective n ¨ Improves QA systems on list questions by using only a few top answers as seeds n Proposed hybrid system is effective ¨ Improves systems Ephyra and (most) top five QA Language Technologies Institute, Carnegie Mellon University 29

Set Expansion for List Question Answering Richard C. Wang Thank You! Language Technologies Institute,

Set Expansion for List Question Answering Richard C. Wang Thank You! Language Technologies Institute, Carnegie Mellon University 30