Continued QA System Development Arjun Bhalla Laurel Hart

Continued QA System Development Arjun Bhalla, Laurel Hart, Kathleen Kamali

Overview • Major Restructuring of System • Enhanced Query Processing, Analysis • Things to Focus on for Next Deliverable

Major Restructuring • Code was messy, classes were disorganized, data flow was difficult to track • Split up code into smaller, more straightforward classes, placed code for doing related tasks into same classes • Renamed method headers as well, changed arguments to accept data properly, more efficiently for processing • Now each portion of the system has a discernible purpose/subtask, and the flow of data is much easier to trace (good for future debugging)

Enhanced Query Analysis • New stopword list trained on training question corpus • First attempt at basic question classification: formed array of question words and parallel array of answer categories • {“who”, “what”, “when”, “where”, “which”, “how”, “why”} • {“person”, “thing”, “time”, “place”, “thing”, “number”, “reason”} • Mapped arrays onto each other by index, and classified each question into a basic category depending on which question word was found inside it • Additionally, used Ling. Pipe API to code a POS-tagger that operates on a question string - Not yet used, in place for further experiments

New Answer Extraction Strategy • Strategy from D 2 was to read in document line by line and extract ngrams of length query. Length*2 and place them as possible answers based on similarity to query words • New strategy was to extract full sentences/paragraphs from document, compare against each token of query string (except for the first one, which is most likely the question-word) • If sentence contained all query words (not including the presumed question word), placed as possible answer • Resulted in full sentences containing the query words found as possible answers by the system • Rationale was to obtain a fully grammatically correct string undoubtedly related to query, which is more likely to contain true answer, attributes which ngramextraction strategy from D 2 was not achieving

Next steps • Indexing not yet focused on, must improve this before entire system can show improvement • For Query processing, pieces have only been put into place, but not being used; must come up with ways to connect POS-tagging with answer categorizer, and use tagging to more accurately match up queries with reasonable answer strings