ASQA Academia Sinica Question Answering System for CLQA

  • Slides: 34
Download presentation
ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh

ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jiang, Chia-Wei Wu, Cheng-Lung Sung, Yu-Ren Chen, Shih-Hung Wu, Wen-Lian Hsu Academia Sinica, Taipei aska@iis. sinica. edu. tw NTCIR 2005 1/21

Outline n n The Design Principal System Architecture n n n Question Processing Passage

Outline n n The Design Principal System Architecture n n n Question Processing Passage Retrieval Answer Extraction Answer Ranking Performance Conclusion 2

The Design Principals of ASQA n Reduce the cost by adopting existing components n

The Design Principals of ASQA n Reduce the cost by adopting existing components n n n a knowledge representation framework an NER engine a Chinese word segmentation tool an open source IR engine and opennlp. maxent : machine learning packages Minimizing system complexity n n n Info. Map: Mencius: Auto. Tag: Lucene: SVMLight Only shallow NLP techniques are used We want to see how a Chinese QA system performs without deep NLP techniques Incorporating human knowledge with machine learning methods n n n Knowledge editing tool Knowledge as machine learning features Knowledge as dominant strategy 3

Chinese Word Segmentation n n Chinese text lacks explicit word boundaries. Word segmentation is

Chinese Word Segmentation n n Chinese text lacks explicit word boundaries. Word segmentation is a necessary step in many Chinese applications There are some word segmentation tools, but not designed for QA Combination rules are applied to form meaningful words for our QA system 第一銅鐵公司 第一(Neu) 銅(Na) 鐵(Na) 公司(Nc) First Copper Iron Corp. 第一(Neu) 銅鐵(Na) 公司(Nc) 4

Architecture of ASQA Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag

Architecture of ASQA Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 5

Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments

Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 6

Question Processing n Capture what the user want n Question classification n n Goal:

Question Processing n Capture what the user want n Question classification n n Goal: accurately classify a Chinese question into a question type Chinese Question: 奧運的發源地在哪裡? Where is the originating place of the Olympics? Question Type: Q_LOCATION|地 QFocus analysis n Goal: Capture other detail information about the question such as QFocus, NE, Time, QFDescription 7

Taxonomy of Question Types 8

Taxonomy of Question Types 8

A Hybrid Approach for Chinese Question Classification n Hybrid Approach n SVM: machine learning

A Hybrid Approach for Chinese Question Classification n Hybrid Approach n SVM: machine learning n n Info. Map: knowledge representation framework n n binary classifiers for each question type syntactic templates for classifying questions Features for SVM QC model n n n Character bigram How. Net Main Definition 9

A Hybrid Approach for Chinese Question Classification n Info. Map and SVM are integrated

A Hybrid Approach for Chinese Question Classification n Info. Map and SVM are integrated according to their individual advantages n n The templates in Info. Map for matching question types are designed with high precision. The SVM model has the Hownet Main Definition semantic feature. It has better recall. Use Info. Map approach as the dominant strategy Only fallback to SVM if there is no Info. Map template matched 10

QFocus Analysis n n QFocus analysis is a tagging problem which is different from

QFocus Analysis n n QFocus analysis is a tagging problem which is different from QType classification Some types of information are extracted by QFocus analysis n n QFocus: a QFocus is the category name of the answers Time (TI): Time or Date expressions Named Entities (NE): PERSON, LOCATION, ORGANIZATION QF Description (QFD): other description about the answer 請問 [2000年/TI] 的 [G 8高峰會/NE] 在 [日本/NE] 何地舉行? Year 2000 G 8 summit Japan Which place in Japan hosted the G 8 summit in 2000? 請問 [芬蘭第一位女總統/QF] 為誰? Finland's first woman president Who is the Finland's first woman president? 請問 [2000年/TI] [沉沒於北極圈巴倫支海/QFD] 的 [俄羅斯核子潛艇/QF] 的名字? Year 2000 sank in the Barents Sea Russian nuclear submarine Which Russian nuclear submarine sank in the Barents Sea in 2000? 11

A Hybrid Approach of QFocus Analysis n Combine syntactic rules and ME-model n n

A Hybrid Approach of QFocus Analysis n Combine syntactic rules and ME-model n n n n Tagging problem The ME Features are Context words, Context POS, Previous Tags 718 tagged question sentence Syntactic rules examples n n n “Noun” string located behind “的”, “之” QF “Noun” string located in front of “是”, “為”, “於”, and “在” QF string quoted by “「」” and “( )“ QFD 12

Passage Retrieval with Lucene Question Processing SVM Info. Map Answer Extraction QType ME Auto.

Passage Retrieval with Lucene Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 13

Passage Retrieval with Lucene n The required operator n n n Initial Query (IQ)

Passage Retrieval with Lucene n The required operator n n n Initial Query (IQ) sets quoted and noun terms as required Relaxed Query (RQ) doesn’t set any term as required The boosting operator n n n Quoted terms: 2 Nouns: 1. 2 Verbs: 0. 7 Passage retrieval runtime workflow End Q by IQ with W-idx Sort Q by IQ with C-idx Q by RQ with C-idx Sort 請問台灣童謠「天黑黑」是由哪位作曲家所創作? Initial query example: +"作曲家"^1. 2 +"台灣"^1. 2 "創作"^0. 7 +"童謠"^1. 2 +"天黑黑"^2 Relaxed query example: "作曲家"^1. 2 "台灣"^1. 2 "創作"^0. 7 "童謠"^1. 2 "天黑黑"^2 YES Any result? NO Q by RQ with W-idx 14

Answer Extraction Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter

Answer Extraction Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 15

Answer Extraction n Top 5 passages are sent to answer extraction module n Named

Answer Extraction n Top 5 passages are sent to answer extraction module n Named entity recognition (Mencius) n n n PERSON, LOC, and ORG are recognized by ME-based NER engine Fined-grained and other coarse-grained types are identified by taxonomy and templates in Info. Map Answer filtering n n Answers which are incompatible with the QType are filtered out Compatibility of question and answer types is defined by a mapping table 16

Answer Ranking Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter

Answer Ranking Question Processing SVM Info. Map Answer Extraction QType ME Auto. Tag Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 17

Answer Ranking n n Rank answer candidates with ranking scores A ranking score is

Answer Ranking n n Rank answer candidates with ranking scores A ranking score is calculated according to the QFocus analysis results NE Score Cue Score QFocus Scores 18

System Performance of CLQA Chinese to Chinese Task 19

System Performance of CLQA Chinese to Chinese Task 19

Performance Fig. 1 Fig. 2 Fig. 3 Fig. 4 20

Performance Fig. 1 Fig. 2 Fig. 3 Fig. 4 20

Conclusions n We have demonstrated that an effective Chinese QA system can be created

Conclusions n We have demonstrated that an effective Chinese QA system can be created n n Shallow NLP techniques Integrating knowledge templates (Info. Map) and machine learning methods (SVM, ME) Open Source IR engine is usable for Chinese QA In the future work, we would like to include deeper NLP techniques n n Parsing Event structure/Relation 21

Thank You for Your Attention Intelligent Agent Systems Lab (IASL) 智慧型代理人系統實驗室 Academia Sinica Question

Thank You for Your Attention Intelligent Agent Systems Lab (IASL) 智慧型代理人系統實驗室 Academia Sinica Question Answering System for CLQA 22

23

23

2004年奧運在哪一個城市舉行? (In which city were the Olympics held in 2004? ) [5 Time]: [3

2004年奧運在哪一個城市舉行? (In which city were the Olympics held in 2004? ) [5 Time]: [3 Organization] : [7 Q_Location]: ([9 Locaiton. Related. Event]) Knowledge representation for CQC in INFOMAP 24

Keyword Extraction and Document Preprocessing n Question Keyword Extraction n Question segments are combined

Keyword Extraction and Document Preprocessing n Question Keyword Extraction n Question segments are combined by the combination rules These combined words are filtered by a stop word list to remove unwanted Chinese words such as ‘請問’ (please) or ‘ 的’ (is) etc. Document Preprocessing n n Each news document is spited into long sentences by these three punctuations 。!? These sentences are indexed by word and character 25

Answer Filtering Table 26

Answer Filtering Table 26

QFocus Score Example Question: 請問 2000年率團訪問北韓的美國國務卿為誰? (Who is the secretary of State of United

QFocus Score Example Question: 請問 2000年率團訪問北韓的美國國務卿為誰? (Who is the secretary of State of United States visiting North Korea in 2000) Ans: 歐布萊特 (Albright ) Passage: 由於美國國務卿歐布萊特正在北韓訪問,楊潔篪此行將與副國務卿 陶伯特等官員會面。 Ø NE_Q={美國,北韓} NE_P={美國,北韓} Ø CUE_Q={2000年,國務卿} CUE_P={國務卿} The ranking score is (2/2)+(1/2)+1+0=2. 5 27

http: //asqa. iis. sinica. edu. tw/clqa/ 28

http: //asqa. iis. sinica. edu. tw/clqa/ 28

CLQA Lucene Indexer (CLQALucene. Passage. Indexer) (1) Segment (2) the text/ POS tagging Data

CLQA Lucene Indexer (CLQALucene. Passage. Indexer) (1) Segment (2) the text/ POS tagging Data Source (Big 5) (2)Combine short tokens via POS (3)Create the Field from combined texts (POS deleted) Context: field. Text() Doc. ID: field. Keyword() DOCUMENT (5)Choose an Analyzer (standard for char/ Whitespace for word) iw = new Index. Writer ( @Index. Dir. Path, new some. Analyzer(), true) Index. Writer field 1 Name 1/value 1 doc 1 field 2 Name 2/value 2 doc 2 (4)Add a Field into a Document Doc. add(field) Index (6)Add a Document into an Index. Writer. add. Documnet() 台北市<space>市長<space>馬英九 29

NER Features n 我/O 是/O 台北市 人/O Nh VH Nc Na chunk -1 是

NER Features n 我/O 是/O 台北市 人/O Nh VH Nc Na chunk -1 是 chunk 0 台北市 chunk +1 人 chunkpos -1 VH chunkpos 0 Nc chunkpos +1 Na chunkfirstword 台 chunklastword 市 char char -2 = 我 -1 = 是 0=台 +1 = 北 +2 = 市 tag -2 = O tag -1 = O 30

QFocus Features n n n n n chunk chunk tag -2 tag -1 -2

QFocus Features n n n n n chunk chunk tag -2 tag -1 -2 -1 0 +1 +2 context word features previous tags chunkpos -1 chunkpos 0 chunkpos +1 31

Maximum Entropy Framework n Manning and Schutze: Maximum entropy modeling is a framework for

Maximum Entropy Framework n Manning and Schutze: Maximum entropy modeling is a framework for integrating information from many heterogeneous information sources for classification. The data for a classification problem is described as a (potentially large) number of features. These features can be quite complex and allow the experimenter to make use of prior knowledge about what types of informations are expected to be important for classification. Each feature corresponds to a constraint on the model. We then compute the maximum entropy model, the model with the maximum entropy of all the models that satisfy the constraints. This term may seem perverse, since we have spent most of the book trying to minimize the (cross) entropy of models, but the idea is that we do not want to go beyond the data. If we chose a model with less entropy, we would add `information' constraints to the model that are not justified by the empirical evidence available to us. Choosing the maximum entropy model is motivated by the desire to preserve as much uncertainty as possible. 32

Question Processing SVM Info. Map Answer Extraction ME Auto. Tag QType Segments QFocus, QLimitations

Question Processing SVM Info. Map Answer Extraction ME Auto. Tag QType Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Filter Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index Question Processing SVM Info. Map documents Answer Extraction ME Auto. Tag QType Mencius Filter Mencius Segments QFocus, QLimitations Passage Retrieval Lucene char index Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 33

Question Processing SVM Info. Map Answer Extraction ME Auto. Tag QType Segments QFocus, QLimitations

Question Processing SVM Info. Map Answer Extraction ME Auto. Tag QType Segments QFocus, QLimitations Passage Retrieval Lucene Mencius Filter Mencius Answer Candidates Answer Ranking Auto. Tag Passages Answers word index char index documents 34