Exploring Segment Representations for Neural Segmentation Models Yijia

  • Slides: 22
Download presentation
Exploring Segment Representations for Neural Segmentation Models Yijia Liu, Wanxiang Che, Jiang Guo, Bing

Exploring Segment Representations for Neural Segmentation Models Yijia Liu, Wanxiang Che, Jiang Guo, Bing Qin, and Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Problem: NLP Segmentation Problem Harbin Institute of Technology Research Center for Social Computing and

Problem: NLP Segmentation Problem Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Problem: NLP Segmentation Problem • Harbin Institute of Technology Research Center for Social Computing

Problem: NLP Segmentation Problem • Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Motivating: Can we use word embedding in CWS? 浦东开发与建设 浦东 / 开发 / 与

Motivating: Can we use word embedding in CWS? 浦东开发与建设 浦东 / 开发 / 与 / 建设 Pudong development and construction Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Motivating: Can we use word embedding in CWS? • To achieve this gold, we

Motivating: Can we use word embedding in CWS? • To achieve this gold, we need • to access the segment (the potential word) during inference • to represent the segment Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Motivating: Can we use word embedding in CWS? • To achieve this gold, we

Motivating: Can we use word embedding in CWS? • To achieve this gold, we need • to access the segment (the potential word) during inference • to represent the segment in “浦东开发与建设” “浦东” is a potential word structure prediction segment representation “浦东”: [0. 5, 0. 3, 0. 6, …] “虹桥”: [0. 5, 0. 2, 0. 5, …] they have similar syntactic/semantic function. Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Motivating: Can we use word embedding in CWS? • To achieve this gold, we

Motivating: Can we use word embedding in CWS? • To achieve this gold, we need • to access the segment (the potential word) during inference • to represent the segment in “浦东开发与建设” “浦东” is a potential word semi-Markov CRF deep learning “浦东”: [0. 5, 0. 3, 0. 6, …] “虹桥”: [0. 5, 0. 2, 0. 5, …] they have similar syntactic/semantic function. Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Refresh on semi-CRF • Harbin Institute of Technology Research Center for Social Computing and

Refresh on semi-CRF • Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

 • crf styled features: • input unit level information • e. g. :

• crf styled features: • input unit level information • e. g. : character • semi-crf styled features: • segment-level information • e. g. : length of the segment • suffer from sparsity and can not efficient utilizing the unlabeled data Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

 • Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

• Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Composing Input Units • Net SRNN SCNN Harbin Institute of Technology SCONCATE Research Center

Composing Input Units • Net SRNN SCNN Harbin Institute of Technology SCONCATE Research Center for Social Computing and Information Retrieval

Embedding Entire Segment • Harbin Institute of Technology Research Center for Social Computing and

Embedding Entire Segment • Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Final Model Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Final Model Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Experiments • Two typical NLP segmentation tasks: NER and CWS • Baselines: • sparse

Experiments • Two typical NLP segmentation tasks: NER and CWS • Baselines: • sparse feature CRF • neural sequence labeling • neural CRF Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

w/ Input Units Composition only • structure predication models outperform classification • but difference

w/ Input Units Composition only • structure predication models outperform classification • but difference is not significant within structure models Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

w/ Segment Embedding: Learning from the Training data? • severe overfitting • initialize with

w/ Segment Embedding: Learning from the Training data? • severe overfitting • initialize with embedding solve this problem Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

w/ Segment Embedding: Auto-segmented data from Homo- or Hetero- baseline • Generally, they all

w/ Segment Embedding: Auto-segmented data from Homo- or Hetero- baseline • Generally, they all help • Hetero- is a little better than Homo- baseline • confirmed with boosting in machine learning Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Final Result • Using segment-level representation greatly improve the performance Harbin Institute of Technology

Final Result • Using segment-level representation greatly improve the performance Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Final Result (compare w/ NER SOTA) • achieve comparable performance without domain-specific knowledge Harbin

Final Result (compare w/ NER SOTA) • achieve comparable performance without domain-specific knowledge Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Final Result (compare w/ CWS SOTA) • achieve SOTA on two datasets Harbin Institute

Final Result (compare w/ CWS SOTA) • achieve SOTA on two datasets Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Conclusion • We thoroughly study representing the segment in neural semi-CRF • SCONCATE is

Conclusion • We thoroughly study representing the segment in neural semi-CRF • SCONCATE is comparable with SRNN but runs faster • Segment embedding greatly improve the performance • Our code can be found at: https: //github. com/Exp. Results/segrep-fornn-semicrf Harbin Institute of Technology Research Center for Social Computing and Information Retrieval

Thanks and Questions! Harbin Institute of Technology Research Center for Social Computing and Information

Thanks and Questions! Harbin Institute of Technology Research Center for Social Computing and Information Retrieval