Language Knowledge Engineering Lab Kyoto University EBMT System
Language Knowledge Engineering Lab. Kyoto University EBMT System of Kyoto University in OLYMPICS Task at IWSLT 2012 Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi {chu, nakazawa, kuro}@nlp. ist. i. kyoto-u. ac. jp Graduate School of Informatics, Kyoto University System Description Sub-sentence Splitting Optimized Chinese Segmenter Zh: 我带了矿泉水和茶,您喜欢喝什么? En: I’ve brought some mineral water and some tea. Which do you prefer? Zh 1: 我带了矿泉水和茶。 En 1: I’ve brought some mineral water and some tea. Zh 2: 您喜欢喝什么? En 2: Which do you prefer? Original 联合国 (United Nations) 研究所 (research institute) 服务业 (service industry) 竞争力 (competitive strength) 发明人 (inventor) 纳税人 (taxpayer) … Optimized 联合 (United )/国 (Nations) 研究 (research)/所 (institute) 服务 (service)/业 (industry) 竞争 (competitive)/力 (strength) 发明 (invent)/人 (person) 纳税 (pay taxes)/人 (person) … ※ Proposed for Chinese-Japanese MT [Chu+, 2012] Non-parallel Sentence Filtering Zh: 我上牛津大学。 (I am studying at Oxford University. ) En: What about you? Alignment Model Related Work [De. Nero+, 2008] Rule-based Decoding Constraints Punctuation-based splitting Ø Reduce the computational complexity Ø Avoid examples across punctuation boundaries Zh: 是的,这位女士要一杯曼哈顿酒,我要一杯马丁尼。 (Yes, this lady will have a Manhattan, and I’ll have a martini. ) En: Yes, I think so. This lady will have a Manhattan and I’ll have a martini. Proposed [Nakazawa+, 2011] Cartesian Product Non-parallel sentence pairs Selection 500 Nonparallel sentence pairs Step 3 HIT & BTEC corpora Simple position-based reordering Dependency tree-based reordering Ø Bilingual titles in Wikipedia Ø Bilingual terms in Wiktionary Experimental Results 5 K parallel sentence pairs from BTEC Step 1 & 2 Additional Corpora Wiki corpora Dictionary Classifier Baseline Optimized & Constrained +Wiki BLEU 11. 62 12. 09 12. 71 12. 22 Input: 这饭菜怎么样, 女士? Output: How about this food, Madam? Input: 干酪饼很不错, 尝一尝吗? Output: The cheese pie is very good, taste? IWSLT 2012 OLYMPICS Task, Hong Kong, Dec. 6 -7, 2012
- Slides: 1