Finding Translation Correspondences from Parallel Parsed Corpus for

  • Slides: 23
Download presentation
Finding Translation Correspondences from Parallel Parsed Corpus for Example -based Translation Eiji Aramaki (Kyoto-U),

Finding Translation Correspondences from Parallel Parsed Corpus for Example -based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi Sato (Kyoto-U), Hideo Watanabe (IBM Japan)

Introduction Translation examples Statistical approach 1 -2% Parallel Corpus Co-occurrence information Our method 50%

Introduction Translation examples Statistical approach 1 -2% Parallel Corpus Co-occurrence information Our method 50% Syntactic Information Translation dictionary

Goal This paper shows ・・・ great contributions of TFP ・・・ 全要素生産性 が 大きく 寄与して

Goal This paper shows ・・・ great contributions of TFP ・・・ 全要素生産性 が 大きく 寄与して いること が 示されている (show) (TFP) case-maker (great) (contribution) case-maker

Problems • For finding many correspondences Translation Dictionary 2 Problems 1: some words can

Problems • For finding many correspondences Translation Dictionary 2 Problems 1: some words can not be consulted by a dictionary 2: ambiguity resolution of consulting dictionary

Overview • Introduction • Method • Experiments • Conclusion

Overview • Introduction • Method • Experiments • Conclusion

Method Step 1 Detection of Phrasal Dependency Structure Step 2 Detection of Basic Phrasal

Method Step 1 Detection of Phrasal Dependency Structure Step 2 Detection of Basic Phrasal Correspondences by Consulting Dictionary Step 3 Discovery of New Correspondences By Handling Remaining Phrases

Step 1: Phrasal Dependency Structures I bought this car by monthly installments. ESG (English

Step 1: Phrasal Dependency Structures I bought this car by monthly installments. ESG (English Parser) Rules I bought this car by monthly installments

Step 1: Phrasal Dependency Structures Rules u Function words are grouped together with a

Step 1: Phrasal Dependency Structures Rules u Function words are grouped together with a following content-word. u A compound noun is considered as one phrase. u Auxiliary verbs are grouped together with a following verb. (is playing, was tired, …) u A parallel-relation word is considered as one phrase. (and , or , …)

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学 技術 に おける 情報 技術 (Science Technology) (Information Technology) … …

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学 技術 に おける 情報 技術 (Science Technology) (Information Technology) … …

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学 技術 に おける 情報 技術 (Science Technology) (Information Technology) … …

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学 技術 に おける 情報 技術 (Science Technology) (Information Technology) … …

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学

Step 2: Detection of Phrasal Correspondences … information technology … in science technology 科学 技術 に おける 情報 技術 (Science Technology) (Information Technology) … …

Step 2: Detection of Phrasal Correspondences • Criteria to choose phrasal correspondences – Correspondences

Step 2: Detection of Phrasal Correspondences • Criteria to choose phrasal correspondences – Correspondences of content words # of word-link X 2 # of J content-word + # of E content-word – Correspondences of neighboring phrases

Method Step 1 Detection of Phrasal Dependency Structure Step 2 Detection of Basic Phrasal

Method Step 1 Detection of Phrasal Dependency Structure Step 2 Detection of Basic Phrasal Correspondences by Consulting Dictionary Step 3 Discovery of New Correspondences By Handling Remaining Phrases

Step 3: Discovery of New Correspondences By Handling Remaining Phrases (New) (merge) in post

Step 3: Discovery of New Correspondences By Handling Remaining Phrases (New) (merge) in post goods and services 物 や (object) Cold war years 冷戦 終結 後 に (cold-war) (end) (after) case-maker サービス の (service)

Step 3: Discovery of New Correspondences By Handling Remaining Phrases • Criteria to discover

Step 3: Discovery of New Correspondences By Handling Remaining Phrases • Criteria to discover new correspondences – Local and Global supports • Local support: other phrasal correspondences within two-phrase distance in the dependency structure. • Global support: phrase correspondences in the parallel sentences. – POS Consistency – Inner Sufficiency

Step 3: Discovery of New Correspondences By Handling Remaining Phrases Japan 日本 play は

Step 3: Discovery of New Correspondences By Handling Remaining Phrases Japan 日本 play は (Japan) case-maker the role 役割 を 果たす (Role) case-maker (Achieve)

Step 3: Discovery of New Correspondences By Handling Remaining Phrases ・・・ technology 技術 が

Step 3: Discovery of New Correspondences By Handling Remaining Phrases ・・・ technology 技術 が (technology) case-maker has become important 重要 と なっている ( important ) ( become )

Experiments l Evaluation data: 200 sentence-pairs form White Paper & Example sentences in a

Experiments l Evaluation data: 200 sentence-pairs form White Paper & Example sentences in a Japanese-English dictionary l Gold standard data: We manually tagged correct correspondences on these sentences. Correct : Exactly equal with a pre-aligned Near-correct : Partly matches with a pre-aligned Wrong : No match with Correct & Near-correct

Output Examples Correct Japanese English is being pursued of G 7 nations geographical proximity

Output Examples Correct Japanese English is being pursued of G 7 nations geographical proximity 行われている (is doing by ) 先進 7カ国の (advanced 7 countries ) 地理的に近い (near in geography) Score 2. 75 2. 6 2. 0 Near-correct tree (become) went [to bed] She ( held) その木は (That tree is) 寝る (Go to bed) 彼女は (She is) 1. 2 1. 0 0. 5

Precision – Recall Correct + Near-Correct × 0. 5→ Correct→

Precision – Recall Correct + Near-Correct × 0. 5→ Correct→

Conclusion • We can find more correspondences than statistical approach. Statistical approach Our system

Conclusion • We can find more correspondences than statistical approach. Statistical approach Our system 1 -2% of the input corpus 51 -68% of the input corpus • In comparable corpus, a statistical approach seems to be effective, however in parallel corpus, our approach is more effective to get large number of translation examples.