Statistical Machine Translation with Rule Based Reordering of
Statistical Machine Translation with Rule Based Re-ordering of Source Sentences Amit Sangodkar Vasudevan N Om P. Damani (CSE, IIT Bombay)
Motivation n n Combining Linguistic knowledge with Statistical Machine Translation. Can re-ordering source language sentences as per target language improve the alignment?
Example English: Many Bengali poets have sung songs in praise of this land. Hindi: ������� � ���������� � ���� ��� Re-order: Many Bengali poets this land of praise in songs sung have
Translation Architecture
Dependency Parser Many Bengali poets have sung songs in praise of this land. amod (poets-3, Many-1) nn (poets-3, Bengali-2) nsubj (sung-5, poets-3) aux (sung-5, have-4) dobj (sung-5, songs-6) prep_in (sung-5, praise-8) det (land-11, this-10) prep_of (praise-8, land-11) ------------------Output of Stanford Parser
Tree Processing n Handling Auxiliary Verbs n n n remove and postfix to their respective verb e. g. aux(sung, have) sung_have Handling Prepositions/Conjunctions n n extract the preposition from the relation and attach to parent/child e. g. prep_in(sung, praise) prep(sung, praise_in)
Modified Dependency Tree
Re-ordering n n Parent-Child Positioning Prioritizing the Relations
Re-ordering (Parent-Child Positioning) n parent before child conj (conjunction), appos (apposition), advcl (adverbial clause), ccomp (clausal complement), rcmod (relative clause modifier) n e. g. John cried because he fell advcl(cry, fell). In Hindi, cry is ordered before fell. n child before parent nsubj(subject), dobj(object) n e. g. Ram eats mango dobj(eat, mango). In Hindi, mango ordered before eat.
Re-ordering (Relation Priority) n n Deciding the order in case of multiple children Priority among relation pairs nsubj dobj prep amod nn nsubj - L L - - dobj R - - prep R L - L L amod - - R - L nn - - R R -
Illustration - Re-ordering Input Dependency Tree nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of nsubj dobj prep det this nsubj - L L dobj R - R prep R L -
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in prep land_of det this dobj songs
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep amod nn amod - L nn R - land_of det this
Illustration - Re-ordering nsubj poets nn amod Many Bengali sung_have prep praise_in prep land_of det this Output: Many dobj songs
Illustration - Re-ordering nsubj poets nn amod Many Bengali sung_have prep praise_in prep land_of det this Output: Many dobj songs
Illustration - Re-ordering nsubj poets amod Many sung_have prep praise_in nn Bengali prep land_of det this Output: Many Bengali dobj songs
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in prep land_of det this Output: Many Bengali poets dobj songs
Illustration - Re-ordering nsubj poets amod Many sung_have prep praise_in nn Bengali dobj songs prep land_of det this Output: Many Bengali poets nsubj dobj prep nsubj - L L dobj R - R prep R L -
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in prep land_of det this Output: Many Bengali poets this dobj songs
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of det this Output: Many Bengali poets this land of
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of det this Output: Many Bengali poets this land of praise in
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of det this Output: Many Bengali poets this land of praise in
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of det this Output: Many Bengali poets this land of praise in songs
Illustration - Re-ordering nsubj poets amod Many nn Bengali sung_have prep praise_in dobj songs prep land_of det this Output: Many Bengali poets this land of praise in songs sung have ������� � ���������� � ���� ���
Experimental Setup n Procedure n n Train Moses using Training data with 6 -gram language model Tune the Moses using Development data Decode Testing data using trained Moses This experimentation procedure on pure data and reordered data
Results Corpus Metric EILMT IIIT Data Set BLEU Baseline Dev Test 0. 1488 0. 1450 Re-ordered Dev Test 0. 1751 0. 1601 NIST 4. 7600 4. 7287 4. 8539 4. 6923 BLEU 0. 0815 0. 0842 0. 0836 0. 0853 NIST 3. 9036 4. 2426 3. 7335 4. 0140
Conclusion n n Using Linguistic knowledge appears to improve the SMT quality BLEU score applicability in this context needs to be investigated
Acknowledgements n n We acknowledge the Department of IT (DIT), Government of India and the English -to-Indian Languages (EILMT) consortium for making the EILMT tourism dataset available. IIIT Data Set: Data acquired during DARPA TIDES MT project 2003 and later refined at LTRC, IIIT-H.
References n n n [Hieu 2008] Hieu Hoang, Philipp Koehn, Design of the Moses Decoder for Statistical Machine Translation, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008. [Marie 2006] Marie-Catherine de Marneffe, Bill Mac. Cartney and Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06. 2006. [Manual 2008] Stanford Dependencies Manual, Available at http: //nlp. stanford. edu/software/dependencies_manual. pdf. . [Moses] Moses Tutorial, Available at http: //www. statmt. org/moses/? n=Moses. Tutorial. . [Singh 2007] Smriti. Singh, Mrugunk. Dalal, Vishal Vachhani, Pushpak Bhattacharyya, Om P. Damani. Hindi Generation from Interlingua (UNL), Machine Translation Summit XI, 2007.
- Slides: 31