Translation Models Taking Translation Direction into Account Gennadi

  • Slides: 27
Download presentation
Translation Models: Taking Translation Direction into Account Gennadi Lembersky Noam Ordan Shuly Wintner ISCOL,

Translation Models: Taking Translation Direction into Account Gennadi Lembersky Noam Ordan Shuly Wintner ISCOL, 2011

2 Statistical Machine Translation (SMT) • Given foreign sentence f: ▫ “Maria no dio

2 Statistical Machine Translation (SMT) • Given foreign sentence f: ▫ “Maria no dio una bofetada a la bruja verde” • Find the most likely English translation e: ▫ “Maria did not slap the green witch” • Most likely English translation e is given by: arg max P(e|f): • P(e|f) estimates conditional probability of any e given f • How to estimate P(e|f)? • Noisy channel: ▫ Decompose P(e|f) into P(f|e) * P(e) / P(f) ▫ Estimate P(f|e) using parallel corpus (translation model) ▫ Estimate P(e) using monolingual corpus (language model)

3 Translation Model • How to model P(f|e)? ▫ Learn parameters of P(f|e) from

3 Translation Model • How to model P(f|e)? ▫ Learn parameters of P(f|e) from a parallel corpus ▫ Estimate translation model parameters at the phrase level �explicit modeling of word context �captures local reorderings, local dependencies • IBM Models define how words in a source sentence can be aligned to words in a parallel target sentence ▫ EM is used to estimate the parameters • Aligned words are extended to phrases • Results: phrase-table

4 Log-Linear Models • Log-linear models ▫ where hi are the feature functions and

4 Log-Linear Models • Log-linear models ▫ where hi are the feature functions and λi are the model parameters ▫ typical feature functions: phrase translation probabilities, lexical translation probabilities, language model probability, reordering model • Model parameter estimation (tuning) using discriminative training; MERT algorithm (Och, 2003)

5 Evaluation • Human evaluation is not practical – too slow and costly •

5 Evaluation • Human evaluation is not practical – too slow and costly • Automatic evaluation is based on a human reference translation ▫ The output of an MT system is compared to the human translation of the same set of sentences ▫ The metric basically calculate the distance between MT output and the reference translation • Tens of metrics were developed ▫ BLEU is the most popular one ▫ METEOR and TER are close

6 Original vs. Translated Texts Given this simplified model: LM Source Text TM Target

6 Original vs. Translated Texts Given this simplified model: LM Source Text TM Target Text Two points are made with regard to the “intermediate component” (TM and LM): 1. TM is blind to direction (but see Kurokawa et al. , 2009) 2. LMs are based on originally written texts.

7 Original vs. Translated Texts Translated texts are ontologically different from non-translated texts ;

7 Original vs. Translated Texts Translated texts are ontologically different from non-translated texts ; they generally exhibit 1. Simplification of the message, the grammar or both (Al-Shabab, 1996, Laviosa, 1998) ; 2. Explicitation, the tendency to spell out implicit utterances that occur in the source text (Blum-Kulka, 1986).

8 Original vs. Translated Texts • Translated texts can be distinguished from nontranslated texts

8 Original vs. Translated Texts • Translated texts can be distinguished from nontranslated texts with high accuracy (87% and more) - For Italian (Baroni & Bernardini, 2006) - For Spanish (Iliseiet al. , 2010); - For English (Koppel & Ordan, 2011)

9 How Translation Direction Affects MT? • Language Models ▫ Our work (accepted to

9 How Translation Direction Affects MT? • Language Models ▫ Our work (accepted to EMNLP) shows that translated LMs are better for MT systems than the original ones. • Translation Models ▫ Kurokawa et al, 2009 showed that when translating French into English it is better to use Frenchtranslated-to-English parallel corpus and vice versa. ▫ This work supports this claim and extends it (in review for WMT)

10 Our Setup • Canadian Hansard corpus: parallel French-English corpus ▫ 80% Original English

10 Our Setup • Canadian Hansard corpus: parallel French-English corpus ▫ 80% Original English (EO) ▫ 20% Original French (FO) ▫ The ‘source’ language is marked • Two scenarios: ▫ Balanced: 750 K FO sentences and 750 K EO sentences ▫ Biased: 750 K FO sentences and 3 M EO sentences • MOSES PB-SMT toolkit • Tuning & Evaluation: ▫ 1000 FO sentences for tuning and 5000 FO sentences for evaluation

11 Baseline Experiments • We translate French-to-English • EO – train the phrase-table on

11 Baseline Experiments • We translate French-to-English • EO – train the phrase-table on EO portion of the parallel corpus • FO – train the phrase-table on FO portion of the parallel corpus • FO+EO – train the phrase-table on all the parallel corpus

12 Baseline Results Set Balanced Biased System BLEU Size Time EO 28. 44 1,

12 Baseline Results Set Balanced Biased System BLEU Size Time EO 28. 44 1, 391, 365 1. 04 FO 31. 92 1, 308, 726 0. 98 FO+EO 31. 72 2, 429, 807 1. 09 EO 29. 53 4, 236, 189 1. 22 FO 31. 92 1, 308, 726 0. 98 FO+EO 32. 85 5, 101, 973 1. 15

13 System. A: Two Phrase-Tables • EO – train the phrase-table on EO portion

13 System. A: Two Phrase-Tables • EO – train the phrase-table on EO portion of the parallel corpus • FO – train the phrase-table on FO portion of the parallel corpus • System. A – let MOSES use both phrase-tables ▫ Log-linear model training gives each phrase-table different scores

14 System. A Results Set System BLEU Size Time Balanced System. A 33. 21

14 System. A Results Set System BLEU Size Time Balanced System. A 33. 21 2, 700, 091 1. 89 Biased System. A 33. 54 5, 544, 915 2. 39 • In the balanced scenario we gained 1. 29 BLEU • In the biased scenario we gained 0. 69 BLEU • The cost is the decoding time and memory resources

15 Looking Inside… • Complete table – a phrase-table after training • Filtered table

15 Looking Inside… • Complete table – a phrase-table after training • Filtered table – a phrase-table that contains only phrases that appear in the evaluation set

16 Few Observations… / 1 • Balanced Set / Complete tables ▫ FO table

16 Few Observations… / 1 • Balanced Set / Complete tables ▫ FO table has many more unique French phrases (15. 8 M vs. 13 M) ▫ EO table has more translation options per each source phrase (1. 42 vs. 1. 33) ▫ The sources phrases in the intersection are shorter (3. 76 vs. 5. 07 -5. 16), but they have more translations (3. 08 -3. 21 vs. 1. 09 -1. 10)

17 Few Observations… / 2 • Balanced Set / Filtered tables ▫ The intersection

17 Few Observations… / 2 • Balanced Set / Filtered tables ▫ The intersection comprises 96. 1% of the translation phrase-pairs in the FO table and 98. 3% of the translation phrase-pairs in the EO table.

18 Few Observations… / 3 • Biased Set – we added 2, 250, 000

18 Few Observations… / 3 • Biased Set – we added 2, 250, 000 Englishoriginal sentences. What happens? ▫ In ‘complete’ EO table – everything grows • In Filtered Tables ▫ number of phrase-pairs increases by a factor of 3 ▫ number of unique source phrases increases by 1/3 �Coverage of French phrases haven’t improved by much ▫ The average number of translations increases by a factor of 2. 3 (from 13. 2 to 30. 3) �Long tail – the probability is split between larger number of translations. Good translations get lower probability than in FO table

19 How does MOSES Select Phrases? • Balanced Set • Biased Set • 96.

19 How does MOSES Select Phrases? • Balanced Set • Biased Set • 96. 5% comes from FO table • 99. 3% of the phrase-pairs selected from the intersection originated in the FO table • 94. 5% comes from FO table • 98. 2% of the phrase-pairs selected from the intersection originated in the FO table

20 The tuning effect /1 • A question: Is FO phrase-table better than the

20 The tuning effect /1 • A question: Is FO phrase-table better than the EO phrase-table or it becomes better during the tuning. • Let’s test System. A with initial (pre-tuning) configuration and with the configuration generated by tuning.

21 The tuning effect /2 • Balanced Set / Before tuning • Balanced Set

21 The tuning effect /2 • Balanced Set / Before tuning • Balanced Set / After tuning • 58% only comes from the FO table • 57. 7% of the phrase-pairs selected from the intersection originated in the FO table • 95. 4% comes from FO table • 99. 3% of the phrase-pairs selected from the intersection originated in the FO table

22 The tuning effect /3 • The decoder prefers the FO table in the

22 The tuning effect /3 • The decoder prefers the FO table in the initial configuration (58%). • The preference becomes much stronger after tuning (95. 4%) • Interestingly, the decoder doesn’t just replace EO phrases with FO phrases; it searches for the longer phrases; ▫ The average length of a phrase selected from the EO table increases by about 1. 5 words.

23 New Experiment: System. B • Based on these results, we can through away

23 New Experiment: System. B • Based on these results, we can through away the intersection subset of the EO phrase-table ▫ We expect a small loss in quality, but a significant improvement in translation speed.

24 System. B Results Set Balanced Biased System BLEU Size Time EO 28. 44

24 System. B Results Set Balanced Biased System BLEU Size Time EO 28. 44 1, 391, 365 1. 04 FO 31. 92 1, 308, 726 0. 98 FO+EO 31. 72 2, 429, 807 1. 09 System. A 33. 21 2, 700, 091 1. 89 System. B 33. 19 1, 327, 955 0. 94 EO 29. 53 4, 236, 189 1. 22 FO 31. 92 1, 308, 726 0. 98 FO+EO 32. 85 5, 101, 973 1. 15 System. A 33. 54 5, 544, 915 2. 39 System. B 33. 34 1, 382, 017 0. 95

25 What about classified corpus? • Annotation of the source language is rarely available

25 What about classified corpus? • Annotation of the source language is rarely available in the parallel corpora. ▫ Will our System. A and System. B outperform FO+EO and FO MT systems? • We use we use SVM for classification, and our features are punctuation marks and the n-grams of part-of-speech tags. • We train the classifier on an English-French subset of the Europarl corpus. • Accuracy is about 73. 5%

26 Classified System Results Set Balanced System BLEU EO+FO 31. 72 FO (annotated) 31.

26 Classified System Results Set Balanced System BLEU EO+FO 31. 72 FO (annotated) 31. 92 FO (classified) 32. 04 System. A (classified) 32. 91 System. B (classified) 32. 57 System. A (annotated) 33. 21 System. B (annotated) 33. 19

27 Thank You!

27 Thank You!