KnowledgeRich MT Chris Dyer Kevin Gimpel Waleed Ammar
- Slides: 46
Knowledge-Rich MT Chris Dyer - Kevin Gimpel Waleed Ammar - Noah Smith November 4, 2011
Outline • Where are we starting with end-to-end MT? • Adapting SMT for low-resource scenarios • What progress have we been making? • What does Year 2 hold?
Cross-site system comparison
The SMT baseline English S'il vous plaît traduire. . . decoder LM learner English français TM learner Please translate. . .
SMT Baselines BLEU Kinyarwanda – English (Hiero) 6. 8 BLEU English – Kinyarwanda (Hiero) 4. 7
SMT Baselines BLEU Kinyarwanda – English (Hiero) 6. 8 BLEU English – Kinyarwanda (Hiero) 4. 7 BLEU Malagasy – English (Hiero) 24. 3 Malagasy – English (Moses) 24. 2 BLEU English – Malagasy (Hiero) 25. 0 English – Malagasy (Moses) 30. 5
Let’s make things better.
The problem? LM English learner English français TM learner
Low-resource! LM English learner English Malagasy TM learner
Low-resource! LM English learner English Malagasy TM Small, Out of domain
Low-resource! LM English learner English Malagasy TM Malagasy verbal morphology “Partial” language models
Low-resource! LM English learner English Malagasy TM Malagasy verbal morphology Dependency parses Unsupservised model outputs
Low-resource! LM English learner English Malagasy TM Unsupservised model outputs Malagasy verbal morphology Dependency parses Word clusters 36: dieny, fara, fiompiny, hamoaka, handehanany 37: adinina, aforeto, ahevao, akaiky, alao,
Year 1 MT Challenge
Year 1 MT Challenge English Malagasy verbal morphology Dependency parses Word clusters 36: dieny, fara, fiompiny, hamoaka, handehanany 37: adinina, aforeto, ahevao, akaiky, alao,
Year 1 MT Challenge English Malagasy verbal morphology Dependency parses Word clusters 36: dieny, fara, fiompiny, hamoaka, handehanany 37: adinina, aforeto, ahevao, akaiky, alao, Translation Model
Year 1 MT Challenge English Malagasy verbal morphology Dependency parses Word clusters 36: dieny, fara, fiompiny, hamoaka, handehanany 37: adinina, aforeto, ahevao, akaiky, alao, henemana no hana. . . Translation Model something intelligible. . .
Accomplishments • Better alignments, better translations • Feature-rich translation • 10 s of millions of features • Diverse knowledge sources • Phrase dependency translation model • phrase ordering with a dependency model
Model 4 CMU
Model 4 CMU
Model 4 CMU
Model 4 CMU Similar pattern of improvements, no language-specific features (yet).
Malagasy - English BLEU Model 4 - GDA 24. 2 Model 4 - GDFA 26. 7 CMU - GDFA 26. 3 Model 4 +CMU 27. 6 Malagasy - English version 1. 0
What improvements? the sons of simeon were jemoela , jamin , jakin , and ohada zohara saul , the son of a canaanite woman. the sons of simeon were jemuel , jamin , ohada , jakin , zohar , and shaul , the son of a canaanite woman. the sons of simeon : jemuel , jamin , ohad , jakin , zohar , and shaul ( the son of a canaanite woman ).
What improvements? the sons of simeon were jemoela , jamin , jakin , and ohada zohara saul , the son of a canaanite woman. the sons of simeon were jemuel , jamin , ohada , jakin , zohar , and shaul , the son of a canaanite woman. the sons of simeon : jemuel , jamin , ohad , jakin , zohar , and shaul ( the son of a canaanite woman ).
What improvements? then the woman said to the serpent , “ no ! you will not die. now the serpent said to the woman , “ you will not die. the serpent said to the woman , “ surely you will not die ,
What improvements? then the woman said to the serpent , “ no ! you will not die. now the serpent said to the woman , “ you will not die. the serpent said to the woman , “ surely you will not die ,
• • Feature-rich translation Discriminative learning on training data Learn much sparser features than possible with just a development set • • Update weights to improve translation probability Final tuning pass on development set to optimize translation metrics (BLEU, METEOR, etc. )
What features?
Contexts give clues to contintuents
Contexts give clues to contintuents
German - English BLEU Features baseline 25. 0 11 / 11 +7 -gram 25. 0 13 / 13 25. 2 11, 194 / 80, 006, 646 25. 4 11, 196 / 80, 006, 648 +Context +7 -gram
Phrasal dependency translation model
Phrasebased output:
Phrasebased output: Our System:
Phrasebased output: Our System: Use features from source-side parse
% BLEU Target Syntax Only
% BLEU Target Syntax Only Target Syntax + String-to-Tree Rules
% BLEU Target Syntax Only Target Syntax + String-to-Tree Rules + Tree-to-Tree Features
• Our best results use supervised parsers for both source and target languages • What about unsupervised parsing?
• Our best results use supervised parsers for both source and target languages • What about unsupervised parsing? • We use the dependency model with valence (Klein & Manning, 2004) • With careful initialization, it gives state-ofthe-art results (Gimpel & Smith, 2011): • 53. 1% attachment accuracy on Penn Treebank • 44. 4% on Chinese Treebank
% BLEU
Year 2 “Into other languages” • Target morphological complexity • Generate novel word forms • Leverage morphological resources and machine learning • Need better language models, not just translation models
Year 2 Challenges • Generating new word forms means a much larger search space than is usual in MT • Inference is expensive • Use “high-recall” linguistic tools to constrain search • Statistics do the rest
Year 2 • Data requirements • Large non-English monolingual corpora • Test sets for focus languages
- Waleed ammar
- Waleed majeed
- Waleed elsafoury
- Historty
- Maksud anger
- Waleed majeed
- Ammar sultan
- Ammar yaseen
- Ammar mirascija
- Dr ammar attiya
- Hany ammar
- Zookeeper ammar
- Hany ammar
- Heba pronunciation
- Hany ammar
- John dyer rainforest
- Hobo dyer map pros and cons
- The unbecoming of mara dyer summary
- Dyer and holder typology of strategies
- Melanie combs dyer
- Dr jeffrey dyer
- The haberdasher canterbury tales
- Jeffrey van gogh
- Lin and dyer
- Nnn captions
- Prokaryotic
- Kevin rolfe md
- Jerome kevin and seth shared a submarine sandwich
- Bekah tinter coach explore
- Kevin creamer
- Kevin jobson
- Kevin thompson nsf
- Kevin lusk
- Hurley range
- Kevin gensler
- Northview junior high
- Kevin 380
- Kevin salcido
- Kevin featherstone lse
- Gloucestershire marf portal
- Kevin klues
- Kevin tunstall
- Kevin o'malley illustrator
- Types of nucleation
- Kevin harville
- Kevin tunell
- Pgm