Machine Translation A Presentation by Julie Conlonova Rob
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau
Overview l Language Alignment System l Datasets ¡ Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus) ¡ A word-aligned set for testing and evaluation to measure accuracy and precision l Decoding
Language Alignment l Goal: Produce a word-aligned set from a sentence-aligned dataset l First step on the road toward Statistical Machine Translation l Example Problem: ¡ The motion to adjourn the House is now deemed to have been adopted. ¡ La motion portant que la Chambre s'ajourne maintenant est réputée adoptée.
IBM Models 1 and 2 -Kevin Knight, A Statistical MT Tutorial Workbook, 1999 Each capable of being used to produce a word-aligned dataset separately. l EM Algorithm l Model 1 produces T-values based on normalized fractional counting of corresponding words. l Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words l
Training Data European Parliament Proceedings Parallel Corpus 1996 -2003 l Aligned Languages: l ¡ ¡ ¡ ¡ English - French English - Dutch English - Italian English - Finish English - Portuguese English - Spanish English - Greek
Training Data cont. l Eliminated ¡ Misaligned sentences ¡ Sentences with 50 or more words ¡ XML tags ¡ Symbols and numerical characters other then commas and periods
Ideally… http: //www. cs. berkeley. edu/~klein/cs 294 -5
Bypassing Interlingua: Models I-III l Variables contributing to the probability of a sentence: ¡Correlation between words in the source/target languages ¡Fertility of a word ¡Correlation between order of words in source sentence and order of words in target
A Translation Matrix Rob Cat is Dog Rob 1 0 0 0 Gato 0 1 0 0 es 0 0 . 5 0 esta 0 0 . 5 0 Perro 0 0 0 1
Building the Translation Matrix: Starting from alignments l Find the sentence alignment l If a word in the source aligns with a word in the target, then increment the translation matrix. l Normalize the translation matrix
Can’t find alignments l Most sentences in the hansards corpus are 60 words long. There are many that can be over 100. l 100100 possible alignments
Counting l Rob is a boy. Rob es nino. l Rob is tall. Rob es alto. l Eric is tall. Eric es alto. … … Base counts on co-occurrence, weighting based on sentence length.
Iterative Convergence Rob Is Use Estimation Maximization algorithm Rob. 66. 33 l Creates translation matrix es. 30. 66 l alto Tall boy . 25 . 2 . 05 . 5 0 nino. 2 . 05 0 . 5
Distorting the Sentence l Word order changes between languages l How is a sentence with 2 words distorted? l How is a sentence with 3 words distorted? l How is a sentence with … To keep track of this information we use…
A tesseract! l (A quadruply nested default dictionary) l This could be a problem if there are more than 100 words in a sentence. l 100 x 100 = too big for RAM and takes too much time
Broad Look at MT l “The translation process can be described simply as: 1. 2. Decoding the meaning of the source text, and Re-encoding this meaning in the target language. ” - “Translation Process”, Wikipedia, May 2006
Decoding l How to go from the T-matrix and A-matrix to a word alignment? l There are several approaches…
Viterbi l If only doing alignment, much smaller memory and time requirements. l Returns optimal path. l T-Matrix probabilities function as the “emission” matrix l A-Matrix probabilities concerned with the positioning of words
Decoding as a Translator Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner. However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets.
Greedy Hill Climbing Knight & Koehn, What’s New in Statistical Machine Translation, 2003 l Best first search l 2 -step look ahead to avoid getting stuck in most probable local maxima
Beam Search Knight & Koehn, What’s New in Statistical Machine Translation, 2003 l Optimization of Best First Search with heuristics and “beam” of choices l Exponential tradeoff when increasing the “beam” width
Other Decoding Methods Knight & Koehn, What’s New in Statistical Machine Translation, 2003 l Finite State Transducer ¡ Mapping between languages based on a finite automaton l Parsing ¡ String to Tree Model
Problem: One to Many Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value” Al-Onaizan, Curin, Jahr, etc. , Statistical Machine Translation, 1999
Results l Study done in 2003 on word alignment error rates in Hansards corpus: ¡ Model 2– 29. 3% on 8 K training sentence pairs l 19. 5% on 1. 47 M training sentence pairs l ¡ Optimized Model 6 – 20. 3% on 8 K training sentence pairs l 8. 7% on 1. 47 M training sentence pairs l Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003
Expected Accuracy 70% l Language ¡ overall performance: Dutch l French • Italian, Spanish, Portuguese Greek Finish
Possible Future Work Given more time, we would’ve implemented IBM Model 3 l Additionally uses n, p, and d fertilities for weighted alignments: l ¡ ¡ ¡ l N, number of words produced by one word D, distortion P, parameter involving words that aren’t involved directly Invokes Model 2 for scoring
Another Possible Translation Scheme l Example-Based Machine Translation ¡ Translation-by-Analogy ¡ Can sometimes achieve better than the “gist” translations from other models
Why Is Improving Machine Translation Necessary?
A Chinese to English Translation
The End Are there any questions/comments?
- Slides: 31