Translating Collocations for Bilingual Lexicons Collocations idiomatic multiword
Translating Collocations for Bilingual Lexicons ä Collocations (idiomatic multi-word expressions) difficult to translate ä semantically opaque ä cannot be translated word-by-word ä a major obstacle to second language acquisition ä Example: demonstrate support prouver son adhésion (prove adherence)
The Champollion approach ä Input: Large parallel corpora ä Output: List of collocations in each language, and equivalence mappings between these collocations ä The method is statistical and languageindependent
Algorithm Ê Align sentences across corpora Ë Extract collocations from co-occurrence Ì Identify all words that frequently appear across a source collocation Í Iteratively consider and score combinations of those words Î Select best set of words for the translation Ï Determine word order and fill in prepositions
Sample translations ä additional costs coûts supplémentaires ä affirmative action positive ä free trade libre-échange ä freer trade libéralisation … échanges ä take … steps prendre … mesures ä stock market bourse
Evaluation results ä Corpus of 3. 5 million words, collocations selected from the same corpus: 78% ä Corpus of 8. 5 million words, collocations selected from the same corpus: 74% ä Corpus of 3. 5 million words, collocations selected from a different corpus: 65%
Conclusion ä Champollion provides for collocation translation ä Robust ä Language-independent ä Requires no tools ä But: Requires parallel corpora
- Slides: 6