Approaches to MT Approaches of Machine Translation Rulebased
Approaches to MT
Approaches of Machine Translation • Rule-based MT or knowledge-based MT • Corpus-based MT • Statistical MT • Example-based MT • Hybrid MT
Rule-based Machine Translation (RBMT) • Machine translation systems that are based on the use of handcrafted linguistic rules as a knowledge-base are called rule -based MT. • These were the dominant MT systems since around 1970. • Ref (http: //en. wikipedia. org/wiki/Rule-based_machine_translation)
Rule-based Machine Translation (RBMT) • Formation of linguistics rules: • These rules are formed from linguistic information about source and target languages • basically retrieved from (bilingual) dictionaries and grammars covering the main morphology, syntactic and semantic regularities of each language, respectively • Other names: • Knowledge-based Machine Translation • Classical Approach of MT
Rule-based Machine Translation (RBMT) • Problems with rule-base MT system • This approach is very expensive to build, requiring the manual entry of large numbers of "rules" • Trained linguists are needed • This approach does not scale up well to a general system • Translations are awkward and hard to understand • Complex grammar, large dictionaries are needed • Slow and costly development
Types of RBMT Systems • Three types are there: • Transfer RBMT Systems (Transfer Rule Based Machine Translation) • Interlingual RBMT Systems (Interlingua) • Dictionary based MT System
Corpus-based MT Ø Ø Ø Corpus-based machine translation systems are those systems that are based on bilingual corpus Statistical models are used on bilingual corpora to translate source language(s) to target language(s) Started in 1990 s, are partially succeeded to replace traditional rule-based approaches
Corpus-based MT Here, a word or phrase is translated to one of a number of possible translations based on its probability of translation in the corpus • Advantage Ø • Self-customization: This means that these systems can learn the translations of terminology and even stylistic phrasing from previously translated materials
Corpus-based MT • Corpus: • Corpus, plural corpora, is a finite collection of linguistic data, either compiled as written texts or as a transcription of recorded speech. Example of bilingual corpus English Japanese How much is that red umbrella? Ano akai kasa wa ikura desu ka. How much is that small camera? Ano chiisai kamera wa ikura desu ka.
Corpus-based MT • Types of corpus-based MT • Statistical Machine Translation - SMT • Example Based Machine Translation- EBMT
Statistical Machine Translation (SMT) • Definition • In statistical machine translation system, translations are generated on the basis of statistical models that are derived from the analysis of bilingual text corpora)
Statistical Machine Translation (SMT) • History of SMT • The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, • Statistical machine translation was re-introduced in 1991 by researchers at IBM's Thomas J. Watson Research Center and has contributed to the significant revival in interest in machine translation in recent years. • Nowadays, it is by far the most widely-studied machine translation method.
Statistical Machine Translation (SMT) • Working of SMT: • TL words are chosen as those most likely to correspond with the SL words in a specific context • TL words are combined in ways most appropriate for the TL in a specific context/domain and style etc.
Statistical Machine Translation (SMT) • Advantage: • Expansion: Statistical-based machine translation (SMT) systems are easier and less expensive to expand, if the system can be taught new knowledge domains or languages by giving it large samples of existing humantranslated texts • Disadvantages: • Outputs are often ungrammatical • Quality and accuracy of translation falls well below that of a human linguist –
Statistical Machine Translation (SMT) • Examples: • EUROPARL, the record of the European Parliament • CANDIDE from IBM • Google used SYSTRAN, yahoo babel fish is still using SYSTRAN
Example-based Machine Translation (EBMT) • EBMT approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base, at run-time • It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning • (EBMT) approach was proposed by Makoto Nagao in 1984 Ref (http: //en. wikipedia. org/wiki/Example-based_machine_translation)
EBMT Basic Philosophy • EBMT Basic Philosophy “Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference. ” • Makoto Nagao (1984)
Example-based machine translation (EBMT) • Working of EBMT: • Based on observation that translators try to find similar SL phrases and sentences and their TL equivalents in previously translated texts • seek sets of analogies and examples from bilingual corpora
EBMT Example
Hybrid Machine Translation (HMT) • Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. • Several MT companies (Asia Online, Lingua. Sys, and Systran) are claiming to have a hybrid approach using both rules and statistics
Recent Research • Presently research is continued in example-based machine translation systems and statistical machine translation systems
Strategies Involved in Machine Translation • Direct strategy • Indirect approach • Interlingua • Transfer
Direct Strategy • The is the strategy that was in use in the first generation MT • Georgtown experiment was based on direct strategy • The translation is based on large dictionaries and word-by-word translation with some simple grammatical adjustments e. g. on word order and morphology
Direct Strategy … • A direct translation system is designed for a specific source and target language pair such as English to Pashto etc. • The translation unit of this approach of MT is usually a word • Example system: • One of the oldest still used MT systems today, Systran, is basically a direct translation system
Architecture for Direct Strategy of MT A compressed model of direct translation A detailed model of direct translation www. hutchinsweb. me. uk/Intro. MT-4. pdf
Working of Direct Strategy • Morphological analysis • lexical identification (word identification) of the words of the source language is done using morphological analysis • bilingual dictionary look-up • Then the words are lead directly to bilingual dictionary look-up providing target language word equivalences • local reordering • Some local reordering rules are applied to give more acceptable target language output, perhaps moving some adjectives or verb particles • Target language output • Then the target language text is produced for the source words one by one. • Ref [www. hutchinsweb. me. uk/Intro. MT-4. pdf]
Direct Strategy Example www. hutchinsweb. me. uk/Intro. MT-4. pdf
Drawbacks of Direct Approach • It can be characterized as 'word-for-word' translation with some local word-order adjustment. • (1) My trebuem mira. [Russian] We require world [Translated into English] 'We want peace. ‘ [Intended Correct Meaning in English] • (2) Ona navarila šcei na neskol'ko dnei. It welded on cabbage soups on several days. 'She cooked enough cabbage soup for several days. ‘ www. hutchinsweb. me. uk/Intro. MT-4. pdf
Interlingua or Pivot Language Strategy • This strategy is based on the idea of creating an intermediate representation of the document or text independent of any particular language. • This intermediate representation functions as a neutral, universal translated text that is different from both the source language and the target language. • Esperanto was an interlingua for translating between languages. [http: //www. thelanguagetranslation. com/machine-translation. html]
Interlingua Strategy (Cont. . ) • The intermediate representation includes all information necessary for the generation of the target text without 'looking back' to the original text. • The representation is thus a projection from the source text and at the same time acts as the basis for the generation of the target text; it is an abstract representation of the target text as well as a representation of the source text.
Interlingua (Cont. . ) • The interlingua approach is clearly most attractive for multilingual systems. • Each analysis module can be independent, both of all other analysis modules and of all generation modules (see figure 4. 2). • Target languages have no effect on any processes of analysis; the aim of analysis is the derivation of an 'interlingual' representation. Analysis Synthesis
Interlingua (Cont. . ) • Working of interlingua: • Is two steps process • Analysis • generation
Limitations of Interlingua Approach • The first reason is the difficulty of devising languageindependent representations • The second is the complexity of analysis and generation grammars when the representations are inevitably far removed from the characteristic features of the source and target texts. www. hutchinsweb. me. uk/Intro. MT-4. pdf
Interlingua Approach • Example systems: • DLT • TRANSLATOR
Transfer Strategy • The transfer strategy gives importance on the concept of "level of representation". • This strategy involves three stages: • Analysis stage • Which is the direct strategy, describes the source document linguistically, using a source language dictionary. • The transfer stage • Changes the results of the analysis stage and produces the linguistic and structural equivalents between the two languages. A bilingual dictionary is used to translate the source language to target language. • Generation stage • Which produces the target language document based on linguistic data of the source language by using a target language dictionary. [http: //www. thelanguagetranslation. com/machine-translation. html]
Transfer Strategy (Cont. . ) Source Language Intermediate Representation Target Language Intermediate Representation [http: //www. thelanguagetranslation. com/machine-translation. html]
Transfer Strategy (Cont. . )
Limitations of Transfer Strategy (Cont. . ) www. hutchinsweb. me. uk/Intro. MT-4. pdf
References • [1] W. Weaver (1955). Translation (1949). In: Machine Translation of Languages, MIT Press, Cambridge, MA. • [2] P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263 -311. • [3] Nagao, M. 1981. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle, in Artificial and Human Intelligence, A. Elithorn and R. Banerji (eds. ) North- Holland, pp. 173 -180, 1984. • [4] "the Association for Computational Linguistics - 2003 ACL Lifetime Achievement Award". Association for Computational Linguistics. http: //www. aclweb. org/index. php? option=com_content&task=view&id=36&Item id=30. Retrieved 2010 -03 -10. • [5] Boretz, Adam, "App. Tek Launches Hybrid Machine Translation Software" Speech. Tech. Mag. com (posted 2 MAR 2009) • [6] http: //www. globalsecurity. org/intell/systems/mt-techniques. htm • [7] M. A Khan, Text-based Machine Translation System, 1995, Chapter 2. • [8] www. hutchinsweb. me. uk
- Slides: 42