Postgraduate Diploma in Translation Machine Translation I Introduction

  • Slides: 57
Download presentation
Postgraduate Diploma in Translation Machine Translation I Introduction to MT Feb 2007 Diploma in

Postgraduate Diploma in Translation Machine Translation I Introduction to MT Feb 2007 Diploma in Translation I

Outline • • Translation Machine Translation (MT) Why MT is important MT and the

Outline • • Translation Machine Translation (MT) Why MT is important MT and the Human Translator Feb 2007 Diploma in Translation I 2

Why Translation is Difficult Feb 2007 Diploma in Translation I

Why Translation is Difficult Feb 2007 Diploma in Translation I

What is Translation? • The process of transforming text from one language into a

What is Translation? • The process of transforming text from one language into a text in another language that is, • in some sense, equivalent to that in a first language • in some sense, a good text in its own right. • It is what translators do. . Feb 2007 Diploma in Translation I 4

What Translators Actually Do: An Example of En/Fr Translation As recently as a decade

What Translators Actually Do: An Example of En/Fr Translation As recently as a decade ago it was widely believed that infectious disease was no longer much of a threat in the developed world. The remaining challenges to public health there, it was thought, stemmed from noninfectious conditions such as cancer, heart disease and degenerative diseases. Feb 2007 Il y a une dizaine d’annees, on croyait que les pays industrialises etait debarasses des risques lies aux maladies infectieuses et que la sante publique n’etait menacee que par des maladies comme le cancer, les troubles cardiaques, et les anomolies genetiques Diploma in Translation I 5

Problems: style and meaning English French • Two sentences • One sentence • infectious

Problems: style and meaning English French • Two sentences • One sentence • infectious disease was no • les pays industrialises longer much of a threat in etait debarasses des the developed world risques lies aux maladies infectieuses • The remaining challenges • la sante publique n’etait to public health there menacee que • noninfectious conditions • maladies Feb 2007 Diploma in Translation I 6

Translation • These tasks are extremely difficult. • They are more than what we

Translation • These tasks are extremely difficult. • They are more than what we expect of a human translator, let alone a computer. • The work of human translators is typically multi-stage. Feb 2007 Diploma in Translation I 7

Translation Workflow Pre-editing Translation LANGUAGE RESOURCES ? ? Post-editing Feb 2007 Diploma in Translation

Translation Workflow Pre-editing Translation LANGUAGE RESOURCES ? ? Post-editing Feb 2007 Diploma in Translation I 8

Translation Workflow Pre-editing Translation Post-editing Feb 2007 Diploma in Translation I LANGUAGE RESOURCES dictionaries

Translation Workflow Pre-editing Translation Post-editing Feb 2007 Diploma in Translation I LANGUAGE RESOURCES dictionaries grammars terminology existing translations 9

Translation Workflow • No pre-editing Lots of post-editing! • Lots of pre-editing Less post-editing!

Translation Workflow • No pre-editing Lots of post-editing! • Lots of pre-editing Less post-editing! • GARBAGE IN, GARBAGE OUT!!! Feb 2007 Diploma in Translation I 10

Translation Workflow: Pre-editing • Text Preparation – Delimitation of what is to be translated

Translation Workflow: Pre-editing • Text Preparation – Delimitation of what is to be translated – Representation in electronic form – Spelling correction • Stylistic guidelines, e. g. – avoidance of long sentences – avoidance of ambiguous terms • Use of controlled languages and related tools – Grammar Checkers – Critiquing Systems Feb 2007 Diploma in Translation I 11

Translation Workflow • Coordination of a large translation job – Distribution of task to

Translation Workflow • Coordination of a large translation job – Distribution of task to several translators – Integration of results – Communication of between translators and cooordinator • Availability of language resources – – Grammar Dictionaries Terminology Existing translations Feb 2007 Diploma in Translation I 12

Controlled Languages • Average number of words used by native speaker – 75, 000.

Controlled Languages • Average number of words used by native speaker – 75, 000. • Basic English, invented by Ogden (1930). Vocabulary size 850. • Simplified constructions e. g. ``make perfect'' instead of ``perfect''. – Learn English – seven years – Learn Esperanto – seven months – Learn BE – seven weeks • Some industries have introduced controlled languages for their manuals. • Xerox offers its technical writers one-day course, British Aerospace does the same in a few short sessions Feb 2007 Diploma in Translation I 13

Translational Equivalence • • • Lexical Mismatches Cultural Mismatches Grammatical/Structural Mismatches Structural/Semantic Mismatches Role

Translational Equivalence • • • Lexical Mismatches Cultural Mismatches Grammatical/Structural Mismatches Structural/Semantic Mismatches Role of Context Feb 2007 Diploma in Translation I 14

Lexical Mismatches English • • • ? spam friend truck lorry just Feb 2007

Lexical Mismatches English • • • ? spam friend truck lorry just Feb 2007 French • • • Alpe ? amie camion venir de Diploma in Translation I 15

Words with Many Senses Hutchins & Somers (1992) Feb 2007 Diploma in Translation I

Words with Many Senses Hutchins & Somers (1992) Feb 2007 Diploma in Translation I 16

Cultural Mismatches English: Health Insurance French: Assurance Maladie English: validate French: oblitérer Feb 2007

Cultural Mismatches English: Health Insurance French: Assurance Maladie English: validate French: oblitérer Feb 2007 Diploma in Translation I 17

Cultural Mismatches It's no good closing the barn door after the horse has bolted

Cultural Mismatches It's no good closing the barn door after the horse has bolted Moutarde après le dîner Feb 2007 Diploma in Translation I 18

Grammatical/Structural Mismatches • I miss you • I like sausages Feb 2007 • tu

Grammatical/Structural Mismatches • I miss you • I like sausages Feb 2007 • tu me manques • Ich habe wursten gern Diploma in Translation I 19

Structural/Semantic Mismatches • Head marking. – In English possessive relation is marked on the

Structural/Semantic Mismatches • Head marking. – In English possessive relation is marked on the owner: The man's house – In Hungarian it is marked on the dependent: The man house-his – his house / sa maison • Direction and manner of motion marking – He ran into the room (English) – He entered the room running (French) Feb 2007 Diploma in Translation I 20

Contextual Interpretation OPEN ouvre ouvert Feb 2007 Diploma in Translation I 21

Contextual Interpretation OPEN ouvre ouvert Feb 2007 Diploma in Translation I 21

Structural Ambiguity • • I forgot how good beer tastes Time flies like an

Structural Ambiguity • • I forgot how good beer tastes Time flies like an arrow I bought a car with four doors/liri The councillors refused the women a permit because they advocated/feared violence. Feb 2007 Diploma in Translation I 22

Similarities and Differences Between Languages • • Similarities Communicative function for survival Mechanisms for

Similarities and Differences Between Languages • • Similarities Communicative function for survival Mechanisms for reference to people, eating, politeness, time. Nouns Verbs Feb 2007 Differences • Lexical Marking of semantic distinctions • Morphology – English – Maltese – German • Word order Diploma in Translation I 23

Morphology • • try, tries, tried, trying nikteb, tikteb, jiktebt, niktbu, . . .

Morphology • • try, tries, tried, trying nikteb, tikteb, jiktebt, niktbu, . . . uygarlastiramadimizdanmisiklarsinizcasma behaving as if you are amongst those whom we could not cause to become civilized Feb 2007 Diploma in Translation I 24

Differences in Word Order • SVO (English) The man kicked the ball • SOV

Differences in Word Order • SVO (English) The man kicked the ball • SOV (Japanese) The man the ball kicked • VSO (Classical Arabic) Kicked the man the ball • Mixed (German) The man who the sausage ate did. • Free word order (Latin) Feb 2007 Diploma in Translation I 25

Summary • Translation concerns translational equivence • This transcends equivalence of meaning (e. g.

Summary • Translation concerns translational equivence • This transcends equivalence of meaning (e. g. sometimes involves cultural conventions) • Translation may involve the resolution of ambiguity. • It makes sense to talk about the distance between languages. Languages which are close are easier to translate. • Translation is a hard problem – for humans let alone machines. Feb 2007 Diploma in Translation I 26

Why Machine Translation is Important Feb 2007 Diploma in Translation I

Why Machine Translation is Important Feb 2007 Diploma in Translation I

Implications of Multilinguality Number of Languages Feb 2007 2 Number of Language Pairs 2

Implications of Multilinguality Number of Languages Feb 2007 2 Number of Language Pairs 2 3 6 10 90 20 380 Diploma in Translation I 28

Commerical Interest • US has invested in MT for intelligence purposes • MT is

Commerical Interest • US has invested in MT for intelligence purposes • MT is popular on the web - the most ued of Google's special features • EU spends more that € 1 B per annum on translation Feb 2007 Diploma in Translation I 29

Academic Interest • Different NL technologies include – parsing – generation – morphology –

Academic Interest • Different NL technologies include – parsing – generation – morphology – pronoun resolution – understanding. . . Feb 2007 Diploma in Translation I 30

Misconceptions about MT • MT is a waste of time because – you will

Misconceptions about MT • MT is a waste of time because – you will never make a machine that can translate Shakespeare. – the quality of translation you can get from an MT system is very low • MT threatens the jobs of translators. • MT systems are machines, and buying an MT system should be very much like buying a car. Feb 2007 Diploma in Translation I 31

Facts about MT • There are many situations where the ability to produce reliable,

Facts about MT • There are many situations where the ability to produce reliable, if less than perfect, translations at high speed is valuable. • MT systems can take over some of the boring, repetitive translation jobs and allow human translation to concentrate on more interesting specialist tasks. • Building an MT system is an arduous and time consuming job, involving the construction of grammars and very large monolingual and bilingual dictionaries. Feb 2007 Diploma in Translation I 32

The Place for MT • Human Translators are good at: – Getting the right

The Place for MT • Human Translators are good at: – Getting the right turn of phrase – Preserving translation equivalence • Human Translators are bad at – Dictionary look-up – Consistency of translation – Translation of terminology • MT can exploit these weaknesses Feb 2007 Diploma in Translation I 33

Summary MT is important because – There are too few human translators – Availability

Summary MT is important because – There are too few human translators – Availability of materials in appropriate language has significant economic consequences. – Scientifically, it is still one of the best test areas for language technology Feb 2007 Diploma in Translation I 34

Machine Translation and Human Translators Feb 2007 Diploma in Translation I

Machine Translation and Human Translators Feb 2007 Diploma in Translation I

Different Styles of MT • FAMT: fully automatic machine translation – FAHQMT – FALQMT

Different Styles of MT • FAMT: fully automatic machine translation – FAHQMT – FALQMT • MAHT: machine aided human translation • HAMT: human aided machine translation Feb 2007 Diploma in Translation I 36

The Dream of FAMT • Fully Automatic (High Quality) Machine Translation (Bar Hillel 1960)

The Dream of FAMT • Fully Automatic (High Quality) Machine Translation (Bar Hillel 1960) Source Language text Feb 2007 FAHQMT Diploma in Translation I Target Language text 37

FAMT • Basic Charactistics – No human intervention – Arbitrary text • Evaluation Criteria

FAMT • Basic Charactistics – No human intervention – Arbitrary text • Evaluation Criteria – Quality of ouput – Cost ($/page) – Speed (pages/hour) Feb 2007 Diploma in Translation I 38

FAMT Success Story TAUM METEO • Written by Chevalier et al. 1978. • Translation

FAMT Success Story TAUM METEO • Written by Chevalier et al. 1978. • Translation of weather reports from English to French • Highly constrained subset of English: – Small number of senses for each word – Restricted syntactic constructions • System determines whether a given sentence is within its capabilities • Very fast, very accurate, no post-editing Feb 2007 Diploma in Translation I 39

FAMT: MORAL • FAMT can work well but only if we give up one

FAMT: MORAL • FAMT can work well but only if we give up one or more of the goals e. g. – Unrestricted text input – High quality translation • This observation has lead to research on sublanguages • And to the use of FALQT Feb 2007 Diploma in Translation I 40

Sublanguages • Restricted domain of reference • Restricted purpose and orientation • Restricted mode

Sublanguages • Restricted domain of reference • Restricted purpose and orientation • Restricted mode of communication (may include bandwidth considerations) • Community of users sharing specialised knowledge • Examples: Weather reports; financial reports; car accident reports Feb 2007 Diploma in Translation I 41

Stock Market Report (To. M) • Back to the Malta Stock Exchange: it was

Stock Market Report (To. M) • Back to the Malta Stock Exchange: it was a positive week with gainers outpacing losers 6 -3, and large cap stocks regaining popularity. Equity turnover by value topped the Lm 600, 000 mark by a comfortable margin for the second time this year and the MSE index closed the week at 5, 123. 766, up 1. 5%. Bo. V opened the week flat at Lm 3. 651 but advanced steadily on sustained buying activity to close at Lm 3. 68. The positive mood extended into Tuesday as the price climbed further, closing at Lm 3. 70. Feb 2007 Diploma in Translation I 42

Fully Automatic Low Quality Translation – (FALQT) • Can be used where translation volume

Fully Automatic Low Quality Translation – (FALQT) • Can be used where translation volume is high. • Where the gist is more important than an accurate translation • Where we need to select a small group of documents from a large collection for subsequent high quality translation. • Must answer question: could document X in collection Z be about Y? Feb 2007 Diploma in Translation I 43

Google Translation • En 2000 -01, le recouvrement • In 2000 -01, the covering

Google Translation • En 2000 -01, le recouvrement • In 2000 -01, the covering of the des couts mettra l'emphase sur costs will put the emphase on la reduction des comptes a the reduction of the accounts recevoir et la mise en place receivable and the installation d'un programme de verification of a programme of checking aupres des firmes ayant near the firms having required demande une reduction des a reduction of the expenses. In frais. En 2000 -01, le budget du 2000 -01, the budget of the programme devrait augmenter program should increase substantiellement du a des substantially due to new nouvelles initiatives du cote initiatives on the side appropriation. Feb 2007 Diploma in Translation I 44

FAMT is not the only way • FAMT lies at one extreme of a

FAMT is not the only way • FAMT lies at one extreme of a continuum of ways in which technology can be brought to bear upon the translation problem • At the other extreme there are word processing software, fax machines, and even mobile phones • Between these two extremes there are other points of interest where technology can radically affect the productivity of the individual translator. Feb 2007 Diploma in Translation I 45

MAHT and HAMT • Machine Aided Human Translation (MAHT) • Human Aided Machine Translation

MAHT and HAMT • Machine Aided Human Translation (MAHT) • Human Aided Machine Translation (HAMT). • The essential difference between these two lies not only in the way in which the person is involved but also in the extent of their involvement Feb 2007 Diploma in Translation I 46

MAHT • All initiative resides with the human. • Often based on a text

MAHT • All initiative resides with the human. • Often based on a text editor with certain translation-specific functionalities such as – Simultaneous access to source and target texts – Online access to dictionaries, thesauri, terminological databases, and word concordance tools. • Identification of and access to secondary materials such as texts being worked on and other texts like it in both source and target forms. Feb 2007 Diploma in Translation I 47

MAHT - Translation Memories • Systems consist of a database in which each source

MAHT - Translation Memories • Systems consist of a database in which each source sentence of a translation is stored together with the target sentence (this is called a translation memory "unit") • Any new source sentences will be searched for in the database and a match value is calculated. • When the match value is 100%, the translation of the source sentence from the database is inserted into the text being translated. Feb 2007 Diploma in Translation I 48

MAHT - Translation Memories • If the match value is below 100% and above

MAHT - Translation Memories • If the match value is below 100% and above a certain user-definable percentage (i. e. , "fuzzy match"), the old translation will be inserted as a translation proposal for the translator to review and edit. • Sentences with match values below that margin have to be translated from scratch. • New and changed translation proposals will then be stored in the database for future use. Feb 2007 Diploma in Translation I 49

MAHT - Translation Memories – Advantages • Avoid redoing translation of repeated material •

MAHT - Translation Memories – Advantages • Avoid redoing translation of repeated material • Use previous texts as a model for new translations • Ensure consistency throughout a translation Feb 2007 Diploma in Translation I 50

MAHT - Translation Memories - Drawbacks • If terminology changes between projects the content

MAHT - Translation Memories - Drawbacks • If terminology changes between projects the content of a TM needs to be updated to reflect these changes. • Blind faith in exact matches (without validation) can generate incorrect translation since typically there is no verification of context. Feb 2007 Diploma in Translation I 51

MAHT - Translation Memories - Remarks • Translation Process: TM tools may not easily

MAHT - Translation Memories - Remarks • Translation Process: TM tools may not easily fit into existing translation or localization processes: work best where work can be signed off in pieces rather than as a whole. • Customisation: rarely works straight out of the box. Menu adaptation, filters to desktop applications may require significant effort. • Investment costs are high and must include setup and maintenance of TMs. • Open. Tag/TMX formats for exchanging TM data between competing systems Feb 2007 Diploma in Translation I 52

MAHT – Other Technology • Communication/coordination amongst translators • Integration of internet technologies and

MAHT – Other Technology • Communication/coordination amongst translators • Integration of internet technologies and web services. • Database technology, smart indexing, and networking • Improvements can be achieved that are well within the scope of current technology. Feb 2007 Diploma in Translation I 53

HAMT – Human Assisted Machine Translation • Machine retains the initiative but works in

HAMT – Human Assisted Machine Translation • Machine retains the initiative but works in collaboration with human consultant. • System translates autonomously until it recognises that a linguistic difficulty of a certain type has arisen, e. g. – – ambiguity pronoun reference unknown word unrecognised construction • At this point it seeks help from the consultant. Feb 2007 Diploma in Translation I 54

HAMT – Challenges • Reliable identification/classification of difficulty. • Reliable communication of difficulty to

HAMT – Challenges • Reliable identification/classification of difficulty. • Reliable communication of difficulty to user. • Tradeoff between quality and scope of translation. Feb 2007 Diploma in Translation I 55

HAMT - Advantages • Modulo challenges – a high quality of translation can be

HAMT - Advantages • Modulo challenges – a high quality of translation can be guaranteed. • Speed – if large sections of text can be translated automatically. • Human consultant need not necessarily have all the skills of a human translator; native competence in one or both languages may suffice. Feb 2007 Diploma in Translation I 56

Summary • Machine Translation is a continuum – FAMT – HAMT – MAHT •

Summary • Machine Translation is a continuum – FAMT – HAMT – MAHT • The utility of a given type of system cannot be assessed with very simple criteria • Utlility function involves at least the human cost, the machine cost, the quality of the result, and the nature of the translation requirements. Feb 2007 Diploma in Translation I 57