Introduction to Machine Translation CSC 4598 Machine Translation

  • Slides: 31
Download presentation
Introduction to Machine Translation CSC 4598 Machine Translation Fall 2018 Dr. Tom Way 1

Introduction to Machine Translation CSC 4598 Machine Translation Fall 2018 Dr. Tom Way 1

TYPES OF MACHINE TRANSLATION 2

TYPES OF MACHINE TRANSLATION 2

Types of Machine Translation • rule based – dictionary based - EASIEST - also

Types of Machine Translation • rule based – dictionary based - EASIEST - also called word-based or wordfor-word approach – transfer rule approach - try to use the meaning of source language to output the same meaning in the target language – interlingual - translate via a language neutral intermediate form • statistical – calculate most likely translation by using pairs of translation (bilingual text corpora) – neural networks - HARDEST - state of the art, really a variation of statistical • example based – use simple examples in one language to generate the same thing in another language 3

HISTORY OF MACHINE TRANSLATION 4

HISTORY OF MACHINE TRANSLATION 4

History of Machine Translation (Based on work by John Hutchins, mt-archive. info) • Before

History of Machine Translation (Based on work by John Hutchins, mt-archive. info) • Before the computer: In the mid 1930 s, a French. Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’. • The pioneers (1947 -1954): the first public MT demo was given in 1954 (by IBM and Georgetown University). • Machine translation was one of the first applications envisioned for computers 5

History of MT (2) Warren Weaver, Ph. D was an American scientist, mathematician, and

History of MT (2) Warren Weaver, Ph. D was an American scientist, mathematician, and science administrator. He is widely recognized as one of the pioneers of machine translation, and as an important figure in creating support for science in the United States. 6

History of MT (3) First demonstrated by IBM in 1954 with a basic word-forword

History of MT (3) First demonstrated by IBM in 1954 with a basic word-forword translation system 7

History of MT (4) • The decade of optimism (1954 -1966) ended with the…

History of MT (4) • The decade of optimism (1954 -1966) ended with the… • ALPAC (Automatic Language Processing Advisory Committee) report in 1966: “There is no immediate or predictable prospect of useful machine translation. " 8

History of MT (5) The ALPAC Report The ALPAC (Automatic Language Processing Advisory Committee)

History of MT (5) The ALPAC Report The ALPAC (Automatic Language Processing Advisory Committee) was a govt. committee of seven scientists. Their 1966 report was very skeptical of the progress in computational linguistics and machine translation. 9

History of MT (6) • The aftermath of the ALPAC report… • Research on

History of MT (6) • The aftermath of the ALPAC report… • Research on machine translation virtually stopped from 1966 to 1980 10

History of MT (7) • Then, a rebirth… • The 1980 s: Interlingua, example-based

History of MT (7) • Then, a rebirth… • The 1980 s: Interlingua, example-based Machine Translation • The 1990 s: Statistical MT • The 2000 s: Hybrid MT • The 2010 s: Google, real-time, mobile, Crowdsourcing, more hybrid approaches 11

MACHINE TRANSLATION TODAY 12

MACHINE TRANSLATION TODAY 12

Where are we now? • Huge potential/need due to the internet, globalization and international

Where are we now? • Huge potential/need due to the internet, globalization and international politics. • Quick development time due to Statistical Machine Translation (SMT), the availability of parallel data and computers. • Translation is reasonable for language pairs with a large amount of resources. • Start to include more “minor” languages. 13

Rule-based MT The Vauquois Triangle 14

Rule-based MT The Vauquois Triangle 14

Statistical MT The Rosetta Stone 15

Statistical MT The Rosetta Stone 15

What is MT good for? • • Rough translation: web data Computer-aided human translation

What is MT good for? • • Rough translation: web data Computer-aided human translation Translation for limited domain Cross-lingual IR • Machines beat humans at: – Speed: much faster than humans – Memory: can easily memorize millions of word/phrase translations. – Manpower: machines are much cheaper than humans – Fast learner: it takes minutes or hours to build a new system. – Never complain, never get tired, … 16

Interest in Machine Translation (1) • Commercial interest: – U. S. has invested in

Interest in Machine Translation (1) • Commercial interest: – U. S. has invested in machine translation (MT) for intelligence purposes – MT is popular on the web—it is the most used of Google’s special features – EU spends more than $1 billion on translation costs each year. – (Semi-)automated translation could lead to huge savings 17

Interest in Machine Translation (2) • Academic interest: – One of the most challenging

Interest in Machine Translation (2) • Academic interest: – One of the most challenging problems in NLP research – Requires knowledge from many NLP subareas, e. g. , lexical semantics, syntactic parsing, morphological analysis, statistical modeling, … – Being able to establish links between two languages allows for transferring resources from one language to another 18

Goals & Uses • • Translating Summarizing Communicating Pre-editing Grammar analysis Analyzing text Understanding

Goals & Uses • • Translating Summarizing Communicating Pre-editing Grammar analysis Analyzing text Understanding text and images 19

DO WE REALLY NEED MACHINE TRANSLATION? 20

DO WE REALLY NEED MACHINE TRANSLATION? 20

Languages on the Internet 21

Languages on the Internet 21

Languages on Twitter 22

Languages on Twitter 22

Languages in Los Angeles 23

Languages in Los Angeles 23

Why do we need MT? 24

Why do we need MT? 24

Why do we need MT? 25

Why do we need MT? 25

Why do we need MT? 26

Why do we need MT? 26

Why is MT hard? 27

Why is MT hard? 27

Why is MT hard? 28

Why is MT hard? 28

Why is MT hard? 29

Why is MT hard? 29

Why is MT hard? • For example… • Commercial system “Language Weaver” created in

Why is MT hard? • For example… • Commercial system “Language Weaver” created in 2002 • Uses statistical techniques from cryptography and machine to acquire statistical models from human translations • Sold in 2010 for $42. 5 million 30

“Language Weaver” SMT System – Comparison: Arabic to English v. 2. 0 – October

“Language Weaver” SMT System – Comparison: Arabic to English v. 2. 0 – October 2003 v. 2. 4 – October 2004 v. 3. 0 - February 2005