European Patent Office European Machine Translation Programme Wolfgang

  • Slides: 13
Download presentation
European Patent Office European Machine Translation Programme Wolfgang Täger December 2006

European Patent Office European Machine Translation Programme Wolfgang Täger December 2006

Programme Partners and Goals The European Patent Office • Trigger: Success of JP-EN patent

Programme Partners and Goals The European Patent Office • Trigger: Success of JP-EN patent translation • Agreement EPO - Member States 1. MT of patents/ abstracts/ communications to/from English 2. Three language pairs per year 3. First three languages: FR - DE - ES • Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek

The European Patent Office MT engine Trial with SMT system (Language Weaver) Call for

The European Patent Office MT engine Trial with SMT system (Language Weaver) Call for tender: Winner Worldlingo (Systran) Going public (esp@cenet): December 2006 Needed: Improve translation by specific dictionaries

Dictionary format The European Patent Office Desiderata • open standard • XML-Unicode • support

Dictionary format The European Patent Office Desiderata • open standard • XML-Unicode • support features of MT engines • support conditional translations (e. g. based on IPC) Is not intended for terminology (no definitions, lexical focus and no semantic focus). ÞOLIF format was chosen How to get dictionaries ? By bilingual term extraction !

The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN,

The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN, DE, FR 300. 000 DE-T 2 publications 37. 000 ES-B 3/T 3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) DE-T 2 EP-B 1 ES B 3/T 3 (La. Tex) DESC EN OR FR OR DE DESC ES CL EN (CL DE) CL ES CL FR CL DE

The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN,

The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN, DE, FR 300. 000 DE-T 2 publications 37. 000 ES-B 3/T 3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) DE-T 2 EP-B 1 ES B 3/T 3 (La. Tex) DESC EN OR FR OR DE DESC ES CL EN (CL DE) CL ES CL FR CL DE

Alignment & Extraction The European Patent Office Alignment: Trial at EPO with internally developed

Alignment & Extraction The European Patent Office Alignment: Trial at EPO with internally developed SW Result was not improved by external companies during call for tender.

Alignment & Extraction The European Patent Office Call for tender for bilingual term extraction

Alignment & Extraction The European Patent Office Call for tender for bilingual term extraction Winner: DFKI 1. Alignment of corpora, POS tagging, Identification of terms 2. Pairing of terms using clues like cooccurrence score, string similarity, grammatical clues, position, available dictionaries, . . . 3. Providing further information like gender, inflection, transitivity, countable, . . .

The European Patent Office Validation & Concordancing Development of OLIF editor at EPO •

The European Patent Office Validation & Concordancing Development of OLIF editor at EPO • Remove noise • Correct entries • Use concordancer (provides statistics based on parallel corpora) => DEMO

OLIF format The European Patent Office • Support of more languages • Clarification of

OLIF format The European Patent Office • Support of more languages • Clarification of inflection scheme • Clarification of term vs lex approach • Tools

Relational database ? ? The European Patent Office Transl Sem. Rel Concept Term Naming

Relational database ? ? The European Patent Office Transl Sem. Rel Concept Term Naming Infl. Form Surf. Form Reg. Ex Lemma Infl Lex. Type

Relational database ? ? The European Patent Office Transl Sem. Rel grüner Tee „hot

Relational database ? ? The European Patent Office Transl Sem. Rel grüner Tee „hot drink. . . “ Naming Nom. Sg. str. f. pos. grüner -er grün i. Like „klein“ DE, Adj

End The European Patent Office Thank you!

End The European Patent Office Thank you!