European Patent Office European Machine Translation Programme Wolfgang
- Slides: 13
European Patent Office European Machine Translation Programme Wolfgang Täger December 2006
Programme Partners and Goals The European Patent Office • Trigger: Success of JP-EN patent translation • Agreement EPO - Member States 1. MT of patents/ abstracts/ communications to/from English 2. Three language pairs per year 3. First three languages: FR - DE - ES • Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek
The European Patent Office MT engine Trial with SMT system (Language Weaver) Call for tender: Winner Worldlingo (Systran) Going public (esp@cenet): December 2006 Needed: Improve translation by specific dictionaries
Dictionary format The European Patent Office Desiderata • open standard • XML-Unicode • support features of MT engines • support conditional translations (e. g. based on IPC) Is not intended for terminology (no definitions, lexical focus and no semantic focus). ÞOLIF format was chosen How to get dictionaries ? By bilingual term extraction !
The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN, DE, FR 300. 000 DE-T 2 publications 37. 000 ES-B 3/T 3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) DE-T 2 EP-B 1 ES B 3/T 3 (La. Tex) DESC EN OR FR OR DE DESC ES CL EN (CL DE) CL ES CL FR CL DE
The European Patent Office Available corpora 560. 000 EP-B publications => claims in EN, DE, FR 300. 000 DE-T 2 publications 37. 000 ES-B 3/T 3 publications => Align corpora for term extraction, concordancing, translation memory (and SMT) DE-T 2 EP-B 1 ES B 3/T 3 (La. Tex) DESC EN OR FR OR DE DESC ES CL EN (CL DE) CL ES CL FR CL DE
Alignment & Extraction The European Patent Office Alignment: Trial at EPO with internally developed SW Result was not improved by external companies during call for tender.
Alignment & Extraction The European Patent Office Call for tender for bilingual term extraction Winner: DFKI 1. Alignment of corpora, POS tagging, Identification of terms 2. Pairing of terms using clues like cooccurrence score, string similarity, grammatical clues, position, available dictionaries, . . . 3. Providing further information like gender, inflection, transitivity, countable, . . .
The European Patent Office Validation & Concordancing Development of OLIF editor at EPO • Remove noise • Correct entries • Use concordancer (provides statistics based on parallel corpora) => DEMO
OLIF format The European Patent Office • Support of more languages • Clarification of inflection scheme • Clarification of term vs lex approach • Tools
Relational database ? ? The European Patent Office Transl Sem. Rel Concept Term Naming Infl. Form Surf. Form Reg. Ex Lemma Infl Lex. Type
Relational database ? ? The European Patent Office Transl Sem. Rel grüner Tee „hot drink. . . “ Naming Nom. Sg. str. f. pos. grüner -er grün i. Like „klein“ DE, Adj
End The European Patent Office Thank you!
- Korean patent translation
- European patent academy
- Wolfgang philipp european commission
- Epp program
- Co-funded by the erasmus+ programme of the european union
- Co-funded by the erasmus+ programme of the european union
- Co-funded by the erasmus+ programme of the european union
- Programe office
- What is communicative translation?
- Voice translation profile
- Transformations of functions
- Noun phrases
- Interactive machine translation
- Lms machine translation