Multilingual multiinstitutional distant learning Example of an international

  • Slides: 9
Download presentation
Multi-lingual & multiinstitutional distant learning Example of an international master programme in Computational Linguistics

Multi-lingual & multiinstitutional distant learning Example of an international master programme in Computational Linguistics 14 -16 November, Blaubeuren, Germany Nikolai Vazov Sofia University 14 -16 November 2003 Kiril Simov Petya Ossenova LML - BAS Sofia University Bul. Tree. Bank project Mi. LCA Symposium, Blaubeuren, Germany

General goals of the programme • to put together linguistics (linguists) and computer technologies

General goals of the programme • to put together linguistics (linguists) and computer technologies (computer scientists) • to put together foreign and local expertise • to promote international multi-lingual cooperation in order to develop multilingual language electronic resources 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 2

Programme participants (1 -2 year) • Two project managment partners – French Ministry of

Programme participants (1 -2 year) • Two project managment partners – French Ministry of Foreign Affairs – French Cultural Institute in Sofia • Two academic partners – University of Sofia (2 departments) – University of Paris IV - Sorbonne 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 3

Programme participants (3 year) • Three project managment partners – French Ministry of Foreign

Programme participants (3 year) • Three project managment partners – French Ministry of Foreign Affairs – French Cultural Institute in Sofia – Agence Universitaire de la Francophonie • Six academic partners – University of Sofia (3 departments) – University of Paris IV - Sorbonne – LML - Bulgarian Academy of Sciences – University of Montréal (RALI & OLST) – University of Iaşi (Romania) – RACAI (Romania) 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 4

Organisation (educational activities) • Foreign participants – 1 -3 intensive teaching sessions – distant

Organisation (educational activities) • Foreign participants – 1 -3 intensive teaching sessions – distant follow up between the sessions – distant examination • Local participants – successive modules (1 -3 weeks each) accompanied by web-based courses – distant tutorship after the course - individual work with students (the format 8 students/15 professors allows for it) – on-line personal library for each student (articles to read and discuss with the other participants) – distant examination 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 5

Organisation (research activities) Carried out as individual tasks with twofold impact: • development of

Organisation (research activities) Carried out as individual tasks with twofold impact: • development of personal skills in manipulating electronic text data (using CLa. RK, Perl, My. SQL, XML, HTML) • integration of individual tasks into the main goal of the team - creation of mono- and multi-lingual electronic resources 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 6

Organisation (research activities) Examples: • writing tokenizers for French (solved in Tree. Tagger) •

Organisation (research activities) Examples: • writing tokenizers for French (solved in Tree. Tagger) • sentence boundaries identification (not entirely handled by Tree. Tagger, but indispensable for parallel corpora) • named entity recognition • temporal expression extraction • abbreviations identification • parenthetic expressions identification • concordances (Bulgarian & French) 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 7

On-line ressources and tools Developed by the team • CLa. RK system • Morphological

On-line ressources and tools Developed by the team • CLa. RK system • Morphological dictionary for Bulgarian • Large tagged corpus of Bulgarian • Concordances (French, Bulgarian) • Temporal expressions extractor (French) • Large archive of bilingual (French-Bulgarian) texts 14 -16 November 2003 Other available ressources • Large tagged corpus FRANTEXT • Large bilingual (French. English) aligned corpus Hansard • Taggers (Tree. Tagger and LATL) • Le Monde sur CD-ROM (with integrated search engine) Mi. LCA Symposium, Blaubeuren, Germany 8

Future work • New (better targeted) master « Electronic language resources » • Goals

Future work • New (better targeted) master « Electronic language resources » • Goals of the master defined the other way around: research needs determine the course content and not vice versa • Envisaged product: parallel French-Bulgarian corpus with named entity identification • So far: collection of parallel texts, development of proper names transcription module 14 -16 November 2003 Mi. LCA Symposium, Blaubeuren, Germany 9