Roadmap for Language Resources and Evaluation in a

  • Slides: 15
Download presentation
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the

Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and Speech Technology (SU-CLa. ST) Stellenbosch University, South Africa jcr@sun. ac. za

Aim • Overview of – proceedings of the LREC 2006 workshop on Networking the

Aim • Overview of – proceedings of the LREC 2006 workshop on Networking the development of African Languages – resolutions taken at the meeting • Remarks on future development and cooperation

Background to the LREC workshop • African Language Association of Southern Africa Special Interest

Background to the LREC workshop • African Language Association of Southern Africa Special Interest Group for Language and Speech Technology (ALASA-SIG) – Special Track on HLT at ALASA International Conference in Johannesburg in 2005 – National and international participants – Proceedings to appear in SA Journal of African Languages – Decision to interact with the international community via LREC 2006

Why? • UNESCO Year of African Languages (2006) • Challenges in bridging the digital

Why? • UNESCO Year of African Languages (2006) • Challenges in bridging the digital divide concerning African languages (connecting Africa) • R&D activities in relative isolation • Perceived need to develop resources and capacity for HLT R&D in African languages • Similar activities in NEMLAR project – Language Technology for Arabic

AIMS of Workshop • Develop an academic network for sharing ideas • Promote co-operation

AIMS of Workshop • Develop an academic network for sharing ideas • Promote co-operation in the development of resources and tools (BLARKs for African languages) • Facilitate capacity building related to African languages in the context of HLT

Programme • Area surveys – – West Africa East Africa Central Africa Southern Africa

Programme • Area surveys – – West Africa East Africa Central Africa Southern Africa • Projects per area • Larger projects and infrastructures • Discussion on networking possibilities

West Africa – Language Documentation paradigm: specific role of Uni Bielefeld – Doctoral students

West Africa – Language Documentation paradigm: specific role of Uni Bielefeld – Doctoral students at various European universities – ALT-I: African Language Technology Institute in Ibadan – Local Language Speech Technology Initiative (Speech synthesis for Ibibio) – Initiatives in development of morphological parsers (Cologne) – West African Linguistics Society

East Africa – Text corpora on Swahili across Europe – University of Helsinki •

East Africa – Text corpora on Swahili across Europe – University of Helsinki • Tools: Open Swahili Localisation Project (OSLP) – spelling checker for Swahili • Tagging tools • Localisation Microsoft Windows XP: Swahili • Morphological analysers • SALAMA: Machine Translation – Centre for Science and New Technologies & CNRS (Avignon) • Speech mining in Somali – University of Nairobi & University of Antwerp • Annotated corpora in Gikuyu and applied machine learning

Southern Africa – Extremely wide range of activities in South Africa primarily by locals

Southern Africa – Extremely wide range of activities in South Africa primarily by locals (see proceedings) – University of South Africa • Morphological analysers for five African languages • Development of machine readable lexicons – University of Pretoria • Text corpora and spelling checkers • Machine-aided Translation / Localisation – Stellenbosch University Centre for Language and Speech Technology • ASR, TTS and Natural language Understanding in five languages

Southern Africa (Continued) – University of North West - Centre for Text Technology •

Southern Africa (Continued) – University of North West - Centre for Text Technology • Localisation, spelling checkers – University of Limpopo & Cape Town • Speech Synthesis – Meraka Institute (Pretoria) • Open source software for language and speech technology applications – University of the Free State & Province of Flanders • Interpreting services, data warehousing

Southern Africa (Continued) • Standardisation: – ISO/TC 37 mirror Committee (Stan. SA TC 37)

Southern Africa (Continued) • Standardisation: – ISO/TC 37 mirror Committee (Stan. SA TC 37) Terminology training workshops with Termnet Workshop on text annotation (Sept 2006) ISO-Meetings: Oslo (04), Warsaw (05), Beijing (06) • AFRILEX: – International conferences and workshops • National Language Service: – National Lexicography Units – National HLT Resource Centre

Larger Projects • The African Anaphora Project (Rutgers, USA) • Building an Infrastructure for

Larger Projects • The African Anaphora Project (Rutgers, USA) • Building an Infrastructure for Collaborative Development (Taiwan)

Decisions taken • To consolidate an inventory on tools, resources etc. available in Africa

Decisions taken • To consolidate an inventory on tools, resources etc. available in Africa by using the on-line ELRA BLARK website • To set up a dedicated website (Wiki) to facilitate networking • The current Organising Committee will be responsible for the activities above as well as for fundraising for training workshops in Africa • To organise a similar workshop at LREC 2008

Concluding impressions • European countries are playing an active role in the field in

Concluding impressions • European countries are playing an active role in the field in West and East Africa – to be welcomed • International organisations are becoming increasingly involved in Africa: – ISCA International Affairs Committee for Africa – ISO – ELRA? ? • International co-operation in EU projects (FP 7)?