Sprkbanken i Finland Kielipankki Language Bank of Finland
Språkbanken i Finland Kielipankki Language Bank of Finland Nordic Treebank Network Fefor, September 17, 2003 AEB/Yleisesittely
Vem, kuka, who? • The Language Bank of Finland is a service provided by CSC • CSC is owned by the Finnish Ministry of Education – provides HPCN services to all universities – maintains scientific applications and databases • CSC focuses on providing shared services • Services are gratis for universities, non-profit for companies • The Language Bank serves the linguistic community in Finland – Server: corpus. csc. fi server (Linux) – Text collections (Finnish and Finland-Swedish) – Taggers – Web based corpus query tool AEB/Yleisesittely
Varför, miksi, why? • There is no Treebank of Finnish at present • … and it is a shame, so • The Language Bank wants to bring about its creation – Infrastructure programme by the Academy of Finland in 2004 – The plan is to use Finnish Dependency Grammar by Connexor • Without query and analysis tools the treebank is just a large heap of files – We need information on tools and technology in order to create a nice service for linguists and language technology professionals AEB/Yleisesittely
At present the Language Bank offers. . . (1) • Text collection of Finnish – 180 million words – 60 % with msd tags (Text. Morfo 2. 0) • Text collection of Finland-Swedish – 32 million words – 100 % with msd tags (SWECG) • Swedish PAROLE – 19 million words (courtesy of Språkbanken, Gothenburg) • Other: – Le Monde 1990, German PAROLE, FISC, Susanne, OTA, Middle French, Oulu AEB/Yleisesittely
At present the Language Bank offers. . . (2) • WWW Lemmie 2. 0 (screenshot on next slide) – Easy-to-use corpus query tool developed at CSC • Taggers – Fi-lite (Connexor) – En-lite (Connexor) – ENGCG (Lingsoft) – SWECG (Lingsoft) – FINTWOL (Lingsoft) – Text. Morfo (Kielikone) – Morfo (Kielikone) AEB/Yleisesittely
AEB/Yleisesittely
In the past the Language Bank has been active in. . . • Preparing ground for research programmes – Preliminary survey on language technology 1998 – Preliminary survey on spoken language research 2001 • Participating in programmes with universities – Enlargement of text collections 1999 -2001 – Integrated resources for speech technology and spoken language research 2002 -2004 AEB/Yleisesittely
In the future the Language Bank will offer. . . • Spoken language data – Academy of Finland funding – The work is being done • Annotation editor for spoken language data (screenshot on next slide) – Annotation interchange format in RDF – Supports collaborative annotation • Treebank of Finnish ; -) – Just need some money… • Better tools for querying and processing research data AEB/Yleisesittely
AEB/Yleisesittely
More information http: //www. csc. fi/kielipankki/ manne. miettinen@csc. fi AEB/Yleisesittely
- Slides: 10