Results of the STEVIN programme STEVIN Final Event

  • Slides: 41
Download presentation
Results of the STEVIN programme STEVIN Final Event, Rotterdam, Nov 28 2011 Jan Odijk

Results of the STEVIN programme STEVIN Final Event, Rotterdam, Nov 28 2011 Jan Odijk

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 2 STEVIN Final Event, Rotterdam, 28 Nov 2011

STEVIN Objectives • Digital Language Infrastructure (DLI) • Strategic Research (SR) • LST community

STEVIN Objectives • Digital Language Infrastructure (DLI) • Strategic Research (SR) • LST community consolidation (CC) 3 STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 4 STEVIN Final Event, Rotterdam, 28 Nov 2011

Digital Language Infrastructure • Creation • Resource Management • IPR 5 STEVIN Final Event,

Digital Language Infrastructure • Creation • Resource Management • IPR 5 STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 6 STEVIN Final Event, Rotterdam, 28 Nov 2011

DLI: Creation • Priorities for written language: a. A large corpus of written Dutch

DLI: Creation • Priorities for written language: a. A large corpus of written Dutch b. An electronic lexicon c. Parallel corpora 7 STEVIN Final Event, Rotterdam, 28 Nov 2011

Realisation: Written (1) • D-COI + SONAR: 500 M word corpus (a) • LASSY:

Realisation: Written (1) • D-COI + SONAR: 500 M word corpus (a) • LASSY: 1 M word Treebank (a) • CORNETTO: 40 k entry lexical semantic database (b) • DPC: 10 M word parallel corpus D-E / D-F (c ) 8 STEVIN Final Event, Rotterdam, 28 Nov 2011

Realisation: Written (2) • COREA: co-reference corpus (a) • IRME: 5 k MWE lexical

Realisation: Written (2) • COREA: co-reference corpus (a) • IRME: 5 k MWE lexical database (b) • DAESO: 1 M word monolingual parallel corpus (c) • DAISY (a) • DUOMAN (a) • PACO-MT (a, c) 9 STEVIN Final Event, Rotterdam, 28 Nov 2011

Creation: Priorities Speech (1) a. speech and multimodal corpora for CALL, NAW, CCQA applications

Creation: Priorities Speech (1) a. speech and multimodal corpora for CALL, NAW, CCQA applications b. multimodal corpora for – – broadcast news transcription or person identification; c. text corpora for stochastic language models; 10 STEVIN Final Event, Rotterdam, 28 Nov 2011

Creation: Priorities Speech (2) d. tools and data for the development of – –

Creation: Priorities Speech (2) d. tools and data for the development of – – robust speech recognition; automatic annotation of corpora; e. speech synthesis; 11 STEVIN Final Event, Rotterdam, 28 Nov 2011

Realisation: Speech (1) • • • 12 Autonomata (a, NAW; e) JASMIN-CGN (a, CALL)

Realisation: Speech (1) • • • 12 Autonomata (a, NAW; e) JASMIN-CGN (a, CALL) D-COI + SONAR (c ) SPRAAK (d) STEVINcan. PRAAT (d) STEVIN Final Event, Rotterdam, 28 Nov 2011

Realisation: Speech (2) • Missing – (b) Multimodal corpora • But partially covered by

Realisation: Speech (2) • Missing – (b) Multimodal corpora • But partially covered by other projects – – 13 EU: AMI, AMIDA (U Twente) NL: IMIX STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 14 STEVIN Final Event, Rotterdam, 28 Nov 2011

DLI: Resource Management • HLT Agency set up • See presentation by Remco van

DLI: Resource Management • HLT Agency set up • See presentation by Remco van Veenendaal 15 STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 16 STEVIN Final Event, Rotterdam, 28 Nov 2011

DLI: IPR • Systematic attention for IPR & Ethical Issues from the start –

DLI: IPR • Systematic attention for IPR & Ethical Issues from the start – Not easy but – The only way to ensure usage of LRs by the R&D community in a legal manner • Specific regulation on how to deal with IPR in the STEVIN programme and projects 17 STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 18 STEVIN Final Event, Rotterdam, 28 Nov 2011

Strategic Research • Will be dealt with by Walter in his presentation • Work

Strategic Research • Will be dealt with by Walter in his presentation • Work programme lists examples of applications – how do STEVIN projects contribute to such applications (directly or indirectly) 19 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Applications (1) • Information extraction from Speech: – Rechtspraakherkenning, NEON, and SNRT –

SR: Applications (1) • Information extraction from Speech: – Rechtspraakherkenning, NEON, and SNRT – AUTONOMATA, JASMIN-CGN, SPRAAK, STEVINcan. PRAAT, N-BEST, AUTONOMATA TOO and MIDAS. • Detection of accent and identity of speakers. – JASMIN-CGN, SPRAAK, DISCO, Diademo, Rechtspraakherkenning 20 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Applications (2) • Extraction of information from (monolingual or multilingual) text. – DAESO,

SR: Applications (2) • Extraction of information from (monolingual or multilingual) text. – DAESO, DUOMAN, Gemeenteconnect and Your. News. – COREA, IRME, D-COI, SONAR, DPC, LASSY, CORNETTO, and PACO-MT • Semantic web: – CORNETTO, D-COI and SONAR 21 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Applications (3) • Dialogue systems and Q&A solutions – DAISY, DUOMAN, Gemeenteconnect, Web

SR: Applications (3) • Dialogue systems and Q&A solutions – DAISY, DUOMAN, Gemeenteconnect, Web Assess. • Automatic summarization and text generation – DAESO, Web Assess – D-COI and SONAR, 22 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Applications (4) • Automatic Translation – DPC, PACO-MT – D-COI, SONAR, LASSY, IRME,

SR: Applications (4) • Automatic Translation – DPC, PACO-MT – D-COI, SONAR, LASSY, IRME, COREA, CORNETTO • Educational systems – DISCO, Spel. Spiek, Primus, HATCI, Woo. Dy, AAP – All resource creation projects 23 STEVIN Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 24 STEVIN Final Event, Rotterdam, 28 Nov 2011

LST Community Consolidation • • • 25 Create networks consolidate LST activities educate new

LST Community Consolidation • • • 25 Create networks consolidate LST activities educate new experts promote discussion promote transfer of knowledge STEVIN Final Event, Rotterdam, 28 Nov 2011

LST Community Consolidation • Set aside a specific budget and a dedicated WG •

LST Community Consolidation • Set aside a specific budget and a dedicated WG • joint KI/SME and NL/FL projects preferred – 330 binary cooperation link occurrences • demonstration projects stimulated companies to participate 26 STEVIN Final Event, Rotterdam, 28 Nov 2011

LST Community Consolidation • Educational projects (3) • Master classes (2) • Networking events

LST Community Consolidation • Educational projects (3) • Master classes (2) • Networking events organized – brokerage events, “Taal in Bedrijf” (‘language@work’), STEVIN programme meetings, etc. . • Networking events supported 27 – e. g. STEVIN CLIN, Inter. Speech 2007, ICT-Delta Final Event, Rotterdam, 28 Nov 2011

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management –

Overview • STEVIN Objectives • Digital Language Infrastructure – Creation – Resource Management – IPR • Strategic Research • LST Community Consolidation • Various Statistics 28 STEVIN Final Event, Rotterdam, 28 Nov 2011

Money Distribution • • • 29 R&D Demonstration Supporting Activities HLT Agency STEVIN Management

Money Distribution • • • 29 R&D Demonstration Supporting Activities HLT Agency STEVIN Management STEVIN Final Event, Rotterdam, 28 Nov 2011 (76. 0%) ( 8. 5%) ( 6. 0%) ( 2. 5%) ( 6. 5%)

Strata Coverage • • 30 Basic resources for LST Basic Research Application-oriented Res. Demonstration

Strata Coverage • • 30 Basic resources for LST Basic Research Application-oriented Res. Demonstration projects STEVIN Final Event, Rotterdam, 28 Nov 2011 (51. 1%) (23. 3%) (15. 4%) (10. 2%)

NL / FL Proportion • R&D Projects • Demonstrator projects • Overall 63%: 37%

NL / FL Proportion • R&D Projects • Demonstrator projects • Overall 63%: 37% 66%: 34% 64%-36% • Educational projects (3) • Master classes (2) 68%: 32% 100%: 0% 31 STEVIN Final Event, Rotterdam, 28 Nov 2011

KI / SME Proportion • • • 32 Money R&D projects by project R&D

KI / SME Proportion • • • 32 Money R&D projects by project R&D projects by #participations Demonstration projects Master classes Education activities STEVIN Final Event, Rotterdam, 28 Nov 2011 83%: 17% 19 : 13 80%: 20% 15%: 85% 0%: 100% 83%: 17%

Language / Speech • Money: 33 STEVIN Final Event, Rotterdam, 28 Nov 2011 53.

Language / Speech • Money: 33 STEVIN Final Event, Rotterdam, 28 Nov 2011 53. 1%: 46. 9%

Funded v. Submitted • • • R&D count 1 19/52 (36. 5%) R&D count

Funded v. Submitted • • • R&D count 1 19/52 (36. 5%) R&D count 2 19/68 (27. 9%) Demonstration 14/41 (30. 0%) Educational 3/ 5 (60%) Master Classes 2/ 3 (66. 6%) Most proposals were very good – So many more could and should be done 34 STEVIN Final Event, Rotterdam, 28 Nov 2011

Thanks for your Attention! 35 STEVIN Final Event, Rotterdam, 28 Nov 2011

Thanks for your Attention! 35 STEVIN Final Event, Rotterdam, 28 Nov 2011

DO NOT GO BEYOND THIS SLIDE! 36 STEVIN Final Event, Rotterdam, 28 Nov 2011

DO NOT GO BEYOND THIS SLIDE! 36 STEVIN Final Event, Rotterdam, 28 Nov 2011

Strategic Research • Priorities written language: a. semantic analysis (tagging, integration with syntax and

Strategic Research • Priorities written language: a. semantic analysis (tagging, integration with syntax and morphology) b. text pre-processing (tokenization, spelling correction, named entity recognition, . . . ) c. morphological analysis (compounding and derivation) d. syntactic analysis: a robust parser for Dutch 37 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Realisation Written (1) • COREA: co-reference resolution (a) • IRME: MWE identification +

SR: Realisation Written (1) • COREA: co-reference resolution (a) • IRME: MWE identification + lexical representation (d, a) • LASSY: parser (d) • DAESO: semantic relations and text-to-text generation (a) 38 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Realisation Written (2) • • DAISY: automatic summarization (a) DUOMAN: attitude detection (a)

SR: Realisation Written (2) • • DAISY: automatic summarization (a) DUOMAN: attitude detection (a) PACO-MT: Machine translation (d, a) D-COI / SONAR (a, b) • Lacking: (c): morphological analysis for derivation and compounding. 39 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Priorities Speech a. robustness of speech recognition; b. output treatment (inverse text normalization);

SR: Priorities Speech a. robustness of speech recognition; b. output treatment (inverse text normalization); c. confidence measures; d. adaptation; e. lattices. 40 STEVIN Final Event, Rotterdam, 28 Nov 2011

SR: Realisation Speech • • • 41 AUTONOMATA (a) MIDAS (a) N-BEST (a )

SR: Realisation Speech • • • 41 AUTONOMATA (a) MIDAS (a) N-BEST (a ) SPRAAK (a, b, c, d, e ) DISCO: (a + CALL priority) AUTONOMATA TOO (a) STEVIN Final Event, Rotterdam, 28 Nov 2011