MAI Internship AprilMay 2002 What MAI Internship 2002

  • Slides: 14
Download presentation
MAI Internship April-May 2002

MAI Internship April-May 2002

What? • • • MAI Internship 2002 The AST Project promotes development of speech

What? • • • MAI Internship 2002 The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000 -2003 Slide 2 of 14

AST dialogue system: basics Telephone Network Speech Synthesis Speech Recognition Dialogue Manager D A

AST dialogue system: basics Telephone Network Speech Synthesis Speech Recognition Dialogue Manager D A T A B A S E Natural Language Understanding MAI Internship 2002 Slide 3 of 14

AST Speech Database • Use? input ASR: acoustic training output ASR: dictionary • Start

AST Speech Database • Use? input ASR: acoustic training output ASR: dictionary • Start from scratch, even for SAE • Telephone data based on Speech. Dat – – Datasheet utterances Hierarchical recruiting method • Labeling Tool: PRAAT MAI Internship 2002 Slide 4 of 14

Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black

Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 300 -400 300 -400 2 isi. Xhosa (X) XX 300 -400 3 Sesotho (S) SS 300 -400 4 isi. Zulu (Z) ZZ 300 -400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans MAI Internship 2002 1500 -2000 900 -1200 AA BA CA 300 -400 Slide 5 of 14

AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic

AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic transcription Forced alignment: HTK Phonetic alignment MAI Internship 2002 Slide 6 of 14

AST Speech Recognition • Difficult: – – – Speaker independent, noisy conditions Medium-size vocabulary

AST Speech Recognition • Difficult: – – – Speaker independent, noisy conditions Medium-size vocabulary (10. 000 words) Training data sparse – Dialogue Manager helps • Not so difficult: • Phoneme-based HMMs future diphones • Finite-state language model • Pitch & clicks African languages ignored MAI Internship 2002 Slide 7 of 14

AST Natural Language Understanding • Same finite-state network as language model recogniser +: all

AST Natural Language Understanding • Same finite-state network as language model recogniser +: all utterances ‘understood’ -: FSG are limited • Makes no sense to recognise more than we can understand • Semantic labels are activated • Alternative: robust parsing (Phoenix, ATIS) MAI Internship 2002 Slide 8 of 14

AST Natural Language Understanding Meaning Recognised utterance Speech Recognition FSG Grammar ID MAI Internship

AST Natural Language Understanding Meaning Recognised utterance Speech Recognition FSG Grammar ID MAI Internship 2002 NLU Dialogue Manager Grammar ID Slide 9 of 14

AST Natural Language Understanding Embedded semantic tags: ‘drie honderd duisend agt en neëntig’ 3

AST Natural Language Understanding Embedded semantic tags: ‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8 t 1=3 t 2=0 t 3=0 V 6=3 V 5=0 V 4=0 V 3=0 V 2=9 V 1=8 MAI Internship 2002 Slide 10 of 14

AST Dialogue Manager • Trade-off: naturalness response restriction • System-directed: predictability user utterances, simple

AST Dialogue Manager • Trade-off: naturalness response restriction • System-directed: predictability user utterances, simple dialogues • Mixed-initiative: shorter dialogues, more recognition errors • User-initiative: unpopular MAI Internship 2002 Slide 11 of 14

AST Dialogue Manager Design: • Early focus on users and task • Wizard-of-Oz: pay

AST Dialogue Manager Design: • Early focus on users and task • Wizard-of-Oz: pay no attention to the man behind the curtain • System-in-the-loop • Finite-state structure because of simplicity and functionality • Possible frame-based approach in future MAI Internship 2002 Slide 12 of 14

AST Speech Synthesis • • MAI Internship 2002 Fixed machine utterances: pre-recorded speech Database

AST Speech Synthesis • • MAI Internship 2002 Fixed machine utterances: pre-recorded speech Database queries: limited-domain synthesis (Festival platform) Slide 13 of 14

Conclusion Finite-state approach in – Recogniser – NLU component – Dialogue manager Workable prototype

Conclusion Finite-state approach in – Recogniser – NLU component – Dialogue manager Workable prototype New fundings 2003 MAI Internship 2002 Slide 14 of 14