HMMBased and SVMBased Recognition of the Speech of
HMM-Based and SVM-Based Recognition of the Speech of Talkers with Spastic Dysarthria Mark Hasegawa-Johnson (ECE), Jon Gunderson (DRES), Adrienne Perlman (SHS), and Thomas Huang (ECE), University of Illinois at Urbana-Champaign
Cerebral Palsy • Frequency: 0. 26% of all children • Our interest: several students at University of Illinois • Symptoms: gross motor miscoordination • Typing: often requires a head stick, because fingers are not sufficiently agile. • It would be nice if a speech interface were possible, but… • Speech: characterized by articulatory imprecision
Participants in this Research
Participants in this Research • Three talkers with CP (M 1, M 3, F 1), all students at University of Illinois • One control talker (M 2) • Each recorded 800 isolated words • M 2: • M 1: • F 1: • M 3:
Recording Paradigm The portable transducer array: eight microphones, four cameras Screenshot of a student using the array in the lab
Analysis of Articulation Errors Word intelligibility: 19% M 1, 19% F 1, 30% M 3
HMM-Based Speaker-Dependent Isolated Word Recognition Accuracy (Percent) Vocabulary Size (Number of Words)
SVM-Based Recognizers Voice Onset Sync N-ary SVM System 1 Output Logic WAV Recognized to Word Binary SVMs Voice PLP 7 -Mic Offset System 2 Waveform Sync Output Fixed. PLP Array Binary Duration Streams Phonological Acoustic Feature Labels Vectors
Binary SVM Targets Onset Fric Vowel Strident Sonorant Palatal Labial Coda Dipth Nasal Zero One + - + Two Three Four Five Six Seven Eight Nine + + + + + - + +
Results: SVM and HMM Recognizers, 10 -Word Vocabulary Digit Recognition Accuracy (Percent)
Conclusions • Novel isolated-word recognizer – Phonological feature SVM is as good as whole-word SVM – Could be generalized to large-vocabulary isolated-word tasks • Inter-talker variability: – F 1: phone, syllable, word deletions. HMM (looking for canonical phone sequence) fails (80% WRA) – M 1: stuttering, repeats, consonant substitutions. SVM (looking for canonical landmarks) fails (70% WRA). • Can ASR be an HCI for talkers with CP? – Isolated word intelligibility: 19% WRA. – Isolated 45 -word ASR: 50% WRA. – Isolated digits: 90%. . . could be used as part of an HCI… • What’s missing? Dialog? – Current preferred method of human-computer interaction: “dictate my term paper to my friend. ”
- Slides: 11