Efficient Computer Interfaces Using Continuous Gestures Language Models

The problem l l Speech recognizers make mistakes Correcting mistakes is inefficient Ø Ø

Research overview l Make correction of dictation: Ø Ø Ø l More efficient More

Building lattice l Example n-best list: 1: jack studied very hard 2: jack studied

Acoustic confusions l l Given a word, find words that sound similar Look pronunciation

Acoustic confusions: “Jack studied hard”

Language model confusions: “Jack studied hard” l Look at words before or after a

Probability model l Our confirmation and correction interface requires probability of a letter given

Probability model l l Keep track of possible paths in lattice Prediction based on

Handling word errors l l l Use default language model during entry of erroneous

Evaluating expansion l Assume a good model requires as little information from the user

Results on test set l l Model evaluated on held out test set (Hub

“To the mouse snow means freedom from want and fear”

Slides: 16

Download presentation

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004

The problem l l Speech recognizers make mistakes Correcting mistakes is inefficient Ø Ø Ø l 140 WPM 14 WPM 32 WPM Uncorrected dictation Corrected dictation, mouse/keyboard Corrected typing, mouse/keyboard Voice-only correction is even slower and more frustrating

Research overview l Make correction of dictation: Ø Ø Ø l More efficient More fun More accessible Approach: Ø Ø Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to perform confirmation and correction

Building lattice l Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little

Insertion errors

Acoustic confusions l l Given a word, find words that sound similar Look pronunciation up in dictionary: studied l Use observed phone confusions to generate alternative pronunciations: s t ah d iy d l s t ah d iy d s ao d iy s t ah d iy … Map pronunciation back to words: s t ah d iy d s ao d iy s t ah d iy studied saudi study

Acoustic confusions: “Jack studied hard”

Language model confusions: “Jack studied hard” l Look at words before or after a node, add likely alternate words based on n-gram LM

Expansion results (on WSJ 1)

Probability model l Our confirmation and correction interface requires probability of a letter given prior letters:

Probability model l l Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:

Handling word errors l l l Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:

Evaluating expansion l Assume a good model requires as little information from the user as possible

Results on test set l l Model evaluated on held out test set (Hub 1) Default language model Ø Ø l 2. 4 bits/letter User decides between 5. 3 letters Best speech-based model Ø Ø 0. 61 bits/letter User decides between 1. 5 letters

“To the mouse snow means freedom from want and fear”

Questions?