Efficient Computer Interfaces Using Continuous Gestures Language Models

  • Slides: 16
Download presentation
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004

The problem l l Speech recognizers make mistakes Correcting mistakes is inefficient Ø Ø

The problem l l Speech recognizers make mistakes Correcting mistakes is inefficient Ø Ø Ø l 140 WPM 14 WPM 32 WPM Uncorrected dictation Corrected dictation, mouse/keyboard Corrected typing, mouse/keyboard Voice-only correction is even slower and more frustrating

Research overview l Make correction of dictation: Ø Ø Ø l More efficient More

Research overview l Make correction of dictation: Ø Ø Ø l More efficient More fun More accessible Approach: Ø Ø Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to perform confirmation and correction

Building lattice l Example n-best list: 1: jack studied very hard 2: jack studied

Building lattice l Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little

Insertion errors

Insertion errors

Acoustic confusions l l Given a word, find words that sound similar Look pronunciation

Acoustic confusions l l Given a word, find words that sound similar Look pronunciation up in dictionary: studied l Use observed phone confusions to generate alternative pronunciations: s t ah d iy d l s t ah d iy d s ao d iy s t ah d iy … Map pronunciation back to words: s t ah d iy d s ao d iy s t ah d iy studied saudi study

Acoustic confusions: “Jack studied hard”

Acoustic confusions: “Jack studied hard”

Language model confusions: “Jack studied hard” l Look at words before or after a

Language model confusions: “Jack studied hard” l Look at words before or after a node, add likely alternate words based on n-gram LM

Expansion results (on WSJ 1)

Expansion results (on WSJ 1)

Probability model l Our confirmation and correction interface requires probability of a letter given

Probability model l Our confirmation and correction interface requires probability of a letter given prior letters:

Probability model l l Keep track of possible paths in lattice Prediction based on

Probability model l l Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:

Handling word errors l l l Use default language model during entry of erroneous

Handling word errors l l l Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:

Evaluating expansion l Assume a good model requires as little information from the user

Evaluating expansion l Assume a good model requires as little information from the user as possible

Results on test set l l Model evaluated on held out test set (Hub

Results on test set l l Model evaluated on held out test set (Hub 1) Default language model Ø Ø l 2. 4 bits/letter User decides between 5. 3 letters Best speech-based model Ø Ø 0. 61 bits/letter User decides between 1. 5 letters

“To the mouse snow means freedom from want and fear”

“To the mouse snow means freedom from want and fear”

Questions?

Questions?