Speech Recognition Techniques Xiaofeng Lai What is speech

  • Slides: 14
Download presentation
Speech Recognition Techniques Xiaofeng Lai

Speech Recognition Techniques Xiaofeng Lai

What is speech recognition? � Speech recognition: ³ This is the ability of a

What is speech recognition? � Speech recognition: ³ This is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.

Outline � � Brief history of speech recognition. Introduction to how it works Applications

Outline � � Brief history of speech recognition. Introduction to how it works Applications Dragon Dictation

Brief History � � � 1950’s, AT&T Bell Laboratories designed “Audrey”. 1960’s, IBM demonstrated

Brief History � � � 1950’s, AT&T Bell Laboratories designed “Audrey”. 1960’s, IBM demonstrated “Shoebox”. 1970’s, with the help of Do. D’s DARPA. 1980’s, The Hidden Markov Model helped. 1990’s, the software for speech recognition came to people, for example, Dragon. 2000’s, computer speech recognition sort of stalls. Like, google voice research, Siri.

How it works Input Speech ADC Statistical modeling systems Output

How it works Input Speech ADC Statistical modeling systems Output

How it works � Input speech ³ ³ � Discrete Continuous Analog-to-digital converter (ADC)

How it works � Input speech ³ ³ � Discrete Continuous Analog-to-digital converter (ADC) ³ The speech recognition technology converts these created vibrations to digital format. Extract phonemes ® Organize grammar ®

How it works � Statistical modeling systems ³ The Hidden Markov model (HMM) ®

How it works � Statistical modeling systems ³ The Hidden Markov model (HMM) ® ³ Artificial neural networks (ANN) ® ® ³ Most common used in everything from data compression to sound recognition. Were originally developed to model of human brain function. Biology The difference ® ® HMM is a special case of the ANNs are capable of modeling extremely complex biological functions

The Hidden Markov Model � � It is a directed graph augmented with probability

The Hidden Markov Model � � It is a directed graph augmented with probability scores. N 1 N 2 N 3 = 0. 4 X 0. 8 X 0. 5 = 0. 16 N 1 N 2 N 2 N 3 N 3 N 3 = 0. 4 x 0. 2 x 0. 8 x 0. 5 = 0. 0008 N 1 N 2 N 3 = 0. 6 x 0. 4 x 0. 2 x 0. 8 x 0. 5 = 0. 192

� Example t ow m aa t ow - British English t ah m

� Example t ow m aa t ow - British English t ah m ey t ow - American English t ah mey t a - Possibly pronunciation when speaking quickly

Applications � � � Healthcare Military Telephone Business People with disabilities Google’s Voice Search,

Applications � � � Healthcare Military Telephone Business People with disabilities Google’s Voice Search, however, has been available on Android and i. Phones.

Applications � Dragon Dictation ³ ³ Powered by Nuance’s world-renowned Dragon Naturally. Speaking software

Applications � Dragon Dictation ³ ³ Powered by Nuance’s world-renowned Dragon Naturally. Speaking software 2. 0, you can send text or email to your friends, send notes and reminders to yourself … all using your voice.

Applications

Applications

Thank you � Questions?

Thank you � Questions?

References � � � http: //www. generation 5. org/content/2002 /howsrworks. asp http: //electronics. howstuffworks.

References � � � http: //www. generation 5. org/content/2002 /howsrworks. asp http: //electronics. howstuffworks. com/gadg ets/high-tech-gadgets/speechrecognition 1. htm http: //en. wikipedia. org/wiki/Speech_recog nition