Speech Recognition Created By Kanjariya Hardik G Introduction
Speech Recognition Created By : Kanjariya Hardik G.
Introduction ü Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking. ü Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone , to a set of words. ü And with the help of these it will recognize whole speech is recognized word by word.
Types of SR ü There are two main types of speaker models: speaker independent an speaker dependent. ü Speaker independent models recognize the speech patterns of a large group of people. ü Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive. ü Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.
How does it works? . . ü Speech produces a sound pressure wave which forms an acoustic signal. The microphone – receives the acoustic signal and converts it to an analogue signal. ü To store the analogue signal, it must be converted to a digital signal. ü A speech recognizer tries to transform a digitally encoded acoustic signal in a natural language into text in that language.
Speech Waveform/Spectrogram s p ee ch l a b Hz ü The spectrogram is an alternative way to characterize speech. ü The louder the sound the greater the amplitude on the y-axis. s
Speech Recognition Process Flow
The major components ü Audio input ü Grammar ü Acoustic Model ü Recognized text
Audio I/O ü It is important to understand that this audio stream is rarely pristine ü It contains not only the speech data (what was said) but also background noise. ü This noise can interfere with the recognition process, and the speech engine must handle (and possibly even adapt to) the environment within which the audio is spoken.
Acoustic+Grammer ü Once the speech data is in the proper format, the engine searches for the best match. ü It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating. ü The knowledge of the environment is provided in the form of an acoustic model. ü Once it identifies the most likely match for what was said, it returns what it recognized as a text string.
About SR Engine ü SR requires a software application "engine" with logic built in to decipher and act on the spoken word. ü Sound Card – Converts acoustic signal to digital signal. ü Function of SR Engine– SR Engine converts these digital signal to phonemes to word.
ü Different SR engine Ø CMU Sphinx Ø Microsoft SAPI Ø IBM Via. Voice
Decoding process.
Recognition Process Flow Summary ü Step 1: User Input The system catches user’s voice in the form of analog acoustic signal. ü Step 2: Digitization Digitize the analog acoustic signal. ü Step 3: Phonetic Breakdown Breaking signals into phonemes.
Recognition Process Flow Summary ü Step 4: Statistical Modeling Ø Mapping phonemes to their phonetic representation using statistics model. ü Step 5: Matching Ø According to grammar , phonetic representation and Dictionary , the system returns an n-best list (I. e. : a word plus a confidence score) Ø Grammar-the union words or phrases to constraint the range of input or output in the voice application. Ø Dictionary-the mapping table of phonetic representation and word(EX: thu, thee the)
REPRESENTATION OF SOFTWARE 15
Challenges and Difficulties of SR Speech Recognition is still a very cumbersome problem. Following are the problem…. ü Speaker Variability Two speakers or even the same speaker will pronounce the same word differently ü Channel Variability The quality and position of microphone and background environment will affect the output
Current Software Options for PC ü Dragon Systems – Naturally Speaking ü Philips – Free. Speech ü IBM – Via. Voice ü Lernout & Hauspie – Voice Xpress
- Slides: 19