8 Speech Recognition Speech Recognition Concepts Speech Recognition
- Slides: 57
8 -Speech Recognition � Speech Recognition Concepts � Speech Recognition Approaches � Recognition Theories � Bayse Rule � Simple Language Model � P(A|W) Network Types 1
7 -Speech Recognition (Cont’d) � HMM Calculating Approaches � Neural Components � Three Basic HMM Problems � Viterbi Algorithm � State Duration Modeling � Training In HMM 2
Recognition Tasks Isolated Word Recognition (IWR) Connected Word (CW) , And Continuous Speech Recognition (CSR) � Speaker Dependent, Multiple Speaker, And Speaker Independent � Vocabulary Size � �Small <20 �Medium >100 , <1000 �Large >1000, <10000 �Very Large >10000 3
Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Text Speech Phone Processing Sequence NLP Speech Processing Text Speech Understanding Speech Recognition 4
Speech Recognition Approaches � Bottom-Up Approach � Top-Down Approach � Blackboard Approach 5
Bottom-Up Approach Signal Processing Knowledge Sources Feature Extraction Voiced/Unvoiced/Silence Segmentation Signal Processing Sound Classification Rules Feature Extraction Phonotactic Rules Segmentation Lexical Access Language Model Segmentation Recognized Utterance 6
Top-Down Approach Inventory Word of speech Dictionary Grammar recognition units Feature Analysis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Utterance Verifier/ Matcher Recognized Utterance Task Model Semantic Hypo thesis 7
Blackboard Approach Acoustic Processes Environmental Processes Lexical Processes Black board Semantic Processes Syntactic Processes 8
Recognition Theories Articulatory Based Recognition �Use from Articulatory system for recognition �This theory is the most successful until now � Auditory Based Recognition �Use from Auditory system for recognition � Hybrid Based Recognition �Is a hybrid from the above theories � Motor Theory �Model the intended gesture of speaker � 9
Recognition Problem � We have the sequence of acoustic symbols and we want to find the words that expressed by speaker � Solution : Finding the most probable of word sequence by having Acoustic symbols 10
Recognition Problem �A : Acoustic Symbols � W : Word Sequence � we should find so that 11
Bayse Rule 12
Bayse Rule (Cont’d) 13
Simple Language Model Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models. 14
Simple Language Model (Cont’d) Trigram : Bigram : Monogram : 15
Simple Language Model (Cont’d) Computing Method : Number of happening W 3 after W 1 W 2 Total number of happening W 1 W 2 Ad. Hoc Method : 16
Error Production Factor � Prosody (Recognition should be Prosody Independent) � Noise (Noise should be prevented) � Spontaneous Speech 17
P(A|W) Computing Approaches � Dynamic � Hidden Markov Model (HMM) � Artificial � Hybrid Time Warping (DTW) Neural Network (ANN) Systems 18
Dynamic Time Warping
Dynamic Time Warping
Dynamic Time Warping
Dynamic Time Warping
Dynamic Time Warping Search Limitation : - First & End Interval - Global Limitation - Local Limitation
Dynamic Time Warping Global Limitation :
Dynamic Time Warping Local Limitation :
Artificial Neural Network . . . Simple Computation Element of a Neural Network 26
Artificial Neural Network (Cont’d) � Neural Network Types �Perceptron �Time Delay Neural Network Computational Element (TDNN) 27
Artificial Neural Network (Cont’d) Single Layer Perceptron. . . 28
Artificial Neural Network (Cont’d) Three Layer Perceptron. . . 29
2. 5. 4. 2 Neural Network Topologies 30
TDNN 31
2. 5. 4. 6 Neural Network Structures for Speech Recognition 32
2. 5. 4. 6 Neural Network Structures Speech Recognition 33
Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech Features Delays � PATTERN CLASSIFIER 34
Neural Network Properties � The system is simple, But too much iteration is needed for training � Doesn’t determine a specific structure � Regardless of simplicity, the results are good � Training size is large, so training should be offline � Accuracy is relatively good 35
Pre-processing � Different preprocessing techniques are employed as the front end for speech recognition systems � The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc. 36
38
39
41
42
کپﺴﺘﺮﻭﻡ - ﺭﻭﺵ ﻣﻞ ﺳیگﻨﺎﻝ ﺯﻣﺎﻧی ﻓﺮیﻢ ﺑﻨﺪی |FFT|2 Mel-scaling Logarithm IDCT Cepstra Delta & Delta Cepstra Differentiator 48 Low-order coefficients
Time-Frequency analysis � Short-term Fourier Transform � Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. � W(n): windowing function � N: frame length � p: step size 51
Critical band integration � Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise � Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole 52
Bark scale 53
Feature orthogonalization � Spectral values in adjacent frequency channels are highly correlated � The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix � Decorrelation is useful to improve the parameter estimation. 54
Language Models for LVCSR Word Pair Model: Specify which word pairs are valid
Statistical Language Modeling
Perplexity of the Language Model Entropy of the Source: First order entropy of the source: If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,
We often compute H based on a finite but sufficiently large Q: H is the degree of difficulty that the recognizer encounters, on average, When it is to determine a word from the same source. Using language model, if the N-gram language model PN(W) is used, An estimate of H is: In general: Perplexity is defined as:
Overall recognition system based on subword units
- Occupational recognition award
- Awards recognition concepts
- Awards recognition concepts
- Speech recognition
- Fundamentals of speech recognition
- Speech recognition presentation
- Speech recognition app inventor
- Deep learning speech recognition
- Speech recognition software
- Dragon speech recognition
- Aude leperre
- Cmu speech recognition
- Kinect for windows speech recognition language pack
- Htk tutorial
- Electron speech to text
- Julia speech recognition
- You say we are playing football indirect speech
- My friend speech
- Match the adverbs in direct speech and reported speech
- Persuasive speech vs informative speech
- Before reported speech
- Informative vs persuasive speech
- Direct to indirect speech
- Change to direct speech
- Pure speech
- Reported speech form
- Wont reported speech
- Past simple reported speech
- Quoted speech to reported speech
- Speech to the young speech to the progress-toward analysis
- Speech to the young
- Direct and indirect speech worksheets with answers
- Direct speech into reported speech
- Narration examples with answers
- Verbos reported speech
- Examples of direct and indirect speech acts
- Renu said i am hungry reporting verb
- Postal address recognition
- Praise, recognition and power are_______.
- Pqa levels of recognition
- Shrm employee recognition
- Unconformity
- Rangkuman chapter 18 revenue recognition
- Face recognition literature review
- Pedro felzenszwalb
- Revenue recognition income statement
- Text recognition
- Exploring self attention for image recognition
- Object detection matlab
- Suspected cancer recognition and referral
- Hazard recognition quiz
- Potential synoynm
- Pattern recognition
- Ann kruse pwc
- Chapter 3 adjusting accounts for financial statements
- Fors practitioner
- Ioiioiii
- Chapter 18 revenue recognition kieso