Speech Recognition Principles Speech Recognition Concepts Speech recognition

Speech Recognition Concepts Speech recognition is the inverse of Speech Synthesis Text Speech Phone

Speech Recognition Approaches Bottom-Up Approach Top-Down Approach Blackboard Approach 3

Bottom-Up Approach Signal Processing Knowledge Sources Feature Extraction Voiced/Unvoiced/Silence Segmentation Signal Processing Sound Classification

Top-Down Approach Inventory Word of speech Dictionary Grammar recognition units Feature Analysis Syntactic Hypo

Blackboard Approach Acoustic Processes Environmental Processes Lexical Processes Black board Semantic Processes Syntactic Processes

top down An overall view of a speech recognition system bottom up 7 From

Recognition Theories Articulatory Based Recognition – Use from Articulatory system for recognition – This

Recognition Problem We have the sequence of acoustic symbols and we want to find

Recognition Problem A : Acoustic Symbols W : Word Sequence we should find so

Simple Language Model Computing this probability is very difficult and we need a very

Simple Language Model (Cont’d) Trigram : Bigram : Monogram : 14

Simple Language Model (Cont’d) Computing Method : Number of happening W 3 after W

P(A|W) Computing Approaches Dynamic Time Warping (DTW) Hidden Markov Model (HMM) Artificial Neural Network

Dynamic Time Warping Method (DTW) To obtain a global distance between two speech patterns

Recognition Tasks Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR) Speaker Dependent And

Error Production Factor Prosody (Recognition should be Prosody Independent) Noise (Noise should be prevented)

Artificial Neural Network . . . Simple Computation Element of a Neural Network 21

Artificial Neural Network (Cont’d) Neural Network Types – Perceptron – Time Delay Neural Network

Artificial Neural Network (Cont’d) Single Layer Perceptron. . . 23

Artificial Neural Network (Cont’d) Three Layer Perceptron. . . 24

Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech

Neural Network Properties The system is simple, But too much iterative Doesn’t determine a

Hidden Markov Model Si Sj Observation : O 1, O 2, . . .

Slides: 27

Download presentation

Speech Recognition Principles

Speech Recognition Concepts Speech recognition is the inverse of Speech Synthesis Text Speech Phone Processing Sequence NLP Speech Recognition Speech Processing Text Speech Understanding 2

Speech Recognition Approaches Bottom-Up Approach Top-Down Approach Blackboard Approach 3

Bottom-Up Approach Signal Processing Knowledge Sources Feature Extraction Voiced/Unvoiced/Silence Segmentation Signal Processing Sound Classification Rules Feature Extraction Phonotactic Rules Segmentation Lexical Access Language Model Segmentation Recognized Utterance 4

Top-Down Approach Inventory Word of speech Dictionary Grammar recognition units Feature Analysis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Utterance Verifier/ Matcher Recognized Utterance Task Model Semantic Hypo thesis 5

Blackboard Approach Acoustic Processes Environmental Processes Lexical Processes Black board Semantic Processes Syntactic Processes 6

top down An overall view of a speech recognition system bottom up 7 From Ladefoged 2001

Recognition Theories Articulatory Based Recognition – Use from Articulatory system for recognition – This theory is the most successful until now Auditory Based Recognition – Use from Auditory system for recognition Hybrid Based Recognition – Is a hybrid from the above theories Motor Theory – Model the intended gesture of speaker 8

Recognition Problem We have the sequence of acoustic symbols and we want to find the words expressed by speaker Solution : Finding the most probable word sequence having Acoustic symbols 9

Recognition Problem A : Acoustic Symbols W : Word Sequence we should find so that 10

Bayse Rule 11

Bayse Rule (Cont’d) 12

Simple Language Model Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models. 13

Simple Language Model (Cont’d) Trigram : Bigram : Monogram : 14

Simple Language Model (Cont’d) Computing Method : Number of happening W 3 after W 1 W 2 Total number of happening W 1 W 2 Ad. Hoc Method : 15

From Ladefoged 2001 16

P(A|W) Computing Approaches Dynamic Time Warping (DTW) Hidden Markov Model (HMM) Artificial Neural Network (ANN) Hybrid Systems 17

Dynamic Time Warping Method (DTW) To obtain a global distance between two speech patterns a time alignment must be performed Ex : A time alignment path between a template pattern “SPEECH” and a noisy input “Ss. PEEh. H” 18

Recognition Tasks Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR) Speaker Dependent And Speaker Independent Vocabulary Size – Small <20 – Medium >100 , <1000 – Large >1000, <10000 – Very Large >10000 19

Error Production Factor Prosody (Recognition should be Prosody Independent) Noise (Noise should be prevented) Spontaneous Speech 20

Artificial Neural Network . . . Simple Computation Element of a Neural Network 21

Artificial Neural Network (Cont’d) Neural Network Types – Perceptron – Time Delay Neural Network Computational Element (TDNN) 22

Artificial Neural Network (Cont’d) Single Layer Perceptron. . . 23

Artificial Neural Network (Cont’d) Three Layer Perceptron. . . 24

Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech Features Delays PATTERN CLASSIFIER 25

Neural Network Properties The system is simple, But too much iterative Doesn’t determine a specific structure Regardless of simplicity, the results are good Training size is large, so training should be offline Accuracy is relatively good 26

Hidden Markov Model Si Sj Observation : O 1, O 2, . . . States in time : q 1, q 2, . . . All states : s 1, s 2, . . . 27