Signal Processing And Analysis Methods For Speech Recognition

  • Slides: 61
Download presentation
Signal Processing And Analysis Methods For Speech Recognition By Sarita Jondhale 1

Signal Processing And Analysis Methods For Speech Recognition By Sarita Jondhale 1

Introduction • Spectral analysis is the process of defining the speech in different parameters

Introduction • Spectral analysis is the process of defining the speech in different parameters for further processing • Eg short term energy, zero crossing rates, level crossing rates and so on • Methods for spectral analysis are therefore considered as core of the signal processing front end in a speech recognition system By Sarita Jondhale 2

Spectral Analysis models • Pattern recognition model • Acoustic phonetic model By Sarita Jondhale

Spectral Analysis models • Pattern recognition model • Acoustic phonetic model By Sarita Jondhale 4

Spectral Analysis Model By Sarita Jondhale Parameter measurement is common in both the systems

Spectral Analysis Model By Sarita Jondhale Parameter measurement is common in both the systems 5

Pattern recognition Model • The three basic steps in pattern recognition model are –

Pattern recognition Model • The three basic steps in pattern recognition model are – 1. parameter measurement – 2. pattern comparison – 3. decision making By Sarita Jondhale 6

1. Parameter measurement • To represent the relevant acoustic events in speech signal in

1. Parameter measurement • To represent the relevant acoustic events in speech signal in terms of compact efficient set of speech parameters • The choice of which parameters to use is dictated by other consideration • eg – computational efficiency, – type of Implementation , – available memory • The way in which representation is computed is based on signal processing considerations By Sarita Jondhale 7

Acoustic phonetic Model By Sarita Jondhale 8

Acoustic phonetic Model By Sarita Jondhale 8

Spectral Analysis • Two methods: – The Filter Bank spectrum – The Linear Predictive

Spectral Analysis • Two methods: – The Filter Bank spectrum – The Linear Predictive coding (LPC) By Sarita Jondhale 9

The Filter Bank spectrum Spectral representation Digital i/p The band pass filters coverage spans

The Filter Bank spectrum Spectral representation Digital i/p The band pass filters coverage spans the frequency range of interest in the signal By Sarita Jondhale 10

1. The Bank of Filters Front end Processor • One of the most common

1. The Bank of Filters Front end Processor • One of the most common approaches for processing the speech signal is the bank -of-filters model • This method takes a speech signal as input and passes it through a set of filters in order to obtain the spectral representation of each frequency band of interest. By Sarita Jondhale 11

 • Eg • 100 -3000 Hz for telephone quality signal • 100 -8000

• Eg • 100 -3000 Hz for telephone quality signal • 100 -8000 Hz for broadband signal • The individual filters generally do overlap in frequency • The output of the ith bandpass filter • where Wi is the normalized frequency By Sarita Jondhale 12

 • Each bandpass filter processes the speech signal independently to produce the spectral

• Each bandpass filter processes the speech signal independently to produce the spectral representation Xn By Sarita Jondhale 13

The Bank of Filters Front end Processor By Sarita Jondhale 14

The Bank of Filters Front end Processor By Sarita Jondhale 14

The Bank of Filters Front end Processor The sampled speech signal, s(n), is passed

The Bank of Filters Front end Processor The sampled speech signal, s(n), is passed through a bank of Q Band pass filters, giving the signals By Sarita Jondhale 15

The Bank of Filters Front end Processor The bank-of-filters approach obtains the energy value

The Bank of Filters Front end Processor The bank-of-filters approach obtains the energy value of the speech signal considering the following steps: • Signal enhancement and noise elimination. - To make the speech signal more evident to the bank of filters. • Set of bandpass filters. - Separate the signal in frequency bands. (uniform/non uniform filters ) By Sarita Jondhale 16

 • Nonlinearity. - The filtered signal at every band is passed through a

• Nonlinearity. - The filtered signal at every band is passed through a non linear function (for example a wave rectifier full wave or half wave) for shifting the bandpass spectrum to the low-frequency band. By Sarita Jondhale 17

The Bank of Filters Front end Processor • Low pass filter. - This filter

The Bank of Filters Front end Processor • Low pass filter. - This filter eliminates the high-frequency generated by the non linear function. • Sampling rate reduction and amplitude compression. - The resulting signals are now represented in a more economic way by re-sampling with a reduced rate and compressing the signal dynamic range. The role of the final lowpass filter is to eliminate the undesired spectral peaks By Sarita Jondhale 18

The Bank of Filters Front end Processor Assume that the output of the ith

The Bank of Filters Front end Processor Assume that the output of the ith bandpass filter is a pure sinusoid at frequency I If full wave rectifier is used as the nonlinearity By Sarita Jondhale 19

Types of Filter Bank Used For Speech Recognition • uniform filter bank • Non

Types of Filter Bank Used For Speech Recognition • uniform filter bank • Non uniform filter bank By Sarita Jondhale 21

uniform filter bank • The most common filter bank is the uniform filter bank

uniform filter bank • The most common filter bank is the uniform filter bank • The center frequency, fi, of the ith bandpass filter is defined as • Q is number of filters used in bank of filters By Sarita Jondhale 22

uniform filter bank • The actual number of filters used in the filter bank

uniform filter bank • The actual number of filters used in the filter bank • bi is the bandwidth of the ith filter • There should not be any frequency overlap between adjacent filter channels By Sarita Jondhale 23

uniform filter bank If bi < Fs/N, then the certain portions of the speech

uniform filter bank If bi < Fs/N, then the certain portions of the speech spectrum would be missing from the analysis and the resulting speech spectrum would not be considered very meaningful By Sarita Jondhale 24

nonuniform filter bank • Alternative to uniform filter bank is nonuniform filter bank •

nonuniform filter bank • Alternative to uniform filter bank is nonuniform filter bank • The criterion is to space the filters uniformly along a logarithmic frequency scale. • For a set of Q bandpass filters with center frequncies fi and bandwidths bi, 1≤i≤Q, we set By Sarita Jondhale 25

nonuniform filter bank By Sarita Jondhale 26

nonuniform filter bank By Sarita Jondhale 26

 • The most commonly used values of α=2 • This gives an octave

• The most commonly used values of α=2 • This gives an octave band spacing adjacent filters • And α=4/3 gives 1/3 octave filter spacing By Sarita Jondhale 27

Implementations of Filter Banks • Depending on the method of designing the filter bank

Implementations of Filter Banks • Depending on the method of designing the filter bank can be implemented in various ways. • Design methods for digital filters fall into two classes: – Infinite impulse response (IIR) (recursive filters) – Finite impulse response By Sarita Jondhale 28

The FIR filter: (finite impulse response) or non recursive filter • The present output

The FIR filter: (finite impulse response) or non recursive filter • The present output is depend on the present input sample and previous input samples • The impulse response is restricted to finite number of samples By Sarita Jondhale 29

 • Advantages: – Stable, noise less sever – Excellent design methods are available

• Advantages: – Stable, noise less sever – Excellent design methods are available for various kinds of FIR filters – Phase response is linear • Disadvantage: – Costly to implement – Memory requirement and execution time are high – Require powerful computational facilities By Sarita Jondhale 30

The IIR filter: (Infinite impulse response) or recursive filter • The present output sample

The IIR filter: (Infinite impulse response) or recursive filter • The present output sample is depends on the present input, past input samples and output samples • The impulse response extends over an infinite duration By Sarita Jondhale 31

 • Advantage: – Simple to design – Efficient • Disadvantage: – Phase response

• Advantage: – Simple to design – Efficient • Disadvantage: – Phase response is non linear – Noise affects more – Not stable By Sarita Jondhale 32

FIR Filters By Sarita Jondhale 33

FIR Filters By Sarita Jondhale 33

FIR Filters • Less expensive implementation can be derived by representing each bandpass filter

FIR Filters • Less expensive implementation can be derived by representing each bandpass filter by a fixed low pass window (n) modulated by the complex exponential By Sarita Jondhale 34

Frequency Domain Interpretation For Short Term Fourier Transform A At n=n 0 Where FT[.

Frequency Domain Interpretation For Short Term Fourier Transform A At n=n 0 Where FT[. ] denotes Fourier Transform Sn 0(ej i) is the conventional Fourier transform of the windowed signal, s(m)w(n 0 -m), evaluated at the frequency = i By Sarita Jondhale 35

Frequency Domain Interpretation For Short Term Fourier Transform Shows which part of s(m) are

Frequency Domain Interpretation For Short Term Fourier Transform Shows which part of s(m) are used in the computation of the short time Fourier transform By Sarita Jondhale 36

Frequency Domain Interpretation For Short Term Fourier Transform • Since w(m) is an FIR

Frequency Domain Interpretation For Short Term Fourier Transform • Since w(m) is an FIR filter with size L then from the definition of Sn(ej i) we can state that – If L is large, relative to the signal periodicity then Sn(ej i) gives good frequency resolution – If L is small, relative to the signal periodicity then Sn(ej i) gives poor frequency resolution By Sarita Jondhale 37

Frequency Domain Interpretation For Short Term Fourier Transform For L=500 points Hamming window is

Frequency Domain Interpretation For Short Term Fourier Transform For L=500 points Hamming window is applied to a section of voiced speech. The periodicity of the signal is seen in the windowed time waveform as well as in the short time spectrum in which the fundamental frequency and its harmonics show up as narrow peaks at equally spaced frequencies. By Sarita Jondhale 38

Frequency Domain Interpretation For Short Term Fourier Transform For short windows, the time sequence

Frequency Domain Interpretation For Short Term Fourier Transform For short windows, the time sequence s(m)w(n-m) doesn’t show the signal periodicity, nor does the signal spectrum. It shows the broad spectral envelop very well. By Sarita Jondhale 39

Frequency Domain Interpretation For Short Term Fourier Transform Shows irregular series of local peaks

Frequency Domain Interpretation For Short Term Fourier Transform Shows irregular series of local peaks and valleys due to the random nature of the unvoiced speech By Sarita Jondhale 40

Frequency Domain Interpretation For Short Term Fourier Transform Using the shorter window smoothes out

Frequency Domain Interpretation For Short Term Fourier Transform Using the shorter window smoothes out the random fluctuations in the short time spectral magnitude and shows the broad spectral envelope very well By Sarita Jondhale 41

Linear Filtering Interpretation of the short-time Fourier Transform • The linear filtering interpretation of

Linear Filtering Interpretation of the short-time Fourier Transform • The linear filtering interpretation of the short time Fourier Transform * From A • i. e Sn(ejwi) is a convolution of the low pass window, w(n), with the speech signal, s(n), modulated to the center frequency wi By Sarita Jondhale 42

FFT Implementation of Uniform Filter Bank Based on the Short-Time FT By Sarita Jondhale

FFT Implementation of Uniform Filter Bank Based on the Short-Time FT By Sarita Jondhale 43

FFT Implementation of Uniform Filter Bank Based on the Short-Time FT By Sarita Jondhale

FFT Implementation of Uniform Filter Bank Based on the Short-Time FT By Sarita Jondhale 44

FFT Implementation of Uniform Filter Bank Based on The Short Time FT The FFT

FFT Implementation of Uniform Filter Bank Based on The Short Time FT The FFT implementation is more efficient than the direct form structure By Sarita Jondhale 45

Nonuniform FIR Filter Bank Implementations The most general form of a nonuniform FIR filter

Nonuniform FIR Filter Bank Implementations The most general form of a nonuniform FIR filter bank By Sarita Jondhale 46

Nonuniform FIR Filter Bank Implementations • The kth bandpass filter impulse response, hk(n), represents

Nonuniform FIR Filter Bank Implementations • The kth bandpass filter impulse response, hk(n), represents a filter with a center frequency k, and bandwidth k. • The set of Q bandpass filters covers the frequency range of interest for the intended speech recognition application By Sarita Jondhale 47

Nonuniform FIR Filter Bank Implementations • Each band pass filter is implemented via a

Nonuniform FIR Filter Bank Implementations • Each band pass filter is implemented via a direct convolution • Each band pass filter is designed via the windowing design method • The composite frequency response of the Q-channel filter bank is independent of the number and distribution of the individual filters By Sarita Jondhale 48

Nonuniform FIR Filter Bank Implementations A filter bank with the three filters has the

Nonuniform FIR Filter Bank Implementations A filter bank with the three filters has the exact same composite frequency response as the filter bank with the seven filters shown in figure above By Sarita Jondhale 49

Nonuniform FIR Filter Bank Implementations • The impulse response of the kth bandpass filter

Nonuniform FIR Filter Bank Implementations • The impulse response of the kth bandpass filter FIR window Impulse response of ideal band pass filer • The frequency response of the kth bandpass filter * By Sarita Jondhale 50

Nonuniform FIR Filter Bank Implementations Thus the frequency response of the composite filter bank

Nonuniform FIR Filter Bank Implementations Thus the frequency response of the composite filter bank * 1 * By Sarita Jondhale 51

Nonuniform FIR Filter Bank Implementations • Where wmin is the lowest frequency in the

Nonuniform FIR Filter Bank Implementations • Where wmin is the lowest frequency in the filter bank and wmax is the highest frequency • Equation 1 can be written as * • Which is independent of the number of ideal filters, Q, and their distribution in the frequency By Sarita Jondhale 52

FFT-Based Nonuniform Filter Banks • By combining two or more uniform channels the nonuniformity

FFT-Based Nonuniform Filter Banks • By combining two or more uniform channels the nonuniformity can be created • Consider taking an N-point DFT of the sequence x(n) By Sarita Jondhale 53

FFT-Based Nonuniform Filter Banks • The equivalent kth channel value, Xk’ can be obtained

FFT-Based Nonuniform Filter Banks • The equivalent kth channel value, Xk’ can be obtained by weighing the sequence, x(n) by the complex sequence 2 exp(-j ( n/N))cos( n/N). • If more than two channels are combined, then a different equivalent weighing sequence results By Sarita Jondhale 54

Tree Structure Realizations of Nonuniform Filter Banks In this method the speech signal is

Tree Structure Realizations of Nonuniform Filter Banks In this method the speech signal is filtered in the stages, and the sampling rate is successively reduced at each stage By Sarita Jondhale 55

Tree Structure Realizations of Nonuniform Filter Banks By Sarita Jondhale 56

Tree Structure Realizations of Nonuniform Filter Banks By Sarita Jondhale 56

Tree Structure Realizations of Nonuniform Filter Banks • The original speech signal, s(n), is

Tree Structure Realizations of Nonuniform Filter Banks • The original speech signal, s(n), is filtered initially into two bands, a low band a high band • The high band is down sampled by 2 and represents the highest octave band ( /2≤ ≤ ) of the filter bank. • The low band is similarly down sampled by 2 and fed into second filtering stage in which the signal is again split into two equal bands. • Again the high band of the stage 2 is down sampled by 2 and is used as a next highest filter By Sarita Jondhale 57 bank output.

Tree Structure Realizations of Nonuniform Filter Banks • The low band is also down

Tree Structure Realizations of Nonuniform Filter Banks • The low band is also down sampled by 2 and fed into a third stage of filters • These third stage output after down sampling by factor 2, are used as the two lowest filter bands By Sarita Jondhale 58

Summary of considerations for speech recognition filter banks 1 st. Type of digital filter

Summary of considerations for speech recognition filter banks 1 st. Type of digital filter used (IIR (recursive) or FIR (nonrecursive)) • • IIR: Advantage: simple to implement and efficient. Disadvantage: phase response is nonlinear FIR: Advantage: phase response is linear Disadvantage: expensive in implementation By Sarita Jondhale 59

Summary of considerations for speech recognition filter banks 2 nd. The number of filters

Summary of considerations for speech recognition filter banks 2 nd. The number of filters to be used in the filter bank. 1. For uniform filter banks the number of filters, Q, can not be too small or else the ability of the filter bank to resolve the speech spectrum is greatly damaged. The value of Q less than 8 are generally avoided 2. The value of Q can not be too large, because the filter bandwidths would eventually be too narrow for some talker (eg. High-pitch females) i. e no prominent harmonics would fall within the band. (in practical systems the value of Q≤ 32). By Sarita Jondhale 60

Summary of considerations for speech recognition filter banks In order to reduce overall computation,

Summary of considerations for speech recognition filter banks In order to reduce overall computation, many practical systems have used nonuniform spaced filter banks By Sarita Jondhale 61

Summary of considerations for speech recognition filter banks 3 rd. The choice of nonlinearity

Summary of considerations for speech recognition filter banks 3 rd. The choice of nonlinearity and LPF used at the output of each channel • Nonlinearity: Full wave or Half wave rectifier • LPF: varies from simple integrator to a good quality IIR lowpass filter. By Sarita Jondhale 62

By Sarita Jondhale 63

By Sarita Jondhale 63