Linear Prediction 1 Outline Windowing n LPC n

Outline Windowing n LPC n Introduction to Vocoders n Excitation modeling n ¨ Pitch

Short-Time Processing n n n Speech signal is inherently non-stationary For continuant phonemes there

Linear Prediction Coding (LPC) n Based on all-pole model for speech production system: n

LPC parameter estimation n There are many methods to estimate the LPC parameters: ¨

LPC Parameters in Coding (vocoders) Θ 0 gain Pitch period, P voiced unvoiced DT

Linear Prediction (Introduction): n The object of linear prediction is to estimate the output

Linear Prediction (Introduction): n Many systems of interest to us are describable by a

Linear Prediction (Types of System Model): n There are two important variants : ¨

Linear Prediction (Derivation of LP equations): n Given a zero-mean signal y(n), in the

Linear Prediction (Derivation of LP equations): ¨ Thus we require that n Or, n

Linear Prediction (Derivation of LP equations): ¨ The orthogonality principle also states that resulting

Linear Prediction (Applications): n Autocorrelation matching : ¨ We have a signal y(n) with

Linear Prediction (Order of Linear Prediction): n The choice of predictor order depends on

Linear Prediction (AR Modeling of Speech Signal): n True Model: Pitch Gain s(n) DT

Linear Prediction (AR Modeling of Speech Signal): n Using LP analysis : Pitch Gain

Introduction to Vocoders s(n) original speech signal n n n vocoder analysis V/UV pitch

Pitch Detection Because speech signal in voiced frames is quasi-periodic (and not fully periodic),

Slides: 22

Download presentation

Linear Prediction 1

Outline Windowing n LPC n Introduction to Vocoders n Excitation modeling n ¨ Pitch Detection

Short-Time Processing n n n Speech signal is inherently non-stationary For continuant phonemes there are stationary periods of at least 20 -25 ms The short-time speech frames are assumed stationary The frame length should be chosen to include just one phoneme or allophone Frame lengths are usually chosen to be between 10 -50 ms We consider rectangular and Hamming windows here 3

Rectangular Window

Hamming Window

Comparison of Windows

Comparison of Windows (cont’d)

Linear Prediction Coding (LPC) n Based on all-pole model for speech production system: n In time domain, we get: n In other words, we can predict s[n] as a function of p previous signal samples (and the excitation). The set of {ak} is one way of representing the time varying filter. There are many other ways to represent this filter (e. g. , pole value, Lattice filter value, LSP, …). n

LPC parameter estimation n There are many methods to estimate the LPC parameters: ¨ Autocorrelation method: results in the optimization of a in a set of p linear equations. ¨ Covariance method n Procedures (such as Levinson-Durbin, Burg, Le Roux) obtain efficient estimation of these parameters.

LPC Parameters in Coding (vocoders) Θ 0 gain Pitch period, P voiced unvoiced DT impulse generator G(z) glottal filter V UV white noise generator Pitch period, P voiced unvoiced DT impulse generator white noise generator H(z) vocal tract filter R(z) lip radiation filter Θ 0 gain V all-pole filter UV Θ 0 gain s(n) speech signal

Linear Prediction (Introduction): n The object of linear prediction is to estimate the output sequence from a linear combination of input samples, past output samples or both : ¨ The factors a(i) and b(j) are called predictor coefficients. 11

Linear Prediction (Introduction): n Many systems of interest to us are describable by a linear, constant-coefficient difference equation : n If Y(z)/X(z)=H(z), where H(z) is a ratio of polynomials N(z)/D(z), then n Thus the predictor coefficients give us immediate access to the poles and zeros of H(z). 12

Linear Prediction (Types of System Model): n There are two important variants : ¨ All-pole model (in statistics, autoregressive (AR) model ) : n The numerator N(z) is a constant. ¨ All-zero model (in statistics, moving-average (MA) model ) : n The denominator D(z) is equal to unity. ¨ The mixed pole-zero model is called the autoregressive moving-average (ARMA) model. 13

Linear Prediction (Derivation of LP equations): n Given a zero-mean signal y(n), in the AR model : n The error is : n To derive the predictor we use the orthogonality principle, the principle states that the desired coefficients are those which make the error orthogonal to the samples y(n-1), y(n-2), …, y(n-p). 14

Linear Prediction (Derivation of LP equations): ¨ Thus we require that n Or, n Interchanging the operation of averaging and summing, and representing < > by summing over n, we have n The required predictors are found by solving these equations. 15

Linear Prediction (Derivation of LP equations): ¨ The orthogonality principle also states that resulting minimum error is given by n Or, ¨ We can minimize the error over all time : n n ¨ where 16

Linear Prediction (Applications): n Autocorrelation matching : ¨ We have a signal y(n) with known autocorrelation. We model this with the AR system shown below : σ 1 -A(z) 17

Linear Prediction (Order of Linear Prediction): n The choice of predictor order depends on the analysis bandwidth. The rule of thumb is : ¨ For a normal vocal tract, there is an average of about one formant per kilo Hertz of BW. ¨ One formant requires two complex conjugate poles. ¨ Hence for every formant we require two predictor coefficients, or two coefficients per kilo Hertz of bandwidth. 18

Linear Prediction (AR Modeling of Speech Signal): n True Model: Pitch Gain s(n) DT Voiced Impulse generator G(z) Glottal Filter Speech Signal U(n) Voiced V Volume velocity U H(z) Vocal tract Filter R(z) LP Filter Uncorrelated Unvoiced Noise generator Gain 19

Linear Prediction (AR Modeling of Speech Signal): n Using LP analysis : Pitch Gain DT Voiced Impulse generator estimate V U White Unvoiced Noise generator s(n) All-Pole Filter (AR) Speech Signal H(z) 20

Introduction to Vocoders s(n) original speech signal n n n vocoder analysis V/UV pitch filter parameters Channel (or storage) vocoder synthesizer ŝ(n) synthesized speech signal Beside the estimation of the vocal tract parameters, a vocoder needs excitation estimation. In early vocoders, this has been achieved by the estimation of V/UV, pitch, and gain. More modern vocoders involve more sophisticated estimation of the excitation, such as in CELP, where vector quantization is used.

Pitch Detection Because speech signal in voiced frames is quasi-periodic (and not fully periodic), the pitch detection is not always easy. n Especially in some phonemes that manifest less periodic behavior, pitch detection is difficult. n Some pitch detection methods: n ¨ AMDF (Average Magnitude Difference Function) ¨ Autocorrelation with center clipping