Speech Audio Processing Speech Audio Coding Examples Veton

  • Slides: 36
Download presentation
Speech & Audio Processing Speech & Audio Coding Examples Veton Këpuska

Speech & Audio Processing Speech & Audio Coding Examples Veton Këpuska

A Simple Speech Coder u LPC Based Analysis Structure Linear Prediction Analysis Windowing Analysis

A Simple Speech Coder u LPC Based Analysis Structure Linear Prediction Analysis Windowing Analysis Filter 9/30/2020 Auto. Correlation Levinson. Durbin Quantization Audio Input Preemphasis Residual Filter Coeffs Veton Këpuska 2

Windowing Analysis Stage N – Length of the Analysis Window 10 -30 msec 9/30/2020

Windowing Analysis Stage N – Length of the Analysis Window 10 -30 msec 9/30/2020 Veton Këpuska 3

Some Analysis Windows 9/30/2020 Veton Këpuska 4

Some Analysis Windows 9/30/2020 Veton Këpuska 4

MATLAB Useful Functions u wintool n u window n u Use “doc wintool” for

MATLAB Useful Functions u wintool n u window n u Use “doc wintool” for more information Use “>doc window” for the list of supported windows Define your own window if needed e. g: n 9/30/2020 Sine window and Vorbis window Veton Këpuska 5

LPC Analysis Stage u LPC Method Described in: n u Summary: n n u

LPC Analysis Stage u LPC Method Described in: n u Summary: n n u Ch 5 -Analysis_&_Synthesis_of_Pole. Zero_Speech_Models. ppt Perform Autocorrelation Solve system of equations with Durbin. Levinson Method MATLAB help n 9/30/2020 doc lpc, etc. Veton Këpuska 6

Example of MATLAB Code function my. LPCCodec(wavfile, N) % % wavfile - input MS

Example of MATLAB Code function my. LPCCodec(wavfile, N) % % wavfile - input MS wav file % N - LPC Filter Order % [x, fs, nbits] = wavread(wavfile); % plot(x); % Playing Original Signal soundsc(x, fs); % Performing LPC analysis using MATLAB lpc function [a, g] = lpc(x, N); % performing filtering operation on estimated filter coeffs % producing predicted samples est_x = filter([0 -a(2: end)], 1, x); % error signal e = x - est_x; % Testing the quality of predicted samples soundsc(est_x, fs); ge[n] ŝ[n] % Synthesis Stage With Zero Loss of Information syn_x = filter([0 -a(2: end)], 1, g. *e); soundsc(syn_x, fs); 9/30/2020 Veton Këpuska 7

Analysis of Quantization Errors u Use MATLAB functions to research the effects of quantization

Analysis of Quantization Errors u Use MATLAB functions to research the effects of quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: n n u Double (float 64) representation (software emulation) Float (float 32) representation (software emulation) Int (int 32) representation (hardware emulation) Short (int 16) representation (hardware emulation). Useful MATLAB functions: n n Fix, floor, round, ceil Example: u u 9/30/2020 sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits. Veton Këpuska 8

Quantization of Error Signal & Filter Coefficients u u Can Apply ADPCM for Error

Quantization of Error Signal & Filter Coefficients u u Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form are found to be sensitive to quantization errors: n n n 9/30/2020 Small quantization error can have a large effect on filter characteristics. Issue is that polynomial coefficients have nonlinear mapping to poles of the filter (e. g. , roots of the polynomial). Alternate representations possible that have significantly better tolerance to quantization error. Veton Këpuska 9

LPC Filter Representations u u As noted previously when Levinson-Durbin algorithm was introduced one

LPC Filter Representations u u As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients: LPC to PARCOR: 9/30/2020 Veton Këpuska 10

PARCOR Filter Representation u PARCOR to LPC: 9/30/2020 Veton Këpuska 11

PARCOR Filter Representation u PARCOR to LPC: 9/30/2020 Veton Këpuska 11

Line Spectral Frequency Representation The PARCOR lattice structure of the LPC synthesis filter above:

Line Spectral Frequency Representation The PARCOR lattice structure of the LPC synthesis filter above: kp+1=∓ 1 Input + z-1 Bp 9/30/2020 Ap-1 Ap - A 0 + kp z-1 Bp-1 Veton Këpuska - kp-1 Output -1 u z-1 0= u It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties. Note that: k u B 0 12

Line Spectral Frequency Representation u u From previous slide the following holds: From this

Line Spectral Frequency Representation u u From previous slide the following holds: From this realization of the filter the LSP representation is derived: 9/30/2020 Veton Këpuska 13

LSF Representation 9/30/2020 Veton Këpuska 14

LSF Representation 9/30/2020 Veton Këpuska 14

LPC Synthesis Filter with LSF 9/30/2020 Veton Këpuska 15

LPC Synthesis Filter with LSF 9/30/2020 Veton Këpuska 15

A Simple Speech Coder u LPC Based Synthesis Structure Decoding Residual Signal Residual Synthesis

A Simple Speech Coder u LPC Based Synthesis Structure Decoding Residual Signal Residual Synthesis Filter Deemphasis Audio Output Filter Coeffs 9/30/2020 Veton Këpuska 16

Audio Coding Veton Këpuska

Audio Coding Veton Këpuska

Audio Coding u u Most of the Audio Coding Standards use principles of Psychoacoustics.

Audio Coding u u Most of the Audio Coding Standards use principles of Psychoacoustics. Example of Basic Structure of MP 3 encoder: Audio Input Filterbank & Transform Quantization Bit-stream Psychoacoustic Model 9/30/2020 Veton Këpuska 18

Basic Structure of Audio Coders u u u Filterbank Processing Psychoacoustic Model Quantization 9/30/2020

Basic Structure of Audio Coders u u u Filterbank Processing Psychoacoustic Model Quantization 9/30/2020 Veton Këpuska 19

Filter Bank Analysis Synthesis Veton Këpuska

Filter Bank Analysis Synthesis Veton Këpuska

Filterbank Processing: u Splitting full-band signal into several subbands: n n 9/30/2020 Uniform sub-bands

Filterbank Processing: u Splitting full-band signal into several subbands: n n 9/30/2020 Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear transformation) u Reflect Human Auditory Apparatus. u Mel-Scale and Bark-Scale transformations Veton Këpuska 21

Mel-Scale 9/30/2020 Veton Këpuska 22

Mel-Scale 9/30/2020 Veton Këpuska 22

Bark-Scale 9/30/2020 Veton Këpuska 23

Bark-Scale 9/30/2020 Veton Këpuska 23

Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N

Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↓ - Down-sampling MDCT – Modified Discrete Cosine Transform MDCT ↓ Audio Input 9/30/2020 MDCT Quantization h 1[n] hk[n] ↓ MDCT h. N[n] ↓ MDCT Veton Këpuska Bit Stream 24

Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter

Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↑ - Up-sampling Bit Stream 9/30/2020 MDCT IMDCT ↑ g 1[n] Decoding IMDCT – Inverse Modified Discrete Cosine Transform MDCT IMDCT ↑ gk[n] MDCT IMDCT ↑ g. N[n] Veton Këpuska Audio Output 25

Psycho-Acoustic Modeling Veton Këpuska

Psycho-Acoustic Modeling Veton Këpuska

Psychoacoustic Model u Masking Threshold according to the human auditory perception. n n 9/30/2020

Psychoacoustic Model u Masking Threshold according to the human auditory perception. n n 9/30/2020 Masking threshold is used to quantize the Discrete Cosine Transform Coefficients Analysis is done in frequency domain represented by DFT and computed by FFT. Veton Këpuska 27

Threshold of Hearing u u Absolute threshold of audibly perceptible events in quiet conditions

Threshold of Hearing u u Absolute threshold of audibly perceptible events in quiet conditions (no other sounds). Any signal below the threshold can be removed without effect on the perception. 9/30/2020 Veton Këpuska 28

Threshold of Hearing 9/30/2020 Veton Këpuska 29

Threshold of Hearing 9/30/2020 Veton Këpuska 29

Frequency Masking u u Schröder Spreading Function Bark Scale Function: 9/30/2020 Veton Këpuska 30

Frequency Masking u u Schröder Spreading Function Bark Scale Function: 9/30/2020 Veton Këpuska 30

Masking Curve 9/30/2020 Veton Këpuska 31

Masking Curve 9/30/2020 Veton Këpuska 31

Primary Tone 1 k. Hz 9/30/2020 Veton Këpuska 32

Primary Tone 1 k. Hz 9/30/2020 Veton Këpuska 32

Masked Tone 900 Hz 9/30/2020 Veton Këpuska 33

Masked Tone 900 Hz 9/30/2020 Veton Këpuska 33

Combined Sound 1 k. Hz + 0. 9 k. Hz 9/30/2020 Veton Këpuska 34

Combined Sound 1 k. Hz + 0. 9 k. Hz 9/30/2020 Veton Këpuska 34

Combined 1 k. Hz + 0. 9 k. Hz (-10 d. B) 9/30/2020 Veton

Combined 1 k. Hz + 0. 9 k. Hz (-10 d. B) 9/30/2020 Veton Këpuska 35

Combined 1 k. Hz + 5 k. Hz (-10 d. B) 9/30/2020 Veton Këpuska

Combined 1 k. Hz + 5 k. Hz (-10 d. B) 9/30/2020 Veton Këpuska 36