Speech Audio Processing Speech Audio Coding Examples A

  • Slides: 37
Download presentation
Speech & Audio Processing Speech & Audio Coding Examples

Speech & Audio Processing Speech & Audio Coding Examples

A Simple Speech Coder u LPC Based Analysis Structure Linear Prediction Analysis Windowing Analysis

A Simple Speech Coder u LPC Based Analysis Structure Linear Prediction Analysis Windowing Analysis Filter 11/30/2020 Levinson. Durbin Auto. Correlation Quantization Audio Input Preemphasis Residual Filter Coeffs Veton Këpuska 2

Windowing Analysis Stage N – Length of the Analysis Window 10 -30 msec 11/30/2020

Windowing Analysis Stage N – Length of the Analysis Window 10 -30 msec 11/30/2020 Veton Këpuska 3

Some Analysis Windows 11/30/2020 Veton Këpuska 4

Some Analysis Windows 11/30/2020 Veton Këpuska 4

MATLAB Useful Functions u wintool n Use “doc wintool” for more information u window

MATLAB Useful Functions u wintool n Use “doc wintool” for more information u window n Use “>doc window” for the list of supported windows u Define your own window if needed e. g: n Sine window and Vorbis window 11/30/2020 Veton Këpuska 5

LPC Analysis Stage u LPC Method Described in: n Ch 5 -Analysis_&_Synthesis_of_Pole. Zero_Speech_Models. ppt

LPC Analysis Stage u LPC Method Described in: n Ch 5 -Analysis_&_Synthesis_of_Pole. Zero_Speech_Models. ppt u Summary: n Perform Autocorrelation n Solve system of equations with Durbin. Levinson Method u MATLAB help n doc lpc, etc. 11/30/2020 Veton Këpuska 6

Example of MATLAB Code function my. LPCCodec(wavfile, N) % % wavfile - input MS

Example of MATLAB Code function my. LPCCodec(wavfile, N) % % wavfile - input MS wav file % N - LPC Filter Order % [x, fs, nbits] = wavread(wavfile); % plot(x); % Playing Original Signal soundsc(x, fs); % Performing LPC analysis using MATLAB lpc function [a, g] = lpc(x, N); % performing filtering operation on estimated filter coeffs % producing predicted samples est_x = filter([0 -a(2: end)], 1, x); % error signal e = x - est_x; % Testing the quality of predicted samples soundsc(est_x, fs); ŝ[n] ge[n] % Synthesis Stage With Zero Loss of Information syn_x = filter([0 -a(2: end)], 1, g. *e); soundsc(syn_x, fs); 11/30/2020 Veton Këpuska 7

Analysis of Quantization Errors u Use MATLAB functions to research the effects of quantization

Analysis of Quantization Errors u Use MATLAB functions to research the effects of quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: n n Double (float 64) representation (software emulation) Float (float 32) representation (software emulation) Int (int 32) representation (hardware emulation) Short (int 16) representation (hardware emulation). u Useful MATLAB functions: n Fix, floor, round, ceil n Example: u sig_hat=fix(sig*2^(B-1))/2^(B-1); u Truncation of the sig to B bits. 11/30/2020 Veton Këpuska 8

Quantization of Error Signal & Filter Coefficients u Can Apply ADPCM for Error Signal

Quantization of Error Signal & Filter Coefficients u Can Apply ADPCM for Error Signal u Filter Coefficients in the Direct Filter Form are found to be sensitive to quantization errors: n Small quantization error can have a large effect on filter characteristics. n Issue is that polynomial coefficients have nonlinear mapping to poles of the filter (e. g. , roots of the polynomial). n Alternate representations possible that have significantly better tolerance to quantization error. 11/30/2020 Veton Këpuska 9

LPC Filter Representations u As noted previously when Levinson-Durbin algorithm was introduced one alternate

LPC Filter Representations u As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients: u LPC to PARCOR: 11/30/2020 Veton Këpuska 10

PARCOR Filter Representation u PARCOR to LPC: 11/30/2020 Veton Këpuska 11

PARCOR Filter Representation u PARCOR to LPC: 11/30/2020 Veton Këpuska 11

Line Spectral Frequency Representation u It turns out that PARCOR coefficients can be represented

Line Spectral Frequency Representation u It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties. u Note that: u The PARCOR lattice structure of the LPC synthesis filter above: z-1 - A 0 kp z-1 - kp-1 -1 + z-1 Bp-1 Veton Këpuska Output B 0 12 0= + Bp 11/30/2020 Ap-1 Ap k kp+1=∓ 1 Input

Line Spectral Frequency Representation u From previous slide the following holds: u From this

Line Spectral Frequency Representation u From previous slide the following holds: u From this realization of the filter the LSP representation is derived: 11/30/2020 Veton Këpuska 13

LSF Representation 11/30/2020 Veton Këpuska 14

LSF Representation 11/30/2020 Veton Këpuska 14

LPC Synthesis Filter with LSF 11/30/2020 Veton Këpuska 15

LPC Synthesis Filter with LSF 11/30/2020 Veton Këpuska 15

A Simple Speech Coder u LPC Based Synthesis Structure Decoding Residual Signal Residual Synthesis

A Simple Speech Coder u LPC Based Synthesis Structure Decoding Residual Signal Residual Synthesis Filter Deemphasis Filter Coeffs 11/30/2020 Veton Këpuska 16 Audio Output

Audio Coding

Audio Coding

Audio Coding u Most of the Audio Coding Standards use principles of Psychoacoustics. u

Audio Coding u Most of the Audio Coding Standards use principles of Psychoacoustics. u Example of Basic Structure of MP 3 encoder: Audio Input Filterbank & Transform Quantization Psychoacoustic Model 11/30/2020 Veton Këpuska 18 Bit-stream

Basic Structure of Audio Coders u Filterbank Processing u Psychoacoustic Model u Quantization 11/30/2020

Basic Structure of Audio Coders u Filterbank Processing u Psychoacoustic Model u Quantization 11/30/2020 Veton Këpuska 19

Filter Bank Analysis Synthesis

Filter Bank Analysis Synthesis

Filterbank Processing: u Splitting full-band signal into several subbands: n Uniform sub-bands (FFT) n

Filterbank Processing: u Splitting full-band signal into several subbands: n Uniform sub-bands (FFT) n Critical Band (FFT followed by non-linear transformation) u Reflect Human Auditory Apparatus. u Mel-Scale and Bark-Scale transformations 11/30/2020 Veton Këpuska 21

Mel-Scale 11/30/2020 Veton Këpuska 22

Mel-Scale 11/30/2020 Veton Këpuska 22

Bark-Scale 11/30/2020 Veton Këpuska 23

Bark-Scale 11/30/2020 Veton Këpuska 23

Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N

Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↓ - Down-sampling MDCT – Modified Discrete Cosine Transform MDCT ↓ Audio Input 11/30/2020 MDCT Bit Stream Quantization h 1[n] hk[n] ↓ MDCT h. N[n] ↓ MDCT Veton Këpuska 24

Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror k

Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror k th-filter N – Number of Channels. Typically 32 ↑ - Up-sampling Bit Stream 11/30/2020 MDCT IMDCT ↑ g 1[n] Decoding IMDCT – Inverse Modified Discrete Cosine Transform MDCT IMDCT ↑ gk[n] MDCT IMDCT ↑ g. N[n] Veton Këpuska Audio Output 25

Psycho-Acoustic Modeling

Psycho-Acoustic Modeling

Psychoacoustic Model u Masking Threshold according to the human auditory perception. n Masking threshold

Psychoacoustic Model u Masking Threshold according to the human auditory perception. n Masking threshold is used to quantize the Discrete Cosine Transform Coefficients n Analysis is done in frequency domain represented by DFT and computed by FFT. 11/30/2020 Veton Këpuska 27

Threshold of Hearing u Absolute threshold of audibly perceptible events in quiet conditions (no

Threshold of Hearing u Absolute threshold of audibly perceptible events in quiet conditions (no other sounds). u Any signal bellow the threshold can be removed without effect on the perception. 11/30/2020 Veton Këpuska 28

Threshold of Hearing 11/30/2020 Veton Këpuska 29

Threshold of Hearing 11/30/2020 Veton Këpuska 29

Frequency Masking u Schröder Spreading Function u Bark Scale Function: 11/30/2020 Veton Këpuska 30

Frequency Masking u Schröder Spreading Function u Bark Scale Function: 11/30/2020 Veton Këpuska 30

Masking Curve 11/30/2020 Veton Këpuska 31

Masking Curve 11/30/2020 Veton Këpuska 31

Primary Tone 1 k. Hz 11/30/2020 Veton Këpuska 32

Primary Tone 1 k. Hz 11/30/2020 Veton Këpuska 32

Masked Tone 900 Hz 11/30/2020 Veton Këpuska 33

Masked Tone 900 Hz 11/30/2020 Veton Këpuska 33

Combined Sound 1 k. Hz + 0. 9 k. Hz 11/30/2020 Veton Këpuska 34

Combined Sound 1 k. Hz + 0. 9 k. Hz 11/30/2020 Veton Këpuska 34

Combined 1 k. Hz + 0. 9 k. Hz (-10 d. B) 11/30/2020 Veton

Combined 1 k. Hz + 0. 9 k. Hz (-10 d. B) 11/30/2020 Veton Këpuska 35

Combined 1 k. Hz + 5 k. Hz (-10 d. B) 11/30/2020 Veton Këpuska

Combined 1 k. Hz + 5 k. Hz (-10 d. B) 11/30/2020 Veton Këpuska 36

END Veton Këpuska

END Veton Këpuska