Universal Speech and Audio Codec Linear Prediction Domain

  • Slides: 19
Download presentation
Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch

Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département de Génie Electrique et Informatique Sherbrooke, Québec, Canada

Outline • The 3 GPP AMR-WB+ Standard – Source of inspiration for LPD processing

Outline • The 3 GPP AMR-WB+ Standard – Source of inspiration for LPD processing in USAC • Changes brought to LPD processing – Forward Aliasing Cancellation – Frequency-Domain Noise Shaping – Other changes • Conclusion – More efficient LPD processing – Better unification of LPD and non-LPD FD coders

Context • The 3 GPP AMR-WB+ Standard – Hybrid codec – Time (ACELP) and

Context • The 3 GPP AMR-WB+ Standard – Hybrid codec – Time (ACELP) and Frequency (TCX) Domain – Very efficient on speech and speech-overmusic contents

ACELP 1 frame Audio Mode Selection TCX 1, 2 or 4 frames Mode Index,

ACELP 1 frame Audio Mode Selection TCX 1, 2 or 4 frames Mode Index, ISF PACKETIZATION The AMR-WB+ Encoder Bitstream

AMR-WB+ Frame Structure (a) ACELP (b) Short TCX ACELP (c) ACELP Medium TCX Long

AMR-WB+ Frame Structure (a) ACELP (b) Short TCX ACELP (c) ACELP Medium TCX Long TCX One super-frame = 1024 samples • Three out of the 26 possible ACELP/TCX coding configurations

Transitions from ACELP to TCX • Zero-input response (ZIR) of LPC weighting filter provides

Transitions from ACELP to TCX • Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing Decoded TCX window ACELP Frame 1/8 overlap

Transitions from TCX to ACELP • Redundant windowed TCX samples are discarded Decoded TCX

Transitions from TCX to ACELP • Redundant windowed TCX samples are discarded Decoded TCX window Frame 1/8 ACELP overlap

Limitations of the AMR-WB+ model • Non-critically sampled transforms – FFT vs. MDCT •

Limitations of the AMR-WB+ model • Non-critically sampled transforms – FFT vs. MDCT • Inefficiencies at transitions between modes – – Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right • Even worse when switching with AAC – Time-Domain Aliasing Cancellation (TDAC) – Transitions between LPD and non-LPD processing

Changes brought to the LPD processing • • Replaced FFTs by MDCTs Introduced Frequency

Changes brought to the LPD processing • • Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes

Frequency Domain Noise Shaping • To unify processing of AAC and TCX frames, the

Frequency Domain Noise Shaping • To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain • Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)

Effect of FDNS on the spectral shape and the time-domain envelope of the noise

Effect of FDNS on the spectral shape and the time-domain envelope of the noise Noise gains g 1[m] calculated at time position A xis a ncy que Fre r m) (k o Interpolated gains seen in the time domain, for each of the M bands A Noise gains g 2[m] calculated at s time position axi y c n B que ) e r F rm (k o C B time axis (n)

Frequency-Domain Noise Shaping • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying

Frequency-Domain Noise Shaping • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)

Forward Aliasing Cancellation • Introduced to compensate windowing and timedomain aliasing in MDCT-coded frames

Forward Aliasing Cancellation • Introduced to compensate windowing and timedomain aliasing in MDCT-coded frames when switching to and from ACELP frames Windowing effect and Time Domain Aliasing TCX frame output ACELP synthesis - Next ACELP frame +

Forward Aliasing Cancellation • FAC is applied in the original signal domain • FAC

Forward Aliasing Cancellation • FAC is applied in the original signal domain • FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature • For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC

Computation of FAC targets for transitions from and to ACELP (encoder) LPC 1 LPC

Computation of FAC targets for transitions from and to ACELP (encoder) LPC 1 LPC 2 Signal in the original domain + - TCX frame output ACELP synthesis Next ACELP frame - + Line 1 + Line 2 Windowed ACELP ZIR - Windowed and folded ACELP synth ACELP contribution Line 3 + ACELP error TCX frame error (including ACELP contribution) FAC target Line 4

Quantization of FAC targets Filter memory (ACELP error) LPC 1 Zero memory LPC 1

Quantization of FAC targets Filter memory (ACELP error) LPC 1 Zero memory LPC 1 1/W 1(z) ZIR W 1(z) DCT-IV Q DCT-IV-1 1/W 1(z) FAC synthesis Transmit to decoder FAC target Transition from ACELP to TCX LPC 2 Filter memory (TCX frame error) W 2(z) FAC target DCT-IV Zero memory Q DCT-IV-1 Transmit to decoder Transition from TC to ACELP LPC 2 1/W 2(z) FAC synthesis

Other changes brought to the LPD processing • Critical sampling – MDCT vs. FFT

Other changes brought to the LPD processing • Critical sampling – MDCT vs. FFT – FAC+FDNS • Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) • Variable bit rate – LPC quantizer – Bit reservoir adaptation

Conclusion • USAC makes use of LPD and non-LPD processing – LPD mode inspired

Conclusion • USAC makes use of LPD and non-LPD processing – LPD mode inspired by AMR-WB+ – Non-LPD mode derived from AAC • Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient – Frequency Domain Noise Shaping (FDNS) – Forward Aliasing Cancellation (FAC) • USAC is a real unification of two coding models