Universal Speech and Audio Codec Linear Prediction Domain

Outline • The 3 GPP AMR-WB+ Standard – Source of inspiration for LPD processing

Context • The 3 GPP AMR-WB+ Standard – Hybrid codec – Time (ACELP) and

ACELP 1 frame Audio Mode Selection TCX 1, 2 or 4 frames Mode Index,

AMR-WB+ Frame Structure (a) ACELP (b) Short TCX ACELP (c) ACELP Medium TCX Long

Transitions from ACELP to TCX • Zero-input response (ZIR) of LPC weighting filter provides

Transitions from TCX to ACELP • Redundant windowed TCX samples are discarded Decoded TCX

Limitations of the AMR-WB+ model • Non-critically sampled transforms – FFT vs. MDCT •

Changes brought to the LPD processing • • Replaced FFTs by MDCTs Introduced Frequency

Frequency Domain Noise Shaping • To unify processing of AAC and TCX frames, the

Effect of FDNS on the spectral shape and the time-domain envelope of the noise

Frequency-Domain Noise Shaping • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying

Forward Aliasing Cancellation • Introduced to compensate windowing and timedomain aliasing in MDCT-coded frames

Forward Aliasing Cancellation • FAC is applied in the original signal domain • FAC

Computation of FAC targets for transitions from and to ACELP (encoder) LPC 1 LPC

Quantization of FAC targets Filter memory (ACELP error) LPC 1 Zero memory LPC 1

Other changes brought to the LPD processing • Critical sampling – MDCT vs. FFT

Conclusion • USAC makes use of LPD and non-LPD processing – LPD mode inspired

Slides: 19

Download presentation

Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département de Génie Electrique et Informatique Sherbrooke, Québec, Canada

Outline • The 3 GPP AMR-WB+ Standard – Source of inspiration for LPD processing in USAC • Changes brought to LPD processing – Forward Aliasing Cancellation – Frequency-Domain Noise Shaping – Other changes • Conclusion – More efficient LPD processing – Better unification of LPD and non-LPD FD coders

Context • The 3 GPP AMR-WB+ Standard – Hybrid codec – Time (ACELP) and Frequency (TCX) Domain – Very efficient on speech and speech-overmusic contents

ACELP 1 frame Audio Mode Selection TCX 1, 2 or 4 frames Mode Index, ISF PACKETIZATION The AMR-WB+ Encoder Bitstream

AMR-WB+ Frame Structure (a) ACELP (b) Short TCX ACELP (c) ACELP Medium TCX Long TCX One super-frame = 1024 samples • Three out of the 26 possible ACELP/TCX coding configurations

Transitions from ACELP to TCX • Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing Decoded TCX window ACELP Frame 1/8 overlap

Transitions from TCX to ACELP • Redundant windowed TCX samples are discarded Decoded TCX window Frame 1/8 ACELP overlap

Limitations of the AMR-WB+ model • Non-critically sampled transforms – FFT vs. MDCT • Inefficiencies at transitions between modes – – Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right • Even worse when switching with AAC – Time-Domain Aliasing Cancellation (TDAC) – Transitions between LPD and non-LPD processing

Changes brought to the LPD processing • • Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes

Frequency Domain Noise Shaping • To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain • Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)

Effect of FDNS on the spectral shape and the time-domain envelope of the noise Noise gains g 1[m] calculated at time position A xis a ncy que Fre r m) (k o Interpolated gains seen in the time domain, for each of the M bands A Noise gains g 2[m] calculated at s time position axi y c n B que ) e r F rm (k o C B time axis (n)

Frequency-Domain Noise Shaping • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)

Forward Aliasing Cancellation • Introduced to compensate windowing and timedomain aliasing in MDCT-coded frames when switching to and from ACELP frames Windowing effect and Time Domain Aliasing TCX frame output ACELP synthesis - Next ACELP frame +

Forward Aliasing Cancellation • FAC is applied in the original signal domain • FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature • For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC

Computation of FAC targets for transitions from and to ACELP (encoder) LPC 1 LPC 2 Signal in the original domain + - TCX frame output ACELP synthesis Next ACELP frame - + Line 1 + Line 2 Windowed ACELP ZIR - Windowed and folded ACELP synth ACELP contribution Line 3 + ACELP error TCX frame error (including ACELP contribution) FAC target Line 4

Quantization of FAC targets Filter memory (ACELP error) LPC 1 Zero memory LPC 1 1/W 1(z) ZIR W 1(z) DCT-IV Q DCT-IV-1 1/W 1(z) FAC synthesis Transmit to decoder FAC target Transition from ACELP to TCX LPC 2 Filter memory (TCX frame error) W 2(z) FAC target DCT-IV Zero memory Q DCT-IV-1 Transmit to decoder Transition from TC to ACELP LPC 2 1/W 2(z) FAC synthesis

Other changes brought to the LPD processing • Critical sampling – MDCT vs. FFT – FAC+FDNS • Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) • Variable bit rate – LPC quantizer – Bit reservoir adaptation

Conclusion • USAC makes use of LPD and non-LPD processing – LPD mode inspired by AMR-WB+ – Non-LPD mode derived from AAC • Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient – Frequency Domain Noise Shaping (FDNS) – Forward Aliasing Cancellation (FAC) • USAC is a real unification of two coding models