Universal Speech and Audio Codec Linear Prediction Domain
- Slides: 19
Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département de Génie Electrique et Informatique Sherbrooke, Québec, Canada
Outline • The 3 GPP AMR-WB+ Standard – Source of inspiration for LPD processing in USAC • Changes brought to LPD processing – Forward Aliasing Cancellation – Frequency-Domain Noise Shaping – Other changes • Conclusion – More efficient LPD processing – Better unification of LPD and non-LPD FD coders
Context • The 3 GPP AMR-WB+ Standard – Hybrid codec – Time (ACELP) and Frequency (TCX) Domain – Very efficient on speech and speech-overmusic contents
ACELP 1 frame Audio Mode Selection TCX 1, 2 or 4 frames Mode Index, ISF PACKETIZATION The AMR-WB+ Encoder Bitstream
AMR-WB+ Frame Structure (a) ACELP (b) Short TCX ACELP (c) ACELP Medium TCX Long TCX One super-frame = 1024 samples • Three out of the 26 possible ACELP/TCX coding configurations
Transitions from ACELP to TCX • Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing Decoded TCX window ACELP Frame 1/8 overlap
Transitions from TCX to ACELP • Redundant windowed TCX samples are discarded Decoded TCX window Frame 1/8 ACELP overlap
Limitations of the AMR-WB+ model • Non-critically sampled transforms – FFT vs. MDCT • Inefficiencies at transitions between modes – – Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right • Even worse when switching with AAC – Time-Domain Aliasing Cancellation (TDAC) – Transitions between LPD and non-LPD processing
Changes brought to the LPD processing • • Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes
Frequency Domain Noise Shaping • To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain • Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)
Effect of FDNS on the spectral shape and the time-domain envelope of the noise Noise gains g 1[m] calculated at time position A xis a ncy que Fre r m) (k o Interpolated gains seen in the time domain, for each of the M bands A Noise gains g 2[m] calculated at s time position axi y c n B que ) e r F rm (k o C B time axis (n)
Frequency-Domain Noise Shaping • FDNS allows a smooth (sample-by-sample) timedomain noise envelope by applying a 1 st-order filtering to the MDCT coefficients (similar in principle to TNS)
Forward Aliasing Cancellation • Introduced to compensate windowing and timedomain aliasing in MDCT-coded frames when switching to and from ACELP frames Windowing effect and Time Domain Aliasing TCX frame output ACELP synthesis - Next ACELP frame +
Forward Aliasing Cancellation • FAC is applied in the original signal domain • FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature • For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC
Computation of FAC targets for transitions from and to ACELP (encoder) LPC 1 LPC 2 Signal in the original domain + - TCX frame output ACELP synthesis Next ACELP frame - + Line 1 + Line 2 Windowed ACELP ZIR - Windowed and folded ACELP synth ACELP contribution Line 3 + ACELP error TCX frame error (including ACELP contribution) FAC target Line 4
Quantization of FAC targets Filter memory (ACELP error) LPC 1 Zero memory LPC 1 1/W 1(z) ZIR W 1(z) DCT-IV Q DCT-IV-1 1/W 1(z) FAC synthesis Transmit to decoder FAC target Transition from ACELP to TCX LPC 2 Filter memory (TCX frame error) W 2(z) FAC target DCT-IV Zero memory Q DCT-IV-1 Transmit to decoder Transition from TC to ACELP LPC 2 1/W 2(z) FAC synthesis
Other changes brought to the LPD processing • Critical sampling – MDCT vs. FFT – FAC+FDNS • Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) • Variable bit rate – LPC quantizer – Bit reservoir adaptation
Conclusion • USAC makes use of LPD and non-LPD processing – LPD mode inspired by AMR-WB+ – Non-LPD mode derived from AAC • Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient – Frequency Domain Noise Shaping (FDNS) – Forward Aliasing Cancellation (FAC) • USAC is a real unification of two coding models
- Audio codec circuit
- Ut video codec
- Camtasia 9 codec
- Bv32 codec
- Melp codec
- Vo.codec
- Wavelet codec
- Melp codec
- Arm codec
- Codec burkina
- Perceptual linear prediction
- Acoustic echo cancellation challenge
- Introduction to functions (review game)
- What is time domain and frequency domain
- Compiler bridges the semantic gap between which domains?
- Z domain to frequency domain
- Ec2314 digital signal processing
- Z transform of ramp function
- Domain specific vs domain general
- Domain specific vs domain general