Overview of Adaptive Multi Rate Narrow Band AMRNB
Overview of Adaptive Multi. Rate Narrow Band (AMR-NB) Speech Codec Presented by Peter
AMR Narrow Band n n n n Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) Specified by 3 GPP for GSM/3 G Systems Input: 8 k. Hz sampling rate, 13 -bit PCM 20 ms frames, no overlap 8 modes + Comfort noise Output bitrate from 4. 75 – 12. 2 kbps Algebraic Code Excited Linear Prediction (ACELP) is used as speech codec
Frequency Response
Speech Encoder n n n n n Pre-processing Linear prediction analysis and quantization Open-loop pitch analysis Impulse response computation Target signal computation Adaptive codebook Algebraic codebook Quantization of the adaptive and fixed codebook gains Memory update
Principles of the adaptive multi-rate speech encoder n n Eight source codecs with bit-rates of 12. 2, 10. 2, 7. 95, 7. 40, 6. 70, 5. 90, 5. 15 and 4. 75 kbit/s 10 th order linear prediction (LP), or short‑term, synthesis filter is used which is given by The long‑term, or pitch, synthesis filter is given by The pitch synthesis filter is implemented using adaptive codebook approach
ACELP
Pre-Processing n Two pre‑processing functions q q n high‑pass filtering signal down‑scaling – prevent overflow A filter with a cut off frequency of 80 Hz is used
Linear Prediction Analysis n n Frame is spit into four sub-frames 12. 2 kbit/s mode q q q n Performed twice per frame 30 ms asymmetric window No lookahead 10. 2, 7. 95, 7. 40, 6. 70, 5. 90, 5. 15, 4. 75 kbit/s q q q Performed once per frame 30 ms asymmetric window 5 ms lookahead
Windowing and Auto-correlation Computation n 12. 2 kbit/s mode q q q Two different asymmetric windows 1 st window concentrates on 2 nd sub-frame 2 nd window concentrates on 4 th sub-frame
Windowing and Auto-correlation Computation n 10. 2, 7. 95, 7. 40, 6. 70, 5. 90, 5. 15, 4. 75 kbit/s q q q One asymmetric windows Concentrates on 4 th sub-frame 5 ms (40 samples) lookahead
Auto-correlation Computation n Lag 0 to 10 is computed n is the windowed speech 60 Hz bandwidth expansion is used by lag windowing n n is multiplied by the white noise correction factor 1. 0001 which is equivalent to adding a noise floor at ‑ 40 d. B
Levinson‑Durbin algorithm n by solving the set of equations n uses the following recursion: n The final solution is given as
LP to LSP conversion n n The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes The LSPs are defined as the roots of the sum and difference polynomials All roots of these polynomials are on the unit circle and they alternate each other z=-1 and 1 are eliminated
LP to LSP conversion
Quantization of the LSP coefficients n 12. 2 kbit/s mode q q n Two sets of LSP are quantified using the representation in the frequency domain 1 st order MA prediction is applied two residual LSF vectors are jointly quantified using split matrix quantization (SMQ) weighted LSP distortion measure is used in the quantization process 10. 2, 7. 95, 7. 40, 6. 70, 5. 90, 5. 15, 4. 75 kbit/s modes q q q 1 st order MA prediction is applied residual LSF vector is quantified using split vector quantization weighted LSP distortion measure
Interpolation of the LSPs n 12. 2 kbit/s mode q n interpolated LSP vectors at the 1 st and 3 rd subframes are given by 10. 2, 7. 95, 7. 40, 6. 70, 5. 90, 5. 15, 4. 75 kbit/s modes q interpolated LSP vectors at the 1 st, 2 nd, and 3 rd subframes are given by
Open‑loop pitch analysis n n n Performed twice per frame (each 10 ms) for 12. 2 k, 10. 2 k, 7. 95 k, 7. 40, 6. 70 k, 5. 90 k bit/s modes Performed once per frame for 5. 15 k, 4. 75 k bit/s modes Filtering the pre-processed signal with a perceptual weighting filter original weighted unit circle Flat: Tilted :
Impulse response computation n The impulse response, h(n) is computed each subframe For the search of adaptive and fixed codebooks Computed by filtering the vector of coefficients of the filter extended by zeros through the two filters and
Adaptive codebook n n n Adaptive codebook search is performed on a subframe basis The parameters are the delay and gain of the pitch filter The codebook contain entries taken from the previously synthesized excitation signal
Algebraic codebook n n Encode the random portion of the excitation signal The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook Codebook search by minimize error between perceptual weighted input speech and reconstructed speech Based on interleaved single-pulse permutation (ISPP) design q q q A few sparse impulse sequence that are phase-shifted version of each other All the pulses have the same magnitude Amplitudes are +1 or -1
Speech decoder n n n Codebook parameter are decoded by table look up LSP coefficients are interpolated and converted to LP coefficients Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post-filtering.
Speech decoder
Synthesis model
Synthesis model n To reconstruct speech q q q A noise-like speech A pitch filter model of the glottal vibrations A linear prediction filter model of the vocal tract
Post‑processing n Adaptive post-filtering q q n High-pass filter q q n Cascade of two filters: a format postfilter and a tilt compensation filter Updated every subframe of 5 ms Against undesired low frequency components Cut-off frequency of 60 Hz is used Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal
- Slides: 25