SPEECH CODING Maryam Zebarjad Alessandro Chiumento SPEECH PROPERTIES

SPEECH CODING Maryam Zebarjad Alessandro Chiumento

SPEECH PROPERTIES Why speech coding? Efficient transmission Efficient storage Problems: High quality with the lowest bit-rate possible 2 categories: Voiced and Unvoiced Voiced: quasi-periodic in the time domain and harmonically structured in the frequency domain Unvoiced: random-like and broadband (like white noise)

Performance measures 4 standards for speech quality: Broadcast, Network, Communications, Synthetic 2 ways of measuring: Objective SNR, long term SEGSNR, short term Subjective DRT DAM Diagnostic Rhyme Test Diagnostic Acceptability Measure MOS Mean Opinion Score

Coding Techniques: WAVEFORM CODERS digitalize speech on a sample-by-sample basis. The goal is to have the output waveform closely match the input waveform. SINUSOIDAL ANALYSIS-SYNTHESIS They relay on the sinusoidal representation of the speech waveform Scalar and vector quantization Sub-band coders Transform coders Short - Time Fourier Transform models Sinusoidal Transform Coding Multiband Excitation Coder VOCODERS Speech – specific coders Formant Vocoders Channel Vocoders LPC Vocoders

Scalar and Vector Quantization SQ: every sample is mapped into a specific code Examples : PCM, DM, ADPCM. .

Scalar and Vector Quantization VQ: the data (speech) is compressed by encoding it in blocks. The incoming vectors are formed from consecutive data samples or from model parameters. Examples: VPCM, GS-VQ, A-VQ. . .

Sub-band Coders Unlike SQ and VQ this coders rely more on frequency- domain properties of speech. the signal band is divided into frequency sub-bands using a bank of bandpass filters. The output of each filter is then sampled (or down-sampled) and encoded. Example: AT&T, CCITT (G. 722), . . .

Transform Coders Work on spectral properties of speech (like SBC) They use unitary transforms whose parameters are quantized at the transmitter and decoded and inverse-transformed at the receiver The potential for bit-rate reduction in transform coding lies in the fact that unitary transforms tend to generate nearuncorrelated transform components which can be coded independently Although there are many possible transforms that can be used (DCT, DFT, WHT, KLT, …) all share the property of unitarity:

Example: Adaptive Transformation Coder It employs DCT and has high performance

Speech Coding Using Sinusoidal Analysis – Synthesis Models This speech coders relay on the sinusoidal representation of the speech waveform Speech Analysis-Synthesis Using the Short-Time Fourier Transform Speech is slowly time-varying (quasi-stationary) and can be modeled by its short time spectrum Analysis expression Synthesis expression h(n) is the sliding analysis window and is often constrained to be about 5 – 20 ms

Speech Coding Using Sinusoidal Analysis – Synthesis Models Speech Analysis-Synthesis Using the Sinusoidal Transform Coding The speech is represented by linear combination of sinusoids with time-varying amplitudes, phases and frequencies: Mc. Aulay - Quartieri The number of sinusoids L is time-varying, the possibility to reduce bit-rate comes from the fact that voiced speech is highly periodic and L can be adjusted accordingly. Furthermore the statistical properties of the Short-Time spectrum of unvoiced speech are preserved.

Vocoders Speech specific Low bit rate but performance degrades for non speech signals 4 types: Channel, Formant, Homomorphic, LPC Vocoders are divided in 3 categories based in excitation models: 2 -state excitation Mixed excitation residual

LPC Vocoder For a p-th order forward linear prediction the present sample if predicted from linear compination of p past samples The prediction parameters are obtained by minimizing the mean square forward prediction error where For forward estimation:

The system can be solved using the Levinson – Durbin recursion:

Wokplan Implementation of: LPC Vocoder DCT Transform Coder DPCM Coder Comparison of three methods for specific speech signals