Audio Coding Ketan MayerPatel CS 294 9 Fall

Overview of Today • PCM – Linear – m-La. W • • DPCM ADPCM

Audio Signals • Analog audio is basically voltage as a continuous function of time.

Sampling • Pulse Amplitude Modulation (PAM) – Each sample’s amplitude is represented by 1

PCM 0100 0011 0010 0001 0000 1001 1010 1011 1100 Quantization error (“noise”) •

Linear PCM • • Uses evenly spaced quantization levels. Typically 16 -bits per sample.

Non-linear Sampling • If we try to use 8 bits per sample, dynamic range

Non-linear Sampling Illustrated Output Input CS 294 -9 : : Fall 2003

m-law and A-law • Non-linear sampling called “companding” • 8 -bits companded provides dynamic

m -Law companding • Provides 14 -bit quality (dynamic range) with an 8 -bit

m -Law Encoding High-resolution PCM encoding (12, 14, 16 bits) 8 -bit -Law encoding

m -Law Decoding High-resolution PCM encoding (12, 14, 16 bits) Table Lookup 8 -bit

Difference Encoding 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differential-PCM (DPCM)

Slope Overload Problem 0100 0011 0010 0001 0000 1001 1010 1011 1100 “Slope Overload”

Adaptive DPCM (ADPCM) • Use a larger step-size to encode differences between high-frequency samples

ADPCM • To ensure differences are always small. . . – Adaptively change the

IMA’s proposed ADPCM 16 -bit PCM sample + – Difference Quantizer + PCM Sample

IMA Difference Quantization 16 -bit PCM sample + – Difference Quantizer + PCM sample

IMA Step-size Table Index Step Size Index 0 1 2 3 4 5 6

Adaptive Step-size Selection 16 -bit PCM Sample + – Difference Quantizer + PCM Sample

Adaptive Step-size Selection Step-Size Table Lookup Range Limit (0 to 88) + Previous Index

IMA ADPCM Example ce n x e de lier iffer t n i ue

Networking Considerations Dequantizer + + Step-Size Adjuster Quantization The IMA codec is reasonably robust

Psychoacoustic Properties 100 Sound Level (d. B) 80 Audible 60 40 20 Inaudible 0

Auditory Masking 100 Sound Level (d. B) Audible 80 Masking tone 60 40 20

MPEG Encoder Block Diagram PCM Audio Samples (32, 44. 1, 48 k. Hz) Mapping

Subband Filter • Transforms signal from time domain to frequency domain. – 32 PCM

Layer 1 • 384 samples per frame. • Iterative bit allocation process: – For

Layer 2 • • 1152 samples per frame. Iterative bit allocation. Subband allocation is

Layer 3 • 1152 samples – Up to 320 kb/s • Each subband further

Vo-coding • Concept: Develop a mathematical model of the vocal cords & throat –

Vocoding - Basic Concepts 75 Amplitude 60 45 30 15 Frequency (k. Hz) 0

“Buzzer” and “Tube” Model “yadda” • Vocoding principles: – voice = formants + buzz

LPC • Decoder artificially generates speech via formant synthesis – A mathematical simulation of

Networking Concerns • Audio bandwidth is actually quite small. • But human sensitivity to

Slides: 35

Download presentation

Audio Coding Ketan Mayer-Patel CS 294 -9 : : Fall 2003

Overview of Today • PCM – Linear – m-La. W • • DPCM ADPCM MPEG-1 Vocoding Sampling Techniques Generic Coding Techniques Psychoacoutic Coding Speech Specific Techniques CS 294 -9 : : Fall 2003

Audio Signals • Analog audio is basically voltage as a continuous function of time. • Unlike video which is 3 D, audio is a 1 D signal. – Can capture without having to discretize the higher dimensions. • Audio sampling basically boils down to quantizing signal level to a set of values. • Digital audio parameters: – bits per sample – sampling rate – number of channels. CS 294 -9 : : Fall 2003

Sampling • Pulse Amplitude Modulation (PAM) – Each sample’s amplitude is represented by 1 analog value • Sampling theory (Nyquist) – If input signal has maximum frequency (bandwidth) f, sampling frequency must be at least 2 f – With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed CS 294 -9 : : Fall 2003

PCM 0100 0011 0010 0001 0000 1001 1010 1011 1100 Quantization error (“noise”) • Pulse Code Modulation (PCM) – Each sample’s amplitude represented by an integer code-word – Each bit of resolution adds 6 d. B of dynamic range – Number of bits required depends on the amount of noise that is tolerated SNR – 4. 77 n = 6. 02 CS 294 -9 : : Fall 2003

Linear PCM • • Uses evenly spaced quantization levels. Typically 16 -bits per sample. Provides a large dynamic range. Difficult for humans to perceive quantization noise. • Compact Disks – 16 -bit linear sampling – 44. 1 KHz sampling rate – 2 channels CS 294 -9 : : Fall 2003

Non-linear Sampling • If we try to use 8 bits per sample, dynamic range is reduced significantly and quantization noise can be heard. • In particular, we end up with not enough levels for the lower amplitudes. • Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes. • Sort of like a log scale. CS 294 -9 : : Fall 2003

Non-linear Sampling Illustrated Output Input CS 294 -9 : : Fall 2003

m-law and A-law • Non-linear sampling called “companding” • 8 -bits companded provides dynamic range equivalent to 12 -bits. • U-law and A-law are companding standards defined in G. 711 • Difference is in exact shape of piece-wise linear companding function. CS 294 -9 : : Fall 2003

m -Law companding • Provides 14 -bit quality (dynamic range) with an 8 -bit encoding • Used in North American & Japanese ISDN voice service • Simple to compute encoding ln(1 + |x|) f(x) = 127 x sign(x) x ln(1 + ) CS 294 -9 : : Fall 2003 (x normalized to [-1, 1])

m -Law Encoding High-resolution PCM encoding (12, 14, 16 bits) 8 -bit -Law encoding Table Lookup Inverse Table Lookup Sender . . . 15 16. . . 31 32 . . . 011 1111 0000 47 48 . . . 16 . . . 8 010 1111 0000 . . . 463 -479 001 1111 0000 4 Code Value 0 1 . . . 215 -223 223 -239 2 000 Quantization 0000 0001 . . . 91 -95 95 -103 Segment . . . 29 -31 31 -35 Step Size 1 . . . Input Amplitude 0 -1 1 -3 Receiver 1111 63 CS 294 -9 : : Fall 2003 14 -bit decoding

m -Law Decoding High-resolution PCM encoding (12, 14, 16 bits) Table Lookup 8 -bit -Law encoding Sender Inverse Table Lookup 14 -bit decoding Receiver m-Law Endoding 00000001 0111111 8 16 93 99 219 231. . . CS 294 -9 : : Fall 2003 4 30 33 . . . 0101111 0110000 2 . . . 0011111 0100000 1 Decode Amplitude 0 2. . . 0001111 0010000 Multiplier 471

Difference Encoding 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differential-PCM (DPCM) – Exploit temporal redundancy in samples – Difference between 2 x-bit samples can be represented with significantly fewer than x-bits – Transmit the difference (rather than the sample) CS 294 -9 : : Fall 2003

Slope Overload Problem 0100 0011 0010 0001 0000 1001 1010 1011 1100 “Slope Overload” • Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits! – Error introduced leads to severe distortion in the higher frequencies CS 294 -9 : : Fall 2003

Adaptive DPCM (ADPCM) • Use a larger step-size to encode differences between high-frequency samples & a smaller stepsize for differences between low-frequency samples • Use previous sample values to estimate changes in the signal in the near future CS 294 -9 : : Fall 2003

ADPCM • To ensure differences are always small. . . – Adaptively change the step-size (quanta) – (Adaptively) attempt to predict next sample value y-bit PCM sample + – Difference Quantizer + Predicted PCM Sample n+1 Predictor Step-Size Adjuster + + + Dequantizer CS 294 -9 : : Fall 2003 x-bit ADPCM “difference”

IMA’s proposed ADPCM 16 -bit PCM sample + – Difference Quantizer + PCM Sample n– 1 Register 4 -bit ADPCM difference Step-Size Adjuster + + + Dequantizer • Predictor is not adaptive and simply uses the last sample value • Quantization step-size increases logarithmically with signal frequency CS 294 -9 : : Fall 2003

IMA Difference Quantization 16 -bit PCM sample + – Difference Quantizer + PCM sample n– 1 Register Quantization Step-Size Adjuster + + + 4 -bit ADPCM difference (in step-size units) Dequantizer Quantizer Output difference < 1 4 step_size 1 step_size < difference < 1 step_size 4 2 1 step_size < difference < 3 step_size 2 4 3 step_size < difference < step_size 4 step_size < difference < 5 4 step_size 5 step_size < difference < 3 step_size 4 2 3 step_size < difference < 7 step_size 2 CS 4294 -9 : : Fall 2003 7 step_size < difference 4 000 001 010 011 100 101 110 111 Step-Size Multiples 0. 0 0. 25 0. 50 0. 75 1. 0 1. 25 1. 75

IMA Step-size Table Index Step Size Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 7 8 9 10 11 12 13 14 16 17 19 21 23 25 28 31 34 37 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Step Size 41 45 50 55 60 66 73 80 88 97 107 118 130 143 157 173 190 209 Index Step Size 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 230 253 279 307 337 371 408 449 494 544 598 658 724 796 876 963 1060 1166 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 1282 1411 1552 1707 1878 2066 2272 2499 2749 3024 3327 3660 4026 4428 4871 5358 5894 6484 CS 294 -9 : : Fall 2003 Index Step Size 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 7132 7845 8630 9493 10442 11487 12635 13899 15289 16818 18500 20350 22358 24623 27086 29794 32767

Adaptive Step-size Selection 16 -bit PCM Sample + – Difference Quantizer + PCM Sample n– 1 + + Register Step-Size Table Lookup New Step-Size Adjuster + 4 -bit ADPCM difference (in step-size units) Dequantizer Range Limit (0 to 88) + Previous Index Register Index Adjustment CS 294 -9 : : Fall 2003 Step-Size Table Index Adjustment Lookup Quantizer Output

Adaptive Step-size Selection Step-Size Table Lookup Range Limit (0 to 88) + Previous Index Register New Step-Size Quantization Step-Size Table Index Adjustment Lookup Quantizer Step-Size Table Output Index Adjustment difference < 1 4 step_size 000 1 step_size < difference < 1 step_size 001 4 2 1 step_size < difference < 3 step_size 010 2 4 3 step_size < difference < step_size 011 4 step_size < difference < 5 4 step_size 100 5 step_size < difference < 3 step_size 101 4 2 3 step_size < difference < 7 step_size 110 2 4 7 step_size < difference CS 294 -9 : : Fall 2003 111 4 -1 -1 2 4 6 8 Difference Quantizer Step-Size Adjustment X 0. 91 X 1. 21 X 1. 46 X 1. 77 X 2. 14

IMA ADPCM Example ce n x e de lier iffer t n i ue tpu stme able ultip ted d l u a o v e ju ze t e m titu r c e d d e n z e z z i i t s A i e i ic d ut iffer tep S uant dex tep-S tep-s econ e p S D S S Pr In Q In R X Step Q Adj I M Decode 150 7 0 155 5 7 010 -1 0 0. 5 3. 5 154 + 167 13 7 111 8 8 1. 75 12 166 + Xn 170 4 16 001 -1 7 0. 25 4 170 – 250 80 14 111 8 15 1. 75 24. 5 195 250 55 31 111 8 23 1. 75 54 249 Xn– 1 250 1 66 000 -1 22 0. 0 0 249 250 1 60 000 -1 21 0. 0 0 249 200 -49 55 011 -1 20 0. 75 -41 208 Register 200 200 200 CS 294 -9 : : Fall 2003 200 Difference Quantizer Step-Size Adjuster + + + Dequantizer

Networking Considerations Dequantizer + + Step-Size Adjuster Quantization The IMA codec is reasonably robust to errors + PCM sample n– 1 An interval with a low-level signal will correct any stepsize error Register Quantizer Output difference < 1 4 step_size 000 1 step_size < difference < 1 step_size 001 2 4 1 step_size < difference < 3 step_size 010 2 4 3 step_size < difference < step_size 011 4 step_size < difference < 5 4 step_size 100 5 step_size < difference < 3 step_size 101 4 2 3 step_size < difference < 7 step_size 110 2 4 7 step_size < difference CS 294 -9 : : Fall 2003 111 4 Step-Size Table Index Adjustment -1 -1 2 4 6 8

Psychoacoustic Properties 100 Sound Level (d. B) 80 Audible 60 40 20 Inaudible 0 0. 02 0. 05 0. 1 0. 2 0. 5 1 2 5 10 20 Frequency (k. Hz) • Human perception of sound is a function of frequency and signal strength – (MPEG exploits this relationship. ) CS 294 -9 : : Fall 2003

Auditory Masking 100 Sound Level (d. B) Audible 80 Masking tone 60 40 20 Masked tone Inaudible 0 0. 02 0. 05 0. 1 0. 2 0. 5 1 2 5 10 20 Frequency (k. Hz) • The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies – Humans cannot distinguish between tones within 100 Hz at low frequencies and 4 k. Hz at high frequencies CS 294 -9 : : Fall 2003

MPEG Encoder Block Diagram PCM Audio Samples (32, 44. 1, 48 k. Hz) Mapping Quantizer Psychoacoutstic Model Coding Frame Packing Ancillary Data CS 294 -9 : : Fall 2003 Encoded Bitstream

Subband Filter • Transforms signal from time domain to frequency domain. – 32 PCM samples yields 32 subband samples. • Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq. – Filter actually works on a window of 512 samples that is shifted over 32 samples at a time. • Subband coefficients are analyzed with psychoacoustic model, quantized, and coded. CS 294 -9 : : Fall 2003

Layer 1 • 384 samples per frame. • Iterative bit allocation process: – For each subband, determine MNR. – Increase number of quantization bits for subband with smallest MNR. – Iterate until all bits used. • Fixed allocation of bits among subbands for a particular frame. • Up to 448 kb/s CS 294 -9 : : Fall 2003

Layer 2 • • 1152 samples per frame. Iterative bit allocation. Subband allocation is dynamic. Up to 384 kb/s CS 294 -9 : : Fall 2003

Layer 3 • 1152 samples – Up to 320 kb/s • Each subband further analyzed using MDCT to create 576 frequency lines. – 4 different windowing schemes depending on whether samples contain “attack” of new frequencies. • Lots of bit allocation options for quantizing frequency coefficients. • Quantized coefficients Huffman coded. CS 294 -9 : : Fall 2003

Vo-coding • Concept: Develop a mathematical model of the vocal cords & throat – Derive/compute model parameters for a short interval and transmit to the decoder – Use the parameters to synthesize speech at the decoder • So what is a good model? – A “buzzer” in a “tube”! – The buzzer is characterized by its intensity & pitch – The tube is characterized by its formants CS 294 -9 : : Fall 2003

Vocoding - Basic Concepts 75 Amplitude 60 45 30 15 Frequency (k. Hz) 0 • Formant — frequency maxima & minima in the spectrum of the speech signal • Vocoders group and code portions of the signal by amplitude CS 294 -9 : : Fall 2003

“Buzzer” and “Tube” Model “yadda” • Vocoding principles: – voice = formants + buzz pitch & intensity – voice – estimated formants = “residue” • Linear Predictive Coding (LPC) – A sample is represented as a linear combination of p previous samples p y(n) = ak y(n – k) + G x x(n) k=1 CS 294 -9 : : Fall 2003

LPC • Decoder artificially generates speech via formant synthesis – A mathematical simulation of the vocal tract as a series of bandpass filters – Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation • Standards: – Regular Pulse Excited Linear Predictive Coder (RPE-LPC) • Digital cellular standard GSM 6. 1 (13 kbps) – Code Excited Linear Predictive Coder (CELP) • US Federal Standard 1016 (4. 8 kbps) – Linear Predictive Coder (LPC) • US Federal Standard 1015 (2. 4 kbps) CS 294 -9 : : Fall 2003

Networking Concerns • Audio bandwidth is actually quite small. • But human sensitivity to loss and noise is quite high. • Netwoking concerns: – Loss concealment – Jitter control • Especially for telephony applications. CS 294 -9 : : Fall 2003