MP 3 and AAC Trac D Tran ECE

MP 3 u MP 3 = MPEG 2 Layer III audio coding § Transform:

Transformation Stage in MP 3 ts n ie x[n] H 0 (z) H 1

Masking u u Masking discovered from psycho-acoustic experiments Human auditory system is less sensitive

Masking Threshold u u Signal components below the masking threshold are deemed insignificant (can

Advanced Audio Coding (AAC) u u u u Successor of MP 3 Better audio

Transformation Stage in AAC x[n] H 0 (z) 1024 128 H 1023(z) 1024 H

JPEG Still Image Coding Standard Trac D. Tran ECE Department The Johns Hopkins University

Overall Structure of JPEG DC Color Converter u Level Offset 8 x 8 DCT

JPEG Quantization u u u Uniform mid-tread quantizer Larger step sizes for chroma components

Scaling of Quantization Table u Actual Q table = scaling x Basic Q table:

DC Prediction u u u DC Coefficients: average of a block DC of neighboring

Coefficient Category u u Divide coefficients into categories of exponentially increased sizes Use Huffman

Coding of DC Coefficients u Encode e(n) = DC(n) – DC(n-1) DC Cat. Prediction

Coding of AC Coefficients u u Most non-zero coefficients are in the upper-left corner

A Complete Example u Original data: 2 -D DCT 124 125 122 120 122

A Complete Example u u Zigzag scanning 2 1 -9 3 EOB Inverse Quantization

Progressive JPEG u Baseline JPEG encodes the image block by block: § Decoder has

JPEG Coding Result for Lena QF 25 Quality factor: 5 25 50 75 90

Summary u Transformation § § u Karhunen-Loeve Transform (KLT): optimal linear transform Discrete Cosine

Slides: 21

Download presentation

MP 3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218

MP 3 u MP 3 = MPEG 2 Layer III audio coding § Transform: cascade of 32 channel filter bank and 6 channel or 18 channel MDCT § Quantization: uniform scalar quantizer with a psycho-acoustic model § Entropy coding: run-length + Huffman

Transformation Stage in MP 3 ts n ie x[n] H 0 (z) H 1 (z) H 31 (z) 32 s n tra ste 32 ad H 0 (z) 6 H 1 (z) 6 H 6 (z) 6 6 -channel 12 -tap MLT/MDCT y-s tat e H 0 (z) 32 H 1 (z) 32 H 31 (z) 32 32 32 -channel 512 -tap CMFB 18 -channel 36 -tap MLT/MDCT

Masking u u Masking discovered from psycho-acoustic experiments Human auditory system is less sensitive around a strong tonal signal

Masking: Original Signal

Masking Threshold u u Signal components below the masking threshold are deemed insignificant (can be quantized to zero) Components are computed from overlapping 1024 -long Hanning windows

Advanced Audio Coding (AAC) u u u u Successor of MP 3 Better audio quality than MP 3 at most bit rates Perceptually lossless at 320 kbps for 5 -channel surround sound (64 kbps/channel) Almost CD quality at 96 kbps (48 kbps/channel) AAC is part of the MPEG 4 Standard Default audio format of Apple’s i. Phone, i. Pod, i. Tunes; Sony Play. Station 3; Nintendo Wii MDCT – Scalar Quantization – Huffman Coding

Transformation Stage in AAC x[n] H 0 (z) 1024 128 H 1023(z) 1024 H 0 (z) 128 H 1 (z) H 127(z) 128 -channel 256 -tap MDCT for transient signals u x[n] 1024 -channel 2048 -tap MDCT for steady-state signals AAC adaptively switches between § 8 blocks of 128 -point MDCT with 256 -point windows § 1 block of 1024 -point MDCT with 2048 -point window § All windows have 50% overlap

JPEG Still Image Coding Standard Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218

Overall Structure of JPEG DC Color Converter u Level Offset 8 x 8 DCT DC Pred. DC VLC Uniform Quant. Color converter AC Zigzag Scan Run -Level AC VLC § RGB to YUV u Level offset § subtract 2^(N-1). N: bits / pixel. u Quantization § Different step size for different coefficients u DC § Predict from DC of previous block u AC: § Zigzag scan to get 1 -D data § Run-level: joint coding of non-zero coeffs and number of zeros before

JPEG Quantization u u u Uniform mid-tread quantizer Larger step sizes for chroma components Different coefficients have different step sizes § Smaller steps for low frequency coefficients (more bits) § Larger steps for high frequency coefficients (less bits) § Human visual system is not sensitive to error in high frequency u Luma Quantization Table 16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 u Actual step size: Scale the basic table by a quality factor 10 14 16 22 37 55 78 95 16 24 40 19 26 58 24 40 57 29 51 87 56 68 109 64 81 104 87 103 121 98 112 100 51 51 60 55 69 56 80 62 103 77 113 92 120 101 103 99 u Chroma Quantization Table 17 18 24 47 99 99 18 21 26 66 99 99 24 26 56 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99

Scaling of Quantization Table u Actual Q table = scaling x Basic Q table: § quality factor ≤ 50: scaling = 50/quality § quality factor > 50: scaling = 2 - quality/50 16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 10 14 16 22 37 55 78 95 16 24 40 19 26 58 24 40 57 29 51 87 56 68 109 64 81 104 87 103 121 98 112 100 51 51 60 55 69 56 80 62 103 77 113 92 120 101 103 99 Quality Factor Scaling -----------------10 5. 0 20 2. 5 50 1. 0 75 0. 5

DC Prediction u u u DC Coefficients: average of a block DC of neighboring blocks are still similar to each others: redundancy The redundancy can be removed by differential coding: § e(n) = DC(n) – DC(n-1) u Only encode the prediction error e(n) 8 x 8 DC coeffs of Lena

Coefficient Category u u Divide coefficients into categories of exponentially increased sizes Use Huffman code to encode category ID Use fixed length code within each category Similar to Exponential Golomb code Ranges Range Size DC Cat. ID AC Cat. ID 0 1 0 N/A 1 2 1 1 2, 3 4 2 2 8 3 3 -15, …, -8, 8, …, 15 16 4 4 -31, …, -16, …, 31 32 5 5 -63, …, -32, 64 6 6 … … [-32767, -16384], [16384, 32767] 32768 15 15 -1, -3, -2, -7, -6, -5, -4, 4, 5, 6, 7 32, …, 63

Coding of DC Coefficients u Encode e(n) = DC(n) – DC(n-1) DC Cat. Prediction Errors Base Codeword 0 0 010 1 -1, 1 011 2 -3, -2, 2, 3 100 3 -7, -6, -5, -4, 4, 5, 6, 7 00 4 -15, …, -8, 8, …, 15 101 5 -31, …, -16, …, 31 110 6 -63, …, -32, …, 63 1110 … … … Our example: DC: 8. Assume last DC: 5 Cat. : 2, index 3 e = 8 – 5 = 3. Bitstream: 10011 8 x 8

Coding of AC Coefficients u u Most non-zero coefficients are in the upper-left corner Zigzag scanning u Example 8 24 -2 0 -31 -4 6 -1 0 -12 -1 2 0 0 -2 -1 0 0 0 0 u 0 0 0 0 0 0 0 0 Zigzag scanning result (DC is coded separately): 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 -1 EOB <end-of-block>

A Complete Example u Original data: 2 -D DCT 124 125 122 120 122 119 117 118 39. 8 6. 5 -2. 2 1. 2 -0. 3 121 120 119 120 118 -102. 4 4. 5 2. 2 1. 1 0. 3 126 124 123 122 121 120 37. 7 1. 3 1. 7 0. 2 -1. 5 124 125 126 125 124 -5. 6 2. 2 -1. 3 -0. 8 1. 4 127 128 129 130 128 127 125 -3. 3 -0. 7 -1. 7 0. 7 -0. 6 143 142 140 139 139 5. 9 -0. 1 -0. 4 -0. 7 1. 9 150 148 152 152 150 151 3. 9 5. 5 2. 3 -0. 5 -0. 1 156 159 158 155 158 157 156 -3. 4 0. 5 -1. 0 0. 8 0. 9 u Quantized by basic table 2 -9 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q table: 16 11 … 12 … 14 … u -1. 0 -0. 6 -2. 2 0. 2 -2. 6 -0. 2 -0. 8 0. 0 0. 7 -1. 0 -0. 1 -1. 3 1. 4 -0. 5 0. 3 1. 1 -0. 4 0. 2 0. 1 0. 7 0. 0 -0. 1 0. 0 floor(39. 8/16 + 0. 5) = 2 floor(6. 5/11 + 0. 5) = 1 -floor(102. 4/12 + 0. 5) = -9 floor(37. 7/14 + 0. 5) = 3 Zigzag scanning 2 1 -9 3 EOB

A Complete Example u u Zigzag scanning 2 1 -9 3 EOB Inverse Quantization 32 11 -108 0 42 0 0 0 0 0 0 0 0 0 0 0 0 0 u 0 0 0 0 122 121 120 123 131 142 153 159 u Reconstructed block 122 121 120 123 130 141 152 159 121 120 122 130 141 152 159 121 119 122 129 140 151 158 MSE: 5. 67 120 119 118 121 128 139 150 157 119 118 117 120 128 139 150 157 119 117 120 127 138 149 156 118 117 120 127 138 149 156

Progressive JPEG u Baseline JPEG encodes the image block by block: § Decoder has to wait till the end to decode and display the entire image u Progressive: Coding DCT coefficients in multiple scans § The first scan generates a low-quality version of the entire image § Subsequent scans refine the entire image gradually. u Two procedures defined in JPEG: § Spectral selection: § Divide all DCT coefficients into several bands (low, middle, high frequency subbands…) § Bands are coded into separate scans § Successive approximation: § Send MSB of all coefficients first § Send lower significant bits in subsequent scans

JPEG Coding Result for Lena QF 25 Quality factor: 5 25 50 75 90 Blocking artifact QF 5

Summary u Transformation § § u Karhunen-Loeve Transform (KLT): optimal linear transform Discrete Cosine Transform (DCT): for images & video MDCT: overlapped higher frequency resolution for audio Discrete Wavelet Transform (DWT): multi-resolution representation MP 3 & AAC § Audio coding: FB/MDCT – Quantization – Huffman u JPEG: first international compression standard for still images § DCT – Quantization – Run-length – Huffman u JPEG 2000: latest technology, wavelet-based § Scalable, progressive coding with flexible intelligent functionalities