Audio Compression Techniques MUMT 611 January 2005 Assignment

  • Slides: 26
Download presentation
Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik 1

Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik 1

Introduction n Digital Audio Compression ¨ Removal of redundant or otherwise irrelevant information from

Introduction n Digital Audio Compression ¨ Removal of redundant or otherwise irrelevant information from audio signal ¨ Audio compression algorithms are often referred to as “audio encoders” n Applications ¨ Reduces required storage space ¨ Reduces required transmission bandwidth 2

Audio Compression n Audio signal – overview ¨ Sampling rate (# of samples per

Audio Compression n Audio signal – overview ¨ Sampling rate (# of samples per second) ¨ Bit rate (# of bits per second). Typically, uncompressed stereo 16 -bit 44. 1 KHz signal has a 1. 4 MBps bit rate ¨ Number of channels (mono / stereo / multichannel) n Reduction by lowering those values or by data compression / encoding 3

Audio Data Compression n Redundant information ¨ Implicit in the remaining information ¨ Ex.

Audio Data Compression n Redundant information ¨ Implicit in the remaining information ¨ Ex. oversampled audio signal n Irrelevant information ¨ Perceptually insignificant ¨ Cannot be recovered from remaining information 4

Audio Data Compression n Lossless Audio Compression ¨ Removes redundant data ¨ Resulting signal

Audio Data Compression n Lossless Audio Compression ¨ Removes redundant data ¨ Resulting signal is same as original – perfect reconstruction n Lossy Audio Encoding ¨ Removes irrelevant data ¨ Resulting signal is similar to original 5

Audio Data Compression n Audio vs. Speech Compression Techniques ¨ Speech Compression uses a

Audio Data Compression n Audio vs. Speech Compression Techniques ¨ Speech Compression uses a human vocal tract model to compress signals ¨ Audio Compression does not use this technique due to larger variety of possible signal variations 6

Generic Audio Encoder 7

Generic Audio Encoder 7

Generic Audio Encoder n Psychoacoustic Model ¨ Psychoacoustics – study of how sounds are

Generic Audio Encoder n Psychoacoustic Model ¨ Psychoacoustics – study of how sounds are perceived by humans ¨ Uses perceptual coding n eliminate information from audio signal that is inaudible to the ear ¨ Detects conditions under which different audio signal components mask each other 8

Psychoacoustic Model n Signal Masking ¨ Threshold cut-off ¨ Spectral (Frequency / Simultaneous) Masking

Psychoacoustic Model n Signal Masking ¨ Threshold cut-off ¨ Spectral (Frequency / Simultaneous) Masking ¨ Temporal Masking n Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain 9

Signal Masking n Threshold cut-off ¨ Hearing threshold level – a function of frequency

Signal Masking n Threshold cut-off ¨ Hearing threshold level – a function of frequency ¨ Any frequency components below the threshold will not be perceived by human ear 10

Signal Masking n Spectral Masking ¨A frequency component can be partly or fully masked

Signal Masking n Spectral Masking ¨A frequency component can be partly or fully masked by another component that is close to it in frequency ¨ This shifts the hearing threshold 11

Signal Masking n Temporal Masking ¨A quieter sound can be masked by a louder

Signal Masking n Temporal Masking ¨A quieter sound can be masked by a louder sound if they are temporally close ¨ Sounds that occur both (shortly) before and after volume increase can be masked 12

Spectral Analysis n Tasks of Spectral Analysis ¨ To derive masking thresholds to determine

Spectral Analysis n Tasks of Spectral Analysis ¨ To derive masking thresholds to determine which signal components can be eliminated ¨ To generate a representation of the signal to which masking thresholds can be applied n Spectral Analysis is done through transforms or filter banks 13

Spectral Analysis n Transforms ¨ Fast Fourier Transform (FFT) ¨ Discrete Cosine Transform (DCT)

Spectral Analysis n Transforms ¨ Fast Fourier Transform (FFT) ¨ Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only ¨ Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT 14

Spectral Analysis n Filter Banks ¨ Time sample blocks are passed through a set

Spectral Analysis n Filter Banks ¨ Time sample blocks are passed through a set of bandpass filters ¨ Masking thresholds are applied to resulting frequency subband signals ¨ Poly-phase and wavelet banks are most popular filter structures 15

Filter Bank Structures n Polyphase Filter Bank [used in all of the MPEG-1 encoders]

Filter Bank Structures n Polyphase Filter Bank [used in all of the MPEG-1 encoders] ¨ Signal is separated into subbands, the widths of which are equal over the entire frequency range ¨ The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process) 16

Filter Bank Structures n Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC)

Filter Bank Structures n Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] ¨ Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies) ¨ This allows for better time resolution (ex. short attacks), but at expense of frequency resolution 17

Noise Allocation n System Task: derive and apply shifted hearing threshold to the input

Noise Allocation n System Task: derive and apply shifted hearing threshold to the input signal ¨ Anything below the threshold doesn’t need to be transmitted ¨ Any noise below the threshold is irrelevant n Frequency component quantization ¨ Tradeoff between space and noise ¨ Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation 18

Noise Allocation n Pre-echo ¨ In case a single audio block contains silence followed

Noise Allocation n Pre-echo ¨ In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding ¨ This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case ¨ This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking) 19

Pre-echo Effect 20

Pre-echo Effect 20

Additional Encoding Techniques n Other encoding techniques are available (alternative or in combination) ¨

Additional Encoding Techniques n Other encoding techniques are available (alternative or in combination) ¨ Predictive Coding ¨ Coupling / Delta Encoding ¨ Huffman Encoding 21

Additional Encoding Techniques n Predictive Coding ¨ Often used in speech and image compression

Additional Encoding Techniques n Predictive Coding ¨ Often used in speech and image compression ¨ Estimates the expected value for each sample based on previous sample values ¨ Transmits/stores the difference between the expected and received value ¨ Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample ¨ Used for additional compression in MPEG 2 AAC 22

Additional Encoding Techniques n Coupling / Delta encoding ¨ Used in cases where audio

Additional Encoding Techniques n Coupling / Delta encoding ¨ Used in cases where audio signal consists of two or more channels (stereo or surround sound) ¨ Similarities between channels are used for compression ¨ A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode ¨ This is a case of lossless encoding process 23

Additional Encoding Techniques n Huffman Coding ¨ Information-theory-based technique ¨ An element of a

Additional Encoding Techniques n Huffman Coding ¨ Information-theory-based technique ¨ An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table ¨ Implemented using a look-up tables in encoder and in decoder ¨ Provides substantial lossless compression, but requires high computational power and therefore is not very popular ¨ Used by MPEG 1 and MPEG 2 AAC 24

Encoding - Final Stages Audio data packed into frames n Frames stored or transmitted

Encoding - Final Stages Audio data packed into frames n Frames stored or transmitted n 25

Conclusion n HTML Bibliography http: //www. music. mcgill. ca/~pkoles n Questions 26

Conclusion n HTML Bibliography http: //www. music. mcgill. ca/~pkoles n Questions 26