CS 414 Multimedia Systems Design Lecture 11 MP

  • Slides: 30
Download presentation
CS 414 – Multimedia Systems Design Lecture 11 – MP 3 and MP 4

CS 414 – Multimedia Systems Design Lecture 11 – MP 3 and MP 4 Audio (Part 7) Klara Nahrstedt Spring 2012 CS 414 - Spring 2012

Administrative n MP 1 – deadline February 18 CS 414 - Spring 2012

Administrative n MP 1 – deadline February 18 CS 414 - Spring 2012

Outline MP 3 Audio Encoding n MP 4 Audio n n Reading: Media Coding

Outline MP 3 Audio Encoding n MP 4 Audio n n Reading: Media Coding book, Section 7. 7. 2 – 7. 7. 5 ¨ Recommended Paper on MP 3: Davis Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia, pp. 6 -74, 1995 ¨ Recommended books on JPEG/ MPEG Audio/Video Fundamentals: ¨ n Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996 CS 414 - Spring 2012

Why Compression is Needed n Data rate = sampling rate * quantization bits *

Why Compression is Needed n Data rate = sampling rate * quantization bits * channels (+ control information) n For example (digital audio): ¨ 44100 Hz; 16 bits; 2 channels ¨generates about 1. 4 M of data per second; 84 M per minute; 5 G per hour

MPEG-1 Audio Lossy compression of audio n In late 1980’s ISO’s MPEG group started

MPEG-1 Audio Lossy compression of audio n In late 1980’s ISO’s MPEG group started to standardize n ¨ TV broadcasting ¨ Use of Audio on CD-ROM (later DVD) MPEG-1 Audio – 1992 n MPEG-2 Audio - 1994 n MPEG-1 Audio Layer I, III n CS 414 - Spring 2012

Criteria for A Good Standard n n n n Achieve desired outcome Be comprehensible

Criteria for A Good Standard n n n n Achieve desired outcome Be comprehensible Allow efficient implementation Support competition Give benchmark tests Be supported by industry Be good for end users …. n n n Two models: implement first, then standardize first, then implement

MPEG-1 Audio Layer II n n Called MP 2 Dominant standard for audio broadcasting

MPEG-1 Audio Layer II n n Called MP 2 Dominant standard for audio broadcasting ¨ DAB n Came out of MUSICAM codecs with bit rates 64 -196 kbps ¨ n n n digital radio and DVB digital television MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio Sampling rates: 32, 44. 1, 48 k. Hz Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps Format: mono, stereo, dual channel, … ¨ MP 2 – sub-band audio encoder in time domain

MPEG-1 Audio Layer III n MPEG-1 Layer III is called MP 3 format ¨

MPEG-1 Audio Layer III n MPEG-1 Layer III is called MP 3 format ¨ Popular for PC and Internet applications ¨ Goal to compress to 128 kbps, but can be compressed to higher or lower resulting quality ¨ Utilization of psychoacoustics n Scientific study of sound perception. CS 414 - Spring 2012

MPEG Audio – MP 3 First psychoacoustic masking code was proposed in 1979 in

MPEG Audio – MP 3 First psychoacoustic masking code was proposed in 1979 in AT&T – Bell Labs, Murray Hill. n MP 3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding) n MPEG-1 Audio Layer III – public release 1993 n MPEG-2 Audio III – public release 1995 n CS 414 - Spring 2012

MPEG Audio – MP 3 1997 – mp 3. com – offering thousands of

MPEG Audio – MP 3 1997 – mp 3. com – offering thousands of MP 3 s created by independent artists for free n 1999 – Napster MP 3 peer-to-peer file sharing n Problem: copyright infringement n Authorized services: Amazon. com, Rhapsody, Juno Records, . . n CS 414 - Spring 2012

MPEG-1 Audio Encoding n Characteristics ¨ Precision 16 bits ¨ Sampling frequency: 32 KHz,

MPEG-1 Audio Encoding n Characteristics ¨ Precision 16 bits ¨ Sampling frequency: 32 KHz, 44. 1 KHz, 48 KHz ¨ 3 compression layers: Layer 1, Layer 2, Layer 3 (MP 3) Layer 3: 32 -320 kbps, target 64 kbps n Layer 2: 32 -384 kbps, target 128 kbps n Layer 1: 32 -448 kbps, target 192 kbps n CS 414 - Spring 2012

MPEG Audio Encoding Steps CS 414 - Spring 2012

MPEG Audio Encoding Steps CS 414 - Spring 2012

MPEG Audio Filter Bank n Filter bank divides input into multiple sub-bands (32 equal

MPEG Audio Filter Bank n Filter bank divides input into multiple sub-bands (32 equal frequency sub-bands) n Sub-band i defined n - filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer CS 414 - Spring 2012

MPEG Audio Psycho-acoustic Model n n n MPEG audio compresses by removing acoustically irrelevant

MPEG Audio Psycho-acoustic Model n n n MPEG audio compresses by removing acoustically irrelevant parts of audio signals Takes advantage of human auditory systems inability to hear quantization noise under auditory masking Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. CS 414 - Spring 2012

Loudness and Pitch (Review on Psychoacoustic Effects) n More sensitive to loudness at mid

Loudness and Pitch (Review on Psychoacoustic Effects) n More sensitive to loudness at mid frequencies than at other frequencies ¨ intermediate frequencies at [500 hz, 5000 hz] ¨ Human hearing frequencies at [20 hz, 20000 hz] n Perceived loudness of a sound changes based on frequency of that sound ¨ basilar membrane reacts more to intermediate frequencies than other frequencies CS 414 - Spring 2012

Fletcher-Munson Contours Each contour represents an equal perceived sound Perception sensitivity (loudness) is not

Fletcher-Munson Contours Each contour represents an equal perceived sound Perception sensitivity (loudness) is not linear across all frequencies and intensities CS 414 - Spring 2012

Masking Effects (Review of Psychoacoustic Effects) Frequency masking Temporal masking CS 414 - Spring

Masking Effects (Review of Psychoacoustic Effects) Frequency masking Temporal masking CS 414 - Spring 2012

MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize

MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band CS 414 - Spring 2012

MPEG Audio Bit Allocation n n This process determines number of code bits allocated

MPEG Audio Bit Allocation n n This process determines number of code bits allocated to each sub-band based on information from the psychoacoustic model Algorithm: 1. Compute mask-to-noise ratio: MNR=SNR-SMR n Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels Get MNR for each sub-band 3. Search for sub-band with the lowest MNR 4. Allocate code bits to this sub-band. 2. n If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1 CS 414 - Spring 2012

Audio Quality n Bitrate ¨ With too low bit rate, we get compression artifacts

Audio Quality n Bitrate ¨ With too low bit rate, we get compression artifacts Ringing n Pre-echo – sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as cymbals n ¨ n Occurs in transform-based audio compression algorithms Quality of encoder and encoding parameters ¨ Constant Bit rate encoding ¨ Variable Bit rate encoding CS 414 - Spring 2012

MP 3 Audio Format Source: http: //wiki. hydrogenaudio. org/images/e/ee/Mp 3 filestructure. jpg CS 414

MP 3 Audio Format Source: http: //wiki. hydrogenaudio. org/images/e/ee/Mp 3 filestructure. jpg CS 414 - Spring 2012

MPEG Audio Comments n n n Precision of 16 bits per sample is needed

MPEG Audio Comments n n n Precision of 16 bits per sample is needed to get good SNR ratio Noise we are getting is quantization noise from the digitization process For each added bit, we get 6 d. B better SNR ratio Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away Raising noise floor is the same as using less bits and using less bits is the same as compression CS 414 - Spring 2012

Successor of MP 3 Advanced Audio Coding (AAC) – now part of MPEG-4 Audio

Successor of MP 3 Advanced Audio Coding (AAC) – now part of MPEG-4 Audio n Inclusion of 48 full-bandwidth audio channels n Default audio format for i. Phone, i. Pad, Nintendo, Play. Station, Nokia, Android, Black. Berry n Introduced 1997 as MPEG-2 Part 7 n In 1999 – updated and included in MPEG-4 n CS 414 - Spring 2012

AAC’s Improvements over MP 3 More sample frequencies (8 -96 k. Hz) n Arbitrary

AAC’s Improvements over MP 3 More sample frequencies (8 -96 k. Hz) n Arbitrary bit rates and variable frame length n Higher efficiency and simpler filterbank n ¨ Uses pure MDCT (modified discrete cosine transform) ¨ Used in Windows Media Audio CS 414 - Spring 2012

MPEG-4 Audio n Variety of applications ¨ General audio signals ¨ Speech signals ¨

MPEG-4 Audio n Variety of applications ¨ General audio signals ¨ Speech signals ¨ Synthetic audio ¨ Synthesized speech (structured audio) CS 414 - Spring 2012

MPEG-4 Audio Part 3 n Includes variety of audio coding technologies ¨ Lossy n

MPEG-4 Audio Part 3 n Includes variety of audio coding technologies ¨ Lossy n speech coding (e. g. , CELP) CELP – code-excited linear prediction – speech coding ¨ General audio coding (AAC) ¨ Lossless audio coding ¨ Text-to-Speech interface ¨ Structured Audio (e. g. , MIDI) CS 414 - Spring 2012

MPEG-4 Part 14 Called MP 4 with Extension. mp 4 n Multimedia container format

MPEG-4 Part 14 Called MP 4 with Extension. mp 4 n Multimedia container format n Stores digital video and audio streams and allows streaming over Internet n Container or wrapper format n ¨ meta-file format whose spec describes how different data elements and metadata coesit in computer file CS 414 - Spring 2012

MPEG-4 Audio Bit-rate 2 -64 kbps n Scalable for variable rates n MPEG-4 defines

MPEG-4 Audio Bit-rate 2 -64 kbps n Scalable for variable rates n MPEG-4 defines set of coders n ¨ Parametric Coding Techniques: low bit-rate 2 -6 kbps, 8 k. Hz sampling frequency ¨ Code Excited Linear Prediction: medium bit-rates 6 -24 kbps, 8 and 16 k. Hz sampling rate ¨ Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 k. Hz CS 414 - Spring 2011

Conclusion MPEG Audio is an integral part of the MPEG standard to be considered

Conclusion MPEG Audio is an integral part of the MPEG standard to be considered together with video n MPEG-4 Audio represents an major extension in terms of capabilities to MPEG -1 Audio n CS 414 - Spring 2012