CS 414 Multimedia Systems Design Lecture 11 MP
- Slides: 30
CS 414 – Multimedia Systems Design Lecture 11 – MP 3 and MP 4 Audio (Part 7) Klara Nahrstedt Spring 2012 CS 414 - Spring 2012
Administrative n MP 1 – deadline February 18 CS 414 - Spring 2012
Outline MP 3 Audio Encoding n MP 4 Audio n n Reading: Media Coding book, Section 7. 7. 2 – 7. 7. 5 ¨ Recommended Paper on MP 3: Davis Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia, pp. 6 -74, 1995 ¨ Recommended books on JPEG/ MPEG Audio/Video Fundamentals: ¨ n Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996 CS 414 - Spring 2012
Why Compression is Needed n Data rate = sampling rate * quantization bits * channels (+ control information) n For example (digital audio): ¨ 44100 Hz; 16 bits; 2 channels ¨generates about 1. 4 M of data per second; 84 M per minute; 5 G per hour
MPEG-1 Audio Lossy compression of audio n In late 1980’s ISO’s MPEG group started to standardize n ¨ TV broadcasting ¨ Use of Audio on CD-ROM (later DVD) MPEG-1 Audio – 1992 n MPEG-2 Audio - 1994 n MPEG-1 Audio Layer I, III n CS 414 - Spring 2012
Criteria for A Good Standard n n n n Achieve desired outcome Be comprehensible Allow efficient implementation Support competition Give benchmark tests Be supported by industry Be good for end users …. n n n Two models: implement first, then standardize first, then implement
MPEG-1 Audio Layer II n n Called MP 2 Dominant standard for audio broadcasting ¨ DAB n Came out of MUSICAM codecs with bit rates 64 -196 kbps ¨ n n n digital radio and DVB digital television MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio Sampling rates: 32, 44. 1, 48 k. Hz Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps Format: mono, stereo, dual channel, … ¨ MP 2 – sub-band audio encoder in time domain
MPEG-1 Audio Layer III n MPEG-1 Layer III is called MP 3 format ¨ Popular for PC and Internet applications ¨ Goal to compress to 128 kbps, but can be compressed to higher or lower resulting quality ¨ Utilization of psychoacoustics n Scientific study of sound perception. CS 414 - Spring 2012
MPEG Audio – MP 3 First psychoacoustic masking code was proposed in 1979 in AT&T – Bell Labs, Murray Hill. n MP 3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding) n MPEG-1 Audio Layer III – public release 1993 n MPEG-2 Audio III – public release 1995 n CS 414 - Spring 2012
MPEG Audio – MP 3 1997 – mp 3. com – offering thousands of MP 3 s created by independent artists for free n 1999 – Napster MP 3 peer-to-peer file sharing n Problem: copyright infringement n Authorized services: Amazon. com, Rhapsody, Juno Records, . . n CS 414 - Spring 2012
MPEG-1 Audio Encoding n Characteristics ¨ Precision 16 bits ¨ Sampling frequency: 32 KHz, 44. 1 KHz, 48 KHz ¨ 3 compression layers: Layer 1, Layer 2, Layer 3 (MP 3) Layer 3: 32 -320 kbps, target 64 kbps n Layer 2: 32 -384 kbps, target 128 kbps n Layer 1: 32 -448 kbps, target 192 kbps n CS 414 - Spring 2012
MPEG Audio Encoding Steps CS 414 - Spring 2012
MPEG Audio Filter Bank n Filter bank divides input into multiple sub-bands (32 equal frequency sub-bands) n Sub-band i defined n - filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer CS 414 - Spring 2012
MPEG Audio Psycho-acoustic Model n n n MPEG audio compresses by removing acoustically irrelevant parts of audio signals Takes advantage of human auditory systems inability to hear quantization noise under auditory masking Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. CS 414 - Spring 2012
Loudness and Pitch (Review on Psychoacoustic Effects) n More sensitive to loudness at mid frequencies than at other frequencies ¨ intermediate frequencies at [500 hz, 5000 hz] ¨ Human hearing frequencies at [20 hz, 20000 hz] n Perceived loudness of a sound changes based on frequency of that sound ¨ basilar membrane reacts more to intermediate frequencies than other frequencies CS 414 - Spring 2012
Fletcher-Munson Contours Each contour represents an equal perceived sound Perception sensitivity (loudness) is not linear across all frequencies and intensities CS 414 - Spring 2012
Masking Effects (Review of Psychoacoustic Effects) Frequency masking Temporal masking CS 414 - Spring 2012
MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band CS 414 - Spring 2012
MPEG Audio Bit Allocation n n This process determines number of code bits allocated to each sub-band based on information from the psychoacoustic model Algorithm: 1. Compute mask-to-noise ratio: MNR=SNR-SMR n Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels Get MNR for each sub-band 3. Search for sub-band with the lowest MNR 4. Allocate code bits to this sub-band. 2. n If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1 CS 414 - Spring 2012
Audio Quality n Bitrate ¨ With too low bit rate, we get compression artifacts Ringing n Pre-echo – sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as cymbals n ¨ n Occurs in transform-based audio compression algorithms Quality of encoder and encoding parameters ¨ Constant Bit rate encoding ¨ Variable Bit rate encoding CS 414 - Spring 2012
MP 3 Audio Format Source: http: //wiki. hydrogenaudio. org/images/e/ee/Mp 3 filestructure. jpg CS 414 - Spring 2012
MPEG Audio Comments n n n Precision of 16 bits per sample is needed to get good SNR ratio Noise we are getting is quantization noise from the digitization process For each added bit, we get 6 d. B better SNR ratio Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away Raising noise floor is the same as using less bits and using less bits is the same as compression CS 414 - Spring 2012
Successor of MP 3 Advanced Audio Coding (AAC) – now part of MPEG-4 Audio n Inclusion of 48 full-bandwidth audio channels n Default audio format for i. Phone, i. Pad, Nintendo, Play. Station, Nokia, Android, Black. Berry n Introduced 1997 as MPEG-2 Part 7 n In 1999 – updated and included in MPEG-4 n CS 414 - Spring 2012
AAC’s Improvements over MP 3 More sample frequencies (8 -96 k. Hz) n Arbitrary bit rates and variable frame length n Higher efficiency and simpler filterbank n ¨ Uses pure MDCT (modified discrete cosine transform) ¨ Used in Windows Media Audio CS 414 - Spring 2012
MPEG-4 Audio n Variety of applications ¨ General audio signals ¨ Speech signals ¨ Synthetic audio ¨ Synthesized speech (structured audio) CS 414 - Spring 2012
MPEG-4 Audio Part 3 n Includes variety of audio coding technologies ¨ Lossy n speech coding (e. g. , CELP) CELP – code-excited linear prediction – speech coding ¨ General audio coding (AAC) ¨ Lossless audio coding ¨ Text-to-Speech interface ¨ Structured Audio (e. g. , MIDI) CS 414 - Spring 2012
MPEG-4 Part 14 Called MP 4 with Extension. mp 4 n Multimedia container format n Stores digital video and audio streams and allows streaming over Internet n Container or wrapper format n ¨ meta-file format whose spec describes how different data elements and metadata coesit in computer file CS 414 - Spring 2012
MPEG-4 Audio Bit-rate 2 -64 kbps n Scalable for variable rates n MPEG-4 defines set of coders n ¨ Parametric Coding Techniques: low bit-rate 2 -6 kbps, 8 k. Hz sampling frequency ¨ Code Excited Linear Prediction: medium bit-rates 6 -24 kbps, 8 and 16 k. Hz sampling rate ¨ Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 k. Hz CS 414 - Spring 2011
Conclusion MPEG Audio is an integral part of the MPEG standard to be considered together with video n MPEG-4 Audio represents an major extension in terms of capabilities to MPEG -1 Audio n CS 414 - Spring 2012
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Multimedia becomes interactive multimedia when
- Non linear multimedia example
- Csc 253
- Esa multimedia.esa.int./multimedia/virtual-tour-iss
- 0 414
- Mil-std-414
- Mil std 414
- 414 climate change
- Cs 414
- Cmsc 414
- Cse 414
- Gcd of 414 and 662
- Cmsc 414
- Cmsc 414
- Cmsc 414
- Advanced operating system notes
- Lecture sound systems
- Lecture sound systems
- Multimedia objects can be classified into 2 categories
- Introduction to multimedia systems
- Cs598
- Mmdbms supports
- Eurocode 2 lap length table
- Elemen urban design
- Elements and principles of design ppt
- Lecture hall acoustic design
- Game design lecture
- Computer-aided drug design lecture notes
- Cmos vlsi design lecture notes
- Basic stages of multimedia project development