Audio Codecs Miikka Vilermo Nokia Research Center Audio

  • Slides: 27
Download presentation
Audio Codecs Miikka Vilermo Nokia Research Center – Audio Visual Systems Laboratory 1 ©

Audio Codecs Miikka Vilermo Nokia Research Center – Audio Visual Systems Laboratory 1 © NOKIA Audio Codecs

Introduction • Codecs evolve and new technologies emerge. • What can we do with

Introduction • Codecs evolve and new technologies emerge. • What can we do with all these codecs? • Will the emerging technologies change the ”status quo”? • What do people want? 2 © NOKIA Audio Codecs

Audio Codecs • Recent technical advances • Existing and emerging codecs • How good

Audio Codecs • Recent technical advances • Existing and emerging codecs • How good is good enough? – Codec requirements. • Important issues outside today’s presentation • One case closed another reopened (if time) • Questions 3 © NOKIA Audio Codecs

Recent Technical Advances • Spectral Band Replication (SBR) • Binaural Cue Coding (BCC) •

Recent Technical Advances • Spectral Band Replication (SBR) • Binaural Cue Coding (BCC) • Integer-to-Integer Modified Discrete Cosine Transform (INTMDCT) 4 © NOKIA Audio Codecs

Spectral Band Replication (SBR) • SBR is one method of Bandwidth Extension (BWE). •

Spectral Band Replication (SBR) • SBR is one method of Bandwidth Extension (BWE). • BWE is a class of methods to increase the perceived bandwidth without using many bits. Psychoacoustics… • SBR was introduced by Coding Technologies. • The technology is applicable to any coder. Eg: AAC+, MP 3 Pro • Achieves very high quality @ 48 kbps stereo. • SBR has been standardized as High Efficiency Advanced Audio Coding (HEAAC). 5 © NOKIA Audio Codecs

High-level block diagram of the SBR incorporated to an audio encoder (a) and audio

High-level block diagram of the SBR incorporated to an audio encoder (a) and audio decoder (b). (Juha Ojanperä) 6 © NOKIA Audio Codecs

Block diagram of the SBR encoder module combined with AAC core encoder. (Juha Ojanperä,

Block diagram of the SBR encoder module combined with AAC core encoder. (Juha Ojanperä, Miikka Vilermo) 7 © NOKIA Audio Codecs

Example of the time/frequency grid of the SBR. (Juha Ojanperä, Miikka Vilermo) 8 ©

Example of the time/frequency grid of the SBR. (Juha Ojanperä, Miikka Vilermo) 8 © NOKIA Audio Codecs

Binaural Cue Coding (BCC) • Traditional multichannel coding requires “the number of channels” x

Binaural Cue Coding (BCC) • Traditional multichannel coding requires “the number of channels” x “mono bitrate” kbps. • Without specific matrixing, traditional multichannel coding is restricted to a certain number of channels e. g. 5. 1 and speaker placement. • Binaural Cue Coding (BCC) has two versions: flexible rendering and natural rendering • In flexible rendering the original multichannel input is downmixed to (usually) one channel and the spatial information is sent as one separate low bitrate parametric stream. The decoder then renders as many channels as are needed based on the parameterised spatial image. The decoder can also apply Head Related Transfer Functions (HRTF’s) to create surround headphone playback. • In natural rendering one parameterised stream of spatial information is created for each of the original channels. This increases the bitrate and limits rendering options in the decoder, but also improves quality. • BCC can also be used as parametric stereo. 9 © NOKIA Audio Codecs

BCC continued • Typical parameters for BCC are: • Inter-Channel Level Difference (ICLD) •

BCC continued • Typical parameters for BCC are: • Inter-Channel Level Difference (ICLD) • Inter-Channel Time Difference (ICTD) • Inter-Channel Correlation (ICC) • Parameters are applied on critical bands. • BCC is based on the assumption that on every critical band the dominant source defines the spatial perception. • BCC doesn’t suffer from unmasking effects since the quantization noise is automatically rendered to the same direction as the source. 10 © NOKIA Audio Codecs

Integer-to-Integer Modified Discrete Cosine Transform (INTMDCT) • Lossy coding is important, but how could

Integer-to-Integer Modified Discrete Cosine Transform (INTMDCT) • Lossy coding is important, but how could you extend that to lossless coding? • Modified Discrete Cosine Transform (MDCT) is the most popular audio coding transform, but losslessly coding floating point values is difficult. • Integer-to-Integer Modified Discrete Cosine Transform (INTMDCT) is similar to MDCT but if the input is integers then the output is integers too. • It is possible to create an integer version of any transform where the transform matrix can be expressed as a product of matrices that have ones in the diagonal and all other elements are zero except in either one row or column. 11 © NOKIA Audio Codecs

INTMDCT Continued • Givens rotations (butterfly operations) can be expressed as such matrices. Thus

INTMDCT Continued • Givens rotations (butterfly operations) can be expressed as such matrices. Thus all matrices that can be expressed in Givens rotations can be used as basis for an integer transform. • MPEG has an ongoing standardisation on lossless coding. INTMDCT was a basis for that work. 12 © NOKIA Audio Codecs

Block diagram of scalable lossless INTMDCT enhanced perceptual codec 13 © NOKIA Audio Codecs

Block diagram of scalable lossless INTMDCT enhanced perceptual codec 13 © NOKIA Audio Codecs

Existing and Emerging Codecs • Internet codecs • Multichannel codecs • Lossless codecs •

Existing and Emerging Codecs • Internet codecs • Multichannel codecs • Lossless codecs • Low delay codecs • New codecs • Others 14 © NOKIA Audio Codecs

Internet Codecs • MP 3 • MPEG-1 layer 3 • largest user base •

Internet Codecs • MP 3 • MPEG-1 layer 3 • largest user base • near CD-quality can be over 192 kbps for difficult material • Ogg Vorbis • open source • claimed to be IPR free • quality around mp 3 but varies greatly between samples • AAC • MPEG 2 and 4 • lowest bitrate for CD-quality • near CD-quality around 128 kbps even for difficult material • Quicktime and Real. Audio use AAC for high bitrates • Windows Media • proprietary • large user base through windows • better than mp 3, WMA 9 comes close to AAC in quality • includes lossless and multichannel coding 15 © NOKIA Audio Codecs

Internet Codecs Continued • Real. Audio • uses AAC for high bitrates • proprietary

Internet Codecs Continued • Real. Audio • uses AAC for high bitrates • proprietary low bitrate codecs, the same as in earlier versions • proprietary multichannel codecs • built for streaming • ATRAC • proprietary • ATRAC 3 plus for low bitrates (<=64 kbps) • ATRAC 3 for high bitrates • mp 3 like quality in high bitrates • better than AAC at low bitrates 16 © NOKIA Audio Codecs

Multichannel Codecs • Windows Media 9 and Real. Audio 10 include multichannel coding, AAC

Multichannel Codecs • Windows Media 9 and Real. Audio 10 include multichannel coding, AAC and AAC+ support multichannel coding • AC 3 (Audio Coding, Dolby) • proprietary • largest installed user base • quality close to mp 3 • production point of view taken into account • DTS (Digital Theater Systems) • proprietary • high bitrate, high quality • MLP (Meridian Lossless Packing) • proprietary • lossless • SDDS (Sony Dynamic Digital Sound) • proprietary • based on ATRAC 17 © NOKIA Audio Codecs

Lossless Codecs • Compression ratios 1/3 -1/2 depending on the material • FLAC (Free

Lossless Codecs • Compression ratios 1/3 -1/2 depending on the material • FLAC (Free Lossless Audio Coding) • free • Monkey’s Audio • free • Windows Media • Many others exist • MPEG has an ongoing standardization work 18 © NOKIA Audio Codecs

Low-Delay Codecs • G. 722 based teleconferencing codecs • low quality, enough for speech

Low-Delay Codecs • G. 722 based teleconferencing codecs • low quality, enough for speech @ 64 kbps • AAC-LC • MPEG 4 • Quality better than mp 3 • Most ordinary codecs not good enough for two-way communications, especially AAC+ has a very high delay 19 © NOKIA Audio Codecs

New Codecs • Spectral Band Replication • AAC+ = MPEG HE-AAC , very high

New Codecs • Spectral Band Replication • AAC+ = MPEG HE-AAC , very high quality around 48 kbps • mp 3+ • AMR-WB+ (Adaptive Multi-Rate Wide. Band, Nokia) • good quality around 24 kbps • optional codec in 3 GPP alongside with AAC+ • Discreet multichannel • AAC+ discreet 5. 1 @ 128 kbps • E-AC 3 (Enhanced Audio Coding, Dolby) • Binaural Cue Coding • mp 3 surround 192 kbps (Fh. G, Agere) • HE-AAC surround 64 kbps, supposedly better than AC-3 at ? ? ? kbps • MPEG standardization about to start • Spectral Band Replication & Binaural Cue Coding • E-AAC+ (Enhanced AAC+, Fh. G, CT, Philips) 20 © NOKIA Audio Codecs

Other Codecs • SBC (Sub Band Coding) • used with bluetooth • low complexity,

Other Codecs • SBC (Sub Band Coding) • used with bluetooth • low complexity, low power • near CD quality @ 320 kbps • Dolby-E • multichannel • synchronous with video frames • high bitrates, but studied tandem coding quality 21 © NOKIA Audio Codecs

How Good Is Good Enough? – Codec Requirements • Many users are happy with

How Good Is Good Enough? – Codec Requirements • Many users are happy with 128 kbps mp 3, but others or moving to 192 kbps mp 3 • i. Tunes AAC 128 is near CD-quality but not fully transparent. However, this seems to be enough judging by the popularity of the service. • On the other hand, Real. Audio AAC 192 is practically transparent. • Personally AAC 320 kbps is enough but then lossless codecs are close at 700 kbps. • Some Internet music services offer songs with lossless compression. • One unanswered question is: What is enough for streaming? • Streaming over fixed line at 128 kbps can be achieved. But how about wireless links: 3 G, WLAN, bluetooth? And in many cases there has to be room for video. 22 © NOKIA Audio Codecs

How Good Is Good Enough? – Codec Requirements Contd. • Delay • usually high

How Good Is Good Enough? – Codec Requirements Contd. • Delay • usually high efficiency means long delay, AAC+ is a prime example • Will multichannel become important? • Error resilience is a must in wireless applications • Scalability would be useful, some new ideas presented recently by A. Aggarwal • Editability • Transcoding is a sin! • Reversible codecs • High enough bitrate 23 © NOKIA Audio Codecs

Important Issues Outside Today’s Presentation • DMR (Digital Rights Management) • usability • parametric

Important Issues Outside Today’s Presentation • DMR (Digital Rights Management) • usability • parametric coding 24 © NOKIA Audio Codecs

One Case Closed and Another Reopened (if time) • Louder Sounds Can Produce Less

One Case Closed and Another Reopened (if time) • Louder Sounds Can Produce Less Forward Masking: Effects of Component Phase in Complex Tones, Gockel et al. , J. Acoust. Soc. Am. , Vol. 114, No. 2 August 2003 • Near-optimal selection of encoding parameters for audio coding, Aggarwal et Al. , IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP '01), 7 -11 May, 2001. Proceedings, Volume 5, Pages: 3269 – 3272. • M. Wolters, ‘A closer look into MPEG-4 high efficiency AAC’, 115 th AES Convention, New York, NY, Oct. 2003 25 © NOKIA Audio Codecs

Conclusion • Existing codecs have matured and added new features • For most needs

Conclusion • Existing codecs have matured and added new features • For most needs there already is a codec • Emerging codecs make possible good quality stereo @ 48 kbps and 5. 1 multichannel @ 64 kbps • User requirements are still a question 26 © NOKIA Audio Codecs

27 © NOKIA Audio Codecs

27 © NOKIA Audio Codecs