Video Coding TSBK 01 Image Coding and Data

Video Coding TSBK 01 Image Coding and Data Compression Lecture 10 Jörgen Ahlberg

Outline I. Colour coding II. Moving images: From 2 D to 3 D? III. Hybrid coding IV. Video coding standards

Part I: Colour Coding The base colours of colour television are – Red: 700 nm – Green: 546 nm – Blue: 435 nm Three base colours enough to synthesize any visible colour!

The Colour Vector B G R In this plane, the luminance Y = R+G+B = 1

The PAL colours R G Y Matrix B Y = 0. 30 B + 0. 59 G + 0. 11 B Cr = 0. 70 R - 0. 59 G - 0. 11 B Cb = - 0. 30 R - 0. 59 G + 0. 89 B Y luminance; Cr, Cb chrominance R-Y B-Y

Digital Colour Coding l Change basis to YUV (almost the same as YCr. Cb). – For more info on color spaces, see colour FAQ at www. poynton. com/Poynton-color. html l The Human Visual System perceives the luminance in higher resolution than the chrominance! à Subsample the colour components. Y U 4: 2: 0 V Y U 4: 2: 2 V

Part II: Coding of Moving Images Principle I - Extend known methods to 3 D Prestanda (bpp) Complexity Decoding complexity 6– 8 Low 0. 5 – 2 Very high Low Predictive 2– 5 Low Transform 0. 5 – 1. 5 High Subband/ Wavelet 0. 1 – 1. 0 High Fractal 0. 1 - 0. 5 Very high Low Coding Method PCM VQ

Extending 2 D Methods Predictive coding l – 3 D predictors – Motion compensated predictors Transform coding l – 3 D transforms Subband coding l – 3 D subband filters BUT! The properties of the image signal are different in the temporal and the spatial domain!

Thus: Principle II: Hybrid methods Hybrid predictive/transform coding popular++

Part III: Hybrid Coding l Combine predictive coding and transform coding. l Use predictive coding to predict the next frame in the sequence. l Use transform coding to code the prediction error.

Transform Coding T Q VLC T: Transform Q: Quantizer VLC: Variable Length Coder

Predictive Coding Q VLC Q-1 P Q: Q-1: P: Quantizer Inverse quantizer (reconstructor) Predictor

Hybrid Coding T Q VLC Q-1 T-1 P

Frame Prediction Intra-coded I-frame Predictively coded P-frames Better prediction if it can compensate for motion!

Motion Compensation

Motion Compensated Hybrid Coding VLC TQ TQ-1 TQ: Transform + quantization P ME ME: Motion estimation VLC

Motion Compensation l Typically one motion vector per macroblock (4 transform blocks) l Motion estimation is a time consuming process – Hierarchical motion estimation – Maximum length of motion vectors – Clever search strategies l Motion vector accuracy: – Integer, half or quarter pixel – Bilinear interpolation

Part IV: Video Coding Standards Mobile Videophone ISDN videophone over PSTN videophone 8 16 64 384 Video CD 1. 5 kbit/s Very low bitrate Digital TV 5 HDTV 20 Mbit/s Low bitrate MPEG-4 H. 263 H. 261 MPEG-1 Medium bitrate MPEG-2 High bitrate

Standards l H. 26 x – Standards for real time communication like video telephony and video conferencing. – Standardized by ITU. l MPEG – Standards for stored video data like movies on CDs, DVDs, etc. – Standardized by ISO.

H. 261 l Standard for ISDN picture phones in 1990. l Motion compensation: – One motion vector per macroblock. – One macroblock = four 8£ 8 luminance blocks + two chrominance blocks (one U and one V). – Motion vectors max 15 pixels long in each direction. l Format: – CIF (352£ 288) or QCIF (176£ 144) – 7. 5 – 30 frames/s. l Bitrate: Multiple of 64 kbit/s (=ISDN) including audio. l Quality: Acceptable for small motion at 128 kbit/s.

H. 263 l Standard for picture telephones over analog subscriber lines in 1995. l Format: – CIF, QCIF or Sub-QCIF. – Usually less than 10 frames/s. l Bitrate: Typically 20 – 30 kbit/s. l Quality: With new options as good as H. 261 (at half the bitrate).

MPEG l Moving Pictures Expert Group – a committee under ISO and IEC. l Original plan: – MPEG-1 for 1. 5 Mbit/s (Video. CD) – MPEG-2 for 10 Mbit/s (Digital TV) – MPEG-3 for 40 Mbit/s (HDTV) l What happened: – MPEG-1 for 1. 5 Mbit/s (Video CD) – MPEG-2 for 2 – 60 Mbit/s (TV and HDTV) – MPEG-4, -7 and -21 for other things.

MPEG-1 l ISO/IEC standard in 1991. l Target bitrate around 1. 5 Mbit/s (Video CD). l Properties: – Bi-directionally predictively coded frames (”B-frames”, see next slide). – More flexible than H. 261. – Almost JPEG for intra frames. l Format: – CIF – No interlace. – 24 – 30 frames/s.

MPEG Frame Types Predictively coded P-frames Intra-coded I-frame I B B P B B I Group of frames (GOF) Bi-directionally predictively coded B-frames

MPEG-coding of I-frames l Intracoded l 8£ 8 DCT l Arbitrary weighting matrix for coefficients l Predictive coding of DC-coefficients l Uniform quantization l Zig-zag, run-level, entropy coding

MPEG-coding of P-frames l Motion compensated prediction from I- or P-frame. l Half-pixel accuracy of motion vectors, bilinear interpolation. l Predictive coding of motion vectors. l Prediction error coded as I-frame.

MPEG-coding of B-frames l Motion compensated prediction from two consecutive Ior P-frames. – Forward prediction only (1 vector/macroblock). – Backward prediction only (1 vector/macroblock). – Average of fwd and bwd (2 vectors/macroblock). l Otherwise as P-frames.

MPEG-2 l ISO/IEC standard in 1994. l Properties: – Handles interlace (optimized for TV) – Even more flexible than MPEG-1 l Format: – 352£ 288 – 704£ 576 (25 frames/s) or 720£ 480 (30 frames/s) – 1440£ 1152 or 1920£ 1080 (HDTV) l Bitrate: – 2 – 60 Mbit/s – ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM. – 18 – 20 Mbit/s: HDTV.

MPEG-2 (cont. ) l Profiles: – Simple profile without B-frames. – Scaleable profiles. l Experience tells that: – At 1. 5 – 2 Mbit/s MPEG-2 is not better than MPEG-1. – With manual interaction at the coding, good quality can be achieved at 3 – 4 Mbit/s. – Problems with implementing the full standard has caused compatibility problems. – Buffering and rate control hard problems.

MPEG-4 l ISO/IEC standard in 1998, version 2 in 1999 l Instead of frames as coding units, MPEG-4 use audio-visual objects l Focus is not primarily on compression, but on content-based functionality l Contains definitions of: – Media object types (video, audio, text, graphics, . . . ) – Parameters for describing the objects – Bitstream syntax for the (compressed) parameters – Scene description, file format, streaming, synchronization, . . . l Allows mixing of media objects.

Parts of the MPEG-4 standard l Part 1, Systems, contains – The bitstream syntax and the binary ”language” for scene description – Computer graphics object descriptions – Multiplexing, transport, . . . l Part 2, Visual, contains – Video coding – Still image coding – Texture coding, . . . l Part 3, Audio, contains a toolbox of audio coders for different applications l . . .

Structure of an MPEG-4 Decoder A/V object Decoder MUX Compositor Bitstream A/V object Audio/Video scene

MPEG-4 (Natural) Video l Instead of frames: Video Object Planes l Coded with Shape Adaptive DCT A video frame Alpha map VOP SA DCT Background VOP

MPEG-4 Video Coding TQ VLC TQ-1 TQ: Transform + quantization Mux Predictor Motion estimation VLC Shape coding VLC

Synthetic/Natural Hybrid Coding l Mix traditional video with 2 D/3 D graphics – Compose virtual environments – Easy to add text, graphs, images, etc l High compression l Receive object from separate sources – Use predefined or locally defined objects l Scaleability – Progressive decoding – Better terminal gives better quality.

Synthetic Objects l 2 D/3 D graphics – Lines, polygons – Still images – Image/video mapping on polygon meshes l VRML scenes and objects l Animated people l More on animation and virtual characters in Lecture 12! l Synthetic audio l More on natural and synthetic audio in Lecture 11!

Natural video object mapped on 2 D mesh Natural video object Computer graphics generated virtual environment Still image or natural video object mapped on animated 3 D mesh All mixed in the decoder!!!

Virtual Environments l Downloaded virtual environment l Different environments for different users l Simple change between environments l Synthetic environments are cheaper than real ones

Tools for Synthetic Objects l Wavelet-based still image compression – Scaleable quality and resolution – Progressive decoding – Can be mapped on 2 D or 3 D meshes l Compression of 2 D and 3 D meshes – Mesh geometry and animation – Transmit vertex coordinates and let the receiving terminal calculate the polygons – A moving or still image can be mapped on the mesh (texture mapping).

More Tools for Synthetic Objects l Face and Body Animation l Text-to-speech (TTS) interface l View-dependent scaleable texture – Information about the users view position in a 3 D scene is transmitted on a back-channel – Only the necessary texture information is transmitted to the user

View-dependent Scaleable Texture The texture is mapped on a surface Original texture What the user sees

Other formats l Microsoft, Real. Video, Quick. Time, . . . l All are variations of the hybrid coder used in MPEGcoders, with some extra features.

New Stuff ITU and ISO in cooperation: H. 264 = MPEG-4 part 10 Finished in 2003.

H. 264 / MPEG-4 part 10 l 4£ 4 integer transform (approximating DCT). l Prediction of blocks of sizes up to 16£ 16. l Motion vectors for blocks of sizes 4£ 4 up to 16£ 16. l Up to 5 reference images for prediction. l Non-uniform qunatization. l Arithmetic coding of run-level pairs.

What about the sound? l MPEG-1 – Audio layer I, II and III (mp 3). l MPEG-2 – Four channels, same codec as in MPEG-1. – AAC (Advanced Audio Codec) added later. l MPEG-4 – AAC – Two speech coders – Structured audio – And more. . . More on audio coding in Lecture 11.

Conclusion l Color coding – Change basis from RGB to YUV – Colour components are compressed harder than the luminance l Moving image coding – Hybrid coding: Motion compensated predictive coding and transform coding of the prediction error – I-, P-, and B-frames – Object-based coding (MPEG-4) mixing synthetic and natural audio & video

Conclusion (cont) l Standards – MPEG-1: Video CD – MPEG-2: Digital TV – MPEG-4: Multimedia – H. 261: ISDN videophone – H. 263: PSTN videophone – H. 264 / MPEG-4 part 10: Universal video

That was the last slide!