Overview of the Scalable Video Coding Extension of

Overview of the Scalable Video Coding Extension of the H. 264/AVC Standard Heiko Schwarz, Detlev Marpe, and Thomas Wiegand CSVT, Sept. 2007 2009/5 MC 2008, VCLAB 1

Outline � Introduction ³ ³ ³ ³ � � � � Problems Definition Functionality Goal Competition Applications Targets History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions 2007/8 MC 2008, VCLAB 2

Introduction - problem � Non-Scalable Video Streaming ³ Multiple video streams are needed for heterogeneous clients 8 Mb/s 512 Kb/s 1 Mb/s 6 Mb/s 2007/8 4 Mb/s MC 2008, VCLAB 3

Introduction - definition � Scalable video stream Sub-stream n Sub-stream 2 Sub-stream 1 � reconstruc tion High quality … … ³ Sub-stream ki Sub-stream k 2 Sub-stream k 1 Low quality Scalability 2007/8 Removal of parts of the video bit-stream to adapt to the various needs of end users and to varying terminal capabilities or network conditions MC 2008, VCLAB 4

Introduction - functionality � Functionality of SVC ³ ³ 2007/8 Graceful degradation when “right” parts of the bit-stream are lost Bit-rate adaptation to match the channel throughput Format adaptation for backwards compatible extension Power adaptation for trade-off between runtime and quality MC 2008, VCLAB 5

Introduction - goal � Goal of SVC Sub-stream ki … Sub-stream k 2 Sub-stream k 1 � = (Quality) H. 264/AVC bit -stream Scalability mode ³ ³ ³ 2007/8 Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based scalability) MC 2008, VCLAB 6

Introduction - competition � SVC is an old research topic (> 20 years) and has been included in H. 262/MPEG-2, H. 263, and MPEG -4 Visual. ³ � Rarely used because ® The characteristics of traditional video transmission systems ® Significant loss of coding efficiency and large increase in decoder complexity Competition ³ ³ 2007/8 Simulcast Transcoding MC 2008, VCLAB 7

Introduction - applications � Applications ³ ³ ³ � Heterogeneous clients Unequal protection Surveillance Problems ³ ³ ³ 2007/8 Increased decoder complexity Decreased coding efficiency Temporal scalability is more often supported than spatial and quality scalability. MC 2008, VCLAB 8

Introduction - targets � Targets ³ ³ ³ 2007/8 Little decrease in coding efficiency Little increase in decoding complexity Support of temporal, spatial, and quality scalability A backward compatible base layer Simple bit-stream adaptations after encoding MC 2008, VCLAB 9

History of SVC � � � October 2003: MPEG issues a call for proposals of Scalable Video Coding ³ 12 wavelet-based ³ 2 extensions of H. 264/AVC ~October 2004: MSRA vs. HHI proposal (Wavelet-based vs. H. 264 Extension) October 2004: HHI proposal adopted as starting point (due to reduction of the encoder and decoder and improvements in coding efficiency) January 2005: MPEG and VCEG agree to jointly finalize the SVC project as an Amendment of H. 264/AVC Spring 2007: Finalization 2007/8 MC 2008, VCLAB 10

Structure of SVC SNR scalable coding Temporal scalable coding Prediction Multiplex Spatial decimation SNR scalable coding Temporal scalable coding 2007/8 Base layer coding Prediction MC 2008, VCLAB Base layer coding 11

Outline � � Introduction History of SVC Structure of SVC Temporal Scalability ³ � � � Hierarchical prediction structure Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions 2007/8 MC 2008, VCLAB 12

Temporal Scalability � Hierarchical prediction structures Hierarchical B pictures 0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9 GOP Non-dyadic hierarchical prediction 0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10 Hierarchical prediction with zero delay 2007/8 MC 2008, VCLAB 0 1 2 3 4 5 6 7 8 9 1011 1213 14 15 16 13

Temporal Scalability � � � Combination with multiple reference picture Arbitrary modification of the prediction structure Issue of quantization ³ ³ 2007/8 Lower layers with higher fidelity Smaller QPs are used in lower layers Propagation of quantization error smaller QPs are used in higher layers MC 2008, VCLAB 14

Temporal Scalability N=1 Video Coding Experiment with H. 264/MPEG 4 -AVC Foreman, CIF 30 Hz @ 1320 kbps Performance as a function of N I P P P P Cascaded QP assignment QP(P) QP(B 0)-3 QP(B 1)-4 QP(B 2)-5 Temporal scalability N=2 I B 0 P N=4 I B 1 B 0 B 1 P N=8 2007/8 I B 2 B 1 B 2 B 0 B 2 B 1 B 2 P MC 2008, VCLAB 16 This slide is copied from JVT-W 132 -Talk

Temporal Scalability � Coding efficiency of hierarchical prediction ³ ³ 2007/8 JSVM 11, High profile with CABAC Only one reference frame CIF MC 2008, VCLAB 18

Temporal Scalability � � Compared with IPPP (With and without delay constraint) Providing temporal scalability usually doesn’t have any negative impact on coding efficiency 2007/8 MC 2008, VCLAB 19

Outline � � � Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability ³ Inter layer prediction ® ® ® � � Inter layer motion prediction Inter layer residual prediction Inter layer intra prediction Quality Scalability Combined Scalability Profiles of SVC Conclusions 2007/8 MC 2008, VCLAB 20

Spatial Scalability Hierarchical MCP & Intra-prediction Spatial decimation motion texture Base layer coding motion Inter-layer prediction • Intra • Motion • Residual H. 264/AVC MCP & Intra-prediction 2007/8 Base layer coding Inter-layer prediction • Intra • Motion • Residual Hierarchical MCP & Intra-prediction Spatial decimation texture Scalable bit-stream H. 264/AVC compatible base layer bit-stream texture motion Multiplex Base layer coding H. 264/AVC compatible coder MC 2008, VCLAB 21

Spatial Scalability � � � Similar to MPEG-2, H. 263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction Spatial 1 Temporal 2 Intra Spatial 0 Temporal 1 Intra 2007/8 MC 2008, VCLAB 22

Spatial Scalability ³ The prediction signals are formed by ® ® ® ³ MCP inside the enhancement layer (Temporal) (small motion and high spatial detail) Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal + Spatial) Inter-layer prediction ³ ³ 2007/8 Three kinds of inter-layer prediction ® Inter-layer motion prediction ® Inter-layer residual prediction ® Inter-layer intra prediction Base mode MB ® Only residual are transmitted, but no additional side info. MC 2008, VCLAB 23

Spatial Scalability � Inter-layer motion prediction ³ ³ 2007/8 (2 x 2, 2 y 2) (2 x 1, 2 y 1) base_mode_flag = 1 The reference layer is inter-coded 16 16 Data are derived from the reference layer (x 2, y 2) (x 1, y 1) ® MB partitioning Reference layer 8 ® Reference indices 8 ® MVs motion_pred_flag ® 1: MV predictors are obtained from the reference layer ® 0: MV predictors are obtained by conventional spatial predictors. MC 2008, VCLAB 24

Spatial Scalability � Inter-layer residual prediction ³ ³ residual_pred_flag = 1 Predictor Block-wise up-sampling by a bi-linear filter from the corresponding 8 8 sub-MB in the reference layer ® Transform block basis ® 2007/8 MC 2008, VCLAB 25

Spatial Scalability � Inter-layer intra prediction ³ ³ ³ base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer Luma: one-dimensional 4 -tap FIR filter ® Chroma: bi-linear filter ® 2007/8 MC 2008, VCLAB 26

Spatial Scalability � Past spatial scalable video: ³ ³ ³ � Inter-layer intra prediction requires completely decoding of base layer. Multiple motion compensation and deblocking filter are needed. Full decoding + inter-layer prediction: complexity > simulcast. Single-loop decoding ³ 2007/8 Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded MC 2008, VCLAB 27

Spatial Scalability � Single-loop vs. multi-loop decoding Inter I 2007/8 B P 28 MC 2008, VCLAB This slide is copied from http: //iphome. hhi. de/wiegand/assets/pdfs/H 264 AVC_SVC. pdf

Spatial Scalability � Generalized spatial scalability in SVC ³ Arbitrary ratio ® ³ Only restriction: Neither the horizontal nor the vertical resolution can decrease from one layer to the next. Cropping Containing new regions ® Higher quality of interesting regions ® 2007/8 MC 2008, VCLAB 29

Spatial Scalability � Coding efficiency ³ 2007/8 Multiple-loop > Single-loop MC 2008, VCLAB 30

Spatial Scalability � Coding efficiency (IPPP…) ³ 2007/8 Multi-loop > Single-loop MC 2008, VCLAB 32

Spatial Scalability � Encoder control (JSVM) ³ Base layer ® ® ³ p 0 is optimized for base layer Enhancement layer ® ® ³ Decisions of p 1 depend on p 0 ® 2007/8 p 1 is optimized for enhancement layer Efficient base layer coding but inefficient enhancement layer coding MC 2008, VCLAB 33

Spatial Scalability � Encoder control (optimization) ³ Base layer ® Considering enhancement layer coding ¯ Eliminating p 0’s disadvantaging enhancement layer coding ® ³ ³ 2007/8 Enhancement layer ® No change w ® w = 0: JSVM encoder control ® w = 1: Single-loop encoder control (base layer is not controlled) MC 2008, VCLAB 34

Spatial Scalability � Coding efficiency of optimal encoder control ³ 2007/8 Optimized encoder vs. JSVM encoder (QPE = QPB + 4) MC 2008, VCLAB 35

Outline � � � � � Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability ³ CGS ³ MGS ³ Drift control Combined Scalability Profiles of SVC Conclusions 2007/8 MC 2008, VCLAB 36

Quality Scalability � Coarse-grain quality scalability (CGS) ³ A special case of spatial scalability ® ³ ³ Smaller quantization step sizes for higher enhancement residual layers Designed for only several selected bit-rate points ® ³ 2007/8 Identical sizes (resolution) for base and enhancement layers Supported bit-rate points = Number of layers Switch can only occur at IDR access units MC 2008, VCLAB 37

Quality Scalability � Medium-grain quality scalability (MGS) ³ More enhancement layers are supported ® ³ Key pictures ® ³ ³ 2007/8 Refinement quality layers of residual Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers MC 2008, VCLAB 38

Quality Scalability � Drift control ³ ³ Drift: The effect caused by unsynchronized MCP at the encoder and decoder side Trade-off of MCP in quality SVC ® 2007/8 Coding efficiency drift MC 2008, VCLAB 39

Quality Scalability � MPEG-4 quality scalability with FGS Refinement (possibly lost or truncated) Base layer ³ ³ Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement layer ® 2007/8 Refinement data are not used for MCP MC 2008, VCLAB 40

Quality Scalability � MPEG-2 quality scalability (without FGS) Refinement (possibly lost or truncated) Base layer ³ ³ Only 1 reference picture is stored and used for MCP of following pictures Drift: Both base layer and enhancement layer ® ³ ³ 2007/8 Frequent intra updates is necessary Complexity: Low Efficiency: Efficient enhancement layer but inefficient base layer MC 2008, VCLAB 41

Quality Scalability � 2 -loop prediction Refinement (possibly lost or truncated) Base layer ³ ³ 2007/8 Several closed encoder loops run at different bit-rate points in a layered structure Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient enhancement layer MC 2008, VCLAB 42

Quality Scalability � SVC concepts Refinement (possibly lost or truncated) Base layer ³ Key picture Trade-off between coding efficiency and drift ® MPEG-4 FGS: All key pictures ® MPEG-2 quality scalability: Non-key pictures ® 2007/8 MC 2008, VCLAB 43

Quality Scalability � Drift control with hierarchical prediction Refinement (possibly lost or truncated) Base layer P ³ Key pictures ® ³ 2007/8 B 1 B 2 P B 2 B 1 B 2 P Based layer is stored and used for the MCP of following pictures Other pictures ® ³ B 2 Enhancement layer is stored and used for the MCP of following pictures GOP size adjusts the trade-off between enhancement layer coding efficiency and drift MC 2008, VCLAB 44

Quality Scalability � Comparisons of drift control High efficiency Low efficiency Drift-free Drift 2007/8 MC 2008, VCLAB 45

Quality Scalability � Comparisons of coding efficiency QSTEP = 2 (QP-4)/6 High d. QP Low d. QP 2007/8 MC 2008, VCLAB 46

Quality Scalability � MGS with key pictures using optimized encoder control Only base layer 2007/8 MC 2008, VCLAB 47

Outline � � � � � Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability ³ SVC encoder structure ³ Dependence and Quality refinement layers ³ Bit-stream format ³ Bit-stream switching Profiles of SVC Conclusions 2007/8 MC 2008, VCLAB 48

Combined Scalability � SVC encoder structure The same motion/prediction information Dependency layer Temporal Decomposition The same motion/prediction information 2007/8 MC 2008, VCLAB 49

Combined Scalability � Dependency and Quality refinement layers Q=2 D=2 Q=1 Q=0 Q=2 D=1 Q=1 Scalable bitstream Q=0 Q=2 D=0 2007/8 Q=1 Q=0 MC 2008, VCLAB 50

Combined Scalability Q 1 D 1 Q 0 T 2 T 1 T 2 T 0 Q 1 D 0 Q 0 2007/8 MC 2008, VCLAB 51

Combined Scalability � Bit-stream format NAL unit header extension 2 6 3 3 2 P T D Q NAL unit payload 1 1 1 3 P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer 2007/8 MC 2008, VCLAB 52

Combined Scalability � Bit-stream switching ³ Inside a dependency layer ® ³ Switching everywhere Outside a dependency layer Switching up only at IDR access units ® Switching down everywhere if using multiple-loop decoding ® 2007/8 MC 2008, VCLAB 53

Outline � � � � � Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC ³ Scalable Baseline ³ Scalable High Intra Conclusions 2007/8 MC 2008, VCLAB 54

Profiles of SVC � Scalable Baseline ³ ³ ³ 2007/8 For conversational and surveillance applications requiring low decoding complexity Spatial scalability: fixed ratio (1, 1. 5, or 2) and MB-aligned cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8 x 8 luma transform The base layer conforms Baseline profile of H. 264/AVC MC 2008, VCLAB 55

Profiles of SVC � Scalable High ³ ³ ³ � For broadcast, streaming, and storage Spatial, temporal, and quality scalability: arbitrary The base layer conforms High profile of H. 264/AVC Scalable High Intra ³ 2007/8 Scalable High + all IDR pictures MC 2008, VCLAB 56

Conclusions � � Temporal scalability ³ Hierarchical prediction structure Spatial and quality scalability ³ Inter-layer prediction of Intra, motion, and residual information ³ Single-loop MC decoding ³ Identical size for each spatial layer – CGS ³ CGS + key pictures + quality refinement layer – MGS applications ³ Power adaption – decoding needed part of the video stream ³ Graceful degradation – when “right” parts are lost ³ Format adaption – backwards compatible extension in mobile TV What’s next in SVC? ³ Bit-depth scalability (8 -bit 4: 2: 0 10 -bit 4: 2: 0) ³ Color format scalability (4: 2: 0 4: 4: 4) 2007/8 MC 2008, VCLAB 57

References � � H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H. 264/AVC Standard, ” CSVT 2007. T. Wiegand, “Scalable Video Coding, ” Joint Video Team, doc. JVT-W 132, San Jose, USA, April 2007. T. Wiegand, “Scalable Video Coding, ” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http: //iphome. hhi. de/wiegand/dic. htm) H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter. Layer Prediction for Single-Loop Decoding in Spatial Scalability, ” Proc. of ICIP’ 05. 2007/8 MC 2008, VCLAB 58