MPEG2 to H 264AVC Transcoding Techniques Jun Xilient

  • Slides: 52
Download presentation
MPEG-2 to H. 264/AVC Transcoding Techniques Jun Xilient Inc. Cupertino, CA

MPEG-2 to H. 264/AVC Transcoding Techniques Jun Xilient Inc. Cupertino, CA

Digital Video Transcoder Coded digital video bit-stream “A” Coded digital video bit-stream “B” Transcoder

Digital Video Transcoder Coded digital video bit-stream “A” Coded digital video bit-stream “B” Transcoder n “A” and “B” may differ in many aspects: coding formats: e. g. MPEG-2 to H. 264/AVC ¨ bit-rate, frame rate, resolution … ¨ features: error resilience features ¨ contents: e. g. logo insertion ¨ November 07 Digital Video Transcoding

Applications n Media Storage Transcode broadcasting MPEG-2 video to H. 264/AVC format: enable long-time

Applications n Media Storage Transcode broadcasting MPEG-2 video to H. 264/AVC format: enable long-time recording ¨ Effective for multi-channel recording ¨ n Home Gateway ¨ Provide connection to IPTV set-top box n n n Box only supports H. 264/AVC Over wireless network with bandwidth limitation Other potential uses: Export to mobile ¨ Internet streaming ¨ …… ¨ November 07 Digital Video Transcoding

Goals and Challenges n H. 264/AVC: latest video compression standard Promises same quality as

Goals and Challenges n H. 264/AVC: latest video compression standard Promises same quality as MPEG-2 at half the bit-rate ¨ Is being widely adopted ¨ n n n Convert MPEG-2 video to H. 264/AVC format ¨ n HD Consumer Storage, e. g. , HD-DVD and Blu-Ray Mobile Devices, e. g. , Apple i. Pod, i. Phone, Sony PSP More efficient storage, export to mobile devices, etc. Challenges Yield similar quality as full re-encoding, but with much lower cost ¨ Key to lower-cost/high-quality: how to intelligently reuse available information from the incoming bitstream ¨ May be loosely considered as a “two-pass coder” ¨ n November 07 Could achieve better quality than full re-encoding given same complexity Digital Video Transcoding

Outline n Intra-only transcoding techniques ¨ Efficient n compressed domain processing Inter transcoding techniques

Outline n Intra-only transcoding techniques ¨ Efficient n compressed domain processing Inter transcoding techniques ¨ Motion November 07 mapping / motion reuse Digital Video Transcoding

Intra Transcoding Techniques

Intra Transcoding Techniques

Intra Transcoder – Pixel Domain Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse)

Intra Transcoder – Pixel Domain Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H. 264/AVC 4 x 4 transform VLD/ IQ HT IDCT H. 264 Entropy Coding Q Inverse HT Intra Prediction (Pixel-domain) November 07 Mode decision Digital Video Transcoding Pixel Buffer

Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT:

Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H. 264/AVC 4 x 4 transform VLD/ IQ H. 264 Entropy Coding Q Inverse Q Intra Prediction (Comp-domain) November 07 Mode decision Digital Video Transcoding Coeff Buffer

AVC 4 x 4 Transform n n Motivation: ¨ DCT requires real-number operations, which

AVC 4 x 4 Transform n n Motivation: ¨ DCT requires real-number operations, which may cause inaccuracies in inversion ¨ Better prediction means less spatial correlation – no strong need for real-number operations H. 264 uses a simple integer 4 x 4 transform ¨ Approximation to 4 x 4 DCT ¨ Transform and inverse transform n November 07 note: ½ in inverse transform represents right shift, so it is non-linear Digital Video Transcoding

Intra Prediction in H. 264/AVC n n Motivation: intra-frames are natural images, so they

Intra Prediction in H. 264/AVC n n Motivation: intra-frames are natural images, so they exhibit strong spatial correlation Pixels in intra-coded frames are predicted based on previously-coded ones ¨ n Prediction can be based on 4 x 4 blocks or 16 x 16 macroblocks (or 8 x 8 blocks for high profile) An encoded mode specifies which neighbor pixels should be used to predict, and how November 07 Digital Video Transcoding

4 x 4 Intra Prediction Example n Current block: n Prediction blocks: Vertical November

4 x 4 Intra Prediction Example n Current block: n Prediction blocks: Vertical November 07 Horizontal Digital Video Transcoding Diagonal_Down_Right

Compressed Domain Processing? n Challenges ¨ Different transforms n MPEG-2 uses DCT, floating point

Compressed Domain Processing? n Challenges ¨ Different transforms n MPEG-2 uses DCT, floating point n H. 264/AVC uses an integer transform ¨ New prediction modes in H. 264/AVC n Can prediction be performed in compressed domain? n Goals ¨ Simpler November 07 computation and architecture Digital Video Transcoding

Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT:

Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H. 264/AVC 4 x 4 transform VLD/ IQ H. 264 Entropy Coding Q Inverse Q Intra Prediction (Comp-domain) November 07 Mode decision Digital Video Transcoding Coeff Buffer

Intra Transcoder – Proposed Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization

Intra Transcoder – Proposed Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H. 264/AVC 4 x 4 transform VLD/ IQ DCT-to-HT conversion (S-Transform) Entropy Coding Q Inverse HT Intra Prediction (HT-domain) Mode decision (HT-domain) November 07 Digital Video Transcoding Pixel Buffer

Techniques n DCT-to-HT conversion n Compressed (HT) domain prediction ¨ Very simple for some

Techniques n DCT-to-HT conversion n Compressed (HT) domain prediction ¨ Very simple for some prediction modes n Compressed domain distortion calculation in mode decision n Advantages ¨ lower computational complexity ¨ No quality loss November 07 Digital Video Transcoding

DCT-to-HT Conversion November 07 Digital Video Transcoding

DCT-to-HT Conversion November 07 Digital Video Transcoding

DCT-to-HT Conversion: Transform Kernel Matrix November 07 Digital Video Transcoding

DCT-to-HT Conversion: Transform Kernel Matrix November 07 Digital Video Transcoding

Fast Algorithm (1 D) November 07 Digital Video Transcoding

Fast Algorithm (1 D) November 07 Digital Video Transcoding

Complexity Analysis n Transform-domain DCT-to-HT (S-Transform): 704 operations ¨ ¨ n Pixel-domain mapping (IDCT*

Complexity Analysis n Transform-domain DCT-to-HT (S-Transform): 704 operations ¨ ¨ n Pixel-domain mapping (IDCT* followed by HT): 992 operations ¨ ¨ ¨ n 352 multiplications 352 additions 256 multiplications 64 shifts 672 additions Advantage ¨ ¨ ¨ 29% saving in total operations Two-stage vs. six-stage implementation Better performance: no intermediate rounding W. H. Chen, C. H. Smith, and S. C. Fralick, ``A Fast Computational Algorithm for the Discrete Cosine Transform, '' IEEE Trans. on Communications, Vol. COM-25, pp. 1004 -1009, 1977 * November 07 Digital Video Transcoding

Intra Transcoder – Proposed Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization

Intra Transcoder – Proposed Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H. 264/AVC 4 x 4 transform VLD/ IQ DCT-to-HT conversion (S-Transform) Entropy Coding Q Inverse HT Intra Prediction (HT-domain) Mode decision (HT-domain) November 07 Digital Video Transcoding Pixel Buffer

Conventional Mode Decisions n Given all possible prediction modes, encoder needs to decide which

Conventional Mode Decisions n Given all possible prediction modes, encoder needs to decide which one to use n Low-complexity mode decision rule (RDO_Off): SATD Cost or n High-complexity mode decision rule with rate distortion optimization (RDO_On): RD Cost November 07 Digital Video Transcoding

Conventional RD Cost Computation n Entire encoding/decoding need to be performed for every mode

Conventional RD Cost Computation n Entire encoding/decoding need to be performed for every mode November 07 Digital Video Transcoding

Motivation & Previous Approaches n RD_Cost based mode decision gives best performances, but very

Motivation & Previous Approaches n RD_Cost based mode decision gives best performances, but very expensive to compute n Previous efforts in fast intra mode decisions ¨ Directional field ¨ Edge histogram ¨ Other pixel-domain approaches ¨ They all lead to lower coding performance n Our approach is based on transform domain processing – no loss in coding performance November 07 Digital Video Transcoding

Transform Domain RD Cost Computation No inverse transform ¨ Transformations of some prediction signals

Transform Domain RD Cost Computation No inverse transform ¨ Transformations of some prediction signals are easy to compute ¨ Distortion calculated in transform domain ¨ November 07 Digital Video Transcoding

HT of DC Prediction HT • • No HT needs to be performed Pdc

HT of DC Prediction HT • • No HT needs to be performed Pdc has only one non-zero elements November 07 Digital Video Transcoding

HT of Horizontal Prediction • • Only one 1 -D HT is needed Ph

HT of Horizontal Prediction • • Only one 1 -D HT is needed Ph has only four non-zero elements (the first column) November 07 Digital Video Transcoding

HT of Vertical Prediction • • Only one 1 -D HT is needed Pv

HT of Vertical Prediction • • Only one 1 -D HT is needed Pv has only four non-zero elements (the first row) November 07 Digital Video Transcoding

Calculate Distortion in Transform Domain Distortion in pixel domain: Distortion in transform domain: November

Calculate Distortion in Transform Domain Distortion in pixel domain: Distortion in transform domain: November 07 Digital Video Transcoding

Ranking-based Fast Mode Decision n Two cost functions: SATD_Cost & RD_Cost n Observation: the

Ranking-based Fast Mode Decision n Two cost functions: SATD_Cost & RD_Cost n Observation: the best mode according to RD_Cost usually has smaller SATD_Cost n Proposed algorithm (mode reduction): to rank different modes using SATD_Cost, then calculate RD_Cost for top several modes ¨ Algorithm can be conducted in transform domain November 07 Digital Video Transcoding

Verification Experiment n Count the percentage of times when the best mode according to

Verification Experiment n Count the percentage of times when the best mode according to RD_Cost are within the best k modes ranked by SATD_Cost ¨ k fixed as 3 in all simulations November 07 Digital Video Transcoding

Simulation Conditions n Three transcoders PDT – reference pixel domain transcoder, with fast IDCT

Simulation Conditions n Three transcoders PDT – reference pixel domain transcoder, with fast IDCT implemented ¨ TDT – transform domain transcoder ¨ TDT-R – transform domain transcoder with ranking-based mode decision ¨ n Test sequences 100 frames, CIF size, 30 fps ¨ Input: MPEG-2 all-I at 6 Mbps ¨ November 07 Digital Video Transcoding

Simulation – “Mobile” November 07 Digital Video Transcoding

Simulation – “Mobile” November 07 Digital Video Transcoding

Simulation – “Stefan” November 07 Digital Video Transcoding

Simulation – “Stefan” November 07 Digital Video Transcoding

Complexity: Run-time Results November 07 Digital Video Transcoding

Complexity: Run-time Results November 07 Digital Video Transcoding

Summary of Intra Transcoding n Efficient transcoder architecture n Efficient mode decision ¨ Transform

Summary of Intra Transcoding n Efficient transcoder architecture n Efficient mode decision ¨ Transform domain distortion calculation ¨ Ranking-based mode decision n Achieved virtually same quality as reference transcoder with significantly lower complexity November 07 Digital Video Transcoding

Inter Transcoding Techniques

Inter Transcoding Techniques

Transcoder Architecture entropy coding HT/Q Inverse Q/ Inverse HT Decoded picture and macroblock data

Transcoder Architecture entropy coding HT/Q Inverse Q/ Inverse HT Decoded picture and macroblock data Prediction MPEG-2 decoder Deblocking filter Pixel buffers Motion and modes Motion/mode mapping November 07 Digital Video Transcoding

Assumptions n Input ¨ MPEG-2 n frame pictures Output ¨ H. 264/AVC baseline profile

Assumptions n Input ¨ MPEG-2 n frame pictures Output ¨ H. 264/AVC baseline profile (no B slices) and main profile ¨ Frame pictures, MBAFF not considered ¨ Block partition sizes considered for motion compensation: 16 x 16, 16 x 8, 8 x 16 and 8 x 8 November 07 Digital Video Transcoding

Motion Mapping: Problems MPEG-2 H. 264/AVC Frame/field motion vector Frame motion vector B, P

Motion Mapping: Problems MPEG-2 H. 264/AVC Frame/field motion vector Frame motion vector B, P pictures Baseline profile has no B picture support Motion vectors for different partition sizes: 16 x 16, 16 x 8, 8 x 16, 8 x 8 One motion vector per macroblock November 07 Digital Video Transcoding

Motion Mapping Algorithm 1. 2. 3. Field-to-frame mapping: convert MPEG-2 field motion vectors (if

Motion Mapping Algorithm 1. 2. 3. Field-to-frame mapping: convert MPEG-2 field motion vectors (if any) to frame vector Reference picture mapping: for B to P frame type conversion Block size mapping: map the MPEG-2 motion vectors to target H. 264/AVC motion vectors of different block size n Algorithm: distance weighted average (DWA) 4. Motion refinement: (1+1/2+1/4) around estimated motion vectors for all block partitions n Note: for B slice output, the above mapping is performed for motion vectors of both directions November 07 Digital Video Transcoding

Field-to-frame Conversion November 07 Digital Video Transcoding

Field-to-frame Conversion November 07 Digital Video Transcoding

Reference Picture Mapping ti=3 Input I B B P Output I P P P

Reference Picture Mapping ti=3 Input I B B P Output I P P P to=1 MVcol MVi, forw MVi, back Input I B B P Output I P P P November 07 MVo Digital Video Transcoding

Block Size Mapping: 16 x 8 8 x 16 November 07 Digital Video Transcoding

Block Size Mapping: 16 x 8 8 x 16 November 07 Digital Video Transcoding

Block Size Mapping: 8 x 8 November 07 Digital Video Transcoding

Block Size Mapping: 8 x 8 November 07 Digital Video Transcoding

Simulation Conditions n Test sequences: ¨ n MPEG-2 input: ¨ n 30 Mbps, (30,

Simulation Conditions n Test sequences: ¨ n MPEG-2 input: ¨ n 30 Mbps, (30, 3) H. 264/AVC output: ¨ ¨ n 1920 x 1080 i, 30 fps, 450 frames UVLC, output bit-rate of interest ~10 Mbps Baseline profile (needs to convert B pictures to P slices) & Main profile Comparison points ¨ ¨ ¨ Mapping algorithm B slices RD optimization November 07 Digital Video Transcoding

Baseline output: no B slices November 07 Digital Video Transcoding

Baseline output: no B slices November 07 Digital Video Transcoding

Baseline output: no B slices November 07 Digital Video Transcoding

Baseline output: no B slices November 07 Digital Video Transcoding

Main Output: with B slices November 07 Digital Video Transcoding

Main Output: with B slices November 07 Digital Video Transcoding

Main Output: with B slices November 07 Digital Video Transcoding

Main Output: with B slices November 07 Digital Video Transcoding

Complexity: Run-time Results November 07 Digital Video Transcoding

Complexity: Run-time Results November 07 Digital Video Transcoding

Conclusions n Efficient motion mapping schemes that directly map MPEG-2 motion vectors to H.

Conclusions n Efficient motion mapping schemes that directly map MPEG-2 motion vectors to H. 264/AVC motion vectors n Evaluated the complexity-performance tradeoff of B-slices and RD optimization n Achieved good rate-distortion performance with low complexity November 07 Digital Video Transcoding

Thank you

Thank you