Motion Compensated Prediction and the Role of the

Outline • Overview block-based hybrid motion compensated predictive video coding – ITU-T standards H.

Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock

Block-Based Hybrid Motion Compensated Predictive Coding (continued) – Human Visual System more sensitive to

Inter-Picture Macroblock Coding – Estimate motion of blocks from picture-to-picture – Search previously coded

Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction – Prediction derived from

Block-Based Hybrid Motion Compensated Predictive Coding (c) 2008 Michael Horowitz

Survey Motion Estimation and Motion Compensation • Motion models – Translational (focus of talk)

Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) •

Motion Estimation (continued) (Xk, Yk) Sample Locations Reference Picture r(i, j) Search Range •

Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x

$Fractional Sample Motion Estimation (continued) – H. 261 • No fractional sample motion estimation$

Fractional Sample Motion Estimation (continued) • Coding efficiency gain H. 263, [from Wang 2002]

Multiple Motion Vectors per MB • One motion vector for each sub-block • H.

Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains – Uncovered areas

Multiple Reference Pictures (Continued) • H. 263 Annex U [Horowitz 2000] (c) 2008 Michael

Multi-Hypothesis Motion Compensated Prediction [Flierl, Wiegand & Girod 1998] • Linear combination of multiple

Multi-Hypothesis for H. 263 • Sequences Mobile & Calendar and Foreman • Results [Flierl

Overlapped Block Motion Compensation [Orchard & Sullivan 1994] • Special case of multi-hypothesis coding

Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find

Perceptual Tuning • • Prevent transparent foreground macroblocks Blurring of fast moving objects Deblocking

Coding Summary • Macroblock-based coding • Two basic macroblock coding modes – Inter-coded MB

1 -D Discrete Cosine Transform • Type II forward DCT [Ahmed et al. 1974]

2 -Dimensional DCT • Forward • Inverse (c) 2008 Michael Horowitz

Basis Functions for 8 x 8 DCT (c) 2008 Michael Horowitz

Why Choose the DCT? • Coding efficiency • Computational complexity • Perceptual implications (c)

Coding Efficiency X X 1 Q 1 ^ X 1 X 2 ^ X

Coding Efficiency (continued) • Distortion – Square error – High-rate assumption • High-rate implies

Rate Allocation Problem • What is smallest D = D* subject to • Find

Rate Allocation Problem (continued) It follows that and which implies (c) 2008 Michael Horowitz

Generalize for k Quantizers X 12 X X 3 X 1 X 2 Q

Generalization (continued) • 2 quantizers with • Minimize subject to with respect to R

Generalization (continued) • It follows that • Generalize to k quantizers by induction (c)

Optimal Rate and Distortion [Huang & Schultheiss 1963] • Rate • Distortion (c) 2008

Observations and Comments • #1 Optimal rate for Qi proportional to • #2 Optimal

Question • Given Gaussian source X & fixed encoder structure (i. e. , k

Transform Coding X 1 X X 2 Xk [Kramer & Mathews 1956] ^ ^

Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest. [Huang et al. 1963] – –

Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. –

Fact 3 • If KLT produces ≥ for , orthogonal produces then & Energy

Practical Considerations • KLT impractical for many systems – Computational complexity • Transform is

Energy Compaction of Some Discrete Transforms • 1 x 32 block in natural images

2 -D Energy Compaction [from Hedberg & Nilsson 2004] • KLT DCT • DFT

Computational Complexity • Recall DCT may be derived from DFT – First N coefficients

Computational Complexity (continued) • 1 -D 8 -point DCT from 16 -point DFT –

Perceptual Implications • Contrast sensitivity of HVS – See last page of handout [Barlow

Concluding Summary • Motion estimation & compensation – Translation-based motion models – Fractional sample

Concluding Summary • DCT – Near optimal R-D performance for wide range of sources

References • • • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine

References (continued) • • • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design

References (continued) • • • M. J. Horowitz, “Demonstration of H. 263++ Annex U

References (continued) • • • A. Segall, “Bit allocation and encoding for vector sources,

Backup slides • Little Things Big Difference • Motion search over picture boundary –

DCT from the DFT [Haralick 1976] • N-point DCT • Extend N-point sequence xk

Extend N-point Sequence xk by Reflection • Example N 2 N (c) 2008 Michael

Compute 2 N-point DFT • Second sum equals (by symmetry of xk) (c) 2008

Compute 2 N-point DFT (continued) • It follows that • Multiply by & employ

Compute 2 N-point DFT (continued) • Recognizing the DCT • Note is even and

Energy Compaction of Some Discrete Transforms • Transform coefficient variances for N=16, ρ=0. 95

KLT Computational Complexity • Transform is signal dependent • Construct transform – Compute correlation

Practical Matters • 16 -bit math for 4 x 4 in H. 264 complexity

Overlapped Block Motion Compensation in H. 263 • Coding efficiency PSNR [d. B] –

1 -D DFT Energy Compaction Analysis • Fourier transform of ramp (continuous both domains)

Ramp: First 5 Fourier Terms [ptolemy. eecs. berkeley. edu/eecs 20/week 8/examples. html] • Fourier

Better Energy Compaction • DFT energy compaction not very good • Better energy compacting

Extended Ramp (Triangle) • 2 N-point extended ramp Amplitude Time • Sample Fourier Domain

Triangle: First 5 Fourier Terms [ptolemy. eecs. berkeley. edu/eecs 20/week 8/examples. html] • Fourier

Compaction Comparison Summary • DFT coefficient amplitude decay – Ramp – Extended ramp •

Contrast Sensitivity • Allen B. Poirson & Brian A. Wandell, Pattern-color separable pathways predict

Uniform Scalar Quantization • Distortion of ith cell 0 th Cell 0 – Assume

Slides: 71

Download presentation

Outline • Overview block-based hybrid motion compensated predictive video coding – ITU-T standards H. 261, H. 263, H. 264 – ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) – Coding efficiency – Computational complexity – Perceptual implications (c) 2008 Michael Horowitz

Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock (MB) has three components – One luma • “Y”, represents “lightness” • 16 x 16 luma samples – Two chroma • “Cb” & “Cr”, represent color • 16 x 16, 8 x 16, or 8 x 8 chroma samples (c) 2008 Michael Horowitz

Block-Based Hybrid Motion Compensated Predictive Coding (continued) – Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples 4: 4: 4 4: 2: 2 4: 2: 0 Y Cb Cr • Two coding modes for macroblocks (c) 2008 Michael Horowitz

Inter-Picture Macroblock Coding – Estimate motion of blocks from picture-to-picture – Search previously coded (reference) pictures Motion Estimate Location of input MB Search Region Reference Picture – Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate (c) 2008 Michael Horowitz

Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction – Prediction derived from spatially adjacent MBs – Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than intercoded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation (c) 2008 Michael Horowitz

Survey Motion Estimation and Motion Compensation • Motion models – Translational (focus of talk) • Location of kth motion compensated block – – (Xk, Yk) is location of kth input block – (MVx, k, MVy, k) is motion vector (MV) for kth block – Affine motion models • Rotation • Scaling • Video standards do not use affine models (c) 2008 Michael Horowitz

Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example – Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders – Find (MVx, k, MVy, k) that minimizes SAD between • Input block sk(i, j) • Motion compensated prediction in reference picture r(i, j) • Subject to search range (c) 2008 Michael Horowitz

Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x 1 ≤ x* < x 2 and y 1 ≤ y* < y 2 fx= (x*-x 1)/(x 2 -x 1) fy= (y*-y 1)/(y 2 -y 1) z(x 1, y 1) z(x 2, y 1) z(x*, y*) z(x 1, y 2) z(x 2, y 2) z(x*, y*) = (1 -fx)(1 -fy)z(x 1, y 1) + fx(1 -fy) z(x 2, y 1) + fxfy z(x 2, y 2) + (1 -fx)fyz(x 1, y 2) (c) 2008 Michael Horowitz

$Fractional Sample Motion Estimation (continued) – H. 261 • No fractional sample motion estimation$

Fractional Sample Motion Estimation (continued) – H. 261 • No fractional sample motion estimation – MPEG-1, MPEG-2 and H. 263 • 1/2 -sample, bilinear interpolation – H. 264 | MPEG-4 AVC & SVC • Luma – 1/2 -sample, 6 -tap interpolation – 1/4 -sample, simple average • Chroma (1/8 -sample, bilinear) (c) 2008 Michael Horowitz

Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains – Uncovered areas – More integer motion vector estimates Integer sample location t-3 t-2 t-1 Direction of motion t 0 Integer sample location (c) 2008 Michael Horowitz

Multi-Hypothesis Motion Compensated Prediction [Flierl, Wiegand & Girod 1998] • Linear combination of multiple predictions – One motion vector for each prediction – Bi-predicted pictures are special case (2 MVs) – Predictions may be forward & backward in time (c) 2008 Michael Horowitz

Overlapped Block Motion Compensation [Orchard & Sullivan 1994] • Special case of multi-hypothesis coding • H. 263 advanced prediction mode (Annex F) – Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples – 4 motion vectors per macroblock (c) 2008 Michael Horowitz

Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] Total distortion Total bit-rate Distortion Rate for block k – Minimize Jk for each block k separately, using common (c) 2008 Michael Horowitz

Coding Efficiency X X 1 Q 1 ^ X 1 X 2 ^ X 2 Q 2 ^ X • Source X = [X 1, X 2] – Xi is a Gaussian random variable – Mean = 0, Variance = i 2 • Rate of quantizer Qi is Ri (bits / index) – Total rate R = R 1 + R 2 (c) 2008 Michael Horowitz

Coding Efficiency (continued) • Distortion – Square error – High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] – Total distortion (c) 2008 Michael Horowitz

Observations and Comments • #1 Optimal rate for Qi proportional to • #2 Optimal distortion • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005] Ri (c) 2008 Michael Horowitz

Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest. [Huang et al. 1963] – – a) Gaussian input random variables b) High-rate quantizers c) Rate of each quantizer is arbitrary real value d) Square error distortion measure (c) 2008 Michael Horowitz

Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. – KLT coefficients are uncorrelated – There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones (c) 2008 Michael Horowitz

Practical Considerations • KLT impractical for many systems – Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms – Fast algorithms exist – Examine loss of coding efficiency resulting from loss of energy compaction (c) 2008 Michael Horowitz

Computational Complexity • Recall DCT may be derived from DFT – First N coefficients of 2 N-point DFT – Requires appropriate input sequence symmetry – Requries scaling [Tseng & Miller 1978] where fm is mth DFT coefficient • Leverage FFT to compute DCT (c) 2008 Michael Horowitz

Computational Complexity (continued) • 1 -D 8 -point DCT from 16 -point DFT – 13 mults, 29 adds [Arai et al. 1988] – 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds best known • Fast 2 -D DCT (8 x 8) – Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds best known – Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts (c) 2008 Michael Horowitz

Perceptual Implications • Contrast sensitivity of HVS – See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization – Shape frequency content of source – Exploit HVS contrast sensitivity (c) 2008 Michael Horowitz

Concluding Summary • Motion estimation & compensation – Translation-based motion models – Fractional sample motion estimation – Multiple motion vectors per macroblock – Multiple reference pictures – Multi-hypothesis motion compensated prediction – Overlapped block motion compensation (c) 2008 Michael Horowitz

References • • • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform, ” IEEE Trans. Comput. , vol. C-23, pp. 90– 93, Jan. 1974. Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE. E 71(11): 1095(Nov. 1988). E. Feig, S. T. Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc. , 40, 2174 -2193 (1992). H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. G. Bjontegaard “Objective simulation results”, Document VCEG-M 34, Video Coding Experts Group (VCEG), Thirteenth Meeting: Austin, Texas, USA, 2 -4 April, 2001 H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources, ” Operations Research, vol. 11, pp. 399 -417, 1963. B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear). (c) 2008 Michael Horowitz

References (continued) • • • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction, ” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239 -248, Snowbird, USA, Apr. 1998. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. B. Girod, Lecture for EE 368 b, Video and Image Compression Stanford University. R. M. Gray and D. L. Neuhoff, "Quantization, " IEEE Transactions on Information Theory, vol. 44, pp. 2325 -2384, Oct. 1998. R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764– 765. H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms, ” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13 -14, 2004. (c) 2008 Michael Horowitz

References (continued) • • • M. J. Horowitz, “Demonstration of H. 263++ Annex U Performance”, Document Q 15 -J 11, Tenth Meeting (Meeting J) of the ITU-T Q. 15/16, Advanced Video Coding Experts Group, Osaka, Japan, 16 -18 May, 2000. J. -Y. Huang and P. M. Schultheiss, “Block quantization of correlated Gaussian randomvariables, ” IEEE Trans. Comm. , vol. 11, pp. 289– 296, September 1963. F. Kossentini, Y. Lee, M. Smith and R. Ward, “Predictive RD Optimized Motion Estimation for Very Low Bit-Rate Video Coding”, Special Issue of the IEEE Journal on Selected Areas in Communications, 15(9), pages 1752 -1763, December 1997. H. P. Kramer and M. V. Mathews, “A linear coding for transmitting a set of correlated signals, ” IRE Trans. Inform. Theory, vol. 23, no. 3, pp. 41 -46, Sept. 1956. M. T. Orchard and G. J. Sullivan, “Overlapped block motion compensation: An estimation-theoretic approach, ” IEEE Trans. Image Processing, vol. 3, no. 9, pp. 693699, Sept. 1994. W. B. Pennebaker, J. L. Mitchell, JPEG, p-53, Kluwer Academic Publishers, Norwell, MA, USA 1992. (c) 2008 Michael Horowitz

References (continued) • • • A. Segall, “Bit allocation and encoding for vector sources, ” IEEE Trans. Inform. Theory IT-22 (March 1976) 162 -169. Y. Shoham and A. Gersho, “Efficient Bit Allocation for an Arbitrary Set of Quantizers, " IEEE Trans. on Acoust. , Speech, Signal Processing, vol. 36, no. 9, pp. 1445 -1453. September 1988. B. D. Tseng and W. C. Miller, “On Computing the Discrete Cosine Transform”, IEEE Transactions on Computers, 27 (10), (1978) 966– 968. Y. Wang, “Video Coding Standards”, lecture slides based on text Video Processing and Communications, Prentice Hall, 2002. A. B. Watson, “DCT quantization matrices visually optimized for individual images, ” Proc. SPIE, 1913: 202 -16, 1993. T. Wiegand, X. Zhang, and B. Girod, “Block-Based Hybrid Video Coding Using Motion-Compensated Long-Term Memory Prediction, ” in Proc. of the Picture Coding Symposium, Berlin, Germany, pp. 153 -158, Sept. 1997. (c) 2008 Michael Horowitz

Compute 2 N-point DFT (continued) • Recognizing the DCT • Note is even and real fm = Re{fm} • It follows that [Tseng & Miller] (i. e. Im{fm} = 0) • First N coeffs of 2 N-point DFT N-point DCT – with appropriate scaling and xk symmetry (c) 2008 Michael Horowitz

KLT Computational Complexity • Transform is signal dependent • Construct transform – Compute correlation matrix for input vector – Find eigenvectors of correlation matrix • Apply transform (c) 2008 Michael Horowitz

Practical Matters • 16 -bit math for 4 x 4 in H. 264 complexity reduction on certain platforms • 4 x 4 and 8 x 8 transforms in H. 264 – Exact inverses • Non-exact specification for inverse DCT – How is it done? – Implications (c) 2008 Michael Horowitz

Better Energy Compaction • DFT energy compaction not very good • Better energy compacting Fourier based transforms exist • Consider DFT of extended sequence – Extend input to force even symmetry – Leads to DCT (c) 2008 Michael Horowitz

Extended Ramp (Triangle) • 2 N-point extended ramp Amplitude Time • Sample Fourier Domain Fourier Series – No discontinuities at boundary (symmetrical extension) – Expect better energy compaction (c) 2008 Michael Horowitz

Compaction Comparison Summary • DFT coefficient amplitude decay – Ramp – Extended ramp • Suggests DCT will compact well • Fourier Series DFT – Sampling in time repetition in frequency – “series-based” observations valid for DFT (c) 2008 Michael Horowitz