Performance Enhancement of Video Compression Algorithms using SIMD

  • Slides: 14
Download presentation
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket

Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket

Motivation Understand the SSE architecture ¢ Understand the Video compression algorithm and identify the

Motivation Understand the SSE architecture ¢ Understand the Video compression algorithm and identify the bottlenecks. ¢ Improve performance of Video Compression Algorithm using the SSE platform ¢

Components of Video Compression Algorithm Motion Estimation ¢ Motion Compensation and Image Subtraction ¢

Components of Video Compression Algorithm Motion Estimation ¢ Motion Compensation and Image Subtraction ¢ Discrete Cosine Transform ¢ Quantization ¢ Run Length Encoding ¢ Huffman Coding ¢

Bottleneck ¢ Motion Estimation l ¢ It is the process of calculating motion vectors

Bottleneck ¢ Motion Estimation l ¢ It is the process of calculating motion vectors by searching image blocks from a reference image in a new target image DCT Technique to change from the time domain to spatial frequency domain l Highest energy compaction after KLT l

SSE 2 Specifics ¢ Intel C/C++ Compiler 8 l 3 coding styles • Intrinsics

SSE 2 Specifics ¢ Intel C/C++ Compiler 8 l 3 coding styles • Intrinsics • Assembly • Vector Ops ¢ Use of Intrinsics _mm_sad_epu 8 for __m 128 i datatype l _m_psadbw for __m 64 datatype l

SSE 2 platform for Motion Estimation Full Search 16 x 16 Full Search 8

SSE 2 platform for Motion Estimation Full Search 16 x 16 Full Search 8 x 8 Three Step 16 x 16 Three Step 8 x 8 Without SSE With SSE 3 secs 1 secs 23 secs 6 secs 4 secs 12 secs 3 secs

Original Frame from Video

Original Frame from Video

Part of Frames 4 and 5

Part of Frames 4 and 5

Motion Compensated frames 16 x 16 8 x 8

Motion Compensated frames 16 x 16 8 x 8

Discrete Cosine Transform 2 -D DCT is extensively used in JPEG compression algorithm. ¢

Discrete Cosine Transform 2 -D DCT is extensively used in JPEG compression algorithm. ¢ Highly computational intensive. ¢ ¢ FOCUS Exploring DCT implementation on SSE 2. l Identify the DCT algorithm which is scalable with the SIMD Architecture l

DCT hardware Accelerator ¢ Distributed Arithmetic l Choice of DA implementation of DCT •

DCT hardware Accelerator ¢ Distributed Arithmetic l Choice of DA implementation of DCT • Scalable with SSE platform. ¢ 2 -D 8 x 8 DCT operations can be performed as l l l Preprocessing 1 -D DCT (Using DA) Transpose 1 -D DCT (Using DA) Post Processing

1 -D DCT on SSE 2 using DA x 0+ x 7 x 1+x

1 -D DCT on SSE 2 using DA x 0+ x 7 x 1+x 6 x 2+x 5 x 3+x 4 x 0 -x 7 x 1 -x 6 x 2 -x 5 x 3 -x 4 4 ROM DAP DAP X 2 X 4 X 6 X 1 X 3 X 5 X 7 16 0. 5 16 + 16 16 R 16 X 0 0. 25 • Total of 8 DAP structures. • Each DAP completes operations in 8 cycles • Scalable on various datapaths 16, 32, 64, 128. • DAP subword dest, source

Work done ¢ Accomplished Motion Estimation coding and analysis l DCT hardware accelerator in

Work done ¢ Accomplished Motion Estimation coding and analysis l DCT hardware accelerator in Verilog l ISA extension for DCT implementation. l ¢ To be done Synthesis to get delay and area estimate l Assembly code with SSE-DCT enhancements and its performance analysis l

Questions

Questions