Case Study ARM Platformbased JPEG Codec HWSW Codesign
Case Study ARM Platform-based JPEG Codec HW/SW Co-design S&IP Consortium Course Material
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 2
ISO/IEC 10918 -1 JPEG v JPEG: Joint Photographic Experts Group v JPEG voted as international standard in 1994 v JPEG standard has four compression method v. Baseline sequential DCT-based coding v. Progressive DCT-based coding v. Lossless coding method Ø Sampling and Quantization are not considered at lossless coding scheme v. Hierarchical coding method S&IP Consortium Course Material 3
Compression Method TISO 0730 -93/d 009 Baseline sequential V. S. Progressive DCTbased coding S&IP Consortium Course Material 4
Block Diagram of JPEG Encoder RGB Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code S&IP Consortium Course Material 5
Color Model in Video ─ YCr. Cb v Y: Luminance v Cb, Cr: Chrominance v YCb. Cr color model is used in JPEG and MPEG S&IP Consortium Course Material 6
Color Model in Video ─ YCr. Cb v CCIR-601 transformula v Color space transform is loss-less S&IP Consortium Course Material 7
Chroma Sub-sampling v 4: 1: 1 and 4: 2: 0 are mostly used in JPEG and MPEG S&IP Consortium Course Material 8
Block Diagram of JPEG Encoder RGB Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code S&IP Consortium Course Material 9
2 -D DCT (Discrete Cosine Transform) Frequency domain S&IP Consortium Course Material Space domain 10
Basis Image of 2 -D DCT Horizontal Frequency Low High Vertical Frequency High S&IP Consortium Course Material 11
Frequency Distribution of 2 -D DCT By frequency: S&IP Consortium Course Material By direction: 12
8 point 1 -D DCT Algorithm (1/2) Better for VLSI design implementation! S&IP Consortium Course Material 13
8 point 1 -D DCT Algorithm (2/2) S&IP Consortium Course Material 14
Implementation 2 -D DCT v Separable, row-column decomposition X 1 D DCT Unit Y=AX S&IP Consortium Course Material Transport Memory (Y) 1 D DCT Unit Z Z=YAT 15
Block Diagram of JPEG Encoder RGB Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code S&IP Consortium Course Material 16
Quantization Table for Luminance 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 67 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 S&IP Consortium Course Material 17
Quantization Table for Chrominance 17 18 24 47 99 99 18 21 26 66 99 99 24 26 56 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 S&IP Consortium Course Material 18
Block Diagram of JPEG Encoder RGB Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code S&IP Consortium Course Material 19
Predictive Coding of DC Coefficients • Differential Pulse Code Modulation (DPCM) • To Store the differential value is better than the exact value. S&IP Consortium Course Material 20
Zig-zag Scan (AC Coefficients) S&IP Consortium Course Material 21
Run-Length Coding(RLC) DC (R, L) => (0, -3)(0, -2)(0, -1)(2, -1)(EOB) S&IP Consortium Course Material 22
Huffman Coding (R, L) => (0, -3)(0, -2)(0, -1)(2, -1)(EOB) (0, 2)(-3), (0, 2)(-2), (0, 1)(-1), (0, 2)(-2), …(0, 0) Category AC Coefficient Range 1 -1, 1 2 -3, -2, 2, 3 3 -7, …, -4, 4, …, 7 4 -15, …, -8, 8, …, 15 5 -31, …, -16, …, 31 6 -63, …, -32, …, 63 7 -127, …, -64, …, 127 8 -255, …, -128, …, 255 9 -511, …, -256, …, 511 10 -1023, …, -512, …, 1023 11 -2047, …, -1024, …, 2047 (Run, SSSS/Catagory ) Huffman Table S&IP Consortium Course Material 23
Huffman Coding for DC and AC Coefficient Run/Size Category Code length Code word 10 2 000 11 3 010 12 3 011 13 3 100 14 3 101 15 3 110 16 4 1110 17 5 11110 18 6 111110 19 7 1111110 10 8 11111110 11 9 11110 Table for luminance DC coefficient differences Code word 0/0 (EOB) 14 1010 0/1 12 00 0/2 12 01 0/3 13 100 0/4 14 1011 0/5 15 11010 0/6 17 1111000 0/7 18 11111000 0/8 10 1111110110 0/9 16 111110000010 0/A 16 111110000011 1/1 14 1100 1/2 15 11011 1/3 17 1111001 1/4 19 111110110 (0, 2)(3), (0, 2)(-2), (0, 1)(-1), (0, 2)(-2), …(0, 0) =>(01) (11) (01) ……(1010) S&IP Consortium Course Material Code length Table for luminance AC coefficients 24
Example of Baseline DCT-based Coding For Y, (8*8 pixels *8 bits/pixel = 512 bits) -128 FDCT Q (6)(61), (0, 2)(-3), (0, 3)(4), (0, 1)(-1), (1110)(111101)(00)(100) Huffman (0, 3)(-4), (0, 2)(2), (100)(0)(100)(001)(01) (1, 2)(2), (0, 2)(-2), (10)(11011)(10)(01)(01)(11111110111)(10)(111010) Run-length (0, 2)(-2), (5, 2)(2), (3, 1)(1), (6, 1)(-1), (1)(1111011)(0)(11100)(0) (2, 1)(-1), (4, 1)(-1), (111011)(0)(11111010)(0)(1010) (7, 1)(-1), (0, 0) Zig-Zag total 102 bits Q Table S&IP Consortium Course Material 25
Block Diagram of JPEG Encoder RGB Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code S&IP Consortium Course Material 26
Block Diagram of JPEG Decoder 01001011101… S&IP Consortium Course Material 27
JPEG Bitstream S&IP Consortium Course Material 28
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 29
Review v Process run v Linker v Tailoring the C library v Load and Execution View S&IP Consortium Course Material 30
Process Run S&IP Consortium Course Material 31
Stack v Function parameter v Local variable S&IP Consortium Course Material 32
Heap v malloc() v new operator S&IP Consortium Course Material 33
Linker S&IP Consortium Course Material 34
ARM Linker Control File S&IP Consortium Course Material 35
Linker S&IP Consortium Course Material 36
Tailoring the C Library v The management of writable memory as static data, heap and stack v Functions that can be redefined v Redirection I/O function S&IP Consortium Course Material 37
Memory Model v Single memory region v The stack grows downward from the top of memory v The heap grows upwards from the bottom of the region S&IP Consortium Course Material 38
Single Memory Model S&IP Consortium Course Material 39
Controlling Runtime Memory Model Function Description __user_initial_stackheap() Return the location of the initial heap __user_heap_extend() Returns the size and base address of a heap extra block __user_stack_slop Returns the amount of extra stack S&IP Consortium Course Material 40
My Own Memory Model Function Description __rt_stackheap_init() It is responsible for setting up sp and sl to point a valid stack __rt_stack_overflow() It is called if a stack overflow occurs __rt_heap_extend() This function returns a new 8 -byte aligned block S&IP Consortium Course Material 41
Trailing the I/O Function S&IP Consortium Course Material 42
Load View and Execution View S&IP Consortium Course Material 43
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 44
AHB Protocol S&IP Consortium Course Material 45
AHB Wrapper S&IP Consortium Course Material 46
Input Pin Block Diagram S&IP Consortium Course Material 47
Output Pin Block Diagram S&IP Consortium Course Material 48
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 49
Lab ─ Case Study v Goal v Implement the JPEG codec system using ARM platform v Principles v Implement the ARM platform-based JPEG codec HW/SW codesign v Requirement v Analysis the profiling of pure software simulation v Explain how to partition the HW/SW of JPEG codec v Implement the JPEG codec with HW/SW co-design v Discussion v Explain where is the stack and heap ? And who initialize them S&IP Consortium Course Material 50
File Structure S&IP Consortium Course Material 51
Read & Write Address Write_head 0 xcc 000000 0 xcc 000040 0 xcc 000004 0 xcc 000044 0 xcc 000008 0 xcc 000048 0 xcc 00000 c 0 xcc 00004 c 0 xcc 000010 0 xcc 000050 0 xcc 000014 0 xcc 000054 0 xcc 000018 0 xcc 000058 0 xcc 00001 c 0 xcc 00005 c FDCT S&IP Consortium Course Material IDCT Read_head 0 xcc 000020 0 xcc 000060 0 xcc 000024 0 xcc 000064 0 xcc 000028 0 xcc 000068 0 xcc 00002 c 0 xcc 00006 c 0 xcc 000030 0 xcc 000070 0 xcc 000034 0 xcc 000074 0 xcc 000038 0 xcc 000078 0 xcc 00003 c 0 xcc 00007 c 52
Result for SW Simulation Original Encoder S&IP Consortium Course Material Decoder 53
Result for HW Simulation Original Encoder S&IP Consortium Course Material Decoder 54
Profiling Result of SW Simulation S&IP Consortium Course Material 55
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 56
Step 1 (Only Software) v 首先,請先確定 作目錄。例如: D: ARMSo. CFinal_project v 請確定 作目錄下是否有sw. bat此批次檔 S&IP Consortium Course Material 57
Step 3 (Only Software) v 開啟AXD Debugger的視窗 v 選擇〝File → Load Image v 選擇〝Execute → Go〞 S&IP Consortium Course Material 59
Step 1 ( SW/HW ) v 確定 作目錄。例如: D: ARMSo. CFinal_project S&IP Consortium Course Material 60
Step 1 ( SW/HW ) v 利用Xilinx ISE軟體將提供之Verilog HDL碼編譯為 可燒錄之*. bit檔 S&IP Consortium Course Material 61
Step 2 ( SW/HW ) v 將ahbahbtop. bit檔燒錄至ARM Integrator之LM 模組上 v 在燒錄時需要Download. brd以及 LM_flash_load. bit此二檔案 S&IP Consortium Course Material 62
Step 4 ( SW/HW ) v 開啟AXD Debugger的視窗 v 選擇〝File → Load Image〞 v 選擇〝Execute → Go〞 S&IP Consortium Course Material 64
Outline v Introduction to JPEG Codec v Review ─ Software ( Concept ) v Review ─ Hardware ( Wrapper ) v Lab ─ Case study v Lab ─ Step and Step v Reference S&IP Consortium Course Material 65
Reference v Wen-Hsiung Chen, C. Harrison Smith, and S. C. Fralick, "A Fast Computational Algorithm for the Discrete Cosine Transform, " IEEE Trans. Commun. , vol. COM-25, pp. 1004 -1009, Sept 1977. v JPEG: Still Image Data Compression Standard by William B. Pennebaker and Joan L. Mitchell, Kluwer Academic Publishers, ISBN: 0442012721 S&IP Consortium Course Material 66
- Slides: 66