RunTime FPGA Partial Reconfiguration for Image Processing Applications

  • Slides: 29
Download presentation
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph. D. Student NSF

Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph. D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Assistant Professor of ECE NSF CHREC Center, University of Florida

Introduction n Run-time reconfiguration is an important feature in SRAM-based FPGAs that allows changes

Introduction n Run-time reconfiguration is an important feature in SRAM-based FPGAs that allows changes in functionality dynamically q n Enables benefits such as flexibility, hardware reuse, and reduced power consumption Drawbacks of run-time reconfiguration q Entire fabric is reconfigured even for slight design changes n q n System execution stalls completely Time to load a design onto the fabric from external memory (reconfiguration time) increases with bitstream size Run-time reconfiguration is enhanced by run-time partial reconfiguration (PR) which mitigate these drawbacks Power Savings Design A, B, & C stored in external memory Flexibility Designs loaded when required Design A Design B Configuration controller Design C A B Design C External memory B Required Design C A 2 FPGA Fabric Hardware Reuse Current required design replaces old one on the same fabric

Partial Reconfiguration (PR) PR allows the ability to reconfigure a portion of an FPGA

Partial Reconfiguration (PR) PR allows the ability to reconfigure a portion of an FPGA dynamically by dividing the FPGA into two types of regions n PR benefits in addition to full reconfiguration benefits q q Static modules Only reconfigured PRR is stalled while static region and other PPRs continue operating Smaller bitstreams sizes n n n Central Controlling Agent q Static region - contains static portion of the design (Static Modules) Partially reconfigurable region (PRR) loaded with a partial reconfiguration module (PRM) Reduced power consumption Reduced memory requirements Reduced time to reconfigure Static modules Example with 2 PRRs 3 Static region q ICAP Mem controller n Module A Module B Module C Module D Reconfigurable Modules (PRMs) PRR 1 PRR 2 FPGA Fabric Module: A&B Modules: C&D

PR Challenges and Motivations n PR is hard q q q n Complicated design

PR Challenges and Motivations n PR is hard q q q n Complicated design flow Requires manual intervention and knowledge about target device Can decrease system performance as compared to full reconfiguration if system design is not carefully considered Despite these challenges, PR can potentially prove to be beneficial for certain application types q n Manual Steps Resource savings, flexibility, power savings, etc. Thus, it becomes necessary to explore PR for different application types q q Provides valuable insights Potentially ease PR for designers of similar application types 4 Xilinx PR Implementation Flow HDL Design Description HDL Synthesis Set Design Constraints Timing/ Placement Analysis Implement Static Design and PR Modules Merge Final Generated Bitstreams

Contribution n PR architecture benefits for a JPEG encoder system q n n The

Contribution n PR architecture benefits for a JPEG encoder system q n n The JPEG encoder PR architecture provides increased flexibility as compared to a non-PR architecture Leveraging the JPEG encoder PR architecture we propose a PR architecture for a JPEG encoder/decoder (codec) system q n JPEG encoder/decoder systems are a key enabling technology for lowpower and high-performance image transmission for on-line satellite communication The proposed PR codec architecture will provide significant benefits in terms of resource savings and power savings as well as flexibility Study of the PR architecture of the JPEG systems can be adapted to realize potential benefits for similar applications types 5

JPEG Encoder/Decoder Process n JPEG encoding process for color images is divided into four

JPEG Encoder/Decoder Process n JPEG encoding process for color images is divided into four main steps q q Color Space Transformation - RGB to YCb. Cr Forward Discrete Cosine Transform (FDCT) Quantization Entropy Encoding 8 x 8 data block of each color component Table Specifications RGB to YCb. Cr FDCT Compressed Image Data Entropy Encoder Quantizer 6

JPEG Encoder/Decoder Process n JPEG encoding process for color images is divided into four

JPEG Encoder/Decoder Process n JPEG encoding process for color images is divided into four main steps q q n Color Space Transformation - RGB to YCb. Cr Forward Discrete Cosine Transform (FDCT) Quantization Entropy Encoding JPEG decoder performs these steps in reverse Reconstructed 8 x 8 data block of each color component Table Specifications YCb. Cr to RGB IDCT Compressed Image Data Entropy Decoder Dequantizer 7

JPEG Encoder Architecture HOST DATA RAM 8 x 8 Data blocks Control Signals Host

JPEG Encoder Architecture HOST DATA RAM 8 x 8 Data blocks Control Signals Host Interface Buffer RGB 2 YCb. Cr & FDCT 2 D MUX Control Signals Data Control Signals HOST PROG H: 0 x. FF Header Generator Pipeline Controller Zig. Zag Quantizer Run Length Encoder Huffman Encoder and Byte Stuffer Skipping ahead to when processing is almost done…… 8

JPEG Encoder Architecture HOST DATA RAM 8 x 8 Data blocks Control Signals Host

JPEG Encoder Architecture HOST DATA RAM 8 x 8 Data blocks Control Signals Host Interface MUX Control Signals Data Control Signals HOST PROG EOI : 0 x. FF Header Generator Buffer Pipeline Controller RGB 2 YCb. Cr & FDCT 2 D Zig. Zag Quantizer Run Length Encoder Encoding Complete! 9 Huffman Encoder and Byte Stuffer

JPEG Encoder PR Architecture Pipeline controller module q q Ability to replace with updated

JPEG Encoder PR Architecture Pipeline controller module q q Ability to replace with updated module Ability to replace with different controller module HOST DATA RAM Control Signals HOST PROG Control Signals Host IF Control Signals Data n Buffer RGB 2 YCb. Cr & FDCT 2 D MUX Header Generator Pipeline Controller Zig. Zag Quantizer 10 Byte Stuffer Run Length Encoder Huffman Encoder

JPEG Encoder PR Architecture RGB 2 YCb. Cr and FDCT module q q q

JPEG Encoder PR Architecture RGB 2 YCb. Cr and FDCT module q q q Ability to replace with an updated module Ability to skip color space transformation by replacing with a module that only does DCT Ability to replace different DCT types HOST DATA RAM Control Signals HOST PROG Control Signals Buffer RGB 2 YCb. Cr & FDCT 2 D PR Region PR Module Host IF Control Signals Data n MUX Header Generator Pipeline Controller Zig. Zag Quantizer 11 Byte Stuffer Run Length Encoder Huffman Encoder

JPEG Encoder PR Architecture Entropy encoder modules (Zigzag, Run length, Huffman, Header Generator, Byte

JPEG Encoder PR Architecture Entropy encoder modules (Zigzag, Run length, Huffman, Header Generator, Byte Stuffer) q q q Ability to update each individual module Ability to employ different entropy encoding schemes Ability to replace Huffman code tables and update header accordingly HOST DATA RAM Control Signals HOST PROG Control Signals Buffer RGB 2 YCb. Cr & FDCT 2 D PR Region PR Module Host IF Control Signals Data n MUX Header Generator Pipeline Controller Zig. Zag Quantizer 12 Byte Stuffer Run Length Encoder Huffman Encoder

JPEG Encoder PR Architecture Quantization Module q q Ability to replace with an updated

JPEG Encoder PR Architecture Quantization Module q q Ability to replace with an updated module Ability to change quantization matrix tables to control image quality HOST DATA RAM Control Signals HOST PROG Control Signals Buffer RGB 2 YCb. Cr & FDCT 2 D PR Region PR Module Host IF Control Signals Data n MUX Header Generator Pipeline Controller Zig. Zag Quantizer 13 Byte Stuffer Run Length Encoder Huffman Encoder

JPEG Encoder PR Architecture Advantages of JPEG Encoder PR Architecture q q Provides flexibility

JPEG Encoder PR Architecture Advantages of JPEG Encoder PR Architecture q q Provides flexibility by allowing the ability to replace different modules More interesting benefits arise when the encoder architecture is combined with a decoder architecture HOST DATA RAM Control Signals HOST PROG Control Signals Buffer RGB 2 YCb. Cr & FDCT 2 D PR Region PR Module Host IF Control Signals Data n MUX Header Generator Pipeline Controller Zig. Zag Quantizer 14 Byte Stuffer Run Length Encoder Huffman Encoder

JPEG Codec PR Architecture HOST DATA PR Region RAM PR Module Control Signals Host

JPEG Codec PR Architecture HOST DATA PR Region RAM PR Module Control Signals Host IF MUX Control Signals Data Control Signals HOST PROG Buffer Byte Stuffer Decoder Pipeline Controller Encoder Header Generator DEMUX YCb. Cr 2 RGB & & FDCT 2 D IDCT 2 D RGB 2 YCb. Cr Reorder Zig. Zag Dequantizer Quantizer 15 Run length decoder Encoder Huffman Decoder Encoder

JPEG Codec PR Architecture Encoder Data Path HOST DATA PR Region RAM PR Module

JPEG Codec PR Architecture Encoder Data Path HOST DATA PR Region RAM PR Module Control Signals Host IF MUX Control Signals Data Control Signals HOST PROG Buffer Byte Stuffer Decoder Pipeline Controller Encoder Header Generator DEMUX RGB 2 YCb. Cr & FDCT 2 D Zig. Zag Quantizer 16 Run length Encoder Huffman Encoder

JPEG Codec PR Architecture Decoder Data Path HOST DATA PR Region RAM PR Module

JPEG Codec PR Architecture Decoder Data Path HOST DATA PR Region RAM PR Module Control Signals Host IF MUX Control Signals Data Control Signals HOST PROG Buffer Byte. Stripper Stuffer Byte Decoder Pipeline Controller Encoder Header Decoder Generator DEMUX YCb. Cr 2 RGB & IDCT 2 D Reorder Dequantizer 17 Run Length Decoder Huffman Decoder

n JPEG Codec PR Architecture Benefits Contd. Resource savings q n q Power savings

n JPEG Codec PR Architecture Benefits Contd. Resource savings q n q Power savings n q PR module bitstreams stored in memory and loaded on demand (decoding or encoding) as opposed to both occupying actual hardware resources Increased flexibility n n Same hardware resources shared between encoder and decoder Encoder and decoder PR modules can be updated as needed or replaced with one of another type as per application requirements Architecture limitations q For a PR module loaded into a particular region n n The loaded PR module’s size and resource requirements (slices, FIFOs, BRAMs, DSPs) cannot exceed the maximum available in the PR region PR module port connections, both incoming and outgoing, cannot exceed the PR regions maximum incoming and outgoing port connections, respectively 18

Experimental Setup n Software q Xilinx ISE 9. 204 with PR patch 12 installed

Experimental Setup n Software q Xilinx ISE 9. 204 with PR patch 12 installed n Synthesize options q q n Optimization Goal - Speed Optimization Effort - Normal Hardware q Xilinx Virtex-4 LX 60 19

Results – Architecture Input image specifications Specifications n q q n Color images only

Results – Architecture Input image specifications Specifications n q q n Color images only (3 components, RGB input) Supported resolution 800 x 600 JPEG Encoder system q q JPEG baseline encoding JPEG ITU-T T. 81 | ISO/IEC 10918 -1 Standard JFIF header v 1. 01 automatic generation Design operates above 100 MHz Hardcoded Huffman tables and two programmable quantization tables, one for luminance and one for chrominance at 50% quality settings 50% quality reduced 100 Mhz T H JPEG Encoder System 20

Results: JPEG Encoder PR Architecture Quantizer Pipeline controller FDCT Huffman Encoder Floorplan Module Zigzag

Results: JPEG Encoder PR Architecture Quantizer Pipeline controller FDCT Huffman Encoder Floorplan Module Zigzag Module Byte Stuffer Module Header Generation Module 21 Module Run length Encoder Module

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices =

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices = 5, 531 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 JPEG Encoder PR Architecture q q q Total Slices = 5, 678 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 Slice Requirements 22

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices =

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices = 5, 531 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 JPEG Encoder PR Architecture q q q Total Slices = 5, 678 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 DSP 48 Requirements 23

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices =

Results: Resource Requirements n JPEG Encoder Architecture q q q n Total Slices = 5, 531 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 JPEG Encoder PR Architecture q q q Total Slices = 5, 678 Total DSP 48 s = 9 Total FIFO/RAMB 16 s = 27 FIFO/RAMB 16 s Requirements 24

n n Results: PR vs Non PR Encoder architecture slice requirement = 5531 Architectures

n n Results: PR vs Non PR Encoder architecture slice requirement = 5531 Architectures Predicted decoder architecture slice requirement ~ 5600 25

n n Results: PR vs Non-PR Predicted total slice requirement for non-PR codec architecture

n n Results: PR vs Non-PR Predicted total slice requirement for non-PR codec architecture ~ 11200 Architectures Predicted PR codec architecture slice requirement ~ 6100 11200 Slices Slice Macro Overhead : 7% or 416 Slices 26

n n n Results: PR vs Non-PR Predicted total slice requirement for non-PR codec

n n n Results: PR vs Non-PR Predicted total slice requirement for non-PR codec architecture ~ 11200 Architectures Predicted PR codec architecture slice requirement ~ 6100 Predicted resource savings from PR codec architecture = ((11200 -6100) * 100)/11200 = 45%!! 45% Savings!! 27

Conclusions and Future Work n Conclusions q q n We created a JPEG encoder

Conclusions and Future Work n Conclusions q q n We created a JPEG encoder PR architecture for image encoding The architecture provides increased flexibility and potential power savings with a slice macro overhead of only 4% The architecture forms a base for a JPEG codec PR architecture The JPEG codec architecture is predicted to benefit from increased flexibility and power savings as well as area savings as much as 45% relative to a non-PR codec architecture Future Work q q Complete proposed JPEG codec PR architecture Extend work to development of PR architectures for MPEG/H. 264 encoder and decoder systems 28

QUESTIONS? This work was supported in part by the I/UCRC Program of the National

QUESTIONS? This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. We also gratefully acknowledge tools provided by Xilinx.