Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan
Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory Dept of ECECS, University of Cincinnati [satish, ranga] @ececs. uc. edu This work is sponsored in part by the US Air Force, Wright Laboratory, WPAFB, under contract number F 33615 -97 -C-1043 Ganesan 1 P 9
Synthesis System Overview Input Specification (VHDL / C) Translator High-level Synthesis Dynamic Reconfiguration Set Generation Logic Elaboration Layout Synthesis Host-side Controller Ganesan PARTIALLY RECONFIGURABLE FPGA 2 P 9
Target Architecture Model Features: • Partially reconfigurable device where a portion device of the device can be reconfigured while the remaining part is still operational P 1 P 2 • Target device split into two parts : P 1 , P 2 • Design is split into sequential blocks and loaded on the two portions of the device • Reconfiguration of a block is overlapped with execution of another Ganesan 3 P 9
Input Specification • Behavior specification in VHDL/C subset • Translated into Intermediate Representation • Intermediate Representation: Block 1 Behavior Block Input Format Block 2 • Single thread of control • Each block performs set of computations Block 3 Block 4 • Data transfer through branch interface Block 5 • Supports control constructs Block 6 Ganesan 4 P 9
High-level Synthesis (HLS) Input Specification (Behavior Blocks) RTL Component Library Area / Timing Constraints High-level Synthesis Engine Scheduling Allocation Binding Register - Transfer Level Design (RTL Blocks) Ganesan 5 P 9
High-level Synthesis (HLS) • Each behavior block in the block graph separately synthesized Block 1 RTL Blk 1 Block 2 RTL Blk 2 Block 3 Block 4 HLS RTL Blk 3 RTL Blk 4 Block 5 RTL Blk 5 Block 6 RTL Blk 6 Ganesan 6 P 9
RTL Model I/0 Clock Reset Start Finish DESIGN DATAPATH Flags CONTROLLER (net-list of components) Controls (finite state machine) Glushkovian Model • Components in the datapath implement operations specified in behavior • Controller (FSM) provides necessary controls for execution • HLS generates 4 signals : Clock(in), Reset(in), Start(in), Finish(out) Ganesan 7 P 9
Dynamic Reconfiguration RTL Blk 1 RTL Blk 2 RTL Blk 3 RTL Blk 4 DR RTL Blk 5 RTL Blk 6 RTL Blk 1 RTL Blk 2 RTL Blk 3|4 RTL Blk 5 RTL Blk 6 RTL Blk 5 Input: • RTL block graph, with each block having been separately synthesized Output: • Sequence of reconfiguration sets • Each reconfiguration set has two blocks: one reconfigures, other executes • Intermediate data between blocks stored in board registers Ganesan 8 P 9
Dynamic Reconfiguration: Example Step 1: RTL Block 1 is loaded on the device Step 2: RTL Block 1 is executed ; RTL Block 2 is configured RTL Blk 1 RTL Blk 2 Step 3: RTL Block 1 completes execution ; RTL Block 3 is reconfigured in place of RTL Block 1; RTL Block 2 is RTL Blk 3 executed RTL Blk 4 Step 4: Repeat Steps 2 and 3 until all RTL blocks have been RTL Blk 5 loaded and executed Ganesan 9 P 9
Latency Improvement Latency of design without DRSG approach L 1 = (R i + E i) RTL Blk 1 1 <= i <= n RTL Blk 2 • Latency of design with DRSG approach L 2 = R 1 + max(R i+1, E i) 1 <= i <= n where : Ri : reconfiguration time of ith block Ei : execution time of ith block • It is easily seen that Ganesan RTL Blk 3 RTL Blk 4 RTL Blk 5 L 2 <= L 1 10 P 9
Handling Conditional Constructs • RTL Block 1 is a conditional block • Either RTL Block 2 or RTL Block 3 is executed due to single thread of control • Two approaches to handle conditional branching RTL Blk 1 RTL Blk 2 Approach I: host polling • The host waits on the conditional predicate to evaluate to load the appropriate branch L 1 = R 1 + max(R i+1 , E i) +Rj RTL Blk 3 RTL Blk 4 1 <= i <= n where Rj : reconfiguration time of the branch that is executed Ganesan 11 P 9
Handling Conditional Constructs Approach II: branch prediction • The host loads one of the branches based on a user given profile • Latency of the design if the correct branch was loaded L 1 = R 1 + max(R i+1 , E i) RTL Blk 1 RTL Blk 2 RTL Blk 3 1 <= i <= n • If the wrong branch was loaded, L 2 = R 1 + max(R i+1 , E i) +Rj RTL Blk 4 1 <= i <= n where Rj : reconfiguration time of the branch • L 1 <= L 2 , always Ganesan 12 P 9
Logic Elaboration Input RTL Specification RTL Component Library Logic Elaboration VELAB Elaborated net-list file in EDIF format Features: • Pre-placed component library to aid layout synthesis • RTL specification obtained form HLS tool ASSERTA • Net-list produced in EDIF format Ganesan 13 P 9
Layout Synthesis Input Net-list Specification Layout Synthesis XACT 6000 FPGA bit-stream Features: • Manual placement required to ensure place and route using XACT 6000 • Replaced blocks are placed in the same location as the blocks they substitute • Bitmap files produced in cal format Ganesan 14 P 9
Host-side Controller Bitmap files Reconfiguration Set Sequence Host-side Controller RTR implementation of design Features: • Manages the partially reconfigurable FPGA device • Loads and executes bitmap files based on the reconfiguration sequence generated by DRSG phase • Device used is Xilinx 6200 Ganesan 15 P 9
Results : Percentage Configuration time Design Total rec. Total exec Overlap Latency % conf 929 us 1025 us 678 us 1276 us 19. 7 4 x 4 1 D DCT 1416 us 2008 us 1161 us 2263 us 11. 2 200 us 538 us 62. 8 4 x 4 2 D FFT 16 -tap FIR 338 us • Table presents percentage total time spent only in configuration using the synthesis flow • The examples show significant improvements in overall latency Ganesan 16 P 9
Conclusions and Future Work Conclusions: • Presented a synthesis system for partially reconfigurable FPGAs • Proposed a dynamic reconfiguration set generation strategy to improve overall design latency by reducing reconfiguration time • Results showed considerable decrease in reconfiguration times Future work: • Automate the procedure of generating run-time reconfigurable designs for partially reconfigurable FPGAs Ganesan 17 P 9
- Slides: 17