Automating Resource Optimisation in Reconfigurable Design Xinyu Niu
Automating Resource Optimisation in Reconfigurable Design Xinyu Niu, Thomas C. P Chau, Qiwei Jin, Wayne Luk and Qiang liu {nx 210, cpc 10, qj 04 , wl}@doc. ic. ac. uk qiangliu@tju. edu. cn Configuration Level ABSTRACT • Generate configurations based on assigned ALAP and ATAP levels • New design approach: automatically identify and exploit run-time reconfiguration for optimising resource utilisation • Configuration Data Flow Graph: a hierarchical graph structure, for synthesis of reconfigurable designs in three steps • Evaluation: barrier option pricing (finance), particle filter (robotics), reverse time migration (oil and gas) • Improvement: 1. 61 to 2. 19 times faster than optimised static FPGA designs, up to 28. 8 times faster than optimised CPU reference designs, and 1. 55 times faster than optimised GPU designs; up to 29 times more energy efficient than CPU/GPU • Optimise configurations to fully utilise available resources • Adopt analysis of resource consumption and bandwidth in the design model Motivating Example n 2 n 3 n 4 n n 2 n 7 n/3 8 n/3 Partition Level • Group Configurations into partitions by mapping configurations into a Configuration Graph • Assign design rules to build the search space • Generate valid partitions within the pruned search space by recursive search algorithm • Static design: accommodate all functions to accomplish the application • Dynamic design: reconfigure design dynamically, implement only active functions Design Flow Results Design issues are addressed at multiple levels: • Function level: extract function information, assign data dependency levels • Configuration level: generate configuration based on designs rules, optimise configurations • Partition level: generate partitions recursively, select the optimal runtime solutions Function Level Algorithm level Function properties extracted from algorithm details: • resource consumption, • bandwidth requirement, • memory architectures • data dependency RESEARCH POSTER PRESENTATION DESIGN © 2011 www. Poster. Presentations. com Function level Function nodes are assigned: • As Late As Possible (ALAP) levels based on interactions between function nodes, • As Timely As Possible (ATAP) levels based on data dependency inside a function Benchmarks: • Barrier Option Pricing (BOP) • Particle Filter (PF) • Reverse-Time Migration (RTM) Exploring original search space Exploring pruned search space Scalable design method: • Design rules to eliminate redundant search space • Design algorithm to only explore valid design space Improvement over static designs: • FPGA devices: 4 Xilinx Virtex-6 SX 475 T FPGAs, hosted in a Maxeler MPC-C 500 computing node, running at 100 MHz • 1. 94, 2. 19 and 1. 61 times faster respectively for barrier option pricing, particle filter, reverse time migration • Static design: 34% to 59% of theoretical performance • Reconfiguration removing idle functions: 98% of theoretical performance • Inefficiency for dynamic design: due to reconfiguration overhead Improvement over optimised CPU and GPU designs: • CPU: 24 Intel Xeon X 5660 cores running at 2. 67 GHz • GPU: an NVIDIA Tesla C 2070 card running at 1. 15 GHz, linearly scaled by 4 for comparison with multi-FPGA designs • Up to 27 times faster than CPU, 1. 55 times faster than GPU • Up to 29 times more energy efficient than CPU and GPU designs • Proposed approach applicable to multi-chip environment • Scalability: limited by reconfiguration overhead, can overcome by parallel reconfiguration of multiple FPGAs Acknowledgement: This work was supported in part by UK EPSRC, by the European Union Seventh Framework Programme under Grant agreement number 257906, 287804 and 318521, by the Hi. PEAC No. E, by Maxeler University Programme, and by Xilinx.
- Slides: 1