VAPRES A Virtual Architecture for Partially Reconfigurable Embedded
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center for High-Performance Reconfigurable Computing (CHREC) Department of Electrical and Computer Engineering University of Florida
Adaptive Hardware Applications • Kalman filter used for target tracking • Finds likely location from noisy measurements • Optimized filter depends on target type Slow Target Fast Target Airborne Target Noisy Target Joseph Antoon University of Florida Low Power Constant gain Low Bandwidth Kalman Filter High Power Constant gain High Bandwidth Kalman Filter High Power Variable Gain Low Bandwidth Multi-scale Smoother High Power Variable Gain Low Bandwidth Kalman Filter 2
Adaptive Hardware Applications • • FPGAs often out-perform CPUs • Parallel computing power • Kalman filters scale well Partial Reconfiguration (PR) • Run-time HW adaptation • Allows FPGA time-sharing CPU FPGAs FPGA Device CPU • Memory Filter B A Communication Challenge • Transfers between modules can lock up CPU • Inter-module network alleviates resources Joseph Antoon University of Florida 3
Using Partial Reconfiguration System Specifications 1. Define system 2. Platform studio 3. Import into ISE top 7. Synthesize! static prr_a prr_b 11. Implement!Could you 6. Code PR 5. Set PRRs make it as black just a bit boxes Estimate 4. Divide project into mandated hierarchy 10. Create “configurations” Joseph Antoon University of Florida region HDL 9. Map on to 8. Guess 12. Write different… Plan. Ahead a good floorplan software 4
• Support • Only supported by Xilinx • Altera support announced • Lack of abstraction • Manual partitioning • Manual floor-planning • App-specific architectures • Increased time-to-market • Reduced flexibility F D ru es st ig rat n in Fl g ow ! Identifying Issues With PR In this work, we propose VAPRES • A Virtual Architecture for PR Embedded Systems • Abstracts base system from application • Automates design flow and floor-planning • Scalable, flexible features Joseph Antoon University of Florida 5
VAPRES Architecture • PR Regions (PRRs) PLB Bus – Independent clocks – FIFO-based I/O DCR – Online placement Micro. Blaze CPU Bridge DCR – Created separately Bridge Micro. Blaze PLB Bus • FSL MACS Fast – Intermodule network Simplex FSL Fast Simplex Links • Flexible, scalable – PR Region. PR Count – PR Region Size 1 Region – MACS bandwidth • Module channel width PR PR • Left to right channel width Socket • Right to left channel IF width – IO Module Count PR Region 1 PR Region 2 PR Socket IF IF IO Module IF Switch 1 Joseph Antoon University of Florida CPU To IO IF Switch 2 6
PR Region Connectivity Micro. Blaze PR Socket Device Control Register (DCR) Regional Clock Buffer (BUFR) PRR FSL Macro Clock Enable Reset Select FSL Fast Simplex Links Slice Macros PR Region Slice Macros Fast Clock Producer / Consumer Queues Slow Clock Multiplexer (BUFGMUX) Joseph Antoon University of Florida MACS Switch 7
MACS – Intermodule Network • • Minimal Adaptive-Routing Circuit Switched Network Circuit based • Uses streaming channels • Circuit set by first word in channel • Fast setup (<10 cycles) Module end 1 dst IF Module 2 IF Switch 2 Joseph Antoon University of Florida IF Module 3 IF Switch 2 8
Design Methodology • Two separate design flows • Base System • Application • Applications made independently • Only base system specs needed Joseph Antoon University of Florida App Flow Base system specifications 9
Base System Design Flow • • User feeds specs to VAPRES Base design created from specs • Parametric templates used System files generated • Floorplan and Constraints • Embedded Dev. Kit (EDK) Files • HDL Synthesis Implementation Bitstream generated System downloaded to the board Base system flow System Specs Templates Base Design Floorplan HDL Synthesis Implementation Generate Bitstream Joseph Antoon University of Florida 10
Application Design Flow • Partition App • Hardware • Software • Software flow • Compile • Link • Hardware Flow • Synthesize • Implement • Bitstream gen • Download App Joseph Antoon University of Florida Application Flow Application Decomposition Source Code HDL System Specs API Compile Synthesis Link Implementation Executable Generate Bitstream 11
Revisiting Target Tracking PLB Bus DCR Bridge Aerospace Kalman Filter Micro. Blaze CPU ICAP Filter Storage Looks like a spaceship Aerospace Blank Kalman PRFilter Region IO Module Sensor PR Socket IF IF Switch 2 Joseph Antoon University of Florida 12
Seamless Filter Swapping • Filter tracks target • Target slows down • Filter swap needed • First load new filter • Spare region used • Old filter continues • Micro. Blaze CPU Blank Module High Power Kalman Filter Redirect traffic • Downtime is now negligible • Previously in seconds Joseph Antoon University of Florida The target changed! IF Blank Module Low Power Kalman Filter Low Power Kalman Filter IF SW 2 IF IO Module IF SW 2 13
Experimental Setup - Resources • Implemented on ML 401 board • Virtex-4 LX 25 FPGA • VAPRES • Two PR Regions • 16 x 11 CLB region size • Two IOMs • MACS • Four switches • 32 -bit channels • Two channels left to right • Two channels right to left Base System View Floor Plan Joseph Antoon University of Florida Post Place and Route 14
Results – Resource Usage VAPRES Resource Usage 12000 19% 14% 9721 10000 Micro. Blaze MACS Remaining 67% 8000 28% LX 25 6000 6% 66% 4000 17 % 1890 2000 0 Micro. Blaze Joseph Antoon University of Florida MACS 4% LX 60 79 % LX 100 15
Experimental Setup – Timing • Two methods to reconfigure • Implemented in software • 1) Write bitfile in one stage • 2) Write bitfile in two stages • One-stage method M to BRAM • Load Flash. R sector A s. ICAP ed s e • Write to L quir • Repeat reuntil bitfile is loaded • Two-stage method , e c n • Load bitfile into BRAM o d n a e t o f • Write bitfile to ICAP L o e t i r w Joseph Antoon University of Florida Flash BRAM ICAP Board peripheral FPGA structure 16
Results – Reconfiguration Time Two-Stage One-Stage 0 0. 25 5% 0. 5 to + Writing Loading d e c u d e Loading r e t wri 94 ms Writing P ICA 1 71. 1. 25 0. 75 7% Loading Flash Writing ICAP 95% One-Stage Time Breakdown Joseph Antoon University of Florida 93% Two-Stage Time Breakdown 17
Experimental Setup - Scaling • Four VAPRES Systems Set Up Small PRRs: Width: Height: MACS: 1 10 CLB 1 row No Joseph Antoon University of Florida Medium PRRs: Width: Height: MACS: 1 10 CLB 2 rows No Large PRRs: Width: Height: MACS: 2 16 CLB 2 rows Yes Populous PRRs: Width: Height: MACS: 3 16 CLB 1 row Yes 18
Results - Scalability Resources (slices) 7600 7500 7400 7300 7200 7100 7000 6900 6800 6700 6600 Small Joseph Antoon University of Florida d e s a e r Inc. Medium. Size. Large R R P d e d Ad R PR d e s a e r Dec R Size Populous PR 19
Results - Scalability Maximum Clock (MHz) 121 120 119 118 117 116 115 114 s n g i Small s z e h d l M l A t 100 t meeonstrain c Joseph Antoon University of Florida Medium Large Populous 20
Conclusions • We developed VAPRES • Virtual Architecture for Partially Reconfigurable Systems • Contributions • Modular design methodology • PR regions with independent, selectable clocks • Highly parametric design • Seamless filter swapping • Future work • Algorithms for runtime module placement • Tools to assist system design formulation • Context save and restore for modules Joseph Antoon University of Florida 21
Thank you for attending Questions? Joseph Antoon University of Florida 22
- Slides: 22