HWSW CoSynthesis of Dynamically Reconfigurable Embedded Systems HWSW
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms
Presentation Outline l l l Introduction Basics/Preliminaries Problem Formulation Representative Approaches Conclusion
Introduction l Embedded Systems? l l Design Goals? l l Special purpose/dedicated systems Highly optimized but Cost Efficient Examples l embedded system provides a friendly interface l hand-held devices, such as a cellular phone or PDA an industrial controller l l safety-critical controller, such as an antilock brake controller in a car or an autopilot
Generic Architectural Template ASIC General Purpose Processor Digital Signal Processor Dedicated Data path Memory
HW/SW Co-Design l Need? l l Increasing design complexities Need to explore the design efficiently CAD/Design Automation Co-Design Steps l l l Co-specification: Specifications describing both HW/SW elements (and the relationship between them) Co-synthesis: Automatic or semi-automatic design of HW/SW to meet a specification Co-Simulation: Simultaneous simulation of HW/SW elements, often at different levels of abstraction
Co-Synthesis Problem l l Partitioning the functional description between HW and SW Allocating processes to processing elements (PEs) Scheduling processes on the PEs Binding processing elements to particular component types
Dynamically Reconfigurable Logic l l l Alternative to conventional ASICs and generalpurpose processors post-fabrication customized for a wide class of applications partially reconfigured at run -time to implement different tasks without effecting computation of other tasks On Chip SRAM/ Cache Embedded CPU Dynamically Reconfigurable Data path
Inputs Specification l l l Task Graphs Estimation/Profiling Resource Libraries
DRL Architecture Model l Frame: atomic reconfiguration storage unit that can be dynamically updated Multiple frames reconfigured one by one Reconfiguration of one frame does not disturb the execution of other frames
Partitioning and Scheduling l Partitioning l l l Coarse Grained – Tasks Level Fine Grained – Basic block Level Scheduling l l Static (design time) Dynamic (At run time)
Challenges of Using DRL Reconfiguration management 1. l Goal: To minimize no. of reconfigurations l l l Reconfiguration Delays Execution Reconfiguration Consumes Power How? l l Tasks Ordering Pre-fetching
Representative Co-synthesis Systems l l l CORDS – Princeton University CRUSADE – Bell Labs SLOPES – Princeton University NIMBLE Compiler Recent – Run-time Scheduling (by Juanjo Noguera, Rosa M. Badia)
NIMBLE Compiler l partitioning algorithm l selects which loops to implement in the FPGA, and which hardware version of each loop should be used to achieve the highest application-level performance
NIMBLE Compiler l Multiple Loop Implementations in HW
NIMBLE Compiler l Heuristic Using Loop Procedure Hierarchy Graph
SLOPES l l l Multi-objective: Price Power Performance Genetic Algorithm for Partitioning and Allocation Scheduling Heuristic l takes into account the delay and power overheads of dynamic reconfiguration
Scheduling Issues l Scheduling sequence l l l multiple ready tasks may reside candidate pool different time, resource and reconfiguration requirements, and power consumption changing the scheduling order may have a significant impact on scheduling quality
Scheduling Issues l Location assignment policy l l possible positions in the FPGA where the circuit implementing the task can be located different locations not only influences the current task, but may also impact the tasks scheduled either after or before it
SLOPES Scheduling l Scheduling sequence l l The order of scheduling tasks is determined dynamically by task priorities Location assignment policy l The global reconfiguration information for all the tasks assigned to the FPGA is considered
Examples
Scheduling Sequence Policy l Dynamic Priority Assignment
Location Assignment Policy l l l Reconfiguration prefetch Configuration pattern reutilization Eviction candidate Fitting policy Slack time utilization
Location Assignment Policy l Frame Priorities
Dynamic Run-time Scheduling l Motivations l l Data Dependent Computation Multi-functions Systems
Proposed Architecture Model
Partitioning: List Based
Scheduler
Scheduling
Conclusion l l Low delay reconfigurable devices Automated Co-synthesis Systems using DRL are able to meet specifications Cost Efficiently Reduced Design Time
- Slides: 29