William Plishker Kaushik Ravindran Kurt Keutzer Design Flow

  • Slides: 1
Download presentation
William Plishker Kaushik Ravindran Kurt Keutzer Design Flow from Domain Specific Languages to Embedded

William Plishker Kaushik Ravindran Kurt Keutzer Design Flow from Domain Specific Languages to Embedded Multiprocessors http: //chess. eecs. berkeley. edu Application description Trend towards Embedded Multiprocessors Proposed Design Approach Popular Examples –Network processors (Intel, Motorola, etc. ) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba) –Research (RAW, IWarp, etc. ) Core Reg File Extension Reg Files Core ALU Extension ALU Data Load/Store $ $ The processor is the basic building block System Bus Instruction RAM/ROM Data RAM/ROM XLMI (peripherals) Software flexibility is key IPVerify Discard From. Device(1) IPVerify Lookup IPRoute Discard IPVerify Discard Diff. Serv … … Discard Dec. IPTTL Diff. Serv Dec. IPTTL Discard To. Device(0) To. Device(1) To. Device ) (n –Abstract model to represent concurrency –Natural to the application domain Discard Architectural Model To. Device(0) Dec. IPTTL IPVerify From. Device(1) Discard From. Device(2) To. Device(1) Dec. IPTTL IPVerify From. Device(3) Discard To. Device(2) Profile Architecture configuration HW / SW partitioning Task allocation Communication assignment Design Space Exploration Compilation / Synthesis Analytical Models for the Architecture Example Flow –Profile information for task execution times –Assume performance and communication requirements can be evaluated statically From (0) To (0) Lookup IPRoute From (1) Constraint Formulation and Optimization Methods To (1) S 2 S 1 –Partition tasks between processing elements –Assign application state to memory –Assign communication to hardware links –Find optimal configuration to maximize some performance metric Packet header Memory read L 11 L 21 T 1 R 2 L 12 L 22 T 2 Receive Lookup Stage 1 Lookup Stage 2 Transmit Branch 1 R 1 Branch 2 MEM FPGA PE PE FPGA … … To. Device(3) Discard … … From. Device(15) Dec. IPTTL Discard To. Device(15) IPVerify Discard Implementation Gap Intel IXP 2800 SRAM SDRAM ME ME ME ME –Description of computation on a target hardware –Task graph with platform specific computation and memory annotations Current Work Mapping network applications to multiple platforms –Application in Click DSL –Target multiprocessors: IXP 2 xxx network processor, Xilinx Virtex 2 VP 50 soft multiprocessor –Integer-linear programming approaches for task allocation For application specific programmable systems to succeed, it is necessary to deliver high-performance implementations quickly Generating an Application Execution Model –Unravel application tasks to expose concurrency –Partition application components into tasks –Annotate memory and communication requirements MEM Queue requirements Schedulable element rates High-level Optimizations Form Task Graph boundaries Application execution model Periodicity Communication requirements, shared resources Assign Element Implementation Options Execution Model Computation requirements (per implementation option) Computation requirements, Architecture constraints Task PE, Data Memory, Comm Interconnect/Memory Arbitration scheme selection Element tuning Element Implementation Selection Floor planning Mapping Translation to IXP-C SRAM Mapping Translation to C Sequential programs Scratchpad Execution Model Fabric and data requirements Configure Architecture PEs, Memory, Interconnect HW/SW Partitioning Mapping XScale PE FPGA Application description in DSL ME Cluster Media Switch Fabric FPGA PE Translation to RTL Programs + MHS SDRAM ME Cluster ME ME ME ME MB MB MB MB RTL ME Cluster Hash Unit Media Switch Fabric XScale Network Processor Soft Multiprocessor FPGA logic Platform Dependent ME Cluster Execution Model MEM Platform Independent –Capture those features of the architecture which most impact performance –Define components which must be annotated in the application to facilitate good mappings Lookup IPRoute Dec. IPTTL Hash Unit Execution Model Computation Model Dec. IPTTL May 11, 2005 Discard Diff. Serv Dec. IPTTL Key Models From. Device(0) Scratchpad From. Device(0) From. Device(n ) –Multiple processing elements –Heterogeneous memories –Special purpose hardware Low Level Programming Environment application without explicit designer intervention Discard Programming Challenges Natural representation of Application –DSLs are tailored to an application domain with: –component libraries –communication and computation semantics –visualization tools Extract parallelism from –test suites –Transform application description in DSL to execution model –Explore design space of the assignment of computation and communication to architectural resources –Produce set of sequential code to be handed off to traditional compilation techniques MEM Instruction Fetch Domain Specific Languages High-level optimizations MEM Timers, Interrupts Tensilica MPSo. C –Application specification in domain specific language (DSL) –Abstract model of architecture and transform application to execution model –Automated mapping from execution model to target architecture Mapping Procedure