Design Support for Embedded Processors and Applications Prof

Design Support for Embedded Processors and Applications Prof. Kurt Keutzer EECS University of California Berkeley, CA keutzer@eecs. berkeley. edu

Embedded system needs meet technology constraints u Embedded system design needs: s Fast-time to market s Predictability s Reliability s Robustness s Efficiency s Economy u Application-specific integrated circuits failing to deliver this s Design risk s Design/tool cost s Unmanageable complexity 2

Demise of ASIC: Total IC Designs ASSP ASIC Handel Jones, IBS 9/23/2002 3

Customer needs meet technology constraints u Customer needs: s Fast-time to market s Predictability s Reliability s Robustness s Efficiency s Economy u Looking for ``platforms’’ – devices that will amortize system design costs over multiple generations u ``"Based on our analysis, having a software approach is the only way to scale to the next generation, " Corgan (, Intel PMM said). "If you have to approach each fourfold increase in speed — from OC 48 [2. 5 Gbits/second] to OC-192 [10 Gbits/s], say — with a new architecture, it's not cost-effective. " 4

Solution: ASIC => ASSP => ASIP: Programmable Platforms u. Develop platforms that allow for amortization of design costs over multiple generations SDRAM Controller PCI Interface u. Make platforms programmable so that they have maximum flexibility with minimum overhead I$ Strong Arm Core D$ Mini D$ engine engine Hash Engine IX Bus Interface Scratch Pad SRAM engine SRAM Controller 5

Example: Philips Nexperia TM MIPS CPU PRxxxx TM-xxxx DEVICE I/P BLOCK. . . DEVICE I/P BLOCK TM Tri. Media CPU D$ I$ VLIW Media Processor: • 100 to 300+ MHz • 32 -bit or 64 -bit DEVICE I/P BLOCK PI BUS I$ MMI DVP MEMORY BUS D$ Tri. Media SDRAM PI BUS General Purpose RISC Processor • 50 to 300+ MHz • 32 -bit or 64 -bit Library of Device Blocks • Image coprocessors • DSPs • UART • 1394 • USB TM . . . DEVICE I/P BLOCK Nexperia System Busses • PI bus • Memory bus • 32 -128 bit • …and more DVP System Silicon Flexible architecture for digital video applications 6

Configurable/Reconfigurable Processors Domain-Specialization Morphics PMC Sierra Improv Systems Chameleon Systems Frontier Design Specialized Instruction-Set Architectures Specialized Micro-Architectures Tensilica Processor FPGA Xilinx ARC Altera Triscend e. ASIC Actel Adaptive Silicon Proceler Atmel 7

Galaxy of Network Processors EZchip 64 48 Cisco 32 Number of PEs 20 18 IBM Lexra Motorola 16 Cognigine 14 12 Xelerated 10 Clear. Speed Vitesse Alchemy Conexant 8 Intel 6 Applied Micro 4 Agere 2 BRECIS Broadcom PMC-Sierra 0 0 1 2 64 instrs/cycle 16 instrs/cycle 3 8 instrs/cycle 4 5 6 Issue width per PE Clearwater 7 8 9 10 8

ASIP/Programmable Platform Characteristics 5 Axes of the Architectural Design Space u Approaches to Parallel Processing s Processing Element (PE) level s Instruction-level s Bit-level u Elements of Special Purpose RISC + SFU Hardware u Structure of Memory Architectures u Types of On-Chip Communication RISC + SFU Mechanisms u Use of Peripherals Era of a single RISC PE (+ 1 DSP) over! RISC Ether net MAC CAM Coproc SRAM This is not a ``run POSIX on an ARM’’ problem! Explosion of application specific 9

Three Key Problem Areas Emerge u Development of programmable platforms/ASIP: s Characterizing target applications s Design space exploration u Deployment of programmable platforms/ASIP: s Development of programming model s Provision of software environment u Mapping applications onto programmable platforms/ASIP: s Application modeling s Application mapping 10

Addressing the problem areas Modern Embedded Systems Compilers Architectures and Languages SDRAM Controller PCI Interface MESCAL research mission: I$ engine Hash Engine engine IX Bus To bring a disciplined Interface Strong engine methodology, and a Arm D$ Scratch Core supporting tool set, to the Pad engine Mini SRAM development, deployment, D$ engine and programming of SRAM application-specific Controller programmable platforms aka Invited paper: ``From application specific. ASIC to ASIP: The Next Design Discontinuity’’, instruction processors. Press coverage Sept 2002: s K. Keutzer, S. Malik, R. Newton, Programmable Platforms will Rule: Proceedings of ICCD, pp. 84 -91, 2002. http: //www. eetimes. com/story/OEG 20020911 S www. gigascale. org/mescal High on MESCAL http: //www. eetimes. com/story/OEG 20020911 S 11

Three Key Problem Areas u Development of programmable platforms: s Characterizing target applications s Design space exploration u Deployment of programmable platforms: s Development of programming model s Provision of software environment u Mapping applications onto programmable platforms s Application modeling s Application mapping 12

Complementary Issues IP Forwarding Engine Port 0 Heterogeneous applications Port 0 u Programming Environment Heterogeneous programmable platforms/ASIP u. Domain-specific models u. Programming model t t Domain specific libraries Environmental models u ``Software architecture’’/MOC u. Primitive computation and communication mechanisms u Mapping to implementation u. Domain-specific presentation t t Device Specific Libraries Environmental support u System architecture/micro- architecture/MOC u Primitive computation and communication mechanisms 13

Our Approach u Bottom-up view - create abstractions of existing devices s opacity - hide micro-architectural details from programmer s visibility - sufficient detail of the architecture to allow the programmer to improve the efficiency of the program u Top down – experiment with existing modeling/programming environments s Learn from their abstractions of the devices s Try to maximize performance within these environments 14

Our Constraint/Angle/Prejudice u In real-time embedded systems correct logical functionality can never be divorced from system performance u In commercial (especially consumer-oriented) embedded systems system price is an utmost concern u Quantitative s (Quantitatively) examine trade-offs among: t Quality-of-results (e. g. speed, but also power, device cost) t Programmer productivity (how long does all this take? ) 15

Application: IPv 4 Forwarding Benchmark Port 0 IP Forwarding Engine Port 0 Port 1 IP Forwarding Engine Port 1 Port 2 . . . Port 15 Ingress Ports IP Forwarding Engine FIFOs Functionality FIFOs Egress Ports 16

Example Programming target – IXP 1200 u Intel IXP 1200 s Multiple processors s specialized execution units s hardware context swap SDRAM Controller PCI Interface u ``Intel Corp. , … chose a "fully programmable" architecture with plenty of space for users to add their own software — but one that turned out to be difficult to program. ‘’ I$ Strong Arm Core D$ Mini D$ engine engine Hash Engine IX Bus Interface Scratch Pad SRAM engine SRAM Controller u http: //www. eetimes. com/story/O EG 20020830 S 0061 u How do we program these architectures? What’s the right programming model? 17

Base-line reference implementations from Intel Assembler u Reference application u. Engine C Features u Reference application modified u Basic C language constructs like loops, condition statements and basic data types (char, int, float) u IXP library defines additional data types, macros and functions (useful for common networking applications) u Memory management is user defined. Hence explicit declaration of memory allocation (and no support for pointers). 18

Commercial NPU programming environment Teja Technologies u Teja is founded by Akash Deshpande – Student of Prof. Pravin Varaiya u Based on his thesis “Control of Hybrid Systems” (1994) Teja Language Features u User interacts mostly with the graphical interface (which exports pre-defined application primitives) u Extending the Teja primitives is done via a FSM-based model (however, this still requires coding in assembly via the graphical interface) u Memory management for pre-defined primitives is done by Teja. User can alter this process (but is tedious and error prone) 19

Teja Features 20

Our own NPU programming environment: NPClick u Based on Click Popular environment for describing/implementing network applications s Developed by Eddie Kohler, MIT=> ICSI u NPClick s Implemented subset of element library in IXP u. C s Element communication via function calls t maintained semantics (packet push/pull) s packet storage fixed: t header in SRAM t payload in DRAM u Designer needs to specify: s thread boundaries s thread/u. Engine assignment s memory allocation of queues (SRAM, DRAM, Scratch) u Opportunities for optimization (future work) s redundant memory loads/stores based on element/thread mapping s schemes for multiplexing hardware resources among multiple element instantiations (e. g. muxing TFIFO among 8 to Device’s) s 21

Programming Models for IXP 1200 22

Productivity Estimates u ``First time’’ learning curve issues makes it difficult to compare the productivity of these approaches u Based on our experience, we estimate the following design times for implementing an IPv 4 router Time to functional correctness Additional time for performance tuning ASM 8 weeks u. C 4 weeks 6 weeks Teja 2 weeks 3 -4 weeks NPClick 2 days 2 weeks u The advantages with Teja and NPClick come from the ability to perform design-space exploration at a higher level 23

Conclusions: Programming Embedded Systems u Neither ASICs or general-purpose processors will fill the needs of most embedded system applications u System design teams will increasingly choose ASIPs/programmable platforms u Programming these devices is a new challenge: s s Parallelism t Process t Operator t Bit/gate level Special-purpose execution units u Need to develop matches between application development environments and programming models of ASIPs/programmable platforms u Match must consider: s s Efficiency Productivity Robustness Reliabilty 24