CAPES DFG Project Universidade do Brasilia Universitaet Kaiserslautern

CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe November 14, 2003, Brasilia, Brazil Reiner Hartenstein* University of Kaiserslautern *) IEEE fellow Present and Future of Reconfigurable Systems

Literature (also downloads) University of Kaiserslautern http: //hartenstein. de also click „recent talks“ this page: also links to available Ph. D theses: Becker , Herz, Kress, Nageldinger, © 2003, reiner@hartenstein. de 2 http: //hartenstein. de

University of Kaiserslautern Reconfigurable Computing: a second programming domain Migration of programming to the structural domain The structural domain has become RAM-based The opportunity to introduce the structural domain to programmers. . . to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm © 2003, reiner@hartenstein. de 3 http: //hartenstein. de

IT ages University of Kaiserslautern flowware mainframe age data streams. . . computer age (PC age) morphware age here? von Neumann does not support morphware 1967 1957 © 2003, reiner@hartenstein. de 2007 1987 1977 1997 4 http: //hartenstein. de

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 5 http: //hartenstein. de

fine grain University of Kaiserslautern • Fine Grain morphware platforms already mainstream: reconfigurable logic just logic design on a strange platform ? speed-up til 3 orders of magnitude © 2003, reiner@hartenstein. de 6 http: //hartenstein. de

you don‘t need specific silicon ! University of Kaiserslautern 12 12 16 no. of masks cost / mio § Lattice 20 26 28 30 >30 4 15% NRE and mask cost [dataquest] 3 . 2 mask set cost [e. ASIC] 0. 8 0. 6 0. 35 0. 25 0. 18 0. 15 0. 13 0. 1 © 2003, reiner@hartenstein. de 7 Altera 37% Actel 6% r. GAs Xilinx 42% total: $3. 7 Bio Top 4 PLD Manufacturers 2000 others : 31% PC: 25% 6% 1 automotive 22% 16% communication consumer • FPGAs going into every type of application – also So. C • fastest growing segment of semiconductor market • [ Dataquest ] > $7 billion by 2003. 0. 07 feature size http: //hartenstein. de

University of Kaiserslautern r. GA with island (Ausschnitt) architecture connect switch © 2003, reiner@hartenstein.

• Rekonfigurierbar switch box University of Kaiserslautern switch point switch box © 2003,

• Rekonfigurierbar University of Kaiserslautern connect point © 2003, reiner@hartenstein. de connect box

• Rekonfigurierbar University of Kaiserslautern Verbindungs-Punkt illustration reconfigurable logic box © 2003, reiner@hartenstein.

University of Kaiserslautern connection activated illustration Die Zuleitung zur Funktionswahl des r. LB nicht gezeigt reconfigurable logic box © 2003, reiner@hartenstein. de 12 http: //hartenstein. de

connect point activated • Routing University of Kaiserslautern © 2003, reiner@hartenstein. de 13 http:

switch points activated 3 Schaltpunkte • Routing University of Kaiserslautern switch point der 4.

Routing continued • Routing University of Kaiserslautern © 2003, reiner@hartenstein. de 15 http: //hartenstein.

• Routing A Routing completed for 1 net University of Kaiserslautern 20 Transistors + 20 Flipflops Plazierungs- und Routing Software bekannt s. 25 Jahren 1979 Silva Lisco (Silicon Valley Research Corp. ) bietet CALM-P an Solche Netzwerk. Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar. B © 2003, reiner@hartenstein. de 16 http: //hartenstein. de

A University of Kaiserslautern Routing: long distance nets Passing through: long distance wiring from r. LBs outside this region A path can be used only once at a time. . . B © 2003, reiner@hartenstein. de 17 http: //hartenstein. de

A University of Kaiserslautern C C and D are not reachable. D C cannot be connected with D. A bridge can be passed only once (bridges of Königsberg) B © 2003, reiner@hartenstein. de routing congestion 18 http: //hartenstein. de

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 19 http: //hartenstein. de

University of Kaiserslautern Leonhard Euler‘s problem of the bridges of Königsberg is such a network problem (1736): Find a way, which passes each bridge exactly once. . . 1736. . . also an optimization: none of the bridges remains unused. © 2003, reiner@hartenstein. de 20 http: //hartenstein. de

University of Kaiserslautern L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128 -140 node Right Bank Graph ed g e Kneiphof Island Other Island Left Bank © 2003, reiner@hartenstein. de 21 http: //hartenstein. de

Data structures for Graphs University of Kaiserslautern adjacency matrix Graph directed graph 1 3 2 4 from 1 2 3 4 undirected graph 1 2 3 4 from 1 2 3 4 to 1 2 3 4 0 0 1 0 1 0 0 to 1 2 3 4 0 1 1 0 1 0 1 1 0 © 2003, reiner@hartenstein. de List 1 2 3 4 22 2 4 2 3 2 1 1 2 3 / / J. E. Hopcroft, R. E. Tarjan: Efficient algorithm for graph manipulation; Comm. ACM, 1973 3 / 3 4 / 2 4 / 3 / http: //hartenstein. de

Large Scale Routing University of Kaiserslautern ENIAC, completed 1945 Partitioning over racks in the hall Partitioning over card cages in the rack Partitioning over boards (cards) in card cages Partitioning over chips etc. on the card (e. g. SBC) Partitioning over blocks on the chip (e. g. microprocessor) © 2003, reiner@hartenstein. de 23 http: //hartenstein. de

University of Kaiserslautern PCBs (printed circuit boards) for 40 years planar „wiring“ MULTEC at Böblingen produces printed circuits boards since 1963 no. of pins is limited © 2003, reiner@hartenstein. de 24 http: //hartenstein. de

Integated Citcuit (Chip) limited number of pins University of Kaiserslautern „wiring“ on a planar

hierarchy University of Kaiserslautern rack more levels card cage KL 2 KL 3 KL 4 FTI 1 chip IMS 2 IMS JWGU IMS card IMS 1 macro cell IMS 3 IMS Kaiserslautern 1 FTI 2 © 2003, reiner@hartenstein. de basic cell 26 http: //hartenstein. de

University of Kaiserslautern card cage wiring connects the cards wiring hierarchy macro cell card wiring connects cables in the rack the chips connect the card cages on-Chipwiring connects the cells *) 30 er: Telefon-Vermittlung (ohne Chips, Crossbar / Hebdreh-Wähler statt Karten) 40 er: erste Computer (ohne Chips) © 2003, reiner@hartenstein. de cell 27 http: //hartenstein. de

An obsolete Application Area University of Kaiserslautern • fine grain reconfigurable • Placement and routing http: //www. uni-kl. de • coarse grain reconfigurable • Flowware ? n o i t a c i • Datastream-based Computing r b a f e r o f • The Anti be Machine Paradigm • Final Remarks ion ? © 2003, reiner@hartenstein. de t a c i r b a f r afte 28 http: //hartenstein. de

Emulators Dini Group Quickturn University of Kaiserslautern Celaro Pro (Mentor) PCi bus extender Dini

Crossbar 32 n=8 64 University of Kaiserslautern 4 x 4 64 14 full crossbar chips in a row no. of crossbar chips partial crossbar chips in a row no. of crossbar chips n n x n/2 n n 8 32 8 8 100 5000 100 32 © 2003, reiner@hartenstein. de 30 http: //hartenstein. de

14 Logic Chips ( Lchip ) with 128 pins (occasionally for rout-through ) University of Kaiserslautern each Xchip : 4 pins connected to each Lchip 32 Crossbar Chips ( Xchip ) with 72 I/O pins (for rout-through only ) Routing Logik-Karte 8 Logic cards per card cage 8 Ychip cards per card cage Einschub 8 card cages per rack Backplane: 8 Zboard cards per rack © 2003, reiner@hartenstein. de Schrank 31 http: //hartenstein. de

Crossbar ? University of Kaiserslautern 1913 J. N. Reynold‘s crossbar switch 1915 patent granted 1926 first public telefon switching application in Shweden Betulander‘s crossbar switch 1919 NASA telemetrics crossbar array 1964 © 2003, reiner@hartenstein. de 32 http: //hartenstein. de

University of Kaiserslautern RWC Real World Computing, Japan, 40 TFLOPS Crossbar weight: 220 tons, 3000 km cable, 5120 processors with 5000 pins each © 2003, reiner@hartenstein. de 33 http: //hartenstein. de

Routing Congestion Example University of Kaiserslautern r. GA direct connection impossible detour connection r.

Routing-only configuration (2 examples) • Routing University of Kaiserslautern r. LB Identitity function configured

Graphs, Partitioning, Algorithms University of Kaiserslautern T. Uehara, W. M. van Cleemput: Optimal Layout of CMOS Functional Arrays; IEEE Trans. C-30, pp. 305 -312, May 1981 B. Kernighan, S. Lin: An Efficient Heuristic Procedure for Partitioning Graphs; BSTJ 49, 1970, C. Alpert, A. Kahng: Recent Directions in Netlist Partitioning: A Survey; Integration, vol 19 (1 -2), pp. 1 -81, 1995 T. Cormen, et al. : Introduction to Algorithms; MIT Press / Mc. Graw-Hill, 1991 © 2003, reiner@hartenstein. de 36 http: //hartenstein. de

why emulators are obsolete System gates per r. GA chip University of Kaiserslautern 10 000 [Xilinx Data] planned 1 000 Virtex II XC 40250 XV XC 4085 XL 100 000 1 000 500 200 1984 1986 1988 1990 © 2003, reiner@hartenstein. de 1992 1994 37 1996 1998 2000 2002 2004 Jahr http: //hartenstein. de

why declining ASIC business? University of Kaiserslautern you don‘t need specific silicon ! More and more the prototyping platform of r. GA based systems will be directly delivered as the product to the customer: fully configured number of design starts [N. Tredennick, Gilder Technology Report, 2003] r. GA-basiert ASICs lost the battle. r. GAs are the winners ASIC emulators have been a transient solution: now with declining commercial significance. © 2003, reiner@hartenstein. de 38 http: //hartenstein. de

Xilinx: full hierarchy on chip from rack to chip University of Kaiserslautern • Xilinx Virtex-II Pro FPGA Architecture • Power. PC 405 RISC CPU (PPC 405) cores • FPGA Fabric-based on Virtex-II Architecture Rocket IO Power PC Core On Chip Memory Controller Embeded RAM Source: Ivo Bolsens, Xilinx © 2003, reiner@hartenstein. de 39 http: //hartenstein. de

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 40 http: //hartenstein. de

focusing on coarse grain University of Kaiserslautern • Fine Grain morphware platforms already mainstream: reconfigurable logic just logic design on a strange platform • Coarse Grain platforms: Reconfigurable Computing : not that new – but shocking the fundamentals of CS curricula an order of magnitude more MIPS/m. W than fine grain © 2003, reiner@hartenstein. de 41 http: //hartenstein. de

why coarse grain University of Kaiserslautern T. Claasen et al. : ISSCC 1999 *) R. Hartenstein: ISIS 1997 MOPS / m. W 1000 100 d r a h 10 ble a r igu nf o c re 1 ( As P r. D pu m co e bl a r gu g tin c) i g lo s 0. 01 )* P DS fi n ors o s c s e oce r s (r p A et s FPG n o i ssor e t c c o r ru icrop m d r inst tanda 0. 1 0. 001 ed r i w coarse grain goes far beyond bridging the gap throughput hardwired coarse grain FPGAs von Neumann flexibility 2 1 © 2003, reiner@hartenstein. de 0. 5 0. 25 42 0. 13 0. 1 0, 07 µ feature size http: //hartenstein. de

r. DPA (Reconfigurable Datapath Array) University of Kaiserslautern r. DPU r. DPU Reconfigurable Interconnect Fabric r. DPU r. DPU separate routing area © 2003, reiner@hartenstein. de RIF layouted over r. DPUs: r. DPA wired by abutment 43 http: //hartenstein. de

CMOS intercoonnect resources University of Kaiserslautern Foundries offer up to 9 metal layers and up to 3 poly layers reconfigurable interconnect fabric layouted over the r. DU cell © 2003, reiner@hartenstein. de 44 http: //hartenstein. de

Commercial r. DPAs University of Kaiserslautern XPU family (IP cores): PACT Corp. , Munich

mapping algorithms efficently onto r. DPA SNN filter on Kress. Array University of Kaiserslautern rout thru only array size: 10 x 16 = 160 r. DPUs d e r tu e c u r war t S „ ig f n Co sign“ [R. H. ] De backbus connect not used by the way: example of scalability / relocatability by EDA support © 2003, reiner@hartenstein. de 46 http: //hartenstein. de

• Routing University of Kaiserslautern Hundreds of r. GAs or very large r.

Communication Resource Requirements University of Kaiserslautern . . . often Functional Resources are not the Throughput Bottleneck In some Application Areas, such as e. g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources The Solution: Generators for Domain-specific RA Platforms © 2003, reiner@hartenstein. de 48 http: //hartenstein. de

University of Kaiserslautern Kress. Array Family generic Fabrics: a few examples Select mode, Select number, width of NNports 16 8 32 Function Repertory + 24 2 r. DPU 4 select Nearest Neighbour (NN) Interconnect: an example routthrough only more NNports: rich Rout Resources rout-through and function Examples of 2 nd Level Interconnect: layouted over r. DPU cell - no separate routing areas ! ©http: //kressarray. de 2003, reiner@hartenstein. de 49 http: //hartenstein. de

Super Pipe Networks University of Kaiserslautern The key is mappin g, rather than architecture

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 51 http: //hartenstein. de

Morphware machines vs. hardwired machines University of Kaiserslautern A clear terminology helps a lot platform program source running on it hardware (not programmable ) fine grain r. GA (FPGA) coarse grain r. DPU, r. DPA morphware reconfigurable data stream processor machine data stream processor hardwired instruction stream processor (v. N. ) © 2003, reiner@hartenstein. de 52 configware flowware & configware flowware software http: //hartenstein. de

University of Kaiserslautern Flowware defines: . . . which data item at which time at which port x x x DPA time x x x | | x x x - time - - x x x - - - x x x | | | x x x time x x x | port # 53 port # - - - x x x - - © 2003, reiner@hartenstein. de input data streams port # output data streams | x x x http: //hartenstein. de

University of Kaiserslautern Paradigm Shifts: Nick Tredennick‘s view why 2 program sources ? data-stream-based reconfigurable computing: instruction-streambased computing: algorithms variable resources fixed resources variable Software © 2003, reiner@hartenstein. de programmable 54 Configware Flowware http: //hartenstein. de

Flowware heading toward mainstream University of Kaiserslautern • Data-stream-based Computing is heading for mainstream – 1997 SCCC (LANL) Streams-C Configurabble Computing –SCORE (UCB) Stream Computations Organized for Reconfigurable Execution –ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing – 2000 Bee (UCB), . . . –Most stream-based multimedia systems, etc. Flowware. . mostly not yet modelled that way: Flowware : most flowware is hidden by its managing data streams indirect instruction-stream-based Software: implementation managing instruction streams –Many other areas. . © 2003, reiner@hartenstein. de 55 http: //hartenstein. de

control-procedural vs. data-procedural University of Kaiserslautern The structural domain is primarily data-stream-based: Flowware provides a (data-)procedural abstraction of the (data-stream-based) structural domain Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“. . . a Troyan horse to introduce the structural domain to the procedural mind set of programmers © 2003, reiner@hartenstein. de 56 http: //hartenstein. de

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 57 http: //hartenstein. de

M M r. DPA M M M © 2003, reiner@hartenstein. de mapper configware M M M „ instruction “ fetch before runtime M r. Data Path Array intermediate M M M as. M M University of Kaiserslautern Configware / Flowware Compilation dis me trib M ar mo ute chi ry d high level source tec data tur wrapper e streams scheduler address generator 58 flowware data sequencer http: //hartenstein. de

University of Kaiserslautern >>> extremely high efficiency: flowware-based computing 1. avoiding address computation memory cycle overhead 2. avoiding instruction fetch and interpretation overhead 3. high parallelism, massively multiple deep pipelines 4. much less configuration memory 5. interconnect layouted over the cell: no extra routing areas 6. methodologies readily available © 2003, reiner@hartenstein. de 59 http: //hartenstein. de

University of Kaiserslautern Programming Language Paradigms flowware languages ve ry ea sy lea to rn mu lti GA ple mu Gs poch m mu wer or ch ful e simmore ple © 2003, reiner@hartenstein. de 60 http: //hartenstein. de

Machine Paradigms University of Kaiserslautern ( “instruction fetch” ) *) e g. Bee project

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 62 http: //hartenstein. de

computing paradigms and methodologies University of Kaiserslautern *) r. DPU = reconfigurable Data Path Unit 1946: machine paradigm (von Neumann) 1989: anti machine paradigm 1990: 1 st r. DPU* (Rabaey) 1994: anti machine high level programming language 1995: super systolic r. DPA (Kress) flowware 1980: data streams (Kung, Leiserson) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), . . . 1997+: discipline of distributed memory architecture 1997: 1 st configware / software partitioning compiler © 2003, reiner@hartenstein. de 63 http: //hartenstein. de

The Secret of Success: Co-Compilation supporting platform-based design University of Kaiserslautern High level PL source “v. N" machine paradigm Partitioner anti machine paradigm CW SW Analyzer compiler / Profiler compiler SW code © 2003, reiner@hartenstein. de CW Code 64 supporting different platforms Resource Parameters http: //hartenstein. de

Machine paradigms von Neumann machine (reconf. ) data-stream (anti machine) University of Kaiserslautern instruction stream M machine instruction memory I/O stream DPU CPU + Flowware DPU - ruction stream + - I/O M M (r)DPU I/O (data sequencer) as. M** data stream am data stre Software (Configware) CPU instruction inst sequencer M memory data address generator DPU or r. DPU distributed memory architecture* memory M M M I/O (r)DPA *) the new discipline came just in time: see Herz et al. : Proc. IEEE ICECS, 2002 http: //hartenstein. de © 2003, reiner@hartenstein. de 65 by Francky Catthoor also see books et al.

University of Kaiserslautern Memory Synthesizable distributed memory architecture. . . for a Stream-based Soft Machine “instructions Compiler ” (data memory) r. DPA Scheduler memory bank . . . memory bank Sequencers (data stream generator ) © 2003, reiner@hartenstein. de 66 http: //hartenstein. de

PC replaced by PS University of Kaiserslautern flowware mainframe age data streams. . . computer age (PC age) morphware age co-compiler PC replaced by PS (personal supercomputer ) 1967 1957 © 2003, reiner@hartenstein. de µProc 2007 1987 1977 von Neumann 67 r. DPA 1997 anti machine http: //hartenstein. de

all methodologies available University of Kaiserslautern flowware data streams. . . morphware age free know-how for personal super computer µProc . . and all other methodologies available from literature r. DPA 1967 1957 © 2003, reiner@hartenstein. de co-compiler 2007 1987 1977 1997 68 http: //hartenstein. de

We have an education problem University of Kaiserslautern µprocessor accelerators Crossing the Hardware / Software Chasm [Mike Butts] It‘s the gap between procedural and structural mind set Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software The typical programmer has problems to understand function evaluation without machine mechanisms. . . . we need a second machine paradigm © 2003, reiner@hartenstein. de 69 http: //hartenstein. de

Ubiquitous Embedded Systems University of Kaiserslautern embedded software and configware became the main vehicle to product differentiation. . . and the main focus in system design (Performance and) Flexibility are key issues current CS curricula do not qualify our students © 2003, reiner@hartenstein. de 70 http: //hartenstein. de

misqualified: jobless CS graduates ? University of Kaiserslautern law ] factor [D TI re y / 4. 1 ( r a e Em be dd ed so ft wa 90% of all code written for embedded systems * 2 1 0*) Department of Trade and Industry, London © 2003, reiner@hartenstein. de 10 12 71 ) [M oo ’s e r ] w la The real labor market 10 times more programmers will write embedded applications than computer software monthsby 2010 18 http: //hartenstein. de :

>> outline << University of Kaiserslautern http: //www. uni-kl. de • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks © 2003, reiner@hartenstein. de 72 http: //hartenstein. de

EDA Industry Revolution every 7 years University of Kaiserslautern EDA industry paradigm switching every 7 years 1999 [Keutzer / Newton] Mc. Kinsey Curves 1992 Synthesis (HDLs): Cadence, Synopsys. . . 1985 1978 Schematics entry: Daisy, Mentor, Valid. . . Transistor entry: Applicon, Calma, CV. . . © 2003, reiner@hartenstein. de 73 http: //hartenstein. de

EDA the main bottleneck University of Kaiserslautern [courtesy by Richard Newton] math formula ?

guess it ! Biggest Mistake of EDA University of Kaiserslautern © 2003, reiner@hartenstein. de

The next EDA Industry Revolution Von Neumann does not support Morphware: University of Kaiserslautern EDA industry paradigm switching every 7 years 1999 [Keutzer / Newton] Mc. Kinsey Curves 1992 Synthesis (HDLs): Cadence, Synopsys. . . 1985 1978 (Co-) Compilation: data-stream-based DPAs higher abstraction level: S* R T : C a l u m m e Syst math for Schematics entry: Daisy, Mentor, Valid. . . Transistor entry: Applicon, Calma, CV. . . *) Term Rewriting Systems © 2003, reiner@hartenstein. de 76 http: //hartenstein. de

Algorithmic cleverness needed University of Kaiserslautern We need an all-embracing taxonomy of algorithms and survey on algorithm transformations. . loop transformations. . optimization, partitioning, signal processing, (de-) coding algorithms (wireless communication), image processing, sorting, . . And much more areas. . . Example - migration from signal processor to r. GA: very high throughput on low power slow FPGAs obtained only by algorithmic cleverness: © 2003, reiner@hartenstein. de 77 http: //hartenstein. de

University of Kaiserslautern algorithmic cleverness needed for CS graduates in embedded systems the hardware / configware / software partitioning problem: current CS curricula do not qualify our students software / configware migration: current CS curricula do not qualify our students extending software engineering into software / flowware engineering: the anti machine paradigm and reconfigurable computing are the curricular enablers © 2003, reiner@hartenstein. de 78 http: //hartenstein. de

>>> thank you University of Kaiserslautern thank you © 2003, reiner@hartenstein. de 79 http:

University of Kaiserslautern - END © 2003, reiner@hartenstein. de 80 http: //hartenstein. de

University of Kaiserslautern Appendix for discussion © 2003, reiner@hartenstein. de 81 http: //hartenstein. de

Processor Memory Performance Gap University of Kaiserslautern Performance 1000 µProc 60%/yr. . CPU 100 Processor-Memory Performance Gap: (grows 50% / year) 10 1 1980 © 2003, reiner@hartenstein. de DRAM 1990 2000 82 DRAM 7%/yr. . http: //hartenstein. de

Why a dichotomy of machine paradigms? v. N: unbalanced University of Kaiserslautern v. N bottleneck. . . , s e h c ca data stream machine: CPU • bad message: caches do not help • good message: no v. N bottleneck • caches not needed stolen from Bob Colwell The anti machine has no von Neumann bottleneck © 2003, reiner@hartenstein. de 83 http: //hartenstein. de

[intel] „Pollack‘s Law“ (simplified) University of Kaiserslautern growth factor area efficiency performance © 2003,

Loop Transformation Examples University of Kaiserslautern sequential processes: loop 1 -16 body endloop resource parameter driven Co-Compilation host: loop 1 -8 trigger endloop 1 -8 fork body loop 1 -8 loop 9 -16 endloop body endloop unrolling loop 1 -4 trigger endloop 1 -2 trigger endloop join strip mining © 2003, reiner@hartenstein. de reconf. array: 85 http: //hartenstein. de

Die Entwurfs-Krise Die langen Durchlauf. Zeiten der ASICFertigung werden zunehmend unbezahlbar Steigende Nachfrage: schnelle Patches und Upgrades – möglichst am Standort des Kunden – Förderung der Langlebigkeit des Produktes © 2003, reiner@hartenstein. de de pro du ct sig n co st University of Kaiserslautern life cy cle year 86 http: //hartenstein. de

Summary of the Anti Machine Paradigm University of Kaiserslautern • anti language primitives are almost the same (slightly extended) • anti machine execution potential is dramatically more powerful • provides drastically more flexibility • not always replacing von Neumann © 2003, reiner@hartenstein. de 87 http: //hartenstein. de

University of Kaiserslautern Reconfigurable Computing: a second programming domain Migration of programming to the structural domain The structural domain has become RAM-based Currently running: the next fundamental revolution after introduction of the microprocessor However, CS curricula ignore this impact of Reconfigurable Computing – key issue in embedded systems. . . causing the coming disaster by unqualified CS graduates pushing up the unemployment rate ? © 2003, reiner@hartenstein. de 88 http: //hartenstein. de

All enabling technologies are available University of Kaiserslautern • literature from last 30 years • languages & (co-)compilation techniques • anti machine and all its architectural resources • parallel memory IP cores and generators • morphware vendors like PACT. . • anything else needed © 2003, reiner@hartenstein. de 89 http: //hartenstein. de

New horizons University of Kaiserslautern • • • A new RAM-based platform going mainstream Configware industry New machine paradigm New theory needed New architectures – without v. N. bottleneck New compilation techniques More effective parallelism provided Rich material is already available in many areas Lots of similarities with the classical v. N. world But a few asymmetries: a challenge © 2003, reiner@hartenstein. de 90 http: //hartenstein. de

evangelist‘s material + lobby space University of Kaiserslautern Evangelist‚s material: • http: //hartenstein. de – click „recent talks“ Lobby space: • http: //morphware. net • http: //configware. org • http: //data-streams. org • http: //flowware. net Trailblazer group: • you are welcome to improve, rewrite, post links. . . • You are welcome to join the trailblazer group © 2003, reiner@hartenstein. de 91 http: //hartenstein. de

The genious of von Neumann University of Kaiserslautern • enormous impact of the von Neumann paradigm • even stronger impact by a dichotomy of paradigms: • von Neumann of matter • von Neuman of anti matter – • Von Neumann machine vs. anti machine • does not mean throwing over v. N. ‘s monument • it multiplies the glory of von Neumann © 2003, reiner@hartenstein. de 92 http: //hartenstein. de

MPU performance stalled University of Kaiserslautern Bill Gates’ law: relative computation time needed doubles every 2 years had been compensated by Moore’s law will stall soon for MPUs © 2003, reiner@hartenstein. de 93 http: //hartenstein. de

Basics of Binding Time University of Kaiserslautern time of “Instruction Fetch” run time microprocessor parallel computer loading time Reconfigurable Computing compile time © 2003, reiner@hartenstein. de 94 http: //hartenstein. de

Time to Market University of Kaiserslautern • Morphware brings a new dimension to digital system development and has a strong impact on So. C design. • Flexibility supports spin-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades • A New Business Model (in-field debugging and upgrading. . . ) Revenue / month [Tom Kean] Update 1 reconfigurable Product with download • A Fundamental Paradigm Shift in Silicon Application Update 2 ASIC Product Time / months 1 © 2003, reiner@hartenstein. de 95 10 20 30 http: //hartenstein. de

Kress. Array principles University of Kaiserslautern • take systolic array principles • replace classical synthesis by simulated annealing • yields the super systolic array • a generalization of the systolic array • no more restricted to regular data dependencies • now reconfigurability makes sense © 2003, reiner@hartenstein. de 96 http: //hartenstein. de

Significance of Address Generators University of Kaiserslautern • Address generators have the potential to reduce computation time significantly. • In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750 • Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead © 2003, reiner@hartenstein. de 97 http: //hartenstein. de

Acceleration Mechanisms University of Kaiserslautern • parallelism by multi bank memory architecture • auxiliary hardware for address calculation • address calculation before run time • avoiding multiple accesses to the same data. • avoiding memory cycles for address computation • improve parallelism by storage scheme transformations • improve parallelism by memory architecture transformations • alleviate interconnect overhead (delay, power and area) © 2003, reiner@hartenstein. de 98 http: //hartenstein. de

4 G Why coarse grain ? 3 G University of Kaiserslautern Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld memory 100 000 2 G Transistors/chip Normalized processor speed 10 000 wireless 1000 100 000 Algorithmic Complexity (Shannon’s Law) 10 000 microprocessor / DSP 1 G 1000 100 computational efficiency 10 1 SH 7752 Strong. ARM 0. 1 100 battery performance 10 1 1960 m. A/ MIP 1970 © 2003, reiner@hartenstein. de 1980 1990 99 2000 0. 01 0. 001 2010 http: //hartenstein. de