Platform Design ASIP Application Specific Instructionset Processor TUe








![Example: Simple processor [Leupers] Inp I. (20: 13) PC I. (12: 5) RAM I. Example: Simple processor [Leupers] Inp I. (20: 13) PC I. (12: 5) RAM I.](https://slidetodoc.com/presentation_image_h2/793af06494aacdb61788bb6bde901a58/image-9.jpg)
![Example: Simple processor [Leupers] 6/17/2021 Platform Design H. Corporaal and B. Mesman 10 Example: Simple processor [Leupers] 6/17/2021 Platform Design H. Corporaal and B. Mesman 10](https://slidetodoc.com/presentation_image_h2/793af06494aacdb61788bb6bde901a58/image-10.jpg)




















- Slides: 30
Platform Design ASIP Application Specific Instruction-set Processor TU/e 5 kk 70 Henk Corporaal Bart Mesman 6/17/2021 Platform Design H. Corporaal and B. Mesman 1
Application domain specific processors (ADSP or ASIP) DSP Programmable CPU Programmable DSP Application domain specific Application specific processor flexibility efficiency 6/17/2021 Platform Design H. Corporaal and B. Mesman 2
Application domain specific processors (ADSP or ASIP) takes a well defined application domain as a starting point • exploits characteristics of the domain (computation kernels) • still programmable within the domain e. g. MPEG 2 coding uses 8*8 DCT transform, DECT, GSM etc. . . implementation GP Appl. domain performance: clock speed + ILP flexible dev. (new apps. ) problems manual design, large effort 6/17/2021 Platform Design Appl. domain ADSP implementation ILP + tuning to domain cost effective (high volume) - specification - design time and effort => synthesized cores H. Corporaal and B. Mesman 3
www. adelantetech. com 6/17/2021 Platform Design H. Corporaal and B. Mesman 4
Outline • design process • retargetable code generation (problem statement) • ADSP/VLIW architectures (Mistral 2 /A|RT designer) • low power aspects (Mistral 2 /A|RT designer) • discussion • conclusion 6/17/2021 Platform Design H. Corporaal and B. Mesman 5
Design process application(s) instance processor model e. g. VLIW with shared RFs parameters SW (code generation) HW design Estimations nsec/cycle, area, power/instr Estimations cycles/alg occupation OK? yes 6/17/2021 more appl. ? Platform Design no no 3 phases 1. exploration 2. hw design (layout) + processing 3. design appl. sw Fast, accurate and early feedback go to phase 2 H. Corporaal and B. Mesman 6
Problem statement A compiler is retargetable if it can generate code for a ‘new’ processor architecture specified in a machine description file. A guarded register transfer pattern (GRTP) is a register transfer pattern (RTP) together with the control bits of the instruction word that control the RTP. a: = b + c | instr = xxxx 0101 GRTPs contain all inter-RT-conflict information. Instruction set extraction (ISE) is the process of generating all possible GRTPs for a specific processor. 6/17/2021 Platform Design H. Corporaal and B. Mesman 7
Problem statement Algorithm spec Processor spec (instance) FE ISE in ch 4 this is part of the code generator CDFG GRTP Code Generation Machinecode 6/17/2021 Platform Design H. Corporaal and B. Mesman 8
Example: Simple processor [Leupers] Inp I. (20: 13) PC I. (12: 5) RAM I. (4) +1 I. (3: 2) IM I. (1: 0) I. (20: 0) REG outp 6/17/2021 Platform Design H. Corporaal and B. Mesman 9
Example: Simple processor [Leupers] 6/17/2021 Platform Design H. Corporaal and B. Mesman 10
ASIP/VLIW architectures A|RT designer template as an example (= set of rules, a model) Differences with VLIW processors of ch. 4 1. // FUs • ASUs = complex appl. Spec. FUs (beyond subword //) e. g. biquad, median, DCT etc … • larger grainsize, more heterogeneous, more pipelines 2. Rfiles • many Rfiles (>5 vs 1 or 2) • limited # ports (3 vs 15) • limited size (<16 vs. 128) 3. Issue slots • all in parallel vs. 5 6/17/2021 Platform Design H. Corporaal and B. Mesman 11
RF 1 RF 2 FU 1 RF 3 RF 4 FU 2 RF 5 RF 6 RF 7 FU 3 RF 8 FU 4 flags IR 1 IR 2 IR 3 Instruction memory 6/17/2021 Platform Design H. Corporaal and B. Mesman IR 4 Control 12
ASIP/VLIW architectures Additional characteristics of the A|RT designer template • interconnect network: busses + input multiplexers mux control is part of the instruction control can change every clock cycle network can be incomplete busses can be merged • memories are modeled as FUs separate data in and data out 2 inputs (data in and address) and 1 output • Each FU can generate one or more flags • instruction format (per issue slot) read write mux 1 mux 2 address RF 1 RF 2 6/17/2021 Platform Design H. Corporaal and B. Mesman control FU output drivers 13
ASIP/VLIW architectures: example RF 1 RF 2 ALU bus 1 19 mux read write 2 RF 1 6/17/2021 read RF 2 RF 3 write RF 2 MAC 10 9 ALU instr. mux 3 Platform Design RF 4 bus 2 0 read RF 3 write RF 3 read write MAC instr. RF 4 H. Corporaal and B. Mesman 14
ASIP/VLIW architectures : example 6/17/2021 Platform Design H. Corporaal and B. Mesman 15
ASIP/VLIW architectures: design flow assign ( a+b, ALU, fu_alu 1) assign ( a+_, ALU, fu_alu 2) assign ( _+_, ALU, fu_alu 3) Algorithm spec Datapath synthesis RF 1 : x = RF 2 : y, RF 3 : z | ALU = ADD Inmux = bus 2 Change RTs pragmas Controller synthesis VLIW makes relatively simple code selection possible 6/17/2021 Estimations area, power, timing Platform Design OK? no yes H. Corporaal and B. Mesman 16
ASIP/VLIW architectures: list scheduling Candidate LIST IPB * + 1 2 * OPB * 4 + 0 0 * 3 * + 1 1 1 5 * 2 * 3 * * * 1 * 4 Scheduled Operation * 3 + 1 2 * 4 + 3 6 * 2 3 2 Conflict & Priority Comp. 4 * 6 + 3 6 MULT + 7 * 3 3 * + 5 8 * 7 * 8 * 5 * 8 + 8 7 ALU * IPB + 9 10 OPB 6/17/2021 Platform Design 4 4 * * 5 5 * * 9 * 5 + 9 * 9 5 * 10 H. Corporaal and B. Mesman + 9 10 17
ASIP/VLIW architectures: feedback 6/17/2021 Platform Design H. Corporaal and B. Mesman 18
Outline • design process • retargetable code generation (problem statement) • ADSP/VLIW architectures (Mistral 2 /A|RT designer) • low power aspects (Mistral 2 /A|RT designer) • discussion • conclusion 6/17/2021 Platform Design H. Corporaal and B. Mesman 19
Low power aspects Implementation Independent Design Database • Estimation area speed + power Mistral 2 Architecture 6/17/2021 Platform Design Estimation Database H. Corporaal and B. Mesman 20
GSM viterbi decoder : default solution 13750 EXU alu_1 romctrl_1 acu_1 ipb_1 opb_1 ctrl total ACTIV 96% 48% 26% 5% 23% AREA 3469 39 327 131 1804 9821 15591 POWER 46196 259 1209 105 5801 135035 188605 • controller responsible for 70% of power consumption – maximum resource-sharing – heavy decision-making : “main” loop with 16 metrics-computations per iteration • EXU-numbers include Registers for local storage 6/17/2021 Platform Design H. Corporaal and B. Mesman 21
GSM viterbi decoder : no loop-folding 14247 EXU alu_1 romctrl_1 acu_1 ipb_1 opb_1 ctrl total ACTIV 92% 45% 25% 5% 22% AREA 3411 39 294 107 1661 4919 10431 POWER 45073 255 1087 86 5340 70087 121928 • area down by 33% • power down by 35% • next step: reduce # of program-steps with second ALU 6/17/2021 Platform Design H. Corporaal and B. Mesman 22
GSM viterbi decoder : 2 ALU’s 9739 EXU alu_1 alu_2 romctrl_1 acu_1 ipb_1 opb_1 ctrl total ACTIV 69% 65% 67% 37% 8% 33% AREA 1797 1393 39 294 149 2136 8957 14766 POWER 12248 8916 255 1087 119 6871 87235 116731 © cycle count down 30% © area up 42% © power down by 5% © next step: introduce ASU to reduce ALU-load 6/17/2021 Platform Design H. Corporaal and B. Mesman 23
GSM viterbi decoder : 1 x ACS-ASU func ACS ( M 1, M 2, d ) MS, MS 8 = begin MS = if ( M 1+d > M 2 -d ) -> ( M 1+d) || ( M 2 -d) fi; MS 8 = if ( M 1 - d > M 2+d) -> ( M 1 - d) || ( M 2+d) fi; end; EXU alu_1 acs_asu_1 or_asu_1 romctrl_1 acu_1 ipb_1 opb_1 ctrl total ACTIV 20% 83% 10% 16% 36% 20% 11% AREA 261 2382 611 65 294 107 163 1864 5747 = POWER 105 3816 122 21 205 43 35 3597 7944 1930 © cycle count down 5 X © power down 20 X ! 6/17/2021 Platform Design H. Corporaal and B. Mesman 24
GSM viterbi decoder : 4 x ACS-ASU EXU alu_1 acs_asu_2 acs_asu_3 acs_asu_4 split_asu_1 or_asu_1 romctrl_1 acu_1 ipb_1 opb_1 ctrl total 425 ACTIV 94% 95% 95% 47% 28% 98% 23% 50% AREA 243 1041 90 592 48 212 60 369 1306 7084 POWER 97 420 420 18 118 6 85 6 80 555 2645 © cycle count down another 5 X © area up 23% © power down another 3 X ! 6/17/2021 Platform Design H. Corporaal and B. Mesman 25
GSM viterbi example : summary Implementation Independent Design Database Mistral 2 72 x ! 6/17/2021 Platform Design H. Corporaal and B. Mesman 26
Discussion: phase 3 processor model application(s) SW (code generation) HW design no no OK? application(s) Freeze processor model no yes more appl. ? Platform Design OK? yes no Exploration phase 6/17/2021 SW (code generation) Application software development: constraint driven compilation H. Corporaal and B. Mesman 27
Discussion: problems with VLIWs code size and instruction bandwidth • code compaction = reduce code size after scheduling possible compaction ratio ? e. g. p 0 = 0. 9 and p 1 = 0. 1 information content (entropy) = - pi log 2 pi = 0. 47 maximum compression factor 2 • control parallelism during scheduling = switch between different processor models (10% of code = 90% runtime) • architecture reduce number of control bits for operand addresses e. g. 128 reg (TM) -> 28 bits/issue slot for addresses only => use stacks and fifos 6/17/2021 Platform Design H. Corporaal and B. Mesman 28
RF 1 RF 2 RF 3 RF 4 FU 1 FU 2 FU 3 FU 4 flags IR 1 IR 2 IR 3 Instruction memory 6/17/2021 Platform Design H. Corporaal and B. Mesman IR 4 Control 29
Conclusions • ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency). • The methodology is interesting for IP creation. • The key problem is retargetable compilation. • A (distributed) VLIW model is a good compromise between HW and SW. • Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback. 6/17/2021 Platform Design H. Corporaal and B. Mesman 30