To DSP or Not to DSP Chad Erven

  • Slides: 17
Download presentation
To DSP or Not to DSP? Chad Erven

To DSP or Not to DSP? Chad Erven

Words to Bits – Your Options ASIC n FPGA n DSP n Embedded RISC

Words to Bits – Your Options ASIC n FPGA n DSP n Embedded RISC n General Purpose Processor (GPP) n

Why Go Programmable? Building the chip wrong 1. – – – Systems are increasingly

Why Go Programmable? Building the chip wrong 1. – – – Systems are increasingly too complex to efficiently be described by RTL designers Errors are orders of magnitudes more difficult to find in hardware than software Defects are extremely costly in hardware Building the wrong chip 2. – Only software is flexible enough to adapt during and after system design HARDWARE IS TO HARD!

So Software and Processors, Right? n Using processors has its drawbacks – especially in

So Software and Processors, Right? n Using processors has its drawbacks – especially in SOC designs – Never a perfect match between the application and the hardware – Performance costs, power penalties, wasted silicon will ALWAYS happen to some extent – Integrating multiple disparate cores with each other

Splitting the Difference – ASIPs n Ever wish you were the processor designer? n

Splitting the Difference – ASIPs n Ever wish you were the processor designer? n Now you are! Write the exact instructions you need and nothing more. n An Application Specific Integrate Processor (ASIP) offers the best of both worlds

Back Up! n Isn’t hardware too much work? – Yes n So doesn’t an

Back Up! n Isn’t hardware too much work? – Yes n So doesn’t an ASIP defeat the purpose? – No n Why not? – – – Extending a base processor is much easier Readily amiable to automation You only have to verify the instruction description, integration into the processor is guaranteed

Cool, Show Me How It Works n ASIPs derive their performance from three problems

Cool, Show Me How It Works n ASIPs derive their performance from three problems for a processor 1. Operations that are innately parallel must be expressed serially – 2. Memory space is addressed as one continuous space – 3. Somewhat solved by SIMD or MIMD processors Somewhat solved by modifiers and/or pragmas (dm/pm) Applications are complicated by their expression as operations on C types – Somewhat alleviated by powerful instructions in hardware

Working with the Innate Nature of the Algorithm n Example – byte swap (common

Working with the Innate Nature of the Algorithm n Example – byte swap (common telecom task) int *a, *b ; … for(int i= 0 { a[i] =( ((b[i] } ; i < 4096 ; i++ ) & & 0 x 000000 ff) 0 x 0000 ff 00) 0 x 00 ff 0000) 0 xff 000000) << << >> >> 24) 8) 8) 24) | | | );

Working with the Innate Nature of the Algorithm n Write your own instruction: operation

Working with the Innate Nature of the Algorithm n Write your own instruction: operation swap {in AR x, out AR y}{} {y = {x[7: 0], x[15: 8], x[23: 16], x[31: 24]}; } n Making the C Code: for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ; Execution Cycles without TIE Extension Execution Cycles With TIE Extension 4, 915, 300 1, 638, 524 5 X SPEED UP!!!

Instruction Fusion reg 5 (output) op 2 reg 3 (input) reg 4 (input) reg

Instruction Fusion reg 5 (output) op 2 reg 3 (input) reg 4 (input) reg 3 (output) op 1 reg 1 (input) reg 2 (input) Unfused operation reg 1 (input) reg 2 (input) Fused operation

n Example for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >>

n Example for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ; Assembly: loop: l 8 ui addi mull 6 u srai s 8 i addi a 12, a 11, 0 a 13, a 10, 0 a 11, 1 a 10, 1 a 8, a 12, a 13 a 8, 4 a 8, a 9, 0 a 9, 1

Example a 11 1 addi 0 0 l 8 ui addi mull 6 u

Example a 11 1 addi 0 0 l 8 ui addi mull 6 u 4 srai a 9 s 8 i 1 a 10 1 addi a 9

Example a 11 1 addi 0 0 l 8 ui a 9 fusion. mull

Example a 11 1 addi 0 0 l 8 ui a 9 fusion. mull 6 u. srai. s 8 i. addi a 9 1 a 10 addi

Example New assembly code: loop: l 8 ui a 12, a 11, 0 l

Example New assembly code: loop: l 8 ui a 12, a 11, 0 l 8 ui a 13, a 10, 0 addi a 10, 1 addi a 11, 1 fusion. mull 6 u. srai. s 8 i. addi a 9, 12, a 13

Benchmarking EEMBC Consumer. Marks (performance). From [Rowen]. EEMBC Summary (Performance/MHz). From [Rowen] • Hand

Benchmarking EEMBC Consumer. Marks (performance). From [Rowen]. EEMBC Summary (Performance/MHz). From [Rowen] • Hand coded assembly for the other processors

And I Haven’t Even Gotten To… n Sharing input operands n Substituting variables with

And I Haven’t Even Gotten To… n Sharing input operands n Substituting variables with constants n Replacing memory tables with logic n Limiting immediate values to the minimum required width n Placing operands in special registers n Creating SIMD instructions n Reducing the size of operand specifiers n Custom input/output queues

Ok, Let Me Have It Dr. Smith (The rest of you can ask questions

Ok, Let Me Have It Dr. Smith (The rest of you can ask questions too)