Comparing Processor Architectures CISC RISC and DSP M

  • Slides: 25
Download presentation
Comparing Processor Architectures CISC, RISC and DSP M. R. Smith, Electrical and Computer Engineering

Comparing Processor Architectures CISC, RISC and DSP M. R. Smith, Electrical and Computer Engineering University of Calgary, Alberta, Canada smithmr @ ucalgary. ca

To be tackled today n Processor Architectures to be covered n n n 6809,

To be tackled today n Processor Architectures to be covered n n n 6809, 68 HC 11, 68332, 68020, 68040, 5206 e ADSP 218 X, ADSP 2106 X, ADSP 2116 X 29 k, Power. PC How to program various processors (in the broad sense) when you can program 68332 and 21061 processors. Basic Implications of Architectures on program performance 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 2

6809 and 68 HC 11 n n 6809 -- microprocessor 68 HC 11 --

6809 and 68 HC 11 n n 6809 -- microprocessor 68 HC 11 -- microcontroller 8 -bit architecture for data registers and data bus 16 -bit architecture for index registers 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 3

Programming the 6809/68 HC 11 n Treat as 8 -bit version of 68332 except

Programming the 6809/68 HC 11 n Treat as 8 -bit version of 68332 except n n only 2 data registers (8 -bit) only 2 index registers (16 -bit) Most instructions are 16 -bits long (2 fetches) Special features n n 2 data registers (A and B) can act as 16 -bit register Direct page register -- Fetch an 8 bit address (fast) and combine with DP register to generate 16 -bit address. Like segment register in Intel X 86 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 4

Points when 6809/68 HC 11 programming n n Look at limitations discussed in Micro

Points when 6809/68 HC 11 programming n n Look at limitations discussed in Micro magazine articles Use 8 -bit features in algorithms n n 16 -bit operation VERY slow Handles arrays, loops smaller than 256 well Use 8 -bit peripherals (low precision) Very cheap, lots of designs and software out -there. 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 5

68332 n n 16 -bit integer microcontroller, essentially 68010 with peripherals. 8 data registers

68332 n n 16 -bit integer microcontroller, essentially 68010 with peripherals. 8 data registers (32 -bits) over 16 -bit busses. 7 index registers (24 -bit) Can handle floating point operations -called F-line -- by “trapping” (SWI) to software floating point routines -- very slow 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 6

Programming 68332 n Treat as integer version of 21061 except n n n Maximum

Programming 68332 n Treat as integer version of 21061 except n n n Maximum of two registers are mentioned in each instruction. No hardware circular buffers Program code and data fight to come over same bus Has complex instructions that essentially do the same operations as several 21061 instructions ADD. W #4, D 0 does n n Fetch the value 4 from memory into internal register Add internal register to D 0 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 7

68020 n 32 bit version of 68000? n n Which has implications in the

68020 n 32 bit version of 68000? n n Which has implications in the speed at which certain instructions (16 -bit values) are accessed and the speed at which 32 -bit values can be read. Program as if it were 68332 Has instruction cache? Floating point instructions (F-line) can trap to software routines or to hardware coprocessor 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 8

68040 n n 32 -bit version of 68332 Has on-board math coprocessor n n

68040 n n 32 -bit version of 68332 Has on-board math coprocessor n n n Limited math operations compared to “external coprocessor” Floating point operations very heavily pipelined Program as if it was a 68332 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 9

ADI 2181 -- integer processor n 16 -bit DSP integer processor n n handles

ADI 2181 -- integer processor n 16 -bit DSP integer processor n n handles fractional numbers Treat as if 21061 processor, but n n no floating point operations only 4 registers in each DAG -- not 8 does not have instruction cache (but that does not cause a problem for some non-obvious reason -- perhaps memory fetching is not the critical element in the clock timing) Only certain registers can do circular buffers 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 10

ADSP 21160 processor (SIMD) n n Essentially a 21061, except that has 2 of

ADSP 21160 processor (SIMD) n n Essentially a 21061, except that has 2 of everything where 1 CPU can be controlled and the other CPU simply mimics the first CPU except that works with the “next” memory location. Program as if 2106 X processor, except n n If algorithm can be broken up into 2 components then would process memory operations 1, 3, 5, 7 etc. on one CPU with the other CPU automatically handling 2, 4, 6, 8 etc. Use concept of “partial sums” algorithms 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 11

Tiger. SHARC processor (MIMD) n n n Essentially a 21061, except that has 2

Tiger. SHARC processor (MIMD) n n n Essentially a 21061, except that has 2 of everything where each CPU can be controlled. Very wide instruction word -- 2 +, 2 * and 4 memory ops Jump pipeline is much deeper Use register interlock, no race conditions when doing math and memory operations Easier to program than TI VLIW Difficult to take full advantage 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 12

TI VILW (MIMD) n n Essentially a 21061, except that has 2 of everything

TI VILW (MIMD) n n Essentially a 21061, except that has 2 of everything where each CPU can be controlled. Very wide instruction -- 2 +, 2 * and 4 memory ops Jump pipeline is much deeper DOES NOT USE register interlock, there are race conditions when doing math and memory operations. Some people claim “this is an advantage” -- when? 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 13

29050 RISC processor n Program as if 21061 except n n n + and

29050 RISC processor n Program as if 21061 except n n n + and * can be done together (FMAC) but less flexible than 21061 -- but normally an issue dm and pm data operations can’t be done together Has special architecture for handling nested “C” subroutines -register window Has large number of registers, but they typically play the same role as the “secondary registers” on 21061. Meaning -- set aside certain registers for when doing interrupts There are no “modify” registers No circular buffers, but do have “virtual memory management unit” which can be made to act like circular buffers 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 14

29 K processor n n Speed approaches that of 21061 except on the multiple

29 K processor n n Speed approaches that of 21061 except on the multiple memory accesses. Floating point unit has a pipeline of 4 (FADD) or 6 (FMAC) Can get 1 new Floating point result every clock cycle “provided” you can keep the pipeline full. Keeping the pipeline full means breaking the algorithm up into “ 4 segments” that can work “interlinked”. This means that coding approach is very similar to VLIW processors in the way the algorithm is broken up. 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 15

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm compared

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm compared to 32 -bit algorithm Execution Time = # of instructions in code * Average # of cycles / instruction * clock speed 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 16

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm compared

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm compared to 32 -bit algorithm # of instructions same (switch between. W and. L format) Clock speed the same Average # of cycles / instruction increases as must fetch 32 -bit values in 2 steps. 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 17

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm with

Examples n n Compare 68332 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 Execution Time = # of instructions in code * Average # of cycles / instruction * clock speed 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 18

Examples n n n Compare 68332 code when operating on 16 -bit DSP algorithm

Examples n n n Compare 68332 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 # of instructions same (Using. W) Clock speed the same Average # of cycles / instruction increases if not using 0 x 400 (16 bit address with sign extension) or 0 x. FFFF 8400 (16 -bit address with sign extension) -- absolute addressing Average # of cycles / instruction stays the same if using the index registers for accessing memory (except the few instructions where you actually load the index registers using a 16 -bit or 32 bit value) 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 19

Examples n n Compare 68332 and 68 HC 11 code when operating on 16

Examples n n Compare 68332 and 68 HC 11 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 Execution Time = # of instructions in code * Average # of cycles / instruction * clock speed 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 20

Examples n n n Compare 68332 and 68 HC 11 code when operating on

Examples n n n Compare 68332 and 68 HC 11 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 Can’t use 0 x. FFFF 8400 on 68 HC 11 since only 16 -bit registers If array is only 256 in size then can set 6832 DP register to 0 x 4 or 0 x 84 and then getting absolute memory addresses done in 8 -byte operations 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 21

Examples n n n Compare 68332 and 68 HC 11 code when operating on

Examples n n n Compare 68332 and 68 HC 11 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 8 -bit bus on HC 11, therefore #cycles / instruction is higher (each 16 -bit value takes 2 fetches) Only 1 16 -bit register on HC 11, therefore very likely that #of instructions in program goes up as have to do multi-precision arithmetic 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 22

Examples n n Compare 68332 and 68040 code when operating on 32 -bit DSP

Examples n n Compare 68332 and 68040 code when operating on 32 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 16 -bit bus on 68332, therefore #cycles / instruction is higher (each 32 bit value takes 2 fetches) Depending on addressing mode, it is likely that #cycles used for fetching instruction information smaller on 68040 since can fetch 32 bits of instruction Then there is the issue of speed associated with 68040 caches, 1 cycle if cached and higher if not cached (true or false) 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 23

Examples n n Compare 68332 and 68040 code when operating on 16 -bit DSP

Examples n n Compare 68332 and 68040 code when operating on 16 -bit DSP algorithm with array in memory locations 0 x 400, 0 x 8400, 0 x. FFFF 8400 16 -bit bus on 68332, therefore #cycles / instruction is higher (each 32 bit value takes 2 fetches) Depending on addressing mode, it is likely that #cycles used for fetching instruction information smaller on 68040 since can fetch 32 bits of instruction Then there is the issue of speed associated with 68040 caches, 1 cycle if cached and higher if not cached (true or false) 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 24

Examples n n n n Compare 68332 and 29 K code when operating on

Examples n n n n Compare 68332 and 29 K code when operating on 16 -bit DSP algorithm One 68332 CISC instruction = several 29 K RISC instructions, therefore 29 K program code longer (more instructions to execute). If 29 K pipeline is kept full, then 1 cycle / instruction compared to 4, 8, 12, 24 or more on 68332 Issues of JUMP pipeline on 29 K, but also branch target cache Special 16 -bit access mode for 32 -bit 29 K Faster “C” subroutine handling -- multiple 29 K registers rather than 68332 stack Faster interrupt handling on 29 K -- dedicated registers 11/25/2020 ENCM 515 -- Comparison of Architectures Copyright smithmr@ucalgary. ca 25