RISC Instruction Set Architecture Simple instruction set opcodes

  • Slides: 23
Download presentation
RISC Instruction Set Architecture Simple instruction set • opcodes are primitive operations use instructions

RISC Instruction Set Architecture Simple instruction set • opcodes are primitive operations use instructions in combination for more complex operations • data transfer, arithmetic/logical, control • few & simple addressing modes (register, immediate, displacement/indexed) Load/store architecture • load/store values from/to memory with explicit instructions • compute in general purpose registers Easily decoded instruction set • fixed length instructions • few instruction formats, many fields in common, a field in many formats is in the same bit location in all of them Spring 2003 CSE P 548 1

Spring 2003 CSE P 548 2

Spring 2003 CSE P 548 2

RISC Instruction Set Architecture Designed for pipeline efficiency • simple instructions do almost the

RISC Instruction Set Architecture Designed for pipeline efficiency • simple instructions do almost the same amount of work • instructions with simple & regular formatting can be decoded in parallel Still some issues • • • condition codes vs. condition registers GPR organization: register windows vs. flat file sizes of immediates support for integer divide & FP operations how CISCy do we get? Spring 2003 CSE P 548 3

Spring 2003 CSE P 548 4

Spring 2003 CSE P 548 4

New RISC Architectures 64 -bit architectures • 64 b registers & datapath • 64

New RISC Architectures 64 -bit architectures • 64 b registers & datapath • 64 b addresses (used in loads, stores, indirect branches) • linear address space (no segmentation) New instructions • better performance • emerging applications • changes in microarchitecture • changes in process technology • take advantage of new compiler optimizations • impulse to CISCyness Spring 2003 CSE P 548 5

Backwards Compatibility Problem: have to be able to execute the “old” 32 b codes,

Backwards Compatibility Problem: have to be able to execute the “old” 32 b codes, with 32 b operands Some general approaches • start all over: design a new 64 -bit instruction set (Alpha) • 2 instruction subsets, mode for each (MIPS-III) • 32 b instructions from previous architecture • new 64 b instructions: ld/st, arithmetic, shift, conditional branch • illegal-instruction trap on 64 b instructions in 32 bit mode Spring 2003 CSE P 548 6

Backwards Compatibility ld/st • datapath is 64 b; therefore manipulate 64 b values in

Backwards Compatibility ld/st • datapath is 64 b; therefore manipulate 64 b values in 1 instruction • when loading 32 b data in 64 b mode, sign extend the value • when loading 32 b data in 32 b mode, zero extend the value for backwards compatibility to 32 b binaries • operand sizes • byte, halfword, double (SPARC V 9) • ld/st 32/64 only (Alpha) load & extract, insert & store for smaller operands avoids MUX & shifter between execution units & L 1 cache shift right • specify operand width so can sign/zero extend from correct bit (either 31 or 63) Spring 2003 CSE P 548 7

Backwards Compatibility FP registers • • want to support more than 16 double-precision values

Backwards Compatibility FP registers • • want to support more than 16 double-precision values 1 64 b register holds single & double-precision operand (Alpha) mode bit to enable 16 new double-precision registers (MIPS-III) specify a register pair with unused low-order bits (SPARC V 9) Handling conditions • condition registers (Alpha) • but overflow condition not in a register --> trap on overflow • so separate 64 b/32 b integer add & subtract instructions • 64 b & 32 b integer condition codes (SPARC V 9) • 1 set of arithmetic instructions sets them both • conditional branches (positive/negative or 0/not 0) & overflow instructions (overflow/not overflow) test a specific CC set Spring 2003 CSE P 548 8

New Instructions Purpose: • for better performance • to better match changes in technology

New Instructions Purpose: • for better performance • to better match changes in technology (e. g. , a bigger discrepancy between CPU & memory speeds) • to better match new implementations (e. g. , deeper pipelines) • to take advantage of new compiler optimizations (e. g. , statically determining which array accesses will hit or miss in the L 1 cache) • to support new, compute-intensive applications (e. g. , multimedia) • impulse to CISCyness (they think it’s for better performance) (e. g. , multiple loop-related operations) Spring 2003 CSE P 548 9

New Instructions data prefetch • fetch data before its load instruction • increases chance

New Instructions data prefetch • fetch data before its load instruction • increases chance of a cache hit; eliminate load latency • caveats: • may displace data still being used • may saturate a multiprocessor bus • an extra instruction • issues • prefetch distance • prefetched data size • number of outstanding prefetches • prefetch destination: L 1 or L 2 cache • can be mandatory/a hint, faulting/nonfaulting • compiler support for prefetching only data cache misses Spring 2003 CSE P 548 10

New Instructions conditional move instruction (an example of predicated execution) • replaces a conditional

New Instructions conditional move instruction (an example of predicated execution) • replaces a conditional branch & move with one instruction that tests a condition & moves a source operand to a destination operand if the condition is true • example: set R 1 cmovez R 2, R 3, R 1 replaces bnez R 1, Label mov R 2, R 3 Label: • eliminates branch latency & branch misprediction penalty • also used to detect address aliasing • allows loads to float above stores Spring 2003 CSE P 548 11

New Instructions Is predicated execution a good idea? Spring 2003 CSE P 548 12

New Instructions Is predicated execution a good idea? Spring 2003 CSE P 548 12

New Instructions loop support • combine simple instructions that handle common programming idioms •

New Instructions loop support • combine simple instructions that handle common programming idioms • scaled add/subtract/compare • branch on count • eliminates instructions • are these a good idea? Spring 2003 CSE P 548 13

New Instructions multimedia instructions (implementation-dependent) • targeted for graphics, audio and video data •

New Instructions multimedia instructions (implementation-dependent) • targeted for graphics, audio and video data • partitioned arithmetic • 64 b wasted on common data • arithmetic on two 32 b, four 16 b or eight 8 b data • example operations: add, subtract, multiply, compare • special instructions that manipulate < 64 b data: • complex operations that are executed frequently • expand, pack, partial store • pixel distance instruction for motion estimation, handling boundary conditions in convolution • examples: MMX, VIS Spring 2003 CSE P 548 14

New Instructions multimedia instructions • ramifications on the architecture • new instructions • new

New Instructions multimedia instructions • ramifications on the architecture • new instructions • new formats • ramifications on the implementation • part of FP hardware • already handles multicycle operations • “register partitioning” already done to implement singleprecision arithmetic • integer pipeline needed to execute integer instructions • surprisingly small proportion of die Spring 2003 CSE P 548 15

New Instructions multimedia instructions • ramifications on the programming ease - either: • call

New Instructions multimedia instructions • ramifications on the programming ease - either: • call assembly language library routines • write assembly language code • ramifications on performance • ex: VIS pixel distance instruction eliminates ~50 RISC instructions • ex: 5. 5 X speedup to compute absolute sum of differences on 16 x 16 -pixel image blocks Bottom line: + increase performance on an important compute-intensive application that uses MM instructions alot + with a small hardware cost - but a large programming effort Spring 2003 CSE P 548 16

CISC Instruction Set Architecture, aka x 86 Complex instruction set • more complex opcodes

CISC Instruction Set Architecture, aka x 86 Complex instruction set • more complex opcodes • ex: transcendental functions, string manipulation • ex: different opcodes for intra/inter segment transfers of control • more addressing modes • 7 data memory addressing modes + multiple displacement sizes • restrictions on what registers can be used with what modes Register-memory architecture • operands in computation instructions can reside in memory Complex instruction encoding • variable length instructions (different numbers of operands, different operand sizes, prefixes for machine word size, postbytes to specify addressing modes, etc. ) • lots of formats, tight encoding Spring 2003 CSE P 548 17

CISC Instruction Set Architecture, aka x 86 More complex register design • special-purpose registers

CISC Instruction Set Architecture, aka x 86 More complex register design • special-purpose registers • hybrid stack architecture for floating point • has been extended with addressing modes More complex memory management • segmentation with paging Spring 2003 CSE P 548 18

Backwards Compatibility is Harder with CISCs Must support: • registers with special functions •

Backwards Compatibility is Harder with CISCs Must support: • registers with special functions • when it is recognized that register speed, not how a register is used, is what matters • multiple data sizes & instructions for all data sizes • when have to translate to RISClike instructions to easily pipeline • special categories of instructions • even though they are no longer used • real addressing, segmentation without paging, segmentation with paging • when addressing range is obtained with address size • stack model for floating point • when most programs use arbitrary memory operand addresses Spring 2003 CSE P 548 19

RISC vs. CISC Which is best? Spring 2003 CSE P 548 20

RISC vs. CISC Which is best? Spring 2003 CSE P 548 20

RISC vs. CISC Advantage of RISC depends on (among other things): • chip technology

RISC vs. CISC Advantage of RISC depends on (among other things): • chip technology • processor complexity Pre-1990: chip density was low & processor implementations were simple • single-chip RISC CPUs (1986) & on-chip caches • instruction decoding “large” part of execution cycle for CISCs Post-1990: chip density is high & processor implementations are complex • both RISC & CISC implementations fit on a chip with big L 1 caches • instruction decoding smaller time component: • multiple-instruction issue • out-of-order execution • speculative execution & sophisticated branch prediction • multithreading Spring 2003 CSE P 548 21

Other Important Factors Clock rate • dense process technology (currently. 18 micron) • superpipelining

Other Important Factors Clock rate • dense process technology (currently. 18 micron) • superpipelining (all pipelines manipulate primitive instructions) Compiler technology • architecture features that help compilation • orthogonal architecture, simple architecture • primitive operations • lots of general purpose registers • operations without side effects Ability of the design team New/old architecture • historical legacy takes time (whether RISC or CISC) $$$ Spring 2003 CSE P 548 22

Wrap-up What RISC ISAs look like today • the original model • the new

Wrap-up What RISC ISAs look like today • the original model • the new instructions • what they do • why they’re used 64 b architectures • issues with backwards compatibility to old “word” sizes (makes you realize how pervasive the “word” size is – it’s not just the addressable memory space) RISC vs. CISC is not the simplistic debate it used to be Spring 2003 CSE P 548 23