CS 136 Advanced Architecture Instruction Set Architecture CS

  • Slides: 29
Download presentation
CS 136, Advanced Architecture Instruction Set Architecture CS 136 1

CS 136, Advanced Architecture Instruction Set Architecture CS 136 1

Types of ISAs • Stack – – Implicit operands (top of stack) Heavy memory

Types of ISAs • Stack – – Implicit operands (top of stack) Heavy memory traffic Limited ability to access operands at will Obsolete • Accumulator – – Implicit register operand (“accumulator”) One memory operand Insufficient temporaries Obsolete • General-purpose register – Multiple registers – Several variations CS 136 2

GPR Architectures • Memory-memory – CISC idea – Usually allows any operand to be

GPR Architectures • Memory-memory – CISC idea – Usually allows any operand to be in register as well • Register-memory – Example: x 86 – Can do one operand in register, one in memory, or 2 in regs • Register-register – – CS 136 Only design used in modern machines Lots of registers ⇒ fast flexible operand access Simplicity of hardware Compiler has full flexibility in register usage 3

Five Ways to Do C = A + B STACK ACCUM MEM-MEM REG-REG PUSH

Five Ways to Do C = A + B STACK ACCUM MEM-MEM REG-REG PUSH A PUSH B ADD POP C LOAD A ADD B STORE C ADD C, A, B LOAD R 1, A ADD R 1, B STORE R 1, C LOAD R 1, A LOAD R 2, B ADD R 3, R 1, R 2 STORE R 3, C CS 136 4

Memory Addressing • Originally just word addressing • 8 -bit bytes and byte addressing

Memory Addressing • Originally just word addressing • 8 -bit bytes and byte addressing introduced on IBM 360 series • Brief experiments with bit addressing (bad idea) • Unaligned accesses not worth supporting • Some machines byte-address but only load/store a word at a time – Turned out to be bad design decision – Too many programs do string processing 1 character at a time – May need to revisit in future (32 -bit characters? ) • Modern RISC designs allow short load/store, but not short arithmetic CS 136 5

Endian-ness • The word is “Endian”, not “Indian” • Reference to Gulliver’s Travels •

Endian-ness • The word is “Endian”, not “Indian” • Reference to Gulliver’s Travels • Little-Endian invented by Digital Equipment on the PDP-11 – – Mathematically more elegant Horrible for humans “It seemed like a good idea at the time” Should be banished from the face of the Earth • Some machines can switch endianness with a control bit – This idea is even stupider than the original CS 136 6

Addressing Modes • How can an instruction reference memory? • Early days: absolute address

Addressing Modes • How can an instruction reference memory? • Early days: absolute address in instruction – Led to instruction modification – Improvement: “Indirection” picked up absolute location, used it as final address • Minimum necessary today: follow pointer in register – Clumsy if only option • Fanciest conceivable: *(R 1+S*R 2+constant), with either or both of R 1 and R 2 autoincremented or autodecremented as side effect, either before or after instruction – No machine went quite this far – But VAX came close CS 136 7

Addressing Modes (cont’d) • What’s actually useful? • Need to follow pointers: can restrict

Addressing Modes (cont’d) • What’s actually useful? • Need to follow pointers: can restrict to registers – ADD R 1, (R 2) – Better: LOAD R 1, R 2 (like MIPS) • Frequent stack access ⇒ register + constant useful • Immediates needed for built-in constants • Access to globals ⇒ absolute memory addresses – (We’ll see that’s painful) • PC-relative modes – Used to be needed for data; not in modern systems – Still needed for calls and branches • Absolute addresses no longer needed for branches – Can always emulate with PC-relative, since PC known – Still available on some architectures to solve distance issues CS 136 8

Operand Types and Sizes • Type usually implies size • Integers can safely be

Operand Types and Sizes • Type usually implies size • Integers can safely be widened to word size – Shrink again when stored – Takes advantage of two’s-complement representation • Single-precision FP gives different results than double-precisions ⇒Necessary to support both widths – Some FPUs can do two SP operations in parallel • Older machines allowed “packed” decimal (2 digits per byte) – x 86 supports with DAA (Decimal Add Adjust) instruction – Still useful in business world, though dying • 32 bits standard these days, 64 bits coming – 128 some day? CS 136 9

Operations Provided • Only one instruction truly needed: SJ – Subtract A from B,

Operations Provided • Only one instruction truly needed: SJ – Subtract A from B, giving C; if result is < 0, jump to D – It’s Turing-complete! • Practical machines need a bit more at minimum: – Arithmetic and logical (add, multiply, divide? , and, or, …) – Data movement (load/store, move between registers) – Control (conditional/unconditional branch, procedure call and return, trap to OS) – System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices) • Other builtins can be useful: – Basic floating point » Bad x 86 design idea: sin, sqrt, etc. ! – Decimal – String – Vector, graphics CS 136 10

Control Flow • Addressing modes are important – PC-relative means code can run at

Control Flow • Addressing modes are important – PC-relative means code can run at any virtual address – Useful for dynamically linked (shared) libraries • Pointer-following jump needed for returns – Also useful for switch statements, function pointers, virtual functions, and shared libraries • How to specify condition for conditional branches? – Condition code as side effect of every instruction » Boils down to extra register » Spurious dependencies in pipeline – Condition register explicitly set by comparison – Compare as part of branch » Adds delay slots in pipeline CS 136 11

Encodings • Variable-length instructions – – Highly efficient (few wasted bits) Allows complex specifications

Encodings • Variable-length instructions – – Highly efficient (few wasted bits) Allows complex specifications (e. g. , x 86 addressing modes) Usually means misaligned instruction fetch Greatly complicates fetch/decode units • Fixed-length instructions – – CS 136 May limit number of registers Usually very few instruction formats Wastes space but gains speed (e. g. , only aligned fetches) Limits width of immediate operands 12

The Fight for Bits • How wide should instruction be? – Wider ⇒ can

The Fight for Bits • How wide should instruction be? – Wider ⇒ can encode more registers, more options – Wider ⇒ bigger programs, need more memory bandwidth – Bigger programs ⇒ fewer cache hits • Things you need to encode: – – – – CS 136 Operation code (16 to 1000 instructions) Operands (at least one, normally two or three) Immediate operands Memory offsets Branch targets Branch conditions Conditional operations (e. g. , conditional load, add) 13

Two or Three Operands? • In favor of three: – Smaller code size –

Two or Three Operands? • In favor of three: – Smaller code size – No clobbered operands ⇒ fewer copies or reloads – Setting R 0 to zero allows fewer operations supported in ALU • In favor of two: – Can address more registers CS 136 14

How to Decide All These Questions? • Slide rules at 50 paces? • Analysis

How to Decide All These Questions? • Slide rules at 50 paces? • Analysis wars – Look at existing designs, existing programs – “Recompile” programs for hypothetical architecture » Analyze size of resulting program » Run through simulator to see how it performs – Impractical approach » Writing compiler back ends is expensive » Simulators are slow – Instead, make projections based on existing object code CS 136 15

Example of Bad Analysis: @-(R 2) • DEC VAX had three “auto” addressing modes:

Example of Bad Analysis: @-(R 2) • DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement • What happened to indirect autopredecrement? – – CS 136 Analyzed output of BLISS compiler on many programs Language didn’t provide way to express autopredecrement Concluded it wasn’t necessary Very different result if had analyzed C! *--p 1 = a[--i]; 16

Example of Difficult Analysis: imm 16 • How big should an immediate be? •

Example of Difficult Analysis: imm 16 • How big should an immediate be? • Easy analysis: examine existing code – Calculate frequency of various widths – Analyze tradeoff of using those bits for other purposes • Problem: analyzed architecture affects frequency of different widths – E. g. , Alpha has only 16 bits, so you’ll never see over 16! – Alternative: look for multi-instruction sequences that effectively use more than 16 bits » Hard to find (compiler pipeline scheduling) » Compiler will stand on head, use sneaky tricks to avoid generating extra instructions – Need for wider constants depends on architecture » E. g. , MIPS needs them when jumping to shared libraries » 64 -bit machine needs 64 -bit addresses CS 136 17

CS 136 18

CS 136 18

Interaction with Compilers • Nearly all modern code generated by compilers • Architect must

Interaction with Compilers • Nearly all modern code generated by compilers • Architect must make compiler’s job easier – – – – CS 136 Lots of registers Orthogonal instruction set Few side effects Instructions and addressing modes matched to language constructs » But NOT attempt to implement them in detail! » Primitives are better than “solutions” even when solutions are correct Good support for stack, globals, and pointers Support for both compile-time and run-time binding Don’t ask compiler to predict dynamic information (e. g. , branch targets) Don’t provide features language can’t express » Example pro and con: vector architectures 19

The MIPS 64 Architecture • Extension of MIPS 32 • Data path widened to

The MIPS 64 Architecture • Extension of MIPS 32 • Data path widened to 64 bits – Still 32 -bit instructions – Still only 32 registers • Most instructions have “D” as prefix to indicate 64 -bit version CS 136 20

MIPS Instruction Formats I-Type Instruction 6 5 5 16 Opcode rs rt Immediate 6

MIPS Instruction Formats I-Type Instruction 6 5 5 16 Opcode rs rt Immediate 6 5 5 6 Opcode rs rt rd shamt funct R-Type Instruction J-Type Instruction 6 26 Opcode Offset inserted into PC CS 136 21

I-Type Instructions 6 5 5 16 Opcode rs rt Immediate • Encodes loads, stores

I-Type Instructions 6 5 5 16 Opcode rs rt Immediate • Encodes loads, stores (all widths), immediate ALU ops • Also conditional branches (rt unused) CS 136 22

R-Type Instructions 6 5 5 6 Opcode rs rt rd shamt funct • Register-register

R-Type Instructions 6 5 5 6 Opcode rs rt rd shamt funct • Register-register ALU operations – “funct” encodes the ALU operation: add, sub, etc. – Opcode chooses operands, special registers, sizes, etc. – Conditional moves • Handles special registers, floating point, … CS 136 23

J-Type Instructions 6 26 Opcode Offset inserted into PC • Jump, jump and link

J-Type Instructions 6 26 Opcode Offset inserted into PC • Jump, jump and link • Trap, return from exception CS 136 24

MIPS Control Flow • Unconditional jump substitutes low bits of PC – NOT addition!

MIPS Control Flow • Unconditional jump substitutes low bits of PC – NOT addition! – Exceptionally bad on 64 -bit architecture, where 36 bits unchanged • No built-in stack – – Subroutine call stores return in register Callee must save on stack if necessary Reduces overall cycle time Ultra-efficient for leaf functions • Conditional branches only test against zero – Complex tests (e. g. , <) store Z/NZ result in a register – We’ve seen how this improves the pipeline • Conditional moves can eliminate many branches – Feature of many modern architectures CS 136 25

MIPS Floating Point • Floating point was originally coprocessor ⇒Separate FP registers – Special

MIPS Floating Point • Floating point was originally coprocessor ⇒Separate FP registers – Special instructions to move to/from integer registers • MIPS 64 (but not 32) has paired single operations – Two SP numbers pass through DP ALU simultaneously • MIPS 64 also has multiply-add in one instruction – Useful in signal processing (multimedia) CS 136 26

Fallacies and Pitfalls • PITFALL: Instruction designed to support feature in some language –

Fallacies and Pitfalls • PITFALL: Instruction designed to support feature in some language – Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK – Why is this bad? » Easy to get wrong (PDP-11 MARK instruction) » Easy to make inefficient (VAX CALLS) » Languages evolve, hardware doesn‘t CS 136 27

Fallacies and Pitfalls (2) • FALLACY: Typical programs exist – We wish! • PITFALL:

Fallacies and Pitfalls (2) • FALLACY: Typical programs exist – We wish! • PITFALL: Ignoring the compiler – Design better code size, based on bad compiler – Good compiler can blow your idea out of the water • FALLACY: Flawed architectures can’t succeed – Ummm, x 86? – Every architecture has drawbacks • FALLACY: You (YOU!) can design a flawless architecture – Always tradeoffs – Always something new to learn CS 136 28

Summary • Instruction encoding is important • Don’t forget to provide what the compiler

Summary • Instruction encoding is important • Don’t forget to provide what the compiler needs – This is NOT what you think the compiler needs! • Addresses will only get wider • Data will only get wider – Including characters • Cleverness to improve bandwidth (e. g. , MADD) • RISC is here to stay CS 136 29