COMP 206 Computer Architecture and Implementation Montek Singh

  • Slides: 21
Download presentation
COMP 206: Computer Architecture and Implementation Montek Singh Mon, Sep 12, 2005 Lecture 4:

COMP 206: Computer Architecture and Implementation Montek Singh Mon, Sep 12, 2005 Lecture 4: Instruction Set Design 1

Organization of an Instruction Arithmetic Logical Shift Load (from MM) Store (to MM) Move

Organization of an Instruction Arithmetic Logical Shift Load (from MM) Store (to MM) Move (reg-reg) Move (MM-MM) (e. g. , MIPS: 4 bytes) 1) Length of operands 2) Shift/rotate: direction, amount 3) Branch condition (e. g. , VAX: 1 -37 bytes) 0 address 1 address 2 address 3 address implied If I/O is not memorymapped Instruction Register Memory Addressing modes • immediate • absolute • computed Unconditional (branch) Conditional (jump) Call Return 2

Classification by Operands ã Important machines that are difficult to classify l Intel 80

Classification by Operands ã Important machines that are difficult to classify l Intel 80 x 86 (m, n) means Ø variable instruction size: 1 -17 bytes Ø memory can be destination Ø uses implied registers m memory operands n total operands l Motorola 680 x 0 Ø Instruction size: 2, 4, 6, 8, 10 bytes Ø Two address format only (2, 2) 3

Instruction Set Design Objective #1 Code size (code density) : l Depends on: Ø

Instruction Set Design Objective #1 Code size (code density) : l Depends on: Ø size of MM/cache Ø access time of cache (on-chip/off-chip) Ø CPU-MM bandwidth l Frequently used (written down) instructions should be short l Implies variable-length instructions 4

Instruction Set Design Objective #2 Execution speed (performance) : l Only frequently executed instructions

Instruction Set Design Objective #2 Execution speed (performance) : l Only frequently executed instructions should be included in the instruction set Ø Infrequently executed instructions slow down the others Ø Complex and long instructions tend to be used infrequently Ø Defining hardware-software interface l Frequently executed instructions should be fast l Pipelining should be made as easy as possible Ø Overlapped execution lowers CPI value Ø Single instruction length, simple instruction formats, and few addressing modes for easy decoding Ø Three (register) address instructions decouple CPU and memory, and also do not destroy their operands (reducing memory accesses) 5

Instruction Set Design Objective #3 Size and complexity of hardware (ALU, CU) l Implementing

Instruction Set Design Objective #3 Size and complexity of hardware (ALU, CU) l Implementing infrequently executed instructions ties down hardware that is rarely used, and could be used for some other purpose with greater advantage l Some instructions should not be included in the instruction set 6

Instruction Set Design Objective #4 Instruction set as a programming language l Needs of

Instruction Set Design Objective #4 Instruction set as a programming language l Needs of a human programmer (less important today) Ø Several desirable properties of instruction sets have been recognized and described, such as orthogonality (each operand can be specified independently of the others) and consistency (being able to predict the remainder of an architecture given partial knowledge of the system) l Needs of an optimizing compiler Ø Simple instructions are more suitable for code optimizations Ø Optimizing compilers try to find the shortest or fastest code sequence that implements the semantics of a HLL program. To make code reorganization tractable, an instruction set is needed that makes: – – – the size of each instruction easy to calculate; the execution time of each instruction easy to calculate; the interactions between instructions easy to figure out. Ø ISA features such as complex addressing modes, variable length instructions, special-purpose registers provide too many ways of doing the same thing and lead to combinatorial explosion 7

Addressing Modes R : the register file M: the memory address space d :

Addressing Modes R : the register file M: the memory address space d : the size of the data item being accessed (1, 2, 4, 8 bytes) ã We can’t directly refer to data values, only their addresses l Except for immediate operands ã Register deferred and direct addressing modes can be synthesized from displacement addressing mode 8

Registers versus Cache ã Similarities l Both are small, fast, and expensive (flip-flops) l

Registers versus Cache ã Similarities l Both are small, fast, and expensive (flip-flops) l Both are used to increase execution speed of CPU l Both operate based on locality of reference ã Differences l Registers are visible in ISA; caches are not (except for instructions for invalidation, prefetch, or flushing) l Number of registers is fixed by instruction format; size of cache is easily changeable l Registers have higher BW: 3 words/cycle, and are random-access; caches have lower BW: 1 word/cycle, and are associative l Register access time is fixed; cache access time is statistical l Register allocation is explicit by compiler; cache allocation is automatic l Registers require fewer bits to address; caches require full memory addresses l Registers create no I/O problems; caches do 9

Organization of Registers ã One general-purpose set (all interchangeable, “typeless”) ã One general-purpose set

Organization of Registers ã One general-purpose set (all interchangeable, “typeless”) ã One general-purpose set (a few with dedicated uses) l PDP-11: eight 16 -bit registers (R 6: stack pointer, R 7: PC) l VAX 11/780: sixteen 32 -bit registers (four special-purpose, R 14: stack pointer, R 15: PC) ã Two sets l Motorola 68000: eight 32 -bit data, eight 32 -bit address l IBM 370: sixteen 32 -bit integer, four 64 -bit FP l DLX, MIPS: 31 32 -bit integer, 32 32 -bit FP ã Three sets l CDC 6600: eight 18 -bit integer, eight 18 -bit address, eight 60 -bit FP ã Many registers with dedicated use l Intel 80 x 86 10

Notations for Information Representation “On holy wars and a plea for peace”, Danny Cohen,

Notations for Information Representation “On holy wars and a plea for peace”, Danny Cohen, IEEE Computer 14(10), pages 49 -54, Oct 1981 64 bits 8 bytes 2 words 1 doubleword Q: How do we number these various units of information in a consistent manner? 9 6 2 1 7 6 6 Most Significant Digit (MSD) “Big End” Least Significant Digit (LSD) “Little End” “Big End”-ian Numbering 0 1 2 3 4 5 6 6 5 4 3 2 1 0 “Little End”-ian Numbering 11

Why Is Numbering Important? ã English text is written left-to-right and the characters are

Why Is Numbering Important? ã English text is written left-to-right and the characters are numbered left-to-right ã Numbers can be numbered in two different ways ã Memory locations are numbered (addresses) ã Consequences of numbering l Data is stored in memory according to byte numbering (the lower- numbered byte goes into a byte in memory with a smaller address) l Data is sent through a bit-serial communication channel according to bit numbering (bit 0 goes first, followed by bit 1, etc. ) ã When displaying computer representation for humans l Numbers are written in the usual way (MSD on left, LSD on right) l Text is written in such a way as to match the numbering of numbers 12

Consequences of Numbering Machine A: Big Endian Numbering Machine B: Little Endian Numbering 1

Consequences of Numbering Machine A: Big Endian Numbering Machine B: Little Endian Numbering 1 9 9 5 W a s h i n g t o n * 0 1 2 3 4 5 6 7 8 9 a b c d e f Not the string “ 1995”, but the bytes making up the integer 1995 * n o t g n i h s a W 1 9 9 5 f e d c b a 9 8 7 6 5 4 3 2 1 0 Protocol 1 Byte j of A goes to byte j of B Protocol 2 Byte j of A goes to byte n-j-1 of B (so this protocol reverses the entire message) Fix: Complicate protocol and treat numbers and character strings differently, which has to be done in software (by attaching descriptors to data items) * n o t g n i h s a W 5 9 9 1 f e d c b a 9 8 7 6 5 4 3 2 1 0 1 9 9 5 W a s h i n g t o n * f e d c b a 9 8 7 6 5 4 3 2 1 0 13

Odds and Ends about Numbering ã The Little Endian notation is compatible with mathematical

Odds and Ends about Numbering ã The Little Endian notation is compatible with mathematical conventions of positional notation ã The Little Endian notation has the disadvantage that is displays English text in reverse l To overcome this, manuals for Little Endian machines usually display character strings vertically ã Example machines l Little Endian: PDP-11, VAX, 80 x 86 l Big Endian: IBM 370, MIPS, DLX, SPARC l Mixed: Motorola 68000, Z 8000 Ø Big Endian byte ordering Ø Little Endian bit ordering 14

Alignment of Words in Memory Controller Mem Bank 00 Mem Bank 01 Mem Bank

Alignment of Words in Memory Controller Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 8 8 32 bits ã CPU accesses a 32 -bit word of data starting at byte address x…x 00 l Such an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-aligned l Memory controller is simple and fast, data available in one cycle ã CPU accesses a 32 -bit word of data starting at byte address 01111 l Byte addresses are 01111, 10000, 10001, 10010 (misaligned address) l Doubles the access time of word ã Requiring aligned addresses results in simpler memory controller and faster execution ã Costs some loss of storage, and adds complexity in code generators 15

Sub-Word Accesses Memory Controller Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem

Sub-Word Accesses Memory Controller Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 8 8 32 bits CPU Register File (32 bits) ã Byte operand in register is usually the rightmost byte of register ã Byte may come from any of the four memory banks ã Needs routing/permuting hardware l Either at memory side of bus (justified bus) Byte always travels on rightmost quarter of bus l Or on CPU side (unjustified bus) Bus lanes are extensions of memory bank lanes ã Source of complications in either case 16

Control Transfer Instructions Terminology l BTA (Branch Target Address): The destination address of the

Control Transfer Instructions Terminology l BTA (Branch Target Address): The destination address of the branch Ø The BTA is static if it is always the same during execution Ø The BTA is dynamic if it can vary during a single execution of a program (procedure return, O-O dynamic dispatch, switch statements are major examples) l Branch is taken if next instruction to be executed is at address BTA l Branch is not taken if next instruction to be executed is the one following the branch instruction (“fall-through”) l Branch outcome: whether the branch is taken or not taken l Forward branch: BTA > (PC), where (PC) is the address of the branch instruction l Backward branch: BTA < (PC) l An unconditional branch is always taken 17

Code Generation Examples for Branches if (x > 0) y += z; else y

Code Generation Examples for Branches if (x > 0) y += z; else y -=z; blez addu j L 18: subu L 33: r 7, L 18 r 3, r 4 L 33 r 3, r 4 while (a < b) { a++; b--; x++; } j L 34: addu L 33: slt bne L 33 r 5, 1 r 6, -1 r 7, 1 r 2, r 5, r 6 r 2, r 0, L 34 Register r 3 contains y Register r 4 contains z Register r 5 contains a Register r 6 contains b Register r 7 contains x 18

Classification of Branches Classifying branches into these four groups permits us to compute some

Classification of Branches Classifying branches into these four groups permits us to compute some of the dynamic frequencies if some others have been measured. Rule of thumb: Backward branches tend to be taken, forward branches tend not to be taken. 19

Computing Branch Frequencies Assume that 75% of all branches are forward, and that 55%

Computing Branch Frequencies Assume that 75% of all branches are forward, and that 55% of all branches are taken. If 80% of all backward branches are taken, what is the probability that a taken branch is a forward branch? 20

Evaluating Branch Conditions ã Typical set of condition codes (e. g. , Motorola 680

Evaluating Branch Conditions ã Typical set of condition codes (e. g. , Motorola 680 x 0) l Negative. Result, Zero. Result, Arithmetic. Overflow, Carry. Out ã Many RISC machines do not use condition codes (e. g. , MIPS, Alpha) l Magnitude comparisons are done with explicit COMPARE instructions that put their results into named registers l Overflow and carry-out have to be inferred by software l Some instructions have two variants: one traps on overflow, the other does not 21