Design of Digital Circuits Lecture 10 a Instruction


















































- Slides: 50

Design of Digital Circuits Lecture 10 a: Instruction Set Architecture Prof. Onur Mutlu ETH Zurich Spring 2019 22 March 2019

Talk Announcement (Today) n n 22 March 2019, Friday, 16: 00 -17: 00, CAB G 51 Cross-Layer Architecture for Deep Learning Prof. Mattan Erez, University of Texas at Austin High-performance DNN inference and training is essential for the ongoing ML revolution. Training of DNNs requires massive memory capacity and bandwidth, and is generally a huge pain, especially for researchers. While significant research effort has been dedicated to inference accelerator, less work has been done on training, especially work that crosses the algorithmic and implementation layers. The result is a very limited number of high-cost accelerators available, in particular, with very expensive high-bandwidth memories. I will motivate and discuss some of our recent work on accelerating training (of CNNs) that combines understanding of and changes to the algorithm with matching hardware architecture modifications. Optional Review 2

Extra Assignment 2: Moore’s Law n (I)Paper review n n G. E. Moore. "Cramming more components onto integrated circuits, " Electronics magazine, 1965 Optional Assignment – for 1% extra credit q q n Write a 1 -page review Upload PDF file to Moodle – Deadline: Friday, March 22 I strongly recommend that you follow my guidelines for (paper) review (see next slide) 3

Extra Assignment 2: Moore’s Law n Guidelines on how to review papers critically (II) q q q Guideline slides: pdf ppt Video: https: //www. youtube. com/watch? v=t. OL 6 FANAJ 8 c Example reviews on “Main Memory Scaling: Challenges and Solution Directions” (link to the paper) n n q Review 1 Review 2 Example review on “Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems” (link to the paper) n Review 1 4

Agenda for Today & Next Few Lectures n LC-3 and MIPS Instruction Set Architectures n LC-3 and MIPS assembly and programming n n Introduction to microarchitecture and single-cycle microarchitecture Multi-cycle microarchitecture 5

Required Readings n This week q Von Neumann Model, LC-3, and MIPS n n q Programming n q n P&P, Chapters 4, 5 H&H, Chapter 6 P&P, Appendices A and C (ISA and microarchitecture of LC-3) H&H, Appendix B (MIPS instructions) P&P, Chapter 6 Recommended: H&H Chapter 5, especially 5. 1, 5. 2, 5. 4, 5. 5 Next week q Introduction to microarchitecture and single-cycle microarchitecture n n q H&H, Chapter 7. 1 -7. 3 P&P, Appendices A and C Multi-cycle microarchitecture n n H&H, Chapter 7. 4 P&P, Appendices A and C 6

Recall: The Instruction Cycle q q q FETCH DECODE EVALUATE ADDRESS FETCH OPERANDS EXECUTE STORE RESULT 7

Instruction Set Architectures 8

Recall: The Instruction Set Architecture The ISA is the interface between what the software commands n n and what the hardware carries out The ISA specifies Problem q Algorithm The memory organization n q Addressability (LC-3: 16 bits, MIPS: 32 bits) Word- or Byte-addressable The register set n n q Address space (LC-3: 216, MIPS: 232) R 0 to R 7 in LC-3 32 registers in MIPS Program ISA Microarchitecture Circuits Electrons The instruction set n n n Opcodes Data types Addressing modes 9

Recall: Opcodes in LC-3 10

Recall: Opcodes in LC-3 b 11

Recall: Funct in MIPS R-Type Instructions (I) Opcode is 0 in MIPS R-Type instructions. Funct defines the operation Harris and Harris, Appendix B: MIPS Instructions 12

Recall: Funct in MIPS R-Type Instructions (II) n Find the complete list of instructions in the appendix Harris and Harris, Appendix B: MIPS Instructions 13

Data Types n An ISA supports one or several data types n LC-3 only supports 2’s complement integers q n MIPS supports q q q n Negative of a 2’s complement binary value X = NOT(X) + 1 2’s complement integers Unsigned integers Floating point Again, tradeoffs are involved q What data types should be supported and what should not be? 14

Data Type Tradeoffs n What is the benefit of having more or high-level data types in the ISA? What is the disadvantage? n Think compiler/programmer vs. microarchitect n Concept of semantic gap n q n Data types coupled tightly to the semantic level, or complexity of instructions Example: Early RISC architectures vs. Intel 432 q q q Early RISC machines: Only integer data type Intel 432: Object data type, capability based machine VAX: Complex types, e. g. , doubly-linked list 15

Addressing Modes n n An addressing mode is a mechanism for specifying where an operand is located There five addressing modes in LC-3 q Immediate or literal (constant) n q Register n q The operand is in one of R 0 to R 7 registers Three of them are memory addressing modes n n The operand is in some bits of the instruction PC-relative Indirect Base+offset In addition, MIPS has pseudo-direct addressing (for j and jal), but does not have indirect addressing 16

Operate Instructions 17

Operate Instructions n In LC-3, there are three operate instructions q NOT is a unary operation (one source operand) n q ADD and AND are binary operations (two source operands) n n n It executes bitwise NOT ADD is 2’s complement addition AND is bitwise SR 1 & SR 2 In MIPS, there are many more q Most of R-type instructions (they are binary operations) n q q E. g. , add, and, nor, xor… I-type versions (i. e. , with one immediate operand) of the Rtype operate instructions F-type operations, i. e. , floating-point operations 18

NOT in LC-3 n NOT assembly and machine code LC-3 assembly Register file NOT R 3, R 5 DR Field Values OP DR SR 9 3 5 SR 111111 Machine Code OP DR SR 1001 011 001 111111 15 11 8 5 12 9 6 0 From FSM There is no NOT in MIPS. How is it implemented? 19

Operate Instructions n n n We are already familiar with LC-3’s ADD and AND with register mode (R-type in MIPS) Now let us see the versions with one literal (i. e. , immediate) operand Subtraction is another necessary operation q How is it implemented in LC-3 and MIPS? 20

Operate Instr. with one Literal in LC-3 n ADD and AND q OP DR SR 1 4 bits 3 bits 1 imm 5 5 bits OP = operation n E. g. , ADD = 0001 (same OP as the register-mode ADD) q n DR ← SR 1 + sign-extend(imm 5) E. g. , AND = 0101 (same OP as the register-mode AND) q DR ← SR 1 AND sign-extend(imm 5) q SR 1 = source register q DR = destination register q imm 5 = Literal or immediate (sign-extend to 16 bits) 21

ADD with one Literal in LC-3 n ADD assembly and machine code LC-3 assembly Register file ADD R 1, R 4, #-2 DR Instruction register SR Field Values OP DR SR 1 1 4 Signextend imm 5 1 -2 Machine Code OP DR SR 0001 100 15 11 12 9 8 imm 5 6 1 5 11110 4 From FSM 0 22

Instructions with one Literal in MIPS n I-type q n 2 register operands and immediate Some operate and data movement instructions opcode rs rt imm 6 bits 5 bits 16 bits q opcode = operation q rs = source register q rt = n n q destination register in some instructions (e. g. , addi, lw) source register in others (e. g. , sw) imm = Literal or immediate 23

Add with one Literal in MIPS n Add immediate MIPS assembly addi $s 0, $s 1, 5 Field Values op rs rt imm 0 17 16 5 rt ← rs + sign-extend(imm) Machine Code op rs rt imm 0010001 10010 0000 0101 0 x 22300005 24

Subtract in LC-3 n n n MIPS assembly High-level code MIPS assembly a = b + c - d; add $t 0, $s 1 sub $s 3, $t 0, $s 2 LC-3 assembly High-level code LC-3 assembly a = b + c - d; ADD NOT ADD Tradeoff in LC-3 q q More instructions But, simpler control logic R 2, R 0, R 1 R 4, R 3 R 5, R 4, #1 R 6, R 2, R 5 2’s complement of R 3 25

Subtract Immediate n MIPS assembly High-level code MIPS assembly a = b - 3; subi $s 1, $s 0, 3 Is subi necessary in MIPS? MIPS assembly addi $s 1, $s 0, -3 n LC-3 High-level code LC-3 assembly a = b - 3; ADD R 1, R 0, #-3 26

Data Movement Instructions and Addressing Modes 27

Data Movement Instructions n In LC-3, there are seven data movement instructions q n LD, LDR, LDI, LEA, STR, STI Format of load and store instructions q q Opcode (bits [15: 12]) DR or SR (bits [11: 9]) Address generation bits (bits [8: 0]) Four ways to interpret bits, called addressing modes n n n PC-Relative Mode Indirect Mode Base+offset Mode Immediate Mode In MIPS, there are only Base+offset and immediate modes for load and store instructions 28

PC-Relative Addressing Mode n LD (Load) and ST (Store) q 8 7 4 2 15 14 13 12 11 10 9 OP DR/SR PCoffset 9 4 bits 3 bits 9 bits 6 5 3 1 0 OP = opcode n n E. g. , LD = 0010 E. g. , ST = 0011 q DR = destination register in LD SR = source register in ST q LD: DR ← Memory[PC✝ + sign-extend(PCoffset 9)] q ST: Memory[PC✝ + sign-extend(PCoffset 9)] ← SR q ✝This is the incremented PC 29

LD in LC-3 n LD assembly and machine code Instruction register LC-3 assembly Register file LD R 2, 0 x 1 AF DR Incremented PC Field Values OP DR PCoffset 9 2 2 0 x 1 AF 3. DR is loaded 1. Address calculation Machine Code OP DR PCoffset 9 0010 110101111 15 12 11 9 Signextend 8 0 The memory address is only +255 to -256 locations away of the LD or ST instruction 2. Memory read Limitation: The PC-relative addressing mode cannot address far away from the instruction 30

Indirect Addressing Mode n LDI (Load Indirect) and STI (Store Indirect) q 8 7 4 2 15 14 13 12 11 10 9 OP DR/SR PCoffset 9 4 bits 3 bits 9 bits 6 5 3 1 0 OP = opcode n n E. g. , LDI = 1010 E. g. , STI = 1011 q DR = destination register in LDI SR = source register in STI q LDI: DR ← Memory[PC✝ + sign-extend(PCoffset 9)]] q STI: Memory[PC✝ + sign-extend(PCoffset 9)]] ← SR q ✝This is the incremented PC 31

LDI in LC-3 n LDI assembly and machine code Instruction register LC-3 assembly Register file LDI R 3, 0 x 1 CC Incremented PC Field Values OP DR PCoffset 9 A 3 0 x 1 CC 1. Address calculation OP DR PCoffset 9 1010 011 11100 12 11 9 Signextend 5. DR is loaded Machine Code 15 DR 8 0 3. Loaded address from MDR to MAR 2. Memory 4. Memory read Now the address of the operand can be anywhere in the memory 32

Base+Offset Addressing Mode n LDR (Load Register) and STR (Store Register) 15 14 13 12 OP 4 bits q 11 10 9 8 7 6 DR/SR Base. R 3 bits 5 4 3 2 1 0 offset 6 6 bits OP = opcode n n E. g. , LDR = 0110 E. g. , STR = 0111 q DR = destination register in LDR SR = source register in STR q LDR: DR ← Memory[Base. R + sign-extend(offset 6)] q STR: Memory[Base. R + sign-extend(offset 6)] ← SR q 33

LDR in LC-3 n LDR assembly and machine code Instruction register LC-3 assembly Register file DR Base. R 0110 001 010 LDR R 1, R 2, 0 x 1 D Signextend Field Values OP DR Base. R offset 6 6 1 2 0 x 1 D 3. DR is loaded 1. Address calculation Machine Code OP DR Base. R offset 6 0110 001 010 011101 11 8 15 12 9 6 5 0 2. Memory read Again, the address of the operand can be anywhere in the memory 34

Base+Offset Addressing Mode in MIPS n In MIPS, lw and sw use base+offset mode (or base addressing mode) High-level code MIPS assembly A[2] = a; sw $s 3, 8($s 0) Memory[$s 0 + 8] ← $s 3 Field Values n op rs rt imm 43 16 19 8 imm is the 16 -bit offset, which is sign-extended to 32 bits 35

An Example Program in MIPS and LC-3 High-level code MIPS registers LC-3 registers a = A[0]; c = a + b - 5; B[0] = c; A = $s 0 b = $s 2 B = $s 1 A = R 0 b = R 2 B = R 1 MIPS assembly LC-3 assembly lw $t 0, 0($s 0) add $t 1, $t 0, $s 2 addi $t 2, $t 1, -5 sw $t 2, 0($s 1) LDR ADD STR R 5, R 0, #0 R 6, R 5, R 2 R 7, R 6, #-5 R 7, R 1, #0 36

Immediate Addressing Mode n LEA (Load Effective Address) 8 7 4 15 14 13 12 11 10 9 OP DR PCoffset 9 4 bits 3 bits 9 bits 6 5 3 2 q OP = 1110 q DR = destination register q LEA: DR ← PC✝ + sign-extend(PCoffset 9) 1 0 What is the difference from PC-Relative addressing mode? Answer: Instructions with PC-Relative mode access memory, but LEA does not Hence the name Load Effective Address ✝This is the incremented PC 37

LEA in LC-3 n LEA assembly and machine code LC-3 assembly Instruction register Register file LEA R 5, #-3 Incremented PC Field Values OP DR PCoffset 9 E 5 0 x 1 FD Signextend DR Machine Code OP DR PCoffset 9 1110 101 111111101 15 12 11 9 8 0 38

Immediate Addressing Mode in MIPS n n In MIPS, lui (load upper immediate) loads a 16 -bit immediate into the upper half of a register and sets the lower half to 0 It is used to assign 32 -bit constants to a register High-level code MIPS assembly a = 0 x 6 d 5 e 4 f 3 c; # $s 0 = a lui $s 0, 0 x 6 d 5 e ori $s 0, 0 x 4 f 3 c 39

Addressing Example in LC-3 n What is the final value of R 3? P&P, Chapter 5. 3. 5 x 30 F 4 40

Addressing Example in LC-3 n What is the final value of R 3? P&P, Chapter 5. 3. 5 LEA -3 ADD ST -5 AND ADD STR LDI -9 R 1 = PC – 3 = 0 x 30 F 7 – 3 = 0 x 30 F 4 R 2 = R 1 + 14 = 0 x 30 F 4 + 14 = 0 x 3102 14 M[PC - 5] = M[0 x 030 F 4] = 0 x 3102 0 R 2 = 0 5 R 2 = R 2 + 5 = 5 14 M[R 1 + 14] = M[0 x 30 F 4 + 14] = M[0 x 3102] = 5 x 30 F 4 R 3 = M[M[PC – 9]] = M[M[0 x 30 FD – 9]] = M[M[0 x 30 F 4]] = M[0 x 3102] = 5 n The final value of R 3 is 5 41

Control Flow Instructions 42

Control Flow Instructions n Allow a program to execute out of sequence n Conditional branches and jumps q Conditional branches are used to make decisions n E. g. , if-else statement q In LC-3, three condition codes are used q Jumps are used to implement n n q Loops Function calls JMP in LC-3 and j in MIPS 43

Condition Codes in LC-3 n n Each time one GPR (R 0 -R 7) is written, three single-bit registers are updated Each of these condition codes are either set (set to 1) or cleared (set to 0) q If the written value is negative n q If the written value is zero n q Z is set, N and P are cleared If the written value is positive n n N is set, Z and P are cleared P is set, N and P are cleared x 86 and SPARC are examples of ISAs that use condition codes 44

Conditional Branches in LC-3 n BRz (Branch if Zero) BRz PCoffset 9 0000 n z p 4 bits q PCoffset 9 9 bits n, z, p = which condition code is tested (N, Z, and/or P) n n n, z, p: instruction bits to identify the condition codes to be tested N, Z, P: values of the corresponding condition codes q PCoffset 9 = immediate or constant value q if ((n AND N) OR (p AND P) OR (z AND Z)) n q then PC ← PC✝ + sign-extend(PCoffset 9) Variations: BRn, BRz, BRp, BRzp, BRnz, BRnzp ✝This is the incremented PC 45

Conditional Branches in LC-3 n BRz Program Counter BRz 0 x 0 D 9 Instruction register n z p Condition registers What if n = z = p = 1? * (i. e. , BRnzp) And what if n = z = p = 0? *n, z, p are the instruction bits to identify the condition codes to be tested 46

Conditional Branches in MIPS n beq (Branch if Equal) beq $s 0, $s 1, offset 4 rs rt offset 6 bits 5 bits 16 bits q 4 = opcode q rs, rt = source registers q offset = immediate or constant value q if rs == rt n q then PC ← PC✝ + sign-extend(offset) * 4 Variations: beq, bne, blez, bgtz ✝This is the incremented PC 47

Branch If Equal in MIPS and LC-3 n MIPS assembly LC-3 assembly beq $s 0, $s 1, offset NOT ADD BRz R 2, R 1 R 3, R 2, #1 R 4, R 3, R 0 offset Subtract (R 0 - R 1) This is an example of tradeoff in the instruction set q The same functionality requires more instructions in LC-3 q But, the control logic requires more complexity in MIPS 48

Lecture Summary n Instruction Set Architectures: LC-3 and MIPS q q q Operate instructions Data movement instructions Control instructions n Instruction formats n Addressing modes 49

Design of Digital Circuits Lecture 10 a: Instruction Set Architecture Prof. Onur Mutlu ETH Zurich Spring 2019 22 March 2019