Instruction Set Architecture COE 308 Computer Architecture Prof

  • Slides: 71
Download presentation
Instruction Set Architecture COE 308 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King

Instruction Set Architecture COE 308 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals

Presentation Outline v Instruction Set Architecture v Overview of the MIPS Processor v R-Type

Presentation Outline v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 2

Instruction Set Architecture (ISA) v Critical Interface between hardware and software v An ISA

Instruction Set Architecture (ISA) v Critical Interface between hardware and software v An ISA includes the following … ² Instructions and Instruction Formats ² Data Types, Encodings, and Representations ² Programmable Storage: Registers and Memory ² Addressing Modes: to address Instructions and Data ² Handling Exceptional Conditions (like division by zero) v Examples (Versions) First Introduced in ² Intel (8086, 80386, Pentium, . . . ) 1978 ² MIPS (MIPS I, III, IV, V) 1986 ² Power. PC (601, 604, …) 1993 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 3

Instructions v Instructions are the language of the machine v We will study the

Instructions v Instructions are the language of the machine v We will study the MIPS instruction set architecture ² Known as Reduced Instruction Set Computer (RISC) ² Elegant and relatively simple design ² Similar to RISC architectures developed in mid-1980’s and 90’s ² Very popular, used in many products § Silicon Graphics, ATI, Cisco, Sony, etc. ² Comes next in sales after Intel IA-32 processors § Almost 100 million MIPS processors sold in 2002 (and increasing) v Alternative design: Intel IA-32 ² Known as Complex Instruction Set Computer (CISC) Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 4

Basics of RISC Design v All instructions are typically of one size v Few

Basics of RISC Design v All instructions are typically of one size v Few instruction formats v Arithmetic instructions are register to register ² Operands are read from registers ² Result is stored in a register v General purpose integer and floating point registers ² Typically, 32 integer and 32 floating-point registers v Memory access only via load and store instructions ² Load and store: bytes, half words, and double words v Few simple addressing modes Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 5

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 6

Logical View of the MIPS Processor. . . Memory 4 bytes per word Up

Logical View of the MIPS Processor. . . Memory 4 bytes per word Up to 232 bytes = 230 words. . . EIU $0 $1 $2 32 General Purpose Registers Arithmetic & Logic Unit $31 ALU Execution & Integer Unit (Main proc) Integer mul/div Hi Integer Multiplier/Divider Instruction Set Architecture Lo FPU $F 0 $F 1 $F 2 Floating Point Unit (Coproc 1) 32 Floating-Point Registers $F 31 FP Arith Floating-Point Arithmetic Unit TMU Bad. Vadd r Status Cause EPC COE 308 – Computer Architecture – KFUPM Trap & Memory Unit (Coproc 0) © Muhamed Mudawar – slide 7

Overview of the MIPS Registers v 32 General Purpose Registers (GPRs) GPRs ² 32

Overview of the MIPS Registers v 32 General Purpose Registers (GPRs) GPRs ² 32 -bit registers are used in MIPS 32 $0 – $31 ² Register 0 is always zero ² Any value written to R 0 is discarded LO v Special-purpose registers LO and HI PC HI ² Hold results of integer multiply and divide FPRs v Special-purpose program counter PC $F 0 – $F 31 v 32 Floating Point Registers (FPRs) ² Floating Point registers can be either 32 -bit or 64 -bit ² A pair of registers is used for double-precision floating-point Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 8

MIPS General-Purpose Registers v 32 General Purpose Registers (GPRs) ² Assembler uses the dollar

MIPS General-Purpose Registers v 32 General Purpose Registers (GPRs) ² Assembler uses the dollar notation to name registers § $0 is register 0, $1 is register 1, …, and $31 is register 31 ² All registers are 32 -bit wide in MIPS 32 $0 = $zero $16 = $s 0 ² Register $0 is always zero $1 = $at $17 = $s 1 $2 = $v 0 $18 = $s 2 $3 = $v 1 $19 = $s 3 $4 = $a 0 $20 = $s 4 $5 = $a 1 $21 = $s 5 $6 = $a 2 $22 = $s 6 $7 = $a 3 $23 = $s 7 $8 = $t 0 $24 = $t 8 $9 = $t 1 $25 = $t 9 $10 = $t 2 $26 = $k 0 $11 = $t 3 $27 = $k 1 $12 = $t 4 $28 = $gp $13 = $t 5 $29 = $sp $14 = $t 6 $30 = $fp $15 = $t 7 $31 = $ra § Any value written to $0 is discarded v Software conventions ² Software defines names to all registers § To standardize their use in programs ² $8 - $15 are called $t 0 - $t 7 § Used for temporary values ² $16 - $23 are called $s 0 - $s 7 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 9

MIPS Register Conventions v Assembler can refer to registers by name or by number

MIPS Register Conventions v Assembler can refer to registers by name or by number ² It is easier for you to remember registers by name ² Assembler converts register name to its corresponding number Name $zero $at $v 0 – $v 1 $a 0 – $a 3 $t 0 – $t 7 $s 0 – $s 7 $t 8 – $t 9 $k 0 – $k 1 $gp $sp $fp $ra Instruction Set Architecture Register $0 $1 $2 – $3 $4 – $7 $8 – $15 $16 – $23 $24 – $25 $26 – $27 $28 $29 $30 $31 Usage Always 0 (forced by hardware) Reserved for assembler use Result values of a function Arguments of a function Temporary Values Saved registers (preserved across call) More temporaries Reserved for OS kernel Global pointer (points to global data) Stack pointer Frame pointer Return address COE 308 – Computer Architecture – KFUPM (points to top of stack) (points to stack frame) (used by jal for function call) © Muhamed Mudawar – slide 10

Instruction Formats v All instructions are 32 -bit wide, Three instruction formats: v Register

Instruction Formats v All instructions are 32 -bit wide, Three instruction formats: v Register (R-Type) ² Register-to-register instructions ² Op: operation code specifies the format of the instruction Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6 v Immediate (I-Type) ² 16 -bit immediate constant is part in the instruction Op 6 Rs 5 Rt 5 immediate 16 v Jump (J-Type) ² Used by jump instructions Op 6 Instruction Set Architecture immediate 26 COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 11

Instruction Categories v Integer Arithmetic ² Arithmetic, logical, and shift instructions v Data Transfer

Instruction Categories v Integer Arithmetic ² Arithmetic, logical, and shift instructions v Data Transfer ² Load and store instructions that access memory ² Data movement and conversions v Jump and Branch ² Flow-control instructions that alter the sequential sequence v Floating Point Arithmetic ² Instructions that operate on floating-point registers v Miscellaneous ² Instructions that transfer control to/from exception handlers ² Memory management instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 12

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 13

R-Type Format Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6

R-Type Format Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6 v Op: operation code (opcode) ² Specifies the operation of the instruction ² Also specifies the format of the instruction v funct: function code – extends the opcode ² Up to 26 = 64 functions can be defined for the same opcode ² MIPS uses opcode 0 to define R-type instructions v Three Register Operands (common to many instructions) ² Rs, Rt: first and second source operands ² Rd: destination operand ² sa: the shift amount used by shift instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 14

Integer Add /Subtract Instructions Instruction addu subu $s 1, $s 2, $s 3 Meaning

Integer Add /Subtract Instructions Instruction addu subu $s 1, $s 2, $s 3 Meaning $s 1 = $s 2 + $s 3 $s 1 = $s 2 – $s 3 R-Type Format op = 0 rs = $s 2 rt = $s 3 rd = $s 1 sa = 0 f = 0 x 20 f = 0 x 21 f = 0 x 22 f = 0 x 23 v add & sub: overflow causes an arithmetic exception ² In case of overflow, result is not written to destination register v addu & subu: same operation as add & sub ² However, no arithmetic exception can occur ² Overflow is ignored v Many programming languages ignore overflow ² The + operator is translated into addu ² The – operator is translated into subu Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 15

Addition/Subtraction Example v Consider the translation of: f = (g+h) – (i+j) v Compiler

Addition/Subtraction Example v Consider the translation of: f = (g+h) – (i+j) v Compiler allocates registers to variables ² Assume that f, g, h, i, and j are allocated registers $s 0 thru $s 4 ² Called the saved registers: $s 0 = $16, $s 1 = $17, …, $s 7 = $23 v Translation of: f = (g+h) – (i+j) addu $t 0, $s 1, $s 2 addu $t 1, $s 3, $s 4 subu $s 0, $t 1 # $t 0 = g + h # $t 1 = i + j # f = (g+h)–(i+j) ² Temporary results are stored in $t 0 = $8 and $t 1 = $9 v Translate: addu $t 0, $s 1, $s 2 to binary code op v Solution: Instruction Set Architecture rs = $s 1 rt = $s 2 rd = $t 0 sa func 000000 10001 10010 01000 00000 100001 COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 16

Logical Bitwise Operations v Logical bitwise operations: and, or, xor, nor x y x

Logical Bitwise Operations v Logical bitwise operations: and, or, xor, nor x y x and y x y 0 0 1 1 0 1 0 0 0 1 0 1 x or y 0 1 1 1 x y x xor y x nor y 0 0 1 1 0 1 0 1 1 0 0 0 v AND instruction is used to clear bits: x and 0 = 0 v OR instruction is used to set bits: x or 1 = 1 v XOR instruction is used to toggle bits: x xor 1 = not x v NOR instruction can be used as a NOT, how? ² nor $s 1, $s 2 is equivalent to not $s 1, $s 2 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 17

Logical Bitwise Instructions Instruction and or xor nor $s 1, $s 2, $s 3

Logical Bitwise Instructions Instruction and or xor nor $s 1, $s 2, $s 3 Meaning $s 1 = $s 2 & $s 3 $s 1 = $s 2 | $s 3 $s 1 = $s 2 ^ $s 3 $s 1 = ~($s 2|$s 3) R-Type Format op = 0 rs = $s 2 rt = $s 3 rd = $s 1 sa = 0 f = 0 x 24 f = 0 x 25 f = 0 x 26 f = 0 x 27 v Examples: Assume $s 1 = 0 xabcd 1234 and $s 2 = 0 xffff 0000 and $s 0, $s 1, $s 2 # $s 0 = 0 xabcd 0000 or $s 0, $s 1, $s 2 # $s 0 = 0 xffff 1234 xor $s 0, $s 1, $s 2 # $s 0 = 0 x 54321234 nor $s 0, $s 1, $s 2 # $s 0 = 0 x 0000 edcb Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 18

Shift Operations v Shifting is to move all the bits in a register left

Shift Operations v Shifting is to move all the bits in a register left or right v Shifts by a constant amount: sll, sra ² sll/srl mean shift left/right logical by a constant amount ² The 5 -bit shift amount field is used by these instructions ² sra means shift right arithmetic by a constant amount ² The sign-bit (rather than 0) is shifted from the left sll shift-out MSB srl shift-in 0 sra shift-in sign-bit Instruction Set Architecture 32 -bit register . . . shift-in 0 . . . shift-out LSB COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 19

Shift Instructions Instruction sll sra sllv srav $s 1, $s 2, 10 $s 1,

Shift Instructions Instruction sll sra sllv srav $s 1, $s 2, 10 $s 1, $s 2, $s 3 Meaning $s 1 = $s 2 << 10 $s 1 = $s 2>>>10 $s 1 = $s 2 >> 10 $s 1 = $s 2 << $s 3 $s 1 = $s 2>>>$s 3 $s 1 = $s 2 >> $s 3 R-Type Format op = 0 op = 0 rs = 0 rt = $s 2 rs = $s 3 rt = $s 2 rd = $s 1 rd = $s 1 sa = 10 sa = 0 f=2 f=3 f=4 f=6 f=7 v Shifts by a variable amount: sllv, srav ² Same as sll, sra, but a register is used for shift amount v Examples: assume that $s 2 = 0 xabcd 1234, $s 3 = 16 sll $s 1, $s 2, 8 $s 1 = $s 2<<8 $s 1 = 0 xcd 123400 sra $s 1, $s 2, 4 $s 1 = $s 2>>4 $s 1 = 0 xfabcd 123 $s 1 = $s 2>>>$s 3 $s 1 = 0 x 0000 abcd srlv $s 1, $s 2, $s 3 op=000000 rs=$s 3=10011 rt=$s 2=10010 rd=$s 1=10001 sa=00000 f=000110 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 20

Binary Multiplication v Shift-left (sll) instruction can perform multiplication ² When the multiplier is

Binary Multiplication v Shift-left (sll) instruction can perform multiplication ² When the multiplier is a power of 2 v You can factor any binary number into powers of 2 ² Example: multiply $s 1 by 36 § Factor 36 into (4 + 32) and use distributive property of multiplication ² $s 2 = $s 1*36 = $s 1*(4 + 32) = $s 1*4 + $s 1*32 sll $t 0, $s 1, 2 ; $t 0 = $s 1 * 4 sll $t 1, $s 1, 5 ; $t 1 = $s 1 * 32 addu $s 2, $t 0, $t 1 Instruction Set Architecture ; $s 2 = $s 1 * 36 COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 21

Your Turn. . . Multiply $s 1 by 26, using shift and add instructions

Your Turn. . . Multiply $s 1 by 26, using shift and add instructions Hint: 26 = 2 + 8 + 16 sll addu $t 0, $t 1, $s 2, $t 0, $s 2, $s 1, $t 0, $s 1, $s 2, 1 3 $t 1 4 $t 0 ; ; ; $t 0 $t 1 $s 2 $t 0 $s 2 = = = $s 1 $s 1 * * * 2 8 10 16 26 Multiply $s 1 by 31, Hint: 31 = 32 – 1 sll $s 2, $s 1, 5 subu $s 2, $s 1 Instruction Set Architecture ; $s 2 = $s 1 * 32 ; $s 2 = $s 1 * 31 COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 22

Integer Multiplication & Division v Consider a×b and a/b where a and b are

Integer Multiplication & Division v Consider a×b and a/b where a and b are in $s 1 and $s 2 ² Signed multiplication: ² Unsigned multiplication: ² Signed division: ² Unsigned division: multu divu $s 1, $s 2 $0 $1 . . $31 v For multiplication, result is 64 bits ² LO = low-order 32 -bit and HI = high-order 32 -bit Multiply Divide v For division ² LO = 32 -bit quotient and HI = 32 -bit remainder ² If divisor is 0 then result is unpredictable HI LO v Moving data ² mflo rd (move from LO to rd), mfhi rd (move from HI to rd) ² mtlo rs (move to LO from rs), mthi rs (move to HI from rs) Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 23

Integer Multiply/Divide Instructions Instruction multu divu mfhi mflo mthi mtlo rs, rt rd rd

Integer Multiply/Divide Instructions Instruction multu divu mfhi mflo mthi mtlo rs, rt rd rd rs rs Meaning hi, lo = rs × rt hi, lo = rs / rt rd = hi rd = lo hi = rs lo = rs Format op 6 = 0 op 6 = 0 rs 5 0 0 rs 5 rt 5 0 0 0 0 rd 5 0 0 0 x 18 0 x 19 0 x 1 a 0 x 1 b 0 x 10 0 x 12 0 x 11 0 x 13 v Signed arithmetic: mult, div (rs and rt are signed) ² LO = 32 -bit low-order and HI = 32 -bit high-order of multiplication ² LO = 32 -bit quotient and HI = 32 -bit remainder of division v Unsigned arithmetic: multu, divu (rs and rt are unsigned) v NO arithmetic exception can occur Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 24

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 25

I-Type Format v Constants are used quite frequently in programs ² The R-type shift

I-Type Format v Constants are used quite frequently in programs ² The R-type shift instructions have a 5 -bit shift amount constant ² What about other instructions that need a constant? v I-Type: Instructions with Immediate Operands Op 6 Rs 5 Rt 5 immediate 16 v 16 -bit immediate constant is stored inside the instruction ² Rs is the source register number ² Rt is now the destination register number (for R-type it was Rd) v Examples of I-Type ALU Instructions: ² Add immediate: addi $s 1, $s 2, 5 # $s 1 = $s 2 + 5 ² OR immediate: # $s 1 = $s 2 | 5 Instruction Set Architecture ori $s 1, $s 2, 5 COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 26

I-Type ALU Instructions Instruction addiu andi ori xori lui $s 1, $s 2, 10

I-Type ALU Instructions Instruction addiu andi ori xori lui $s 1, $s 2, 10 $s 1, $s 2, 10 $s 1, 10 Meaning $s 1 = $s 2 + 10 $s 1 = $s 2 & 10 $s 1 = $s 2 | 10 $s 1 = $s 2 ^ 10 $s 1 = 10 << 16 I-Type Format op = 0 x 8 op = 0 x 9 op = 0 xc op = 0 xd op = 0 xe op = 0 xf rs = $s 2 rs = $s 2 0 rt = $s 1 rt = $s 1 imm 16 = 10 imm 16 = 10 v addi: overflow causes an arithmetic exception ² In case of overflow, result is not written to destination register v addiu: same operation as addi but overflow is ignored v Immediate constant for addi and addiu is signed ² No need for subiu instructions v Immediate constant for andi, ori, xori is unsigned Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 27

Examples: I-Type ALU Instructions v Examples: assume A, B, C are allocated $s 0,

Examples: I-Type ALU Instructions v Examples: assume A, B, C are allocated $s 0, $s 1, $s 2 A = B+5; translated as addiu $s 0, $s 1, 5 C = B– 1; translated as addiu $s 2, $s 1, -1 op=001001 rs=$s 1=10001 rt=$s 2=10010 imm = -1 = 11111111 A = B&0 xf; translated as andi $s 0, $s 1, 0 xf C = B|0 xf; translated as ori $s 2, $s 1, 0 xf C = 5; translated as ori $s 2, $zero, 5 A = B; translated as ori $s 0, $s 1, 0 v No need for subi, because addi has signed immediate v Register 0 ($zero) has always the value 0 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 28

32 -bit Constants v I-Type instructions can have only 16 -bit constants Op 6

32 -bit Constants v I-Type instructions can have only 16 -bit constants Op 6 Rs 5 Rt 5 immediate 16 v What if we want to load a 32 -bit constant into a register? v Can’t have a 32 -bit constant in I-Type instructions ² We have already fixed the sizes of all instructions to 32 bits v Solution: use two instructions instead of one ² Suppose we want: $s 1=0 x. AC 5165 D 9 (32 -bit constant) ² lui: load upper immediate load upper 16 bits clear lower 16 bits lui $s 1, 0 x. AC 51 $s 1=$17 0 x. AC 51 0 x 0000 ori $s 1, 0 x 65 D 9 $s 1=$17 0 x. AC 51 0 x 65 D 9 Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 29

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 30

J-Type Format Op 6 immediate 26 v J-type format is used for unconditional jump

J-Type Format Op 6 immediate 26 v J-type format is used for unconditional jump instruction: j label. . . label: # jump to label v 26 -bit immediate value is stored in the instruction ² Immediate constant specifies address of target instruction v Program Counter (PC) is modified as follows: ² Next PC = PC 4 immediate 26 00 least-significant 2 bits are 00 ² Upper 4 most significant bits of PC are unchanged Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 31

Conditional Branch Instructions v MIPS compare and branch instructions: beq Rs, Rt, label branch

Conditional Branch Instructions v MIPS compare and branch instructions: beq Rs, Rt, label branch to label if (Rs == Rt) bne Rs, Rt, label branch to label if (Rs != Rt) v MIPS compare to zero & branch instructions Compare to zero is used frequently and implemented efficiently bltz Rs, label branch to label if (Rs < 0) bgtz Rs, label branch to label if (Rs > 0) blez Rs, label branch to label if (Rs <= 0) bgez Rs, label branch to label if (Rs >= 0) v No need for beqz and bnez instructions. Why? Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 32

Set on Less Than Instructions v MIPS also provides set on less than instructions

Set on Less Than Instructions v MIPS also provides set on less than instructions slt rd, rs, rt if (rs < rt) rd = 1 else rd = 0 sltu rd, rs, rt unsigned < slti rt, rs, im 16 if (rs < im 16) rt = 1 else rt = 0 sltiu rt, rs, im 16 unsigned < v Signed / Unsigned Comparisons Can produce different results Assume $s 0 = 1 and $s 1 = -1 = 0 xffff $t 0, $s 1 results in $t 0 = 0 stlu $t 0, $s 1 results in $t 0 = 1 slt Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 33

More on Branch Instructions v MIPS hardware does NOT provide instructions for … blt,

More on Branch Instructions v MIPS hardware does NOT provide instructions for … blt, ble, bgt, bge, bltu bleu bgtu bgeu branch if less than branch if less or equal branch if greater than branch if greater or equal (signed/unsigned) Can be achieved with a sequence of 2 instructions v How to implement: v Solution: blt $s 0, $s 1, label slt $at, $s 0, $s 1 bne $at, $zero, label v How to implement: v Solution: ble $s 2, $s 3, label slt $at, $s 3, $s 2 beq $at, $zero, label Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 34

Pseudo-Instructions v Introduced by assembler as if they were real instructions ² To facilitate

Pseudo-Instructions v Introduced by assembler as if they were real instructions ² To facilitate assembly language programming Pseudo-Instructions move $s 1, $s 2 not $s 1, $s 2 li $s 1, 0 xabcd 1234 sgt $s 1, $s 2, $s 3 blt $s 1, $s 2, label Conversion to Real Instructions addu Ss 1, $s 2, $zero nor $s 1, $s 2 ori $s 1, $zero, 0 xabcd lui $s 1, 0 xabcd ori $s 1, 0 x 1234 slt $s 1, $s 3, $s 2 slt $at, $s 1, $s 2 bne $at, $zero, label v Assembler reserves $at = $1 for its own use ² $at is called the assembler temporary register Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 35

Jump, Branch, and SLT Instructions Instruction j beq bne blez bgtz bltz bgez label

Jump, Branch, and SLT Instructions Instruction j beq bne blez bgtz bltz bgez label rs, rt, label rs, label Instruction sltu sltiu rd, rs, rt rt, rs, imm 16 Instruction Set Architecture Meaning jump to label branch if (rs == rt) branch if (rs != rt) branch if (rs<=0) branch if (rs > 0) branch if (rs < 0) branch if (rs>=0) Format op 6 = 2 op 6 = 4 op 6 = 5 op 6 = 6 op 6 = 7 op 6 = 1 imm 26 rs 5 rs 5 Meaning rd=(rs<rt? 1: 0) rt=(rs<imm? 1: 0) rt 5 0 0 0 1 imm 16 imm 16 Format op 6 = 0 0 xa 0 xb rs 5 COE 308 – Computer Architecture – KFUPM rt 5 rd 5 0 0 x 2 a 0 0 x 2 b imm 16 © Muhamed Mudawar – slide 36

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 37

Translating an IF Statement v Consider the following IF statement: if (a == b)

Translating an IF Statement v Consider the following IF statement: if (a == b) c = d + e; else c = d – e; Assume that a, b, c, d, e are in $s 0, …, $s 4 respectively v How to translate the above IF statement? bne $s 0, $s 1, else addu $s 2, $s 3, $s 4 j exit else: subu $s 2, $s 3, $s 4 exit: . . . Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 38

Compound Expression with AND v Programming languages use short-circuit evaluation v If first expression

Compound Expression with AND v Programming languages use short-circuit evaluation v If first expression is false, second expression is skipped if (($s 1 > 0) && ($s 2 < 0)) {$s 3++; } # One Possible Implementation. . . bgtz $s 1, L 1 # first expression j next # skip if false L 1: bltz $s 2, L 2 # second expression j next # skip if false L 2: addiu $s 3, 1 # both are true next: Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 39

Better Implementation for AND if (($s 1 > 0) && ($s 2 < 0))

Better Implementation for AND if (($s 1 > 0) && ($s 2 < 0)) {$s 3++; } The following implementation uses less code Reverse the relational operator Allow the program to fall through to the second expression Number of instructions is reduced from 5 to 3 # Better Implementation blez $s 1, next bgez $s 2, next addiu $s 3, 1 next: Instruction Set Architecture . . . # skip if false # both are true COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 40

Compound Expression with OR v Short-circuit evaluation for logical OR v If first expression

Compound Expression with OR v Short-circuit evaluation for logical OR v If first expression is true, second expression is skipped if (($sl > $s 2) || ($s 2 > $s 3)) {$s 4 = 1; } v Use fall-through to keep the code as short as possible bgt $s 1, $s 2, L 1 ble $s 2, $s 3, next L 1: li $s 4, 1 next: # yes, execute if part # no: skip if part # set $s 4 to 1 v bgt, ble, and li are pseudo-instructions ² Translated by the assembler to real instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 41

Your Turn. . . v Translate the IF statement to assembly language v $s

Your Turn. . . v Translate the IF statement to assembly language v $s 1 and $s 2 values are unsigned if( $s 1 <= $s 2 ) { $s 3 = $s 4 } bgtu $s 1, $s 2, next move $s 3, $s 4 next: v $s 3, $s 4, and $s 5 values are signed if (($s 3 <= $s 4) && ($s 4 > $s 5)) { $s 3 = $s 4 + $s 5 } Instruction Set Architecture bgt $s 3, $s 4, next ble $s 4, $s 5, next addu $s 3, $s 4, $s 5 next: COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 42

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 43

Load and Store Instructions v Instructions that transfer data between memory & registers v

Load and Store Instructions v Instructions that transfer data between memory & registers v Programs include variables such as arrays and objects v Such variables are stored in memory v Load Instruction: load ² Transfers data from memory to a register Registers store Memory v Store Instruction: ² Transfers data from a register to memory v Memory address must be specified by load and store Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 44

Load and Store Word v Load Word Instruction (Word = 4 bytes in MIPS)

Load and Store Word v Load Word Instruction (Word = 4 bytes in MIPS) lw Rt, imm 16(Rs) # Rt = MEMORY[Rs+imm 16] v Store Word Instruction sw Rt, imm 16(Rs) # MEMORY[Rs+imm 16] = Rt v Base or Displacement addressing is used ² Memory Address = Rs (base) + Immediate 16 (displacement) ² Immediate 16 is sign-extended to have a signed displacement Base or Displacement Addressing Op 6 Rs 5 Rt 5 immediate 16 + Memory Word Base address Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 45

Example on Load & Store v Translate A[1] = A[2] + 5 (A is

Example on Load & Store v Translate A[1] = A[2] + 5 (A is an array of words) ² Assume that address of array A is stored in register $s 0 lw $s 1, 8($s 0) # $s 1 = A[2] addiu $s 2, $s 1, 5 # $s 2 = A[2] + 5 sw $s 2, 4($s 0) # A[1] = $s 2 v Index of a[2] and a[1] should be multiplied by 4. Why? Memory Registers . . . $s 0 = $16 address of A $s 1 = $17 value of A[2] $s 2 = $18 A[2] + 5. . . lw sw A[2] A+12 A+8 A[1] A+4 A[0] A A[3] . . . Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 46

Load and Store Byte and Halfword v The MIPS processor supports the following data

Load and Store Byte and Halfword v The MIPS processor supports the following data formats: ² Byte = 8 bits, Halfword = 16 bits, Word = 32 bits v Load & store instructions for bytes and halfwords ² lb = load byte, lbu = load byte unsigned, sb = store byte ² lh = load half, lhu = load half unsigned, sh = store halfword v Load expands a memory data to fit into a 32 -bit register v Store reduces a 32 -bit register to fit in memory 32 -bit Register s sign – extend s s b 0 zero – extend 0 bu s sign – extend s s h 0 zero – extend 0 hu Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 47

Load and Store Instructions Instruction lb lh lw lbu lhu sb sh sw rt,

Load and Store Instructions Instruction lb lh lw lbu lhu sb sh sw rt, imm 16(rs) rt, imm 16(rs) Meaning I-Type Format rt = MEM[rs+imm 16] rt = MEM[rs+imm 16] = rt 0 x 20 0 x 21 0 x 23 0 x 24 0 x 25 0 x 28 0 x 29 0 x 2 b rs 5 rs 5 rt 5 rt 5 imm 16 imm 16 v Base or Displacement Addressing is used ² Memory Address = Rs (base) + Immediate 16 (displacement) v Two variations on base addressing ² If Rs = $zero = 0 then Address = Immediate 16 (absolute) ² If Immediate 16 = 0 then Address = Rs (register indirect) Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 48

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 49

Translating a WHILE Loop v Consider the following WHILE statement: i = 0; while

Translating a WHILE Loop v Consider the following WHILE statement: i = 0; while (A[i] != k) i = i+1; Where A is an array of integers (4 bytes per element) Assume address A, i, k in $s 0, $s 1, $s 2, respectively v How to translate above WHILE statement? xor move loop: lw beq addiu sll addu j next: . . . Instruction Set Architecture $s 1, $t 0, $t 1, $s 1, $t 0, loop $s 1, $s 1 $s 0 0($t 0) $s 2, next $s 1, 1 $s 1, 2 $s 0, $t 0 # # # # COE 308 – Computer Architecture – KFUPM Memory. . . A[i] A+4×i . . . A[2] A+8 A[1] A+4 A[0] A . . . i = 0 $t 0 = address A $t 1 = A[i] exit if (A[i]== k) i = i+1 $t 0 = 4*i $t 0 = address A[i] © Muhamed Mudawar – slide 50

Using Pointers to Traverse Arrays v Consider the same WHILE loop: i = 0;

Using Pointers to Traverse Arrays v Consider the same WHILE loop: i = 0; while (A[i] != k) i = i+1; Where address of A, i, k are in $s 0, $s 1, $s 2, respectively v We can use a pointer to traverse array A Pointer is incremented by 4 (faster than indexing) move j loop: addiu cond: lw bne $t 0, cond $s 1, $t 0, $t 1, $s 0 $s 1, 1 $t 0, 4 0($t 0) $s 2, loop # # # $t 0 = $s 0 = addr A test condition i = i+1 point to next $t 1 = A[i] loop if A[i]!= k v Only 4 instructions (rather than 6) in loop body Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 51

Arrays vs. Pointers v Array indexing involves ² Multiplying index by element size ²

Arrays vs. Pointers v Array indexing involves ² Multiplying index by element size ² Using shift instruction when element size is a power of 2 ² Adding to array base address v Array version requires shift to be inside loop ² Part of index calculation for incremented i v Pointers correspond directly to memory addresses ² Can avoid indexing complexity ² Induction variable elimination ² Less instructions and faster code Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 52

Copying a String The following code copies source string to target string Address of

Copying a String The following code copies source string to target string Address of source in $s 0 and address of target in $s 1 Strings are terminated with a null character (C strings) i = 0; do {target[i]=source[i]; i++; } while (source[i]!=0); move L 1: lb sb addiu bne Instruction Set Architecture $t 0, $t 1, $t 2, $s 0 $s 1 0($t 0) 0($t 1) $t 0, 1 $t 1, 1 $zero, L 1 # # # # $t 0 = pointer to source $t 1 = pointer to target load byte into $t 2 store byte into target increment source pointer increment target pointer loop until NULL char COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 53

Summing an Integer Array sum = 0; for (i=0; i<n; i++) sum = sum

Summing an Integer Array sum = 0; for (i=0; i<n; i++) sum = sum + A[i]; Assume $s 0 = array address, $s 1 = array length = n move xor L 1: lw addu addiu bne Instruction Set Architecture $t 0, $t 1, $s 2, $t 2, $s 2, $t 0, $t 1, $s 0 $t 1, $t 1 $s 2, $s 2 0($t 0) $s 2, $t 2 $t 0, 4 $t 1, 1 $s 1, L 1 # # # # $t 0 = address A[i] $t 1 = i = 0 $s 2 = sum = 0 $t 2 = A[i] sum = sum + A[i] point to next A[i] i++ loop if (i != n) COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 54

Addressing Modes v Where are the operands? v How memory addresses are computed? Immediate

Addressing Modes v Where are the operands? v How memory addresses are computed? Immediate Addressing Op 6 Rs 5 Rt 5 Operand is a constant immediate 16 Register Addressing Op 6 Rs 5 Rt 5 Rd 5 sa 5 Operand is in a register funct 6 Register Operand is in memory (load/store) Base or Displacement Addressing Op 6 Rs 5 Rt 5 immediate 16 + Byte Halfword Word Register = Base address Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 55

Branch / Jump Addressing Modes Used for branching (beq, bne, …) PC-Relative Addressing Op

Branch / Jump Addressing Modes Used for branching (beq, bne, …) PC-Relative Addressing Op 6 Rs 5 Rt 5 immediate 16 Word = Target Instruction +1 PC 30 00 Target Instruction Address PC = PC + 4 × (1 + immediate 16) PC 30 + immediate 16 + 1 Used by jump instruction Pseudo-direct Addressing Op 6 00 immediate 26 Word = Target Instruction : PC 26 PC 4 00 Target Instruction Address Instruction Set Architecture PC 4 immediate 26 COE 308 – Computer Architecture – KFUPM 00 © Muhamed Mudawar – slide 56

Jump and Branch Limits v Jump Address Boundary = 226 instructions = 256 MB

Jump and Branch Limits v Jump Address Boundary = 226 instructions = 256 MB ² Text segment cannot exceed 226 instructions or 256 MB ² Upper 4 bits of PC are unchanged Target Instruction Address PC 4 immediate 26 00 v Branch Address Boundary ² Branch instructions use I-Type format (16 -bit immediate constant) ² PC-relative addressing: PC 30 + immediate 16 + 1 00 § Target instruction address = PC + 4×(1 + immediate 16) § Count number of instructions to branch from next instruction § Positive constant => Forward Branch, Negative => Backward branch § At most ± 215 instructions to branch (most branches are near) Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 57

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v

Next. . . v Instruction Set Architecture v Overview of the MIPS Processor v R-Type Arithmetic, Logical, and Shift Instructions v I-Type Format and Immediate Constants v Jump and Branch Instructions v Translating If Statements and Boolean Expressions v Load and Store Instructions v Translating Loops and Traversing Arrays v Alternative Architecture Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 58

Alternative Architecture v Design alternative: ² Provide more complex instructions ² Goal is to

Alternative Architecture v Design alternative: ² Provide more complex instructions ² Goal is to reduce number of instructions executed ² Danger is a slower cycle time and/or a higher CPI v Let’s look briefly at IA-32 (Intel Architecture - 32 bits) ² An architecture that is “difficult to explain and impossible to love” ² Developed by several independent groups ² Evolved over more than 20 years ² History illustrates impact of compatibility on the ISA Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 59

The Intel x 86 ISA v Evolution with backward compatibility ² 8080 (1974): 8

The Intel x 86 ISA v Evolution with backward compatibility ² 8080 (1974): 8 -bit microprocessor § Accumulator, plus 3 index-register pairs ² 8086 (1978): 16 -bit extension to 8080 § Complex instruction set (CISC) ² 8087 (1980): floating-point coprocessor § Adds FP instructions and register stack ² 80286 (1982): 24 -bit addresses, MMU § Segmented memory mapping and protection ² 80386 (1985): 32 -bit extension (now IA-32) § Additional addressing modes and operations § Paged memory mapping as well as segments Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 60

The Intel x 86 ISA v Further evolution… ² i 486 (1989): pipelined, on-chip

The Intel x 86 ISA v Further evolution… ² i 486 (1989): pipelined, on-chip caches and FPU § Compatible competitors: AMD, Cyrix, … ² Pentium (1993): superscalar, 64 -bit datapath § Added MMX (Multi-Media e. Xtension) instructions § The infamous FDIV bug ² Pentium Pro (1995), Pentium II (1997) § New microarchitecture (see Colwell, The Pentium Chronicles) ² Pentium III (1999) § Added SSE (Streaming SIMD Extensions) and registers ² Pentium 4 (2001) § New microarchitecture § Added SSE 2 instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 61

The Intel x 86 ISA v And further… ² AMD 64 (2003): extended architecture

The Intel x 86 ISA v And further… ² AMD 64 (2003): extended architecture to 64 bits ² EM 64 T – Extended Memory 64 Technology (2004) § AMD 64 adopted by Intel (with refinements) § Added SSE 3 instructions ² Intel Core (2006) § Added SSE 4 instructions, virtual machine support ² AMD 64 (announced 2007): SSE 5 instructions § Intel declined to follow, instead… ² Advanced Vector Extension (announced 2008) § Longer SSE registers, more instructions v Technical elegance ≠ market success Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 62

Basic x 86 Registers (IA-32) Instruction Set Architecture COE 308 – Computer Architecture –

Basic x 86 Registers (IA-32) Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 63

Typical IA-32 Instructions v Data movement instructions ² MOV, PUSH, POP, LEA, … v

Typical IA-32 Instructions v Data movement instructions ² MOV, PUSH, POP, LEA, … v Arithmetic and logical instructions ² ADD, SUB, SHL, SHR, ROL, OR, XOR, INC, DEC, CMP, … v Control flow instructions ² JMP, JZ, JNZ, CALL, RET, LOOP, … v String instructions ² MOVS, LODS, … v First operand is a source and destination ² Can be register or memory operand v Second operand is a source ² Can be register, memory, or an immediate constant Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 64

IA-32 Instruction Formats v Complexity: ² Instruction formats from 1 to 17 bytes long

IA-32 Instruction Formats v Complexity: ² Instruction formats from 1 to 17 bytes long ² One operand must act as both a source and destination ² One operand can come from memory ² Complex addressing modes § Base or scaled index with 8 or 32 bit displacement v Typical IA-32 Instruction Formats: PUSH ESI CALL 5 3 8 32 PUSH Reg CALL Offset ADD EAX, #6765 4 3 1 32 ADD Reg w Immediate JE EIP + displacement 4 4 8 JE MOV 6 MOV Condi- Displacement tion EBX, [EDI + 45] 1 1 8 d w Instruction Set Architecture r/m Postbyte 8 Displacement TEST EDX, #42 7 1 TEST w 8 32 Postbyte Immediate COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 65

ARM & MIPS Similarities v ARM: the most popular embedded core v Similar basic

ARM & MIPS Similarities v ARM: the most popular embedded core v Similar basic set of instructions to MIPS ARM MIPS 1985 Instruction size 32 bits Address space 32 -bit flat Data alignment Aligned 9 3 15 × 32 -bit 31 × 32 -bit Memory mapped Date announced Data addressing modes Registers Input/output Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 66

Compare and Branch in ARM v Uses condition codes for the result of an

Compare and Branch in ARM v Uses condition codes for the result of an arithmetic/logic instruction ² Negative, zero, carry, overflow ² Compare instructions to set condition codes without keeping the result v Each instruction can be conditional ² Top 4 bits of instruction word: condition value ² Can avoid branches over single instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 67

Instruction Encoding Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed

Instruction Encoding Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 68

Fallacies v Powerful instruction higher performance ² Fewer instructions required ² But complex instructions

Fallacies v Powerful instruction higher performance ² Fewer instructions required ² But complex instructions are hard to implement § May slow down all instructions, including simple ones ² Compilers are good at making fast code from simple instructions v Use assembly code for high performance ² But modern compilers are better at dealing with modern processors ² More lines of code more errors and less productivity Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 69

Fallacies v Backward compatibility instruction set doesn’t change ² But they do introduce more

Fallacies v Backward compatibility instruction set doesn’t change ² But they do introduce more instructions x 86 instruction set Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 70

Summary of Design Principles 1. Simplicity favors regularity ² Simple instructions dominate the instruction

Summary of Design Principles 1. Simplicity favors regularity ² Simple instructions dominate the instruction frequency § § So design them to be simple and regular, and make them fast Use general-purpose registers uniformly across instructions ² Fix the size of instructions (simplifies fetching & decoding) ² Fix the number of operands per instruction § Three operands is the natural number for a typical instruction 2. Smaller is faster ² Limit the number of registers for faster access (typically 32) 3. Make the common case fast ² Include constants inside instructions (faster than loading them) ² Design most instructions to be register-to-register 4. Good design demands good compromises ² Smaller immediate constants in I-type instructions Instruction Set Architecture COE 308 – Computer Architecture – KFUPM © Muhamed Mudawar – slide 71