Instruction Set Architecture Outline t t t Instruction

  • Slides: 136
Download presentation
Instruction Set Architecture 國立清華大學資訊 程學系 黃婷婷教授

Instruction Set Architecture 國立清華大學資訊 程學系 黃婷婷教授

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 1

What Is Computer Architecture? Computer Architecture = Instruction Set Architecture + Machine Organization t

What Is Computer Architecture? Computer Architecture = Instruction Set Architecture + Machine Organization t “. . . the attributes of a [computing] system as seen by the [______ assembly language] programmer, i. e. the conceptual structure and functional behavior …” What are specified? 2

Recall in C Language t Operators: +, -, *, /, % (mod), . .

Recall in C Language t Operators: +, -, *, /, % (mod), . . . l t Operands: l l t 7/4==1, 7%4==3 Variables: lower, upper, fahr, celsius Constants: 0, 1000, -17, 15. 4 Assignment statement: variable = expression l Expressions consist of operators operating on operands, e. g. , celsius = 5*(fahr-32)/9; a = b+c+d-e; 3

When Translating to Assembly. . . a = b + 5; Statement load add

When Translating to Assembly. . . a = b + 5; Statement load add store $r 1, M[b] $r 2, 5 $r 3, $r 1, $r 2 Constant $r 3, M[a] Operands Memory Register Operator (op code) 4

Components of an ISA t Organization of programmable storage l l l t Data

Components of an ISA t Organization of programmable storage l l l t Data types and data structures l t t registers memory: flat, segmented modes of addressing and accessing data items and instructions encoding and representation Instruction formats Instruction set (or operation code) l ALU, control transfer, exceptional handling 5

MIPS ISA as an Example t Registers Instruction categories: l l l Load/Store Computational

MIPS ISA as an Example t Registers Instruction categories: l l l Load/Store Computational Jump and Branch Floating Point Memory Management Special $r 0 - $r 31 PC HI LO 3 Instruction Formats: all 32 bits wide OP $rs $rt OP $rd sa funct immediate jump target 6

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 7

Operations of Hardware t Syntax of basic MIPS arithmetic/logic instructions: 1 2 3 4

Operations of Hardware t Syntax of basic MIPS arithmetic/logic instructions: 1 2 3 4 add $s 0, $s 1, $s 2 # f = g + h 1) operation by name 2) operand getting result (“destination”) 3) 1 st operand for operation (“source 1”) 4) 2 nd operand for operation (“source 2”) t t Each instruction is 32 bits Syntax is rigid: 1 operator, 3 operands l t Why? Keep hardware simple via regularity Design Principle 1: Simplicity favors regularity l l Regularity makes implementation simpler Simplicity enables higher performance at lower cost 8

Example t How to do the following C statement? f = (g + h)

Example t How to do the following C statement? f = (g + h) - (i + j); Compiled MIPS code: add t 0, g, h add t 1, i, j sub f, t 0, t 1 # temp t 0 = g + h # temp t 1 = i + j # f = t 0 - t 1 9

Operands and Registers t Unlike high-level language, assembly don’t use variables => assembly operands

Operands and Registers t Unlike high-level language, assembly don’t use variables => assembly operands are registers l l t Limited number of special locations built directly into the hardware Operations are performed on these Benefits: l l Registers in hardware => faster than memory Registers are easier for a compiler to use n l e. g. , as a place for temporary storage Registers can hold variables to reduce memory traffic and improve code density (since register named with fewer bits than memory location) 10

MIPS Registers t 32 registers, each is 32 bits wide l l l t

MIPS Registers t 32 registers, each is 32 bits wide l l l t t Why 32? Design Principle 2: smaller is faster Groups of 32 bits called a word in MIPS Registers are numbered from 0 to 31 Each can be referred to by number or name Number references: $0, $1, $2, … $30, $31 By convention, each register also has a name to make it easier to code, e. g. , $16 - $22 $s 0 - $s 7 (C variables) $8 - $15 $t 0 - $t 7 (temporary) 32 x 32 -bit FP registers (paired DP) Others: HI, LO, PC 11

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1 at . . . (caller can clobber) 2 v 0 expression evaluation & 23 s 7 3 v 1 function results 24 t 8 4 a 0 arguments 25 t 9 5 a 1 26 k 0 reserved for OS kernel 6 a 2 27 k 1 7 a 3 28 gp pointer to global area 8 t 0 reserved for assembler temporary: caller saves temporary (cont’d) 29 sp stack pointer . . . (callee can clobber) 30 fp frame pointer 15 t 7 31 ra return address (HW) Fig. 2. 18 12

MIPS R 2000 Organization Fig. A. 10. 1 13

MIPS R 2000 Organization Fig. A. 10. 1 13

Example t How to do the following C statement? f t t = (g

Example t How to do the following C statement? f t t = (g + h) - (i + j); f, …, j in $s 0, …, $s 4 use intermediate temporary register t 0, t 1 add $t 0, $s 1, $s 2 add $t 1, $s 3, $s 4 sub $s 0, $t 1 # t 0 = g + h # t 1 = i + j # f=(g+h)-(i+j) 14

Register Architecture t Accumulator (1 register): 1 address: add A 1+x address: addx A

Register Architecture t Accumulator (1 register): 1 address: add A 1+x address: addx A t Stack: 0 address: t add tos + next General Purpose Register: 2 address: 3 address: t acc + mem[A] acc + mem[A+x] add A, B EA(A) + EA(B) add A, B, C EA(A) EA(B) + EA(C) Load/Store: (a special case of GPR) 3 address: add load store $ra, $rb, $rc $ra $rb + $rc $ra, $rb $ra mem[$rb] $ra, $rb mem[$rb] $ra 15

Register Organization Affects Programming Code for C = A + B for four register

Register Organization Affects Programming Code for C = A + B for four register organizations: Stack Accumulator Register (reg-mem) (load-store) Push A Load $r 1, A Push B Add $r 1, B Load $r 2, B Add Store C, $r 1 Add $r 3, $r 1, $r 2 Pop C Store C, $r 3 => Register organization is an attribute of ISA! Comparison: Byte per instruction? Number of instructions? Cycles per instruction? Since 1975 all machines use GPRs 16

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 17

Memory Operands t C variables map onto registers; what about large data structures like

Memory Operands t C variables map onto registers; what about large data structures like arrays? l t Memory contains such data structures But MIPS arithmetic instructions operate on registers, not directly on memory l l Data transfer instructions (lw, sw, . . . ) to transfer between memory and register A way to address memory operands 18

Data Transfer: Memory to Register (1/2) t To transfer a word of data, need

Data Transfer: Memory to Register (1/2) t To transfer a word of data, need to specify two things: l l Register: specify this by number (0 - 31) Memory address: more difficult n Think of memory as a 1 D array n Address it by supplying a pointer to a memory address n Offset (in bytes) from this pointer n The desired memory address is the sum of these two values, e. g. , 8($t 0) n Specifies the memory address pointed to by the value in $t 0, plus 8 bytes (why “bytes”, not “words”? ) n Each address is 32 bits 19

Data Transfer: Memory to Register (2/2) t Load Instruction Syntax: 1 2 3 4

Data Transfer: Memory to Register (2/2) t Load Instruction Syntax: 1 2 3 4 lw $t 0, 12($s 0) t t 1) operation name 2) register that will receive value 3) numerical offset in bytes 4) register containing pointer to memory Example: lw $t 0, 12($s 0) l lw (Load Word, so a word (32 bits) is loaded at a time) l Take the pointer in $s 0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t 0 Notes: l $s 0 is called the base register, 12 is called the offset l Offset is generally used in accessing elements of array: base register points to the beginning of the array 20

Example t $s 0 t lw = 1000 Memory 1000 $t 0, 12($s 0)

Example t $s 0 t lw = 1000 Memory 1000 $t 0, 12($s 0) t $t 0 ‧ ‧ ‧ 1004 = ? 999 1008 1012 999 1016 ‧ ‧ ‧ Instruction Set-21 21

Data Transfer: Register to Memory t t t Also want to store value from

Data Transfer: Register to Memory t t t Also want to store value from a register into memory Store instruction syntax is identical to Load instruction syntax Example: sw $t 0, 12($s 0) l l sw (meaning Store Word, so 32 bits or one word are stored at a time) This instruction will take the pointer in $s 0, add 12 bytes to it, and then store the value from register $t 0 into the memory address pointed to by the calculated sum 22

Example t $s 0 = 1000 t $t 0 = 25 t sw Memory

Example t $s 0 = 1000 t $t 0 = 25 t sw Memory 1000 $t 0, 12($s 0) ‧ ‧ ‧ 1004 1008 t M[? ] = 25= M[1012] 25 1012 25 1016 ‧ ‧ ‧ Instruction Set-23 23

Compilation with Memory t Compile by hand using registers: $s 1: g, $s 2:

Compilation with Memory t Compile by hand using registers: $s 1: g, $s 2: h, $s 3: base address of A g = h + A[8]; t What offset in lw to select an array element A[8] in a C program? l l 4 x 8=32 bytes to select A[8] 1 st transfer from memory to register: lw l t $t 0, 32($s 3) # $t 0 gets A[8] Add 32 to $s 3 to select A[8], put into $t 0 Next add it to h and place in g add $s 1, $s 2, $t 0 # $s 1 = h+A[8] 24

Memory Operand Example 2 t C code: A[12] = h + A[8]; l t

Memory Operand Example 2 t C code: A[12] = h + A[8]; l t h in $s 2, base address of A in $s 3 Compiled MIPS code: l Index 8 requires offset of A lw $t 0, A($s 3) add $t 0, $s 2, $t 0 sw $t 0, B($s 3) # load word # store word A = 32 B = 48 25

Addressing: Byte versus Word t t Every word in memory has an address, similar

Addressing: Byte versus Word t t Every word in memory has an address, similar to an index in an array Early computers numbered words like C numbers elements of an array: l Memory[0], Memory[1], Memory[2], … Called the “address” of a word t t Computers need to access 8 -bit bytes as well as words (4 bytes/word) Today, machines address memory as bytes, hence word addresses differ by 4 l l Memory[0], Memory[4], Memory[8], … This is also why lw and sw use bytes in offset 26

A Note about Memory: Alignment t MIPS requires that all words start at addresses

A Note about Memory: Alignment t MIPS requires that all words start at addresses that are multiples of 4 bytes 0 1 2 3 Aligned Not Aligned t Called Alignment: objects must fall on address that is multiple of their size 27

Another Note: Endianess t t Byte order: numbering of bytes within a word Big

Another Note: Endianess t t Byte order: numbering of bytes within a word Big Endian: address of most significant byte at least address of a word l t IBM 360/370, Motorola 68 k, MIPS, Sparc, HP PA Little Endian: address of least significant byte at least address l Intel 80 x 86, DEC Vax, DEC Alpha (Windows NT) little endian byte 0 3 2 1 0 msb lsb 0 1 big endian byte 0 2 3 word address 28

Role of Registers vs. Memory t What if more variables than registers? l l

Role of Registers vs. Memory t What if more variables than registers? l l t Compiler tries to keep most frequently used variables in registers Writes less common variables to memory: spilling Why not keep all variables in memory? l l Smaller is faster: registers are faster than memory Registers more versatile: n n MIPS arithmetic instructions can read 2 registers, operate on them, and write 1 per instruction MIPS data transfers only read or write 1 operand per instruction, and no operation 29

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 30

Constants t t Small constants used frequently (50% of operands) e. g. , A

Constants t t Small constants used frequently (50% of operands) e. g. , A = A + 5; B = B + 1; C = C - 18; Put 'typical constants' in memory and load them Constant data specified in an instruction: addi $29, 4 slti $8, $18, 10 andi $29, 6 ori $29, 4 Design Principle 3: Make the common case fast 31

Immediate Operands t Immediate: numerical constants l l Often appear in code, so there

Immediate Operands t Immediate: numerical constants l l Often appear in code, so there are special instructions for them Add Immediate: f = g + 10 (in C) addi $s 0, $s 1, 10 (in MIPS) where $s 0, $s 1 are associated with f, g Syntax similar to add instruction, except that last argument is a number instead of a register No subtract immediate instruction n Just use a negative constant addi $s 2, $s 1, -1 32

The Constant Zero t t The number zero (0), appears very often in code;

The Constant Zero t t The number zero (0), appears very often in code; so we define register zero MIPS register 0 ($zero) is the constant 0 l l t Cannot be overwritten This is defined in hardware, so an instruction like addi $0, 5 will not do anything Useful for common operations l E. g. , move between registers add $t 2, $s 1, $zero 33

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers (read by students) Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 34

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 35

Instructions as Numbers t Currently we only work with words (32 -bit blocks): l

Instructions as Numbers t Currently we only work with words (32 -bit blocks): l l t Each register is a word lw and sw both access memory one word at a time So how do we represent instructions? l l Remember: Computer only understands 1 s and 0 s, so “add $t 0, $0” is meaningless to hardware MIPS wants simplicity: since data is in words, make instructions be words… 36

MIPS Instruction Format t One instruction is 32 bits => divide instruction word into

MIPS Instruction Format t One instruction is 32 bits => divide instruction word into “fields” l t Each field tells computer something about instruction We could define different fields for each instruction, but MIPS is based on simplicity, so define 3 basic types of instruction formats: l l l R-format: for register I-format: for immediate, and lw and sw (since the offset counts as an immediate) J-format: for jump 37

R-Format Instructions t (1/2) Define the following “fields”: 6 opcode l l l 5

R-Format Instructions t (1/2) Define the following “fields”: 6 opcode l l l 5 rs 5 rt 5 rd 5 shamt 6 funct opcode: partially specifies what instruction it is (Note: 0 for all R-Format instructions) funct: combined with opcode to specify the instruction Question: Why aren’t opcode and funct a single 12 -bit field? rs (Source Register): generally used to specify register containing first operand rt (Target Register): generally used to specify register containing second operand rd (Destination Register): generally used to specify register which will receive result of computation 38

R-Format Instructions t Notes about register fields: l t (2/2) Each register field is

R-Format Instructions t Notes about register fields: l t (2/2) Each register field is exactly 5 bits, which means that it can specify any unsigned integer in the range 0 -31. Each of these fields specifies one of the 32 registers by number. Final field: l l shamt: contains the amount a shift instruction will shift by. Shifting a 32 -bit word by more than 31 is useless, so this field is only 5 bits This field is set to 0 in all but the shift instructions 39

R-format Example op rs rt rd shamt funct 6 bits 5 bits 6 bits

R-format Example op rs rt rd shamt funct 6 bits 5 bits 6 bits add $t 0, $s 1, $s 2 Special $s 1 $s 2 $t 0 0 add 0 17 18 8 0 32 000000 10001 10010 01000 00000 10000001100100100001000002 = 0232402016 40

Hexadecimal t Base 16 l l 0 1 2 3 t Compact representation of

Hexadecimal t Base 16 l l 0 1 2 3 t Compact representation of bit strings 4 bits per hex digit 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 a b 1000 1001 1010 1011 c d e f 1100 1101 1110 1111 Example: eca 8 6420 l 1110 1100 1010 1000 0110 0100 0010 0000 41

I-Format Instructions t Define the following “fields”: 6 opcode l l l 5 rs

I-Format Instructions t Define the following “fields”: 6 opcode l l l 5 rs 5 rt 16 immediate opcode: uniquely specifies an I-format instruction rs: specifies the only register operand rt: specifies register which will receive result of computation (target register) addi, slti, immediate is sign-extended to 32 bits, and treated as a signed integer 16 bits can be used to represent immediate up to 216 different values 42

MIPS I-format Instructions t Design Principle 4: Good design demands good compromises l Different

MIPS I-format Instructions t Design Principle 4: Good design demands good compromises l Different formats complicate decoding, but allow 32 -bit instructions uniformly l Keep formats as similar as possible 43

I-Format Example 1 t MIPS Instruction: addi $21, $22, -50 l opcode = 8

I-Format Example 1 t MIPS Instruction: addi $21, $22, -50 l opcode = 8 (look up in table) l rs = 22 (register containing operand) l rt = 21 (target register) l immediate = -50 (by default, this is decimal) decimal representation: 8 22 21 -50 binary representation: 001000 10110 10101 11111001110 44

I-Format Example 2 t MIPS Instruction: lw $t 0, 1200($t 1) l opcode =

I-Format Example 2 t MIPS Instruction: lw $t 0, 1200($t 1) l opcode = 35 (look up in table) l rs = 9 (base register) l rt = 8 (destination register) l immediate = 1200 (offset) decimal representation: 35 9 8 1200 binary representation: 100011 01000 0000010010110000 45

Stored Program Computers The BIG Picture Memory Accounting program (machine code) Editor program (machine

Stored Program Computers The BIG Picture Memory Accounting program (machine code) Editor program (machine code) Processo r C compiler (machine code) Payroll data Book text t Instructions represented in binary, just like data Instructions and data stored in memory Programs can operate on programs l t e. g. , compilers, linkers, … Binary compatibility allows compiled programs to work on different computers l Standardized ISAs Source code in C For editor program 46

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 47

Bitwise Operations t t t Up until now, we’ve done arithmetic (add, sub, addi)

Bitwise Operations t t t Up until now, we’ve done arithmetic (add, sub, addi) and memory access (lw and sw) All of these instructions view contents of register as a single quantity (such as a signed or unsigned integer) New perspective: View contents of register as 32 bits rather than as a single 32 -bit number Since registers are composed of 32 bits, we may want to access individual bits rather than the whole. Introduce two new classes of instructions: l l Shift instructions Logical operators 48

Logical Operations t t Instructions for bitwise manipulation Operation C Java MIPS Shift left

Logical Operations t t Instructions for bitwise manipulation Operation C Java MIPS Shift left << << sll Shift right >> >>> srl Bitwise AND & & and, andi Bitwise OR | | or, ori Bitwise NOT ~ ~ nor Useful for extracting and inserting groups of bits in a word 49

Shift Operations t t rs rt rd shamt funct 6 bits 5 bits 6

Shift Operations t t rs rt rd shamt funct 6 bits 5 bits 6 bits shamt: how many positions to shift Shift left logical l l t op Shift left and fill with 0 bits sll by i bits multiplies by 2 i Shift right logical l l Shift right and fill with 0 bits srl by i bits divides by 2 i (unsigned only) 50

Shift Instructions t Shift Instruction Syntax: 1 2 3 sll (1/3) 4 $t 2,

Shift Instructions t Shift Instruction Syntax: 1 2 3 sll (1/3) 4 $t 2, $s 0, 4 1) operation name 2) register that will receive value 3) first operand (register) 4) shift amount (constant) t MIPS has three shift instructions: l l l sll (shift left logical): shifts left, fills empties with 0 s srl (shift right logical): shifts right, fills empties with 0 s sra (shift right arithmetic): shifts right, fills empties by sign extending 51

Shift Instructions t t (2/3) Move (shift) all the bits in a word to

Shift Instructions t t (2/3) Move (shift) all the bits in a word to the left or right by a number of bits, filling the emptied bits with 0 s. Example: shift right by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0000 0001 0010 0011 0100 0101 0110 t Example: shift left by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0000 52

Shift Instructions t (3/3) Example: shift right arithmetic by 8 bits 0001 0010 0011

Shift Instructions t (3/3) Example: shift right arithmetic by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0000 0001 0010 0011 0100 0101 0110 t Example: shift right arithmetic by 8 bits 1001 0010 0011 0100 0101 0110 0111 1000 1111 1001 0010 0011 0100 0101 0110 53

Uses for Shift Instructions t Shift for multiplication: in binary l Multiplying by 4

Uses for Shift Instructions t Shift for multiplication: in binary l Multiplying by 4 is same as shifting left by 2: n n l t 112 x 1002 = 11002 10102 x 1002 = 1010002 Multiplying by 2 n is same as shifting left by n Since shifting is so much faster than multiplication (you can imagine how complicated multiplication is), a good compiler usually notices when C code multiplies by a power of 2 and compiles it to a shift instruction: a *= 8; would compile to: sll $s 0, 3 (in C) (in MIPS) 54

AND Operations t Useful to mask bits in a word l Select some bits,

AND Operations t Useful to mask bits in a word l Select some bits, clear others to 0 and $t 0, $t 1, $t 2 0000 0000 1101 1100 0000 $t 1 0000 0011 1100 0000 $t 0 0000 0000 1100 0000 55

OR Operations t Useful to include bits in a word l Set some bits

OR Operations t Useful to include bits in a word l Set some bits to 1, leave others unchanged or $t 0, $t 1, $t 2 0000 0000 1101 1100 0000 $t 1 0000 0011 1100 0000 $t 0 0000 0011 1100 0000 56

NOT Operations t t Useful to invert bits in a word l Change 0

NOT Operations t t Useful to invert bits in a word l Change 0 to 1, and 1 to 0 MIPS has NOR 3 -operand instruction l a NOR b == NOT ( a OR b ) nor $t 0, $t 1, $zero Register 0: always read as zero $t 1 0000 0011 1100 0000 $t 0 1111 1100 0011 1111 57

So Far. . . t t t All instructions have allowed us to manipulate

So Far. . . t t t All instructions have allowed us to manipulate data. So we’ve built a calculator. In order to build a computer, we need ability to make decisions… 58

Outline t t t Instruction set architecture Operands l Register operands and their organization

Outline t t t Instruction set architecture Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 59

MIPS Decision Instructions beq t register 1, register 2, L 1 Decision instruction in

MIPS Decision Instructions beq t register 1, register 2, L 1 Decision instruction in MIPS: beq register 1, register 2, L 1 “Branch if (registers are) equal” meaning : if (register 1==register 2) goto L 1 t Complementary MIPS decision instruction bne register 1, register 2, L 1 “Branch if (registers are) not equal” meaning : if (register 1!=register 2) goto L 1 t These are called conditional branches 60

MIPS Goto Instruction j t MIPS has an unconditional branch: j label l l

MIPS Goto Instruction j t MIPS has an unconditional branch: j label l l t t label Called a Jump Instruction: jump directly to the given label without testing any condition meaning : goto label Technically, it’s the same as: beq $0, label since it always satisfies the condition It has the j-type instruction format 61

Compiling C if into MIPS t Compile by hand if (i == j) f=g+h;

Compiling C if into MIPS t Compile by hand if (i == j) f=g+h; else f=g-h; t Use this mapping: f, g. . , j : $s 0, $s 1, $s 2, $s 3, $s 4 t (true) i == j f=g+h Final compiled MIPS code: Else: Exit: bne add j sub $s 3, $s 4, Else $s 0, $s 1, $s 2 Exit $s 0, $s 1, $s 2 i == j? # # (false) i != j f=g-h Exit branch i!=j f=g+h(true) go to Exit f=g-h (false) Note: Compiler automatically creates labels to handle decisions (branches) appropriately 62

Compiling Loop Statements t C code: while (save[i] == k) i += 1; l

Compiling Loop Statements t C code: while (save[i] == k) i += 1; l t i in $s 3, k in $s 5, address of save in $s 6 Compiled MIPS code: Loop: Exit: … sll add lw bne addi j $t 1, $t 0, $s 3, Loop $s 3, 2 $t 1, $s 6 0($t 1) $s 5, Exit $s 3, 1 #$t 1=i x 4 #$t 1=addr of save[i] #$t 0=save[i] #if save[i]!=k goto Exit #i=i+1 #goto Loop 63

Inequalities in MIPS t t Until now, we’ve only tested equalities (== and !=

Inequalities in MIPS t t Until now, we’ve only tested equalities (== and != in C), but general programs need to test < and > Set on Less Than: l l slt rd, rs, rt n if (rs < rt) rd = 1; else rd = 0; slti rt, rs, constant n if (rs < constant) rt = 1; else rt = 0; Compile by hand: if (g < h) goto Less; Let g: $s 0, h: $s 1 slt $t 0, $s 1 bne $t 0, $0, Less t # $t 0 = 1 if g<h # goto Less if $t 0!=0 MIPS has no “branch on less than” => too complex 64

Branch Instruction Design t t Why not blt, bge, etc? Hardware for <, ≥,

Branch Instruction Design t t Why not blt, bge, etc? Hardware for <, ≥, … slower than =, ≠ l l t t Combining with branch involves more work per instruction, requiring a slower clock All instructions penalized! beq and bne are the common case This is a good design compromise 65

Signed vs. Unsigned t t t Signed comparison: slt, slti Unsigned comparison: sltu, sltui

Signed vs. Unsigned t t t Signed comparison: slt, slti Unsigned comparison: sltu, sltui Example l l l $s 0 = 1111 1111 $s 1 = 0000 0000 0001 slt $t 0, $s 1 # signed n l – 1 < +1 $t 0 = 1 sltu $t 0, $s 1 n # unsigned +4, 294, 967, 295 > +1 $t 0 = 0 66

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 67

Procedure Calling t Steps required Caller: Place parameters in registers 2. Transfer control to

Procedure Calling t Steps required Caller: Place parameters in registers 2. Transfer control to procedure Callee: 3. Acquire storage for procedure 4. Perform procedure’s operations 5. Place result in register for caller 6. Return to place of call 1. 68

C Function Call Bookkeeping sum = leaf_example(a, b, c, d). . . int leaf_example

C Function Call Bookkeeping sum = leaf_example(a, b, c, d). . . int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } Return address $ra t Procedure address Labels t Arguments $a 0, $a 1, $a 2, $a 3 t Return value $v 0, $v 1 t Local variables $s 0, $s 1, …, $s 7 Note the use of register conventions t 69

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1 at . . . (caller can clobber) 2 v 0 expression evaluation & 23 s 7 3 v 1 function results 24 t 8 4 a 0 arguments 25 t 9 5 a 1 26 k 0 reserved for OS kernel 6 a 2 27 k 1 7 a 3 28 gp pointer to global area 8 t 0 reserved for assembler temporary: caller saves temporary (cont’d) 29 sp stack pointer . . . (callee can clobber) 30 fp frame pointer 15 t 7 31 ra return address (HW) Fig. 2. 18 70

Procedure Call Instructions t t Procedure call: jump and link jal Procedure. Label l

Procedure Call Instructions t t Procedure call: jump and link jal Procedure. Label l Address of following instruction put in $ra l Jumps to target address (i. e. , Procedure. Label) Procedure return: jump register jr $ra l Copies $ra to program counter l Can also be used for computed jumps n e. g. , for case/switch statements l Jump table is an array of addresses corresponding to labels in codes l Load appropriate entry to register l Jump register 71

Caller’s Code t . . . sum = leaf_example(a, b, c, d). . .

Caller’s Code t . . . sum = leaf_example(a, b, c, d). . . t MIPS code: a, …, d in $s 0, …, $s 3, and sum in $s 4 : add add jal add $a 0, $s 0 $a 1, $0, $s 1 $a 2, $0, $s 2 $a 3, $0, $s 3 leaf_example $s 4, $0, $v 0 Move a, b, c, d to a 0. . a 3 Jump to leaf_example Move result in v 0 to sum : 72

Procedure, Stack, Activation Record t We have only one register file….

Procedure, Stack, Activation Record t We have only one register file….

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1

Registers Conventions for MIPS 0 zero constant 0 16 s 0 callee saves 1 at . . . (caller can clobber) 2 v 0 expression evaluation & 23 s 7 3 v 1 function results 24 t 8 4 a 0 arguments 25 t 9 5 a 1 26 k 0 reserved for OS kernel 6 a 2 27 k 1 7 a 3 28 gp pointer to global area 8 t 0 reserved for assembler temporary: caller saves temporary (cont’d) 29 sp stack pointer . . . (callee can clobber) 30 fp frame pointer 15 t 7 31 ra return address (HW) Fig. 2. 18 74

Leaf Procedure Example t C code: int leaf_example (int g, h, i, j) {

Leaf Procedure Example t C code: int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } l Arguments g, …, j in $a 0, …, $a 3 l f in $s 0 (hence, need to save $s 0 on stack) l Save $t 1 and $t 2 l Result in $v 0 75

Leaf Procedure Example t MIPS code: leaf_example: addi $sp, -12 sw $s 0, 0($sp)

Leaf Procedure Example t MIPS code: leaf_example: addi $sp, -12 sw $s 0, 0($sp) sw $t 0, 4($sp) sw $t 1, 8($sp) add $t 0, $a 1一 add $t 1, $a 2, $a 3 sub $s 0, $t 1 add $v 0, $s 0, $zero lw $s 0, 0($sp) lw $t 0, 4($sp) lw $t 1, 8($sp) addi $sp, 12 jr $ra Save $s 0 $t 1 on stack Procedure body Result Restore $s 0 $t 1 Return 76

Local Data on the Stack High address After procedure In procedure Before procedure High

Local Data on the Stack High address After procedure In procedure Before procedure High address $sp Contents of $t 1 Contents of $t 0 $sp Contents of $s 0 77

Use of Register Convention t Do not save the values stored in temporary, t

Use of Register Convention t Do not save the values stored in temporary, t 0 t 7. l Saving load and store operations l Then, we have … 78

Leaf Procedure Example t C code: int leaf_example (int g, h, i, j) {

Leaf Procedure Example t C code: int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } l Arguments g, …, j in $a 0, …, $a 3 l f in $s 0 (hence, need to save $s 0 on stack) l $t 1 and $t 2 are not saved on stack l Result in $v 0 79

Leaf Procedure Example t MIPS code: leaf_example: addi $sp, -4 sw $s 0, 0($sp)

Leaf Procedure Example t MIPS code: leaf_example: addi $sp, -4 sw $s 0, 0($sp) add $t 0, $a 1 add $t 1, $a 2, $a 3 sub $s 0, $t 1 add $v 0, $s 0, $zero lw $s 0, 0($sp) addi $sp, 4 jr $ra Save $s 0 on stack Procedure body Result Restore $s 0 Return 80

Local Data on the Stack High address $sp Contents of $s 0 81

Local Data on the Stack High address $sp Contents of $s 0 81

Non-Leaf Procedures t t t Procedures that call other procedures For nested call, caller

Non-Leaf Procedures t t t Procedures that call other procedures For nested call, caller needs to save on the stack: l Its return address l Any arguments and temporaries needed after the call (because callee will not save them) Restore from the stack after the call 82

Non-Leaf Procedure Example t C code: int fact (int n) { if (n <

Non-Leaf Procedure Example t C code: int fact (int n) { if (n < 1) return 1; else return n * fact(n - 1); } l Argument n in $a 0 l Result in $v 0 83

Non-Leaf Procedure Example t MIPS code: fact: addi sw sw slti beq addi jr

Non-Leaf Procedure Example t MIPS code: fact: addi sw sw slti beq addi jr L 1: addi jal lw lw addi mul jr $sp, $ra, $a 0, $t 0, $v 0, $sp, $ra $a 0, fact $a 0, $ra, $sp, $v 0, $ra $sp, -8 4($sp) 0($sp) $a 0, 1 $zero, L 1 $zero, 1 $sp, 8 $a 0, -1 0($sp) 4($sp) $sp, 8 $a 0, $v 0 # # adjust stack for 2 items save return address save argument test for n < 1 # # # # # if so, result is 1 pop 2 items from stack and return else decrement n recursive call restore original n and return address pop 2 items from stack multiply to get result and return 84

Local Data on the Stack t t Local data allocated by callee l e.

Local Data on the Stack t t Local data allocated by callee l e. g. , C automatic variables Procedure frame (activation record) l Used by some compilers to manage stack storage 85

Memory Layout t t Text: program code Static data: global variables l e. g.

Memory Layout t t Text: program code Static data: global variables l e. g. , static variables in C, constant arrays and strings l $gp initialized to address allowing ±offsets into this segment Dynamic data: heap l E. g. , malloc in C, new in Java Stack: automatic storage 86

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 91

Character Data t t Byte-encoded character sets l ASCII: 128 characters n 95 graphic,

Character Data t t Byte-encoded character sets l ASCII: 128 characters n 95 graphic, 33 control l Latin-1: 256 characters n ASCII, +96 more graphic characters Unicode: 32 -bit character set l Used in Java, C++ wide characters, … l Most of the world’s alphabets, plus symbols l UTF-8, UTF-16: variable-length encodings 92

Byte/Halfword Operations Could use bitwise operations t MIPS byte/halfword load/store t String processing is

Byte/Halfword Operations Could use bitwise operations t MIPS byte/halfword load/store t String processing is a common case t lb rt, offset(rs) lh rt, offset(rs) l Sign extend to 32 bits in rt lbu rt, offset(rs) lhu rt, offset(rs) l Zero extend to 32 bits in rt sb rt, offset(rs) sh rt, offset(rs) l Store just rightmost byte/halfword 93

Load Byte Signed/Unsigned $t 0 … 12 F 7 F 0 … lb $t

Load Byte Signed/Unsigned $t 0 … 12 F 7 F 0 … lb $t 1, 0($t 0) $t 1 FFFFFF F 7 Sign-extended lbu $t 2, 0($t 0) $t 2 000000 F 7 Zero-extended Instruction Set-94

String Copy Example t C code (naïve): l Null-terminated string void strcpy (char x[],

String Copy Example t C code (naïve): l Null-terminated string void strcpy (char x[], char y[]) { int i; i = 0; while ((x[i]=y[i])!='') i += 1; } l Addresses of x, y in $a 0, $a 1 l i in $s 0 95

String Copy Example t MIPS code: strcpy: addi sw add L 1: add lbu

String Copy Example t MIPS code: strcpy: addi sw add L 1: add lbu add sb beq addi j L 2: lw addi jr $sp, $s 0, $t 1, $t 2, $t 3, $t 2, $s 0, L 1 $s 0, $sp, $ra $sp, -4 0($sp) $zero, $zero $s 0, $a 1 0($t 1) $s 0, $a 0 0($t 3) $zero, L 2 $s 0, 1 0($sp) $sp, 4 # # # # adjust stack for 1 item save $s 0 i = 0 addr of y[i] in $t 1 $t 2 = y[i] addr of x[i] in $t 3 x[i] = y[i] exit loop if y[i] == 0 i = i + 1 next iteration of loop restore saved $s 0 pop 1 item from stack and return 96

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 97

32 -bit Constants t t t Most constants are small l 16 -bit immediate

32 -bit Constants t t t Most constants are small l 16 -bit immediate is sufficient For the occasional 32 -bit constant Load Upper Immediate: lui rt, constant l l Copies 16 -bit constant to left 16 bits of rt Clears right 16 bits of rt to 0 Load a big number to s 0 using lui 0000 0011 1101 0000 1001 0000 lui $s 0, 61 0000 0011 1101 0000 S 0 ori $s 0, 2304 0000 0011 1101 0000 1001 0000 S 0 98

Branch Addressing (1) t Use I-format: opcode l l t rs rt immediate opcode

Branch Addressing (1) t Use I-format: opcode l l t rs rt immediate opcode specifies beq or bne Rs and Rt specify registers to compare What can immediate specify? PC-relative addressing l l Immediate is only 16 bits, but PC is 32 -bit => immediate cannot specify entire address Loops are generally small: < 50 instructions n l Though we want to branch to anywhere in memory, a single branch only need to change PC by a small amount How to use PC-relative addressing n n 16 -bit immediate as a signed two’s complement integer to be added to the PC if branch taken Now we can branch +/- 215 bytes from the PC ? 99

Branch Addressing (2) t Immediate specifies word address l Instructions are word aligned (byte

Branch Addressing (2) t Immediate specifies word address l Instructions are word aligned (byte address is always a multiple of 4, i. e. , it ends with 00 in binary) n l l t The number of bytes to add to the PC will always be a multiple of 4 Specify the immediate in words (confusing? ) Now, we can branch +/- 215 words from the PC (or +/- 217 bytes), handle loops 4 times as large Immediate specifies PC + 4 l l l Due to hardware, add immediate to (PC+4), not to PC If branch not taken: PC = PC + 4 If branch taken: PC = (PC+4) + (immediate*4) 100

Branch Example t MIPS Code: Loop: beq addi j End: t $9, $0, End

Branch Example t MIPS Code: Loop: beq addi j End: t $9, $0, End $8, $10 $9, -1 Loop Branch is I-Format: opcode rs rt immediate opcode = 4 (look up in table) rs = 9 (first operand) rt = 0 (second operand) immediate = ? ? ? l Number of instructions to add to (or subtract from) the PC, starting at the instruction following the branch => immediate = 3 101

Branch Example t MIPS Code: Loop: beq addi j End: $9, $0, End $8,

Branch Example t MIPS Code: Loop: beq addi j End: $9, $0, End $8, $10 $9, -1 Loop decimal representation: 4 9 0 3 binary representation: 0001001 000000000011 102

Jump Addressing t t (1/3) For branches, we assumed that we won’t want to

Jump Addressing t t (1/3) For branches, we assumed that we won’t want to branch too far, so we can specify change in PC. For general jumps (j and jal), we may jump to anywhere in memory. Ideally, we could specify a 32 -bit memory address to jump to. Unfortunately, we can’t fit both a 6 -bit opcode and a 32 -bit address into a single 32 -bit word, so we compromise. 103

Jump Addressing t Define “fields” of the following number of bits each: 6 bits

Jump Addressing t Define “fields” of the following number of bits each: 6 bits t target address Key concepts: l l t 26 bits As usual, each field has a name: opcode t (2/3) Keep opcode field identical to R-format and I-format for consistency Combine other fields to make room for target address Optimization: l Jumps only jump to word aligned addresses n n last two bits are always 00 (in binary) specify 28 bits of the 32 -bit address 104

Jump Addressing t Where do we get the other 4 bits? l l l

Jump Addressing t Where do we get the other 4 bits? l l l t Take the 4 highest order bits from the PC Technically, this means that we cannot jump to anywhere in memory, but it’s adequate 99. 9999…% of the time, since programs aren’t that long Linker and loader avoid placing a program across an address boundary of 256 MB Summary: l l t (3/3) New PC = PC[31. . 28] || target address (26 bits) || 00 Note: II means concatenation 4 bits || 26 bits || 2 bits = 32 -bit address If we absolutely need to specify a 32 -bit address: l Use jr $ra # jump to the address specified by $ra 105

Target Addressing Example t Loop code from earlier example l Assume Loop at location

Target Addressing Example t Loop code from earlier example l Assume Loop at location 80000 $t 1, $s 3, 2 80000 0 0 19 9 4 0 add $t 1, $s 6 80004 0 9 22 9 0 32 lw $t 0, 0($t 1) 80008 35 9 8 0 bne $t 0, $s 5, Exit 80012 5 8 21 2 addi $s 3, 1 80016 8 19 19 1 j 80020 2 Loop: sll Loop Exit: … 20000 80024 80016 + 2 x 4 = 80024 t 20000 x 4 = 80000 t 106

Branching Far Away t t If branch target is too far to encode with

Branching Far Away t t If branch target is too far to encode with 16 -bit offset, assembler rewrites the code Example beq $s 0, $s 1, L 1 ↓ bne $s 0, $s 1, L 2 j L 1 L 2: … 107

MIPS Addressing Mode 1. Immediate addressing op rs rt Immediate 2. Register addressing op

MIPS Addressing Mode 1. Immediate addressing op rs rt Immediate 2. Register addressing op rs rt rd … funct Registers Register 3. Base addressing op rs rt Register Address Memory + Byte Halfword Word 108

MPIS Addressing Modes 4. PC-relative addressing op rs rt Memory Address PC + Word

MPIS Addressing Modes 4. PC-relative addressing op rs rt Memory Address PC + Word 5. Pseudodirect addressing op Address PC Memory Word 109

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 110

Translation and Startup Many compilers produce object modules directly Static linking 111

Translation and Startup Many compilers produce object modules directly Static linking 111

Producing an Object Module t t Assembler (or compiler) translates program into machine instructions

Producing an Object Module t t Assembler (or compiler) translates program into machine instructions Provides information for building a complete program from the pieces l l l Header: described contents of object module Text segment: translated instructions Static data segment: data allocated for the life of the program Relocation info: for contents that depend on absolute location of loaded program Symbol table: global definitions and external refs Debug info: for associating with source code 113

Linking Object Modules t t Produces an executable image 1. Merges segments 2. Resolve

Linking Object Modules t t Produces an executable image 1. Merges segments 2. Resolve labels (determine their addresses) 3. Patch location-dependent and external refs Could leave location dependencies for fixing by a relocating loader l But with virtual memory, no need to do this l Program can be loaded into absolute location in virtual memory space 114

Loading a Program t Load from image file on disk into memory 1. Read

Loading a Program t Load from image file on disk into memory 1. Read header to determine segment sizes 2. Create virtual address space 3. Copy text and initialized data into memory n Or set page table entries so they can be faulted in 4. Set up arguments on stack 5. Initialize registers (including $sp, $fp, $gp) 6. Jump to startup routine n Copies arguments to $a 0, … and calls main n When main returns, do exit syscall 115

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 116

C Sort Example t t Illustrates use of assembly instructions for a C bubble

C Sort Example t t Illustrates use of assembly instructions for a C bubble sort function Swap procedure (leaf) void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } l v in $a 0, k in $a 1, temp in $t 0 117

The Procedure Swap swap: sll $t 1, $a 1, 2 # $t 1 =

The Procedure Swap swap: sll $t 1, $a 1, 2 # $t 1 = k * 4 add $t 1, $a 0, $t 1 # $t 1 = v+(k*4) # (address of v[k]) lw $t 0, 0($t 1) # $t 0 (temp) = v[k] lw $t 2, 4($t 1) # $t 2 = v[k+1] sw $t 2, 0($t 1) # v[k] = $t 2 (v[k+1]) sw $t 0, 4($t 1) # v[k+1] = $t 0 (temp) jr $ra # return to calling routine 118

The Sort Procedure in C t Non-leaf (calls swap) void sort (int v[], int

The Sort Procedure in C t Non-leaf (calls swap) void sort (int v[], int n) { int i, j; for (i = 0; i < n; i += 1) { for (j = i – 1; j >= 0 && v[j] > v[j + 1]; j -= 1) { swap(v, j); } } } l v in $a 0, k in $a 1, i in $s 0, j in $s 1 119

The Procedure Body move for 1 tst: slt beq addi for 2 tst: slti

The Procedure Body move for 1 tst: slt beq addi for 2 tst: slti bne sll add lw lw slt beq move jal addi j exit 2: addi j $s 2, $a 0 $s 3, $a 1 $s 0, $zero $t 0, $s 3 $t 0, $zero, exit 1 $s 1, $s 0, – 1 $t 0, $s 1, 0 $t 0, $zero, exit 2 $t 1, $s 1, 2 $t 2, $s 2, $t 1 $t 3, 0($t 2) $t 4, 4($t 2) $t 0, $t 4, $t 3 $t 0, $zero, exit 2 $a 0, $s 2 $a 1, $s 1 swap $s 1, – 1 for 2 tst $s 0, 1 for 1 tst # # # # # # save $a 0 into $s 2 save $a 1 into $s 3 i = 0 $t 0 = 0 if $s 0 ≥ $s 3 (i ≥ n) go to exit 1 if $s 0 ≥ $s 3 (i ≥ n) j = i – 1 $t 0 = 1 if $s 1 < 0 (j < 0) go to exit 2 if $s 1 < 0 (j < 0) $t 1 = j * 4 $t 2 = v + (j * 4) $t 3 = v[j] $t 4 = v[j + 1] $t 0 = 0 if $t 4 ≥ $t 3 go to exit 2 if $t 4 ≥ $t 3 1 st param of swap is v (old $a 0) 2 nd param of swap is j call swap procedure j –= 1 jump to test of inner loop i += 1 jump to test of outer loop Move params Outer loop Inner loop Pass params & call Inner loop Outer loop 120

The Full Procedure sort: exit 1: addi $sp, – 20 sw $ra, 16($sp) sw

The Full Procedure sort: exit 1: addi $sp, – 20 sw $ra, 16($sp) sw $s 3, 12($sp) sw $s 2, 8($sp) sw $s 1, 4($sp) sw $s 0, 0($sp) … … lw $s 0, 0($sp) lw $s 1, 4($sp) lw $s 2, 8($sp) lw $s 3, 12($sp) lw $ra, 16($sp) addi $sp, 20 jr $ra # # # # make room on stack for 5 registers save $ra on stack save $s 3 on stack save $s 2 on stack save $s 1 on stack save $s 0 on stack procedure body # # # # restore $s 0 from stack restore $s 1 from stack restore $s 2 from stack restore $s 3 from stack restore $ra from stack restore stack pointer return to calling routine 121

Effect of Compiler Optimization Compiled with gcc for Pentium 4 under Linux 122

Effect of Compiler Optimization Compiled with gcc for Pentium 4 under Linux 122

Effect of Language and Algorithm 123

Effect of Language and Algorithm 123

Lessons Learnt t Instruction count and CPI are not good performance indicators in isolation

Lessons Learnt t Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Nothing can fix a dumb algorithm! 124

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 125

Arrays vs. Pointers t t Array indexing involves l Multiplying index by element size

Arrays vs. Pointers t t Array indexing involves l Multiplying index by element size l Adding to array base address Pointers correspond directly to memory addresses l Can avoid indexing complexity 126

Example: Clearing an Array clear 1(int array[], int size) { int i; for (i

Example: Clearing an Array clear 1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } clear 2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; } move $t 0, $zero loop 1: sll $t 1, $t 0, 2 add $t 2, $a 0, $t 1 move $t 0, $a 0 # p = & array[0] sll $t 1, $a 1, 2 # $t 1 = size * 4 add $t 2, $a 0, $t 1 # $t 2 = # &array[size] loop 2: sw $zero, 0($t 0) # Memory[p] = 0 addi $t 0, 4 # p = p + 4 slt $t 3, $t 0, $t 2 # $t 3 = #(p<&array[size]) bne $t 3, $zero, loop 2 # if (…) # goto loop 2 # i = 0 # $t 1 = i * 4 # $t 2 = # &array[i] sw $zero, 0($t 2) # array[i] = 0 addi $t 0, 1 # i = i + 1 slt $t 3, $t 0, $a 1 # $t 3 = # (i < size) bne $t 3, $zero, loop 1 # if (…) # goto loop 1 127

Comparison of Array vs. Ptr t t t Multiply “strength reduced” to shift (strength

Comparison of Array vs. Ptr t t t Multiply “strength reduced” to shift (strength reduction) Array version requires shift to be inside loop l Part of index calculation for incremented i l c. f. incrementing pointer Compiler can achieve same effect as manual use of pointers l Eliminating array address calculations within loop (induction variable elimination): 6 instructions reduced to 4 in loop l Better to make program clearer and safer 128

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands

Outline t t t Instruction set architecture (using MIPS ISA as an example) Operands l Register operands and their organization l Memory operands, data transfer l Immediate operands Signed and unsigned numbers Representing instructions Operations l Logical l Decision making and branches Supporting procedures in hardware Communicating with people Addressing for 32 -bit immediate and addresses Translating and starting a program A sort example Arrays versus pointers ARM and x 86 instruction sets 129

ARM & MIPS Similarities t t ARM: the most popular embedded core Similar basic

ARM & MIPS Similarities t t ARM: the most popular embedded core Similar basic set of instructions to MIPS ARM MIPS Date announced 1985 Instruction size 32 bits Address space 32 -bit flat Data alignment Aligned 9 3 15 × 32 -bit 31 × 32 -bit Memory mapped Data addressing modes Registers Input/output 130

Compare and Branch in ARM t t Uses condition codes for result of an

Compare and Branch in ARM t t Uses condition codes for result of an arithmetic/logical instruction l Negative, zero, carry, overflow l Compare instructions to set condition codes without keeping the result Each instruction can be conditional l Top 4 bits of instruction word: condition value l Can avoid branches over single instructions 131

The Intel x 86 ISA t Evolution with backward compatibility l 8080 (1974): 8

The Intel x 86 ISA t Evolution with backward compatibility l 8080 (1974): 8 -bit microprocessor n Accumulator, plus 3 index-register pairs l 8086 (1978): 16 -bit extension to 8080 n Complex instruction set (CISC) l 8087 (1980): floating-point coprocessor n Adds FP instructions and register stack l 80286 (1982): 24 -bit addresses, MMU n Segmented memory mapping and protection l 80386 (1985): 32 -bit extension (now IA-32) n Additional addressing modes and operations n Paged memory mapping as well as segments 132

The Intel x 86 ISA t Further evolution… l i 486 (1989): pipelined, on-chip

The Intel x 86 ISA t Further evolution… l i 486 (1989): pipelined, on-chip caches and FPU n Compatible competitors: AMD, Cyrix, … l Pentium (1993): superscalar, 64 -bit datapath n Later versions added MMX (Multi-Media e. Xtension) instructions n The infamous FDIV bug l Pentium Pro (1995), Pentium II (1997) n New microarchitecture (see Colwell, The Pentium Chronicles) l Pentium III (1999) n Added SSE (Streaming SIMD Extensions) and associated registers l Pentium 4 (2001) n New microarchitecture n Added SSE 2 instructions 133

The Intel x 86 ISA t t Further evolution… l i 486 (1989): pipelined,

The Intel x 86 ISA t t Further evolution… l i 486 (1989): pipelined, on-chip caches and FPU n Compatible competitors: AMD, Cyrix, … l Pentium (1993): superscalar, 64 -bit datapath n Later versions added MMX (Multi-Media e. Xtension) instructions l Pentium Pro (1995), Pentium II (1997) n New microarchitecture (see Colwell, The Pentium Chronicles) l Pentium III (1999) n Added SSE (Streaming SIMD Extensions) and associated registers l Pentium 4 (2001) n Added SSE 2 instructions l ……. l Advanced Vector Extension (announced 2008) n Longer SSE registers, more instructions Technical elegance ≠ market success 134

X 86 Instruction Set t Backward compatibility instruction set doesn’t change l But they

X 86 Instruction Set t Backward compatibility instruction set doesn’t change l But they do accrete more instructions x 86 instruction set 135

Implementing IA-32 t t Complex instruction set makes implementation difficult l Hardware translates instructions

Implementing IA-32 t t Complex instruction set makes implementation difficult l Hardware translates instructions to simpler microoperations n Simple instructions: 1– 1 n Complex instructions: 1–many l Microengine similar to RISC l Market share makes this economically viable Comparable performance to RISC l Compilers avoid complex instructions 136

Fallacies t t Powerful instruction higher performance l Fewer instructions required l But complex

Fallacies t t Powerful instruction higher performance l Fewer instructions required l But complex instructions are hard to implement n May slow down all instructions, including simple ones l Compilers are good at making fast code from simple instructions Use assembly code for high performance l But modern compilers are better at dealing with modern processors l More lines of code more errors and less productivity 137

Pitfalls t t Sequential words are not at sequential addresses l Increment by 4,

Pitfalls t t Sequential words are not at sequential addresses l Increment by 4, not by 1! Keeping a pointer to an automatic variable after procedure returns l e. g. , passing pointer back via an argument l Pointer becomes invalid when stack popped 138

Concluding Remarks t t Design principles 1. Simplicity favors regularity 2. Smaller is faster

Concluding Remarks t t Design principles 1. Simplicity favors regularity 2. Smaller is faster 3. Make the common case fast 4. Good design demands good compromises MIPS: typical of RISC ISAs l c. f. x 86 139

Concluding Remarks t Measure MIPS instruction executions in benchmark programs l Consider making the

Concluding Remarks t Measure MIPS instruction executions in benchmark programs l Consider making the common case fast l Consider compromises Instruction class MIPS examples SPEC 2006 Int SPEC 2006 FP Arithmetic Data transfer add, sub, addi lw, sw, lbu, lhu, sb, lui and, or, nor, andi, ori, sll, srl beq, bne, slti, sltiu j, jr, jal 16% 35% 48% 36% 12% 4% 34% 8% 2% 0% Logical Cond. Branch Jump 140