CMPUT 229 Fall 2003 Topic 7 Floating Point

Reading Assignment CMPUT 229 - Computer Organization and Architecture I 2

Representing Large and Small Numbers How would you represent a number such as 6.

Floating Point Representation Most standard floating point representation use: 1 bit for the sign

$Floating Point Representation (example) 1 8 23 S exponent fraction Thus the exponent is$

$Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal$

$Floating Point 1 8 23 S exponent fraction What is the largest number that$

$Floating Point 1 8 23 S exponent fraction What is the smallest number (closest$

Special Floating Point Representations In the 8 -bit field of the exponent we can

Double Precision 32 -bit floating point representation is usually called single precision representation. A

Floating Point Addition (Decimal) How do we perform the following addition? 9. 99910 101

Floating Point Addition (Example) Convert the numbers 0. 510 and -0. 437510 to floating

Floating Point Multiplication (Decimal) Assume that we only can store four digits of the

MIPS Coprocessors CMPUT 229 - Computer Organization and Architecture I COPYRIGHT 1998 MORGAN KAUFMANN

Floating Point in MIPS Supports the IEEE 754 single-precision and double-precision formats. MIPS has

Floating Point in MIPS In order to load a value in a floating point

Floating Point Instruction in MIPS What does the following assembly code do? lwc 1

Floating Point (example) Parameter Passing Convention base of x[ ] $a 0 base of

i 0 i 32 return j 0 j 32 x[i][j] 0. 0 k 0

i 0 void mm ( double x[ ][ ], double y[ ][ ], double

i 0 j 0 Parameter Passing Convention base of x[ ][ ] $a 0

The loop body load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1

The loop body (cont. ) load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+

Initializing and Storing $f 4 How can we initialize $f 4? MIPS assembly to

Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][

load y[i][k] in $f 16 load z[k][j] in $f 16 store $f 4 in

MIPS assembly: li $t 1, 32 Write the code to save/restore li $s 0,

MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s

First we will have to examine the pseudoinstructions. For instance li $t 1, 32

How many times each loop is executed? out = 1 L 1 = 32

MIPS assembly: li L 1 = 32 times li L 2 = 32 32

Computing CPI If you know that each of the following types of instructions take

Computing CPI (cont. ) CMPUT 229 - Computer Organization and Architecture I 43

Computing Execution Time If the machine that we are using has a processor that

In preparation to the midterm. . . Write a code segment that reads a

In preparation to the midterm. . . Write a minimum instruction sequence that inverts

Slides: 46

Download presentation

CMPUT 229 - Fall 2003 Topic 7: Floating Point José Nelson Amaral CMPUT 229 - Computer Organization and Architecture I 1

Reading Assignment CMPUT 229 - Computer Organization and Architecture I 2

Representing Large and Small Numbers How would you represent a number such as 6. 023 1023 in binary? The range (1023) of this number is greater than the range of the 32 -bits representation that we have used for integers (231 2. 14 1010). However the precision (6023) of this number is quite small, and can be expressed in a small number of bits. The solution is to use a floating point representation. A floating point representation allocates some bits for the range of the value, some bits for precision, and one bit for the sign. CMPUT 229 - Computer Organization and Architecture I 3 From: Patt and Patel, pp. 32

Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) 1 8 23 S exponent fraction CMPUT 229 - Computer Organization and Architecture I 4 From: Patt and Patel, pp. 33

$Floating Point Representation (example) 1 8 23 S exponent fraction Thus the exponent is$

Floating Point Representation (example) 1 8 23 S exponent fraction Thus the exponent is given by: 229 - Computer 1 10000001 CMPUT 10101000000000 Organization and Architecture I 5 From: Patt and Patel, pp. 34

$Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal$

Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal value of the following floating point number? 001111011000000000000 exponent = 64+32+16+8+2+1=(128 -8)+3=120+3=123 CMPUT 229 - Computer Organization and Architecture I 6 From: Patt and Patel, pp. 34

$Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal$

Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal value of the following floating point number? 01000001100101000000000 exponent =128+2+1=131 CMPUT 229 - Computer Organization and Architecture I 7 From: Patt and Patel, pp. 35

$Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal$

Floating Point Representation (example) 1 8 23 S exponent fraction What is the decimal value of the following floating point number? 1100000101000000000 exponent =128+2=130 CMPUT 229 - Computer Organization and Architecture I 8 From: Patt and Patel, pp. 35

$Floating Point 1 8 23 S exponent fraction What is the largest number that$

Floating Point 1 8 23 S exponent fraction What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 0111111111111111 exponent =254 CMPUT 229 - Computer Organization and Architecture I 9 From: Patt and Patel, pp. 35

$Floating Point 1 8 23 S exponent fraction What is the largest number that$

Floating Point 1 8 23 S exponent fraction What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 0111111111111111 exponent actual exponent =254 -127 = 127 CMPUT 229 - Computer Organization and Architecture I 10 From: Patt and Patel, pp. 35

$Floating Point 1 8 23 S exponent fraction What is the smallest number (closest$

Floating Point 1 8 23 S exponent fraction What is the smallest number (closest to zero) that can be represented in 32 bits floating point using the IEEE 754 format above? 00000000000000001 exponent actual exponent =0 -126 = -126 CMPUT 229 - Computer Organization and Architecture I 11 From: Patt and Patel, pp. 35

Special Floating Point Representations In the 8 -bit field of the exponent we can represent numbers from 0 to 255. We studied how to read numbers with exponents from 0 to 254. What is the value represented when the exponent is 255 (i. e. 11112)? An exponent equal 255 = 11112 in a floating point representation indicates a special value. When the exponent is equal 255 = 11112 and the fraction is 0, the value represented is infinity. When the exponent is equal 255 = 11112 and the fraction is non-zero, the value represented is Not a Number (Na. N). CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 301 12

Double Precision 32 -bit floating point representation is usually called single precision representation. A double precision floating point representation requires 64 bits. In double precision the following number of bits are used: 1 sign bit 11 bits for exponent 52 bits for fraction (also called significand) CMPUT 229 - Computer Organization and Architecture I 13

Floating Point Addition (Decimal) How do we perform the following addition? 9. 99910 101 + 1. 61010 10 -1 Step 1: Align decimal point of the number with smaller exponent (notice lost of precision) 9. 99910 101 + 0. 01610 101 Step 2: Add significands: 9. 99910 101 + 0. 01610 101 = 10. 01510 101 Step 3: Renormalize the result: 10. 015 101 = 1. 0015 102 Step 3: Round-off the result to the representation available: 1. 0015 102 = 1. 002 102 CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 281 14

Floating Point Addition (Example) Convert the numbers 0. 510 and -0. 437510 to floating point binary representation, and then perform the binary floating point addition of these numbers. Which number should have its significand adjusted? CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 283 15

Floating Point Multiplication (Decimal) Assume that we only can store four digits of the significand two digits of the exponent in a decimal floating point representation. How would you multiply 1. 11010 by 9. 20010 10 -5 in this representation? Step 1: Add the exponents: new exponent = 10 - 5 = 5 Step 2: Multiply the significands: Step 3: Normalize the product: 10. 21210 105 = 1. 021210 106 Step 4: Round-off the product: 1. 021210 106 = 1. 02110 106 1. 110 9. 200 0000 2220 9990 10. 212000 CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 286 16

Floating Point in MIPS Supports the IEEE 754 single-precision and double-precision formats. MIPS has a separate set of registers to store floating point operands: $f 0, $f 1, $f 2, . . . In single precision, each individual register $f 0, $f 1, $f 2, … contains one single precision (32 -bit) value. In double precision, each pair of registers $f 0 -$f 1, $f 2 -$f 3, … contains one double precision (64 -bit) value. CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 288 18

Floating Point in MIPS In order to load a value in a floating point register, MIPS offers the load word coprocessor, lwcz, instructions. Because the floating point coprocessor is the coprocessor number 1, the instruction is lwc 1. Similarly to store the value of a floating point register into memory, MIPS offers the store word coprocessor, swc 1. CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 288 19

Floating Point Instruction in MIPS What does the following assembly code do? lwc 1 add. s swc 1 $f 4, 4($sp) $f 6, 8($sp) $f 2, $f 4, $f 6 $f 2, 12($sp) Reads two floating point values from the stack, performs their addition and stores the result in the stack. CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 288 20

Floating Point (example) Parameter Passing Convention base of x[ ] $a 0 base of y[ ] $a 1 base of z[ ] $a 2 void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; Assumption for( i=0 ; i != 32 ; i=i+1 ) i $s 0 for( j=0 ; j != 32 ; j=j+1 ) j $s 1 { k $s 2 x[i][j] = 0. 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } } CMPUT 229 - Computer Organization and Architecture I Hen/Patt, pp. 294 21

i 0 i 32 return j 0 j 32 x[i][j] 0. 0 k 0 void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] } } k 32 load x[i][j] load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 1 = d 1 + x[i][j] d 1 i i+1 j j+1 k k+1 CMPUT 229 - Computer Organization and Architecture I Do we need to load and store x[i][j] in every iteration of loop k? 22

i 0 i 32 return j 0 j 32 d 2 0. 0 k 0 void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] } } k 32 i i+1 load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 k k+1 x[i][j] d 2 j j+1 CMPUT 229 - Computer Organization and Architecture I Parameter Passing Convention base of x[ ] $a 0 base of y[ ] $a 1 base of z[ ] $a 2 Assumption i $s 0 j $s 1 k $s 2 23

i 0 i 32 return j 0 j 32 d 2 0. 0 k 32 MIPS assembly: li $t 1, 32 # t 1 32 li $s 0, 0 #i 0 L 1: beq $s 0, $t 1, D 1 li $s 1, 0 #j 0 L 2: beq $s 1, $t 1, D 2 $f 4 0. 0 li $s 2, 0 #k 0 L 3: beq $s 2, $t 1, D 3 <loop body> addiu $s 2, 1 # k k+1 j L 3 D 3: x[i][j] $f 4 addiu $s 1, 1 # j j+1 j L 2 D 2: addiu $s 0, 1 # i i+1 j L 1 D 1: i i+1 load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 k k+1 x[i][j] d 2 j j+1 CMPUT 229 - Computer Organization and Architecture I Parameter Passing Convention base of x[ ] $a 0 base of y[ ] $a 1 base of z[ ] $a 2 Assumption i $s 0 j $s 1 k $s 2 24

i 0 void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] } } j 0 d 2 0. 0 k 0 load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 k k+1 k 32 x[i][j] d 2 j j+1 j 32 i i+1 CMPUT 229 - Computer Organization and Architecture Ii 32 return 25

i 0 j 0 Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 d 2 0. 0 k 0 Assumption i $s 0 j $s 1 k $s 2 load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: $f 4 0. 0 li $s 2, 0 L 3: <loop body> addiu $s 2, 1 bne $s 2, $t 1, L 3 x[i][j] $f 4 addiu $s 1, 1 bne $s 1, $t 1, L 2 addiu $s 0, 1 bne $s 0, $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # k k+1 # j j+1 # i i+1 k k+1 k 32 x[i][j] d 2 j j+1 j 32 i i+1 CMPUT 229 - Computer Organization and Architecture Ii 32 return 26

The loop body load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 How do we load the y[i][k] into a floating point register? First we have to consider how a 2 -dimensional matrix of doubles is stored in memory Base of y[ ][ ] Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Base of y[ ][ ]+8 32 Assumption i $s 0 j $s 1 k $s 2 Base of y[ ][ ]+8 y[0][0] y[0][1] y[0][2] y[0][31] y[1][0] y[1][1] y[1][2] y[1][31] y[31][0] y[31][1] y[31][2] y[31] In general, the address of y[i][k] is given by: add(y[i][k])= base of y[ ][ ] + ( i 32 + k ) 8 CMPUT 229 - Computer Organization and Architecture I 27

The loop body load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Assumption i $s 0 j $s 1 k $s 2 In general, the address of y[i][k] is given by: add(y[i][k])= base of y[ ][ ] + ( i 32 + k ) 8 MIPS assembly for load y[i][k]: L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] Write the code to load z[k][j] in $f 18. MIPS assembly for load z[k][j]: sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) CMPUT 229 - Computer Organization and Architecture I # $t 2 32 k + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] 28

The loop body (cont. ) load y[i][k] load z[k][j] d 1 y[i][k]*z[k][j] d 2+ d 1 Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Once we have loaded y[i][k] into $f 16 and z[k][j] into $f 18, we can proceed to peform the multiply and the add: MIPS assembly for multiply and add: mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 # $f 16 y[i][k] z[k][j] Assumption i $s 0 j $s 1 k $s 2 CMPUT 229 - Computer Organization and Architecture I 29

Initializing and Storing $f 4 How can we initialize $f 4? MIPS assembly to initialize $f 4: mtc 1 $zero, $f 2 mtc 1 $zero, $f 3 Warning: In your textbook, page A-69, mtcz is specified as follows: MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: $f 4 0. 0 li $s 2, 0 L 3: <loop body> addiu $s 2, 1 bne $s 2, $t 1, L 3 x[i][j] $f 4 addiu $s 1, 1 bne $s 1, $t 1, L 2 addiu $s 0, 1 bne $s 0, $t 1, L 1 Move to coprocessor z: mtcz rd, rt Move CPU register rt to coprocessor z’s register rd. CMPUT 229 - Computer Organization and Architecture I # t 1 32 #i 0 #j 0 #k 0 # k k+1 # j j+1 # i i+1 Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Assumption i $s 0 j $s 1 30 k $s 2

Initializing and Storing $f 4 How can we initialize $f 4? MIPS assembly to initialize $f 4: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 How can we store $f 4 in x[i][j]? MIPS assembly to store $f 4 in x[i][j]: L 3: sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: $f 4 0. 0 li $s 2, 0 L 3: <loop body> addiu $s 2, 1 bne $s 2, $t 1, L 3 x[i][j] $f 4 addiu $s 1, 1 bne $s 1, $t 1, L 2 addiu $s 0, 1 bne $s 0, $t 1, L 1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 CMPUT 229 - Computer Organization and Architecture I # t 1 32 #i 0 #j 0 #k 0 # k k+1 # j j+1 # i i+1 Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Assumption i $s 0 j $s 1 31 k $s 2

Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Assumption i $s 0 j $s 1 k $s 2 MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 CMPUT 229 addiu - Computer$s 0, 1 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 32

load y[i][k] in $f 16 load z[k][j] in $f 16 store $f 4 in x[i][j] Parameter Passing Convention base of x[ ][ ] $a 0 base of y[ ][ ] $a 1 base of z[ ][ ] $a 2 Assumption i $s 0 j $s 1 k $s 2 MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 CMPUT 229 addiu - Computer$s 0, 1 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 33

MIPS assembly: li $t 1, 32 Write the code to save/restore li $s 0, 0 L 1: li $s 1, 0 registers that need to L 2: mtc 1 $zero, $f 4 be saved in the stack. mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 CMPUT 229 addiu - Computer$s 0, 1 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 34

MIPS assembly: li $t 1, 32 Write the code to save/restore li $s 0, 0 L 1: li $s 1, 0 registers that need to L 2: mtc 1 $zero, $f 4 be saved in the stack. mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 MIPS foo stack saving assembly: addu $t 2, $s 2 addi $sp, -36 sll $t 2, 3 sw $s 0, 32($sp) addu $t 2, $a 1, $t 2 sw $s 1, 28($sp) l. d $f 16, 0($t 2) sw $s 2, 24($sp) sll $t 2, $s 2, 5 swc 1 $f 4, 20($sp) addu $t 2, $s 1 swc 1 $f 5, 16($sp) sll $t 2, 3 swc 1 $f 16, 12($sp) addu $t 2, $a 2, $t 2 swc 1 $f 17, 8($sp) l. d $f 18, 0($t 2) swc 1 $f 18, 4($sp) mul. d $f 16, $f 18, $f 16 swc 1 $f 19, 0($sp) add. d $f 4, $f 16 addiu $s 2, 1 MIPS foo stack restoring assembly: bne $s 2, $t 1, L 3 lwc 1 $f 19, 0($sp) sll $t 2, $s 0, 5 lwc 1 $f 18, 4($sp) addu $t 2, $s 1 lwc 1 $f 17, 8($sp) sll $t 2, 3 lwc 1 $f 16, 12($sp) addu $t 2, $a 0, $t 2 lwc 1 $f 5, 16($sp) swc 1 $f 4, 0($t 2) lwc 1 $f 4, 20($sp) swc 1 $f 5, 4($t 2) lw $s 2, 24($sp) addiu $s 1, 1 lw $s 1, 28($sp) bne $s 1, $t 1, L 2 lw $s 0, 32($sp) CMPUT 229 addiu - Computer$s 0, 1 addi $sp, 36 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 35

MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 Suppose that we classify the sll $t 2, 3 addu $t 2, $a 1, $t 2 instructions of this program into: l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 integer logic and arithmetic addu $t 2, $s 1 32 -bit load/stores sll $t 2, 3 addu $t 2, $a 2, $t 2 conditional branchs l. d $f 18, 0($t 2) FP additions mul. d $f 16, $f 18, $f 16 FP multiplications add. d $f 4, $f 16 addiu $s 2, 1 move to/from coprocessor bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 How many instructions of sll $t 2, 3 each class are executed? addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 CMPUT 229 addiu - Computer$s 0, 1 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 36

First we will have to examine the pseudoinstructions. For instance li $t 1, 32 is translated to ori $t 1, $zero, 32 And l. d $f 16, 0($t 2) is translated to lwc 1 $f 18, 0($t 2) lwc 1 $f 19, 4($t 2) MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 CMPUT 229 addiu - Computer$s 0, 1 Organization and Architecture bne $s 0, I $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 18 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 37

How many times each loop is executed? out = 1 L 1 = 32 times L 2 = 32 32 times L 3 = 32 32 32 times MIPS assembly: li $t 1, 32 li $s 0, 0 L 1: li $s 1, 0 L 2: mtc 1 $zero, $f 4 mtc 1 $zero, $f 5 li $s 2, 0 L 3: sll $t 2, $s 0, 5 addu $t 2, $s 2 sll $t 2, 3 addu $t 2, $a 1, $t 2 l. d $f 16, 0($t 2) sll $t 2, $s 2, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 2, $t 2 l. d $f 18, 0($t 2) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 sll $t 2, $s 0, 5 addu $t 2, $s 1 sll $t 2, 3 addu $t 2, $a 0, $t 2 swc 1 $f 4, 0($t 2) swc 1 $f 5, 4($t 2) addiu $s 1, 1 $s 1, $t 1, L 2 CMPUT bne 229 - Computer addiu $s 0, Organization and Architecture I 1 bne $s 0, $t 1, L 1 # t 1 32 #i 0 #j 0 #k 0 # $t 2 32 i + k # $t 2 (32 i + k) 8 # $t 2 Addr(y[i][k]) # $f 16 y[i][k] # $t 2 32 k # $t 2 32 i + j # $t 2 (32 k + j) 8 # $t 2 Addr(z[k][j]) # $f 16 z[k][j] # $f 16 y[i][k] z[k][j] # k k+1 # $t 2 32 i + j # $t 2 (32 i + j) 8 # $t 2 Addr(x[i][j]) # x[i][j] $f 4 # j j+1 # i i+1 38

MIPS assembly: li L 1 = 32 times li L 2 = 32 32 times L 1: li L 3 = 32 32 32 times L 2: mtc 1 li L 3: sll addu sll Complete the table below with the number of addu instructions of each type executed in each l. d region of the program. sll addu l. d mul. d addiu bne sll addu swc 1 addiu bne CMPUT 229 - Computer addiu Organization and Architecture I bne $t 1, 32 $s 0, 0 $s 1, 0 $zero, $f 4 $zero, $f 5 $s 2, 0 $t 2, $s 0, 5 $t 2, $s 2 $t 2, 3 $t 2, $a 1, $t 2 $f 16, 0($t 2) $t 2, $s 2, 5 $t 2, $s 1 $t 2, 3 $t 2, $a 2, $t 2 $f 18, 0($t 2) $f 16, $f 18, $f 16 $f 4, $f 16 $s 2, 1 $s 2, $t 1, L 3 $t 2, $s 0, 5 $t 2, $s 1 $t 2, 3 $t 2, $a 0, $t 2 $f 4, 0($t 2) $f 5, 4($t 2) $s 1, 1 $s 1, $t 1, L 2 $s 0, 139 $s 0, $t 1, L 1

MIPS assembly: li L 1 = 32 times li L 2 = 32 32 times = 1024 times L 1: li L 3 = 32 32 32 times = 32768 times L 2: mtc 1 li L 3: sll addu sll Complete the table below with the number of addu instructions of each type executed in each l. d region of the program. sll addu l. d mul. d addiu bne sll addu swc 1 addiu bne CMPUT 229 - Computer addiu Organization and Architecture I bne $t 1, 32 $s 0, 0 $s 1, 0 $zero, $f 4 $zero, $f 5 $s 2, 0 $t 2, $s 0, 5 $t 2, $s 2 $t 2, 3 $t 2, $a 1, $t 2 $f 16, 0($t 2) $t 2, $s 2, 5 $t 2, $s 1 $t 2, 3 $t 2, $a 2, $t 2 $f 18, 0($t 2) $f 16, $f 18, $f 16 $f 4, $f 16 $s 2, 1 $s 2, $t 1, L 3 $t 2, $s 0, 5 $t 2, $s 1 $t 2, 3 $t 2, $a 0, $t 2 $f 4, 0($t 2) $f 5, 4($t 2) $s 1, 1 $s 1, $t 1, L 2 $s 0, 141 $s 0, $t 1, L 1

Computing CPI If you know that each of the following types of instructions take the indicated number of clock cycles to execute. How would you compute the CPI for this machine? CMPUT 229 - Computer Organization and Architecture I 42

Computing CPI (cont. ) CMPUT 229 - Computer Organization and Architecture I 43

Computing Execution Time If the machine that we are using has a processor that operates at 1. 3 GHz, how long does it take to execute foo( )? CMPUT 229 - Computer Organization and Architecture I 44

In preparation to the midterm. . . Write a code segment that reads a byte B from the address 0 x 8400 0040 and: a) writes 0 x 0000 00 FF in address 0 x 8400 0044 if the bit 5 of B is 1; b) writes 0 x. FFFF FF 00 in address 0 x 8400 0044 otherwise CMPUT 229 - Computer Organization and Architecture I 45

In preparation to the midterm. . . Write a minimum instruction sequence that inverts all the bits in the exponent field of the number stored in register $f 2. CMPUT 229 - Computer Organization and Architecture I 46