COMPUTER ORGANIZATION AND DESIGN 5 th The HardwareSoftware

n Operations on integers n n Addition and subtraction Multiplication and division Dealing with

§ 3. 2 Addition and Subtraction Integer Addition n Example: 7 + 6 n

Integer Subtraction n n Add negation of second operand Example: 7 – 6 =

Dealing with Overflow n Some languages (e. g. , C) ignore overflow n n

Arithmetic for Multimedia n Graphics and media processing operates on vectors of 8 -bit

n Start with long-multiplication approach § 3. 3 Multiplication multiplicand multiplier product 1000 ×

Multiplication Hardware Initially 0 Chapter 3 — Arithmetic for Computers — 8

Optimized Multiplier n Perform steps in parallel: add/shift n One cycle per partial-product addition

Faster Multiplier n Uses multiple adders n n Cost/performance tradeoff Can be pipelined n

MIPS Multiplication n Two 32 -bit registers for product n n n HI: most-significant

n n quotient Check for 0 divisor Long division approach n dividend divisor 1001

Division Hardware Initially divisor in left half Initially dividend Chapter 3 — Arithmetic for

Optimized Divider n n One cycle per partial-remainder subtraction Looks a lot like a

Faster Division n Can’t use parallel hardware as in multiplier n n Subtraction is

MIPS Division n Use HI/LO registers for result n n n HI: 32 -bit

n Representation for non-integral numbers n n Like scientific notation n n – 2.

Floating Point Standard n n Defined by IEEE Std 754 -1985 Developed in response

IEEE Floating-Point Format single: 8 bits double: 11 bits S Exponent n n Fraction

Single-Precision Range n n Exponents 0000 and 1111 reserved Smallest value n n Exponent:

Double-Precision Range n n Exponents 0000… 00 and 1111… 11 reserved Smallest value n

$Floating-Point Precision n Relative precision n n all fraction bits are significant Single: approx$

Floating-Point Example n Represent – 0. 75 n n – 0. 75 = (–

Floating-Point Example n What number is represented by the singleprecision float 11000000101000… 00 n

Floating-Point Addition n Consider a 4 -digit decimal example n n 1. Align decimal

Floating-Point Addition n Now consider a 4 -digit binary example n n 1. Align

FP Adder Hardware n n Much more complex than integer adder Doing it in

FP Adder Hardware Step 1 Step 2 Step 3 Step 4 Chapter 3 —

FP Arithmetic Hardware n FP multiplier is of similar complexity to FP adder n

FP Instructions in MIPS n FP hardware is coprocessor 1 n n Adjunct processor

FP Instructions in MIPS n Single-precision arithmetic n add. s, sub. s, mul. s,

FP Example: °F to °C n C code: float f 2 c (float fahr)

FP Example: Array Multiplication n X=X+Y×Z n n All 32 × 32 matrices, 64

FP Example: Array Multiplication n MIPS code: li li L 1: li L 2:

FP Example: Array Multiplication … sll $t 0, $s 0, 5 addu $t 0,

Accurate Arithmetic n IEEE Std 754 specifies additional rounding control n n Not all

n Graphics and audio applications can take advantage of performing simultaneous operations on short

n Originally based on 8087 FP coprocessor n n FP values are 32 -bit

x 86 FP Instructions Data transfer Arithmetic Compare Transcendental FILD mem/ST(i) FISTP mem/ST(i) FLDPI

Streaming SIMD Extension 2 (SSE 2) n Adds 4 × 128 -bit registers n

n Unoptimized code: 1. void dgemm (int n, double* A, double* B, double* C)

n x 86 assembly code: 1. vmovsd (%r 10), %xmm 0 # Load 1

n Optimized C code: § 3. 8 Going Faster: Subword Parallelism and Matrix Multiply

n Optimized x 86 assembly code: 1. vmovapd (%r 11), %ymm 0 # Load

n n Left shift by i places multiplies an integer by 2 i Right

Associativity n Parallel programs may interleave operations in unexpected orders n n Assumptions of

Who Cares About FP Accuracy? n Important for scientific code n But for everyday

n Bits have no inherent meaning n n Interpretation depends on the instructions applied

Concluding Remarks n ISAs support arithmetic n n n Bounded range and precision n

Slides: 49

Download presentation

COMPUTER ORGANIZATION AND DESIGN 5 th The Hardware/Software Interface Chapter 3 Arithmetic for Computers Edition

n Operations on integers n n Addition and subtraction Multiplication and division Dealing with overflow § 3. 1 Introduction Arithmetic for Computers Floating-point real numbers n Representation and operations Chapter 3 — Arithmetic for Computers — 2

§ 3. 2 Addition and Subtraction Integer Addition n Example: 7 + 6 n Overflow if result out of range n n Adding +ve and –ve operands, no overflow Adding two +ve operands n n Overflow if result sign is 1 Adding two –ve operands n Overflow if result sign is 0 Chapter 3 — Arithmetic for Computers — 3

Integer Subtraction n n Add negation of second operand Example: 7 – 6 = 7 + (– 6) +7: – 6: +1: n 0000 … 0000 0111 1111 … 1111 1010 0000 … 0000 0001 Overflow if result out of range n n Subtracting two +ve or two –ve operands, no overflow Subtracting +ve from –ve operand n n Overflow if result sign is 0 Subtracting –ve from +ve operand n Overflow if result sign is 1 Chapter 3 — Arithmetic for Computers — 4

Dealing with Overflow n Some languages (e. g. , C) ignore overflow n n Use MIPS addu, addui, subu instructions Other languages (e. g. , Ada, Fortran) require raising an exception n n Use MIPS add, addi, sub instructions On overflow, invoke exception handler n n n Save PC in exception program counter (EPC) register Jump to predefined handler address mfc 0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action Chapter 3 — Arithmetic for Computers — 5

Arithmetic for Multimedia n Graphics and media processing operates on vectors of 8 -bit and 16 -bit data n Use 64 -bit adder, with partitioned carry chain n Operate on 8× 8 -bit, 4× 16 -bit, or 2× 32 -bit vectors SIMD (single-instruction, multiple-data) Saturating operations n On overflow, result is largest representable value n n c. f. 2 s-complement modulo arithmetic E. g. , clipping in audio, saturation in video Chapter 3 — Arithmetic for Computers — 6

n Start with long-multiplication approach § 3. 3 Multiplication multiplicand multiplier product 1000 × 1001 1000 0000 1001000 Length of product is the sum of operand lengths Chapter 3 — Arithmetic for Computers — 7

Multiplication Hardware Initially 0 Chapter 3 — Arithmetic for Computers — 8

Optimized Multiplier n Perform steps in parallel: add/shift n One cycle per partial-product addition n That’s ok, if frequency of multiplications is low Chapter 3 — Arithmetic for Computers — 9

Faster Multiplier n Uses multiple adders n n Cost/performance tradeoff Can be pipelined n Several multiplication performed in parallel Chapter 3 — Arithmetic for Computers — 10

MIPS Multiplication n Two 32 -bit registers for product n n n HI: most-significant 32 bits LO: least-significant 32 -bits Instructions n mult rs, rt n n multu rs, rt 64 -bit product in HI/LO mfhi rd n / / mflo rd Move from HI/LO to rd Can test HI value to see if product overflows 32 bits mul rd, rs, rt n Least-significant 32 bits of product –> rd Chapter 3 — Arithmetic for Computers — 11

n n quotient Check for 0 divisor Long division approach n dividend divisor 1001 1000 1001010 -1000 10 remainder n-bit operands yield n-bit quotient and remainder If divisor ≤ dividend bits n n 0 bit in quotient, bring down next dividend bit Restoring division n n 1 bit in quotient, subtract Otherwise n n § 3. 4 Division Do the subtract, and if remainder goes < 0, add divisor back Signed division n n Divide using absolute values Adjust sign of quotient and remainder as required Chapter 3 — Arithmetic for Computers — 12

Division Hardware Initially divisor in left half Initially dividend Chapter 3 — Arithmetic for Computers — 13

Optimized Divider n n One cycle per partial-remainder subtraction Looks a lot like a multiplier! n Same hardware can be used for both Chapter 3 — Arithmetic for Computers — 14

Faster Division n Can’t use parallel hardware as in multiplier n n Subtraction is conditional on sign of remainder Faster dividers (e. g. SRT devision) generate multiple quotient bits per step n Still require multiple steps Chapter 3 — Arithmetic for Computers — 15

MIPS Division n Use HI/LO registers for result n n n HI: 32 -bit remainder LO: 32 -bit quotient Instructions n n div rs, rt / divu rs, rt No overflow or divide-by-0 checking n n Software must perform checks if required Use mfhi, mflo to access result Chapter 3 — Arithmetic for Computers — 16

n Representation for non-integral numbers n n Like scientific notation n n – 2. 34 × 1056 +0. 002 × 10– 4 +987. 02 × 109 normalized not normalized In binary n n Including very small and very large numbers § 3. 5 Floating Point ± 1. xxxxxxx 2 × 2 yyyy Types float and double in C Chapter 3 — Arithmetic for Computers — 17

Floating Point Standard n n Defined by IEEE Std 754 -1985 Developed in response to divergence of representations n n n Portability issues for scientific code Now almost universally adopted Two representations n n Single precision (32 -bit) Double precision (64 -bit) Chapter 3 — Arithmetic for Computers — 18

IEEE Floating-Point Format single: 8 bits double: 11 bits S Exponent n n Fraction S: sign bit (0 non-negative, 1 negative) Normalize significand: 1. 0 ≤ |significand| < 2. 0 n n n single: 23 bits double: 52 bits Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the “ 1. ” restored Exponent: excess representation: actual exponent + Bias n n Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203 Chapter 3 — Arithmetic for Computers — 19

Single-Precision Range n n Exponents 0000 and 1111 reserved Smallest value n n Exponent: 00000001 actual exponent = 1 – 127 = – 126 Fraction: 000… 00 significand = 1. 0 ± 1. 0 × 2– 126 ≈ ± 1. 2 × 10– 38 Largest value n n n exponent: 11111110 actual exponent = 254 – 127 = +127 Fraction: 111… 11 significand ≈ 2. 0 ± 2. 0 × 2+127 ≈ ± 3. 4 × 10+38 Chapter 3 — Arithmetic for Computers — 20

Double-Precision Range n n Exponents 0000… 00 and 1111… 11 reserved Smallest value n n Exponent: 000001 actual exponent = 1 – 1023 = – 1022 Fraction: 000… 00 significand = 1. 0 ± 1. 0 × 2– 1022 ≈ ± 2. 2 × 10– 308 Largest value n n n Exponent: 111110 actual exponent = 2046 – 1023 = +1023 Fraction: 111… 11 significand ≈ 2. 0 ± 2. 0 × 2+1023 ≈ ± 1. 8 × 10+308 Chapter 3 — Arithmetic for Computers — 21

$Floating-Point Precision n Relative precision n n all fraction bits are significant Single: approx$

Floating-Point Precision n Relative precision n n all fraction bits are significant Single: approx 2– 23 n n Equivalent to 23 × log 102 ≈ 23 × 0. 3 ≈ 6 decimal digits of precision Double: approx 2– 52 n Equivalent to 52 × log 102 ≈ 52 × 0. 3 ≈ 16 decimal digits of precision Chapter 3 — Arithmetic for Computers — 22

Floating-Point Example n Represent – 0. 75 n n – 0. 75 = (– 1)1 × 1. 12 × 2– 1 S=1 Fraction = 1000… 002 Exponent = – 1 + Bias n n Single: – 1 + 127 = 126 = 011111102 Double: – 1 + 1023 = 1022 = 01111102 Single: 1011111101000… 00 Double: 101111101000… 00 Chapter 3 — Arithmetic for Computers — 23

Floating-Point Example n What number is represented by the singleprecision float 11000000101000… 00 n n S=1 Fraction = 01000… 002 Fxponent = 100000012 = 129 x = (– 1)1 × (1 + 012) × 2(129 – 127) = (– 1) × 1. 25 × 22 = – 5. 0 Chapter 3 — Arithmetic for Computers — 24

Floating-Point Addition n Consider a 4 -digit decimal example n n 1. Align decimal points n n n 9. 999 × 101 + 0. 016 × 101 = 10. 015 × 101 3. Normalize result & check for over/underflow n n Shift number with smaller exponent 9. 999 × 101 + 0. 016 × 101 2. Add significands n n 9. 999 × 101 + 1. 610 × 10– 1 1. 0015 × 102 4. Round and renormalize if necessary n 1. 002 × 102 Chapter 3 — Arithmetic for Computers — 27

Floating-Point Addition n Now consider a 4 -digit binary example n n 1. Align binary points n n n 1. 0002 × 2– 1 + – 0. 1112 × 2– 1 = 0. 0012 × 2– 1 3. Normalize result & check for over/underflow n n Shift number with smaller exponent 1. 0002 × 2– 1 + – 0. 1112 × 2– 1 2. Add significands n n 1. 0002 × 2– 1 + – 1. 1102 × 2– 2 (0. 5 + – 0. 4375) 1. 0002 × 2– 4, with no over/underflow 4. Round and renormalize if necessary n 1. 0002 × 2– 4 (no change) = 0. 0625 Chapter 3 — Arithmetic for Computers — 28

FP Adder Hardware n n Much more complex than integer adder Doing it in one clock cycle would take too long n n n Much longer than integer operations Slower clock would penalize all instructions FP adder usually takes several cycles n Can be pipelined Chapter 3 — Arithmetic for Computers — 29

FP Adder Hardware Step 1 Step 2 Step 3 Step 4 Chapter 3 — Arithmetic for Computers — 30

FP Arithmetic Hardware n FP multiplier is of similar complexity to FP adder n n FP arithmetic hardware usually does n n n But uses a multiplier for significands instead of an adder Addition, subtraction, multiplication, division, reciprocal, square-root FP integer conversion Operations usually takes several cycles n Can be pipelined Chapter 3 — Arithmetic for Computers — 33

FP Instructions in MIPS n FP hardware is coprocessor 1 n n Adjunct processor that extends the ISA Separate FP registers n n 32 single-precision: $f 0, $f 1, … $f 31 Paired for double-precision: $f 0/$f 1, $f 2/$f 3, … n n FP instructions operate only on FP registers n n n Release 2 of MIPs ISA supports 32 × 64 -bit FP reg’s Programs generally don’t do integer ops on FP data, or vice versa More registers with minimal code-size impact FP load and store instructions n lwc 1, ldc 1, swc 1, sdc 1 n e. g. , ldc 1 $f 8, 32($sp) Chapter 3 — Arithmetic for Computers — 34

FP Instructions in MIPS n Single-precision arithmetic n add. s, sub. s, mul. s, div. s n n Double-precision arithmetic n add. d, sub. d, mul. d, div. d n n e. g. , mul. d $f 4, $f 6 Single- and double-precision comparison n n c. xx. s, c. xx. d (xx is eq, lt, le, …) Sets or clears FP condition-code bit n n e. g. , add. s $f 0, $f 1, $f 6 e. g. c. lt. s $f 3, $f 4 Branch on FP condition code true or false n bc 1 t, bc 1 f n e. g. , bc 1 t Target. Label Chapter 3 — Arithmetic for Computers — 35

FP Example: °F to °C n C code: float f 2 c (float fahr) { return ((5. 0/9. 0)*(fahr - 32. 0)); } n fahr in $f 12, result in $f 0, literals in global memory space n Compiled MIPS code: f 2 c: lwc 1 lwc 2 div. s lwc 1 sub. s mul. s jr $f 16, $f 18, $f 0, $ra const 5($gp) const 9($gp) $f 16, $f 18 const 32($gp) $f 12, $f 18 $f 16, $f 18 Chapter 3 — Arithmetic for Computers — 36

FP Example: Array Multiplication n X=X+Y×Z n n All 32 × 32 matrices, 64 -bit double-precision elements C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } n Addresses of x, y, z in $a 0, $a 1, $a 2, and i, j, k in $s 0, $s 1, $s 2 Chapter 3 — Arithmetic for Computers — 37

FP Example: Array Multiplication n MIPS code: li li L 1: li L 2: li sll addu l. d L 3: sll addu l. d … $t 1, 32 $s 0, 0 $s 1, 0 $s 2, 0 $t 2, $s 0, 5 $t 2, $s 1 $t 2, 3 $t 2, $a 0, $t 2 $f 4, 0($t 2) $t 0, $s 2, 5 $t 0, $s 1 $t 0, 3 $t 0, $a 2, $t 0 $f 16, 0($t 0) # # # # $t 1 = 32 (row size/loop end) i = 0; initialize 1 st for loop j = 0; restart 2 nd for loop k = 0; restart 3 rd for loop $t 2 = i * 32 (size of row of x) $t 2 = i * size(row) + j $t 2 = byte offset of [i][j] $t 2 = byte address of x[i][j] $f 4 = 8 bytes of x[i][j] $t 0 = k * 32 (size of row of z) $t 0 = k * size(row) + j $t 0 = byte offset of [k][j] $t 0 = byte address of z[k][j] $f 16 = 8 bytes of z[k][j] Chapter 3 — Arithmetic for Computers — 38

FP Example: Array Multiplication … sll $t 0, $s 0, 5 addu $t 0, $s 2 sll $t 0, 3 addu $t 0, $a 1, $t 0 l. d $f 18, 0($t 0) mul. d $f 16, $f 18, $f 16 add. d $f 4, $f 16 addiu $s 2, 1 bne $s 2, $t 1, L 3 s. d $f 4, 0($t 2) addiu $s 1, 1 bne $s 1, $t 1, L 2 addiu $s 0, 1 bne $s 0, $t 1, L 1 # # # # $t 0 = i*32 (size of row of y) $t 0 = i*size(row) + k $t 0 = byte offset of [i][k] $t 0 = byte address of y[i][k] $f 18 = 8 bytes of y[i][k] $f 16 = y[i][k] * z[k][j] f 4=x[i][j] + y[i][k]*z[k][j] $k k + 1 if (k != 32) go to L 3 x[i][j] = $f 4 $j = j + 1 if (j != 32) go to L 2 $i = i + 1 if (i != 32) go to L 1 Chapter 3 — Arithmetic for Computers — 39

Accurate Arithmetic n IEEE Std 754 specifies additional rounding control n n Not all FP units implement all options n n Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of a computation Most programming languages and FP libraries just use defaults Trade-off between hardware complexity, performance, and market requirements Chapter 3 — Arithmetic for Computers — 40

n Graphics and audio applications can take advantage of performing simultaneous operations on short vectors n Example: 128 -bit adder: n n Sixteen 8 -bit adds Eight 16 -bit adds Four 32 -bit adds Also called data-level parallelism, vector parallelism, or Single Instruction, Multiple Data (SIMD) § 3. 6 Parallelism and Computer Arithmetic: Subword Parallelism Subword Parallellism Chapter 3 — Arithmetic for Computers — 41

n Originally based on 8087 FP coprocessor n n FP values are 32 -bit or 64 in memory n n n 8 × 80 -bit extended-precision registers Used as a push-down stack Registers indexed from TOS: ST(0), ST(1), … Converted on load/store of memory operand Integer operands can also be converted on load/store Very difficult to generate and optimize code n Result: poor FP performance § 3. 7 Real Stuff: Streaming SIMD Extensions and AVX in x 86 FP Architecture Chapter 3 — Arithmetic for Computers — 42

x 86 FP Instructions Data transfer Arithmetic Compare Transcendental FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD 1 FLDZ FIADDP FISUBRP FIMULP FIDIVRP FSQRT FABS FRNDINT FICOMP FIUCOMP FSTSW AX/mem FPATAN F 2 XMI FCOS FPTAN FPREM FPSIN FYL 2 X n mem/ST(i) Optional variations n n I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed Chapter 3 — Arithmetic for Computers — 43

Streaming SIMD Extension 2 (SSE 2) n Adds 4 × 128 -bit registers n n Extended to 8 registers in AMD 64/EM 64 T Can be used for multiple FP operands n n n 2 × 64 -bit double precision 4 × 32 -bit double precision Instructions operate on them simultaneously n Single-Instruction Multiple-Data Chapter 3 — Arithmetic for Computers — 44

n Unoptimized code: 1. void dgemm (int n, double* A, double* B, double* C) 2. { 3. for (int i = 0; i < n; ++i) 4. for (int j = 0; j < n; ++j) 5. { 6. double cij = C[i+j*n]; /* cij = C[i][j] */ 7. for(int k = 0; k < n; k++ ) 8. cij += A[i+k*n] * B[k+j*n]; /* cij += A[i][k]*B[k][j] */ 9. C[i+j*n] = cij; /* C[i][j] = cij */ 10. } 11. } § 3. 8 Going Faster: Subword Parallelism and Matrix Multiply Chapter 3 — Arithmetic for Computers — 45

n x 86 assembly code: 1. vmovsd (%r 10), %xmm 0 # Load 1 element of C into %xmm 0 2. mov %rsi, %rcx # register %rcx = %rsi 3. xor %eax, %eax # register %eax = 0 4. vmovsd (%rcx), %xmm 1 # Load 1 element of B into %xmm 1 5. add %r 9, %rcx # register %rcx = %rcx + %r 9 6. vmulsd (%r 8, %rax, 8), %xmm 1 # Multiply %xmm 1, element of A 7. add $0 x 1, %rax # register %rax = %rax + 1 8. cmp %eax, %edi # compare %eax to %edi 9. vaddsd %xmm 1, %xmm 0 # Add %xmm 1, %xmm 0 10. jg 30 <dgemm+0 x 30> # jump if %eax > %edi 11. add $0 x 1, %r 11 d # register %r 11 = %r 11 + 1 12. vmovsd %xmm 0, (%r 10) # Store %xmm 0 into C element § 3. 8 Going Faster: Subword Parallelism and Matrix Multiply Chapter 3 — Arithmetic for Computers — 46

n Optimized C code: § 3. 8 Going Faster: Subword Parallelism and Matrix Multiply 1. #include <x 86 intrin. h> 2. void dgemm (int n, double* A, double* B, double* C) 3. { 4. for ( int i = 0; i < n; i+=4 ) 5. for ( int j = 0; j < n; j++ ) { 6. __m 256 d c 0 = _mm 256_load_pd(C+i+j*n); /* c 0 = C[i][j] */ 7. for( int k = 0; k < n; k++ ) 8. c 0 = _mm 256_add_pd(c 0, /* c 0 += A[i][k]*B[k][j] */ 9. _mm 256_mul_pd(_mm 256_load_pd(A+i+k*n), 10. _mm 256_broadcast_sd(B+k+j*n))); 11. _mm 256_store_pd(C+i+j*n, c 0); /* C[i][j] = c 0 */ 12. } 13. } Chapter 3 — Arithmetic for Computers — 47

n Optimized x 86 assembly code: 1. vmovapd (%r 11), %ymm 0 # Load 4 elements of C into %ymm 0 2. mov %rbx, %rcx # register %rcx = %rbx 3. xor %eax, %eax # register %eax = 0 4. vbroadcastsd (%rax, %r 8, 1), %ymm 1 # Make 4 copies of B element 5. add $0 x 8, %rax # register %rax = %rax + 8 6. vmulpd (%rcx), %ymm 1 # Parallel mul %ymm 1, 4 A elements 7. add %r 9, %rcx # register %rcx = %rcx + %r 9 8. cmp %r 10, %rax # compare %r 10 to %rax 9. vaddpd %ymm 1, %ymm 0 # Parallel add %ymm 1, %ymm 0 10. jne 50 <dgemm+0 x 50> # jump if not %r 10 != %rax 11. add $0 x 1, %esi # register % esi = % esi + 1 12. vmovapd %ymm 0, (%r 11) # Store %ymm 0 into 4 C elements § 3. 8 Going Faster: Subword Parallelism and Matrix Multiply Chapter 3 — Arithmetic for Computers — 48

n n Left shift by i places multiplies an integer by 2 i Right shift divides by 2 i? n n Only for unsigned integers § 3. 9 Fallacies and Pitfalls Right Shift and Division For signed integers n n Arithmetic right shift: replicate the sign bit e. g. , – 5 / 4 n n n 111110112 >> 2 = 111111102 = – 2 Rounds toward –∞ c. f. 111110112 >>> 2 = 001111102 = +62 Chapter 3 — Arithmetic for Computers — 49

Associativity n Parallel programs may interleave operations in unexpected orders n n Assumptions of associativity may fail Need to validate parallel programs under varying degrees of parallelism Chapter 3 — Arithmetic for Computers — 50

Who Cares About FP Accuracy? n Important for scientific code n But for everyday consumer use? n n “My bank balance is out by 0. 0002¢!” The Intel Pentium FDIV bug n n The market expects accuracy See Colwell, The Pentium Chronicles Chapter 3 — Arithmetic for Computers — 51

n Bits have no inherent meaning n n Interpretation depends on the instructions applied § 3. 9 Concluding Remarks Computer representations of numbers n n Finite range and precision Need to account for this in programs Chapter 3 — Arithmetic for Computers — 52

Concluding Remarks n ISAs support arithmetic n n n Bounded range and precision n n Signed and unsigned integers Floating-point approximation to reals Operations can overflow and underflow MIPS ISA n Core instructions: 54 most frequently used n n 100% of SPECINT, 97% of SPECFP Other instructions: less frequent Chapter 3 — Arithmetic for Computers — 53