Representing fractions Fixed point n The problem n

$Representing fractions – Fixed point n The problem: n How to represent fractions with$

Representing fractions – Fixed point n The problem: n How to represent fractions with finite number of bits ?

$Representing fractions – Fixed point A number with 10 bits a 1 a 2$

Representing fractions – Fixed point A number with 10 bits a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10

$Representing fractions – Fixed point A number with 10 bits a 1 a 2$

Representing fractions – Fixed point A number with 10 bits a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8. a 9 a 10 Fixing the point

$Representing fractions – Fixed point Range of representation: Number Signed -27… 27 -1 (2$

Representing fractions – Fixed point Range of representation: Number Signed -27… 27 -1 (2 complement) Unsigned 0… 28 -1 Fraction Quanta of 0. 25

Fixed point : the problem n Cannot represent wide ranges of numbers. n In scientific applications.

Representing Fractions – Floating point 10 1 * 101 Base (radix) - r -0. 123 -1. 23 * 10 -2

Representing Fractions – Floating point 10 1 * 101 exponent -0. 123 -1. 23 * 10 -1

Representing Fractions – Floating point 10 1 * 101 Number (Mantissa) -0. 123 -1. 23 * 10 -1

Representing Fractions – Floating point 10 (-1)0*1 * 101 Sign bit -0. 123 (-1)1*1. 23 * 10 -1

Problem of uniqueness 100*10 -4 0. 1 Representation is not Unique 0. 001*102

Problem of uniqueness Normalization 610*10 -4 0. 61 Standardization 0. 0061*102 6. 1*10 -1 One digit to the Left of the point

Normalized Binary Floating point D = (-1)a 0 * (1. a 1 a 2 a 3…)*2 b 1 b 2 b 3… a 0 b 1 b 2…bna 1 a 2 a 3…am String of bits

Floating point - Questions n Representing the (signed) exponent n How to represent zero? n And Nan, infinity ? n How to add, subtract and multiply? n Rounding Errors.

Floating point – Representing the exponent How to represent singed number ? Sign bit 2 -Complement

Floating point – Representing the exponent How to represent singed number ? Sign bit Neither 2 -Complement

Floating point – Representing the exponent n We want the exponent to be binary ordered: 0000 < 0001 < …. < 1000 < … < 1111

Floating point – Representing the exponent Number = Number - B Usually B = 2 n-1 -1 We define the following sizes like this: emin 000… 0001 emax 111… 1110

Floating point – Representing zero, NAN, ± IEEE 754 special values Exponent Mantissa Represent e = emin-1 M=0 ± 0 e = emin-1 M≠ 0 0. M*2 emin Denormalized number 1. M*2 e normalized number emin ≤ emax e = emax+1 M=0 e = emax+1 M≠ 0 ± Na. N

IEEE 754 Parameter Single Double Mantissa 24 53 emax 127 1023 emin -126 -1022 Exponent width Format width 8 11 32 64 (Including the sign Bit)

What is Na. N (not a number) Operation + * / Na. N produced by + (- ) 0* 0/0 , / Operation X±Na. N X*Na. N X/Na. N Partial list

Infinity n Provide a safe was to continue calculation when overflow is encountered.

Calculations with Floating Point numbers n Addition: n Equalize the exponents (smaller larger exponent) n Sum the mantissa n Renormalize if necessary

Calculations with Floating Point numbers n Example (in base 10): |E| = 1 , |M| = 3 91 9. 10*101 9. 70*100

Calculations with Floating Point numbers 9. 10*101 + 9. 70*100 Not The same Order.

Calculations with Floating Point numbers 9. 10*101 + 9. 70*100 9. 10*101 + 0. 97*101 10. 7*101 renormalize 1. 07*102

Calculations with Floating Point numbers n Example II (in base 10): |E| = 1 , |M| = 3 91 9. 10*101 9. 75*100

Calculations with Floating Point numbers 9. 10*101 + 9. 75*100 Not The same Order.

Calculations with Floating Point numbers 9. 10 *101 + 0. 975*101 9. 10*101 + 9. 75*100 10. 75*101 renormalize 5 1. 07*102 (rounding error)

Rounding Errors The Problem: Squeezing infinite many real numbers into a finite number of bits

Measuring Rounding Errors n Units in last place (Ulps) n Relative Error

Measuring Rounding Errors – ULP p digits If d. dddd*re represent z error = |d. dddd – (z/re)|*rp-1

Measuring Rounding Errors – ULP Example I: r = 10 , p = 3 The number 3. 14*10 -2 represents 0. 0314159 Error = 0. 159

Measuring Rounding Errors – ULP n What is the maximum ULP if the rounding is toward the nearest number? 0. 5 ULP

Measuring Rounding Errors – Relative Error p digits If d. dddd*re represent z Relative error = |d. dddd*re – z|/z

Measuring Rounding Errors – Relative errors Example I: r = 10 , p = 3 The number 3. 14*10 -2 represents 0. 0314159 Relative Error ~ 0. 0005