Floating point Number system corresponding to the decimal

Floating point Number system corresponding to the decimal notation 4 1, 837 * 10 significand exponent a great number of corresponding binary standards exists there is one common standard: IEEE 754 -1985 (IEC 559) Datorteknik Floating. Point bild 1

IEEE 754 -1985 Number representation Single precision (32 bits) sign: 1 bit exponent: 8 bits fraction: 23 bits Double precision (64 bits) sign: 1 bit exponent: 11 bits fraction: 52 bits Single extended and double extended numbers exists inside the floating point hardware Datorteknik Floating. Point bild 2

IEEE 754 -1985 Single Precision: 1 sign S 8 E 23 M exponent: mantissa: excess 127 sign + magnitude, normalized binary integer binary significand w/ hidden integer bit: 1. M actual exponent is e = E - 127 N = (-1) 2 E-127 0 < E < 255 (1. M) 0 = 0 0000 0. . . 0 -1. 5 = 1 01111111 10. . . 0 Magnitude of numbers that can be represented is in the range: 2 -126 (1. 0) to which is approximately: -38 to 1. 8 x 10 2 127 (2 - 2 23 ) 3. 40 x 10 38 Datorteknik Floating. Point bild 3

IEEE 754 -1985 Fraction part: 23 / 52 bits; 0 ≤ x <1 Significand: 1 + fraction part “ 1” is not stored; “hidden bit” corresponds to 7 resp. 16 decimal digits Exponent: 127 / 1023 added to the exponent; “biased exponent” -39 corresponds to 10 resp. 10 39 - 10 -308 - 10 308 Datorteknik Floating. Point bild 4

IEEE 754 -1985 Special features: Correct rounding of “halfway” result (to even number) Includes special values: Na. N Not a number ∞ Infinity -∞ - Infinity Uses denormal number to represent Emin numbers less than 2 Rounds to nearest by default; Three other rounding modes exists. Sophisticated exception handling Datorteknik Floating. Point bild 5

Multiplication (s 1 * 2 e 1 ) * (s 2 * 2 e 2 ) = s 1*s 2 *2 e 1+e 2 so, multiply significands and add exponents Problem: Significand coded in signedmagnitude - use unsigned multiplication and take care of sign Round 2 n bits significand to n bits significand Compute new exponent with respect to bias Datorteknik Floating. Point bild 6

Rounding 1. Multiply the two significands to get the 2 n-bits product: P A x 0 x 1 x 2 x 3 x 4 x 5 g r s s These four bits guard round OR: ed together (“sticky bit”) bit Case 1: x 0 = 0, shift needed: P x 1 x 2 x 3 x 4 x 5 g A r s s Case 2: x 0 = 1, increment exponent, set g=r; r=s or r P A x 0 x 1 x 2 x 3 x 4 x 5 r s s s Datorteknik Floating. Point bild 7

Rounding 2: For both cases: if r = 0, P is the correctly rounded product. if r = 1 and s = 1, then P + 1 is the correctly rounded product if r = 1 and s = 0, (the “halfway case”), then P is the correctly rounded product if x 5 (or g) is 0 P+1 is the correctly rounded product if x 5 (or g) is 1 Datorteknik Floating. Point bild 8

Add / Sub (s 1 * e e 1 ) + (s 2 * e e 2 e 3 ) = (s 3 * e ) 1: Shift summands so they have the same exponent. (eg. if e 2 < e 1: shift s 2 right and increment e 2 until e 1 = e 2) 2: Add significands 3: Normalize number (shift s 3 left and decrement e 3 until MSB = 1) s 3 4: Round s 3 correctly (under the common assumption that more than 23 / 52 bits is internally used for addition) Subtraction use the same method Datorteknik Floating. Point bild 9

Division (s 1 * 2 e 1 ) / (s 2 * 2 ) = (s 1 / s 2) * 2 e 1 -e 2 so, divide significands and subtract exponents Problem: Significand coded in signedmagnitude - use unsigned division (different algoritms exists) and take care of sign Round n + 2 (guard and round) bits significand to n bits significand Compute new exponent with respect to bias Datorteknik Floating. Point bild 10