Floating Point Number system corresponding to the decimal

Floating Point Number system corresponding to the decimal notation 1, 837 * 10 4 significand exponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754 -1985 (IEC 559) Computer Engineering Floating. Point page 1

$IEEE 754 -1985 Number representations: – Single precision (32 bits) sign: exponent: fraction: 1$

IEEE 754 -1985 Number representations: – Single precision (32 bits) sign: exponent: fraction: 1 bit 8 bits 23 bits – Double precision (64 bits) sign: exponent: fraction: 1 bit 11 bits 52 bits Computer Engineering Floating. Point page 2

Single Precision Format Sign S 1 S 8 E Exponent E: excess 127 binary integer 23 F Mantissa M (24 bit): normalized binary significand w/ hidden integer bit: 1. F Excess 127; actual exponent is e = E - 127 N = (-1)S * (1. F [bit-string])*2 e Computer Engineering Floating. Point page 3

Example 1 S 1 E 01111110 F 100000000000 e = E - 127 e = 126 - 127 = -1 N = (-1)1 * (1. 1 [bit-string]) *2 -1 N = -1 * 0. 11 [bit-string] N = -1 * (2 -1 *1 + 2 -2 *1) N = -1 * (0. 5*1 + 0. 25*1) = -0. 75 Computer Engineering Floating. Point page 4

Single Precision Range Magnitude of numbers that can be represented is in the range: 2 -126 *(1. 0) to 2128 *(2 -2 -23) which is approximately: 1. 2*10 -38 to 3. 4 *1038 Computer Engineering Floating. Point page 5

IEEE 754 -1985 Single Precision (32 bits) Double Precision (64 bits) Fraction part: 23 bits; Fraction part: 52 bits; 0 x < 1 Significand: 1 + fraction part. “ 1” is not stored; “hidden bit”. Corresponds to 7 decimal digits. 0 x < 1 Significand: 1 + fraction part. “ 1” is not stored; “hidden bit”. Corresponds to 16 decimal digits. Exponent: 127 added to the exponent. Corresponds to the range 10 -39 to 10 39 Exponent: 1023 added to the exponent; Corresponds to the range 10 -308 to 10 308 Computer Engineering Floating. Point page 6

IEEE 754 -1985 Special features: – Correct rounding of “halfway” result (to even number). – Includes special values: Na. N - Not a number Infinity - Infinity – Uses denormal number to represent numbers less than 2 -E min – Rounds to nearest by default; Three other rounding modes exist. – Sophisticated exception handling. Computer Engineering Floating. Point page 7

Add / Sub (s 1 * 2 e 1) +/- (s 2 * 2 e 2 ) = (s 1 +/- s 2) * 2 e 3 = s 3 * 2 e 3 – s = 1. s, the hidden bit is used during the operation. 1: Shift summands so they have the same exponent: – e. g. , if e 2 < e 1: shift s 2 right and increment e 2 until e 1 = e 2 2: Add/Sub significands using the sign bits for s 1 and s 2. – set sign bit accordingly for the result. 3: Normalize result (sign bit kept separate): – shift s 3 left and decrement e 3 until MSB = 1. 4: Round s 3 correctly. – more than 23 / 52 bits is used internally for the addition. Computer Engineering Floating. Point page 8

Multiplication (s 1 * 2 e 1) * (s 2 * 2 e 2 ) = s 1 * s 2 * 2 e 1+e 2 so, multiply significands and add exponents. Problem: Significand coded in sign & magnitude; use unsigned multiplication and take care of sign. Round 2 n bits significand to n bits significand. Normalize result, compute new exponent with respect to bias. Computer Engineering Floating. Point page 9

Division (s 1 * 2 e 1 ) / (s 2 * 2 e 2 ) = (s 1 / s 2) * 2 e 1 -e 2 so, divide significands and subtract exponents Problem: Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign Round n + 2 (guard and round) bits significand to n bits significand Compute new exponent with respect to bias Computer Engineering Floating. Point page 10