IEEE Floating Point Handling Denormalized Numbers 182007 L

Lecture overview o Denormalized setup for operation with normalized 1/8/2007 - L 24 IEEE

The floating point standard o Single Precision o Value of bits stored in representation is: n n If e=255 and f /= 0, then v is Na. N regardless of s s If e=255 and f = 0, then v = (-1) ¥ s If 0 < e < 255, then v = (-1) 2 e-127 (1. f) – normalized number s If e = 0 and f /= 0, the v = (-1) 2 -126 (0. f) o n Denormalized numbers – allow for graceful underflow s If e = 0 and f = 0 the v = (-1) 0 (zero) 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 3

Consider the example o o o A = 100 $42 C 8 0000 0100 0010 1100 1000 0000 S = 0 E = 1000 0101= 133 – 127 = 6 F = 1001 0000 --Man. A = 1. 100100000 B = 25 $41 C 8 0000 0100 0001 1100 1000 0000 S = 0 E = 1000 0011 = 131 – 127 = 4 F = 1001 0000 --Man. B = 1. 100100000 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 4

Example Continued o For A + B need to align binary pt by 2 places n n n Man. A = 1. 10010000000 Shf. Man. B = 0. 01100100000 - aligned to exp of 6 Sum is 1. 1111010000 with a bin exp of 6 0 100 0010 1 111 1010 0000 ---$4 2 F A 0 0 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 5

Some basics o o o o Consider 11001 or 25 This could be represented as 1. 1001 x 24 OR 0. 001101 x 26 to have the same value Consider 0. 00110000 as the fractional part of a denormalized number This has value 0. 00110000 x 2 -126 Or with a shift by 1 position 0. 0110000 x 2 -127 (Note this is a shift left on a fixed binary pt) 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 6

Why do this o With the shift it has the same value but the format becomes: n n n o v = (-1)s 2 -127 (x. xxxx) = (-1)s 2 e-127 (x. xxxx) where x. xxxx is the shifted fractional part and the e here is 0 This is the same format as normalized number when e=0 and the operation now does not care that one of the inputs was denormalized. 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 7

General Rule o Fixed binary point and shift digits n n o 000. 1000 to 001. 0000 Shift left – subtract 1 from exponent 001. 0000 to 000. 1000 Shift right – add 1 to exponent Fixed binary digits and move binary point n n 000. 1000 to 0001. 000 Move right – subtract 1 from exponent 0001. 000 to 000. 1000 Move left – add 1 to exponent 1/8/2007 - L 24 IEEE Floating Point Basics Copyright 2006 - Joanne De. Groat, ECE, OSU 8