Fractions and the Real Numbers Decimal Numbers 1

Fractions and the Real Numbers Decimal Numbers 1 Many interesting quantities are not normally integer-valued: - the mass of a rocket payload - the batting average of a baseball player - the average score on an assignment Therefore, we have fractions… … and many fractions can be represented using a positional notation where we allow the powers of the base to be negative as well as non-negative: Of course, some fractions cannot be represented this way in a finite manner: So, rational numbers are a handy idea, but we'll ignore that for now… CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

$Fractions: Fixed-Point Decimal Numbers 2 How can we represent fractions? - Use a “binary$

Fractions: Fixed-Point Decimal Numbers 2 How can we represent fractions? - Use a “binary point” to separate positive from negative powers of two – just like “decimal point. ” - 2’s comp addition and subtraction still work (if binary points are aligned). But, we cannot represent extremely large or extremely small values, with a reasonable fixed number of bits, in fixed-point notation, if we must specify an alignment for the binary point… CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Floating-point Notation Decimal Numbers 3 Some numbers are so large or so small that it's inconvenient to write them in the usual manner: Therefore, we have scientific or floating-point notation: But, of course: So, we frequently adopt a normalized representation requiring the decimal point to be in a specific location, say immediately after the first digit: CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Anatomy of a Floating-point Value Decimal Numbers 4 A floating-point value can be viewed as a combination of three distinct components: sign the sign of the number significand the normalized value (to be shifted by the size of the exponent) exponent the amount by which the significand is shifted to obtain the true value (The base of representation is implicit. ) CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Converting base-10 to base-2 Decimal Numbers 5 At the hardware level, we'll represent values in base-2: In general, a base-10 value between 0 and 1 can be converted to base-2 by successively multiplying by 2 and recording whethere was a carry across the decimal point, stopping when you obtain zero (or enough bits to satisfy your needs): fractional value carry-over? . 625. 25 1 . 5 0 . 0 1 CS@VT August 2009 0. 101 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

$Converting base-10 to base-2 Here a couple more examples: 3. 8125 -----fractional value. 8125.$

Converting base-10 to base-2 Here a couple more examples: 3. 8125 -----fractional value. 8125. 625. 5. 0 carry-over? 1 1 0 1 So, 3. 8125 converts to 11. 1101 in base-2. Decimal Numbers 6 3. 1416 -----fractional value. 1416. 2832. 5664. 1328. 2656. 5312. 0624. 1248. 2496. 4992. 9984. 9968. 9936 carry-over? 0 0 1 1 So, 3. 1416 is about 11. 001001000011 in base-2. CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Taking It to Hardware Decimal Numbers 7 We have to decide how to handle three components of the floating-point representation. A single bit suffices to represent the sign of the number. We have to allocate bits for the significand for the exponent. - more bits for the significand provides more accuracy (significant digits) - more bits for the exponent provides a larger range of representation What about normalization? - except for zero, every number will have a left-most (most significant bit) of 1 - why store it? CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

IEEE 754 Floating Point Standard(s) Decimal Numbers 8 IEEE 754 -1985 floating point standard defines fundamental formats: - single precision: 8 bit exponent, 23 bit significand, 1 sign bit - double precision: 11 bit exponent, 52 bit significand, 1 sign bit For both: - the exponent is stored as a non-negative integer, with a bias (127, 1023) - the significand is normalized so that the binary point is to the right of the first nonzero bit (0 being an exception) - the first bit of the significand is not stored (phantom bit) - first bit of significand equals 1 unless biased exponent equals 0 - +0 is represented by 32 zeros; also have -0! There are lots of other details, including denormalized numbers; we will ignore them. CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

$General Format Decimal Numbers 9 The general format is: fraction f e CS@VT August$

General Format Decimal Numbers 9 The general format is: fraction f e CS@VT August 2009 significand excluding the high-order bit # of bits in the fraction # of bits in the biased exponent Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

IEEE 754 Floating Point Examples Decimal Numbers 10 Consider the base-10 value 3. 8125. We saw earlier that this converts to 11. 1101 in base-2. This normalizes to 1. 11101 x 2^1. Sign bit is 0 since number is nonnegative 0 CS@VT August 2009 Stored exponent is 1 + 127 = 128 10000000 Normalized fraction is 11101, padded with 0 s to 23 bits 11101000000000 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Floating Point Addition Decimal Numbers 11 Addition: - shift so that the exponents are equal - add the mantissas - normalize the result 32. 2510 = 100000. 012 1. 0000001 · 25 0. 187510 = 0. 00112 1. 1 · 2 -3 in IEEE single format 1. 0000001 · 25 0. 000000011 · 25 -----1. 000000111 · 25 = 100000. 01112 32. 437510 The result is already normalized to the IEEE format. CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Floating Point Multiplication Decimal Numbers 12 Multiplication: - multiply the mantissas - add the exponents - normalize the result 32. 2510 = 100000. 012 1. 0000001 · 25 0. 187510 = 0. 00112 1. 1 · 2 -3 in IEEE single format 1. 0000001 · 1. 1 = 1. 10000011 Exponent would be 2, so the product equals: 1. 10000011 · 22 = 110. 0000112 6. 04687510 Again, the result is already normalized to the IEEE format. CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Floating Point Complexities Decimal Numbers 13 Operations are somewhat more complicated than integer operations. In addition to overflow we can have “underflow”, Representable values are "relatively" equally spaced. Accuracy can be a big problem - IEEE 754 keeps two extra bits, guard and round - four rounding modes - positive divided by zero yields “infinity” - zero divide by zero yields “not a number” (Na. N) - other complexities Not following the standard can be even worse - see text for description of 80 x 86 and Pentium bug! CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

$Unrepresentable Numbers Of course we know that many fractions cannot be represented in a$

Unrepresentable Numbers Of course we know that many fractions cannot be represented in a finite number of digits. But things may be worse than we would naturally expect: Decimal Numbers 14 0. 1 -----fractional value. 1. 2. 4. 8. 6. 2 carry-over? 0 0 0 1 1 So, 0. 1 in base-2 is: 0. 000110011 CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Unrepresentable Results Decimal Numbers 15 Suppose that X = 1. 0 and Y = 1. 0 x 2 -24. Both are representable as normalized IEEE singles. But: X + Y = 1. 000000000000 + 0. 0000000000001 = 1. 0000000000001 And that value cannot be stored correctly as an IEEE single; in fact, the result will either be truncated to 1. 0 or rounded up when it is stored. Either way, an incorrect value will have been stored. Using IEEE doubles merely changes the scale of the problem… CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens

Some Floating Point Issues Decimal Numbers 16 Machine epsilon is defined to be ε = 2 -t, where t is the number of bits in the mantissa. This is the smallest distinguishable relative difference between two numbers that have different floating-point representations. Storage errors are inevitable since finitely-many bits are used in the representation. The most we can expect is that: fl(x) = x(1 + δ), where |δ| <= ε Round-off errors are also inevitable, and may be magnified in interesting ways. Take Numerical Analysis or Numerical Methods to learn more… CS@VT August 2009 Computer Organization I © 2006 -09 Mc. Quain, Feng & Ribbens