Floating Point Numbers Material on Data Representation can

Floating Point Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter) CSIT 301 (Blum) 1

Fractions • Similar to what we’re used to with decimal numbers 3. 14159 = 3 · 100 + 1 · 10 -1 + 4 · 10 -2 + 1 · 10 -3 + 5 · 10 -4 + 9 · 10 -5 11. 001001 = 1 · 21 + 1 · 20 + 0 · 2 -1 + 0 · 2 -2 + 1 · 2 -3 + 0 · 2 -4 + 0 · 2 -5 + 1 · 2 -6 (11. 001001 3. 140625) CSIT 301 (Blum) 2

Converting decimal to binary II • 98. 61 – Integer part • • 98 / 2 49 / 2 24 / 2 12 / 2 6/2 3/2 1/2 = 49 = 24 = 12 = 6 = 3 = 1 = 0 remainder remainder 0 1 0 0 0 1 1 – 1100010 CSIT 301 (Blum) 3

Converting decimal to binary III • 98. 61 – Fractional part • • • 0. 61 2 = 1. 22 0. 22 2 = 0. 44 2 = 0. 88 2 = 1. 76 0. 76 2 = 1. 52 0. 52 2 = 1. 04 –. 100111 CSIT 301 (Blum) 4

Another Example (Whole number part) • 123. 456 – Integer part • • 123 / 2 = 61 remainder 1 61 / 2 = 30 remainder 1 30 / 2 = 15 remainder 0 15 / 2 = 7 remainder 1 7 / 2 = 3 remainder 1 3 / 2 = 1 remainder 1 1 / 2 = 0 remainder 1 – 1111011 CSIT 301 (Blum) 5

Checking: Find Calculator on menu PHY 201 (Blum) 6

Put the calculator in Programmer view PHY 201 (Blum) 7

Enter number (in Decimal), read off binary or put into binary mode if you want to use copy/Paste PHY 201 (Blum) 8

$Another Example (fractional part) • 123. 456 – Fractional part • • 0. 456$

Another Example (fractional part) • 123. 456 – Fractional part • • 0. 456 2 = 0. 912 2 = 1. 824 0. 824 2 = 1. 648 0. 648 2 = 1. 296 0. 296 2 = 0. 592 2 = 1. 184 0. 184 2 = 0. 368 … –. 0111010… CSIT 301 (Blum) 9

Convert to decimal mode, then PHY 201 (Blum) 10

Ctrl-C to copy the displayed number. Switch to Scientific View. Ctrl-V to paste PHY 201 (Blum) 11

Divide by 2 raised to the number of digits (in this case 7, including leading zero) 1 2 PHY 201 (Blum) 12

Divide by 2 raised to the number of digits (in this case 7, including leading zero) 3 4 PHY 201 (Blum) 13

Finally hit the equal sign. In most cases it will not be exact PHY 201 (Blum) 14

$Other way around • Multiply fraction by 2 raised to the desired number of$

Other way around • Multiply fraction by 2 raised to the desired number of digits in the fractional part. For example –. 456 27 = 58. 368 • Throw away the fractional part and represent the whole number – 58 111010 • But note that we specified 7 digits and the result above uses only 6. Therefore we need to put in the leading 0 – 0111010 CSIT 301 (Blum) 15

Fixed point • If one has a set number of bits reserved for representing the whole number part and another set number of bits reserved for representing the fractional part of a number, then one is said to be using fixed point representation. – The point dividing whole number from fraction has an unchanging (fixed) place in the number. CSIT 301 (Blum) 16

Limits of the fixed point approach • Suppose you use 4 bits for the whole number part and 4 bits for the fractional part (ignoring sign for now). • The largest number would be 1111 = 15. 9375 • The smallest, non-zero number would be 0000. 0001 =. 0625 CSIT 301 (Blum) 17

Floating point representation • Floating point representation allows one to represent a wider range of numbers using the same number of bits. • It is like scientific notation. CSIT 301 (Blum) 18

Scientific notation • Used to represent very large and very small numbers. – Ex. Avogadro’s number • 6. 0221367 1023 particles • 6022136700000000 – Ex. Fundamental charge e • 1. 60217733 10 -19 C • 0. 000000000160217733 C CSIT 301 (Blum) 19

Scientific notation: all of these are the same number • • • 12345. 6789 = 1234. 56789 100 1234. 56789 10 = 1234. 56789 101 123. 456789 100 =123. 456789 102 12. 3456789 103 1. 23456789 104 Rule: Shift the point to the left and increment the power of ten. CSIT 301 (Blum) 20

Small numbers • • 0. 000001234 0. 00001234 10 -1 0. 0001234 10 -2 0. 001234 10 -3 0. 01234 10 -4 0. 1234 10 -5 1. 234 10 -6 Rule: shift point to the right and decrement the power. CSIT 301 (Blum) 21

IEEE 754 standards • The standards for floating point numbers are known as IEEE 754. • Starting with the fixed point binary representation, shift the point and increase the power (of 2 now that we’re in binary). • Like Scientific Notation, shift so that the number has one non-zero whole number digit (not 0 hence a 1) and the remainder are fractional bits. CSIT 301 (Blum) 22

Floats (98. 61) • SHIFT expression so it is between 1 and 2 and keep track of the number of shifts • 1100010. 10011100001010001 • 1. 10001010011100001010001 26 • Express the number of shifts in binary • 1. 10001010011100001010001 200000110 CSIT 301 (Blum) We’re not done yet so this exponent will change. 23

Mantissa and Exponent and Sign • • • 1. 10001010011100001010001 200000110 (Significand) Mantissa 1. 10001010011100001010001 200000110 Exponent +1. 10001010011100001010001 200000110 The number may be negative, so there a bit (the sign bit) reserved to indicate whether the number is positive or negative CSIT 301 (Blum) 24

Small numbers • 0. 000010101110 • 1. 0101110 2 -5 • The power (a. k. a. the exponent) could be negative so we have to be able to deal with that. • Floating point numbers use a procedure known as biasing to handle the negative exponent problem. CSIT 301 (Blum) 25

Biasing • Actually the exponent is not represented as shown previously. • There were 8 bits used to represent the exponent on the previous slide, that means there are 256 numbers that could be represented. • Since the exponent could be negative (to represent numbers less than 1), we choose roughly half of the range to be positive and half to be negative. CSIT 301 (Blum) 26

Biasing (Cont. ) • In biasing, one does not use 2’s complement or a sign bit. • Instead one adds a bias (equal to the magnitude of the most negative number) to the exponents and represents the result of that addition. CSIT 301 (Blum) 27

Biasing (Cont. ) • The exponents of all 1’s is reserved for special purposes – as is the exponent of all 0’s. • Thus with 8 bits, the bias is 127 (= 27 -1 that is 2 raised to the number of bits used for the exponent minus one). • In our previous example, we had to shift 6 times to the left, corresponding to an exponent of +6. • We add that shift to the bias 127+6=133. • That is the number we put in the exponent portion: 133 10000101. CSIT 301 (Blum) 28

Big floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the largest float? • Mantissa: 1111 Exponent 1111 • 0. 9375 27 • =120 • (Compare this to the largest fixed-point number using the same amount of space 15. 9375) CSIT 301 (Blum) 29

Small floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the smallest float? • Mantissa: 1000 Exponent 0000 • 0. 5 2 -8 • = 0. 001953125 • (Compare this to the smallest fixed-point number using the same amount of space. 0625) CSIT 301 (Blum) 30

Mantissa Storage • 1. 10001010011100001010001 200000110 • (Significand) Mantissa • Our rules have use starting with 1. something (there a few exceptions). • The standards come from a time when storage was “expensive” – so why store a digit that is always 1? So the standard does not store the 1 – it is implied. CSIT 301 (Blum) 31

The pieces • One bit for a sign • Eight bits for an exponent – biased by 127 • Twenty-three digits for the mantissa – which does not include the implied 1 • +98. 61 • Sign: 0 • Exponent: 1000 0101 • Mantissa: 1000 1010 0111 0000 1010 001 CSIT 301 (Blum) 32

https: //www. h-schmidt. net/Float. Converter/IEEE 754. html CSIT 301 (Blum) 33

Adding Floats • Consider adding the following numbers expressed in scientific notation 3. 456789 103 1. 212121 10 -2 • The first step is to re-express the number with the smaller magnitude so that it has the same exponent as the other number. CSIT 301 (Blum) 34

Adding Floats (Cont. ) • • 1. 212121 10 -2 0. 1212121 10 -1 0. 01212121 100 0. 001212121 101 0. 0001212121 102 0. 00001212121 103 The number was shifted 5 times (3 -(-2)). CSIT 301 (Blum) 35

Adding Floats (Cont. ) • When the exponents are equal the mantissas can be added. 3. 456789 103 0. 00001212121 103 • =3. 45680112121 103 CSIT 301 (Blum) 36

Rounding • In a computer there a finite number of bits used to represent a number. • When the smaller floating-point number is shifted to make the exponents equal, some of the less significant bits are lost. • This loss of information (precision) is known as rounding. CSIT 301 (Blum) 37

One more fine point about floating-point representation • As discussed so far, the mantissa (significand) always starts with a 1. • When storage was expensive, designers opted not to represent this bit, since it is always 1. • It had to be inserted for various operations on the number (adding, multiplying, etc. ), but it did not have to be stored. CSIT 301 (Blum) 38

Still another fine point • When we assume that the mantissa must start with a 1, we lose 0. • Zero is too important a number to lose, so we interpret the mantissa of all zeros and exponent of all zeros as zero – Even though ordinarily we would assume the mantissa started with a one that we didn’t store. CSIT 301 (Blum) 39

Yet another fine point • In the IEEE 754 format for floats, you bias by one less (127) and reserve the exponents 0000 and 1111 for special purposes. • One of these special purposes is “Not a number” (Na. N). • Another in “Infinity” which is the floating point version of overflow. CSIT 301 (Blum) 40

An example • Represent -9087. 8735 as a float using 23 bits for the mantissa, 8 for the exponent and one for the sign. • The float stores 23 bits but there is an implied bit, so we will talk about 24. • Convert the whole number magnitude 9087 to binary: 10 0011 0111 1111 • That uses up 14 of the 24 bits for the mantissa (23 stored), leaving 10 for the fractional part. CSIT 301 (Blum) 41

$An example (Cont. ) • Multiply the fractional part by 210 and convert whole$

An example (Cont. ) • Multiply the fractional part by 210 and convert whole number part of that to binary, make sure in uses 9 bits (add leading 0’s if it doesn’t). • . 8735 210 = 894. 464 • 894 1101111110 CSIT 301 (Blum) 42

An example (Cont. ) • • • 10001101111111. 1101111110 1. 0001101111101111110 213 Mantissa (1)0001101111101111110 Exponent 13+127=140 10001100 Sign bit 1 (because number was negative) The actual order is sign-exponent-mantissa CSIT 301 (Blum) 43

Check CSIT 301 (Blum) 44

Example 2 • 0. 0076534 • No whole number part. Begin by using all 24 (sic) mantissa bits for the fractional part. • 0. 0076534 224 = 128402. 7449344 • 128402 11111010110010010 • Only uses 17 places, means that so far number starts with 7 zeros. But float mantissas are supposed to start with 1. • . 000000011111010110010010 • 1. 1111010110010010× 2 -8 • (But we need more digits for our mantissa) CSIT 301 (Blum) 45

Example 2 (Cont. ) 24+7 • 0. 0076534 231 = 16435551. 3516032 • 16435551 1111 1010 1100 1001 0101 1111 • Above is mantissa • Exponent 127 – 8 = 119 0111 • Sign bit 0 (positive number) CSIT 301 (Blum) 46

Check CSIT 301 (Blum) 47

Reverse • • 10000111101010001100100111110101 Sign bit is one number is negative Exponent 00001111 15 15 -127 (unbias) -112 • Mantissa: 1. 01010001100100111110101 CSIT 301 (Blum) 48

Reverse (Cont) • 1. 01010001100100111110101 × 223 / 223 • 101010001100100111110101 / 223 • 11061749 / 223 CSIT 301 (Blum) 49

Reverse (Cont). • -1. 31866323947906494140625*2^(-112) CSIT 301 (Blum) 50

Check CSIT 301 (Blum) 51

Range • • Big float 2*2127 = 3. 403 x 1038 Small float 1*2 -126 = 1. 175 x 10 -38 CSIT 301 (Blum) 52

References • Computer Organization and Design Patterson & Hennessy, pp. 206 -215 • Computer Architecture, Nicholas Carter • Computer Systems: Organization and Architecture, John Carpinelli CSIT 301 (Blum) 53