CHAPTER II Fixed Point Format and Floating Point

unsigned numbers encoded in pure binary Posiive integer 7 6 5 4 3

different binary representations Relative integers +3 +2 +1 0 0 -1 -2 -3 -4

2's complement method Step 1: compute the 1's complement of the binary number representing

Application Representation of -18 : take the absolute value and represent it in binary

Notes Note 1: - the max positive signed number of n- bits is (2

Some data types are: 1. short: of size 16 bits represented as 2’s

$Fractional number representation Consider a fractional binary number having the following form: a$

Conversion of a decimal number into a binary number Consider the decimal number 5.

Fixed point representation: CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 12

CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 13

To correctly multiply – 0. 25 by 0. 25 using Q 8 (1.

CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 15

Floating point representation CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 16

Floating point representation EXPONENT 2’s complement 0 1 -E 2: -22 E 1: 21

Comparison: fixed point & floating point The scanning fixed-point N bit corresponds to a

Slides: 18

Download presentation

CHAPTER II Fixed Point Format and Floating Point Format Binary representation of relative integers: Several binary representations integers are commonly used, especially in converters analog / digital. These include: • 2’s complement. • 1’s complement. • Absolute sign. • Binary shifted. CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 1

unsigned numbers encoded in pure binary Posiive integer 7 6 5 4 3 2 1 0 Pure binary 111 110 101 100 011 010 001 000 CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 2

different binary representations Relative integers +3 +2 +1 0 0 -1 -2 -3 -4 Binary shift 111 110 101 100 011 010 001 000 Sign plus absolute value 011 010 001 000 101 110 111 Complement 1’s 011 010 001 000 111 110 101 100 Complement 2’s 011 010 001 000 111 110 101 100 CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 3

2's complement method Step 1: compute the 1's complement of the binary number representing the absolute value. 1's complement means toggling each bit of the number: the "0" becomes "1" and vice versa. Step 2: add "1" to the result obtained. CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 4

Application Representation of -18 : take the absolute value and represent it in binary form: 18 = 00010010 represent the "1's" complement: 11101101 add 1 to the result: 1110 = -18. Check: 18 + (-18) = 0 00010010 + 1110 0000 CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 5

Notes Note 1: - the max positive signed number of n- bits is (2 n-1 -1). - the minimum negative signed number is (-2 n-1). Note 2: How to compute a negative binary number? Two methods are considered : By computing its 2's complement, so we are finding its absolute value. by computing it the manner considered in the following example: 1010 is a negative number; It equals to: (-23)*1 + 22*0 + 21*1 + 20*0 = -8 + 2 = -6. Note what we used for the MSB. CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 6

Some data types are: 1. short: of size 16 bits represented as 2’s complement with a range from -215 to (215 - 1) 2. int or signed int: of size 32 bits represented as 2’s complement with a range from -231 to (231 - 1) 3. float: of size 32 bits represented as IEEE 32 -bit with a range from 2 -126 =1. 175494 X 10 -38 to 2+128 = 3. 40282346 X 1038 4. double: of size 64 bits represented as IEEE 64 -bit with a range from 2 -1022 =2. 22507385 X 10 -308 to 2+1024 = 1. 79769313 X 10+308

26 - 23 = 64 - 8 = 56 = 00111000

$Fractional number representation Consider a fractional binary number having the following form: a$

Fractional number representation Consider a fractional binary number having the following form: a 1 a 2 a 3 a 4. b 1 b 2 b 3 b 4 where: ai ss, bi are "ones" or "zeros". The fractional part is: 0. b 1 b 2 b 3 b 4 It can be represented in the decimal form by: N = b 1*2 -1 + b 2*2 -2 + b 3*2 -3 + b 4*2 -4 Example : (11. 101)2 = (? )10 11 = 3) 10 0. 101 = 1*2 -1 + 0*2 -2 + 1*2 -3 = 1/2 + 1/8 = 0. 625. So: (11. 101)2 = (3. 625) 10 CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 10

Conversion of a decimal number into a binary number Consider the decimal number 5. 375. To convert this number into a binary one we take decimal part "5" and the fractional part "0. 375" at else. 5) 10 = 101) 2 0. 375) 10 = (? )2 The method of converting a decimal fractional number into a binary one is the successive multiplication by 2: e. g. 0. 375 0. 5 0 * * 2 2 _______ ____ 0. 750 1. 5 1. 0 When a multiplication process gives a number >= 1 then we subtract 1 from it and continue multiplication until reaching a zero result. The final result is the collection of bits written in BOLD and underlined of each multiplication result. => 5. 375) 10 = 101. 011) 2. CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 11

Fixed point representation: CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 12

CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 13

To correctly multiply – 0. 25 by 0. 25 using Q 8 (1. 7) notation, the steps are: 1. Sign-extend the two values to be multiplied: – 0. 25: 1. 110 0000 b → 1 1. 110 0000 b 0. 25: 0. 010 0000 b → 0 0. 010 0000 b 2. Multiply the two values together: 1 1. 110 0000 b × 00. 010 0000 b 11 1100 0000 b 0000 0000 b 0 0000 b 0 0001 1110 0000 b 3. Left-shift the result by 1 position: 0011 1100 0000 b 4. Place the implied decimal point prior to the 13 th least significant bit: 001. 1 1100 0000 b = – 0. 0625 decimal CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 14

CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 15

Floating point representation CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 16

Floating point representation EXPONENT 2’s complement 0 1 -E 2: -22 E 1: 21 MANTISSA E 0: 20 -b 0 -20 . b-1 2 -1 Q 1. 4 b-2 2 -2 1 . 0 1 1 Q 1. 4 MANTISSA Q 1. 4 b-3 2 -3 1 MANTISSA Q 1. 4 b-4 2 -4 0 • Example of floating point representation of 8 bits with m = 5 and e = 3 The exponent is expressed in q 1. 4 format. The mantissa is expressed in 2’s complement and is normalized because for the same x we may get two solutions: E 1, M 1 and E 2, M 2 : ½ ≤ | M | <1 Mantissa M 01110 01100 10010 01000 01111 Representation binary exponent E 010 100 011 Value decimal M. 2 E 1. 75 0. 046875 -7 0. 03125 7. 5 CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 17

Comparison: fixed point & floating point The scanning fixed-point N bit corresponds to a linear scale and a quantification uniform. The digitization floating point N bit corresponds to a scale non-linear geometric progression reason 2 and a quantification of non-uniform type almost logarithmic. CHAPTERII: fixed point and floating point formats DR. Yasser MOHANNA 18