Floating Point Representation Major All Engineering Majors Authors

  • Slides: 19
Download presentation
Floating Point Representation Major: All Engineering Majors Authors: Autar Kaw, Matthew Emmons http: //numericalmethods.

Floating Point Representation Major: All Engineering Majors Authors: Autar Kaw, Matthew Emmons http: //numericalmethods. eng. usf. edu Numerical Methods for STEM undergraduates 10/3/2020 http: //numericalmethods. eng. usf. edu 1

Floating Decimal Point : Scientific Form 2 lmethods. eng. usf. edu ht

Floating Decimal Point : Scientific Form 2 lmethods. eng. usf. edu ht

Example The form is or Example: For 3 lmethods. eng. usf. edu ht

Example The form is or Example: For 3 lmethods. eng. usf. edu ht

Floating Point Format for Binary Numbers 1 is not stored as it is always

Floating Point Format for Binary Numbers 1 is not stored as it is always given to be 1. 4 lmethods. eng. usf. edu ht

Example 9 bit-hypothetical word §the first bit is used for the sign of the

Example 9 bit-hypothetical word §the first bit is used for the sign of the number, §the second bit for the sign of the exponent, §the next four bits for the mantissa, and §the next three bits for the exponent We have the representation as 0 Sign of the number 5 0 1 Sign of the exponent 0 1 mantissa 1 1 0 1 exponent lmethods. eng. usf. edu ht

Machine Epsilon Defined as the measure of accuracy and found by difference between 1

Machine Epsilon Defined as the measure of accuracy and found by difference between 1 and the next number that can be represented 6 lmethods. eng. usf. edu ht

Example Ten bit word §Sign of number §Sign of exponent §Next four bits for

Example Ten bit word §Sign of number §Sign of exponent §Next four bits for mantissa Next number 7 0 0 0 0 0 1 lmethods. eng. usf. edu ht

Relative Error and Machine Epsilon The absolute relative true error in representing a number

Relative Error and Machine Epsilon The absolute relative true error in representing a number will be less then the machine epsilon Example 10 bit word (sign, sign of exponent, 4 for mantissa) 0 Sign of the number 8 1 0 Sign of the exponent 1 1 exponent 0 1 1 0 0 mantissa lmethods. eng. usf. edu ht

IEEE 754 Standards for Single Precision Representation http: //numericalmethods. eng. usf. edu

IEEE 754 Standards for Single Precision Representation http: //numericalmethods. eng. usf. edu

IEEE-754 Floating Point Standard • Standardizes representation of floating point numbers on different computers

IEEE-754 Floating Point Standard • Standardizes representation of floating point numbers on different computers in single and double precision. • Standardizes representation of floating point operations on different computers.

One Great Reference What every computer scientist (and even if you are not) should

One Great Reference What every computer scientist (and even if you are not) should know about floating point arithmetic! http: //www. validlab. com/goldberg/paper. pdf

IEEE-754 Format Single Precision 32 bits for single precision 0 0 0 0 0

IEEE-754 Format Single Precision 32 bits for single precision 0 0 0 0 0 0 0 0 Sign (s) 12 Biased Exponent (e’) Mantissa (m)

Example#1 1 1 0 0 0 1 0 1 0 0 0 0 0

Example#1 1 1 0 0 0 1 0 1 0 0 0 0 0 Sign (s) 13 Biased Exponent (e’) Mantissa (m)

Example#2 Represent -5. 5834 x 1010 as a single precision floating point number. ?

Example#2 Represent -5. 5834 x 1010 as a single precision floating point number. ? ? ? ? ? ? ? ? Sign (s) 14 Biased Exponent (e’) Mantissa (m)

Exponent for 32 Bit IEEE-754 8 bits would represent Bias is 127; so subtract

Exponent for 32 Bit IEEE-754 8 bits would represent Bias is 127; so subtract 127 from representation 15

Exponent for Special Cases Actual range of and are reserved for special numbers Actual

Exponent for Special Cases Actual range of and are reserved for special numbers Actual range of

Special Exponents and Numbers all zeros all ones s 0 1 0 or 1

Special Exponents and Numbers all zeros all ones s 0 1 0 or 1 all zeros all ones m Represents all zeros 0 all zeros -0 all zeros non-zero Na. N

IEEE-754 Format The largest number by magnitude The smallest number by magnitude Machine epsilon

IEEE-754 Format The largest number by magnitude The smallest number by magnitude Machine epsilon 18

THE END http: //numericalmethods. eng. usf. edu

THE END http: //numericalmethods. eng. usf. edu