IEEE Floating Point Adder Using the IEEE Floating
- Slides: 43
IEEE Floating Point Adder Using the IEEE Floating Point Standard for an add/subtract execution units 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 1
Lecture overview o o o The Interface Part by part A floating point adder design 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 2
Adder is double precision o Double Precision o Value of bits in word representation is: n n n If e=2047 and f /= 0, then v is Na. N regardless of s s If e=2047 and f = 0, then v = (-1) ¥ s e-1023 If 0 < e < 2047, then v = (-1) 2 (1. f) o n s If e = 0 and f /= 0, the v = (-1) 2 -1022 (0. f) o n – normalized number Denormalized numbers – allow for graceful underflow s If e = 0 and f = 0 the v = (-1) 0 (zero) 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 3
Specification of a FPA o o Floating Point Add/Subtract Unit Specification n Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard o o 1/8/2007 - L 25 Floating Point Adder Normalized numbers Not a Numbers – Na. Ns +/- Infinity Denormalized numbers Copyright 2006 - Joanne De. Groat, ECE, OSU 4
Specifications continued n n Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding + ¥ and - ¥ = Nan per the standard Unit latches it inputs into registers from parallel 64 -bit data busses. There is a separate signal line that indicates the operation add or subtract 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 5
Specifications continued o Outputs n n The correctly represented result Flags that are output are o o o o 1/8/2007 - L 25 Floating Point Adder Zero result Overflow to infinity from normalized numbers as inputs Na. N result Overshift (result is the larger of the two operands) Denormalized result Inexact (result was rounded) Invalid operation for addition Copyright 2006 - Joanne De. Groat, ECE, OSU 6
High level block diagram o Basic architecture interface n n Data – 64 bit A, B, & C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi 1 and Phi 2 (a 2 phase clocked architecture 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 7
Start the VHDL o The entity interface 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 8
Basic design o o Can be divided into functional sub-blocks First latch and drive 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 9
What goes in the other blocks o o From adjusting the inputs to prepare to add To renormalize To round 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 10
VHDL coding for the latched o o o A first cut The input latches Note 2 phase 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 11
And on the output o o Drivers Note use of guarded blocks 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 12
And what goes in between? o o In the final design lots goes in between but You first want to make sure that the latches are working properly So just pass one input to the output and check And once this works properly can move on with the design 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 13
The first section o o Prepare to add Identify type of inputs and appropriately adjust operands 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 14
The exponent unit portion o o o Must get the larger exponent And the difference between the exponents which is the shift distance Also several control signals n n Exponent all 0 s and all 1 s Exponent A>B, A<B, = 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 15
Mantissa Processing Logic o o Need to examine the two fractional parts and generate several control signals that are required to prepare the operands Need relational signals M>, M=, M< n o Needed to know which operand to shift Need to know if stored fractional part if all 0’s or not n Needed for Na. N, 0, ¥ and determination 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 16
After generating control signals o o Step 1 is to select between a normalized mantissa and a denormalized mantissa For normalized – Prepend NOT(Ex 0) n n o If Ex 0 is a 1 then the exponent if all 0 s and you have a denormalized number or 0 When Ex 0 is a 0 you have a Na. N, infinity, or a normalized number Other selection is the factional part shifted left by 1 and postpended by a 0 n n For denormalized numbers Taking it from 2 -126 to 2 -127 and can now treat it like a normalized number 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 17
Now select between these two o Select the denormalized n n n o WHEN Ex 0 * (NOT Mx 0) When Ex 0 is a 1 you have a denormalized number or 0 When Mx 0 is a 0 there is a least 1 bit of the fractional part that is a 1 and thus you have a denormalized number Select the Na. N, infinity, 0, normalized number n n n Select this case when Ex 0 is a 0 or Mx 0 is a 1 When Mx 0 is a 1 have infinity, 0, or a normalized number When Ex 0 is a 0 have a normalized number, infinity, or Na. N 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 18
Shown in table form o o Selection table to also point out this relationship Note that for a 0 have NOT(Ex 0) prepended to the fractional part or a 0. 00000… 000 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 19
Selections are input to a crossbar o o o The crossbar switch place the larger value on the right path and the small onto the left path The small is the operand to shift if any shifting to align the binary point is needed The equation for exchange on the crossbar is n E> + (E=*M>) or shift the A input to the right side if the exponent of A is the larger OR the exponents are equal and the fractional part of A is larger 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 20
The next multiplexers o o o Now have the smaller on the left path and the larger on the right path. On the left path if either exponent is all 1 s then that operand is Na. N or infinity and has been crossbarred, or is equal, to the right path operand. In this case want to simply pass it through to the output by adding 0 to it. So a 0 is one choice of the left path mux. On the right path select the right path value or mux in a hardwired Na. N for an illegal operation 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 21
Linear shifting o o o Next step is to linear shift the left operand The exponent generates the exponent > signals by subtracting the exponents Exp. A-Exp. B and Exp. B-Exp. A Then with the help of the all control signals the exponent difference is known and this value is sent to the shifter. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 22
One last mulitplexer o o The right path operand, the larger is simply input to the ADDER. On the left path the output of the linear shifter is sent to the ADDER for a + operation OR The one’s complement of the value is sent to the ADDER for a – operation. In this case the input carry is handled appropriately. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 23
Code for this section - behavioral o Most of code is generation of various signals and movement of data in muxes 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 24
Xbar code highlight o Code n n n swap <= expgt OR (expeq AND mangt); xbar_r <= lxbarin when (swap = ‘ 1’) else rxbarin; xbar_l <= rxbarin when (swap = ‘ 1’) else lxbarin; 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 25
Hard code Na. N VHDL code o The code n n -- Control equation for mux in_mux_r_man <= expa 1 AND mana 0 and expb 1 AND manb 0 and (signa XOR signb); in_mux_r <= nan_man WHEN (in_mux_r_man = ‘ 1’) ELSE xbar_r; 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 26
Now add the mantissas o o o Simply add the two mantissas. As the sign of the B input was XORed with the operation, i. e. , inverted if it was a subtract operation, the carry in the XOR of the two signs. If the signs are different then a subtract is being performed and a ‘ 1’ if being input to the carry in of the adder. The adder does two’s complement addition. Inputs are of the form x. xxxxx…xx or 54 bits. The output is of the form xx. xxxx…xxx or 58 bits 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 27
On to the next challenge o o o This is perhaps the hardest part – renormalization of the result Have a result exponent (the exponent of the larger) and a mantissa in the form xx. xxxxxx…xxxx The following slide shows the processing needed 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 28
Renormalization Unit o Have exponent and mantissa to deal with. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 29
Many choices to deal with o o o May need to shift the mantissa 1 position to the right on a fixed binary point. May be OK as is May have to shift left – then need to know the position of the leading 1. n n In a behavioral model can simply shift left once, increment a counter and then check. In hardware need a leading 1 detector that give the position of the leading 1 so that the mantissa can be shifter left. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 30
Interactions o o All shifts of mantissa result in exponent adjustment. There are 4 choices on the exponent n n As is Incremented by 1 Adjusted down by some amount depending on shift Zero 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 31
Interactions o There are 5 choices on the mantissa n n n o As is Right shifted by 1 – increment exp by 1 Left shifted for leading 1 Left shifted and then right shifted by 1 Hardwired 0 This part is the same for both addition and multiplication. Easy to do algorithmically. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 32
Rounding Unit o o o Once done with renormalization will look at the guard bits to determine rounding. Standard specifies several rounding modes. Can also just truncate. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 33
Rounding o o Can result in changes to both the mantissa and the exponent. After rounding final result is output in normalized form. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 34
And don’t forget the flags o o Any arithmetic unit output flags on the status and validity of the result. The flags to be generated are output from various control signals or combinations of various control signals. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 35
To test (verify) the design o o Must test for normal operation and boundary conditions Will check A by B n n n o Na. N +/- infinity +/- 0 Denorm Norm For both direct and all crossed pairings 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 36
Boundary conditions o Wish to check several boundary conditions n n n n Denorm + Denorm = Max Denorm + Denorm = Min Norm – Norm = Max Denorm … Rounding using first guard bit Rounding using 1 st and 2 nd guard bits … 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 37
Testing o o Testing of the design code is not necessarily the same as the testing the would be done on the chip. The “testing” of the design is call verification and must insure that all possible input combinations produce the specified output. 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 38
Scan of entire architecture 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 39
Scan of the chip 1/8/2007 - L 25 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 40
Result cases in verification o o Input classes – Na. N, Inf, 0, Norm, Dnorm Each class by other n Na. N by all the others gives Na. N o n n For the multiplier design this is easy since all results are Na. N and Na. N is generated directly by unit. Inf – possible results – Inf or Na. N when illegal op of +Inf * -Inf – result generated directly 0 – result is 0 – also generated directly 1/8/2007 - L 2 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 41
Result cases 2 o Norm and Denormalized cases n Norm by Norm o o n 3 possible results Overflow to infinity Normalized Underflow to denormalized or 0 Denorm by Denorm o o 1/8/2007 - L 2 Floating Point Adder Denormalized Underflow to 0 Copyright 2006 - Joanne De. Groat, ECE, OSU 42
Denorm conditions o Exponent possibilities n n Exponent is 0 and 00. xxxxx – simply left shift Exponent is negative and 1/8/2007 - L 2 Floating Point Adder Copyright 2006 - Joanne De. Groat, ECE, OSU 43
- Compound adder
- Point format example
- Fixed point vs floating point
- Fp adder
- Range of signed number
- Floating point multiplication flowchart
- Floating point instructions mips
- Floating point operations per second
- Why use floating point numbers
- Floating point denormalized
- Floating point
- Floating point representation
- Floating-point number
- Floating point 32 bit
- Eascii
- Parts of a floating point number
- Branch prediction logic in pentium processor
- Xkcd floating point
- Dfa for floating point numbers
- Floating point representation adalah
- Xkcd floating point
- Express (32)10 in the revised 14-bit floating-point model
- Floating point form
- Floating point puzzles
- Floating point puzzles
- Integer
- Two's complement representation
- Explain floating point arithmetic operations with example
- Serial adder vhdl code
- K map half adder
- Simbol half adder
- Full adder equation
- 4 bit carry look ahead adder
- Binary full subtractor
- 74hc382
- Rangkaian adder yang menjumlahkan banyak bit disebut *
- Compound adder
- Sommatore binario
- Bcd adder vhdl code
- Qiskit adder
- Carry select adder
- Rangkaian komparator 2 bit
- Carry propagate adder
- Semicolon adder