CSE 477 VLSI Digital Circuits Fall 2003 Lecture

  • Slides: 32
Download presentation
CSE 477 VLSI Digital Circuits Fall 2003 Lecture 20: Adder Design Mary Jane Irwin

CSE 477 VLSI Digital Circuits Fall 2003 Lecture 20: Adder Design Mary Jane Irwin ( www. cse. psu. edu/~mji ) www. cse. psu. edu/~cg 477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © 2003 Rabaey, A. Chandrakasan, B. Nikolic] CSE 477 L 20 Adder Design. 1 J. Irwin&Vijay, PSU, 2003

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter,

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter, etc. q l Register file and pipeline registers l Multiplexers, decoders Control l q Interconnect l q Finite state machines (PLA, ROM, random logic) Switches, arbiters, buses Memory l Caches (SRAMs), TLBs, DRAMs, buffers CSE 477 L 20 Adder Design. 2 Irwin&Vijay, PSU, 2003

The 1 -bit Binary Adder Cin A B 1 -bit Full Adder (FA) Cout

The 1 -bit Binary Adder Cin A B 1 -bit Full Adder (FA) Cout G=A&B P=A B K = !A & !B S A B Cin Cout S carry status 0 0 0 kill 0 0 1 kill 0 1 0 0 1 propagate 0 1 1 1 0 propagate 1 0 0 0 1 propagate 1 0 1 1 0 propagate 1 1 0 generate 1 1 1 generate S = A B Cin = P Cin Cout = A&B | A&Cin | B&Cin = G | P&Cin (majority function) q How can we use it to build a 64 -bit adder? q How can we modify it easily to build an adder/subtractor? q How can we make it better (faster, lower power, smaller)? CSE 477 L 20 Adder Design. 3 Irwin&Vijay, PSU, 2003

FA Gate Level Implementations q The way you learned to design in CSE 271

FA Gate Level Implementations q The way you learned to design in CSE 271 and CSE 471 A B Cin A t 1 B Cin t 0 t 2 t 1 t 0 Cout S CSE 477 L 20 Adder Design. 4 Irwin&Vijay, PSU, 2003

Review: XOR FA Cin A S B Cout 16 transistors CSE 477 L 20

Review: XOR FA Cin A S B Cout 16 transistors CSE 477 L 20 Adder Design. 5 Irwin&Vijay, PSU, 2003

Review: CPL FA !B !Cin B Cin A !S !A S B !B Cin

Review: CPL FA !B !Cin B Cin A !S !A S B !B Cin A !Cin !Cout B Cin !A Cout !B !Cin 20+8 transistors, dual rail – beware of threshold drops CSE 477 L 20 Adder Design. 6 Irwin&Vijay, PSU, 2003

Review: Mirror Adder 24+4 transistors A 8 B 4 B kill 0 -propagate 8

Review: Mirror Adder 24+4 transistors A 8 B 4 B kill 0 -propagate 8 A 8 4 A 4 Cin 1 -propagate A A 8 4 B 2 generate Cout = A&B | B&Cin | A&Cin A 4 4 !Cout 4 4 Cin 2 B 2 Cin 2 B 6 A 6 Cin 6 !S Cin 3 A 3 B 3 SUM = A&B&Cin | COUT&(A | B | Cin) Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2. CSE 477 L 20 Adder Design. 8 Irwin&Vijay, PSU, 2003

Mirror Adder Features q The NMOS and PMOS chains are completely symmetrical with a

Mirror Adder Features q The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized. q When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances. q The transistors connected to Cin are placed closest to the output. q Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. CSE 477 L 20 Adder Design. 9 Irwin&Vijay, PSU, 2003

A 64 -bit Adder/Subtractor add/subt q q Ripple Carry Adder (RCA) built out of

A 64 -bit Adder/Subtractor add/subt q q Ripple Carry Adder (RCA) built out of 64 FAs Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in RCA l advantage: simple logic, so small (low cost) l disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) CSE 477 L 20 Adder Design. 10 A 0 1 -bit FA C 1 S 0 A 1 1 -bit FA C 2 S 1 A 2 1 -bit FA C 3 S 2 B 0 B 1 B 2 . . . q C 0=Cin C 63 A 63 B 63 1 -bit FA S 63 C 64=Cout Irwin&Vijay, PSU, 2003

Ripple Carry Adder (RCA) A 3 B 3 A 2 B 2 A 1

Ripple Carry Adder (RCA) A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA FA S 3 S 2 S 1 S 0 Cout=C 4 C 0=Cin Tadder TFA(A, B Cout) + (N-2)TFA(Cin Cout) + TFA(Cin S) T = O(N) worst case delay Real Goal: Make the fastest possible carry path CSE 477 L 20 Adder Design. 11 Irwin&Vijay, PSU, 2003

Inversion Property q Inverting all inputs to a FA results in inverted values for

Inversion Property q Inverting all inputs to a FA results in inverted values for all outputs A Cout B FA A Cin Cout S B FA Cin S !S (A, B, Cin) = S(!A, !B, !Cin) !Cout (A, B, Cin) = Cout (!A, !B, !Cin) CSE 477 L 20 Adder Design. 12 Irwin&Vijay, PSU, 2003

Exploiting the Inversion Property A 3 B 3 A 2 B 2 A 1

Exploiting the Inversion Property A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA’ FA’ S 3 S 2 S 1 S 0 Cout=C 4 C 0=Cin inverted cell regular cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder). q Now need two “flavors” of FAs CSE 477 L 20 Adder Design. 13 Irwin&Vijay, PSU, 2003

Fast Carry Chain Design q The key to fast addition is a low latency

Fast Carry Chain Design q The key to fast addition is a low latency carry network q What matters is whether in a given position a carry is l generated propagated Gi = A i & B i Pi = Ai Bi (sometimes use Ai | Bi) l annihilated (killed) Ki = !Ai & !Bi l q Giving a carry recurrence of Ci+1 = Gi | Pi&Ci C 1 = G 0 C 2 = G 1 C 3 = G 2 C 4 = G 3 | | P 0&C 0 P 1&G 0 | P 1&P 0 &C 0 P 2&G 1 | P 2&P 1&G 0 | P 2&P 1&P 0&C 0 P 3&G 2 | P 3&P 2&G 1 | P 3&P 2&P 1&G 0 | P 3&P 2&P 1&P 0&C 0 CSE 477 L 20 Adder Design. 15 Irwin&Vijay, PSU, 2003

Manchester Carry Chain (MCC) q Switches controlled by Gi and Pi !Ci+1 !Ci Gi

Manchester Carry Chain (MCC) q Switches controlled by Gi and Pi !Ci+1 !Ci Gi Pi clk q Total delay of l l l time to form the switch control signals Gi and Pi setup time for the switches signal propagation delay through N switches in the worst case CSE 477 L 20 Adder Design. 16 Irwin&Vijay, PSU, 2003

4 -bit Sliced MCC Adder A 3 B 3 A 2 B 2 A

4 -bit Sliced MCC Adder A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 clk & G P !C 4 !C 0 !C 3 !C 1 !C 2 S 3 S 2 S 1 S 0 CSE 477 L 20 Adder Design. 17 Irwin&Vijay, PSU, 2003

8 -bit MCC Adder & !C 7 q & 4 -bit slice MCC !C

8 -bit MCC Adder & !C 7 q & 4 -bit slice MCC !C 0 Its really hard to beat the speed of a well designed MCC for word lengths of 8 bits or less ! CSE 477 L 20 Adder Design. 18 Irwin&Vijay, PSU, 2003

Carry Skip Adders (aka Carry Bypass Adders) T = O( n) A = O(n)

Carry Skip Adders (aka Carry Bypass Adders) T = O( n) A = O(n) CSE 477 L 20 Adder Design. 19 Irwin&Vijay, PSU, 2003

Carry Skip Adder C 4 A 3 B 3 A 2 B 2 A

Carry Skip Adder C 4 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA FA S 3 S 2 S 1 S 0 BP = P 0&P 1&P 2&P 3 C 0 “Block Propagate” If (P 0 & P 1 & P 2 & P 3 = 1) then C 4 = C 0 otherwise the block itself kills or generates the carry internally CSE 477 L 20 Adder Design. 20 Irwin&Vijay, PSU, 2003

Carry-Skip Chain Implementation block carry-out BP block carry-in P 3 P 2 P 1

Carry-Skip Chain Implementation block carry-out BP block carry-in P 3 P 2 P 1 P 0 !Cout Cin G 3 G 2 G 1 G 0 BP CSE 477 L 20 Adder Design. 21 Irwin&Vijay, PSU, 2003

16 bit, 4 -bit Block Carry Skip Adder bits 12 to 15 bits 8

16 bit, 4 -bit Block Carry Skip Adder bits 12 to 15 bits 8 to 11 bits 4 to 7 bits 0 to 3 Setup Carry Propagation Sum Sum Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15 Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +B tcarry + tsum CSE 477 L 20 Adder Design. 22 Irwin&Vijay, PSU, 2003 Ci, 0

Optimal Skip Block Size and Add Time q Assuming one stage of ripple (tcarry)

Optimal Skip Block Size and Add Time q Assuming one stage of ripple (tcarry) has the same delay as one skip logic stage (tskip) and both are 1 TCSk. A = 1 + B + (N/B-2) + B + 1 tsetup ripple in block 0 skips ripple in last block tsum = 2 B + N/B q So the optimal block size, B, is d. TCSk. A/d. B = 0 (N/2) = Bopt q And the optimal time is Optimal TCSk. A = 4√(n/2) – 1 = 2√(2 n) – 1 CSE 477 L 20 Adder Design. 23 Irwin&Vijay, PSU, 2003

Carry Skip Adder Extensions q Variable block sizes l A carry that is generated

Carry Skip Adder Extensions q Variable block sizes l A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay Cout q Cin Multiple levels of skip logic Cout Cin skip level 1 skip level 2 CSE 477 L 20 Adder Design. 24 AND of the first level skip signals (BP’s) Irwin&Vijay, PSU, 2003

RCA, Carry Skip Adder Comparison B=2 CSE 477 L 20 Adder Design. 25 B=3

RCA, Carry Skip Adder Comparison B=2 CSE 477 L 20 Adder Design. 25 B=3 B=4 B=5 B=6 Irwin&Vijay, PSU, 2003

Prefix Adders T = O(log n) A = O(n log n) CSE 477 L

Prefix Adders T = O(log n) A = O(n log n) CSE 477 L 20 Adder Design. 31 Irwin&Vijay, PSU, 2003

Parallel Prefix Adders (PPAs) q Define carry operator € on (G, P) signal pairs

Parallel Prefix Adders (PPAs) q Define carry operator € on (G, P) signal pairs (G’’, P’’) (G’, P’) G’’ G’ € where G = G’’ | P’’&G’ P = P’’&P’ (G, P) l !G P’’ € is associative, i. e. , [(g’’’, p’’’) € (g’’, p’’)] € (g’, p’) = (g’’’, p’’’) € [(g’’, p’’) € (g’, p’)] € € € CSE 477 L 20 Adder Design. 32 € Irwin&Vijay, PSU, 2003

PPA General Structure q Given P and G terms for each bit position, computing

PPA General Structure q Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0, P 0) € (G 1, P 1) € (G 2, P 2) € … € (GN-2, PN-2) € (GN-1, PN-1) q Since € is associative, we can group them in any order l but note that it is not commutative Pi, Gi logic (1 unit delay) q Measures to consider l l Ci parallel prefix logic tree (1 unit delay per level) l l Si logic (1 unit delay) CSE 477 L 20 Adder Design. 33 l number of € cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching) Irwin&Vijay, PSU, 2003

Brent-Kung PPA € € € G 6 G 5 P 6 P 5 €

Brent-Kung PPA € € € G 6 G 5 P 6 P 5 € € G 4 G 3 P 4 P 3 € € G 2 p 2 G 1 P 1 G 0 P 0 € € A = 2 log 2 N € G 8 G 7 P 8 P 7 T = log 2 N € G 11 G 10 G 9 p 11 P 10 p 9 € € € T = log 2 N - 2 Parallel Prefix Computation G 15 G 14 G 13 G 12 p 15 p 14 p 13 P 12 € € C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 € C 8 C 7 € C 6 C 5 € C 4 C 3 C 2 C 1 A = N/2 CSE 477 L 20 Adder Design. 35 Irwin&Vijay, PSU, 2003

A Faster Yet PPA q Brent-Kung (BK) adder has the time bound of TBK

A Faster Yet PPA q Brent-Kung (BK) adder has the time bound of TBK = 1 + (2 log N – 2) + 1 q There are even faster PPA approaches that are used in most modern day machines for operands of 32 bits or greater q Kogge-Stone (KS) l faster pp tree (log. N for KS versus 2 log. N-2 for BK) l fan-out of carry cell € limited to two l takes more € cells (Nlog. N - N + 1 for KS versus 2 N - 2 - log. N for BK) and has more wiring CSE 477 L 20 Adder Design. 36 Irwin&Vijay, PSU, 2003

Kogge-Stone PPF Adder G 11 G 10 G 9 P 11 P 10 P

Kogge-Stone PPF Adder G 11 G 10 G 9 P 11 P 10 P 9 G 8 P 8 G 7 P 7 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 G 2 G 1 P 2 P 1 € € € € € € € € € € € € C 8 C 7 C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 C 6 C 5 C 4 C 3 G 0 P 0 C in € € C 2 C 1 T = log 2 N A = log 2 N Parallel Prefix Computation G 15 G 14 G 13 G 12 P 15 P 14 P 13 P 12 A=N CSE 477 L 20 Adder Design. 38 Tadd = tsetup + log 2 N t€ + tsum Irwin&Vijay, PSU, 2003

PPA Comparisons Measure BK PPA N=64 KS PPA N=64 # of € cells tree

PPA Comparisons Measure BK PPA N=64 KS PPA N=64 # of € cells tree depth tree area (Wx. H) cell fan-in 2 N - 2 - log. N 2 log. N - 2 (N/2) * (2 log. N -2) 129 10 320 Nlog. N - N + 1 log. N N * log. N 321 6 384 2 2 cell fan-out max wire length wiring density glitching log. N N/4 6 16 2 N/2 2 32 CSE 477 L 20 Adder Design. 39 sparse dense high low Irwin&Vijay, PSU, 2003

More Adder Comparisons CSE 477 L 20 Adder Design. 40 Irwin&Vijay, PSU, 2003

More Adder Comparisons CSE 477 L 20 Adder Design. 40 Irwin&Vijay, PSU, 2003

Next Lecture and Reminders q Next lecture l Multiplier Design - Reading assignment –

Next Lecture and Reminders q Next lecture l Multiplier Design - Reading assignment – Rabaey, et al, 11. 4 q Reminders l HW#4 due November 11 th (not Nov 4 th as on outline) l HW#5 will be optional (due November 20 th) l Project final reports due December 4 th l Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th Final exam scheduled l - Tuesday, December 16 th from 10: 10 to noon in 118 and 113 Thomas CSE 477 L 20 Adder Design. 41 Irwin&Vijay, PSU, 2003