CSE 477 VLSI Digital Circuits Fall 2003 Lecture
- Slides: 32
CSE 477 VLSI Digital Circuits Fall 2003 Lecture 20: Adder Design Mary Jane Irwin ( www. cse. psu. edu/~mji ) www. cse. psu. edu/~cg 477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © 2003 Rabaey, A. Chandrakasan, B. Nikolic] CSE 477 L 20 Adder Design. 1 J. Irwin&Vijay, PSU, 2003
Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter, etc. q l Register file and pipeline registers l Multiplexers, decoders Control l q Interconnect l q Finite state machines (PLA, ROM, random logic) Switches, arbiters, buses Memory l Caches (SRAMs), TLBs, DRAMs, buffers CSE 477 L 20 Adder Design. 2 Irwin&Vijay, PSU, 2003
The 1 -bit Binary Adder Cin A B 1 -bit Full Adder (FA) Cout G=A&B P=A B K = !A & !B S A B Cin Cout S carry status 0 0 0 kill 0 0 1 kill 0 1 0 0 1 propagate 0 1 1 1 0 propagate 1 0 0 0 1 propagate 1 0 1 1 0 propagate 1 1 0 generate 1 1 1 generate S = A B Cin = P Cin Cout = A&B | A&Cin | B&Cin = G | P&Cin (majority function) q How can we use it to build a 64 -bit adder? q How can we modify it easily to build an adder/subtractor? q How can we make it better (faster, lower power, smaller)? CSE 477 L 20 Adder Design. 3 Irwin&Vijay, PSU, 2003
FA Gate Level Implementations q The way you learned to design in CSE 271 and CSE 471 A B Cin A t 1 B Cin t 0 t 2 t 1 t 0 Cout S CSE 477 L 20 Adder Design. 4 Irwin&Vijay, PSU, 2003
Review: XOR FA Cin A S B Cout 16 transistors CSE 477 L 20 Adder Design. 5 Irwin&Vijay, PSU, 2003
Review: CPL FA !B !Cin B Cin A !S !A S B !B Cin A !Cin !Cout B Cin !A Cout !B !Cin 20+8 transistors, dual rail – beware of threshold drops CSE 477 L 20 Adder Design. 6 Irwin&Vijay, PSU, 2003
Review: Mirror Adder 24+4 transistors A 8 B 4 B kill 0 -propagate 8 A 8 4 A 4 Cin 1 -propagate A A 8 4 B 2 generate Cout = A&B | B&Cin | A&Cin A 4 4 !Cout 4 4 Cin 2 B 2 Cin 2 B 6 A 6 Cin 6 !S Cin 3 A 3 B 3 SUM = A&B&Cin | COUT&(A | B | Cin) Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2. CSE 477 L 20 Adder Design. 8 Irwin&Vijay, PSU, 2003
Mirror Adder Features q The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized. q When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances. q The transistors connected to Cin are placed closest to the output. q Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. CSE 477 L 20 Adder Design. 9 Irwin&Vijay, PSU, 2003
A 64 -bit Adder/Subtractor add/subt q q Ripple Carry Adder (RCA) built out of 64 FAs Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in RCA l advantage: simple logic, so small (low cost) l disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) CSE 477 L 20 Adder Design. 10 A 0 1 -bit FA C 1 S 0 A 1 1 -bit FA C 2 S 1 A 2 1 -bit FA C 3 S 2 B 0 B 1 B 2 . . . q C 0=Cin C 63 A 63 B 63 1 -bit FA S 63 C 64=Cout Irwin&Vijay, PSU, 2003
Ripple Carry Adder (RCA) A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA FA S 3 S 2 S 1 S 0 Cout=C 4 C 0=Cin Tadder TFA(A, B Cout) + (N-2)TFA(Cin Cout) + TFA(Cin S) T = O(N) worst case delay Real Goal: Make the fastest possible carry path CSE 477 L 20 Adder Design. 11 Irwin&Vijay, PSU, 2003
Inversion Property q Inverting all inputs to a FA results in inverted values for all outputs A Cout B FA A Cin Cout S B FA Cin S !S (A, B, Cin) = S(!A, !B, !Cin) !Cout (A, B, Cin) = Cout (!A, !B, !Cin) CSE 477 L 20 Adder Design. 12 Irwin&Vijay, PSU, 2003
Exploiting the Inversion Property A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA’ FA’ S 3 S 2 S 1 S 0 Cout=C 4 C 0=Cin inverted cell regular cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder). q Now need two “flavors” of FAs CSE 477 L 20 Adder Design. 13 Irwin&Vijay, PSU, 2003
Fast Carry Chain Design q The key to fast addition is a low latency carry network q What matters is whether in a given position a carry is l generated propagated Gi = A i & B i Pi = Ai Bi (sometimes use Ai | Bi) l annihilated (killed) Ki = !Ai & !Bi l q Giving a carry recurrence of Ci+1 = Gi | Pi&Ci C 1 = G 0 C 2 = G 1 C 3 = G 2 C 4 = G 3 | | P 0&C 0 P 1&G 0 | P 1&P 0 &C 0 P 2&G 1 | P 2&P 1&G 0 | P 2&P 1&P 0&C 0 P 3&G 2 | P 3&P 2&G 1 | P 3&P 2&P 1&G 0 | P 3&P 2&P 1&P 0&C 0 CSE 477 L 20 Adder Design. 15 Irwin&Vijay, PSU, 2003
Manchester Carry Chain (MCC) q Switches controlled by Gi and Pi !Ci+1 !Ci Gi Pi clk q Total delay of l l l time to form the switch control signals Gi and Pi setup time for the switches signal propagation delay through N switches in the worst case CSE 477 L 20 Adder Design. 16 Irwin&Vijay, PSU, 2003
4 -bit Sliced MCC Adder A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 clk & G P !C 4 !C 0 !C 3 !C 1 !C 2 S 3 S 2 S 1 S 0 CSE 477 L 20 Adder Design. 17 Irwin&Vijay, PSU, 2003
8 -bit MCC Adder & !C 7 q & 4 -bit slice MCC !C 0 Its really hard to beat the speed of a well designed MCC for word lengths of 8 bits or less ! CSE 477 L 20 Adder Design. 18 Irwin&Vijay, PSU, 2003
Carry Skip Adders (aka Carry Bypass Adders) T = O( n) A = O(n) CSE 477 L 20 Adder Design. 19 Irwin&Vijay, PSU, 2003
Carry Skip Adder C 4 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA FA S 3 S 2 S 1 S 0 BP = P 0&P 1&P 2&P 3 C 0 “Block Propagate” If (P 0 & P 1 & P 2 & P 3 = 1) then C 4 = C 0 otherwise the block itself kills or generates the carry internally CSE 477 L 20 Adder Design. 20 Irwin&Vijay, PSU, 2003
Carry-Skip Chain Implementation block carry-out BP block carry-in P 3 P 2 P 1 P 0 !Cout Cin G 3 G 2 G 1 G 0 BP CSE 477 L 20 Adder Design. 21 Irwin&Vijay, PSU, 2003
16 bit, 4 -bit Block Carry Skip Adder bits 12 to 15 bits 8 to 11 bits 4 to 7 bits 0 to 3 Setup Carry Propagation Sum Sum Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15 Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +B tcarry + tsum CSE 477 L 20 Adder Design. 22 Irwin&Vijay, PSU, 2003 Ci, 0
Optimal Skip Block Size and Add Time q Assuming one stage of ripple (tcarry) has the same delay as one skip logic stage (tskip) and both are 1 TCSk. A = 1 + B + (N/B-2) + B + 1 tsetup ripple in block 0 skips ripple in last block tsum = 2 B + N/B q So the optimal block size, B, is d. TCSk. A/d. B = 0 (N/2) = Bopt q And the optimal time is Optimal TCSk. A = 4√(n/2) – 1 = 2√(2 n) – 1 CSE 477 L 20 Adder Design. 23 Irwin&Vijay, PSU, 2003
Carry Skip Adder Extensions q Variable block sizes l A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay Cout q Cin Multiple levels of skip logic Cout Cin skip level 1 skip level 2 CSE 477 L 20 Adder Design. 24 AND of the first level skip signals (BP’s) Irwin&Vijay, PSU, 2003
RCA, Carry Skip Adder Comparison B=2 CSE 477 L 20 Adder Design. 25 B=3 B=4 B=5 B=6 Irwin&Vijay, PSU, 2003
Prefix Adders T = O(log n) A = O(n log n) CSE 477 L 20 Adder Design. 31 Irwin&Vijay, PSU, 2003
Parallel Prefix Adders (PPAs) q Define carry operator € on (G, P) signal pairs (G’’, P’’) (G’, P’) G’’ G’ € where G = G’’ | P’’&G’ P = P’’&P’ (G, P) l !G P’’ € is associative, i. e. , [(g’’’, p’’’) € (g’’, p’’)] € (g’, p’) = (g’’’, p’’’) € [(g’’, p’’) € (g’, p’)] € € € CSE 477 L 20 Adder Design. 32 € Irwin&Vijay, PSU, 2003
PPA General Structure q Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0, P 0) € (G 1, P 1) € (G 2, P 2) € … € (GN-2, PN-2) € (GN-1, PN-1) q Since € is associative, we can group them in any order l but note that it is not commutative Pi, Gi logic (1 unit delay) q Measures to consider l l Ci parallel prefix logic tree (1 unit delay per level) l l Si logic (1 unit delay) CSE 477 L 20 Adder Design. 33 l number of € cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching) Irwin&Vijay, PSU, 2003
Brent-Kung PPA € € € G 6 G 5 P 6 P 5 € € G 4 G 3 P 4 P 3 € € G 2 p 2 G 1 P 1 G 0 P 0 € € A = 2 log 2 N € G 8 G 7 P 8 P 7 T = log 2 N € G 11 G 10 G 9 p 11 P 10 p 9 € € € T = log 2 N - 2 Parallel Prefix Computation G 15 G 14 G 13 G 12 p 15 p 14 p 13 P 12 € € C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 € C 8 C 7 € C 6 C 5 € C 4 C 3 C 2 C 1 A = N/2 CSE 477 L 20 Adder Design. 35 Irwin&Vijay, PSU, 2003
A Faster Yet PPA q Brent-Kung (BK) adder has the time bound of TBK = 1 + (2 log N – 2) + 1 q There are even faster PPA approaches that are used in most modern day machines for operands of 32 bits or greater q Kogge-Stone (KS) l faster pp tree (log. N for KS versus 2 log. N-2 for BK) l fan-out of carry cell € limited to two l takes more € cells (Nlog. N - N + 1 for KS versus 2 N - 2 - log. N for BK) and has more wiring CSE 477 L 20 Adder Design. 36 Irwin&Vijay, PSU, 2003
Kogge-Stone PPF Adder G 11 G 10 G 9 P 11 P 10 P 9 G 8 P 8 G 7 P 7 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 G 2 G 1 P 2 P 1 € € € € € € € € € € € € C 8 C 7 C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 C 6 C 5 C 4 C 3 G 0 P 0 C in € € C 2 C 1 T = log 2 N A = log 2 N Parallel Prefix Computation G 15 G 14 G 13 G 12 P 15 P 14 P 13 P 12 A=N CSE 477 L 20 Adder Design. 38 Tadd = tsetup + log 2 N t€ + tsum Irwin&Vijay, PSU, 2003
PPA Comparisons Measure BK PPA N=64 KS PPA N=64 # of € cells tree depth tree area (Wx. H) cell fan-in 2 N - 2 - log. N 2 log. N - 2 (N/2) * (2 log. N -2) 129 10 320 Nlog. N - N + 1 log. N N * log. N 321 6 384 2 2 cell fan-out max wire length wiring density glitching log. N N/4 6 16 2 N/2 2 32 CSE 477 L 20 Adder Design. 39 sparse dense high low Irwin&Vijay, PSU, 2003
More Adder Comparisons CSE 477 L 20 Adder Design. 40 Irwin&Vijay, PSU, 2003
Next Lecture and Reminders q Next lecture l Multiplier Design - Reading assignment – Rabaey, et al, 11. 4 q Reminders l HW#4 due November 11 th (not Nov 4 th as on outline) l HW#5 will be optional (due November 20 th) l Project final reports due December 4 th l Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th Final exam scheduled l - Tuesday, December 16 th from 10: 10 to noon in 118 and 113 Thomas CSE 477 L 20 Adder Design. 41 Irwin&Vijay, PSU, 2003
- Cse 477
- Scaling of mos circuits in vlsi
- Advantages of parallel circuit over series circuit
- Cmos vlsi design lecture notes
- Magnetically coupled circuits lecture notes
- Spring, summer, fall, winter... and spring (2003)
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Digital integrated circuits a design perspective
- Digital integrated circuits: a design perspective
- Digital circuits
- Troubleshooting digital circuits
- Digital integrated circuits a design perspective
- Digital circuits
- Digital integrated circuits
- Signal circuit
- Characteristics of digital ic
- Sw 477
- Diketahui log 3 = a dan log 2 = b maka log 18 =
- Ece477
- Sepura *477
- Biba n 477 ddl
- Experiment 477
- Liedboek 477
- Ece 477
- Uiuc cs 477
- Opwekking 477
- Jika log 5 = 0 699 nilai log 20 adalah
- Diketahui log 3 = a dan log 2 = b maka log 18 adalah
- Ceng 477
- Ceng 477
- Jika log 2=0 301 nilai log 32 adalah
- Jika panjang ap 8 cm bq 5 cm
- Ece 477