ECE 425 VLSI Circuit Design Lecture 23 Subsystem

ECE 425 - VLSI Circuit Design Lecture 23 Subsystem Design Spring 2007 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette. edu ECE 425 Spring 2007 Lecture 23 - Subsystem Design 1

Announcements } Reading } Wolf: 6. 1 -6. 9 } These notes drawn in part from handouts by } J. Rabaey, Digital Integrated Circuits, © Prentice-Hall 1995. ECE 425 Spring 2007 Lecture 23 - Subsystem Design 2

Where we Are: } Last Time: } More about Project Pad Frame & Comparator } Streaming Video: Two Talks on Chip-Level Design } Today: } Custom Subsystem Design • General approach • Shifters • Adders ECE 425 Spring 2007 Lecture 23 - Subsystem Design 3

Subsystem Design } General Techniques } Goals } Pipelining } Datapath Design } Common Subsystems } } } Shifters Adders ALUs Multipliers Memories Structure Logic ECE 425 Spring 2007 Lecture 23 - Subsystem Design 4

Subsystem Design (Ch. 6) } Goals of Custom Subsystem Design } Maximize performance } Minimize area } Fit together with other subsystems } Key idea: optimize across levels of abstractions } } Layout Circuit Logic Register-Transfer and Higher ECE 425 Spring 2007 Lecture 23 - Subsystem Design 5

Optimizing for Peformance and/or Area } Layout level } Microscopic changes: • move wires, change wire sizing • add vias, • reduce source/drain cap, etc. } Macroscopic changes: • cell placement, • design of hierarchy } Circuit level } Transistor sizing } Advanced circuits (e. g. , dynamic logic) ECE 425 Spring 2007 Lecture 23 - Subsystem Design 6

Optimizing for Peformance and/or Area (cont'd) } Logic level } Use specialized designs (e. g. shifters, ALUs, etc. ) } Flatten to reduce delay } Restructure } Register-transfer level (and above) } Place latches/flip flops to maximize performance (retiming) } Encode FSMs to minimize area/delay } Perform computations in parallel with extra hardware if cost permits } Pipeline logic to increase performance ECE 425 Spring 2007 Lecture 23 - Subsystem Design 7

Pipelining } Key idea: Partition combinational function with latches / flip flops } Each partition is called a stage } Time between each result: one clock period } Latency: number of clock cycles before result appears (== number of stages) ECE 425 Spring 2007 Lecture 23 - Subsystem Design 8

Example - Before Pipelining } Comb. logic delay } tp=80 ns* } Latch/Flip-Flop setup } tsu=5 ns } Clock Period } tclk=85 ns * archiac TTL timing values - divide by approx. 100 for VLSI! ECE 425 Spring 2007 Lecture 23 - Subsystem Design 9

Example - After Pipelineing } Comb. logic delay } tp 1=tp 2=40 ns } Latch setup } tsu=5 ns } Clock Period } tclk=45 ns } Latency: 2 cycles ECE 425 Spring 2007 Lecture 23 - Subsystem Design 10

Pipelining Comments } Impact on performance } Increases operations per unit time } Increases latency } Added overhead due to register setup times } Design concerns / limits } Balance stage delays for best performance } Structure of logic may limit number of stages ECE 425 Spring 2007 Lecture 23 - Subsystem Design 11

Effect of Adding Pipeline Stages ECE 425 Spring 2007 Lecture 23 - Subsystem Design 12

Custom Datapath Design } Goal: create a tight design of several elements } Arithmetic / Logic Functions, Shifters } Storage: Registers, Register Files } Interconnect: wires, buses ECE 425 Spring 2007 Lecture 23 - Subsystem Design 13

Datapath Pysical Design } Bit-sliced layout of each component } Connection by abutment } "Pitch-matched" connections } Designed using "wiring plan" Wiring Plan ECE 425 Spring 2007 Lecture 23 - Subsystem Design 14

Bus Design in datapaths } Key idea: replace multiplexers with distributed drivers for long connections } Pseudo-nmos NOR: Fig 6 -8, p. 318 Pseudo-nmos bus • high power • simple design } Precharged: Fig 6 -9, p. 319 • lower power • more complex design } In either case, careful circuit design and interconnect modeling is essential ECE 425 Spring 2007 Lecture 23 - Subsystem Design Precharged bus 15

Ancient Example: the Motorola 68 K ECE 425 Spring 2007 Lecture 23 - Subsystem Design 16

Subsystem Design } General Techniques } Goals } Pipelining } Datapath Design } Common Subsystems } } } Shifters Adders ALUs Multipliers Memories Structure Logic ECE 425 Spring 2007 Lecture 23 - Subsystem Design 17

Shifter Design } Why shift? } Arithmetic operations } Floating-point } Bit field extraction } Shift Register - one shift per clock cycle } Hardware shifters - implement as comb. logic } Single-bit shifters } Barrel shifters } Logarithmic shifters ECE 425 Spring 2007 Lecture 23 - Subsystem Design 18

Single-Bit Shifter } Essentially a MUX made from pass transistors Source: J. Rabaey, Digital

Barrel Shifter } pass transistors connect input bit to chosen output } regular layout } each signal flows through only one trans. gate } area dominated by pitch of metal wires ECE 425 Spring 2007 0 Lecture 23 - Subsystem Design 1 0 20

4 X 4 Barrel Shifter - Layout Widthbarrel ~ 2 pm M Source: J.

Logarithmic Shifter } Combine shifts of powers-of-two Source: J. Rabaey, Digital Integrated Circuits ©

Logarithmic Shifter - Layout Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall ECE

Adder Design } Review: Full Adder } Sum: } Carry: si = ai XOR bi XOR ci ci+1 = ai*bi + ai*ci + bi*ci Ai Bi Ci Si Ci+1 0 0 1 1 0 0 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 ECE 425 Spring 2007 Ai B i Ci+1 Ci Si Lecture 23 - Subsystem Design 24

Adder Design (cont'd) } Ripple: constructed from n full adders } Compact, but delay proportional to n } May be tolerable when n=8, BUT } What about n=32? Potential worst cases: • A 0 or B 0 to S 31 • A 0 or B 0 to C 32 A 3 B 3 C 4 C 3 S 3 ECE 425 Spring 2007 A 2 B 2 C 3 C 2 S 2 A 1 B 1 C 2 C 1 S 1 Lecture 23 - Subsystem Design A 0 B 0 C 1 C 0 S 0 0 25

Full Adder - Static CMOS� Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall

Inversion Property Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall ECE 425 Spring

Inversion Adder Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall ECE 425 Spring

Mirror Adder: A Better Structure Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall

Mirror adder notes • The NMOS and PMOS chains are completely symmetrical. This guarantees identical rising and falling transitions if the NMOS and PMOS devices are properly sized. A maximum of two series transistors can be observed in the carry-generation circuitry. • When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. • The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. • The transistors connected to Ci are placed closest to the output. • Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall ECE 425 Spring 2007 Lecture 23 - Subsystem Design 30

Dynamic Adder - np-CMOS 17 transistors Source: J. Rabaey, Digital Integrated Circuits © 1995

Layout - Dynamic np-CMOS adder Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall

Speeding up Carry - Carry Lookahead } Key idea: trade off delay, amount of logic used } Benefit: Faster addition } Cost: much more logic 1 } Define two signals for each adder stage: } Generate } Propagate gi = ai*bi pi = ai + b i 1 01 A 0 B 0 C 1 C 0 S 0 1 X } Why use these names? } Adder i will always generate a carry if ai, bi both true } A i will propagate a carry input if either or both ai, bi both true ECE 425 Spring 2007 Lecture 23 - Subsystem Design 33

Carry Lookahead (cont’d) } Now rewrite carry output as function of ai, bi, pi, gi } Original eqn: } New eqn: ci+1 = ai*bi + ai*ci + bi*ci ci+1 = gi + pi*ci } "Flatten" carry function in terms of gi, pi c 1 = g 0 + p 0*c 0 c 2 = g 1 + p 1*c 1 = g 1 + p 1*(g 0 + p 0*g 0 ) = g 1 + p 1*g 0 + p 1*p 0*c 0 c 3 = g 2 + p 2*g 1 + p 2*p 1*g 0 + p 3*p 2*p 1*c 0 c 4 = g 3 + p 3*g 2 + p 3*p 2*g 1 + p 3*p 2*p 1*g 0 + p 3*p 2*p 1*c 0 } Add carry lookahead logic that computes c 1 -c 4 in terms of p 0 p 3 and g 0 -g 3 ECE 425 Spring 2007 Lecture 23 - Subsystem Design 34

Logarithmic Lookahead: Brent-Kung Adder Source: J. Rabaey, Digital Integrated Circuits © 1995 Prentice-Hall ECE

Coming Up: } } More about adders ALUs Memories Structure Logic ECE 425 Spring 2007 Lecture 23 - Subsystem Design 36