Introduction to CMOS VLSI Design Lecture 19 Design

  • Slides: 29
Download presentation
Introduction to CMOS VLSI Design Lecture 19: Design for Skew Credits: David Harris Harvey

Introduction to CMOS VLSI Design Lecture 19: Design for Skew Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture notes) 19: Design for Skew 1

Outline q q q Clock Distribution Clock Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant

Outline q q q Clock Distribution Clock Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits 19: Design for Skew CMOS VLSI Design 2

Clocking q Synchronous systems use a clock to keep operations in sequence – Distinguish

Clocking q Synchronous systems use a clock to keep operations in sequence – Distinguish this from previous or next – Determine speed at which machine operates q Clock must be distributed to all the sequencing elements – Flip-flops and latches q Also distribute clock to other elements – Domino circuits and memories 19: Design for Skew CMOS VLSI Design 3

Clock Distribution q On a small chip, the clock distribution network is just a

Clock Distribution q On a small chip, the clock distribution network is just a wire – And possibly an inverter for clkb (clock’s complement) q On practical chips, the RC delay of the wire resistance and gate load is very long – Variations in this delay cause clock to get to different elements at different times – This is called clock skew q Most chips use repeaters to buffer the clock and equalize the delay – Reduces but doesn’t eliminate skew 19: Design for Skew CMOS VLSI Design 4

Example q Skew comes from differences in gate and wire delay – With right

Example q Skew comes from differences in gate and wire delay – With right buffer sizing, clk 1 and clk 2 could ideally arrive at the same time. – But power supply noise changes buffer delays – clk 2 and clk 3 will always see RC skew 19: Design for Skew CMOS VLSI Design 5

Review: Skew Impact q Ideally full cycle is available for work q Skew adds

Review: Skew Impact q Ideally full cycle is available for work q Skew adds sequencing overhead q Makes hold time worse 19: Design for Skew CMOS VLSI Design 6

Cycle Time Trends q Much of CPU performance comes from higher f – f

Cycle Time Trends q Much of CPU performance comes from higher f – f is improving faster than simple process shrinks – Sequencing overhead is bigger part of cycle 19: Design for Skew CMOS VLSI Design 7

Solutions q Reduce clock skew – Careful clock distribution network design – Plenty of

Solutions q Reduce clock skew – Careful clock distribution network design – Plenty of metal wiring resources q Analyze clock skew – Only budget actual, not worst case skews – Local vs. global skew budgets q Tolerate clock skew – Choose circuit structures insensitive to skew 19: Design for Skew CMOS VLSI Design 8

Clock Dist. Networks q q Ad hoc Grids H-tree Hybrid 19: Design for Skew

Clock Dist. Networks q q Ad hoc Grids H-tree Hybrid 19: Design for Skew CMOS VLSI Design 9

Clock Grids q q Use grid on two or more levels to carry clock

Clock Grids q q Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die 19: Design for Skew CMOS VLSI Design 10

Alpha Clock Grids 19: Design for Skew CMOS VLSI Design 11

Alpha Clock Grids 19: Design for Skew CMOS VLSI Design 11

H-Trees q Fractal structure – Gets clock arbitrarily close to any point – Matched

H-Trees q Fractal structure – Gets clock arbitrarily close to any point – Matched delay along all paths q Delay variations cause skew q A and B might see big skew 19: Design for Skew CMOS VLSI Design 12

Itanium 2 H-Tree q Four levels of buffering: – Primary driver – Repeater –

Itanium 2 H-Tree q Four levels of buffering: – Primary driver – Repeater – Second-level clock buffer – Gater q Route around obstructions 19: Design for Skew CMOS VLSI Design 13

Hybrid Networks q Use H-tree to distribute clock to many points q Tie these

Hybrid Networks q Use H-tree to distribute clock to many points q Tie these points together with a grid q Ex: IBM Power 4, Power. PC – H-tree drives 16 -64 sector buffers – Buffers drive total of 1024 points – All points shorted together with grid 19: Design for Skew CMOS VLSI Design 14

Skew Tolerance q Flip-flops are sensitive to skew because of hard edges – Data

Skew Tolerance q Flip-flops are sensitive to skew because of hard edges – Data launches at latest rising edge of clock – Must setup before earliest next rising edge of clock – Overhead would shrink if we can soften edge q Latches tolerate moderate amounts of skew – Data can arrive anytime latch is transparent 19: Design for Skew CMOS VLSI Design 15

Skew: Latches 2 -Phase Latches Pulsed Latches 19: Design for Skew CMOS VLSI Design

Skew: Latches 2 -Phase Latches Pulsed Latches 19: Design for Skew CMOS VLSI Design 16

Dynamic Circuit Review q Static circuits are slow because fat p. MOS load input

Dynamic Circuit Review q Static circuits are slow because fat p. MOS load input q Dynamic gates use precharge to remove p. MOS transistors from the inputs – Precharge: f = 0 output forced high – Evaluate: f = 1 output may pull low 19: Design for Skew CMOS VLSI Design 17

Domino Circuits q Dynamic inputs must monotonically rise during evaluation – Place inverting stage

Domino Circuits q Dynamic inputs must monotonically rise during evaluation – Place inverting stage between each dynamic gate – Dynamic / static pair called domino gate q Domino gates can be safely cascaded 19: Design for Skew CMOS VLSI Design 18

Domino Timing q Domino gates are 1. 5 – 2 x faster than static

Domino Timing q Domino gates are 1. 5 – 2 x faster than static CMOS – Lower logical effort because of reduced Cin q Challenge is to keep precharge off critical path q Look at clocking schemes for precharge and eval – Traditional schemes have severe overhead – Skew-tolerant domino hides this overhead 19: Design for Skew CMOS VLSI Design 19

Traditional Domino Ckts q Hide precharge time by ping-ponging between halfcycles – One evaluates

Traditional Domino Ckts q Hide precharge time by ping-ponging between halfcycles – One evaluates while other precharges – Latches hold results during precharge 19: Design for Skew CMOS VLSI Design 20

Clock Skew q Skew increases sequencing overhead – Traditional domino has hard edges –

Clock Skew q Skew increases sequencing overhead – Traditional domino has hard edges – Evaluate at latest rising edge – Setup at latch by earliest falling edge 19: Design for Skew CMOS VLSI Design 21

Time Borrowing q Logic may not exactly fit half-cycle – No flexibility to borrow

Time Borrowing q Logic may not exactly fit half-cycle – No flexibility to borrow time to balance logic between half cycles q Traditional domino sequencing overhead is about 25% of cycle time in fast systems! 19: Design for Skew CMOS VLSI Design 22

Relaxing the Timing q Sequencing overhead caused by hard edges – Data departs dynamic

Relaxing the Timing q Sequencing overhead caused by hard edges – Data departs dynamic gate on late rising edge – Must setup at latch on early falling edge q Latch functions – Prevent glitches on inputs of domino gates – Holds results during precharge q Is the latch really necessary? – No glitches if inputs come from other domino – Can we hold the results in another way? 19: Design for Skew CMOS VLSI Design 23

Skew-Tolerant Domino q Use overlapping clocks to eliminate latches at phase boundaries. – Second

Skew-Tolerant Domino q Use overlapping clocks to eliminate latches at phase boundaries. – Second phase evaluates using results of first 19: Design for Skew CMOS VLSI Design 24

Full Keeper q After second phase evaluates, first phase precharges q Input to second

Full Keeper q After second phase evaluates, first phase precharges q Input to second phase falls – Violates monotonicity? q But we no longer need the value q Now the second gate has a floating output – Need full keeper to hold it either high or low 19: Design for Skew CMOS VLSI Design 25

Time Borrowing q Overlap can be used to – Tolerate clock skew – Permit

Time Borrowing q Overlap can be used to – Tolerate clock skew – Permit time borrowing q No sequencing overhead 19: Design for Skew CMOS VLSI Design 26

Multiple Phases q With more clock phases, each phase overlaps more – Permits more

Multiple Phases q With more clock phases, each phase overlaps more – Permits more skew tolerance and time borrowing 19: Design for Skew CMOS VLSI Design 27

Clock Generation 19: Design for Skew CMOS VLSI Design 28

Clock Generation 19: Design for Skew CMOS VLSI Design 28

Summary q Clock skew effectively increases setup and hold times in systems with hard

Summary q Clock skew effectively increases setup and hold times in systems with hard edges q Managing skew – Reduce: good clock distribution network – Analyze: local vs. global skew – Tolerate: use systems with soft edges q Flip-flops and traditional domino are costly q Latches and skew-tolerant domino perform at full speed even with moderate clock skews. 19: Design for Skew CMOS VLSI Design 29