CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture

  • Slides: 36
Download presentation
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing

CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © 2003 Rabaey, A. Chandrakasan, B. Nikolic] Sp 09 CMPEN 411 L 17 S. 1 J.

This Lecture q Reading l Dynamic sequential circuits - Reading assignment – Rabaey, et

This Lecture q Reading l Dynamic sequential circuits - Reading assignment – Rabaey, et al, 7. 3, 7. 7 l Timing issues, Intro to datapath design - Reading assignment – Rabaey, et al, 10. 1 -10. 3. 3; 11. 1 -11. 2 q Next lecture l Intro to datapath design - Reading assignment – Rabaey, et al, 11. 1 -11. 2 l Adder design - Reading assignment – Rabaey, et al, 11. 3 Sp 09 CMPEN 411 L 17 S. 2

Last Lecture: Static MS ET Implementation Slave Master D I 2 T 2 I

Last Lecture: Static MS ET Implementation Slave Master D I 2 T 2 I 1 T 1 I 3 clk !clk Sp 09 CMPEN 411 L 17 S. 3 I 5 T 4 I 4 T 3 QM I 6 Q

Dynamic ET Flipflop master slave !clk D clk T 1 I 1 QM T

Dynamic ET Flipflop master slave !clk D clk T 1 I 1 QM T 2 C 1 clk master transparent slave hold I 2 Q C 2 !clk tsu = tpd_tx thold = zero tc-q = 2 tpd_inv + tpd_tx clk !clk Sp 09 CMPEN 411 L 17 S. 4 master hold slave transparent

Pseudostatic Dynamic Latch q Robustness considerations limit the use of dynamic FF’s l l

Pseudostatic Dynamic Latch q Robustness considerations limit the use of dynamic FF’s l l l q coupling between signal nets and internal storage nodes can inject significant noise and destroy the FF state leakage currents cause state to leak away with time internal dynamic nodes don’t track fluctuations in VDD that reduces noise margins A simple fix is to make the circuit pseudostatic clk QM Q !clk q Add above logic added to all dynamic latches Sp 09 CMPEN 411 L 17 S. 5

Dynamic ET FF Race Conditions !clk D clk T 1 I 1 QM T

Dynamic ET FF Race Conditions !clk D clk T 1 I 1 QM T 2 C 1 clk !clk Sp 09 CMPEN 411 L 17 S. 6 I 2 Q C 2 !clk 0 -0 overlap race condition toverlap 0 -0 < t. T 1 + t. I 1 + t. T 2 1 -1 overlap race condition toverlap 1 -1 < thold

Fix 1: Dynamic Two-Phase ET FF Keep clock non-overlap large enough, but with 4

Fix 1: Dynamic Two-Phase ET FF Keep clock non-overlap large enough, but with 4 clock singals to route clk 1 clk 2 T 1 D I 1 QM T 2 C 1 !clk 1 I 2 Q C 2 !clk 2 master transparent slave hold clk 1 tnon_overlap clk 2 master hold slave transparent Sp 09 CMPEN 411 L 17 S. 7

Fix 2: C 2 MOS (Clocked CMOS) ET Flipflop q A clock-skew insensitive FF

Fix 2: C 2 MOS (Clocked CMOS) ET Flipflop q A clock-skew insensitive FF Master Slave M 2 clk M 4 D M 3 !clk M 1 clk !clk Sp 09 CMPEN 411 L 17 S. 8 M 6 QM C 1 !clk M 8 Q clk M 7 M 5 C 2

C 2 MOS (Clocked CMOS) ET Flipflop q A clock-skew insensitive FF Master Slave

C 2 MOS (Clocked CMOS) ET Flipflop q A clock-skew insensitive FF Master Slave M 2 clk Mon 4 off D !clk Mon 3 off M 1 master transparent slave hold QM C 1 !clk Moff 8 on Q Moff 7 on C 2 M 5 clk !clk Sp 09 CMPEN 411 L 17 S. 9 M 6 master hold slave transparent

C 2 MOS FF 0 -0 Overlap Case q Clock-skew insensitive as long as

C 2 MOS FF 0 -0 Overlap Case q Clock-skew insensitive as long as the rise and fall times of the clock edges are sufficiently small M 2 0 M 4 D M 6 0 QM M 8 Q C 1 C 2 M 1 M 5 clk !clk Sp 09 CMPEN 411 L 17 S. 10

C 2 MOS FF 1 -1 Overlap Case M 2 M 6 QM D

C 2 MOS FF 1 -1 Overlap Case M 2 M 6 QM D 1 M 3 Q C 1 1 M 5 clk !clk Sp 09 CMPEN 411 L 17 S. 11 M 7 C 2

Fix 3: True Single Phase Clocked (TSPC) Latches Negative Latch In clk Positive Latch

Fix 3: True Single Phase Clocked (TSPC) Latches Negative Latch In clk Positive Latch clk Q hold when clk = 1 transparent when clk = 0 Sp 09 CMPEN 411 L 17 S. 12 In Q clk transparent when clk = 1 hold when clk = 0

Embedding Logic in TSPC Latch A PUN B Q In clk PDN Q clk

Embedding Logic in TSPC Latch A PUN B Q In clk PDN Q clk A B Sp 09 CMPEN 411 L 17 S. 13 clk

TSPC ET FF Master D clk on off clk master transparent slave hold clk

TSPC ET FF Master D clk on off clk master transparent slave hold clk Sp 09 CMPEN 411 L 17 S. 14 Slave on off QM on clk off Q master hold slave transparent

Choosing a Clocking Strategy q Choosing the right clocking scheme affects the functionality, speed,

Choosing a Clocking Strategy q Choosing the right clocking scheme affects the functionality, speed, and power of a circuit q Two-phase designs q l + robust and conceptually simple l - need to generate and route two clock signals l - have to design to accommodate possible skew between the two clock signals Single phase designs l l + + + - only need to generate and route one clock signal supported by most automated design methodologies don’t have to worry about skew between the two clocks have to have guaranteed slopes on the clock edges Sp 09 CMPEN 411 L 17 S. 15

Review: Sequential Definitions q Use two, level sensitive latches of opposite type to build

Review: Sequential Definitions q Use two, level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent) q Static storage l static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on - Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based) q Dynamic storage l l dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit (by adding a feedback loop on the output) so that it is pseudostatic Sp 09 CMPEN 411 L 17 S. 16

Timing Classifications q Synchronous systems l l All memory elements in the system are

Timing Classifications q Synchronous systems l l All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i. e. , a global clock signal) Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize - Clock skew (spatial variations in clock edges) - Clock jitter (temporal variations in clock edges) q Asynchronous systems l l q Self-timed (controlled) systems No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc. ) Hybrid systems l l Synchronization between different clock domains Interfacing between asynchronous and synchronous domains Sp 09 CMPEN 411 L 17 S. 17

Review: Synchronous Timing Basics R 1 In clk q D Q tclk 1 tc-q,

Review: Synchronous Timing Basics R 1 In clk q D Q tclk 1 tc-q, tsu, thold, tcdreg R 2 Combinational logic D Q tclk 2 tplogic, tcdlogic Under ideal conditions (i. e. , when tclk 1 = tclk 2) T tc-q + tplogic + tsu thold ≤ tcdlogic + tcdreg q Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations l l skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions) jitter causes T to change on a cycle-by-cycle basis Sp 09 CMPEN 411 L 17 S. 18

Sources of Clock Skew and Jitter in Clock Network 4 power supply 3 interconnect

Sources of Clock Skew and Jitter in Clock Network 4 power supply 3 interconnect 6 capacitive load clock 1 generation PLL 7 capacitive coupling 2 clock drivers 5 temperature q Skew l l l q Jitter manufacturing device variations in clock drivers interconnect variations l environmental variations (power supply and temperature) l Sp 09 CMPEN 411 L 17 S. 19 l clock generation capacitive loading and coupling environmental variations (power supply and temperature)

Positive Clock Skew q Clock and data flow in the same direction R 1

Positive Clock Skew q Clock and data flow in the same direction R 1 In R 2 Combinational logic D Q tclk 1 clk tclk 2 T 1 >0 2 delay T+ 3 4 + thold T: T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu - thold : thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - q > 0: Improves performance, but makes thold harder to meet. If thold is not met (race conditions), the circuit malfunctions independent of the clock period! Sp 09 CMPEN 411 L 17 S. 20

Negative Clock Skew q Clock and data flow in opposite directions R 1 In

Negative Clock Skew q Clock and data flow in opposite directions R 1 In R 2 D Q Combinational logic tclk 1 D Q tclk 2 delay T 1 2 <0 T+ 3 4 T: T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu - thold : thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - q < 0: Degrades performance, but thold is easier to meet (eliminating race conditions) Sp 09 CMPEN 411 L 17 S. 21 clk

Clock Jitter q Jitter causes T to vary on a cycle-bycycle basis R 1

Clock Jitter q Jitter causes T to vary on a cycle-bycycle basis R 1 Combinational logic In tclk T -tjitter T: q +tjitter T - 2 tjitter tc-q + tplogic + tsu so T tc-q + tplogic + tsu + 2 tjitter Jitter directly reduces the performance of a sequential circuit Sp 09 CMPEN 411 L 17 S. 22

Combined Impact of Skew and Jitter Constraints on the minimum clock period ( >

Combined Impact of Skew and Jitter Constraints on the minimum clock period ( > 0) q R 1 In R 2 Combinational logic D Q tclk 1 1 D Q tclk 2 T T+ >0 6 12 -tjitter T tc-q + tplogic + tsu - + 2 tjitter q thold ≤ tcdlogic + tcdreg – – 2 tjitter > 0 with jitter: Degrades performance, and makes thold even harder to meet. (The acceptable skew is reduced by jitter. ) Sp 09 CMPEN 411 L 17 S. 23

Clock Distribution Networks q q Clock skew and jitter can ultimately limit the performance

Clock Distribution Networks q q Clock skew and jitter can ultimately limit the performance of a digital system, so designing a clock network that minimizes both is important l In many high-speed processors, a majority of the dynamic power is dissipated in the clock network. l To reduce dynamic power, the clock network must support clock gating (shutting down (disabling the clock) units) Clock distribution techniques l Balanced paths (H-tree network, matched RC trees) - In the ideal case, can eliminate skew - Could take multiple cycles for the clock signal to propagate to the leaves of the tree l Clock grids - Typically used in the final stage of the clock distribution network - Minimizes absolute delay (not relative delay) Sp 09 CMPEN 411 L 17 S. 24

H-Tree Clock Network q If the paths are perfectly balanced, clock skew is zero

H-Tree Clock Network q If the paths are perfectly balanced, clock skew is zero Clock Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied Idle condition Clock Sp 09 CMPEN 411 L 17 S. 25 Gated clock

Clock Grid Network q Distributed buffering reduces absolute delay and makes clock gating easier,

Clock Grid Network q Distributed buffering reduces absolute delay and makes clock gating easier, but is sensitive to variations in the buffer delay q The secondary buffers isolate the local clock nets from the upstream local logic load and amplify the area clock signals degraded by the RC network Clock main clock buffer q secondary clock buffers Sp 09 CMPEN 411 L 17 S. 26 l decreases absolute skew l gives steeper clocks Only have to bound the skew within the local logic area

DEC Alpha 21164 (EV 5) Example q 300 MHz clock (9. 3 million transistors

DEC Alpha 21164 (EV 5) Example q 300 MHz clock (9. 3 million transistors on a 16. 5 x 18. 1 mm die in 0. 5 micron CMOS technology) l q single phase clock 3. 75 n. F total clock load l Extensive use of dynamic logic q 20 W (out of 50) in clock distribution network q Two level clock distribution l l q Single 6 inverter stage main clock buffer at the center of the chip Secondary clock buffers drive the left and right sides of the clock grid in m 3 and m 4 Total equivalent driver size of 58 cm !! Sp 09 CMPEN 411 L 17 S. 27

Secondary Clock Buffers Sp 09 CMPEN 411 L 17 S. 28

Secondary Clock Buffers Sp 09 CMPEN 411 L 17 S. 28

Clock Skew in Alpha Processor q q Absolute skew smaller than 90 ps The

Clock Skew in Alpha Processor q q Absolute skew smaller than 90 ps The critical instruction and execution units all see the clock within 65 ps Sp 09 CMPEN 411 L 17 S. 29

ASIC example Sp 09 CMPEN 411 L 17 S. 30

ASIC example Sp 09 CMPEN 411 L 17 S. 30

Microprocessor example Sp 09 CMPEN 411 L 17 S. 31

Microprocessor example Sp 09 CMPEN 411 L 17 S. 31

Dealing with Clock Skew and Jitter q To minimize skew, balance clock paths using

Dealing with Clock Skew and Jitter q To minimize skew, balance clock paths using H-tree or matched-tree clock distribution structures. q If possible, route data and clock in opposite directions; eliminates races at the cost of performance. q The use of gated clocks to help with dynamic power consumption make jitter worse. q Shield clock wires (route power lines – VDD or GND – next to clock lines) to minimize/eliminate coupling with neighboring signal nets. q Use dummy fills to reduce skew by reducing variations in interconnect capacitances due to interlayer dielectric thickness variations. q Beware of temperature and supply rail variations and their effects on skew and jitter. Power supply noise fundamentally limits the performance of clock networks. Sp 09 CMPEN 411 L 17 S. 32

Clock Skew Scheduling 12 16 Minimum clock period with zero skew 16 16 A

Clock Skew Scheduling 12 16 Minimum clock period with zero skew 16 16 A Sp 09 CMPEN 411 L 17 S. 33 12 C

Clock Skew Scheduling i j k 12 16 1 T = 15 pulse at

Clock Skew Scheduling i j k 12 16 1 T = 15 pulse at i, k tight pulse at j max Sp 09 CMPEN 411 L 17 S. 34 16 C 12

Clock Scaling Sp 09 CMPEN 411 L 17 S. 35

Clock Scaling Sp 09 CMPEN 411 L 17 S. 35

Next Lecture and Reminders q Next lecture l Intro to datapath design - Reading

Next Lecture and Reminders q Next lecture l Intro to datapath design - Reading assignment – Rabaey, et al, 11. 1 -11. 2 l Adder design - Reading assignment – Rabaey, et al, 11. 3 Sp 09 CMPEN 411 L 17 S. 36