Lecture 23 Performance OUTLINE Timing diagrams from Lecture

  • Slides: 26
Download presentation
Lecture #23 Performance OUTLINE • Timing diagrams (from Lecture 22) • Delay analysis (from

Lecture #23 Performance OUTLINE • Timing diagrams (from Lecture 22) • Delay analysis (from Lecture 22) • Maximum clock frequency - three figures of merit • Continuously-switched inverters • Ring oscillators Reading (Rabaey et al. ) Parts of Ch. 5: Pages 179 -184; 193 -203; 212 -217; 220 -227; 230 -232 Perspective and Summary EECS 40, Fall 2004 Lecture 23 Performance, Slide 1 Prof. White

Propagation Delay in Timing Diagrams • To simplify the drawing of timing diagrams, we

Propagation Delay in Timing Diagrams • To simplify the drawing of timing diagrams, we can approximate the signal transitions to be abrupt (though in reality they are exponential). A A F F 1 0 t tp. HL tp. LH t To further simplify timing analysis, we can define the propagation delay as EECS 40, Fall 2004 Lecture 23 Performance, Slide 2 Prof. White

Glitching Transitions A, B, C The propagation delay from one logic gate to the

Glitching Transitions A, B, C The propagation delay from one logic gate to the next can cause spurious transitions, called glitches, to occur. (A node can exhibit multiple transitions before settling to the correct logic level. ) 1 0 B 1 0 t B • C 1 0 A+B A B t tp 2 tp 3 tp t A+B F B C EECS 40, Fall 2004 1 0 t F B • C 1 0 Lecture 23 Performance, Slide 3 t Prof. White

Glitch Reduction • Spurious transitions can be minimized by balancing signal paths Example: F

Glitch Reduction • Spurious transitions can be minimized by balancing signal paths Example: F = A • B • C • D EECS 40, Fall 2004 Lecture 23 Performance, Slide 4 Prof. White

MOSFET Layout and Cross-Section Top View: Cross Section: EECS 40, Fall 2004 Lecture 23

MOSFET Layout and Cross-Section Top View: Cross Section: EECS 40, Fall 2004 Lecture 23 Performance, Slide 5 Prof. White

Source and Drain Junction Capacitance Csource = Cj (AREA) + Cjsw (PERIMETER) = Cj.

Source and Drain Junction Capacitance Csource = Cj (AREA) + Cjsw (PERIMETER) = Cj. LSW + CJSW(2 LS + W) EECS 40, Fall 2004 Lecture 23 Performance, Slide 6 Prof. White

Computing the Output Capacitance 2 l=0. 25 mm Example 5. 4 (pp. 197 -203)

Computing the Output Capacitance 2 l=0. 25 mm Example 5. 4 (pp. 197 -203) VDD In Out PMOS W/L=9 l/2 l Poly-Si Out In NMOS W/L=3 l/2 l GND Metal 1 EECS 40, Fall 2004 Lecture 23 Performance, Slide 7 Prof. White

2 l=0. 25 mm VDD PMOS Capacitances for 0. 25 mm technology: W/L=9 l/2

2 l=0. 25 mm VDD PMOS Capacitances for 0. 25 mm technology: W/L=9 l/2 l Gate capacitances: • Cox(NMOS) = Cox(PMOS) = 6 f. F/mm 2 Overlap capacitances: In • CGDO(NMOS) = Con = 0. 31 f. F/mm • CGDO(PMOS)= Cop = 0. 27 f. F/mm Bottom junction capacitances: • CJ(NMOS) = Keqbpn. Cj = 2 f. F/mm 2 NMOS • CJ(PMOS) = Keqbpp. Cj = 1. 9 f. F/mm 2 W/L=3 l/2 l Sidewall junction capacitances: GND • CJSW(NMOS) = Keqswn. Cj = 0. 28 f. F/mm • CJSW(PMOS) = Keqbpp. Cj = 0. 22 f. F/mm EECS 40, Fall 2004 Lecture 23 Performance, Slide 8 Out Prof. White

EECS 40, Fall 2004 Lecture 23 Performance, Slide 9 Prof. White

EECS 40, Fall 2004 Lecture 23 Performance, Slide 9 Prof. White

Typical MOSFET Parameter Values • For a given MOSFET fabrication process technology, the following

Typical MOSFET Parameter Values • For a given MOSFET fabrication process technology, the following parameters are known: – – VT (~0. 5 V) Cox and k (<0. 001 A/V 2) VDSAT ( 1 V) l ( 0. 1 V-1) Example Req values for 0. 25 mm technology (W = L): EECS 40, Fall 2004 Lecture 23 Performance, Slide 10 Prof. White

Compute propagation delays EECS 40, Fall 2004 Lecture 23 Performance, Slide 11 Prof. White

Compute propagation delays EECS 40, Fall 2004 Lecture 23 Performance, Slide 11 Prof. White

Examples of Propagation Delay Pentium II CMOS technology generation 0. 25 mm Pentium III

Examples of Propagation Delay Pentium II CMOS technology generation 0. 25 mm Pentium III Pentium IV Product 600 MHz Fan-out=4 inverter delay ~100 ps 0. 18 mm 1. 8 GHz ~40 ps 0. 13 mm 3. 2 GHz ~20 ps Clock frequency, f Typical clock periods: • high-performance m. P: ~15 FO 4 delays • Play. Station 2: 60 FO 4 delays EECS 40, Fall 2004 Lecture 23 Performance, Slide 12 Prof. White

STATIC CMOS DRIVING LARGE LOADS VDD MP 1 vin + - vout CL MN

STATIC CMOS DRIVING LARGE LOADS VDD MP 1 vin + - vout CL MN 1 The load, CL , may be the capacitance of a long line on the chip (e. g. up to 1 p. F, or may be the load on one of the chip output pins (e. g. up to 50 p. F). We have seen that the typical driving resistance R for a minimum sized inverter is in the range of 10 KW. A 1 KW resistor driving a 50 p. F load would have a stage delay of 35 nsec, huge in comparison to normal stage delays. Thus we need to use larger devices to drive large capacitive loads, that is greatly increase W/L. However, increasing W/L of a stage will increase the load it presents to the stage driving it, and we just move the delay problem back one stage. EECS 40, Fall 2004 Lecture 23 Performance, Slide 13 Prof. White

STATIC CMOS DRIVING LARGE LOADS VDD MP 1 VDD PROBLEM: A minimum sized inverter

STATIC CMOS DRIVING LARGE LOADS VDD MP 1 VDD PROBLEM: A minimum sized inverter drives a large load, CL, leading to excessive delay, even with a buffer stage. MPB vout vin + - CL PROPOSED SOLUTION: Insert MNB several simple inverter stages with MN 1 increasing W/L between Inverter 1 and the load CL. The total delay through the multiple stages will be less than the delay of one single stage driving CL. VDD MP 1 MPB 2 MPB 3 vout vin + - CL MN 1 EECS 40, Fall 2004 MNB 1 MNB 2 MNB 3 Lecture 23 Performance, Slide 14 Prof. White

STATIC CMOS DRIVING LARGE LOADS Example: The 2. 5 V 0. 25 mm CMOS

STATIC CMOS DRIVING LARGE LOADS Example: The 2. 5 V 0. 25 mm CMOS inverter driving 50 p. F load. Properties: W/L|N =1/. 25, W/L|P =2/. 25, VDD = 2. 5 V, VT = 0. 5 V. Rn = 13 KW /4 = 3. 25 KW ; Rp = 31 KW /8 = 3. 75 KW 5 nm oxide thickness , Cox =6. 9 f. F/mm 2. NMOS: CGp = W x L x Cox =1. 7 f. F. PMOS : CGp = W x L x Cox =3. 4 f. F. Thus CIN= 5. 2 f. F Basic gate delay (0. 69 RC) is about 10 p. S. If we size one inverter to drive the load with this time constant it requires a W/L increase by a factor of 50 p. F/5. 2 f. F =9615. So CIN= 50000 f. F =50 p. F for the buffer gate! Thus the gate delay for the first stage is (50000/5. 2)X 10 p. S = 96. 1 n. S. Total delay = 96. 1 +. 01 = 96. 11 n. S. TOO LONG and NO IMPROVEMENT! Note: We are ignoring drain capacitance in these examples. EECS 40, Fall 2004 vin W/L = 4 VDD MP 1 MPB vout + - MN 1 MNB 50 p. F W/L = 9615 Lecture 23 Performance, Slide 15 Prof. White

STATIC CMOS DRIVING LARGE LOADS Same example with tapered device sizes (geometric series) Case

STATIC CMOS DRIVING LARGE LOADS Same example with tapered device sizes (geometric series) Case 1: Same example, but with buffer devices scaled by factor of 98 (982=9615 ) Stage 1 load = 98 X 5. 2 f. F, (R= 3. 5 K) Stage 2 load = 50 p. F , (R = 3. 5 K /98) Delay = 98 X 10 p. S + 96 n. S/98 =0. 98 +0. 98 n. S ~2 n. S Case 2: Now taper through 3 buffer stages with W/L ratios of 9. 9 (9. 94=9615) VDD MP 1 MPB 2 MPB 3 vout vin + - CL MN 1 MNB 2 MNB 3 4 equal gate delays of 9. 9 x 10 p. S =99 p. S Total = 4 X. 099 n. S ~0. 4 n. S Gate delay through 4 gates is much less than through 2! Note: We are ignoring drain capacitance in these examples. EECS 40, Fall 2004 Lecture 23 Performance, Slide 16 Prof. White

STATIC CMOS DRIVING LARGE LOADS Comments In our example we got better results with

STATIC CMOS DRIVING LARGE LOADS Comments In our example we got better results with 3 buffer stages than 1. 7 buffer stages would do even better. How many buffer stages are optimum? Well under these simple assumptions (like ignoring drain and wiring capacitance, and operating asynchronously) you can show that the number of buffer stages, N obeys N +1 = ln(R) where R is the ratio of the load capacitance to the capacitance of a minimum sized stage. This formula is not important, but you should remember the concept that buffering with multiple stages usually leads to lower net delay if the load is large. VDD MP 1 MPB 2 MPB 3 vout vin + - CL MN 1 EECS 40, Fall 2004 MNB 1 MNB 2 MNB 3 Lecture 23 Performance, Slide 17 Prof. White

How to measure inverter performance? VDD MP 3 vin 1 + - MN 1

How to measure inverter performance? VDD MP 3 vin 1 + - MN 1 MP 4 vout 1 = vin 2 MN 2 1) We have defined the unit delay tp as the time until Vout 1 reaches VDD /2 starting at either 0 V (rising) or VDD (falling). Vin 1 is a step function. There are two other measures of performance which we can also consider: 2) The stage delay when the input is a continuous square-wave clock input. 3) The delay of a pulse through a multi-stage “ring oscillator”, EECS 40, Fall 2004 Lecture 23 Performance, Slide 18 Prof. White

Unit gate delay performance measurement Suppose Vin 1 goes from low to high. vin

Unit gate delay performance measurement Suppose Vin 1 goes from low to high. vin 1 + - V VDD MP 4 MP 3 vout 1 = vin 2 MN 1 VDD 0. 5 VDD MN 2 Vout 1 goes from VDD to ground. t tp We defined the inverter delay tp. HLas the time until Vout 1 reaches VDD /2. Because when it reaches this value, the following stage will sense that its input has switched from high to low. Similarly tp. LH is the time for the output to rise from zero to VDD /2 when the input is falling. Maximum frequency is just 1/(tp. HL + tp. LH) The properly designed stage will have similar delay time for rising input as for falling input. (Design proper ratio of W p to Wn) EECS 40, Fall 2004 Lecture 23 Performance, Slide 19 Prof. White

Driving Inverters (or gates) with Square-Wave Clock VDD VIN , VX Vh In etc.

Driving Inverters (or gates) with Square-Wave Clock VDD VIN , VX Vh In etc. 1/f Node X loaded by CX Inverter 1 has output resistance Rp or Rn Vl t 1 t 2 t 3 t 4 t 5 Lets follow VX for VIN starting at t=0 Output slowly converges to sawtooth waveform. Let’s find relationship between max and min values vh and vl after many cycles: (1) Pull down: can solve simultaneously given t/RC (2) Pull up: Example: EECS 40, Fall 2004 Lecture 23 Performance, Slide 20 Prof. White

Square-Wave Drive VDD In etc. 1/f t 1 t 2 t 3 t 4

Square-Wave Drive VDD In etc. 1/f t 1 t 2 t 3 t 4 t 5 Inverter 2 will operate correctly so long as VX passes through vil and vih. We approximate response of devices in inverter 2 as instantaneous (remember the steep transfer curve). Let’s look at VX after a long time. Vih Vil EECS 40, Fall 2004 When VX crosses down through vil, inverter 2 switches, and when it crosses up through vih, it switches back Lecture 23 Performance, Slide 21 Prof. White

If frequency increases when will inverter fail? If VX does not pass through Vil

If frequency increases when will inverter fail? If VX does not pass through Vil or Vih, because frequency is too high. MAXIMUM CLOCK FREQUENCY fmax : Increase f until inverter 2 fails to toggle because its input does not pass through its threshold(s). In general, Rp Rn, so rise or fall is slower. EECS 40, Fall 2004 Lecture 23 Performance, Slide 22 Prof. White

Example: Take R = 3 K, C = 5 f. F, So fmax 1

Example: Take R = 3 K, C = 5 f. F, So fmax 1 = 50 GHz tp. HL = tp. LH = 0. 69 RC = 10 p. S ; Now consider the square-wave drive case: Take VDD=2. 5 V, Vih = 1. 5, Vil = 1 V , so in this symmetric case: v il = Vihe Δt/Rn. C andv ih = VDD + (Vil - VDD )e Solving either equation with RC = 15 p. S, Dt = 6. 1 p. S; fmax 2 = 1012/12. 2=82 GHz - Δt/Rp. C Vih Vil (obviously this result depends on our somewhat arbitrary choice for Vih and Vil ) EECS 40, Fall 2004 Lecture 23 Performance, Slide 23 Prof. White

Ring Oscillator 1 2 3 4 … n Odd number of stages As soon

Ring Oscillator 1 2 3 4 … n Odd number of stages As soon as the inverter 1 drives inverter 2’s input past Vil (falling) or Vih (rising), inverter 2 switches and starts driving input node of toward its switch point, etc. Note: V starts at 0 V (rising) or VDD (falling) WHY? Result: Signal propagates along chain at another kind of maximum clock frequency fmax* (really maximum propagation frequency ) Let the average delay per stage be t. MIN then the time around loop is N t. MIN. One period is twice around the loop, something very easy to measure. [ If t. MIN is 20 p. Sec but N is 1001, the period 1/ f. RO is 40 n. Sec. ] Now we. define fmax* by , so could be 1001 easy to measure (low frequency) EECS 40, Fall 2004 Lecture 23 Performance, Slide 24 NOTE: fmax *< fmax 2 WHY? Prof. White

Ring Oscillator 0=0 V 1 0 1=VDD close switch Odd number of stages As

Ring Oscillator 0=0 V 1 0 1=VDD close switch Odd number of stages As soon as the switch closes inverter 5 drives inverter 1’s input up (starting at 0 V). When it reaches Vih inverter 1 switches and starts driving input node of inverter two down, starting at VDD. . We note that the transient always starts at 0 or VDD and ends at Vih or Vil , respectively. This clearly takes longer than the clock-driven chain of inverter transient. Need to solve same exponential equations as in square-wave drive, but with different limits: Up: Start at 0, end at Vih = VDD[1 -exp(- t. LH/Rp. C)] Down: Start at VDD, end at Vil = VDD[exp(- t. HL/Rn. C)] Solve for t. LH and t. HL and avg. to get t. MIN : t. MIN = ( t. LH + t. HL )/2 EECS 40, Fall 2004 Lecture 23 Performance, Slide 25 Prof. White

Ring Oscillator Example 0=0 V close switch 1 0 1=VDD 101 Stages, same parameters:

Ring Oscillator Example 0=0 V close switch 1 0 1=VDD 101 Stages, same parameters: (RC = 15 p. S) From Vih = VDD[1 -exp(- t. LH/Rp. C)] we find t. LH = 13. 7 p. S Similarly from Vil = VDD[exp(- t. HL/Rn. C)] t. HL = 13. 7 p. S Thus the delay through 101 stages, twice is 202 X 13. 7 =2. 78 n. S. The ring oscillator frequency is 109/2. 78 = 360 MHz. Finally, fmax* = 360 X 101 = 36 GHz. This is of course less than either the 50 GHz estimated from unit gate delay or the 82 GHz estimated from square-wave driven max toggle frequency. EECS 40, Fall 2004 Lecture 23 Performance, Slide 26 Prof. White