Digital System Clocking HighPerformance and LowPower Aspects Vojin






















































- Slides: 54
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Chapter 9: Microprocessor Examples Digital System Clocking: Wiley-Interscience and IEEE Press, January 2003 Oklobdzija, Stojanovic,
Microprocessor Examples • Clocking for Intel® Microprocessors • IA-32 Pentium® Pro • First IA-64 Microprocessor • Pentium 4 • Sun Microsystems Ultra. SPARC-III® Clocking • Clocking and CSEs • Alpha® Clocking: A Historical Overview • Clocking and CSEs • IBM® Microprocessors • Level-Sensitive Scan Design • Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 2
Microprocessor Examples • Clocking for Intel® Microprocessors • IA-32 Pentium® Pro • First IA-64 Microprocessor • Pentium 4 • Sun Microsystems Ultra. SPARC-III® Clocking • Clocking and CSEs • Alpha® Clocking: A Historical Overview • Clocking and CSEs • IBM® Microprocessors • Level-Sensitive Scan Design • Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 3
Intel® Microprocessor Features Pentium III Pentium 4 MPR Issue June 1997 April 2000 Dec 2001 Clock Speed 266 MHz 1 GHz 2 GHz Pipeline Stages 12/14 22/24 Transistors 7. 5 M 24 M 42 M 16 k/16 K/- 16 K/256 K 12 K/8 K/256 K 203 mm 2 106 mm 2 217 mm 2 IC Process 0. 28 m, 4 M 0. 18 m, 6 M Max Power 27 W 23 W 67 W Cache (I/D/L 2) Die Size Source: Microprocessor Report Journal Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 4
IA-32 Pentium® Pro Ext Clk CLK Gen FB Clk Delay Line Delay SR Delay Line Deskew Control PD Right Spine Left Spine Core Delay SR Clock distribution network with deskewing circuit (Geannopoulos and Dai 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 5
Adaptive Deskewing Technique • Equalization of two clock distribution spines by compensating for delay mismatch • Delay lines • Phase detector • Controller • Result: global clock skew of only 15 ps • 0. 25 m technology • 7. 5 M transistors Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 6
IA-32 Pentium® Pro In Out Load<1: 15, 2> Delay Line Load<0: 14, 2> <1: 15, 2> <0: 14, 2> Delay Shift Register Delay shift register (Geannopoulos and Dai 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 7
IA-32 Pentium® Pro Right Clk Bandwidth Control Left Clk Delay = n Left Leads Phase Detector 1 n Right Leads Phase Detector 2 Phase detector (Geannopoulos and Dai 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 8
First IA-64 Microprocessor PLL RCDs Deskew Cluster Core Clock PLL Reference Clock distribution topology (Rusu and Tam 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 9
Programmable Deskew Units • Strategy similar to that in IA-32 • External differential clock • System bus frequency • PLL generates internal clock • 2 x frequency • Clock distribution architecture • Balanced global clock tree • Multiple deskew buffers • Multiple local clock buffers Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 10
First IA-64 Microprocessor RCD Deskew Buffer Regional Clock Grid Global Clock TAP Interface Reference Clock Phase Detector Digital Filter Control FSM Deskew Settings RCD Regional Feedback Clock Deskew buffer architecture (Rusu and Tam 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 11
First IA-64 Microprocessor Input Output Enable Delay Control Register Digitally controlled delay line (Rusu and Tam 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 12
First IA-64 Microprocessor Simulated regional clock-grid skew (Rusu and Tam 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 13
First IA-64 Microprocessor Measured regional clock skew (Rusu and Tam 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 14
Pentium® 4 1 x-Clk enable clock enable distribution & sync clock enable generator clock enable distribution & sync MACRO core Clk distribution bus clock# I/O PLL I/O data Clk distribution I/O feedback clock core clock outbound deskew state machine data from core data bus outbound clocks Q data MSFF Q D inbound buffers MSFF core clock input buffer inbound latching clocks inbound clocks gen state machine Nov. 14, 2003 core clock divide by 4 D data clock data to core addr. bus outbound clocks MACRO Core PLL bus clock 2 x-Clk enables strobe glitch protection and detection input buffers strobes Core and I/O clock generation (Kurd et al. 2001), Copyright © 2001 IEEE Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 15
Multi-GHz Clock Network in Pentium 4 • Three core and three I/O frequencies (total 6 frequencies running concurrently) • Differential off-chip reference clock • PLL synthesizes core and I/O clocks • Global core clock distribution • 47 independent clock domains • Each domain has 5 -bit deskew control register • Clock skew < 20 ps Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 16
Pentium® 4 To Test Access Port 3 3 -stage binary tree of clock repeaters PLL Domain Buffer 1 Domain Buffer 2 Domain Buffer 3 Domain Buffer 46 Domain Buffer 47 Nov. 14, 2003 Local Clock Macro Phase Detector Local Clock Macro Sequential Elements Local Clock Macro Sequential Elements Logical diagram of core clock distribution (Kurd et al. 2001), Copyright © 2001 IEEE Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 17
Pentium® 4 Stretch 1 Stretch 0 Enable 2 Adjustable Delay Buffer Gclk Enable 1 medium freq. pulse clk phase 1 Enable 2 Stretch 1 Stretch 0 Enable 2 Clk. Buf Type 1 Gclk Stretch 1 Stretch 0 Enable 1 Slow. Clk. Sync Gclk Stretch 1 Stretch 0 Enable 1 Enable 2 Clk. Buf Type 1 Gclk Clk. Buf Type 3 Stretch 0 slow freq. pulse clk phase 1 Enable Clk. Buf Type 1 Stretch 1 medium freq. pulse clk phase 2 Enable 1 Adjustable Delay Buffer Enable 1 medium freq. normal clk phase 1 fast freq. pulse clk Enable 2 Gclk Clk. Buf Type 2 Example of local clock buffers generating various frequency, phase and types of clocks (Kurd et al. 2001), Copyright © 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 18
Intel Clocking: Summary • Increasing clock speeds and die size • Balancing the clock skew in large designs using simple RC trees is becoming less effective • Insertion delay 7 -8 FO 4 due to increased die • Comparable to the clock period • Clock skew control has been getting harder to due to increased PVT variations • Inductive effects at multi-GHz rates • Use of active deskewing circuits Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 19
Microprocessor Examples • Clocking for Intel® Microprocessors • IA-32 Pentium® Pro • First IA-64 Microprocessor • Pentium 4 • Sun Microsystems Ultra. SPARC-III® Clocking • Clocking and CSEs • Alpha® Clocking: A Historical Overview • Clocking and CSEs • IBM® Microprocessors • Level-Sensitive Scan Design • Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 20
Ultra. SPARC® Family Characteristics Ultra. SPARC-I® Ultra. SPARC-III® Year 1995 1997 2000 Architecture SPARC V 9, 4 -issue Die size 17. 7 x 17. 8 mm 2 12. 5 x 12. 5 mm 2 15 x 15. 5 mm 2 # of transistors 5. 2 M 5. 4 M 23 M Clock Frequency 167 MHz 330 MHz 1 GHz Supply voltage 3. 3 V 2. 5 V 1. 6 V Process 0. 5 m CMOS 0. 35 m CMOS 0. 15 m CMOS Metal layers 4 (Al) 5 (Al) 7 (Al) Power consumption <30 W <80 W Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 21
Ultra. SPARC-III: Clocking • Performance-driven high-power clock distribution • Eight logic gates per cycle • High-speed semi-dynamic flip-flops with logic embedding • Large hold time mandates use of advanced tools for fixing fast-path violations Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 22
Ultra. SPARC-III®: Clocking Clock distribution delay in Ultra. SPARC-III (Heald et al. 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 23
Ultra. SPARC-III: Clock Storage Elements Semidynamic flip-flop (Klass 1998), Copyright © 1998 IEEE • Single-ended dynamic structure with use of keepers for static operation and use of clock pulsing • Positive feedback (NAND) improves low-to-high setup time • Fast, at the price of high internal and clock power Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 24
Ultra. SPARC-III: Clock Storage Elements Logic embedding in a semi-dynamic flip-flop Two-input XOR function (Klass, 1998), Copyright © 1998 IEEE • A non-inverting logic function can be embedded by replacing the input D transistor with an n-MOS logic network • Necessary for fitting 8 logic stages in cycle time, also used for scan • Complexity of embedded logic limited by the n-MOS stack depth Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 25
Ultra. SPARC-III: Clock Storage Elements Differential dynamic SDFF Single-ended dynamic SDFF (Klass, 1998), Copyright © 1998 IEEE • Dynamic version of SDFF used in dynamic logic paths • Outputs exercise precharge-evaluate sequence to ensure monotonicity Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 26
Ultra. SPARC-III: Clock Storage Elements Ultra. SPARC-III flip-flop (Heald et al. 2000), Copyright © 2000 IEEE • Final Ultra. SPARC-III flip-flop modified by decoupling keepers to increase immunity to -particles • Somewhat degraded speed and logic embedding property Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 27
Microprocessor Examples • Clocking for Intel® Microprocessors • IA-32 Pentium® Pro • First IA-64 Microprocessor • Pentium 4 • Sun Microsystems Ultra. SPARC-III® Clocking • Clocking and CSEs • Alpha® Clocking: A Historical Overview • Clocking and CSEs • IBM® Microprocessors • Level-Sensitive Scan Design • Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 28
Alpha® Microprocessor Features 21064 21164 21264 21364 1. 68 9. 3 15. 2 152 16. 8 x 13. 9 18. 1 x 16. 5 16. 7 x 18. 8 21. 1 x 18. 8 0. 75 m 0. 35 m 0. 18 m Supply [V] 3. 3 2. 2 1. 5 Power [W] 30 50 72 125 Clk Freq. [MHz] 200 300 600 1200 Gates/Cycle 16 14 12 12 # transistors [M] Die Size [mm 2] Process Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 29
Alpha® Microprocessors: Clocking clock grid (a) (b) (c) Alpha microprocessor final clock driver location: (a) 21064, (b) 21164, (c) 21264 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 30
Alpha® Microprocessors: Clocking 21064 clock skew (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 31
Alpha® Microprocessors: Clocking 21164 clock skew (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 32
Alpha® Microprocessors: Clocking ext. clk D D Clk D local clk GCLK Grid Clk D Box Clk Grid PLL Clk local clk D Clk D cond. local clk Clk cond. local clk 21264 clock hierarchy (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 33
Alpha® Microprocessors: Clocking 21264 clock skew (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 34
Alpha® Microprocessors: Clocking NCLK DLL DLL GCLK grid L 2 LClk L 2 RClk 21364 major clock domains (Xanthopoulos et al. 2001), Copyright © 2001 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 35
Alpha® Microprocessors: Clocking 21364, NCLK clock skew (Xanthopoulos et al. 2001), Copyright © 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 36
Alpha® µP: Clock Storage Elements 21064 modified TSPC latches (Gronowski et al. 1998), Copyright © 1998 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 37
Alpha® µP: Clock Storage Elements X D Clk (a) Q X D Q Clk (b) 21164: (a) phase-A latch, (b) phase-B latch (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 38
Alpha® µP: Clock Storage Elements D 1 D 2 D 1 X D 2 Clk X 1 Q Q Clk D 3 D 4 X 2 Clk (a) (b) Embedding of logic into a latch: (a) 21064 TSPC latch, one level of logic; (b) 21164 latch, two levels of logic. (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 39
Alpha® µP: Clock Storage Elements Q Q Clk D 21264 flip-flop (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 40
Alpha® Microprocessors: Timing Logic D Q D D Q R GCLK Critical Path Definition and Criteria - Identify common clock, D and R - Maximize D - Minimize R D+U R Tcycle Logic D Q D R GCLK cond Race Definition and Criteria - Identify common clock, D and R - Minimize D - Maximize R D R+H Critical-path and race analysis for clock buffering and conditioning (Gronowski et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 41
Microprocessor Examples • Clocking for Intel® Microprocessors • IA-32 Pentium® Pro • First IA-64 Microprocessor • Pentium 4 • Sun Microsystems Ultra. SPARC-III® Clocking • Clocking and CSEs • Alpha® Clocking: A Historical Overview • Clocking and CSEs • IBM® Microprocessors • Level-Sensitive Scan Design • Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 42
Hazard-Free Level-Sensitive Polarity-Hold Latch +Clock Data Out -Clock Eichelberger 1983 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 43
General LSSD Configuration Inputs (X) Outputs (Y) Combinational Logic Clocked Storage Elements Y=Y(X, Sn ) Scan-Out Clock Present State Sn Scan-In Nov. 14, 2003 Scan-Out Next State Sn+1 = f {Sn , X} Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 44
LSSD Shift Register Latch L 1 Latch -Scan_In -L 1 +L 1 L 2 Latch -Data -L 2 +A Clk -C Clk +B Clk Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 45
LSSD Double Latch Design State Sn Primary Outputs Z X 1 X 2 Primary Inputs X Combinational Logic X 3 Xn C 1 A Shift Scan In B Shift or Scan In Nov. 14, 2003 L 1 L 2 Sn L 1 L 2 Scan Out Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 46
IBM® S/390 Parallel Server Processor B_CLK A_CLK CLKG L 1 L 2 SCAN_IN CLKL Q CLK_ENABLE (SCAN_OUT) CLKG IN_A SELECT_N IN_B SELECT_A CLKL TEST_DISABLE LSSD SRL with multiplexer used in the IBM S/390 G 4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 47
IBM® S/390 Parallel Server Processor B_CLK A_CLK Q SCAN_IN mux_A IN_B IN_C (SCAN_OUT) mux_M_N IN_M IN_N SELECT_N Q CLKL SELECT_A TEST_DISABLE Static multiplexer version of the SRL used in the IBM S/390 G 4 (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 48
CLKG A_CLK SCAN_IN C 1 IN L 1 L 2 Q (SCAN_OUT) IBM® S/390 Parallel Server Processor C 2 B_CLK CLKG C 2_ENABLE C 2 C 1_DISABLE C 1 A clocked storage element is used in the non-timing-critical timing macros of the IBM S/390 G 4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 49
IBM® S/390 Parallel Server Processor CLKG C 1 C 2 B_CLK CLKG C 2_ENABLE UNOVERLAP C 1_DISABLE C 2 C 1 The clock-generation element used to detect problems created with fast paths: IBM S/390 G 4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 50
IBM® Power. PC Processor SCAN_GATE SG SEL_EXTi SELi NCLK (a) OT SEL 0 CLK SELn-1 SO D 0 Dn-1 CLK True Mux CLK Slave Latch OC SEL 0 SELn-1 SR Master Latch Complement Mux (b) The experimental IBM Power. PC processor (Silberman et al. 1998), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 51
IBM® Power. PC 603: Master-Slave Latch VDD ACLK SCANin C 2 ACLK C 1 C 2 Dout Din C 1 C 2 ACLK The Power. PC 603 MSL (Gerosa et al. 1994), Copyright © 1994 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 52
IBM® Power. PC 603: Local Clk Generator C 1_FREEZE C 1_TEST SCAN_C 1 GCLK ACLK C 1 WAITCLK OVERRIDE C 2_TEST C 2_FREEZE The Power. PC 603 local clock regenerator (Gerosa et al. 1994), Copyright © 1994 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 53
Summary • Intel® Microprocessors • Active clock deskewing in Pentium® processors • Sun Microsystems® Processors • Semidynamic flip-flop (one of the fastest single-ended flip-flops today, “soft-edge”) • Alpha® Processors • Performance leader in the ‘ 90 s • Incorporating logic into CSEs • IBM® Processors • Design for testability techniques • Low-power champion Power. PC 603 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 54