ELEC 7770 Advanced VLSI Design Spring 2007 Power
- Slides: 26
ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng. auburn. edu http: //www. eng. auburn. edu/~vagrawal/COURSE/E 7770_Spr 07 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1
SIA Roadmap for Processors (1999) Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic transistors/cm 2 6. 2 M 18 M 39 M 84 M 180 M 390 M Clock (GHz) 1. 25 2. 1 3. 5 6. 0 10. 0 16. 9 Chip size (mm 2) 340 430 520 620 750 900 Power supply (V) 1. 8 1. 5 1. 2 0. 9 0. 6 0. 5 High-perf. Power (W) 90 130 160 175 183 Source: http: //www. semichips. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 2
Power Reduction in Processors § Just about everything is used. § Hardware methods: § § § Voltage reduction for dynamic power § Dual-threshold devices for leakage reduction § Clock gating, frequency reduction § Sleep mode Architecture: § Instruction set § hardware organization Software methods Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 3
SPEC CPU 2000 Benchmarks § Twelve integer and 14 floating point programs, § § CINT 2000 and CFP 2000. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300 MHz processor. CINT 2000 and CFP 2000 summary measurements are the geometric means of SPEC ratios. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 4
Reference CPU s: Sun Ultra 5_10 300 MHz Processor Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 5
CINT 2000: 3. 4 GHz Pentium 4, HT Technology (D 850 MD Motherboard) SPECint 2000_base = 1341 SPECint 2000 = 1389 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 6
Two Benchmark Results § Baseline: A uniform configuration not optimized for specific program: § Same compiler with same settings and flags used for all benchmarks § Other restrictions § Peak: Run is optimized for obtaining the peak performance for each benchmark program. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 7
CFP 2000: 3. 6 GHz Pentium 4, HT Technology (D 925 XCV/AA-400 Motherboard) SPECfp 2000_base = 1627 SPECfp 2000 = 1630 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 8
CINT 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECint 2000_base = 579 SPECint 2000 = 588 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 9
CFP 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECfp 2000_base = 648 SPECfp 2000 = 659 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 10
Energy SPEC Benchmarks § Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ────── joules consumed Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 11
Energy Efficiency § Efficiency averaged on n benchmark programs: § n 1/n Efficiency = ( Π Efficiencyi ) i=1 where Efficiencyi is the efficiency for program i. Relative efficiency: Efficiency of a computer Relative efficiency = ───────── Eff. of reference computer Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 12
SPEC 2000 Relative Energy Efficiency Always Laptop Min. power max. clock adaptive clk. min. clock Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 13
Voltage Scaling § Dynamic: Reduce voltage and frequency during § idle or low activity periods. Static: Clustered voltage scaling § Logic on non-critical paths given lower voltage. § 47% power reduction with 10% area increase reported. § M. Igarashi et al. , “Clustered Voltage Scaling Techniques for Low-Power Design, ” Proc. IEEE Symp. Low Power Design, 1997. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 14
Pipeline Gating § A pipeline processor uses speculative execution. § Incorrect branch prediction results in pipeline stalls and wasted energy. § Idea: Stop fetching instructions if a branch hazard is expected: § If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. § Ref. : S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction, ” Proc. 25 th Annual International Symp. Computer Architecture, June 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 15
Slack Scheduling § Application: Superscalar, out-of-order execution: § An instruction is executed as soon as data and resources it needs become available. § A commit unit reorders the results. § Delay the execution of instructions whose result is not § immediately needed. Example of RISC instructions: § add r 0, r 1, r 2; (A) § sub r 3, r 4, r 5; (B) § and r 9, x 1, r 9; (C) § or r 5, r 9, r 10; (D) § xor r 2, r 10, r 11; (E) Spring 07, Feb 22 J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack, ” Proc. ACM Kool Chips Workshop, Dec. 2000. ELEC 7770: Advanced VLSI Design (Agrawal) 16
Slack Scheduling Example Standard scheduling A B C D E Slack scheduling A B C D E Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 17
Slack Scheduling logic Re-order buffer Slack bit Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) Low-power execution units 18
Clock Distribution clock Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 19
Clock Power Pclk = CLVDD 2 f + CLVDD 2 f / λ 2 +. . . = CLVDD 2 f where CL = λ = stages – 1 Σ n=0 1 ─ λn total load capacitance constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 20
Clock Network Examples Alpha 21064 Alpha 21164 Alpha 21264 Technology 0. 75μ CMOS 0. 35μ CMOS Frequency (MHz) 200 300 600 Total capacitance 12. 5 n. F Clock load 3. 25 n. F 3. 75 n. F Clock power 40% (20 W) Max. clock skew 200 ps (<10%) 90 ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600 -MHz Alpha Microprocessor, ” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627 -1633, Nov. 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 21
Power Reduction Example § § § § Alpha 21064: 200 MHz @ 3. 45 V, power dissipation = 26 W Reduce voltage to 1. 5 V, power (5. 3 x) = 4. 9 W Eliminate FP, power (3 x) = 1. 6 W Scale 0. 75→ 0. 35μ, power (2 x) = 0. 8 W Reduce clock load, power (1. 3 x) = 0. 6 W Reduce frequency 200→ 160 MHz, power (1. 25 x) = 0. 5 W J. Montanaro et al. , “A 160 -MHz, 32 -b, 0. 5 -W CMOS RISC Microprocessor, ” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703 -1714, Nov. 1996. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 22
Parallel Architecture Processor Input Processor Output Input Output f/2 f Processor Capacitance = C Voltage = V Frequency = f Power = CV 2 f f/2 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) f Capacitance = 2. 2 C Voltage = 0. 6 V Frequency = 0. 5 f Power = 0. 396 CV 2 f 23
Output Input ½ Proc. Register Processor Register Input Register Pipeline Architecture ½ Proc. Output f f Capacitance = C Voltage = V Frequency = f Power = CV 2 f Spring 07, Feb 22 Capacitance = 1. 2 C Voltage = 0. 6 V Frequency = f Power = 0. 432 CV 2 f ELEC 7770: Advanced VLSI Design (Agrawal) 24
Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance n. C C Voltage V/n Frequency f/n f Power CV 2 f/n 2 Chip area n times 10 -20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 25
For More on Microprocessors § T. D. Burd and R. W. Brodersen, Energy Efficient § Microprocessor Design, Springer, 2002. R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 26
- Rotary district 7770
- Elec 4601
- Elec 4601
- Elec service plus
- Elec
- Dec alpha 21264
- Elec
- Saif zahir
- Elec 202
- For chase
- Spring, summer, fall, winter... and spring (2003)
- Autumn summer winter spring
- Difference between clock routing and power routing
- The real lesson 21
- Memory design in vlsi
- Y chart in vlsi design
- Vlsi design tutorial
- Subsystem design in vlsi
- Mixed signal vlsi design
- Rom design in vlsi
- Intro to vlsi
- Y chart in vlsi design
- Layout design rules in vlsi
- Modularity in vlsi
- Ad hoc testable design techniques
- Introduction to vlsi design
- Structured vlsi design