ELEC 7770 Advanced VLSI Design Spring 2007 Power

  • Slides: 26
Download presentation
ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal James

ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng. auburn. edu http: //www. eng. auburn. edu/~vagrawal/COURSE/E 7770_Spr 07 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1

SIA Roadmap for Processors (1999) Year 1999 2002 2005 2008 2011 2014 Feature size

SIA Roadmap for Processors (1999) Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic transistors/cm 2 6. 2 M 18 M 39 M 84 M 180 M 390 M Clock (GHz) 1. 25 2. 1 3. 5 6. 0 10. 0 16. 9 Chip size (mm 2) 340 430 520 620 750 900 Power supply (V) 1. 8 1. 5 1. 2 0. 9 0. 6 0. 5 High-perf. Power (W) 90 130 160 175 183 Source: http: //www. semichips. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 2

Power Reduction in Processors § Just about everything is used. § Hardware methods: §

Power Reduction in Processors § Just about everything is used. § Hardware methods: § § § Voltage reduction for dynamic power § Dual-threshold devices for leakage reduction § Clock gating, frequency reduction § Sleep mode Architecture: § Instruction set § hardware organization Software methods Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 3

SPEC CPU 2000 Benchmarks § Twelve integer and 14 floating point programs, § §

SPEC CPU 2000 Benchmarks § Twelve integer and 14 floating point programs, § § CINT 2000 and CFP 2000. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300 MHz processor. CINT 2000 and CFP 2000 summary measurements are the geometric means of SPEC ratios. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 4

Reference CPU s: Sun Ultra 5_10 300 MHz Processor Spring 07, Feb 22 ELEC

Reference CPU s: Sun Ultra 5_10 300 MHz Processor Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 5

CINT 2000: 3. 4 GHz Pentium 4, HT Technology (D 850 MD Motherboard) SPECint

CINT 2000: 3. 4 GHz Pentium 4, HT Technology (D 850 MD Motherboard) SPECint 2000_base = 1341 SPECint 2000 = 1389 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 6

Two Benchmark Results § Baseline: A uniform configuration not optimized for specific program: §

Two Benchmark Results § Baseline: A uniform configuration not optimized for specific program: § Same compiler with same settings and flags used for all benchmarks § Other restrictions § Peak: Run is optimized for obtaining the peak performance for each benchmark program. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 7

CFP 2000: 3. 6 GHz Pentium 4, HT Technology (D 925 XCV/AA-400 Motherboard) SPECfp

CFP 2000: 3. 6 GHz Pentium 4, HT Technology (D 925 XCV/AA-400 Motherboard) SPECfp 2000_base = 1627 SPECfp 2000 = 1630 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 8

CINT 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECint 2000_base =

CINT 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECint 2000_base = 579 SPECint 2000 = 588 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 9

CFP 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECfp 2000_base =

CFP 2000: 1. 7 GHz Pentium 4 (D 850 MD Motherboard) SPECfp 2000_base = 648 SPECfp 2000 = 659 Source: www. spec. org Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 10

Energy SPEC Benchmarks § Energy efficiency mode: Besides the execution time, energy efficiency of

Energy SPEC Benchmarks § Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ────── joules consumed Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 11

Energy Efficiency § Efficiency averaged on n benchmark programs: § n 1/n Efficiency =

Energy Efficiency § Efficiency averaged on n benchmark programs: § n 1/n Efficiency = ( Π Efficiencyi ) i=1 where Efficiencyi is the efficiency for program i. Relative efficiency: Efficiency of a computer Relative efficiency = ───────── Eff. of reference computer Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 12

SPEC 2000 Relative Energy Efficiency Always Laptop Min. power max. clock adaptive clk. min.

SPEC 2000 Relative Energy Efficiency Always Laptop Min. power max. clock adaptive clk. min. clock Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 13

Voltage Scaling § Dynamic: Reduce voltage and frequency during § idle or low activity

Voltage Scaling § Dynamic: Reduce voltage and frequency during § idle or low activity periods. Static: Clustered voltage scaling § Logic on non-critical paths given lower voltage. § 47% power reduction with 10% area increase reported. § M. Igarashi et al. , “Clustered Voltage Scaling Techniques for Low-Power Design, ” Proc. IEEE Symp. Low Power Design, 1997. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 14

Pipeline Gating § A pipeline processor uses speculative execution. § Incorrect branch prediction results

Pipeline Gating § A pipeline processor uses speculative execution. § Incorrect branch prediction results in pipeline stalls and wasted energy. § Idea: Stop fetching instructions if a branch hazard is expected: § If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. § Ref. : S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction, ” Proc. 25 th Annual International Symp. Computer Architecture, June 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 15

Slack Scheduling § Application: Superscalar, out-of-order execution: § An instruction is executed as soon

Slack Scheduling § Application: Superscalar, out-of-order execution: § An instruction is executed as soon as data and resources it needs become available. § A commit unit reorders the results. § Delay the execution of instructions whose result is not § immediately needed. Example of RISC instructions: § add r 0, r 1, r 2; (A) § sub r 3, r 4, r 5; (B) § and r 9, x 1, r 9; (C) § or r 5, r 9, r 10; (D) § xor r 2, r 10, r 11; (E) Spring 07, Feb 22 J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack, ” Proc. ACM Kool Chips Workshop, Dec. 2000. ELEC 7770: Advanced VLSI Design (Agrawal) 16

Slack Scheduling Example Standard scheduling A B C D E Slack scheduling A B

Slack Scheduling Example Standard scheduling A B C D E Slack scheduling A B C D E Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 17

Slack Scheduling logic Re-order buffer Slack bit Spring 07, Feb 22 ELEC 7770: Advanced

Slack Scheduling logic Re-order buffer Slack bit Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) Low-power execution units 18

Clock Distribution clock Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 19

Clock Distribution clock Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 19

Clock Power Pclk = CLVDD 2 f + CLVDD 2 f / λ 2

Clock Power Pclk = CLVDD 2 f + CLVDD 2 f / λ 2 +. . . = CLVDD 2 f where CL = λ = stages – 1 Σ n=0 1 ─ λn total load capacitance constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 20

Clock Network Examples Alpha 21064 Alpha 21164 Alpha 21264 Technology 0. 75μ CMOS 0.

Clock Network Examples Alpha 21064 Alpha 21164 Alpha 21264 Technology 0. 75μ CMOS 0. 35μ CMOS Frequency (MHz) 200 300 600 Total capacitance 12. 5 n. F Clock load 3. 25 n. F 3. 75 n. F Clock power 40% (20 W) Max. clock skew 200 ps (<10%) 90 ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600 -MHz Alpha Microprocessor, ” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627 -1633, Nov. 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 21

Power Reduction Example § § § § Alpha 21064: 200 MHz @ 3. 45

Power Reduction Example § § § § Alpha 21064: 200 MHz @ 3. 45 V, power dissipation = 26 W Reduce voltage to 1. 5 V, power (5. 3 x) = 4. 9 W Eliminate FP, power (3 x) = 1. 6 W Scale 0. 75→ 0. 35μ, power (2 x) = 0. 8 W Reduce clock load, power (1. 3 x) = 0. 6 W Reduce frequency 200→ 160 MHz, power (1. 25 x) = 0. 5 W J. Montanaro et al. , “A 160 -MHz, 32 -b, 0. 5 -W CMOS RISC Microprocessor, ” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703 -1714, Nov. 1996. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 22

Parallel Architecture Processor Input Processor Output Input Output f/2 f Processor Capacitance = C

Parallel Architecture Processor Input Processor Output Input Output f/2 f Processor Capacitance = C Voltage = V Frequency = f Power = CV 2 f f/2 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) f Capacitance = 2. 2 C Voltage = 0. 6 V Frequency = 0. 5 f Power = 0. 396 CV 2 f 23

Output Input ½ Proc. Register Processor Register Input Register Pipeline Architecture ½ Proc. Output

Output Input ½ Proc. Register Processor Register Input Register Pipeline Architecture ½ Proc. Output f f Capacitance = C Voltage = V Frequency = f Power = CV 2 f Spring 07, Feb 22 Capacitance = 1. 2 C Voltage = 0. 6 V Frequency = f Power = 0. 432 CV 2 f ELEC 7770: Advanced VLSI Design (Agrawal) 24

Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance n. C C Voltage V/n Frequency

Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance n. C C Voltage V/n Frequency f/n f Power CV 2 f/n 2 Chip area n times 10 -20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 25

For More on Microprocessors § T. D. Burd and R. W. Brodersen, Energy Efficient

For More on Microprocessors § T. D. Burd and R. W. Brodersen, Energy Efficient § Microprocessor Design, Springer, 2002. R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 26