LowPower Design of Digital VLSI Circuits Multicore Design
- Slides: 23
Low-Power Design of Digital VLSI Circuits Multicore Design for Low Power Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng. auburn. edu http: //www. eng. auburn. edu/~vagrawal Copyright Agrawal, 2011 Lecture 15: Multicore Design 1
Low-Power Datapath Architecture l Lower supply voltage l l l This slows down circuit speed Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. Copyright Agrawal, 2011 Lecture 15: Multicore Design 2
Input Combinational logic Register A Reference Datapath Output Cref CK Supply voltage Total capacitance switched per cycle Clock frequency Power consumption: Pref Copyright Agrawal, 2011 Lecture 15: Multicore Design = Vref = Cref =f = Cref. Vref 2 f 3
Comb. Logic Copy 2 Multiphase Clock gen. and mux control f/N Register f/N N = Deg. of parallelism Register Comb. Logic Copy 1 Supply voltage: VN ≤ V 1 = Vref N to 1 multiplexer Input Register Each copy processes every Nth input, operates at f/N reduced voltage Register A Parallel Architecture Output f Comb. Logic Copy N CK Copyright Agrawal, 2011 Lecture 15: Multicore Design 4
Level Converter: L to H VDDH Transistors with thicker oxide and longer channels Vout_H Vin_L VDDL N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12. 4. 3, Addison-Wesley, 2005. Copyright Agrawal, 2011 Lecture 15: Multicore Design 5
Level Converter: H to L VDDL Vin_H Transistors with thicker oxide and longer channels Vout_L N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12. 4. 3, Addison-Wesley, 2005. Copyright Agrawal, 2011 Lecture 15: Multicore Design 6
Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 Copyright Agrawal, 2011 Lecture 15: Multicore Design 7
Power PN = Pproc + Poverhead Pproc = N(Cinreg+ Ccomb)VN 2 f/N + Coutreg. VN 2 f = (Cinreg+ Ccomb+Coutreg)VN 2 f = Cref. VN 2 f Poverhead = Coverhead. VN 2 f PN [1 + δ(N – 1)]Cref. VN 2 f = PN ── P 1 Copyright Agrawal, 2011 = ≈ δCref(N – 1)VN 2 f VN 2 [1 + δ(N – 1)] ─── Vref 2 Lecture 15: Multicore Design 8
Voltage vs. Speed Delay of a gate, T ≈ CLVref ──── I CLVref ───── k(W/L)(Vref – Vt)2 = where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage Normalized gate delay, T 4. 0 N=2 2. 0 N=1 1. 0 0. 0 Copyright Agrawal, 2011 N=3 3. 0 1. 2μ CMOS Voltage reduction slows down as we get closer to Vt Vt V 3 V 2=2. 9 V Lecture 15: Multicore Design Vref =5 V Supply voltage 9
Increasing Multiprocessing 1. 0 1. 2μ CMOS, Vref = 5 V 0. 8 Vt=0. 8 V 0. 6 PN/P 1 Vt=0. 4 V 0. 4 0. 2 Vt=0 V (extreme case) 0. 0 1 2 3 4 5 6 7 8 9 10 11 12 N Copyright Agrawal, 2011 Lecture 15: Multicore Design 10
Extreme Cases: Vt = 0 Delay, T α 1/ Vref For N processing elements, delay = NT → VN = Vref/N PN ── P 1 = [1+ δ (N – 1)] 1 ── N 2 → 1/N For negligible overhead, δ→ 0 PN ── P 1 ≈ 1 ── N 2 For Vt > 0, power reduction is less and there will be an optimum value of N. Copyright Agrawal, 2011 Lecture 15: Multicore Design 11
Example: Multiplier Core l Specification: l 200 MHz Clock l 15 W dissipation @ 5 V l Low voltage operation, VDD ≥ 1. 5 volts Relative clock rate l = (VDD – 0. 5)2 ─────── 20. 25 Problem: l Integrate multiplier core on a SOC l Power budget for multiplier ~ 5 W Copyright Agrawal, 2011 Lecture 15: Multicore Design 12
Input Multiplier Core 2 200 MHz CK Multiphase Clock gen. and mux control 40 MHz Reg 40 MHz Output Reg 40 MHz Multiplier Core 1 5 to 1 mux Reg A Multicore Design 200 MHz Multiplier Core 5 Core clock frequency = 200/N, N should divide 200. Copyright Agrawal, 2011 Lecture 15: Multicore Design 13
How Many Cores? l For N cores: l clock frequency = 200/N MHz l Supply voltage, VDDN = 0. 5 + (20. 25/N)1/2 volts l Assuming 10% overhead per core, VDDN 2 Power dissipation =15 [1 + 0. 1(N – 1)] (───) watts 5 Copyright Agrawal, 2011 Lecture 15: Multicore Design 14
Design Tradeoffs Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 1 200 5. 00 15. 0 2 100 3. 68 8. 94 4 50 2. 75 5. 90 5 40 2. 51 5. 29 8 25 2. 10 4. 50 Number of cores, N Copyright Agrawal, 2011 Lecture 15: Multicore Design 15
Power Reduction in Processors l l Just about everything is used. Hardware methods: Voltage reduction for dynamic power l Dual-threshold devices for leakage reduction l Clock gating, frequency reduction l Sleep mode l l Architecture: Instruction set l hardware organization l l Software methods Copyright Agrawal, 2011 Lecture 15: Multicore Design 16
Parallel Architecture Processor Input Processor Output f/2 Input f Processor Capacitance = C Voltage = V Frequency = f Power = CV 2 f f/2 Copyright Agrawal, 2011 Lecture 15: Multicore Design f Capacitance = 2. 2 C Voltage = 0. 6 V Frequency = 0. 5 f Power = 0. 396 CV 2 f 17
Output Input ½ Proc. Register Processor Register Input Register Pipeline Architecture ½ Proc. Output f f Capacitance = 1. 2 C Voltage = 0. 6 V Frequency = f Power = 0. 432 CV 2 f Capacitance = C Voltage = V Frequency = f Power = CV 2 f Copyright Agrawal, 2011 Lecture 15: Multicore Design 18
Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance n. C C Voltage V/n Frequency f/n f Power CV 2 f/n 2 Chip area n times 10 -20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer, 1998. Copyright Agrawal, 2011 Lecture 15: Multicore Design 19
Performance based on SPECint 2000 and SPECfp 2000 benchmarks Multicore Processors Copyright Agrawal, 2011 Computer, May 2005, p. 12 Multicore Single core 2000 2004 Lecture 15: Multicore Design 2008 20
Multicore Processors l l l D. Geer, “Chip Makers Turn to Multicore Processors, ” Computer, vol. 38, no. 5, pp. 11 -13, May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips, ” Computer, vol. 5, no. 7, pp. 36 -40, July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip, ” IEEE Spectrum, vol. 43. no. 1, pp. 20 -23, January 2006. Copyright Agrawal, 2011 Lecture 15: Multicore Design 21
Cell - Cell Broadband Engine Architecture © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops Copyright Agrawal, 2011 L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony Lecture 15: Multicore Design 22
Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Copyright Agrawal, 2011 Lecture 15: Multicore Design Eight Identical Processors f = 5. 6 GHz (max) 44. 8 Gflops 23
- Scaling factors in vlsi
- Series parallel circuit current
- Digital integrated circuits: a design perspective
- Digital integrated circuits: a design perspective
- Digital integrated circuits a design perspective
- Speedy transactions in multicore in-memory databases
- Multicore packet scheduler:
- Multiprocessor and multicore
- Multicore programming
- Amdahl's law in the multicore era
- Cache craftiness for fast multicore key-value storage
- Pxie-pcie8372
- Obs multicore
- Asymmetric multicore processing
- Autosar multicore
- Digital circuits
- Troubleshooting digital circuits
- Digital circuits
- Digital integrated circuits
- Signal circuit
- Digital ic characteristics
- Memory design in vlsi
- Vlsi flow design
- Biucache