ELEC 5970 0016970 001Fall 2005 Special Topics in

  • Slides: 22
Download presentation
ELEC 5970 -001/6970 -001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic

ELEC 5970 -001/6970 -001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Low-Power Logic Design and Parallelism Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http: //www. eng. auburn. edu/~vagrawal@eng. auburn. edu 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 1

State Encoding • Two-bit binary counter: • State sequence, 00→ 01→ 10→ 11→ 00

State Encoding • Two-bit binary counter: • State sequence, 00→ 01→ 10→ 11→ 00 • Six bit transitions in four clock cycles • 6/4 = 1. 5 transitions per clock • Two-bit Gray-code counter • State sequence, 00→ 01→ 10→ 00 • Four bit transitions in four clock cycles • 4/4 = 1. 0 transition per clock • Gray-code counter is more power efficient. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers (now Springer), 1998. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 2

Three-Bit Counters State 000 001 010 011 100 101 110 111 000 11/01/05 Binary

Three-Bit Counters State 000 001 010 011 100 101 110 111 000 11/01/05 Binary No. of toggles 1 2 1 3 Gray-code State No. of toggles 000 001 1 010 111 100 000 ELEC 5970 -001/6970 -001 Lecture 17 1 1 1 1 3

N-Bit Counter: Toggles in Counting Cycle • Binary counter: T(binary) = 2(2 N –

N-Bit Counter: Toggles in Counting Cycle • Binary counter: T(binary) = 2(2 N – 1) • Gray-code counter: T(gray) = 2 N • T(gray)/T(binary) = 2 N-1/(2 N – 1) → 0. 5 11/01/05 Bits T(binary) T(gray)/T(binary) 1 2 2 1. 0 2 6 4 0. 6667 3 14 8 0. 5714 4 30 16 0. 5333 5 62 32 0. 5161 6 126 64 0. 5079 ∞ - - 0. 5000 ELEC 5970 -001/6970 -001 Lecture 17 4

Bus Encoding • Example: Four bit bus • 0000→ 1110 has three transitions. •

Bus Encoding • Example: Four bit bus • 0000→ 1110 has three transitions. • If bits of second pattern are inverted, then 0000→ 0001 will have only one transition. Number of bit transitions after inversion encoding • Bit-inversion encoding for N-bit bus: 11/01/05 N N/2 0 0 N/2 Number of bit transitions ELEC 5970 -001/6970 -001 Lecture 17 N 5

Sent data Received data Bus-Inversion Encoding Logic Polarity decision logic 11/01/05 Bus register Polarity

Sent data Received data Bus-Inversion Encoding Logic Polarity decision logic 11/01/05 Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O, ” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49 -58, March 1995. ELEC 5970 -001/6970 -001 Lecture 17 6

Transition probability based on PI statistics FSM State Encoding 0. 6 11 0. 3

Transition probability based on PI statistics FSM State Encoding 0. 6 11 0. 3 0. 4 00 0. 6 0. 1 0. 3 0. 1 01 01 0. 4 0. 9 00 0. 6 0. 1 11 0. 9 Expected number of state-bit transitions: 2(0. 3+0. 4) + 1(0. 1+0. 1) = 1. 6 1(0. 3+0. 4+0. 1) + 2(0. 1) = 1. 0 State encoding can be selected using a power-based cost function. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 7

FSM: Clock-Gating • Moore machine: Outputs depend only on the state variables. – If

FSM: Clock-Gating • Moore machine: Outputs depend only on the state variables. – If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Xi/Zk Si Sk Sj 11/01/05 Xj/Zk Xk/Zk Clock can be stopped when (Xk, Sk) combination occurs. ELEC 5970 -001/6970 -001 Lecture 17 8

Clock-Gating in Moore FSM Flip-flops PI Clock activation logic CK 11/01/05 Latch Combinational logic

Clock-Gating in Moore FSM Flip-flops PI Clock activation logic CK 11/01/05 Latch Combinational logic PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998. ELEC 5970 -001/6970 -001 Lecture 17 9

Clock-Gating in Low-Power Flip-Flop D D Q CK 11/01/05 ELEC 5970 -001/6970 -001 Lecture

Clock-Gating in Low-Power Flip-Flop D D Q CK 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 10

Low-Power Datapath Architecture • Lower supply voltage – This slows down circuit speed –

Low-Power Datapath Architecture • Lower supply voltage – This slows down circuit speed – Use parallel computing to gain the speed back • Works well when threshold voltage is also lowered. • About 60% reduction in power obtainable. • Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 11

CK Combinational logic Output Cref Supply voltage Total capacitance switched per cycle Clock frequency

CK Combinational logic Output Cref Supply voltage Total capacitance switched per cycle Clock frequency Power consumption: Pref 11/01/05 Register Input Register A Reference Datapath ELEC 5970 -001/6970 -001 Lecture 17 = Vref = Cref =f = Cref. Vref 2 f 12

Comb. Logic Copy 2 Multiphase Clock gen. and mux control f/N Register f/N N

Comb. Logic Copy 2 Multiphase Clock gen. and mux control f/N Register f/N N = Deg. of parallelism Register Input Comb. Logic Copy 1 Supply voltage: VN ≤ V 1 = Vref N to 1 multiplexer f/N Register A copy processes every Nth input, operates at reduced voltage Register A Parallel Architecture Output f Comb. Logic Copy N CK 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 13

Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4

Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 14

Power PN = Pproc + Poverhead Pproc = N(Cinreg+Ccomb)VN 2 f/N + Coutreg. VN

Power PN = Pproc + Poverhead Pproc = N(Cinreg+Ccomb)VN 2 f/N + Coutreg. VN 2 f = (Cinreg+Ccomb+Coutreg)VN 2 f = Cref. VN 2 f Poverhead = Coverhead. VN 2 f PN [1 + δ(N – 1)]Cref. VN 2 f = PN ── P 1 11/01/05 = ≈ δCref(N – 1)VN 2 f VN 2 [1 + δ(N – 1)] ─── Vref 2 ELEC 5970 -001/6970 -001 Lecture 17 15

Voltage vs. Speed Delay of a gate, T ≈ CLVref ──── I CLVref ─────

Voltage vs. Speed Delay of a gate, T ≈ CLVref ──── I CLVref ───── k(W/L)(Vref – Vt)2 = where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage Normalized gate delay, T 4. 0 N=2 2. 0 N=1 1. 0 0. 0 11/01/05 N=3 3. 0 1. 2μ CMOS Voltage reduction slows down as we get closer to Vt Vt V 3 V 2=2. 9 V Vref =5 V ELEC 5970 -001/6970 -001 Lecture 17 Supply voltage 16

Increasing Multiprocessing 1. 0 1. 2μ CMOS, Vref = 5 V 0. 8 Vt=0.

Increasing Multiprocessing 1. 0 1. 2μ CMOS, Vref = 5 V 0. 8 Vt=0. 8 V 0. 6 PN/P 1 Vt=0. 4 V 0. 4 0. 2 Vt=0 V (extreme case) 0. 0 1 2 3 4 5 6 7 8 9 10 11 12 N 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 17

Extreme Case: Vt = 0 Delay, T α 1/ Vref For N processing elements,

Extreme Case: Vt = 0 Delay, T α 1/ Vref For N processing elements, delay = NT → VN = Vref/N PN ── P 1 = [1+ δ (N – 1)] 1 ── N 2 → 1/N For negligible overhead, δ→ 0 PN ── P 1 ≈ 1 ── N 2 For Vt > 0, power reduction is less and there will be an optimum value of N. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 18

Reduced-Power Shift Register D Q D Q multiplexer D D Q D Q D

Reduced-Power Shift Register D Q D Q multiplexer D D Q D Q D Output Q CK(f/2) Flip-flops are operated at full voltage and half the clock frequency. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 19

Power Consumption of Shift Reg. 16 -bit shift register, 2μ CMOS Freq (MHz) Power

Power Consumption of Shift Reg. 16 -bit shift register, 2μ CMOS Freq (MHz) Power (μW) 1 33. 0 1535 2 16. 5 887 4 8. 25 738 C. Piguet, “Circuit and Logic Level Design, ” pages 103 -133 in W. Nebel and J. Mermet (ed. ), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997. 11/01/05 1. 0 Normalized power Deg. Of parallelism P = C’VDD 2 f/n 0. 5 0. 25 0. 0 1 ELEC 5970 -001/6970 -001 Lecture 17 2 4 Degree of parallelism, n 20

Multicore Processors • D. Geer, “Chip Makers Turn to Multicore Processors, ” Computer, vol.

Multicore Processors • D. Geer, “Chip Makers Turn to Multicore Processors, ” Computer, vol. 38, no. 5, pp. 11 -13, May 2005. • A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips, ” Computer, vol. 5, no. 7, pp. 36 -40, July 2005; this special issue contains three more articles on multicore processors. 11/01/05 ELEC 5970 -001/6970 -001 Lecture 17 21

Performance based on SPECint 2000 and SPECfp 2000 benchmarks Multicore Processors 11/01/05 Computer, May

Performance based on SPECint 2000 and SPECfp 2000 benchmarks Multicore Processors 11/01/05 Computer, May 2005, p. 12 Multicore Single core 2000 2004 ELEC 5970 -001/6970 -001 Lecture 17 2008 22