ESE 680 002 ESE 534 Computer Organization Day

  • Slides: 45
Download presentation
ESE 680 -002 (ESE 534): Computer Organization Day 7: January 31, 2007 Energy and

ESE 680 -002 (ESE 534): Computer Organization Day 7: January 31, 2007 Energy and Power Penn ESE 680 -002 Spring 2007 -- De. Hon 1

Today • Energy Tradeoffs? • Voltage limits and leakage? • Thermodynamics meets Information Theory

Today • Energy Tradeoffs? • Voltage limits and leakage? • Thermodynamics meets Information Theory • Adiabatic Switching • [This is an ambitious lecture] Penn ESE 680 -002 Spring 2007 -- De. Hon 2

At Issue • Many now argue power will be the ultimate scaling limit –

At Issue • Many now argue power will be the ultimate scaling limit – (not lithography, costs, …) • Proliferation of portable and handheld devices – …battery size and life biggest issues • Cooling, energy costs may dominate cost of electronics Penn ESE 680 -002 Spring 2007 -- De. Hon 3

What can we do about it? tgd=Q/I=(CV)/I Id=(m. COX/2)(W/L)(Vgs-VTH Penn ESE 680 -002 Spring

What can we do about it? tgd=Q/I=(CV)/I Id=(m. COX/2)(W/L)(Vgs-VTH Penn ESE 680 -002 Spring 2007 -- De. Hon 2 ) 4

Tradeoff • E V 2 • tgd 1/V • We can trade speed for

Tradeoff • E V 2 • tgd 1/V • We can trade speed for energy • E×(tgd)2 constant Martin et al. Power-Aware Computing, Kluwer 2001 http: //caltechcstr. library. caltech. edu/308/ Penn ESE 680 -002 Spring 2007 -- De. Hon 5

Questions • How far can this go? – (return to later in lecture) •

Questions • How far can this go? – (return to later in lecture) • What do we do about slowdown? Penn ESE 680 -002 Spring 2007 -- De. Hon 6

Parallelism • We have Area-Time tradeoffs • Compensate slowdown with additional parallelism • …trade

Parallelism • We have Area-Time tradeoffs • Compensate slowdown with additional parallelism • …trade Area for Energy Architectural Option Penn ESE 680 -002 Spring 2007 -- De. Hon 7

Ideal Example • • • Perhaps: 1 n. J/32 b Op, 10 ns cycle

Ideal Example • • • Perhaps: 1 n. J/32 b Op, 10 ns cycle Cut voltage in half 0. 25 n. J/32 b Op, 20 ns cycle Two in parallel to complete 2 ops/20 ns 75% energy reduction – Also 75% power reduction Penn ESE 680 -002 Spring 2007 -- De. Hon 8

Power Density Constrained Example • • Logic Density: 1 foo-op/mm 2 Energy cost: 10

Power Density Constrained Example • • Logic Density: 1 foo-op/mm 2 Energy cost: 10 n. J/foo-op @ 10 GHz Cooling limit: 100 W/cm 2 How many foo-ops/cm 2/s? – 10 n. J/mm 2 x 100 mm 2/cm 2=1000 n. J/cm 2 – top speed 100 MHz – 100 M x 100 foo-ops = 1010 foo-ops/cm 2/s Penn ESE 680 -002 Spring 2007 -- De. Hon 9

Response • How many foo-ops/cm 2/s? – 10 n. J/mm 2 x 100 mm

Response • How many foo-ops/cm 2/s? – 10 n. J/mm 2 x 100 mm 2/cm 2=1000 n. J/cm 2 – top speed 100 MHz – 100 M x 100 foo-ops = 1010 foo-ops/cm 2/s • Power constraint won’t let us run at 10 GHz – might as well lower voltage, save energy Penn ESE 680 -002 Spring 2007 -- De. Hon 10

What can we support? E×(tgd)2 constant 10 n. J×(100 ps)2=E×(tcycle)2 Penn ESE 680 -002

What can we support? E×(tgd)2 constant 10 n. J×(100 ps)2=E×(tcycle)2 Penn ESE 680 -002 Spring 2007 -- De. Hon 11

(Pushing through the Math) Penn ESE 680 -002 Spring 2007 -- De. Hon 12

(Pushing through the Math) Penn ESE 680 -002 Spring 2007 -- De. Hon 12

Improved Power • How many foo-ops/cm 2/s? – 2 GHz x 100 foo-ops =

Improved Power • How many foo-ops/cm 2/s? – 2 GHz x 100 foo-ops = 2 × 1011 foo-ops/cm 2/s – At 5× lower voltage – [vs. 100 M x 100 foo-ops = 1010 foo-ops/cm 2/s] Penn ESE 680 -002 Spring 2007 -- De. Hon 13

How far? Penn ESE 680 -002 Spring 2007 -- De. Hon 14

How far? Penn ESE 680 -002 Spring 2007 -- De. Hon 14

Limits • Ability to turn off the transistor • Noise • Parameter Variations Penn

Limits • Ability to turn off the transistor • Noise • Parameter Variations Penn ESE 680 -002 Spring 2007 -- De. Hon 15

Sub Threshold Conduction • To avoid leakage want Ioff very small • Use Ion

Sub Threshold Conduction • To avoid leakage want Ioff very small • Use Ion for logic – determines speed • Want Ion/Ioff large [Frank, IBM J. R&D v 46 n 2/3 p 235] Penn ESE 680 -002 Spring 2007 -- De. Hon 16

Sub Threshold Conduction • S 90 m. V for single gate • S 70

Sub Threshold Conduction • S 90 m. V for single gate • S 70 m. V for double gate • 4 orders of magnitude IVT/Ioff VT>280 m. V [Frank, IBM J. R&D v 46 n 2/3 p 235] Penn ESE 680 -002 Spring 2007 -- De. Hon 17

ITRS 2005 – High Performance Penn ESE 680 -002 Spring 2007 -- De. Hon

ITRS 2005 – High Performance Penn ESE 680 -002 Spring 2007 -- De. Hon Table 40 a 18

ITRS 2005 – Low Power Penn ESE 680 -002 Spring 2007 -- De. Hon

ITRS 2005 – Low Power Penn ESE 680 -002 Spring 2007 -- De. Hon Table 41 c 19

Thermodynamics Penn ESE 680 -002 Spring 2007 -- De. Hon 20

Thermodynamics Penn ESE 680 -002 Spring 2007 -- De. Hon 20

Lower Bound? • Reducing entropy costs energy • Single bit gate output – Set

Lower Bound? • Reducing entropy costs energy • Single bit gate output – Set from previous value to 0 or 1 – Reduce state space by factor of 2 – Entropy: S= k×ln(before/after)=k×ln 2 – Energy=T S=k. T×ln(2) • Naively setting a bit costs at least k. T×ln(2) Penn ESE 680 -002 Spring 2007 -- De. Hon 21

Numbers (ITRS 2005) • k. T×ln(2) = 2. 87× 10 -21 J (at R.

Numbers (ITRS 2005) • k. T×ln(2) = 2. 87× 10 -21 J (at R. T K=300) W/L=3 W=21 nm=0. 021 mm Table 41 d Penn ESE 680 -002 Spring 2007 -- De. Hon C 8× 10 -18 F 10 -17 F Eop=CV 2=2. 5× 10 -18 F 22

Sanity Check • • V=0. 5 V Q=CV=0. 5× 10 -17 columbs e=1. 6×

Sanity Check • • V=0. 5 V Q=CV=0. 5× 10 -17 columbs e=1. 6× 10 -19 columbs Q 30 electrons? • Energy in a particle? – 105— 106 electrons? Penn ESE 680 -002 Spring 2007 -- De. Hon 23

Hmm… • CV 2=2. 5× 10 -18 J • 18 Billion Transistors in 2.

Hmm… • CV 2=2. 5× 10 -18 J • 18 Billion Transistors in 2. 5 cm 2 – Generous, assumes no interconnect capacitance • • 4. 5× 10 -8 J/2. 5 cm 2 2× 10 -8 J/cm 2 Cooling limit of @100 W/cm 2 Maximum operating frequency? 5 GHz Penn ESE 680 -002 Spring 2007 -- De. Hon 24

Recycling… • Thermodynamics only says we have to dissipate energy if we discard information

Recycling… • Thermodynamics only says we have to dissipate energy if we discard information • Can we compute without discarding information? • Can we use this? Penn ESE 680 -002 Spring 2007 -- De. Hon 25

Three Reversible Primitives Penn ESE 680 -002 Spring 2007 -- De. Hon 26

Three Reversible Primitives Penn ESE 680 -002 Spring 2007 -- De. Hon 26

Universal Primitives • These primitives – Are universal – Are all reversible • If

Universal Primitives • These primitives – Are universal – Are all reversible • If keep all the intermediates they produce – Discard no information – Can run computation in reverse Penn ESE 680 -002 Spring 2007 -- De. Hon 27

Cleaning Up • Can keep “erase” unwanted intermediates with reverse circuit Penn ESE 680

Cleaning Up • Can keep “erase” unwanted intermediates with reverse circuit Penn ESE 680 -002 Spring 2007 -- De. Hon 28

Thermodynamics • In theory, at least, thermodynamics does not demand that we dissipate any

Thermodynamics • In theory, at least, thermodynamics does not demand that we dissipate any energy (power) in order to compute Penn ESE 680 -002 Spring 2007 -- De. Hon 29

Adiabatic Switching Penn ESE 680 -002 Spring 2007 -- De. Hon 30

Adiabatic Switching Penn ESE 680 -002 Spring 2007 -- De. Hon 30

Two Observations 1. Dissipate power through on-transistor charging capacitance 2. Discard capacitor charge at

Two Observations 1. Dissipate power through on-transistor charging capacitance 2. Discard capacitor charge at end of cycle Penn ESE 680 -002 Spring 2007 -- De. Hon 31

Charge Cycle • Charging capacitor § Q=CV § E=QV § E=CV 2 § Half

Charge Cycle • Charging capacitor § Q=CV § E=QV § E=CV 2 § Half in capacitor, half dissipated in pullup [Athas/Koller/Svensoon, USC/ISI ACMOS-TR-2 1993] Penn ESE 680 -002 Spring 2007 -- De. Hon 32

Adiabatic Switching • Current source charging: – Ramp supplies slowly so supply constant current

Adiabatic Switching • Current source charging: – Ramp supplies slowly so supply constant current § P=I 2 R § Etotal=P*T § Q=IT=CV § I=CV/T § Etotal=I 2 R*T=(CV/T)2 R*T § Etotal=I 2 R*T=(RC/T) CV 2 Penn ESE 680 -002 Spring 2007 -- De. Hon Ignores leakage … May require large Vt 33

Impact of Adiabatic Switching § § Etotal=I 2 R*T=(RC/T) CV 2 RC=tgd Etotal (tgd/T)

Impact of Adiabatic Switching § § Etotal=I 2 R*T=(RC/T) CV 2 RC=tgd Etotal (tgd/T) Without reducing V § Can trade energy and time § E×T=constant Penn ESE 680 -002 Spring 2007 -- De. Hon 34

Adiabatic Discipline • Never turn on a device with a large voltage differential across

Adiabatic Discipline • Never turn on a device with a large voltage differential across it. • P= V 2/R Penn ESE 680 -002 Spring 2007 -- De. Hon 35

SCRL Inverter • F’s, nodes, at Vdd/2 • P 1 at ground • •

SCRL Inverter • F’s, nodes, at Vdd/2 • P 1 at ground • • Slowly turn on P 1 Slow split F’s Slow turn off P 1’s Slow return F’s to Vdd/2 [Younis/Knight ISLPED(? ) 1994] Penn ESE 680 -002 Spring 2007 -- De. Hon 36

SCRL Inverter • Basic operation – Set inputs – Split rails to compute output

SCRL Inverter • Basic operation – Set inputs – Split rails to compute output adiabatically – Isolate output – Bring rails back together • Have transferred logic to output • Still need to worry about resetting output adiabatically Penn ESE 680 -002 Spring 2007 -- De. Hon 37

SCRL NAND • Same basic idea works for nand gate – Set inputs –

SCRL NAND • Same basic idea works for nand gate – Set inputs – Adiabatically switch output – Isolate output – Reset power rails Penn ESE 680 -002 Spring 2007 -- De. Hon 38

SCRL Cascade • Cascade like domino logic – Compute phase 1 – Compute phase

SCRL Cascade • Cascade like domino logic – Compute phase 1 – Compute phase 2 from phase 1… • How do we restore the output? Penn ESE 680 -002 Spring 2007 -- De. Hon 39

SCRL Pipeline • We must uncompute the logic – Forward gates compute output –

SCRL Pipeline • We must uncompute the logic – Forward gates compute output – Reverse gate restore to Vdd/2 Penn ESE 680 -002 Spring 2007 -- De. Hon 40

SCRL Pipeline • • P 1 high (F 1 on; F 1 inverse off)

SCRL Pipeline • • P 1 high (F 1 on; F 1 inverse off) F 1 split: a=F 1(a 0) F 2 split: b=F 2(F 1(a 0)) F 2 -1(F 2(F 1(a 0))=a P 1 low – now F 2 -1 drives a F 1 restore by F 1 converge …restore F 2 Use F 2 -1 to restore a to Vdd/2 adiabatically Penn ESE 680 -002 Spring 2007 -- De. Hon 41

SCRL Rail Timing Penn ESE 680 -002 Spring 2007 -- De. Hon 42

SCRL Rail Timing Penn ESE 680 -002 Spring 2007 -- De. Hon 42

SCRL • Requires Reversible Gates to uncompute each intermediate • All switching (except IO)

SCRL • Requires Reversible Gates to uncompute each intermediate • All switching (except IO) is adiabatic • Can, in principle, compute at any energy Penn ESE 680 -002 Spring 2007 -- De. Hon 43

Trickiness • • Generating the ramped clock rails Use LC circuits Need high-Q resonators

Trickiness • • Generating the ramped clock rails Use LC circuits Need high-Q resonators Making this efficient is key to practical implementation – Some claim not possible in practice Penn ESE 680 -002 Spring 2007 -- De. Hon 44

Big Ideas • Can trade time for energy – …area for energy • Noise

Big Ideas • Can trade time for energy – …area for energy • Noise and subthreshold conduction limit voltage scaling • Thermodynamically admissible to compute without dissipating energy • Adiabatic switching alternative to voltage scaling • Can base CMOS logic on these observations Penn ESE 680 -002 Spring 2007 -- De. Hon 45