Circuit Design with Alternative EnergyEfficient Devices Elad Alon

  • Slides: 27
Download presentation
Circuit Design with Alternative Energy-Efficient Devices Elad Alon Collaborators: Hei Kam, Fred Chen (MIT),

Circuit Design with Alternative Energy-Efficient Devices Elad Alon Collaborators: Hei Kam, Fred Chen (MIT), Tsu-Jae King-Liu, Vladimir Stojanovic (MIT), Dejan Markovic (UCLA), Mark Horowitz (Stanford) Dept. of EECS, UC Berkeley

CMOS is Scaling, Power Can Not 1000 Predictions (ca. 2000) Itanium II Itanium 100

CMOS is Scaling, Power Can Not 1000 Predictions (ca. 2000) Itanium II Itanium 100 Power (W) Pentium III Reality (Core 2) Pentium 4 Pentium Pro 10 Pentium 8086 1 80286 8088 8080 8008 486 DX 386 DX S. Borkar, Intel 4004 0. 1 1970 1975 Pentium II 1980 1985 1990 1995 2000 2005 2010 2

Ed Nowak, IBM Drain Current Id Supply and Threshold Voltages Scaling Vth, Vdd Gate

Ed Nowak, IBM Drain Current Id Supply and Threshold Voltages Scaling Vth, Vdd Gate Voltage Vg • k. T/q doesn’t scale, so lowering Vth increases leakage • Fixed Vth, Vdd power density doesn’t scale well 3

 • Many new devices with S-1<60 m. V/dec proposed • But, many of

• Many new devices with S-1<60 m. V/dec proposed • But, many of these are slow (low Ion) – And/or have other “weird” characteristics Drain Current Id Alternative Devices to the Rescue? T E F S O M New Device Slope=S-1 Gate Voltage Vg • Can these devices reduce energy? If so, at what performance? – Need to look at the circuits 4

Outline • Energy-Performance Analysis • Circuit Design with Relays • Conclusions 5

Outline • Energy-Performance Analysis • Circuit Design with Relays • Conclusions 5

Processor Power Breakdown • Most components track performance vs. energy curves of logic •

Processor Power Breakdown • Most components track performance vs. energy curves of logic • Control, Datapath, Clock • Use proxy circuit to examine tradeoffs 6

Proxy Circuit for Static Logic Vdd 0 V Output Input Ld stages Switching activity

Proxy Circuit for Static Logic Vdd 0 V Output Input Ld stages Switching activity factor = , Gate capacitance per stage = C • tdelay = Ld. CVdd/(2 Ion) • Edyn+Eleak = αLd. CVdd 2 + Ld. Ioff. Vddtdelay 7

Simple Optimization Rule • Optimal Ion/Ioff Ld/α – Derived in CMOS – But holds

Simple Optimization Rule • Optimal Ion/Ioff Ld/α – Derived in CMOS – But holds for nearly all switching devices Nose and Sakurai • Pleak/Pdyn ~constant – ~30 -50% across wide range of parameters 8

MOSFET “New Device” Energy Drain Current Id Using the Rule to Compare “New Device”

MOSFET “New Device” Energy Drain Current Id Using the Rule to Compare “New Device” MOSFET Vddx Gate Voltage Vg Vddx Performance • Match Ioff by adjusting “VT” • New device wins if: Ion, new(Vdd) > Ion, MOS(Vdd) 9

What Else Matters: Variability • Leakage: – E(Ioff) vs. E(Vth) • Delay: – Finite

What Else Matters: Variability • Leakage: – E(Ioff) vs. E(Vth) • Delay: – Finite Ld – Cycle time set by worst-case 10

What Else Matters: Wires & Area Vdd 0 V Output Input Cw Cw •

What Else Matters: Wires & Area Vdd 0 V Output Input Cw Cw • Devices don’t drive just other devices • Need to look at extrinsic cap (wires) too – Especially if device has area overhead 11

Parallelism Energy Serial: Perf. f Parallel: Perf. 2 f, E/op ~const “New Device” MOSFET

Parallelism Energy Serial: Perf. f Parallel: Perf. 2 f, E/op ~const “New Device” MOSFET Performance • If available, parallelism allows slower devices – Extends energy benefit to higher performance 12

Minimum Energy Normalized Energy/cycle Drain Current Id 2. 0 1. 5 Seff-1 Gate Voltage

Minimum Energy Normalized Energy/cycle Drain Current Id 2. 0 1. 5 Seff-1 Gate Voltage Vg 1. 0 Lower Seff 0. 5 0. 1 0. 2 0. 3 Vdd(V) • At low performance or high parallelism: – Lowest Vdd for required Ion/Ioff wins • Vdd, min Seff, Emin Seff 2 13

Source P Gate Drain N Drain Current Id (A/mm) Example: Tunneling FET Ion ≈A(Vgs+VT)exp[-B/(Vgs+VT)]

Source P Gate Drain N Drain Current Id (A/mm) Example: Tunneling FET Ion ≈A(Vgs+VT)exp[-B/(Vgs+VT)] [1]J. Chen et al. , IEEE Electron Device Lett. , vol. EDL-8, no. 11, pp. 515– 517, Nov. 1987. 1 E-05 1 E-07 1 E-09 1 E-11 1 E-13 1 E-15 0 0. 2 0. 4 0. 6 0. 8 1 Gate Voltage Vg (V) • Band-to-band tunneling device – Steep transition (<60 m. V/dec) at low current – Low Ion(<~100μA) • Assume work function can be tuned 14

15

15

Outline • Energy-Performance Analysis • Circuit Design with Relays • Conclusions 16

Outline • Energy-Performance Analysis • Circuit Design with Relays • Conclusions 16

Conductance Nano-Electro-Mechanical Relay Gon Vrl Vpi Gate Voltage Vg [V] • Based on mechanically

Conductance Nano-Electro-Mechanical Relay Gon Vrl Vpi Gate Voltage Vg [V] • Based on mechanically making and breaking contact – No leakage, perfectly abrupt transition • Reliability is the key challenge 17

Circuit Design with Relays CMOS: Relay: • CMOS delay set by electrical time constant

Circuit Design with Relays CMOS: Relay: • CMOS delay set by electrical time constant – Distribute logical/electrical effort over many stages • Relay: mechanical delay (~10 ns) >> electrical t (~1 ps) – Implement logic as a single complex gate 18

Relay Energy-Perf. Tradeoff • No leakage TFET Energy (J) • Stack of 30 series

Relay Energy-Perf. Tradeoff • No leakage TFET Energy (J) • Stack of 30 series relays MOSFET Relay – Vdd, min set only by functionality (surface force) Performance (GHz) • How about real logic circuits? 19

Relay-Based Adder • Manchester carry chain • Ripple carry – Cascade full adder cells

Relay-Based Adder • Manchester carry chain • Ripple carry – Cascade full adder cells • N-bit adder still 1 mechanical delay 20

Adder Energy-Delay • Compare vs. optimal CMOS adder • ~10 -40 x slower –

Adder Energy-Delay • Compare vs. optimal CMOS adder • ~10 -40 x slower – Low Rcont not critical • ~10 -100 x lower E/op – Lower Cg – Fewer devices, all minimum size – Lower Vdd, min 21

Parallelism and Area • If parallelism available, can trade area for throughput • Competing

Parallelism and Area • If parallelism available, can trade area for throughput • Competing with sub-threshold CMOS – Area-overhead bounded 22

Power Breakdown Revisited • Better logic “uncore” power dominant • Need to analyze (and

Power Breakdown Revisited • Better logic “uncore” power dominant • Need to analyze (and leverage) devices for entire system… – Relay DRAM or NVM (not SRAM)? – Relay ADC/DACs? 23

Outline • Simple Energy-Performance Analysis • Circuit Design with Relays • Conclusions 24

Outline • Simple Energy-Performance Analysis • Circuit Design with Relays • Conclusions 24

Summary • New devices need circuit level analysis • Ion/Ioff set by logic depth,

Summary • New devices need circuit level analysis • Ion/Ioff set by logic depth, activity factor • Don’t forget about variability, wires • Tailor circuit style to the device • If available, parallelism may allow slower (low Ion) devices • Don’t forget about the rest of the system 25

Good News/Bad News • Parallelism still available in CMOS • But eventually limited by

Good News/Bad News • Parallelism still available in CMOS • But eventually limited by Emin Today: Parallelism lowers E/op • Opportunity for new devices… • At least in sub 100 MHz applications Future: Parallelism doesn’t help -1 26

Acknowledgements • • Berkeley Wireless Research Center NSF DARPA FCRP 27

Acknowledgements • • Berkeley Wireless Research Center NSF DARPA FCRP 27