Attacking the PowerWall by Using Nearthreshold Cores Liang
Attacking the Power-Wall by Using Near-threshold Cores Liang Wang liang@cs. virginia. edu
Power Wall • The end of Classical Scaling. – Vdd: almost constant – Power density: roughly increase in exponential – Utilization: roughly decrease in exponential * From Venkatesh, et. al. ASPLOS’ 10 • We can fabricate more cores than we can lipower up con k Dar Liang Wang, ECE 6332 Final Si 2
Near-threshold Cores (NVt. Cores) • Pros – Low power per-core. – More cores per-chip. • Limitations – Low per-core frequency, reducing throughput gains from parallelization. – Variations, harmful for performance and functionality. Will NVt. cores be a viable solution to push down the power-wall? Liang Wang, ECE 6332 Final 3
Outline • Performance Model • Analyses and Results • Conclusion Liang Wang, ECE 6332 Final 4
System Modeling Symmetric Multi-core System A Single core v Core Area: a Power: p(v) Freq: f(v) Area: A Power: P Dynamic Power Static Power Frequency Fitted to circuit sim. Number of active cores Application with Amdahl’s Law parallel ration of Liang Wang, ECE 6332 Final 5
Simulation Setup • Circuit – A single inverter – Ripple carry Adder (32 bits, 16 bits, 8 bits, and 4 bits) • Technology Library – A modified version of Predictive Technology Model (PTM) • Technology Nodes – 45 nm, 32 nm, 22 nm, 16 nm • Process Variants – HKMGS: High-performance High-K Metal Gate and Stress effect. – LP: Low-power process • CAD Tools – RC Compiler – Spectre driven by Ocean Liang Wang, ECE 6332 Final 6
Voltage-Frequency Scaling ~8 x LP has much larger frequency drop-down comparing to HP with the same change in vdd ~400 x ~15 x 16 nm has larger frequency drop-down comparing to 45 nm With the same change in vdd ~103 x Liang Wang, ECE 6332 Final 7
Design space exploration (Area) 45 nm, HKMGS, IO cores, 100 w, =0. 99 Peak is capped by total area 2 x Peak from 200 to 6. 4 K satu rati Liang Wang, ECE 6332 Final ng 8
Cross-technology study 500 mm 2 80 W 400 mm 2 100 W Liang Wang, ECE 6332 Final 9
Compare to Dark Silicon Available cores on-chip 500 mm 2 80 W HKMGS • NVt. cores alleviate the issue of low utilization. • NVt. cores has better performance. (up to 2 x) Liang Wang, ECE 6332 Final 10
Variation • NVt. cores are very sensitive to variations – Functionality. (ratioed circuits) – Performance. (focused in this project) • Monte-Carlo simulation – Performed on every VDD setups – 100 iterations per VDD – Process and mismatch Liang Wang, ECE 6332 Final 11
Voltage-Frequency Scaling Revisited • HKMGS – Up to 5 x slow down • LP – Up to 10 x slow down • HKMGS – Up to 10 x slow down • LP – Up to 100 x slow down Liang Wang, ECE 6332 Final 12
Impact of Variation 400 mm 2, 100 W, IO Worse Perf. Lower Utilization Flatten Vdd Liang Wang, ECE 6332 Final 13
Conclusion • In terms of performance – Simple core (IO) is better. – HP process (HKMGS) is better. • Lowering VDD reduces dark silicon, improves throughput. • Vulnerable to process variation. Liang Wang, ECE 6332 Final 14
- Slides: 14