Temperature Variation Aware Energy Optimization in Heterogeneous MPSo

  • Slides: 31
Download presentation
Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs Mohammadsadegh Sadri Department of Electrical,

Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs Mohammadsadegh Sadri Department of Electrical, Electronic and Information Engineering (DEI) University of Bologna, Italy Supervisor : Prof. Luca Benini {mohammadsadegh. sadr 2, luca. benini}@unibo. it Ver 4 - last update 30 -jan-2014

Introduction MPSo. Cs, Many-cores, 3 D Integrated circuits …… Increasing power density! Hotspots! Results.

Introduction MPSo. Cs, Many-cores, 3 D Integrated circuits …… Increasing power density! Hotspots! Results. CMOS : 65 nm CMOS Magnificent Spatial and 40 nm Ø System Operation Failure! Temporal Temperature CMOS 28 nm Ø Accelerated aging! Changes (Variations). Ø Energy and Design inefficiency! Ø… Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 2 (c) Luca

Outline Introduction Mi. MAPT : Temperature Variation Aware Design Analysis Energy Optimization in 3

Outline Introduction Mi. MAPT : Temperature Variation Aware Design Analysis Energy Optimization in 3 D MPSo. C with Wide-IO DRAM A Heterogeneous Many-core Architecture using ZYNQ Conclusion & Future works Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 3

Part II Mi. MAPT : Temperature Variation Aware Delay, Power and Thermal Analysis

Part II Mi. MAPT : Temperature Variation Aware Delay, Power and Thermal Analysis

Necessity of Fast & Accurate Thermal Analysis High Power ØNeed for a Densities Temporal

Necessity of Fast & Accurate Thermal Analysis High Power ØNeed for a Densities Temporal Short-cut!Variability of workload Non-regular layouts for RTL entities long intervals Build a versatile method to define thermal floorplan ØEarly detection of suspicious cases High spatial Transient ØTrigger when resolution for Fine-grain only thermal simulation over needed! simulation For nowadays designs: q Very time consuming! q Practically Impossible! ØThermal floorplan, different than layout floorplan! Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 5

Temperature Distribution Ø Non-Uniform Bell Shapes Other Cases Horizontal or Vertical Gradients Conclusion: -

Temperature Distribution Ø Non-Uniform Bell Shapes Other Cases Horizontal or Vertical Gradients Conclusion: - Delay/Power Analysis May Need to be Done: v For Every Possible Design Operating Condition. (Not only characterized corners. ) v Considering Non-uniform die Temperature. 110 C v You need a tool: v To Arm the Timing/Power Analysis tool (e. g. Synopsys Prime-Time) v To Account for Non-uniform Temperature Of Standard-cells in Delay/Power Analysis 25 C 25 C Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs … Self Heating …

Mi. MAPT Ø Micrel’s Multi-scale Analyzer for Power and Temperature Understands: 2 1 Mi.

Mi. MAPT Ø Micrel’s Multi-scale Analyzer for Power and Temperature Understands: 2 1 Mi. MAPT 5 Cadence Flow: 4 Standard design flow file formats: 3 - RTL Compiler (RC) (v. 10. 1) Fast Accurate Mi. MAPT Performs Mi. MAPT is not: Std-cell • - &. LIB, . LEF Lib. Merged Virtual Acceleration: So. C Encounter (v. 10. 1) ofa delay/power and limited • Detection. DEF, to. TCL: physical Chip 1. Doinfo thermal simulation at. Mi. MAPT RT Analysis: Level Hotspots thermal analysis specific thermal integrates • . . . Even ifnecessary finalinto chip is 2. Switch to Gate Level when - Synopsys Flow: (Spatial and considering simulation engine Standard Tool report formats: not ready, ASIC you can - Design Compiler (v 2010. 03) temperature non(currently uses power report design flow • - Temporal Synthesizer obtain thermal ICC Compiler (v 2010. 03) coordinates) uniformities • - Hotspot) Timing/Power analysis tool estimates. Prime. Time (v 2010. 06) power/delay reports Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 7

Non-uniform Temperature Map Dynamic Power Static Power Total Power Period Critical Timing Path Value

Non-uniform Temperature Map Dynamic Power Static Power Total Power Period Critical Timing Path Value at uniform 50 C 40 nm. LP – VDD=0. 81 v (X : pattern number) 17 MHz (Real running frequency: 271 MHz, estimated one: 288 MHz 40 nm. LP – VDD=1. 21 v (X : pattern number) 5. 4 m. W Example chip: Intel SCC: ~3 Watts difference in real static power and estimated one Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs

Example Mi. MAPT Operation Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous

Example Mi. MAPT Operation Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 9

Mi. MAPT vs. Fine-Grain Design - Temperature difference for Hotspots estimated by Mi. MAPT

Mi. MAPT vs. Fine-Grain Design - Temperature difference for Hotspots estimated by Mi. MAPT vs. fine grain: 0. 02 K. & - Spatial distance between Hotspot detected by Mi. MAPT vs. Fine-grain is ~ 0. 0 um. Test case Execution Time: 613 s Mi. MAPT Fine-Grain Execution Time: 19186 s - Execution Time - Hotspots: - Spatial/Temporal Coordinates - Temperature Further Descriptions: [THERMINIC 12] , [VLSI INTEGRATION] Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 10

Part III Temperature Variation Aware Energy Optimization in 3 D MPSo. Cs With Wide-I/O

Part III Temperature Variation Aware Energy Optimization in 3 D MPSo. Cs With Wide-I/O DRAM

3 D MPSo. Cs with Stacked DRAMs 3 D Integration Pros One Die (Top

3 D MPSo. Cs with Stacked DRAMs 3 D Integration Pros One Die (Top View)Cons Higher Bandwidth Difficult to manufacture Lower Energy … Thermal issues … Samsung Wide-I/O DRAM channels DRAM dies Core die 1 DRAM channel: - Spans 4 silicon dies & contains 8 banks (2 banks/die). - Data bus width: 128 Bits - Max clock : 200/300 MHz Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs

Transaction Level Modeling The need for modeling more complex hardware: (RTLon too slow!) Running

Transaction Level Modeling The need for modeling more complex hardware: (RTLon too slow!) Running Android TLM the platform Example : Synopsys Platform Studio Concurrent HW/SW Development Transaction Level Models (TLM) : q Fast models for hardware components Design Space Exploration q Speed/Accuracy balance : o Loosely Timed (LT) Sophisticated Design Debugging & Analysis o Approximately Timed (AT) o Cycle Accurate (CA) Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs Early Power/ Performance Analysis

TLM Virtual Infrastructure TLM Environment q gem 5 simulates a multi-core ARM system. q

TLM Virtual Infrastructure TLM Environment q gem 5 simulates a multi-core ARM system. q Android OS with real-world benchmarks. q DRAM accesses trace captured q Timing annotations 14 q Performance metrics of CPUs q. CPU TLM models of Synopsys are Loosely Timed and not accurate! q Re-play the recorded trace: q. Cycle Accurate TLM Models for CPUs (e. g. Carbon) are expensive! Power Models qtimings adjusted 3 D-ICE & Governors q gem 5 used to model CPU operation. Thermal Model (In Python) Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs

Temperature Variation Aware Bank-wise Refresh Required refresh rate vs. Temperature (32 MBits Bank) Different

Temperature Variation Aware Bank-wise Refresh Required refresh rate vs. Temperature (32 MBits Bank) Different refresh rates for each of the DRAM banks according to its own temperature! An Idea! Vertical variation in temperature of 2 banks of one DRAM channel in 2 different dies (5. 6 C). Lateral difference (variation) in temperature of 2 adjacent banks of one DRAM channel (3. 3 C). Sample thermal profile of the 3 D chip Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 15

Temperature Variation Aware Bank-wise Refresh Improvement in refresh rate : 24% Improvement in averaged

Temperature Variation Aware Bank-wise Refresh Improvement in refresh rate : 24% Improvement in averaged refresh power : 16% 5 Further description : [DATE 14] , [DAC 14] Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 16

Part IV A Heterogeneous Architecture for Temperature Variation Aware Hardware Acceleration Research

Part IV A Heterogeneous Architecture for Temperature Variation Aware Hardware Acceleration Research

Hardware Acceleration : Motivations 1951 UNIVAC I : 0. 015 operations per 1 watt-second

Hardware Acceleration : Motivations 1951 UNIVAC I : 0. 015 operations per 1 watt-second Half a century later! 2012 ST P 2012 : 40 billion operations per 1 watt-second Performance Per Watt!! Problem : Perform More Computations with Less Energy! Solution : Specialized functional units (Accelerators) Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs (c) Luca

Hardware Acceleration : Issues var 1 var 2 Better Performance Per Watt! PHYSICAL Case

Hardware Acceleration : Issues var 1 var 2 Better Performance Per Watt! PHYSICAL Case 2 var 1 CPU TASK 1 L 1$ TASK 2 var 2 TASK 3 ? ? ? Shouldn’t CPU Flush the cache! Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs TASK 4 Case 1 VIRTUAL cached How is the address passed What about Variables? to accelerator? Faster! var 3 MMU ? ? ? DRAM

Hardware Acceleration : Issues Accelerator 90 C DRAM cached Need … A Real-World Platform

Hardware Acceleration : Issues Accelerator 90 C DRAM cached Need … A Real-World Platform to var 1 Perform Experiments! Accelerator L 1$ (specialized hardware) var 2 60 C 75 C (specialized hardware) var 1 var 2 CPU TASK 1 TASK 2 TASK 3 TASK 4 Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs var 3

Xilinx ZYNQ Architecture PL PS SGP 0 Peripherals (UART, USB, Network, SD, GPIO, …)

Xilinx ZYNQ Architecture PL PS SGP 0 Peripherals (UART, USB, Network, SD, GPIO, …) SGP 1 DMA Controller (ARM PL 330) HP 0 AXI Masters HP 1 HP 2 HP 3 DRAM Controller (Synopsys Intelli. DDR MPMC) Inter Connect (ARM NIC-301) L 2 PL 310 AXI Slaves AXI Master MGP 0 MGP 1 ACP OCM S n o o p Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs L 1 ARM A 9 NEON MMU 21

Primary Performance Explorations PL PS For each method, Which method is better What is

Primary Performance Explorations PL PS For each method, Which method is better What is the transfer todata share dataspeed? between How much is the energy consumption? CPU and Accelerator? Effect of background workload on performance? HP 0 AXI Master (Accelerator) DRAM Controller L 2 PL 310 OCM ACP S n o o p Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs L 1 ARM A 9 NEON MMU 22

Speed Comparison ACP Loses! CPU OCM between CPU ACP & CPU HP 298 MBytes/s

Speed Comparison ACP Loses! CPU OCM between CPU ACP & CPU HP 298 MBytes/s 239 MBytes/s 4 K 16 K 64 K 128 K 256 K Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 1 MBytes 23

Energy Comparison CPU only methods : worst case! CPU OCM always between CPU ACP

Energy Comparison CPU only methods : worst case! CPU OCM always between CPU ACP and CPU HP CPU ACP ; always better energy than CPU HP 0 When the image size grows CPU ACP converges CPU HP 0 Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 24

Heterogeneous Hardware Architecture Cluster 0 A heterogeneous architecture: OR 1 K - ARM host

Heterogeneous Hardware Architecture Cluster 0 A heterogeneous architecture: OR 1 K - ARM host - Computational clusters: Resource Utilization - 8 Open. RISC Cores – XC 7045 (ZC-706 Board) OR 1 K - Open. RISC CPU cores - Hardware accelerators Cluster 1 OR 1 K ARM Host Cluster 2 ZYNQ OR 1 K HW ACC Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs PL PS 25

Part V Conclusions & Future Work

Part V Conclusions & Future Work

Conclusions 1. A thermal model for Intel SCC. • Comparison with calibrated sensor readings.

Conclusions 1. A thermal model for Intel SCC. • Comparison with calibrated sensor readings. 2. Effect of on-die temperature variation on power/delay of circuits. • Mi. MAPT evaluates designs considering temperature variation. • Mi. MAPT significantly faster than traditional methods. 3. TLM platform for thermal/performance exploration of 3 D MPSo. Cs. • Temperature variation aware bank-wise refresh improves power. 4. Developed a complete heterogeneous hardware platform • Enables future research regarding temperature variation aware control policies. Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs

Outputs! 1 SCC Thermal Calibration Software 2 Mi. MAPT Tool 3 3 D DRAM

Outputs! 1 SCC Thermal Calibration Software 2 Mi. MAPT Tool 3 3 D DRAM Modeling TLM Platform 4 Open. RISC Cluster For Xilinx ZYNQ Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 28

Ideas for Future Work 1. Mi. MAPT • 3 D Mi. MAPT • Evaluation

Ideas for Future Work 1. Mi. MAPT • 3 D Mi. MAPT • Evaluation of design containing blocks of memories • Considering new fabrication technologies 2. TLM Platform • Development of efficient thermal management policies (MPC) • Extension of modeling capabilities to other variants of 3 D logic. • Integration of gem 5 core into the TLM platform. 3. Heterogeneous Cluster • Exploration of temperature variation aware hardware reconfiguration ideas • Architectural enhancements Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 29

Publications [VLSI INTEGRATION] Mohammadsadegh Sadri, Andrea Bartolini, and Luca Benini. SUBMITTED: temperature variation aware

Publications [VLSI INTEGRATION] Mohammadsadegh Sadri, Andrea Bartolini, and Luca Benini. SUBMITTED: temperature variation aware multi-scale delay, power and thermal analysis at rt and gate level. [THERMINIC 11] Mohammad. Sadegh Sadri, Andrea Bartolini, and Luca Benini. Single-chip cloud computer thermal model. [THERMINIC 12] Mohammadsadegh Sadri, Andrea Bartolini, and Luca Benini. Mimapt: Adaptive multi-resolution thermal analysis at rt and gate level. [DATE 14] Mohammadsadegh Sadri, Matthias Jung, Christian. Weis, Norbert. Wehn, and Luca Benini. Energy optimization in 3 d mpsocs with wide-i/o dram using temperature variation aware bank-wise refresh. [FPGAWORLD 13] Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, and Luca Benini. Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. [DAC 14] Matthias Jung, Christian Weis, Mohammadsadegh Sadri, Norbert Wehn, and Luca Benini. SUBMITTED: optimized active and power-down mode refresh control in 3 d-drams. [PATMOS 11] Andrea Bartolini, Mohammad. Sadegh Sadri, Francesco Beneventi, and others. A system level approach to multi-core thermal sensors calibration. [DATE 12] Andrea Bartolini, Mohammadsadegh Sadri, J. Furst, A. K. Coskun, and L. Benini. Quantifying the impact of frequency scaling on the energy efficiency of the singlechip cloud computer. Mohammadsadegh Sadri – Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs 30

Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs Mohammadsadegh Sadri Department of Electrical,

Temperature Variation Aware Energy Optimization in Heterogeneous MPSo. Cs Mohammadsadegh Sadri Department of Electrical, Electronic and Information Engineering (DEI) University of Bologna, Italy Supervisor : Prof. Luca Benini {mohammadsadegh. sadr 2, luca. benini}@unibo. it Ver 3 -last update 28 -jan-2014