Optimal Power Allocation for Multiprogrammed Workloads on Singlechip
- Slides: 19
Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1, 2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2, 3 1 2 3
Single-chip heterogeneous processors • Compared to systems based on discrete components - Lower communication overhead Lower power consumption Lower cost (less silicon) Emerging application friendly (sequential + parallel processing) AMD’s Llano Intel’s Sandy Bridge Sources: AMD, Intel, and Samsung’s Exynos 2
Challenges • SCHP’s performance: limited by power budget - Total chip power budget CPU/GPU power budget • Multiprogrammed workload - Workload-aware power allocation Considering characteristics and metrics How can optimize overall performance within limited power budget? 3
Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation - Characteristics of programs Evaluation Metrics - Power configuration Benchmark programs • Methodology • Evaluation • Algorithm • Conclusion 4
Target platform: SCHP + MW • • 4 -core CPU + 16 -SM GPU Multiple V/F domains DVFS 2 programs running Hardware resources evenly divided CPU V/F domain (per-core) Program 1 Program 2 CPU Core 0 CPU Core 2 GPU 0 V/F domain GPU 0 CPU Core 1 CPU Core 3 GPU 1 V/F domain GPU 1 Multiprogrammed Workload Memory Controllers MCs V/F domain 5
Workload-aware power allocation • Characteristics of programs - Non-uniform performance sensitivities - Throughput vs. Energy efficiency • Evaluation metrics compute-bound (mri-q) Normalized throughput memory-bound (stream-copy) 2. 0 1. 8 1. 6 1. 4 1. 2 1. 0 0. 8 Allocating more power to mri-q 28. 6 34. 2 39. 8 48. 6 59. 0 Power allocation (using the same HW) 6
Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation - Characteristics of programs Evaluation Metrics - Power configuration Benchmark programs • Methodology • Evaluation • Algorithm • Conclusion 7
Methodology: shared power budget Output Power Configuration 34. 2 22. 4 34. 2 24. 8 46. 4 17. 4 62. 8 16. 8 11. 2 • CPU Can 1 change the budget CPUpower 2 GPUfor 1 • • Energy Efficiency Throughput 22. 4 31. 2 16. 8 31. 2 41. 6 11. 2 41. 6 GPU 2 Total chip power budget = 100 W CPU power budget = 80 W GPU power budget = 64 W Baseline configuration - Evenly divided (25 W for each CPU/GPU group) 8
Methodology: benchmark programs • Used 6 benchmark programs. • Divided into 3 groups depending on characteristics Benchmark Acronym Source Characteristics Magnetic Resonance Imaging Q MRQ Parboil Compute-bound Stream Cluster SCL Rodinia Compute-bound Hotspot HOT Rodinia Neutral Sum of Absolute Difference SAD Parboil Neutral Stencil STN Parboil Memory-bound Stream Copy SCP CS Virginia Memory-bound 9
Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation - Characteristics of programs Evaluation Metrics - Power configuration Benchmark programs • Methodology • Evaluation • Algorithm • Conclusion 10
Evaluation: case study 1 (compute- vs. memory-bound) 19% throughput improvement 32% energy efficiency improvement • Allocating more power to compute-bound • Optimal points vary depending on metrics. 11
Evaluation: case study 2 (memory- vs. memory-bound) 10% throughput improvement 32% energy efficiency improvement • Equally allocated power • Again, optimal point depends on - Evaluation metric Workload characteristics (compute- or memory-bound) 12
Evaluation: variation of optimal configuration • Depending on programs’ characteristics and evaluation metrics P 1 P 2 MRQ (C) SCP (M) SAD (N) MRQ (C) SCL (C) HOT (N) SAD (N) SCL(C) STN (M) HOT (N) SCP (M) MRQ(N) SAD (N) STN (M) SCP (M) Metric 1: throughput P 1 (Watt) P 2 (Watt) CPU GPU 17. 4 31. 2 17. 4 41. 6 17. 4 22. 4 Metric 2: energy efficiency P 1 (Watt) P 2 (Watt) CPU GPU 17. 4 16. 8 17. 4 11. 2 17. 4 16. 8 17. 4 22. 4 17. 4 16. 8 17. 4 11. 2 17. 4 22. 4 13
Evaluation: performance improvement from optimal power allocation • Achieved significant improvement - 12% for throughput 18% for energy efficiency Normalized IPS/W GEOMEAN SAD vs. SCP (NM) HOT vs. STN (NM) SCL vs. SAD (CN) MRQ vs. SAD (CN) HOT vs. MRQ (NC) SCL vs. SCP (CM) MRQ vs. SCP (CM) SAD vs. HOT (NN) SCP vs. STN (MM) MRQ vs. SCL (CC) 1. 4 1. 3 1. 2 1. 1 1. 0 0. 9 14
Algorithm for throughput maximization calculate (slope) wait(regular_time) compute-bound (mri-q) Normalized throughput abs(sp 1 -sp 2) 2. 0 < threshold NO sp 1 > sp 2 NO 1. 8 1. 6 1. 4 1. 2 1. 0 0. 8 alloc(p 2_more) YES memory-bound (stream-copy) YES 28. 6 alloc(equally) alloc(p 1_more) 34. 2 39. 8 48. 6 Power allocation 59. 0 15
Algorithm for energy efficiency maximization final = min_power • Gradient search from the minimum power allocation MAX = max( EE(final), EE(final, p 1++), EE(final, p 2++) ) EE(final) == MAX exit EE(final, p 1++) > EE(final, p 2++) final = (final, p 1++) 16
Conclusion • We propose a solution for optimal power allocation - Workload-aware power allocation By using program characteristics and evaluation metrics • Significant performance improvement achieved - 12% for throughput 18% for energy efficiency • Run-time algorithms effectively find (near-)optimal power allocation 17
Backup slides 18
Simulator • Integrated CPU + GPU simulator • H. Wang, V. Sathish, R. Singh, M. Schulte and N. Kim, "Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors, " in PACT, 2012. http: //cpu-gpu-sim. ece. wisc. edu/ gem 5 + GPGPU-Sim Adaptive power allocation for multiprogrammed workload - Per-core V/F domains for CPU - 2 V/F domains for GPU 19
- Contiguous allocation vs linked allocation
- Big data workloads
- Composable infrastructure gartner
- Vector processing workloads
- Nvidia optimal power vs adaptive
- Power triangle formula
- Fspos
- Typiska novell drag
- Nationell inriktning för artificiell intelligens
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Särskild löneskatt för pensionskostnader
- Tidböcker
- Sura för anatom
- Förklara densitet för barn
- Datorkunskap för nybörjare
- Boverket ka
- Debattinlägg mall
- Autokratiskt ledarskap