Mirage Cores The Illusion of Many Outoforder Cores
- Slides: 23
Mirage Cores: The Illusion of Many Out-of-order Cores Using In -order Hardware Shruti Padmanabha, Andrew Lukefahr*, Reetuparna Das, Scott Mahlke Micro-50, Boston Oct 18, 2017 University of Michigan Electrical Engineering and Computer Science *Now at Indiana University
General purpose computer architectures System throughput Energy efficiency Single-thread performance 2
Heterogeneous CMP architectures Oo. O Out-of-order Core System throughput • Energy efficiency • • • CMP Area High performance Dynamically reorders instructions Large, complex, onesize fits all design Low energy efficiency Single-thread performance 3
Heterogeneous CMP architectures Oo. O In-Order Core • • • Chip Area Smaller, simplistic design Low area Low power Issues instructions in program order Low performance System throughput Energy efficiency Single-thread performance 4
Heterogeneous CMP architectures In. O In. O Oo. O Chip Area System throughput Energy efficiency Single-thread performance 5
Mirage Cores - Objective Single-thread performance System throughput More In. O cores More Oo. O-like In. O cores Energy efficiency 6
Background: Dyna. MOS “More Oo. O-like In. O cores” Program Traces Oo. O Sched $ In. O Memoize! Oin. O HW 70% of the traces have equivalent schedules for most of their lifetimes Oin. O vs In. O Performance: 1. 4 X Area: 1. 2 X Energy: 1. 4 X 7
Mirage Cores: Motivations Memoization opportunities vary based on program/phase characteristics In. O cores can utilize memoized traces for phases of millions of instructions Oin. O Oo. O Oin. O 8
Mirage Cores: Concept Oo. O+ Oo. O Schedule producer In. O+ Oin. O Oo. O In. O+ In. O Oin. O In. O+In. O System throughput Oin. O In. O+ Oo. OOin. O In. O+ Oin. O Chip Area Energy efficiency Single-thread performance 9
Mirage Cores: Challenges Oo. O+ Oo. O Schedule producer In. O+ Oin. O In. O+ Efficiently time-share the Oo. O • Architecture • # Oin. Os per Oo. O • Minimize overheads Oin. O • Effectively arbitrate between applications • Metrics? • Goals? 10
Mirage Cores: Architecture Oo. O L 1 i$ Sched$ In. O+ Oin. O L 1 d$ Sched$ Interconnect To Shared L 2 L 1 i$ Arbitrator L 1 i$ Sched$ In. O+ Oin. O L 1 d$ L 1 i$ Sched$ In. O+ Oin. O To Shared L 2 L 1 d$ L 1 i$ Sched$ In. O+ Oin. O L 1 d$ … 11
Arbitration Between Applications Oo. O Candidate Arbitrator App 0 Time 1 Million cycles App 1 Execution metrics ? ? App 0 App 2 App 3 … 12
Metrics for arbitration Execution metric Determines Measure Memoizability Single-application speedup ∆Sched$-MPKI 13
Memoizability - ∆Sched$-MPKI delta = IPC Relationship between performance and Sched$-MPKI for bzip 2 . . . Program in increasing order of 1 M cycle intervals 14
Metrics for arbitration Execution metric Determines Memoizability Single-application speedup Slowdown System throughput Time on Oo. O Fairness Measure ∆Sched$-MPKI 15
Goals for arbitration 1. Maximize energy efficiency 2. Maximize system throughput Oo. O Traditional Heterogeneous CMP 3. Guarantee fair/priority based resource allocation Oo. O 16
Evaluation Methodology Architectural Feature Parameters Oo. O Core 3 wide O 3 @ 2 GHz 12 stage pipeline 128 ROB Entries 128 entry PRF, 32 entry LSQ In. O Core 3 wide In. Order @ 2 GHz 8 stage pipeline 128 entry PRF, 32 entry LSQ Memory System 32 KB L 1 i/d cache, 2 cycle access 8 KB Schedule cache, 1 cycle access 1 MB L 2 cache, 15 cycle access 1 GB Main Mem, 100 cycle access Simulator Gem 5 Energy Model Mc. PAT 17
Evaluation Experimental parameters Parameters Number of cores n-In. O + 1 Oo. O Baseline n-Oo. O Workloads Random mixes of n-benchmarks from spec 2 k 6 Each run for a 1 billion instruction simpoint 18
Architectures for comparison 8: 1 configuration Traditional Heterogeneous CMP In. O Oo. O (booster) Mirage Core CMP OIn. O Oo. O + Sched Producer In. O OIn. O Homogeneous Oo. O CMP Oo. O Oo. O Homogeneous In. O CMP In. O In. O 19
App 7 App 5 Traditional Het-CMP Mirage (no memoization) Cores 100% 54% energy savings over homogeneous Oo. O CMP with 16% 100% Homo-Oo. O STP loss 80% OR Homo-In. O 60% 24% STP gains over homogeneous In. O CMP with 14% energy 60% overhead 40% 20% 0% App 1 App 2 App 6 Energy Rel to Homo. O Performance Rel to Homo-Oo. O 8 In. Os with 1 Oo. O App 0 App 4 App 3 Homo-Oo. O Homo-In. O 20% 0% 20
Size of cluster Traditional Het-CMP (no memoization) Perforamnce Relative to Homo-Oo. O Utilization of Oo. O 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 4 8 12 16 Mirage Cores Homo-Oo. O 100% 90% 80% 70% Homo-In. O 60% 50% 40% 30% 20% 10% 0% 4 8 12 16 Number of Oin. O cores per Oo. O is oversubscribed for n >=12 21
Conclusion Reordered schedules System throughput Energy efficiency Single-thread performance In. O+ Oin. O In. O+ Oin. O Oo. O+ Oo. O Schedule producer Achieve > 80% of a homogeneous Oo. O CMP with > 20% area and > 50% energy savings 22
Mirage Cores: The Illusion of Many Out-of-order Cores Using In-order Hardware Questions? Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke Micro-50, Boston Oct 18, 2017 University of Michigan Electrical Engineering and Computer Science
- Refraction of light meaning
- Quantum mirage
- Katarzyna matoga
- Theysc
- The good the bad
- Resmed finland oy
- Mirage definition physics
- Relative index of refraction
- Mbti top
- Superior vs inferior mirage
- Diễn thế sinh thái là
- Sự nuôi và dạy con của hổ
- Ng-html
- V cc
- Phép trừ bù
- Alleluia hat len nguoi oi
- Lời thề hippocrates
- đại từ thay thế
- Thiếu nhi thế giới liên hoan
- Vẽ hình chiếu vuông góc của vật thể sau
- Quá trình desamine hóa có thể tạo ra
- Công thức tính độ biến thiên đông lượng
- Dot
- Sơ đồ cơ thể người