PrefetchAware SharedResource Management for MultiCore Systems Eiman Ebrahimi
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi* Chang Joo Lee*+ Onur Mutlu‡ Yale N. Patt* * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory + Intel Corporation Carnegie Mellon University Austin
Background and Problem Core 0 Core 1 . . . Core 2 Core N Shared Memory Resources Shared Cache Core 0 Prefetcher . . . Core N Prefetcher Memory Controller On-chip Off-chip DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 Chip Boundary . . . DRAM Bank K 2
Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques 3
Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques memory controllers management of on-chip inteconnect management of multiple shared resources 4
Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers - Network Fair Queuing (Nesbit et. al. MICRO’ 06) Parallelism Aware Batch Scheduling (Mutlu et. al. ISCA’ 08) Fair management of on-chip interconnect Fair management of multiple shared resources - Fairness via Source Throttling (Ebrahimi et. al. , ASPLOS’ 10) 5
Background and Problem Fair memory scheduling technique: Network Fair Queuing (NFQ) Improves fairness and performance with no prefetching Significant degradation of performance and fairness in the presence of prefetching 1, 2 No Prefetching 1, 8 Aggressive Stream Prefetching 1, 6 1 1, 4 1, 2 0, 8 1 0, 6 0, 4 0, 2 0 FR-FCFS NFQ 0, 8 0, 2 0 Perf. Max Slowdown 6
Background and Problem Understanding the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques memory controllers management of on-chip inteconnect management of multiple shared resources Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques 7
Background and Problem Prior work addresses inter-application interference caused by prefetches Hierarchical Prefetcher Aggressiveness Control (Ebrahimi et. al. , MICRO’ 09) Dynamically detects interference caused by prefetches and throttles down overly aggressive prefetchers Even with controlled prefetching, fairness techniques should be made prefetch -aware 8
Outline Problem Statement Motivation for Special Treatment of Prefetches Prefetch-Aware Shared Resource Management Evaluation Conclusion 9
Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA’ 08] Principle 1: Parallelism-awareness Schedules requests from each thread to different banks back to back Preserves each thread’s bank parallelism Principle 2: Request Batching Marks a fixed number of oldest requests from each thread to form a “batch” Eliminates starvation & provides fairness T 1 T 2 T 0 T 3 T 2 T 1 T 0 Bank 0 Batch Bank 1 10
Impact of Prefetching on Parallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands alike when generating a batch Policy (b): Prefetches are not included alongside demands when generating a batch 11
Impact of Prefetching on Parallelism-Aware Batch Scheduling Service Order Policy (a) Mark Prefetches in PAR-BS P 2 D 1 P 1 Service Order Bank 1 P 2 D 2 P 1 Bank 2 DRAM Bank 1 Bank 2 Batch Core 1 Core 2 P 1 Accurate Prefetch Inaccurate Prefetch D 1 D 2 P 1 P 1 Sta ll Policy (b) Don’t Mark Prefetches in PAR-BS P 2 Bank 1 D 1 P 1 D 2 Bank 2 D 1 P 1 D 2 Sta Core 1 P 1 D 2 P 1 Batch ll Sta Core 2 ll Bank 1 Bank 2 D 2 P 2 Comput e Hit P 2 Saved Cycles P 1 P 2 D 2 P 1 Comput e Miss C Hit P 2 Saved Cycles D 2 C P 2 Accurate Prefetches Too Late Sta C C ll ll Miss 12
Impact of Prefetching on Parallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands alike when generating a batch Pros: Accurate prefetches will be more timely Cons: Inaccurate prefetches from one thread can unfairly delay demands and accurate prefetches of others Policy (b): Prefetches are not included alongside demands when generating a batch Pros: Inaccurate prefetches can not unfairly delay demands of other cores Cons: Accurate prefetches will be less timely - Less performance benefit from prefetching 13
Outline Problem Statement Motivation for Special Treatment of Prefetches Prefetch-Aware Shared Resource Management Evaluation Conclusion 14
Prefetch-Aware Shared Resource Management Three key ideas: Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy Fairness via source-throttling technique: Coordinate core and prefetcher throttling decisions Demand boosting for memory non-intensive applications 15
Prefetch-Aware Shared Resource Management Three key ideas: Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy Fairness via source-throttling technique: Coordinate core and prefetcher throttling decisions Demand boosting for memory non-intensive applications 16
Prefetch-aware PARBS (P-PARBS) Service Order Policy (a) Mark Prefetches in PAR-BS P 2 D 1 P 1 Bank 1 P 2 D 2 P 1 Bank 2 DRAM Bank 1 Bank 2 Batch Core 1 Core 2 P 1 Accurate Prefetch Inaccurate Prefetch D 1 D 2 P 1 P 1 Sta ll D 2 Comput e D 2 P 2 Comput e Hit P 2 C C Hit P 2 17
Service Order Prefetch-aware PARBS (P-PARBS) Policy (b) Don’t Mark Prefetches in PAR-BS DRAM P 2 Bank 1 D 1 P 2 P 1 P 1 Bank 2 D 2 D 2 Sta D 1 D 2 Batch Core 1 ll Sta Core 2 ll Bank 1 Bank 2 Accurate Prefetch Inaccurate Prefetch D 2 P 1 P 2 D 2 P 1 Comput e P 1 P 2 Accurate Prefetches Too Late Sta C ll ll Underlying prioritization policies Comput e Miss need to distinguish between Our Policy: Mark Accurate Prefetches Miss prefetches based on accuracy P 1 P 2 D 1 Bank 1 P 2 D 2 Bank 1 D 1 Bank 2 D 2 Batch Core 1 Core 2 D 2 P 2 P 1 Sta Comput ll Sta e Comput ll e C P 1 C Saved Cycles Hit P 2 18
Prefetch-Aware Shared Resource Management Three key ideas: Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy Fairness via source-throttling technique: Coordinate core and prefetcher throttling decisions Demand boosting for memory non-intensive applications 19
No Demand Boosting Serviced Last Service Order With Demand Boosting Legend: Core 1 Dem Core 1 is memory nonintensive Demand boosting eliminates starvation of Core 2 Dem memory non-intensive applications Serviced First Core 2 Pref Core 2 is memory Bank 1 Bank 2 intensive Core 2 is memory intensive
Prefetch-Aware Shared Resource Management Three key ideas: Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy Fairness via source-throttling technique: Coordinate core and prefetcher throttling decisions Demand boosting for memory non-intensive applications 21
Outline Problem Statement Motivation for Special Treatment of Prefetches Prefetch-Aware Shared Resource Management Evaluation Conclusion 22
Evaluation Methodology x 86 cycle accurate simulator Baseline processor configuration Per-core - 4 -wide issue, out-of-order, 256 entry ROB Shared (4 -core system) - 128 MSHRs 2 MB, 16 -way L 2 cache Main Memory - DDR 3 1333 MHz Latency of 15 ns per command (t. RP, t. RCD, CL) 8 B wide core to memory bus 23
System Performance Results 1, 2 11% 1, 2 10. 9% 11. 3% 1 1 1 0, 8 0, 6 No Prefetching Aggressive Prefetching 0, 6 HPAC 0, 4 0, 2 0 0 0 NFQ PARBS Prefetch-Aware FST (Core Throttling) 24
Max Slowdown Results 1, 2 1 1 1 9. 9% 0, 8 18. 4% 0, 8 0, 6 14. 5% No Prefetching Aggressive Prefetching 0, 6 HPAC 0, 4 0, 2 0 0 0 NFQ PARBS Prefetch-Aware FST (Core Throttling) 25
Conclusion State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetching Their underlying prioritization techniques need to be extended to differentiate prefetches based on accuracy Core and prefetcher throttling should be coordinated with source-based resource management techniques Demand boosting eliminates starvation of memory non-intensive applications Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10% 26
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi* Chang Joo Lee*+ Onur Mutlu‡ Yale N. Patt* * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory + Intel Corporation Carnegie Mellon University Austin
- Slides: 27