DICE Compressing DRAM Caches for Bandwidth and Capacity
DICE: Compressing DRAM Caches for Bandwidth and Capacity Vinson Young Prashant Nair Moinuddin Qureshi
MOORE’S LAW HITS BANDWIDTH WALL Moore’s scaling encounters Bandwidth Wall 2
3 D-DRAM MITIGATES BANDWIDTH WALL 3 D-DRAM Hybrid Memory Cube (HMC) from Micron, 3 D-DRAM improves bandwidth, but does not have High Bandwidth Memory (HBM) from Samsung capacity to replace conventional DIMM memory 3
CPU L 1$ L 2$ fast L 3$ 3 D-DRAM Cache MCDRAM from Intel, HBC from AMD System Memory slow Memory Hierarchy 3 D-DRAM AS A CACHE (3 D-DRAM CACHE) OS-visible Space Architecting 3 D-DRAM as a cache can improve memory bandwidth (and avoid OS/software change) 4
PRACTICAL 3 D-DRAM CACHE: ALLOY CACHE Tags “part-of-line” Alloy Tag+Data Avoid Tag Serialization One “Tag+Data” Similar to DRAM Cache in KNL: Direct-mapped, Tags in ECC Practical DRAM cache: low latency and bandwidth-efficient
3 D-DRAM CACHE BANDWIDTH IS IMPORTANT On 8 -CPU, 1 GB DRAM Cache configuration 1. 80 2 x Capacity Speedup 1. 60 2 x Bandwidth, 2 x Capacity 22% 10% 1. 40 1. 20 1. 00 0. 80 6 L 2 AL GA P M EC SP SP EC RA T E IX 0. 60 2 x-capacity cache improves performance by 10%. And, additional 2 x bandwidth increases speedup to 22%. Improving both bandwidth and capacity is valuable. 6
INTRODUCTION: DRAM CACHE Baseline: Direct-Mapped, One Data Block in an access A B A A B C D D Baseline Traditional Compression (Incompressible) C D A C W Y B D X Z A C W Y Spatial Indexing (Compressible) (Incompressible) 7
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A A W B X C 1 x Y Bandwidth D Z B A B C D Traditional Compression (Compressible) (Incompressible) C D A C W Y B D X Z A C W Y Spatial Indexing (Compressible) (Incompressible) 8
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A A W B C X Y D Z B A B C 1 x Bandwidth D Traditional Compression (Compressible) (Incompressible) C D A C W Y B D X Z A C W Y Spatial Indexing (Compressible) (Incompressible) 9
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A A W B A B 4 Xaccesses B C Y @ C 1 x-2 x Capacity D Z D Traditional Compression (Compressible) (Incompressible) C D A C W Y B D X Z A C W Y Spatial Indexing (Compressible) (Incompressible) 10
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B A W A B C X Y B C D Z D Traditional Compression (Compressible) (Incompressible) C D A B C D W 2 x X Bandwidth Y Z A C W Y Spatial Indexing (Compressible) (Incompressible) 11
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B A W A B C X Y B C D Z D Traditional Compression (Compressible) (Incompressible) C D A C W Y B D X Z B, D? A C <W 1 x Bandwidth Y Spatial Indexing (Compressible) (Incompressible) 12
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? Traditional Compression 1 x Bandwidth Compressible 1 x Bandwidth Incompressible Spatial Indexing 2 x Bandwidth < 1 x Bandwidth Compressible Incompressible 13
INTRODUCTION: TRADITIONAL COMPRESSION 1. 80 Speedup 1. 60 Improves Capacity Little speedup (7%) No degradation 1. 40 1. 20 1. 00 0. 80 6 L 2 AL GA P M EC SP SP EC RA T E IX 0. 60 Compression for capacity (TSI) sees little speedup (7%) due to diminishing returns on giga-scale caches 14
INTRODUCTION: SPATIAL INDEXING 1. 80 Improves Bandwidth Speedup 1. 60 1. 40 1. 20 Can degrade No speedup 1. 00 0. 80 6 L 2 AL GA P M EC SP SP EC RA T E IX 0. 60 Spatial Indexing compression gets both benefits of bandwidth and capacity when lines are compressible. But, it hurts performance when lines are incompressible 15
INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression 1 x Bandwidth Compressible 1 x Bandwidth Incompressible Spatial Indexing 2 x Bandwidth < 1 x Bandwidth Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 16
DICE OVERVIEW • Compressed DRAM Cache Organization • Flexible Mapping for Quick Switching • Dynamic Indexing Compr. Ession (DICE) – Insertion Policy – Index Prediction 17
PRACTICAL DRAM CACHE COMPRESSION On-chip L 3 Cache Write Read Decompression Logic Writeback Compression Logic L 4 Cache Controller Install DRAM Cache DRAM(compressed) Cache Memory Off-chip Compression: Simple changes within the controller 18
DRAM CACHE TAG FORMAT Tag Boundary Data 8 Bytes 64 Bytes Tag A Data A Cache controller receives 72 B of tag+data. It can flexibly interpret bits as tag bits or data bits. 19
PROPOSED FLEXIBLE TAG FORMAT Tag Boundary Data Is Tag? Not Tag Is Tag? A B X I X B A We create Tag space as needed, for up to 28 lines. Achieves 1. 6 x effective capacity. 20
DICE OVERVIEW • Compressed DRAM Cache Organization • Flexible Mapping for Quick Switching • Dynamic Indexing Compr. Ession (DICE) – Insertion Policy – Index Prediction 21
FLEXIBLE MAPPING (TSI OR BAI) 0 4 0 1 1 5 2 3 4 5 2 6 4 5 2 3 3 7 6 7 Traditional Set Indexing (TSI) Naïve Spatial Indexing Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 22
FLEXIBLE MAPPING (TSI OR BAI) 0 4 0 1 1 5 4 5 2 6 2 3 3 7 6 7 Traditional Set Indexing (TSI) Naïve Spatial Indexing Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 23
FLEXIBLE MAPPING (TSI OR BAI) 0 4 0 1 4 1 5 2 3 1 4 5 2 6 4 5 2 3 6 3 7 6 7 3 6 7 Traditional Set Indexing (TSI) Naïve Spatial Indexing Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 24
DICE OVERVIEW • Compressed DRAM Cache Organization • Flexible Mapping for Quick Switching • Dynamic Indexing Compr. Ession (DICE) – Insertion Policy – Index Prediction 25
DICE: DYNAMIC-INDEXED COMPRESSED CACHE DRAM Cache Compressibility Based Insertion ? Install Traditional Set Index Bandwidth. Aware Index Cache Index Prediction ? Read TSI = BAI DICE: Dynamic-Indexing Cache compr. Ession, decides index on install, and predicts index on read 26
COMPRESSIBILITY-BASED INSERTION DRAM Cache Compressibility Based Insertion > ½-size Traditional Set Index Install Bandwidth<= ½-size Aware Index TSI = BAI ? But checking both wastes bandwidth Read No explicit swaps. Eviction and install decides policy Compressibilty-based insertion uses Bandwidth-Aware Indexing when lines are compressible, and TSI otherwise 27
SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in a Compressible Page Bandwidth. Aware Index Install <= ½-size Lines within a page have similar compressibility Bandwidth. Aware Index Read BAI Bandwidth. Aware Index DICE is likely to install lines of a page into similar index 28
SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in an Incompressible Page Install > ½-size Lines within a page have similar compressibility Traditional Set Index Bandwidth. Aware Index Read TSI 2 nd access only on mispredict Thus, page-based last-time prediction of index can be accurate (94%) 29
PAGE-BASED CACHE INDEX PREDICTOR (CIP) Demand Access Page # Hash 0 = Traditional Set Index Last-Time Table (LTT) 1 = Bandwidth 1 Aware Index 1 Predict 0 Traditional Set Index 0 1 Page-based last-time prediction exploits similar intra-page compressibility, to achieve high prediction accuracy (94%) 30
DICE OVERVIEW • Compressed DRAM Cache Organization • Flexible Mapping for Quick Switching • Dynamic Indexing (DICE) – Insertion Policy – Index Prediction • Results 31
METHODOLOGY (1/8 TH KNIGHTS LANDING) CPU • • Stacked DRAM Commodity DRAM Core Chip § 3. 2 GHz 4 -wide out-of-order core § 8 cores, 8 MB shared last-level cache Compression § FPC + BDI 32
METHODOLOGY (1/8 TH KNIGHTS LANDING) Other sensitivities in paper CPU Stacked DRAM Commodity DRAM Capacity Bus Channels Bandwidth 1 GB DDR 1. 6 GHz, 128 -bit 4 channels 100 GBps 32 GB DDR 1. 6 GHz, 64 -bit 1 channel 12. 5 GBps Latency 35 ns 33
DICE RESULTS 1. 80 Speedup 1. 60 1. 40 1. 20 Traditional Set Indexing Spatial Indexing as Spatial Performs as. Performs Traditional Indexing DICE outperforms both 1. 00 0. 80 6 L 2 AL GA P M EC SP SP EC RA T E IX 0. 60 DICE improves performance over both Spatial Indexing and Traditional Indexing with fine-grain decision (19%) 34
INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression 1 x Bandwidth Compressible 1 x Bandwidth Incompressible Spatial Indexing 2 x Bandwidth < 1 x Bandwidth Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 35
THANK YOU 36
EXTRA SLIDES • Extra Slides 37
DIFFERENT CACHE SENSITIVITIES 38
COMPARISON TO PREFETCH 39
COMPARISON TO SRAM /MEMORY COMPRESSION 40
FULL RESULTS (MIXED COMPRESSIBILITY) 41
SRAM CACHE COMPRESSION ON DRAM CACHE 42
DISTRIBUTION FOR INDEX DECISION 43
DICE INSERTION THRESHOLD 44
EFFECTIVE CAPACITY 45
L 3 HIT RATE IMPROVEMENT 46
LARGER TSI VS. BAI EXAMPLE 47
48
- Slides: 48