Soft MC A Flexible and Practical OpenSource Infrastructure
Soft. MC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, Onur Mutlu 1
Executive Summary • Two critical problems of DRAM: Reliability and Performance ‒ Recently-discovered bug: Row. Hammer • Characterize, analyze, and understand DRAM cell behavior • We design and implement Soft. MC, an FPGA-based DRAM testing infrastructure ‒ Flexible and Easy to Use (C++ API) ‒ Open-source (github. com/CMU-SAFARI/Soft. MC) • We implement two use cases A retention time distribution test ‒ An experiment to validate two latency reduction mechanisms ‒ • Soft. MC enables a wide range of studies 2
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 3
DRAM Operations DRAM Cell Memory Bus DRAM Row Memory Precharge Activate Read Controller CPU Sense Amplifier 4
DRAM Latency DRAM Cell Sense Amplifier 0 (refresh) 64 ms time Activate Read Precharge Activate Ready-to-access Latency Precharge Retention Time: The interval during which the data Latency Activation Latency is retained correctly in the DRAM cell without accessing it 5
Latency vs. Reliability DRAM Cell Sense Amplifier time Activate Read Precharge Activate Ready-to-access Latency Precharge Violating latencies negatively Latency Activation Latency affects DRAM reliability 6
Other Factors Affecting Reliability and Latency • Temperature • Voltage • Inter-cell Interference • Manufacturing Process To develop • Retention Time new mechanisms reliability and latency, • improving … we need to better understand the effects of these factors 7
Characterizing DRAM Many of the factors affecting DRAM reliability and latency cannot be properly modeled We need to perform experimental studies of real DRAM chips 8
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 9
Goals of a DRAM Testing Infrastructure • Flexibility Ability to test any DRAM operation ‒ Ability to test any combination of DRAM operations and custom timing parameters ‒ • Ease of use Simple programming interface (C++) ‒ Minimal programming effort and time ‒ Accessible to a wide range of users ‒ • who may lack experience in hardware design 10
Soft. MC: High-level View FPGA-based memory characterization infrastructure Prototype using Xilinx ML 605 Easily programmable using the C++ API 11
Soft. MC: Key Components 1. Soft. MC API 2. PCIe Driver 3. Soft. MC Hardware 12
Soft. MC API Writing data to DRAM: Instruction. Sequence iseq; iseq. insert(gen. ACT(bank, row)); iseq. insert(gen. WAIT(t. RCD)); iseq. insert(gen. WR(bank, col, data)); iseq. insert(gen. WAIT(t. CL + t. BL + t. WR)); Instruction generator functions iseq. insert(gen. PRE(bank)); iseq. insert(gen. WAIT(t. RP)); iseq. insert(gen. END()); iseq. execute(fpga); 13
Soft. MC: Key Components 1. Soft. MC API 2. PCIe Driver* Communicates raw data with the FPGA 3. Soft. MC Hardware * Jacobsen, Matthew, et al. "RIFFA 2. 1: A reusable integration framework for FPGA accelerators. " TRETS, 2015 14
Soft. MC Hardware Instruction Receiver Host Machine PCIe Controller Instructions Instruction Queue Autorefresh Controller Instruction Dispatcher Activate Read Wait (Ready-to-access Latency) DRAM DDR PHY Calibration Controller Read Capture Data Soft. MC Hardware (FPGA) 15
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 16
Retention Time Distribution Study Write Reference Data to a Row Wait (Refresh Interval) Read Back Observe Errors Increase the refresh interval Can be implemented with just ~100 lines of code 17
Number of Erroneous Bytes Retention Time Test: Results 8000 @ ~20⁰C (room temperature) 6000 4000 Module A Module B Module C 2000 Validates the correctness of 0 the 0 Soft. MC Infrastructure 1 2 3 4 5 6 7 8 Refresh Interval (s) 18
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 19
Accessing Highly-charged Cells Faster NUAT (Shin+, HPCA 2014) Charge. Cache (Hassan+, HPCA 2016) A highly-charged cell can be accessed with low latency 20
How a Highly-Charged Cell Is Accessed Faster? 0 (refresh) 64 ms DRAM Cell Sense Amplifier time Activate Read Precharge Ready-to-access Latency Activation Latency Activate Precharge Latency 21
Ready-to-access Latency Test Longer wait Lower cell charge Shorter wait Higher cell charge Write Reference Data Wait for the Wait Interval Read Back Observe Errors Change the Wait Interval With custom ready-to-access latency parameter Can be implemented with just ~150 lines of code 22
400 300 200 100 0 of Erroneous Bytes 500 Expected Curves Real Curves Latency (cycles) 6 6 5 5 4 4 3 3 @ 80⁰C temperature We do not observe the expected latency reduction effect Refresh Interval chips in existing DRAM 8 Number 32 56 80 104 128 152 176 200 224 248 272 296 320 344 368 392 416 440 464 488 Number of Erroneous Bytes Ready-to-access Latency: Results Wait Interval (ms) 23
Why Don’t We See the Latency Reduction Effect? • The memory controller cannot externally control when a sense amplifier gets enabled in existing DRAM chips charge Ready to Access Cell Ready to Access Charge Level Data 1 Potential Reduction Enabling the Sense Amplifier Fixed Latency! Data 0 ACT R/W time 24
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 25
Future Research Directions • More Characterization of DRAM How are the cell characteristics changing with different generations of technology nodes? ‒ What types of usage accelerate aging? ‒ • Characterization of Non-volatile Memory • Extensions Memory Scheduling ‒ Workload Analysis ‒ Testbed for in-memory Computation ‒ 26
Outline 1. DRAM Basics & Motivation 2. Soft. MC 3. Use Cases – Retention Time Distribution Study – Evaluating Recently-Proposed Ideas 4. Future Research Directions 5. Conclusion 27
Conclusion • Soft. MC: First publicly-available FPGA-based DRAM testing infrastructure • Flexible and Easy to Use • Implemented two use cases Retention Time Distribution Study ‒ Evaluation of two recently-proposed latency reduction mechanisms ‒ • Soft. MC can enable many other studies, ideas, and methodologies in the design of future memory systems • Download our first prototype github. com/CMU-SAFARI/Soft. MC 28
Soft. MC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, Onur Mutlu
Backup Slides 30
Key Soft. MC Instructions 31
Soft. MC @ Github 32
Latency (cycles) 6 5 4 3 300 200 100 Module A 0 100000 Latency (cycles) 10000 6 1000 100 4 3 Module B 10 1 Wait Interval (ms) 10000000 100000 Latency (cycles) 6 5 4 3 10000 100 10 1 Module C 8 48 88 128 168 208 248 288 328 368 408 448 488 Number of Erroneous Bytes 5 8 56 104 152 200 248 296 344 392 440 488 400 Number of Erroneous Bytes 500 8 56 104 152 200 248 296 344 392 440 488 Number of Erroneous Bytes Ready-to-Access Latency Test Results Wait Interval (ms) 33
Activation Latency Test With low activation latency parameter Write Reference Data Change the wait interval Wait for the Wait Interval Observe Errors Read Back ACTPRE Wait for the Wait Interval 34
20000 10000 Latency (cycles) 14 11 8 5 2 Module A 0 1200 Latency (cycles) 1000 14 11 8 5 800 600 Module B 400 200 0 1600 1200 800 14 11 8 Wait Interval (ms) 5 2 Module C 400 0 8 48 88 128 168 208 248 288 328 368 408 448 488 Number of Erroneous Bytes Wait Interval (ms) 2400 Latency (cycles) 2000 2 8 56 104 152 200 248 296 344 392 440 488 30000 Number of Erroneous Bytes 40000 8 56 104 152 200 248 296 344 392 440 488 Number of Erroneous Bytes Activation Latency Test Results Wait Interval (ms) 35
- Slides: 35