RAIDR RetentionAware Intelligent DRAM Refresh Jamie Liu Ben
RAIDR: Retention-Aware Intelligent DRAM Refresh Jamie Liu Ben Jaiyen Richard Veras Onur Mutlu Carnegie Mellon University Presented by Robin Burkhard 14 November 2018 1
Background, Problem & Goal 2
Background: DRAM Refresh n DRAM cells leak over time q n Cells have different retention times q n n Rows need to be refreshed periodically Refresh each row every 64 ms Refresh degrades performance q Bank unavailable while refreshed q Memory accesses have to wait Each refresh consumes energy 3
Background: DRAM Refresh 46% ~ 50/50 8% 4
Retention Time Distribution 64 ms refresh interval is a waste of energy and time caused by very few cells Idea of RAIDR: Refresh weak rows at high frequency and all other rows at low frequency 5
Mechanism & Implementation 6
RAIDR Operation 1. Profiling: Get the retention time for each row 2. Binning: Store rows into different bins based on retention time 3. Refreshing: Use different refresh rates for different bins 7
RAIDR Profiling Write all bits in row to 1 1 1 1 1 0 1 1 1 Deactivate refresh Wait for first bit to change 8
RAIDR Binning Default Configuration of RAIDR for 32 GB DRAM system: 64 -128 ms bin 128 -256 ms bin 64 ms refresh interval ~ 30 rows remaining rows 128 ms refresh interval ~ 1000 rows 256 ms refresh interval ~ 4 Million rows Space-efficient implementation using Bloom Filters 9
RAIDR Binning 16 bits size 3 hash functions insert(x) 0 0 0 0 10
RAIDR Binning insert(y) insert(x) 0 0 1 0 16 bits size 3 hash functions 0 0 0 11
RAIDR Binning insert(y) insert(x) 0 0 1 1 16 bits size 3 hash functions 0 0 0 1 0 12
RAIDR Binning insert(y) insert(x) 0 0 1 test(x) = 1 & 1 = 1 (present) n n n 0 1 1 16 bits size 3 hash functions 0 0 0 test(z) = 1 & 0 = 0 (not present) 1 0 0 1 0 test(w) = 1 & 1 = 1 (not present) no overflow property guaranteed false positives acceptable inability to remove elements acceptable 13
RAIDR Refreshing Choose all rows as refresh candidate every 64 ms no Row in 64 -128 ms bin? Yes n n Implemented in memory controller using row and period counter Rate Scaler increases refresh rate at extended temperature mode (>85°C) no Row in 128 -256 ms bin and last refresh 128 ms ago? Yes Refresh the row Yes Last refresh 256 ms ago? no Do not refresh the row 14
Evaluation & Results 15
Evaluation System 1. 25 KB in Memory Controller 16
Compared other Mechanisms n Auto Refresh q q n Distributed Refresh q q n Memory Controller sends address of row to be refreshed and can make use of banklevel parallelism Same number of refreshes as Auto-Refresh Smart Refresh q q n Memory controller sends auto-refresh commands causing DRAM to refresh several rows in one rank automatically Common mechanism in existing systems Timeout counter for each row that is reset on access or refresh Rows only get refreshed when the counter has expired No Refresh q Not used in practice 17
Results: Number of Refresh Operations 18
Results: System Performance 19
Results: Energy Consumption 20
Strengths & Weaknesses 21
Strengths n Significant reduction in number of refreshes is achieved by very simple and clever idea n Benefits of RAIDR increase with higher DRAM capacity n Modifications to Memory Controller are small and not too complex n No modifications to DRAM or software needed n Applicable to many different types of DRAM (e. g. 3 D-stacked DRAMs, e. DRAMs) 22
Weaknesses n RAIDR’s profiling method might not identify all weak rows q q n Retention times of cells are dependent on values stored in nearby cells (Data Pattern Dependencies) All 1 pattern used by RAIDR can cause cells to have higher retention times and be profiled wrong RAIDR is unable to adapt refresh rate for rows at runtime q q Cells might shift randomly between different retention time states (Variable Retention Times) Later change in retention time can cause a cell to be refreshed at too low rate n DPD/VRT can cause retention errors which makes the system less reliable n One weak cell still causes a whole row (64 K cells) to be refreshed at high rate n Setting default interval much higher than 256 ms becomes inefficient again 23
Thoughts and Ideas n Deactivating rows with lower retention time than 256 ms ? n Reduce refresh rate for everything and increase fault tolerance ? n Skip refresh of rows that contain only ‘ 0’s ? n Handle VRT failures at runtime and adapt refresh rate ? n Replacing DRAM by devices that don’t need to be refreshed ? 24
Questions & Discussion 25
Backup Slides 26
DRAM System Organization 27
Idle Power Consumption 28
RAIDR Configurations 29
Scalability 30
Period Counter Refresh row 00 Yes Choose all rows as refresh candidate every 64 ms 01 / 11 Row in 64 -128 ms bin? no Do not Refresh row 10 no Row in 64 -128 ms bin? Yes no Row in 128 -256 ms bin? Yes Refresh row 31
Papers n n n https: //users. ece. cmu. edu/~omutlu/pub/raidr-dram-refresh_isca 12. pdf http: //www. pdl. cmu. edu/PDL-FTP/NVM/dram-retention_isca 13. pdf http: //www. istc-cc. cmu. edu/publications/papers/2015/avatar-dramrefresh_dsn 15. pdf http: //www. xcg. cs. pitt. edu/papers/baek-tc 13. pdf https: //prashantnair. bitbucket. io/isca 40/Arch. Shield. pdf 32
- Slides: 32