Computer Architecture Lecture 5 b Row Hammer in

Computer Architecture Lecture 5 b: Row. Hammer in 2020: Revisiting Row. Hammer Prof. Onur Mutlu ETH Zürich Fall 2020 1 October 2020

Revisiting Row. Hammer

Row. Hammer in 2020 (I) n Jeremie S. Kim, Minesh Patel, A. Giray Yaglikci, Hasan Hassan, Roknoddin Azizi, Lois Orosa, and Onur Mutlu, "Revisiting Row. Hammer: An Experimental Analysis of Modern Devices and Mitigation Techniques" Proceedings of the 47 th International Symposium on Computer Architecture (ISCA), Valencia, Spain, June 2020. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Talk Video (20 minutes)] [Lightning Talk Video (3 minutes)] 3

Revisiting Row. Hammer An Experimental Analysis of Modern Devices and Mitigation Techniques Jeremie S. Kim Minesh Patel A. Giray Yag lıkc ı Hasan Hassan Roknoddin Azizi Lois Orosa Onur Mutlu

Executive Summary • Motivation: Denser DRAM chips are more vulnerable to Row. Hammer but no characterization-based study demonstrates how vulnerability scales • Problem: Unclear if existing mitigation mechanisms will remain viable for future DRAM chips that are likely to be more vulnerable to Row. Hammer • Goal: 1. Experimentally demonstrate how vulnerable modern DRAM chips are to Row. Hammer and study how this vulnerability will scale going forward 2. Study viability of existing mitigation mechanisms on more vulnerable chips • Experimental Study: First rigorous Row. Hammer characterization study across a broad range of DRAM chips - 1580 chips of different DRAM {types, technology node generations, manufacturers} - We find that Row. Hammer vulnerability worsens in newer chips • Row. Hammer Mitigation Mechanism Study: How five state-of-the-art mechanisms are affected by worsening Row. Hammer vulnerability - Reasonable performance loss (8% on average) on modern DRAM chips - Scale poorly to more vulnerable DRAM chips (e. g. , 80% performance loss) • Conclusion: it is critical to research more effective solutions to Row. Hammer for future DRAM chips that will likely be even more vulnerable to Row. Hammer 5/8

Motivation - Denser DRAM chips are more vulnerable to Row. Hammer - Three prior works [Kim+, ISCA’ 14], [Park+, MR’ 16], over the last six years provide Row. Hammer characterization data on real DRAM - However, there is no comprehensive experimental study that demonstrates how vulnerability scales across DRAM types and technology node generations - It is unclear whether current mitigation mechanisms will remain viable for future DRAM chips that are likely to be more vulnerable to Row. Hammer 6

Goal 1. Experimentally demonstrate how vulnerable modern DRAM chips are to Row. Hammer and predict how this vulnerability will scale going forward 2. Examine the viability of current mitigation mechanisms on more vulnerable chips 7

DRAM Testing Infrastructures Three separate testing infrastructures 1. DDR 3: FPGA-based Soft. MC [Hassan+, HPCA’ 17] (Xilinx ML 605) 2. DDR 4: FPGA-based Soft. MC [Hassan+, HPCA’ 17] (Xilinx Virtex Ultra. Scale 95) 3. LPDDR 4: In-house testing hardware for LPDDR 4 chips All provide fine-grained control over DRAM commands, timing parameters and temperature DDR 4 DRAM testing infrastructure 8

DRAM Chips Tested 1580 total DRAM chips tested from 300 DRAM modules • Three major DRAM manufacturers {A, B, C} • Three DRAM types or standards {DDR 3, DDR 4, LPDDR 4} • LPDDR 4 chips we test implement on-die ECC • Two technology nodes per DRAM type {old/new, 1 x/1 y} • Categorized based on manufacturing date, datasheet publication date, purchase date, and characterization results Type-node: configuration describing a chip’s type and technology node generation: DDR 3 -old/new, DDR 4 -old/new, LPDDR 4 -1 x/1 y 9

Effective Row. Hammer Characterization To characterize our DRAM chips at worst-case conditions, we: 1. Prevent sources of interference during core test loop - We disable: • DRAM refresh: to avoid refreshing victim row • DRAM calibration events: to minimize variation in test timing • Row. Hammer mitigation mechanisms: to observe circuit-level effects - Test for less than refresh window (32 ms) to avoid retention failures 2. Worst-case access sequence - We use worst-case access sequence based on prior works’ observations - For each row, repeatedly access the two directly physically-adjacent rows as fast as possible [More details in the paper] 10

Testing Methodology REFRESH Row 0 Row 1 Row 2 Row 3 Row 4 Row 5 Aggressor Row Victim Aggressor Row Row Disable refresh to prevent interruptions in the core loop of our test from refresh operations Induce Row. Hammer bit flips on a fully charged row 11

Testing Methodology open closed Row 0 Row 1 Row 2 Row 3 Row 4 Row 5 Aggressor Row Victim Aggressor Row Disable refresh to prevent interruptions in the core loop of our test from refresh operations Induce Row. Hammer bit flips on a fully charged row Core test loop where we alternate accesses to adjacent rows 1 Hammer (HC) = two accesses Prevent further retention failures Record bit flips for analysis 12

Key Takeaways from 1580 • Chips of newer DRAM technology nodes are more vulnerable to Row. Hammer • There are chips today whose weakest cells fail after only 4800 hammers • Chips of newer DRAM technology nodes can exhibit Row. Hammer bit flips 1) in more rows and 2) farther away from the victim row. 13

1. Row. Hammer Vulnerability Q. Can we induce Row. Hammer bit flips in all of our DRAM chips? All chips are vulnerable, except many DDR 3 chips • A total of 1320 out of all 1580 chips (84%) are vulnerable • Within DDR 3 -old chips, only 12% of chips (24/204) are vulnerable • Within DDR 3 -new chips, 65% of chips (148/228) are vulnerable Newer DRAM chips are more vulnerable to Row. Hammer 14

2. Data Pattern Dependence Q. Are some data patterns more effective in inducing Row. Hammer bit flips? • We test several data patterns typically examined in prior work to identify the worst-case data pattern • The worst-case data pattern is consistent across chips of the same manufacturer and DRAM type-node configuration • We use the worst-case data pattern per DRAM chip to characterize each chip at worst-case conditions and minimize the extensive testing time [More detail and figures in paper] 15

3. Hammer Count (HC) Effects Q. How does the Hammer Count affect the number of bit flips induced? Mfr. A DDR 4 -new Hammer Count = 2 Accesses, one to each adjacent row of victim 16

3. Hammer Count (HC) Effects Row. Hammer bit flip rates increase when going from old to new DDR 4 technology node generations Row. Hammer bit flip rates (i. e. , Row. Hammer vulnerability) increase with technology node generation 17

4. Spatial Effects: Row Distance Q. Where do Row. Hammer bit flips occur relative to aggressor rows? Aggressor Row Mfr. A DDR 4 -old The number of Row. Hammer bit flips that occur in a given row decreases as the distance from the victim row (row 0) increases. 18

4. Spatial Effects: Row Distance We normalize data by inducing a bit flip rate of 10 -6 in each chip Chips of newer DRAM technology nodes can exhibit Row. Hammer bit flips 1) in more rows and 2) farther away from the victim row. 19

4. Spatial Effects: Row Distance We plot this data for each DRAM type-node configuration per manufacturer [More analysis in the paper] 20

4. Spatial Distribution of Bit Flips Q. How are Row. Hammer bit flips spatially distributed across a chip? We normalize data by inducing a bit flip rate of 10 -6 in each chip Representative of DDR 3/DDR 4 chip Representative of LPDDR 4 chip The distribution of Row. Hammer bit flip density per word changes significantly in LPDDR 4 chips from other DRAM types At a bit flip rate of 10 -6, a 64 -bit word can contain up to 4 bit flips. Even at this very low bit flip rate, a very strong ECC is required 21

4. Spatial Distribution of Bit Flips We plot this data for each DRAM type-node configuration per manufacturer [More analysis in the paper] 22

5. First Row. Hammer Bit Flips per Chip What is the minimum Hammer Count required to cause bit flips (HC first)? Whisker Q 3: 75% point Median: 50% Q 1: 25% point Whisker 23

5. First Row. Hammer Bit Flips per Chip What is the minimum Hammer Count required to cause bit flips (HC first)? We note the different DRAM types on the x-axis: DDR 3, DDR 4, LPDDR 4. We focus on trends across chips of the same DRAM type to draw conclusions 24

5. First Row. Hammer Bit Flips per Chip Newer chips from a given DRAM manufacturer more vulnerable to Row. Hammer 25

5. First Row. Hammer Bit Flips per Chip In a DRAM type, HCfirst reduces significantly from old to new chips, i. e. , DDR 3: 69. 2 k to 22. 4 k, DDR 4: 17. 5 k to 10 k, LPDDR 4: 16. 8 k to 4. 8 k There are chips whose weakest cells fail after only 4800 hammers Newer chips from a given DRAM manufacturer more vulnerable to Row. Hammer 26

Key Takeaways from 1580 • Chips of newer DRAM technology nodes are more vulnerable to Row. Hammer • There are chips today whose weakest cells fail after only 4800 hammers • Chips of newer DRAM technology nodes can exhibit Row. Hammer bit flips 1) in more rows and 2) farther away from the victim row. 27

Evaluation Methodology • Cycle-level simulator: Ramulator [Kim+, CAL’ 15] https: //github. com/CMU-SAFARI/ramulator - 4 GHz, 4 -wide, 128 entry instruction window - 48 8 -core workload mixes randomly drawn from SPEC CPU 2006 (10 < MPKI < 740) • Metrics to evaluate mitigation mechanisms 1. DRAM Bandwidth Overhead: fraction of total system DRAM bandwidth consumption from mitigation mechanism 2. Normalized System Performance: normalized weighted speedup to a 100% baseline 28

Evaluation Methodology • We evaluate five state-of-the-art mitigation mechanisms: - Increased Refresh Rate [Kim+, ISCA’ 14] PARA [Kim+, ISCA’ 14] Pro. HIT [Son+, DAC’ 17] MRLoc [You+, DAC’ 19] TWi. Ce [Lee+, ISCA’ 19] • and one ideal refresh-based mitigation mechanism: - Ideal • More detailed descriptions in the paper on: - Descriptions of mechanisms in our paper and the original publications - How we scale each mechanism to more vulnerable DRAM chips (lower HCfirst) 29

Mitigation Mech. Eval. (Increased Refresh) 105 104 103 102 Increased Refresh Rate 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) Substantial overhead for high HCfirst values. This mechanism does not support HCfirst < 32 k due to the prohibitively high refresh rates required 30

Mitigation Mechanism Evaluation (PARA) 105 104 103 102 80% performance loss PARA Low Performance Overhead High Performance Overhead 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) 31

Mitigation Mechanism Evaluation (Pro. HIT) 105 104 103 102 Pro. HIT PARA 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) 32

Mitigation Mechanism Evaluation (MRLoc) 105 104 103 102 MRLoc PARA Supported Not supported 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) Models for scaling Pro. HIT and MRLoc for HCfirst < 2 k are not provided and how to do so is not intuitive 33

Mitigation Mechanism Evaluation (TWi. Ce) 105 104 103 102 TWi. Ce-ideal PARA Supported Not supported 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) TWi. Ce does not support HCfirst < 32 k. We evaluate an ideal scalable version (TWi. Ce-ideal) assuming it solves two critical design issues 34

Mitigation Mechanism Evaluation (Ideal) 105 104 103 102 Ideal TWi. Ce-ideal PARA 6% performance loss 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) Ideal mechanism issues a refresh command to a row only right before the row can potentially experience a Row. Hammer bit flip 35

Mitigation Mechanism Evaluation 105 104 103 102 Ideal LPDDR 4 -1 y DDR 4 -new DDR 3 -new DDR 4 -old LPDDR 4 -1 x DDR 3 -old TWi. Ce-ideal PARA 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) PARA, Pro. HIT, and MRLoc mitigate Row. Hammer bit flips in worst chips today with reasonable system performance (92%, 100%) 36

Mitigation Mechanism Evaluation 105 104 103 102 Ideal LPDDR 4 -1 y DDR 4 -new DDR 3 -new DDR 4 -old LPDDR 4 -1 x DDR 3 -old TWi. Ce-ideal PARA 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) Only PARA’s design scales to low HCfirst values but has very low normalized system performance 37

Mitigation Mechanism Evaluation 105 104 103 102 Ideal LPDDR 4 -1 y DDR 4 -new DDR 3 -new DDR 4 -old LPDDR 4 -1 x DDR 3 -old TWi. Ce-ideal PARA 105 104 103 102 HCfirst (number of hammers required to induce first Row. Hammer bit flip) Ideal mechanism is significantly better than any existing mechanism for HCfirst < 1024 Significant opportunity for developing a Row. Hammer solution with low performance overhead that supports low HCfirst 38

Key Takeaways from Mitigation Mechanisms • Existing Row. Hammer mitigation mechanisms can prevent Row. Hammer attacks with reasonable system performance overhead in DRAM chips today • Existing Row. Hammer mitigation mechanisms do not scale well to DRAM chips more vulnerable to Row. Hammer • There is still significant opportunity for developing a mechanism that is scalable with low overhead 39

Additional Details in the Paper • Single-cell Row. Hammer bit flip probability • More details on our data pattern dependence study • Analysis of Error Correcting Codes (ECC) in mitigating Row. Hammer bit flips • Additional observations on our data • Methodology details for characterizing DRAM • Further discussion on comparing data across different infrastructures • Discussion on scaling each mitigation mechanism 40

Row. Hammer Solutions Going Forward Two promising directions for new Row. Hammer solutions: 1. DRAM-system cooperation - We believe the DRAM and system should cooperate more to provide a holistic solution can prevent Row. Hammer at low cost 2. Profile-guided - Accurate profile of Row. Hammer-susceptible cells in DRAM provides a powerful substrate for building targeted Row. Hammer solutions, e. g. : • Only increase the refresh rate for rows containing Row. Hammer-susceptible cells - A fast and accurate profiling mechanism is a key research challenge for developing low-overhead and scalable Row. Hammer solutions 41

Conclusion • We characterized 1580 DRAM chips of different DRAM types, technology nodes, and manufacturers. • We studied five state-of-the-art Row. Hammer mitigation mechanisms and an ideal refresh-based mechanism • We made two key observations 1. Row. Hammer is getting much worse. It takes much fewer hammers to induce Row. Hammer bit flips in newer chips • e. g. , DDR 3: 69. 2 k to 22. 4 k, DDR 4: 17. 5 k to 10 k, LPDDR 4: 16. 8 k to 4. 8 k 2. Existing mitigation mechanisms do not scale to DRAM chips that are more vulnerable to Row. Hammer • e. g. , 80% performance loss when the hammer count to induce the first bit flip is 128 • We conclude that it is critical to do more research on Row. Hammer and develop scalable mitigation mechanisms to prevent Row. Hammer in future systems 42

Revisiting Row. Hammer An Experimental Analysis of Modern Devices and Mitigation Techniques Jeremie S. Kim Minesh Patel A. Giray Yag lıkc ı Hasan Hassan Roknoddin Azizi Lois Orosa Onur Mutlu

Revisiting Row. Hammer in 2020 (I) n Jeremie S. Kim, Minesh Patel, A. Giray Yaglikci, Hasan Hassan, Roknoddin Azizi, Lois Orosa, and Onur Mutlu, "Revisiting Row. Hammer: An Experimental Analysis of Modern Devices and Mitigation Techniques" Proceedings of the 47 th International Symposium on Computer Architecture (ISCA), Valencia, Spain, June 2020. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Talk Video (20 minutes)] [Lightning Talk Video (3 minutes)] 44

Future Memory Reliability/Security Challenges

Computer Architecture Lecture 5 a: Row. Hammer in 2020: Revisiting Row. Hammer Prof. Onur Mutlu ETH Zürich Fall 2020 1 October 2020