Block Hammer Preventing Row Hammer at Low Cost
Block. Hammer Preventing Row. Hammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Abdullah Giray Yag lıkc ı Minesh Patel Jeremie S. Kim Roknoddin Azizi Ataberk Olgun Lois Orosa Hasan Hassan Jisung Park Konstantinos Kanellopoulos Taha Shahroodi Saugata Ghose* Onur Mutlu *
Executive Summary • Motivation: Row. Hammer is a worsening DRAM reliability and security problem • Problem: Mitigation mechanisms have limited support for current/future chips - Scalability with worsening Row. Hammer vulnerability - Compatibility with commodity DRAM chips • Goal: Efficiently and scalably prevent Row. Hammer bit-flips without knowledge of or modifications to DRAM internals • Key Idea: Selectively throttle memory accesses that may cause Row. Hammer bit-flips • Mechanism: Block. Hammer - Tracks activation rates of all rows by using area-efficient Bloom filters - Throttles row activations that could cause Row. Hammer bit flips - Identifies and throttles threads that perform Row. Hammer attacks • Scalability with Worsening Row. Hammer Vulnerability: - Competitive with state-of-the-art mechanisms when there is no attack - Superior performance and DRAM energy when a Row. Hammer attack is present • Compatibility with Commodity DRAM Chips: - No proprietary information of DRAM internals - No modifications to DRAM circuitry 2
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 3
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 4
Organizing and Accessing DRAM Cells A DRAM cell consists of a capacitor and an access transistor A row needs to be activated to access its content 5
Capacitor voltage (Vdd) DRAM Refresh 100% Refresh Window t. REFW Refresh Operations Vmin 0% REF time REF Periodic refresh operations preserve stored data [Patel+ ISCA’ 17, Kim+ ISCA’ 20] 6
The Row. Hammer Phenomenon DRAM Bank open closed Row 0 Victim Row 1 Victim Row 2 Aggressor Row 3 Victim Row 4 Victim Row Repeatedly opening (activating) and closing (precharging) a DRAM row causes Row. Hammer bit flips in nearby cells [Kim+ ISCA’ 20] 7
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 8
Row. Hammer Mitigation Approaches • Increased refresh rate REF-to-REF time reduces Fewer activations can fit • Physical isolation Aggressor Row DRAM Bank Isolation Rows Large-enough distance Victim Rows • Reactive refresh Victim Rows Aggressor Row DRAM Bank Victim rows Refresh Rapidly activated (hammered) Refresh • Proactive throttling Fewer activations can be performed 9
Two Key Challenges 1 Scalability with worsening Row. Hammer vulnerability 2 Compatibility with commodity DRAM chips 10
Scalability with Worsening Row. Hammer Vulnerability • DRAM chips are more vulnerable to Row. Hammer today • Row. Hammer bit-flips occur at much lower activation counts (more than an order of magnitude decrease): - 139. 2 K 9. 6 K [Y. Kim+, ISCA 2014] [J. S. Kim+, ISCA 2020] • Row. Hammer blast radius has increased by 33%: - 9 rows - 12 rows [Y. Kim+, ISCA 2014] [J. S. Kim+, ISCA 2020] • In-DRAM mitigation mechanisms are ineffective [Frigo+, S&P 2020] Row. Hammer is a more serious problem than ever 11
Mitigation Approaches with Worsening Row. Hammer Vulnerability • Increased refresh rate • Physical isolation REF-to-REF time further reduces Even fewer activations can fit Aggressor Row DRAM Bank Isolation Rows Larger distance more isolation rows Victim Rows • Reactive refresh Victim rows Refresh more frequently Refresh more rows Aggressor row DRAM Bank Victim rows Refresh more frequently Refresh more rows • Proactive throttling More aggressively throttles row activations 12
Mitigation Approaches with Worsening Row. Hammer Vulnerability • Increased refresh rate • Physical isolation REF-to-REF time further reduces Even fewer activations can fit Aggressor Row DRAM Bank Larger distance Isolation Rows Mitigation mechanisms face the challenge of Isolation Rows more isolation rows scalability with worsening Row. Hammer Victim Rows • Reactive refresh Victim rows Refresh more frequently Refresh more rows Aggressor row DRAM Bank Victim rows Refresh more frequently Refresh more rows • Proactive throttling More aggressively throttles row activations 13
Two Key Challenges 1 Scalability with worsening Row. Hammer vulnerability 2 Compatibility with commodity DRAM chips 14
Compatibility with Commodity DRAM Chips Visible within the Processor System Level Memory Controller (Channel, Rank, Bank Group, Bank, Row, Col) DRAM Chip Application Level In-DRAM Mapping Physical Rows and Columns Virtual Memory Address Physical Memory Address DRAM Bus Addresses 15
Compatibility with Commodity DRAM Chips Vendors apply in-DRAM mapping for two reasons: • Design Optimizations: By simplifying DRAM circuitry to provide better density, performance, and power • Yield Improvement: By mapping faulty rows and columns to redundant ones • In-DRAM mapping scheme includes insights into chip design and manufacturing quality In-DRAM mapping is proprietary information 16
Row. Hammer Mitigation Approaches • Increased refresh rate REF-to-REF time reduces Fewer activations can fit • Physical isolation Aggressor Row DRAM Bank Isolation Rows Victim Rows • Reactive refresh Victim Rows Aggressor Row DRAM Bank Victim rows Identifying victim and isolation rows requires • Proactive throttling proprietary knowledge of in-DRAM mapping Fewer activations can be performed 17
Our Goal To prevent Row. Hammer efficiently and scalably without knowledge of or modifications to DRAM internals 18
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 19
Block. Hammer Key Idea Selectively throttle memory accesses that may cause Row. Hammer bit-flips 20
Block. Hammer Overview of Approach Row. Blocker Tracks row activation rates using area-efficient Bloom filters Blacklists rows that are activated at a high rate Throttles activations targeting a blacklisted row No row can be activated at a high enough rate to induce bit-flips Attack. Throttler Identifies threads that perform a Row. Hammer attack Reduces memory bandwidth usage of identified threads Greatly reduces the performance degradation and energy wastage a Row. Hammer attack inflicts on a system 21
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 22
Row. Blocker • Modifies the memory request scheduler to throttle row activations • Blacklists rows with a high activation rate and delays subsequent activations targeting blacklisted rows Blacklisting Logic Delaying Logic 23
Row. Blocker • Blocks a row activation if the row is both blacklisted and recently activated 24
Row. Blocker • When a row activation is performed, both Row. Blocker-BL and Row. Blocker-HB are updated with the row activation information 25
Row. Blocker-BL Blacklisting Logic • Blacklists a row when the row’s activation count in a time window exceeds a threshold • Employs two counting Bloom filters for area-efficient activation rate tracking 26
Counting Bloom Filters • Blacklisting logic counts activations using counting Bloom filters • A row’s activation count - can be observed more than it is (false positive) - cannot be observed less than it is (no false negative) • To avoid saturating counters, we use a time-interleaving approach ACT Row A Test B Hash functions 10 10 0 0 210 10 10 Minimum 1 27
Row. Blocker-BL Blacklisting Logic • Blacklisting logic employs two counting Bloom filters • A new row activation is inserted in both filters • Only one filter (active filter) responds to test queries • The active filter changes at every epoch CBFA is active CBFB is passive CBFA is passive CBFB is active 28
Row. Blocker-BL Blacklisting Logic • Blacklisting logic employs two counting Bloom filters • A new row activation is inserted in both filters • Only one filter (active filter) responds to test queries • The active filter changes at every epoch • Blacklists a row if its activation count reaches the blacklisting threshold (NBL) Assume that the row is activated at a high rate Assume that the row is not activated at a high rate 29
Limiting the Row Activation Rate • The activation rate is Row. Hammer-safe if it is smaller than or equal to Row. Hammer threshold (NRH) activations in a refresh window (t. REFW) • Row. Blocker limits the activation count (NCBF) in a CBF’s lifetime (t. CBF) Clear CBFA t. CBF Clear CBFB 30
Limiting the Row Activation Rate • The activation rate is Row. Hammer-safe if it is smaller than or equal to Row. Hammer threshold (NRH) activations in a refresh window (t. REFW) • Row. Blocker limits the activation count (NCBF) in a CBF’s lifetime (t. CBF) Row. Hammer Safety Constraint Clear CBFA t. CBF Clear CBFB 31
Row. Blocker-HB Limiting the Row Activation Rate • Ensures that all rows experience a Row. Hammer-safe activation rate NCBF row activations NBL row activations Blacklisted row activation Row activation t. RC x NBL t. Delay t. CBF t. Delay time t. CBF – (t. RC ✖�NBL) • We limit NCBF by configuring t. Delay : 32
Row. Blocker-HB Delaying Row Activations • Row. Blocker-HB ensures no subsequent blacklisted row activation is performed sooner than t. Delay • Row. Blocker-HB implements a history buffer for row activations that can fit in a t. Delay time window • A blacklisted row activation is blocked as long as a valid activation record of the row exists in the history buffer No row can be activated at a high enough rate to induce bit-flips 33
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 34
Attack. Throttler • Tackles a Row. Hammer attack’s performance degradation and energy wastage on a system • A Row. Hammer attack intrinsically keeps activating blacklisted rows • Row. Hammer Likelihood Index (RHLI): Number of activations that target blacklisted rows (normalized to maximum possible activation count) 0. 0 Benign application No blacklisted row activations 1. 0 Row. Hammer attack Blacklisted row activation count approaches Row. Hammer threshold RHLI is larger when the thread’s access pattern is more similar to a Row. Hammer attack 35
Attack. Throttler • Applies a smaller quota to a thread’s in-flight request count as RHLI increases 0. 0 Benign application No blacklisted row activations No quota applied 1. 0 Row. Hammer attack Blacklisted row activation count approaches Row. Hammer threshold No request is allowed RHLI • Reduces a Row. Hammer attack’s memory bandwidth consumption, enabling a larger memory bandwidth for concurrent benign applications Greatly reduces the perfomance degradation and energy wastage a Row. Hammer attack inflicts on a system • RHLI can also be used as a Row. Hammer attack indicator by the system software 36
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 37
Evaluation Block. Hammer’s Hardware Complexity • We analyze six state-of-the-art mechanisms and Block. Hammer NRH=32 K • We calculate area, access energy, and static power consumption* Mitigation Mechanism Block. Hammer PARA [73] Pro. HIT [137] MRLoc [161] CBT [132] TWi. Ce [84] Graphene [113] SRAM KB 51. 48 16. 00 23. 10 - CAM KB 1. 73 0. 22 0. 47 8. 50 14. 02 5. 22 Area mm 2 %CPU 0. 14 0. 06 <0. 01 0. 20 0. 08 0. 15 0. 06 0. 04 0. 02 Access Energy Static Power p. J m. W 20. 30 22. 27 3. 67 0. 14 4. 44 0. 21 9. 13 35. 55 7. 99 21. 28 40. 67 3. 11 Block. Hammer is low cost and competitive with state-of-the-art mechanisms *Assuming a high-end 28 -core Intel Xeon processor system with 4 -channel single-rank DDR 4 DIMMs with a Row. Hammer threshold (NRH) of 32 K 38
Evaluation NRH=1 K NRH=32 K Block. Hammer’s Hardware Complexity Mitigation Mechanism Block. Hammer PARA [73] Pro. HIT [137] MRLoc [161] CBT [132] TWi. Ce [84] Graphene [113] SRAM KB 51. 48 16. 00 23. 10 441. 33 x x 512. 00 738. 32 - CAM KB 1. 73 0. 22 0. 47 8. 50 14. 02 5. 22 55. 58 x x 272. 00 448. 27 166. 03 Area Access Energy Static Power mm 2 %CPU p. J m. W 0. 14 0. 06 20. 30 22. 27 <0. 01 3. 67 0. 14 10 x 5 x <0. 01 4. 44 0. 21 0. 20 0. 08 9. 13 35. 55 0. 15 0. 06 7. 99 21. 28 0. 04 0. 02 40. 67 3. 11 1. 57 0. 64 99. 64 220. 99 <0. 01 x x 23 x x x 3. 95 20 x 1. 60 127. 93 15 x 535. 50 5. 17 35 x 2. 10 124. 79 30 x 631. 98 1. 14 0. 46 917. 55 30 x 93. 96 23 x Block. Hammer’s hardware complexity scales more efficiently than state-of-the-art mechanisms 39
Evaluation Performance and DRAM Energy • Cycle-level simulations using Ramulator and DRAMPower • System Configuration: Processor LLC Memory scheduler Address mapping DRAM Row. Hammer Threshold 3. 2 GHz, {1, 8} core, 4 -wide issue, 128 -entry instr. window 64 -byte cacheline, 8 -way set-associative, {2, 16} MB FR-FCFS Minimalistic Open Pages DDR 4 1 channel, 1 rank, 4 bank group, 4 banks per bank group 32 K • Single-Core Benign Workloads: - 22 SPEC CPU 2006 - 4 YCSB Disk I/O - 2 Network Accelerator Traces - 2 Bulk Data Copy with Non-Temporal Hint (movnti) • Randomly Chosen Multiprogrammed Workloads: - 125 workloads containing 8 benign applications - 125 workloads containing 7 benign applications and 1 Row. Hammer attack thread 40
Evaluation Performance and DRAM Energy • We classify single-core workloads into three categories based on row buffer conflicts per thousand instructions 0. 0 1. 0 Low (L) 5. 0 Medium (M) High (H) RBCPKI • No application’s row activation count exceeds Block. Hammer’s blacklisting threshold (NBL) Block. Hammer does not incur performance or DRAM energy overheads for single-core benign applications 41
Evaluation Performance and DRAM Energy • System throughput (weighted speedup) • Job turnaround time (harmonic speedup) • Unfairness (maximum slowdown) • DRAM energy consumption No Row. Hammer Attack Block. Hammer introduces very low performance (<0. 5%) and DRAM energy (<0. 4%) overheads Row. Hammer Attack Present Block. Hammer significantly increases benign application performance (by 45% on average) and reduces DRAM energy consumption (by 29% on average) 42
Evaluation Scaling with Row. Hammer Vulnerability • System throughput (weighted speedup) • Job turnaround time (harmonic speedup) • Unfairness (maximum slowdown) • DRAM energy consumption No Row. Hammer Attack Block. Hammer’s performance and energy overheads remain negligible (<0. 6%) Row. Hammer Attack Present Block. Hammer scalably provides much higher performance (71% on average) and lower energy consumption (32% on average) than state-of-the-art mechanisms 43
More in the Paper • Security Proof - Mathematically represent all possible access patterns - We show that no row can be activated high-enough times to induce bit-flips when Block. Hammer is configured correctly • Addressing Many-Sided Attacks • Evaluation of 14 mechanisms representing four mitigation approaches - Comprehensive Protection Compatibility with Commodity DRAM Chips Scalability with Row. Hammer Vulnerability Deterministic Protection 44
Outline DRAM and Row. Hammer Background Motivation and Goal Block. Hammer Row. Blocker Attack. Throttler Evaluation Conclusion 45
Conclusion • Motivation: Row. Hammer is a worsening DRAM reliability and security problem • Problem: Mitigation mechanisms have limited support for current/future chips - Scalability with worsening Row. Hammer vulnerability - Compatibility with commodity DRAM chips • Goal: Efficiently and scalably prevent Row. Hammer bit-flips without knowledge of or modifications to DRAM internals • Key Idea: Selectively throttle memory accesses that may cause Row. Hammer bit-flips • Mechanism: Block. Hammer - Tracks activation rates of all rows by using area-efficient Bloom filters - Throttles row activations that could cause Row. Hammer bit flips - Identifies and throttles threads that perform Row. Hammer attacks • Scalability with Worsening Row. Hammer Vulnerability: - Competitive with state-of-the-art mechanisms when there is no attack - Superior performance and DRAM energy when a Row. Hammer attack is present • Compatibility with Commodity DRAM Chips: - No proprietary information of DRAM internals - No modifications to DRAM circuitry 46
Block. Hammer Preventing Row. Hammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Abdullah Giray Yag lıkc ı Minesh Patel Jeremie S. Kim Roknoddin Azizi Ataberk Olgun Lois Orosa Hasan Hassan Jisung Park Konstantinos Kanellopoulos Taha Shahroodi Saugata Ghose* Onur Mutlu *
Block. Hammer Preventing Row. Hammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Backup Slides
Timing Constraints for DRAM Row Activations • Timing row activations is critical to meet reliability and power constraints. • Two timing constraints limit row activation rates. time Bank A ACT Time difference > t. RC (~45 -50 ns) Row X Row Y ACT Bank B Row Z ACT Bank C Row T ACT Bank D Row U Bank E Time difference > t. FAW (~30 -35 ns) Bank F ACT Row V ACT Time difference > t. FAW (~30 -35 ns) 1 2 3 4 Row W 5 6 t. RC. : Minimum delay between two consecutive activations in a bank. t. FAW: Rolling time window in which at most four rows can be activated in a rank. 7 49
Block. Hammer Hardware Complexity • Row. Blocker - Row. Blocker-BL: Implemented per-bank • 1 K counters in a CBF • 4 H 3 hash functions - Row. Blocker-HB: Implemented per-rank • 887 entries • Attack. Throttler - Two counters per <Bank, Thread> pair. 50
Row. Hammer Characteristics • Row. Hammer Threshold (NRH): The minimum row activation count in a refresh window to induce a Row. Hammer bit-flip. • Blast Radius (r. Blast): The maximum physical distance from the aggressor row at which Row. Hammer bit-flips can be observed. • Blast Impact Factor (ci): Set of coefficients that scale a Row. Hammer attacks impact on victim rows based on their physical distance to the aggressor row. 51
Many-Sided Attacks • NRH : Row. Hammer threshold for single-sided attack. • NRH* : Maximum activation count that Block. Hammer allows in a refresh window. • r. Blast : Blast radius • ci : Blast impact factor • We configure NRH* such that hammering all rows NRH* times does not cause bit-flips. 52
DRAM Organization DRAM Bank A DRAM bank is hierarchically organized into subarrays local bitline DRAM cell wordline DRAM row Columns of cells in subarrays share a local bitline Rows of cells in a subarray share a wordline 53
… Cache line … … … Row Decoder DRAM Operation READ Local Row READBuffer READ DRAM Command Sequence ACT R 0 RD RD RD PRE R 0 time ACT R 1 RD RD RD 54
DRAM Cell Each cell encodes information in leaky capacitors wordline access transistor bitline capacitor charge leakage paths Stored data is corrupted if too much charge leaks (i. e. , the capacitor voltage degrades too much) [Patel+ ISCA’ 17, Kim+ ISCA’ 20] 55
Security Analysis No permutation of epochs can satisfy the necessary constraints of a successful attack 56
- Slides: 56