BitExact ECC Recovery BEER Determining DRAM OnDie ECC
Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics Minesh Patel, Jeremie S. Kim Taha Shahroodi, Hasan Hassan, Onur Mutlu MICRO 2020 (Session 2 C – Memory)
Minimum viable operating timings? Aggregate failure rates? ‘Weak’ cell locations? Temperature dependence? What are the DRAM chip’s reliability characteristics? Inter-chip variation? Statistical error distributions? System Designer Test/Validation Engineer Research Scientist Third-party DRAM consumer 2
DRAM Testing and Error Characterization On-die ECC Study observed bit flips Bit flips obfuscated by on-die ECC Unknown & Proprietary No feedback to CPU upon error correction 3
DRAM Testing and Error Characterization On-die ECC complicates third-party DRAM testing Study observed bit flips Bit flips obfuscated by on-die ECC Unknown & Proprietary No feedback to CPU 4
Overcoming Challenges of On-Die ECC Our goal: Determine exactly how on-die ECC obfuscates errors (i. e. , its parity-check matrix) DRAM Chip I/O ECC Logic 5 Data
BEER Methodology 1 Induce uncorrectable errors 2 Aggregate unique error patterns 3 Solve for the parity-check matrix BEER requires no special hardware or knowledge https: //github. com/CMU-SAFARI/BEER 6
Experimental demonstration 80 LPDDR 4 DRAM chips (3 major manufacturers) Two-Part Evaluation Simulated correctness and practicality Over 100, 000 representative ECC codes of varying word lengths (4 – 247 bits) 7
1. Different manufacturers appear to Experimental demonstration use different parity-check 80 LPDDR 4 DRAM chipsmatrices (3 of major manufacturers) 2. Chips the same model appear to use identical parity-check matrices Two-Part Evaluation 1. BEER works for all simulated test cases Simulated correctness and practicality Over 100, 000 ECC codes 2. BEER representative is practical in both of varying wordand lengths (4 – usage 247 bits) runtime memory 8
Crafting worst-case test patterns d an n ng sti tio Te lida Va Profiling for error-prone physical cells Ch ara cte Err riz ors ing Studying raw bit error properties BEER Use Cases Designing Systems Improving on-die ECC 9 System-level error-mitigation mechanisms Root-cause failure analysis
Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics Minesh Patel, Jeremie S. Kim Taha Shahroodi, Hasan Hassan, Onur Mutlu MICRO 2020 (Session 2 C – Memory)
- Slides: 10