Nicolas Wicki Data Retention in MLC NAND Flash

  • Slides: 49
Download presentation
Nicolas Wicki Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu

Nicolas Wicki Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu Carnegie Mellon University, LSI Corporation 2015 IEEE 21 st International Symposium on High Performance Computing Architecture (HPCA 2015) | 07. 11. 2018 | 1

Nicolas Wicki Why flash memory?

Nicolas Wicki Why flash memory?

Nicolas Wicki Solid State Drive Smartphone USB flash stick Computer Devices using Flash Memory

Nicolas Wicki Solid State Drive Smartphone USB flash stick Computer Devices using Flash Memory | 07. 11. 2018 | 3

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple pages. § We program and read in pages. Flash Memory Page 0 Flash Controller Block Page 255 1010 4 -16 KB | 07. 11. 2018 | 4

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple pages. § We program and read in pages. Flash Memory 1010 10 1111 Flash Controller Block 10 1010 | 07. 11. 2018 | 5

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple pages. § We program and read in pages. Flash Memory 1010 10 Flash Controller Block 1010 | 07. 11. 2018 | 6

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple

Nicolas Wicki NAND Flash Memory Device § We erase in blocks consisting of multiple pages. § We program and read in pages. § After programming we erase old copied pages. § Flash controller manages operations. § ECC controller corrects data. Flash Memory 1111 Flash Controller Block ECC 1010 | 07. 11. 2018 | 7

Nicolas Wicki Flash Memory Cells +10 V § A cell stores charge as electrons

Nicolas Wicki Flash Memory Cells +10 V § A cell stores charge as electrons in the floating gate. Control Gate +10 V Gate Oxide § We program cells by applying a high positive voltage to our control gate. Floating Gate § The positive charge attracts electrons through the tunnel oxide from the substrate. Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 8

Nicolas Wicki Erasing Cells -20 V § We erase cells by applying a high

Nicolas Wicki Erasing Cells -20 V § We erase cells by applying a high negative voltage. Control Gate -20 V Gate Oxide Floating Gate Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 9

Nicolas Wicki Trap Assisted Tunnelling Control Gate § Repeated program/erase cycles trap electrons in

Nicolas Wicki Trap Assisted Tunnelling Control Gate § Repeated program/erase cycles trap electrons in the tunnel oxide. Gate Oxide § An electric field is created by the trapped charge. Floating Gate § Charge from the floating gate leaks to the substrate Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 10

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases floating gate charge. Gate Oxide Floating Gate Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 11

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases floating gate charge. Gate Oxide § Or decreases if de-trapping towards the substrate. Floating Gate Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 12

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases floating gate charge. Gate Oxide § Or decreases if de-trapping towards the substrate. Floating Gate § Leaves a positive charge that attracts charge from the floating gate. + + Tunnel Oxide Source Substrate + Drain | 07. 11. 2018 | 13

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases

Nicolas Wicki Charge De-Trapping Control Gate § Trapped charge leaves tunnel oxide. § Increases floating gate charge. Gate Oxide § Or decreases if de-trapping towards the substrate. Floating Gate § Leaves a positive charge that attracts charge from the floating gate. Tunnel Oxide Source Substrate Drain | 07. 11. 2018 | 14

Nicolas Wicki Read Reference Voltage Retention Loss 1 0 0 § Use charge as

Nicolas Wicki Read Reference Voltage Retention Loss 1 0 0 § Use charge as indicator for bit values. Cell 1 Cell 0 § Assign 0 to a high charge and 1 to a low charge. 1 Cell § Read reference voltage separates differently charge cells. § Charge leaks over time caused by trap assisted tunnelling or charge de-trapping. § Changed values introduce retention errors. 0 Cell over Time 1 Cell multiple P/E cycles | 07. 11. 2018 | 15

Probability Density Function Nicolas Wicki Vref 1 1 0 0 Threshold Voltage Distribution |

Probability Density Function Nicolas Wicki Vref 1 1 0 0 Threshold Voltage Distribution | 07. 11. 2018 | 16

Probability Density Function Nicolas Wicki 1 Vref-2 Vref-1 0 11 10 Vref-3 00 01

Probability Density Function Nicolas Wicki 1 Vref-2 Vref-1 0 11 10 Vref-3 00 01 Threshold Voltage Distribution in Multi Level Cell Flash Memory | 07. 11. 2018 | 17

Nicolas Wicki Read-Retry Read page from flash memory Adjust read reference voltage ECC Data

Nicolas Wicki Read-Retry Read page from flash memory Adjust read reference voltage ECC Data corrected? No Yes Forward Data | 07. 11. 2018 | 18

Nicolas Wicki Executive Summary § Problem § Density of flash memory rises and diminishes

Nicolas Wicki Executive Summary § Problem § Density of flash memory rises and diminishes lifetime. § Correcting errors increases read latency. § Goal § Deepen understanding of voltage threshold distributions of flash memory. § Improve both lifetime and system performance. § Recover non-correctable data. § Method § Retention Optimized Reading § Improved Read-Retry § Retention Failure Recovery § Result § Lifetime improvement by 64%. § Read latency reduction by 70. 4%. § Raw bit error rate drop by 50%. | 07. 11. 2018 | 19

Nicolas Wicki Problem § Multi level cell § Higher error rate due to smaller

Nicolas Wicki Problem § Multi level cell § Higher error rate due to smaller threshold windows. § Lifetime § Retention errors: § Limit the time flash memory can be read from. § May lead to loosing data. § Read Latency § Retention errors: § Introduce overhead by error correction codes. § Increase number of read-retries. | 07. 11. 2018 | 20

Nicolas Wicki Goal § Building a strong understanding, characterization, and analysis of threshold voltage

Nicolas Wicki Goal § Building a strong understanding, characterization, and analysis of threshold voltage distribution over retention age. § Introduce a dynamic technique improving lifetime and read latency. § Devise a new mechanism to recover non-correctable data. | 07. 11. 2018 | 21

Nicolas Wicki FPGA-Based Flash Memory Testing Platform § Different amounts of program/erase cycles for

Nicolas Wicki FPGA-Based Flash Memory Testing Platform § Different amounts of program/erase cycles for multiple groups of flash memory. § Data of retention ages ranging from 0 to 40 days. § All experiments were conducted under room temperature (20°C). Source: Y. Cai et al. , "FPGA-Based Solid-State Drive | 07. 11. 2018 | 22 Prototyping Platform", FCCM 2011

Nicolas Wicki Retention Optimized Reading

Nicolas Wicki Retention Optimized Reading

Probability Density Function Nicolas Wicki 3 read reference voltages 4 states 1 Erased V

Probability Density Function Nicolas Wicki 3 read reference voltages 4 states 1 Erased V ref-1 0 11 P 1 10 Vref-2 P 2 Vref-3 00 P 3 01 Threshold Voltage Distribution over Time Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo | 07. 11. 2018 | 24

Probability Density Function Nicolas Wicki Erased state distribution can be neglected. 1 P 1

Probability Density Function Nicolas Wicki Erased state distribution can be neglected. 1 P 1 0 10 Vref-2 P 2 00 Vref-3 P 3 01 Threshold Voltage Distribution over Time Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo | 07. 11. 2018 | 25

Probability Density Function Nicolas Wicki Distribution shifts cause raw bit errors. 1 P 1

Probability Density Function Nicolas Wicki Distribution shifts cause raw bit errors. 1 P 1 0 10 Vref-2 P 2 00 Vref-3 P 3 01 Raw Bit Errors Threshold Voltage Distribution over Time Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo | 07. 11. 2018 | 26

Probability Density Function Nicolas Wicki 1 Optimized read reference voltages minimize raw bit errors.

Probability Density Function Nicolas Wicki 1 Optimized read reference voltages minimize raw bit errors. P 1 0 10 OPT 2 Vref-2 P 2 00 OPT 3 Vref-3 P 3 01 Minimal raw bit error Threshold Voltage Distribution over Time Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo | 07. 11. 2018 | 27

Nicolas Wicki Retention Optimized Reading use Vdefault; Vdefault++ #errors > record ECC; if #errors

Nicolas Wicki Retention Optimized Reading use Vdefault; Vdefault++ #errors > record ECC; if #errors < record new Vdefault = Vrecord = voltage, errors Vdefault-- Vdefault++ read last page; use Vdefault | 07. 11. 2018 | 28

Nicolas Wicki Improved Read-Retry Read page with OPT is from last page of a

Nicolas Wicki Improved Read-Retry Read page with OPT is from last page of a block and cells have lowest retention age. smallest charge leakage Decrease threshold voltage ECC Data corrected? No Yes Forward Data | 07. 11. 2018 | 29

Nicolas Wicki ECC threshold OPT = optimized read reference voltage Raw Bit Error Rate

Nicolas Wicki ECC threshold OPT = optimized read reference voltage Raw Bit Error Rate to Program/Erase Cycles Source: Y. Cai et al. , “Data retention in MLC NAND flash memory: … “ in IEEE 21 st Int. Symp. HPCA, 2015 | 07. 11. 2018 | 30

Nicolas Wicki Evaluation Lifetime Read Latency 30 80 Both provide 64% lifetime increase over

Nicolas Wicki Evaluation Lifetime Read Latency 30 80 Both provide 64% lifetime increase over baseline. 70 Percent Points P/E cycles x 1000 25 20 15 10 60 40 30 20 0 10 Naive Read Retry Retention Optimized Reading (Improved Read -Retry) 70. 4% latency reduction compared to naïve read-retry 50 5 Baseline (Fixed Threshold) 2. 4% latency reduction compared to baseline Stage-0 Stage-1 0 Retention Optimized Reading Nicolas Wicki | 07. 11. 2018 | 31

Nicolas Wicki Evaluation § We have a storage overhead of 768 KB out of

Nicolas Wicki Evaluation § We have a storage overhead of 768 KB out of 512 GB. 0. 00015% overhead § Execution overhead depends on program/erase cycles, retention age and amount of data written. Retention Age P/E Cycles Latency 1 day 8000 3 seconds 7 days 8000 15 seconds 30 days 8000 23 seconds Assuming flash capacity is full (512 GB). | 07. 11. 2018 | 32

Nicolas Wicki Retention Failure Recovery

Nicolas Wicki Retention Failure Recovery

Nicolas Wicki Fast and Slow Leaking Cells Fast Leaking Cells § Separate cells into

Nicolas Wicki Fast and Slow Leaking Cells Fast Leaking Cells § Separate cells into fast and slow leaking cells. Time t Cell § Over the same time t fast leaking cells leak more charge than slow leaking cells. § Threshold separating cells is the average threshold voltage shift. Average Threshold Voltage Shift Slow Leaking Cells Time t Cell | 07. 11. 2018 | 34

Nicolas Wicki Retention Failure Recovery OPT P 2 Failed Data P 3 Backup Data

Nicolas Wicki Retention Failure Recovery OPT P 2 Failed Data P 3 Backup Data 4 1 Find Fast & Slow Leaking Cells Find Risky Cells 2 3 1 Cells correctly in P 2 2 Fast leaking cells from P 3 wrongly in P 2 3 Slow leaking cells from P 2 wrongly in P 3 4 Cells correctly in P 3 Flip type 2 and 3 cells | 07. 11. 2018 | 35

Nicolas Wicki Retention Failure Recovery Raw bit error rate is expected to drop by

Nicolas Wicki Retention Failure Recovery Raw bit error rate is expected to drop by 50%. Evaluation Source: Y. Cai et al. , “Data retention in MLC NAND flash memory: … “ in IEEE 21 st Int. Symp. HPCA, 2015 | 07. 11. 2018 | 36

Nicolas Wicki Executive Summary § Problem § Density of flash memory rises and diminishes

Nicolas Wicki Executive Summary § Problem § Density of flash memory rises and diminishes lifetime. § Correcting errors increases read latency. § Goal § Deepen understanding of voltage threshold distributions of flash memory. § Improve both lifetime and system performance. § Recover non-correctable data. § Method § Retention Optimized Reading § Improved Read-Retry § Retention Failure Recovery § Result § Lifetime improvement by 64%. § Read latency reduction by 70. 4%. § Raw bit error rate drop by 50%. | 07. 11. 2018 | 37

Nicolas Wicki Strengths § Retention optimized reading enhances memory lifetime under low overhead. §

Nicolas Wicki Strengths § Retention optimized reading enhances memory lifetime under low overhead. § Retention failure recovery decreases raw bit error rate. § Mechanisms complement each other, but can be implemented individually. § We may adjust ECC capabilities to increase power efficiency. § Paper § Presents a simple and intuitive algorithm. § Conducts research with high potential impact. | 07. 11. 2018 | 38

Nicolas Wicki Weaknesses § How does temperature affect threshold voltage shifts? § How many

Nicolas Wicki Weaknesses § How does temperature affect threshold voltage shifts? § How many flash memory devices were used? § How does retention failure recovery affect storage overhead? § The paper is hard to understand in detail and covers a lot of topics. § Why was retention optimized reading not compared to adaptive voltage threshold? 1 § The paper has many similarities with previously published papers. § Figure explanations are quite sparsely provided. 1 Papandreou et al. , ”Using Adaptive Read Voltage Thresholds to Enhance the Reliability of MLC NAND…”, Proceedings of the 24 th edition of the great lakes symposium on VLSI, 2014 | 07. 11. 2018 | 39

Nicolas Wicki Key Takeaways § Retention errors limit flash memory lifetime. § Read-retry increases

Nicolas Wicki Key Takeaways § Retention errors limit flash memory lifetime. § Read-retry increases read latency. § We gained a clear understanding of threshold voltage distributions. § Retention optimized reading improves lifetime and read latency. § Retention failure recovery reduces errors. | 07. 11. 2018 | 40

Nicolas Wicki Open Discussion § In what order should we assign our 2 bit

Nicolas Wicki Open Discussion § In what order should we assign our 2 bit values to our 4 states? § They are often assigned this way: Erased - 11, P 1 - 10, P 2 - 00, P 3 - 01. § Because if the threshold voltage were to shift to the left we only get one bit error. OPT 00 11 | 07. 11. 2018 | 41

Nicolas Wicki Open Discussion § How should we assign our 2 bit values to

Nicolas Wicki Open Discussion § How should we assign our 2 bit values to pages? 217 cells Row Index LSB of the MSB of the 0 Page 2 1 Page 4 2 Page 3 Page 6 … … … 127 Page 253 Page 255 217 cells Cell 01 00 10 11 Source: Table adapted from Wang, Wei, et al. "Reducing MLC flash memory retention errors through programming initial step only. ”, MSST 31 st Symposium on. IEEE, 2015 | 07. 11. 2018 | 42

Nicolas Wicki Open Discussion § We have seen that reducing the number of read-retries

Nicolas Wicki Open Discussion § We have seen that reducing the number of read-retries has a great impact on read latency. § Can you think of yet another method to reduce the number of read-retries? § My idea would be to use binary search implemented into our current read-retry mechanism. | 07. 11. 2018 | 43

Nicolas Wicki Improved Read-Retry Read page with OPT Decrease/increase threshold voltage by half ECC

Nicolas Wicki Improved Read-Retry Read page with OPT Decrease/increase threshold voltage by half ECC Data corrected? No Yes Forward Data | 07. 11. 2018 | 44

Nicolas Wicki Additional Papers § Bez et al. , “Introduction to Flash Memory”, 2003

Nicolas Wicki Additional Papers § Bez et al. , “Introduction to Flash Memory”, 2003 § Cai et al. , “Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime”, 2012 § Cai et al. , “Error Analysis and Retention-Aware Error Management For NAND Flash Memory”, 2013 § Cai et al. , “Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling”, 2013 § Papandreou et al. , “Using Adaptive Read Voltage Thresholds to Enhance the Reliability of MLC NAND Flash Memory Systems”, 2014 § Aslam et al. , “Read and Write Voltage Signal Optimization for Multi-Level-Cell (MLC) NAND Flash Memory”, 2016 § Coutet et al. , “Influence of temperature of storage, write and read operations on multiple level cells NAND flash memories”, 2018 | 07. 11. 2018 | 45

Nicolas Wicki Big Thanks to Giray & Mohammed for their support. No, really, thanks.

Nicolas Wicki Big Thanks to Giray & Mohammed for their support. No, really, thanks. | 07. 11. 2018 | 46

Nicolas Wicki Backup Slides

Nicolas Wicki Backup Slides

Nicolas Wicki Flash Correct-and-Refresh § Read page with fixed read reference voltage. § Error

Nicolas Wicki Flash Correct-and-Refresh § Read page with fixed read reference voltage. § Error correction informs about range of actual voltage threshold. § Identify cells in a wrong state. § Left shift errors are caused by retention loss. § Right shift errors are cause by cell-to-cell interference when programming other cells. Yes Read page Error correction Page number++ Last page in block? § Identify right shift errors and left shift errors. Choose a block to be refreshed No Cell threshold voltage comparison Re-program in place # Right shift errors < threshold Re-map to the new block Yes No Source: Figure adapted from Y. Cai et al. , “Flash Correct-and-Refresh: …“, 2012. | 07. 11. 2018 | 48

Nicolas Wicki Fast and Slow Leaking Cells | 07. 11. 2018 | 49

Nicolas Wicki Fast and Slow Leaking Cells | 07. 11. 2018 | 49