Computer Architecture Lecture 16 a Opportunities and Challenges

  • Slides: 49
Download presentation
Computer Architecture Lecture 16 a: Opportunities and Challenges of Emerging Memory Tech. Prof. Onur

Computer Architecture Lecture 16 a: Opportunities and Challenges of Emerging Memory Tech. Prof. Onur Mutlu ETH Zürich Fall 2020 19 November 2020

Emerging Memory Technologies 2

Emerging Memory Technologies 2

Recall: Emerging Memory Technologies n Some emerging resistive memory technologies seem more scalable than

Recall: Emerging Memory Technologies n Some emerging resistive memory technologies seem more scalable than DRAM (and they are non-volatile) n Example: Phase Change Memory q q q n Data stored by changing phase of material Data read by detecting material’s resistance Expected to scale to 9 nm (2022 [ITRS 2009]) Prototyped at 20 nm (Raoux+, IBM JRD 2008) Expected to be denser than DRAM: can store multiple bits/cell But, emerging technologies have (many) shortcomings q Can they be enabled to replace/augment/surpass DRAM? 3

Recall: Phase Change Memory: Pros and n Pros Cons over DRAM q q q

Recall: Phase Change Memory: Pros and n Pros Cons over DRAM q q q n Cons q q n Better technology scaling (capacity and cost) Non volatile Persistent Low idle power (no refresh) Higher latencies: ~4 -15 x DRAM (especially write) Higher active energy: ~2 -50 x DRAM (especially write) Lower endurance (a cell dies after ~108 writes) Reliability issues (resistance drift) Challenges in enabling PCM as DRAM replacement/helper: q q Mitigate PCM shortcomings Find the right way to place PCM in the system 4

Recall: PCM-based Main Memory (I) n How should PCM-based (main) memory be organized? n

Recall: PCM-based Main Memory (I) n How should PCM-based (main) memory be organized? n Hybrid PCM+DRAM [Qureshi+ ISCA’ 09, Dhiman+ DAC’ 09]: q How to partition/migrate data between PCM and DRAM 5

Recall: PCM-based Main Memory (II) n How should PCM-based (main) memory be organized? n

Recall: PCM-based Main Memory (II) n How should PCM-based (main) memory be organized? n Pure PCM main memory [Lee et al. , ISCA’ 09, Top Picks’ 10]: q How to redesign entire hierarchy (and cores) to overcome PCM shortcomings 6

Emerging Memory Technologies: Opportunities and Challenges 7

Emerging Memory Technologies: Opportunities and Challenges 7

Other Opportunities with Emerging Technologies n Merging of memory and storage q n New

Other Opportunities with Emerging Technologies n Merging of memory and storage q n New applications q n e. g. , ultra-fast checkpoint and restore More robust system design q n e. g. , a single interface to manage all data e. g. , reducing data loss Processing tightly-coupled with memory q e. g. , enabling efficient search and filtering 8

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR 9

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR Non-volatile memories combine characteristics of memory and storage 10

Two-Level Memory/Storage Model n The traditional two-level storage model is a bottleneck with NVM

Two-Level Memory/Storage Model n The traditional two-level storage model is a bottleneck with NVM q q q Volatile data in memory a load/store interface Persistent data in storage a file system interface Problem: Operating system (OS) and file system (FS) code to locate, translate, buffer data become performance and energy bottlenecks with fast NVM stores Two-Level Store Load/Store Operating system and file system Virtual memory Address translation Main Memory fopen, fread, fwrite, … Processor and caches Persistent (e. g. , Phase-Change) Storage (SSD/HDD) Memory 11

Unified Memory and Storage with NVM Goal: Unify memory and storage management in a

Unified Memory and Storage with NVM Goal: Unify memory and storage management in a single unit to n eliminate wasted work to locate, transfer, and translate data q q Improves both energy and performance Simplifies programming model as well Unified Memory/Storage Persistent Memory Manager Load/Store Processor and caches Feedback Persistent (e. g. , Phase-Change) Memory Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory, ” WEED 2013. 12

PERSISTENT MEMORY CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to manipulate persistent data

PERSISTENT MEMORY CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to manipulate persistent data directly 13

The Persistent Memory Manager (PMM) Persistent objects PMM uses access and hint information to

The Persistent Memory Manager (PMM) Persistent objects PMM uses access and hint information to allocate, migrate and access data in the heterogeneous array of devices 14

The Persistent Memory Manager (PMM) Exposes a load/store interface to access persistent data n

The Persistent Memory Manager (PMM) Exposes a load/store interface to access persistent data n q n Manages data placement, location, persistence, security q n This can lead to overheads that need to be managed Exposes hooks and interfaces for system software q n To get the best of multiple forms of storage Manages metadata storage and retrieval q n Applications can directly access persistent memory no conversion, translation, location overhead for persistent data To enable better data placement and management decisions Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory, ” WEED 2013. 15

Efficient Data Mapping among Heterogeneous Devices n A persistent memory exposes a large, persistent

Efficient Data Mapping among Heterogeneous Devices n A persistent memory exposes a large, persistent address space q q q n But it may use many different devices to satisfy this goal From fast, low-capacity volatile DRAM to slow, high-capacity nonvolatile HDD or Flash And other NVM devices in between Performance and energy can benefit from good placement of data among these devices q q Utilizing the strengths of each device and avoiding their weaknesses, if possible For example, consider two important application characteristics: locality and persistence 16

Efficient Data Mapping among Heterogeneous Devices 17

Efficient Data Mapping among Heterogeneous Devices 17

Efficient Data Mapping among Heterogeneous Devices Columns in a column store that are scanned

Efficient Data Mapping among Heterogeneous Devices Columns in a column store that are scanned through only infrequently place on Flash X 18

Efficient Data Mapping among Heterogeneous Devices Columns in a column store that are scanned

Efficient Data Mapping among Heterogeneous Devices Columns in a column store that are scanned through only infrequently place on Flash X Frequently-updated index for a Content Delivery Network (CDN) place in DRAM X Applications or system software can provide hints for data placement 19

Evaluated Systems n n n HDD Baseline q Traditional system with volatile DRAM memory

Evaluated Systems n n n HDD Baseline q Traditional system with volatile DRAM memory and persistent HDD storage q Overheads of operating system and file system code and buffering NVM Baseline (NB) q Same as HDD Baseline, but HDD is replaced with NVM q Still has OS/FS overheads of the two-level storage model Persistent Memory (PM) q Uses only NVM (no DRAM) to ensure full-system persistence q All data accessed using loads and stores q Does not waste time on system calls q Data is manipulated directly on the NVM device 20

Performance Benefits of a Single-Level Store ~24 X ~5 X Meza+, “A Case for

Performance Benefits of a Single-Level Store ~24 X ~5 X Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory, ” WEED 2013. 21

Energy Benefits of a Single-Level Store ~16 X ~5 X Meza+, “A Case for

Energy Benefits of a Single-Level Store ~16 X ~5 X Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory, ” WEED 2013. 22

On Persistent Memory Benefits & Challenges n Justin Meza, Yixin Luo, Samira Khan, Jishen

On Persistent Memory Benefits & Challenges n Justin Meza, Yixin Luo, Samira Khan, Jishen Zhao, Yuan Xie, and Onur Mutlu, "A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory" Proceedings of the 5 th Workshop on Energy-Efficient Design (WEED), Tel-Aviv, Israel, June 2013. Slides (pptx) Slides (pdf) 23

Challenge and Opportunity Combined Memory & Storage 24

Challenge and Opportunity Combined Memory & Storage 24

Challenge and Opportunity A Unified Interface to All Data 25

Challenge and Opportunity A Unified Interface to All Data 25

Intel Optane Persistent Memory (2019) n n Non-volatile main memory Based on 3 D-XPoint

Intel Optane Persistent Memory (2019) n n Non-volatile main memory Based on 3 D-XPoint Technology 26 https: //www. storagereview. com/intel_optane_dc_persistent_memory_module_pmm

UPMEM Processing-in-DRAM Engine n(2019) Processing in DRAM Engine n n Includes standard DIMM modules,

UPMEM Processing-in-DRAM Engine n(2019) Processing in DRAM Engine n n Includes standard DIMM modules, with a large number of DPU processors combined with DRAM chips. Replaces standard DIMMs q DDR 4 R-DIMM modules n n q 8 GB+128 DPUs (16 PIM chips) Standard 2 x-nm DRAM process Large amounts of compute & memory bandwidth https: //www. anandtech. com/show/14750/hot-chips-31 -analysis-inmemory-processing-by-upmem https: //www. upmem. com/video-upmem-presenting-its-true-processing-in-memory-solution-hot-chips-2019/ 27

One Key Challenge in Persistent Memory n n How to ensure consistency of system/data

One Key Challenge in Persistent Memory n n How to ensure consistency of system/data if all memory is persistent? Two extremes q q n Programmer transparent: Let the system handle it Programmer only: Let the programmer handle it Many alternatives in-between… 28

CRASH CONSISTENCY PROBLEM Add a node to a linked list 2. Link to prev

CRASH CONSISTENCY PROBLEM Add a node to a linked list 2. Link to prev 1. Link to next System crash can result in inconsistent memory state 29

CURRENT SOLUTIONS Explicit interfaces to manage consistency – NV-Heaps [ASPLOS’ 11], BPFS [SOSP’ 09],

CURRENT SOLUTIONS Explicit interfaces to manage consistency – NV-Heaps [ASPLOS’ 11], BPFS [SOSP’ 09], Mnemosyne [ASPLOS’ 11] Atomic. Begin { Insert a new node; } Atomic. End; Limits adoption of NVM Have to rewrite code with clear partition between volatile and non-volatile data Burden on the programmers 30

CURRENT SOLUTIONS Explicit interfaces to manage consistency – NV-Heaps [ASPLOS’ 11], BPFS [SOSP’ 09],

CURRENT SOLUTIONS Explicit interfaces to manage consistency – NV-Heaps [ASPLOS’ 11], BPFS [SOSP’ 09], Mnemosyne [ASPLOS’ 11] Example Code update a node in a persistent hash table void hashtable_update(hashtable_t* ht, void *key, void *data) { list_t* chain = get_chain(ht, key); pair_t* pair; pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) list_find(chain, &update. Pair); pair->second = data; } 31

CURRENT SOLUTIONS void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key);

CURRENT SOLUTIONS void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) TMLIST_FIND(chain, &update. Pair); pair->second = data; } 32

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) TMLIST_FIND(chain, &update. Pair); pair->second = data; } 33

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; Need a new implementation pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) TMLIST_FIND(chain, &update. Pair); pair->second = data; } 34

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; Need a new implementation pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) TMLIST_FIND(chain, &update. Pair); Third party code pair->second = data; can be inconsistent } 35

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){

CURRENT SOLUTIONS Manual declaration of persistent components void TMhashtable_update(TMARCGDECL hashtable_t* ht, void *key, void*data){ list_t* chain = get_chain(ht, key); pair_t* pair; Need a new implementation pair_t update. Pair; update. Pair. first = key; pair = (pair_t*) TMLIST_FIND(chain, &update. Pair); Prohibited Third party code pair->second = data; Operation can be inconsistent } Burden on the programmers 36

OUR APPROACH: Thy. NVM Goal: Software transparent consistency in persistent memory systems Key Idea:

OUR APPROACH: Thy. NVM Goal: Software transparent consistency in persistent memory systems Key Idea: Periodically checkpoint state; recover to previous checkpt on crash 37

Thy. NVM: Summary A new hardware-based checkpointing mechanism • Checkpoints at multiple granularities to

Thy. NVM: Summary A new hardware-based checkpointing mechanism • Checkpoints at multiple granularities to reduce both checkpointing latency and metadata overhead • Overlaps checkpointing and execution to reduce checkpointing latency • Adapts to DRAM and NVM characteristics Performs within 4. 9% of an idealized DRAM with zero cost consistency 38

2. OVERLAPPING CHECKPOINTING AND EXECUTION time Running Checkpointing Epoch 1 Epoch 0 Running Epoch

2. OVERLAPPING CHECKPOINTING AND EXECUTION time Running Checkpointing Epoch 1 Epoch 0 Running Epoch 0 Checkpointing Running Epoch 1 Running time Checkpointing Epoch 0 Epoch 1 Epoch 2

More About Thy. NVM n Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei

More About Thy. NVM n Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu, "Thy. NVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems" Proceedings of the 48 th International Symposium on Microarchitecture (MICRO), Waikiki, Hawaii, USA, December 2015. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Source Code] 40

Another Key Challenge in Persistent Memory Programming Ease to Exploit Persistence 41

Another Key Challenge in Persistent Memory Programming Ease to Exploit Persistence 41

Tools/Libraries to Help Programmers n Himanshu Chauhan, Irina Calciu, Vijay Chidambaram, Eric Schkufza, Onur

Tools/Libraries to Help Programmers n Himanshu Chauhan, Irina Calciu, Vijay Chidambaram, Eric Schkufza, Onur Mutlu, and Pratap Subrahmanyam, "NVMove: Helping Programmers Move to Byte-Based Persistence" Proceedings of the 4 th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW), Savannah, GA, USA, November 2016. [Slides (pptx) (pdf)] 42

Another Key Challenge in Persistent Memory Security and Data Privacy Issues 43

Another Key Challenge in Persistent Memory Security and Data Privacy Issues 43

Security and Privacy Issues of NVM n Endurance problems Wearout attacks n Hybrid memories

Security and Privacy Issues of NVM n Endurance problems Wearout attacks n Hybrid memories Performance attacks n Data not erased after power-off Privacy breaches 44

Conclusion 45

Conclusion 45

The Future of Emerging Technologies is Bright n Regardless of challenges q in underlying

The Future of Emerging Technologies is Bright n Regardless of challenges q in underlying technology and overlying problems/requirements Can enable: - Orders of magnitude improvements - New applications and computing systems Problem Algorithm Yet, we have to Program/Language System Software SW/HW Interface - Think across the stack - Design enabling systems Micro-architecture Logic Devices Electrons 46

If In Doubt, Refer to Flash Memory n A very “doubtful” emerging technology q

If In Doubt, Refer to Flash Memory n A very “doubtful” emerging technology q for at least two decades Proceedings of the IEEE, Sept. 2017 https: //arxiv. org/pdf/1706. 08642 47

Many Research & Design n Enabling completely persistent memory Opportunities n Computation in/using NVM

Many Research & Design n Enabling completely persistent memory Opportunities n Computation in/using NVM based memories n Hybrid memory systems n Security and privacy issues in persistent memory n Reliability and endurance related problems n Virtual memory systems for NVM virtual block interface n … 48

Computer Architecture Lecture 16 a: Opportunities and Challenges of Emerging Memory Tech. Prof. Onur

Computer Architecture Lecture 16 a: Opportunities and Challenges of Emerging Memory Tech. Prof. Onur Mutlu ETH Zürich Fall 2020 19 November 2020