Rethinking System Support for Persistent Memory Samira Khan

  • Slides: 60
Download presentation
Rethinking System Support for Persistent Memory Samira Khan

Rethinking System Support for Persistent Memory Samira Khan

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR 2

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR Non-volatile memories combine characteristics of memory and storage 3

VISION: UNIFY MEMORY AND STORAGE CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to

VISION: UNIFY MEMORY AND STORAGE CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to manipulate persistent data directly in memory Avoids reading and writing back data to/from storage 4

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION Crash OS/SYSTEM Consistency Availability Ld/St FILE NVM

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION Crash OS/SYSTEM Consistency Availability Ld/St FILE NVM MEMORY I/O Compression Integrity Check STORAGE Encryption APPLICATION OS/SYSTEM Ld/St PERSISTENT MEMORY Overhead in OS/storage layer overshadows the benefit of nanosecond access latency of NVM 5

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash Consistency Availability NVM APPLICATION Ld/St PERSISTENT MEMORY Not the operating system, Application layer is responsible for crash consistency in PM 6

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash APPLICATION Software Consistency. Software OS/SYSTEM Availability Ld/St

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash APPLICATION Software Consistency. Software OS/SYSTEM Availability Ld/St FILE NVM PERSISTENT Compression MANAGER MEMORY I/O Hardware PERSISTENT Integrity Check STORAGE MEMORY Encryption APPLICATION Not the operating system, hardware is responsible for many system support in PM 7

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS How to write consistent code? How to test the code is correct? How to recover and resume application and OS? How to provide efficient hardware support? A full stack support for persistent memory applications 8

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent Programming COMPILER Resumption of the System Ld/St OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS (WEED’ 15) (Submitted to ASPLOS’ 20) Runtime Consistency Testing (ASPLOS’ 19) Pre-Execution of System Support (ISCA’ 19) Efficient Logging Mechanisms (HPCA’ 18, MICRO’ 15) Programming and testing techniques for persistent memory applications Efficient hardware and ISA support for persistent memory 9

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent Programming COMPILER Resumption of the System Ld/St OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS (WEED’ 15) (Submitted to ASPLOS’ 20) Runtime Consistency Testing (ASPLOS’ 19) Pre-Execution of System Support (ISCA’ 19) Efficient Logging Mechanisms (HPCA’ 18, MICRO’ 15) Programming and testing techniques for persistent memory applications Efficient hardware and ISA support for persistent memory 10

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for Correctness ASPLOS’ 19 JANUS: Optimizing for Efficiency ISCA’ 19 Conclusion 11

PERSISTENT MEMORY PROGRAMMING • Support for crash consistency have two fundamental guarantees • Durability:

PERSISTENT MEMORY PROGRAMMING • Support for crash consistency have two fundamental guarantees • Durability: writes become persistent in PM • Ordering: one write becomes persistent in PM before another Core Volatile Cache Durability Guarantee: writeback data from cache Flush A Persistent PM-DIMM 12

PERSISTENT MEMORY PROGRAMMING • Support for crash consistency have two fundamental guarantees • Durability:

PERSISTENT MEMORY PROGRAMMING • Support for crash consistency have two fundamental guarantees • Durability: writes become persistent in PM • Ordering: one write becomes persistent in PM before another Core Volatile Cache Persistent B A PM-DIMM Ordering Guarantee: Write A before B Writeback A Barrier Writeback B 13

PERSISTENT MEMORY PROGRAMMING PM Programming Expert • Uses low-level primitives • Understands the hardware

PERSISTENT MEMORY PROGRAMMING PM Programming Expert • Uses low-level primitives • Understands the hardware • Understands the algorithm Normal • Uses a high-level interface • Does not need to know details of hardware or algorithm Two different ways to program persistent applications 14

PERSISTENT MEMORY PROGRAMMING (LOW-LEVEL) • Hardware provides low-level primitives for crash consistency • Exposes

PERSISTENT MEMORY PROGRAMMING (LOW-LEVEL) • Hardware provides low-level primitives for crash consistency • Exposes instructions for cache flush and barriers • sfence, clwb from x 86 • dc cvap from ARM • Academic proposals, e. g. , ofence, dfence. x 86 clwb sfence PM-DIMM ARM dc cvap dsb PM-DIMM New Instr PM-DIMM [Kiln’ 13, Thy. NVM’ 15, DPO’ 16, JUSTDOLogging’ 16, ATOM’ 17, HOPS’ 17, etc. ] 15

PROGRAMMING USING LOW-LEVEL PRIMITIVES 1 void list. Append(item_t new_val) { 2 node_t* new_node =

PROGRAMMING USING LOW-LEVEL PRIMITIVES 1 void list. Append(item_t new_val) { 2 node_t* new_node = new node_t(new_val); 3 new_node->next = head; 4 head = new_node; 5 persist_barrier(); Writes 6 } to PM can reorder Create new_node Update head pointer Writeback updates Head In cache new_node is lost after failure Inconsistent linked list 16

PROGRAMMING USING LOW-LEVEL PRIMITIVES 1 void list. Append(item_t new_val) { 2 node_t* new_node =

PROGRAMMING USING LOW-LEVEL PRIMITIVES 1 void list. Append(item_t new_val) { 2 node_t* new_node = new node_t(new_val); 3 new_node->next = head; persist_barrier(); 4 head = new_node; 5 persist_barrier(); 6 } Enforce writeback before changing head Head In In cache PM Ensuring crash consistency with low-level is HARD! Consistent linkedprimitives list 17

PERSISTENT MEMORY PROGRAMMING PM Programming Expert • Uses low-level primitives • Understands the hardware

PERSISTENT MEMORY PROGRAMMING PM Programming Expert • Uses low-level primitives • Understands the hardware • Understands the algorithm Normal • Uses a high-level interface • Does not need to know details of hardware or algorithm 18

PERSISTENT MEMORY PROGRAMMING (HIGH-LEVEL) • Libraries provide transactions on top of low-level primitives •

PERSISTENT MEMORY PROGRAMMING (HIGH-LEVEL) • Libraries provide transactions on top of low-level primitives • Intel’s PMDK • Academic proposals Atomic. Begin { Append a new node; } Atomic. End; Uses logging mechanisms to atomically commit the updates [NV-Heaps’ 11, Mnemosyne’ 11, ATLAS’ 14, REWIND’ 15, NVL-C’ 16, NVThreads’ 17 LSNVMM’ 17, etc. ] 19

PROGRAMMING USING TRANSACTIONS 1 void List. Append(item_t new_val) { 2 TX_BEGIN { 3 node_t

PROGRAMMING USING TRANSACTIONS 1 void List. Append(item_t new_val) { 2 TX_BEGIN { 3 node_t *new_node = make. Node(new_val); 4 TX_ADD(list. head, sizeof(node_t*)); 5 List. head = new_node; 6 List. length++; 7 } TX_END 8 } Create new_node backup head Update length is not backed up before update! 20

PROGRAMMING USING TRANSACTIONS 1 void List. Append(item_t new_val) { 2 TX_BEGIN { 3 node_t

PROGRAMMING USING TRANSACTIONS 1 void List. Append(item_t new_val) { 2 TX_BEGIN { 3 node_t *new_node = make. Node(new_val); 4 TX_ADD(list. head, sizeof(node_t*)); 5 List. head = new_node; 6 List. length++; TX_ADD(list. length, sizeof(unsigned)); 7 } TX_END Backup length 8 } before update Ensuring crash consistency with transactions is still HARD! 21

PERSISTENCE MEMORY PROGRAMMING IS HARD PM Programming Expert • Uses low-level primitives • Understands

PERSISTENCE MEMORY PROGRAMMING IS HARD PM Programming Expert • Uses low-level primitives • Understands the hardware • Understands the algorithm Normal • Uses a high-level interface • Does not need to know details of hardware or algorithm Both expert and normal programmers can make mistakes 22

PERSISTENT MEMORY PROGRAMMING IS HARD Detect crash consistency bugs We need a tool to

PERSISTENT MEMORY PROGRAMMING IS HARD Detect crash consistency bugs We need a tool to detect crash consistency bugs! 23

REQUIREMENTS OF THE TOOL Fast Flexible PM Libraries Kernel Modules Existing HW Custom Programs

REQUIREMENTS OF THE TOOL Fast Flexible PM Libraries Kernel Modules Existing HW Custom Programs Future HW and Models [PMDK, NV-Heaps’ 11, Mnemosyne’ 11, ATLAS’ 14, REWIND’ 15, NVL-C’ 16, NVThreads’ 17 LSNVMM’ 17, etc. ] [PMFS’ 14, BPFS’ 09, NOVA’ 16, NOVA-Fortis’ 17, Strata’ 17, SCMFS’ 11 etc. ]store, [DPO’ 16, HOPS’ 17, etc. ] E. g. , custom database, key-value etc. ARM, [x 86, etc. ] 24

Our work: Flexible Fast Kernel Modules PM Libraries PMTest Custom Programs Existing HW Academic

Our work: Flexible Fast Kernel Modules PM Libraries PMTest Custom Programs Existing HW Academic Proposals Less than 2 X overhead in real workloads PMTest has detected new bugs in PMFS and PMDK applications Artifact available at pmtest. persistentmemory. org

PMTEST KEY IDEAS: FLEXIBLE • Many different programming models and hardware primitives available PM

PMTEST KEY IDEAS: FLEXIBLE • Many different programming models and hardware primitives available PM Program Call library PMDK Library write, sfence, clwb x 86 PM Program PM Kernel Module Call library Mnemosyne Library write, dc cvap, dsb ARM write, sfence, clwb x 86 The challenge is to support different hardware and software models 26

PMTEST KEY IDEAS: FLEXIBLE Operations that maintain crash consistency are similar: ordering and durability

PMTEST KEY IDEAS: FLEXIBLE Operations that maintain crash consistency are similar: ordering and durability guarantees PM Program Call library PMDK Library write, sfence, clwb x 86 PM Program PM Kernel Module Call library Mnemosyne Library write, dc cvap, dsb ARM write, sfence, clwb x 86 Our key idea is to test for these two fundamental guarantees which in turn can cover all hardware-software variations 27

PMTEST KEY IDEAS: FAST • Prior work [Yat’ 14] uses exhaustive testing O(n!) n

PMTEST KEY IDEAS: FAST • Prior work [Yat’ 14] uses exhaustive testing O(n!) n sfence write A write B write C. . . sfence write B write A write C. . . sfence write C write B write A. . . sfence write A write C write B. . . sfence write B write C write A. . . sfence write C write A write B. . . sfence Recoverable? Exhaustive testing is time consuming and not practical 28

PMTEST KEY IDEAS: FAST • Reduce test time by using only one dynamic trace

PMTEST KEY IDEAS: FAST • Reduce test time by using only one dynamic trace Runtime Trace Persistent Memory Application sfence write C write B write A. . . sfence Recoverable? A significant improvement over O(n!) testing 29

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace The interval in which a write can possibly become persistent write A clwb A sfence write B clwb B sfence Trace A A persists before B B Timeline A disjoint interval indicates that no re-ordering in the hardware will lead to a case where A does not persist before B 30

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace The interval in which a write can possibly become persistent write A write B clwb A sfence clwb B sfence Trace A Interleaving A may NOT persist before B B Timeline An overlapping interval indicates that there is a case where A does not persist before B 31

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace

PMTEST KEY IDEAS: FAST • PMTest infers the persistence interval from PM operation trace The interval in which a write can possibly become persistent write A write B clwb A sfence clwb B A persists before B? sfence Trace A B No Timeline Querying the trace can detect any violation in ordering and durability guarantee at runtime 32

PMTEST OVERVIEW Testing Annotation Persistent Memory Application Offline Checking Rules PMTesting Results Online 33

PMTEST OVERVIEW Testing Annotation Persistent Memory Application Offline Checking Rules PMTesting Results Online 33

SUMMARY SO FAR • It is hard to guarantee crash consistency in persistent memory

SUMMARY SO FAR • It is hard to guarantee crash consistency in persistent memory applications PMTest pmtest. persistentmemory. org • Our tool PMTest is fast and flexible • Flexible: Supports kernel modules, custom PM programs, transaction-based programs • Fast: Incurs < 2 X overhead in real-workload applications • PMTest has detected 3 new bugs in PMFS and PMDK applications 34

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION Ld/St Compression Hardware

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION Ld/St Compression Hardware Integrity Check Encryption PMTest PERSISTENT MANAGER PERSISTENT MEMORY Not the operating system, hardware is responsible for many system support in PM 35

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for Correctness ASPLOS’ 19 JANUS: Optimizing for Efficiency ISCA’ 19 Conclusion 36

MEMORY AND STORAGE SUPPORT The memory and storage support is designed for Security Prevent

MEMORY AND STORAGE SUPPORT The memory and storage support is designed for Security Prevent attackers from stealing or tampering data Encryption, integrity verification, etc. Bandwidth Improve NVM’s limited bandwidth Deduplication, compression, etc. Endurance Extend NVM’s limited lifetime Wear-leveling, error correction, etc. We refer to the memory and storage support as backend memory operations 37

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline 38

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline Volatile Backend Memory Operations Non-volatile Recent NVM support guarantees writes accepted by memory controller is non-volatile 39

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline Latency to Persistence Volatile ~15 ns Backend Memory Operations Non-volatile >100 ns 40

WHY WRITE LATENCY IS IMPORTANT? • NVM programs need to use crash consistency mechanisms

WHY WRITE LATENCY IS IMPORTANT? • NVM programs need to use crash consistency mechanisms that enforces data writeback Core Volatile Cache persist_barrier Non-volatile NVM 41

WRITE LATENCY IN NVM PROGRAMS Backup Update Writeback from cache persist_barrier Timeline Commit Example:

WRITE LATENCY IN NVM PROGRAMS Backup Update Writeback from cache persist_barrier Timeline Commit Example: Steps in undo logging transaction Execution cannot continue until writeback completes 42

WRITE LATENCY IN NVM PROGRAMS Backup Update Write latency is on the critical path

WRITE LATENCY IN NVM PROGRAMS Backup Update Write latency is on the critical path Timeline Commit Example: Steps in undo logging transaction Crash consistency mechanism puts write latency on the critical path 43

WRITE LATENCY IN NVM PROGRAMS Backup Update Timeline Commit Increased latency Commit Backend memory

WRITE LATENCY IN NVM PROGRAMS Backup Update Timeline Commit Increased latency Commit Backend memory operations increase the writeback latency 44

Backend memory operations are on the critical path How to reduce the latency? 45

Backend memory operations are on the critical path How to reduce the latency? 45

OBSERVATION Each backend memory operation seems indivisible Integration leads to serialized operations Counter-mode Encryption

OBSERVATION Each backend memory operation seems indivisible Integration leads to serialized operations Counter-mode Encryption Integrity Verification Deduplication 46

OBSERVATION However, it is possible to decompose them into sub-operations Generate counter Decompose Encrypt

OBSERVATION However, it is possible to decompose them into sub-operations Generate counter Decompose Encrypt counter Data Counter-mode Encryption Encrypted counter Generate MAC (for integrity verification) 47

KEY IDEA I: PARALLELIZATION After decomposing the example operations: Counter-mode Encryption Integrity Verification Deduplication

KEY IDEA I: PARALLELIZATION After decomposing the example operations: Counter-mode Encryption Integrity Verification Deduplication 48

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency 2. Inter-operation

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency 2. Inter-operation dependency Counter-mode Integrity 1. across Dependency within each. Deduplication operation Dependency different operations when they Encryption Verification cooperate 49

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency Parallelizable Counter-mode Integrity Deduplication Sub-operations without dependency 50 Encryption Verification can execute in parallel

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency Parallelizable Counter-mode Integrity Deduplication Sub-operations without dependency 51 Encryption Verification can execute in parallel

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data External dependency Sub-operations can

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data External dependency Sub-operations can pre-execute Counter-mode Integrity Deduplication Encryption Verification as soon as their data/address dependency is resolved 52

OUR PROPOSAL: JANUS Janus is a Roman god with two faces: one looks into

OUR PROPOSAL: JANUS Janus is a Roman god with two faces: one looks into the past, and another into the future When dependent data and address become available Past Pre-execute operations with dependency resolved Future 53

JANUS OVERVIEW Backup Update Janus: • Parallelization Backend memory operations Original writeback latency Backup

JANUS OVERVIEW Backup Update Janus: • Parallelization Backend memory operations Original writeback latency Backup Update Timeline Commit Serialized Parallelization reduces the latency of each operation

JANUS OVERVIEW Backup Janus: • Parallelization • Pre-execution Timeline Update Backend memory operations Original

JANUS OVERVIEW Backup Janus: • Parallelization • Pre-execution Timeline Update Backend memory operations Original writeback latency Backup Update Commit Parallelized Pre-executed Commit Serialized Pre-execution moves their latency off the critical path

PERFORMANCE • Janus provides a software interface to issue pre-execution • Compared to baseline

PERFORMANCE • Janus provides a software interface to issue pre-execution • Compared to baseline with serialized operations: Manual: 2. 35 X speedup Janus Automated: 2 X speedup

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for

Rethinking System Support NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage PMTEST: Testing for Correctness ASPLOS’ 19 JANUS: Optimizing for Efficiency ISCA’ 19 Conclusion 57

NEEDS STORAGE AND MEMORY SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION PMTest Ld/St Compression

NEEDS STORAGE AND MEMORY SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION PMTest Ld/St Compression Hardware Integrity check Encryption pmtest. persistentmemory. org PERSISTENT MANAGER PERSISTENT MEMORY Janus PMTest focuses on correctness of persistent memory applications Janus focuses on reducing the overhead of system support 58

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS How to write consistent code? How to test the code is correct? How to recover and resume application and OS? How to provide efficient hardware support? A full stack support for persistent memory applications Many directions to explore! 59

Rethinking System Support for Persistent Memory Samira Khan

Rethinking System Support for Persistent Memory Samira Khan