Rethinking System Support for Persistent Memory Samira Khan

  • Slides: 53
Download presentation
Rethinking System Support for Persistent Memory Samira Khan

Rethinking System Support for Persistent Memory Samira Khan

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR 2

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR Non-volatile memories combine characteristics of memory and storage 3

VISION: UNIFY MEMORY AND STORAGE CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to

VISION: UNIFY MEMORY AND STORAGE CPU NVM PERSISTENT MEMORY Ld/St Provides an opportunity to manipulate persistent data directly in memory Avoids reading and writing back data to/from storage 4

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash Consistency Availability NVM Compression Data Integrity Encryption APPLICATION OS/SYSTEM Ld/St PERSISTENT MEMORY Overhead in OS/storage layer overshadows the benefit of nanosecond access latency of NVM 5

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT APPLICATION OS/SYSTEM Ld/St MEMORY STORAGE FILE I/O Crash Consistency Availability NVM APPLICATION Ld/St PERSISTENT MEMORY Not the operating system, Application layer is responsible for crash consistency in PM 6

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash APPLICATION Software Consistency. Software OS/SYSTEM Availability Ld/St

CHALLENGE: MEMORY & STORAGE SYSTEM SUPPORT Crash APPLICATION Software Consistency. Software OS/SYSTEM Availability Ld/St FILE NVM PERSISTENT Compression MANAGER MEMORY I/O Hardware PERSISTENT Data Integrity STORAGE MEMORY Encryption APPLICATION Not the operating system, hardware is responsible for many system support in PM 7

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS How to write consistent code? How to test the code is correct? How to recover and resume application and OS? How to provide efficient hardware support? A full stack support for persistent memory applications 8

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent Programming COMPILER Resumption of the System Ld/St OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS (WEED’ 15) (Submitted to ASPLOS’ 20) Runtime Consistency Testing (ASPLOS’ 19) Pre-Execution of System Support (ISCA’ 19) Efficient Logging Mechanisms (HPCA’ 18, MICRO’ 15) Programming and testing techniques for persistent memory applications Efficient hardware and ISA support for persistent memory 9

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent

CURRENT WORKS SPAN THE WHOLE STACK PERSISTENT MEMORY CPU PROBLEM Software APPLICATION Efficient Persistent Programming COMPILER Resumption of the System Ld/St OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS (WEED’ 15) (Submitted to ASPLOS’ 20) Runtime Consistency Testing (ASPLOS’ 19) Pre-Execution of System Support (ISCA’ 19) Efficient Logging Mechanisms (HPCA’ 18, MICRO’ 15) Programming and testing techniques for persistent memory applications Efficient hardware and ISA support for persistent memory 10

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key Ideas JANUS: Interface and Mechanism ISCA’ 19 Conclusion 11

MEMORY AND STORAGE SUPPORT The memory and storage support is designed for Security Prevent

MEMORY AND STORAGE SUPPORT The memory and storage support is designed for Security Prevent attackers from stealing or tampering data Encryption, integrity verification, etc. Bandwidth Improve NVM’s limited bandwidth Deduplication, compression, etc. Endurance Extend NVM’s limited lifetime Wear-leveling, error correction, etc. We refer to the memory and storage support as backend memory operations 12

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline 13

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline Volatile Backend Memory Operations Non-volatile Recent NVM support guarantees writes accepted by memory controller is non-volatile 14

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access

BACKEND MEMORY OPERATION LATENCY Cache Writeback Memory Controller Core Cache Memory Controller NVM Access Write Access Timeline Latency to Persistence Volatile ~15 ns Backend Memory Operations Non-volatile >100 ns 15

WHY WRITE LATENCY IS IMPORTANT? • NVM programs need to use crash consistency mechanisms

WHY WRITE LATENCY IS IMPORTANT? • NVM programs need to use crash consistency mechanisms that enforces data writeback Core Volatile Cache persist_barrier Non-volatile NVM 16

WRITE LATENCY IN NVM PROGRAMS Backup Update Writeback from cache persist_barrier Timeline Commit Example:

WRITE LATENCY IN NVM PROGRAMS Backup Update Writeback from cache persist_barrier Timeline Commit Example: Steps in undo logging transaction Execution cannot continue until writeback completes 17

WRITE LATENCY IN NVM PROGRAMS Backup Update Write latency is on the critical path

WRITE LATENCY IN NVM PROGRAMS Backup Update Write latency is on the critical path Timeline Commit Example: Steps in undo logging transaction Crash consistency mechanism puts write latency on the critical path 18

WRITE LATENCY IN NVM PROGRAMS Backup Update Timeline Commit Increased latency Commit Backend memory

WRITE LATENCY IN NVM PROGRAMS Backup Update Timeline Commit Increased latency Commit Backend memory operations increase the writeback latency 19

Backend memory operations are on the critical path How to reduce the latency? 20

Backend memory operations are on the critical path How to reduce the latency? 20

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key Ideas JANUS: Interface and Mechanism ISCA’ 19 Conclusion 21

OBSERVATION Each backend memory operation seems indivisible Integration leads to serialized operations Counter-mode Encryption

OBSERVATION Each backend memory operation seems indivisible Integration leads to serialized operations Counter-mode Encryption Integrity Verification Deduplication 22

OBSERVATION However, it is possible to decompose them into sub-operations Generate counter Decompose Encrypt

OBSERVATION However, it is possible to decompose them into sub-operations Generate counter Decompose Encrypt counter Data Counter-mode Encryption Encrypted counter Generate MAC (for integrity verification) 23

KEY IDEA I: PARALLELIZATION After decomposing the example operations: Counter-mode Encryption Integrity Verification Deduplication

KEY IDEA I: PARALLELIZATION After decomposing the example operations: Counter-mode Encryption Integrity Verification Deduplication 24

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency 2. Inter-operation

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency 2. Inter-operation dependency Counter-mode Integrity 1. across Dependency within each. Deduplication operation Dependency different operations when they Encryption Verification cooperate 25

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency Parallelizable Counter-mode Integrity Deduplication Sub-operations without dependency 26 Encryption Verification can execute in parallel

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency

KEY IDEA I: PARALLELIZATION There are two types of dependencies: Intra-operation dependency Inter-operation dependency Parallelizable Counter-mode Integrity Deduplication Sub-operations without dependency 27 Encryption Verification can execute in parallel

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data External dependency Sub-operations can

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data External dependency Sub-operations can pre-execute Counter-mode Integrity Deduplication Encryption Verification as soon as their data/address dependency is resolved 28

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data Address-dependent sub-operations as soon

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data Address-dependent sub-operations as soon as Counter-mode Integrity can pre-execute Deduplication Encryption Verification the address of the write is available 29

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data-dependent sub-operations as soon as

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data-dependent sub-operations as soon as Counter-mode Integritycan pre-execute Deduplication Encryption Verification the data of the write is available 30

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data Both-dependent sub-operations as soon

KEY IDEA II: PRE-EXECUTION A write consists of: Address Data Both-dependent sub-operations as soon as Counter-mode Integritycan pre-execute Deduplication Encryption Verification both the data and address of the write are available 31

KEY IDEA II: PRE-EXECUTION A write consists of: How Counter-mode can Encryption we know

KEY IDEA II: PRE-EXECUTION A write consists of: How Counter-mode can Encryption we know Address Data Integrity Deduplication the. Verification address/data ahead of time? 32

AVAILABILITY OF ADDRESS AND DATA Data for update Update tree node <Key, Value> using

AVAILABILITY OF ADDRESS AND DATA Data for update Update tree node <Key, Value> using undo log Traverse tree with Key Pre-execution Backup for update Use Pre-execution Results Update Commit Backup Location is known after traversal During backup: Address ofthe theupdateis isknown Pre-execution ofbackup: address and of data-dependent sub-operations During Data Take pre-executed results when writing back update 33

OUR PROPOSAL: JANUS Janus is a Roman god with two faces: one looks into

OUR PROPOSAL: JANUS Janus is a Roman god with two faces: one looks into the past, and another into the future When dependent data and address become available Past Pre-execute operations with dependency resolved Future 34

JANUS OVERVIEW Backup Update Janus: • Parallelization Backend memory operations Original writeback latency Backup

JANUS OVERVIEW Backup Update Janus: • Parallelization Backend memory operations Original writeback latency Backup Update Timeline Commit Serialized Parallelization reduces the latency of each operation

JANUS OVERVIEW Backup Janus: • Parallelization • Pre-execution Timeline Update Backend memory operations Original

JANUS OVERVIEW Backup Janus: • Parallelization • Pre-execution Timeline Update Backend memory operations Original writeback latency Backup Update Commit Parallelized Pre-executed Commit Serialized Pre-execution moves their latency off the critical path

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key Ideas JANUS: Interface and Mechanism ISCA’ 19 Conclusion 37

JANUS OVERVIEW Parallelization Janus SW Interface Pre-execution NVM Program SW Core HW CPU Janus

JANUS OVERVIEW Parallelization Janus SW Interface Pre-execution NVM Program SW Core HW CPU Janus HW Memory Controller Janus software interface enables pre-execution 38

SOFTWARE INTERFACE • Janus provides functions for pre-executing address and data dependent sub-operations at

SOFTWARE INTERFACE • Janus provides functions for pre-executing address and data dependent sub-operations at object granularity • Janus interface is hardware-independent: only takes address and data val) { 1 void update. Tree(int key, item_t val // find tree node with key 2 The data for update node* location = find(key); 3 // add old val to undo log 4 Backup 5 undo_log(location); // update val The address for update is known 6 location->val = val; Update 7 persist_barrier(); 8 // commit updates 9 Commit 10 commit(); 11 } is known 39

SOFTWARE INTERFACE • Janus provides functions for pre-executing address and data dependent sub-operations at

SOFTWARE INTERFACE • Janus provides functions for pre-executing address and data dependent sub-operations at object granularity • Janus interface is hardware-independent: only takes address and data Keep track of pre-execution: PRE_ID Thread_ID Transaction_ID Address Size 1 void update. Tree(int key, item_t val) { Pre-execute data-dependent pre_obj; 2 sub-operations PRE_DATA(&pre_obj, &val, sizeof(item_t)); 3 // find tree node with key 4 node* location = find(key); 5 PRE_ADDR(&pre_obj, location, sizeof(item_t)); 6 // add old val to undo log 7 undo_log(location); 8 Pre-execute address-dependent // update val 9 sub-operations location->val = val; 10 persist_barrier(); 11 // commit updates 12 commit(); 13 40 }

AUTOMATED INSTURMENTATION • Manual instrumentation is effective at improving performance, but requires significant programmer’s

AUTOMATED INSTURMENTATION • Manual instrumentation is effective at improving performance, but requires significant programmer’s effort • Janus provides a compiler pass to automatically instrument program with the interface Compiler Pass NVM Program Dependency Analysis Data Instrumentation of Janus Interface Address 41

JANUS OVERVIEW Parallelization Janus SW Interface Pre-execution NVM Program SW Core HW CPU Janus

JANUS OVERVIEW Parallelization Janus SW Interface Pre-execution NVM Program SW Core HW CPU Janus HW Memory Controller 42

HARDWARE MECHANISM • Converter: convert pre-execution from object to cache line granularity • Intermediate

HARDWARE MECHANISM • Converter: convert pre-execution from object to cache line granularity • Intermediate result buffer: store pre-execution results to avoid changing processor/memory state • Correctness check: invalidate incorrect pre-execution Object Cache line granularity Convert Core Intermediate Results Backend memory Result Buffer operations correctness check Memory Controller 43

HARDWARE MECHANISM • Converter: convert pre-execution from object to cache line granularity • Intermediate

HARDWARE MECHANISM • Converter: convert pre-execution from object to cache line granularity • Intermediate result buffer: store pre-execution results to avoid changing processor/memory state • Correctness check: invalidate incorrect pre-execution Cache line Object granularity Convert X 1 X 2 X Core PRE_BOTH X WRITE X 1 R 1 X 2 R 2 Write completes Intermediate Results Backend memory Result Buffer operations Take results correctness check Memory Controller 44

METHODOLOGY • Gem 5 Simulator: Processor L 1 D/I, L 2 cache Backend memory

METHODOLOGY • Gem 5 Simulator: Processor L 1 D/I, L 2 cache Backend memory operation units Intermediate result buffer Out-of-Order, 4 GHz 64/32 KB, 2 MB per core (shared) 512 KB per core for each operation (shared) 4 units per core 64 entries per core (shared) • Design points: • Serialized: all backend memory operations are serialized • Janus: pre-execute parallelized backend memory operations • Instrumentation of Janus functions: Manual Automated 45

JANUS VS. BASELINE Parallelization Speedup over Serialized Baseline 6 5 4 3 2 1

JANUS VS. BASELINE Parallelization Speedup over Serialized Baseline 6 5 4 3 2 1 A y rra Pre-execution 2. 35 X Speedup p a Sw u ue Q e sh a H B e e r -T RB e e r -T TP A T T C PC . o Ge an e M Janus provides 2. 35 X speedup on average 46

JANUS VS. BASELINE Moves the latency off critical path Reduces the latency Parallelization Speedup

JANUS VS. BASELINE Moves the latency off critical path Reduces the latency Parallelization Speedup over Serialized Baseline 6 5 4 3 2 1 A y rra p a Sw u ue Q e sh a H B e e r -T Pre-execution RB e e r -T TP A T T C PC . o Ge an e M Pre-execution provides more speedup 47

MANUAL VS. AUTO INSTRUMENTATION Speedup over Serialized Baseline Janus (Manual) Janus (Auto) 6 5

MANUAL VS. AUTO INSTRUMENTATION Speedup over Serialized Baseline Janus (Manual) Janus (Auto) 6 5 15% slower 4 3 2 1 y a Arr ap w S e Q u ue sh a H ble a T e re T B e R re T B TP A T n C C TP . o Ge a e M Compiler pass provides close-to-manual performance 48

PRE-EXECUTION PERCENTAGE Percentage over Total Operations Complete Incomplete 1 0, 8 0, 6 0,

PRE-EXECUTION PERCENTAGE Percentage over Total Operations Complete Incomplete 1 0, 8 0, 6 0, 4 0, 2 0 Array Swap Queue Hash Table B-Tree RB-Tree TATP TPCC Janus provides a reasonable coverage 49

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key

Challenge: High Overhead NON-VOLATILE MEMORY PERSISTENT MEMORY Unified Memory and Storage Observation and Key Ideas JANUS: Interface and Mechanism ISCA’ 19 Conclusion 50

NEEDS STORAGE AND MEMORY SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION PMTest Ld/St Compression

NEEDS STORAGE AND MEMORY SYSTEM SUPPORT Crash Consistency. Software NVM APPLICATION PMTest Ld/St Compression Hardware Data Integrity Encryption pmtest. persistentmemory. org PERSISTENT MANAGER PERSISTENT MEMORY Janus PMTest focuses on correctness of persistent memory applications Janus focuses on reducing the overhead of system support 51

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER

GOAL: END-TO-END SYSTEM FOR PERSISTENT MEMORY CPU PROBLEM PERSISTENT MEMORY Software APPLICATION Ld/St COMPILER OS PERSISTENT MANAGER ARCHITECTURE Hardware CIRCUITS How to write consistent code? How to test the code is correct? How to recover and resume application and OS? How to provide efficient hardware support? A full stack support for persistent memory applications Many directions to explore! 52

Rethinking System Support for Persistent Memory Samira Khan

Rethinking System Support for Persistent Memory Samira Khan