Architected for Performance Delivering Continued Innovation for Storage

  • Slides: 20
Download presentation
Architected for Performance Delivering Continued Innovation for Storage NVM Express® Specification Updates Sponsored by

Architected for Performance Delivering Continued Innovation for Storage NVM Express® Specification Updates Sponsored by NVM Express organization, the owner of NVMe™, NVMe-o. F™ and NVMe-MI™ standards

Speaker Jonmichael Hands Product Manager / Strategic Planner, Intel NVM Express, Inc. Marketing Co-Chair

Speaker Jonmichael Hands Product Manager / Strategic Planner, Intel NVM Express, Inc. Marketing Co-Chair 2

NVM Express® Roadmap 2014 NVMe Mgmt. Interface (NVMe-MI) NVMe Over Fabrics (NVMe-o. F) NVM

NVM Express® Roadmap 2014 NVMe Mgmt. Interface (NVMe-MI) NVMe Over Fabrics (NVMe-o. F) NVM Express (NVMe) Q 1 Q 2 Q 3 2015 Q 4 Q 1 Q 2 Q 3 2016 Q 4 Q 1 Q 2 2017 Q 3 Q 4 Q 1 Q 3 2018 Q 4 Q 1 Q 2 2019 Q 3 Q 4 Q 1 • Sanitize • Streams • Virtualization • Namespace Management • Controller Memory Buffer • Host Memory Buffer • Live Firmware Update Q 2 Q 3 Q 4 NVMe (next) NVMe 1. 3 May’ 17 NVMe 1. 2. 1 May’ 16 NVMe 1. 2 Nov’ 14 Q 2 • IO Determinism • Persistent Memory Region • Persistent Event Log • Multipathing NVMe-o. F 1. 0 May’ 16 • • • Enhanced Discovery • In-band Authentication • TCP Transport Binding Transport and protocol RDMA binding NVMe-MI 1. 0 Nov’ 15 NVMe-o. F (next) NVMe-MI 1. 0 a April’ 17 • Out-of-band management • Device discovery • Health & temp NVMe-MI 1. 1 • Enclosure Management • In-band Mechanism • Storage Device Extension monitoring • Firmware Update Released NVMe specification Planned release 3

Agenda - New Features for Next Version of NVMe Category Feature Benefit Hyperscale performance

Agenda - New Features for Next Version of NVMe Category Feature Benefit Hyperscale performance NVM Sets Improved multi tenant quality of service through physical isolation / separation Read Recovery Levels Improved read latency with host to drive tradeoff on UBER IO Determinism Read only like latencies for mixed read/write workloads Multi-Host Shared Write Streams Improve SSD endurance by tagging data into streams, new use cases on dealing with data from multiple hosts New Use Cases Persistent Memory Region Multi purpose persistent memory for innovative use cases Manageability / Triage Administrative Controller Splits NVMe controller up into administrative, I/O, and discovery controllers. Admin controller used for enclosure management. Persistent Event Log SSD keeps log of events that host (e. g. OS) can read NVMe-o. F Multipathing and Namespace Sharing (ANA) Discover optimal path to namespace Data integrity, configurations Rebuild Assist Drive can discover unrecoverable data and ask host to rebuild from other copies Enhanced Command Retry Host configurable retry status for commands with time delay Namespace Granularity Create namespace size that is optimal for the SSD media layout Verify data integrity on drive without sending data to host Namespace write protect Lockdown namespace for read only and boot use cases 4

IO Determinism - The problem “In practice, a single user request may result in

IO Determinism - The problem “In practice, a single user request may result in thousands of subqueries, with a critical path that is dozens of subqueries long. The fork/join structure of subqueries causes latency outliers to have a disproportionate effect on total latency" Great resources from FMS this year! Challenges to Adopting Stronger Consistency at Scale - Ajoux et. Al. , (Facebook & USC), 2015 5

Hyper-scale SSD / NVM Form Factor Characteristics Important: • Scalable & Flexible • High

Hyper-scale SSD / NVM Form Factor Characteristics Important: • Scalable & Flexible • High volume & Low cost • Power & Thermal Efficiency • Hot-swappable & Serviceable • Performance per TB & Quality of Service Less important: • Backwards compatible • Support for non-NVM media • Maximum density • Peak performance (peak IOPs/BW) NVMe™: Hardware Implementations and Key Benefits in environments Flash Memory Summit 2018 6

NVM Sets An NVM Set is a collection of NVM that is separate (logically

NVM Sets An NVM Set is a collection of NVM that is separate (logically and potentially physically) from NVM in other NVM Sets Goal would be to isolate NVM Set to a specific channel, die, and resources so that workloads on one Set do not impact the others (solves noisy neighbor problem) 7

Read Recovery Levels Options to change amount of error recovery done on reads to

Read Recovery Levels Options to change amount of error recovery done on reads to decrease read latencies Host may be ok with lower uncorrectable bit error rate for faster reads if it has multiple copies of the data Read Recovery Levels tells host about this tradeoff and offers various levels 8

Predictable Latency Mode – IO Determinism When Predictable Latency Mode is enabled NVM Sets

Predictable Latency Mode – IO Determinism When Predictable Latency Mode is enabled NVM Sets and their associated namespaces have vendor specific quality of service attributes; • I/O commands that access NVM in the same NVM Set have the same quality of service attributes; and • I/O commands that access NVM in one NVM Set do not impact the quality of service of I/O commands that access NVM in a different NVM Set. Time DTWIN NDWIN Deterministic Window Non-Deterministic Window Host agrees not to send writes or trims during dwin Drive agrees to do background ops in nwin 9

Facebook Demonstration for IO Determinism Read only like latencies achievable for mixed workloads with

Facebook Demonstration for IO Determinism Read only like latencies achievable for mixed workloads with PLM enabled 10

Persistent Memory Region The Controller Memory Buffer (CMB) is a region of general purpose

Persistent Memory Region The Controller Memory Buffer (CMB) is a region of general purpose read/write memory on the controller that may be used for a variety of purposes CMB enabled memory within NVMe controller to use for things like submission queue support, read and write data support for peer-to-peer and NVMe over Fabrics The Persistent Memory Region (PMR) is an optional region of general purpose PCI Express read/write persistent memory that may be used for a variety of purposes. Small PMR can be used for write cache, RAID logs, journaling, scratch pad for metadata, dedupe and compression staging. Large PMR can enable persistent memory over fabrics! See presentations from FMS 2018 for more info! Important new NVMe features for optimizing the data pipeline, Dr. Stephen Bates, CTO Eideticom FMS 2018 NVMe SSDs with Persistent Memory Regions, Chander Chadha, Sr. Manager Product Marketing, Toshiba Memory America, Inc. 11

Persistent Event Log First version (TP 4007) Second version Future work SMART / Health

Persistent Event Log First version (TP 4007) Second version Future work SMART / Health Log Snapshot Subsystem hardware error Power Excursion Firmware Commit Event Set Feature Voltage Excursion Timestamp Change Format Rebuild assist notification Power-On or Reset Sanitize NVMe-MI failures Vendor Specific Namespace Create/Delete IO Determinism TCG Performance stats Temperature Excursion The log is intended to persistently capture significant events for use by software/system vendors that are not the NVMe subsystem manufacturer such as operating systems, management software, storage system vendors, etc. 12

Administrative Controller The new NVMe spec will have 3 controller types Administrative Controller I/O

Administrative Controller The new NVMe spec will have 3 controller types Administrative Controller I/O Controller Admin required IO required Admin prohibited Create/delete completion and submission queues Flush Get Log Page Create/delete completion and submission queues Get Log Page Read Identify Write Discovery Controller A discovery controller is a special type of controller used in NVMe over Fabrics to provide access to a Discovery Log Page Abort Set/Get Features 13

Discovery A host connects to a DISCOVERY controller to find out what NVMe™ stuff

Discovery A host connects to a DISCOVERY controller to find out what NVMe™ stuff is “out there” • The discovery controller has a list of available devices (available NVMe subsystems, NVMe ports) • The host can then connect to the things it has discovered and find namespaces to access • One discovery service can point to other discovery services (nesting) The “root” of discovery must be manually configured A discovery service can’t tell a host if something changes § Like if a new device shows up; or § If a new port shows up; or § If a completely new discovery service shows up Special Thanks: Fred Knight, Net. App 14

NVMe™ Multipathing and Namespace Sharing Technical Term: Asymmetric Namespace Access (ANA) NVMe™ Multipathing I/O

NVMe™ Multipathing and Namespace Sharing Technical Term: Asymmetric Namespace Access (ANA) NVMe™ Multipathing I/O refers to two or more completely independent PCI Express paths between a single host and a namespace Namespace sharing enables two or more hosts to access a common shared namespace using different NVM Express controllers Host Port A Port B NVMe™ Controller 1 NVMe™ Controller 3 NVMe™ Controller 4 NVMe™ Controller 1 NVMe™ Controller 2 NVMe™ Controller 3 NVMe™ Controller 4 NSD 1 NSD 1 Namespace NVMe™ Multipathing Port C Host A Host B Host C Namespace Sharing Both multi-path I/O and namespace sharing require that the NVM subsystem contain two or more controllers 15

Rebuild Assist • Introduces new NVMe command – Get LBA Status to get a

Rebuild Assist • Introduces new NVMe command – Get LBA Status to get a list of Potentially Unrecoverable LBAs • Tracked LBAs – done in background by drive • Untracked LBAs – initiated by host, informs the drive to scan for affected LBAs • Introduces new log page - LBA Status Information • Introduces a new set features command - LBA Status Information Attributes 16

Rebuild Assist – Untracked List Example Controller: • Detects die failure NS 1 and

Rebuild Assist – Untracked List Example Controller: • Detects die failure NS 1 and NS 2 affected • Update LBA Status Information log page • Issue asynchronous event HOST: • Read LBA Status Information log page Controller • Remove Untracked LBAs from log page HOST • Issues Get Status command NS 1 and NS 2 • Controller • Scan Indirection table find Untracked List • Return Untracked List • Remove LBAs from Untracked List HOST • Write NS 1 LBAs A – B • Write NS 1 LBAs C – D • Write NS 2 LBAs A – Z 17

Miscellaneous management and data integrity Enhanced Command Retry • New get features command -

Miscellaneous management and data integrity Enhanced Command Retry • New get features command - Host Behavior Support • New status code - Command Interrupted • New identify controller structure - Command Retry Delay Time Namespace Granularity – hints for optimal drive utilization during namespace creation • Namespace Granularity Descriptor List Verify – controller verifies integrity of data and protection information without sending data to the host • New command in standard NVM Command Set 18

NVMe™ 1. 4 –Namespace Write Protection is an optional configurable controller capability that enables

NVMe™ 1. 4 –Namespace Write Protection is an optional configurable controller capability that enables the host to control the write protection state of a namespace. (exactly what you think it does) Could be used for secure space on drive, bootloader, backup image, important system files 19

Architected for Performance

Architected for Performance