Session id 36777 Usermode IO in Oracle 10

  • Slides: 32
Download presentation

Session id: 36777 User-mode I/O in Oracle 10 g with ODM and DAFS Jeff

Session id: 36777 User-mode I/O in Oracle 10 g with ODM and DAFS Jeff Silberman Systems Architect Network Appliance Margaret Susairaj Server Technologies Oracle Corp

Agenda Ÿ Ÿ Ÿ Ÿ The Transportation Revolution Concepts: RDMA, DAT, DAPL, DAFS RDMA

Agenda Ÿ Ÿ Ÿ Ÿ The Transportation Revolution Concepts: RDMA, DAT, DAPL, DAFS RDMA and Oracle 10 g The DAFS API: User-mode I/O and OS bypass ODM : The File I/O API for Oracle 10 g RAC and Infini. Band Performance Summary, Q&A

The Transportation Revolution Ÿ “dumb” networks vs. reliable data movers Ÿ Data copies vs.

The Transportation Revolution Ÿ “dumb” networks vs. reliable data movers Ÿ Data copies vs. RDMA Ÿ Ethernet vs. Infini. Band Ÿ Kernel mode I/O vs. User-mode I/O Ÿ Unix I/O vs. ODM

Concepts Ÿ Remote Direct Memory Access (RDMA) Ÿ Direct Access Transports (DAT) Ÿ Direct

Concepts Ÿ Remote Direct Memory Access (RDMA) Ÿ Direct Access Transports (DAT) Ÿ Direct Access Provider Library (DAPL) Ÿ Direct Access File System (DAFS)

RDMA Ÿ Memory to memory access over a network Ÿ Requires both intelligent transports

RDMA Ÿ Memory to memory access over a network Ÿ Requires both intelligent transports and intelligent network interface cards (NICs) Ÿ Cannot be done over “standard” Gigabit Ethernet Ÿ Operations defined with respect to the server Ÿ Examples: – FC/VI, Gb. E/VI, DAPL/IB

Direct Access Transports (DAT) Ÿ Both RDMA read and RDMA write operations supported Ÿ

Direct Access Transports (DAT) Ÿ Both RDMA read and RDMA write operations supported Ÿ Multiple concurrent virtual connections Ÿ Asynchronous I/O Ÿ Direct Data Placement Ÿ Kernel Bypass DAT is transport agnostic

Direct Access Provider Library (DAPL) Ÿ Standards-based API for DAT – DAT Collaborative: Over

Direct Access Provider Library (DAPL) Ÿ Standards-based API for DAT – DAT Collaborative: Over 40 companies including both Oracle and IBM Ÿ Designed to facilitate higher-level RDMA protocols – Ÿ Ÿ Examples: DAFS, Oracle RAC DAPL “providers” are typically the NIC providers A portable API for RDMA transports u. DAPL for user-level access k. DAPL for kernel-based access

Direct Access File System (DAFS) Ÿ DAFS is a remote file access protocol Ÿ

Direct Access File System (DAFS) Ÿ DAFS is a remote file access protocol Ÿ DAFS derives heavily from NFSv 4 Ÿ Target is local data-center file sharing Ÿ Ideal cluster file system for RAC Ÿ Rich set of Oracle-inspired semantics Ÿ Will always perform better than TOE’s – Zero touch, zero data copy

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA Buffers Oracle Disk Manager Infini. Band Adapter Direct Data Control Oracle File I/O API RDMA NIC (RNIC)

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA Buffers Oracle Disk Manager DAFS API Oracle File I/O API DAFS user-level I/O library . . . Infini. Band Adapter Direct Data Control RDMA NIC (RNIC)

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA Buffers Oracle Disk Manager DAFS API DAT Infini. Band Adapter Direct Data Control Oracle File I/O API DAFS user-level I/O library DAT library vector RDMA NIC (RNIC)

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA Buffers Oracle Disk Manager Oracle File I/O API DAFS API DAPL Provider DAFS user-level I/O library DAT library vector DAT DAPL Provider Infini. Band Adapter Direct Data . . . Control DAPL Provider Direct Access Provider Libraries RDMA NIC (RNIC)

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA

Oracle 10 g and RDMA DAFS File Server Buffers DAFS Engine 10 g SGA Buffers Oracle Disk Manager Oracle File I/O API DAFS API DAPL Provider DAFS user-level I/O library DAT library vector DAT DAPL Provider HCA Driver Infini. Band Adapter Direct Data . . . Control DAPL Provider HCA Driver Direct Access Provider Libraries Transport-specific Device Drivers RDMA NIC (RNIC)

Oracle 10 g and RDMA Ÿ Low latency Ÿ High Bandwidth Ÿ Memory to

Oracle 10 g and RDMA Ÿ Low latency Ÿ High Bandwidth Ÿ Memory to memory transfer Ÿ Minimal CPU intervention Ÿ User-mode I/O üStorage I/O requests üData block transfers for cache fusion üLock request messages üParallel Query internode messages

DAFS API: User-Mode I/O Ÿ Memory Registration Ÿ Asynchronous I/O Ÿ Security / Authentication

DAFS API: User-Mode I/O Ÿ Memory Registration Ÿ Asynchronous I/O Ÿ Security / Authentication Ÿ I/O Fencing Ÿ I/O Completion Groups Ÿ Multi-path I/O

DAFS Implementation Models Kernel File System Raw Device Driver Application (unchanged) Buffers File System

DAFS Implementation Models Kernel File System Raw Device Driver Application (unchanged) Buffers File System DAFS Layer User Library Application (modified) Buffers File I/O Syscalls Device Driver DAFS Layer Disk I/O Syscalls Buffers u. DAFS Library DA Provider Library Adapter Driver HBA/HCA Application Transparency Performance User Space OS Kernel

Oracle 10 g Ÿ Grid-based computing – – – Easily scale the number of

Oracle 10 g Ÿ Grid-based computing – – – Easily scale the number of servers Easily scale the storage Easily share all resources Ÿ Ease of Manageability Ÿ Improved Performance Capability Ÿ Support for new technologies

Oracle Disk Manager (ODM) Ÿ The File I/O API for Oracle Ÿ Performance of

Oracle Disk Manager (ODM) Ÿ The File I/O API for Oracle Ÿ Performance of Raw Disk with the Manageability of Files

Oracle Disk Manager (ODM) Problem Solution No consistent standard I/O interfaces vary with each

Oracle Disk Manager (ODM) Problem Solution No consistent standard I/O interfaces vary with each operating system variant. The ODM API semantics are invariant across all OS platforms including Windows No standard asynchronous I/O model for regular files. Asynchronous I/O, if it was provided, relied on special kernel-based device drivers. ODM supports both synchronous and asynchronous I/O for any regular files in an ODM file system No standard for batching I/O requests within a single I/O call. The odm_io() function provides batch I/O capability, which minimizes the number of system calls and kernel traps Excess system resources consumed when each process in an Oracle instance must open each datafile in the instance ODM provides shared file identifiers. A given fileid can be used by any process in the instance, thereby reducing the number of opens, instance wide.

ODM Advanced File Semantics Ÿ Open with ‘share’ key Ÿ Files not visible until

ODM Advanced File Semantics Ÿ Open with ‘share’ key Ÿ Files not visible until file is initialized Ÿ Files cannot be deleted if open references exist

ODM version 2 Ÿ Zero data copy – Ÿ Ÿ Ÿ Zero touch of

ODM version 2 Ÿ Zero data copy – Ÿ Ÿ Ÿ Zero touch of data, from storage to SGA Memory registration User-mode I/O : Reduced context switches NIC provisioning I/O hints and priorities Non-shared file ids – Same semantics as with Unix file descriptors Ÿ Portability – Advanced semantics are invariant across platforms

Oracle 10 g RAC Servers Redundant paths for high availability or load balancing File

Oracle 10 g RAC Servers Redundant paths for high availability or load balancing File Storage Infini. Band Switches Internet Application Servers Data Center

Performance Ÿ Thanks to Ariel Cohen from Topspin* Communications Ÿ One client / one

Performance Ÿ Thanks to Ariel Cohen from Topspin* Communications Ÿ One client / one server – – 1. 8 GHz Xeon CPU 133 MHz PCI-X bus 4 x IB HCA (10 Gbs) Gigabit Ethernet w/ checksum offload support Ÿ Jumbo frame size of 9000 – Red. Hat Linux 7. 3 *Ariel Cohen. “A Performance Analysis of 4 X Infini. Band Data Transfer Operations”. Proceedings of the International Parallel and Distributed Processing Symposium – Workshop on Communication Architecture for Clusters, April 2003

Performance

Performance

Performance

Performance

NFS and RDMA

NFS and RDMA

Evolution and Revolution Ÿ Hungry apps and database must look elsewhere for extra CPU

Evolution and Revolution Ÿ Hungry apps and database must look elsewhere for extra CPU power – OS bypass for I/O Ÿ High performance transports are here today – Infini. Band offers 10 Gbs w/ 10 usec latency Ÿ Unix and Windows do not provide user-level I/O – The DAFS API does Ÿ Oracle 10 g RAC w/ a single pipe – Both RAC/IPC and user-level file I/O over one IB pipe “Please keep your seatbelts fastened … “

Next Steps High Availability Sessions from Oracle Tuesday in Moscone Room 304 Wednesday in

Next Steps High Availability Sessions from Oracle Tuesday in Moscone Room 304 Wednesday in Moscone Room 304 11: 00 AM 8: 30 AM How Oracle Database 10 g Revolutionizes Availability and Enables the Grid Oracle Database 10 g - RMAN and ATA Storage in Action 11: 00 AM 3: 30 PM Oracle Recovery Manager (RMAN) 10 g: Reloaded Oracle Data Guard: Maximum Data Protection at Minimum Cost 1: 00 PM 5: 00 PM Proven Techniques for Maximizing Availability Oracle Database 10 g Time Navigation: Human-Error Correction 4: 30 PM Data Guard SQL Apply: Back to the Future For More Info On Oracle HA Go To http: //otn. oracle. com/deploy/availability/

Next Steps High Availability Sessions from Oracle Thursday Database HA Demos All Four Days

Next Steps High Availability Sessions from Oracle Thursday Database HA Demos All Four Days In The Oracle Demo Campground 8: 30 AM in Moscone Room 304 Oracle Database 10 g Data Warehouse Backup and Recovery: Automatic, Simple, Reliable 8: 30 AM in Moscone Room 104 Building RAC Clusters over Infini. Band Real Application Clusters Data Guard Database Backup & Recovery Flashback Recovery Log. Miner, Online Redefinition, and Cross Platform Transportable Tablespaces For More Info On Oracle HA Go To http: //otn. oracle. com/deploy/availability/

Reminder – please complete the Oracle. World online session survey Thank you.

Reminder – please complete the Oracle. World online session survey Thank you.

Q& A QUESTIONS ANSWERS

Q& A QUESTIONS ANSWERS