OFED 1 2 Status and Contents April 07

  • Slides: 26
Download presentation
OFED 1. 2 Status and Contents April 07 Tziporet Koren http: //openfabrics. org/

OFED 1. 2 Status and Contents April 07 Tziporet Koren http: //openfabrics. org/

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2 New Features Ø OFED 1. 2 Status Ø What’s next? http: //openfabrics. org/ Mellanox Technologies 2

OFED - Open Fabrics Enterprise Distribution Ø Enterprise Working Group (EWG) within Open Fabrics

OFED - Open Fabrics Enterprise Distribution Ø Enterprise Working Group (EWG) within Open Fabrics Alliance (OFA) Ø Collaborative effort to test & release OFA software Ø Broader test participation Ø Multi-vendor interoperability Ø Ready for OS vendor adoption Ø Support for many distributions Ø Components – Kernel & User Space Ø Add-on components for vendors to differentiate above OFA Reduces deployment complexity and cost http: //openfabrics. org/ Mellanox Technologies 3

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2 New Features Ø OFED 1. 2 Status Ø What’s next? http: //openfabrics. org/ Mellanox Technologies 4

OFA Linux Software Stack IP Based App Access Diag Open Tools SM Sockets Based

OFA Linux Software Stack IP Based App Access Diag Open Tools SM Sockets Based Access User Level MAD API User APIs Mid-Layer Provider Hardware Various MPIs User Space Kernel Space Clustered DB Access to File Systems UDAPL Open. Fabrics User Level Kernel bypass Upper Layer Protocol Block Storage Access VNIC IPo. IB Verbs & CMA / API SDP Lib SDP SRP i. SER RDS NFS-RDMA RPC Connection Manager Abstraction (CMA) SA MAD Client SMA Connection Manager Open. Fabrics Kernel Level Verbs / API Hardware Specific Driver Cluster File Sys Kernel bypass Application Level Hardware Specific Driver Infini. Band HCA i. WARP R-NIC SA Subnet Administrator MAD Management Datagram SMA Subnet Manager Agent PMA Performance Manager Agent IPo. IB IP over Infini. Band SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) i. SER i. SCSI RDMA Protocol (Initiator) RDS Reliable Datagram Service VNIC Virtual NIC UDAPL User Direct Access Programming Lib HCA Host Channel Adapter R-NIC RDMA NIC Key Common Apps & Access Methods for using OF Stack Infini. Band i. WARP http: //openfabrics. org/ Mellanox Technologies 5

OFED 1. 2 Components OFA Ø HCA/NIC Drivers development Add on Ø Mellanox, QLogic,

OFED 1. 2 Components OFA Ø HCA/NIC Drivers development Add on Ø Mellanox, QLogic, IBM, Chelsio Ø Core: verbs, MAD, SMA, CMA, SA cache Ø IPo. IB Ø SDP Ø SRP, i. SER Ø RDS Ø VNIC Ø UDAPL Ø OSM Ø Diagnostic tools Ø Bonding module Ø MPI Components: Ø MVAPICH Ø Open MPI Ø MVAPICH 2 Ø MPI tests: OSU benchmarks, Intel MPI benchmarks, Presta New in 1. 2 http: //openfabrics. org/ Mellanox Technologies 6

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2 New Features Ø Kernel Ø High Availability Ø User Level Ø Management Ø i. WARP Ø MPI Ø OFED 1. 2 Status Ø What’s next? http: //openfabrics. org/ Mellanox Technologies 7

Main New Features - Kernel Ø GA level for the EDC market: Ø SDP,

Main New Features - Kernel Ø GA level for the EDC market: Ø SDP, RDS, High Availability, Storage (i. SER and SRP) Ø Stability improvements Ø Performance improvements: Ø New - IPo. IB Connected Mode (~1000 MB/Sec) Ø New - RDS for Oracle Ø SDP message BW: Ø 10 X for small messages Ø 5 X for medium messages Ø Scalability: Ø SDP memory consumption limit http: //openfabrics. org/ Mellanox Technologies 8

High Availability - IPo. IB Ø Fails over from one interface to another on

High Availability - IPo. IB Ø Fails over from one interface to another on carrier off Ø Two solutions: ØUser space: A script that detects carrier off/on events, reconfigures interfaces, sends rarp to notify remote side ØStatus: GA ØKernel module: Bonding ØCovered in a separate talk ØStatus: Beta http: //openfabrics. org/ Mellanox Technologies 9

High Availability - SRP Ø Failover between two ports/HCAs Ø SRP HA is built

High Availability - SRP Ø Failover between two ports/HCAs Ø SRP HA is built of two parts: Ø srp_daemon – discovery and sets up for all possible paths to SRP targets on the fabric Ø Multipath tool – switches to a different path when a path fails. Current version uses Device Mapper multi-path Ø Device Mapper (DM) driver from Linux kernel Ø The persistent binding and HA are provided by user space apps (dm-multipath & dm-multipathd) Ø Solution works for Red. Hat EL 4 and SLES 10 Ø Status: Beta http: //openfabrics. org/ Mellanox Technologies 10

High Availability – RDS & DSP Ø RDS: ØRDS implementation fully support HA ØRequires

High Availability – RDS & DSP Ø RDS: ØRDS implementation fully support HA ØRequires IPo. IB HA to work Ø SDP: ØDoes not support HA ØCan be implemented for same HCA using APM ØFor multi-HCA support requires protocol change http: //openfabrics. org/ Mellanox Technologies 11

Main New Features - User Level Ø libibverbs 1. 1: ØFork support (requires apps

Main New Features - User Level Ø libibverbs 1. 1: ØFork support (requires apps change) ØBetter low-level driver handling, including multiple drivers linked in statically ØDocumentation: man pages Ø librdmacm (u. CMA) 1. 0: ØMulticast joining from user space ØUD support http: //openfabrics. org/ Mellanox Technologies 12

Main New Features - Management Ø Open. SM: Ø Routing improvements Ø Performance improvement

Main New Features - Management Ø Open. SM: Ø Routing improvements Ø Performance improvement to min hop and up/down of over an order of magnitude Ø New fat-tree and LASH algorithms Ø SA optional record support “virtually” complete Ø IB router enablement Ø SA database dump/restore Ø Many diagnostic improvements since OFED 1. 1 Ø Covered in Do. E tools talk Ø ibdiagui Ø GUI for ibdiagnet Ø Used at SC 06 http: //openfabrics. org/ Mellanox Technologies 13

OFED 1. 2 - i. WARP Support Ø Chelsio NIC supported Ø Verbs and

OFED 1. 2 - i. WARP Support Ø Chelsio NIC supported Ø Verbs and CMA APIs are the same as Infiniband Ø ULPs that are supported: Ø MPI (mvapich 2 tested) Ø u. DAPL Ø Basic Testing: Ø u. DAPL Ø mvapich 2 Ø NFS-RDMA Ø Status: Beta http: //openfabrics. org/ Mellanox Technologies 14

Main New Features - MPI Ø MPI implementations: ØMVAPICH: version 0. 9. 9 ØOpen

Main New Features - MPI Ø MPI implementations: ØMVAPICH: version 0. 9. 9 ØOpen MPI: version 1. 2. 1 ØMVAPICH 2: version 0. 9. 8 (New) Ø Common MPI setup sourcing: ØSimple menu-driven interface to choose which MPI implementation to set as the default on a per-user and/or system-wide basis http: //openfabrics. org/ Mellanox Technologies 15

Main New Features - MVAPICH Ø MVAPICH – Version 0. 9. 9 Ø Improved

Main New Features - MVAPICH Ø MVAPICH – Version 0. 9. 9 Ø Improved message coalescing: Ø Reduction of per QP send queues for reduction in memory requirement Ø Increases the small message messaging rate significantly Ø Multi-core optimizations: Ø Optimized scalable shared memory design Ø Optimized, high-performance shared memory aware collective operations Ø Multi-port support for enabling user processes to bind to different IB ports for balanced communication performance Ø On-demand connection management using native IB UD support Ø Multi-path support for hot-spot avoidance in large scale clusters using LMC Ø Memory Hook Support provided by integration with ptmalloc 2 library http: //openfabrics. org/ Mellanox Technologies 16

Main New Features - Open MPI Ø Open MPI - 1. 2. 1 Ø

Main New Features - Open MPI Ø Open MPI - 1. 2. 1 Ø Improvements to scalability of launching applications on large numbers of nodes Ø "Installdirs" functionality (install OMPI into one place and then be able to move it elsewhere; good for ISV's) Ø Support fork() when using the OF libibverbs Ø Support for setting fixed limits on registered memory Ø Fixes for heterogeneous network environments (e. g. , different number of IB ports on different hosts) http: //openfabrics. org/ Mellanox Technologies 17

MPI - MVAPICH 2 Ø Includes most of the features of MVAPICH Ø Performance

MPI - MVAPICH 2 Ø Includes most of the features of MVAPICH Ø Performance and scalability comparable to MVAPICH for two-sided communication Ø Added MPI-2 features (one-sided communication, collectives and datatype) Ø Integrated Multi-rail support Ø Multi-threading support (MPI_Thread_Multiple) Ø RDMACM support for Infini. Band i. WARP Ø Checkpoint/Restart support for application transparent systems-level fault tolerance http: //openfabrics. org/ Mellanox Technologies 18

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2 New Features Ø OFED 1. 2 Status ØOFED 1. 2 Release Status ØOFED 1. 2 System Matrix ØThird Party Components Testing Ø What’s next? http: //openfabrics. org/ Mellanox Technologies 19

OFED 1. 2 Release Status Ø Feature freeze: Feb 2 Ø Alpha: Feb 14

OFED 1. 2 Release Status Ø Feature freeze: Feb 2 Ø Alpha: Feb 14 Ø Beta: Mar 14 Ø RC 1: Apr 4 Ø RC 2: Apr 18 Ø RC 3: May 3 Ø Release: May 16 http: //openfabrics. org/ Mellanox Technologies 20

OFED 1. 2 System Matrix Ø CPU Arch: Ø X 86, x 86_64, PPC

OFED 1. 2 System Matrix Ø CPU Arch: Ø X 86, x 86_64, PPC 64, ia 64 (IB only) Ø kernel. org: kernel 2. 6. 20 and 2. 6. 19 Ø Novell: Ø SLES 9 SP 3 Ø SLES 10 (SP 1) Ø Redhat: Ø RHEL 4 (up 3 and up 4) Ø RHEL 5 Ø Free distros (Fedora, Su. SE Pro, Ubuntu) Ø Basic testing only http: //openfabrics. org/ Mellanox Technologies 21

Third Party Components Testing Ø Proprietary MPIs: Ø Intel Ø HP (over u. DAPL)

Third Party Components Testing Ø Proprietary MPIs: Ø Intel Ø HP (over u. DAPL) Ø Proprietary SMs: Ø Cisco, Voltaire, Qlogic Ø Storage Targets: Ø i. SER: IP Store (Falcon. Store), Voltaire FC GW Ø SRP: Engenio, MTD 2000, Areca-1220, DDN, Cisco GW http: //openfabrics. org/ Mellanox Technologies 22

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2

Agenda Ø What is OFED? Ø OFED Architecture and Components Ø OFED 1. 2 New Features Ø OFED 1. 2 Status Ø What’s next? http: //openfabrics. org/ Mellanox Technologies 23

What’s Next? Ø OFED 1. 3 – Oct/Nov 07 Ø Features that didn’t make

What’s Next? Ø OFED 1. 3 – Oct/Nov 07 Ø Features that didn’t make it in 1. 2: Ø Minimize integration effort into OS distribution Ø Definition immediately after 1. 2 is out Ø Qo. S - collaborate with IBTA to align schedule of software delivery Ø IPo. IB: NAPI Ø NFS over RDMA integration Ø Mellanox Connect. X IB HCA support Ø Including new features Ø Other features to be agreed upon by OFA and EWG at the conference http: //openfabrics. org/ Mellanox Technologies 24

Summary Ø OFED becomes the industry standard Ø OFED 1. 2 for the EDC

Summary Ø OFED becomes the industry standard Ø OFED 1. 2 for the EDC market: Ø Ø Stability Performance High Availability Scalability Ø OFED 1. 2 for the HPC market: Ø Ø Scalable for large clusters Multi-core support Multi-rail Performance improvements Ø Successful collaboration between all participants http: //openfabrics. org/ Mellanox Technologies 25

Thank You http: //openfabrics. org/

Thank You http: //openfabrics. org/