SCI SOCKET The fastest socket on earth Atle

  • Slides: 38
Download presentation
SCI SOCKET: The fastest socket on earth? Atle Vesterkjær atleve@dolphinics. com http: //www. dolphinics.

SCI SOCKET: The fastest socket on earth? Atle Vesterkjær atleve@dolphinics. com http: //www. dolphinics. com Olaf Helsets vei 6 NO-0619 Oslo, Norway Phone: +47 23 16 70 00 Fax: +47 23 16 71 80 LCSC 2004

SCI SOCKET - Outline • The fastest socket on earth and the impact on

SCI SOCKET - Outline • The fastest socket on earth and the impact on storage and applications ØSCI technology ØSCI SOCKET for storage and applications. ØSCI SOCKET benchmarks LCSC 2004

Highlights of the Dolphin SCI Technology • • Ultra Low Latency Ø CPU has

Highlights of the Dolphin SCI Technology • • Ultra Low Latency Ø CPU has direct access to remote memory Ø No protocol overhead • 1. 4 µs 4 bytes write • < 3 µs 512 bytes write • 0. 2 µs pipelined write Ø Fast failover for HA systems Highly efficient bus bridging Ø Bus Requests and Responses (CPU load/store operations) are translated directly in Hardware to Request and Response Packets Ø Point to Point Links gives Bus Performance and Latency over Distance • High data throughput: ~ 346 MByte/s • 0. 2 µs pipelined write LCSC 2004

Highlights of the Dolphin SCI Technology • Wide Application Area - Common Mode :

Highlights of the Dolphin SCI Technology • Wide Application Area - Common Mode : Multiprocessing Ø Storage, Clustering, Multiprocessing, Embedded Systems, Telecommunication, Defense, Medical Imaging • Choice of Topologies, Ring, Torus, Switched • Shipping in Critical Applications for more than 10 years • Based on ANSI/IEEE 1592 -1992 Scalable Coherent Interface (SCI) Standard LCSC 2004

Linköping University - NSC - SCI Clusters Also in Sweden, Umeå University 120 Athlon

Linköping University - NSC - SCI Clusters Also in Sweden, Umeå University 120 Athlon nodes • Monolith: 200 node, 2 x. Xeon, 2, 2 GHz, 3 D SCI • INGVAR: 32 node, AMD 900 MHz, 2 D SCI • Otto: 48 node, P 4 2. 26 GHz, 2 D SCI • Maxwell: 40 node 2 x. Xeon, 2 D SCI • Bris: 16+2, 2 x Xeon • Total 336 SCI Nodes LCSC 2004

Applications, Database Clustering Ultra Enterprise Cluster • SUN’s High End servers are clustered with

Applications, Database Clustering Ultra Enterprise Cluster • SUN’s High End servers are clustered with Dolphin Cards Ø Money Transaction and Data Base Applications Ø High Availability and Performance Ø Dolphin Ships: Cards and Switches Ø 7 th year of shipments Ø Oracle 9 i Performance and Scaleability Ø SCI runs natively on SUN’s RSM (Remote Shared Memory API LCSC 2004

Mirage 2000 Upgrade, First Test Flight January 2001 Thales uses Dolphin’s Technology as the

Mirage 2000 Upgrade, First Test Flight January 2001 Thales uses Dolphin’s Technology as the main interconnect in the on-board Multi Processor Offered with systems like Mirage 2000 -9, Mirage 2000 -5, Rafale and more LCSC 2004

Space Mission Application Dolphin’s technology is chosen for evaluation http: //sim. jpl. nasa. gov/

Space Mission Application Dolphin’s technology is chosen for evaluation http: //sim. jpl. nasa. gov/ Dolphins in Space! LCSC 2004

SCI Adapter Cards - 64 bit 66 MHz • PCI-, PMC(VME)- and Compact. PCI™SCI

SCI Adapter Cards - 64 bit 66 MHz • PCI-, PMC(VME)- and Compact. PCI™SCI Adapter Card • Industry-best latency Ø 1. 4 microseconds 4 bytes write Ø < 3 microseconds 512 bytes write Ø 0. 2 microseconds pipelined write • High data throughput ~ 346 MBytes/s • Supports both: Ø Direct Memory Access (DMA) Ø Remote Memory Access (RMA) Ø Remote Interrupt • Hot-pluggable cabling • Redundant SCI adapters can be used for Fault-tolerance PCI LC PSB SCI Cluster Adapter PCI to PCI Bridge PCI Extension Reflected Memory LCSC 2004

Dolphin Products: Switches, Chips and Cards LCSC 2004

Dolphin Products: Switches, Chips and Cards LCSC 2004

Torus Topology LC PSB 1 D Topology (Ring) to 10 Nodes SCI PCI 2

Torus Topology LC PSB 1 D Topology (Ring) to 10 Nodes SCI PCI 2 D Torus Topology to 100+ Nodes SCI LC LC PSB SCI PCI 3 D Torus Toplogy to 1000 s of Nodes SCI LC LC LC PSB PCI LCSC 2004 SCI

Dolphin SW • • • All Dolphin SW is free open source (GPL or

Dolphin SW • • • All Dolphin SW is free open source (GPL or LGPL) SISCI – shared memory interface SCI-Sockets Ø Low Latency Socket Library Ø TCP and UDP Replacement Ø User and Kernel level support Ø Release 2. 3 available SCI-MPICH (RWTH Aachen) Ø MPICH 1. 2 and some MPICH 2 features. MPICH 2 in development. Ø New release is being prepared, beta available SCI Interconnect Manager Ø Automatic failover recovory. Ø No single point of failuere in 2 D and 3 D networks. Other Ø SCI Reflective Memory, Scali MPI, Linux Labs SCI Cluster Cray-compatible shmem and Clugres Postgre. SQL, Mandrake. Soft Clustering HPC solution, Xprimes X 1 Database Performance Cluster for Microsoft SQL Servers, Cluster. Frame from Qlusters and Sun. Cluster 3. 1 (Oracle 9 i), My. SQL Cluster LCSC 2004

Latency vs SW SW Latency (1/2 Ping Pong roundtrip) SISCI (Direct HW) 1. 4

Latency vs SW SW Latency (1/2 Ping Pong roundtrip) SISCI (Direct HW) 1. 4 µs SCI-Sockets 2. 3 µs Scali MPI Connect 3. 5 µs SCI-MPICH 3. 8 µs LCSC 2004

Replace in Title/Slide Master with Company Logo or delete SCI SOCKET Legacy Socket Applications

Replace in Title/Slide Master with Company Logo or delete SCI SOCKET Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect

Motivation • Link level speeds of interconnects are increasing Ø Communication bottleneck moved to

Motivation • Link level speeds of interconnects are increasing Ø Communication bottleneck moved to protocol software Ø High speed networks provide their own efficient interfaces • On the other hand: Ø A large number of applications is build around legacy protocols such as TCP/IP suite Ø De-facto standard: Berkeley Sockets API Ø Porting to hardware specific APIs unprofitable in many cases • SCI SOCKET aims to bring together: Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect LCSC 2004

Berkeley Sockets over SCI • High Speed, Low Latency Replacement for Gigabit Ethernet for

Berkeley Sockets over SCI • High Speed, Low Latency Replacement for Gigabit Ethernet for Critical Applications • Bypassing traditional network stacks like TCP/UDP/IP Ø Eliminating protocol overhead and Reducing latency • Transparent to applications, no modifications or recompilation required • Ultra low latency Ø 2. 27 us socket send/receive latency Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect LCSC 2004

Berkeley Sockets over SCI • Data transfer through remote shared memory • Offers new

Berkeley Sockets over SCI • Data transfer through remote shared memory • Offers new socket transport family AF_SCI • Flexible using configuration files Ø Specifying Cluster nodes Ø Specifying ports Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect LCSC 2004

LD_PRELOAD • • Standard mechanism to preload C library functions User defined Library fuctions

LD_PRELOAD • • Standard mechanism to preload C library functions User defined Library fuctions called instead of C library AF_INET selects traditional TCP/IP path AF_SCI selects SCI_SOCKET int socket(int family, int type, int protocol) { if((family == AF_INET) && (type == TCP || type == UDP)) socket_lib(AF_SCI, type); else socket_lib(family, type); } Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect LCSC 2004

SCI SOCKET • Easy installation of the SCI socket library Application Configuration file Legacy

SCI SOCKET • Easy installation of the SCI socket library Application Configuration file Legacy Socket Applications SCI Socket library SCI Standard Socket library Ethernet SCI SOCKET Low Latency SCI Interconnect LCSC 2004

Configuration File /etc/scisock. conf • Selects which machines that can be reached using SCI

Configuration File /etc/scisock. conf • Selects which machines that can be reached using SCI • Optionally /etc/scisock_opt. conf selects which ports that can be reached using SCI #This is a SCI socket config file #Should be placed in /etc/sci # #hostname SCI Node. Id #This is a SCI socket_opt config file #Should be placed in /etc/sci directory # #-key -Type -value node. A 193. 71. 152. 89 Mailhost File-serv Enable. Ports. By. Default Enable. Port Disable. Port Enable. Port. Range Disable. Port. Range 4 8 16 20 tcp|udp -yes/no ’portnumber’ ’start_port end_port’ LCSC 2004

Linux Kernel Socket Switch User App User space Kernel space Cluster File System i.

Linux Kernel Socket Switch User App User space Kernel space Cluster File System i. SCSI Linux Kernel Socket Switch SCI Native SOCKET TCP UDP IP Socket lib Ethernet driver Ethernet HW SCI HW LCSC 2004

Small Message Latency LCSC 2004

Small Message Latency LCSC 2004

TCP STREAM LCSC 2004

TCP STREAM LCSC 2004

TCP-RR SCI SOCKET vs Gigabit Ethernet LCSC 2004

TCP-RR SCI SOCKET vs Gigabit Ethernet LCSC 2004

Scali MPI over SCI SOCKET • SCI SOCKET is 1. 6 - 6. 0

Scali MPI over SCI SOCKET • SCI SOCKET is 1. 6 - 6. 0 times faster than TCP/Gig. E LCSC 2004

Why is SCI SOCKET so fast ? • Small messages are sent using basic

Why is SCI SOCKET so fast ? • Small messages are sent using basic CPU instructions Ø Data are normally located in CPU cache Ø Low cost write post to local memory address Ø Single store CPU instruction to send 8 bytes Ø Raw send latency for 8 bytes is approximately 210 nanoseconds Ø No need to lock down or register memory • Large messages are sent using DMA • Stream-lined and lock-free messaging protocol on top of shared memory • Combination of polling and interrupts • Receive message causes received message to be cached Ø No additional memory access Legacy Socket Applications SCI SOCKET Low Latency SCI Interconnect LCSC 2004

Cluster File Systems • SCI SOCKET: A typical cluster file system will run out

Cluster File Systems • SCI SOCKET: A typical cluster file system will run out of the box • PVFS Ø Open Source / GPL software Ø http: //www. parl. clemson. edu/pvfs/desc. html • Lustre Ø Open Source / GPL software • http: //www. lustre. org/ • GFS Ø Global File System • Commersial file system available from Sistina Ø www. sistina. com/products_gfs. htm LCSC 2004

i. SCSI • SCSI over IP Ø Protocol for encapsulating SCSI commands into IP

i. SCSI • SCSI over IP Ø Protocol for encapsulating SCSI commands into IP packets Ø I/O block data transport over IP networks • i. SCSI and SCI SOCKET can be used to build scalable SAN / NAS solutions i. SCSI Driver TCP/IP NIC IP network NIC TCP/IP SCSI Driver LCSC 2004

i. SCSI over SCI SOCKET • Latency is approximately 10 x better than Gigabit

i. SCSI over SCI SOCKET • Latency is approximately 10 x better than Gigabit Ethernet Ø Latency is reported by Intels ’ktest’ Gigabit Ethernet SCI SOCKET SCSI op 0 x 28 250 us 29 us SCSI op 0 x 2 A 250 us 31 us SCSI op 0 x 25 250 us 27 us LCSC 2004

i. SCSI over SCI SOCKET • Throughput is 2 -4 times Gigabit Ethernet LCSC

i. SCSI over SCI SOCKET • Throughput is 2 -4 times Gigabit Ethernet LCSC 2004

SCI SOCKET comparison Technology Latency Throughput Reference SCI 2. 26 us 2016 Mbps www.

SCI SOCKET comparison Technology Latency Throughput Reference SCI 2. 26 us 2016 Mbps www. dolphinics. com Myrinet 12 us 1818 Mbps www. myrinet. com Gbit Ethernet 23 us 936 Mbps www. dolphinics. com Infiniband 28 us 3768 Mbps IEEE Symposium IPASS 2004 LCSC 2004

SCI vs other interconnects • As reported by Ameslab (Iowa state University, USA) Ø

SCI vs other interconnects • As reported by Ameslab (Iowa state University, USA) Ø Netpipe benchmark LCSC 2004

Applications running SCI SOCKET Ø Intel i. SCSI Ø PVFS Ø LUSTRE Ø My.

Applications running SCI SOCKET Ø Intel i. SCSI Ø PVFS Ø LUSTRE Ø My. SQL Cluster Ø LAM-MPI Ø MPICH 2 Ø PVM Ø Oracle (Client/Server sqlplus) Ø Terra. Grid (tm) by Terrascale Ø Scali MPI Connect™ Ø Latency_bench Ø Netpipe TCP/PVM Ø Netperf LCSC 2004

Current Development Ø Available on X 86, X 86_64, Linux 2. 4 and 2.

Current Development Ø Available on X 86, X 86_64, Linux 2. 4 and 2. 6. Ø Itanium beta release is ready Ø Porting to windows in progress Ø Support for multiple adapters in progress • Data striping gives multiple throughput with no latency penalty or extra CPU load • Redundancy and transparent failover to other SCI adapter and Ethernet LCSC 2004

SCI SOCKET: The fastest socket on earth? Atle Vesterkjær atleve@dolphinics. com http: //www. dolphinics.

SCI SOCKET: The fastest socket on earth? Atle Vesterkjær atleve@dolphinics. com http: //www. dolphinics. com Olaf Helsets vei 6 NO-0619 Oslo, Norway Phone: +47 23 16 70 00 Fax: +47 23 16 71 80 LCSC 2004

LCSC 2004

LCSC 2004

http: //www. gria. org/ • Would you like your computers to earn you extra

http: //www. gria. org/ • Would you like your computers to earn you extra money? • Would you like to have cheap access to tons of computing power? Ø The GRIA project will take Grid technology into the real world, enabling industrial users to trade computational resources on a commercial basis to meet their needs more cost effectively. • GRIA enables organizations to: Ø Outsource computation. • If you need short-term computation, and cannot justify the expense of the hardware purchase, GRIA provides a mechanism to discover, negotiate and utilize other organizations' spare computing resources. Ø Rent out spare CPU cycles. • GRIA provides a mechanism allowing you to commercially offer your spare computing resources on the Grid. LCSC 2004

Acknowledgement • SCI SOCKET kernel module has been developed in the IST-33240 project GRIA

Acknowledgement • SCI SOCKET kernel module has been developed in the IST-33240 project GRIA (http: //www. gria. org) • SCI SOCKET user space software library has been developed in the ITEA project HYADES (http: //www. hyades-itea. org) • The SCI SOCKET software is open source and available under GPL/LGPL. Dolphin strongly appreciates the contribution to the code and testing done by volunteer programmers and partners. • More information about SCI SOCKET can be found at http: //www. dolphinics. com/products/software/sci_sockets. html LCSC 2004