High Speed Supercomputer Communications in Broadband Networks Ralph
High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger Research Center Jülich Gmb. H R. Niederberger@fz-juelich. de Helmut Grund, Ferdinand Hommes, Eva Pless GMD - German National Research Center for Information Technology Helmut. Grund@gmd. de, Ferdinand. Hommes@gmd. de, Eva. Pless@gmd. de Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 1
Introduction • GTB West – Goals, Projects, Timeframes and Configuration – Super Computer Impediments and Solutions • Status of Cray Super Computer Communications • Future Tests • Summary Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 2
Introduction • New kinds of Microprocessors and expansion of internal storage lead to new kinds of supercomputing systems solving best different kinds of problems. • Two mostly known types of supercomputers are massively parallel systems and vector systems. • A new kind of supercomputer is the Metacomputer. • A Metacomputer distributes an application onto 2 or more equal or distinct machines which are coupled dynamically via an external network. • This distribution may be done by quality (functional distribution) or by quantity. Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 3
Introduction Distribution of an application onto more than one system only recommended, if computation time can be decreased significantly. This depends on degree of parallalization and time of communication between processes Communication time depends on: communication medium and protocol length of communication link number of intermediate systems performance of communicating systems (cpu, internal communication, . . . ) Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 4
GTB - West Project sponsored by BMBF and DFN with financial participation of the project partners Partners: Research Center Jülich Gmb. H http: //www. fz-juelich. de GMD - German National Research Center for Information Technology http: //www. gmd. de Deutsches Klimarechenzentrum http: //www. dkrz. de Alfred Wegener Inst. for Polar & Marine Res. http: //www. awi. de Pallas Gmb. H http: //www. pallas. de o. tel. o http: //www. o-tel-o. de Runtime: More Info: Aug, 1 st 1997 - Jan, 31 th 2000 http: //www. fz-juelich. de/gigabit Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 5
GTB - West Projects • Giganet - Configuration, Management and Performance Analysis of the Gigabit Testbed • Methods and Tools, Software Support • Solute Transport in Ground Water • Algorithmic Analysis of Magnetoenzephalography Data • Complex Visualization over a Gigabit WAN • Multimedia applications in a Gigabit WAN • Distributed calculations of climate and weather models • Porting Parallel and Distributed Applications from CEC CISPAR Project Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 6
GTB West - Goals • Demonstrate the usefulness of high speed wide-area communication networks for scientific computing • Engage in selected applications which are known to need very high communication bandwidth • Major objective: – coupling of architecturally different supercomputers i. e. vector computers and massively parallel computers to build a new kind of metacomputer • strengthen the know how in – high speed computer communications, – metacomputing in LAN and WAN environments – coupling of the super computer centers in Germany Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 7
Status (Phase 1) • • Installation on top of o. tel. o high tension cables most problems at last mile at GMD underground workings necessary at Research Center Jülich installation of fiber cables together with hot water supply • o. tel. o offers SDH infrastructure and uses Lucent technologies • no major problems using o. tel. o trunc lines have been found Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 8
Trunc lines (Phase 1) Repeater Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 9
Status • • 622 Mbit/s link stable for one year CRAY T 3 E Supercomputer connected with 155 Mbit/s ATM FZJ GMD link upgrade (622 Mbit/s 2. 4 Gbit/s): End of July 1998 Aug. 5. 1998: – first ATM-WAN connection with 2. 4 Gbit/s user data (8 Workstations with 155 / 622 Mbit/s interfaces) – 96. 4% (TCP) – 99. 97% (UDP) (high packet loss) Beta test FORE ASX- 4000 ended Beta test Hi. PPI to ATM gateway (SUN and SGI) ongoing Throughput and delay measurement ongoing Monitoring and accounting of trunc line with HP -Open. View at GMD Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 10
IBM SP 2 internals external machine . . 10 Mbps external Ethernet 10 Mbps internal Ethernet. . SP 2 -Nodes: Frame 3 HP-Switch . . . . Frame 1 HP-Switch 800 Mbps 155 Mbps HIPPI Switch ATM Switch 622 Mbps 800 Mbps external machine Frame 2 . . Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA external machine . . High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. external machine 11
Cray T 3 E internals T 3 E-3 D-Thorus 4 proc T 3 E-processors: communication nodes: . . 4 proc . . Giga. Ring HIPPI ATM FDDI Ethernet 800 Mbps external machine. . external machine Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA 10 Mbps external machine FDDI Ring 800 Mbps external machine 155 Mbps ATM Switch HIPPI Switch 4 proc external. . machine 100 Mbps external machine High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 12
Cray T 3 E internals (2) User PE Support/ OS PE Device. Treiber PE I/O Controller Giga. Ring D-MPN ATM I/O Controller Giga. Ring HPN MPN ATM Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA FDDI MPN: Sbus-System with 200 MB Ethernet High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 13
Impediments Current problem: Communication throughput within and between supercomputers differs extremly Example: Cray/T 3 E with internal communication throughput of 500 MB/s bidirectional into three dimensions (3 D torus) High speed external connections: (Fast-) Ethernet (10 -100 Mb/s), FDDI (100 Mb/s) , Hi. PPI (800 Mb/s-1600 Mb/s), Super Hi. PPI (6400 Mb/s ), ATM 155 Mb/s, 622 Mb/s - 2. 4 Gb/s, Gigabit-Ethernet (1 Gb/s), Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 14
Cray Systems Network Environment World Wide Internet CRAY/T 3 E 512 Essential Hi. PPI EPS 1004 CRAY/T 3 E 256 Cisco Router FDDI Concentrator CRAY/T 90 CRAY/J 90 Compute Server Ju. Net Cisco Router Connecting a Cray system with n systems 2 * n PVC entries Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA 155 Mb/s ATM CRAY/J 90 File Server High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 15
PVC configuration not recommended Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 16
High speed communication Alternatives communicating between CRAY/T 3 E and IBM/SP 2 • raw. Hi. PPI (800 Mb/s) – Hi. PPI Tunneling (622 Mb/s, currently MTU 9180) – Hi. PPI Sonet Extender (currently 155 Mb/s or 932 Mb/s) • TCP/IP via Hi. PPI (622 Mb/s, currently MTU 9180 because of routing) • native. ATM (155 Mb/s, 622 Mb/s) (Hardware ? , Software ? ) • TCP/IP via ATM (155 Mb/s, 622 Mb/s) (Hardware ? ) Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 17
Throughput considerations • Transmission time in fiber optics cables tt = length of medium / (0, 66 * c) with c = 300. 000 km/s additionally delays in routers, switches etc. ttopt = 100 km / (0, 66 * 300. 000 km/s) = 1/2000 s = 0, 5 ms use path mtu discovery apply socket buffers to bandwidth delay product • BDP = (B * RTT) = 622 Mb/s * 0. 5 ms 311 kb 40 k. B • use setsockopt to set: – SO_SNDBUF und SO_RCVBUF 1 MB – TCP_NODELAY=1 and TCP_WINSHIFT=4 Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 18
Throughput considerations (2) Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 19
Supercomputer - Impediments CRAY T 3 E communication throughput measured • Maximum of 115 Mb/s via TCP/IP over ATM MTU 9180 (Default MTU from standard) • Maximum of 430 Mb/s via TCP/IP over Hi. PPI MTU 64 KB because of IP-Header fields • Maximum of 530 Mb/s via raw Hi. PPI no real MTU limitation Netperf between SUN Ultra/60 and SGI Origin 200 maximum of 535 Mb/s user data via 622 Mb/s ATM Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 20
Net topology FZJ SUN Ultra 2 2 Proc IBM SP 2 GMD SUN Enterprise 4000 GMD - FZJ 8 x HIPPI Switch Fore ASX-1000 Fore ASX-4000 SUN Ultra 60 2 Proc CRAY T 3 E 256 Proc atmsun Fore ASX-1000 Fore ASX-4000 SGI O 200 HIPPI Switch Cisco A 100 Fore ASX-1000 SUN atmfore Fore ASX-1000 CRAY T 3 E 512 Proc Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA CRAY T 90 16 Proc SGI Cisco LS 1010 SUN atmsun atmfore High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. Legende: 155 Mbit/s 622 Mbit/s 800 Mbit/s 2, 4 Gbit/s 21
Gigabit Testbed West Network Layout IBM /SP 2 CRAY/T 3 E Gigabit Testbed West Hi. PPI 800 Mb/s MTU 64 K ATM 622 Mb/s 64 K MTU SGI/SUN Hi. PPI/Sbus Hi. PPI/PCI 2. 4 Gb/s FZJ ATM ASX 4000 Cisco Router GMD Cisco Router ATM 155 / 622 Mb/s 9 K MTU 110 km Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 22
Gigabit Tests July, 30 th 1998 GMD filou 2. 4 Gbit Interface SUN Enterprise 5000 2. 4 Gbps baloo 622 Mbps ATM Switch ATM/SDH SUN Ultra 60 Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 23
Gigabit Speed Record August, 5 th 1998 FZJ GMD 622 Mbps 2. 4 Gbps ATM Switch 622 Mbps 3 * 622 Mbps ATM Switch 622 Mbps ATM/SDH 622 Mbps 155 Mbps Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA 155 Mbps 622 Mbps High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 24
Gigabit Testbed West Connecting CRAY T 3 E and IBM SP 2 via separate network Problem: Interrupt rate of CRAY/T 3 E systems Solution: Create two logical networks upon one physical network • network 1 with 64 k MTU between gateway systems (exact MTU 65280) as specified for CRAY systems on Hi. PPI networks • network 2 with 9. 180 MTU between directly connected ATM systems Advantage: MTU-Path-Discovery on the end systems will find maximum value to use. MTU: 9180 Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA 4356 1500 9180 65280 High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 25
Status CRAY Hi. PPI Testbed configuration CRAY/T 90 CRAY/J 90 HPN 1 134. 94. 72. 4 134. 94. 72. 5 134. 94. 72. 1 CRAY/T 3 E 256 CRAY/T 3 E 512 HPN 1 192. 168. 115. 6 134. 94. 72. 3 134. 94. 72. 2 192. 168. 115. 26 (gmdsp 2) 192. 168. 115. 10 Hi. PPI-Switch 192. 168. 115. 25 Ethernet module Parallel Hi. PPI card Serial Hi. PPI card 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 192. 168. 115. 5 SUN Ultra 60 192. 168. 115. 9 SGI O 200 Fore ASX 4000 192. 168. 110. 36 192. 168. 116. 36 Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA 192. 168. 110. 3 192. 168. 116. 3 (gmdsun) 192. 168. 110. 49 192. 168. 116. 49 High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. Fore ASX 4000 26
Communication nominal and real throughput Nominal: 800 Mbps 622 Mbps 2. 4 Gbps 622 Mbps 800 Mbps CRAY T 3 E/256 FZJ H/Arouter CRAY T 3 E/512 HIPPI CRAY T 90 Real: GMD ATM Switch 430 Mbps Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA IBM SP 2 H/Arouter HIPPI ATM Switch ATM/SDH 530 Mbps 370 Mbps High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 370 Mbps 27
Gigabit Testbed West TCP-Gateway-Layout (Beta-Tests in Jülich) CRAY/T 3 E (256) 430 (direct) 340 (gate) 350 (direct) 270 (gate) CRAY/T 3 E (512) 430 Ethernet module Parallel Hi. PPI 800 Mb/s MTU 64 K 370 0 320 380 1 2 3 4 5 6 7 8 350 315 9 10 11 12 13 14 15 Serial Hi. PPI 800 Mb/s MTU 64 K 440 250 535 SUN Hi. PPI/PCI Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA ATM 622 Mb/s MTU 9180 or 64 K 415 SGI Hi. PPI/PCI High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 28
Future Tests CRAY Hi. PPI Testbed configuration • Solve Hi. PPI problem. Using large MTU sizes (65280 k. B) does not work correctly • Testing the other Cray Systems with Hi. PPI to ATM gateway (T 90, J 90) • Testing different configurations if testbed is available – using 2 HPN 1 – using 2 Communication nodes within CRAY/T 3 E – using one Gateway for more than one machine – using same Hi. PPI device for local and remote communication – using multiple Hi. PPI devices for advanced throughput High Speed Supercomputer Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 29
Possible future test scenario multiple Hi. PPI Gateway ASX 4000 Gateway : Hi. PPI : ATM 622 Mb/s 4*ATM 622 Mb/s ATM 2. 4 Gb/s ATM 622 Mb/s 4*ATM 622 Mb/s Internal communication: M 1 Mm, N 1 Nn External communication: Mm-k+1, Mm-k+2, . . Mm (Multiplex of k Hi. PPI interfaces) IP over Hi. PPI IP over ATM IP over Hi. PPI Nn-j, Nn-j+1, . . . Nn (Multiplex of j Hi. PPI interfaces) Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 30
Summary • No problems left with 2, 4 Gbit/s ATM/SDH trunc line • Workstation systems can generate and transfer datastreams saturating a 622 Mbit/s ATM link • Coupling of supercomputer systems over WANs with high bandwidth currently only possible with an HIPPI to ATM gateway solution and special configuration But time is ready for gigabit transmissions. Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 31
Summary • Applications are capable using gigabit networks. • Metacomputing may become reality in LAN as well as in WAN environments • Therefore supercomputer system designers have to prepare their systems with gigabit communication interfaces „The net is the computer and the computer is the net“ ((Super. Computer) Communications) != (Super (Computer. Communications)) Cray User Group Meeting 24 -28 May 1999, Minneapolis, USA High Speed Supercomputer Communications in Broadband Networks R. Niederberger@fz-juelich. de et al. 32
- Slides: 32