High Bandwidth High Throughput in the MBNG Data
High Bandwidth High Throughput in the MB-NG & Data. TAG Projects MB - NG Richard Hughes-Jones, Stephen Dallison , Gareth Fairey Dept. of Physics and Astronomy, University of Manchester Robin Tasker Daresbury Laboratory CLRC Miguel Rio, Yee Ting Li Dept. of Physics and Astronomy, University College London e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 1
Topology of the MB – NG Network MB - NG Manchester Domain man 02 UCL Domain Boundary Router Cisco 7609 lon 02 man 01 Edge Router Cisco 7609 lon 01 man 03 UKERNA Development Network lon 03 Key Gigabit Ethernet 2. 5 Gbit POS Access RAL Domain MPLS Admin. Domains e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 2
Datatag Testbed w 03 chi w 04 chi w 05 chi w 01 chi w 02 chi w 06 chi v 10 chi v 11 chi v 12 chi v 13 chi VTHD/INRIA stm 16 (France. Telecom) r 06 gva Alcatel 7770 ONS 15454 w 03 w 01 gva w 02 gva w 03 gva w 04 gva w 05 gva w 06 gva w 20 gva v 02 gva v 03 gva STM 64 3 x 3 x 2 x 8 x r 06 chi-Alcatel 7770 2 x SURF NET cernh 7 GEANT CNAF w 01 bol stm 16(Colt) backup+projects Alcatel 1670 r 05 gva-Juniper. M 10 r 05 chi-Juniper. M 10 r 04 chi-Cisco 7609 CANARIE SURFNET CESNET stm 64 (GC) 7 x 2 x Alcatel 1670 ONS 15454 stm 16 (DTag) r 04 gva Cisco 7606 2 x s 01 chi Extreme S 5 i ONS 15454 s 02 gva Cisco 5505 -management 3 x Chicago Geneva 1000 base. SX s 01 gva Extreme S 1 i Data. TAG CERN/Caltech production Network 1000 base. T 10 Gbase. LX SWITCH SDH/Sonet CCC tunnel GEANT Stm 16(Swisscom) stm 4(DTag) Cisco 2950 -management ar 3 -chicago -Cisco 7606 cernh 4 -Cisco 7609 cernh 7 -Cisco 7609 edoardo. martelli@cern. ch last update: 20030701 e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 3
End Hosts how good are they really ? e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 4
End Hosts b 2 b & end-to-end UDP Tests u Test with UDPmon Supermicro P 4 DP 6 u u Max throughput 975 Mbit/s 20% CPU utilisation receiver packets > 1000 bytes 40% CPU utilisation smaller packets u u u u PCI: 64 bit 66 MHz Latency 6, 1 ms & well behaved Latency Slope 0. 0761 µs/byte B 2 B Expect: 0. 0118 µs/byte § § § PCI Gig. E PCI 0. 00188 6 routers Jitter small 2 -3 µs FWHM e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 5
Signals on the PCI bus u 1472 byte packets every 15 µs Intel Pro/1000 u PCI: 64 bit 33 MHz Data Transfers u 82% usage Send setup Send PCI Receive Transfers u PCI: 64 bit 66 MHz Data Transfers Send setup u u 65% usage Send PCI Receive PCI Data transfers half as long Receive Transfers e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 6
Interrupt Coalescence Investigations u Kernel parameters for Socket Buffer size rtt*BW u u TCP mem-mem lon 2 -man 1 Tx 64 Tx-abs 64 Rx 0 Rx-abs 128 820 -980 Mbit/s +- 50 Mbit/s u u u Tx 64 Tx-abs 64 Rx 20 Rx-abs 128 937 -940 Mbit/s +- 1. 5 Mbit/s u u u Tx 64 Tx-abs 64 Rx 80 Rx-abs 128 937 -939 Mbit/s +- 1 Mbit/s e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 7
txqueuelen-vs-sendstalls u Tx Queue located between IP stack & NIC driver u TCP treats ‘Queue full’ as congestion ! u Results for Lon Man u Select txqueuelen =2000 e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 8
Network Investigations e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 9
Network Bottlenecks u. Backbones 2. 5 and 10 Gbit – usually good (in Europe) u. Access links need care GEANT-NRN and Campus – Super. JANET 4 u. NNW – SJ 4 Access: given as example of good forward planning: 10 November 2002 1 Gbit link 24 February 2003 26 Feb 2003 Upgraded to 2. 5 Gbit u. Trunking – use of multiple 1 Gbit Ethernet links e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 10
24 Hours High. Speed TCP mem-mem u u TCP mem-mem lon 2 -man 1 Tx 64 Tx-abs 64 Rx-abs 128 941. 5 Mbit/s +- 0. 5 Mbit/s e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 11
TCP sharing man 1 -lon 2 u 1 stream every 60 s: § man 1 lon 2 § man 2 lon 2 § man 3 lon 2 u Sample every 10 ms u 1 Stream: § § Average 940 Mbit/s No Dup ACKs No Sendstalls u 2 Streams: § § § Average ~500 Mbit/s Many Dup ACKs Cwnd reduced u 2 Streams: § Average ~300 Mbit/s e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 12
2 TCP streams man 1 -lon 2 u 2 Streams: u Dips in throughput due to Dup ACK u u ~4 losses /sec u A bit regular ? u Cwnd decreases: § § § 1 point 33% Ramp starts at 62% Slope 70 Bytes/us 1 sec e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 13
TCP Protocol Stack Comparisons u Standard TCP High. Speed TCP Scalable TCP MB-NG u kernel on the receiver dropped packets periodically u MB-NG Network rtt 6. 2 ms. Recovery time 1. 6 s u Data. TAG Network rtt 119 ms. Recovery time 590 s 9. 8 min Data. TAG u Throughput of the Data. TAG network was factor ~5 lower than that on the MB-NG network e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 14
Application Throughput e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 15
MB – NG Super. JANET 4 Development Network MAN MCC SJ 4 Dev MB - NG OSM 1 OC 48 POS-SS SJ 4 Dev PC PC 3 ware RAID 0 Gigabit Ethernet 2. 5 Gbit POS Access 2. 5 Gbit POS core MPLS Admin. Domains UCL OSM 1 OC 48 POS-SS PC e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester PC 16
Gridftp Throughput High. Speed. TCP u RAID 0 Disk Tests: § § 120 Mbytes/s Read 100 Mbytes/s Write u Int Coal 64 128 u Txqueuelen 2000 u TCP buffer 1 M byte (rtt*BW = 750 kbytes) TCP ACK traffic u Interface throughput u Data Rate: 520 Mbit/s Data traffic u Same for B 2 B tests u So its not that simple! e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 17
Gridftp Throughput + Web 100 u Throughput Mbit/s: u See alternate 600/800 Mbit and zero u Cwnd smooth u No dup Ack / send stall / timeouts e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 18
http data transfers High. Speed TCP u Bulk data moved by web servers u Apachie web server out of the box! u prototype client - curl http library u 1 Mbyte TCP buffers u 2 Gbyte file u Throughput ~720 Mbit/s u Cwnd - some variation u No dup Ack / send stall / timeouts e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 19
Ba. Bar Case Study: Disk Performace u Ba. Bar Disk Server § § § Tyan Tiger S 2466 N motherboard 1 64 bit 66 MHz PCI bus Athlon MP 2000+ CPU AMD-760 MPX chipset 3 Ware 7500 -8 RAID 5 8 * 200 Gb Maxtor IDE 7200 rpm disks u Note the VM parameter readahead max u Disk to memory (read) Max throughput 1. 2 Gbit/s 150 MBytes/s) u Memory to disk (write) Max throughput 400 Mbit/s 50 MBytes/s) [not as fast as Raid 0] e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 20
Ba. Bar Case Study: Throughput & PCI Activity u 3 Ware forces PCI bus to 33 MHz u Ba. Bar Tyan to MB-NG Super. Micro Network mem-mem 619 Mbit/s u Disk – disk throughput bbcp 40 -45 Mbytes/s (320 – 360 Mbit/s) u PCI bus effectively full! Read from RAID 5 Disks Write to RAID 5 Disks e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 21
Conclusions The MB-NG Project has achieved: u Continuous memory to memory data transfers with an average user data rate of 940 Mbit/s for over 24 hours using the High. Speed TCP stack. u Sustained high throughput data transfers of 2 GByte files between RAID 0 disk systems using Gridftp and bbcp. u Transfers of 2 GByte files using the http protocol from the standard apache Web server and High. Speed TCP that achieved data rates of ~725 Mbit/s. u Ongoing operation and comparison of different Transport Protocols - Optical Switched Networks u Detailed investigation of Routers, NICs & end-host performance. u Working with e-Science groups to get high performance to the user. u Sustained data flows at Gigabit rates are achievable u Use Server quality PCs not Supermarket PCs + care with interfaces u Be kind to the Wizards ! e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 22
More Information Some URLs u MB-NG project web site: http: //www. mb-ng. net/ u Data. TAG project web site: http: //www. datatag. org/ u UDPmon / TCPmon kit + writeup: http: //www. hep. man. ac. uk/~rich/net u Motherboard and NIC Tests: www. hep. man. ac. uk/~rich/net/nic/Gig. Eth_tests_Boston. ppt & http: //datatag. web. cern. ch/datatag/pfldnet 2003/ u TCP tuning information may be found at: http: //www. ncne. nlanr. net/documentation/faq/performance. html & http: //www. psc. edu/networking/perf_tune. html e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 23
Backup Slides e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 24
EU Review Demo Consisted of: Data over TCP Streams Raid 0 Disk Node Monitoring Grid. FTP Site Monitoring e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester Raid 0 Disk Dante Monitoring 25
Throughput on the day ! TCP ACKs Data ~400 Mbit/s e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 26
Some Measurements of Throughput CERN -SARA u. Using the GÉANT Backup Link § 1 GByte file transfers § Blue Data § Red TCP ACKs u. Standard TCP § Average Throughput 167 Mbit/s § Users see 5 - 50 Mbit/s! u. High-Speed TCP § Average Throughput 345 Mbit/s u. Scalable TCP § Average Throughput 340 Mbit/s e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 27
What the Users Really find: u. CERN – RAL using production GÉANT § CMS Tests 8 streams § 50 Mbit/s @ 15 MB buffer § Firewall 100 Mbit/s u. NNW – SJ 4 Access § 1 Gbit link e-Science All Hands Meeting 1 -4 Sep 03 R. Hughes-Jones Manchester 28
- Slides: 28