Network Monitoring for SCIC Les Cottrell SLAC ICFASCIC

  • Slides: 15
Download presentation
Network Monitoring for SCIC Les Cottrell, SLAC ICFA/SCIC meeting August 24, 2005 www. slac.

Network Monitoring for SCIC Les Cottrell, SLAC ICFA/SCIC meeting August 24, 2005 www. slac. stanford. edu/grp/scs/net/talk 05/icfa-netmonaug 05. ppt Initially funded by Do. E Field Work proposal. Currently partially funded by US Department of State/Pakistan Ministry of Science & Technology 1

Coverage • Measure the network performance for developing regions – From developed to developing

Coverage • Measure the network performance for developing regions – From developed to developing & vice versa – Between developing regions & within developing regions • Originated in High Energy Physics, now focused on DD – Adding monitoring sites in: Africa, S. America, Russia, Pakistan, India – Working with Turkey but ISP blocks pings • http: //www-iepm. slac. stanford. edu/pingerworld/ – Interactive: zoom/pan, mouseover, clickable Monitoring site Remote site Ping. ER coverage Aug 2005 2

Ping. ER Management • No funding for Ping. ER ongoing operational management (40% FTE

Ping. ER Management • No funding for Ping. ER ongoing operational management (40% FTE at the moment) – Develop tools to simplify, automate, reduce manual effort – New installation procedures of monitor site – Assistance to producing executive plots – Provide alerts for unreachable remote sites – Provide alerts if unable to gather data from monitor sites – Check sanity of data and the configuration database – Check host are where we think they are… 3

Triangulation 1/2 • Web hosts with TLDs in many developing countries have proxies in

Triangulation 1/2 • Web hosts with TLDs in many developing countries have proxies in developed countries – E. g. 50% of initially chosen Pakistan Universities had web proxies outside Pakistan – Use IP 2 Location. com & traceroute to verify location, – working on triangulation • Make RTTmin measures to given host from known landmarks • Estimate distance from landmark using d= a. L* RTTmin + b. L – Initial a. L ~ 50 km/ms (speed of light in fiber, factor of 2 for right of way paths, non great-circle-route hop locations), b. L = 0. • Optimize a. L, b. L using RTTmin for known Ping. ER pairs • Locate host lat/long with confidence estimates 4

Triangulation 2/2 • Landmarks: – Using Looking Glass servers (provide pings) – Install web

Triangulation 2/2 • Landmarks: – Using Looking Glass servers (provide pings) – Install web accessible on demand ping tool at Ping. ER monitoring sites – Use Geo. LIM landmarks (for US & W. Europe) • Installing Geo. LIM landmark at NIIT • Will build tool to validate where Ping. ER nodes are really located and fix database or replace 5

Integrate with Mon. ALISA • Mainly to look at closer to real time displays

Integrate with Mon. ALISA • Mainly to look at closer to real time displays – Code is ready, looking for host and disk space to save data 6

Case study on Pakistan • Two sites to join LCG (NUST, QEA/NCP), is connectivity

Case study on Pakistan • Two sites to join LCG (NUST, QEA/NCP), is connectivity adequate? • Prompted by two outages of SEAMEW 3 – Fiber cut off Karachi causes 12 day outage Jun-Jul ‘ 05 • Huge losses of confidence and business 7

Fiber Outage Jun 27 -Jul 8 ‘ 05 • Looked at 9 sites in

Fiber Outage Jun 27 -Jul 8 ‘ 05 • Looked at 9 sites in Pakistan measured from within and outside Pakistan Loss % – Saw big (300=>600 ms) increase in min-RTT as some sites switched to satellite – Losses 2 -3% => >10% 14 Pakistan loss from SLAC 75% – Unreachability 1 -2%=>20% Median – Effect varied by site 25% 0 Jan 04 Jun 05 8

Longer term RTT ms Loss % • Typically once a month losses go to

Longer term RTT ms Loss % • Typically once a month losses go to 20% Feb 05 Another fiber outage, this time of 3 hours! Power cable dug up by excavators of Karachi Water & Sewage Board Jul 05 Jun/Jul outage • Infrastructure appears fragile • Losses to QEA & NIIT are 3 -8% averaged over month 9

Pakistan: Next steps • Established contacts with PERN (manages E&R net connections) and NTC

Pakistan: Next steps • Established contacts with PERN (manages E&R net connections) and NTC (carrier, government monopoly) and PIE (Pakistan Internet Exchange - international carrier interface) – Monitoring PIE backbone router in Karachi • NTC router deprecate pings so can’t monitor it – Establishing Ping. ER monitors in PERN and NTC • Already have one at NIIT. • Want to pin-point causes of poor performance (losses, unreachability) – Monitoring to NIIT via NTC and Broadband/DSL provider to compare providers. 10

First results from S. Africa • Host at Tertiary Education Network (TENET) site at

First results from S. Africa • Host at Tertiary Education Network (TENET) site at Ronderbush – TENET secures for ZA universities & technical colleges management of service contracts, operational functions, other value added services • Monitoring about 45 beacon sites worldwide • Land line links to world, min-RTTs: – Europe: ~215 ms; US: ~250 ms; Russia: ~235 ms; – L. America: ~415 ms; E. Asia: ~450 ms; Pakistan: ~ 465 ms; Australia: ~ 480 ms • Evaluating what sites in Africa to monitor 11

Collaborations/funding • Good news: – Active collaboration with NIIT Pakistan to develop network monitoring

Collaborations/funding • Good news: – Active collaboration with NIIT Pakistan to develop network monitoring including Ping. ER (in particular management) • Travel funded by US State department & Pakistan MOST for 1 year • Have submitted a follow on proposal to USAID – FNAL & SLAC continue support for Ping. ER management and coordination • Bad news (currently unfunded, could disappear): – Do. E funding for Ping. ER terminated – Harder to cover from SLAC HEP budget, given new project oriented budgeting – Proposal to EC 6 th framework with ICTP, ICT Cambridge UK, CONAE Argentina, Usikov Inst Ukraine, STAC Vietnam VUB Belgium rejected, also proposal to IDRC/Canada February ‘ 04 rejected – Working with ICTP proposal • Hard to get funding for operational needs (~0. 4 FTE) – For quality data need constant vigilance (host disappear/move, security blocks pings, need to update remote host lists …), harder as more/remoter hosts 12

Overall Situation • Performance from U. S. & Europe is improving all over, for

Overall Situation • Performance from U. S. & Europe is improving all over, for losses, RTT & throughput • Performance to developed countries are orders of magnitude better than to developing countries • Poorer regions 5 -10 years behind • Poorest regions Africa, Central & S. Asia • Some regions are: – catching up (SE Europe, Russia), – keeping up (Latin America, Mid East, China), – falling further behind (e. g. India, Africa) 13

Future Focii • • First view of Africa from within Africa Impact of Gloriad

Future Focii • • First view of Africa from within Africa Impact of Gloriad for Russian connectivity Impact of new RNP initiatives for Brazil More on India (preparation for CHEP 06) Finish off the study of Pakistan Impact of new connectivity in E. Asia Others (suggestions welcome…) 14

Further Information • Ping. ER project home site – www-iepm. slac. stanford. edu/pinger/ •

Further Information • Ping. ER project home site – www-iepm. slac. stanford. edu/pinger/ • Ping. ER methodology (presented at I 2 Apr 22 ’ 04) – www. slac. stanford. edu/grp/scs/net/talk 03/i 2 -method-apr 04. ppt • ICFA/SCIC Network Monitoring report – www. slac. stanford. edu/xorg/icfa-net-paper-jan 05/20050206 netmon. doc • ICFA/SCIC home site – http: //icfa-scic. web. cern. ch/ICFA-SCIC/ • SLAC/NIIT collaboration – http: //maggie. niit. edu. pk/ • Pakistan outage: www. slac. stanford. edu/grp/scs/net/case/pakjul 05/jun-july. htm 15