ESnet Endtoend Internet Monitoring Les Cottrell and Warren
- Slides: 35
ESnet End-to-end Internet Monitoring Les Cottrell and Warren Matthews, SLAC and David Martin, HEPNRC Presented at the ESSC Review Meeting, Berkeley, May 1998 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM) 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 1
Outline of Talk • • • Why are we (ESnet/HENP community) measuring? What are we measuring & how? What do we see? What does it mean? Summary – Deployment/development, Internet Performance, Next Steps – Collaborations 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 2
Why go to the effort? • Internet woefully under-measured & underinstrumented • Internet very diverse - no single path typical • Users need end-to end measurements for: – realistic expectations, planning information – guidelines for setting and validating SLAs – information to help in identifying problems – help to decide where to apply resources • Complements ESnet utilization measurements • Provides information for reporting problems to NOC 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 3
Our Main Tool (Ping. ER) is Ping Based • “Universally available”, easy to understand – no software for clients to install • Low network impact • Provides useful real world measures of response time, loss, reachability, unpredictability • Now monitoring from 14 sites in 8 countries monitoring > 500 links in 22 countries (> 300 sites) • Resources: 6 bps/link, ~600 k. Bytes/month/link 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 4
Measurement Architecture HTTP WWW Ping Reports & Data SLAC Archive Analysis Archive Monitoring Cache Monitoring Remote HEPNRC Monitoring Remote 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 5
Ping Loss Quality • Want quick to grasp indicator of link quality • Loss is the most sensitive indicator – Studies on economic value of response time by IBM showed there is a threshold around 4 -5 secs where complaints increase. – loss of packet requires ~ 4 sec TCP retry timeout – For packet loss we use following thresholds: • 0 -1% = Good 1 -2. 5% = Acceptable • 2. 5%-5% = Poor 5%-12% = Very Poor • > 12% = Bad (unusable for interactive work) 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 6
Quality Distributions from SLAC • ESnet median good quality • Other groups poor or very poor 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 7
Aggregation/Grouping • Critical for 14 monitoring sites & > 500 links • Group measurements by: – area (e. g. N. America W, N. America E, W. Europe, Japan, Asia, others, or by country, or TLD) – trans-oceanic links, intercontinental links, crossing IXP – ISP (ESnet, v. BNS/I 2, TEN-34. . . ) – by monitoring site – one site seen from multiple sites – common interest/affiliation (XIWT, HENP, Expmt …) • Beware: reduces statistics, choice of sites critical 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 8
Tabular Navigation Tool Monitoring site Select grouping, e. g. Intercontinental, TLDs, Site to site. . . Select metric Select month Goes back to Jul-97 , Colored by quality < 62. 5 ms excellent (white) <125 ms good (green) < 250 ms poor (yellow) <500 ms very poor (pink) >500 ms bad (red) Remote site Response, Loss, Quiescence, Reachability. . . Drill down Site to show all sites monitoring it Value to see all links contributing Mouse. Over To see number of links To see country To see monitoring site 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 9
Drill down (all sites monitoring CERN) Also provides Excel for DIY Select one of these groups Sort CMU CNAF RL FNAL SLAC DESY Carelton RMKI CERN KEK 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 10
Overall Improvements Jan-95 Nov-97 • For about 80 remote sites seen from SLAC • Response time improved between 1 and 2. 5% / month • Loss - similar (closer to 2. 5%/month) 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 11
How does it look for ESnet Researchers getting to US sites (280 links, 28 States)? • Within ESnet excellent (median loss 0. 1%) • To v. BNS sites very good (~ 2 * loss for ESnet) • DOE funded Universities not on v. BNS/ESnet – acceptable to poor, getting better (factor 2 in 6 months) – lot of variability (e. g. ) • Brown. T, UMass. T = unacceptable(>= 12%) • Pitt*, SC*. Colo. State*, UNMT, UOregon. T, Rochester*, UC*, Ole. Miss*, Harvard 1 q 98, UWashington. T, UNMT= v. poor(> 5%) • Syracuse. T, Purdue. T, Hawaii* = poor (>= 2. 5%) – *=no v. BNS plans, T= v. BNS date TBD, V=on v. BNS 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 12
University access changes in last year • A year ago we looked at Universities with large DOE programs • Identified ones with poor (>2. 5%) or worse (>5%) performance – UOregon. T, Harvard 1 q 98, UWashington. T = very poor (>= 5%) – JHUV, Duke. V, UCSDV, UMich. T, UColo. V, UPenn. T, UMNV, UCIT, UWisc. V = acceptable (>1%)/good – *=no v. BNS plans, T= v. BNS date TBD, V=on v. BNS 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 13
Canada • 20 links, 9 remote sites, 7 monitoring sites • Seems to depend most on the remote site – UToronto bad to everyone – Carleton, Laurentian, Mc. Gill poor – Montreal, UVic acceptable/good – TRIUMF good with ESnet, poor to CERN 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 14
Europe • Divides up into 2 – TEN-34 backbone sites (de, uk, nl, ch, fr, it, at) • within Europe good performance • from ESnet good to acceptable, except nl, fr (Renater) &. uk are bad – Others • within Europe performance poor • from ESnet bad to es, il, hu, pl acceptable for cz 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 15
Asia • Israel bad • KEK & Osaka good from US, very poor from Canada • Tokyo poor from US • Japan-CERN/Italy acceptable, Japan-DESY bad • FSU bad to Moscow, acceptable to Novosibirsk • China is bad everywhere 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 16
Intercontinental Grouping (Loss) Looks pretty bad for intercontinental use Improving (about factor of 2 in last 6 months) 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 17
Summary 1/5 • Deployment Development – ESnet/HENP/ICFA has 14 Collection sites in 8 countries collecting data on > 500 links involving 22 countries – HEPNRC archiving/analyzing, SLAC analyzing – 600 KB/month/link, 6 bps/link, . 25 FTE @ archive site, 1. 5 -2. 5 FTE on analysis – reports available worldwide to end-users to access, navigate, review & customize (via Excel) & see quality – 4 GBytes of data available to experts for analysis – tools available for others to monitor, archive, analyze • XIWT/IPWT chose & deployed Ping. ER ~ 10 collection sites are now monitoring 41 beacon sites 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 18
Summary 2/5 • Deployment Development • Next Steps – Improve tools: • • • Improve statistical robustness - Poisson sampling, medians More groupings, beacon sites, matched pairs, for comparison More navigation features to drill down Better/easier identification of common bottlenecks Prediction (extrapolations, develop models, configure and validate with data) – Pursuing deployment of dedicated PC based monitor platforms: IETF Surveyor & NIMI/LBNL • NIMIs up & running at PSC, LBNL, FNAL, SLAC, CERN (CH), working with RAL (UK), KEK (JP), DESY (DE) • Will provide throughput, traceroute & one way ping measurements 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 19
Summary 3/5 • Deployment Development • Next Steps • Internet Performance (summary for our 500 links) – Performance within ESnet is good – Performance to v. BNS good (median loss ~ 2* ESnet) – Performance to non ESnet/v. BNS sites is acceptable to poor – Intercontinental performance is very poor to bad – Response time improving by 1 -2% / month – Packet loss improving between SLAC & other sites by 3% / month since Jan-95, – Very dynamic 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 20
Summary 4/5 • Deployment Development • Next Steps • Internet Performance (continued): – Links to sites outside N. America vary from good (KEK) to bad – Canada a mixed bag, depending on remote site it is acceptable to bad – TEN-34 backbone countries (exc UK) good to acceptable – Otherwise Europe poor to bad – Asia (apart from some Japanese sites) is bad – Rest of world generally poor to bad. 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 21
Summary 5/5 • Deployment Development • Next Steps • Internet Performance • Lots of collaboration & sharing: – – – – – SLAC & HEPNRC leading effort on Ping. ER 14 monitoring sites, ~ 400 remote sites Monitoring site tools CERN & CNAF/INFN, Oxford/Trace. Ping Map. Ping/MAPNet working with NLANR TRIUMF Traceroute topology Map NIMI/LBNL & Surveyor/IETF/IPPM Industry: XIWT/IPWT, also SBIR from Net. Predict on prediction Talks at IETF, XIWT, ICFA, ESSC, ESCC, Interface’ 98, CHEP… Lots of support: DOE/MICS/ESSC/ESnet, ICFA, XIWT 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 22
More Information & extra info follows • ICFA Monitoring WG home page (links to status report, meeting notes, how to access data, and code) – http: //www. slac. stanford. edu /xorg/icfa/ntf/home. html • WAN Monitoring at SLAC has lots of links – http: //www. slac. stanford. edu /comp/net/wan-mon. html • Tutorial on WAN Monitoring – http: //www. slac. stanford. edu /comp/net/wan-mon/tutorial. html • Ping. ER History tables – http: //www. slac. stanford. edu/ /xorg/iepm/pinger/table. html • NIMI http: //www. psc. edu/~mahdavi/nimi_paper/NIMI. html 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 23
Perception of Packet Loss • Above 4 -6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. • The occurrence of long delays of 4 seconds or more at a frequency of 4 -5% or more is also irritating for interactive activities such as telnet and X windows. • Above 10 -12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable. 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 24
180 Day Ping Performance SLACCERN 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 25
Running 10 week averages Sorted on biggest change Standard deviation gives idea of loading 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 26
Quiescence • Frequency of zero packet loss (for all time not cut on prime time) 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 27
Response & Loss Improvements • Improved between 1 and 2. 5% / month • Response & Loss similar improvements 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 28
Top Level Domain Grouping (Loss) Diagonals are within TLD US good/accept for it, de, ch & cz Hungary is poor China unusable Canada poor to bad UK - US bad 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 29
US ESnet & v. BNS ESnet Median 0. 1% Links 36 Unique remote sites 17 Monitoring sites 6 v. BNS . EDU, non ESnet/v. BNS Median 1. 5% (avg 3. 2%) Links 54 Unique remote sites 36 Monitoring sites 3 Median 0. 3% Links 30 Unique remote sites 18 Monitoring sites 4 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 30
3/4/98 z: cottrellesccmay 98esscmay 98. ppt 31
3/4/98 z: cottrellesccmay 98esscmay 98. ppt 32
Advanced to U Chicago to Advanced Loss Delay 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 33
Map. Ping • Java Applet, based on Map. Net from NLANR – Colors links by performance – Selection: • • collection site performance metric month zoom level – Mouse over gives coords 3/4/98 z: cottrellesccmay 98esscmay 98. ppt 34
Traceroute Topology Tool • Reverse traceroute servers • Traceping • Topology. Map KEK From TRIUMF – Ellipses show node on route – Open ellipse is measurement node – Blue ellipse not reachable – Keeps history FNAL 3/4/98 z: cottrellesccmay 98esscmay 98. ppt DESY CERN 35
- Esnet internet
- Les cottrell
- Les cottrell
- Esne esnet
- Simon cottrell
- Chemsitry
- Ecuacion de cottrell
- Cottrell equation
- Cottrell equation
- Damon cottrell
- Emily cottrell
- Gary cottrell
- Garrison cottrell
- Lüders bantları nedir
- Ymk
- Internet or internet
- Lasswell's linear model of communication
- Luther warren and harry fenner
- Warren pryor poem
- Warren pryor poem
- Alo vallikivi
- Les danger d'internet
- Les danger d'internet
- Parts de les flors
- Les lettres en français
- Les 10 volcans les plus dangereux du monde
- La ficelle theme
- Les constellations les plus connues
- Je moi tu toi
- Part de la planta en la fulla s'uneix a la tija
- Mot invariable et variable
- Classe de mot variable
- Trouvez les réponses. écrivez-les en chiffres (numbers).
- Grand corp malade train
- Allez vous en sur les places et sur les parvis
- Preactionneurs