Network Monitoring The GGF Perspective 1 st EGEE

  • Slides: 18
Download presentation
Network Monitoring: The GGF Perspective 1 st EGEE Conference Cork, April 2004 Mark Leese

Network Monitoring: The GGF Perspective 1 st EGEE Conference Cork, April 2004 Mark Leese Paul Mealor Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Contents Simple really: u. Use cases - why this is important u. What GGF

Contents Simple really: u. Use cases - why this is important u. What GGF is doing Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

The Grid? u Basic Grid principle: Grid App Middleware Uzbekistan Resource (CE) Resource Network

The Grid? u Basic Grid principle: Grid App Middleware Uzbekistan Resource (CE) Resource Network CERN (SE) u User applications (Grid apps) submit their work to the middleware which selects the “best” resources available to runs the job. u Network performance information is essential. . . because… Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Use Case 1: Resource Selection u. Resource Brokers (RBs) are responsible for finding the

Use Case 1: Resource Selection u. Resource Brokers (RBs) are responsible for finding the best resource (Computing Element, CE) to be used for a job, e. g. : u. Run job at B, using copy of data from A, then store results at C u. All other things being equal, take into account the data access requirements of the job u. Out of the list of CEs capable of running the job, use network cost function to identify the CE with the “best” data access: File source & destination File size Network Cost Function Estimated transfer time u. Consider “best” combination of data sources and sinks, e. g. IF source data = 10 GB AND resulting data will = 100 GB THEN pick CE based on performance to result storing SE (Storage Element). u. European Data Grid does something along these lines (Please, no one tell me that this is wrong) Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Use Case 2: Replica Selection u. File replication = proven technique for improving data

Use Case 2: Replica Selection u. File replication = proven technique for improving data access u. Spread multiple copies of same file across the Grid n Do you really want to get everything from CERN, everytime? n Do you really want to get everything from your geographically nearest site everytime? GGF looking at formally defining these (and other) use cases u. A file has Logical File Name (LFN) which maps to 1 or more PFNs (physicals) u. Replica Manager should include Replica Selection Service which uses network performance data (from somewhere) to find “best” replica. 1. LFN 2. Multiple locations (PFNs) Grid App 5. Grid. FTP commands Replica Catalogue 4. Selected replica (PFN) Replica Selection 3. Get performance data/ predictions Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) Net Mon Service

How are GGF addressing problem? u. Patience ; -) First we must look at

How are GGF addressing problem? u. Patience ; -) First we must look at web services. u. Essentially, an online application accessed using XML. . . u…which makes it easier for other apps to use yours… u…which allows the Grid middleware to access our data 2. Client locates suitable service using registry Client UDDI registry 1. WSP registers service with registry 3. Client requests WSDL doc WSP 4. WSDL tells client how to interact 5. Service and client interact using XML messages, sent via SOAP Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

How are GGF addressing problem? u. By producing standards relating to network monitoring services.

How are GGF addressing problem? u. By producing standards relating to network monitoring services. u. First with the Network Measurements Working Group (NM-WG): n Defining XML schemas for requesting tests and historic data, and publishing network measurements l Aims: to standardise communication, and… l …use XML, for web services and OGSI model l Simple use case… DANTE, Internet 2, SLAC etc. already using NM-WG work. test request (request schema) Network Monitoring Service tests results (publication schema) u. All request & result messages can be formatted using standardised schemas = truly powerful combination Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Standard measurements? u Schemas based on NM-WG proposed measurement classification system: n describes a

Standard measurements? u Schemas based on NM-WG proposed measurement classification system: n describes a set of network characteristics and their classification hierarchy n used for creating common schemata for describing network monitoring data n using a standard classification maximises data portability description + hierarchy Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

So what can you ask for 1? u Initial schema requirements set. Four sections:

So what can you ask for 1? u Initial schema requirements set. Four sections: what, where, when, how u What: n Use GGF metric names, e. g. path. delay. one. Way n Can request statistical data, with a specified sample interval, e. g. daily averages for one-way delay over the last month n After some “discussion”, multiple statistics in same request n Can limit number of returned results to avoid overload u Where: n Source and destination n Flexible: IPv 4|6, hostnames, or textual names such as “core router” and “edge router” (e. g. for security) Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

So what can you ask for 2? u. When: n The primary means of

So what can you ask for 2? u. When: n The primary means of specifying the time period we are interested in (for tests or data retrieval) is: ltarget Time (an absolute time or “now”) lrelative +ve and -ve time tolerances… -ve time tolerance = 600 secs target_time = 14: 00 = 13: 50 -14: 10 -ve time tolerance = 600 secs Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

So what can you ask for 3? n Setting limit on number of results

So what can you ask for 3? n Setting limit on number of results controls possibilities: lwhen number of results = “all”: supply all matching measurements in given time period lwhen number of results = 1: time data defines the period for which a measurement is considered to be acceptable, e. g. 14: 00 +/- 10 minutes n Can also give start & end time if you wish, but values are mapped to target_time & number of results will = all n “testing interval” controls how often tests are run Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

So what can you ask for 4096? u. How: n Can supply values to

So what can you ask for 4096? u. How: n Can supply values to act as parameters for tests, or filters for querying past data, including tool name. n Uses param specific tags or list of parameters: <remote. Param. List>-a –b 10 -c</remote. Param. List> n Possible to set ranges for parameters… <tcp. Buffer. Size range=“max”>4194304</tcp. Buffer. Size> <tcp. Buffer. Size>1048576</tcp. Buffer. Size> <tcp. Buffer. Size range=“min”>1048576</tcp. Buffer. Size> …and orders of preference. n Unspecified params use receiving system’s defaults n Can request reporting of actual param values used n Can control whether a test is ever run Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Is that all GGF is doing? u. No, GGF Grid High Performance Networking Research

Is that all GGF is doing? u. No, GGF Grid High Performance Networking Research Group also hard at work, modelling the network as a Grid resource so they can perform “advance reservation” etc. reservation” u. Computing, storage and interconnecting network are all resources: n Easier to manage l. All can be reserved l. Capability discovery l. Exploit commonalities n Forms integrated stack Grid applications middleware computing Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) network storage

The network as a resource u. To be achieved with set of network subservices

The network as a resource u. To be achieved with set of network subservices forming holistic network service. u. Can't say more as this probably going to change quite a lot. u. Want to know more? n Then get involved! Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Network monitoring service u u u u Historic measurement data Predictions Allow clients to

Network monitoring service u u u u Historic measurement data Predictions Allow clients to run scheduled tests On-demand (real-time) tests Provide less-frequently monitored information (network route, topology…) Event notifications, for all of the above Across multiple administrative domains for all of the above Grid/Net Operations Network domain X Diagram shows potential clients: numerous and varied Network domain Y GOC/NOC Admin Software Grid Applications Grid Middleware Network Monitoring Service Automated Test Systems Network Monitoring Service Other Network Services Network domain Z Other Network Services Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) Network Monitoring Service Other Network Services

Will this be easy? u. Probably not, but like all good car salespeople, I

Will this be easy? u. Probably not, but like all good car salespeople, I won’t tell you about the problems. u. But the potential benefits are worth the effort! Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

Conclusion u. Grid network monitoring crucial to the Grid n But you all know

Conclusion u. Grid network monitoring crucial to the Grid n But you all know that already! u. GHPN: looking at network services, inc. monitoring service u. NM-WG: looking at how to interface to network monitoring services u. Ambitious, but potential benefits justify efforts! JRA 4 SHOULD be involved! Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)

The End ? ? ? Questions m. j. leese@dl. ac. uk pdm@hep. ucl. ac.

The End ? ? ? Questions m. j. leese@dl. ac. uk pdm@hep. ucl. ac. uk GET INVOLVED! http: //www-didc. lbl. gov/NMWG/ http: //forge. gridforum. org/projects/ghpn-rg Mark Leese (Daresbury Laboratory) Paul Mealor (University College London)