Performance Comparison of Grid Information Services Beth Plale
Performance Comparison of Grid Information Services Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative project with Peter Dinda, Northwestern University 23 July 2002
Schemas in performance evaluation influenced by “Key Concepts and Services of a Grid Information Service”, Beth Plale, Peter Dinda, Gregor von Laszewski, IASTED Parallel and Distributed Computing Systems (PDCS), September 2002 23 July 2002
Types of Resource Information Grid Entity Description Organizations Accountable bodies and owners of resources People Resource admins, resource providers, GIS admins Physical resources Compute resources, network interfaces, benchmark results, number of users, load Services Job manager, load leveler, other GIS’ Comm resources Link capacity, switch capacity, error rate, drop rate Software packages BLAS, LAPACK, etc. Event producers Generators of event streams Event channels Event stream propagation vehicle Event dictionaries List of commonly used event types Instruments Radar systems, telescopes, etc. Network paths Available bandwidth and expected latency Network topologies Hosts, switches, routers Wireless devices Wireless hosts, wavepoints, cells, etc. Virtual organizations Groups of collaborators 23 July 2002
Criteria for Inclusion in GIS • Defn: object in repository represents entity in real-world grid • Grid entity has representation in GIS repository if grid entity: – can be described – has value to more than one application – has persistency needs beyond single application run 23 July 2002
Services Provided by GIS • Query interface: request for information through query language – e. g. , SELECT … FROM … WHERE in SQL • Update interface: request to add/update information in repository – e. g. , UPDATE … in SQL • Management interface: activation, deactivation of service 23 July 2002
Additional GIS Functionality • Replication – Provision of replica transparency • Distribution (a grid-driven necessity) – Partitioning of information across sites. • Security interface – Object level or column level? – Access control 23 July 2002
View of GIS service Interoperability 2. GCE testbed portal 1. Xpath query 4. XML doc GCE testbed XML schema Xpath query XML doc SQL 3. converter query LDAP query XML db my. SQL Xindice 23 July 2002 LDAP
Benchmark Evaluation of Alternate GIS Representations • Evaluation of three databases: relational (my. SQL), LDAP (open. LDAP), and XML (Xindice) • Database schemas: derived from single ER diagram and based partly on GLUE v 8 • Benchmark: set of query and update use cases derived from Grid job submission. • Cost metric: minimized query response times, minimized update times, and minimized size of resulting query set. 23 July 2002
Benchmark Evaluation Assumptions • Grid entities have complex relationships. • The questions asked of GIS data are becoming more complex. • Some entities require extremely rapid update rates. • Thus a cost metric that considers multiple aspects: – Minimized query response times, – Minimized update times, and – Minimized size of resulting query set. 23 July 2002
Benchmark Evaluation GCE XML GLUE v 8 input schemas represent as E-R diagram transform into schema for start Grid GIS Benchmark Use Cases relational (my. SQL) evaluate against GCE job submission use cases 23 July 2002 LDAP (open LDAP) XML (Xindice) populate by scripts and existing data
Set I: 05 -’ 02, large multi-site project Object Classes w/ Object classes instances 30 10 Top 5 classes -- MDSDevice -- Host. Info -- MDSDevice. Group -- top -- MDSSoftware 242 36. 5 % 24. 5 13. 5 8. 5 7. 0 ------90. 0 % Set III: 11 -’ 00, DOE site Object Classes w/ Object classes instances 31 23 July 2002 19 17531 Set II: 01 -’ 02, large academic HPC site Object classes 19 Classes w/ Object instances 5 106 Top 5 classes -- Globus Queue 42. 0 % -- Globus. Services. Job. Mgr 26. 0 -- Globus. Network. Interface 17. 5 -- Globus. Physical. Resource 8. 0 -- Globus. Daemon 6. 0 ------100. 0 % Top 5 classes -- Globus. File. Instance 80. 0 % -- Globus. Queue. Entry 6. 5 -- Globus. Queue 3. 2 -- Globus. Organization 1. 8 -- Globus. Service. Job. Manager 1. 8 ------94. 5 %
E-R Diagram computing elements application sources network cards has instan from use end points user accounts has has nodes has is-a end-to-end connections is-a hosts (compute nodes) traceroute GLUE packet loss, latency. roundtrip. Delay. ping, bandwidth. avail. TCP. single. Stream 23 July 2002 network benchmarks subclusters run on host, port, protocol has clusters applications users v 8 network nodes network paths
Relational (table) representation network cards computing elements application sources users clusters end points applications end-to-end connections traceroute packet loss, latency. roundtrip. Delay. ping, bandwidth. avail. TCP. single. Stream network benchmarks subclusters host, port, protocol 23 July 2002 user accounts nodes hosts (compute nodes) network nodes network paths
Hierarchical representation EDTtop network nodes compute elements user clusters user accounts connections application subclusters hosts (compute nodes) endpoints 23 July 2002 application sources network path
Benchmark: set of Use Cases of GIS query and update • Use cases based on job submission. – examples drawn from Hot. Page (M. Thomas) • Query 1: Suppose user is part of NPACI organization and knows his/her binary runs better on T 3 E. – “Of machines in NPACI organization, give me list of T 3 Es and their location for which availability is good, a binary is resident, and I have an account. ” 23 July 2002
SELECT C. CPUmodel, C. name, C. location FROM Cluster as C, Sub. Cluster as SC, Host as H, Return machines and locations Application as A, User. Account as UA, User as U WHERE C. Organization = “NPACI” and SC. Owning. Cluster = C. Cluster. Name and SC. CPUModel = “T 3 E” and A. OSName = SC. OSName and A. Owner = “Jane Lee” and A. Location = C. Location For All H where H. Owning. Cluster = C. Cluster. Name avg(H. SMPLoad 1 min. X 100 < 0. 50) C. Cluster. Unique. ID = UA. ID and UA. ID = U. ID and U. Name = “Jane Lee” and UA. Expire. Date > 21 -July-2002 and UA. Activate. Date <= 21 -July-2002 Cluster is NPACI and user has binary on machine Availability is good User has valid account on cluster -> GLUEv 8 23 July 2002
• “Of machines in NPACI organization, give me list of T 3 Es and their location for which availability is good, a binary is resident, and I have an account. ” • “availability is good” could be defined different: • -- Defined here as ‘average load over all nodes in a SMP is less than. 50’. • -- More difficult is ‘existence of 20 contiguous nodes. ’ • ‘Binary is resident’ is fairly easy, ‘binary is nearby’ is a harder question to answer. • “Show histographic usage of my job or show historical usage of machine X for task Y where Y is job submission or transfer rate to HPSS” 23 July 2002
23 July 2002
Benchmark Evaluation GCE XML GLUE v 8 E-R diagram input schemas start Grid GIS Benchmark Use Cases relational (my. SQL) GCE job submission use cases 23 July 2002 LDAP (open LDAP) XML (Xindice) scripts and existing data http: //www. cs. indiana. edu/~plale
- Slides: 19