HighPerformance Grid Computing and Research Networking Grid Computing

  • Slides: 78
Download presentation
High-Performance Grid Computing and Research Networking Grid Computing Presented by Selim Kalayci Instructor: S.

High-Performance Grid Computing and Research Networking Grid Computing Presented by Selim Kalayci Instructor: S. Masoud Sadjadi http: //www. cs. fiu. edu/~sadjadi/Teaching/ sadjadi At cs Dot fiu Dot edu 1

Acknowledgements n The content of many of the slides in this lecture notes have

Acknowledgements n The content of many of the slides in this lecture notes have been adopted from the online resources prepared previously by the people listed below. Many thanks! n Henri Casanova n n n Principles of High Performance Computing http: //navet. ics. hawaii. edu/~casanova henric@hawaii. edu Ian Foster Presentations&Tutorials from n www. globus. org 2

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 3

Multiple Computers n n è Adding CPUs to a single computer becomes very expensive

Multiple Computers n n è Adding CPUs to a single computer becomes very expensive How about multiple computers together? Linux Clusters (60% of Top-500 list) Blue/Gene: 30 K computers 4

Beyond the machine room? n Need more capacity than available at (most) single sites

Beyond the machine room? n Need more capacity than available at (most) single sites n n n Everyone would like a 10 K-node 100 GHz cluster Very expensive (cooling, power) More economical to have multiple sites Need to locate available resources now Data/Instruments are inherently distributed Campus Machine Room Nation 5

Grid Computing A dynamic multi-institutional network of computers that come together to share resources

Grid Computing A dynamic multi-institutional network of computers that come together to share resources for the purpose of coordinated problem solving. resource application Achieved through: institutional boundary 1. Open general-purpose protocols 2. Standard interfaces 6

Layers in Grid 7

Layers in Grid 7

A Grid Checklist n n coordinates resources that are not subject to centralized control

A Grid Checklist n n coordinates resources that are not subject to centralized control … … using standard, open, general-purpose protocols and interfaces … … to deliver nontrivial qualities of service. Virtual Organizations n Group of individuals or institutions defined by sharing rules to share the resources of “Grid” for a common goal. n Example: Application service providers, storage service providers, databases, crisis management team, consultants. 8

How is a grid different? n Grids focus on site autonomy n Grids involve

How is a grid different? n Grids focus on site autonomy n Grids involve heterogeneity n n Grids involve more resources than just computers and networks Grids focus on the user 9

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 10

Grid Infrastructure n Distributed management n n Of physical resources Of software services Of

Grid Infrastructure n Distributed management n n Of physical resources Of software services Of communities and their policies Unified treatment n n n Build on Web services framework Use WS-RF, WS-Notification (or WSTransfer/Man) to represent/access state Common management abstractions & interfaces 11

Globus is Open Source Grid Infrastructure n Implement key Web services standards n n

Globus is Open Source Grid Infrastructure n Implement key Web services standards n n Software for Grid infrastructure n n Service-enable new & existing resources E. g. , GRAM on computer, Grid. FTP on storage system, custom application services Uniform abstractions & mechanisms Tools to build applications that exploit Grid infrastructure n n State, notification, security, … Registries, security, data management, … Enabler of a rich tool & service ecosystem 12

GLOBUS TOOLKIT 4 – GT 4 n n n Open source toolkit developed by

GLOBUS TOOLKIT 4 – GT 4 n n n Open source toolkit developed by The Globus Alliance that allows us to build Grid applications. Organized as a collection of loosely coupled components. Consists of services, programming libraries, and development tools. 13

GT Domain Areas n Core runtime n n Security n n Provision, deploy, &

GT Domain Areas n Core runtime n n Security n n Provision, deploy, & manage services Data management n n Apply uniform policy across distinct systems Execution management n n Infrastructure for building new services Discover, transfer, & access large data Monitoring n Discover & monitor dynamic services 14

GT 4 Components 15

GT 4 Components 15

WSRF & WS-Notification n Naming and bindings (basis for virtualization) n n Lifecycle (basis

WSRF & WS-Notification n Naming and bindings (basis for virtualization) n n Lifecycle (basis for fault resilient state mgmt) n n n Resource properties associated with resources Operations for querying and setting this info Asynchronous notification of changes to properties Service groups (basis for registries, collective svcs) n n Resources created by services following factory pattern Resources destroyed immediately or scheduled Information model (basis for monitoring, discovery) n n Every resource can be uniquely referenced, and has one or more associated services for interacting with it Group membership rules & membership management Base Fault type 16

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 17

Security Services n n n Forms the underlying communication medium for all the services

Security Services n n n Forms the underlying communication medium for all the services Secure Authentication and Authorization Single Sign-on n User need not explicitly authenticate himself every time a service is requested Uniform Credentials Ex: GSI (Globus Security Infrastructure) 18

Grid Security Infrastructure - GSI n Grid Security Infrastructure (GSI) n Use GSI as

Grid Security Infrastructure - GSI n Grid Security Infrastructure (GSI) n Use GSI as a standard mechanism for bridging disparate security mechanisms n n Doesn’t solve trust problem, but now things talk same protocol and understand each other’s identity credentials Basic support for delegation, policy distribution Translate from other mechanisms to/from GSI as needed Convert from GSI identity to local identity for authorization 19

Grid Security Infrastructure - GSI n Grid Security Infrastructure (GSI) n Based on standard

Grid Security Infrastructure - GSI n Grid Security Infrastructure (GSI) n Based on standard PKI technologies n n n SSL protocol or WS-Security for authentication, message protection X. 509 Certificates for asserting identity n n CAs allow one-way, light-weight trust relationships (not just site -to-site) for users, services, hosts, etc. Proxy Certificates n GSI extension to X. 509 certificates for delegation, single signon 20

Gridmap file n A gridmap file at each site maps the grid id of

Gridmap file n A gridmap file at each site maps the grid id of a user to a local id n The grid id of the user is his/her subject in the grid user certificate n The local id is site-specific; n multiple grid ids can be mapped to a single local id n Usually a local id exists for each VO participating in that grid effort n The local ids are then used to implement site specific policies n Priorities etc. 21

Gridmap file entry n n The gridmap-file is maintained by the site administrator Each

Gridmap file entry n n The gridmap-file is maintained by the site administrator Each entry maps a Grid DN (distinguished name of the user; subject name) to local user names # #Distinguished Name # Local username “/DC=org/DC=doegrids/OU=People/CN=Laukik Chitnis 712960” “/DC=org/DC=doegrids/OU=People/CN=Richard Cavanaugh 710220” “/DC=org/DC=doegrids/OU=People/CN=Jang. Uk In 712961” “/DC=org/DC=doegrids/OU=People/CN=Jorge Rodriguez 690211” ivdgl grid 3 ivdgl osg 22

How to create and use an Identity (1) n Run the below command to

How to create and use an Identity (1) n Run the below command to generate a personal grid identity certificate. grid-cert-request n This will create the following files in $HOME/. globus usercert_request. pem (request to sign certificate) userkey. pem (private key - encrypted) usercert. pem (public key - signed) 23

How to create and use an Identity (2) n After you have created the

How to create and use an Identity (2) n After you have created the request then you need to mail it to the local certificate authority: cat $HOME/. globus/usercert_request. pem | mail skala 001@cis. fiu. edu (or dvill 013@cs. fiu. edu) n Then the CA will mail you back a signed certificate which you will want to put into $HOME/. globus/usercert. pem (it can take up to a day for the CA to process the request) 24

Commands to log in / logout n grid-proxy-init n This "logs you into" the

Commands to log in / logout n grid-proxy-init n This "logs you into" the globus system. n grid-proxy-info n Use this to see your status. n grid-proxy-destroy n Use this to log out. n n A proxy is like a temporary ticket to use the Grid, default in the above case being 12 hours. Once this is done, you should be able to run “grid jobs” n globus-job-run site-name command 25

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 26

GT 4 Data Management n Stage/move large data to/from nodes n n n Locate

GT 4 Data Management n Stage/move large data to/from nodes n n n Locate data of interest n n Replica Location Service (RLS) Replicate data for performance/reliability n n Grid. FTP, Reliable File Transfer (RFT) Alone, and integrated with GRAM Distributed Replication Service (DRS) Provide access to diverse data sources n n File systems, parallel file systems, hierarchical storage: Grid. FTP Databases: OGSA DAI 27

Grid. FTP n n n What is Grid. FTP? A secure, robust, fast, efficient,

Grid. FTP n n n What is Grid. FTP? A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol A Protocol n Multiple independent implementations can interoperate n n n This works. Both the Condor Project at Uwisand Fermi Lab have home grown servers that work with ours. Lots of people have developed clients independent of the Globus Project. We also supply a reference implementation: n n n Server Client tools (globus-url-copy) Development Libraries 28

Globus-url-copy n n Grid. FTP-compliant client from the Globus team Copy files from one

Globus-url-copy n n Grid. FTP-compliant client from the Globus team Copy files from one URL to another URL n n n One URL is usually a gsiftp: // URL Another URL is usually a file: / URL To move a file from remote Grid. FTP-enabled server to local machine % globus-url-copy gsiftp: //gcb. fiu. edu/tmp/jt file: /home/skala 001/jt n To put file onto server reverse URLs % globus-url-copy file: /home/skala 001/jt gsiftp: //gcb. fiu. edu/tmp/jt n Monitor performance using –vb flag % globus-url-copy -vb gsiftp: //gcb. fiu. edu/tmp/jt file: /home/skala 001/jt 29

Reliable File Transfer - RFT n n WSRF compliant Fault-tolerant, High- performance data transfer

Reliable File Transfer - RFT n n WSRF compliant Fault-tolerant, High- performance data transfer service n Soft state. n Notifications/Query Reliability on top of high performance provided by Grid. FTP. n Fire and Forget. n Integrated Automatic Failure Recovery. n Network level failures. n System level failures etc. n Essentially a Data transfer scheduler with FIFO as a Queue Policy. 30

RFT Client SOAP Messages Notifications (Optional) RFT Service Grid. FTP Server Master DSI Protocol

RFT Client SOAP Messages Notifications (Optional) RFT Service Grid. FTP Server Master DSI Protocol Interpreter Grid. FTP Server Data Channel IPC Link IPC Receiver Protocol Interpreter Master DSI IPC Link Slave DSI Data Channel Slave DSI IPC Receiver 31

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 32

Execution Management n Common WS interface to schedulers n n More generally: interface for

Execution Management n Common WS interface to schedulers n n More generally: interface for process execution management n n n Unix, Condor, LSF, PBS, SGE, … Lay down execution environment Stage data Monitor & manage lifecycle Kill it, clean up A basis for application-driven provisioning 33

Grid Job Management Goals Provide a service to securely: n Create an environment for

Grid Job Management Goals Provide a service to securely: n Create an environment for a job n Stage files to/from environment n Cause execution of job process(es) n Via various local resource managers n Monitor execution n Signal important state changes to client n Enable client access to output files n Streaming access during execution 34

GRAM n n GRAM: Globus Resource Allocation and Management GRAM is a Globus Toolkit

GRAM n n GRAM: Globus Resource Allocation and Management GRAM is a Globus Toolkit component n For Grid job management GRAM is a unifying remote interface to Resource Managers n Yet preserves local site security/control GRAM is for stateful job control n Reliable operation n Asynchronous monitoring and control n Remote credential management n File staging via RFT and Grid. FTP 35

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate GT 4 Java Container GRAM services Delegation Transfer request RFT File Transfer SEG Compute element Local job control Deleg ate sudo Client Job events GRAM adapter Grid. FTP control Local scheduler User job FTP data Grid. FTP Remote storage element(s) 36

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate GT 4 Java Container GRAM services Delegation Transfer request RFT File Transfer SEG Compute element Local job control Deleg ate sudo Client Job events GRAM adapter Grid. FTP control Local scheduler User job FTP data Delegated credential can be: Made available to the application Grid. FTP Remote storage element(s) 37

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate GT 4 Java Container GRAM services Delegation Transfer request RFT File Transfer SEG Compute element Local job control Deleg ate sudo Client Job events GRAM adapter Grid. FTP control Local scheduler User job FTP data Delegated credential can be: Used to authenticate with RFT Grid. FTP Remote storage element(s) 38

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate

GT 4 WS GRAM Architecture Service host(s) and compute element(s) Job tions func Delegate GT 4 Java Container GRAM services Delegation Transfer request RFT File Transfer SEG Compute element Local job control Deleg ate sudo Client Job events GRAM adapter Grid. FTP control Local scheduler User job FTP data Delegated credential can be: Used to authenticate with Grid. FTP Remote storage element(s) 39

A Simple Example n Command example: % globusrun-ws -submit -c /bin/date Submitting job. .

A Simple Example n Command example: % globusrun-ws -submit -c /bin/date Submitting job. . . Done. Job ID: uuid: 002 a 6 ab 8 -6036 -11 d 9 -bae 6 -0002 a 5 ad 41 e 5 Termination time: 01/07/2005 22: 55 GMT Current job state: Active Current job state: Clean. Up Current job state: Done Destroying job. . . Done. n n A successful submission will create a new Managed. Job resource with its own unique EPR for messaging Use –o option to create the EPR file % globusrun-ws -submit –o job. epr -c /bin/date 40

A Simple Example(2) n To see the output, use –s (stream) option % globusrun-ws

A Simple Example(2) n To see the output, use –s (stream) option % globusrun-ws -submit –s -c /bin/date Termination time: 06/14/2007 18: 07 GMT Current job state: Active Current job state: Clean. Up-Hold Wed Jun 13 14: 07: 54 EDT 2007 Current job state: Clean. Up Current job state: Done Destroying job. . . Done. Cleaning up any delegated credentials. . . Done. n If you want to send the output to a file, use –so option % globusrun-ws -submit –s –so job. out -c /bin/date … % cat job. out Wed Jun 13 14: 07: 54 EDT 2007 41

A Simple Example(3) Submitting your job to different schedulers n Fork % globusrun-ws -submit

A Simple Example(3) Submitting your job to different schedulers n Fork % globusrun-ws -submit -Ft Fork -s -c /bin/date (Actually, the default is Fork. So, you can skip it in this case. ) n SGE % globusrun-ws -submit -Ft SGE -s -c /bin/hostname 42

Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50 Submitting job.

Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50 Submitting job. . . Done. Job ID: uuid: f 9544174 -60 c 5 -11 d 9 -97 e 3 -0002 a 5 ad 41 e 5 Termination time: 01/08/2005 16: 05 GMT % globusrun-ws -status -j job_epr Current job state: Active % globusrun-ws -status -j job_epr Current job state: Done % globusrun-ws -kill -j job_epr Requesting original job description. . . Done. Destroying job. . . Done. 43

Complete Factory Contact n Override default EPR n n Select a different host/service Use

Complete Factory Contact n Override default EPR n n Select a different host/service Use “contact” shorthand for convenience n n Relies on proprietary knowledge of EPR format! Command example: % globusrun-ws -submit –F gcb. fiu. edu -c /bin/date 44

Read RSL from File n Command: % globusrun-ws -submit -f touch. xml n Contents

Read RSL from File n Command: % globusrun-ws -submit -f touch. xml n Contents of touch. xml file: <job> <executable>/bin/touch</executable> <argument>touched_it</argument> </job> 45

Resource Specification Language (RSL) n n RSL is the language used by the clients

Resource Specification Language (RSL) n n RSL is the language used by the clients to submit a job. All job submission requests are described in RSL, including the executable file and arguments. You can specify the type and capabilities of resources to execute your job. You can also coordinate Stage-in and Stageout operations through RSL. 46

Common/useful options n globusrun-ws -J n n globusrun-ws -S n n Perform delegation as

Common/useful options n globusrun-ws -J n n globusrun-ws -S n n Perform delegation as necessary for job’s file staging globusrun-ws -s n n Perform delegation as necessary for job Stream stdout/err during job execution to the terminal globusrun-ws -self n Useful for testing, when you have started the service using your credentials instead of host credentials 47

Staging job <job> <executable>/bin/echo</executable> <directory>/tmp</directory> <argument>Hello</argument> <stdout>job. out</stdout> <stderr>job. err</stderr> <file. Stage. Out> <transfer>

Staging job <job> <executable>/bin/echo</executable> <directory>/tmp</directory> <argument>Hello</argument> <stdout>job. out</stdout> <stderr>job. err</stderr> <file. Stage. Out> <transfer> <source. Url>file: ///tmp/job. out</source. Url> <destination. Url> gsiftp: //host. domain: 2811/tmp/stage. out </destination. Url> </transfer> </file. Stage. Out> </job> 48

RSL Variable n Enables late binding of values n n Values resolved by GRAM

RSL Variable n Enables late binding of values n n Values resolved by GRAM service System-specific variables n n n ${GLOBUS_USER_HOME} ${GLOBUS_LOCATION} ${GLOBUS_SCRATCH_DIR} n n Alternative directory that is shared with compute node Typically providing more space than user’s HOME dir 49

RSL Variable Example <job> <executable>/bin/echo</executable> <argument>HOME is ${GLOBUS_USER_HOME}</argument> <argument>SCRATCH = ${GLOBUS_SCRATCH_DIR}</argument> <argument>GL is ${GLOBUS_LOCATION}</argument>

RSL Variable Example <job> <executable>/bin/echo</executable> <argument>HOME is ${GLOBUS_USER_HOME}</argument> <argument>SCRATCH = ${GLOBUS_SCRATCH_DIR}</argument> <argument>GL is ${GLOBUS_LOCATION}</argument> <stdout>${GLOBUS_USER_HOME}/echo. stdout</stdout> <stderr>${GLOBUS_USER_HOME}/echo. stderr</stderr> </job> !!!/tmp/rsl. Example 50

GRAM Commands n Run a job using: % globus-job-run localhost /bin/date n Submit to

GRAM Commands n Run a job using: % globus-job-run localhost /bin/date n Submit to Fork: % globus-job-run localhost/jobmanager-fork /bin/date n Submit a batch job using: % globus-job-submit localhost /bin/sleep 50 n n n globus-job-status globus-job-get-output globus-job-cancel 51

Running a Script in GRAM n Add this script to file “job” #! /bin/csh

Running a Script in GRAM n Add this script to file “job” #! /bin/csh -f echo "Hello World from "; $GLOBUS_LOCATION/bin/globus-hostname echo arg 1 = $1 echo arg 2 = $2 echo -n "sum is " echo "$1+$2" | /usr/bin/bc –l n Change the permissions for “job” % chmod +x job n Run the job % globus-job-run localhost. /job 5 6 n You should get Hello World from gcb. fiu. edu arg 1 = 5 arg 2 = 6 sum is 11 !!!/tmp/job 52

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 53

What is MDS 4? n Grid-level monitoring system used most often for resource selection

What is MDS 4? n Grid-level monitoring system used most often for resource selection and error notification n Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification n n Aid user/agent to identify host(s) on which to run an application Make sure that they are up and running correctly WS-Resource. Properties, WS-Base. Notification, WSService. Group Functions as an hourglass to provide a common interface to lower-level monitoring tools 54

MDS 4 Components n Information providers n n n Higher level services n n

MDS 4 Components n Information providers n n n Higher level services n n Index Service – a way to aggregate data Trigger Service – a way to be notified of changes Both built on common aggregator framework Clients n n Monitoring is a part of every WSRF service Non-WS services are also be used Web. MDS All of the tool are schema-agnostic, but interoperability needs a well-understood common language 55

Information Providers n n GT 4 information providers collect information from some system and

Information Providers n n GT 4 information providers collect information from some system and make it accessible as WSRF resource properties Growing number of information providers n n n Ganglia, Clu. Mon, Nagios SGE, LSF, Open. PBS, PBSPro, Torque Many opportunities to build additional ones n E. g. , network monitoring, storage systems, various sensors 56

Information Providers n n Data sources for the higher-level services Some are built into

Information Providers n n Data sources for the higher-level services Some are built into services n n Any WSRF-compliant service publishes some data automatically WS-RF gives us standard Query/Subscribe/Notify interfaces GT 4 services: Service. Meta. Data. Info element includes start time, version, and service type name Most of them also publish additional useful information as resource properties 57

Information Providers: GT 4 Services n Reliable File Transfer Service (RFT) n n Community

Information Providers: GT 4 Services n Reliable File Transfer Service (RFT) n n Community Authorization Service (CAS) n n Service status data, number of active transfers, transfer status, information about the resource running the service Identifies the VO served by the service instance Replica Location Service (RLS) n n Note: not a WS Location of replicas on physical storage systems (based on user registrations) for later queries 58

Information Providers (2) n Other sources of data n n n Any executables Other

Information Providers (2) n Other sources of data n n n Any executables Other (non-WS) services Interface to another archive or data store File scraping Just need to produce a valid XML document 59

Information Providers: Cluster and Queue Data n Interfaces to Hawkeye, Ganglia, Clu. Mon, Nagios

Information Providers: Cluster and Queue Data n Interfaces to Hawkeye, Ganglia, Clu. Mon, Nagios n n Basic host data (name, ID), processor information, memory size, OS name and version, file system data, processor load data Some condor/cluster specific data This can also be done for sub-clusters, not just at the host level Interfaces to PBS, Torque, LSF n Queue information, number of CPUs available and free, job count information, some memory statistics and host info for head node of cluster 60

Higher-Level Services n Index Service n n Trigger Service n n Caching registry Warn

Higher-Level Services n Index Service n n Trigger Service n n Caching registry Warn on error conditions All of these have common needs, and are built on a common framework 61

MDS 4 Index Service n Index Service is both registry and cache n n

MDS 4 Index Service n Index Service is both registry and cache n n Subscribes to information providers In memory default approach n n n Datatype and data provider info, like a registry (UDDI) Last value of data, like a cache DB backing store currently being discussed to allow for very large indexes Can be set up for a site or set of sites, a specific set of project data, or for user-specific data only Can be a multi-rooted hierarchy n No *global* index 62

Container-wide Index n n n Each GT 4 container has a local index Collects

Container-wide Index n n n Each GT 4 container has a local index Collects information about services in that container Each service registers to container index when correctly configured 63

VO-wide indexes n n Local indexes can be registered to VO wide indexes Configfile

VO-wide indexes n n Local indexes can be registered to VO wide indexes Configfile at resource container or at VO index – contains URL for resource or VO index 64

MDS 4 Trigger Service n n n Subscribe to a set of resource properties

MDS 4 Trigger Service n n n Subscribe to a set of resource properties Evaluate that data against a set of preconfigured conditions (triggers) When a condition matches, action occurs n n Email is sent to pre-defined address Website updated 65

Information models n n Each information sources publishes information in XML according to some

Information models n n Each information sources publishes information in XML according to some schema. Some times the author of the information source or the grid resource defines that schema. Some collaborative efforts to define common schemas–for example GLUE for compute information Schema typically written in XSD, but not required 66

GLUE schema n n Grid Laboratory Uniform Environment Schema developed by Data. TAG for

GLUE schema n n Grid Laboratory Uniform Environment Schema developed by Data. TAG for EU/USA interoperability. Modelled in UML Implementations n XML version for MDS n n Information collected from various cluster monitoring systems Also: LDAP and SQL versions (used by older versions of MDS and other monitoring systems). 67

MDS user interfaces n General purpose UIs n n n Web browser based interface

MDS user interfaces n General purpose UIs n n n Web browser based interface -Web. MDS Command line tools Specialized clients n Brokers 68

Web. MDS n n Web-based interface to display monitoring information Easily extensible for new

Web. MDS n n Web-based interface to display monitoring information Easily extensible for new data using XSLT 69

MDS 4 - Command Line n n Xpath Queries to query the Index Service

MDS 4 - Command Line n n Xpath Queries to query the Index Service To see all collected in the Index Service n n To see the number of free nodes: n n wsrf-query -s https: //gcb. fiu. edu: 8443/wsrf/services/Default. Index. Service wsrf-query -s https: //gcb. fiu. edu: 8443/wsrf/services/Default. Index. Service "number(//*/glue: GLUECE//glue: Computing. Element/glue: Stat e/@glue: Free. CPUs)" To see how many jobs are currently running: n wsrf-query -s https: //gcb. fiu. edu: 8443/wsrf/services/Default. Index. Service "number(//*[localname()='GLUECE']//glue: Computing. Element//glue: State/@glu 70 e: Total. Jobs)"

Configuring GRAM to use a cluster monitoring system n n GRAM extracts and publishes

Configuring GRAM to use a cluster monitoring system n n GRAM extracts and publishes cluster information from either Ganglia or Hawkeye $GLOBUS_LOCATION/etc/globus_wsrf_mds_usef ulrp/gluerp. xml <default. Provider> tag specifies whether to use Ganglia or Hawkeye or none. Uncomment appropriate example supplied in the configfile 71

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus

Agenda n n n n Grid Computing Grid Middleware - Globus Security in Globus Data Management Execution Management Monitoring Metaschedulers - Gridway 72

Grid Meta-Scheduler n n Local Schedulers is not fit for Grid environment Meta-scheduler(s) should

Grid Meta-Scheduler n n Local Schedulers is not fit for Grid environment Meta-scheduler(s) should interact with lower-level schedulers for scheduling decisions Resources (Computational, Data, Network, etc. ) and Jobs are other entities, Meta-Scheduler should be aware of and interact with Meta-Scheduler uses existing Grid services 73

Grid. Way n n Lightweight metascheduler on top of GT 2. 4 – 4.

Grid. Way n n Lightweight metascheduler on top of GT 2. 4 – 4. x Properties: n n n Support of GGF DRMA standard API for job submission and management Support for JSDL Simple scheduling mechanisms but extensible Interoperability between different grid infrastructures and middlewares (Globus, EGEE, UNICORE…) Allows job dependencies (workflow) Supports job migration/adaptive execution (Grid- and application-initiated) 74

Grid. Way Architecture RFT GRAM DRMAA Library MDS Grid. Way Core Job pool Execution

Grid. Way Architecture RFT GRAM DRMAA Library MDS Grid. Way Core Job pool Execution of jobs on LRM Job control operations Request Manager Host pool Dispatch Manager Matchmaking, execution and migration Transfer Manager CLI Execution Manager RFT GRAM Information Manager Scheduler Performance Monitor MDS Resource 75

Grid. Way Modules n n n Request Manager Interfaces with client commands Dispatch Manager

Grid. Way Modules n n n Request Manager Interfaces with client commands Dispatch Manager Performs job scheduling Information Manager Resource Monitoring and data gathering n n Execution Manager Executes job stages Performance Monitor Evaluates the job performance 76

Scheduling Strategy n n n Dispatch manager wakes up at every scheduling interval Uses

Scheduling Strategy n n n Dispatch manager wakes up at every scheduling interval Uses Resource Selector to select the host(s) to submit the job Resource Selector interfaces with Grid Information Services, such as MDS Resource Selector returns a candidate list of hosts to submit the job by using a policy script You can implement your own policy script, so it is extensible Dispatch Manager then submits the job to the Execution Manager 77

Grid. Way Commands n n n gwd - start the daemon gwhost - information

Grid. Way Commands n n n gwd - start the daemon gwhost - information about resources gwps - information about jobs gwuser - information about users gwsubmit - submits job gwkill - cancels a job 78