JDL Basic Examples UI command line JDL advanced
JDL Basic Examples UI command line JDL advanced attributes Multi. Node Jobs Marco Cecchi – Daniele Cesini INFN CNAF Scuola Grid Bologna, 27 Novembre 2007 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 1 http: //grid. infn. it/
JDL and Basic Examples www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 2 http: //grid. infn. it/
Job Description Language (JDL) • g. Lite approach to Request Description • Condor Class. Ads-based language (key/value pairs) • Fully extensible & flexible, high-level Allow the user to specify job execution needed information • Characteristics of the application (Executable, Arguments, Input/Output Sandbox files, . . . ) • Requirements/preferences about resources (Computational, storage) • Management hints for the WMS (number of retries, proxy renewal, lb server. . . ) Investigating Job Submission Description Language (JSDL) XML-based language: https: //forge. gridforum. org/projects/jsdl-wg/ www. ccr. infn. it CHEP'07, Victoria http: //grid. infn. it/
Class. Ad Statements • Key/Value pair • Value can be: – – A number A string “ “ A list { } A Class. Ad [ ] • Statements end with semicolon www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 4 http: //grid. infn. it/
Class. Ad Operators www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 5 http: //grid. infn. it/
Class. Ad functions Some Class. Ad functions…. . Floor(const x) is. Undefined(any a) is. Error(any a) is. String(any a) is. Integer(any a) is. Real(any a is. List(any a) is. Classad(any a) int(const x) random(const x) real(const x) round (const x) strcat(any*) substr(string s, int offset [, int length ]) to. Upper(string s) to. Lower(string s) size(list l). size(classad c) min(list l) max(list l) member(const x, string l identical. Member(const x, string l) regexp(string pattern, string target [, string options ]) http: //www. cs. wisc. edu/condor/classad/refman/ www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 6 http: //grid. infn. it/
JDL Hints • To use special characters, such as &, , |, >, <, If these characters should be escaped in the shell (for example, if they are part of a file name), they should be preceded by triple in the JDL, or specified inside quoted strings. • Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning if each line. Multi-line comments must be enclosed between “/*” and “*/”. • Attention! The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 7 http: //grid. infn. it/
GLUESchema (Grid Logical Uniform Environment) • Provides a standardized description of the Grid • Allows to present resources and services to users and external services in a uniform way. • The intended uses are: • resource discovery (“what is out there? ”) • selection (“what are the properties? ”) • monitoring (“what is the state of the system? ”) • LDAP is used to publish GLUESchema information • GLUESchema published information can be used inside a JDL to make requirements and ranking expressions www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Glue. Schema and IS for a site www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 9 http: //grid. infn. it/
Glue. Schema and IS for another Site www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 10 http: //grid. infn. it/
A minimal JDL [cesini@lcg-ui corso]$ cat minimal. jdl Executable = "/bin/hostname"; Std. Output = "std. out"; Std. Error = "std. err"; Executable = <string>; • Specifies the command to be run by the job. • If the command is already present on the WN, it must be expressed as an absolute path; • If it has to be copied from the UI, only the file name must be specified, and the path of the command on the UI should be given in the Input. Sandbox attribute. Std. Output = <string>; && Std. Error = <string>; • Define the name of the files containing the standard output and standard error of the executable, once the job output is retrieved. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 11 http: //grid. infn. it/
A less minimal JDL Input. Sandbox = < string | list of strings > • Identifies the files to be copied on the WN before the job execution • Its size can be limited on the server side (typically 10 MB) • Different files MUST have different names Output. Sandbox = < string | list of strings > • Identifies the files to have to be copied from the WN to the WMS after the job execution • Only this files can be retrieved on the UI • Its size can be limited on the server side Executable = "test. sh"; Input. Sandbox = {"/home/cesini/corso/test. sh"}; Std. Output = "std. out"; Std. Error = "std. err"; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 12 http: //grid. infn. it/
Use small Sand. Box • ISB and OSB are loaded and remain into the WMS during the whole job life (up to Cleared status) – A WMS can handle up to 30 k. Job/day • To avoid troubles to the server hard disk please ! USE SMALL SANDBOXES ! (much below the server limits) • Use the data management tools to transfer big files • (…. see afternoon talks…) www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 13 http: //grid. infn. it/
A simple JDL Arguments = < string > • Used to pass arguments to the executable : Arguments = "file. A 10"; [cesini@lcg-ui cesini]$ cat first. jdl Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"test. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err"}; www. ccr. infn. it [cesini@lcg-ui cesini]$ cat test. sh #!/bin/sh echo "First file: " cat $1 echo "Second file: " cat $2 Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 14 http: //grid. infn. it/
Std. Input && Virtual Organization Std. Input = < string > • For the standard input, an input file can be similarly specified: Std. Input = "std. in"; • This means that the job is run as follows: $ job < std. in Virtual. Organization = < string > • Specifies the VO of the user: Virtual. Organisation = "cms"; • It is superseded by the VO contained in the user proxy • For normal proxies, the VO can either be specified in the JDL, in the UI configuration files or as argument to the job submission command www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 15 http: //grid. infn. it/
UI User Interface: Command line interface, most important commands www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 16 http: //grid. infn. it/
Create the proxy • First of all…. . the proxy! voms-proxy-init --voms <vo_name> voms-proxy-info --all [cesini@lcg-ui cesini]$ ll. globus/ -rw------- 1 cesini -r---- 1 cesini www. ccr. infn. it 2126 Jul 7 2007 usercert. pem 1910 Jul 7 2007 userkey. pem Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 17 http: //grid. infn. it/
job-list-match • Listing resources that can execute the job and match the JDL requirements glite-wms-job-list-match -a --rank -c wms_rb 00. conf first. jdl [cesini@lcg-ui corso]$ glite-wms-job-list-match -a --rank -c wms_rb 00. conf first. jdl Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server ===================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - bogrid 5. bo. infn. it: 2119/jobmanager-lcgpbs-cert 0 - ce. grid. unipg. it: 2119/jobmanager-lcgpbs-cert 0 - ce 02 -lhcb-t 2. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-cert_t 20 - ce 03 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 0 - ce 04 -lcg. cr. cnaf. infn. it: 2119/blah-lsf-infngrid 0 - ce 05 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 0 - ce 06 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 0 - cex. grid. unipg. it: 2119/jobmanager-lcgpbs-cert 0 - cmsrm-ce 01. roma 1. infn. it: 2119/jobmanager-lcglsf-cmsgcert 0 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 18 http: //grid. infn. it/
Basic WMProxy clients options • -c wms_rb 00. conf – select the client configuration file to be used, sets: the WMS to be contacted The default VO to be Used The default requirements and rank Many other things… – If not indicates the UI generally has a default configuration file per VO • -a – Automatically delegates a proxy to the WMProxy, simple but slower. To explicitly delegate a proxy use the –d option – better performance as delegation may require a non-negligible amount of time glite-wms-job-delegate-proxy -d <deleg. ID> glite-wms-job-list-match –d <deleg. ID> --rank -c wms_rb 00. conf first. jdl www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 19 http: //grid. infn. it/
job-submit Submitting a JDL glite-wms-job-submit -a -c wms_rb 00. conf first. jdl [cesini@lcg-ui corso]$ glite-wms-job-submit -a -c wms_rb 00. conf first. jdl Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server =========== glite-wms-job-submit Success =========== The job has been successfully submitted to the WMProxy Your job identifier is: https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag ==================================== UNIQUE JOB ID www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 20 http: //grid. infn. it/
The Job. ID Upon submission each job is assigned a unique, virtually non-recyclable job identifier. In an URL form: https: //<LB_hostname>[: <port>]/<unique_string> <LB hostname> is the hostname of the Logging and Bookkeeping (LB) server for the job, which usually is set by the WMS – but can be modified by the JDL The remainder is a random generated sequence: the Grid is a highly decentralized system, characterized by lack of unified control. The Job. Id is used for any other further operation on the job after submission www. ccr. infn. it CHEP'07, Victoria http: //grid. infn. it/
glite-wms-job-status Retrieving the job status: glite-wms-job-status https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag [cesini@lcg-ui corso]$ glite-wms-job-status https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag ******************************* BOOKKEEPING INFORMATION: Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag Current Status: Ready Destination: grid 003. roma 2. infn. it: 2119/jobmanager-lcgpbs-cert Submitted: Mon Nov 19 15: 09: 42 2007 CET ******************************* • Verbosity can be increased with –v <1|2|3> www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 22 http: //grid. infn. it/
“-i” and “–o” Options [cesini@lcg-ui corso]$ export i=0 [cesini@lcg-ui corso]$ while [ $i -le 10 ]; do glite-wms-jobsubmit –a –c wms_rb 00. conf -o ID_file. txt first. jdl ; let i=i+1; done >> submission. txt & [cesini@lcg-ui corso]$ glite-wms-job-status -i ID_file. txt ---------------------------------1 : https: //albalonga. cnaf. infn. it: 9000/8 Fj. A 0 EJ 05 j. YHdkg. YX 0 JU 3 Q 2 : https: //albalonga. cnaf. infn. it: 9000/vd. Ntnw. Dh 2 z. ZJqywu-nwfm. A ………. . 10: https: //albalonga. cnaf. infn. it: 9000/5 ZWpn 7 uom. Uz. Xtjqx. Fx. Je 5 g 11: https: //albalonga. cnaf. infn. it: 9000/kda. Ng. NOSEw. Hzlz. V 47 K 1 Fwg a : all q : quit ---------------------------------Choose one or more job. Id(s) in the list - [1 -11]all: Use - -noint to have directly the status for all the JOBIDs www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 23 http: //grid. infn. it/
Jobs State Machine (1/9) Submitted The job has been submitted from the UI but it is still waiting to be accepted by the WMProxy www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (2/9) Waiting The job has been accepted by the Wmproxy and it is waiting to be processed by the WM www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (3/9) Ready The job has been processed by the WM but it hasn't been transferred to the CE yet www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (4/9) Scheduled job is waiting in the CE queue www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (5/9) Running job is executing! www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (6/9) Done The job has terminated, either successfully or considered (by condor. C) to be terminated with some error. (i. e. : due to unrecoverable errors on the CE side) www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (7/9) Aborted The processed job has been aborted by the WMS (for too long in a queue on the WM or on the CE, expired credentials etc. ) www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (8/9) Cancelled Job has been cancelled by the user www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Jobs State Machine (9/9) Cleared: the output has been transferred by the user or removed because of some timeout www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
job-output • When the status of the job is Done(Success) the Output Sandbox can be retrieved with: glite-wms-job-output <Job_ID> [cesini@lcg-ui corso]$ glite-wms-job-output https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag Connecting to the service https: //131. 154. 100. 90: 7443/glite_wms_wmproxy_server ======================================== JOB GET OUTPUT OUTCOME Output sandbox files for the job: https: //albalonga. cnaf. infn. it: 9000/TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag have been successfully retrieved and stored in the directory: /tmp/glite-ui/cesini_TWr 2 b. Z 0 Qla. Ws. Brd 43 zsl. Ag ======================================== • Output directory can be changed with - - dir <Out. Dir> • Note that the OSB are periodically purged from the WMS, do not wait for too long before retrieving them www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 33 http: //grid. infn. it/
job-cancel • A job can be cancelled after submission using: glite-wms-job-cancel <Job_ID> [cesini@lcg-ui corso]$ glite-wms-job-cancel https: //albalonga. cnaf. infn. it: 9000/k. FTSk. FWGadk. Zq. Fg. Nb 4 m 5 WQ Are you sure you want to remove specified job(s) [y/n]y : y Connecting to the service https: //131. 154. 100. 90: 7443/glite_wms_wmproxy_server =============== glite-wms-job-cancel Success =============== The cancellation request has been successfully submitted for the following job(s): - https: //albalonga. cnaf. infn. it: 9000/k. FTSk. FWGadk. Zq. Fg. Nb 4 m 5 WQ ========================================= www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 34 http: //grid. infn. it/
job-logging-info • All information stored on the LB about a job can be queried using: glite-wms-job-logging-info <Job_ID> [cesini@lcg-ui corso]$ glite-wms-job-logging-info https: //albalonga. cnaf. infn. it: 9000/fzxo 1 Ii 1 K-s. Cj. AFf. Hlj. Q 3 Q *********************************** LOGGING INFORMATION: Printing info for the Job : https: //albalonga. cnaf. infn. it: 9000/fzxo 1 Ii 1 K-s. Cj. AFf. Hlj. Q 3 Q --Event: Reg. Job - source = Network. Server - timestamp = Mon Nov 19 15: 27: 36 2007 CET --Event: Accepted - source = Network. Server - timestamp = Mon Nov 19 15: 27: 37 2007 CET --Event: En. Queued - result = START - source = Network. Server - timestamp = Mon Nov 19 15: 27: 37 2007 CET --Event: En. Queued - result = OK - source = Network. Server - timestamp = Mon Nov 19 15: 27: 37 2007 CET --Event: De. Queued - source = Workload. Manager - timestamp = Mon Nov 19 15: 27: 37 2007 CET --Event: Match - dest_id = spacin-ce 1. dma. unina. it: 2119/jobmanager-lcgpbs-cert - source = Workload. Manager - timestamp = Mon Nov 19 15: 27: 41 2007 CET Try to increase verbosity up to –v 3 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 35 http: //grid. infn. it/
JDL Advanced Attributes www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 36 http: //grid. infn. it/
Retry. Count & Shallow. Retry. Count These attributes are used to handle failed jobs and their resubmission • Deep resubmission: • when the user’s job has started running on the WN and then the job itself or the WMS Job. Wrapper has failed. • Shallow resubmission: • when the WMS Job. Wrapper has failed before starting the actual user’s job. Retry. Count = <positive integer > • Sets how many deep resubmissions have to be done before aborting the job • Limited by Max. Retry. Count on server side • Zeroes the shallows counter • 0 disable the deep retry Shallow. Retry. Count = < integer greater (equal) than -1 > • Sets how many shallow resubmissions have to be done before aborting the job • Limited by Max. Shallow. Retry. Count on server side • -1 disable the shallow retry (it is different from 0) www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Retry. Count & Shallow. Retry. Count [cesini@lcg-ui Retry]$ cat retry 1. jdl ################# # JDL with retry control activated # ################# Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"test. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err"}; # This will resubmit deeply once Retry. Count = 1; # This will resubmit shallowly twice Shallow. Retry. Count = 2; # This is a trick, will be resubmitted Prologue = "/bin/false"; www. ccr. infn. it [cesini@lcg-ui Retry]$ Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/PSp. Mat. GORSX f. Uk-P 7 pih. EA Current Status: Aborted Status Reason: hit job shallow retry count (2) [cesini@lcg-ui Retry]$ grep -i shallow log_infov 3_retry 1. txt Shallow. Retry. Count = 2; - result = SHALLOW - reason = hit job shallow retry count (2) Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 38 http: //grid. infn. it/
Retry. Count & Shallow. Retry. Count = 0; # This will resubmit deeply once Retry. Count = 1; # This will resubmit shallowly zero times Shallow. Retry. Count = 0; # This is a trick, will be resubmitted Prologue = "/bin/false"; Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/EXQTAHe. A 4 k. L OPo 7 Xk. ATy. HQ Current Status: Aborted Status Reason: hit job shallow retry count (0) Shallow. Retry. Count = 1; # This will resubmit deeply once Retry. Count = 1; # This will resubmit shallowly zero times Shallow. Retry. Count = -1; # This is a trick, will be resubmitted Prologue = "/bin/false"; www. ccr. infn. it Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/D 4 Dvhrf. V 1 BL 9 5 fz 38 x 12 x. A Current Status: Aborted Status Reason: hit job retry count (1) Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 39 http: //grid. infn. it/
Advanced Input. Sand. Box handling Input. Sandbox = <string | or list of string> • identifies the list of files on: – the UI local file system – A grid. FTP server – An HTTPS server (but this requires to have the Grid. Site htcp client command installed on the WN; this is not true in current CE standard configuration) Input. Sandbox={"test. sh", "gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. A", "gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. B"}; Input. Sandbox. Base. URI = < string > • Changes the Input. Sandbox path pointing to gsi. FTP server Input. Sandbox. Base. URI =“gsiftp: //grid 007 g. cnaf. infn. it/tmp”; means that Input. Sandbox = “myfile. dat”; is: Input. Sandbox = “gsiftp: //grid 007 g. cnaf. infn. it/tmp/myfile. dat”; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 40 http: //grid. infn. it/
Advanced Input. Sand. Box handling Make sure that remote files are available! Or your jobs will remain stuck forever waiting for ISB globus-url-copy file: ///home/cesini/corso/file. A gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. A globus-url-copy file: ///home/cesini/corso/file. B gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. B globus-url-copy file: ///home/cesini/corso/test. sh gsiftp: //grid 007 g. cnaf. infn. it/tmp/test. sh [cesini@lcg-ui Sand. Box]$ cat remote-ISB-Base. URI. jdl ################## [cesini@lcg-ui Sand. Box]$ cat remote. ISB. jdl ################## # JDL with advanced ISB handling # ################## Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"test. sh", "gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. A", "gsiftp: //grid 007 g. cnaf. infn. it/tmp/file. B" }; Output. Sandbox = {"std. out", "std. err"}; # JDL with advanced ISB handling 2 # ################## Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox. Base. URI = "gsiftp: //grid 007 g. cnaf. infn. it/tmp"; Input. Sandbox = {"test. sh", "file. A", "file. B"}; # You can force to use a local file explicitly indicating the file with file: // # Input. Sandbox = {"file: //home/cesini/corso/Sand. Box/test. sh", "file. A", "file. B"}; # or only with the complete path # Input. Sandbox = {"/home/cesini/corso/Sand. Box/test. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err"}; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 41 http: //grid. infn. it/
Advanced Output. Sand. Box handling Output. Sandbox. Dest. URI = <string> or <string list> • allows to have the output directly copied to specified locations running: – a grid. FTP server – An HTTPS server (but this requires to have the Grid. Site htcp client command installed on the WN; this is not true in current WN standard configuration). • Note that output files managed in this way are not retrieved by the glite-wms-job-output command. • The Output. Sandbox. Dest. URI list must have the same cardinality as the Output. Sandbox list, otherwise the JDL will be considered as invalid. Output. Sandbox. Base. Dest. URI = <string> • represents the base URI on a grid. FTP/HTTPS server www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 42 http: //grid. infn. it/
Advanced Output. Sand. Box handling [cesini@lcg-ui Sand. Box]$ cat remote-ISB-OSB. jdl ####################### # JDL with advanced ISB and OSB handling # ####################### Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox. Base. URI = "gsiftp: //grid 007 g. cnaf. infn. it/tmp"; Input. Sandbox = {"test. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err"}; Output. Sandbox. Base. Dest. URI = "gsiftp: //grid 007 g. cnaf. infn. it/tmp"; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 43 http: //grid. infn. it/
Allow. Zipped. ISB = <bool> • When set to true makes the WMProxy client commands archive and compress all job input sandbox files into a single tar, gzipped file that is then transferred to the WMS. • particularly useful when the job sandbox is composed by a large number of files • Not mandatory. If not specified in the JDL it is assumed to be set to false. • If Allow. Zipped. ISB is set to true, then the Zipped. ISB attribute is set by the client command irrespective of what it contains www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 44 http: //grid. infn. it/
Allow. Zipped. ISB [cesini@lcg-ui Retry]$ cat. . /Sand. Box/allow. Zipped. ISB. jdl ###################### # Example JDL with Allow Zipped. ISB Enabled # ###################### Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"test. sh", ". . /file. A", ". . /file. B"}; Allow. Zipped. ISB = true; Output. Sandbox = {"std. out", "std. err"}; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 45 http: //grid. infn. it/
Requirements = < Boolean Class. Ad expression> • describes which CEs in the IS are eligible to run the job. • the attributes that can be used are those expressed on the Glue. Schema with the “other. ” prefix • It is mandatory • If this attribute is not included in the JDL the client sets it to: Requirements = other. Glue. CEState. Status == "Production"; Requirements = other. Glue. CEInfo. Host. Name == “gridit-ce-001. cnaf. infn. it"; Requirements = other. Glue. CEInfo. Total. CPUs > 2 && other. Glue. CEPolicy. Max. Running. Jobs < 2; Requirements = other. Glue. CEPolicy. Max. CPUTime >= 1800; Requirements = (other. Glue. CEUnique. ID == "gridit-ce-001. cnaf. infn. it: 2119/jobmanager-lcgpbs-cert"); Requirements= Member("INFN-CNAF“ , other. Glue. Host. Application. Software. Run. Time. Environment); www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
An ATLAS Requirements = ( ( ( other. Glue. CEState. Status == "Production" ) && ( !( Reg. Exp(". *blah. *", other. Glue. CEUnique. ID) || Reg. Exp("ce 02. grid. acad. bg. *", other. Glue. CEUnique. ID) || Reg. Exp(". *jobmanagercondor. *", other. Glue. CEUnique. ID) || Reg. Exp(". *jinr. ru. *", other. Glue. CEUnique. ID) || Reg. Exp("mars-ce 2. mars. lesc. doc. ic. ac. uk. *", other. Glue. CEUnique. ID) || Reg. Exp(". *. na. infn. it. *", other. Glue. CEUnique. ID) || Reg. Exp(". *. ph. liv. ac. uk. *", other. Glue. CEUnique. ID) || Reg. Exp("atlasce. lnf. infn. it. *", other. Glue. CEUnique. ID) || Reg. Exp("ce-iepgrid. saske. sk. *", other. Glue. CEUnique. ID) || Reg. Exp("ce. phy. bg. ac. yu. *", other. Glue. CEUnique. ID) || Reg. Exp("ce. polgrid. pl. *", other. Glue. CEUnique. ID) || Reg. Exp("grid-ce. physik. uniwuppertal. de. *", other. Glue. CEUnique. ID) || Reg. Exp(". *. cern. ch. *", other. Glue. CEUnique. ID) ) && ( Member("VO-atlas-cloud-ES", other. Glue. Host. Application. Software. Run. Time. Environment) || Reg. Exp("ce 04. pic. es", other. Glue. CEUnique. ID) || Reg. Exp("lcg 2 ce. ific. uv. es", other. Glue. CEUnique. ID) || Reg. Exp("ce 01. ific. uv. es", other. Glue. CEUnique. ID) || Reg. Exp("ifaece 01. pic. es", other. Glue. CEUnique. ID) || Reg. Exp("grid 003. ft. uam. es", other. Glue. CEUnique. ID) || Reg. Exp("ce 02. lip. pt", other. Glue. CEUnique. ID) || Reg. Exp("grid 006. lca. uc. pt", other. Glue. CEUnique. ID) || Member("VO-atlas-tier. T 0", other. Glue. Host. Application. Software. Run. Time. Environment) || Member("VO-atlas-tier. T 1", other. Glue. Host. Application. Software. Run. Time. Environment) || Member("VO-atlas-tier. T 2", other. Glue. Host. Application. Software. Run. Time. Environment) ) ) && ( Member("VO-atlasrelease-12. 0. 7", other. Glue. Host. Application. Software. Run. Time. Environment) || Member("VOatlas-offline-12. 0. 7", other. Glue. Host. Application. Software. Run. Time. Environment) || Member("VO-atlas-production 12. 0. 7", other. Glue. Host. Application. Software. Run. Time. Environment) ) ) && ( ( other. Glue. CEPolicy. Max. CPUTime * other. Glue. Host. Benchmark. SI 00 ) >= 1333350 ) ) && ( other. Glue. Host. Main. Memory. RAMSize >= 800 ) ) && ( other. Glue. Host. Network. Adapter. Outbound. IP == true ); www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 47 http: //grid. infn. it/
Requirements [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a. . /first. jdl |grep -c 2119 CE number = 356 #no requirements using dteam VO [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req 1. jdl |grep -c 2119 78 #Requirements = (other. Glue. CEInfo. LRMSType == "PBS" && other. Glue. CEInfo. Total. CPUs > 25 ); dteam VO [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req 2. jdl |grep -c 2119 46 # With the previous ATLAS requirements [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req 3. jdl # False requirements Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server ========== glite-wms-job-list-match failure ========== No Computing Element matching your job requirements has been found! ================================ www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 48 http: //grid. infn. it/
Rank = < Class. Ad Floating-Point expression> • States how to rank CEs that met the Requirements • The WMS will submit the job to the CE with the highest rank. • It is mandatory • If not specified in the jdl, the clients on the UI add Rank = -other. Glue. CEState. Estimated. Response. Time; (CE with the minimal Estimated time for traversing the local batch system – calculated by the CE itself) Rank = other. Glue. CEPolicy. Max. Running. Jobsother. Glue. CEState. Running. Jobs; (CE with the max number of free slots) Rank = <some constant> (constant value, all CE should be treated in the same way by the WMS) Rank = random(<an integer>) (random value, CE are randomly chosen) www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Rank = other. Glue. CEPolicy. Max. Running. Jobs-other. Glue. CEState. Running. Jobs; (CE with the max number of free slots) [cesini@lcg-ui Rank]$ glite-wms-job-list-match --rank -a rank 1. jdl Connecting to the service https: //glite-rb 00. cnaf. infn. it: 7443/glite_wms_wmproxy_server ==================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - ce 01 -lhcb-t 2. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-cert_t 22884 - ce 02 -lhcb-t 2. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-cert_t 22884 - gridce 2. pi. infn. it: 2119/jobmanager-lcglsf-cert 4 1360 - grid 012. ct. infn. it: 2119/jobmanager-lcglsf-cert 164 - prod-ce-01. pd. infn. it: 2119/jobmanager-lcglsf-cert 104 - gridce. pi. infn. it: 2119/jobmanager-lcglsf-cert 100 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 50 http: //grid. infn. it/
Rank = 1 [cesini@lcg-ui Rank]$ glite-wms-job-list-match --rank -a rank 2. jdl Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server =================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - atlasce. lnf. infn. it: 2119/jobmanager-lcgpbs-cert 1 - atlasce 01. na. infn. it: 2119/jobmanager-lcgpbs-cert 1 - beagle 14. ba. itb. cnr. it: 2119/jobmanager-lcgpbs-cert 1 - bogrid 5. bo. infn. it: 2119/jobmanager-lcgpbs-cert 1 - ce. grid. unipg. it: 2119/jobmanager-lcgpbs-cert 1 - ce 01 -lhcb-t 2. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-cert_t 21 - ce 02 -lhcb-t 2. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-cert_t 21 - ce 03 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 1 - ce 05 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 1 - ce 06 -lcg. cr. cnaf. infn. it: 2119/jobmanager-lcglsf-infngrid 1 - cex. grid. unipg. it: 2119/jobmanager-lcgpbs-cert 1 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 51 http: //grid. infn. it/
Rank = Random(1000) [cesini@lcg-ui Rank]$ glite-wms-job-list-match --rank -a rank 3. jdl Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server =================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - spacin-ce 1. dma. unina. it: 2119/jobmanager-lcgpbs-cert 973 - prod-ce-01. pd. infn. it: 2119/jobmanager-lcglsf-cert 969 - t 2 -ce-01. lnl. infn. it: 2119/jobmanager-lcglsf-certsl 4942 - grid-ce. pr. infn. it: 2119/jobmanager-lcgpbs-cert 927 - gridit-ce-001. cnaf. infn. it: 2119/jobmanager-lcgpbs-cert 895 - t 2 -ce-01. mi. infn. it: 2119/jobmanager-lcgpbs-cert 890 - beagle 14. ba. itb. cnr. it: 2119/jobmanager-lcgpbs-cert 876 - grid 003. roma 2. infn. it: 2119/jobmanager-lcgpbs-cert 868 - grid 002. ca. infn. it: 2119/jobmanager-lcglsf-cert 831 - pamelace 01. na. infn. it: 2119/jobmanager-lcgpbs-cert 801 - atlasce 01. na. infn. it: 2119/jobmanager-lcgpbs-cert 744 - gridce. pg. infn. it: 2119/jobmanager-lcgpbs-cert 722 www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 52 http: //grid. infn. it/
Fuzzy Rank Fuzzy. Rank = < boolean > • Enables fuzzyness in the ranking computation. • Forces the matchmaking algorithm to adopt a stochastic selection criteria while searching for the best matching CE. • False by default • Rank values associated to each matching CE represent the probability that each CE has to be selected as the best matching one. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 53 http: //grid. infn. it/
Data. Requirements = <list of Class. Ads> • Represents the data requirements for a job. • Each Class. Ad in the list contains three attributes: – Input. Data (the list of input data needed by the job) – Data. Catalog. Type (type of data catalog that has to be targeted to resolve logical names) – Data. Catalog (the URI of the data catalog if this is not the VO default one) Data. Requirements = { [ Data. Catalog. Type = “. . . ” ; Data. Catalog = “https: //. . . ”; Input. Data = { “lfn: …”, “guid: …”, “lds: …”, “query: …” }; ], …} www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 54 http: //grid. infn. it/
Data. Catalog. Type The Data. Catalog. Type = <string> • represents the type of the data catalog. Possible values : – RLS {LCG Replica Location Service (lfn, guid) } – SI {g. Lite Storage Index (lfn, guid) } – DLI {LCG Data Location Interface (lfn, guid, lds, query) } • It is Mandatory • Only DLI works with the lfcserver (currently used in production) www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 55 http: //grid. infn. it/
Input. Data =<string | list of strings> • Represents Logical File Names (LFN), Grid Unique IDentifiers (GUID), Logical Dataset (LDS) and/or generic queries. • Used by the WMS to query the related Data Catalog for getting back a list of Physical File names (PFN) that are needed by the job as input for processing. • Listed names have to be prefixed with “lfn: ”, “guid: ”, “lds” and “query: ” to indicate that they are respectively LFNs, GUIDs, LDSs and generic queries. Input. Data = { “lfn: /EO. test. file” , “lds: cms. test. file”, “guid: 135 b 7 b 23 -4 a 6 a-11 d 7 -87 e 7 -9 d 101 f 8 c 8 b 70”, }; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 56 http: //grid. infn. it/
Data. Catalog and Data. Access. Protocol Data. Catalog = <string> • indicates the data catalog service used to resolve the file names specified in the Input. Data attribute list • It should be specified only if it is different from the VO default one. Data. Catalog = "http: //lfcserver. cnaf. infn. it: 8085"; Data. Access. Protocol = <string | strings list> • represents the (list of) protocols that the application is able to “speak” for accessing files listed in Input. Data • Mandatory if Data. Requirements or Input. Data is specified Data. Access. Protocol = { “https”, “gsiftp” }; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 57 http: //grid. infn. it/
Output. SE & Output. Data Output. SE = < string > • Represents the URI of the Storage Element where the user wants to store the output data. • Used by the WMS to find a CE being “close” to this SE and schedule the job there. Output. SE = “grid 001. cnaf. infn. it"; Output. Data = < list of classads > (*) • Describes the output files of the job • Similar to Data. Requirements • Automatically save the files (*) still unsupported www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Output. SE [cesini@lcg-ui Data. Req]$ cat output-se. jdl ################### # JDL with Output. SE Requirements # ################### Executable = "test. sh"; Arguments = "file. A file. B"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"test. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err"}; Output. SE = "grid 007 g. cnaf. infn. it"; [cesini@lcg-ui Data. Req]$ glite-wms-job-list-match -a output-se. jdl Connecting to the service https: //glite-rb-00. cnaf. infn. it: 7443/glite_wms_wmproxy_server ================================= COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - gridit-ce-001. cnaf. infn. it: 2119/jobmanager-lcgpbs-cert ================================= www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 59 http: //grid. infn. it/
Prologue = < string > • <string> is the name of a script/executable that must be run as a prologue within the job wrapper • It can be used for preliminary (i. e. ): • data transfers • environment checks • DB updates • Custom logging • If shallow resubmission is enabled and prologue fails the job will be shallow resubmitted otherwise deeply. • Use Prologue. Arguments = <string> to pass arguments to the prologue Prologoue = “my_prologue_script” Prologue = “/bin/false” #Can be used to test shallow resubmission www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 60 http: //grid. infn. it/
Epilogue Epilologue = < string > • Set the name of a script/executable that must be run as a epilogue • It can be used for post (i. e. ): • Data transfers • DB updates • Custom logging • Job functionality checks • If epilogue fails the job will be deeply resubmitted. • Use Epilogue. Arguments = <string> to pass arguments to the epilogue Epilogoue = “my_epilogue_script” Epilogue = “/bin/false” #Can be Used to test the deep resubmission www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 61 http: //grid. infn. it/
Data. Requirements [cesini@lcg-ui Data. Req]$ cat data-req. jdl ################# # JDL with Data Requirements # ################# Executable = "calc-pi. sh"; # Arguments is the number of digits, must be < 1000000 Arguments = "1000"; Std. Output = "std. out"; Std. Error = "std. err"; Prologue = "prologue. sh"; Fuzzy. Rank = true; Input. Sandbox = {"calc-pi. sh", "file. A", "file. B"}; Output. Sandbox = {"std. out", "std. err", "out-PI. txt", "oute. txt“, ”prologue. sh”}; Requirements = other. Glue. CEInfo. Host. Name != "spacin-ce 1. dma. unina. it"; Data. Requirements = { [ Data. Catalog. Type = "DLI"; Data. Catalog = "http: //lfcserver. cnaf. infn. it: 8085"; Input. Data = {"lfn: /grid/infngrid/cesini/PI_1 M. txt", "lfn: /grid/infngrid/cesini/e-2 M. txt"}; ] }; Data. Access. Protocol = "gsiftp"; www. ccr. infn. it [cesini@lcg-ui Data. Req]$ lcg-lr --vo infngrid lfn: /grid/infngrid/cesini/e-2 M. txt Shows three replicas at MI, BA, PD, LNL, PI [cesini@lcg-ui Data. Req]$ glite-wms-job-list-match a -c. . /wms_rb 00. conf data-req. jdl Connecting to the service https: //glite-rb 00. cnaf. infn. it: 7443/glite_wms_wmproxy_server ================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - t 2 -ce-01. mi. infn. it: 2119/jobmanager-lcgpbs-cert - gridba 2. ba. infn. it: 2119/jobmanager-lcgpbs-cert - prod-ce-01. pd. infn. it: 2119/jobmanager-lcglsf-cert …. Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 62 http: //grid. infn. it/
MULTI NODE JOBS www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 63 http: //grid. infn. it/
Job. Type and Type Job. Type = <string> • Normal - A simple job • Interactive – Starts an interactive section with the user • MPICH - An MPI job • Parametric – A series of jobs depending on a parameter • Checkpointable - DEPRECATED • Partitionable - DEPRECATED Type = <string> • DAG – A Directed Acyclic Graph of jobs • Collection – A flat DAG www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
DAG JOBS • A DAG (directed acyclic graph) represents a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs. • The jobs are nodes (vertices) in the graph and the edges (arcs) identify the dependencies. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 65 http: //grid. infn. it/
Nodes = < classads > • Descries the nodes that create the DAG nodes = [ a=[ /* node “a” */ description = [ the classad contains a Job. Type = “Normal”; JDL which describes the Executable = “a. exe”; node Input. Sandbox = {…}; ]; ]; b=[ file=node_b, jdl; ]; … ]; www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
DAG Dependencies dependencies = < list of string lists > • describe the nodes dependencies • the strings are the node names • format: { { a, b }, { a, c }, { a, d } , { {a, b, c}, e } } • dependencies = { { a, b }; // node “b” depends from node “a” { a, c }; // node “c” depends from node “a” { {a, b, c}, e }; // node “e” depends from node ”a”, “b”. “c” }; max_nodes_running = < positive integer > • Sets the number of maximum number of nodes that DAGMAN can submit to CEs at a given time www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
DAG example [cesini@lcg-ui dag]$ cat dag. jdl ################# # Example of a simple DAG jod # ################# [ Type = "dag"; Input. Sandbox = {"test. sh"}; nodes = [ node. A = [ description = [ Job. Type = "Normal"; Executable = "test. sh"; Input. Sandbox = {"isb_node. A", "test. sh"}; Arguments = "isb_node. A"; ]; ]; mynode = [ description = [ Job. Type = "Normal"; Executable = "test. sh"; Input. Sandbox = {"isb_node. MYNODE", "test. sh"}; Arguments = "isb_node. MYNODE"; ]; ]; www. ccr. infn. it node. D = [ description = [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. D"; Input. Sandbox = {"isb_node. D", "test. sh"}; ]; ]; node. C = [ description = [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. C"; Input. Sandbox = {"isb_node. C", "test. sh"}; ]; ]; node. B = [ description = [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. B"; Input. Sandbox = {"isb_node. B", root. Input. Sandbox}; ]; ]; ]; dependencies = { { node. A, node. B }, { node. A, node. C }, {node. A, mynode }, { { node. B, node. C, mynode }, node. D }}; ]; Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 68 http: //grid. infn. it/
DAG: attributes changes • Type: • Mandatory Virtual. Orgnization, LBAddress, My. Proxy. Server: • Must be the same for all nodes • If declared on the common section are inherited by the nodes • Allow. Zipped. ISB: • a single compressed archive is created for the input sandbox files of all DAG nodes. • Expiry. Time: • If specified in the common section not considered for the DAG itself but inherited by the nodes • Can be specified for each node www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
DAG: attributes changes • Requirements and Rank: – All nodes that do not contain the Requirements and/or Rank expressions inherit those values the one specified for the DAG. • Usertags: – Applied only for the whole DAG • Input. Sandbox, Input. Sandbox. Base. URI and Ouput. Sandbox. Base. URI: – All nodes that do not contain the ISB and/or ISBBase. URI attributes inherit the values from the one specified for the DAG – The ISB declared in the common section can be considered as a “shared ISB” – – OSB is not inherited!!! • Perusal. File. Enable: – All nodes that do not contain this attribute in their descriptions inherit the from the one specified for the DAG. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 70 http: //grid. infn. it/
New Attributes for DAG • Nodes. Collocation = <boolean> – When set to true force the WMS to send all nodes to the same CE • DEFAULTNODERETRYCOUNT = <integer> – specifies value of the Retry. Count attribute to be applied to all nodes of the DAG not specifying their own Retry. Count. • DEFAULTNODESHALLOWRETRYCOUNT – specifies value of the Shallow. Retry. Count attribute to be applied to all nodes of the DAG not specifying their own Shallow. Retry. Count. PAY ATTENTION: Retry. Count and Shallow. Retry. Count are NOT inherited! www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 71 http: //grid. infn. it/
Collection • A job Collection is a set of independent jobs that for some reasons (known to the user) have to be submitted, monitored and controlled as a single request. • A job collection cab be considered as a flat DAG – It is not treated with DAGMAN • As it happens for a DAG, upon submission, besides the identifiers for the sub-jobs, the collection is associated with a Job. Id – JOBType: collection – No description attributes – nodes are described by classadds – No dependencies attribute – Same DAG inheritance rules www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 72 http: //grid. infn. it/
Collection Example [cesini@lcg-ui collection]$ cat collection. jdl #################### # Example of a simple COLLECTION jod # #################### [ Type = "collection"; Input. Sandbox = {"test. sh"}; Std. Output = "std. out"; Std. Error = "std. err"; Output. Sandbox = {"std. out", "std. err"}; nodes = { [ Job. Type = "Normal"; Executable = "test. sh"; Input. Sandbox = {"isb_node. A", "test. sh"}; Arguments = "isb_node. A"; ], [ Job. Type = "Normal"; Executable = "test. sh"; Input. Sandbox = {"isb_node. MYNODE", "test. sh"}; Arguments = "isb_node. MYNODE"; ], www. ccr. infn. it [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. D"; Input. Sandbox = {"isb_node. D", "test. sh"}; ], [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. C"; Input. Sandbox = {"isb_node. C", "test. sh"}; ], [ Job. Type = "Normal"; Executable = "test. sh"; Arguments = "isb_node. B"; Input. Sandbox = {"isb_node. B", root. Input. Sandbox}; ]; Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 73 http: //grid. infn. it/
Bulk Match. Making • Bulk. MM allows the WMS to perform the Match. Making phase for the nodes of a collection in a quicker manner • For all nodes with some identical attributes (chosen by the user) MM is done only once • Bulk. MM must be enable also on server side Significant. Attributes = <list of strings> • Indicates which attributes should be checked to consider nodes as equivalent for MM www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 74 http: //grid. infn. it/
Bulk MM Example [cesini@lcg-ui Collection]$ cat bulk. MM-coll. jdl [ Type = "collection"; requirements=true; Significant. Attributes = { "Requirements" }; Default. Node. Shallow. Retry. Count = 3; Default. Node. Retry. Count = 0; nodes = { [ Job. Type = "Normal"; Executable = "/bin/echo"; Std. Output = "test. out"; Std. Error = "test. err"; Output. Sandbox = {"test. out", "test. err"}; Requirements = other. Glue. CEState. Estimated. Response. Time == 0; Fuzzy. Rank = true; ], www. ccr. infn. it [ Job. Type = "Normal"; Executable = "/bin/echo"; Std. Output = "test. out"; Std. Error = "test. err"; Output. Sandbox = {"test. out", "test. err"}; Requirements = (other. Glue. CEState. Estimated. Response. Time> 0 && other. Glue. CEState. Estimated. Response. Time < 2); Fuzzy. Rank = true; ] }; ]; Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 75 http: //grid. infn. it/
Bulk MM Example [cesini@lcg-ui Collection]$ glite-wms-job-status https: //albalonga. cnaf. infn. it: 9000/If. ZUHIv. Ms. Eqr. NOaiwwv 9 MA ******************************* BOOKKEEPING INFORMATION: Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/If. ZUHIv. Ms. Eqr. NOaiwwv 9 MA Current Status: Running Submitted: Mon Nov 26 19: 22: 01 2007 CET ******************************* - Nodes information: Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/QEjwh. ITa. Is. Xva. Yjs. Qe. Pc. ZA Current Status: Scheduled FOR NODES 1 and 2 Status Reason: Job successfully submitted to Globus - state. Enter. Times = Destination: bogrid 5. bo. infn. it: 2119/jobmanager-lcgpbs-cert Submitted : Mon Nov 26 19: 22: 01 2007 CET Submitted: Mon Nov 26 19: 22: 01 2007 CET Waiting : Mon Nov 26 19: 22: 10 2007 CET ******************************* Ready : Mon Nov 26 19: 22: 10 2007 CET Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/o. Scheduled : Mon Nov 26 19: 22: 32 2007 CET MIPy. S 5 y. MCRad 6 phu. Zj_w Running : Mon Nov 26 19: 23: 48 2007 CET Current Status: Running Status Reason: Job successfully submitted to Globus Destination: virgo-ce. roma 1. infn. it: 2119/jobmanager-lcgpbs-cert Submitted: Mon Nov 26 19: 22: 01 2007 CET ******************************* Status info for the Job : https: //albalonga. cnaf. infn. it: 9000/yj. ERa 75 Fvqu. CNxank. Hfj 2 A Current Status: Waiting Status Reason: Broker. Helper: no compatible resources Submitted: Mon Nov 26 19: 22: 01 2007 CET ******************************* www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 76 http: //grid. infn. it/
DAG vs. COLLECTION If possible collections are to be preferred to DAGs since their handling has been highly optimized: Condor DAGMAN is not used for collection www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 77 http: //grid. infn. it/
Job Parametric Job = “Parametric” • Is a job having one parametric attribute in the JDL. • The submission of a Parametric job results in the submission of a set of jobs having the same descriptions apart from the values of the parametric attributes. • JOBID for parametric job and for parametrized jobs • A special variable (_PARAM_) addresses the parameter • _PARAM_ can take values automatically or from a defined list www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 78 http: //grid. infn. it/
Parameter. Start, Parameter. Step, Parameters, Nodes. Collocation Parameters = < integer | list of strings > • If integer sets the maximum value of the parameter • If list of strings sets the values of the parameter Parameter. Start, Parameter. Step = < Integer > • Parameter. Start sets the first value of the parameter • Parameter. Step sets the difference between two consecutive parameters • The submission will result in the generation of N jobs, where N = (Parameters – Parameter. Start)/Parameter. Step Nodes. Collocation = < Boolean > • If true force the WMS to send all the parameterized jobs to the same CE www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Parametric Job example [cesini@lcg-ui Parametric]$ cat parametric. jdl ##################### # Example of a parametric jdl using step # ##################### Job. Type = "Parametric"; Executable = "test_parametric. sh"; Std. Input = "input_PARAM_. txt"; Std. Output = "myoutput_PARAM_. txt"; Std. Error = "myerror_PARAM_. txt"; Parameter. Start = 1000; Parameters = 1050; Parameter. Step = 10; Input. Sandbox = {"test_parametric. sh", "input_PARAM_. txt"}; Output. Sandbox = {"myoutput_PARAM_. txt", "myerror_PARAM_. txt" }; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 80 http: //grid. infn. it/
Parametric Job example [cesini@lcg-ui Parametric]$ cat parametric 2. jdl ######################### # Example of a parametric jdl using explicit parameters # ######################### [ Job. Type = "Parametric"; Executable = "test_parametric 2. sh"; Std. Output = "myoutput_PARAM_. txt"; Std. Error = "myerror_PARAM_. txt"; Parameters = {Alfa, Beta, Gamma}; Arguments = "_PARAM_"; Input. Sandbox = {"test_parametric 2. sh"}; Output. Sandbox = {"myoutput_PARAM_. txt", "myerror_PARAM_. txt" }; ] www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 81 http: //grid. infn. it/
MPI Job. Type = "mpich"; Node. Number = <integer> • Is an integer greater than 1 specifying the number of CPUs needed for a MPI job. • Only allowed if the job type is MPICH • The RB uses this attribute during the matchmaking for selecting those CE having a number of CPUsequal or greater than the one specified in Node. Number. • - Mandatory: Yes (if the job type is MPICH) www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 82 http: //grid. infn. it/
MPI Example [cesini@lcg-ui MPI]$ cat mpi. jdl ################# # Simple JDL for MPI Job # ################# [ Type = "job"; Job. Type = "mpich"; // This is the minimum number of CPU needed by the job Node. Number = 3; Executable = "cpi"; Std. Output = "cpi. out"; Std. Error = "cpi. err"; Input. Sandbox = {"cpi"}; Output. Sandbox = {"cpi. err", "cpi. out"}; Retry. Count = 3; ] www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 83 http: //grid. infn. it/
MORE JDL ATTRIBUTES and UI CLIENTS OPTIONS www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 84 http: //grid. infn. it/
The Brokerinfo mechanism • The Broker. Info file is a mechanism to access, at job execution time, certain information concerning the job glite-brokerinfo [-v] [-f filename] function [parameter]. . . where function is one of the following: – – – get. CE get. Data. Access. Protocol get. Input. Data get. SEs get. Close. SEs get. SEMount. Point <SE> get. SEFree. Space <SE> get. LFN 2 SFN <LFN> get. SEProtocols <SE> get. SEPort <SE> <Protocol> get. Virtual. Organisation www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 85 http: //grid. infn. it/
Perusal Usage Perusal. File. Enable = < boolean > • Enables the possibility for the user to inspect chunks of files generated by the job at regular time intervals • By default jobs are uploaded on a WMS location • Use glite-wms-job-perusal command to retrieve files Perusal. File. Enable = “true”; Perusal. Time. Interval = < positive integer > • Sets the interval between perusal uploads (in seconds) • It can be overridden by Min. Perusal. Time. Interval on server side Perusal. Time. Interval = 10; Perusal. File. Dest. URI = < string > • Sets the location on a grid. FTP or HTTPS server where the chunks of files have to be copied. Perusal. Files. Dest. URI = “gsiftp: //grid 007 g. cnaf. infn. it/tmp”; www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Perusal Usage [cesini@lcg-ui Perusal]$ cat perusal. jdl ################ # JDL with perusal usage # ################ Executable = "perusal. sh"; Std. Output = "std. out"; Std. Error = "std. err"; Input. Sandbox = {"perusal. sh"}; Output. Sandbox = {"std. out", "std. err", "date. txt"}; Perusal. sh simply writes the date. txt file every 10 seconds Perusal. File. Enable = true; Perusal. Time. Interval = 10; [cesini@lcg-ui Perusal]$ cat perusal-commands. txt # to be lanched when the job is running glite-wms-job-perusal --set -f date. txt https: //albalonga. cnaf. infn. it: 9000/0 WCHp. JRMrd 7 bq 3 C 758 s. OQQ glite-wms-job-perusal --get -f date. txt https: //albalonga. cnaf. infn. it: 9000/0 WCHp. JRMrd 7 bq 3 C 758 s. OQQ www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 87 http: //grid. infn. it/
Perusal Usage …. while the job is still running…. [cesini@lcg-ui Perusal]$ glite-wms-job-perusal --get -f date. txt https: //albalonga. cnaf. infn. it: 9000/0 WCHp. JRMrd 7 bq 3 C 758 s. OQQ Connecting to the service https: //131. 154. 100. 90: 7443/glite_wms_wmproxy_server =========== glite-wms-job-perusal Success =========== The retrieved files have been successfully stored in: /tmp/job. Output/cesini_0 WCHp. JRMrd 7 bq 3 C 758 s. OQQ ===================================== -------------------------------------file 1/1: date. txt-20071122120418_22 -20071122120603_31 -------------------------------------Thu Nov 22 12: 04: 15 CET 2007 Thu Nov 22 12: 04: 25 CET 2007 Thu Nov 22 12: 04: 35 CET 2007 ……… www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 88 http: //grid. infn. it/
Gangmatching Requirements = any. Match (<list of classads>) | which. Match(<list of classads>) | all. Match (<list of classads>) • With “standard” matchmaking only 2 “involved entities” the job and the CE • Gangmatching allows to take into account, besides CE information, also SE information in the matchmaking process • Typical use case for gangmatching: • My job has to run on a CE close to a SE with at least 200 MB of available space: Requirements = any. Match(other. storage. Close. SEs, target. Glue. SAState. Available. Space > 200); www. ccr. infn. it http: //grid. infn. it/
Long living jobs • User proxy have to be short • Jobs are killed when the proxy expires • If my jobs have to run for a longer time? Status Reason: Got a job held event, reason: Globus error 131: the user proxy expired (job is still running) User need to store a long living proxy into the My. Proxy. Server VOMS My. Proxy. Server WMS UI/User www. ccr. infn. it Proxy renewal service Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 90 http: //grid. infn. it/
My. Proxy. Server • Management of the proxy renewal functionality is available via the myproxy commands: – $ myproxy-init -s <myproxy_server> -d -n – $ myproxy-info -s <myproxy_server> -d – $ myproxy-destroy -s <myproxy_server> -d • As the renewal process starts 30 minutes before the old proxy expires, it is necessary to generate an initial proxy long enough, or the renewal may be triggered too late. www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 91 http: //grid. infn. it/
My. Proxy. Server = < string > • specifies the hostname of a My. Proxy server where the user has registered her/his long-term proxy certificate. My. Proxy. Server = “myproxy. cnaf. infn. it: 7512”; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 92 http: //grid. infn. it/
LBAddress = < string > • the address (<host>[: <port>]) of the LB server where the WMS components have to store job information. • If not set it is default is taken on the server side LBAddress = “lb-grid. ct. infn. it“; www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Expiry. Time = < integer > • represents the date and time (in seconds since epoch) until the job has to be considered valid by the WMS. • The glite-wms-job-submit command provides options (--valid, -to) to specify the value for in a user-friendly format. • It can be overridden on the WMS side Expiry. Time = 1112339655; www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
User. Tag Environment = < string or list of strings > • Sets environment variables on the execution machine Environment = {“JOB_LOG_FILE=/tmp/myjob. lo g”, “ORACLE_SID=edg_rdbms_1”, “JAVABIN=/usr/local/java”}; www. ccr. infn. it Grid. School, INFN-CNAF-Bologna, 27 Nov 2007 - 95 http: //grid. infn. it/
Useful Links JDL Attributes WMProxy submission https: //edms. cern. ch/document/590869/1 WMS API Documentation http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 -wm/api_doc/wms_jdl/index. html Glite User. Guide https: //edms. cern. ch/cedar/plsql/cedarw. download_error? cookie=&p_do c_id=722398&p_version=1. 1&p_file_name=&p_error_id=20003 www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
Useful Links WMS Architecture overview http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 -wm/wms. shtml LB Architecture overview http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 -wm/lb. shtml www. ccr. infn. it Roma Tutorial WMS-JDL http: //grid. infn. it/
- Slides: 97