Enabling Grids for Escienc E Special Jobs Valeria
Enabling Grids for E-scienc. E Special Jobs Valeria Ardizzone INFN Catania Roma, 18 -21 April 2006 www. eu-egee. org EGEE-II INFSO-RI-031688
Outline Enabling Grids for E-scienc. E • Overview MPI - How to create a MPI job in middleware. • Overview WMProxy - DAG Collection Parametric EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Overview Enabling Grids for E-scienc. E • Execution of parallel jobs is an essential issue for modern conceptions of informatics and application. • Most used library for parallel jobs support is (Message Passing Interface) MPI • At the state of the art, parallel jobs can run inside single Computing Elements (CE) only; – several projects are involved into studies concerning the possibility of executing parallel jos on Worker Nodes (WNs) belonging to differents CEs. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Requirements & Settings Enabling Grids for E-scienc. E • In order to garantee that MPI job can run, the following requirements MUST BE satisfied: – the MPICH software must be installed and placed in the PATH environment variable, on all the WNs of the CE. – Some MPI’s applications required a shared filesystem among the WNs to run. § The variable VO_<name_of_VO>_SW_DIR will contain the name of a directory in case of SHARED filesystem. § The variable VO_<name_of_VO>_SW_DIR will contain “. ” if there is NO SHARED filesystem. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Requirements & Settings Enabling Grids for E-scienc. E – The Executable that is specified in the JDL must not be the MPI application directly, but a wrapper script that invokes the MPI applications by calling mpirun command. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E How to create a MPI Job EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E • For the user’s point of view, jobs to be run as MPI are specified setting the JDL Job. Type attribute to MPICH and specifying the Node. Number attribute as well. E. g. : Job. Type = “MPICH”; Node. Number = 4; EGEE-II INFSO-RI-031688 This attribute define the required number of CPUs needed for the application. Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E • When these two attributes are included in a JDL the User Interface (UI) automatically add the following expression (other. Glue. CEInfo. Total. CPUs >= Node. Number) && Member (“MPICH”, other. Glue. Host. Application. Software. Run. Time. Environment) to the JDL requirements expression in order to find out the best resource where the job can be executed. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E MPI jobs with LCG middleware EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
the problem. . . Enabling Grids for E-scienc. E • Unfortunately LCG project was not synchronized with the latter requirement avoiding to share disk space with nodes inside the same CE. • This drove us to spend our time in providing a ad-hoc solution in order to find an efficent workaround to this problem. • The solution adopted bypasses the problem by putting some intelligence inside the script passed in Inputsandbox. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
… the solution Enabling Grids for E-scienc. E • In detail each job has to mirror, via scp, its files on all nodes dedicated to it. ssh hostbased authentication MUST BE well configured between all the WNs. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
mpi. jdl Enabling Grids for E-scienc. E [ Type = "Job"; The number of threads Actuallyspecified the Local Job. Type = "MPICH"; with Node. Number attribute agrees Resource with the second. Manager Argument. It will be supported Executable = "MPItest. sh"; used during the invoking of mpirun are PBS Node. Number = 5; command. and LSF only. Arguments = "cpi 5"; Std. Output = "test. out"; Std. Error = "test. err"; Input. Sandbox = {"MPItest. sh", "cpi"}; Output. Sandbox = {"test. err", "test. out", "executable. out"}; Requirements = other. Glue. CEInfo. LRMSType == "PBS" || other. Glue. CEInfo. LRMSType == "LSF"; ] EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
MPItest. sh Enabling Grids for E-scienc. E for i in `cat $HOST_NODEFILE` $HOST_NODEFILE ; do echo "Mirroring via SSH to $i" # creates the working directories on all the nodes allocated for parallel execution. ssh $i mkdir -p `pwd` # copies the needed files on all the nodes allocated for parallel execution. /usr/bin/scp -rp. /* $i: `pwd` # checks that all files are present on all the nodes allocated for parallel execution. ssh $i ls `pwd` The Environment done # execute the parallel job with mpirun. echo "Executing $EXE" chmod 755 $EXE variable $HOST_NODEFILE contains the list of WNs allocated for the parallel execution. mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE > executable. out EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E MPI jobs with g. Lite middleware EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E • Unlike the LCG middleware, g. Lite WMS is able to support both configurations (shared and not shared) automatically for both LSF and Torque. • With g. Lite-1. 4 job wrapper will take care to mirror the working directory in all nodes dedicated to the mpi job if the home are not shared. EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
mpi-glite. jdl Enabling Grids for E-scienc. E [ Job. Type = "MPICH"; Node. Number = 6; Executable = "cpi"; Std. Output = "cpi. out"; Std. Error = "cpi. err"; Input. Sandbox = {"cpi"}; Output. Sandbox = {"cpi. err", "cpi. out"}; Requirements = (Member("GLITE-1. 4", other. Glue. Host. Application. Software. Run. Time. Environmen t) && (other. Glue. CEInfo. Total. CPUs >= 10)); ] EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Enabling Grids for E-scienc. E • The command sequence to submit this example is this one: voms-proxy-init –-voms gilda edg-job-submit mpi. jdl edg-job-status <Job. ID> glite-job-submit mpi-glite. jdl glite-job-status <Job. ID> EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
mpi job @ work ! Enabling Grids for E-scienc. E pi is approximately 3. 1415926544231239, Error is 0. 000008333307 wall clock time = 10. 008261 Process 0 of 6 on grid 036. ct. infn. it Process 1 of 6 on grid 036. ct. infn. it Process 2 of 6 on grid 033. ct. infn. it Process 3 of 6 on grid 033. ct. infn. it Process 5 of 6 on grid 034. ct. infn. it Process 4 of 6 on grid 034. ct. infn. it EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
Overview WMProxy Enabling Grids for E-scienc. E • WMProxy (Workload Manager Proxy) – is a new service providing access to the g. Lite Workload Management System (WMS) functionality through a simple Web Services based interface. – has been designed to efficiently handle a large number of requests for job submission and control to the WMS – the service interface addresses the Web Services and SOA architecture standards, in particular adhering to WS-I – Developed in C++ using gsoap 2. 7. 6 b as soap stubs generator EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
New request types Enabling Grids for E-scienc. E • Support for new types strongly relies on newly developed JDL converters and on the DAG submission support – All JDL conversions are performed on the server – A single submission for several jobs • All new request types can be monitored and controlled through a single handle (the request id) – Each sub-jobs can be however followed-up and controlled independently through its own id • “Smarter” WMS client commands/API – allow submission of DAGs, Collections and parametric jobs exploiting the concept of “shared sandbox” – allow automatic generation and submission of collections and DAGs from sets of JDL files located in user specified directories on the UI EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
WMProxy Enabling Grids for E-scienc. E • Delegation approach has been changed – Delegation is no more part of the authentication process – Can be done only once for multiple jobs – WMProxy imports the delegation port type provided by Grid. Site and shared by all g. Lite components • LCMAPS is used for user mapping as it is for the CE – the gsi-free flavour of LCMAPS – works with VOMS pool account/group too • Authorization based on DN+FQAN – Using Gridsite gacl EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 Direct Acyclic Graph Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
JDL structure Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 Attribute: Nodes Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 Attribute: Dependencies Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Job Collections Enabling Grids for E-scienc. E • Job Collections: – A Collection is a set of independent jobs that for some reason (known to the user) have to be submitted, monitored and controlled as a single request – the JDL description for a Collection is quite simple as it basically consists of a list of JDL descriptions (the sub-jobs) – Same features as for DAGs are available § Shared sandboxes § Attributes Inheritance § Attribute references between nodes and with the ‘parent’ [ Type = "collection"; nodes = { [ <job descr 1 >], [ <job descr 2 >], … }; … ] EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
‘Scattered’ Input Sandboxes Enabling Grids for E-scienc. E • Input Sandbox can contain – file paths on the UI machine (i. e. the usual way) – URI pointing to files on a remote grid. FTP/HTTPS server Input. Sandbox = { "gsiftp: //neo. datamat. it: 2811/var/prg/sim. exe", "https: //ghemon. cnaf. infn. it: 8443/data/idat_1", "file: ///home/pacio/myconf"}; • A base URI to be applied to all sandbox files can also be specified Input. Sandbox. Base. URI = "gsiftp: //matrix. datamat. it: 2811/var"; • Only local files (file: //) are uploaded to the WMS node • File pointed by URIs are directly downloaded on the WN by the Job. Wrapper just before the job is started EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
‘Scattered’ Output Sandboxes Enabling Grids for E-scienc. E • JDL has been enriched with new attributes for specifying the destinations for the files listed in the Output. Sandbox attribute list Output. Sandbox = { "job. Output", "run 1/event 1", "job. Error" }; Output. Sandbox. Dest. URI = { "gsiftp: //matrix. datamat. it/var/job. Output", "https: //grid 003. ct. infn. it: 8443/home/cms/event 1", "gsiftp: //matrix. datamat. it/var/job. Error" }; • A base URI to be applied to all sandbox files can also be specified Output. Sandbox. Base. Dest. URI = "gsiftp: //neo. datamat. it/home/run 1/"; • Files are copied when the job has completed execution by the Job. Wrapper to the specified destination without transiting on the WMS node EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Parametric job Enabling Grids for E-scienc. E • In a parametric job, multiple job are generated from a single jdl, though parametrization of one or more of the attributes • Each generated jdl will have a specific value of the parameter • Parameters can be listed or enumered Input. Sandbox = “input_PARAM_. txt"; Std. Output = "myoutput_PARAM_. txt"; Std. Error = "myerror_PARAM_. txt"; Parameters = 2500; Parameter. Step = 100; 24 job generated Parameter. Start = 1000; • Job monitoring / managing is always done through an unique job. ID, as if the job was single (see collection), altough is possible gathering infos on single jobs EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Enabling Grids for E-scienc. E Hands-on EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
MPI exercise Enabling Grids for E-scienc. E Create the following “mpiglite. jdl” file on your home. [ Type = "Job"; Job. Type = "MPICH"; Executable = “cpi"; Node. Number = 2; Std. Output = “cpi. out"; Std. Error = “cpi. err"; Input. Sandbox = {"cpi"}; Output. Sandbox = {“cip. err", “cip. out"}; Requirements = Member("GLITE-1. 4", other. Glue. Host. Application. Software. Run. Time. Environment); Retry. Count = 0; ] EGEE-II INFSO-RI-031688 Marc-Elian Bégin - Demos - 1 st EU review
MPI submission Enabling Grids for E-scienc. E [glite-tutor] /home/vardizzo > glite-job-submit -o id mpi_glite. jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb. ct. infn. it, port 7772 Logging to host glite-rb. ct. infn. it, port 9002 ===== glite-job-submit Success ============= The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https: //glite-rb. ct. infn. it: 9000/bsrbbzbc. XZWSz. U 3 i. UYlm 6 g The job identifier has been saved in the following file: /home/vardizzo/id ============================= EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
MPI status and output Enabling Grids for E-scienc. E Query the status of the job using the following command: [glite-tutor] /home/vardizzo > glite-job-status -i id ………………………. When the status of the job is “DONE”, you can retrieve output with the following command: [glite-tutor] /home/vardizzo > glite-job-output -i id ……………………… EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
WMProxy : submission & monitoring Enabling Grids for E-scienc. E • In order to submit job with WMProxy, it’s mandatatory credentials delegation /home/giorgio Couldn't find /home/giorgio [cut…] Your proxy is > voms-proxy-info a valid proxy. > voms-proxy-init --voms gilda valid until Thu Apr 6 06: 42: 32 2006 /home/giorgio > glite-wms-job-delegate-proxy -d delegation. Id • The submission/monitoring commands are slightly different, but the most of the “old” options are supported glite-wms-job-submit –d delegation. Id <other opts> jdl glite-job-output glite-wms-job-output -wms. EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
DAG job Enabling Grids for E-scienc. E • DAG job is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other ones • Dependencies are represented through Directed Acyclic Graphs, where the nodes are graphs, and the edges identify the dependencies node. A node. B node. C Node. F node. D EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
DAG jdl Enabling Grids for E-scienc. E [ ; ] type = "dag"; max_nodes_running = 4; nodes = [ node. A = [ file ="nodes/node. A. jdl" ; ]; node. B = [ file ="nodes/node. B. jdl" ; ]; node. C = [ file ="nodes/node. C. jdl" ; ]; node. F = [ file ="nodes/node. F. jdl"; ]; dependencies = { {node. A, node. B}, {node. A, node. C}, {node. A, node. F}, { {node. B, node. C, node. F}, node. D } } ]; EGEE-II INFSO-RI-031688 Node description could be done also here, instead of using separate file Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Esempio di DAG job Enabling Grids for E-scienc. E [ type = "dag"; max_nodes_running = 2; Input. Sandbox = {"print_fathers. sh"}; nodes = [ Node. A = [ description = [ Executable = "/bin/hostname"; Std. Output= "Node. A. out"; Std. Error = "Node. A. err"; Output. Sandbox = {"Node. A. out", "Node. A. err"}; . 4 1 e t i ]; ]; L g n o Node. B = [ description = [ r e Executable = "/bin/sh"; f ord o t u Arguments = "print_fathers. sh"; O Std. Output= "Node. B. out"; Std. Error = "Node. B. err"; Input. Sandbox = {"print_fathers. sh", root. nodes. Node. A. description. Output. Sandbox[0]}; Output. Sandbox = {"Node. B. out", "Node. B. err"}; ]; ]; dependencies = { {Node. A, Node. B} } ]; ] EGEE-II INFSO-RI-031688
Job Collection Enabling Grids for E-scienc. E • Job collection is a set of independent jobs that user wants to submit and monitor as a single request • Jobs of a collection are submitted as DAG nodes, without dependencies • JDL is a list of classad, which describes the subjobs [ Type = "collection"; Virtual. Organisation = “gilda"; nodes = { [ <job descr 1 >], [ <job descr 2 >], … }; ] EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
gridftp basic use Enabling Grids for E-scienc. E • Upload TO a gridftp server globus-url-copy file: <local absolute path> gsiftp: //<gsiftp server>/<remote absolute path> Ex: globus-url-copy file: /home/madrid 01/file. txt gsiftp: //gliterb. ct. infn. it/tmp/file. txt • Download FROM a gridftp server globus-url-copy gsiftp: //<gsiftp server>/<remote absolute path> file: <local absolute path> Ex: globus-url-copy gsiftp: //glite-rb. ct. infn. it/tmp/file. txt file: /home/madrid 01/file. txt EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Job collection examples Enabling Grids for E-scienc. E [ Type = "collection"; Input. Sandbox = {"start_hostname. sh"}; Retry. Count = 0; nodes={ [ Executable = "/bin/sh"; Std. Output = "host. out"; Std. Error = "host. err"; Input. Sandbox = root. Input. Sandbox; Output. Sandbox = {"host. err", "host. out"}; Output. Sandbox. URI={"gsiftp: //glite-rb. ct. infn. it: 2811/tmp/host. out", "host. err"}; Arguments = "start_hostname. sh"; ], [ Executable = "/bin/sh"; Std. Output = "test. out"; Std. Error = "test. err"; Input. Sandbox={"starter. sh", "gsiftp: //gliterb. ct. infn. it: 2811/tmp/t 01. txt"}; Output. Sandbox = {"test. err", "test. out"}; Arguments = "starter. sh"; ], [ file = "hostname. jdl"; ] }; ] EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Parametric Job Enabling Grids for E-scienc. E • A parametric job is a job where one or more of its attributes are parametrized • Value of attributes vary according to parameter [ Job. Type = "Parametric"; Executable = “/bin/echo"; Arguments = “PARAM”; #Input. Sandbox = “input_PARAM_. txt"; Std. Output = "myoutput_PARAM_. txt"; Std. Error = "myerror_PARAM_. txt"; Parameters = 2500; Parameter. Step = 100; Parameter. Start = 1000; Output. Sandbox = {“myoutput_PARAM_. txt”}; ] • Job monitoring / managing is always done through an unique job. ID, as if the job was single (see submission of collection EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Parametric job / 2 Enabling Grids for E-scienc. E • Parameter can be also a list of string • Input. Sandbox (if present) has to be coherent with parameters [ui-test] /home/giorgio/param > cat param 2. jdl [ Job. Type = "Parametric"; Executable = “/bin/cat"; Arguments = “input_PARAM_. txt”; Input. Sandbox = "input_PARAM_. txt"; Std. Output = "myoutput_PARAM_. txt"; Std. Error = "myerror_PARAM_. txt"; Parameters = {earth, moon, mars}; Output. Sandbox = {“myoutput_PARAM_. txt”}; ] [ui-test] /home/giorgio/param > ls input. EARTH. txt input. MARS. txt input. MOON. txt param 2. jdl EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
References Enabling Grids for E-scienc. E • JDL attributes specification for WM proxy – https: //edms. cern. ch/document/590869/1 • WMProxy quickstart – http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 wm/wmproxy_client_quickstart. shtml • WMS user guides – https: //edms. cern. ch/document/572489/1 EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
Enabling Grids for E-scienc. E THE END EGEE-II INFSO-RI-031688 Valeria Ardizzone, INFN Catania, EGEE g. Lite Tutorial, 18 -21 April 2006 Rome
- Slides: 44