JSS Job Submission Service Marco Verlato Massimo Sgaravatto
JSS Job Submission Service Marco Verlato Massimo Sgaravatto INFN Padova
Current activities n n Analysis of Condor-G Development of wrapper of Condor-G
Analysis of Condor-G So far the following bugs/missing functionalities wrt. the functionalities required for the PM 9 release have been identified: n 1. Support for the x 509 userproxy attribute n 2. 3. “Start running” event missing for very short jobs Info about failure of a submission to a Globus resource missing in the job log file (only present in the gridmanager log file) n 4. n 6. Necessary if we want to exploit the libcondorapi. a to parse the log file Info about when a job has been successfully sent to a Globus resource missing n 5. Since the JSS must submit jobs on behalf of different users Only the events “submitted”, “running” and “completed” are recorded Necessary to notify the L&B service Support for refresh of user credential API for condor_submit and condor_rm
Analysis of Condor-G This list of desiderata submitted to Condor team (to understand if, how, when they can address these issues, and how we could contribute) Response of Condor team n n n 1, 2 hopefully addressed n n n They are going to address 3, 4 in a short time 5: Problem ! n n n The Globus GRAM API doesn't have a real way to refresh the x 509 proxy Functionality asked to Globus team by Condor team but it is not clear if and how they are planning to address the problem 6: Not in the short time n n Fixes just received: seem working We can survive without the condor_submit and condor_rm API (not too “elegant”, but …) Access to the source code
Development of JSS n Architecture n 2 processes n 1 process listening for incoming client (RB) requests for job submission/cancel n n Client-server communications from RB and JSS achieved by means of API calls n Set of client API’s provided from JSS to RB for job submission and job cancel n API provided from RB to JSS for asynchronous notifications 1 process parsing the job log file
Job submission n Receive from RB JDL expression (augmented with Globus. Resource. Contact. String + Queue. Name + Local. Pathname) + Input. Sandbox. Dir + Output. Sandbox. Dir Build the job wrapper script Build the Condor-G submit file n Value for attribute x 509 userproxy missing n n n Need to agree and implement a solution for user credential delegation (see CESNET’s proposals on My. Proxy) Issue condor_submit Save persistently info about job (dg_job. Id, condor_job. Id, . . ) Notify RB (JOB_ACCEPTED/JOB_REFUSED) Notify L&B (JSSAccepted. Event/JSSRefused. Event)
Job removal n n Receive from RB list of dg_job. Id’s to remove For each job n n Find the correspondent condor_job. Id Issue condor_rm
What is missing/issues n n Condor-G missing functionalities (in particular 3) Process for parsing the job log file n n n n Cancelling the job if failures > Retry. Count Mail to User. Contact when job starts running Notifying the RB (JOB_DONE, JOB_CANCELLED, JOB_ABORTED) Notifying the L&B service (JSSTransfer. Event) Testing Improve implementation Issues n Refresh of user credential
- Slides: 8