Stork An Introduction Condor Week 2006 Milan Condor

  • Slides: 21
Download presentation
Stork An Introduction Condor Week 2006 Milan Condor Project Computer Sciences Department University of

Stork An Introduction Condor Week 2006 Milan Condor Project Computer Sciences Department University of Wisconsin-Madison http: //www. cs. wisc. edu/condor

Two Main Ideas • Make data transfers a “first class citizen” in Condor •

Two Main Ideas • Make data transfers a “first class citizen” in Condor • Reuse items in the Condor toolbox http: //www. cs. wisc. edu/condor 2

The tools • Class. Ads • Matchmaking • DAGMan http: //www. cs. wisc. edu/condor

The tools • Class. Ads • Matchmaking • DAGMan http: //www. cs. wisc. edu/condor 3

The data transfer problem • Process large data sets at sites on grid. For

The data transfer problem • Process large data sets at sites on grid. For each data set: o stage in data from remote server o run CPU data processing job o stage out data to remote server http: //www. cs. wisc. edu/condor 4

Simple Data Transfer Job #!/bin/sh globus-url-copy source dest Often works fine for short, simple

Simple Data Transfer Job #!/bin/sh globus-url-copy source dest Often works fine for short, simple data transfers, but… http: //www. cs. wisc. edu/condor 5

What can go wrong? • Too many transfers at one time • Service down;

What can go wrong? • Too many transfers at one time • Service down; need to try later • Service down; need to try alternate data source • Partial transfers • Time out; not worth waiting anymore http: //www. cs. wisc. edu/condor 6

Stork • What Schedd is to CPU jobs, Stork is to data placement jobs.

Stork • What Schedd is to CPU jobs, Stork is to data placement jobs. o Job queue o Flow control o Failure-handling policies o Event log http: //www. cs. wisc. edu/condor 7

Supported Data Transfers • local file system • Grid. FTP • HTTP • SRB

Supported Data Transfers • local file system • Grid. FTP • HTTP • SRB • Ne. ST • SRM • other protocols via simple plugin http: //www. cs. wisc. edu/condor 8

Stork Commands stork_submit stork_q stork_status stork_rm - submit a job - list the job

Stork Commands stork_submit stork_q stork_status stork_rm - submit a job - list the job queue - show completion status - cancel a job http: //www. cs. wisc. edu/condor 9

Creating a Submit Description File • A plain ASCII text file • Tells Stork

Creating a Submit Description File • A plain ASCII text file • Tells Stork about your job: o source/destination o alternate protocols o proxy location o debugging logs o command-line arguments http: //www. cs. wisc. edu/condor 10

Simple Submit File // c++ style comment lines [ dap_type = "transfer"; src_url =

Simple Submit File // c++ style comment lines [ dap_type = "transfer"; src_url = "gsiftp: //server/path”; dest_url = "file: ///dir/file"; x 509 proxy = "default"; log = "stage-in. out. log"; output = "stage-in. out"; err = "stage-in. out. err"; ] Note: different format from Condor submit files http: //www. cs. wisc. edu/condor 11

Sample stork_submit # stork_submit stage-in. stork using default proxy: /tmp/x 509 up_u 19100 ========

Sample stork_submit # stork_submit stage-in. stork using default proxy: /tmp/x 509 up_u 19100 ======== Sending request: [ dest_url = "file: ///dir/file"; src_url = "gsiftp: //server/path"; err = "path/stage-in. out. err"; output = "path/stage-in. out"; dap_type = "transfer"; log = "path/stage-in. out. log"; x 509 proxy = "default" ] ======== Request assigned id: 1 # returned job id http: //www. cs. wisc. edu/condor 12

Sample Stork User Log 000. . . 001. . . 008. . . 005

Sample Stork User Log 000. . . 001. . . 008. . . 005 (001. -01) 04/17 19: 30: 00 Job submitted from host: <128. 105. 121. (001. -01) 04/17 19: 30: 01 Job executing on host: <128. 105. 121. 53 (001. -01) 04/17 19: 30: 01 job type: transfer (001. -01) 04/17 19: 30: 01 src_url: gsiftp: //server/path (001. -01) 04/17 19: 30: 01 dest_url: file: ///dir/file (001. -01) 04/17 19: 30: 02 Job terminated. (1) Normal termination (return value 0) Usr 0 00: 00, Sys 0 00: 00 - Run Remote Usage Usr 0 00: 00, Sys 0 00: 00 - Run Local Usage Usr 0 00: 00, Sys 0 00: 00 - Total Remote Usage Usr 0 00: 00, Sys 0 00: 00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job . . . http: //www. cs. wisc. edu/condor 13

Who needs Stork? SRM exists. It provides a job queue, logging, etc. Why not

Who needs Stork? SRM exists. It provides a job queue, logging, etc. Why not use that? http: //www. cs. wisc. edu/condor 14

Use whatever makes sense! • Another way to view Stork: • Glue between DAGMan

Use whatever makes sense! • Another way to view Stork: • Glue between DAGMan and data • transport or transport scheduler. So one DAG can describe a workflow, including both data movement and computation steps. http: //www. cs. wisc. edu/condor 15

Stork jobs in a DAG • A DAG is defined by a text file,

Stork jobs in a DAG • A DAG is defined by a text file, listing each • job and its dependents: # data-process. dag IN Data IN in. stork Job CRUNCH crunch. condor CRUNCH Data OUT out. stork Parent IN Child CRUNCH OUT Parent CRUNCH Child OUT each node will run the Condor or Stork job specified by accompanying submit file http: //www. cs. wisc. edu/condor 16

Important Stork Parameters • STORK_MAX_NUM_JOBS limits number of active jobs • STORK_MAX_RETRY limits job

Important Stork Parameters • STORK_MAX_NUM_JOBS limits number of active jobs • STORK_MAX_RETRY limits job attempts, before job marked as failed • STORK_MAXDELAY_INMINUTES specifies “hung job” threshold http: //www. cs. wisc. edu/condor 17

Features in Development Matchmaking o Job Class. Ad with site Class. Ad o Global

Features in Development Matchmaking o Job Class. Ad with site Class. Ad o Global max transfers per site limits o Load balancing across sites o Dynamic reconfiguration of sites o Coordination of multiple instances of Stork Working prototype developed with Globus gridftp team http: //www. cs. wisc. edu/condor 18

Further Ahead • Automatic startup of personal stork • • server on demand Fair

Further Ahead • Automatic startup of personal stork • • server on demand Fair sharing between users Fit into new pluggable scheduling framework ala schedd-on-the-side http: //www. cs. wisc. edu/condor 19

Summary • Stork manages a job queue for data transfers • A DAG may

Summary • Stork manages a job queue for data transfers • A DAG may describe a workflow containing both data movement and processing steps. http: //www. cs. wisc. edu/condor 20

Additional Resources • http: //www. cs. wisc. edu/condor/stork/ • Condor Manual, Stork sections •

Additional Resources • http: //www. cs. wisc. edu/condor/stork/ • Condor Manual, Stork sections • stork-announce@cs. wisc. edu list • stork-discuss@cs. wisc. edu list http: //www. cs. wisc. edu/condor 21