RISICO on the GRID architecture First implementation Mirko

  • Slides: 14
Download presentation
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra

RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra

Outline of the presentation ➲ ➲ Porting features; Jobs management; Implementation tests and results;

Outline of the presentation ➲ ➲ Porting features; Jobs management; Implementation tests and results; Conclusions and further development.

Porting features Totally implemented in python. Uses the same executable of the RISICO system

Porting features Totally implemented in python. Uses the same executable of the RISICO system (no changes needed). ➲ Easily configurable through configuration file. ➲ ➲

The RISICO system ➲ ➲ ➲ Italy: 310000 km^2 Current system: 300 k regular

The RISICO system ➲ ➲ ➲ Italy: 310000 km^2 Current system: 300 k regular cells, 1 km side. Grid version: 30 M regular cells, 0. 1 km side. GRIDIFICATION

RISICO vs GRID-RISICO Get Input from Database Upload Input into catalog Create n jobs

RISICO vs GRID-RISICO Get Input from Database Upload Input into catalog Create n jobs JOB 1 Get input from catalog Run RISICO GRIDIFICATION JOB n Get input from catalog Run RISICO on dataset 1 Run RISICO on dataset n Write output 1 to catalog Write output n to catalog Collect outputs from catalog Write Output to Database Write Outputs to Database

Job submission A RISICO's job is fully defined by a jdl (job description language)

Job submission A RISICO's job is fully defined by a jdl (job description language) file and by a parameter file. ➲ Each submitted job must terminate successfully within a defined time. The job activity is monitored by a software module called Job. Monitor. ➲ The job submission procedure is handled by a Job. Submitter, which creates a set of job and associates a Job. Monitor with each job. ➲

Job Monitoring All the jobs are monitored by an instance of a module called

Job Monitoring All the jobs are monitored by an instance of a module called Job. Monitor. ➲ The Job. Monitor: ➲ Checks the job status during execution; Retrieves the job output from catalog; If the job fails, Job. Monitor tries to resubmit it. Job. Monitor will log the error if the job fails to run correctly.

Workflow: job creation, submission and data-collection Downloads input from remote meteo-database, creates an archive

Workflow: job creation, submission and data-collection Downloads input from remote meteo-database, creates an archive and uploads it to catalog; ➲ Creates a jdl and parameters file for each job; ➲ Submits the jobs. ➲ Waits for jobs output. ➲ Gets jobs output from catalog and aggregates them. ➲

Job definition (1) ➲ ➲ Each job works with a specific dataset defining a

Job definition (1) ➲ ➲ Each job works with a specific dataset defining a spatial domain (subset). Such subsets are created off-line and stored on the catalog. A parameters file states the association between a job and a dataset. Each job produces an output, whose path in the catalog is a-priori known. job 1 job n

Job definition (2) Job 1: Domain: Status: Input: Output: celle/celle_01. tar. bz 2 celle/stato

Job definition (2) Job 1: Domain: Status: Input: Output: celle/celle_01. tar. bz 2 celle/stato 0_01. tar. bz 2 input/input_20070119. tar. bz 2 output/output_01_20071119. tar. bz 2 Each job has its own domain. ➲ Job domain, status information and output are referred to the same geographical domain ➲ All jobs share the same input file. ➲

Job definition (3) Job 1: Domain: Status: Input: Output: celle/celle_01. tar. bz 2 celle/stato

Job definition (3) Job 1: Domain: Status: Input: Output: celle/celle_01. tar. bz 2 celle/stato 0_01. tar. bz 2 input/input_20070119. tar. bz 2 output/output_01_20071119. tar. bz 2 Job 2: Domain: Status: Input: Output: celle/celle_02. tar. bz 2 celle/stato 0_02. tar. bz 2 input/input_20070119. tar. bz 2 output/output_02_20071119. tar. bz 2 Job n: Domain: Status: Input: Output: celle/celle_nn. tar. bz 2 celle/stato 0_nn. tar. bz 2 input/input_20070119. tar. bz 2 output/output_nn_20071119. tar. bz 2 CATALOG

Final version ➲ Estimated performances on the complete set of data (30 M cells):

Final version ➲ Estimated performances on the complete set of data (30 M cells): Total CPU-Time: about 2 hours and 30 minutes; Optimal job number: about 30 (5 -10 minutes of CPU time for each job); Storage: 30 GByte / day.

Test Results The porting has been tested with a subset (1 M cells) of

Test Results The porting has been tested with a subset (1 M cells) of the RISICO system final working-set. ➲ 10 parallel jobs were used. ➲ Performances: ➲ Job CPU-time: 30 seconds Grid overhead: 2 minutes.

Conclusions RISICO represents a feasible and significative test case. ➲ Grid architecture provides a

Conclusions RISICO represents a feasible and significative test case. ➲ Grid architecture provides a valuable benefits to operational activities. ➲