The CSTminer application Giacinto Donvito INFNBari Tutorial on
The CSTminer application Giacinto Donvito INFN-Bari Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Outlook Application description Input files Output files arguments Example of challenges Optimizations Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Application description “CSTminer compares two or more sequences to identify conserved tracts and classifies them as coding (likely belonging to protein coding genes) or non-coding (potential regulatory regions)” Static binary (easy to port over the grid) Good CPU efficiency Small input Fairly small output Made by a lot of small and independent task Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Application description (2) Huge number of tasks (hundreds of thousands) Difficult to take trace of each task status The execution time of the single comparisons is too small (few seconds of CPU time) It is better to join in the same input files more than one sequence: It is needed to split the whole genome in few files depending on the running time. Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Application description (3) It needs two input files for each run The output file is a single file for each run The command line contains information about both executable configuration and input/output files It is easy to be parametrized Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Example of challenge Vitis: Comparisons between Vitis genome and other 3 genomes: 207624 Tasks 3 CPU/Years 665 MB of compressed input ~5 GB of compressed output Vitis on the Grid: 5 days of run on the EGEE Grid ~ 220 speedup factor 3276 different WNs Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
Example of challenge Mouse vs Human: ~800 M CSTminer runs (100 K) ~2 sec CPU for each run 50 years over 1 single CPU 22 farm used > 900 different hosts used 2 months over INFN-Grid infrastructure (our national g. Lite infrastructure) Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
DEMO Ready for testing it!! Portal: http: //webcms. ba. infn. it/~pierro/JST/index. php Executable: http: //192. 168. 0. 1/CSTminer Input files: http: //192. 168. 0. 1/ATH-genes. tar. bz 2; http: //192. 168. 0. 1/Vitis-genes. tar. bz 2 Tutorial on "GRID Computing“, EMBnet Conference 2008, 17 September 2008
- Slides: 8