GRID Integration in LHCb Vincenzo Vagnoni INFN Bologna
GRID Integration in LHCb Vincenzo Vagnoni INFN Bologna Draft INFN-GRID Technical Board CNAF, 15 th February 2005 1
How LHCb has been making use of the GRID (MC production) n n Initially, LHCb developed its home-made distributed computing framework: DIRAC Then, when LCG came into production, it seemed very convenient to adopt a hook to integrate into LCG almost for free what was already existing n n A small bash script is sent to the CE through the LCG WMS When the script reaches the WN, it makes some checks in order to validate the usage of the WN (e. g. checks for available disk space in home directory, etc. ) If the environment is clean, it downloads the DIRAC framework, downloads the “real” job from the DIRAC job task database (if the software is not yet installed on the VO shared directory it installs it from scratch in the home directory), and finally executes the job If the job completes successfully, the produced data are moved to the SE by means of GRIDFTP and registered in the LHCb file catalogues 2
How LHCb is now going to make use of the GRID (data analysis) n Data analysis in LHCb has two phases n n n Data stripping is starting these days n n Data stripping: reduction of the full data sample by means of pre-selection algorithms to be run in a scheduled way, i. e. centrally managed Chaotic data analysis: the reduced output of the data stripping (which is relatively small and should be kept on disk) is analyzed chaotically by the end-physicists for final analysis Some problems with the SRM at CNAF, now seems ok Both phases at the Tier 1 will be made now with the same LCG/DIRAC mechanism 3
Some comments on the LHCb approach n The LCG/DIRAC approach resulted very useful n n n While waiting for a more reliable LCG infrastructure, this approach ensures that “real” jobs don’t get lost in the UI/RB/CE/WN chain If a WN is mis-configured, the dummy job is lost, but not the real one All the “real” jobs are executed in the same order as the central DIRAC job repository was populated n n This can be similarly achieved with plain LCG with advanced reservation features, executing all jobs in a bunch However, the current LHCb approach is not what we would like to have in the end n n This approach requires continuous automatic submission of dummy jobs to the LCG, in order to keep the queues full according to the presence of DIRAC jobs in the DIRAC task database This is not the “true” spirit of LCG, even if can be “acceptable” from the point of view of the LCG 4
Understanding the WMS n Study in detail the performance of the WMS n n The idea is to setup a testbed for making a stress test of the WMS With a small amount of resources (e. g. one rack of machines) we could simulate the existence of more CEs, and a large number of CPUs Positive discussion with Francesco Giacomini, we should now go ahead We could make this way many checks and study the scalability of the WMS in a well controlled environment n n n Study the performance by varying e. g. : job submission rate, length of the jobs, number of the running jobs, size of input and output sandbox, number of files in input and output sandbox, etc. This is also important for us (LHCb) to make experience It would be nice to make this work together with the other 5 experiments
Pilot jobs n There are plans to introduce features in the WMS to facilitate approaches a la LHCb n Apart for the LHCb interest, this could reveal of general interest for the integration in GRID of already existing distributed applications of any kind n It might be important in a phase when the GRID really starts to spread out the limited environment of HEP applications n We can provide feedback and support to the developers n A written draft on the specifications of the Pilot Jobs implementation would easy a startup of a discussion with the developers of the experiment’s 6 software
Plain LCG instead of LCG/DIRAC mixing n When the features of the WMS will be well understood (at least by us, see slide 5), we can think of forcing a plain LCG approach, at least with our own (T 1) resources n n But, we have to seriously demonstrate that with a proper configuration (or even fixes) of the WMS (number of brokers conditioned to the submission rate, etc. ) the job efficiency approaches 100% (from the point of view of the WMS, not the application of course) For the software integration we need dedicated manpower at CERN, but this is another story… 7
- Slides: 7