Enabling Grids for Escienc E LCG ARDA project
Enabling Grids for E-scienc. E LCG ARDA project Status and plans Massimo Lamanna / CERN http: //arda. cern. ch INFSO-RI-508833
Overview • ARDA in a nutshell • ARDA prototypes – 4 experiments • ARDA feedback on middleware – Middleware components on the development test bed • Outlook and conclusions Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 2
The ARDA project • ARDA is an LCG project – main activity is to enable LHC analysis on the grid – ARDA is contributing to EGEE § uses the entire CERN NA 4 -HEP resource (NA 4 = Applications) • Interface with the new EGEE middleware (g. Lite) – By construction, ARDA uses the new middleware § Use the grid software as it matures – Verify the components in an analysis environments § Contribution in the experiments framework (discussion, direct contribution, benchmarking, …) § Users needed here. Namely physicists needing distributed computing to perform their analyses – Provide early and continuous feedback Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 3
ARDA prototype overview LHC Experiment Main focus Basic prototype component /framework GUI to Grid GANGA/Da. Vinci Interactive analysis PROOF/Ali. ROOT High-level services DIAL/Athena Explore/exploit native g. Lite functionality Middleware ORCA Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 4
Ganga 4 • Major version • Important contribution from the ARDA team • Interesting concepts • Note that GANGA is a joint ATLAS-LHCb project • Contacts with CMS (exchange of ideas, code snippets, …) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 5
GANGA Workshop 13 -15 of June GANGA Workshop: http: //agenda. cern. ch/full. Agenda. php? ida=a 052763 at Imperial College London (organised by U. Egede) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 6
ALICE prototype ROOT and PROOF • ALICE provides – the UI – the analysis application (Ali. ROOT) • GRID middleware g. Lite provides all the rest Middleware UI shell end to Application end • ARDA/ALICE is evolving the ALICE analysis system Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 7
Demo at Supercompu ting 04 and Den Haag PROOF SLAVES Site B PROOF SLAVES PROOF MASTER SERVER Site C Site A PROOF SLAVES USER SESSION OSG Applications Meeting (SLAC) - June 1 st, 2005 Demo based on a hybrid Massimo system. Lamanna using -2004 prototype 8
ARDA shell + C/C++ API C++ access library for g. Lite has been developed by ARDA • High performance • Protocol quite proprietary. . . Essential for the ALICE prototype Generic enough for general use Using this API grid commands have been added seamlessly to the standard shell Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 9
Current Status • Developed g. Lite C++ API and API Service – providing generic interface to any GRID service • C++ API is integrated into ROOT – In the ROOT CVS – job submission and job status query for batch analysis can be done from inside ROOT • Bash interface for g. Lite commands with catalogue expansion is developed – More powerful than the original shell – In use in ALICE – Considered a “generic” mw contribution (essential for ALICE, interesting in general) • First version of the interactive analysis prototype ready • Batch analysis model is improved – submission and status query are integrated into ROOT – job splitting based on XML query files – application (Aliroot) reads file using xrootd without prestaging Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 10
ATLAS/ARDA • Main component: – Contribute to the DIAL evolution § g. Lite analysis server • “Embedded in the experiment” – – – AMI tests and interaction Production and CTB tools Job submission (ATHENA jobs) Integration of the g. Lite Data Management within Don Quijote Active participation in several ATLAS reviews Plan to demonstrate GANGA+Prod service (coming soon) • Benefit from the other experiments prototypes – First look on interactivity/resiliency issues § E. g. use of DIANE – GANGA (Principal component of the LHCb prototype, key component of the overall ATLAS strategy) Massimo Lamanna - OSG Applications Meeting. Tao-Sheng (SLAC) - June 1 st, 2005 Chen, 11 ASCC
Data Management Don Quijote ARDA has connected g. Lite Locate and move data over grid boundaries DQ Client DQ server SE RLS GRID 3 DQ server RLS SE Nordugrid DQ server RLS LCG SE DQ server RLS SE g. Lite Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 12
Combined Test Beam Real data processed at g. Lite Standard Athena for testbeam Data from CASTOR Processed on g. Lite worker node Example: ATLAS TRT data analysis done by PNPI St Petersburg Number of straw hits per layer Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 13
DIANE Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 14
DIANE on g. Lite running Athena Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 15
ARDA/CMS • Pattern ARDA/CMS activity – Prototype (ASAP) – Contributions to CMS-specific components § Ref. DB/Pub. DB – Usage of components used by CMS § Notably Monalisa – Contribution to CMS-specific developments § Physh Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 16
ARDA/CMS • ARDA/CMS prototype • Ref. DB Re-Design and Pub. DB – Taking part in the Ref. DB redesign – Developing schema for Pub. DB and supervising development of the first Pub. DB version • Analysis Prototype Connected to Mon. Alisa – To track the progress of an analysis task is troublesome when the task is split into several (hundreds of) sub-jobs – Analysis prototype associates each sub-job with built-in ‘identity’ and capability to report its progress to the Mon. Alisa system – Mon. Alisa service receives and combines progress reports of single sub-jobs and publishes the overall progress of the whole task Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 17
ARDA/CMS • Phy. Sh – Physicist Shell – ASAP is Python-based and it uses XML-RPC calls for client-server interaction like Clarens and Phy. Sh – In addition, to enable future integration, the analysis prototype has similarly structured CVS repository as the Phy. Sh project Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 18
ARDA-CMS • CMS prototype (ASAP = Arda Support for cms Analysis Processing) – First version of the CMS analysis prototype capable of creating- submitting-monitoring of the CMS analysis jobs on the g. Lite middleware had been developed by the end of the year 2004 § Demonstrated at the CMS week in December 2004 – Prototype was evolved to support both RB versions deployed at the CERN testbed (prototype task queue and g. Lite 1. 0 WMS ). – Currently submission to both RBs is available and completely transparent for the users (same configuration file, same functionality) – Plan to implement g. Lite job submission handler for Crab • Users? – Starting from February 2005 CMS users began working on the testbed submitting jobs through ASAP – Positive feedback, suggestions from the users are implemented asap – Plan to involve more users as soon as preproduction farm is available – Plan to try and use in the prototype new functionality provided by WMS (DAGs, interactive job for testing purposes) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 19
ASAP: Starting point for users • The user is familiar with the experiment application needed to perform the analysis (ORCA application for CMS) – The user knows how to create executable to run the analysis task (reading selected data samples, use the data to compute derived quantities, take decisions, fill histograms, select events, etc…). The executable is based on the experiment framework • The user debugged the executable on small data samples, on a local computer or computing services (e. g. lxplus at CERN) • How to go for larger samples , which can be located at any regional center CMS-wide? • The users should not be forced : – to change anything in the compiled code – to change anything in the configuration file for ORCA – to know where the data samples are located Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 20
ASAP work and information flow First scenario Monitoring system CMS catalogs Monalisa Ref. DB Pub. DB Job monitoring directory Job generation ASAP UI Defines in the configuration file Application, application version, JDL Submission Querying job status Job running on the Worker Node g. Lite Saving output Executable ORCA data cards Data sample, Working directory, Castor directory to save output, Number of events to be processed, Output files location Number of events per job Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 21
ASAP work and information flow Second scenario Monalisa Ref. DB Pub. DB Job monitoring directory JDL Job running on the Worker Node g. Lite ASAP UI Job submission Application, applicationversion, Executable, Orca data cards Data sample, Working directory, Castor directory to save output, Number of events to be processed Number of events per job Delegates user credentials using My. Proxy Publishing Job status On the WEB Checking job status ASAP Job Monitoring service Resubmission in case of failure Fetching results Storing results to Castor Output files location Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 22
CMS - Using Mon. Alisa for user job monitoring Demo at Supercomputing 04 A single job Is submiited to g. Lite JDL contains job-splitting instructions Master job is split by g. Lite into sub-jobs Dynamic monitoring of the total number of the events of processed by all sub-jobs belonging to the same Master job Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 23
Job Monitoring • ASAP Monitor Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 24
Merging the results Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 25
First CMS users on g. Lite • Demo of the first working version of the prototype was done for the CMS community in December 2004 • ASAP is the first ARDA prototype which migrated to g. Lite version 1. 0 • First CMS physicists started to work on the g. Lite testbed using ASAP in the beginning of February 2005 • Currently we support 5 users from different physics group (can not allow more before moving to the preproduction farm): – 3 users - Higgs group – 1 user - SUSY group – 1 user – Standard Model • Positive feed back from the users, got many suggestions for improving interface and functionality. Fruitful collaboration. • ASAP has a support mailing list and a web page where we start to create a user guide: http: //arda-cms. cern. ch/asap/doc Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 26
H->2 t->2 j analysis: bkg. data available (all signal events processed with Arda) Bkg. samples Processed with qcd, p. T = 50 -80 Ge. V/c 100 K Arda 2. 08 x 10 -2 2. 44 x 10 -4 qcd, p. T = 80 -120 Ge. V/c 200 K crab 2. 94 x 10 -3 5. 77 x 10 -3 qcd, p. T = 120 -170 Ge. V/c 200 K Arda 5. 03 x 10 -4 4. 19 x 10 -2 qcd, p. T > 170 Ge. V/c 1 M 1. 33 x 10 -4 2. 12 x 10 -1 tt, W->tn 80 K crab 5. 76 x 10 -9 4. 88 x 10 -2 Wt, W->tn 30 K Arda 7. 10 x 10 -10 1. 38 x 10 -2 W+j, W->tn 400 K crab 5. 74 x 10 -7 2. 16 x 10 -2 Z/g*->tt, 130<mtt < 300 Ge. V/c 2 70 K Arda 1. 24 x 10 -8 9. 53 x 10 -2 Z/g*->tt, mtt > 300 Ge. V/c 2 60 K gross 6. 22 x 10 -10 3. 23 x 10 -1 s Br, mb Kine presel. A. Nikitenko (CMS)Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 27
Higgs boson mass (Mtt) reconstruction Higgs boson mass was reconstructed after basic off-line cuts: reco ETt jet > 60 Ge. V, ETmiss > 40 Ge. V. Mtt evaluation is shown for the consecutive cuts : pt > 0 Ge. V/c, pn > 0 Ge. V/c, Dfj 1 j 2 < 1750. s(MH) ~ s(ETmiss) / sin(fj 1 j 2) Mtt and s(Mtt) are in a very good agreement with old results CMS Note 2001/040, Table 3: Mtt = 455 Ge. V/c 2, s(Mtt)=77 Ge. V/c 2. ORCA 4, Spring 2000 production. 28 A. Nikitenko (CMS)Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005
ARDA ASAP • First users were able to process their data on g. Lite – Work of these pilot users can be regarded as a first round of validation of the g. Lite middleware and analysis prototypes • The number of users should increase as soon as preproduction system will become available – Interest to have CPUs at the centres where data sits (LHC Tier-1 s) • To enable user analysis on the Grid: – we will continue to work in the close collaboration with the physics community and g. Lite developers § ensuring good level of communication between them § providing constant feedback to the g. Lite development team • Key factors to progress: – Increasing number of users – Larger distributed systems – More middleware components Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 29
ARDA Feedback (g. Lite middleware) • 2004: – Prototype available (CERN + Madison Wisconsin) – A lot of activity (4 experiments prototypes) Access – Main limitation: size o • granted n May 18 th § Experiments data available! § Just an handful of worker nodes 2004! 2005: on the – Coherent move to prepare a g. Lite package to be deployed pre-production service § ARDA contribution: § Mentoring and tutorial § Actual tests! – Lot of testing during 05 Q 1 – Pre. Production Service is about to start! Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 30
WMS monitor – “Hello World!” jobs – 1 per minute since last Febraury – Logging&Bookkeeping info on the web to help the developers Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 31
Data Management • Central component together with the WMS • Early tests started in 2004 • Two main components: – g. Lite. IO (protocol + server to access the data) – Fi. Re. Man (file catalogue) – The two components are not isolated, for example g. Lite. IO uses the ACL as recorded in Fi. Re. Man, Fi. Re. Man exposes the physical location of files for the WMS to optimise the job submissions… • Both LFC and Fi. Re. Man offer large improvements over RLS – LFC is the most recent LCG 2 catalogue • Still some issues remaining: – Scalability of Fi. Re. Man – Bulk Entry for LFC missing – More work needed to understand performance and bottlenecks – Need to test some real Use Cases – In general, the validation of DM tools takes time! Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 32
Fi. Re. Man Performance - Queries • Query Rate for an LFN 1200 Fireman Single Fireman Bulk 100 Fireman Bulk 500 Fireman Bulk 1000 Fireman Bulk 5000 Entries Returned / Second 1000 800 600 400 200 0 5 10 15 20 25 30 35 40 45 50 Number Of Threads Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 33
Fi. Re. Man Performance - Queries • Comparison with LFC: 1200 Fireman - Single Entry Fireman - Bulk 100 LFC Entries Returned / Second 1000 800 600 400 200 0 12 5 10 20 50 100 Number Of Threads Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 34
More data coming… C. Munro (ARDA & Brunel Univ. ) at ACAT 05 Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 35
Summary of g. Lite usage and testing • Info available also under http: //lcg. web. cern. ch/lcg/PEB/arda/LCG_ARDA_Glite. htm • g. Lite version 1 § WMS • • Continuous monitor available on the web (active since 17 th of February) Concurrency tests Usage with ATLAS and CMS jobs (Using Storage Index) Good improvements observed § DMS (Fi. Re. Man + g. Lite. IO) • Early usage and feedback (since Nov 04) on functionality, performance and usability • Considerable improvement in performances/stability observed since • Some of the tests given to the development team for tuning and to JRA 1 to be used in the testing suite • Most of the tests given to JRA 1 to be used in the testing suite • Performance/stability measurements: heavy-duty testing needed for real validation § Contribution to the common testing effort to finalise g. Lite 1 with SA 1, JRA 1 and NA 4 -testing) • Migration of certification tests within the certification test suite (LCG g. Lite) • Comparison between LFC (LCG) and Fi. Re. Man • Mini tutorial to facilitate the usage of g. Lite within the NA 4 testing Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 36
Metadata services on the Grid • • g. Lite has provided a prototype for the EGEE Biomed community (in 2004) Requirements in ARDA (HEP) were not all satisfied by that early version ARDA preparatory work – Stress testing of the existing experiment metadata catalogues – Existing implementations showed to share similar problems ARDA technology investigation – On the other hand usage of extended file attributes in modern systems (NTFS, NFS, EXT 2/3 SCL 3, Reiser. FS, JFS, XFS) was analysed: a sound POSIX standard exists! • • Prototype activity in ARDA Discussion in LCG and EGEE and UK Grid. PP Metadata group • Synthesis: – New interface which will be maintained by EGEE benefiting from the activity in ARDA (tests and benchmarking of different data bases and direct collaboration with LHCb/Grid. PP) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 37
ARDA Implementation • Prototype – Validate our ideas and expose a concrete example to interested parties • • Multiple back ends – Currently: Oracle, Postgre. SQL, SQLite Dual front ends – TCP Streaming § Chosen for performance – SOAP § Formal requirement of EGEE § Compare SOAP with TCP Streaming • Also implemented as standalone Python library – Data stored on the file system Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 38
Dual Front End • Text based protocol • Most operations are SOAP calls • Data streamed to client in single connection Implementations • Based on iterators – Session created – Return initial chunk of data and session token – Subsequent request: client calls next. Query() using • – – Server – C++, multiprocess Clients – C++, Java, Python, Perl, Ruby mance r o f r e p study o t y a w cols… Clean o t o r p f ions o t a c i l p im session token – Session closed when: § § § • End of data Client calls end. Query() Client timeout Implementations – Server – g. SOAP (C++). – Clients – Tested WSDL with g. SOAP, ZSI (Python), AXIS (Java) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 39
More data coming… N. Santos (ARDA & Coimbra Univ. ) at ACAT 05 • Test protocol performance – No work done on the backend – Switched 100 Mbits LAN • Language comparison • – TCP-S with similar performance in all languages – SOAP performance varies strongly with toolkit Protocols comparison – Keepalive improves performance significantly – On Java and Python, SOAP is several times slower than TCP-S • Measure scalability of protocols – • • 1000 pings Switched 100 Mbits LAN TCP-S 3 x faster than g. Soap (with keepalive) Poor performance without keepalive – Around 1. 000 ops/sec (both g. SOAP and TCP-S) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 40
Current Uses of the ARDA Metadata prototype • Evaluated by LHCb bookkeeping – Migrated bookkeeping metadata to ARDA prototype § 20 M entries, 15 GB – Feedback valuable in improving interface and fixing bugs – Interface found to be complete – ARDA prototype showing good scalability • Ganga (LHCb, ATLAS) – User analysis job management system – Stores job status on ARDA prototype – Highly dynamic metadata • Discussed within the community – EGEE – UK Grid. PP Metadata group Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 41
ARDA workshops and related activities • ARDA workshop (January 2004 at CERN; open) • ARDA workshop (June 21 -23 at CERN; by invitation) – “The first 30 days of EGEE middleware” • NA 4 meeting (15 July 2004 in Catania; EGEE open event) • ARDA workshop (October 20 -22 at CERN; open) – “LCG ARDA Prototypes” – Joint session with OSG • NA 4 meeting 24 November (EGEE conference in Den Haag) • ARDA workshop (March 7 -8 2005 at CERN; open) • ARDA workshop (October 2005; together with LCG Service Challenges) • Wednesday afternoon meeting started in 2005: – Presentations from experts and discussion (not necessary from ARDA people) Available from http: //arda. cern. ch Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 42
Conclusions (1/3) • ARDA has been set up to – Enable distributed HEP analysis on g. Lite § Contact have been established • With the experiments • With the middleware developers • Experiment activities are progressing rapidly – Prototypes for ALICE, ATLAS, CMS & LHCb § Complementary aspects are studied § Good interaction with the experiments environment – Always seeking for users!!! § People more interested in physics than in middleware… we support them! – 2005 will be the key year (g. Lite version 1 is becoming available on the preproduction service) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 43
Conclusions (2/3) • ARDA provides special feedback to the development team – First use of components (e. g. g. Lite prototype activity) – Try to run real-life HEP applications – Dedicated studies offer complementary information • Experiment-related ARDA activities produce elements of general use – Very important “by-product” – Examples: § Shell access (originally developed in ALICE/ARDA) § Metadata catalog (proposed and under test in LHCb/ARDA) § (Pseudo)-interactivity experience (something in/from all experiments) Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 44
Conclusions (3/3) • ARDA is a privileged observatory to follow, contribute and influence the evolution of the HEP analysis – Analysis prototypes are a good idea! § Technically, they complement the data challenges’ experience § Key point: these systems are exposed to users – The approach of 4 parallel lines is not too inefficient § Contributions in the experiments from day zero • Difficult environment § Commonality can not be imposed… – We could do better in keeping good connection with OSG § How? Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 45
Outlook • Commonality is a very tempting concept, indeed… – Sometimes a bit fuzzy, maybe… • Maybe it is becoming possible (and valuable)… – Lot of experience in the whole community! – Baseline services ideas – LHC schedule: physics is coming! • Maybe it is emerging… (examples are not exhaustive) – Interactivity is a genuine requirement: e. g. PROOF and DIANE – Portals toolkits for the users to build applications on top of the computing infrastructure: e. g. GANGA – Metadata/workflow systems open to the users: needed! § This area has yet to be “diagonalised” • – Monitor and discovery services open to users: e. g. Monalisa in ASAP Strong preference for a “a posteriori” approach – All experiments still need their system… – Since it is really needed, then we* should do it § No doubt that technically we* can We* = the HEP community in collaboration with the middleware experts Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 46
People • • Massimo Lamanna Frank Harris (EGEE NA 4) • • Birger Koblitz Andrey Demichev Viktor Pose Victor Galaktionov • • Derek Feichtinger Andreas Peters • • Hurng-Chun Lee Dietrich Liko Frederik Orellana Tao-Sheng Chen • • • Julia Andreeva Juha Herrala Alex Berejnoi • • • Andrew Maier Kuba Moscicki Wei-Long Ueng 2 Ph. D students: • Craig Munro (Brunel Univ. ) Distributed analysis within CMS working mainly with Julia • Nuno Santos (Coimbra Univ) Metadata and resilient computing working mainly with Birger • ALICE Catalin Cirstoiu and Slawomir Biegluk (short-term LCG visitors) ATLAS CMS Good collaboration with EGEE/LCG Russian institutes and with ASCC Taipei LHCb Massimo Lamanna - OSG Applications Meeting (SLAC) - June 1 st, 2005 47
- Slides: 47