Don Quijote Data Management for the ATLAS Automatic

  • Slides: 15
Download presentation
Don Quijote Data Management for the ATLAS Automatic Production System CHEP 2004 27/09/2004 Miguel

Don Quijote Data Management for the ATLAS Automatic Production System CHEP 2004 27/09/2004 Miguel Branco – CERN PH-ATC Don Quijote - CHEP 2004 miguel. branco@cern. ch

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion and Additional Information 27/09/2004 Don Quijote - CHEP 2004 2

ATLAS Data Challenges q ATLAS decided to undertake a series of Data Challenges in

ATLAS Data Challenges q ATLAS decided to undertake a series of Data Challenges in order to validate its Computing Model, its software, its data model q Started summer 2004: o ATLAS DC-2 q Introduced the new ATLAS Automatic Production System: o Unsupervised production across many sites spread over three different Grids (US Grid 3, Nordu. Grid, LCG-2) o 3 major components: Ø Ø Ø Windmill – ATLAS Production Supervisor Job Executors – one executor per “grid-flavor” Common Data Management system q The decision was taken to implement a single data management system capable of accessing all ATLAS Data Challenges data 27/09/2004 Don Quijote - CHEP 2004 3

Don Quijote q Don Quijote (DQ) is a high-level interface for grid data management

Don Quijote q Don Quijote (DQ) is a high-level interface for grid data management for the ATLAS Automatic Production System q Allow transparent registration and movement of replicas between all grid “flavors” used by ATLAS o US Grid 3, Nordu. Grid and LCG-2 o o Find common features between tools and catalogs “Bridge” them and provide a unified interface o lightweight clients q Avoid creating yet another replica and metadata catalog q Use existing catalogs and data management tools q Accessible as a Service 27/09/2004 Don Quijote - CHEP 2004 4

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion and Additional Information 27/09/2004 Don Quijote - CHEP 2004 5

Architecture Client o o o Servers One per “Grid” GSI-enabled version and insecure version

Architecture Client o o o Servers One per “Grid” GSI-enabled version and insecure version (with service certificate) Multiple configuration settings o o o Client C++ client API User interface tools in Python Configuration file indicating endpoint of each server Nordu. Grid US Grid 3 LCG-2 Globus RLS 2. x LCG RLS 27/09/2004 Don Quijote - CHEP 2004 Globus RLS 2. x 6

Architecture DQ-LCG server DQ-Grid 3 server DQ-NG server 3 rd party-transfer castorgrid. ific. uv.

Architecture DQ-LCG server DQ-Grid 3 server DQ-NG server 3 rd party-transfer castorgrid. ific. uv. es Source Storage from NG Whomever owns castorgrid. ific. uv. es please Stage this one and return Ok. Taking it. Will let Ok. URL copy a filecare fromof this Transport and register Who has replicas These of. Here the are LFN? is the my TURL replicas me a Grid. FTP Transport URL you know when it’s done. the replica in the replica catalog maintaining these metadata attributes. DQ Client 27/09/2004 Don Quijote - CHEP 2004 Replicate this LFN to castorgrid. ific. uv. es 7

DQ modules q Current structure: Dq. Core C++ Client Module Dq. Pool. Rls Dq.

DQ modules q Current structure: Dq. Core C++ Client Module Dq. Pool. Rls Dq. Globus. Rls dq. py Python Module C++ Python wrapper Dq. Lcg. Replica. Access Dq. Classic. Replica. Access Dq. Lcg. Info. Service Dq. Vdt. Info. Service Dq. Ng. Info. Service dms. py Production User Interface dms 2. py Dq. Factory Dq. Config. File End-user Client tool Dq. Monitor Dq. UI Dq. Server. Lcg 27/09/2004 Dq. Interface Dq. Server. Ng Don Quijote - CHEP 2004 Dq. Server. Vdt 8

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion and Additional Information 27/09/2004 Don Quijote - CHEP 2004 9

Functionalities provided by API q What can be done using client API or command-line

Functionalities provided by API q What can be done using client API or command-line tools? o Search for replicas of logical files as well as metadata attributes o List storage locations o Replicate files between storage locations o Get a locally accessible physical file from a grid-storage o Put a file into a grid storage o Validate a file – md 5 checksum, file size o Subject to security: Ø Ø Renaming logical files Removing logical files and physical replicas q All actions above can be executed within or across different grids 27/09/2004 Don Quijote - CHEP 2004 10

End-user tools q Provide a single tool for end-users to manage data files o

End-user tools q Provide a single tool for end-users to manage data files o Integrates all tools that users would have to know about into a single one: Ø o POOL, EDG, Globus, Castor, …Act as a Replica Manager Although being “POOL-aware”, there is nothing ATLAS or HEP-specific q Eases security requirements for end-users o Temporarily and for some requests only! 27/09/2004 Don Quijote - CHEP 2004 11

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion

Overview q Introduction q Architecture q End-user tools and APIs q Future Plans, Conclusion and Additional Information 27/09/2004 Don Quijote - CHEP 2004 12

Future plans q Decouple DQ modules into full Service Oriented Architecture o Outsource module

Future plans q Decouple DQ modules into full Service Oriented Architecture o Outsource module implementations o Most commonly accessed files/partitions/datasets, … o Twiki-based o Prototype being developed by Frederik Orellana q Monitoring of Server requests q Reliable File Transfer service (Tier 0 exercise) q Working on Documentation q Interface to EGEE/g. Lite from ARDA project q Future? No plans for major rewrite, only refactoring q Most important is to maintain the same interface for end-users and for the production system 27/09/2004 Don Quijote - CHEP 2004 13

Conclusion q Don Quijote is becoming the default grid data file access layer for

Conclusion q Don Quijote is becoming the default grid data file access layer for ATLAS o “New catalogs are coming from grid projects; we should stick with our present DQ insulation layer” ATLAS Database and Data Management project q Accomplished goal of exposing different grids middleware with a unified interface q Client tools for end-users as well as for production managers q DQ usage: o Can access ~32 TB of data and ~140 K files produced so far by the ATLAS DC since early June o Total requests: over 600 000, mostly to the replica catalogs without file movement o File Transfers: only around 3 TB so far; will increase to around 35 TB with Tier 0 exercise in coming weeks q Overall, still a bit to go to provide a unified system to access ATLAS production data o DQ aims to help building that unified system 27/09/2004 Don Quijote - CHEP 2004 14

Additional information q DQ web page: o http: //cern. ch/mbranco/cern/donquijote/ q DQ docs (twiki):

Additional information q DQ web page: o http: //cern. ch/mbranco/cern/donquijote/ q DQ docs (twiki): o https: //uimon. cern. ch/twiki/bin/view/Atlas/Don. Quijote q Feel free to contact me: o miguel. branco@cern. ch 27/09/2004 Don Quijote - CHEP 2004 15