Grid PP 2 Metadata Management Gavin Mc Cance

  • Slides: 18
Download presentation
Grid. PP 2: Metadata Management Gavin Mc. Cance University of Glasgow Grid. PP 2

Grid. PP 2: Metadata Management Gavin Mc. Cance University of Glasgow Grid. PP 2 Workshop, UCL Data. Grid is a project funded by the European Union Grid. PP is funded by PPARC Grid. PP 2 Workshop – 5 March 2004 – Data Management

Grid. PP 2 Middleware Metadata Management Gavin Mc. Cance – University of Glasgow Grid.

Grid. PP 2 Middleware Metadata Management Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 2

Work areas u. Metadata u. Storage n management and UK metadata group management See

Work areas u. Metadata u. Storage n management and UK metadata group management See Jens’ talk Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 3

Metadata Management u. The focus is upon Grid-enabling metadata services for the experiments n

Metadata Management u. The focus is upon Grid-enabling metadata services for the experiments n Building upon our previous work in this area n Building upon experiments’ existing work in this area u. Formation of a UK metadata group within Grid. PP 2 n 1 generic Grid metadata post @ Glasgow n ~1 post per experiment s s n ATLAS @ Glasgow, LHCb @ Oxford, CMS @ Bristol US expts, others? ? The UK metadata group will form part of the work of these experiment posts Interaction with the UK data management support teams Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 4

Grid. PP 2 Metadata Group u Purpose n n Take overall responsibility for common

Grid. PP 2 Metadata Group u Purpose n n Take overall responsibility for common experiment metadata technologies in order to Grid-enable the experiments’ metadata Identify the commonalities and experience across experiments and make sure these are recognized s n will be to i. e. technologies, schema: data product navigational problem Come to agreement and feed this back into the wider ARDA process u Work directly with interested groups forming the ARDA n EGEE JRA 1 Data Management Group (@CERN) n LCG Deployment Teams (@CERN) n LCG Experiments n IT Database group (@CERN) Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 5

Metadata Responsibilities u Generic n Concentration on the technologies used to create scalable, manageable

Metadata Responsibilities u Generic n Concentration on the technologies used to create scalable, manageable and fault-tolerant metadata services s n n The underlying Grid software stack Emphasis upon the service, not just the product s n metadata post @Glasgow: 24/7 supportable production metadata services Not prescribing things like the schema, or saying the ‘API must look like Spitfire’: prototype interfaces should be based upon experiments’ existing metadata interfaces Will track, develop and adopt as necessary Grid metadata access standards s s Feed into standards to make sure we’re in a position to benefit from the future production products that implement these standards Feed PPE use-case and experience back into the wider world Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 6

Metadata Responsibilities u Experiment n n n Document existing implementations from the experiments Make

Metadata Responsibilities u Experiment n n n Document existing implementations from the experiments Make sure all the experiments’ use-cases are satisfied by the products and the technologies being proposed by the group Work within the group to ensure that commonalities and experience across experiments are recognized and effort is not wasted s s s n n metadata posts (~1 per experiment): At the technology level – e. g. using the same underlying Grid software stack At the interface level – e. g. GANGA Possibly at the schema level… Feed this understanding and agreement back into the wider ARDA process and back into their own experiments ARDA terminology: Dataset metadata ARDA Metadata service Data product navigation ARDA Job Provenance service Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 7

Short term plans of the group… u Immediate u Current work: task of the

Short term plans of the group… u Immediate u Current work: task of the group is information gathering u http: //www. gridpp. ac. uk/datamanagement/metadata/ u. A review of how each experiment uses metadata: n What you mean by the term metadata: what does it include? n Details on this. . how do you use the metadata? n Implementation and deployment details: how is it split into services, the size of metadata, details on the schema, technologies used, etc. n Relation to other products, e. g. POOL n Future directions already in people’s minds? Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 8

…Short term plans of the group u The results of this review are being

…Short term plans of the group u The results of this review are being made available on a web page and should be pulled into a document n Common format to easily compare the different experiments uses of metadata u This n n document will serve as input to a metadata workshop ~end of April. . ? Still to be arranged… @Glasgow? VRVS? Purpose of the workshop will be to identify areas of commonality and work on the future programme for the group Generate ~short-lived sub-tasks within the group with a clear purpose and outcome Continue regular planning meetings to guide these sub-tasks u Should n ensure we have input from other sciences as well. . Can request input from the EDG WP 9/10 groups and EGEE Biomed groups Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 9

Links to other projects… u We can’t do this ourselves… u EGEE JRA 1:

Links to other projects… u We can’t do this ourselves… u EGEE JRA 1: The JRA 1 data management development cluster of EGEE is based at CERN - we will build upon the relationship formed within EDG (it’s a similar team as EDG) n n Primary interface to JRA 1 will be the generic middleware post at Glasgow Proposal to work directly with JRA 1 DM s n n i. e. use the JRA 1 CVS repository, use the same development tools and infrastructure, use the experience of the testing and integration teams of EGEE, deliver through this group The large experiment participation in this UK metadata group is seen as a very helpful within the JRA 1 DM cluster Lack of any formal agreement… Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 10

…Links to other projects… u LCG / EGEE SA 1: products delivered to LCG

…Links to other projects… u LCG / EGEE SA 1: products delivered to LCG through EGEE JRA 1? ? n See UK data management support posts later… u Experiments: members of the experiments will form part of the metadata group n n Feed-back the work of this group into the experiments and verify that the proposed solutions will work for their experiments Hope is to establish a recognized UK lead in metadata that is recognisably cross-experiment u ARDA n project: Some combination of the above. . ARDA is now a real project at CERN, though the details of how we work need to be sorted out Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 11

…Links to other projects u Direct testing of our products and solutions for other

…Links to other projects u Direct testing of our products and solutions for other sciences n Planning to do this through the other EGEE application groups n e. g. biomed have very strict security requirements n Is there another avenue in the UK for this sort of cross-science activity? ? u Various n Global Grid Forum s s n Grid and web-service forums: Mainly the DAIS group, with probable participation in the related Data Area groups Due to EDG focus on stability and support, we lost touch with the GGF data area groups the last year or so – re-establish… W 3 C, OASIS ? Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 12

Review of objectives and timelines u Multiple n experiment posts with different deliverables and

Review of objectives and timelines u Multiple n experiment posts with different deliverables and focus Not all of the experiment posts’ work will be within the scope of the metadata group, but all work done should be reported there so that commonalities can be identified early u As an example of how the work will be divided and for the general timelines, I highlight the relevant objectives for: n The generic-middleware metadata post @Glasgow n The ATLAS post @Glasgow n Then discuss the timelines for the development Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 13

Generic middleware objectives u u u Proforma 2 + 3: Development of Grid technologies

Generic middleware objectives u u u Proforma 2 + 3: Development of Grid technologies within a service-focussed architecture (such as WSRF) for use in metadata based applications for the experiments; Delivery of fault-tolerant, reliable and manageable software for this purpose. The emphasis from the beginning will be upon developing services that meet the requirements of the experiments; Use of this technology for the enabling of existing experiments’ metadata based products in line with the Metadata Catalog service described in the ARDA document (from LCG SC 2 RTAG 11); Participation in the Grid Forum data areas to ensure that particle physics is in a position to benefit from developments here. Promising developments will influence the design of the metadata services and we will feed back our requirements and experience into these forums. Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 14

ATLAS middleware objectives u u u Proforma 2 + 3: Gain a conceptual understanding

ATLAS middleware objectives u u u Proforma 2 + 3: Gain a conceptual understanding of the existing ATLAS metadata structures and the ATLAS specific use-cases that drive them; Develop, with reference to the use-cases and interactions with other ATLAS developers, the metadata necessary to support the navigational use-cases. Both the schema itself and the optimal location of the metadata require study; Understand the analysis use-cases and optimise the event to file granularity for different types of analysis data (ESD, AOD, TAG) depending upon the use-case. Develop automated ways to monitor the best granularity of event data based on analysis access patterns; Implement fully working and documented solutions, working with the ATLAS and UK metadata teams to ensure that the developments here are fully integrated with the rest of the ATHENA/GAUDI software, in particular, with the ATLAS Metadata Infrastructure (AMI) product. Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 15

Timescales for the deliverables… u Pre – Participate in architecture discussions and prototyping u

Timescales for the deliverables… u Pre – Participate in architecture discussions and prototyping u PM 1 n – Architecture and Planning “Report” Placing exercise in response to the EGEE architecture u PM 2 – Understanding of the Experiment Metadata Requirements (process started now…) u PM 3 – Design of Grid Services (Release 1) u PM 7 – Software and Associated Documentation (Release 1) u PM 9 – Participate in LCG TDR Review u PM 10 n – Tier 1 and 2 Support “Report” In collaboration with UK data management support posts u PM 11 – Detailed Metadata Requirements “Report” u PM 11 – Architecture and Planning (Release 2) Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 16

…Timescales for deliverables u PM 12 – Design and Refactor of Grid Services (Release

…Timescales for deliverables u PM 12 – Design and Refactor of Grid Services (Release 2) u PM 16 – Software and Associated Documentation (Release 2) u PM 21 – Tier 1 and Tier 2 Detailed Support Plan n In collaboration with UK data management support posts u PM 23 – Architecture and Planning (Release 3) u PM 26 - Design and Refactor of Grid Services (Release 3) u PM 32 – Software and Associated Documentation (Release 3) u PM 36 – Final Report Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 17

Support team of Grid. PP 2 u UK n data management support posts Aim:

Support team of Grid. PP 2 u UK n data management support posts Aim: to provide first-level support for all DM software s n first stop for UK system administrators Work directly with the development and deployment teams (Grid. PP 2 Metadata Group and Storage, EGEE and LCG) n Provide hands-on deployment help for data challenge support n Develop how-to portal to collect deployment experience n Feed back sys-admin issues and experience to developers s n n n Site policies, quotas, firewalls – survey sysadmins Develop site validation tools Responsible for developing the overall support plan for the data management services beyond Grid. PP 2 Need to fit all this in with the rest of the UK Support Plan Gavin Mc. Cance – University of Glasgow Grid. PP 2 Workshop – 5 March 2004 – Data Management – n° 18