Tony Doyle a doylephysics gla ac uk Overview

  • Slides: 47
Download presentation
 Tony Doyle a. doyle@physics. gla. ac. uk “Overview of UK Development and Deployment

Tony Doyle a. doyle@physics. gla. ac. uk “Overview of UK Development and Deployment Programme”, LCG PEB Meeting, CERN, 16 September 2003 Tony Doyle - University of Glasgow

Outline • • • Management The Project Map Grid. PP Status UK Grid Users

Outline • • • Management The Project Map Grid. PP Status UK Grid Users Deployment – LCG and UK perspective • Current Resources • EDG 2. 0/LCG 1. 0 Deployment Status • Accounting • Today’s Operations • Future Operation Planning • Middleware Status • Middleware Evolution • Grid. PP 2 Planning Status Tony Doyle - University of Glasgow

Grid. PP in Context Experiments Institutes Tier-2 Apps Int Grid. PP Tier-1/A Core e-Science

Grid. PP in Context Experiments Institutes Tier-2 Apps Int Grid. PP Tier-1/A Core e-Science Programme Grid Support Centre Apps Dev CERN LCG Middleware EGEE Not to scale! Tony Doyle - University of Glasgow

Grid. PP Management CB (20 members) meets half-yearly to provide Institute overview PMB (12

Grid. PP Management CB (20 members) meets half-yearly to provide Institute overview PMB (12 members) meets weekly [via VC] to provide management of project TB (10 members) meet as required in response to technical needs and regularly via phone Tony Doyle - University of Glasgow EB (14 members) meet quarterly to provide experiments input

Grid. PP Project Overview Tony Doyle - University of Glasgow

Grid. PP Project Overview Tony Doyle - University of Glasgow

Financial Breakdown • Five components – – – Tier-1/A = Hardware + 10 CLRC

Financial Breakdown • Five components – – – Tier-1/A = Hardware + 10 CLRC e-Science Staff Data. Grid = 25 Data. Grid Posts inc. CLRC PPD Staff Applications = 17 Experiments Posts (to interface middleware) Operations = Travel (~100 people)+ Management + e Early Investment CERN = 25 LCG posts + Tier-0 + e LTA Tony Doyle - University of Glasgow

Quarterly Reporting Quarterly reporting allows comparison of delivered effort with expected effort Feedback loop

Quarterly Reporting Quarterly reporting allows comparison of delivered effort with expected effort Feedback loop as issues arise Tony Doyle - University of Glasgow

Funded Effort Breakdown (Snapshot 2003 Q 3) LCG effort is largest single area of

Funded Effort Breakdown (Snapshot 2003 Q 3) LCG effort is largest single area of Grid. PP Future project priorities focussed on LCG and EGEE Tony Doyle - University of Glasgow

Grid. PP Status: The Project Map Tony Doyle - University of Glasgow

Grid. PP Status: The Project Map Tony Doyle - University of Glasgow

Grid. PP Status: Summary • • • Grid. PP 1 has now completed 2

Grid. PP Status: Summary • • • Grid. PP 1 has now completed 2 ex 3 years All metrics are currently satisfied 103 of 182 tasks are complete 70 tasks not yet complete or overdue 9 tasks are overdue: – 6 are associated with LCG • 2 of these are trivial (definition of future milestones) • 4 of these are related to the delay in LCG-1 – 2 are associated with applications (CMS and D 0) – 1 is associated with the UK infrastructure (test of a heterogeneous testbed) Tony Doyle - University of Glasgow

Risk Register (Status April 03) Scaling up to a production system (LCG-1 deployment) System

Risk Register (Status April 03) Scaling up to a production system (LCG-1 deployment) System management effort at UK Tier-2 sites (being addressed as part of Grid. PP 2) Tony Doyle - University of Glasgow

UK Certificates and VO membership 1. 2. 1. UK e-Science CA now used in

UK Certificates and VO membership 1. 2. 1. UK e-Science CA now used in production EDG testbed 2. PP “users” engaged from many institutes 3. UK participating in 6 ex 9 EDG VOs 3. Tony Doyle - University of Glasgow

UK Deployment Overview • • • Significant resources within EDG. Currently being upgraded to

UK Deployment Overview • • • Significant resources within EDG. Currently being upgraded to EDG 2. Integrating EDG on farms has been repeated many times but it is difficult. Sites are keen to take part within EDG 2 currently, with LCG 1 deployment after this. By the end of the year many HEP farms plan to be contributing to LCG 1 resources. Basis of Deployment Input to LCG Plan. Input from Tier-1 (~50%) initially and four distributed Tier-2’s (50%) on ~Q 1 2004 timescale. CPU (k. SI 2 K) Disk TB Support FTE Tape TB 700 160 1000 60 5 2. 5 5 France 420 81 10. 2 540 Germany 207 40 9. 0 62 Holland 124 3 4. 0 12 Italy 507 60 16. 0 100 Japan 220 45 5. 0 100 Poland 86 9 5. 0 28 Russia 120 30 10. 0 40 Taiwan 220 30 4. 0 120 Spain 150 30 4. 0 100 Sweden 179 40 2. 0 40 Switzerland 26 5 2. 0 40 UK 1656 226 17. 3 295 USA 801 176 15. 5 1741 Total 5600 1169 120. 0 4223 CERN Czech Repub Tony Doyle - University of Glasgow

Tier-1 @ RAL Tier 1/a WP 3 Testbed LCG 1. 0/EDG 2. 0 LCG

Tier-1 @ RAL Tier 1/a WP 3 Testbed LCG 1. 0/EDG 2. 0 LCG Testbed LCG 1. 0/EDG 2. 0 CE SE MON LCG 0 Testbed 5 x. WN 230 x. WN EDG Dev Testbed SE 1 x. WN • UI within CSF. EDG 2. 0 CE • NM for EDG 2. MON SE • Top level MDS for EDG. • Various WP 3 and WP 5 dev nodes. 1 x. WN ADS • VOMS for DEV TB. • http: //ganglia. gridpp. rl. ac. uk/ Tony Doyle - University of Glasgow

London Grid: Imperial College EDG Testbed EDG 2. 0 CE Ba. Bar Farm EDG

London Grid: Imperial College EDG Testbed EDG 2. 0 CE Ba. Bar Farm EDG 2. 0 SE WNs CMS-LCG 0 CE WN • Plan to be in LCG 1 and other testbeds. CE WNs • RB for EDG 2. 0. SE WP 3 Testbed EDG 2. 0 CE SE MON 1 x. WN Tony Doyle - University of Glasgow

London Grid: Queen Mary and UCL EDG Testbed EDG 1. 4 CE SE 1

London Grid: Queen Mary and UCL EDG Testbed EDG 1. 4 CE SE 1 x. WN 32 x. WN • Queen Mary CE also feeds EDG jobs to 32 node e-Science farm. • Plan to have LCG 1/EDG 2 running for the end of the year. • Expansion with SRIF grants. (64 WN+2 TB in Jan 2004, 100 WN + 8 TB in Dec 2004. ) • http: //194. 36. 10. 1/ganglia-webfrontend EDG Testbed EDG 1. 4 CE SE 1 x. WN • UCL Network Monitors for WP 7 development. • SRIF bid in place for 200 CPUs for the end of the year to join LCG 1. Tony Doyle - University of Glasgow

Southern Grid: Bristol EDG Testbed WP 3 Testbed EDG 2. 0 CE EDG 2.

Southern Grid: Bristol EDG Testbed WP 3 Testbed EDG 2. 0 CE EDG 2. 0 SE CE 1 x. WN CMS-LCG 0 24 x. WN MON 1 x. WN CMS/LHCb Farm CE SE Ba. Bar Farm • Grid. PP RC. EDG 1. 4 • Plan to join LCG 1 CE SE 78 x. WN Tony Doyle - University of Glasgow

Southern Grid: Cambridge and Oxford EDG Testbed • Some RH 73 WNs for ongoing

Southern Grid: Cambridge and Oxford EDG Testbed • Some RH 73 WNs for ongoing ATLAS challenge. EDG 1. 4 CE • Cambridge farm shared with local NA-48, GANGA users. SE • 3 TB Grid. FTP-SE. • Plan to join LCG 1/EDG 2 later in the year with an extra 50 CPUs. 15 x. WN • EDG jobs will be fed into local e-Science farm. EDG Testbed • Oxford: Plan to join EDG 2/LCG 1. EDG 1. 4 CE • http: //farm 002. hep. phy. cam. ac. uk/cavendish/ SE • Nagios monitoring has been set up. • (RAL is also evaluating Nagios) 2 x. WN • Planning to send EDG jobs into 10 WN CDF farm. • 128 node cluster being ordered now. Tony Doyle - University of Glasgow

Southern Grid: RAL PPD and Birmingham EDG Testbed • PPD User Interface EDG 2.

Southern Grid: RAL PPD and Birmingham EDG Testbed • PPD User Interface EDG 2. 0 CE WP 3 Testbed SE CE SE 1 x. WN 9 x. WN MON • Part of Southern Tier 2 Centre within LCG 1. • 50 CPUs and 5 TB of disk expected for the end of year. EDG Testbed • Birmingham: Expansion to 60 CPUs and 4 TBs. EDG 1. 4 CE SE • Expect to participate within LCG 1/EDG 2 1 x. WN Tony Doyle - University of Glasgow

North. Grid: Manchester and Liverpool EDG Testbed EDG 1. 4 CE SE 9 x.

North. Grid: Manchester and Liverpool EDG Testbed EDG 1. 4 CE SE 9 x. WN DZero Farm EDG 1. 4 CE SE(1. 5 TB) CE 80 x. WN EDG Testbed EDG 1. 4 CE Ba. Bar Farm SE(5 TB) 60 x. WN • Grid. PP and Ba. Bar VO Servers. • User Interface SE • Plan that DZero farm will join LCG. • SRIF bid in place for significant HEP resources. 1 x. WN • Liverpool plan to follow EDG 2, possibly integrating newly installed Dell (funded by NW Development Agency) and Ba. Bar farm. Largest single Tier-2 resource. Tony Doyle - University of Glasgow

Scot. Grid: Glasgow, Edinburgh and Durham Scot. GRID EDG 1. 4 CE • WNs

Scot. Grid: Glasgow, Edinburgh and Durham Scot. GRID EDG 1. 4 CE • WNs on a private network with outbound NAT in place. • Various WP 2 development boxes. SE 59 x. WN • 34 dual blade servers just arrived. 5 TB Fast. T 500 expected soon. • Shared resources (CDF and Bioinformatics) CDF LHC BIO WP 3 Testbed EDG 2. 0 CE SE MON • Edinburgh: 24 TB Fast. T 700 and 8 -way server just arrived. • Durham: existing farm available. • Plan to be part of LCG. Tony Doyle - University of Glasgow

EDG 2. 0 Deployment Status 12/9/03 • RAL (Tier 1 A): Up and running

EDG 2. 0 Deployment Status 12/9/03 • RAL (Tier 1 A): Up and running with 2. 0. 1. UI gppui 04 available (as part of CSF) and offer to give access to LCFGng node to help people compare with their own LCFGng setup. • IC: Existing WP 3 testbed site is at 2. 0. 0. Standard 2. 0 RB available • UCL: Trying to go to 2. 0: SE up so far. • QMUL: 2. 0 installation ongoing. • RAL (PPD): 2. 0. 0 site up and running. • Oxford: wait until October for 2. 0. • Birmingham: Working on getting a 2. 0 site up next week • Bristol: WP 3 testbed site at 2. 0. 0. Also doing a new 2. 0 site install. UI and MON up, still doing CE, SE and WN. • Cambridge to follow. • Manchester: Trying to get 2. 0. 1 set up. • Glasgow: Concentrating on commissioning new hardware during the next month. Wait until then before going to 2. 0. • Edinburgh to follow. Tony Doyle - University of Glasgow

Meeting Current LHC Requirements: Experiment Accounting Experimentdriven project. Priorities determined by Experiments Board. Tony

Meeting Current LHC Requirements: Experiment Accounting Experimentdriven project. Priorities determined by Experiments Board. Tony Doyle - University of Glasgow

Tier-1/A Accounting Annual accounting: CMS LHCb ATLAS, CMS and LHCb jobs. Generally dominated by

Tier-1/A Accounting Annual accounting: CMS LHCb ATLAS, CMS and LHCb jobs. Generally dominated by Ba. Bar since January. Monthly accounting: Online Ganglia-based monitoring, see: http: //www. gridpp. ac. uk/tier 1 a/ Last month: CMS (and Ba. Bar) jobs. CMS Tony Doyle - University of Glasgow

Today’s Operations 1. Support Team • 2. Methods • • 3. built from sysadmins.

Today’s Operations 1. Support Team • 2. Methods • • 3. built from sysadmins. 4 funded by Grid. PP to work on EDG WP 6, the rest are site sysadmins. Email list, phone meetings, personal visits, job submission monitoring RB, VO, RC for UK use to support non-EDG use Rollout • • Experience from RAL in EDG dev testbeds and IC and Bristol in CMS testbeds 10 sites have been part of EDG app testbed at one time Tony Doyle - University of Glasgow

Grid. PP 2 Operations • RAL is also leading the LCG • To move

Grid. PP 2 Operations • RAL is also leading the LCG • To move from testbed to Security Group production, Grid. PP plans a – written 4 documents setting out bigger team with a full-time procedures and User Rules Operations Manager – working with GOC task force on • Manpower will be from the Tier Security Policy -1 and Tier-2 Centres who will – Risk Analysis and further planning for LCG in 2004 contribute to the Production Team • The team will run a UK Grid which will belong to various grids (EDG, LCG, . . ) and also support other experiments Tony Doyle - University of Glasgow

LCG Operations • RAL has led project to develop an Operations Centre for LCG

LCG Operations • RAL has led project to develop an Operations Centre for LCG 1 – – – Applied Grid. PP and Map. Center monitoring to LCG 1 Dashboard combining several types of monitoring Set up a web site with contact information Developing Security Plan Accounting (the current priority, building upon resource centre and experiment accounting) Tony Doyle - University of Glasgow

Tony Doyle - University of Glasgow

Tony Doyle - University of Glasgow

Tony Doyle - University of Glasgow

Tony Doyle - University of Glasgow

EGEE • The UK Production Team will be expanded as part of EGEE ROC

EGEE • The UK Production Team will be expanded as part of EGEE ROC and CIC posts to meet EGEE requirements • To deliver an EGEE grid infrastructure that must also deliver to other communities and projects • Could do this just within PP (matching funding available) but also want to engage fully with UK Core programme EGEE CIC (4. 5 FTE) UK Team (8 FTE) EGEE ROC (5 FTE) (2 FTE) UK GSC (2 FTE) Tier 1 (16. 5 FTE) Tony Doyle - University of Glasgow

Tier-1/A Services [FTE] • High quality data services • National and International Role •

Tier-1/A Services [FTE] • High quality data services • National and International Role • UK Focus for International Grid development • Highest single priority within Grid. PP 2 Regained Programme CPU 2. 0 Disk 1. 5 AFS 0. 0 Tape 2. 5 Core Services 2. 0 Operations 2. 5 Networking 0. 5 Security 0. 0 Deployment 2. 0 Experiments 2. 0 Management 1. 5 Total 16. 5 Tony Doyle - University of Glasgow

Tier-2 Services [FTE] • Four Regional Tier-2 Centres • • London: Brunel, Imperial College,

Tier-2 Services [FTE] • Four Regional Tier-2 Centres • • London: Brunel, Imperial College, QMUL, RHUL, UCL. South. Grid: Birmingham, Bristol, Cambridge, Oxford, RAL PPD. North. Grid: CLRC Daresbury, Lancaster, Liverpool, Manchester, Sheffield. Current Planning Scot. Grid: Durham, Edinburgh, Glasgow. • Hardware provided by Institutes • Grid. PP provides added manpower Y 1 Y 2 Y 3 Hardware Support 4. 0 8. 0 Core Services 4. 0 User Support 1. 0 2. 0 Specialist Services Security 1. 0 Resource Broker 1. 0 Network 0. 5 Data Management 2. 0 VO Management 0. 5 14. 0 19. 0 Existing Staff -4. 0 Grid. PP 2 10. 0 15. 0 Total SY 40. 0 Tony Doyle - University of Glasgow

Operational Roles • Core Infrastructure Services • Core Operational Tasks (ROC) (CIC) – Monitor

Operational Roles • Core Infrastructure Services • Core Operational Tasks (ROC) (CIC) – Monitor infrastructure, components – – – – Grid information services Monitoring services Resource brokering Allocation and scheduling services Replica data catalogues Authorisation services Accounting services – – – and services Troubleshooting Verification of new sites joining Grid Acceptance tests of new middleware releases Verify suppliers are meeting SLA Performance tuning and optimisation Publishing use figures and accounts • Still to be defined fully in EGEE Tony Doyle - University of Glasgow

SRB for CMS • UK e. Science has been interested in SRB for several

SRB for CMS • UK e. Science has been interested in SRB for several years. • CCLRC has gained expertise for other projects and is collaborating with SDSC • Now hosting MCAT for worldwide CMS pre-DC 04 • Interfaced to RAL Datastore MCAT Database c MCAT Server b – Service Started 1 July 2003 • 183, 000 files registered • 10 TB of data stored in system • Used across 13 sites worldwide including CERN and Fermilab • 30 Storage resources managed across the sites d e f SRB A Server SRB B Server a g SRB Client Tony Doyle - University of Glasgow

EDG Storage. Element • Not initially adopted by LCG 1 • Since then limited

EDG Storage. Element • Not initially adopted by LCG 1 • Since then limited SRM functionality has been added to support GFAL – available for test by LCG • Full SRMv 1 functionality has been developed and is currently being integrated on internal testbed • GACLs being integrated Tony Doyle - University of Glasgow

RGMA - Status • Running on WP 3, EDG-development and EDG-application testbeds • Application

RGMA - Status • Running on WP 3, EDG-development and EDG-application testbeds • Application Deployment: 29 CEs, 11 SEs, 10 sites in 6 countries – RGMA browser access in < 1 sec • Monitoring scripts being run on the testbeds and results linked from the WP 3 web page – http: //hepunx. rl. ac. uk/edg/wp 3/ • Registry replication is being tested on WP 3 testbed – Better performance & higher reliability required • Authentication successfully tested on WP 3 testbed • Two known bugs remain – Excessive threads requiring GOUT machine restart • New code has been developed with extensive unit tests. Now being tested on WP 3 testbed • This new code will support at least 90 sites – Latest Producer choosing algorithm failing to reject bad LPs – shows up intermittent absence of information • Revised algorithm needs coding (localised change) Tony Doyle - University of Glasgow

RGMA - Users • Users and Interfaces to other systems: – – – Resource

RGMA - Users • Users and Interfaces to other systems: – – – Resource Broker CMS (Boss) Service and Service Status for all EDG services Network Monitoring & Network Cost Function Map. Center Logging & Bookkeeping UK e-Science, Cross. Grid and Ba. Bar evaluating Replica Manager MDS (GIN/GOUT) Nagios Ganglia (Ranglia) • Future: RB direct use of RGMA (no GOUT) – Better performance and reliability Tony Doyle - University of Glasgow

Middleware, Security and Network Service Evolution Activity • Information Services [5+5 FTE] and Networking

Middleware, Security and Network Service Evolution Activity • Information Services [5+5 FTE] and Networking [1. 5+1. 5 FTE]: strategic roles within EGEE • Security expands to meet reqts. • Data and Workload Management continue • No further configuration management development Middleware Networking Current Planning Security 3. 5 Info-Mon. 4. 0 Data & Storage 4. 0 Workload 1. 5 Networking 3. 0 TOTAL Security 16. 0 • programme defined by – mission criticality (experiment requirements driven) – International/UK-wide lead – leverage of EGEE, UK core and LCG developments Tony Doyle - University of Glasgow

Grid. PP 2 Proposal http: //www. gridpp. ac. uk/docs/gridpp 2/ ~30 page proposal +

Grid. PP 2 Proposal http: //www. gridpp. ac. uk/docs/gridpp 2/ ~30 page proposal + figures/tables + 11 planning documents: 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Tier-0 Tier-1 Tier-2 The Network Sector Middleware Applications Hardware Requirements Management Travel Dissemination From Testbed to Production Tony Doyle - University of Glasgow

Current planning based upon £ 19. 6 m Funding Scenario PPPARC Review Timeline: Projects

Current planning based upon £ 19. 6 m Funding Scenario PPPARC Review Timeline: Projects Peer Review Panel (14 -15/7/03) Grid Steering Committee (28 -29/7/03) Science Committee (October 03) Tony Doyle - University of Glasgow

Timeline Tony Doyle - University of Glasgow

Timeline Tony Doyle - University of Glasgow

Grid. PP 2 Project Map • Need to build this in: to identify progress…

Grid. PP 2 Project Map • Need to build this in: to identify progress… Tony Doyle - University of Glasgow

Experiment Requirements: UK only Total Requirement: Tony Doyle - University of Glasgow

Experiment Requirements: UK only Total Requirement: Tony Doyle - University of Glasgow

Meeting the Experiments’ Hardware Requirements • Significant… Production Grid inc. Tier-2 resources needed… Tony

Meeting the Experiments’ Hardware Requirements • Significant… Production Grid inc. Tier-2 resources needed… Tony Doyle - University of Glasgow

Projected Hardware Resources Total Resources: 2004 (note x 2 scale change) 2007 Tony Doyle

Projected Hardware Resources Total Resources: 2004 (note x 2 scale change) 2007 Tony Doyle - University of Glasgow

Application Interfaces - Service Evolution • Applications – 18 FTEs: ongoing programme of work

Application Interfaces - Service Evolution • Applications – 18 FTEs: ongoing programme of work can continue – Difficult to involve experiment activity not already engaged within Grid. PP • Project would need to build on cross-experiment collaboration – Grid. PP 1 already has experience – GANGA: ATLAS & LHCb – SAM: CDF & D 0 – Persistency: CMS & Ba. Bar • Encourage new joint developments across experiments Tony Doyle - University of Glasgow

Conclusions • Management under control via the Project Map and Project Plan • Grid.

Conclusions • Management under control via the Project Map and Project Plan • Grid. PP Status is defined in terms of high level tasks and metrics: under control • Major component is LCG – We contribute significantly to LCG and our success depends critically on LCG • Deployment – high and low level perspectives merge via accounting • Resource centre and experiment accounting are both important • • Comprehensive accounting is a priority, built up from existing systems Today’s operations in the UK are built around a small team Future operations planning expands this team significantly: Production Manager being appointed Middleware deployment focussing on Information Service performance issues Existing IS team will be reinforced in UK within EGEE Security (deployment and policy) is emphasised Grid. PP 2 planning status: formal feedback in November Tony Doyle - University of Glasgow