SEEGRIDSCI SA 1 Report www seegridsci eu PSC

  • Slides: 22
Download presentation
SEE-GRID-SCI SA 1 Report www. see-grid-sci. eu PSC 03 Meeting Bucharest, 15 -16 Jan

SEE-GRID-SCI SA 1 Report www. see-grid-sci. eu PSC 03 Meeting Bucharest, 15 -16 Jan 2009 Antun Balaz SA 1 Leader Institute of Physics Belgrade antun@phy. bg. ac. yu The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP 7 Research Infrastructures contract no. 211338

Overview Objectives and metrics Deliverables, milestones VO resources and services/deployment status Grid operations/interoperations Operational

Overview Objectives and metrics Deliverables, milestones VO resources and services/deployment status Grid operations/interoperations Operational and monitoring tools Action points PSC 03 Meeting, Bucharest, 15 -16 January 2009

Infrastructure and operations related objectives of SEE-GRID-SCI Do. W states that SEE-GRID-SCI will provide

Infrastructure and operations related objectives of SEE-GRID-SCI Do. W states that SEE-GRID-SCI will provide and operate the nextgeneration e. Infrastructure for the SEE region. In SEE-GRID-SCI context this refers to operating the Grid infrastructure and specific end-user services for the benefit of new user communities. Objective 2: Providing infrastructure for new communities § O 2. 1: Expand the current infrastructure q MTSA 1. 1: Increase in the number of computing and storage resources (tables given in Do. W) § O 2. 2: Inclusion of Armenia and Georgia q MTSA 1. 2: Number of Grid sites and processing and storage resources (tables given in Do. W) § O 2. 3: Achieve high reliability, availability and automation q MTSA 1. 3: Increase of the average overall Grid site availability (M 01 >= 70%, M 12 >= 75%, M 24 >= 81%) q MTSA 1. 4: Number of successful jobs ran as % of total jobs (M 01 >= 50%, M 12 >= 55%, M 24 >= 60%) q MTSA 1. 5: Number of management tools expanded or developed (+achieving tools integration and automation) § O 2. 4: Provision of the network link to Moldova PSC 03 Meeting, Bucharest, 15 -16 January 2009

SA 1 deliverables DSA 1. 1 a: Infrastructure Deployment Plan (M 04) § CERN,

SA 1 deliverables DSA 1. 1 a: Infrastructure Deployment Plan (M 04) § CERN, Editor: D. Stojiljkovic DSA 1. 2: SLA detailed specification and related monitoring tools (M 05) § UOBL, Editor: M. Savic DSA 1. 3 a: Infrastructure overview and assessment (M 12) § UKIM, Editor: B. Jakimovski DSA 1. 1 b: Infrastructure Deployment Plan (M 14) § UOB-IPB, Editor: A. Balaz DSA 1. 3 b: Infrastructure overview and assessment (M 23) § UKIM, Editor: B. Jakimovski PSC 03 Meeting, Bucharest, 15 -16 January 2009

SA 1 milestones MSA 1. 1: Infrastructure deployment plan defined (M 04) § CERN

SA 1 milestones MSA 1. 1: Infrastructure deployment plan defined (M 04) § CERN (verified by DSA 1. 1 a) MSA 1. 2: SLA structure and enforcement plan defined (M 05) § Uo. BL (verified by DSA 1. 2) MSA 1. 3: Network link for Moldova established (M 23) § RENAM: (verified by the operational link to MD and DSA 1. 3 b) MSA 1. 4: Infrastructure performance and usage assessed (M 23) § UKIM (verified by DSA 1. 3 b) PSC 03 Meeting, Bucharest, 15 -16 January 2009

VO resources Per country commitments gathered from (almost) all partners: SEEGRIDSCI-SA 1 -RS-006 -SA

VO resources Per country commitments gathered from (almost) all partners: SEEGRIDSCI-SA 1 -RS-006 -SA 1 -Commitmentsb-2009 -01 -16. xls Do. W commitments must be met § CPUs: at least k. SI 2 K numbers Note that the committed resources must be available to SEE-GRID-SCI NA 4 VOs whenever needed They must be declared in HGSM; current situation in SEEGRIDSCI-SA 1 -RS-034 -SA 1 -HGSM-VO-commitments-a-2009 -01 -16. xls, still pending: § AL-01 -FIT, BA-03 -ETFSA, BA-04 -PMFSA, BG 01 -IPP, BG 04 -ACAD, BG 05 -SUGrid, BG 06 -GPHI, HR-*, MD-*, RO-* (except RO-01 ICI), SZTAKI, TR-* PSC 03 Meeting, Bucharest, 15 -16 January 2009

CPU commitments from Do. W Country No. of sites May-08 May-09 May-10 May-08 No.

CPU commitments from Do. W Country No. of sites May-08 May-09 May-10 May-08 No. of CPUs May-09 May-10 Aggregated k. SI 2 k May-08 May-09 May-10 GR 2 2 2 77 115 150 77 149 224 BG 3 4 4 180 210 260 230 260 280 RO 5 50 60 80 60 70 100 TR 2 3 3 64 128 200 75 217 340 HU 1 1 1 8 10 12 8 10 13 AL 4 6 6 25 35 35 40 50 50 BA 4 44 60 84 40 74 92 MK 2 2 2 50 70 70 60 84 84 RS 5 5 6 100 140 200 110 180 260 ME 1 2 2 36 48 60 32 44 56 MD 4 5 6 26 31 39 37 41 49 HR 2 2 3 20 40 60 20 50 80 AM 1 2 3 32 38 42 30 36 39 GE 0 1 1 0 8 14 0 10 18 36 44 48 712 993 1306 819 1275 1685 OVERALL PSC 03 Meeting, Bucharest, 15 -16 January 2009

Number of sites PSC 03 Meeting, Bucharest, 15 -16 January 2009

Number of sites PSC 03 Meeting, Bucharest, 15 -16 January 2009

Number of CPUs PSC 03 Meeting, Bucharest, 15 -16 January 2009

Number of CPUs PSC 03 Meeting, Bucharest, 15 -16 January 2009

Storage commitments from Do. W Country Storage resources [TB] May-08 May-09 May-10 GR 1

Storage commitments from Do. W Country Storage resources [TB] May-08 May-09 May-10 GR 1 2 2 BG 5 7 8 RO 1. 3 2 2. 5 TR 4 10 10 HU 0. 02 1 2 AL 1. 2 1. 8 BA 0. 75 1 1. 25 MK 2 3 3 RS 4 8 12 ME 0. 8 1. 2 2 MD 6 9 12 HR 1 2 2 AM 1 2 2 GE 0 1 1 28. 07 51 61. 55 OVERALL PSC 03 Meeting, Bucharest, 15 -16 January 2009

Storage PSC 03 Meeting, Bucharest, 15 -16 January 2009

Storage PSC 03 Meeting, Bucharest, 15 -16 January 2009

VO services/deployment of VOs Core services for VOs: deployment status in SEEGRIDSCI-SA 1 -RS

VO services/deployment of VOs Core services for VOs: deployment status in SEEGRIDSCI-SA 1 -RS 006 -SA 1 -Commitments-b-2009 -01 -16. xls VO deployment status in SEEGRIDSCI-SA 1 -RS-033 -SA 1 -VO-Support -a-2009 -01 -16. xls meteo not supported on: BA-* , BG 01 -IPP, HR-*, MD-*, MK-01 -UKIM_II, RO-*, SZTAKI seismo not supported on: AEGIS 02 -RCUB, AEGIS 03 -ELEF-LEDA, BA-*, BG 01 -IPP, BG 02 -IM, BG 06 -GPHI, HR-*, MD-*, MK-01 -UKIM_II, MREN-01 CIS, RO-*, SZTAKI env not supported on: BA-*, HR-*, MD-*, MK-01 -UKIM_II, MREN-01 -CIS, SZTAKI Sites not supporting ANY of discipline Vos: UKIM_II, SZTAKI BA-*, HR-*, MD-*, MK-01 - sgdemo supported on: AEGIS 02 -RCUB, AEGIS 04 -KG, AEGIS 05 ETFBG, AL-01 -FIT, AM-01 -IIAP-NAS-RA, BG 03 -NGCC, BG 04 -ACAD, HR-01 -RBI, MK-01 -UKIM_II, MK-02 -ETF, RO-03 -UPB, SZTAKI, TR 01 -ULAKBIM PSC 03 Meeting, Bucharest, 15 -16 January 2009

Grid operations (1) Convergence with EGEE-SEE § ops. vo. egee-see. org (seeops) core services

Grid operations (1) Convergence with EGEE-SEE § ops. vo. egee-see. org (seeops) core services deployed § monitoring switched to new core services, but not to the new VO § sites not supporting new seeops VO: BA-*, BG 01 -IPP, HR-*, MD-*, MK 01 -UKIM_II, SZTAKI Critical SAM tests reviewed New version of BBm. SAM released: https: //c 01. grid. etfbl. net/bbmsam/ § Documentation draft put on the Wiki: http: //wiki. egee-see. org/index. php/SG_BBm. SAM Ongoing integration of SEE-GRID and EGEE-SEE helpdesks § Interface with GGUS and other relevant support systems § http: //wiki. egee-see. org/index. php/Proposed_Helpdesk_Structure § Support groups in support_groups. xls Operations manual (based on the EGEE one) § Todor/Emanouil prepared a draft, will be circulated soon: SEEGRIDSCI-SA 1 -IPP-011 -Operational. Manual-a-2008 -12 -22. doc PSC 03 Meeting, Bucharest, 15 -16 January 2009

Grid operations (2) MPI WG § http: //wiki. egee-see. org/index. php/SEE-GRID_MPI_User_Guide § http: //wiki.

Grid operations (2) MPI WG § http: //wiki. egee-see. org/index. php/SEE-GRID_MPI_User_Guide § http: //wiki. egee-see. org/index. php/SEE-GRID_MPI_Admin_Guide Deployment of AL, MD, ME, AM, GE sites § § AL-01 -FIT: some progress done MD-01 -TUM: not operational, does not pass SAM tests MD-04 -RENAM: not operational, does not pass SAM tests ME has to deploy another site § AM site has to become operational, and another one to be deployed by M 12 § GE first site already deployed (this is M 12 target) Current resources: 34 sites (Y 1 target 44) 1659 CPUs (Y 1 target 993, k. SI 2 k 1275) Storage available 160 TB, storage used and 40 TB used (Y 1 target 51 TB) Compared to Y 1 target, AL, BA, GR, MK, ME, MD, TR? and AM are not meeting CPU commitments § Compared to Y 1 target, only BG, RO, TR, RS are meeting storage commitments § § PSC 03 Meeting, Bucharest, 15 -16 January 2009

SLA/availabilities New SLA defined (operational metrics redefined to 80% availability requested) New SLA to

SLA/availabilities New SLA defined (operational metrics redefined to 80% availability requested) New SLA to be signed by all sites and the project (not legally binding) Conformance to SLA monitored by the SLA Enforcement Team (SET) and reported to SA 1 and NA 1 Members of SET § M. Savic (UOBL) § B. Jakimovski (UKIM) § I. Liabotis (GRNET) As reported in SEEGRIDSCI-SA 1 -MK-002 -SLA-Q 2 -b-2008 -11 -11. x, for Aug-Oct 2008 the average weighted availability is 87% (M 12 target 75%, M 24 target 81%) Sites fully conforming to the SLA (availability > 80%) upgraded to the new status in HGSM: seegrid_certified § The aim is to include them into production BDII configuration of EGEESEE, so that reliable resources are visible from all SEE top-level BDIIs PSC 03 Meeting, Bucharest, 15 -16 January 2009

Operational and monitoring tools improvements HGSM: interface to GOCDB in progress, OAT work Helpdesk:

Operational and monitoring tools improvements HGSM: interface to GOCDB in progress, OAT work Helpdesk: integration with EGEE-SEE; statistics; GGUS; reorganization of support groups Wiki: reorganization; status? Accounting Portal: further improvements needed (job success rates for metrics!) Nagios: integration of alarms/automatic submission of tickets WMSMON tool improved (former rbwmsmon); client should be deployed by all WMS servers New security-related Pakiti-based tool in development by SZTAKI PSC 03 Meeting, Bucharest, 15 -16 January 2009

WMSMON (1/3) Computing resources discovery and management in the g. Lite environment is done

WMSMON (1/3) Computing resources discovery and management in the g. Lite environment is done by the WMS Current implementation of Grid Service Availability Monitoring framework does not include direct probes of WMSMON - newly developed g. Lite WMS monitoring tool § site independent g. Lite WMS monitoring § centralized g. Lite WMS monitoring § uniform g. Lite WMS monitoring PSC 03 Meeting, Bucharest, 15 -16 January 2009

WMSMON (2/3) WMSMON is based on the server-client architecture § aggregated status view of

WMSMON (2/3) WMSMON is based on the server-client architecture § aggregated status view of all monitored WMS services § detailed status page for each WMS service § links to the appropriate troubleshooting guides PSC 03 Meeting, Bucharest, 15 -16 January 2009

WMSMON (3/3) PSC 03 Meeting, Bucharest, 15 -16 January 2009

WMSMON (3/3) PSC 03 Meeting, Bucharest, 15 -16 January 2009

Action points (1) AP 38: Operations manual (Ongoing, Emanouil, 15 Jun 08) AP 40:

Action points (1) AP 38: Operations manual (Ongoing, Emanouil, 15 Jun 08) AP 40: HGSM site application support (Hakan, 11 Dec 08) AP 42: HGSM interface with GOCDB (ongoing) AP 43: Helpdesk statistics (Alex. S, 15 Sep 08) AP 44: Helpdesk integration with EGEE-SEE, consolidation of all support groups etc. (Alex. S, 30 Jun 08) AP 46: BBm. SAM portal improvements (SLA) (Mihajlo, ongoing) AP 49: Nagios – integration of all alarms; “CIC dashboard” (Ongoing, Emanouil, 30 Jun 08) AP 53: AP 54: AP 55: AP 56: MPI WG (Antun, 30 Nov 08) Wiki reorganization (Boro, 30 Jun 08) Core services deployment (site admins, 30 Nov 08) - Closed NA 4 VO support on all sites (all site admins, 23 Nov 08) PSC 03 Meeting, Bucharest, 15 -16 January 2009

Action points (2) AP 57: Deployment of all MD and AL sites (MD GIM,

Action points (2) AP 57: Deployment of all MD and AL sites (MD GIM, AL GIM, 30 Jun 08) AP 59: Migration of all g. Lite-3. 0 services to g. Lite-3. 1 (all site admins, 30 Nov 08): still waiting for MK-02 -ETF AP 94: Core services for ops. vo. egee-see. org (Mihajlo, Luka, 15 Sep 08): BA ok, ME? AP 95: Enabling support for ops. vo. egee-see. org on all sites (all site admins, 23 Nov 08) AP 96: SAM testing migration to ops. vo. egee-see. org (Mihajlo, 30 Nov 08) AP 97: WMS monitoring tool improvements (Dusan, 30 Nov 08) AP 98: WMS monitoring supported on all WMS+LBs (site admins, 15 Dec 08) AP 99: EGEE-SEE and SEEGRID working group providing input to EGEE COD about regional monitoring. This should include MJRA 1. 1 conclusions (Emanouil, 31 Oct 08, status? ) PSC 03 Meeting, Bucharest, 15 -16 January 2009

Action points (3) AP 100: New accounting solution to include MPI jobs properly (Emanouil,

Action points (3) AP 100: New accounting solution to include MPI jobs properly (Emanouil, 31 October) AP 111: Sites to add NA 4 VO commitments to HGSM (all sites, 15 Dec 08) New AP: Statistics on successful jobs to be produced in a form suitable for the review and made available through the accounting portal (Emanouil, 15 Feb 09) New AP: BBm. SAM documentation to be finalized (Mihajlo, 15 Feb 09) New AP: SLA conformance statistics to be produced automatically by BBm. SAM in a form suitable for the review (Mihajlo, 15 Feb 09) PSC 03 Meeting, Bucharest, 15 -16 January 2009