Tier 1 Grid Services Ian Collier Grid PP

  • Slides: 7
Download presentation
Tier 1 (Grid) Services Ian Collier Grid. PP Review June 20 th 2012

Tier 1 (Grid) Services Ian Collier Grid. PP Review June 20 th 2012

Past Year • EMI Updates – Migration off g. Lite to EMI(2) – Formally

Past Year • EMI Updates – Migration off g. Lite to EMI(2) – Formally engaged with Staged Rollout & Early Adopters process • Virtualisation – (Nearly) all services on (Hyper-V) virtualised platform – Much easier to set up & manage than collection of bare metal – Quick recovery after power events notable • CVMFS Stratum 0 for non-LHC Vos – Actively used now – Responses have been very positive – We. NMR latest, enthusiastic, users

Operational Issues • Batch start rates – Limited – Have been testing alternatives to

Operational Issues • Batch start rates – Limited – Have been testing alternatives to torque/maui – Condor & SLURM frontrunners • Condor looking very good • Have been hitting scaling limits with SLURM – As side effect also looking at ARC CE – Ne step: test with half of the retiring 2007 WNs on SL 6 with new Condor & ARC CE • (cvmfs) Job timeout failures – Low but persistent rate (~5% varying) – Have been testing 2. 1. x client • Found much worse problems – Investigation continuing

Coming Year • Continue Updates – Starting on EMI-3 – Further Staged Rollout &

Coming Year • Continue Updates – Starting on EMI-3 – Further Staged Rollout & Early adoption – Complete SL 6 migrations • Virtualisation – Shared storage just coming on-line • Investigations to make full use of that • Replication between buildings, etc. • Distribute services – Between R 89 & Atlas ‘outpost’ as it develops – ie BDIIs, FTS’, CEs, etc. , spread between 2 buildings • CVMFS Stratum 0 – Erasmus project to build web interface for SW upload – Negotiating for sites to replicate • Reference architecture may be different from WLCG • EGI have picked coordinating network of repositories & replicas • Nikhef & OSG, maybe CERN

Configuration Management • Quattor working well – Although we benefit from QWG, we could

Configuration Management • Quattor working well – Although we benefit from QWG, we could do so more – Made some ‘expedient’ choices early on – ready to revisit now • Quattor community more active recently – No longer held back by backward compatibility for CERN • Migration to Aquilon – Opportunity to refactor – Will allow more automation – Will improve workflows. • Of course track other activities& developments

Cloud • SCD Cloud – – Concept well proven ~300 cores, 90 -95% use

Cloud • SCD Cloud – – Concept well proven ~300 cores, 90 -95% use Adding half of 2007 WNs Member of staff (not rotating graduate) in plan • Storage – Have small ceph cluster to deploy • Image store • Object (S 3) store - service • Active use cases: – Internal (Tier 1 & SCT) development & testbeds • High level of user trust • Developing Use cases – Other users in STFC (ISIS, RAL Space) – EGI, Grid. PP & WLCG Cloud work

Looking to Future Starting to think about: • Post Grid. PP 4 • Cloud

Looking to Future Starting to think about: • Post Grid. PP 4 • Cloud is great for ‘disposable’ resources – What would it take for us to consider it to be solid enough for services now on Hyper-V? – What about layer (& interface) in batch farm?